# I Initialize

This notebook is a walkthrough of describing the 2nd step in the ABC Enhancer Gene Prediction model

[broadinstitute/ABC-Enhancer-Gene-Prediction: Cell type specific enhancer-gene predictions using ABC model (Fulco, Nasser et al, Nature Genetics 2019)](https://github.com/broadinstitute/ABC-Enhancer-Gene-Prediction)  

From the repo:

Quantifying Enhancer Activity: 

```run.neighborhoods.py``` will count DNase-seq (or ATAC-seq) and H3K27ac ChIP-seq reads in candidate enhancer regions. It also makes GeneList.txt, which counts reads in gene bodies and promoter regions.

Replicate epigenetic experiments should be included as comma delimited list of files. Read counts in replicate experiments will be averaged when computing enhancer Activity.

Sample Command:

```
python src/run.neighborhoods.py \
--candidate_enhancer_regions example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted.candidateRegions.bed \
--genes example_chr22/reference/RefSeqCurated.170308.bed.CollapsedGeneBounds.chr22.bed \
--H3K27ac example_chr22/input_data/Chromatin/ENCFF384ZZM.chr22.bam \
--DHS example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam,example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep2.chr22.bam \
--expression_table example_chr22/input_data/Expression/K562.ENCFF934YBO.TPM.txt \
--chrom_sizes example_chr22/reference/chr22 \
--ubiquitously_expressed_genes reference/UbiquitouslyExpressedGenesHG19.txt \
--cellType K562 \
--outdir example_chr22/ABC_output/Neighborhoods/ 
```

Main output files:

  * **EnhancerList.txt**: Candidate enhancer regions with Dnase-seq and H3K27ac ChIP-seq read counts
  * **GeneList.txt**: Dnase-seq and H3K27ac ChIP-seq read counts on gene bodies and gene promoter regions


## Load packages

In [1]:
#from sevenbridges import Api, ImportExportState
import yaml
import time
import json
import importlib
import getpass
import sevenbridges

In [2]:
%load_ext yamlmagic

# II CWL Description

## Section 1/4 - Tool Label and Documentation

In [3]:
%%yaml label_and_description

#### rarely changing header and bioler plate
cwlVersion: v1.2
class: CommandLineTool
$namespaces:
  sbg: https://sevenbridges.com
hints:
- class: sbg:SaveLogs
  value: '*.sh'  

#### Tool Label  
label: Run Neighborhood

#### Tool Description

doc: |-

  Quantifying Enhancer Activity: 

  ```run.neighborhoods.py``` will count DNase-seq (or ATAC-seq) and H3K27ac ChIP-seq reads in candidate enhancer regions. It also makes GeneList.txt, which counts reads in gene bodies and promoter regions.

  Replicate epigenetic experiments should be included as comma delimited list of files. Read counts in replicate experiments will be averaged when computing enhancer Activity.  

  Main output files:

    * **EnhancerList.txt**: Candidate enhancer regions with Dnase-seq and H3K27ac ChIP-seq read counts
    * **GeneList.txt**: Dnase-seq and H3K27ac ChIP-seq read counts on gene bodies and gene promoter regions


<IPython.core.display.Javascript object>

## Section 2/4 Tool Inputs

In [4]:
%%yaml inputs

inputs:
- id: candidate_enchancer_regions
  type: File
  sbg:fileTypes: BED

- id: genes
  type: File
  sbg:fileTypes: BED
  
- id: H3K27ac
  type: File
  doc: ENCFF384ZZM.chr22.bam
  sbg:fileTypes: BAM
  secondaryFiles:
  - pattern: .bai

- id: dhs
  type: File
  doc: wgEncodeUwDnaseK562AlnRep1.chr22.bam
  sbg:fileTypes: BAM
  secondaryFiles:
  - pattern: .bai

- id: expression_table
  type: File
  doc: K562.ENCFF934YBO.TPM.txt
  sbg:fileTypes: BAM
  secondaryFiles:
  - pattern: .bai

- id: chrom_sizes
  type: File

- id: ubiquitously_expressed_genes
  type: File
  doc: UbiquitouslyExpressedGenesHG19.txt
  sbg:fileTypes: TXT

- id: cell_type
  type: string


<IPython.core.display.Javascript object>

## Section 3/4 Scripts and Other Requirements

In [7]:
%%yaml base_command_and_requirements
baseCommand:
- bash
- run.neighborhoods.sh
requirements:
- class: ShellCommandRequirement
- class: InlineJavascriptRequirement
- class: DockerRequirement
  dockerPull: images.sb.biodatacatalyst.nhlbi.nih.gov/andrewblair/cardiac-compendium:2023042401
- class: InitialWorkDirRequirement
  listing:
  - entryname: run.neighborhoods.sh
    writable: false
    entry: |-
    
      python src/run.neighborhoods.py \
      --candidate_enhancer_regions $(inputs.candidate_enchancer_regions.path) \
      --genes $(inputs.genes.path) \
      --H3K27ac $(inputs.H3K27ac.path) \
      --DHS $(inputs.dhs.path) \
      --expression_table $(inputs.expression_table.path) \
      --chrom_sizes $(inputs.chrom_sizes.path) \
      --ubiquitously_expressed_genes $(inputs.ubiquitously_expressed_genes.path) \
      --cellType $(inputs.cell_type) \
      --outdir ./
     


<IPython.core.display.Javascript object>

## Section 4/4 Tool Outputs

In [6]:
%%yaml outputs
outputs:
- id: enchancer_list
  type: File
  doc: Candidate enhancer regions with Dnase-seq and H3K27ac ChIP-seq read counts
  outputBinding:
    glob: '*EnhancerList.txt'
- id: counts
  type: File
  doc: Dnase-seq and H3K27ac ChIP-seq read counts on gene bodies and gene promoter regions
  outputBinding:
    glob: '*GeneList.txt'


<IPython.core.display.Javascript object>

# III Test tool

In [10]:
with open('../cwl/step02_run_neighborhoods.tool.cwl', 'w') as f:
    data = yaml.dump(label_and_description | inputs | base_command_and_requirements | outputs, f, sort_keys=False, default_flow_style=False)

In [9]:
%%bash
cwltool --tool-help ../step02_run_neighborhoods.tool.cwl

[1;30mINFO[0m /home/codespace/.python/current/bin/cwltool 3.1.20230325110543
[1;30mINFO[0m Resolved '../step02_run_neighborhoods.tool.cwl' to 'file:///workspaces/cwl-notebooks/abc_enchancer_gene_prediction/step02_run_neighborhoods.tool.cwl'
                                          `https://sevenbridges.comSaveLogs`
[1;30mINFO[0m ../step02_run_neighborhoods.tool.cwl:6:3: Unknown hint https://sevenbridges.comSaveLogs


usage: ../step02_run_neighborhoods.tool.cwl [-h] --candidate_enchancer_regions
                                            CANDIDATE_ENCHANCER_REGIONS
                                            --genes GENES --H3K27ac H3K27AC
                                            --dhs DHS --expression_table
                                            EXPRESSION_TABLE --chrom_sizes
                                            CHROM_SIZES
                                            --ubiquitously_expressed_genes
                                            UBIQUITOUSLY_EXPRESSED_GENES
                                            --cell_type CELL_TYPE
                                            [job_order]

Quantifying Enhancer Activity: ```run.neighborhoods.py``` will count DNase-seq
(or ATAC-seq) and H3K27ac ChIP-seq reads in candidate enhancer regions. It
also makes GeneList.txt, which counts reads in gene bodies and promoter
regions. Replicate epigenetic experiments should be included as comma
de

In [None]:
%%bash
cwltool ../step02_run_neighborhoods.tool.cwl \
--candidate_enchancer_regions wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted.candidateRegions.bed \
  --genes GENES
  --H3K27ac H3K27AC     ENCFF384ZZM.chr22.bam \
  --dhs DHS             wgEncodeUwDnaseK562AlnRep1.chr22.bam \
  --expression_table K562.ENCFF934YBO.TPM.txt \
  --chrom_sizes chr22 \
  --ubiquitously_expressed_genes UbiquitouslyExpressedGenesHG19.txt \
  --cell_type K562

# IV Push tool to platform

In [11]:
%%bash
sbpack bdc dave/abc-development-scratch-project/run-neighborhoods "../cwl/step02_run_neighborhoods.tool.cwl"


sbpack v2022.03.16
Upload CWL apps to any Seven Bridges powered platform
(c) Seven Bridges 2020



Packing ../cwl/step02_run_neighborhoods.tool.cwl


# V References

https://docs.sevenbridges.com/reference/add-an-app-using-raw-cwl