# I Initialize

This notebook is a walkthrough of describing the 3rd step in the ABC Enhancer Gene Prediction model

[broadinstitute/ABC-Enhancer-Gene-Prediction: Cell type specific enhancer-gene predictions using ABC model (Fulco, Nasser et al, Nature Genetics 2019)](https://github.com/broadinstitute/ABC-Enhancer-Gene-Prediction)  

From the repo:

Computing the ABC Score

Compute ABC scores by combining Activity (as calculated by run.neighborhoods.py) and Hi-C.

```
python src/predict.py \
--enhancers example_chr22/ABC_output/Neighborhoods/EnhancerList.txt \
--genes example_chr22/ABC_output/Neighborhoods/GeneList.txt \
--HiCdir example_chr22/input_data/HiC/raw/ \
--chrom_sizes example_chr22/reference/chr22 \
--hic_resolution 5000 \
--scale_hic_using_powerlaw \
--threshold .02 \
--cellType K562 \
--outdir example_chr22/ABC_output/Predictions/ \
--make_all_putative
```

Main output files:

  * **EnhancerList.txt**: Candidate enhancer regions with Dnase-seq and H3K27ac ChIP-seq read counts
  * **GeneList.txt**: Dnase-seq and H3K27ac ChIP-seq read counts on gene bodies and gene promoter regions


## Load packages

In [42]:
#from sevenbridges import Api, ImportExportState
import yaml
import time
import json
import importlib
import getpass
import sevenbridges

In [43]:
%load_ext yamlmagic

The yamlmagic extension is already loaded. To reload it, use:
  %reload_ext yamlmagic


# II CWL Description

## Section 1/4 - Tool Label and Documentation

In [44]:
%%yaml label_and_description

#### rarely changing header and boiler plate
cwlVersion: v1.2
class: CommandLineTool
$namespaces:
  sbg: https://sevenbridges.com
hints:
- class: sbg:SaveLogs
  value: '*.sh'  

#### Tool Label  
label: Compute ABC Score

#### Tool Description
doc: |-

  Compute ABC scores by combining Activity (as calculated by run.neighborhoods.py) and Hi-C.  
  


<IPython.core.display.Javascript object>

## Section 2/4 Tool Inputs

In [45]:
%%yaml inputs

inputs:
- id: enhancers
  type: File
  doc: EnhancerList.txt
  sbg:fileTypes: TXT
  
- id: genes
  type: File
  doc: GeneList.txt
  sbg:fileTypes: TXT

- id: hi_c_directory
  type: Directory
  doc: example_chr22/input_data/HiC/raw/

- id: chrom_sizes
  type: File
  doc: example_chr22/reference/chr22
  secondaryFiles:
  - pattern: .bed

- id: hi_c_resolution
  type: int

- id: cell_type
  type: string


<IPython.core.display.Javascript object>

## Section 3/4 Scripts and Other Requirements

In [46]:
%%yaml base_command_and_requirements
baseCommand:
- bash
- predict.sh
requirements:
- class: ShellCommandRequirement
- class: InlineJavascriptRequirement
- class: DockerRequirement
  dockerPull: images.sb.biodatacatalyst.nhlbi.nih.gov/andrewblair/cardiac-compendium:2023042901
- class: InitialWorkDirRequirement
  listing:
  - entryname: predict.sh
    writable: false
    entry: |-
    
      python /usr/src/app/src/predict.py \
      --enhancers $(inputs.enhancers.path) \
      --genes $(inputs.genes.path) \
      --HiCdir $(inputs.hi_c_directory.path) \
      --chrom_sizes $(inputs.chrom_sizes.path) \
      --hic_resolution 5000 \
      --scale_hic_using_powerlaw \
      --threshold .02 \
      --cellType $(inputs.cel_type) \
      --outdir ./ \
      --make_all_putative



     


<IPython.core.display.Javascript object>

## Section 4/4 Tool Outputs

In [47]:
%%yaml outputs
outputs:
- id: enchancer_list
  type: File
  outputBinding:
    glob: '*EnhancerList.txt'



<IPython.core.display.Javascript object>

# III Test tool

In [48]:
with open('../cwl/step03_compute_abc_score.tool.cwl', 'w') as f:
    data = yaml.dump(label_and_description | inputs | base_command_and_requirements | outputs, f, sort_keys=False, default_flow_style=False)

In [49]:
%%bash
cwltool --tool-help ../cwl/step03_compute_abc_score.tool.cwl

[1;30mINFO[0m /home/codespace/.python/current/bin/cwltool 3.1.20230425144158
[1;30mINFO[0m Resolved '../cwl/step03_compute_abc_score.tool.cwl' to 'file:///workspaces/workflow-notebooks/abc_enchancer_gene_prediction/cwl/step03_compute_abc_score.tool.cwl'
                                              to 'https://sevenbridges.comSaveLogs'
[1;30mINFO[0m ../cwl/step03_compute_abc_score.tool.cwl:6:3: Unknown hint https://sevenbridges.comSaveLogs


usage: ../cwl/step03_compute_abc_score.tool.cwl [-h] --enhancers ENHANCERS
                                                --genes GENES --hi_c_directory
                                                HI_C_DIRECTORY --chrom_sizes
                                                CHROM_SIZES --hi_c_resolution
                                                HI_C_RESOLUTION --cell_type
                                                CELL_TYPE
                                                [job_order]

Compute ABC scores by combining Activity (as calculated by
run.neighborhoods.py) and Hi-C.

positional arguments:
  job_order             Job input json file

options:
  -h, --help            show this help message and exit
  --enhancers ENHANCERS
                        EnhancerList.txt
  --genes GENES         GeneList.txt
  --hi_c_directory HI_C_DIRECTORY
                        example_chr22/input_data/HiC/raw/
  --chrom_sizes CHROM_SIZES
                        example_chr22/reference/ch

In [50]:
%%script false --no-raise-error
#%%bash
cwltool ../step02_run_neighborhoods.tool.cwl \
--candidate_enchancer_regions wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted.candidateRegions.bed \
  --genes GENES
  --H3K27ac H3K27AC     ENCFF384ZZM.chr22.bam \
  --dhs DHS             wgEncodeUwDnaseK562AlnRep1.chr22.bam \
  --expression_table K562.ENCFF934YBO.TPM.txt \
  --chrom_sizes chr22 \
  --ubiquitously_expressed_genes UbiquitouslyExpressedGenesHG19.txt \
  --cell_type K562

In [51]:
%%script false --no-raise-error
python /usr/src/app/src/run.neighborhoods.py \
--candidate_enhancer_regions wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted.candidateRegions.bed \
--genes RefSeqCurated.170308.bed.CollapsedGeneBounds.chr22.bed \
--H3K27ac ENCFF384ZZM.chr22.bam \
--DHS wgEncodeUwDnaseK562AlnRep1.chr22.bam \
--expression_table K562.ENCFF934YBO.TPM.txt \
--chrom_sizes chr22 \
--ubiquitously_expressed_genes UbiquitouslyExpressedGenesHG19.txt \
--cellType DHS \
--outdir ./

# IV Push tool to platform

In [52]:
%%bash
sbpack bdc dave/abc-development-scratch-project/compute-abc-score ../cwl/step03_compute_abc_score.tool.cwl


sbpack v2022.03.16
Upload CWL apps to any Seven Bridges powered platform
(c) Seven Bridges 2020



Packing ../cwl/step03_compute_abc_score.tool.cwl


## V  Issues Encountered while Debugging

came across this same issue as  
[gzip: stdout: Broken pipe · Issue #74 · broadinstitute/ABC-Enhancer-Gene-Prediction](https://github.com/broadinstitute/ABC-Enhancer-Gene-Prediction/issues/74) 
Will have to correct and re-build docker.  


# VI References

https://docs.sevenbridges.com/reference/add-an-app-using-raw-cwl