# I Initialize

This notebook is a walkthrough of describing the 2nd 1 step in the ABC Enhancer Gene Prediction model

[broadinstitute/ABC-Enhancer-Gene-Prediction: Cell type specific enhancer-gene predictions using ABC model (Fulco, Nasser et al, Nature Genetics 2019)](https://github.com/broadinstitute/ABC-Enhancer-Gene-Prediction)  

We wrap the macs2 call candidate regions step.  The example code in the github repo is this

```
conda env create -f abcenv.yml

python src/makeCandidateRegions.py \
--narrowPeak example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted \
--bam example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam \
--outDir example_chr22/ABC_output/Peaks/ \
--chrom_sizes example_chr22/reference/chr22 \
--regions_blocklist reference/wgEncodeHg19ConsensusSignalArtifactRegions.bed \
--regions_includelist example_chr22/reference/RefSeqCurated.170308.bed.CollapsedGeneBounds.TSS500bp.chr22.bed \
--peakExtendFromSummit 250 \
--nStrongestPeaks 3000 

```

## Load packages

In [2]:
#from sevenbridges import Api, ImportExportState
import yaml
import time
import json
import importlib
import getpass
import sevenbridges

In [3]:
%load_ext yamlmagic

# II CWL Description

## Section 1/4 - Tool Label and Documentation

In [4]:
%%yaml label_and_description

#### rarely changing header and bioler plate
cwlVersion: v1.2
class: CommandLineTool
$namespaces:
  sbg: https://sevenbridges.com
hints:
- class: sbg:SaveLogs
  value: '*.sh'  

#### Tool Label  
label: MACS2 Call Candidate Regions

#### Tool Description

doc: |-

  The call candidate regions tool does this...
  


<IPython.core.display.Javascript object>

## Section 2/4 Tool Inputs

example
```
- id: bam
  type: File
  secondaryFiles:
  - pattern: .bai
    required: true
  sbg:fileTypes: BAM
```

In [5]:
%%yaml inputs

inputs:
#narrowPeak example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted \
- id: narrow_peak
  type: File

#bam example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam
- id: bam
  type: File
  sbg:fileTypes: BAM
  secondaryFiles:
  - pattern: .bai

#chrom_sizes example_chr22/reference/chr22 \
- id: chr_sizes
  type: File

#regions_blocklist reference/wgEncodeHg19ConsensusSignalArtifactRegions.bed
- id: regions_blocklist
  type: File
  sbg:fileTypes: BED

#regions_includelist example_chr22/reference/RefSeqCurated.170308.bed.CollapsedGeneBounds.TSS500bp.chr22.bed
- id: regions_includelist
  type: File
  sbg:fileTypes: BED


<IPython.core.display.Javascript object>

## Section 3/4 Scripts and Other Requirements

inlines javascript examples  
```
$(inputs.bam.path) -n $(inputs.bam.nameroot).macs2
```

In [6]:
%%yaml base_command_and_requirements
baseCommand:
- bash
- call_candidate_regions.sh
requirements:
- class: ShellCommandRequirement
- class: InlineJavascriptRequirement
- class: DockerRequirement
  dockerPull: images.sb.biodatacatalyst.nhlbi.nih.gov/andrewblair/cardiac-compendium:2023042401
- class: InitialWorkDirRequirement
  listing:
  - entryname: call_candidate_regions.sh
    writable: false
    entry: |-
      #conda env create -f abcenv.yml

      python3 /usr/src/app/src/makeCandidateRegions.py \
      --narrowPeak $(inputs.narrow_peak.path) \
      --bam $(inputs.bam.path) \
      --outDir ./ \
      --chrom_sizes $(inputs.chr_sizes.path) \
      --regions_blocklist $(inputs.regions_blocklist.path) \
      --regions_includelist $(inputs.regions_includelist.path) \
      --peakExtendFromSummit 250 \
      --nStrongestPeaks 3000 
     


<IPython.core.display.Javascript object>

## Section 4/4 Tool Outputs

In [7]:
%%yaml outputs
outputs:
- id: candidate_regions
  type: File
  outputBinding:
    glob: '*candidateRegions.bed'
- id: counts
  type: File
  outputBinding:
    glob: '*Counts.bed'

<IPython.core.display.Javascript object>

In [None]:
with open('../cwl/step01B_macs2_call_candidate_regions.cwl', 'w') as f:
    data = yaml.dump(label_and_description | inputs | base_command_and_requirements | outputs, f, sort_keys=False, default_flow_style=False)

# III Test tool

In [None]:
%%bash
cwltool --tool-help step01B_macs2_call_candidate_regions.cwl

[1;30mINFO[0m /home/codespace/.python/current/bin/cwltool 3.1.20230325110543
[1;30mINFO[0m Resolved 'step01B_macs2_call_candidate_regions.cwl' to 'file:///workspaces/cwl-notebooks/abc_enchancer_gene_prediction/step01B_macs2_call_candidate_regions.cwl'
                                              to `https://sevenbridges.comSaveLogs`
[1;30mINFO[0m step01B_macs2_call_candidate_regions.cwl:6:3: Unknown hint https://sevenbridges.comSaveLogs


usage: step01B_macs2_call_candidate_regions.cwl [-h] --narrow_peak NARROW_PEAK
                                                --bam BAM --chr_sizes
                                                CHR_SIZES --regions_blocklist
                                                REGIONS_BLOCKLIST
                                                --regions_includelist
                                                REGIONS_INCLUDELIST
                                                [job_order]

The call candidate regions tool does this...

positional arguments:
  job_order             Job input json file

options:
  -h, --help            show this help message and exit
  --narrow_peak NARROW_PEAK
  --bam BAM
  --chr_sizes CHR_SIZES
  --regions_blocklist REGIONS_BLOCKLIST
  --regions_includelist REGIONS_INCLUDELIST


In [None]:
#%%bash
#cwltool macs2_call_candidate_regions.cwl \
#    --narrow_peak /workspaces/cwl-notebooks/tools/ABC-Enhancer-Gene-Prediction/example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562.mergedPeaks.slop175.withTSS500bp.chr22.bed \
#    --bam /workspaces/cwl-notebooks/tools/ABC-Enhancer-Gene-Prediction/example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam \
#    --chr_sizes /workspaces/cwl-notebooks/tools/ABC-Enhancer-Gene-Prediction/example_chr22/reference/chr22 \
#    --regions_blocklist /workspaces/cwl-notebooks/tools/ABC-Enhancer-Gene-Prediction/reference/wgEncodeHg19ConsensusSignalArtifactRegions.bed \
#    --regions_includelist /workspaces/cwl-notebooks/tools/ABC-Enhancer-Gene-Prediction/example_chr22/reference/RefSeqCurated.170308.bed.CollapsedGeneBounds.TSS500bp.chr22.bed

# IV Push tool to platform

In [14]:
%%bash
sbpack bdc dave/abc-development-scratch-project/makecandidateregions step01B_macs2_call_candidate_regions.cwl


sbpack v2022.03.16
Upload CWL apps to any Seven Bridges powered platform
(c) Seven Bridges 2020



Packing step01B_macs2_call_candidate_regions.cwl


Bad pipe message: %s [b'*\xc9\xcf\xe7ba\x1f\xf5\xd9UK\xfa\xd2\x88\x1d\xeb\x98\xf1 \xa4\x8c\xdb\xfb\x81\xf3_\x1a\x81(\xf7\xad\x1a_y\x98\x02!5|\x05z\xa1\x8b-;K\xaapwqb\x00\x08\x13\x02\x13\x03\x13\x01\x00\xff\x01\x00\x00\x8f\x00\x00\x00\x0e\x00\x0c\x00\x00\t127']
Bad pipe message: %s [b'.0.1\x00\x0b\x00\x04\x03\x00\x01\x02\x00\n\x00\x0c\x00\n\x00\x1d\x00\x17\x00\x1e\x00\x19\x00\x18\x00#\x00\x00\x00\x16\x00\x00\x00\x17\x00\x00\x00\r\x00\x1e\x00']
Bad pipe message: %s [b'\x03\x05\x03\x06\x03\x08\x07\x08\x08\x08\t\x08\n\x08\x0b\x08\x04\x08\x05\x08\x06\x04\x01\x05\x01\x06\x01']
Bad pipe message: %s [b'%\xf1c\xed\\\t\xa2\x9dH\x81\xa7lY8\xc2\xee\xea\xc9\x00\x00|\xc0,\xc00\x00', b"\x9f\xcc\xa9\xcc\xa8\xcc\xaa\xc0\xaf\xc0\xad\xc0\xa3\xc0\x9f\xc0]\xc0a\xc0W\xc0S\xc0+\xc0/\x00\xa2\x00\x9e\xc0\xae\xc0\xac\xc0\xa2\xc0\x9e\xc0\\\xc0`\xc0V\xc0R\xc0$\xc0(\x00k\x00j\xc0#\xc0'\x00g\x00@\xc0\n\xc0\x14\x009\x008\xc0\t\xc0\x13\x003\x002\x00\x9d\xc0\xa1\xc0\x9d\xc0Q\x00\x9c\xc0\xa0\xc0\x9c\xc0P\x00=\x00<\x005

# V References

https://docs.sevenbridges.com/reference/add-an-app-using-raw-cwl