# Introduction

This notebook is a walkthrough of describing the 2nd 1 step in the ABC Enhancer Gene Prediction model

[broadinstitute/ABC-Enhancer-Gene-Prediction: Cell type specific enhancer-gene predictions using ABC model (Fulco, Nasser et al, Nature Genetics 2019)](https://github.com/broadinstitute/ABC-Enhancer-Gene-Prediction)  

We wrap the macs call candidate regions step.  The example code in the github repo is this

```
conda env create -f abcenv.yml

python src/makeCandidateRegions.py \
--narrowPeak example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted \
--bam example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam \
--outDir example_chr22/ABC_output/Peaks/ \
--chrom_sizes example_chr22/reference/chr22 \
--regions_blocklist reference/wgEncodeHg19ConsensusSignalArtifactRegions.bed \
--regions_includelist example_chr22/reference/RefSeqCurated.170308.bed.CollapsedGeneBounds.TSS500bp.chr22.bed \
--peakExtendFromSummit 250 \
--nStrongestPeaks 3000 

```

# Load packages

In [1]:
pip install yamlmagic pyyaml sevenbridges-python



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m23.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [2]:
#from sevenbridges import Api, ImportExportState
import yaml
import time
import json
import importlib
import getpass
import sevenbridges

In [3]:
%load_ext yamlmagic

# A. CWL Description

## Section 1/4 - Tool Description

In [4]:
%%yaml label_and_description

#### rarely changing header and bioler plate
cwlVersion: v1.2
class: CommandLineTool
$namespaces:
  sbg: https://sevenbridges.com
hints:
- class: sbg:SaveLogs
  value: '*.sh'  

#### Label  
label: MACS2 Call Candidate Regions

#### Tool Description

doc: |-

  The call candidate regions tool does this...
  


<IPython.core.display.Javascript object>

## Section 2/4 Tool Inputs

example
```
- id: bam
  type: File
  secondaryFiles:
  - pattern: .bai
    required: true
  sbg:fileTypes: BAM
```

In [5]:
%%yaml inputs

inputs:
#narrowPeak example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted \
- id: narrow_peak
  type: File

#bam example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam
- id: bam
  type: File
  sbg:fileTypes: BAM
  secondaryFiles:
  - pattern: .bai

#chrom_sizes example_chr22/reference/chr22 \
- id: chr_sizes
  type: File

#regions_blocklist reference/wgEncodeHg19ConsensusSignalArtifactRegions.bed
- id: regions_blocklist
  type: File
  sbg:fileTypes: BED

#regions_includelist example_chr22/reference/RefSeqCurated.170308.bed.CollapsedGeneBounds.TSS500bp.chr22.bed
- id: regions_includelist
  type: File
  sbg:fileTypes: BED


<IPython.core.display.Javascript object>

## Section 3/4 Scripts and Other Requirements

inlines javascript examples  
```
$(inputs.bam.path) -n $(inputs.bam.nameroot).macs2
```

In [6]:
%%yaml base_command_and_requirements
baseCommand:
- bash
- call_candidate_regions.sh
requirements:
- class: ShellCommandRequirement
- class: InlineJavascriptRequirement
- class: DockerRequirement
  dockerPull: images.sb.biodatacatalyst.nhlbi.nih.gov/andrewblair/cardiac-compendium:2023042401
- class: InitialWorkDirRequirement
  listing:
  - entryname: call_candidate_regions.sh
    writable: false
    entry: |-
      #conda env create -f abcenv.yml

      python3 /usr/src/app/src/makeCandidateRegions.py \
      --narrowPeak $(inputs.narrow_peak.path) \
      --bam $(inputs.bam.path) \
      --outDir ./ \
      --chrom_sizes $(inputs.chr_sizes.path) \
      --regions_blocklist $(inputs.regions_blocklist.path) \
      --regions_includelist $(inputs.regions_includelist.path) \
      --peakExtendFromSummit 250 \
      --nStrongestPeaks 3000 
     


<IPython.core.display.Javascript object>

## Section 4/4 Tool Outputs

In [7]:
%%yaml outputs
outputs:
- id: candidate_regions
  type: File
  outputBinding:
    glob: '*candidateRegions.bed'
- id: counts
  type: File
  outputBinding:
    glob: '*Counts.bed'

<IPython.core.display.Javascript object>

# B.  Create CWL and Test tool

In [8]:
with open('step01B_macs2_call_candidate_regions.cwl', 'w') as f:
    data = yaml.dump(label_and_description | inputs | base_command_and_requirements | outputs, f, sort_keys=False, default_flow_style=False)

In [9]:
%%bash
#pip install cwltool

In [10]:
%%bash
cwltool --tool-help step01B_macs2_call_candidate_regions.cwl

[1;30mINFO[0m /home/codespace/.python/current/bin/cwltool 3.1.20230325110543
[1;30mINFO[0m Resolved 'step01B_macs2_call_candidate_regions.cwl' to 'file:///workspaces/cwl-notebooks/abc_enchancer_gene_prediction/step01B_macs2_call_candidate_regions.cwl'
                                              to `https://sevenbridges.comSaveLogs`
[1;30mINFO[0m step01B_macs2_call_candidate_regions.cwl:6:3: Unknown hint https://sevenbridges.comSaveLogs


usage: step01B_macs2_call_candidate_regions.cwl [-h] --narrow_peak NARROW_PEAK
                                                --bam BAM --chr_sizes
                                                CHR_SIZES --regions_blocklist
                                                REGIONS_BLOCKLIST
                                                --regions_includelist
                                                REGIONS_INCLUDELIST
                                                [job_order]

The call candidate regions tool does this...

positional arguments:
  job_order             Job input json file

options:
  -h, --help            show this help message and exit
  --narrow_peak NARROW_PEAK
  --bam BAM
  --chr_sizes CHR_SIZES
  --regions_blocklist REGIONS_BLOCKLIST
  --regions_includelist REGIONS_INCLUDELIST


--narrow_peak NARROW_PEAK
  --bam wgEncodeUwDnaseK562AlnRep1.chr22.bam
  --chr_sizes CHR_SIZES
  --regions_blocklist REGIONS_BLOCKLIST
  --regions_includelist REGIONS_INCLUDELIST

In [11]:
%%bash
cwltool macs2_call_candidate_regions.cwl \
    --narrow_peak /workspaces/cwl-notebooks/tools/ABC-Enhancer-Gene-Prediction/example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562.mergedPeaks.slop175.withTSS500bp.chr22.bed \
    --bam /workspaces/cwl-notebooks/tools/ABC-Enhancer-Gene-Prediction/example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam \
    --chr_sizes /workspaces/cwl-notebooks/tools/ABC-Enhancer-Gene-Prediction/example_chr22/reference/chr22 \
    --regions_blocklist /workspaces/cwl-notebooks/tools/ABC-Enhancer-Gene-Prediction/reference/wgEncodeHg19ConsensusSignalArtifactRegions.bed \
    --regions_includelist /workspaces/cwl-notebooks/tools/ABC-Enhancer-Gene-Prediction/example_chr22/reference/RefSeqCurated.170308.bed.CollapsedGeneBounds.TSS500bp.chr22.bed

[1;30mINFO[0m /home/codespace/.python/current/bin/cwltool 3.1.20230325110543
[1;30mINFO[0m Resolved 'macs2_call_candidate_regions.cwl' to 'file:///workspaces/cwl-notebooks/abc_enchancer_gene_prediction/macs2_call_candidate_regions.cwl'
                                       `https://sevenbridges.comSaveLogs`
[1;30mINFO[0m macs2_call_candidate_regions.cwl:64:3: Unknown hint https://sevenbridges.comSaveLogs
[1;30mERROR[0m [31mWorkflow error, try again with --debug for more information:
macs2_call_candidate_regions.cwl:8:3: Missing required secondary file
                                      'wgEncodeUwDnaseK562AlnRep1.chr22.bam.bai' from file object: {
                                          "class": "File",
                                          "location":
                                          "file:///workspaces/cwl-notebooks/tools/ABC-Enhancer-Gene-Prediction/example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam",
                                 

CalledProcessError: Command 'b'cwltool macs2_call_candidate_regions.cwl \\\n    --narrow_peak /workspaces/cwl-notebooks/tools/ABC-Enhancer-Gene-Prediction/example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562.mergedPeaks.slop175.withTSS500bp.chr22.bed \\\n    --bam /workspaces/cwl-notebooks/tools/ABC-Enhancer-Gene-Prediction/example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam \\\n    --chr_sizes /workspaces/cwl-notebooks/tools/ABC-Enhancer-Gene-Prediction/example_chr22/reference/chr22 \\\n    --regions_blocklist /workspaces/cwl-notebooks/tools/ABC-Enhancer-Gene-Prediction/reference/wgEncodeHg19ConsensusSignalArtifactRegions.bed \\\n    --regions_includelist /workspaces/cwl-notebooks/tools/ABC-Enhancer-Gene-Prediction/example_chr22/reference/RefSeqCurated.170308.bed.CollapsedGeneBounds.TSS500bp.chr22.bed\n'' returned non-zero exit status 1.

# C. Push tool to platform

In [12]:
%%bash
sbpack bdc dave/abc-development-scratch-project/makecandidateregions step01B_macs2_call_candidate_regions.cwl


sbpack v2022.03.16
Upload CWL apps to any Seven Bridges powered platform
(c) Seven Bridges 2020



Packing step01B_macs2_call_candidate_regions.cwl


# References

https://docs.sevenbridges.com/reference/add-an-app-using-raw-cwl