# Introduction

This notebook is a walkthrough of describing the 2nd 1 step in the ABC Enhancer Gene Prediction model

[broadinstitute/ABC-Enhancer-Gene-Prediction: Cell type specific enhancer-gene predictions using ABC model (Fulco, Nasser et al, Nature Genetics 2019)](https://github.com/broadinstitute/ABC-Enhancer-Gene-Prediction)  

We wrap the macs call candidate regions step.  The example code in the github repo is this

```
conda env create -f abcenv.yml

python src/makeCandidateRegions.py \
--narrowPeak example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted \
--bam example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam \
--outDir example_chr22/ABC_output/Peaks/ \
--chrom_sizes example_chr22/reference/chr22 \
--regions_blocklist reference/wgEncodeHg19ConsensusSignalArtifactRegions.bed \
--regions_includelist example_chr22/reference/RefSeqCurated.170308.bed.CollapsedGeneBounds.TSS500bp.chr22.bed \
--peakExtendFromSummit 250 \
--nStrongestPeaks 3000 

```

# Load packages

In [11]:
pip install yamlmagic pyyaml sevenbridges-python


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m23.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [12]:
import yaml

In [13]:
%load_ext yamlmagic

The yamlmagic extension is already loaded. To reload it, use:
  %reload_ext yamlmagic


In [14]:
#from sevenbridges import Api, ImportExportState
import time
import json
import importlib
import getpass
import sevenbridges

# Section 1/6 - Tool Label

In [15]:
%%yaml label
label: MACS2 Call Candidate Regions

<IPython.core.display.Javascript object>

# Section 2/6 Tool Inputs

example
```
- id: bam
  type: File
  secondaryFiles:
  - pattern: .bai
    required: true
  sbg:fileTypes: BAM
```

In [16]:
%%yaml inputs

inputs:
#narrowPeak example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted \
- id: narrow_peak
  type: File

#bam example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam
- id: bam
  type: File
  sbg:fileTypes: BAM
  secondaryFiles:
  - pattern: .bai

#chrom_sizes example_chr22/reference/chr22 \
- id: chr_sizes
  type: File

#regions_blocklist reference/wgEncodeHg19ConsensusSignalArtifactRegions.bed
- id: regions_blocklist
  type: File
  sbg:fileTypes: BED

#regions_includelist example_chr22/reference/RefSeqCurated.170308.bed.CollapsedGeneBounds.TSS500bp.chr22.bed
- id: regions_includelist
  type: File
  sbg:fileTypes: BED


<IPython.core.display.Javascript object>

# Section 3/6 Base command

In [17]:
%%yaml base_command
baseCommand:
- bash
- call_candidate_regions.sh

<IPython.core.display.Javascript object>

# Section 4/6 Requirements

inlines javascript examples  
```
$(inputs.bam.path) -n $(inputs.bam.nameroot).macs2
```

In [18]:
%%yaml requirements
requirements:
- class: ShellCommandRequirement
- class: DockerRequirement
  dockerPull: images.sb.biodatacatalyst.nhlbi.nih.gov/andrewblair/cardiac-compendium:2023042401
- class: InitialWorkDirRequirement
  listing:
  - entryname: call_candidate_regions.sh
    writable: false
    entry: |-
      #conda env create -f abcenv.yml

      python3 /usr/src/app/src/makeCandidateRegions.py \
      --narrowPeak $(inputs.narrow_peak.path) \
      --bam $(inputs.bam.path) \
      --outDir ./ \
      --chrom_sizes $(inputs.chr_sizes.path) \
      --regions_blocklist $(inputs.regions_blocklist.path) \
      --regions_includelist $(inputs.regions_includelist.path) \
      --peakExtendFromSummit 250 \
      --nStrongestPeaks 3000 
     
- class: InlineJavascriptRequirement

<IPython.core.display.Javascript object>

# Section 5/6 Outputs

In [25]:
%%yaml outputs
outputs:
- id: candidate_regions
  type: File
  outputBinding:
    glob: '*candidateRegions.bed'
- id: counts
  type: File
  outputBinding:
    glob: '*Counts.bed'

<IPython.core.display.Javascript object>

# Section 6/6 Misc settings

In [20]:
%%yaml misc_settings
### below is boiler plate and rarely changes
cwlVersion: v1.2
class: CommandLineTool

$namespaces:
  sbg: https://sevenbridges.com
  
hints:
- class: sbg:SaveLogs
  value: '*.sh'  

<IPython.core.display.Javascript object>

# Push tool to platform

In [9]:
api_token = getpass.getpass()

In [10]:
api = sevenbridges.Api(url = "https://api.sb.biodatacatalyst.nhlbi.nih.gov/v2",  token = api_token)

In [29]:
api.apps.install_app(
    id='dave/abc-development-scratch-project/makecandidateregions/8',
    raw=label | inputs | base_command | requirements | outputs | misc_settings)

<App: id=dave/abc-development-scratch-project/makecandidateregions rev=8>

In [26]:
print(yaml.safe_dump(label | inputs | base_command | requirements | outputs | misc_settings))


$namespaces:
  sbg: https://sevenbridges.com
baseCommand:
- bash
- call_candidate_regions.sh
class: CommandLineTool
cwlVersion: v1.2
hints:
- class: sbg:SaveLogs
  value: '*.sh'
inputs:
- id: narrow_peak
  type: File
- id: bam
  sbg:fileTypes: BAM
  secondaryFiles:
  - pattern: .bai
  type: File
- id: chr_sizes
  type: File
- id: regions_blocklist
  sbg:fileTypes: BED
  type: File
- id: regions_includelist
  sbg:fileTypes: BED
  type: File
label: MACS2 Call Candidate Regions
outputs:
- id: candidate_regions
  outputBinding:
    glob: '*candidateRegions.bed'
  type: File
- id: counts
  outputBinding:
    glob: '*Counts.bed'
  type: File
requirements:
- class: ShellCommandRequirement
- class: DockerRequirement
  dockerPull: images.sb.biodatacatalyst.nhlbi.nih.gov/andrewblair/cardiac-compendium:2023042401
- class: InitialWorkDirRequirement
  listing:
  - entry: '#conda env create -f abcenv.yml


      python3 /usr/src/app/src/makeCandidateRegions.py \

      --narrowPeak $(inputs.narrow_p

In [28]:
with open('macs2_call_candidate_regions.cwl', 'w') as f:
    data = yaml.dump(label | inputs | base_command | requirements | outputs | misc_settings, f, sort_keys=False, default_flow_style=False)

# Push this notebook to files tab

In [15]:
upload = api.files.upload(
    path='macs2_call_candidate_regions.cwl.ipynb',
    overwrite=True,
    parent = "644299a6dc22f20baf6998fc",
)


# Create Docker Image

In [8]:
%%writefile Dockerfile
# TO DO
# TO DO
# Add Tabix install

FROM continuumio/anaconda3:2022.10
# Install required libraries
RUN apt-get update && apt-get install -y python python3 virtualenv python3-pip  zlib1g-dev zlib1g libbz2-dev liblzma-dev wget libncurses5-dev

# Set the working directory
WORKDIR /usr/src/app

# Setup the python requirements
#RUN pip2 install --no-cache-dir numpy
RUN pip3 install Cython
RUN pip3 install --no-cache-dir numpy pandas scipy pyBigWig pyranges
#RUN pip2 install pysam

# Setup samtools
RUN wget -O samtools-0.1.19.tar.bz2 https://sourceforge.net/projects/samtools/files/samtools/0.1.19/samtools-0.1.19.tar.bz2/download &&  tar xjf samtools-0.1.19.tar.bz2 && cd  /usr/src/app/samtools-0.1.19 &&  make -j 4

# Doesn't work?
# Setup tabix
# RUN wget -O tabix-0.2.5.tar.bz2 https://sourceforge.net/projects/samtools/files/tabix/tabix-0.2.5.tar.bz2/download &&  tar xjf tabix-0.2.5.tar.bz2 && cd  /usr/src/app/tabix-0.2.5 &&  make -j 4

# Update the path
ENV PATH=/usr/src/app/samtools-0.1.19/:${PATH}

# Setup bedtools
RUN wget -O bedtools-2.26.0.tar.gz https://github.com/arq5x/bedtools2/releases/download/v2.26.0/bedtools-2.26.0.tar.gz && tar xzf bedtools-2.26.0.tar.gz && cd /usr/src/app/bedtools2/ &&  make -j 12

# Update the path
ENV PATH=/usr/src/app/bedtools2/bin/:${PATH}

# Install python packages
#RUN pip2 install MACS2 && pip2 install progressbar &&  pip3 install progressbar
#RUN pip3 install MACS2
COPY macs.yml /usr/src/app/macs.yml
RUN conda env create -f macs.yml
COPY abcenv.yml /usr/src/app/abcenv.yml
RUN conda env create -f abcenv.yml


# Copy the required scripts
COPY src/ src/

Writing Dockerfile


In [9]:
%%bash
#git clone https://github.com/broadinstitute/ABC-Enhancer-Gene-Prediction.git
mv Dockerfile ABC-Enhancer-Gene-Prediction
cd ABC-Enhancer-Gene-Prediction/
#docker build -t images.sb.biodatacatalyst.nhlbi.nih.gov/andrewblair/cardiac-compendium:2023042401 ./


In [9]:
user_name = getpass.getpass()

In [11]:
%%bash -s "$myPythonVar" "$myOtherVar"
echo "This bash script knows about $1 and $2"

This bash script knows about $myPythonVar and $myOtherVar


In [17]:
%%bash -s "$user_name" "$api_token"
docker login images.sb.biodatacatalyst.nhlbi.nih.gov -u $1 -p $2

https://docs.docker.com/engine/reference/commandline/login/#credentials-store



Login Succeeded


In [10]:
%%bash
docker images

REPOSITORY                                                               TAG          IMAGE ID       CREATED          SIZE
images.sb.biodatacatalyst.nhlbi.nih.gov/andrewblair/cardiac-compendium   2023042401   652e6fabded5   2 minutes ago    8.4GB
<none>                                                                   <none>       9f241e70b291   21 minutes ago   1.01GB
images.sb.biodatacatalyst.nhlbi.nih.gov/andrewblair/cardiac-compendium   <none>       6849e918aac7   4 hours ago      985MB


In [5]:
%%bash
docker push images.sb.biodatacatalyst.nhlbi.nih.gov/andrewblair/cardiac-compendium:2023042401

The push refers to repository [images.sb.biodatacatalyst.nhlbi.nih.gov/andrewblair/cardiac-compendium]
c328c9630971: Preparing
8ea20722af9d: Preparing
8ed486dc87e3: Preparing
414cf4ba15ab: Preparing
1312649f8d59: Preparing
8778bf8006bf: Preparing
4e1e398d52e3: Preparing
cf9824e7a29e: Preparing
3ecfd4cdc847: Preparing
7b9c35392376: Preparing
5ac6f5a1ae3e: Preparing
6f56466ea64f: Preparing
497abe10839f: Preparing
ec4a38999118: Preparing
4e1e398d52e3: Waiting
cf9824e7a29e: Waiting
3ecfd4cdc847: Waiting
6f56466ea64f: Waiting
7b9c35392376: Waiting
5ac6f5a1ae3e: Waiting
497abe10839f: Waiting
8778bf8006bf: Waiting
ec4a38999118: Waiting
8ed486dc87e3: Pushed
1312649f8d59: Pushed
c328c9630971: Pushed
4e1e398d52e3: Pushed
3ecfd4cdc847: Layer already exists
7b9c35392376: Layer already exists
5ac6f5a1ae3e: Layer already exists
6f56466ea64f: Layer already exists
497abe10839f: Layer already exists
ec4a38999118: Layer already exists
8778bf8006bf: Pushed
cf9824e7a29e: Pushed
8ea20722af9d: Pushed
414cf4

In [29]:
%%bash
pip install cwltool

Collecting cwltool
  Downloading cwltool-3.1.20230325110543-py3-none-any.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m19.6 MB/s[0m eta [36m0:00:00[0m00:01[0m
[?25hCollecting shellescape<3.9,>=3.4.1
  Downloading shellescape-3.8.1-py2.py3-none-any.whl (3.1 kB)
Collecting rdflib<6.4.0,>=4.2.2
  Downloading rdflib-6.3.2-py3-none-any.whl (528 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m528.1/528.1 kB[0m [31m17.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting prov==1.5.1
  Downloading prov-1.5.1-py2.py3-none-any.whl (426 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m426.5/426.5 kB[0m [31m15.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting ruamel.yaml<0.17.22,>=0.15
  Downloading ruamel.yaml-0.17.21-py3-none-any.whl (109 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m109.5/109.5 kB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
Collecting coloredlogs
  Downloading coloredlog


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m23.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython -m pip install --upgrade pip[0m


In [1]:
%%bash
cwltool --tool-help macs2_call_candidate_regions.cwl

[1;30mINFO[0m /home/codespace/.python/current/bin/cwltool 3.1.20230325110543
[1;30mINFO[0m Resolved 'macs2_call_candidate_regions.cwl' to 'file:///workspaces/cwl-notebooks/tools/macs2_call_candidate_regions.cwl'
                                       `https://sevenbridges.comSaveLogs`
[1;30mINFO[0m macs2_call_candidate_regions.cwl:60:3: Unknown hint https://sevenbridges.comSaveLogs


usage: macs2_call_candidate_regions.cwl [-h] --narrow_peak NARROW_PEAK --bam
                                        BAM --chr_sizes CHR_SIZES
                                        --regions_blocklist REGIONS_BLOCKLIST
                                        --regions_includelist
                                        REGIONS_INCLUDELIST
                                        [job_order]

MACS2 Call Candidate Regions

positional arguments:
  job_order             Job input json file

options:
  -h, --help            show this help message and exit
  --narrow_peak NARROW_PEAK
  --bam BAM
  --chr_sizes CHR_SIZES
  --regions_blocklist REGIONS_BLOCKLIST
  --regions_includelist REGIONS_INCLUDELIST


In [None]:
--narrow_peak NARROW_PEAK
  --bam wgEncodeUwDnaseK562AlnRep1.chr22.bam
  --chr_sizes CHR_SIZES
  --regions_blocklist REGIONS_BLOCKLIST
  --regions_includelist REGIONS_INCLUDELIST

In [4]:
%%bash
cwltool macs2_call_candidate_regions.cwl \
    --narrow_peak /workspaces/cwl-notebooks/tools/ABC-Enhancer-Gene-Prediction/example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562.mergedPeaks.slop175.withTSS500bp.chr22.bed \
    --bam /workspaces/cwl-notebooks/tools/ABC-Enhancer-Gene-Prediction/example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam \
    --chr_sizes /workspaces/cwl-notebooks/tools/ABC-Enhancer-Gene-Prediction/example_chr22/reference/chr22 \
    --regions_blocklist /workspaces/cwl-notebooks/tools/ABC-Enhancer-Gene-Prediction/reference/wgEncodeHg19ConsensusSignalArtifactRegions.bed \
    --regions_includelist /workspaces/cwl-notebooks/tools/ABC-Enhancer-Gene-Prediction/example_chr22/reference/RefSeqCurated.170308.bed.CollapsedGeneBounds.TSS500bp.chr22.bed

[1;30mINFO[0m /home/codespace/.python/current/bin/cwltool 3.1.20230325110543
[1;30mINFO[0m Resolved 'macs2_call_candidate_regions.cwl' to 'file:///workspaces/cwl-notebooks/tools/macs2_call_candidate_regions.cwl'
                                       `https://sevenbridges.comSaveLogs`
[1;30mINFO[0m macs2_call_candidate_regions.cwl:60:3: Unknown hint https://sevenbridges.comSaveLogs
usage: macs2_call_candidate_regions.cwl [-h] --narrow_peak NARROW_PEAK --bam
                                        BAM --chr_sizes CHR_SIZES
                                        --regions_blocklist REGIONS_BLOCKLIST
                                        --regions_includelist
                                        REGIONS_INCLUDELIST
                                        [job_order]
macs2_call_candidate_regions.cwl: error: the following arguments are required: --regions_blocklist, --regions_includelist
bash: line 5: --regions_blocklist: command not found


CalledProcessError: Command 'b'cwltool macs2_call_candidate_regions.cwl \\\n    --narrow_peak /workspaces/cwl-notebooks/tools/ABC-Enhancer-Gene-Prediction/example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562.mergedPeaks.slop175.withTSS500bp.chr22.bed \\\n    --bam /workspaces/cwl-notebooks/tools/ABC-Enhancer-Gene-Prediction/example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam \\\n    --chr_sizes /workspaces/cwl-notebooks/tools/ABC-Enhancer-Gene-Prediction/example_chr22/reference/chr22\n    --regions_blocklist /workspaces/cwl-notebooks/tools/ABC-Enhancer-Gene-Prediction/reference/wgEncodeHg19ConsensusSignalArtifactRegions.bed \\\n    --regions_includelist /workspaces/cwl-notebooks/tools/ABC-Enhancer-Gene-Prediction/example_chr22/reference/RefSeqCurated.170308.bed.CollapsedGeneBounds.TSS500bp.chr22.bed\n'' returned non-zero exit status 127.

# References

https://docs.sevenbridges.com/reference/add-an-app-using-raw-cwl