# Introduction

This notebook is a walkthrough of describing the 2nd 1 step in the ABC Enhancer Gene Prediction model

[broadinstitute/ABC-Enhancer-Gene-Prediction: Cell type specific enhancer-gene predictions using ABC model (Fulco, Nasser et al, Nature Genetics 2019)](https://github.com/broadinstitute/ABC-Enhancer-Gene-Prediction)  

We wrap the macs call candidate regions step.  The example code in the github repo is this

```
conda env create -f abcenv.yml

python src/makeCandidateRegions.py \
--narrowPeak example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted \
--bam example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam \
--outDir example_chr22/ABC_output/Peaks/ \
--chrom_sizes example_chr22/reference/chr22 \
--regions_blocklist reference/wgEncodeHg19ConsensusSignalArtifactRegions.bed \
--regions_includelist example_chr22/reference/RefSeqCurated.170308.bed.CollapsedGeneBounds.TSS500bp.chr22.bed \
--peakExtendFromSummit 250 \
--nStrongestPeaks 3000 

```

# Load packages

In [1]:
pip install yamlmagic pyyaml sevenbridges-python

Collecting sevenbridges-python
  Downloading sevenbridges_python-2.9.1-py3-none-any.whl (102 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m102.1/102.1 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: sevenbridges-python
Successfully installed sevenbridges-python-2.9.1

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m23.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [2]:
import yaml

In [3]:
%load_ext yamlmagic

In [14]:
#from sevenbridges import Api, ImportExportState
import time
import json
import importlib
import getpass
import sevenbridges

# Section 1/6 - Tool Label

In [5]:
%%yaml label
label: MACS2 Call Candidate Regions

<IPython.core.display.Javascript object>

# Section 2/6 Tool Inputs

example
```
- id: bam
  type: File
  secondaryFiles:
  - pattern: .bai
    required: true
  sbg:fileTypes: BAM
```

In [6]:
%%yaml inputs

inputs:
#narrowPeak example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted \
- id: narrow_peak
  type: File

#bam example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam
- id: bam
  type: File
  sbg:fileTypes: BAM
  secondaryFiles:
  - pattern: .bai
    required: true

#chrom_sizes example_chr22/reference/chr22 \
- id: chr_sizes
  type: File
  required: true  

#regions_blocklist reference/wgEncodeHg19ConsensusSignalArtifactRegions.bed
- id: regions_blocklist
  type: File
  required: true
  sbg:fileTypes: BED

#regions_includelist example_chr22/reference/RefSeqCurated.170308.bed.CollapsedGeneBounds.TSS500bp.chr22.bed
- id: regions_includelist
  type: File
  required: true 
  sbg:fileTypes: BED


<IPython.core.display.Javascript object>

# Section 3/6 Base command

In [7]:
%%yaml base_command
baseCommand:
- bash
- call_candidate_regions.sh

<IPython.core.display.Javascript object>

# Section 4/6 Requirements

inlines javascript examples  
```
$(inputs.bam.path) -n $(inputs.bam.nameroot).macs2
```

In [8]:
%%yaml requirements
requirements:
- class: ShellCommandRequirement
- class: DockerRequirement
  dockerPull: images.sb.biodatacatalyst.nhlbi.nih.gov/andrewblair/cardiac-compendium:2023042401
- class: InitialWorkDirRequirement
  listing:
  - entryname: call_candidate_regions.sh
    writable: false
    entry: |-
      #conda env create -f abcenv.yml

      python3 /usr/src/app/src/makeCandidateRegions.py \
      --narrowPeak $(inputs.narrow_peak.path) \
      --bam $(inputs.bam.path) \
      --outDir ./ \
      --chrom_sizes $(inputs.chr_sizes.path) \
      --regions_blocklist $(inputs.regions_blocklist.path) \
      --regions_includelist $(inputs.regions_includelist.path) \
      --peakExtendFromSummit 250 \
      --nStrongestPeaks 3000 
     
- class: InlineJavascriptRequirement

<IPython.core.display.Javascript object>

# Section 5/6 Outputs

In [9]:
%%yaml outputs
outputs:
- id: macs2_outputs
  type: File[]?
  outputBinding:
    glob: '*.macs2*'

<IPython.core.display.Javascript object>

# Section 6/6 Misc settings

In [10]:
%%yaml misc_settings
### below is boiler plate and rarely changes
cwlVersion: v1.2
class: CommandLineTool

$namespaces:
  sbg: https://sevenbridges.com
  
hints:
- class: sbg:SaveLogs
  value: '*.sh'  

<IPython.core.display.Javascript object>

# Push tool to platform

In [11]:
api_token = getpass.getpass()

In [16]:
api = sevenbridges.Api(url = "https://api.sb.biodatacatalyst.nhlbi.nih.gov/v2",  token = api_token)

In [17]:
api.apps.install_app(
    id='dave/abc-development-scratch-project/makecandidateregions/6',
    raw=label | inputs | base_command | requirements | outputs | misc_settings)

<App: id=dave/abc-development-scratch-project/makecandidateregions rev=6>

In [18]:
print(yaml.safe_dump(label | inputs | base_command | requirements | outputs | misc_settings))


$namespaces:
  sbg: https://sevenbridges.com
baseCommand:
- bash
- call_candidate_regions.sh
class: CommandLineTool
cwlVersion: v1.2
hints:
- class: sbg:SaveLogs
  value: '*.sh'
inputs:
- id: narrow_peak
  type: File
- id: bam
  sbg:fileTypes: BAM
  secondaryFiles:
  - pattern: .bai
    required: true
  type: File
- id: chr_sizes
  required: true
  type: File
- id: regions_blocklist
  required: true
  sbg:fileTypes: BED
  type: File
- id: regions_includelist
  required: true
  sbg:fileTypes: BED
  type: File
label: MACS2 Call Candidate Regions
outputs:
- id: macs2_outputs
  outputBinding:
    glob: '*.macs2*'
  type: File[]?
requirements:
- class: ShellCommandRequirement
- class: DockerRequirement
  dockerPull: images.sb.biodatacatalyst.nhlbi.nih.gov/andrewblair/cardiac-compendium:2023042401
- class: InitialWorkDirRequirement
  listing:
  - entry: '#conda env create -f abcenv.yml


      python3 /usr/src/app/src/makeCandidateRegions.py \

      --narrowPeak $(inputs.narrow_peak.path) \

# Push this notebook to files tab

In [15]:
upload = api.files.upload(
    path='macs2_call_candidate_regions.cwl.ipynb',
    overwrite=True,
    parent = "644299a6dc22f20baf6998fc",
)


# Create Docker Image

```
# TO DO
# TO DO
# Add Tabix install

FROM ubuntu:18.04

# Install required libraries
RUN apt-get update && apt-get install -y python python3 virtualenv python-pip python3-pip  zlib1g-dev zlib1g libbz2-dev liblzma-dev wget libncurses5-dev

# Set the working directory
WORKDIR /usr/src/app

# Setup the python requirements
RUN pip2 install --no-cache-dir numpy
RUN pip3 install Cython
RUN pip3 install --no-cache-dir numpy pandas scipy pyBigWig pyranges

# Setup samtools
RUN wget -O samtools-0.1.19.tar.bz2 https://sourceforge.net/projects/samtools/files/samtools/0.1.19/samtools-0.1.19.tar.bz2/download &&  tar xjf samtools-0.1.19.tar.bz2 && cd  /usr/src/app/samtools-0.1.19 &&  make -j 4

# Doesn't work?
# Setup tabix
# RUN wget -O tabix-0.2.5.tar.bz2 https://sourceforge.net/projects/samtools/files/tabix/tabix-0.2.5.tar.bz2/download &&  tar xjf tabix-0.2.5.tar.bz2 && cd  /usr/src/app/tabix-0.2.5 &&  make -j 4

# Update the path
ENV PATH=/usr/src/app/samtools-0.1.19/:${PATH}

# Setup bedtools
RUN wget -O bedtools-2.26.0.tar.gz https://github.com/arq5x/bedtools2/releases/download/v2.26.0/bedtools-2.26.0.tar.gz && tar xzf bedtools-2.26.0.tar.gz && cd /usr/src/app/bedtools2/ &&  make -j 12

# Update the path
ENV PATH=/usr/src/app/bedtools2/bin/:${PATH}

# Install python packages
#RUN pip2 install MACS2 && pip2 install progressbar &&  pip3 install progressbar
RUN pip3 install MACS2

# Copy the required scripts
COPY src/ src/
```

In [9]:
user_name = getpass.getpass()

In [11]:
%%bash -s "$myPythonVar" "$myOtherVar"
echo "This bash script knows about $1 and $2"

This bash script knows about $myPythonVar and $myOtherVar


In [17]:
%%bash -s "$user_name" "$api_token"
docker login images.sb.biodatacatalyst.nhlbi.nih.gov -u $1 -p $2

https://docs.docker.com/engine/reference/commandline/login/#credentials-store



Login Succeeded


In [20]:
%%bash
docker images

REPOSITORY                                                               TAG          IMAGE ID       CREATED          SIZE
images.sb.biodatacatalyst.nhlbi.nih.gov/andrewblair/cardiac-compendium   2023042401   6849e918aac7   23 minutes ago   985MB


In [23]:
%%bash
docker push images.sb.biodatacatalyst.nhlbi.nih.gov/andrewblair/cardiac-compendium:2023042401

The push refers to repository [images.sb.biodatacatalyst.nhlbi.nih.gov/andrewblair/cardiac-compendium]
ec3ce3022ee3: Preparing
460d6060cde9: Preparing
28c24111cd91: Preparing
1d7de5e7fedc: Preparing
ee70ad3ab83e: Preparing
f9772a127b90: Preparing
66817e1a8325: Preparing
6ec50e2215b4: Preparing
0ffaed1c3fac: Preparing
b7e0fa7bfe7f: Preparing
f9772a127b90: Waiting
66817e1a8325: Waiting
6ec50e2215b4: Waiting
0ffaed1c3fac: Waiting
b7e0fa7bfe7f: Waiting
ec3ce3022ee3: Pushed
1d7de5e7fedc: Pushed
460d6060cde9: Pushed
6ec50e2215b4: Pushed
f9772a127b90: Pushed
66817e1a8325: Pushed
28c24111cd91: Pushed
b7e0fa7bfe7f: Pushed
ee70ad3ab83e: Pushed
0ffaed1c3fac: Pushed
2023042401: digest: sha256:2f8d5d06c170adff171383d70bbb24e3429975bb738f7495ed92785ac4f2ed31 size: 2427


# References

https://docs.sevenbridges.com/reference/add-an-app-using-raw-cwl