# Introduction

This notebook is a walkthrough of describing the 2nd 1 step in the ABC Enhancer Gene Prediction model

[broadinstitute/ABC-Enhancer-Gene-Prediction: Cell type specific enhancer-gene predictions using ABC model (Fulco, Nasser et al, Nature Genetics 2019)](https://github.com/broadinstitute/ABC-Enhancer-Gene-Prediction)  

We wrap the macs call candidate regions step.  The example code in the github repo is this

```
conda env create -f abcenv.yml

python src/makeCandidateRegions.py \
--narrowPeak example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted \
--bam example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam \
--outDir example_chr22/ABC_output/Peaks/ \
--chrom_sizes example_chr22/reference/chr22 \
--regions_blocklist reference/wgEncodeHg19ConsensusSignalArtifactRegions.bed \
--regions_includelist example_chr22/reference/RefSeqCurated.170308.bed.CollapsedGeneBounds.TSS500bp.chr22.bed \
--peakExtendFromSummit 250 \
--nStrongestPeaks 3000 

```

# Load packages

In [1]:
pip install yamlmagic pyyaml

Collecting yamlmagic
  Downloading yamlmagic-0.2.0-py2.py3-none-any.whl (5.5 kB)
Installing collected packages: yamlmagic
Successfully installed yamlmagic-0.2.0
Note: you may need to restart the kernel to use updated packages.


In [2]:
import yaml

In [3]:
%load_ext yamlmagic

In [4]:
from sevenbridges import Api, ImportExportState
import time
import json
import importlib
import getpass

# Section 1/6 - Tool Label

In [5]:
%%yaml label
label: MACS2 Call Candidate Regions

<IPython.core.display.Javascript object>

# Section 2/6 Tool Inputs

example
```
- id: bam
  type: File
  secondaryFiles:
  - pattern: .bai
    required: true
  sbg:fileTypes: BAM
```

In [6]:
%%yaml inputs

inputs:
#narrowPeak example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted \
- id: narrow_peak
  type: File

#bam example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam
- id: bam
  type: File
  sbg:fileTypes: BAM
  secondaryFiles:
  - pattern: .bai
    required: true

#chrom_sizes example_chr22/reference/chr22 \
- id: chr_sizes
  type: File
  required: true  

#regions_blocklist reference/wgEncodeHg19ConsensusSignalArtifactRegions.bed
- id: regions_blocklist
  type: File
  required: true
  sbg:fileTypes: BED

#regions_includelist example_chr22/reference/RefSeqCurated.170308.bed.CollapsedGeneBounds.TSS500bp.chr22.bed
- id: regions_includelist
  type: File
  required: true 
  sbg:fileTypes: BED


<IPython.core.display.Javascript object>

# Section 3/6 Base command

In [7]:
%%yaml base_command
baseCommand:
- bash
- call_candidate_regions.sh

<IPython.core.display.Javascript object>

# Section 4/6 Requirements

inlines javascript examples  
```
$(inputs.bam.path) -n $(inputs.bam.nameroot).macs2
```

In [8]:
%%yaml requirements
requirements:
- class: ShellCommandRequirement
- class: DockerRequirement
  dockerPull: quay.io/jnasser/abc-container
- class: InitialWorkDirRequirement
  listing:
  - entryname: call_candidate_regions.sh
    writable: false
    entry: |-
      #conda env create -f abcenv.yml

      python3 /usr/src/app/src/makeCandidateRegions.py \
      --narrowPeak $(inputs.narrow_peak.path) \
      --bam $(inputs.bam.path) \
      --outDir ./ \
      --chrom_sizes $(inputs.chr_sizes.path) \
      --regions_blocklist $(inputs.regions_blocklist.path) \
      --regions_includelist $(inputs.regions_includelist.path) \
      --peakExtendFromSummit 250 \
      --nStrongestPeaks 3000 
     
- class: InlineJavascriptRequirement

<IPython.core.display.Javascript object>

# Section 5/6 Outputs

In [9]:
%%yaml outputs
outputs:
- id: macs2_outputs
  type: File[]?
  outputBinding:
    glob: '*.macs2*'

<IPython.core.display.Javascript object>

# Section 6/6 Misc settings

In [10]:
%%yaml misc_settings
### below is boiler plate and rarely changes
cwlVersion: v1.2
class: CommandLineTool

$namespaces:
  sbg: https://sevenbridges.com
  
hints:
- class: sbg:SaveLogs
  value: '*.sh'  

<IPython.core.display.Javascript object>

# Push tool to platform

In [11]:
api_token = getpass.getpass()

 ································


In [12]:
api = Api(url = "https://api.sb.biodatacatalyst.nhlbi.nih.gov/v2",  token = api_token)

In [13]:
api.apps.install_app(
    id='dave/abc-development-scratch-project/makecandidateregions/5',
    raw=label | inputs | base_command | requirements | outputs | misc_settings)

<App: id=dave/abc-development-scratch-project/makecandidateregions rev=5>

In [14]:
print(yaml.safe_dump(label | inputs | base_command | requirements | outputs | misc_settings))


$namespaces:
  sbg: https://sevenbridges.com
baseCommand:
- bash
- call_candidate_regions.sh
class: CommandLineTool
cwlVersion: v1.2
hints:
- class: sbg:SaveLogs
  value: '*.sh'
inputs:
- id: narrow_peak
  type: File
- id: bam
  sbg:fileTypes: BAM
  secondaryFiles:
  - pattern: .bai
    required: true
  type: File
- id: chr_sizes
  required: true
  type: File
- id: regions_blocklist
  required: true
  sbg:fileTypes: BED
  type: File
- id: regions_includelist
  required: true
  sbg:fileTypes: BED
  type: File
label: MACS2 Call Candidate Regions
outputs:
- id: macs2_outputs
  outputBinding:
    glob: '*.macs2*'
  type: File[]?
requirements:
- class: ShellCommandRequirement
- class: DockerRequirement
  dockerPull: quay.io/jnasser/abc-container
- class: InitialWorkDirRequirement
  listing:
  - entry: '#conda env create -f abcenv.yml


      python3 /usr/src/app/src/makeCandidateRegions.py \

      --narrowPeak $(inputs.narrow_peak.path) \

      --bam $(inputs.bam.path) \

      --outDir .

# Push this notebook to files tab

In [15]:
upload = api.files.upload(
    path='macs2_call_candidate_regions.cwl.ipynb',
    overwrite=True,
    parent = "644299a6dc22f20baf6998fc",
)


# Create Docker Image

```
ARG BASE_CONTAINER=jupyter/datascience-notebook:latest
FROM $BASE_CONTAINER

USER root
RUN apt-get update && apt-get install -y \
    default-jdk \
    gawk \
    gcc \
    git \
    libz-dev \
    locales \
    make \
    unzip \
    bzip2 \
    libbz2-dev \
    zlib1g-dev \
    zlib1g \
    liblzma-dev \
    wget \
    libncurses5-dev \	
&& rm -rf /var/lib/apt/lists/*

# GAWK has the 'and' function, needed for chimeric_blacklist
RUN echo 'alias awk=gawk' >> ~/.bashrc

# Need to be sure we have this for stats
RUN locale-gen en_US.UTF-8

WORKDIR /opt/

ADD https://github.com/lh3/bwa/archive/v0.7.17.zip .
RUN unzip v0.7.17.zip 
RUN cd bwa-0.7.17/ && make
RUN ln -s bwa-0.7.17/bwa bwa

RUN conda install -c bioconda/label/cf201901 samtools
RUN conda install -c bioconda tabix
RUN conda install -c bioconda bedtools
RUN conda install -c bioconda pyranges

RUN git clone https://github.com/broadinstitute/ABC-Enhancer-Gene-Prediction.git
RUN conda env create -f ABC-Enhancer-Gene-Prediction/macs.yml
RUN conda env create -f ABC-Enhancer-Gene-Prediction/abcenv.yml

RUN git clone https://github.com/aidenlab/juicer.git
```

```
# TO DO
# TO DO
# Add Tabix install

FROM ubuntu:18.04

# Install required libraries
RUN apt-get update && apt-get install -y python python3 virtualenv python-pip python3-pip  zlib1g-dev zlib1g libbz2-dev liblzma-dev wget libncurses5-dev

# Set the working directory
WORKDIR /usr/src/app

# Setup the python requirements
RUN pip2 install --no-cache-dir numpy
RUN pip3 install Cython
RUN pip3 install --no-cache-dir numpy pandas scipy pyBigWig pyranges

# Setup samtools
RUN wget -O samtools-0.1.19.tar.bz2 https://sourceforge.net/projects/samtools/files/samtools/0.1.19/samtools-0.1.19.tar.bz2/download &&  tar xjf samtools-0.1.19.tar.bz2 && cd  /usr/src/app/samtools-0.1.19 &&  make -j 4

# Doesn't work?
# Setup tabix
# RUN wget -O tabix-0.2.5.tar.bz2 https://sourceforge.net/projects/samtools/files/tabix/tabix-0.2.5.tar.bz2/download &&  tar xjf tabix-0.2.5.tar.bz2 && cd  /usr/src/app/tabix-0.2.5 &&  make -j 4

# Update the path
ENV PATH=/usr/src/app/samtools-0.1.19/:${PATH}

# Setup bedtools
RUN wget -O bedtools-2.26.0.tar.gz https://github.com/arq5x/bedtools2/releases/download/v2.26.0/bedtools-2.26.0.tar.gz && tar xzf bedtools-2.26.0.tar.gz && cd /usr/src/app/bedtools2/ &&  make -j 12

# Update the path
ENV PATH=/usr/src/app/bedtools2/bin/:${PATH}

# Install python packages
#RUN pip2 install MACS2 && pip2 install progressbar &&  pip3 install progressbar
RUN pip3 install MACS2

# Copy the required scripts
COPY src/ src/
```

# References

https://docs.sevenbridges.com/reference/add-an-app-using-raw-cwl