# ATAC-seq Pipeline
## Quick-Start Guide
This is a quick start guide to ATAC-seq pipeline. The pipeline is structures as a python library that can be imported with: 

In [130]:
import ATACseqPipeline
import importlib
importlib.reload(ATACseqPipeline)

<module 'ATACseqPipeline' from '/gpfs/group/home/snagaraja/ATACseqPipeline/core/ATACseqPipeline.py'>

In [131]:
print("hi")

hi


The sample sheet must be configured as a tab-delimeted file. Must contain the following columns: SampleName (replicates of the same sample should still have unique names. rule of thumb: give a unique name for every single fastq R1R2 pair here.), Read1 (Path to read 1 of FASTQ file), Read2 (Path to read 2 of FASTQ file), Status (sample name. Replicates must have the same status), C/T (is this sample a control or treatment).

The Pipeline object allows the user to set up an instance of the custom pipeline. It can be configured using a sample sheet as shown below. Note that setting dry_run=True at the time of pipeline creation will result in shell scripts to be created, but not submitted through SLURM. 

In [132]:
myexp = ATACseqPipeline.Pipeline(data_path='/gpfs/home/snagaraja/Exp122_Chd7', dry_run=True, app_path='/gpfs/home/snagaraja/ATACseqPipeline/core')
myexp.from_ssheet(ssheet_path='/gpfs/home/snagaraja/ATACseqPipeline/data/exp122_ssheet.txt')

/gpfs/group/home/snagaraja/ATACseqPipeline/core
/gpfs/home/snagaraja/Exp122_Chd7/submissions


In [133]:
import pandas as pd 
sampsheet = pd.read_csv('/gpfs/home/snagaraja/ATACseqPipeline/data/exp122_ssheet.txt', sep='\t')
print(sampsheet)

   SampleName                                              Read1  \
0     shCD4_1  /gpfs/home/snagaraja/ATACseqPipeline/data/Exp1...   
1     shCD4_2  /gpfs/home/snagaraja/ATACseqPipeline/data/Exp1...   
2      shCD19  /gpfs/home/snagaraja/ATACseqPipeline/data/Exp1...   
3  shChd7.1_1  /gpfs/home/snagaraja/ATACseqPipeline/data/Exp1...   
4  shChd7.1_2  /gpfs/home/snagaraja/ATACseqPipeline/data/Exp1...   
5  shChd7.2_1  /gpfs/home/snagaraja/ATACseqPipeline/data/Exp1...   
6  shChd7.2_2  /gpfs/home/snagaraja/ATACseqPipeline/data/Exp1...   
7  shChd7.3_1  /gpfs/home/snagaraja/ATACseqPipeline/data/Exp1...   
8  shChd7.3_2  /gpfs/home/snagaraja/ATACseqPipeline/data/Exp1...   

                                               Read2    Status CT Batch   
0  /gpfs/home/snagaraja/ATACseqPipeline/data/Exp1...     shCD4  C    One  
1  /gpfs/home/snagaraja/ATACseqPipeline/data/Exp1...     shCD4  C    Two  
2  /gpfs/home/snagaraja/ATACseqPipeline/data/Exp1...    shCD19  C    One  
3  /gpfs/home/snaga

Now uncomment each of these lines of code and hit play to run them. You should be able to monitor your jobs by using `squeue -u username`. As of right now, you need to run a step, and only run the next step once the previous one is complete. Future versions will run all job automagically. 

In [134]:
# myexp.configure_envs()

In [135]:
myexp.align_fastqs(genome_path='/gpfs/group/pipkin/tvenable/MusRef/MouseRefseq39_bowtie')

/gpfs/home/snagaraja/Exp122_Chd7/bams/
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/align_fastqs_shCD4_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/align_fastqs_shCD4_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/align_fastqs_shCD19.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/align_fastqs_shChd7.1_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/align_fastqs_shChd7.1_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/align_fastqs_shChd7.2_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/align_fastqs_shChd7.2_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/align_fastqs_shChd7.3_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/align_fastqs_shChd7.3_2.sh


In [136]:
myexp.remove_mito()

/gpfs/home/snagaraja/Exp122_Chd7/bams_noDups_noMito/
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/remove_mito_shCD4_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/remove_mito_shCD4_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/remove_mito_shCD19.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/remove_mito_shChd7.1_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/remove_mito_shChd7.1_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/remove_mito_shChd7.2_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/remove_mito_shChd7.2_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/remove_mito_shChd7.3_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/remove_mito_shChd7.3_2.sh


In [137]:
myexp.remove_duplicates()

created /gpfs/home/snagaraja/Exp122_Chd7/submissions/remove_duplicates_shCD4_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/remove_duplicates_shCD4_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/remove_duplicates_shCD19.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/remove_duplicates_shChd7.1_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/remove_duplicates_shChd7.1_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/remove_duplicates_shChd7.2_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/remove_duplicates_shChd7.2_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/remove_duplicates_shChd7.3_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/remove_duplicates_shChd7.3_2.sh


In [138]:
myexp.call_peaks_macs2()

created /gpfs/home/snagaraja/Exp122_Chd7/submissions/call_peaks_macs2_shCD4_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/call_peaks_macs2_shCD4_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/call_peaks_macs2_shCD19.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/call_peaks_macs2_shChd7.1_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/call_peaks_macs2_shChd7.1_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/call_peaks_macs2_shChd7.2_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/call_peaks_macs2_shChd7.2_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/call_peaks_macs2_shChd7.3_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/call_peaks_macs2_shChd7.3_2.sh


In [139]:
myexp.filter_blacklist('/gpfs/home/snagaraja/ATACseqPipeline/refs/blacklist.bed')

created /gpfs/home/snagaraja/Exp122_Chd7/submissions/filter_blacklist_shCD4_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/filter_blacklist_shCD4_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/filter_blacklist_shCD19.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/filter_blacklist_shChd7.1_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/filter_blacklist_shChd7.1_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/filter_blacklist_shChd7.2_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/filter_blacklist_shChd7.2_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/filter_blacklist_shChd7.3_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/filter_blacklist_shChd7.3_2.sh


In [140]:
myexp.get_insert_sizes()

created /gpfs/home/snagaraja/Exp122_Chd7/submissions/get_insert_sizes_shCD4_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/get_insert_sizes_shCD4_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/get_insert_sizes_shCD19.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/get_insert_sizes_shChd7.1_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/get_insert_sizes_shChd7.1_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/get_insert_sizes_shChd7.2_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/get_insert_sizes_shChd7.2_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/get_insert_sizes_shChd7.3_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/get_insert_sizes_shChd7.3_2.sh


In [141]:
myexp.plot_insert_sizes()

created /gpfs/home/snagaraja/Exp122_Chd7/submissions/plot_insert_sizes.sh


In [142]:
myexp.merge_peaks()

created /gpfs/home/snagaraja/Exp122_Chd7/submissions/merge_peaks.sh


In [143]:
myexp.peak_counts()

created /gpfs/home/snagaraja/Exp122_Chd7/submissions/peak_counts.sh


In [144]:
myexp.fastqc()

created /gpfs/home/snagaraja/Exp122_Chd7/submissions/fastqc_shCD4_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/fastqc_shCD4_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/fastqc_shCD19.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/fastqc_shChd7.1_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/fastqc_shChd7.1_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/fastqc_shChd7.2_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/fastqc_shChd7.2_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/fastqc_shChd7.3_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/fastqc_shChd7.3_2.sh


In [145]:
myexp.multiqc_report()

created /gpfs/home/snagaraja/Exp122_Chd7/submissions/multiqc.sh


In [146]:
myexp.bam_to_bed()

created /gpfs/home/snagaraja/Exp122_Chd7/submissions/bam_to_bed_shCD4_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/bam_to_bed_shCD4_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/bam_to_bed_shCD19.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/bam_to_bed_shChd7.1_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/bam_to_bed_shChd7.1_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/bam_to_bed_shChd7.2_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/bam_to_bed_shChd7.2_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/bam_to_bed_shChd7.3_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/bam_to_bed_shChd7.3_2.sh


In [147]:
myexp.merge_peaks()

created /gpfs/home/snagaraja/Exp122_Chd7/submissions/merge_peaks.sh


In [148]:
myexp.peak_counts()

created /gpfs/home/snagaraja/Exp122_Chd7/submissions/peak_counts.sh


In [149]:
myexp.bed_to_gtf()

created /gpfs/home/snagaraja/Exp122_Chd7/submissions/bed_to_gtf.sh


In [150]:
myexp.peak_counts()

created /gpfs/home/snagaraja/Exp122_Chd7/submissions/peak_counts.sh


In [151]:
myexp.shift_peaks('/gpfs/home/snagaraja/ATACseqPipeline/core/GCF_000001635.27_GRCm39_genomic.size')

created /gpfs/home/snagaraja/Exp122_Chd7/submissions/shift_peaks_shCD4_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/shift_peaks_shCD4_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/shift_peaks_shCD19.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/shift_peaks_shChd7.1_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/shift_peaks_shChd7.1_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/shift_peaks_shChd7.2_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/shift_peaks_shChd7.2_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/shift_peaks_shChd7.3_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/shift_peaks_shChd7.3_2.sh


In [152]:
myexp.bam_to_bigWig()

created /gpfs/home/snagaraja/Exp122_Chd7/submissions/bam_to_bigWig_shCD4_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/bam_to_bigWig_shCD4_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/bam_to_bigWig_shCD19.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/bam_to_bigWig_shChd7.1_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/bam_to_bigWig_shChd7.1_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/bam_to_bigWig_shChd7.2_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/bam_to_bigWig_shChd7.2_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/bam_to_bigWig_shChd7.3_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7/submissions/bam_to_bigWig_shChd7.3_2.sh


In [153]:
print("done")

done


In [154]:
myexp.main()

Submitted batch job 25148337

Submitted batch job 25148338

Submitted batch job 25148339

Submitted batch job 25148340

Submitted batch job 25148341

Submitted batch job 25148342

Submitted batch job 25148343

Submitted batch job 25148344

Submitted batch job 25148345

Submitted batch job 25148346

Submitted batch job 25148347

Submitted batch job 25148348

Submitted batch job 25148349

Submitted batch job 25148350

Submitted batch job 25148351

Submitted batch job 25148352

Submitted batch job 25148353

Submitted batch job 25148354

Submitted batch job 25148355

Submitted batch job 25148356

Submitted batch job 25148357

Submitted batch job 25148358

Submitted batch job 25148359

Submitted batch job 25148360

Submitted batch job 25148361

Submitted batch job 25148362

Submitted batch job 25148363

Submitted batch job 25148364

Submitted batch job 25148365

Submitted batch job 25148366

Submitted batch job 25148367

Submitted batch job 25148368

Submitted batch job 25148369

Submitted 

In [2]:
a = ["hello", "hi", "shalom"]
for idx, sample in enumerate(a):
    print(idx, sample)

0 hello
1 hi
2 shalom


In [4]:
mydict = {'a': 1, 'b': 2, 'c': 3}
mydict[]

KeyError: 1