# ATAC-seq Pipeline
## Quick-Start Guide
This is a quick start guide to ATAC-seq pipeline. The pipeline is structures as a python library that can be imported with: 

In [13]:
import ATACseqPipeline
import importlib
importlib.reload(ATACseqPipeline)

<module 'ATACseqPipeline' from '/gpfs/group/home/snagaraja/ATACseqPipeline/core/ATACseqPipeline.py'>

In [10]:
print("hi")

hi


The sample sheet must be configured as a tab-delimeted file. Must contain the following columns: SampleName (replicates of the same sample should still have unique names. rule of thumb: give a unique name for every single fastq R1R2 pair here.), Read1 (Path to read 1 of FASTQ file), Read2 (Path to read 2 of FASTQ file), Status (sample name. Replicates must have the same status), C/T (is this sample a control or treatment).

The Pipeline object allows the user to set up an instance of the custom pipeline. It can be configured using a sample sheet as shown below. Note that setting dry_run=True at the time of pipeline creation will result in shell scripts to be created, but not submitted through SLURM. 

In [11]:
myexp = ATACseqPipeline.Pipeline(data_path='/gpfs/home/snagaraja/Exp122_Chd7_0.01', dry_run=True, app_path='/gpfs/home/snagaraja/ATACseqPipeline/core')
myexp.from_ssheet(ssheet_path='/gpfs/home/snagaraja/ATACseqPipeline/data/exp122_ssheet.txt')

/gpfs/group/home/snagaraja/ATACseqPipeline/core


In [4]:
import pandas as pd 
sampsheet = pd.read_csv('/gpfs/home/snagaraja/ATACseqPipeline/data/exp122_ssheet.txt', sep='\t')
print(sampsheet)

   SampleName                                              Read1  \
0     shCD4_1  /gpfs/home/snagaraja/ATACseqPipeline/data/Exp1...   
1     shCD4_2  /gpfs/home/snagaraja/ATACseqPipeline/data/Exp1...   
2      shCD19  /gpfs/home/snagaraja/ATACseqPipeline/data/Exp1...   
3  shChd7.1_1  /gpfs/home/snagaraja/ATACseqPipeline/data/Exp1...   
4  shChd7.1_2  /gpfs/home/snagaraja/ATACseqPipeline/data/Exp1...   
5  shChd7.2_1  /gpfs/home/snagaraja/ATACseqPipeline/data/Exp1...   
6  shChd7.2_2  /gpfs/home/snagaraja/ATACseqPipeline/data/Exp1...   
7  shChd7.3_1  /gpfs/home/snagaraja/ATACseqPipeline/data/Exp1...   
8  shChd7.3_2  /gpfs/home/snagaraja/ATACseqPipeline/data/Exp1...   

                                               Read2    Status CT Batch   
0  /gpfs/home/snagaraja/ATACseqPipeline/data/Exp1...     shCD4  C    One  
1  /gpfs/home/snagaraja/ATACseqPipeline/data/Exp1...     shCD4  C    Two  
2  /gpfs/home/snagaraja/ATACseqPipeline/data/Exp1...    shCD19  C    One  
3  /gpfs/home/snaga

Now uncomment each of these lines of code and hit play to run them. You should be able to monitor your jobs by using `squeue -u username`. As of right now, you need to run a step, and only run the next step once the previous one is complete. Future versions will run all job automagically. 

In [5]:
# myexp.configure_envs()

In [6]:
myexp.align_fastqs(genome_path='/gpfs/group/pipkin/tvenable/MusRef/MouseRefseq39_bowtie')

/gpfs/home/snagaraja/Exp122_Chd7_0.01/bams/
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/align_fastqs_shCD4_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/align_fastqs_shCD4_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/align_fastqs_shCD19.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/align_fastqs_shChd7.1_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/align_fastqs_shChd7.1_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/align_fastqs_shChd7.2_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/align_fastqs_shChd7.2_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/align_fastqs_shChd7.3_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/align_fastqs_shChd7.3_2.sh


In [7]:
myexp.remove_mito()

/gpfs/home/snagaraja/Exp122_Chd7_0.01/bams_noDups_noMito/
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/remove_mito_shCD4_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/remove_mito_shCD4_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/remove_mito_shCD19.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/remove_mito_shChd7.1_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/remove_mito_shChd7.1_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/remove_mito_shChd7.2_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/remove_mito_shChd7.2_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/remove_mito_shChd7.3_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/remove_mito_shChd7.3_2.sh


In [8]:
myexp.remove_duplicates()

created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/remove_duplicates_shCD4_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/remove_duplicates_shCD4_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/remove_duplicates_shCD19.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/remove_duplicates_shChd7.1_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/remove_duplicates_shChd7.1_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/remove_duplicates_shChd7.2_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/remove_duplicates_shChd7.2_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/remove_duplicates_shChd7.3_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/remove_duplicates_shChd7.3_2.sh


In [9]:
myexp.call_peaks_macs2()

created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/call_peaks_macs2_shCD4_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/call_peaks_macs2_shCD4_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/call_peaks_macs2_shCD19.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/call_peaks_macs2_shChd7.1_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/call_peaks_macs2_shChd7.1_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/call_peaks_macs2_shChd7.2_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/call_peaks_macs2_shChd7.2_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/call_peaks_macs2_shChd7.3_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/call_peaks_macs2_shChd7.3_2.sh


In [10]:
myexp.filter_blacklist('/gpfs/home/snagaraja/ATACseqPipeline/refs/blacklist.bed')

created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/filter_blacklist_shCD4_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/filter_blacklist_shCD4_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/filter_blacklist_shCD19.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/filter_blacklist_shChd7.1_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/filter_blacklist_shChd7.1_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/filter_blacklist_shChd7.2_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/filter_blacklist_shChd7.2_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/filter_blacklist_shChd7.3_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/filter_blacklist_shChd7.3_2.sh


In [11]:
myexp.get_insert_sizes()

created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/get_insert_sizes_shCD4_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/get_insert_sizes_shCD4_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/get_insert_sizes_shCD19.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/get_insert_sizes_shChd7.1_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/get_insert_sizes_shChd7.1_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/get_insert_sizes_shChd7.2_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/get_insert_sizes_shChd7.2_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/get_insert_sizes_shChd7.3_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/get_insert_sizes_shChd7.3_2.sh


In [12]:
myexp.plot_insert_sizes()

created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/plot_insert_sizes.sh


In [13]:
myexp.merge_peaks()

created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/merge_peaks.sh


In [14]:
myexp.peak_counts()

created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/peak_counts.sh


In [15]:
myexp.fastqc()

created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/fastqc_shCD4_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/fastqc_shCD4_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/fastqc_shCD19.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/fastqc_shChd7.1_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/fastqc_shChd7.1_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/fastqc_shChd7.2_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/fastqc_shChd7.2_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/fastqc_shChd7.3_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/fastqc_shChd7.3_2.sh


In [16]:
myexp.multiqc_report()

created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/multiqc.sh


In [17]:
myexp.bam_to_bed()

created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/bam_to_bed_shCD4_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/bam_to_bed_shCD4_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/bam_to_bed_shCD19.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/bam_to_bed_shChd7.1_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/bam_to_bed_shChd7.1_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/bam_to_bed_shChd7.2_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/bam_to_bed_shChd7.2_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/bam_to_bed_shChd7.3_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/bam_to_bed_shChd7.3_2.sh


In [18]:
myexp.merge_peaks()

created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/merge_peaks.sh


In [19]:
myexp.peak_counts()

created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/peak_counts.sh


In [20]:
myexp.bed_to_gtf()

created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/bed_to_gtf.sh


In [21]:
myexp.peak_counts()

created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/peak_counts.sh


In [22]:
myexp.shift_peaks('/gpfs/home/snagaraja/ATACseqPipeline/core/GCF_000001635.27_GRCm39_genomic.size')

created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/shift_peaks_shCD4_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/shift_peaks_shCD4_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/shift_peaks_shCD19.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/shift_peaks_shChd7.1_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/shift_peaks_shChd7.1_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/shift_peaks_shChd7.2_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/shift_peaks_shChd7.2_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/shift_peaks_shChd7.3_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/shift_peaks_shChd7.3_2.sh


In [23]:
myexp.bam_to_bigWig()

created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/bam_to_bigWig_shCD4_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/bam_to_bigWig_shCD4_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/bam_to_bigWig_shCD19.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/bam_to_bigWig_shChd7.1_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/bam_to_bigWig_shChd7.1_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/bam_to_bigWig_shChd7.2_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/bam_to_bigWig_shChd7.2_2.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/bam_to_bigWig_shChd7.3_1.sh
created /gpfs/home/snagaraja/Exp122_Chd7_0.01/submissions/bam_to_bigWig_shChd7.3_2.sh


In [24]:
print("done")

done


In [25]:
myexp.main()

{'align_fastqs': ['25148457', '25148458', '25148459', '25148460', '25148461', '25148462', '25148463', '25148464', '25148465'], 'remove_mito': ['25148466', '25148467', '25148468', '25148469', '25148470', '25148471', '25148472', '25148473', '25148474'], 'remove_duplicates': ['25148475', '25148476', '25148477', '25148478', '25148479', '25148480', '25148481', '25148482', '25148483'], 'bam_to_bed': ['25148484', '25148485', '25148486', '25148487', '25148488', '25148489', '25148490', '25148491', '25148492'], 'shift_peaks': ['25148493', '25148494', '25148495', '25148496', '25148497', '25148498', '25148499', '25148500', '25148501'], 'call_peaks_macs2': ['25148502', '25148503', '25148504', '25148505', '25148506', '25148507', '25148508', '25148509', '25148510'], 'filter_blacklist': ['25148511', '25148512', '25148513', '25148514', '25148515', '25148516', '25148517', '25148518', '25148519'], 'merge_peaks': ['25148520'], 'bed_to_gtf': ['25148521'], 'peak_counts': ['25148522'], 'bam_to_bigWig': ['251