ATACprofWS (ATAC-seq Profiler with Spike-in)

In-house scripts for profiling ATAC-seq peaks examined with Spike-in

These scripts compare the genome-wide accessibility dynamics caused by a TF KO, along with normalizing read counts and by performing statistical tests.

These scripts calculate scale factors and detect common/unique ATAC-seq peaks within a KO sample and a control sample.

Requirements

BEDTools
MACS2
R library(edgeR)

Preparing

Edit "data/mapped_readCount.txt" with counting your mapped reads either to the Spikein genome or the Target genome. As an example, we prepared three samples with two replications per each (NC, PARP1, and TFDP1).

#Sample	Spike	Target
NC_1	4165657	15361208
NC_2	3753537	12121573
PARP1_1	4652827	14347168
PARP1_2	3585646	13066181
TFDP1_1	2909301	10177613
TFDP1_2	2348308	11196191

Edit "ATACprofWS.conf" with your PC environment. This conf file will be imported by PERL script as follows:

require("./ATACprofWS.conf");

Therefore, this conf file must be located in the current directory.

Step 1: calculating scale factors

%>perl script/1.scaleFactor.pl data/mapped_readCount.txt scaleFactor.txt
##Reading data/mapped_readCount.txt
##	6 samples read
##	min(total_mapped): 13086914
##	min(spike_mapped): 2348308
##	min(spike_norm): 2268896.62

You may see "scaleFactor.txt"

Step 2-1: preparing scripts for peak calling

%>perl script/2.callpeak_MACS2.pl scaleFactor.txt /your_directory/data/BED/ /your_directory/MACS2
## Reading scale factors from scaleFactor.txt
#	6 SFs
#	3 SFs: TFDP1 PARP1 NC

## run sh /home/park/Miyanari/ATAC/ATACprofWS/MACS2/TFDP1/TFDP1_macs2.age
## or
## qsub /home/park/Miyanari/ATAC/ATACprofWS/MACS2/TFDP1/TFDP1_macs2.age

## run sh /home/park/Miyanari/ATAC/ATACprofWS/MACS2/PARP1/PARP1_macs2.age
## or
## qsub /home/park/Miyanari/ATAC/ATACprofWS/MACS2/PARP1/PARP1_macs2.age

## run sh /home/park/Miyanari/ATAC/ATACprofWS/MACS2/NC/NC_macs2.age
## or
## qsub /home/park/Miyanari/ATAC/ATACprofWS/MACS2/NC/NC_macs2.age

Here, we need BED files that include the location of mapped reads. You may convert BAMs to BED as follows:

%>bamToBed -i bamfile_of_Bowtie2_mapping | awk 'BEGIN{OFS="\t"}{$4="N";$5="1000";print $0}' > output_bed.tmp
%>awk -F $'\t' 'BEGIN {OFS = FS}{ if($6=="+"){$2=$2+4}elseif($6=="-"){$3=$3-5} print $0}' output_bed.tmp > output.bed

We have prepared the example BED files ("data/BED/"), which included the top 500,000 lines only in the original BED files.

Step 2-2: running the peak-calling scripts

%>sh MACS2/NC/NC_macs2.age
%>sh MACS2/PARP1/PARP1_macs2.age 
%>sh MACS2/TFDP1/TFDP1_macs2.age

This process merges the two replicates after running MACS2.

Step 3-1: preparing scripts for capturing final peak sets

%>perl script/3.post_callpeak_MACS2_v4-1.pl scaleFactor.txt /your_directory/MACS2 /your_directory/postMACS2_v4-1

This process adjusts the read counts at each MACS2 peak using scale Factors. This process tests statistical significance. The thresholds were defined in ATACprofWS.conf. This process also generates files for de novo peaks (unique in a KO sample) and common peaks (found in both KO and Control samples). This process also addresses peaks within 10k-bin genomic windows.

Step 3-2: running the scripts

%>sh postMACS2_v4-1/PARP1/PARP1_pstMACS2.age 
%>sh postMACS2_v4-1/TFDP1/TFDP1_pstMACS2.age

Step 4: gathering all the results to be used for further analysis

%>perl script/4.get_Profile.pl scaleFactor.txt /home/park/Miyanari/ATAC/ATACprofWS/postMACS2_v4-1/ /home/park/Miyanari/ATAC/ATACprofWS/Profile

Now, the directory "Profile" includes the common and de-novo "Peak_Positions" and "Peak_Windows_10K". The file "Numbers.txt" shows the number of ATAC peaks in each category. For example, "TFDP1_KO_denovo.bed" is for the unique peaks that appeared after TFDP1 KO."TFDP1_NC_denovo.bed" is for the disappeared peaks in the TFDP1-KO sample (= unique peaks in the Control sample).

Step 5: downstream analysis

%>perl script/5.plot_Distribution.pl Profile/Peak_Windows_10K/ NULL Profile/Boxplot
%>perl script/5.plot_Distribution.pl Profile/Peak_Windows_10K/ 2 Profile/Boxplot
%>perl script/6.plot_PeakPie.pl Profile/Peak_Positions/ Profile/Diff_Peaks/ 2

By using the files in the directory "Profile/", these scripts look at the genome-wide fold-change of accessibility of KO samples, differential peaks, distribution of unique/common peaks, etc.

Example:

%>tar xfvz example.tar.gz
%>cat run.log

For guidance purposes, the some of output results have been archived in this example directory. In addition, the file "run.log" lists the command lines used above.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

script

script

ATACprofWS.conf

ATACprofWS.conf

README.md

README.md

example.tar.gz

example.tar.gz

run.log

run.log

scaleFactor.txt

scaleFactor.txt

Repository files navigation

ATACprofWS (ATAC-seq Profiler with Spike-in)

In-house scripts for profiling ATAC-seq peaks examined with Spike-in

These scripts compare the genome-wide accessibility dynamics caused by a TF KO, along with normalizing read counts and by performing statistical tests.

These scripts calculate scale factors and detect common/unique ATAC-seq peaks within a KO sample and a control sample.

Requirements

Preparing

Step 1: calculating scale factors

Step 2-1: preparing scripts for peak calling

Step 2-2: running the peak-calling scripts

Step 3-1: preparing scripts for capturing final peak sets

Step 3-2: running the scripts

Step 4: gathering all the results to be used for further analysis

Step 5: downstream analysis

Example:

About

Releases 1

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
data		data
script		script
ATACprofWS.conf		ATACprofWS.conf
README.md		README.md
example.tar.gz		example.tar.gz
run.log		run.log
scaleFactor.txt		scaleFactor.txt

Park-Sung-Joon/ATACprofWS

Folders and files

Latest commit

History

Repository files navigation

ATACprofWS (ATAC-seq Profiler with Spike-in)

In-house scripts for profiling ATAC-seq peaks examined with Spike-in

These scripts compare the genome-wide accessibility dynamics caused by a TF KO, along with normalizing read counts and by performing statistical tests.

These scripts calculate scale factors and detect common/unique ATAC-seq peaks within a KO sample and a control sample.

Requirements

Preparing

Step 1: calculating scale factors

Step 2-1: preparing scripts for peak calling

Step 2-2: running the peak-calling scripts

Step 3-1: preparing scripts for capturing final peak sets

Step 3-2: running the scripts

Step 4: gathering all the results to be used for further analysis

Step 5: downstream analysis

Example:

About

Resources

Stars

Watchers

Forks

Languages