Atac-seq Integrative Analysis Pipeline

Pipeline for the QC metrics construction, data analysis and visualization of ATAC-seq data.
Current version: AIAP_v1.1 Last update: 2019.12.9

Advisor: Bo Zhang
Contributor: Cheng Lyu and Shaopeng Liu

For any question, please contact shaopeng.liu@wustl.edu

Documentation:

Pipeline documentation: analysis details and QC metrics information
Please click here
Potential transcription factor binding region prediction algorithm:
Please click here
Update logfile: pipeline change record
Please click here

Usage:

Test data:

There are one paired-end mm10 data with 0.25M reads for test purpose, they can be downloaded by:

wget http://regmedsrv1.wustl.edu/Public_SPACE/shaopengliu/Singularity_image/atac-seq/test_mm10_data/mm10_1.fastq.gz
wget http://regmedsrv1.wustl.edu/Public_SPACE/shaopengliu/Singularity_image/atac-seq/test_mm10_data/mm10_2.fastq.gz

General IAP version:

Step1. download singularity images and reference files (you only need download them ONCE, then you can use them directly), if there is any update, you may need to download a new image, but reference files are usually NOT changed:

Download the singularity image:

wget http://regmedsrv1.wustl.edu/Public_SPACE/shaopengliu/Singularity_image/atac-seq/ATAC_IAP_v1.1.simg

If you want to use previous version, please find them by click here

Download the reference files of different genome:

wget http://regmedsrv1.wustl.edu/Public_SPACE/shaopengliu/Singularity_image/atac-seq/ref_file/atac_mm10_ref.tar.gz

You can also find more genome builds: click here . Currently we have: mm9/10, hg19/38, danRer10/11, rn6 and dm6.

Decompress the reference files and put to your own folder:

tar -xzf atac_mm10_ref.tar.gz

Step2. process data by the singularity image:

Please run the cmd on the same directory of your data, if your data is on /home/example, then you may need `cd /home/example` first. The location of image and reference files is up to you.

singularity run -B ./:/process -B <path-to-parent-folder-of-ref-file>:/atac_seq/Resource/Genome  <path-to-downloaded-image> -r <SE/PE> -g <mm9/mm10/hg38 etc.>  -o <read_file1>  -p <read_file2>

It may looks a little confusing at first time, but when you get familier with Singularity they will be friendly :)
For example, if
a) you download the image on /home/image/ATAC_IAP_v1.1.simg
b) the reference file on /home/src/mm10
c) and your data is read1.fastq.gz and read2.fastq.gz on folder /home/data

Then you need to:

cd /home/data
singularity run -B ./:/process -B /home/src:/atac_seq/Resource/Genome /home/image/ATAC_IAP_v1.1.simg -r PE -g mm10 -o read1.fastq.gz -p read2.fastq.gz

TaRGET II version:

click here to find the TaRGET image and download to your server
then run the code below on the same directory with your data:
singularity run -B ./:/process <path-to-image> -r <SE/PE> -g <mm10/hg38> -o <read_file1> -p <read_file2>

Soft link of file is supported, but you need to use full path of the file and mount the original location, for example:

ln -s `pwd`/myfile* /scratch/test
cd /scratch/test
singularity run -B ./:/process -B /scratch/test:/scratch/test <path-to-image>  -r <SE/PE> -g <mm10/hg38>  -o <myfile_1>  -p <myfile_2>

explaination:
The cmd is in this manner: singularity run <options> <singularity_image_to_run> <pipeline_parameters>

soft link introduction: If you want to use soft link, which is much more friendly when you have a lot of data, of the data. You will only need to add one bind option for singularity, which is -B <full-path-of-original-position>:<full-path-of-original-position>
For example, I want to soft link my data from /scratch to run on my own folder /home/example:

ln -s /scrach/mydata.fastq.gz /home/example; Please make sure you use the absolute path
cd /home/example
singularity run -B ./:/process -B /home/src:/atac_seq/Resource/Genome -B /scratch:/scratch /home/image/ATAC_IAP_v1.00.simg -r PE -g mm10 -o read1.fastq.gz -p read2.fastq.gz

#parameters:
-h: help information
-r: SE for single-end, PE for paired-end
-g: genome reference, one simg is designed for ONLY one species due to the file size. For now the supported genoms are: <mm10/mm9/hg19/hg38/danRer10/danRer11/rn6/dm6>
-o: reads file 1 or the SE reads file, must be ended by .fastq or .fastq.gz or .sra (for both SE and PE)
-p: reads file 2 if input PE data, must be ended by .fastq or .fastq.gz
-c: (optional) specify read length minimum cutoff for methylQA filtering, default 38
-t: (optional) specify number of threads to use, default 24
-i: (optional) insertion free region finding parameters used by Wellington Algorithm (Jason Piper etc. 2013), see documentation for more details.
      If you don NOT want to run IFR finding step, please just ignore the -i option; however IFR finding will use default parameters only if -i specified as 0:
      min_lfp=5
      max_lfp=15
      step_lfp=2
      min_lsh=50
      max_lsh=200
      step_lsh=20
      method=BH
      p_cutoff=0.05
      If you want to specify your own parameter, please make sure they are in the same order and seperated by comma
      Example: -i 5,15,2,50,200,20,BH,0.05
      You can check the pipe log file for the parameters used by IFR code

Name		Name	Last commit message	Last commit date
Latest commit History 351 Commits
archive/AIAP_v1.00		archive/AIAP_v1.00
atac_ref		atac_ref
documents		documents
image_build_file		image_build_file
pipe_code		pipe_code
pipe_code_AIAP		pipe_code_AIAP
pipe_code_TaRGET_local		pipe_code_TaRGET_local
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

archive/AIAP_v1.00

archive/AIAP_v1.00

atac_ref

atac_ref

documents

documents

image_build_file

image_build_file

pipe_code

pipe_code

pipe_code_AIAP

pipe_code_AIAP

pipe_code_TaRGET_local

pipe_code_TaRGET_local

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Atac-seq Integrative Analysis Pipeline

Documentation:

Usage:

Test data:

General IAP version:

Please run the cmd on the same directory of your data, if your data is on /home/example, then you may need `cd /home/example` first. The location of image and reference files is up to you.

TaRGET II version:

About

Releases

Packages

Languages

ShaopengLiu1/ATAC-seq_QC_analysis

Folders and files

Latest commit

History

Repository files navigation

Atac-seq Integrative Analysis Pipeline

Documentation:

Usage:

Test data:

General IAP version:

Please run the cmd on the same directory of your data, if your data is on /home/example, then you may need cd /home/example first. The location of image and reference files is up to you.

TaRGET II version:

About

Resources

Stars

Watchers

Forks

Languages

Please run the cmd on the same directory of your data, if your data is on /home/example, then you may need `cd /home/example` first. The location of image and reference files is up to you.