-
Notifications
You must be signed in to change notification settings - Fork 3
Home
- RNA-seq alignment files in bam format (89 samples) can be publicly downloaded from Array-Express (https://www.ebi.ac.uk/arrayexpress/experiments/E-GEUV-1/), we have prepared all links in "test_data_RNA.links.txt" in the github repository. Download with command below:
wget -i test_data_RNA.links.txt
- Genotype data in VCF4.1 files, can be also downloaded from Array-Express (https://www.ebi.ac.uk/arrayexpress/experiments/E-GEUV-1/), we have prepared the links in "test_data_genotype.links.txt" in the github repository. Download with command below:
wget -i test_data_genotype.links.txt
- Refseq gene annotation file (hg19_refseq_whole_gene.bed) and refseq transcript id map to gene symbol (hg19_refseq_id_to_symbol.txt) both files can be downloaded from UCSC table browser.
NOTE: Users reuse this pipeline in their own dataset should download correct version of reference genome consistent with their RNA-seq data.
1. Download and install the following softwares or scripts
wget -c https://repo.anaconda.com/archive/Anaconda3-5.3.0-Linux-x86_64.sh
bash Anaconda3-5.3.0-Linux-x86_64.sh
conda install -c r r
conda install -c conda-forge r-dplyr
conda install -c bioconda Bioconductor-impute
git clone https://github.com/3UTR/DaPars2.git
conda install -c bioconda bedtools
conda install -c bioconda vcftools
conda install -c bioconda plink=1.90
conda install -c bioconda samtools
conda install -c bioconda r-peer
conda install -c bioconda matrixeqtl
# start R
R
# use the following R function to install the latest susieR
> remote::install_github("stephenslab/susieR")
git clone https://github.com/3UTR/3aQTL-pipe.git
Required data list before starting this pipe
Data description | Example Data |
---|---|
RNA-seq alignment files (bam files) | The 89 bam files in GEUVADIS RNA-seq project (CEU subpopulation) |
Genotype data in VCF format | The VCF files in GEUVADIS RNA-seq project |
A text file contains all samples | sample_list.txt (in the 3aQTL_pipe repository) |
A reference genome annotation (BED) | hg19_refseq_whole_gene.bed |
ID mapping between Refseq ID and gene symbol | hg19_refseq_transcript2geneSymbol.txt |
A text file contains VCF file(s) | vcf_list.txt |
A tab-delimited text file contains known covariates, e.g. gender (optional) | known_covariates.txt |
git clone https://github.com/3UTR/3aQTL-pipe.git
cd 3aQTL_pipe
If users want to repeat the pipeline on GEUVADIS dataset ("CEU" sub-population), you can use the "hg19_refseq_whole_gene.bed", "hg19_refseq_transcript2geneSymbol.txt" in the github repository and prepare a "sample_list.txt" and a "vcf_list.txt" for the dataset according to the example files we provided in the github repository.
Note: the bam files in "sample_list.txt" and VCF file(s) in "vcfList.txt" should including path.
For example:
sample1_id /path/to/bam/sample1_id.bam # change "/path/to/bam/" to the location where you put bam files
/path/to/vcf/testdat.vcf # change "/path/to/vcf/" to the location where you put vcf files
cd 3aQTL_pipe
bash ./src/prepare_inputs_for_apa_quant.sh -s sample_list.txt -g hg19_refseq_whole_gene.bed -r hg19_refseq_id_to_symbol.txt
This shell script will be run with 8 threads in parallel by default, users can change this by option "-t"
Three files will returned: refseq_3utr_annotation.bed, wigFile_and_readDepth.txt, Dapars2_running_configure.txt
# analyze single chromosome
python ./src/Dapars2_Multi_Sample.py Dapars2_running_configure.txt chr1
# alternatively, users can analyze all chromosomes with "DaPars2_Multi_Sample_Multi_Chr.py"
cat refseq_3utr_annotation.bed | cut -f 1|sort|uniq |grep -v “MT” > chrList.txt
Rsript ./src/merge_dapars2_res_by_chr.R Dapars2_out Dapars2 sample_list.txt chrList.txt
This will generate a final APA quantitative profile "Dapars2_res.all_chromosomes.txt"
bash ./src/prepare_inputs_for_3aQTL_mapping.sh -c known_covariates.txt
Note: option “-c” takes a tab-delimited covariate file (default is “NA” if not possible). An example can be found at https://github.com/3UTR/3aQTL-pipe.
Rscript ./src/run_3aQTL_mapping.R
Note: The cis 3’aQTLs “Cis_3aQTL_all_control_gene_exprs.txt” and trans 3’aQTLs “Trans_3aQTL_all_control_gene_exprs.txt” will be output in directory “Matrix_eQTL/”.
Rscript ./src/QTL_plot.R -s “chr7_128640188_A_G” -g “NM_001347928.2|IRF5|chr7|+”
Note: The first parameter specifies the SNP for visualization. The second one specifies the related transcript. This will generate a publication-ready plot “chr7_128640188_A_G.IRF5.pdf”
bash ./src/prepare_inputs_for_finemapping.sh
Note: The shell script requires “3UTR_location.txt”, genetic associations result (e.g. “Cis_3aQTL_all_control_gene_exprs.txt”) that generated in step 5 and step 6, respectively. And genotype data (“Genotype_matrix.txt”) is generated in step 5. It outputs a directory for each significant transcript with “3aQTL.vcf” and “expr.phen”.
bash ./src/run_fine_mapping.sh -t 8
# option "-t" specifies the number of threads to be used, default if "1"
Note: SusieR will generate three files in each transcript directory. One plot describes the independent signals (Figure 2), an R binary file contains the results of susieR fine mapping, and a text file lists all independent signals with the suffix “.pdf”, “.rds”, and “.txt”, respectively.
Rscript ./src/merge_finemap_results.R
We have built a docker image in name of "3aqtl_pipe" for the whole pipeline includes all source codes of scripts involved in this pipeline. The docker image "3aqtl_pipe" has been pushed into Docker hub, users can pull down and run a new container from this image and use the pipeline through the created container.
Quick guide
We assumed users have docker installed in the server. If not, please contact the administrator of the server to install docker at first.
- Pull down the docker image "3aqtl_pipe"
docker pull 3utr/3aqtl_pipe:miniv4 # “miniv4" denotes the tag info
docker image ls # list all images including "3utr/3aqtl_pipe:miniv4
- Run a docker container from the pulled docker image
Before creating a container, prepare all bam files and vcf files in a directory (e.g. "bam_vcf", could be two sub directories within "bam_vcf") in local sever and mount this local directory to docker container (setted by option "-v").
docker run -it --name="3aqtl_container" -v /local/path/to/bam_vcf:/home/bam_vcf 3utr/3aqtl_pipe:miniv4 /bin/bash
# -v option creates volume for directory in local host and it will be map to the directory in container "/home/bam_vcf" (will be created if not exits)
# the initial location of the container was set to /home and directory
ls # list the contents in current location,you will see a "3aQTL_pipe" directory and a "bam_vcf" directory
Users will see a "3aQTL_pipe" directory and a "bam_vcf" directory in the container. Change directory to workspace by "cd 3aQTL_pipe"
"3aQTL_pipe" contains all source codes of 3'aQTL pipeline
- Create a "sample_list.txt" and "vcf_list.txt" in "3aQTL_pipe" directory
The two files contains samples and corresponding location of bam files, and location of vcf files, therefore they should be created after run a container.
Here is an example way to do this:
#we assume the bam files and vcf files are located in the directory "bam_vcf" in the container initial location, if not you need to change
# the location in commands below
cd 3aQTL_pipe
for i in `ls ../bam_vcf/*.bam`;do sample=${i##*/};sample=${sample%%.*};echo -e "$sample\t$i";done > sample_list.txt
ls ../bam_vcf/*.vcf* > vcf_list.txt
After this, users can jump to "Quick Start" to go through the whole pipeline step by step.
1. Example of APA quantification across samples
An example of format of Dapars2 output can be found in the wiki of Dapars2
2. Example of 3'aQTL mapping by Matrix-eQTL
Matrix-eQTL will report the association statistics of tested SNP-Gene pair in tab-delimited text file as shown below:
3. Example of fine-mapping results
Xudong Zou, Ruofan Ding, Wenyan Chen, Gao Wang, Shumin Cheng, Wei Li, Lei Li
Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518055, China
- Code and Execution:
[Ref TBD]
- The first 3'aQTL atlas of human tissues:
An atlas of alternative polyadenylation quantitative trait loci contributing to complex trait and disease heritability
Lei Li, Kai-Lieh Huang, Yipeng Gao, Ya Cui, Gao Wang, Nathan D. Elrod, Yumei Li, Yiling Elaine Chen, Ping Ji, Fanglue Peng, William K. Russell, Eric J. Wagner & Wei Li. Nature Genetics,53,994-1005 (2021). DOI:https://doi.org/10.1038/s41588-021-00864-5
https://www.nature.com/articles/s41588-021-00864-5
For any issues, please create a GitHub Issue.