An integrated pipeline for neoantigen prediction from NGS data.
Authors: Zhan Zhou, Jingcheng Wu, Xingzheng Lyu, Jianan Ren
Date: July 2021
Version: 2.0.1
License: TSNAD is released under GNU license
System: Linux
Contact: zhanzhou@zju.edu.cn
An integrated software for cancer somatic mutation and tumour-specific neoantigen detection.
There are two ways to install TSNAD:
-
installed by docker without any other pre-installed tools (strongly recommand, can be used both in linux and windows)
-
installed by github with all required tools installed (only can be used in linux)
First, you need to install docker (https://docs.docker.com/)
then, type the following code to install TSNAD:
docker pull biopharm/tsnad:latest
it may take several hours to download because of the large size.
You need to enter the TSNAD running enviromont with your path of WES/WGS/RNA-seq as the following command (RNA-seq is not necessary to provide):
docker run -it -v [dir of WES/WGS]/:/home/tsnad/samples -v [dir of RNA-seq]:/home/tsnad/RNA-seq -v [output dir]:/home/tsnad/results biopharm/tsnad:latest /bin/bash
type the following command then the prediction of neoantigen from WES/WGS would start:
cd /home/tsnad
bash uncompress.sh
python TSNAD.py -I samples/ -R RNA-seq/ -V [grch37/grch38] -O results/
All results would be stored in the folder results/, and the final results of neoantigen are stored in the results/deephlapan_results/.
TSNAD uses the following software and libraries:
- Trimmomatic 0.39 (In Tools/)
- BWA 0.7.17 (In Tools/)
- SAMtools 1.13 (In Tools/)
- GATK 4.2.0.0
- VEP 104
- hisat2 2.2.1
- Stringtie 2.1.6 (In Tools/)
- OptiType 1.3.5 (In Tools/)
- STAR 2.7 (In Tools/)
- Arriba 1.1.0 (In Tools/)
- DeepHLApan 1.1 (In Tools/)
- JAVA 1.8
- Python 2.7
- Perl 5.22
1-11 tools are better put in the folder Tools/.
-
Trimmomatic
unzip Trimmomatic-*.zip
-
BWA
tar -xjvf bwa-*.tar.bz2 cd bwa-* make vim ~/.bashrc export PATH=$PATH:/home/tsnad/Tools/bwa-0.7.17/ source ~/.bashrc
-
SAMtools
sudo apt-get install libncurses5-dev sudo apt-get install libbz2-dev sudo apt-get install liblzma-dev tar -xjvf samtools-*.tar.bz2 cd samtools-* ./configure make sudo make install
-
GATK
unzip gatk-*.zip sudo apt install openjdk-8-jdk-headless The necessary files for grch37 wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/1000G_phase1.snps.high_confidence.b37.vcf.gz wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/1000G_phase1.snps.high_confidence.b37.vcf.idx.gz wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/dbsnp_138.b37.vcf.gz wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/dbsnp_138.b37.vcf.idx.gz wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/Mills_and_1000G_gold_standard.indels.b37.vcf.gz wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/Mills_and_1000G_gold_standard.indels.b37.vcf.idx.gz wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/human_g1k_v37.fasta.gz wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/human_g1k_v37.fasta.fai.gz wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/human_g1k_v37.fasta.ann.gz wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/human_g1k_v37.fasta.bwt.gz wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/human_g1k_v37.fasta.amb.gz wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/human_g1k_v37.fasta.pac.gz wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/human_g1k_v37.fasta.sa.gz wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/human_g1k_v37.2bit.gz wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/human_g1k_v37.dict.gz The necessary files for grch38 wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/1000G_phase1.snps.high_confidence.hg38.vcf.gz wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/1000G_phase1.snps.high_confidence.hg38.vcf.gz.tbi wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/dbsnp_146.hg38.vcf.gz wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/dbsnp_146.hg38.vcf.gz.tbi wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/Homo_sapiens_assembly38.fasta.gz wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/Homo_sapiens_assembly38.fasta.fai wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/Homo_sapiens_assembly38.fasta.64.alt wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/Homo_sapiens_assembly38.dict
uncompress all the downloaded files and put them in the same folder (e.g. gatk-*/b37/)
to note, the chromosome name in dbsnp file is different from other files, so we need to transform it as follows :
perl sub/transform.pl dbsnp_138.b37.vcf dbsnp_138.b37_adj.vcf
-
VEP
unzip ensembl-vep-release-*.zip cd ensembl-vep-release-* perl INSTALL.pl
download the API, download the cache homo_sapiens_merged_vep_104_GRCh37.tar.gz for grch37, download the cache homo_sapiens_merged_vep_104_GRCh38.tar.gz for grch38.
if it is not help, try following step:
cd mkdir src cd src wget ftp://ftp.ensembl.org/pub/ensembl-api.tar.gz wget https://cpan.metacpan.org/authors/id/C/CJ/CJFIELDS/BioPerl-1.6.924.tar.gz tar -xvf ensembl-api.tar.gz tar -xvf BioPerl-1.6.924.tar.gz PERL5LIB=${PERL5LIB}:${HOME}/src/BioPerl-1.6.924 PERL5LIB=${PERL5LIB}:${HOME}/src/ensembl/modules PERL5LIB=${PERL5LIB}:${HOME}/src/ensembl-compara/modules PERL5LIB=${PERL5LIB}:${HOME}/src/ensembl-variation/modules PERL5LIB=${PERL5LIB}:${HOME}/src/ensembl-funcgen/modules PERL5LIB=${PERL5LIB}:${HOME}/src/ensembl-io/modules PERL5LIB=${PERL5LIB}:${HOME}/src/ensembl-tools export PERL5LIB sudo perl -MCPAN -e shell install Bio::PrimarySeqI install DBI
-
Hisat2
unzip hisat2-*.zip cd hisat2-* The necessary files for grch37 wget https://genome-idx.s3.amazonaws.com/hisat/grch37_genome.tar.gz wget http://ftp.ensembl.org/pub/grch37/release-104/gtf/homo_sapiens/Homo_sapiens.GRCh37.87.gtf.gz tar -zxvf grch37_genome.tar.gz gunzip Homo_sapiens.GRCh37.87.gtf.gz -d The necessary files for grch38 wget https://genome-idx.s3.amazonaws.com/hisat/grch38_genome.tar.gz wget http://ftp.ensembl.org/pub/release-104/gtf/homo_sapiens/Homo_sapiens.GRCh38.104.gtf.gz tar -zxvf grch38_genome.tar.gz gunzip Homo_sapiens.GRCh38.104.gtf.gz -d
-
Stringtie
tar -zxvf stringtie-*.tar.gz
-
OptiType
unzip OptiType.zip -d OptiType cd OptiType/glpk-5.0 ./configure make && make install cd ../OptiType/hdf5-1.12.1 ./configure make && make install vim /etc/ld.so.conf /usr/local/lib /sbin/ldconfig -v pip install numpy pip install pyomo pip install pysam pip install matplotlib pip install tables pip install pandas pip install future
-
STAR
unzip STAR-master.zip cd STAR-master/source make STAR The necessary files for grch37 wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz The necessary files for grch38 wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_28/gencode.v28.annotation.gtf.gz
-
Arriba
tar -xvf arriba_v1.1.0.tar.gz cd arriba_v1.1.0 && make
-
DeepHLApan
unzip deephlapan.zip -d deephlapan cd deephlapan python setup.py install
-
configure the file in the directory /config, take grch38 as example:
trimmomatic_tool /home/tsnad/Tools/Trimmomatic-0.39/trimmomatic-0.39.jar bwa_folder /home/tsnad/Tools/bwa-0.7.17/ samtools_folder /home/tsnad/Tools/samtools-1.13/ gatk_tool /home/tsnad/Tools/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar VEP_folder /home/tsnad/Tools/ensembl-vep/ hisat2_folder /home/tsnad/Tools/hisat2-2.1.0/ stringtie_tool /home/tsnad/Tools/hisat2-2.1.0/stringtie-1.3.5.Linux_x86_64/stringtie Optitype_folder /home/tsnad/Tools/OptiType/ star_folder /home/tsnad/Tools/STAR/ arriba_folder /home/tsnad/Tools/arriba_v1.1.0/ ref_human_file /home/tsnad/Tools/gatk-4.2.0.0/grch38/Homo_sapiens_assembly38.fasta ref_1000G_file /home/tsnad/Tools/gatk-4.2.0.0/grch38/1000G_phase1.snps.high_confidence.hg38.vcf ref_Mills_file /home/tsnad/Tools/gatk-4.2.0.0/grch38/Mills_and_1000G_gold_standard.indels.hg38.vcf ref_dbsnp_file /home/tsnad/Tools/gatk-4.2.0.0/grch38/dbsnp_144.hg38_adj.vcf headcrop 10 leading 3 minlen 35 needRevisedData True normal_f 0 normal_reads 6 slidingwindow 4:15 threadNum 6 trailing 3 tumor_alt 5 tumor_f 0.05 tumor_reads 10 typeNum 2 laneNum 1 partNum 2
replace the path of each tool or reference file in your own. The other parameters from headcrop to partNum should not be changed if you don't know their meanings.
-
After configuration, return to the path where TSNAD.py located:
python TSNAD.py -I [dir of WES/WGS] -R [dir of RNA-seq] -V [grch37/grch38] -O [dir of outputs]
headcrop: Cut the specified number of bases from the start of the read, default 10, used by trimmomatic
leading: Cut bases off the start of a read, if below a threshold quality,default 3, used by trimmomatic
minlen: Drop the read if it is below a specified length, default 35, used by trimmomatic
slidingwindow: Perform a sliding window trimming, cutting once the average quality within the window falls below a threshold, default 4:15, used by trimmomatic
normal_f: The maximum fraction of single nucleotide variant in normal sample, default 0, used for somatic mutation filtering.
normal_reads: The minimum number of sequence reads in normal sample, default 6, used for somatic mutation filtering.
tumor_alt: The minimum number of single nucleotide variant in tumor sample, default 5, used for somatic mutation filtering.
tumor_f: The minimum fraction of single nucleotide variant in tumor sample, default 0.05, used for somatic mutation filtering.
tumor_reads: The minimum number of sequence reads in tumor sample, default 10, used for somatic mutation filtering.
typeNum: The number of types of input files(i.e. tumor and normal:2, tumor only :1), default:2. In this tool, it's always 2.
laneNum: The number of lanes when sequencing, default:1.
partNum: Single-read sequencing:1, paired-end sequencing:2, default:2.
As the default parameters, the input WGS/WES files in the input directory should be
normal_L1_R1.fastq.gz
normal_L1_R2.fastq.gz
tumor_L2_R1.fastq.gz
tumor_L2_R2.fastq.gz
The samples could be downloaded from following links:
normal_L1_R1.fastq.gz
normal_L1_R2.fastq.gz
tumor_L2_R1.fastq.gz
tumor_L2_R2.fastq.gz
rna_L1_R1.fastq.gz
rna_L1_R2.fastq.gz
To generate useable neoantigen predictions, the minimum depth should be 15X for WGS and 50X for WES, the recommended depth should be 30X for WGS and 100X for WES. For sample with WES tumor/normal data and RNA-seq data, it takes about 50 hours to finish neoantigen prediction in the Ubuntu system with 64G memory and 512G hard disk space.
2021.07
-
replace SOAP-HLA and Kourami with OptiType
-
the version of each tool is listed as follows:
Trimmomatic 0.39 BWA 0.7.17 SAMtools 1.13 GATK 4.2.0.0 VEP 104 Hisat2 2.2.1 Stringtie 2.1.6 OptiType 1.3.5 STAR 2.7 Arriba 1.1.0 DeepHLApan 1.1
2019.09
- provide the neoantigen prediction from indel and gene fusion
- replace NetMHCpan with DeepHLApan
- provide the docker version of TSNAD
- provide the web-service of TSNAD (http://biopharm.zju.edu.cn/tsnad/)
2019.05
- VEP v94 -> v96
- Add the selection of grch38 when calling mutations.
2018.11
- Trimmomatic v0.35 -> v0.38
- BWA v0.7.12 -> v0.7.17
- SAMtools v1.3 -> v1.9
- Picard v1.140 -> embedded in GATK
- GATK v3.5 -> v4.0.11.0
- Annovar -> VEP v94
- NetMHCpan v2.8 -> v4.0
- Add the function of RNA-seq analysis for neoantigen filter.
2017.04
- GUI for neoantigen prediction
- Two parts: one for somatic mutation detection, another for HLA-peptide binding prediction.