The TronFlow alignment pipeline is part of a collection of computational workflows for tumor-normal pair somatic variant calling.
This pipeline aligns paired and single end FASTQ files with BWA aln and mem algorithms and with BWA mem 2.
For RNA-seq STAR is also supported. To increase sensitivity of novel junctions use --star_two_pass_mode
(recommended for RNAseq variant calling).
It also includes an initial step of read trimming using FASTP.
Run it from GitHub as follows:
nextflow run tron-bioinformatics/tronflow-alignment -profile conda --input_files $input --output $output --algorithm aln --library paired
Otherwise download the project and run as follows:
nextflow main.nf -profile conda --input_files $input --output $output --algorithm aln --library paired
Find the help as follows:
$ nextflow run tron-bioinformatics/tronflow-alignment --help
N E X T F L O W ~ version 19.07.0
Launching `main.nf` [intergalactic_shannon] - revision: e707c77d7b
Usage:
nextflow main.nf --input_files input_files [--reference reference.fasta]
Input:
* input_fastq1: the path to a FASTQ file (incompatible with --input_files)
* input_name: name of the sample (only needed if input_fastq1 is used)
* input_files: the path to a tab-separated values file containing in each row the sample name and two paired FASTQs (incompatible with --fastq1 and --fastq2)
when `--library paired`, or a single FASTQ file when `--library single`
Example input file:
name1 fastq1.1 fastq1.2
name2 fastq2.1 fastq2.2
* reference: path to the indexed FASTA genome reference or the star reference folder in case of using star
Optional input:
* input_fastq2: the path to a second FASTQ file (incompatible with --input_files, incompatible with --library paired)
* output: the folder where to publish output (default: output)
* algorithm: determines the BWA algorithm, either `aln`, `mem`, `mem2` or `star` (default `aln`)
* library: determines whether the sequencing library is paired or single end, either `paired` or `single` (default `paired`)
* cpus: determines the number of CPUs for each job, with the exception of bwa sampe and samse steps which are not parallelized (default: 8)
* memory: determines the memory required by each job (default: 32g)
* inception: if enabled it uses an inception, only valid for BWA aln, it requires a fast file system such as flash (default: false)
* skip_trimming: skips the read trimming step
* star_two_pass_mode: activates STAR two-pass mode, increasing sensitivity of novel junction discovery, recommended for RNA variant calling (default: false)
* additional_args: additional alignment arguments, only effective in BWA mem, BWA mem 2 and STAR (default: none)
Output:
* A BAM file \${name}.bam and its index
* FASTP read trimming stats report in HTML format \${name.fastp_stats.html}
* FASTP read trimming stats report in JSON format \${name.fastp_stats.json}
The table with FASTQ files expects two tab-separated columns without a header
Sample name | FASTQ 1 | FASTQ 2 |
---|---|---|
sample_1 | /path/to/sample_1.1.fastq | /path/to/sample_1.2.fastq |
sample_2 | /path/to/sample_2.1.fastq | /path/to/sample_2.2.fastq |
The reference genome has to be provided in FASTA format and it requires two set of indexes:
- FAI index. Create with
samtools faidx your.fasta
- BWA indexes. Create with
bwa index your.fasta
For bwa-mem2 a specific index is needed:
bwa-mem2 index your.fasta
For star a reference folder prepared with star has to be provided. In order to prepare it will need the reference genome in FASTA format and the gene annotations in GTF format. Run a command as follows:
STAR --runMode genomeGenerate --genomeDir $YOUR_FOLDER --genomeFastaFiles $YOUR_FASTA --sjdbGTFfile $YOUR_GTF
- Li H. and Durbin R. (2010) Fast and accurate long-read alignment with Burrows-Wheeler Transform. Bioinformatics, Epub. https://doi.org/10.1093/bioinformatics/btp698
- Shifu Chen, Yanqing Zhou, Yaru Chen, Jia Gu; fastp: an ultra-fast all-in-one FASTQ preprocessor, Bioinformatics, Volume 34, Issue 17, 1 September 2018, Pages i884–i890, https://doi.org/10.1093/bioinformatics/bty560
- Vasimuddin Md, Sanchit Misra, Heng Li, Srinivas Aluru. Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems. IEEE Parallel and Distributed Processing Symposium (IPDPS), 2019.
- Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013 Jan 1;29(1):15-21. doi: 10.1093/bioinformatics/bts635. Epub 2012 Oct 25. PMID: 23104886; PMCID: PMC3530905.