Skip to content

MaestSi/Illumina_variant_calling

Repository files navigation

Illumina_variant_calling

Illumina_variant_calling is a pipeline for performing quality check, alignment and small variant calling with Illumina paired-end reads.

drawing

Getting started

Prerequisites

  • Miniconda3. Tested with conda 4.10.3. which conda should return the path to the executable. If you don't have Miniconda3 installed, you could download and install it with:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
chmod 755 Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh

Then, after completing Illumina_variant_calling installation, set the MINICONDA_DIR variable in config_Variant_calling_pipeline.sh to the full path to miniconda3 directory.

  • A reference sequence in fasta format

  • A pair of fastq files containing R1 and R2 paired-end Illumina reads

Installation

git clone https://github.com/MaestSi/Illumina_variant_calling.git
cd Illumina_variant_calling
chmod 755 *
./install.sh

Otherwise, you can download a docker image with:

docker pull maestsi/illumina_variant_calling:latest

A conda environment named Illumina_variant_calling_env is created, where fastqc, fastp, samtools, bwa, picard, gatk and qualimap are installed. Then, you can open the config_Variant_calling_pipeline.sh file with a text editor and set the variables PIPELINE_DIR and MINICONDA_DIR to the value suggested by the installation step.

Usage

Variant_calling_pipeline.sh

Usage: ./Variant_calling_pipeline.sh -1 <sample_name_reads_R1.fastq> -2 <sample_name_reads_R2.fastq> -r <reference.fasta>

Note: activate the conda environment with conda activate Illumina_variant_calling_env before running.

Inputs:

  • <sample_name_reads_R1.fastq>: fastq file containing R1 reads
  • <sample_name_reads_R2.fastq>: fastq file containing R2 reads
  • <reference.fasta>: fasta sequence to be used as a reference for alignment and variant calling

Outputs:

  • QC: folder containing many sequencing and mapping quality reports
  • <sample_name_mapped_to_<reference_name>_MarkDup_Clipped.bam: post-processed bam file
  • <sample_name>.variants.filtered.vcf.gz: vcf file containing filtered variants
  • <sample_name>.complete.raw.g.vcf.gz: gvcf file containing genotype at each genomic locus

Citation

Lopatriello G, Maestri S, Alfano M, Papa R, Di Vittori V, De Antoni L, Bellucci E, Pieri A, Bitocchi E, Delledonne M, Rossato M. CRISPR/Cas9-Mediated Enrichment Coupled to Nanopore Sequencing Provides a Valuable Tool for the Precise Reconstruction of Large Genomic Target Regions. International Journal of Molecular Sciences. 2023; 24(2):1076. https://doi.org/10.3390/ijms24021076

For further information, please refer to the following manuscripts or repositories:

FastQC

Shifu Chen, Yanqing Zhou, Yaru Chen, Jia Gu, fastp: an ultra-fast all-in-one FASTQ preprocessor, Bioinformatics, Volume 34, Issue 17, 01 September 2018, Pages i884–i890, https://doi.org/10.1093/bioinformatics/bty560

Li H. (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:1303.3997v2

Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R; 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009 Aug 15;25(16):2078-9. doi: 10.1093/bioinformatics/btp352. Epub 2009 Jun 8. PMID: 19505943; PMCID: PMC2723002.

fgbio

Picard Tools

Konstantin Okonechnikov, Ana Conesa and Fernando García-Alcalde "Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data." Bioinformatics(2015)

Van der Auwera GA & O'Connor BD. (2020). Genomics in the Cloud: Using Docker, GATK, and WDL in Terra (1st Edition). O'Reilly Media

About

A quality check, alignment and small variant calling pipeline for Illumina paired-end reads

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published