Variant Calling Workflows

This repository contains several variant calling workflows. The workflows are designed to identify variants from sequence data produced by several different types of NGS techonologies. The pipelines follow the rigorous standards described in the refernce material associated with the Genome Analysis Toolkit at https://sites.google.com/a/broadinstitute.org/legacy-gatk-forum-discussions/best-practices-workflows and the same general workflow depicted below:

GATK's Best Practices Workflow for DNA-Seq Variant Calling

The pipelines are ordered in a logical sequence for initial discovery of variants in a large cost effective SNP-array followed by validation of the variants via targeted sequencing. In order to perform cost effective targetded sequencing, thorough variant discovery practices are employed to reduce the number of false positive. The pipelines also differ in how the samples are considered which is desribed further in the individual sections for each workflow.

SNP-array Analysis Pipeline:

This workflow was designed to use Illumina GSA SNP-array data as input

This pipeline identifies variants from Illumina's Infinium GSA SNP-array IDAT files. The pipeline uses Illumina's proprietary iiap command-line software to convert the IDAT files to GTC files. The rest of the pipeline relies on bcftools plugins and the Genome Analysis Toolkit in order to annotate, phase, filter, and identify variants. The pipeline is more thoroughly described .

SNP-array Analysis Overview

Convert IDAT to GTC (Illumina's iaap-cli gencall)
Convert GTC to VCF (bcftools +gtc2vcf)
Annotate variants (bcftools annotate)
Extract ACMG59 table (bcftools view & bcftools query)
Perform SNP QC (GATK)
Phase genotypes
Run MoChA (MoChA)
Compute principal components and ancestry
Extract final tables
Generate MoChA call plots (MoChA)

Targeted Sequencing Pipeline:

This is a targeted sequencing variant anlysis workflow. This analysis requires I llumina MiSeq or Illumina NextSeq targeted sequencing data as input.

This pipeline identifies variants from targeted sequencing data.For Data produced by an Illumina MiSeq can be input into this pipeline. Infinium GSA SNP array. The pipeline uses Illumina's proprietary iiap command line software to conver the IDAT files to GTCs. The rest of the pipeline relies on bcftools plugins and the Genome Analysis Toolkit in order to identify variants.

Align data (BWA MEM)
Remove duplicates (picard)
Recalibrate base pairs (GATK)
Estimate target coverages (bdetools coverage)
Run Mutect2 (GATK)
HaplotypeCaller (GATK)
Merge calls (bcftools merge)
Annotate variants (bcftools annotate & bcftools csq)
Extract final table with ACMG59 (bcftools query)
Generate IGV plots (IGV)

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
MicroarayScripts		MicroarayScripts
TargetedSeqScripts		TargetedSeqScripts
misc		misc
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Variant Calling Workflows

GATK's Best Practices Workflow for DNA-Seq Variant Calling

SNP-array Analysis Pipeline:

This workflow was designed to use Illumina GSA SNP-array data as input

SNP-array Analysis Overview

Targeted Sequencing Pipeline:

This is a targeted sequencing variant anlysis workflow. This analysis requires I llumina MiSeq or Illumina NextSeq targeted sequencing data as input.

GATK's Germline Variant Discovery for Analysis of a Cohort of Samples

GATK's Germline Variant Discovery for Analysis of Individuals Samples

About

Releases

Packages

Languages

ahalfpen727/VariantAnalysisScripts

Folders and files

Latest commit

History

Repository files navigation

Variant Calling Workflows

GATK's Best Practices Workflow for DNA-Seq Variant Calling

SNP-array Analysis Pipeline:

This workflow was designed to use Illumina GSA SNP-array data as input

SNP-array Analysis Overview

Targeted Sequencing Pipeline:

This is a targeted sequencing variant anlysis workflow. This analysis requires I llumina MiSeq or Illumina NextSeq targeted sequencing data as input.

GATK's Germline Variant Discovery for Analysis of a Cohort of Samples

GATK's Germline Variant Discovery for Analysis of Individuals Samples

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages