Skip to content

ahalfpen727/VariantAnalysisScripts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Variant Calling Workflows

This repository contains several variant calling workflows. The workflows are designed to identify variants from sequence data produced by several different types of NGS techonologies. The pipelines follow the rigorous standards described in the refernce material associated with the Genome Analysis Toolkit at https://sites.google.com/a/broadinstitute.org/legacy-gatk-forum-discussions/best-practices-workflows and the same general workflow depicted below:

GATK's Best Practices Workflow for DNA-Seq Variant Calling

The pipelines are ordered in a logical sequence for initial discovery of variants in a large cost effective SNP-array followed by validation of the variants via targeted sequencing. In order to perform cost effective targetded sequencing, thorough variant discovery practices are employed to reduce the number of false positive. The pipelines also differ in how the samples are considered which is desribed further in the individual sections for each workflow.

SNP-array Analysis Pipeline:

This workflow was designed to use Illumina GSA SNP-array data as input

This pipeline identifies variants from Illumina's Infinium GSA SNP-array IDAT files. The pipeline uses Illumina's proprietary iiap command-line software to convert the IDAT files to GTC files. The rest of the pipeline relies on bcftools plugins and the Genome Analysis Toolkit in order to annotate, phase, filter, and identify variants. The pipeline is more thoroughly described here.

SNP-array Analysis Overview

  1. Convert IDAT to GTC (Illumina's iaap-cli gencall)
  2. Convert GTC to VCF (bcftools +gtc2vcf)
  3. Annotate variants (bcftools annotate)
  4. Extract ACMG59 table (bcftools view & bcftools query)
  5. Perform SNP QC (GATK)
  6. Phase genotypes
  7. Run MoChA (MoChA)
  8. Compute principal components and ancestry
  9. Extract final tables
  10. Generate MoChA call plots (MoChA)

GVC

Targeted Sequencing Pipeline:

This is a targeted sequencing variant anlysis workflow. This analysis requires I llumina MiSeq or Illumina NextSeq targeted sequencing data as input.

This pipeline identifies variants from targeted sequencing data.For Data produced by an Illumina MiSeq can be input into this pipeline. Infinium GSA SNP array. The pipeline uses Illumina's proprietary iiap command line software to conver the IDAT files to GTCs. The rest of the pipeline relies on bcftools plugins and the Genome Analysis Toolkit in order to identify variants.

  1. Align data (BWA MEM)
  2. Remove duplicates (picard)
  3. Recalibrate base pairs (GATK)
  4. Estimate target coverages (bdetools coverage)
  5. Run Mutect2 (GATK)
  6. HaplotypeCaller (GATK)
  7. Merge calls (bcftools merge)
  8. Annotate variants (bcftools annotate & bcftools csq)
  9. Extract final table with ACMG59 (bcftools query)
  10. Generate IGV plots (IGV)

GATK's Germline Variant Discovery for Analysis of a Cohort of Samples

GVC

GATK's Germline Variant Discovery for Analysis of Individuals Samples

GVS

About

Several variant analysis workflows for different types of sequencing data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published