Skip to content

gatk-workflows/gatk4-basic-joint-genotyping

Repository files navigation

gatk4-basic-joint-genotyping

Basic joint genotyping with GATK4. NOT Best Practices, only for teaching/demo purposes.

Inputs and outputs

Required inputs

  • One or more per-sample GVCF files (.g.vcf), provided as an array
  • Genomic resources: reference genome in FASTA format (.fasta) and its accessory files (.fasta.fai and .dict)
  • List of intervals to process in GATK intervals list format (.list)

Optional inputs

  • Resourcing and environment parameters including memory, disk space and container are all customaizable

Outputs

  • A multi-sample VCF of variants joint-called across the cohort, block-gzipped (.gz) with tabix index (.gz.tbi)

Overview of the pipeline

This workflow consists of four steps:

RenameAndIndexFile

Ensures that the input GVCF files have the appropriate file extensions (.g.vcf.gz) and creates an index file (.tbi).

  • Per file, scattered by input file
  • Expects an input GVCF
  • Outputs a copy of the GVCF (renamed if it did not have the right extension) and its index file.

ImportGVCFs

Imports data from GVCF into a GenomicsDB datastore

  • Across all inputs, scattered by genome interval
  • Expects an array of input GVCFs
  • Outputs a tarred GenomicsDB datastore

GenotypeGVCFs

Applies joint genotyping to all samples present in the datastore

  • Across all inputs, scattered by genome interval
  • Expects a tarred GenomicsDB datastore
  • Outputs a VCF file with variant calls made across the cohort

MergeVCFs

Merges VCF files across intervals generated by the scatter above

  • Across genomic intervals
  • Expects an array of per-interval VCFs
  • Outputs the final cohort VCF

About

Basic joint genotyping with GATK4. NOT Best Practices, only for teaching/demo purposes.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages