Skip to content

How to generate inputs using GATK

Amaro Taylor-Weiner edited this page Aug 16, 2018 · 5 revisions

Mutation statistics file

DeTiN requires a candidate somatic site metrics file, two files listing heterozygous site allele counts in the tumor and normal samples, and a ”.seg” file with tumor segmented allelic copy number. We used MuTect version 1.1.6 (HG19) to generate ”call stats” files as the candidate somatic site metrics input using the following parameters:

–cosmic gs://firecloud-tcga-open-access/tutorial/reference/hg19 cosmic v54 120711.vcf

–dbsnp gs://firecloud-tcga-open-access/tutorial/reference/dbsnp 134 b37.leftAligned.vcf

–fraction contamination = 0.0001

–downsample to coverage = 10000

DeTiN is effective for other callers, however, MuTect presents several advantages over other callers for use with deTiN. First, MuTect emits every candidate somatic site which meets the tumor log odds threshold. This is a key feature since deTiN is only able to rescue and measure TiN from sites which are available in the input files. Second, MuTect explicitly reports which filters were used to reject a candidate variant. This allows deTiN to consider as candidate somatic mutations sites which were filtered due to evidence in the normal rather than mapping quality or other artifact modes.

Allelic Copy Number data

To generate the heterozygous allele counts and segmented allelic copy data we used GATK4. For a complete guide to running GATK4CNV see these documents: (How to) Call somatic copy number variants using GATK4 CNV, Overview of GetBayesianHetCoverage for heterozygous SNP callingand, Description and examples of the steps in the ACNV case workflow and examples of the steps in the ACNV case workflow. These modules are available for download and use in Firecloud (upon request). We used the following parameter settings:

GetBayesianHetCoverage –readDepthThreshold = 15, –minimumMappingQuality = 30, –minimumBaseQuality = 20, –hetCallingStringency = 30, –minimumAbnormalFraction = 0.8, –maximumAbnormalFraction = 0.9, –maximumCopyNumber = 2, –quadratureOrder = 200, –errorAdjustmentFactor = 1

PadTargets –padding = 250

PerformSegmentation –alpha = 0.01, –eta = 0.05 , –kmax = 25 , –minWidth = 2 , –nmin = 200 , –nperm = 10000 , –trim = 0.025 , –undoPrune = 0.05 , –undoSD = 3 , –undoSplits = NONE , –pmethod = HYBRID

AllelicCNV –smallSegmentThreshold = 3, –numSamplesCopyRatio = 100, –numBurnInCopyRatio = 50, –numSamplesAlleleFraction = 200, –numBurnInAlleleFraction = 100, –intervalThresholdCopyRatio = 2, –intervalThresholdAlleleFraction = 2, –maxNumIterationsSimSeg = 25, –maxNumIterationsSNPSeg = 25, –useAllCopyRatioSegments = false