New qual doesn't count spanning deletions as support for variant qual #4801

davidbenjamin · 2018-05-23T10:56:21Z

@ldgauthier The new qual was not doing the right thing for spanning deletions. This fixes it. Would you mind reviewing? And would you like me to re-run those GVCFs just in case before going ahead with #4614?

ldgauthier

Did you confirm that both those tests fail in master? I pasted them into my branch and testPresenceOfUnlikelySpanningDeletionDoesntAffectResults passes, but it's possible some changes I made in my branch are affecting the results.

davidbenjamin · 2018-05-23T14:42:05Z

testSpanningDeletionIsNotConsideredVariant fails in master. testPresenceOfUnlikelySpanningDeletionDoesntAffectResults passes here and in master, as it should. Its purpose is to check that the spanning deletion logic doesn't break anything.

ldgauthier · 2018-05-23T18:42:44Z

If the second test passed in master then I don't consider it to be a good test of your fix. Can you make a test where the PLs with and without span del would lead to different QUALs in the old version? Maybe a case with a confident deletion in sample1 and a low quality SNP in sample2 that causes a spanning deletion in sample1. You could probably do this by modifying your first test by adding a sample that has A or B called with low GQ with respect to the reference.

davidbenjamin · 2018-05-23T19:44:09Z

The point of the second test is to make sure that the new code path involving the spanning deletion has no effect if the spanning deletion has low likelihood. You could imagine that I introduced a bug where you lose sensitivity whenever a spanning deletion appears in the alleles list even if there's very little evidence for it, just because its potential existence sends the code down some new, incorrect, logic.

I think testSpanningDeletionIsNotConsideredVariant is already doing what you want because before this fix the example VC is definitely considered to be variant even though the only non-ref evidence is for the symbolic * allele. The PLs in this example are 50, 100, 100, 0, 100, 100, which means that it is overwhelmingly likely that this sample's genotype is REF/SPAN_DEL. In the old code you get a qual of about 50 (the hom ref PL), whereas in the new code you get that the chances of a variant are about 1 in 10^10 since the only truly variant PLs are 100.

Now, if you think a multi-sample case would be useful that sounds good to me. However, a "confident deletion in sample1 and a low quality SNP in sample2" would not look very different since the deletion would only exist in an upstream VC, not the VC being tested, so basically we would have sample1 = REF/SPAN_DEL, sample2 = REF/SNP with low quality. The current test is the same thing with just sample1. Is this what you had in mind?

ldgauthier · 2018-05-25T18:47:26Z

Let's be rigorous and use two samples so that you're testing a case that could actually occur in the wild. It should be easy enough to add sample2 with PLs like [10,0,40,100,70,300]. Then can you add a comment about where the "magic number" in the assert statement comes from? Or maybe to be super duper rigorous you could compare the QUAL from that variant to the QUAL for sample3 and sample4 where 3 and 4 have the same PLs as 1 and 2, but without the spanning deletion.

davidbenjamin · 2018-05-27T03:54:41Z

@ldgauthier I did all of those things. Note that to be precise, when you remove the spanning deletion the "het" REF / SPAN_DEL must be treated as haploid, which the new tests do.

ldgauthier

Beautiful new tests. I completely agree with the logic about the span-del het being like a haploid ref.

codecov-io · 2018-05-30T14:02:29Z

Codecov Report

Merging #4801 into master will increase coverage by 0.348%.
The diff coverage is 90.476%.

@@               Coverage Diff               @@
##              master     #4801       +/-   ##
===============================================
+ Coverage     80.131%   80.479%   +0.348%     
- Complexity     17488     18032      +544     
===============================================
  Files           1085      1089        +4     
  Lines          63245     65070     +1825     
  Branches       10200     10630      +430     
===============================================
+ Hits           50679     52368     +1689     
- Misses          8579      8632       +53     
- Partials        3987      4070       +83

Impacted Files	Coverage Δ	Complexity Δ
...org/broadinstitute/hellbender/utils/MathUtils.java	`77.162% <100%> (+0.308%)`	`194 <2> (+2)`	⬆️
...rs/genotyper/afcalc/AlleleFrequencyCalculator.java	`85.227% <86.667%> (-0.299%)`	`28 <4> (+5)`
...forms/markduplicates/MarkDuplicatesSparkUtils.java	`87.64% <0%> (-1.857%)`	`69% <0%> (+7%)`
...s/spark/sv/discovery/SvDiscoveryInputMetaData.java	`100% <0%> (ø)`	`9% <0%> (+7%)`	⬆️
.../utils/read/markduplicates/DuplicationMetrics.java	`85.366% <0%> (ø)`	`13% <0%> (?)`
...covery/inference/CpxVariantReInterpreterSpark.java	`100% <0%> (ø)`	`5% <0%> (?)`
...ils/test/testers/AbstractMarkDuplicatesTester.java	`79.487% <0%> (ø)`	`17% <0%> (?)`
...nce/SegmentedCpxVariantSimpleVariantExtractor.java	`93.96% <0%> (ø)`	`71% <0%> (?)`
.../sv/StructuralVariationDiscoveryPipelineSpark.java	`88.652% <0%> (+0.081%)`	`13% <0%> (ø)`	⬇️
...iscoverFromLocalAssemblyContigAlignmentsSpark.java	`77.019% <0%> (+0.177%)`	`3% <0%> (+1%)`	⬆️
... and 30 more

ldgauthier · 2018-05-30T14:18:16Z

@davidbenjamin I kicked Travis and now tests are green so you can merge.

new qual doesn't count spanning deletions as support for variant qual

f4f815d

davidbenjamin assigned ldgauthier May 23, 2018

ldgauthier reviewed May 23, 2018

View reviewed changes

added some unit tests

64f3fd9

ldgauthier approved these changes May 30, 2018

View reviewed changes

davidbenjamin merged commit ecc32f6 into master May 30, 2018

davidbenjamin deleted the db_new_qual_span_del branch May 30, 2018 15:40

davidbenjamin mentioned this pull request May 30, 2018

make newQual the default for GenotypeGVCFs #4614

Closed

droazen mentioned this pull request Jul 2, 2018

New qual parameter errors GenotypeGVCFs in v4.0.5.0. #4975

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New qual doesn't count spanning deletions as support for variant qual #4801

New qual doesn't count spanning deletions as support for variant qual #4801

davidbenjamin commented May 23, 2018

ldgauthier left a comment

davidbenjamin commented May 23, 2018

ldgauthier commented May 23, 2018

davidbenjamin commented May 23, 2018

ldgauthier commented May 25, 2018

davidbenjamin commented May 27, 2018

ldgauthier left a comment

codecov-io commented May 30, 2018

ldgauthier commented May 30, 2018

New qual doesn't count spanning deletions as support for variant qual #4801

New qual doesn't count spanning deletions as support for variant qual #4801

Conversation

davidbenjamin commented May 23, 2018

ldgauthier left a comment

Choose a reason for hiding this comment

davidbenjamin commented May 23, 2018

ldgauthier commented May 23, 2018

davidbenjamin commented May 23, 2018

ldgauthier commented May 25, 2018

davidbenjamin commented May 27, 2018

ldgauthier left a comment

Choose a reason for hiding this comment

codecov-io commented May 30, 2018

Codecov Report

ldgauthier commented May 30, 2018