Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixing bugs in CalculateGenotypePosteriers #4352

Merged
merged 2 commits into from Feb 20, 2018

Conversation

lbergelson
Copy link
Member

fixing 3 issues in CalclualteGenotypePosteriers

@lbergelson
Copy link
Member Author

@ldgauthier Could you take a look at this? I changed a bunch of logic to bring it inline with modern tool design. I think it's right, but maybe I'm breaking something that I didn't understand?

@codecov-io
Copy link

codecov-io commented Feb 6, 2018

Codecov Report

Merging #4352 into master will increase coverage by 0.015%.
The diff coverage is 85.714%.

@@              Coverage Diff               @@
##             master     #4352       +/-   ##
==============================================
+ Coverage     79.04%   79.055%   +0.015%     
- Complexity    16447     16449        +2     
==============================================
  Files          1047      1047               
  Lines         59189     59185        -4     
  Branches       9672      9670        -2     
==============================================
+ Hits          46783     46789        +6     
+ Misses         8644      8637        -7     
+ Partials       3762      3759        -3
Impacted Files Coverage Δ Complexity Δ
...kers/variantutils/CalculateGenotypePosteriors.java 92.537% <85.714%> (+6.622%) 14 <7> (+1) ⬆️
...park/sv/discovery/alignment/AlignmentInterval.java 89.272% <0%> (ø) 73% <0%> (ø) ⬇️
...kers/variantutils/PosteriorProbabilitiesUtils.java 77.966% <0%> (+0.847%) 43% <0%> (+1%) ⬆️
...oadinstitute/hellbender/utils/gcs/BucketUtils.java 80% <0%> (+1.29%) 39% <0%> (ø) ⬇️
...e/hellbender/engine/spark/SparkContextFactory.java 73.973% <0%> (+2.74%) 11% <0%> (ø) ⬇️

Copy link
Contributor

@ldgauthier ldgauthier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You took out a weird-looking conditional that was a fix for a particular issue. I'm open to an alternate fix that's not so awkward.
While we're here, I'm uncomfortable with the level of reference checking. The supportVariants VCF has to be on the same reference as the input.

if ( header.hasInfoLine(GATKVCFConstants.MLE_ALLELE_COUNT_KEY) ) {
final VCFInfoHeaderLine mleLine = header.getInfoHeaderLine(GATKVCFConstants.MLE_ALLELE_COUNT_KEY);
if ( mleLine.getCountType() != VCFHeaderLineCount.A ) {
throw new UserException("VCF does not have a properly formatted MLEAC field: the count type should be \"A\"");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no regression test for this, but it's from a particular issue: https://github.com/broadinstitute/gsa-unstable/issues/845
The issue was caused by a 4.0 VCF as the supplemental VCF.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmn, I think I thought this was being checked somewhere else now, but I can't find any evidence that it is. Not sure why I took that out...

fixing 3 issues in CalclualteGenotypePosteriers
* no longer emit duplicate variants if you have overlapping VCF records in the input
* specifying vcf.gz output will produce a vcf.gz instead of an unzipped file called vcf.gz
* fix potential NPE during shutdown
@lbergelson lbergelson force-pushed the lb_fixes_for_calculateGenotypeConcordance branch from fa09fce to acdd17c Compare February 20, 2018 20:27
@lbergelson
Copy link
Member Author

@ldgauthier I undid that change that made no sense and added some new tests so it doesn't happen again.

@lbergelson
Copy link
Member Author

@ldgauthier I believe that all of the features reference dictionaries are already checked against eachother for "compatibility". That check is currently kind of strange, but it catches bizarre inconsistencies.

if (!skipPopulationPriors) {
vc_bothPriors = PosteriorProbabilitiesUtils.calculatePosteriorProbs(vc_familyPriors, otherVCs, missing * numRefIfMissing, globalPrior, !ignoreInputSamples, defaultToAC, useACoff);
} else {
final VariantContextBuilder builder2 = new VariantContextBuilder(vc_familyPriors);
VariantContextUtils.calculateChromosomeCounts(builder, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, this should probably be builder2, right? All this time and no one ever said anything.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It definitely looks like it should be. It's effectively the same thing, but it's very confusing and if anyone ever make changes it could lead to problems... I might start a separate PR to address that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, what am I talking about... the thing that gets output should be the the result of modifying builder2 and we modify builder. That seems super wrong.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, i'm back to "it's dumb but fine"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new pr to fix this #4431

@ldgauthier
Copy link
Contributor

Take a look at the builder2 issue, but otherwise LGTM! 👍

@lbergelson lbergelson merged commit 7838ffd into master Feb 20, 2018
@lbergelson lbergelson deleted the lb_fixes_for_calculateGenotypeConcordance branch February 20, 2018 21:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Duplicate records in VCF produced by CalculateGenotypePosteriors
3 participants