Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix several issues with M2 and HC force-calling mode #5874

Merged
merged 2 commits into from Apr 18, 2019
Merged

Conversation

davidbenjamin
Copy link
Contributor

Closes #5857.

@ldgauthier Here it is, with the 5-haplotype compromise

Copy link
Contributor

@ldgauthier ldgauthier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor things and one unit test request. Feel free to merge after you add the test.

for (final Allele givenAllele : unassembledGivenAlleles) {
for (final Haplotype baseHaplotype : baseHaplotypes) {
// make sure this allele doesn't collide with a variant on the haplotype
if (baseHaplotype.getEventMap()!= null && baseHaplotype.getEventMap().getVariantContexts().stream().anyMatch(vc -> vc.overlaps(givenVC))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if the GGA allele is a SNP that is spanned by a deletion in the discovered variants the it's only added to the reference haplotype, right? And it will still get output in the vcf?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and yes.

EventMap.buildEventMapsForHaplotypes(haplotypeList, fullReferenceWithPadding, paddedReferenceLoc, debug, maxMnpDistance);
variationEvents = EventMap.getAllVariantContexts(haplotypeList);

final boolean sameMnpDistance = lastMaxMnpDistanceUsed.isPresent() && maxMnpDistance == lastMaxMnpDistanceUsed.getAsInt();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would the MNP distance change within the same tool execution?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't ever change, but I lean toward keeping the logic self-consistent without assuming anything about the tools that invoke it. Of course, some of these classes are so entwined with HC and M2 that it's safe to assume things, but in this case the price of caution is small.

Also, the code for exactly when event maps are cached was quite brittle in that you could write reasonable code and encounter gotchas, so I felt like making it all more explicit as to what had and had not been computed previously.

@@ -542,9 +542,6 @@ public ActivityProfileState isActive( final AlignmentContext context, final Refe
final AssemblyResultSet untrimmedAssemblyResult = AssemblyBasedCallerUtils.assembleReads(region, givenAlleles, hcArgs, readsHeader, samplesList, logger, referenceReader, assemblyEngine, aligner, !hcArgs.doNotCorrectOverlappingBaseQualities);

final SortedSet<VariantContext> allVariationEvents = untrimmedAssemblyResult.getVariationEvents(hcArgs.maxMnpDistance);
// TODO - line bellow might be unnecessary : it might be that assemblyResult will always have those alleles anyway
// TODO - so check and remove if that is the case:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels so good to finally take care of that TODO that's been around for longer than I have.

* @param readErrorCorrector a ReadErrorCorrector object, if read are to be corrected before assembly. Can be null if no error corrector is to be used.
* @param aligner
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're going to modify the java doc anyway, can you add a blurb for the aligner?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* @param reads the reads we're going to assemble
* @param refHaplotype the reference haplotype
* @param aligner
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

java doc update please

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* @param allowLowComplexityGraphs if true, do not check for low-complexity graphs
* @param allowNonUniqueKmersInRef if true, do not fail if the reference has non-unique kmers
* @param aligner
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -61,7 +61,7 @@ public EventMap(final Collection<VariantContext> stateForTesting) {
/**
*
* @param maxMnpDistance Phased substitutions separated by this distance or less are merged into MNPs. More than
* two substitutions occuring in the same alignment block (ie the same M/X/EQ CIGAR element)
* two substitutions occurring in the same alignment block (ie the same M/X/EQ CIGAR element)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boy Scout Rule

@@ -464,6 +464,33 @@ public void testGivenAllelesMode() throws Exception {
}
}

/**
* Here we give Mutect2 ridiculous kmer settings in order to force assembly to fail.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was where the previous behavior was an infinite loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous behavior was: assembly fails, therefore GGA alleles are injected in vain and don't get genotyped. Assembly still fails now, but we then inject the GGA alleles into the reference without using a graph.

Assert.assertEquals(assemblyResultSet.getHaplotypeCount(), 4);
Assert.assertEquals(assemblyResultSet.getHaplotypeList().get(2).getBaseString(), "AAAAGCCCGGGGTTTT");
Assert.assertEquals(assemblyResultSet.getHaplotypeList().get(3).getBaseString(), "ACAAGCCCGGGGTTTT");

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like your code should handle multi-allelic input alleles, but can you add a test please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@codecov-io
Copy link

codecov-io commented Apr 18, 2019

Codecov Report

Merging #5874 into master will increase coverage by 6.735%.
The diff coverage is 97.163%.

@@               Coverage Diff               @@
##              master     #5874       +/-   ##
===============================================
+ Coverage     80.106%   86.841%   +6.735%     
- Complexity     30643     32319     +1676     
===============================================
  Files           1991      1991               
  Lines         149202    149308      +106     
  Branches       16480     16486        +6     
===============================================
+ Hits          119520    129661    +10141     
+ Misses         23875     13633    -10242     
- Partials        5807      6014      +207
Impacted Files Coverage Δ Complexity Δ
...walkers/haplotypecaller/HaplotypeCallerEngine.java 78.547% <ø> (-0.074%) 74 <0> (ø)
...institute/hellbender/utils/haplotype/EventMap.java 84.53% <ø> (ø) 40 <0> (ø) ⬇️
...pecaller/readthreading/ReadThreadingAssembler.java 66.25% <100%> (-1.827%) 48 <0> (-4)
...lotypecaller/AssemblyBasedCallerUtilsUnitTest.java 98.542% <100%> (+0.155%) 89 <2> (+2) ⬆️
...r/tools/walkers/mutect/Mutect2IntegrationTest.java 88.06% <100%> (+0.284%) 95 <4> (+4) ⬆️
...der/tools/walkers/mutect/M2ArgumentCollection.java 90.244% <100%> (ø) 16 <1> (+1) ⬆️
.../readthreading/ReadThreadingAssemblerUnitTest.java 98.712% <100%> (ø) 38 <0> (ø) ⬇️
...otypecaller/HaplotypeCallerArgumentCollection.java 100% <100%> (ø) 4 <1> (+1) ⬆️
...ecaller/AssemblyBasedCallerArgumentCollection.java 100% <100%> (ø) 2 <0> (ø) ⬇️
...ols/walkers/haplotypecaller/AssemblyResultSet.java 76.836% <83.333%> (+0.505%) 51 <3> (+5) ⬆️
... and 169 more

@davidbenjamin davidbenjamin merged commit 5381421 into master Apr 18, 2019
@davidbenjamin davidbenjamin deleted the db_gga branch April 18, 2019 11:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve force-calling / GGA mode
3 participants