New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix several issues with M2 and HC force-calling mode #5874
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -43,6 +43,7 @@ public final class AssemblyResultSet { | |
private boolean wasTrimmed = false; | ||
private final SortedSet<Integer> kmerSizes; | ||
private SortedSet<VariantContext> variationEvents; | ||
private OptionalInt lastMaxMnpDistanceUsed = OptionalInt.empty(); | ||
private boolean debug; | ||
private static final Logger logger = LogManager.getLogger(AssemblyResultSet.class); | ||
|
||
|
@@ -92,6 +93,7 @@ public AssemblyResultSet trimTo(final AssemblyRegion trimmedAssemblyRegion) { | |
result.setRegionForGenotyping(trimmedAssemblyRegion); | ||
result.setFullReferenceWithPadding(fullReferenceWithPadding); | ||
result.setPaddedReferenceLoc(paddedReferenceLoc); | ||
result.variationPresent = haplotypes.stream().anyMatch(Haplotype::isNonReference); | ||
if (result.refHaplotype == null) { | ||
throw new IllegalStateException("missing reference haplotype in the trimmed set"); | ||
} | ||
|
@@ -510,14 +512,24 @@ private void updateReferenceHaplotype(final Haplotype newHaplotype) { | |
*/ | ||
public SortedSet<VariantContext> getVariationEvents(final int maxMnpDistance) { | ||
ParamUtils.isPositiveOrZero(maxMnpDistance, "maxMnpDistance may not be negative."); | ||
if (variationEvents == null) { | ||
final List<Haplotype> haplotypeList = getHaplotypeList(); | ||
EventMap.buildEventMapsForHaplotypes(haplotypeList, fullReferenceWithPadding, paddedReferenceLoc, debug, maxMnpDistance); | ||
variationEvents = EventMap.getAllVariantContexts(haplotypeList); | ||
|
||
final boolean sameMnpDistance = lastMaxMnpDistanceUsed.isPresent() && maxMnpDistance == lastMaxMnpDistanceUsed.getAsInt(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How would the MNP distance change within the same tool execution? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It shouldn't ever change, but I lean toward keeping the logic self-consistent without assuming anything about the tools that invoke it. Of course, some of these classes are so entwined with HC and M2 that it's safe to assume things, but in this case the price of caution is small. Also, the code for exactly when event maps are cached was quite brittle in that you could write reasonable code and encounter gotchas, so I felt like making it all more explicit as to what had and had not been computed previously. |
||
lastMaxMnpDistanceUsed = OptionalInt.of(maxMnpDistance); | ||
|
||
if (variationEvents == null || !sameMnpDistance || haplotypes.stream().anyMatch(hap -> hap.isNonReference() && hap.getEventMap() == null)) { | ||
regenerateVariationEvents(maxMnpDistance); | ||
} | ||
return variationEvents; | ||
} | ||
|
||
public void regenerateVariationEvents(int maxMnpDistance) { | ||
final List<Haplotype> haplotypeList = getHaplotypeList(); | ||
EventMap.buildEventMapsForHaplotypes(haplotypeList, fullReferenceWithPadding, paddedReferenceLoc, debug, maxMnpDistance); | ||
variationEvents = EventMap.getAllVariantContexts(haplotypeList); | ||
lastMaxMnpDistanceUsed = OptionalInt.of(maxMnpDistance); | ||
variationPresent = haplotypeList.stream().anyMatch(Haplotype::isNonReference); | ||
} | ||
|
||
public void setDebug(boolean debug) { | ||
this.debug = debug; | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -542,9 +542,6 @@ public List<VariantContext> callRegion(final AssemblyRegion region, final Featur | |
final AssemblyResultSet untrimmedAssemblyResult = AssemblyBasedCallerUtils.assembleReads(region, givenAlleles, hcArgs, readsHeader, samplesList, logger, referenceReader, assemblyEngine, aligner, !hcArgs.doNotCorrectOverlappingBaseQualities); | ||
|
||
final SortedSet<VariantContext> allVariationEvents = untrimmedAssemblyResult.getVariationEvents(hcArgs.maxMnpDistance); | ||
// TODO - line bellow might be unnecessary : it might be that assemblyResult will always have those alleles anyway | ||
// TODO - so check and remove if that is the case: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It feels so good to finally take care of that TODO that's been around for longer than I have. |
||
allVariationEvents.addAll(givenAlleles); | ||
|
||
final AssemblyRegionTrimmer.Result trimmingResult = trimmer.trim(region, allVariationEvents); | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if the GGA allele is a SNP that is spanned by a deletion in the discovered variants the it's only added to the reference haplotype, right? And it will still get output in the vcf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and yes.