New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added a check to VariantAnnotator that all read samples are present in the VCF #6944
Conversation
…t in the provided vcf before running variant annotator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jamesemery I think the check you added here might be the reverse of the one required
final Set<String> readsSamples = getHeaderForReads().getReadGroups().stream().map(rg -> rg.getSample()).collect(Collectors.toSet()); | ||
readsSamples.forEach(readSample -> { | ||
if (!samples.contains(readSample)) | ||
throw new UserException(String.format("Reads sample '%s' from readgroups tags does not match any sample in the variant genotypes", readSample)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upon further investigation, @jamesemery and I confirmed that this patch is correct, and the original bug report was backwards. The NPE is triggered when there are samples in the bam not present in the vcf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into it @droazen @jamesemery I checked again and it looks like the user said there were no samples in common.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look at the version that user was running the NPE is actually occurring on this line:
returnMap.get(ReadUtils.getSampleName(read, header)).add(read);
Which means that the problem is caused when the returnMap (which gets generated right above from the VCF samples list) doesn't contain a sample that is present in one of the reads at that site. It sounded like the user had accentually matched up the wrong bam with the wrong VCF and thats why they got the crash. Technically what they said is accurate but not exactly what caused the bug, it was the reverse of what they reported as the bug. Given that multisample bams exist perhaps there is something more robust that can be done to fix the bug if this ever comes up again.
Fixes #6915