Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added warning for unsorted genotype field #7887

Merged
merged 3 commits into from Jun 10, 2022

Conversation

orlicohen
Copy link
Contributor

Fixes #7732

@@ -472,6 +478,16 @@ protected GenomicsDBOptions getGenomicsDBOptions() {
public void onTraversalStart() {
final Map<String, VCFHeader> vcfHeaders = Collections.singletonMap(getDrivingVariantsFeatureInput().getName(), getHeaderForVariants());

List<String> genotypeField = getHeaderForVariants().getGenotypeSamples();
if(!ParsingUtils.isSorted(genotypeField)){
logger.warn("Detected unsorted genotype fields on input. This could result in very slow traversal for large files.");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#7732 give how dangerous the risk of 10x slower runs for very large VCF files I would recommend making this warning much louder and hard to miss. I would recommend making the warning behave like the warning in HaplotypeCaller.onTraversalStart() so it is much more visible. I would also change the text here "This could result in very slow traversal for large files" -> "SelectVariants will sort the genotypes on output which will result in very slow traversal on large inputs" or something along those lines.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this warning could be contextual and only happen if we would otherwise not unpack the genotypes, but that might be difficult to check for.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly tying the warning to the filesize/the number of genotypes in the first place might also be worth it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, make the size of the warning font be dependent on the number of genotypes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

@codecov
Copy link

codecov bot commented Jun 10, 2022

Codecov Report

Merging #7887 (4578723) into master (d4f083d) will decrease coverage by 0.000%.
The diff coverage is 83.333%.

@@               Coverage Diff               @@
##              master     #7887       +/-   ##
===============================================
- Coverage     86.936%   86.935%   -0.000%     
- Complexity     36945     36953        +8     
===============================================
  Files           2221      2221               
  Lines         173803    173821       +18     
  Branches       18775     18777        +2     
===============================================
+ Hits          151097    151112       +15     
- Misses         16075     16077        +2     
- Partials        6631      6632        +1     
Impacted Files Coverage Δ
...rs/variantutils/SelectVariantsIntegrationTest.java 97.712% <75.000%> (-0.424%) ⬇️
...der/tools/walkers/variantutils/SelectVariants.java 81.395% <90.000%> (+0.228%) ⬆️

Copy link
Collaborator

@jamesemery jamesemery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

@@ -489,8 +510,7 @@ public void onTraversalStart() {

// Look at the parameters to decide which analysis to perform
discordanceOnly = discordanceTrack != null;
if (discordanceOnly) {
logger.info("Selecting only variants discordant with the track: " + discordanceTrack.getName());
if (discordanceOnly) {logger.info("Selecting only variants discordant with the track: " + discordanceTrack.getName());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should fix the spacing/indentation here before merging

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^I agree with david can you revert this please?

@@ -489,8 +510,7 @@ public void onTraversalStart() {

// Look at the parameters to decide which analysis to perform
discordanceOnly = discordanceTrack != null;
if (discordanceOnly) {
logger.info("Selecting only variants discordant with the track: " + discordanceTrack.getName());
if (discordanceOnly) {logger.info("Selecting only variants discordant with the track: " + discordanceTrack.getName());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^I agree with david can you revert this please?

@orlicohen orlicohen merged commit 9f49f1f into master Jun 10, 2022
@orlicohen orlicohen deleted the oc_fix7732_unsortedvcfwarning branch June 10, 2022 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add warning to SelectVariants about poor performance if the samples are not sorted in the VCF header
4 participants