Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added check for non-finite copy ratios in ModelSegments pipeline. #4292

Merged
merged 2 commits into from Feb 21, 2018

Conversation

samuelklee
Copy link
Contributor

@LeeTL1220 This should fail earlier in situations like the one you ran into with the low coverage test data. I don't think there should be any unintended side effects. I'm fine if this doesn't make it in before the point release if you want to run some sanity checks on real data with it (since users should be able to figure out what is going on relatively quickly if they run into the issue), but I'll leave it up to you.

@codecov-io
Copy link

codecov-io commented Jan 29, 2018

Codecov Report

Merging #4292 into master will increase coverage by 0.188%.
The diff coverage is 100%.

@@              Coverage Diff               @@
##              master    #4292       +/-   ##
==============================================
+ Coverage     79.082%   79.27%   +0.188%     
- Complexity     16655    17036      +381     
==============================================
  Files           1050     1050               
  Lines          59681    60796     +1115     
  Branches        9754     9946      +192     
==============================================
+ Hits           47197    48193      +996     
- Misses          8696     8787       +91     
- Partials        3788     3816       +28
Impacted Files Coverage Δ Complexity Δ
.../tools/copynumber/denoising/SVDDenoisingUtils.java 79.221% <100%> (+0.273%) 46 <2> (+2) ⬆️
...er/tools/copynumber/formats/records/CopyRatio.java 76% <100%> (+2.087%) 9 <0> (ø) ⬇️
...broadinstitute/hellbender/utils/read/GATKRead.java 50% <0%> (-18.75%) 20% <0%> (+7%)
...er/tools/spark/sv/evidence/EvidenceTargetLink.java 70.513% <0%> (-6.05%) 18% <0%> (+3%)
...ender/engine/datasources/ReferenceMultiSource.java 76.471% <0%> (-0.452%) 15% <0%> (+4%)
...lbender/utils/read/SAMRecordToGATKReadAdapter.java 92.027% <0%> (-0.433%) 270% <0%> (+134%)
...ute/hellbender/utils/read/ArtificialReadUtils.java 93.909% <0%> (-0.186%) 122% <0%> (+52%)
...itute/hellbender/engine/spark/GATKRegistrator.java 100% <0%> (ø) 4% <0%> (+2%) ⬆️
...utils/test/ReadsPreprocessingPipelineTestData.java 1.25% <0%> (+0.403%) 1% <0%> (ø) ⬇️
...tructuralVariationDiscoveryArgumentCollection.java 97.143% <0%> (+0.476%) 0% <0%> (ø) ⬇️
... and 15 more

Copy link
Contributor

@LeeTL1220 LeeTL1220 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just add a bit more for a user to diagnose?

@LeeTL1220
Copy link
Contributor

@samuelklee I'm fine without this one in next release.

@samuelklee
Copy link
Contributor Author

@LeeTL1220 OK, tweaked the message a bit. I think I'm OK with this going in for the next point release. This is the sort of thing for which it will be nice to have the automatic validations, as a sanity check.

@LeeTL1220
Copy link
Contributor

@samuelklee Is it possible to expand the error message a bit so that users get something actionable. For example, "Does the interval list match what was covered in the bam file? Does this bam have coverage in all intervals?"

@samuelklee
Copy link
Contributor Author

samuelklee commented Feb 14, 2018

I'd rather keep the message more generic, and think of the check as simply defining what a valid CopyRatio object can be: an interval associated with a finite double value. One might imagine that someone would try to create such an object that does not originate from a BAM (perhaps for test data, or for imputing missing values in pre-existing data, etc.). This check says that they must create it with some finite value.

A more appropriate place for the sort of message you suggest is in the relevant denoising method. In the edge case you encountered, you used a BAM that was almost completely uncovered in all bins at the specified resolution, resulting in a sample median of zero. Since one of the steps in standardization is dividing by the sample median, this results in a divide by zero. I've added the corresponding check.

@samuelklee
Copy link
Contributor Author

@LeeTL1220 Is this good to go?

@LeeTL1220
Copy link
Contributor

LeeTL1220 commented Feb 21, 2018 via email

@samuelklee samuelklee merged commit ac32f83 into master Feb 21, 2018
@samuelklee samuelklee deleted the sl_no_cr_nans branch February 21, 2018 22:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants