Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix and test for calculating CombineGVCFs intermediate band interval start locations. #4681

Merged
merged 1 commit into from Apr 25, 2018

Conversation

cmnbroad
Copy link
Collaborator

Fixes #4672.

Copy link
Collaborator

@jamesemery jamesemery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment

@@ -157,8 +157,14 @@ void createIntermediateVariants(SimpleInterval intervalToClose) {

// Break up the GVCF according to the provided reference blocking scheme
if ( multipleAtWhichToBreakBands > 0) {
for (int i = ((intervalToClose.getStart())/multipleAtWhichToBreakBands)*multipleAtWhichToBreakBands; i <= intervalToClose.getEnd(); i+=multipleAtWhichToBreakBands) {
sitesToStop.add(i-1); // Subtract 1 here because we want to split before this base
// if the intermediate interval to close starts before the end of the first interval,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a perfectly reasonable fix but i would probably add an extra layer of protection here. Below where sitesToStop is converted into intervalLocs you should also add a check that the intervals generated by this or the other break code are not going to be invalid. Also i think this is still broken with multipleAtWhichToBreakBands = 1 because you get a start position = 0

Copy link
Collaborator Author

@cmnbroad cmnbroad Apr 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamesemery I'm not clear on what kind of interval validation you think we should add ? If we ever do still create an invalid interval it would fail the same way it does now, which I think is the outcome we want. You're right about the start position 0 - I changed the banded stop site calculation, and wound up factoring it out into a separate method and added unit tests, since fine resolution can result in large test files, and I wanted to test a number of cases.

//Test that trailing reference blocks are emitted with correct intervals
{new File[]{getTestFile("gvcfExample1WithTrailingReferenceBlocks.g.vcf"), getTestFile("gvcfExample2WithTrailingReferenceBlocks.g.vcf")}, getTestFile("gvcfWithTrailingReferenceBlocksExpected.g.vcf"), NO_EXTRA_ARGS, b38_reference_20_21},
// same test as the previous one, except with a band multiple specified
{new File[]{getTestFile("gvcfExample1WithTrailingReferenceBlocks.g.vcf"), getTestFile("gvcfExample2WithTrailingReferenceBlocks.g.vcf")},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test with either/both of useBPResolution or bands of size 1 to demonstrate that there isn't a band being made at 0.

@codecov-io
Copy link

codecov-io commented Apr 22, 2018

Codecov Report

Merging #4681 into master will increase coverage by 0.121%.
The diff coverage is 100%.

@@               Coverage Diff               @@
##              master     #4681       +/-   ##
===============================================
+ Coverage     79.845%   79.966%   +0.121%     
- Complexity     17338     17444      +106     
===============================================
  Files           1074      1074               
  Lines          62956     63296      +340     
  Branches       10189     10290      +101     
===============================================
+ Hits           50267     50615      +348     
+ Misses          8707      8694       -13     
- Partials        3982      3987        +5
Impacted Files Coverage Δ Complexity Δ
...stitute/hellbender/tools/walkers/CombineGVCFs.java 94% <100%> (+0.25%) 62 <4> (+2) ⬆️
...ls/walkers/mutect/M2FiltersArgumentCollection.java 100% <0%> (ø) 1% <0%> (ø) ⬇️
...lkers/ReferenceConfidenceVariantContextMerger.java 95.5% <0%> (+0.243%) 122% <0%> (+51%) ⬆️
...nder/utils/runtime/StreamingProcessController.java 70.37% <0%> (+0.823%) 50% <0%> (ø) ⬇️
...hellbender/utils/haplotype/HaplotypeBAMWriter.java 100% <0%> (+1.923%) 19% <0%> (+9%) ⬆️
...der/tools/walkers/mutect/M2ArgumentCollection.java 96.875% <0%> (+2.757%) 3% <0%> (ø) ⬇️
...r/tools/walkers/mutect/Mutect2FilteringEngine.java 88.406% <0%> (+6.169%) 61% <0%> (+20%) ⬆️
...ellbender/tools/walkers/vqsr/CNNScoreVariants.java 81.69% <0%> (+6.804%) 48% <0%> (+7%) ⬆️
...r/engine/filters/ReadGroupBlackListReadFilter.java 90.667% <0%> (+7.03%) 32% <0%> (+15%) ⬆️
...utils/smithwaterman/SmithWatermanIntelAligner.java 80% <0%> (+30%) 3% <0%> (+2%) ⬆️

@cmnbroad cmnbroad force-pushed the cn_combine_gvcfs_bands branch 2 times, most recently from 667ba35 to e33b5d5 Compare April 23, 2018 14:10
Copy link
Collaborator

@jamesemery jamesemery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two very very minor comments that you could probably just ignore and merge.

{ new SimpleInterval("contig", 10, 10), 5, Arrays.asList(9) },

// TODO ?? is this the correct result ?
{ new SimpleInterval("contig", 10, 10), 10, Arrays.asList(9) },
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks reasonably correct to me, how does it significantly differ from the above two? These all get filtered out after this method anyway before we actually build the intervals so its not an issue for the output list to have a result outside of the interval space.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ehhh... i don't consider this critical or even important at all, but all of these tests that return Collections.EMPTY_LIST as a result are only ever failing on the first breakpoint interval, which technically goes through a new codepath. If you wanted you could add a test { new SimpleInterval("contig", 110, 120), 100, Collections.EMPTY_LIST }, to be more through.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on the first comment. I view the purpose of this test as validating that we don't create invalid intermediate intervals at this stage of the calculation, since thats what was originally failing - I think the fact that they get filtered out later is ok. Ideally we'd have additional unit tests for the subsequent parts of the calculation, but they require much more state.

I think the additional test you mentioned should return {"99"}, not the empty list, for this method. Again, that stop site will get filtered later as its outside of the interval. Its testing the case where the band multiple is smaller than the start of the close interval, which is essentially the same as the { new SimpleInterval("contig", 10, 20), 5, Arrays.asList(9, 14, 19) } test. Doesn't hurt to add it though.

@cmnbroad cmnbroad merged commit 01fa44d into master Apr 25, 2018
@cmnbroad cmnbroad deleted the cn_combine_gvcfs_bands branch April 25, 2018 01:26
cwhelan pushed a commit to cwhelan/gatk-linked-reads that referenced this pull request May 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants