Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize some Mutect-related tools #5073

Merged
merged 4 commits into from Aug 2, 2018

Conversation

@davidbenjamin
Copy link
Contributor

commented Jul 31, 2018

@LeeTL1220 The first commit makes all concordance tools and MC3/M2 vcf merge much faster in Firecloud -- tasks that have been taking an hour will take a few minutes. The second commit makes GetPileupSummaries, hence the contamination task much faster. Previously that tool has cached all reads for the 100,000 bases around each site, so basically it had to do a whole bam's worth of I/O just to get ~60,000 pileups.

@@ -77,7 +77,7 @@ private void initializeTruthVariantsIfNecessary() {
}

if (truthVariants == null) {
truthVariants = new FeatureDataSource<>(new FeatureInput<>(truthVariantsFile, "truth"), CACHE_LOOKAHEAD, VariantContext.class);
truthVariants = new FeatureDataSource<>(new FeatureInput<>(truthVariantsFile, "truth"), CACHE_LOOKAHEAD, VariantContext.class, cloudPrefetchBuffer, cloudIndexPrefetchBuffer);

This comment has been minimized.

Copy link
@cmnbroad

cmnbroad Jul 31, 2018

Collaborator

@davidbenjamin You should make this same change to the evalVariants FeatureDataSource.

This comment has been minimized.

Copy link
@davidbenjamin

davidbenjamin Jul 31, 2018

Author Contributor

Thanks @cmnbroad, did it.

@davidbenjamin davidbenjamin force-pushed the db_m2_optimizations branch from d6da091 to 518b9b7 Jul 31, 2018

@codecov-io

This comment has been minimized.

Copy link

commented Jul 31, 2018

Codecov Report

Merging #5073 into master will increase coverage by 0.034%.
The diff coverage is 92.308%.

@@               Coverage Diff               @@
##              master     #5073       +/-   ##
===============================================
+ Coverage     86.385%   86.419%   +0.034%     
- Complexity     28822     29084      +262     
===============================================
  Files           1791      1791               
  Lines         133561    134336      +775     
  Branches       14902     15138      +236     
===============================================
+ Hits          115377    116092      +715     
- Misses         12791     12808       +17     
- Partials        5393      5436       +43
Impacted Files Coverage Δ Complexity Δ
...ntamination/GetPileupSummariesIntegrationTest.java 95.455% <100%> (+0.216%) 4 <0> (ø) ⬇️
...e/hellbender/engine/AbstractConcordanceWalker.java 87.059% <100%> (ø) 13 <0> (ø) ⬇️
...ools/walkers/contamination/GetPileupSummaries.java 81.633% <88.889%> (+0.782%) 19 <9> (+1) ⬆️
...walkers/genotyper/GenotypingGivenAllelesUtils.java 40.541% <0%> (-20.998%) 2% <0%> (ø)
...utils/variant/GATKVariantContextUtilsUnitTest.java 88.811% <0%> (+2.209%) 186% <0%> (+43%) ⬆️
...pecaller/readthreading/ReadThreadingAssembler.java 70.769% <0%> (+2.271%) 79% <0%> (+27%) ⬆️
...lbender/utils/variant/GATKVariantContextUtils.java 88.601% <0%> (+3.306%) 409% <0%> (+188%) ⬆️
...org/broadinstitute/hellbender/utils/BaseUtils.java 90.265% <0%> (+3.54%) 48% <0%> (+2%) ⬆️
...utils/smithwaterman/SmithWatermanIntelAligner.java 90% <0%> (+10%) 3% <0%> (ø) ⬇️
...ithwaterman/SmithWatermanIntelAlignerUnitTest.java 90% <0%> (+30%) 2% <0%> (ø) ⬇️
@LeeTL1220
Copy link
Contributor

left a comment

@davidbenjamin One minor comment.

@@ -55,14 +53,15 @@
* gatk GetPileupSummaries \
* -I tumor.bam \
* -V common_biallelic.vcf.gz \
* -V common_biallelic.vcf.gz \

This comment has been minimized.

Copy link
@LeeTL1220

LeeTL1220 Aug 1, 2018

Contributor

Why -V twice?

This comment has been minimized.

Copy link
@davidbenjamin

davidbenjamin Aug 1, 2018

Author Contributor

Whoops -- one is -V and one should be -L. I have to admit this is kind of clumsy but it's not utterly redundant, at least. For example, one might want to use some custom interval list (-L) of common sites but grab the allele frequencies from gnomAD (-V).

Fixed it and expanded the javadoc a bit.

@davidbenjamin

This comment has been minimized.

Copy link
Contributor Author

commented Aug 1, 2018

@LeeTL1220 back to you.

@LeeTL1220
Copy link
Contributor

left a comment

@davidbenjamin One more doc comment, then feel free to merge.

@@ -65,6 +65,17 @@
* -O pileups.table
* </pre>
*
* Although the sites (-L) and variants (-V) resources will often be identical, this need not be the case. For example,

This comment has been minimized.

Copy link
@LeeTL1220

LeeTL1220 Aug 1, 2018

Contributor

Typically these files would only be different if you were trying to use a subset of the variants? I.e. -L was a subset of -V? If so, can you add that to the doc?

This comment has been minimized.

Copy link
@davidbenjamin

davidbenjamin Aug 1, 2018

Author Contributor

Done.

@davidbenjamin davidbenjamin force-pushed the db_m2_optimizations branch from 5f82c7e to 351582c Aug 1, 2018

@davidbenjamin davidbenjamin merged commit 27716dc into master Aug 2, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@davidbenjamin davidbenjamin deleted the db_m2_optimizations branch Aug 2, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.