Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to emit (but still filter) all germline sites in Mutect2 #4522

Merged
merged 3 commits into from Mar 13, 2018

Conversation

Projects
None yet
4 participants
@davidbenjamin
Copy link
Contributor

commented Mar 12, 2018

@takutosato I made sure that when the new option is turned off, as it is by default, nothing changes. Also, the new option surprisingly only added 30% or so to runtime because M2 already calls so many
false active regions.

@chandrans @sooheelee This is the feature we have been discussing.

@takutosato
Copy link
Contributor

left a comment

One trivial comment but otherwise looks great!

@@ -222,6 +220,14 @@ private void addGenotypes(final LikelihoodMatrix<Allele> tumorLog10Matrix,
callVcb.genotypes(genotypes);
}

public static double[] getEffectiveCounts(final LikelihoodMatrix<Allele> log10LikelihoodMatrix) {
if (log10LikelihoodMatrix.numberOfReads() == 0) {
return new double[log10LikelihoodMatrix.numberOfAlleles()]; // zero couns for each allele

This comment has been minimized.

Copy link
@takutosato

takutosato Mar 12, 2018

Contributor

couns -> counts

@davidbenjamin davidbenjamin force-pushed the db_genotype_germline branch from fbd8850 to 3f0d5b2 Mar 12, 2018

@codecov-io

This comment has been minimized.

Copy link

commented Mar 12, 2018

Codecov Report

Merging #4522 into master will decrease coverage by 0.007%.
The diff coverage is 58.333%.

@@               Coverage Diff               @@
##              master     #4522       +/-   ##
===============================================
- Coverage     79.118%   79.111%   -0.007%     
  Complexity     16675     16675               
===============================================
  Files           1051      1051               
  Lines          60282     60285        +3     
  Branches        9874      9875        +1     
===============================================
- Hits           47694     47692        -2     
- Misses          8764      8765        +1     
- Partials        3824      3828        +4
Impacted Files Coverage Δ Complexity Δ
...ls/walkers/mutect/M2FiltersArgumentCollection.java 100% <ø> (ø) 1 <0> (ø) ⬇️
...tools/walkers/mutect/SomaticLikelihoodsEngine.java 86.111% <ø> (-0.731%) 12 <0> (-2)
...hellbender/tools/walkers/mutect/Mutect2Engine.java 87.582% <0%> (-1.307%) 45 <0> (ø)
...der/tools/walkers/mutect/M2ArgumentCollection.java 100% <100%> (ø) 1 <0> (ø) ⬇️
.../tools/walkers/mutect/SomaticGenotypingEngine.java 91.391% <62.5%> (-1.807%) 59 <7> (+2)

@davidbenjamin davidbenjamin merged commit 69c0c01 into master Mar 13, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@davidbenjamin davidbenjamin deleted the db_genotype_germline branch Mar 13, 2018

@sooheelee

This comment has been minimized.

Copy link
Contributor

commented Mar 14, 2018

Thanks for including us in the discussion @davidbenjamin.

the new option surprisingly only added 30% or so to runtime because M2 already calls so many
false active regions.

This is a great value-add.

@sooheelee

This comment has been minimized.

Copy link
Contributor

commented Mar 14, 2018

Just to clarify @davidbenjamin @takutosato, the normal sample calling uses:

  1. HaplotypeCaller logic for germline calling OR
  2. Mutect2 sensitive calling?
@davidbenjamin

This comment has been minimized.

Copy link
Contributor Author

commented Mar 15, 2018

@sooheelee This new option isn't doing germline calling per se. Rather it calls all variants, somatic and germline, in the tumor sample. To directly call germline events in the normal it wold be faster and better to use HaplotypeCaller.

@sooheelee

This comment has been minimized.

Copy link
Contributor

commented Mar 15, 2018

Thanks for the clarification. To paraphrase the following:

This new option isn't doing germline calling per se. Rather it calls all variants, somatic and germline, in the tumor sample.

We now have a parameter that allows inclusion of sites that are variant in the tumor that are also variant in the matched-normal and/or present in the germline resource. With the option enabled, these sites are emitted for the tumor and if the matched-normal is part of the analysis, then its variants for the sites are also noted in the callset. Finally, this parameter is set to true with --genotype-germline-sites.

@davidbenjamin

This comment has been minimized.

Copy link
Contributor Author

commented Mar 15, 2018

That's right, and just because more explicitness never hurts, here's another paraphrase.

M2 default: call somatic variation in the tumor sample. Some possible germline variants will be included, but not intentionally.

New option: call all variation in the tumor sample.

In both case FilterMutectCalls is the same. It uses the INFO fields from Mutect2 to put the germline_risk in the filter field. If a user really wants to use Mutect2 as a slower HaplotypeCaller she could always set the germline posterior probability threshold to 1, effectively turning off this filter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.