Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs for M2 filtering #3560

Merged
merged 1 commit into from Sep 11, 2017
Merged

Docs for M2 filtering #3560

merged 1 commit into from Sep 11, 2017

Conversation

davidbenjamin
Copy link
Contributor

@takutosato can you review this?

@davidbenjamin davidbenjamin added this to the Popularize Mutect 2 at the Broad milestone Sep 8, 2017
Copy link
Contributor

@takutosato takutosato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments, no need to pass it back

\item \code{maxEventsInHaplotype} is the maximum allowable number of called variants co-occurring in a single assembly region. If the number of called variants exceeds this they will all be filtered. Note that this filter is misnamed because it counts the total number of events over all haplotypes in an assembly region.
\item \code{uniqueAltReadCount} is the minimum number of unique (start position, fragment length) pairs required to make a call. This count is a proxy for the number of unique molecules (as opposed to PCR duplicates) supporting an allele. Normally PCR duplicates are marked and filtered by the GATK engine, but in UMI-aware calling this may not be the case, hence the need for this filter.
\item \code{maxAltAllelesThreshold} is the maximum allowable number of alt alleles at a site. By default only biallelic variants pass the filter.
\item \code{max\_germline\_posterior} is the maximum posterior probability, as determined by the above germline probability model, that a variant is a germline event.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just use to lowerCamelCase and not names_with_underscores. Could you also standardize them in M2FiltersArgumentCollection, too? For instance STRAND_ARTIFACT_POSTERIOR_PROB_THRESHOLD should be strandArtifactPosteriorProbThreshold and TUMOR_LOD_THRESHOLD tumorLodThreshold.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but there's discussion about standardizing all GATK argument to eg strand-artifact-posterior-prob-threshold and I'm waiting to see about that. Personally I think camel case is superior. If HaplotypeCaller doesn't adopt a standard within a few weeks, then let's do camel case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good


\begin{itemize}
\item \code{tumor\_lod} is the minimum likelihood of an allele as determined by the somatic likelihoods model required to pass.
\item \code{maxEventsInHaplotype} is the maximum allowable number of called variants co-occurring in a single assembly region. If the number of called variants exceeds this they will all be filtered. Note that this filter is misnamed because it counts the total number of events over all haplotypes in an assembly region.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we rename the variable then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think I'll do it after the comms team's upcoming tutorial.


Here for convenience is a table of \code{Mutect2} filters with their corresponding annotations specified by the \code{-A} argument\footnote{Most of these are default annotations and do not need to be invoked explicitly.}, vcf keys for these annotations, and command line arguments controlling filtering thresholds.

\begin{table}[h!]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we update the variable names we should update the argument column

@codecov-io
Copy link

Codecov Report

Merging #3560 into master will not change coverage.
The diff coverage is n/a.

@@             Coverage Diff             @@
##              master     #3560   +/-   ##
===========================================
  Coverage     79.932%   79.932%           
  Complexity     17900     17900           
===========================================
  Files           1199      1199           
  Lines          65015     65015           
  Branches       10124     10124           
===========================================
  Hits           51968     51968           
  Misses          9014      9014           
  Partials        4033      4033

@davidbenjamin davidbenjamin merged commit 895d7ce into master Sep 11, 2017
@davidbenjamin davidbenjamin deleted the db_m2_filter_docs branch September 11, 2017 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants