Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2 pass variant walker #4744

Merged
merged 7 commits into from May 17, 2018
Merged

2 pass variant walker #4744

merged 7 commits into from May 17, 2018

Conversation

takutosato
Copy link
Contributor

No description provided.

@takutosato takutosato requested a review from droazen May 7, 2018 20:31
@takutosato
Copy link
Contributor Author

Hey @droazen will you review this?

Copy link
Collaborator

@cmnbroad cmnbroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be useful to have. Requested javadoc and unit tests.


import java.util.stream.StreamSupport;

public abstract class TwoPassVariantWalker extends VariantWalker {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great - we can use this in the CNNScoreVariants tool (#4316). It will need class and method javadoc though, and unit tests (see the ones for TwoPassReadWalker).

// Second pass
logger.info("Starting second pass through the variants");
traverseVariants(variantFilter, readFilter, this::secondPassApply);
logger.info(readFilter.getSummaryLine());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This accumulates the filter count across both passes and displays the aggregate total. The TwoPassReadWalker does the same thing, though I wonder if thats a feature, given that the second pass is usually an implementation detail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked to Louis and James a bit about this. I don't think we need two different VariantFilters, so I'll just call makeVariantFilter() twice, in the hope that calling it the second time resets the counter

@droazen droazen self-assigned this May 8, 2018
Copy link
Collaborator

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review complete, back to @takutosato. This looks good, just needs some tests and javadoc.


import java.util.stream.StreamSupport;

public abstract class TwoPassVariantWalker extends VariantWalker {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For tests, I think we need 3 things here:

  • A TwoPassVariantWalkerUnitTest test class, modeled after TwoPassReadWalkerUnitTest
  • An ExampleTwoPassVariantWalker tool in tools.examples, modeled after the existing ExampleVariantWalker tool
  • A simple ExampleTwoPassVariantWalkerIntegrationTest, modeled after ExampleVariantWalkerIntegrationTest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unit test for TwoPassVariantWalker writes a private example walker class and runs it, which seems to be the same as the integration test. I think doing either is sufficient - what do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good to have both the unit test and the integration test. The unit test uses a toy implementation to count calls to first/second pass apply, which is a good correctness test for the traversal. However, we also try to include a runnable example implementation that actually produces meaningful output for every walker type as a template for tool developers, and that's where ExampleTwoPassVariantWalker comes in.


@Override
public void traverse(){
final VariantFilter variantFilter = makeVariantFilter();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you file a github ticket saying that VariantWalkerBase (and subclasses) need to migrate to CountingVariantFilter, so that summary statistics for variant filtration can be printed at the end of traversal, as they are for read filters currently?

@cmnbroad Do you know why CountingVariantFilter never got hooked up here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we didn't do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}


protected abstract void firstPassApply(final VariantContext variant,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add full javadoc for firstPassApply(), secondPassApply(), and afterFirstPass(), modeled after the javadoc for the analogous methods in TwoPassReadWalker

@droazen droazen assigned takutosato and unassigned droazen May 8, 2018
@codecov-io
Copy link

codecov-io commented May 9, 2018

Codecov Report

Merging #4744 into master will increase coverage by 0.215%.
The diff coverage is 91.667%.

@@               Coverage Diff               @@
##              master     #4744       +/-   ##
===============================================
+ Coverage     79.985%   80.201%   +0.215%     
- Complexity     17419     18234      +815     
===============================================
  Files           1081      1086        +5     
  Lines          63118     66825     +3707     
  Branches       10195     11063      +868     
===============================================
+ Hits           50485     53594     +3109     
- Misses          8648      9073      +425     
- Partials        3985      4158      +173
Impacted Files Coverage Δ Complexity Δ
...er/tools/examples/ExampleTwoPassVariantWalker.java 89.286% <89.286%> (ø) 8 <8> (?)
...titute/hellbender/engine/TwoPassVariantWalker.java 95% <95%> (ø) 4 <4> (?)
...ery/inference/SimpleNovelAdjacencyInterpreter.java 80.435% <0%> (-19.565%) 15% <0%> (+8%)
.../DiscoverVariantsFromContigAlignmentsSAMSpark.java 88.889% <0%> (-4.659%) 49% <0%> (+27%)
...ellbender/utils/test/CommandLineProgramTester.java 91.667% <0%> (-3.571%) 11% <0%> (+2%)
...r/utils/read/markduplicates/sparkrecords/Pair.java 93.893% <0%> (-1.612%) 44% <0%> (+22%)
...iscoverFromLocalAssemblyContigAlignmentsSpark.java 76.048% <0%> (-0.474%) 5% <0%> (ø)
...lbender/tools/spark/sv/discovery/SimpleSVType.java 86% <0%> (-0.441%) 3% <0%> (+1%)
.../discovery/inference/ImpreciseVariantDetector.java 80.952% <0%> (-0.298%) 6% <0%> (ø)
...itute/hellbender/engine/spark/GATKRegistrator.java 100% <0%> (ø) 6% <0%> (+3%) ⬆️
... and 57 more

@takutosato
Copy link
Contributor Author

@droazen (or @cmnbroad?) back to you

@droazen droazen assigned droazen and unassigned takutosato May 14, 2018
* @param variant A variant record in a vcf
* @param readsContext Reads overlapping the current variant. Will be empty if a read source (e.g. bam) isn't provided
* @param referenceContext Reference bases spanning the current variant
* @param featureContext A vcf record overlapping the current variant from an auxiliary data source (e.g. gnomad)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

featureContext supports more than just VCF records as auxiliary inputs -- any tribble-supported format can be accepted. I'd just change vcf record to record

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


/**
*
* First pass through the variants. The user may store data in instance variables of the walker as you go
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as you go -> as they go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reworded

final VariantFilter variantContextFilter = makeVariantFilter();
final CountingReadFilter readFilter = makeReadFilter();

// First pass through the variant
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

variant -> variants

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

final FeatureContext featureContext);

/**
* Process the data collected during the first pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add explicit comment that this is called after firstPassApply() and before secondPassApply()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// Second pass
logger.info("Starting second pass through the variants");
traverseVariants(variantContextFilter, readFilter, this::secondPassApply);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add logger.info(readFilter.getSummaryLine()) after the second traversal pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

import java.util.ArrayList;
import java.util.List;

import static org.broadinstitute.hellbender.utils.variant.GATKVCFConstants.QUAL_BY_DEPTH_KEY;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid static imports -- they tend to create confusion when reading the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


@CommandLineProgramProperties(
summary = "Example variant walker that makes two passes through a vcf",
oneLineSummary = "Example two variant walker",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"two variant" -> "two-pass variant"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


private double sampleVarianceOfQDs;

int counter = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

@Override
protected void afterFirstPass() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example code would be more logical if you put afterFirstPass() physically between firstPassApply() and secondPassApply()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

import java.util.List;

import static org.broadinstitute.hellbender.utils.variant.GATKVCFConstants.QUAL_BY_DEPTH_KEY;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document in a 1-2 sentence comment what the example tool does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


import static org.broadinstitute.hellbender.tools.examples.ExampleTwoPassVariantWalker.COPY_OF_QD_KEY_NAME;
import static org.broadinstitute.hellbender.tools.examples.ExampleTwoPassVariantWalker.QD_DISTANCE_FROM_MEAN;
import static org.broadinstitute.hellbender.utils.variant.GATKVCFConstants.QUAL_BY_DEPTH_KEY;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid static imports

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

public class ExampleTwoPassVariantWalkerIntegrationTest extends CommandLineProgramTest {
@Test
public void test() throws IOException {
final File outputVcf = File.createTempFile("output", "vcf");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use BaseTest.createTempFile() instead, as it schedules the file (+ companion indices) for deletion on JVM exit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

runCommandLine(args);

final FeatureDataSource<VariantContext> variantsBefore = new FeatureDataSource<>(inputVcf);
final FeatureDataSource<VariantContext> variantsAfter = new FeatureDataSource<>(outputVcf);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create these data sources in a try-with-resources block, to ensure that they get closed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@Test
public void test() throws IOException {
final File outputVcf = File.createTempFile("output", "vcf");
final String inputVcf = "src/test/resources/org/broadinstitute/hellbender/tools/walkers/variantutils/SelectVariants/haploid-multisample.vcf";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really a good VCF to use for this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was the first vcf that had QD - most vcfs in the test suite don't

@CommandLineProgramProperties(
summary = "An example subclass of TwoPassVariantWalker",
oneLineSummary = "An example subclass of TwoPassVariantWalker",
programGroup = TestProgramGroup.class
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

omitFromCommandLine = true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

@Test
public void testNum() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testNum -> testTwoPassTraversal()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

import org.testng.Assert;
import org.testng.annotations.Test;

public class TwoPassVariantWalkerUnitTest {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extends GATKBaseTest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second-pass review complete, back to @takutosato with more comments.

@takutosato
Copy link
Contributor Author

@droazen thanks for the review. back to you

Copy link
Collaborator

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final review complete, back to @takutosato. Two last quick comments to address, then go ahead and hit merge after tests pass.


private VariantContextWriter vcfWriter;

static String QD_DISTANCE_FROM_MEAN = "QD_DIST";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a constant this should be static final, not just static


static String QD_DISTANCE_FROM_MEAN = "QD_DIST";

static String COPY_OF_QD_KEY_NAME = "QD_COPY";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also be static final

@droazen droazen assigned takutosato and unassigned droazen May 15, 2018
@droazen droazen assigned droazen and unassigned takutosato May 17, 2018
@droazen
Copy link
Collaborator

droazen commented May 17, 2018

I pushed the final requested changes myself just now -- will merge once tests pass.

@droazen droazen dismissed cmnbroad’s stale review May 17, 2018 20:06

All comments addressed

@takutosato
Copy link
Contributor Author

Great, thanks @droazen!

@droazen droazen merged commit 6e1cc8c into master May 17, 2018
@droazen droazen deleted the ts_2pass branch May 17, 2018 21:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants