Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BQSR add option to not compute indel recalibration tables #1745

Merged
merged 2 commits into from Apr 29, 2016

Conversation

akiezun
Copy link
Contributor

@akiezun akiezun commented Apr 20, 2016

saves 10%-20% of runtime of BaseRecalibator.

Intentionally done as a conservative and optional change so that we can evaluate it properly. Analyses done in #1056 show that resulting bams are identical (tested on bams up to 30GB of size so far).

Added a test for it - note that it uses a 27MB file that is not in LFS, which may be a problem. (we could switch to using the standard BAM and using -L to limit territory. Using full file takes 3-4 minutes which is too slow for testing)

Also added some utility methods to a few testing classes so that we can test this properly. Added full tests for added functionality.

@lbergelson please review

@coveralls
Copy link
Collaborator

Coverage Status

Coverage increased (+0.0005%) to 13.182% when pulling 1e699ba on ak_removeIndelsFromBQSR_take2 into 1bfbed4 on master.

import java.io.File;

//Note: has to be in this package to have access to Main.instanceMain
public final class BQSRIntegrationTest extends CommandLineProgramTest {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a BaseRecalibratorIntegrationTest, and you've now created BQSRIntegrationTest??

Copy link
Contributor Author

@akiezun akiezun Apr 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this one tests both steps

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think you need a better name here to avoid confusion, then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it's confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to BothStepsOfBQSRIntegrationTest

@droazen
Copy link
Collaborator

droazen commented Apr 21, 2016

Why didn't you put the large file into git lfs? That's what it's there for! :)

@akiezun
Copy link
Contributor Author

akiezun commented Apr 21, 2016

lfs: by mistake

@lbergelson
Copy link
Member

@droazen I don't think it's a great idea to include a 27GB file in lfs. That would mean that travis would download it every time it runs at a cost of ~$25

@droazen
Copy link
Collaborator

droazen commented Apr 21, 2016

@lbergelson I think @akiezun mis-spoke -- the file is actually 27 MB, isn't it?

@@ -93,6 +93,36 @@ public ArgumentsBuilder addFileArgument(String argumentName, File file){
}

/**
* add an argument with a file as its parameter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment is wrong

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@akiezun
Copy link
Contributor Author

akiezun commented Apr 21, 2016

yes, 27MB

args1.addFileArgument("knownSites", new File(dbsnp_138_b37_20_21_vcf));
args1.addReference(new File(b37_reference_20_21));
args1.addBooleanArgument("skipIndelBQSR", skipIndels);
new Main().instanceMain(makeCommandLineArgs(args1.getArgsList(), BaseRecalibrator.class.getSimpleName()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't these just call CommandLineProgramTest.runCommandline? It seems like it would make more sense to overload it to take a class name instead of making instanceMain publicly visible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can't use it to call more than 2 tool though

@lbergelson
Copy link
Member

If it's actually 27MB I would just check it into lfs or rewrite to use -L

@@ -60,7 +63,12 @@ public void singleReadDiffTest(File firstBam, File secondBam, File referenceFile
args.add(referenceFile.getAbsolutePath());
}

this.runCommandLine(args.getArgsArray());
final Object result = this.runCommandLine(args.getArgsArray());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can pass an ArgumentBuilder directly into runCommandLine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh nice

@lbergelson
Copy link
Member

@akiezun Back to you.

@lbergelson lbergelson assigned akiezun and unassigned lbergelson Apr 21, 2016
@droazen
Copy link
Collaborator

droazen commented Apr 21, 2016

@akiezun CEUTrio.HiSeq.WGS.b37.NA12878.20.bam should be renamed to include the interval it spans, so as not to imply that it contains all of chromosome 20. Perhaps it should also be shrunk further to cut down on unnecessary test suite runtime.

Since BQSR is never run with -L I think it's defensible to add this smaller slice of the existing NA12878 bam so that we can run the test over an entire file, but it should definitely be in lfs.

@akiezun
Copy link
Contributor Author

akiezun commented Apr 21, 2016

production runs BQSR with -L so it's not a bad idea to use -L

@@ -30,9 +31,72 @@ public void testOneBigString(){

@Test
public void testFromArray(){
ArgumentsBuilder args = new ArgumentsBuilder(new Object[]{"Option=" + new File("/path/to"), "OtherOption=" + -1});
Assert.assertEquals(args.getArgsArray(), new String[]{"--Option","/path/to", "--OtherOption", "-1"});
ArgumentsBuilder args = new ArgumentsBuilder(new Object[]{"Option=" + new File("/path/to"), "OtherOption=" + -1, new File("/somewhere")});

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[FindBugs] INFO: DMI: Hard coded reference to an absolute pathname in org.broadinstitute.hellbender.utils.test.ArgumentsBuilderTest.testFromArray()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[FindBugs] INFO: DMI: Hard coded reference to an absolute pathname in org.broadinstitute.hellbender.utils.test.ArgumentsBuilderTest.testFromArray()

@coveralls
Copy link
Collaborator

Coverage Status

Coverage increased (+70.5%) to 83.673% when pulling 2bb5701 on ak_removeIndelsFromBQSR_take2 into 1bfbed4 on master.

@akiezun akiezun force-pushed the ak_removeIndelsFromBQSR_take2 branch from 2bb5701 to 64329e1 Compare April 22, 2016 00:29
@coveralls
Copy link
Collaborator

Coverage Status

Coverage increased (+70.5%) to 83.673% when pulling 64329e1 on ak_removeIndelsFromBQSR_take2 into 5be05d4 on master.

@akiezun akiezun assigned lbergelson and unassigned akiezun Apr 22, 2016
@akiezun
Copy link
Contributor Author

akiezun commented Apr 22, 2016

back to @lbergelson

@akiezun
Copy link
Contributor Author

akiezun commented Apr 29, 2016

@lbergelson can you re review? Should be super easy

@lbergelson
Copy link
Member

👍

@lbergelson lbergelson merged commit a1c9820 into master Apr 29, 2016
@lbergelson lbergelson deleted the ak_removeIndelsFromBQSR_take2 branch April 29, 2016 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants