fix SB tag parsing #1203

Closed
wants to merge 2 commits into
from

Conversation

Projects
None yet
4 participants
@jpdna
Member

jpdna commented Oct 9, 2016

The SB tag defined as "Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias":

##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">

is not parsed properly in existing code or covered by tests.
This came to light in trying to parse the gVCF file:
http://bioinformaticstools.mayo.edu/research/wp-content/plugins/download.php?url=https://s3-us-west-2.amazonaws.com/mayo-bic-tools/variant_miner/gvcfs/NA12878.chr22.g.vcf.gz

This PR correctly parses this SB tag and adds a variant with this tag to small.vcf and updates tests.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Oct 9, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1530/
Test PASSed.

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1530/
Test PASSed.

@jpdna

This comment has been minimized.

Show comment
Hide comment
@jpdna

jpdna Oct 9, 2016

Member

Note - I realize its less than ideal that so many count assertions had to be changed in the tests in this PR due to changing of small.vcf For the multi-allele issue in #1202 I'll look at making a new test vcf file in another directory so as to not have side effects on the counts, and we may also want to find an alternative in this PR's test component.

Member

jpdna commented Oct 9, 2016

Note - I realize its less than ideal that so many count assertions had to be changed in the tests in this PR due to changing of small.vcf For the multi-allele issue in #1202 I'll look at making a new test vcf file in another directory so as to not have side effects on the counts, and we may also want to find an alternative in this PR's test component.

@fnothaft

Mostly LGTM! I'm OK with the .counts changing throughout the test suite. That's not a big issue from my side. Thanks for taking this on, @jpdna!

@@ -17,6 +17,9 @@
*/
package org.bdgenomics.adam.converters
+import scala.collection.JavaConversions._

This comment has been minimized.

@fnothaft

fnothaft Oct 9, 2016

Member

Nit: these should go at the end. Actually, we should lex sort all the imports here, since it should be htsjdk, org.apache, then scala.

@fnothaft

fnothaft Oct 9, 2016

Member

Nit: these should go at the end. Actually, we should lex sort all the imports here, since it should be htsjdk, org.apache, then scala.

This comment has been minimized.

@fnothaft

fnothaft Oct 9, 2016

Member

Also, no whitespace.

@fnothaft

fnothaft Oct 9, 2016

Member

Also, no whitespace.

+ * @param attr Attribute to convert.
+ * @return Attribute as a java.util.List[Integer]
+ */
+ private def attrAsIntList(attr: Object): Object = attr match {

This comment has been minimized.

@fnothaft

fnothaft Oct 9, 2016

Member

Can you add a test that covers this function?

@fnothaft

fnothaft Oct 9, 2016

Member

Can you add a test that covers this function?

@@ -187,7 +201,7 @@ private[converters] object VariantAnnotationConverter extends Serializable {
AttrKey("phaseQuality", attrAsInt _, new VCFFormatHeaderLine(VCFConstants.PHASE_QUALITY_KEY, 1, VCFHeaderLineType.Float, "Read-backed phasing quality")),
AttrKey("phaseSetId", attrAsInt _, new VCFFormatHeaderLine(VCFConstants.PHASE_SET_KEY, 1, VCFHeaderLineType.Integer, "Phase set")),
AttrKey("minReadDepth", attrAsInt _, new VCFFormatHeaderLine("MIN_DP", 1, VCFHeaderLineType.Integer, "Minimum DP observed within the GVCF block")),
- AttrKey("strandBiasComponents", attrAsInt _, new VCFFormatHeaderLine("SB", 4, VCFHeaderLineType.Integer, "Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias."))
+ AttrKey("strandBiasComponents", attrAsIntList _, new VCFFormatHeaderLine("SB", 4, VCFHeaderLineType.Integer, "Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias."))

This comment has been minimized.

@heuermh

heuermh Oct 13, 2016

Member

Do you have a doc link for the SB tag being used for genotypes (i.e. as a FORMAT key)? What are the 4 different values? Are there always 4 values?

@heuermh

heuermh Oct 13, 2016

Member

Do you have a doc link for the SB tag being used for genotypes (i.e. as a FORMAT key)? What are the 4 different values? Are there always 4 values?

This comment has been minimized.

@heuermh

heuermh Oct 13, 2016

Member

Great, thanks!

@heuermh

heuermh Oct 13, 2016

Member

Great, thanks!

This comment has been minimized.

@fnothaft

fnothaft Oct 13, 2016

Member

SB components are a 2x2 matrix: forward/rev strand on one axis, ref/alt on the other axis

@fnothaft

fnothaft Oct 13, 2016

Member

SB components are a 2x2 matrix: forward/rev strand on one axis, ref/alt on the other axis

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Oct 13, 2016

Member

Ping @jpdna for adding tests, small import cleanup, then I am good to merge this.

Member

fnothaft commented Oct 13, 2016

Ping @jpdna for adding tests, small import cleanup, then I am good to merge this.

@heuermh heuermh modified the milestone: 0.20.0 Oct 13, 2016

@fnothaft

Thanks @jpdna; additional commit looks good. I will squash down and merge when tests pass.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Oct 14, 2016

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1545/

Build result: FAILURE

[...truncated 3 lines...]Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prbWiping out workspace first.Cloning the remote Git repositoryCloning repository https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > /home/jenkins/git2/bin/git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1203/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 213dda5 # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1203/merge^{commit} # timeout=10Checking out Revision 213dda5 (origin/pr/1203/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 213dda53041f413f1501f44eb4e88ee9deb2f339First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1545/

Build result: FAILURE

[...truncated 3 lines...]Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prbWiping out workspace first.Cloning the remote Git repositoryCloning repository https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > /home/jenkins/git2/bin/git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1203/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 213dda5 # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1203/merge^{commit} # timeout=10Checking out Revision 213dda5 (origin/pr/1203/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 213dda53041f413f1501f44eb4e88ee9deb2f339First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@jpdna

This comment has been minimized.

Show comment
Hide comment
@jpdna

jpdna Oct 14, 2016

Member

This test failure is weird - and only on Spark 1.5.2, and I wouldn't think that the very last commit would have broken anything related to these failures. Thoughts?

Member

jpdna commented Oct 14, 2016

This test failure is weird - and only on Spark 1.5.2, and I wouldn't think that the very last commit would have broken anything related to these failures. Thoughts?

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Oct 14, 2016

Member

Note that the build matrix chart doesn't accurately show what happened here, to keep build times down, Jenkins only builds 2.6.0,2.10,1.5.2,centos and 2.6.0,2.11,1.5.2,centos. If those builds succeed, then it builds out the rest of the matrix. So in this case, the two red dots represent new failed builds from this pull request, and the green dots are successful builds from something earlier.

It appears you need to update the new pipe API related tests to match the VCF changes you made in this pull request.

Member

heuermh commented Oct 14, 2016

Note that the build matrix chart doesn't accurately show what happened here, to keep build times down, Jenkins only builds 2.6.0,2.10,1.5.2,centos and 2.6.0,2.11,1.5.2,centos. If those builds succeed, then it builds out the rest of the matrix. So in this case, the two red dots represent new failed builds from this pull request, and the green dots are successful builds from something earlier.

It appears you need to update the new pipe API related tests to match the VCF changes you made in this pull request.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Oct 14, 2016

Member

Test failures repro for me locally. But, it's a simple fix. Will shoot your way in 1 min @jpdna .

Member

fnothaft commented Oct 14, 2016

Test failures repro for me locally. But, it's a simple fix. Will shoot your way in 1 min @jpdna .

@fnothaft fnothaft referenced this pull request Oct 14, 2016

Closed

fix SB tag parsing #1209

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Oct 14, 2016

Member

@jpdna I've opened PR #1209 with 3536385, which fixes the issue. We added two tests in #1114 that used the VCF that you added a record to, so I just updated the tests to account for the new record. I also rebased your PR on ToT. Let me know if #1209 looks good to you, and I will squash it down and merge it when tests pass.

Member

fnothaft commented Oct 14, 2016

@jpdna I've opened PR #1209 with 3536385, which fixes the issue. We added two tests in #1114 that used the VCF that you added a record to, so I just updated the tests to account for the new record. I also rebased your PR on ToT. Let me know if #1209 looks good to you, and I will squash it down and merge it when tests pass.

@fnothaft fnothaft closed this Oct 14, 2016

@jpdna

This comment has been minimized.

Show comment
Hide comment
@jpdna

jpdna Oct 14, 2016

Member

My mistake here is that I should have re-based before the PR myself first, and then I would have seen the failure locally that Jenkins saw - correct?

Yes - #1209 looks good to me. Thanks!

Member

jpdna commented Oct 14, 2016

My mistake here is that I should have re-based before the PR myself first, and then I would have seen the failure locally that Jenkins saw - correct?

Yes - #1209 looks good to me. Thanks!

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Oct 14, 2016

Member

My mistake here is that I should have re-based before the PR myself first, and then I would have seen the failure locally that Jenkins saw - correct?

Yeah, exactly. No worries! I did the same thing yesterday with #1114, actually...

Member

fnothaft commented Oct 14, 2016

My mistake here is that I should have re-based before the PR myself first, and then I would have seen the failure locally that Jenkins saw - correct?

Yeah, exactly. No worries! I did the same thing yesterday with #1114, actually...

@fnothaft fnothaft referenced this pull request Oct 14, 2016

Closed

Release ADAM version 0.20.0 #1048

47 of 61 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment