Increasing unit test coverage for VariantContextConverter #1276

Merged
merged 2 commits into from Nov 18, 2016

Conversation

Projects
None yet
3 participants
@heuermh
Member

heuermh commented Nov 16, 2016

No description provided.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Nov 17, 2016

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1617/

Build result: ABORTED

[...truncated 3 lines...]Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prbWiping out workspace first.Cloning the remote Git repositoryCloning repository https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > /home/jenkins/git2/bin/git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1276/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 603d840 # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1276/merge^{commit} # timeout=10Checking out Revision 603d840 (origin/pr/1276/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 603d8409297f5eeba7f30846362c4933efeacaf5First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in ABORTED, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1617/

Build result: ABORTED

[...truncated 3 lines...]Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prbWiping out workspace first.Cloning the remote Git repositoryCloning repository https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > /home/jenkins/git2/bin/git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1276/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 603d840 # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1276/merge^{commit} # timeout=10Checking out Revision 603d840 (origin/pr/1276/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 603d8409297f5eeba7f30846362c4933efeacaf5First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in ABORTED, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Nov 17, 2016

Member

If I ignore the hanging unit test then I see VCF header-related exceptions

- don't lose any variants when piping as VCF !!! IGNORED !!!
2016-11-16 17:19:15 ERROR Utils:95 - Aborting task
java.lang.IllegalStateException: Key IndelQD found in VariantContext field FILTER at 1:14397 but this key isn't defined in the VCFHeader.  We require all VCFs to have complete VCF headers by default.
    at htsjdk.variant.vcf.VCFEncoder.fieldIsMissingFromHeaderError(VCFEncoder.java:173)
    at htsjdk.variant.vcf.VCFEncoder.getFilterString(VCFEncoder.java:154)
    at htsjdk.variant.vcf.VCFEncoder.encode(VCFEncoder.java:106)
    at htsjdk.variant.variantcontext.writer.VCFWriter.add(VCFWriter.java:222)
    at org.seqdoop.hadoop_bam.VCFRecordWriter.writeRecord(VCFRecordWriter.java:140)
    at org.seqdoop.hadoop_bam.KeyIgnoringVCFRecordWriter.write(KeyIgnoringVCFRecordWriter.java:60)
    at org.seqdoop.hadoop_bam.KeyIgnoringVCFRecordWriter.write(KeyIgnoringVCFRecordWriter.java:38)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply$mcV$sp(PairRDDFunctions.scala:1113)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply(PairRDDFunctions.scala:1111)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply(PairRDDFunctions.scala:1111)
    at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1277)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1119)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1091)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Member

heuermh commented Nov 17, 2016

If I ignore the hanging unit test then I see VCF header-related exceptions

- don't lose any variants when piping as VCF !!! IGNORED !!!
2016-11-16 17:19:15 ERROR Utils:95 - Aborting task
java.lang.IllegalStateException: Key IndelQD found in VariantContext field FILTER at 1:14397 but this key isn't defined in the VCFHeader.  We require all VCFs to have complete VCF headers by default.
    at htsjdk.variant.vcf.VCFEncoder.fieldIsMissingFromHeaderError(VCFEncoder.java:173)
    at htsjdk.variant.vcf.VCFEncoder.getFilterString(VCFEncoder.java:154)
    at htsjdk.variant.vcf.VCFEncoder.encode(VCFEncoder.java:106)
    at htsjdk.variant.variantcontext.writer.VCFWriter.add(VCFWriter.java:222)
    at org.seqdoop.hadoop_bam.VCFRecordWriter.writeRecord(VCFRecordWriter.java:140)
    at org.seqdoop.hadoop_bam.KeyIgnoringVCFRecordWriter.write(KeyIgnoringVCFRecordWriter.java:60)
    at org.seqdoop.hadoop_bam.KeyIgnoringVCFRecordWriter.write(KeyIgnoringVCFRecordWriter.java:38)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply$mcV$sp(PairRDDFunctions.scala:1113)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply(PairRDDFunctions.scala:1111)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply(PairRDDFunctions.scala:1111)
    at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1277)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1119)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1091)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Nov 17, 2016

Member

We'll need #1260 + a bit more to fix the header lines issue...

Member

fnothaft commented Nov 17, 2016

We'll need #1260 + a bit more to fix the header lines issue...

@fnothaft

That hang is kind of odd, but I have a guess. I might change the tee /dev/null command in VariantContextRDDSuite to tee to a file and see what you're writing out. I'm thinking that what's happening is we're writing a VCF with a header that is missing a FILTER line for the IndelQD filter. When we read that back from the pipe, we are probably getting an IllegalStateException from tribble/htsjdk RE: the header line. I'm guessing then that this is causing the writer hang to exit but while blocking the piping thread pool from shutting down. (Yeah, that's a bug. Sigh!) Can you test this hypothesis? If that looks right on, open an issue and I'll fix the pipe problems.

@@ -152,6 +152,22 @@ class ADAMContextSuite extends ADAMFunSuite {
assert(vcs.size === 6)
val vc = vcs.head
+
+ /*

This comment has been minimized.

@fnothaft

fnothaft Nov 17, 2016

Member

If all's the same to you, I'd nix this comment.

@fnothaft

fnothaft Nov 17, 2016

Member

If all's the same to you, I'd nix this comment.

+ case (true, true) => vcb.passFilters
+ }
+
+ val somatic: java.lang.Boolean = Option(variant.getSomatic).getOrElse(false)

This comment has been minimized.

@fnothaft

fnothaft Nov 17, 2016

Member

I'd lose the : java.lang.Boolean. Is there a reason you need it?

@fnothaft

fnothaft Nov 17, 2016

Member

I'd lose the : java.lang.Boolean. Is there a reason you need it?

This comment has been minimized.

@heuermh

heuermh Nov 17, 2016

Member

Yeah it wouldn't compile without it. Odd that the lines above were ok.

@heuermh

heuermh Nov 17, 2016

Member

Yeah it wouldn't compile without it. Odd that the lines above were ok.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Nov 17, 2016

Member

Yes, I believe that is what is happening with the hang. Teeing to another file results in an empty file.

Member

heuermh commented Nov 17, 2016

Yes, I believe that is what is happening with the hang. Teeing to another file results in an empty file.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Nov 18, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1631/
Test PASSed.

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1631/
Test PASSed.

+ test("Convert somatic htsjdk site-only SNV to ADAM") {
+ val converter = new VariantContextConverter
+
+ // not sure why this doesn't work

This comment has been minimized.

@fnothaft

fnothaft Nov 18, 2016

Member

This one too.

@fnothaft

fnothaft Nov 18, 2016

Member

This one too.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Nov 18, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1632/
Test PASSed.

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1632/
Test PASSed.

@fnothaft fnothaft merged commit e0979a9 into bigdatagenomics:master Nov 18, 2016

1 check passed

default Merged build finished.
Details

@heuermh heuermh deleted the heuermh:vcc-coverage branch Nov 18, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment