New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAM-1371] Wrap ADAM->htsjdk VariantContext conversion with validation stringency. #1373

Merged
merged 1 commit into from Mar 3, 2017

Conversation

Projects
None yet
4 participants
@fnothaft
Member

fnothaft commented Jan 24, 2017

Resolves #1371. Could use some more unit tests; will make a pass back later.

@fnothaft fnothaft added this to the 0.21.1 milestone Jan 24, 2017

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Jan 24, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1758/
Test PASSed.

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1758/
Test PASSed.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Jan 25, 2017

Member

I was thinking this might help with htsjdk exceptions like those in #1351, and added unit tests for it, but the exception is apparently thrown elsewhere

  sparkTest("VCF file with invalid alleles silent stringency") {
    val path = testFile("ExAC.0.3.GRCh38.invalid-allele.vcf")
    val vcs = sc.loadVcf(path, stringency = ValidationStringency.SILENT)
    assert(vcs.rdd.count == 5)
  }
- VCF file with invalid alleles silent stringency *** FAILED ***
  org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): htsjdk.tribble.TribbleException: The provided VCF file is malformed at approximately line number 401: Duplicate allele added to VariantContext: A
	at htsjdk.variant.vcf.AbstractVCFCodec.generateException(AbstractVCFCodec.java:779)
	at htsjdk.variant.vcf.AbstractVCFCodec.parseVCFLine(AbstractVCFCodec.java:356)
	at htsjdk.variant.vcf.AbstractVCFCodec.decodeLine(AbstractVCFCodec.java:279)
	at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:257)
	at org.seqdoop.hadoop_bam.VCFRecordReader.nextKeyValue(VCFRecordReader.java:144)
	at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:179)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
	at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1631)
	at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
	at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Member

heuermh commented Jan 25, 2017

I was thinking this might help with htsjdk exceptions like those in #1351, and added unit tests for it, but the exception is apparently thrown elsewhere

  sparkTest("VCF file with invalid alleles silent stringency") {
    val path = testFile("ExAC.0.3.GRCh38.invalid-allele.vcf")
    val vcs = sc.loadVcf(path, stringency = ValidationStringency.SILENT)
    assert(vcs.rdd.count == 5)
  }
- VCF file with invalid alleles silent stringency *** FAILED ***
  org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): htsjdk.tribble.TribbleException: The provided VCF file is malformed at approximately line number 401: Duplicate allele added to VariantContext: A
	at htsjdk.variant.vcf.AbstractVCFCodec.generateException(AbstractVCFCodec.java:779)
	at htsjdk.variant.vcf.AbstractVCFCodec.parseVCFLine(AbstractVCFCodec.java:356)
	at htsjdk.variant.vcf.AbstractVCFCodec.decodeLine(AbstractVCFCodec.java:279)
	at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:257)
	at org.seqdoop.hadoop_bam.VCFRecordReader.nextKeyValue(VCFRecordReader.java:144)
	at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:179)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
	at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1631)
	at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
	at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jan 25, 2017

Member

Yep; where is your god now, eh?

Member

fnothaft commented Jan 25, 2017

Yep; where is your god now, eh?

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Jan 25, 2017

Member

Want to merge this as is then? I'm no longer in the mood to write unit tests for it :)

Member

heuermh commented Jan 25, 2017

Want to merge this as is then? I'm no longer in the mood to write unit tests for it :)

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jan 25, 2017

Member

I have a unittest in mind, but won't be able to get to it until this weekend. Are you OK with waiting until next week to merge? If not, let's merge as is.

Member

fnothaft commented Jan 25, 2017

I have a unittest in mind, but won't be able to get to it until this weekend. Are you OK with waiting until next week to merge? If not, let's merge as is.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Jan 25, 2017

Member

This can wait; I pushed up the failing tests as heuermh@57f0895 in case they might come in handy later

Member

heuermh commented Jan 25, 2017

This can wait; I pushed up the failing tests as heuermh@57f0895 in case they might come in handy later

@coveralls

This comment has been minimized.

Show comment
Hide comment
@coveralls

coveralls Mar 3, 2017

Coverage Status

Coverage increased (+0.004%) to 76.328% when pulling 521dab4 on fnothaft:issues/1371-vcc-validation into 55dba3d on bigdatagenomics:master.

coveralls commented Mar 3, 2017

Coverage Status

Coverage increased (+0.004%) to 76.328% when pulling 521dab4 on fnothaft:issues/1371-vcc-validation into 55dba3d on bigdatagenomics:master.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Mar 3, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1832/
Test PASSed.

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1832/
Test PASSed.

@heuermh heuermh merged commit 07c1982 into bigdatagenomics:master Mar 3, 2017

2 checks passed

coverage/coveralls Coverage increased (+0.004%) to 76.328%
Details
default Merged build finished.
Details
@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Mar 3, 2017

Member

Thank you, @fnothaft!

Member

heuermh commented Mar 3, 2017

Thank you, @fnothaft!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment