Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAM-1371] Wrap ADAM->htsjdk VariantContext conversion with validation stringency. #1373

Merged

Conversation

@fnothaft
Copy link
Member

fnothaft commented Jan 24, 2017

Resolves #1371. Could use some more unit tests; will make a pass back later.

@fnothaft fnothaft added this to the 0.21.1 milestone Jan 24, 2017
@AmplabJenkins
Copy link

AmplabJenkins commented Jan 24, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1758/
Test PASSed.

@heuermh
Copy link
Member

heuermh commented Jan 25, 2017

I was thinking this might help with htsjdk exceptions like those in #1351, and added unit tests for it, but the exception is apparently thrown elsewhere

  sparkTest("VCF file with invalid alleles silent stringency") {
    val path = testFile("ExAC.0.3.GRCh38.invalid-allele.vcf")
    val vcs = sc.loadVcf(path, stringency = ValidationStringency.SILENT)
    assert(vcs.rdd.count == 5)
  }
- VCF file with invalid alleles silent stringency *** FAILED ***
  org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): htsjdk.tribble.TribbleException: The provided VCF file is malformed at approximately line number 401: Duplicate allele added to VariantContext: A
	at htsjdk.variant.vcf.AbstractVCFCodec.generateException(AbstractVCFCodec.java:779)
	at htsjdk.variant.vcf.AbstractVCFCodec.parseVCFLine(AbstractVCFCodec.java:356)
	at htsjdk.variant.vcf.AbstractVCFCodec.decodeLine(AbstractVCFCodec.java:279)
	at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:257)
	at org.seqdoop.hadoop_bam.VCFRecordReader.nextKeyValue(VCFRecordReader.java:144)
	at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:179)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
	at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1631)
	at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
	at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
@fnothaft
Copy link
Member Author

fnothaft commented Jan 25, 2017

Yep; where is your god now, eh?

@heuermh
Copy link
Member

heuermh commented Jan 25, 2017

Want to merge this as is then? I'm no longer in the mood to write unit tests for it :)

@fnothaft
Copy link
Member Author

fnothaft commented Jan 25, 2017

I have a unittest in mind, but won't be able to get to it until this weekend. Are you OK with waiting until next week to merge? If not, let's merge as is.

@heuermh
Copy link
Member

heuermh commented Jan 25, 2017

This can wait; I pushed up the failing tests as heuermh@57f0895 in case they might come in handy later

…on stringency.

Resolves #1371.
@fnothaft fnothaft force-pushed the fnothaft:issues/1371-vcc-validation branch from f32f5a3 to 521dab4 Mar 3, 2017
@coveralls
Copy link

coveralls commented Mar 3, 2017

Coverage Status

Coverage increased (+0.004%) to 76.328% when pulling 521dab4 on fnothaft:issues/1371-vcc-validation into 55dba3d on bigdatagenomics:master.

@AmplabJenkins
Copy link

AmplabJenkins commented Mar 3, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1832/
Test PASSed.

@heuermh heuermh merged commit 07c1982 into bigdatagenomics:master Mar 3, 2017
2 checks passed
2 checks passed
coverage/coveralls Coverage increased (+0.004%) to 76.328%
Details
default Merged build finished.
Details
@heuermh
Copy link
Member

heuermh commented Mar 3, 2017

Thank you, @fnothaft!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants
You can’t perform that action at this time.