Investigate failures to load ExAC.0.3.GRCh38.vcf variants #1351

heuermh · 2017-01-16T23:21:31Z

On git head after merging #1346 and #1348, the next error is

$ time ./bin/adam-submit \
  vcf2adam -only_variants \
  ExAC.0.3.GRCh38.vcf.gz ExAC.0.3.GRCh38.variants.adam

Using ADAM_MAIN=org.bdgenomics.adam.cli.ADAMMain
Using SPARK_SUBMIT=/usr/local/bin/spark-submit
...
2017-01-16 17:18:06 ERROR Utils:91 - Aborting task
htsjdk.tribble.TribbleException: The provided VCF file is malformed at approximately
line number 15364: Duplicate allele added to VariantContext: C
	at htsjdk.variant.vcf.AbstractVCFCodec.generateException(AbstractVCFCodec.java:779)
	at htsjdk.variant.vcf.AbstractVCFCodec.parseVCFLine(AbstractVCFCodec.java:356)
	at htsjdk.variant.vcf.AbstractVCFCodec.decodeLine(AbstractVCFCodec.java:279)
	at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:257)
	at org.seqdoop.hadoop_bam.VCFRecordReader.nextKeyValue(VCFRecordReader.java:144)
	at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:199)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply$mcV$sp(PairRDDFunctions.scala:1123)
	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply(PairRDDFunctions.scala:1123)
	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply(PairRDDFunctions.scala:1123)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1341)
	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1131)
	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1102)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

The text was updated successfully, but these errors were encountered:

fnothaft · 2017-01-17T18:34:23Z

This looks like an error in the liftover. Did you lift over yourself or is this an official release?

heuermh · 2017-01-17T19:01:57Z

From Ensembl but not necessarily an official release, http://ftp.ensembl.org/pub/variation_genotype/homo_sapiens/ExAC.0.3.GRCh38.vcf.gz

I need to write something to go through the file and update the CSQ variant annotations to the ANN specification, so when I do that I'll filter out the invalid data as well.

heuermh · 2017-04-19T17:42:13Z

Closing as WontFix, that particular file has gone 404 and a newer version of ExAC exists.

heuermh mentioned this issue Jan 25, 2017

[ADAM-1371] Wrap ADAM->htsjdk VariantContext conversion with validation stringency. #1373

Merged

heuermh closed this as completed Apr 19, 2017

heuermh modified the milestone: 0.23.0 Jul 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate failures to load ExAC.0.3.GRCh38.vcf variants #1351

Investigate failures to load ExAC.0.3.GRCh38.vcf variants #1351

heuermh commented Jan 16, 2017

fnothaft commented Jan 17, 2017

heuermh commented Jan 17, 2017 •

edited

heuermh commented Apr 19, 2017

Investigate failures to load ExAC.0.3.GRCh38.vcf variants #1351

Investigate failures to load ExAC.0.3.GRCh38.vcf variants #1351

Comments

heuermh commented Jan 16, 2017

fnothaft commented Jan 17, 2017

heuermh commented Jan 17, 2017 • edited

heuermh commented Apr 19, 2017

heuermh commented Jan 17, 2017 •

edited