New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate failures to load ExAC.0.3.GRCh38.vcf variants #1351

Closed
heuermh opened this Issue Jan 16, 2017 · 3 comments

Comments

Projects
2 participants
@heuermh
Member

heuermh commented Jan 16, 2017

On git head after merging #1346 and #1348, the next error is

$ time ./bin/adam-submit \
  vcf2adam -only_variants \
  ExAC.0.3.GRCh38.vcf.gz ExAC.0.3.GRCh38.variants.adam

Using ADAM_MAIN=org.bdgenomics.adam.cli.ADAMMain
Using SPARK_SUBMIT=/usr/local/bin/spark-submit
...
2017-01-16 17:18:06 ERROR Utils:91 - Aborting task
htsjdk.tribble.TribbleException: The provided VCF file is malformed at approximately
line number 15364: Duplicate allele added to VariantContext: C
	at htsjdk.variant.vcf.AbstractVCFCodec.generateException(AbstractVCFCodec.java:779)
	at htsjdk.variant.vcf.AbstractVCFCodec.parseVCFLine(AbstractVCFCodec.java:356)
	at htsjdk.variant.vcf.AbstractVCFCodec.decodeLine(AbstractVCFCodec.java:279)
	at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:257)
	at org.seqdoop.hadoop_bam.VCFRecordReader.nextKeyValue(VCFRecordReader.java:144)
	at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:199)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply$mcV$sp(PairRDDFunctions.scala:1123)
	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply(PairRDDFunctions.scala:1123)
	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply(PairRDDFunctions.scala:1123)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1341)
	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1131)
	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1102)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jan 17, 2017

Member

This looks like an error in the liftover. Did you lift over yourself or is this an official release?

Member

fnothaft commented Jan 17, 2017

This looks like an error in the liftover. Did you lift over yourself or is this an official release?

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Jan 17, 2017

Member

From Ensembl but not necessarily an official release, http://ftp.ensembl.org/pub/variation_genotype/homo_sapiens/ExAC.0.3.GRCh38.vcf.gz

I need to write something to go through the file and update the CSQ variant annotations to the ANN specification, so when I do that I'll filter out the invalid data as well.

Member

heuermh commented Jan 17, 2017

From Ensembl but not necessarily an official release, http://ftp.ensembl.org/pub/variation_genotype/homo_sapiens/ExAC.0.3.GRCh38.vcf.gz

I need to write something to go through the file and update the CSQ variant annotations to the ANN specification, so when I do that I'll filter out the invalid data as well.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Apr 19, 2017

Member

Closing as WontFix, that particular file has gone 404 and a newer version of ExAC exists.

Member

heuermh commented Apr 19, 2017

Closing as WontFix, that particular file has gone 404 and a newer version of ExAC exists.

@heuermh heuermh closed this Apr 19, 2017

@heuermh heuermh modified the milestone: 0.23.0 Jul 22, 2017

@heuermh heuermh added this to Completed in Release 0.23.0 Jan 4, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment