Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gVCF - can't load multi-allelic sites #1202

Closed
jpdna opened this issue Oct 8, 2016 · 3 comments
Closed

gVCF - can't load multi-allelic sites #1202

jpdna opened this issue Oct 8, 2016 · 3 comments

Comments

@jpdna
Copy link
Member

@jpdna jpdna commented Oct 8, 2016

I'm trying to input the file:
http://bioinformaticstools.mayo.edu/research/wp-content/plugins/download.php?url=https://s3-us-west-2.amazonaws.com/mayo-bic-tools/variant_miner/gvcfs/NA12878.chr22.g.vcf.gz

I believe I solved the SB tag problem of #1199

However, now on the VCF row below I get the following error:

chr22   18027817        .       CTTTT   C,CT,CTT,CTTT,<NON_REF> 1364.73 .       DP=78;MLEAC=0,0,1,1,0;MLEAF=0.00,0.00,0.500,0.500,0.00;MQ=70.00;MQ0=0   GT:AD:DP:GQ:PL:SB       3/4:0,4,6,28,17,0:55:99:1402,1244,2007,972,1382,1297,371,355,301,301,792,617,486,0,653,1038,1016,883,315,597,935:0,0,14,3

Note, clearly the row above is not a reference block, it is called multi-allelic site within the gVCF file

 x2.rdd.count
2016-10-08 13:04:26 ERROR Executor:95 - Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.IllegalArgumentException: Multi-allelic site with non-ref symbolic allele[VC Unknown @ chr22:18027817-18027821 Q1364.73 of type=MIXED alleles=[CTTTT*, <NON_REF>, C, CT, CTT, CTTT] attr={DP=78, MLEAC=[0, 0, 1, 1, 0], MLEAF=[0.00, 0.00, 0.500, 0.500, 0.00], MQ=70.00, MQ0=0} GT=GT:AD:DP:GQ:PL:SB 3/4:0,4,6,28,17,0:55:99:1402,1244,2007,972,1382,1297,371,355,301,301,792,617,486,0,653,1038,1016,883,315,597,935:0,0,14,3
    at org.bdgenomics.adam.converters.VariantContextConverter.convert(VariantContextConverter.scala:214)
    at org.bdgenomics.adam.rdd.ADAMContext$$anonfun$loadVcf$1.apply(ADAMContext.scala:823)
    at org.bdgenomics.adam.rdd.ADAMContext$$anonfun$loadVcf$1.apply(ADAMContext.scala:823)
    at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
    at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1595)
    at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
    at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
2016-10-08 13:04:26 WARN  TaskSetManager:70 - Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.IllegalArgumentException: Multi-allelic site with non-ref symbolic allele[VC Unknown @ chr22:18027817-18027821 Q1364.73 of type=MIXED alleles=[CTTTT*, <NON_REF>, C, CT, CTT, CTTT] attr={DP=78, MLEAC=[0, 0, 1, 1, 0], MLEAF=[0.00, 0.00, 0.500, 0.500, 0.00], MQ=70.00, MQ0=0} GT=GT:AD:DP:GQ:PL:SB    3/4:0,4,6,28,17,0:55:99:1402,1244,2007,972,1382,1297,371,355,301,301,792,617,486,0,653,1038,1016,883,315,597,935:0,0,14,3
    at org.bdgenomics.adam.converters.VariantContextConverter.convert(VariantContextConverter.scala:214)
    at org.bdgenomics.adam.rdd.ADAMContext$$anonfun$loadVcf$1.apply(ADAMContext.scala:823)
    at org.bdgenomics.adam.rdd.ADAMContext$$anonfun$loadVcf$1.apply(ADAMContext.scala:823)
    at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
    at org.apache.spark.util.Utils$.getIterat

exception is from:
https://github.com/bigdatagenomics/adam/blob/master/adam-core/src/main/scala/org/bdgenomics/adam/converters/VariantContextConverter.scala#L214

@fnothaft
Copy link
Member

@fnothaft fnothaft commented Oct 8, 2016

I'm working on a patch for all the gVCF stuff that should hopefully hit mid next week. Can you make a subset of that VCF that contains the header + the lines implicated here and in #1199 and open a PR with tests that reproduce the two failures? I can then pull them in to my patch to test my patch.

@jpdna
Copy link
Member Author

@jpdna jpdna commented Oct 11, 2016

Created PR #1205 to demonstrate the multi-allele site issue for gVCF

@jpdna
Copy link
Member Author

@jpdna jpdna commented Oct 16, 2016

Just pinging on this to make sure the problem is described sufficiently above and by the #1205

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.