VCF format tag SB field parse error in loading #1199

Closed
jpdna opened this Issue Oct 7, 2016 · 2 comments

Comments

Projects
None yet
2 participants
@jpdna
Member

jpdna commented Oct 7, 2016

I'm trying to input the file:
http://bioinformaticstools.mayo.edu/research/wp-content/plugins/download.php?url=https://s3-us-west-2.amazonaws.com/mayo-bic-tools/variant_miner/gvcfs/NA12878.chr22.g.vcf.gz

from the variantDB challenge with
val x2 = sc.loadVcf("NA12878.chr22.g.vcf")

I'm getting error:

scala> x2.rdd.count
2016-10-07 15:49:24 ERROR Executor:95 - Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.NumberFormatException: For input string: "0,0,0,1"
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
    at java.lang.Integer.parseInt(Integer.java:580)
    at java.lang.Integer.valueOf(Integer.java:766)
    at org.bdgenomics.adam.converters.VariantAnnotationConverter$.org$bdgenomics$adam$converters$VariantAnnotationConverter$$attrAsInt(VariantAnnotationConverter.scala:76)
    at org.bdgenomics.adam.converters.VariantAnnotationConverter$$anonfun$22.apply(VariantAnnotationConverter.scala:190)
    at org.bdgenomics.adam.converters.VariantAnnotationConverter$$anonfun$convert$2.apply(VariantAnnotationConverter.scala:307)
    at org.bdgenomics.adam.converters.VariantAnnotationConverter$$anonfun$convert$2.apply(VariantAnnotationConverter.scala:303)
    at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
    at scala.collection.immutable.Map$Map4.foreach(Map.scala:181)
    at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
    at org.bdgenomics.adam.converters.VariantAnnotationConverter$.convert(VariantAnnotationConverter.scala:303)
    at org.bdgenomics.adam.converters.VariantContextConverter$$anonfun$6.apply(VariantContextConverter.scala:406)
    at org.bdgenomics.adam.converters.VariantContextConverter$$anonfun$6.apply(VariantContextConverter.scala:386)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
    at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
    at scala.collection.AbstractTraversable.map(Traversable.scala:105)
    at org.bdgenomics.adam.converters.VariantContextConverter.extractGenotypes(VariantContextConverter.scala:385)
    at org.bdgenomics.adam.converters.VariantContextConverter.extractReferenceModelGenotypes(VariantContextConverter.scala:471)
    at org.bdgenomics.adam.converters.VariantContextConverter.convert(VariantContextConverter.scala:210)
    at org.bdgenomics.adam.rdd.ADAMContext$$anonfun$loadVcf$1.apply(ADAMContext.scala:823)
    at org.bdgenomics.adam.rdd.ADAMContext$$anonfun$loadVcf$1.apply(ADAMContext.scala:823)
    at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
    at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1595)
    at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
    at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
2016-10-07 15:49:24 WARN  TaskSetManager:70 - Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.NumberFormatException: For input string: "0,0,0,1"
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
    at java.lang.Integer.parseInt(Integer.java:580)
    at java.lang.Integer.valueOf(Integer.java:766)

The first offending data row is:

chr22   16157603        .       G       C,<NON_REF>     16.07   .       DP=1;MLEAC=1,0;MLEAF=0.500,0.00;MQ=59.00;MQ0=0  GT:AD:DP:GQ:PL:SB       1/1:0,1,0:1:3:41,3,0,41,3,41:0,0,0,1

The SB field in header is defined as:

##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">

I'm going to look into a fix, just wanted to create issue here for tracking and get any comments

@jpdna jpdna changed the title from VCF tag field parse error in loading to VCF format tag field parse error in loading Oct 7, 2016

@jpdna

This comment has been minimized.

Show comment
Hide comment
@jpdna

jpdna Oct 7, 2016

Member

Indeed, as error trace suggests, I don't think it is working to send the comma separated list of 4 integers to 'attrAsInt`

https://github.com/bigdatagenomics/adam/blob/master/adam-core/src/main/scala/org/bdgenomics/adam/converters/VariantAnnotationConverter.scala#L190

Member

jpdna commented Oct 7, 2016

Indeed, as error trace suggests, I don't think it is working to send the comma separated list of 4 integers to 'attrAsInt`

https://github.com/bigdatagenomics/adam/blob/master/adam-core/src/main/scala/org/bdgenomics/adam/converters/VariantAnnotationConverter.scala#L190

@jpdna jpdna changed the title from VCF format tag field parse error in loading to VCF format tag SB field parse error in loading Oct 9, 2016

@jpdna

This comment has been minimized.

Show comment
Hide comment
@jpdna

jpdna Oct 9, 2016

Member

hopefully fixed by #1203

Member

jpdna commented Oct 9, 2016

hopefully fixed by #1203

@heuermh heuermh modified the milestone: 0.20.0 Oct 13, 2016

@heuermh heuermh referenced this issue Oct 14, 2016

Closed

Release ADAM version 0.20.0 #1048

47 of 61 tasks complete

@fnothaft fnothaft closed this in e3c061b Oct 14, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment