Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adam2vcf fails with -coalesce #735

Closed
antonstamov opened this issue Jul 22, 2015 · 3 comments
Closed

adam2vcf fails with -coalesce #735

antonstamov opened this issue Jul 22, 2015 · 3 comments

Comments

@antonstamov
Copy link
Contributor

It fails with any coalesce value(1,4,10,etc)

Stacktrace:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 41.0 failed 1 times, most recent failure: Lost task 3.0 in stage 41.0 (TID 333, localhost): com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException
Serialization trace:
alleles (htsjdk.variant.variantcontext.FastGenotype)
genotypes (htsjdk.variant.variantcontext.VariantContext)
vc (org.seqdoop.hadoop_bam.VariantContextWritable)
at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:626)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:109)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
at com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:42)
at com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:33)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:142)
at org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:987)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:965)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
at scala.collection.convert.Wrappers$MutableBufferWrapper.add(Wrappers.scala:80)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:109)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
... 33 more

@arahuja
Copy link
Contributor

arahuja commented Jul 22, 2015

I actually just saw this as well, and not sure it's related to coalesce. I get an NPE when using a Projection with GenotypeField

Maybe a separate issue or some bug in the serializer?

val adamGenotypes = adamContext.loadParquetGenotypes("/some/path,adam", projection = Some(Projection(GenotypeField.alternateReadDepth)))

adamGenotypes.take(1)

gives

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, demeter-csmau10-20.demeter.hpc.mssm.edu): java.lang.NullPointerException: null of org.bdgenomics.formats.avro.Variant in field variant of org.bdgenomics.formats.avro.Genotype
        at org.apache.avro.generic.GenericDatumWriter.npe(GenericDatumWriter.java:93)
        at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:87)
        at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
        at org.bdgenomics.adam.serialization.AvroSerializer.write(ADAMKryoRegistrator.scala:49)
        at org.bdgenomics.adam.serialization.AvroSerializer.write(ADAMKryoRegistrator.scala:38)
        at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
        at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:318)
        at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:293)
        at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
        at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:250)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:236)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
        at org.apache.avro.generic.GenericData.getField(GenericData.java:580)
        at org.apache.avro.generic.GenericData.getField(GenericData.java:595)
        at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:112)
        at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
        at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
        at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114)
        at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
        at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
        ... 12 more

@heuermh
Copy link
Member

heuermh commented Jul 6, 2016

I am not able to reproduce with current git HEAD

$ ./bin/adam-submit --version
Using ADAM_MAIN=org.bdgenomics.adam.cli.ADAMMain
Using SPARK_SUBMIT=/usr/local/bin/spark-submit

       e         888~-_          e             e    e
      d8b        888   \        d8b           d8b  d8b
     /Y88b       888    |      /Y88b         d888bdY88b
    /  Y88b      888    |     /  Y88b       / Y88Y Y888b
   /____Y88b     888   /     /____Y88b     /   YY   Y888b
  /      Y88b    888_-~     /      Y88b   /          Y888b

ADAM version: 0.19.1-SNAPSHOT
Commit: d39b374f0d55a1a931bfd66218588ce3d675e902 Build: 2016-07-06
Built for: Scala 2.10 and Hadoop 2.6.0

$ ./bin/adam-submit vcf2adam adam-core/src/test/resources/small.vcf small.adam
Using ADAM_MAIN=org.bdgenomics.adam.cli.ADAMMain
Using SPARK_SUBMIT=/usr/local/bin/spark-submit

$ ./bin/adam-submit adam2vcf -coalesce 1 small.adam small.vcf
Using ADAM_MAIN=org.bdgenomics.adam.cli.ADAMMain
Using SPARK_SUBMIT=/usr/local/bin/spark-submit
Jul 6, 2016 2:40:28 PM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 1
Jul 6, 2016 2:40:29 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Jul 6, 2016 2:40:29 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 15 records.
Jul 6, 2016 2:40:29 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Jul 6, 2016 2:40:29 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: block read in memory in 35 ms. row count = 15

@fnothaft
Copy link
Member

fnothaft commented Jul 6, 2016

Closing as not able to reproduce/issue is old.

@fnothaft fnothaft closed this as completed Jul 6, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants