adam2vcf fails with -coalesce #735

Closed
antonstamov opened this Issue Jul 22, 2015 · 3 comments

Comments

Projects
None yet
4 participants
@antonstamov
Contributor

antonstamov commented Jul 22, 2015

It fails with any coalesce value(1,4,10,etc)

Stacktrace:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 41.0 failed 1 times, most recent failure: Lost task 3.0 in stage 41.0 (TID 333, localhost): com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException
Serialization trace:
alleles (htsjdk.variant.variantcontext.FastGenotype)
genotypes (htsjdk.variant.variantcontext.VariantContext)
vc (org.seqdoop.hadoop_bam.VariantContextWritable)
at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:626)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:109)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
at com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:42)
at com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:33)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:142)
at org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:987)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:965)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
at scala.collection.convert.Wrappers$MutableBufferWrapper.add(Wrappers.scala:80)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:109)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
... 33 more

@arahuja

This comment has been minimized.

Show comment
Hide comment
@arahuja

arahuja Jul 22, 2015

Contributor

I actually just saw this as well, and not sure it's related to coalesce. I get an NPE when using a Projection with GenotypeField

Maybe a separate issue or some bug in the serializer?

val adamGenotypes = adamContext.loadParquetGenotypes("/some/path,adam", projection = Some(Projection(GenotypeField.alternateReadDepth)))

adamGenotypes.take(1)

gives

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, demeter-csmau10-20.demeter.hpc.mssm.edu): java.lang.NullPointerException: null of org.bdgenomics.formats.avro.Variant in field variant of org.bdgenomics.formats.avro.Genotype
        at org.apache.avro.generic.GenericDatumWriter.npe(GenericDatumWriter.java:93)
        at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:87)
        at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
        at org.bdgenomics.adam.serialization.AvroSerializer.write(ADAMKryoRegistrator.scala:49)
        at org.bdgenomics.adam.serialization.AvroSerializer.write(ADAMKryoRegistrator.scala:38)
        at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
        at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:318)
        at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:293)
        at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
        at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:250)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:236)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
        at org.apache.avro.generic.GenericData.getField(GenericData.java:580)
        at org.apache.avro.generic.GenericData.getField(GenericData.java:595)
        at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:112)
        at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
        at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
        at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114)
        at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
        at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
        ... 12 more
Contributor

arahuja commented Jul 22, 2015

I actually just saw this as well, and not sure it's related to coalesce. I get an NPE when using a Projection with GenotypeField

Maybe a separate issue or some bug in the serializer?

val adamGenotypes = adamContext.loadParquetGenotypes("/some/path,adam", projection = Some(Projection(GenotypeField.alternateReadDepth)))

adamGenotypes.take(1)

gives

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, demeter-csmau10-20.demeter.hpc.mssm.edu): java.lang.NullPointerException: null of org.bdgenomics.formats.avro.Variant in field variant of org.bdgenomics.formats.avro.Genotype
        at org.apache.avro.generic.GenericDatumWriter.npe(GenericDatumWriter.java:93)
        at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:87)
        at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
        at org.bdgenomics.adam.serialization.AvroSerializer.write(ADAMKryoRegistrator.scala:49)
        at org.bdgenomics.adam.serialization.AvroSerializer.write(ADAMKryoRegistrator.scala:38)
        at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
        at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:318)
        at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:293)
        at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
        at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:250)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:236)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
        at org.apache.avro.generic.GenericData.getField(GenericData.java:580)
        at org.apache.avro.generic.GenericData.getField(GenericData.java:595)
        at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:112)
        at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
        at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
        at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114)
        at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
        at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
        ... 12 more
@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Jul 6, 2016

Member

I am not able to reproduce with current git HEAD

$ ./bin/adam-submit --version
Using ADAM_MAIN=org.bdgenomics.adam.cli.ADAMMain
Using SPARK_SUBMIT=/usr/local/bin/spark-submit

       e         888~-_          e             e    e
      d8b        888   \        d8b           d8b  d8b
     /Y88b       888    |      /Y88b         d888bdY88b
    /  Y88b      888    |     /  Y88b       / Y88Y Y888b
   /____Y88b     888   /     /____Y88b     /   YY   Y888b
  /      Y88b    888_-~     /      Y88b   /          Y888b

ADAM version: 0.19.1-SNAPSHOT
Commit: d39b374f0d55a1a931bfd66218588ce3d675e902 Build: 2016-07-06
Built for: Scala 2.10 and Hadoop 2.6.0

$ ./bin/adam-submit vcf2adam adam-core/src/test/resources/small.vcf small.adam
Using ADAM_MAIN=org.bdgenomics.adam.cli.ADAMMain
Using SPARK_SUBMIT=/usr/local/bin/spark-submit

$ ./bin/adam-submit adam2vcf -coalesce 1 small.adam small.vcf
Using ADAM_MAIN=org.bdgenomics.adam.cli.ADAMMain
Using SPARK_SUBMIT=/usr/local/bin/spark-submit
Jul 6, 2016 2:40:28 PM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 1
Jul 6, 2016 2:40:29 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Jul 6, 2016 2:40:29 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 15 records.
Jul 6, 2016 2:40:29 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Jul 6, 2016 2:40:29 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: block read in memory in 35 ms. row count = 15
Member

heuermh commented Jul 6, 2016

I am not able to reproduce with current git HEAD

$ ./bin/adam-submit --version
Using ADAM_MAIN=org.bdgenomics.adam.cli.ADAMMain
Using SPARK_SUBMIT=/usr/local/bin/spark-submit

       e         888~-_          e             e    e
      d8b        888   \        d8b           d8b  d8b
     /Y88b       888    |      /Y88b         d888bdY88b
    /  Y88b      888    |     /  Y88b       / Y88Y Y888b
   /____Y88b     888   /     /____Y88b     /   YY   Y888b
  /      Y88b    888_-~     /      Y88b   /          Y888b

ADAM version: 0.19.1-SNAPSHOT
Commit: d39b374f0d55a1a931bfd66218588ce3d675e902 Build: 2016-07-06
Built for: Scala 2.10 and Hadoop 2.6.0

$ ./bin/adam-submit vcf2adam adam-core/src/test/resources/small.vcf small.adam
Using ADAM_MAIN=org.bdgenomics.adam.cli.ADAMMain
Using SPARK_SUBMIT=/usr/local/bin/spark-submit

$ ./bin/adam-submit adam2vcf -coalesce 1 small.adam small.vcf
Using ADAM_MAIN=org.bdgenomics.adam.cli.ADAMMain
Using SPARK_SUBMIT=/usr/local/bin/spark-submit
Jul 6, 2016 2:40:28 PM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 1
Jul 6, 2016 2:40:29 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Jul 6, 2016 2:40:29 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 15 records.
Jul 6, 2016 2:40:29 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Jul 6, 2016 2:40:29 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: block read in memory in 35 ms. row count = 15
@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jul 6, 2016

Member

Closing as not able to reproduce/issue is old.

Member

fnothaft commented Jul 6, 2016

Closing as not able to reproduce/issue is old.

@fnothaft fnothaft closed this Jul 6, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment