adam2vcf Fails with Sample not serializable #1100

Closed
jpdna opened this Issue Aug 6, 2016 · 0 comments

Comments

Projects
None yet
2 participants
@jpdna
Member

jpdna commented Aug 6, 2016

using adam as of commit: e7e1adf
I was attempting a round trip of vcf to adam-parquet to vcf with the following commands
adam-submit vcf2adam HG00096.vcf HG00096.var.adam
(worked fine)

then back to vcf with:
adam-submit adam2vcf HG00096.var.adam outFromAdamHG00096.vcf

the adam2vcf command produced the following error:

adam-submit adam2vcf HG00096.var.adam outFromAdamHG00096.vcf
Using ADAM_MAIN=org.bdgenomics.adam.cli.ADAMMain
Using SPARK_SUBMIT=/jpr1/work/Hbase_July22/spark1.6.1/spark-1.6.1-bin-hadoop2.6/bin/spark-submit
Command body threw exception:
org.apache.spark.SparkException: Task not serializable
Exception in thread "main" org.apache.spark.SparkException: Task not serializable
    at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
    at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
    at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
    at org.apache.spark.SparkContext.clean(SparkContext.scala:2055)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1.apply(RDD.scala:742)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1.apply(RDD.scala:741)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
    at org.apache.spark.rdd.RDD.mapPartitionsWithIndex(RDD.scala:741)
    at org.bdgenomics.adam.rdd.variation.VariantContextRDD.saveAsVcf(VariantContextRDD.scala:117)
    at org.bdgenomics.adam.cli.ADAM2Vcf.run(ADAM2Vcf.scala:83)
    at org.bdgenomics.utils.cli.BDGSparkCommand$class.run(BDGCommand.scala:55)
    at org.bdgenomics.adam.cli.ADAM2Vcf.run(ADAM2Vcf.scala:59)
    at org.bdgenomics.adam.cli.ADAMMain.apply(ADAMMain.scala:131)
    at org.bdgenomics.adam.cli.ADAMMain$.main(ADAMMain.scala:71)
    at org.bdgenomics.adam.cli.ADAMMain.main(ADAMMain.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.NotSerializableException: org.bdgenomics.formats.avro.Sample
Serialization stack:
    - object not serializable (class: org.bdgenomics.formats.avro.Sample, value: {"sampleId": "HG00096", "name": null, "attributes": {}})
    - writeObject data (class: scala.collection.immutable.$colon$colon)
    - object (class scala.collection.immutable.$colon$colon, List({"sampleId": "HG00096", "name": null, "attributes": {}}))
    - field (class: org.bdgenomics.adam.rdd.variation.VariantContextRDD, name: samples, type: interface scala.collection.Seq)
    - object (class org.bdgenomics.adam.rdd.variation.VariantContextRDD, VariantContextRDD(MapPartitionsRDD[4] at map at GenotypeRDD.scala:62,SequenceDictionary{
1->249250621, 0
2->243199373, 1
3->198022430, 2
4->191154276, 3
5->180915260, 4
6->171115067, 5
7->159138663, 6
8->146364022, 7
9->141213431, 8
10->135534747, 9
11->135006516, 10
12->133851895, 11
13->115169878, 12
14->107349540, 13
15->102531392, 14
16->90354753, 15
17->81195210, 16
18->78077248, 17
19->59128983, 18
20->63025520, 19
21->48129895, 20
22->51304566, 21
GL000191.1->106433, 22
GL000192.1->547496, 23
GL000193.1->189789, 24
GL000194.1->191469, 25
GL000195.1->182896, 26
GL000196.1->38914, 27
GL000197.1->37175, 28
GL000198.1->90085, 29
GL000199.1->169874, 30
GL000200.1->187035, 31
GL000201.1->36148, 32
GL000202.1->40103, 33
GL000203.1->37498, 34
GL000204.1->81310, 35
GL000205.1->174588, 36
GL000206.1->41001, 37
GL000207.1->4262, 38
GL000208.1->92689, 39
GL000209.1->159169, 40
GL000210.1->27682, 41
GL000211.1->166566, 42
GL000212.1->186858, 43
GL000213.1->164239, 44
GL000214.1->137718, 45
GL000215.1->172545, 46
GL000216.1->172294, 47
GL000217.1->172149, 48
GL000218.1->161147, 49
GL000219.1->179198, 50
GL000220.1->161802, 51
GL000221.1->155397, 52
GL000222.1->186861, 53
GL000223.1->180455, 54
GL000224.1->179693, 55
GL000225.1->211173, 56
GL000226.1->15008, 57
GL000227.1->128374, 58
GL000228.1->129120, 59
GL000229.1->19913, 60
GL000230.1->43691, 61
GL000231.1->27386, 62
GL000232.1->40652, 63
GL000233.1->45941, 64
GL000234.1->40531, 65
GL000235.1->34474, 66
GL000236.1->41934, 67
GL000237.1->45867, 68
GL000238.1->39939, 69
GL000239.1->33824, 70
GL000240.1->41933, 71
GL000241.1->42152, 72
GL000242.1->43523, 73
GL000243.1->43341, 74
GL000244.1->39929, 75
GL000245.1->36651, 76
GL000246.1->38154, 77
GL000247.1->36422, 78
GL000248.1->39786, 79
GL000249.1->38502, 80
MT->16569, 81
NC_007605->171823, 82
X->155270560, 83
Y->59373566, 84
hs37d5->35477943, 85},List({"sampleId": "HG00096", "name": null, "attributes": {}})))
    - field (class: org.bdgenomics.adam.rdd.variation.VariantContextRDD$$anonfun$4, name: $outer, type: class org.bdgenomics.adam.rdd.variation.VariantContextRDD)
    - object (class org.bdgenomics.adam.rdd.variation.VariantContextRDD$$anonfun$4, <function2>)
    at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
    at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
    at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
    at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:301)
    ... 25 more
Aug 6, 2016 12:02:13 PM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 6

@jpdna jpdna added the bug label Aug 6, 2016

@fnothaft fnothaft self-assigned this Aug 6, 2016

fnothaft added a commit to fnothaft/adam that referenced this issue Aug 6, 2016

[ADAM-1100] Resolve Sample Not Serializable exception
Resolves #1100. Registered `Sample` class with the `AvroSerializer` in
`ADAMKryoRegistrator`.

fnothaft added a commit to fnothaft/adam that referenced this issue Aug 7, 2016

[ADAM-1100] Resolve Sample Not Serializable exception
Resolves #1100. Registered `Sample` class with the `AvroSerializer` in
`ADAMKryoRegistrator`.

@jpdna jpdna closed this in #1101 Aug 10, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment