New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adam-0.18.2 can not load Adam-0.14.0 adamSave function data (sam) #1050

Closed
xubo245 opened this Issue Jun 9, 2016 · 4 comments

Comments

Projects
None yet
3 participants
@xubo245

xubo245 commented Jun 9, 2016

There is not adamLoad function in adam-0.18.2,And I use :
val rdd = sc.loadParquetAlignments(samFile)
or
val rdd = sc.loadBam(samFile)
Both error!

samFile is Adam formats saving by adam-0.14.0 adamSave function

Please tell me how to fix it?

@xubo245

This comment has been minimized.

Show comment
Hide comment
@xubo245

xubo245 Jun 9, 2016

part error:

2016-06-09 20:39:10 ERROR Executor:96 - Exception in task 0.0 in stage 0.0 (TID 0)
org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block 0 in file hdfs://master:9000/xubo/alignment/output/g38L100c50Nhs20upload2.adam/0/part-r-00000.gz.parquet
at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:228)
at org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:201)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:163)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1553)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1125)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1125)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ClassCastException: org.bdgenomics.formats.avro.Contig cannot be cast to java.lang.Integer
at org.bdgenomics.formats.avro.AlignmentRecord.put(AlignmentRecord.java:257)
at org.apache.parquet.avro.AvroIndexedRecordConverter.set(AvroIndexedRecordConverter.java:157)
at org.apache.parquet.avro.AvroIndexedRecordConverter.access$000(AvroIndexedRecordConverter.java:42)
at org.apache.parquet.avro.AvroIndexedRecordConverter$1.add(AvroIndexedRecordConverter.java:92)
at org.apache.parquet.avro.AvroIndexedRecordConverter.end(AvroIndexedRecordConverter.java:177)
at org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:413)
at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:209)
... 15 more
2016-06-09 20:39:10 WARN TaskSetManager:71 - Lost task 0.0 in stage 0.0 (TID 0, localhost): org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block 0 in file hdfs://master:9000/xubo/alignment/output/g38L100c50Nhs20upload2.adam/0/part-r-00000.gz.parquet
at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:228)
at org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:201)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:163)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1553)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1125)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1125)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)

xubo245 commented Jun 9, 2016

part error:

2016-06-09 20:39:10 ERROR Executor:96 - Exception in task 0.0 in stage 0.0 (TID 0)
org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block 0 in file hdfs://master:9000/xubo/alignment/output/g38L100c50Nhs20upload2.adam/0/part-r-00000.gz.parquet
at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:228)
at org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:201)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:163)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1553)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1125)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1125)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ClassCastException: org.bdgenomics.formats.avro.Contig cannot be cast to java.lang.Integer
at org.bdgenomics.formats.avro.AlignmentRecord.put(AlignmentRecord.java:257)
at org.apache.parquet.avro.AvroIndexedRecordConverter.set(AvroIndexedRecordConverter.java:157)
at org.apache.parquet.avro.AvroIndexedRecordConverter.access$000(AvroIndexedRecordConverter.java:42)
at org.apache.parquet.avro.AvroIndexedRecordConverter$1.add(AvroIndexedRecordConverter.java:92)
at org.apache.parquet.avro.AvroIndexedRecordConverter.end(AvroIndexedRecordConverter.java:177)
at org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:413)
at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:209)
... 15 more
2016-06-09 20:39:10 WARN TaskSetManager:71 - Lost task 0.0 in stage 0.0 (TID 0, localhost): org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block 0 in file hdfs://master:9000/xubo/alignment/output/g38L100c50Nhs20upload2.adam/0/part-r-00000.gz.parquet
at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:228)
at org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:201)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:163)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1553)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1125)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1125)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Jun 13, 2016

Member

Up until we hit version 1.0 and start semantic versioning our Avro formats and ADAM APIs are not guaranteed to be compatible between versions. You've ran into both situations in this example.

If you want to go from BAM via ADAM to Avro and then load the Avro via ADAM for further processing then both ADAM steps need to be with the same version. In this case, I would recommend the latest release, version 0.19.0.

Member

heuermh commented Jun 13, 2016

Up until we hit version 1.0 and start semantic versioning our Avro formats and ADAM APIs are not guaranteed to be compatible between versions. You've ran into both situations in this example.

If you want to go from BAM via ADAM to Avro and then load the Avro via ADAM for further processing then both ADAM steps need to be with the same version. In this case, I would recommend the latest release, version 0.19.0.

@xubo245

This comment has been minimized.

Show comment
Hide comment
@xubo245

xubo245 Jun 14, 2016

@heuermh Thanks ,I am using two different soft which both build on Adam,I want to use them to build pipline, but the adam version different ...
But I used SparkSQL can read the data,maybe I should try to different methods

xubo245 commented Jun 14, 2016

@heuermh Thanks ,I am using two different soft which both build on Adam,I want to use them to build pipline, but the adam version different ...
But I used SparkSQL can read the data,maybe I should try to different methods

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jul 6, 2016

Member

Closing as won't fix until 1.0.0.

Member

fnothaft commented Jul 6, 2016

Closing as won't fix until 1.0.0.

@fnothaft fnothaft closed this Jul 6, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment