Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.io.NotSerializableException: org.bdgenomics.formats.avro.AlignmentRecord #1240

Closed
Fei-Guang opened this issue Nov 4, 2016 · 4 comments
Closed

Comments

@Fei-Guang
Copy link

@Fei-Guang Fei-Guang commented Nov 4, 2016

i run the the following code in idea

val spark = SparkSession
  .builder
  .master("local[*]")
  .appName("Anno BDG")
  .getOrCreate()

//set new runtime options
spark.conf.set("spark.sql.shuffle.partitions", 6)
spark.conf.set("spark.executor.memory", "2g")
spark.conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
spark.conf.set("spark.kryo.registrator", "org.bdgenomics.adam.serialization.ADAMKryoRegistrator")
val sc = spark.sparkContext
val reads = sc.loadAlignments("/data/sample.rmdup.bam")
val lines = sc.textFile("/data/win_100k.use_50mer")
// conceptually, we could just do reads.rdd.zip(lines), but!
// we aren't guaranteed that both RDDs have the same number
// of records in each partition, so zipWithIndex followed by join
// is (slower, but) safer
val zippedLinesAndReads = reads.rdd
  .zipWithIndex
  .map(_.swap)
  .join(lines.zipWithIndex.map(_.swap))

val countsByChromosome = zippedLinesAndReads.flatMap(kv => {
  val (_, (read, line)) = kv

  // get the range from the rdd2.kmer file
  val columns = line.split("\t[") // i assume this is tab delimited?
  val start = columns(4).toLong
  val end = columns(5).toLong

  // is the alignment start position between the start and end pos from the line?
  // if yes, emit the chromosome name and 1
  if (start <= read.getStart && read.getStart < end) {
    Some((read.getContigName, 1))
  } else {
    None
  }
}).reduceByKeyLocally(_ + _)

MY ENV:

spark-2.0.1-bin-hadoop2.6
adam-distribution-spark2_2.11-0.20.0
scala-2.11.8

it reports the following error:

java.io.NotSerializableException: org.bdgenomics.formats.avro.AlignmentRecord
Serialization stack:

  • object not serializable (class: org.bdgenomics.formats.avro.AlignmentRecord, value: {"readInFragment": 0, "contigName": "chr10", "start": 61758687, "oldPosition": null, "end": 61758727, "mapq": 25, "readName": "NB501244AR:119:HJY3WBGXY:2:11112:6137:19359", "sequence": "AAAATACTGAGACTTATCAGAATTTCAGGCTAAAGCAACC", "qual": "AAAAAAEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEE", "cigar": "40M", "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": false, "properPair": false, "readMapped": true, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": true, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": "40", "origQual": null, "attributes": "XT:A:U\tXO:i:0\tXM:i:0\tNM:i:0\tXG:i:0\tX1:i:0\tX0:i:1", "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null})
    at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
    at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
    at org.apache.spark.serializer.SerializationStream.writeValue(Serializer.scala:135)
    at org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:185)
    at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:150)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
    at org.apache.spark.scheduler.Task.run(Task.scala:86)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
    2016-11-04 10:30:56 ERROR TaskSetManager:70 - Task 0.0 in stage 2.0 (TID 9) had a not serializable result: org.bdgenomics.formats.avro.AlignmentRecord
    Serialization stack:
  • object not serializable (class: org.bdgenomics.formats.avro.AlignmentRecord, value: {"readInFragment": 0, "contigName": "chr1", "start": 10001, "oldPosition": null, "end": 10041, "mapq": 0, "readName": "NB501244AR:119:HJY3WBGXY:3:11508:7857:8792", "sequence": "AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC", "qual": "///E////6E////EEAEEE/EEEEEEEEEEEEAEAAA/A", "cigar": "40M", "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": false, "properPair": false, "readMapped": true, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": true, "mateNegativeStrand": false, "primaryAlignment": true, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": "40", "origQual": null, "attributes": "XT:A:R\tXO:i:0\tXM:i:0\tNM:i:0\tXG:i:0\tX0:i:594", "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}); not retrying
    2016-11-04 10:30:56 ERROR TaskSetManager:70 - Task 4.0 in stage 2.0 (TID 13) had a not serializable result: org.bdgenomics.formats.avro.AlignmentRecord
    Serialization stack:
  • object not serializable (class: org.bdgenomics.formats.avro.AlignmentRecord, value: {"readInFragment": 0, "contigName": "chr10", "start": 61758687, "oldPosition": null, "end": 61758727, "mapq": 25, "readName": "NB501244AR:119:HJY3WBGXY:2:11112:6137:19359", "sequence": "AAAATACTGAGACTTATCAGAATTTCAGGCTAAAGCAACC", "qual": "AAAAAAEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEE", "cigar": "40M", "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": false, "properPair": false, "readMapped": true, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": true, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": "40", "origQual": null, "attributes": "XT:A:U\tXO:i:0\tXM:i:0\tNM:i:0\tXG:i:0\tX1:i:0\tX0:i:1", "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}); not retrying
    2016-11-04 10:30:56 ERROR TaskSetManager:70 - Task 3.0 in stage 2.0 (TID 12) had a not serializable result: org.bdgenomics.formats.avro.AlignmentRecord
    Serialization stack:
  • object not serializable (class: org.bdgenomics.formats.avro.AlignmentRecord, value: {"readInFragment": 0, "contigName": "chr7", "start": 68163823, "oldPosition": null, "end": 68163863, "mapq": 0, "readName": "NB501244AR:119:HJY3WBGXY:4:21602:16293:18064", "sequence": "TGTGAGGGTGTTGCCCAAAAGAGATTAACATTTGAGTCAG", "qual": "AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE", "cigar": "40M", "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": false, "properPair": false, "readMapped": true, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": true, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": "40", "origQual": null, "attributes": "XT:A:R\tXO:i:0\tXM:i:0\tNM:i:0\tXG:i:0\tXA:Z:chr3,-84617448,40M,0;\tX1:i:0\tX0:i:2", "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}); not retrying
    2016-11-04 10:30:56 ERROR TaskSetManager:70 - Task 2.0 in stage 2.0 (TID 11) had a not serializable result: org.bdgenomics.formats.avro.AlignmentRecord
    Serialization stack:
  • object not serializable (class: org.bdgenomics.formats.avro.AlignmentRecord, value: {"readInFragment": 0, "contigName": "chr4", "start": 181076278, "oldPosition": null, "end": 181076318, "mapq": 25, "readName": "NB501244AR:119:HJY3WBGXY:2:23302:26459:8305", "sequence": "CACTGTGTTTTACTTCTATTTTAAAAAACCTGAAGGCTAT", "qual": "EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA", "cigar": "40M", "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": false, "properPair": false, "readMapped": true, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": true, "mateNegativeStrand": false, "primaryAlignment": true, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": "40", "origQual": null, "attributes": "XT:A:U\tXO:i:0\tXM:i:0\tNM:i:0\tXG:i:0\tX1:i:0\tX0:i:1", "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}); not retrying

Process finished with exit code 1

@Fei-Guang
Copy link
Author

@Fei-Guang Fei-Guang commented Nov 4, 2016

Hi @Fei-Guang! Where are you running that? Are you running that in spark-shell? ADAM relies on a >>>custom Kryo serializer Registrator for serialization. If you use ./bin/adam-shell, this starts a Spark >>>shell >>>where the serialization config (and classpath) are set up.

hello @fnothaft i run it in idea, how to register a Kryo serializer in idea?

@Fei-Guang
Copy link
Author

@Fei-Guang Fei-Guang commented Nov 4, 2016

"$SPARK_SHELL"
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer
--conf spark.kryo.registrator=org.bdgenomics.adam.serialization.ADAMKryoRegistrator \

spark.conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
spark.conf.set("spark.kryo.registrator", "org.bdgenomics.adam.serialization.ADAMKryoRegistrator")
@Fei-Guang Fei-Guang closed this Nov 4, 2016
@Fei-Guang Fei-Guang reopened this Nov 4, 2016
@Fei-Guang
Copy link
Author

@Fei-Guang Fei-Guang commented Nov 4, 2016

it's spark bug

@Fei-Guang Fei-Guang closed this Nov 4, 2016
@avkonst
Copy link

@avkonst avkonst commented Mar 11, 2017

"it's spark bug" is there a link to it? what version of spark where it is solved?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.