Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAM-1334] Clean up serialization issues in Broadcast region join. #1336

Conversation

fnothaft
Copy link
Member

@fnothaft fnothaft commented Jan 3, 2017

Resolves #1334. Eliminates type erasure problems by having different concrete implementations per each broadcast. Depends on bigdatagenomics/utils#97.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1715/

Build result: FAILURE

[...truncated 3 lines...]Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prbWiping out workspace first.Cloning the remote Git repositoryCloning repository https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > /home/jenkins/git2/bin/git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1336/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 9b8e3ed # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1336/merge^{commit} # timeout=10Checking out Revision 9b8e3ed (origin/pr/1336/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 9b8e3edf595ec4270648b72a986c9726ba1085afFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@fnothaft
Copy link
Member Author

fnothaft commented Jan 4, 2017

Jenkins, retest this please.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1718/

Build result: FAILURE

[...truncated 3 lines...]Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prbWiping out workspace first.Cloning the remote Git repositoryCloning repository https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > /home/jenkins/git2/bin/git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1336/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 9b8e3ed # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1336/merge^{commit} # timeout=10Checking out Revision 9b8e3ed (origin/pr/1336/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 9b8e3edf595ec4270648b72a986c9726ba1085afFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@heuermh
Copy link
Member

heuermh commented Jan 4, 2017

I assume you would like this in 0.21.0? Feel free to set the milestone.

@fnothaft fnothaft added this to the 0.21.0 milestone Jan 4, 2017
@fnothaft
Copy link
Member Author

fnothaft commented Jan 4, 2017

Ah, yes! We need it for the Variant DB challenge. I've just set the milestone to 0.21.0. I'm going to fix the build failure (it's a small issue where the move_to_xyz scripts need to be updated), and once the build passes, I'll cut a utils release so that we can remove the snapshot dependency.

@heuermh
Copy link
Member

heuermh commented Jan 4, 2017

Sounds good, thanks!

@fnothaft
Copy link
Member Author

fnothaft commented Jan 5, 2017

Just pushed a fix for the move_to_xyz scripts. This will not be ready to merge though, until the utils release is cut (which I will do after this passes).

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1720/

Build result: FAILURE

[...truncated 3 lines...]Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prbWiping out workspace first.Cloning the remote Git repositoryCloning repository https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > /home/jenkins/git2/bin/git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1336/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains fa154df # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1336/merge^{commit} # timeout=10Checking out Revision fa154df (origin/pr/1336/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f fa154dfaa75c1ab94a9583693abb1703a9803f21First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

import scala.collection.JavaConversions._
import scala.math.max
import scala.reflect.ClassTag

private[adam] case class NucleotideContigFragmentArray(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I know the reason already, why do we need concrete classes for each of these?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type erasure at serialization time. Since we previously only had generics, we were getting errors that were "last registered IntervalArray class wins". It was great. Uplifting, really.

@@ -74,6 +82,15 @@ trait TreeRegionJoin[T, U] {
*/
case class InnerTreeRegionJoin[T: ClassTag, U: ClassTag]() extends RegionJoin[T, U, T, U] with TreeRegionJoin[T, U] {

def broadcastAndJoin(tree: IntervalArray[ReferenceRegion, T],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I like this API design...

@fnothaft fnothaft force-pushed the issues/1334-tree-serialization-erasure branch from 949970b to 43e2519 Compare January 5, 2017 06:42
@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1721/
Test PASSed.

@fnothaft
Copy link
Member Author

fnothaft commented Jan 5, 2017

I will cut a bdg-utils release tomorrow AM.

@heuermh
Copy link
Member

heuermh commented Jan 5, 2017

Is this on HEAD related?

$ ./bin/adam-submit vcf2adam adam-core/src/test/resources/small.vcf small.adam
$ ./bin/adam-submit vcf2adam adam-core/src/test/resources/sorted.vcf sorted.adam
$ ./bin/adam-shell
...
scala> val variants = sc.loadVariants("/Users/heuermh/working/adam/*.adam/*")
variants: org.bdgenomics.adam.rdd.variant.VariantRDD = 
VariantRDD(MapPartitionsRDD[1] at map at ADAMContext.scala:388,SequenceDictionary{
1->249250621, 0
2->249250621, 1
13->249250621, 2},WrappedArray(FILTER=<ID=IndelFS,Description="FS > 200.0">, FILTER=<ID=IndelQD,Description="QD < 2.0">, FILTER=<ID=IndelReadPosRankSum,Description="ReadPosRankSum < -20.0">, FILTER=<ID=LowQual,Description="Low quality">, FILTER=<ID=VQSRTrancheSNP99.50to99.60,Description="Truth sensitivity tranche level for SNP model at VQS Lod: -0.5377 <= x < -0.1787">, FILTER=<ID=VQSRTrancheSNP99.60to99.70,Description="Truth sensitivity tranche level for SNP model at VQS Lod: -1.0634 <= x < -0.5377">, FILTER=<ID=VQSRTrancheSNP99.70to99.80,Description="Truth sensitivity tranche level for SNP model at VQS Lod: -1.7119 <...

scala> variants.rdd.collect.head
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
2017-01-05 10:35:30 ERROR Executor:95 - Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.ArrayStoreException: org.bdgenomics.formats.avro.Genotype
	at scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:88)
	at scala.Array$.slowcopy(Array.scala:81)
	at scala.Array$.copy(Array.scala:107)
	at scala.collection.mutable.ResizableArray$class.copyToArray(ResizableArray.scala:77)
	at scala.collection.mutable.ArrayBuffer.copyToArray(ArrayBuffer.scala:47)
	at scala.collection.TraversableOnce$class.copyToArray(TraversableOnce.scala:241)
	at scala.collection.AbstractTraversable.copyToArray(Traversable.scala:105)
	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:249)
	at scala.collection.AbstractTraversable.toArray(Traversable.scala:105)
	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
	at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
2017-01-05 10:35:30 ERROR Executor:95 - Exception in task 1.0 in stage 0.0 (TID 1)
java.lang.ArrayStoreException: org.bdgenomics.formats.avro.Genotype
	at scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:88)
	at scala.Array$.slowcopy(Array.scala:81)
	at scala.Array$.copy(Array.scala:107)
	at scala.collection.mutable.ResizableArray$class.copyToArray(ResizableArray.scala:77)
	at scala.collection.mutable.ArrayBuffer.copyToArray(ArrayBuffer.scala:47)
	at scala.collection.TraversableOnce$class.copyToArray(TraversableOnce.scala:241)
	at scala.collection.AbstractTraversable.copyToArray(Traversable.scala:105)
	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:249)
	at scala.collection.AbstractTraversable.toArray(Traversable.scala:105)
	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
	at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
2017-01-05 10:35:30 WARN  TaskSetManager:70 - Lost task 1.0 in stage 0.0 (TID 1, localhost): java.lang.ArrayStoreException: org.bdgenomics.formats.avro.Genotype
	at scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:88)
	at scala.Array$.slowcopy(Array.scala:81)
	at scala.Array$.copy(Array.scala:107)
	at scala.collection.mutable.ResizableArray$class.copyToArray(ResizableArray.scala:77)
	at scala.collection.mutable.ArrayBuffer.copyToArray(ArrayBuffer.scala:47)
	at scala.collection.TraversableOnce$class.copyToArray(TraversableOnce.scala:241)
	at scala.collection.AbstractTraversable.copyToArray(Traversable.scala:105)
	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:249)
	at scala.collection.AbstractTraversable.toArray(Traversable.scala:105)
	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
	at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

2017-01-05 10:35:30 ERROR TaskSetManager:74 - Task 1 in stage 0.0 failed 1 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 1 times, most recent failure: Lost task 1.0 in stage 0.0 (TID 1, localhost): java.lang.ArrayStoreException: org.bdgenomics.formats.avro.Genotype
	at scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:88)
	at scala.Array$.slowcopy(Array.scala:81)
	at scala.Array$.copy(Array.scala:107)
	at scala.collection.mutable.ResizableArray$class.copyToArray(ResizableArray.scala:77)
	at scala.collection.mutable.ArrayBuffer.copyToArray(ArrayBuffer.scala:47)
	at scala.collection.TraversableOnce$class.copyToArray(TraversableOnce.scala:241)
	at scala.collection.AbstractTraversable.copyToArray(Traversable.scala:105)
	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:249)
	at scala.collection.AbstractTraversable.toArray(Traversable.scala:105)
	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
	at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
	at scala.Option.foreach(Option.scala:236)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
	at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:33)
	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:38)
	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:42)
	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:44)
	at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:46)
	at $iwC$$iwC$$iwC$$iwC.<init>(<console>:48)
	at $iwC$$iwC$$iwC.<init>(<console>:50)
	at $iwC$$iwC.<init>(<console>:52)
	at $iwC.<init>(<console>:54)
	at <init>(<console>:56)
	at .<init>(<console>:60)
	at .<clinit>(<console>)
	at .<init>(<console>:7)
	at .<clinit>(<console>)
	at $print(<console>)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
	at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
	at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
	at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
	at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
	at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
	at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
	at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
	at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
	at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
	at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
	at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
	at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
	at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
	at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
	at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
	at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
	at org.apache.spark.repl.Main$.main(Main.scala:31)
	at org.apache.spark.repl.Main.main(Main.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ArrayStoreException: org.bdgenomics.formats.avro.Genotype
	at scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:88)
	at scala.Array$.slowcopy(Array.scala:81)
	at scala.Array$.copy(Array.scala:107)
	at scala.collection.mutable.ResizableArray$class.copyToArray(ResizableArray.scala:77)
	at scala.collection.mutable.ArrayBuffer.copyToArray(ArrayBuffer.scala:47)
	at scala.collection.TraversableOnce$class.copyToArray(TraversableOnce.scala:241)
	at scala.collection.AbstractTraversable.copyToArray(Traversable.scala:105)
	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:249)
	at scala.collection.AbstractTraversable.toArray(Traversable.scala:105)
	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
	at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

@fnothaft
Copy link
Member Author

fnothaft commented Jan 5, 2017

Just cut the new bdg-utils release. Once that pushes to maven, I will update this, and we should be able to merge it and cut the ADAM release.

@heuermh
Copy link
Member

heuermh commented Jan 5, 2017

bdg-utils 0.2.11 is now available on Maven Central. Since posting the stack trace above, I've been looking into other things. Should I investigate it further this evening?

@fnothaft fnothaft force-pushed the issues/1334-tree-serialization-erasure branch from 43e2519 to 9c0668b Compare January 6, 2017 04:23
@fnothaft
Copy link
Member Author

fnothaft commented Jan 6, 2017

bdg-utils 0.2.11 is now available on Maven Central. Since posting the stack trace above, I've been looking into other things. Should I investigate it further this evening?

I've just updated to point at the 0.2.11 release. Let me see if I can repro that issue you ran into on my side.

@fnothaft
Copy link
Member Author

fnothaft commented Jan 6, 2017

Oh, I see what's going on in your example. If you want to call sc.loadVariants, you need to provide the -only_variants flag when running vcf2adam. So, you should either change:

$ ./bin/adam-submit vcf2adam adam-core/src/test/resources/small.vcf small.adam
$ ./bin/adam-submit vcf2adam adam-core/src/test/resources/sorted.vcf sorted.adam

to

$ ./bin/adam-submit vcf2adam adam-core/src/test/resources/small.vcf small.adam -only_variants
$ ./bin/adam-submit vcf2adam adam-core/src/test/resources/sorted.vcf sorted.adam -only_variants

Or, change:

val variants = sc.loadVariants("/Users/heuermh/working/adam/*.adam/*")

to either

val variants = sc.loadGenotypes("/Users/heuermh/working/adam/*.adam/*")
  .toVariantContextRDD
  .toVariantRDD

or, more simply:

val genotypes = sc.loadGenotypes("/Users/heuermh/working/adam/*.adam/*")

;)

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1722/
Test PASSed.

@heuermh
Copy link
Member

heuermh commented Jan 6, 2017

Of course, that does it!

I look forward to wiping out the -only_variants flag in #1327. :)

@heuermh heuermh merged commit 5dcd70b into bigdatagenomics:master Jan 6, 2017
@heuermh
Copy link
Member

heuermh commented Jan 6, 2017

Thank you, @fnothaft!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants