Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAM-1334] Clean up serialization issues in Broadcast region join. #1336

Conversation

@fnothaft
Copy link
Member

fnothaft commented Jan 3, 2017

Resolves #1334. Eliminates type erasure problems by having different concrete implementations per each broadcast. Depends on bigdatagenomics/utils#97.

@AmplabJenkins
Copy link

AmplabJenkins commented Jan 3, 2017

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1715/

Build result: FAILURE

[...truncated 3 lines...]Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prbWiping out workspace first.Cloning the remote Git repositoryCloning repository https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > /home/jenkins/git2/bin/git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1336/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 9b8e3ed # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1336/merge^{commit} # timeout=10Checking out Revision 9b8e3ed (origin/pr/1336/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 9b8e3edf595ec4270648b72a986c9726ba1085afFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@fnothaft
Copy link
Member Author

fnothaft commented Jan 4, 2017

Jenkins, retest this please.

@AmplabJenkins
Copy link

AmplabJenkins commented Jan 4, 2017

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1718/

Build result: FAILURE

[...truncated 3 lines...]Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prbWiping out workspace first.Cloning the remote Git repositoryCloning repository https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > /home/jenkins/git2/bin/git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1336/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 9b8e3ed # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1336/merge^{commit} # timeout=10Checking out Revision 9b8e3ed (origin/pr/1336/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 9b8e3edf595ec4270648b72a986c9726ba1085afFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@heuermh
Copy link
Member

heuermh commented Jan 4, 2017

I assume you would like this in 0.21.0? Feel free to set the milestone.

@fnothaft fnothaft added this to the 0.21.0 milestone Jan 4, 2017
@fnothaft
Copy link
Member Author

fnothaft commented Jan 4, 2017

Ah, yes! We need it for the Variant DB challenge. I've just set the milestone to 0.21.0. I'm going to fix the build failure (it's a small issue where the move_to_xyz scripts need to be updated), and once the build passes, I'll cut a utils release so that we can remove the snapshot dependency.

@heuermh
Copy link
Member

heuermh commented Jan 4, 2017

Sounds good, thanks!

@fnothaft fnothaft force-pushed the fnothaft:issues/1334-tree-serialization-erasure branch from ef72554 to 949970b Jan 5, 2017
@fnothaft
Copy link
Member Author

fnothaft commented Jan 5, 2017

Just pushed a fix for the move_to_xyz scripts. This will not be ready to merge though, until the utils release is cut (which I will do after this passes).

@AmplabJenkins
Copy link

AmplabJenkins commented Jan 5, 2017

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1720/

Build result: FAILURE

[...truncated 3 lines...]Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prbWiping out workspace first.Cloning the remote Git repositoryCloning repository https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > /home/jenkins/git2/bin/git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1336/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains fa154df # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1336/merge^{commit} # timeout=10Checking out Revision fa154df (origin/pr/1336/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f fa154dfaa75c1ab94a9583693abb1703a9803f21First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@heuermh
heuermh approved these changes Jan 5, 2017
import scala.collection.JavaConversions._
import scala.math.max
import scala.reflect.ClassTag

private[adam] case class NucleotideContigFragmentArray(

This comment has been minimized.

@heuermh

heuermh Jan 5, 2017 Member

I think I know the reason already, why do we need concrete classes for each of these?

This comment has been minimized.

@fnothaft

fnothaft Jan 5, 2017 Author Member

Type erasure at serialization time. Since we previously only had generics, we were getting errors that were "last registered IntervalArray class wins". It was great. Uplifting, really.

@@ -74,6 +82,15 @@ trait TreeRegionJoin[T, U] {
*/
case class InnerTreeRegionJoin[T: ClassTag, U: ClassTag]() extends RegionJoin[T, U, T, U] with TreeRegionJoin[T, U] {

def broadcastAndJoin(tree: IntervalArray[ReferenceRegion, T],

This comment has been minimized.

@heuermh

heuermh Jan 5, 2017 Member

Yeah I like this API design...

@fnothaft fnothaft force-pushed the fnothaft:issues/1334-tree-serialization-erasure branch from 949970b to 43e2519 Jan 5, 2017
@AmplabJenkins
Copy link

AmplabJenkins commented Jan 5, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1721/
Test PASSed.

@fnothaft
Copy link
Member Author

fnothaft commented Jan 5, 2017

I will cut a bdg-utils release tomorrow AM.

@heuermh
Copy link
Member

heuermh commented Jan 5, 2017

Is this on HEAD related?

$ ./bin/adam-submit vcf2adam adam-core/src/test/resources/small.vcf small.adam
$ ./bin/adam-submit vcf2adam adam-core/src/test/resources/sorted.vcf sorted.adam
$ ./bin/adam-shell
...
scala> val variants = sc.loadVariants("/Users/heuermh/working/adam/*.adam/*")
variants: org.bdgenomics.adam.rdd.variant.VariantRDD = 
VariantRDD(MapPartitionsRDD[1] at map at ADAMContext.scala:388,SequenceDictionary{
1->249250621, 0
2->249250621, 1
13->249250621, 2},WrappedArray(FILTER=<ID=IndelFS,Description="FS > 200.0">, FILTER=<ID=IndelQD,Description="QD < 2.0">, FILTER=<ID=IndelReadPosRankSum,Description="ReadPosRankSum < -20.0">, FILTER=<ID=LowQual,Description="Low quality">, FILTER=<ID=VQSRTrancheSNP99.50to99.60,Description="Truth sensitivity tranche level for SNP model at VQS Lod: -0.5377 <= x < -0.1787">, FILTER=<ID=VQSRTrancheSNP99.60to99.70,Description="Truth sensitivity tranche level for SNP model at VQS Lod: -1.0634 <= x < -0.5377">, FILTER=<ID=VQSRTrancheSNP99.70to99.80,Description="Truth sensitivity tranche level for SNP model at VQS Lod: -1.7119 <...

scala> variants.rdd.collect.head
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
2017-01-05 10:35:30 ERROR Executor:95 - Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.ArrayStoreException: org.bdgenomics.formats.avro.Genotype
	at scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:88)
	at scala.Array$.slowcopy(Array.scala:81)
	at scala.Array$.copy(Array.scala:107)
	at scala.collection.mutable.ResizableArray$class.copyToArray(ResizableArray.scala:77)
	at scala.collection.mutable.ArrayBuffer.copyToArray(ArrayBuffer.scala:47)
	at scala.collection.TraversableOnce$class.copyToArray(TraversableOnce.scala:241)
	at scala.collection.AbstractTraversable.copyToArray(Traversable.scala:105)
	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:249)
	at scala.collection.AbstractTraversable.toArray(Traversable.scala:105)
	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
	at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
2017-01-05 10:35:30 ERROR Executor:95 - Exception in task 1.0 in stage 0.0 (TID 1)
java.lang.ArrayStoreException: org.bdgenomics.formats.avro.Genotype
	at scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:88)
	at scala.Array$.slowcopy(Array.scala:81)
	at scala.Array$.copy(Array.scala:107)
	at scala.collection.mutable.ResizableArray$class.copyToArray(ResizableArray.scala:77)
	at scala.collection.mutable.ArrayBuffer.copyToArray(ArrayBuffer.scala:47)
	at scala.collection.TraversableOnce$class.copyToArray(TraversableOnce.scala:241)
	at scala.collection.AbstractTraversable.copyToArray(Traversable.scala:105)
	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:249)
	at scala.collection.AbstractTraversable.toArray(Traversable.scala:105)
	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
	at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
2017-01-05 10:35:30 WARN  TaskSetManager:70 - Lost task 1.0 in stage 0.0 (TID 1, localhost): java.lang.ArrayStoreException: org.bdgenomics.formats.avro.Genotype
	at scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:88)
	at scala.Array$.slowcopy(Array.scala:81)
	at scala.Array$.copy(Array.scala:107)
	at scala.collection.mutable.ResizableArray$class.copyToArray(ResizableArray.scala:77)
	at scala.collection.mutable.ArrayBuffer.copyToArray(ArrayBuffer.scala:47)
	at scala.collection.TraversableOnce$class.copyToArray(TraversableOnce.scala:241)
	at scala.collection.AbstractTraversable.copyToArray(Traversable.scala:105)
	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:249)
	at scala.collection.AbstractTraversable.toArray(Traversable.scala:105)
	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
	at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

2017-01-05 10:35:30 ERROR TaskSetManager:74 - Task 1 in stage 0.0 failed 1 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 1 times, most recent failure: Lost task 1.0 in stage 0.0 (TID 1, localhost): java.lang.ArrayStoreException: org.bdgenomics.formats.avro.Genotype
	at scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:88)
	at scala.Array$.slowcopy(Array.scala:81)
	at scala.Array$.copy(Array.scala:107)
	at scala.collection.mutable.ResizableArray$class.copyToArray(ResizableArray.scala:77)
	at scala.collection.mutable.ArrayBuffer.copyToArray(ArrayBuffer.scala:47)
	at scala.collection.TraversableOnce$class.copyToArray(TraversableOnce.scala:241)
	at scala.collection.AbstractTraversable.copyToArray(Traversable.scala:105)
	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:249)
	at scala.collection.AbstractTraversable.toArray(Traversable.scala:105)
	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
	at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
	at scala.Option.foreach(Option.scala:236)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
	at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:33)
	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:38)
	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:42)
	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:44)
	at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:46)
	at $iwC$$iwC$$iwC$$iwC.<init>(<console>:48)
	at $iwC$$iwC$$iwC.<init>(<console>:50)
	at $iwC$$iwC.<init>(<console>:52)
	at $iwC.<init>(<console>:54)
	at <init>(<console>:56)
	at .<init>(<console>:60)
	at .<clinit>(<console>)
	at .<init>(<console>:7)
	at .<clinit>(<console>)
	at $print(<console>)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
	at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
	at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
	at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
	at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
	at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
	at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
	at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
	at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
	at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
	at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
	at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
	at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
	at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
	at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
	at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
	at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
	at org.apache.spark.repl.Main$.main(Main.scala:31)
	at org.apache.spark.repl.Main.main(Main.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ArrayStoreException: org.bdgenomics.formats.avro.Genotype
	at scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:88)
	at scala.Array$.slowcopy(Array.scala:81)
	at scala.Array$.copy(Array.scala:107)
	at scala.collection.mutable.ResizableArray$class.copyToArray(ResizableArray.scala:77)
	at scala.collection.mutable.ArrayBuffer.copyToArray(ArrayBuffer.scala:47)
	at scala.collection.TraversableOnce$class.copyToArray(TraversableOnce.scala:241)
	at scala.collection.AbstractTraversable.copyToArray(Traversable.scala:105)
	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:249)
	at scala.collection.AbstractTraversable.toArray(Traversable.scala:105)
	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
	at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
@fnothaft
Copy link
Member Author

fnothaft commented Jan 5, 2017

Just cut the new bdg-utils release. Once that pushes to maven, I will update this, and we should be able to merge it and cut the ADAM release.

@heuermh
Copy link
Member

heuermh commented Jan 5, 2017

bdg-utils 0.2.11 is now available on Maven Central. Since posting the stack trace above, I've been looking into other things. Should I investigate it further this evening?

fnothaft added 3 commits Jan 3, 2017
Resolves #1334. Eliminates type erasure problems by having different concrete
implementations per each broadcast. Depends on
bigdatagenomics/utils#97.
@fnothaft fnothaft force-pushed the fnothaft:issues/1334-tree-serialization-erasure branch from 43e2519 to 9c0668b Jan 6, 2017
@fnothaft
Copy link
Member Author

fnothaft commented Jan 6, 2017

bdg-utils 0.2.11 is now available on Maven Central. Since posting the stack trace above, I've been looking into other things. Should I investigate it further this evening?

I've just updated to point at the 0.2.11 release. Let me see if I can repro that issue you ran into on my side.

@fnothaft
Copy link
Member Author

fnothaft commented Jan 6, 2017

Oh, I see what's going on in your example. If you want to call sc.loadVariants, you need to provide the -only_variants flag when running vcf2adam. So, you should either change:

$ ./bin/adam-submit vcf2adam adam-core/src/test/resources/small.vcf small.adam
$ ./bin/adam-submit vcf2adam adam-core/src/test/resources/sorted.vcf sorted.adam

to

$ ./bin/adam-submit vcf2adam adam-core/src/test/resources/small.vcf small.adam -only_variants
$ ./bin/adam-submit vcf2adam adam-core/src/test/resources/sorted.vcf sorted.adam -only_variants

Or, change:

val variants = sc.loadVariants("/Users/heuermh/working/adam/*.adam/*")

to either

val variants = sc.loadGenotypes("/Users/heuermh/working/adam/*.adam/*")
  .toVariantContextRDD
  .toVariantRDD

or, more simply:

val genotypes = sc.loadGenotypes("/Users/heuermh/working/adam/*.adam/*")

;)

@AmplabJenkins
Copy link

AmplabJenkins commented Jan 6, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1722/
Test PASSed.

@heuermh
Copy link
Member

heuermh commented Jan 6, 2017

Of course, that does it!

I look forward to wiping out the -only_variants flag in #1327. :)

@heuermh heuermh merged commit 5dcd70b into bigdatagenomics:master Jan 6, 2017
1 check passed
1 check passed
default Merged build finished.
Details
@heuermh
Copy link
Member

heuermh commented Jan 6, 2017

Thank you, @fnothaft!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.