New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CANNOLI-95] Bump ADAM dependency version to 0.24.0-SNAPSHOT #102

Merged
merged 2 commits into from Feb 8, 2018

Conversation

Projects
None yet
3 participants
@heuermh
Member

heuermh commented Feb 5, 2018

Fixes #95

Work in progress, currently unit tests hang with

- interleave two paired FASTQ files
Exception in thread "dag-scheduler-event-loop" java.lang.NoSuchMethodError: org.apache.hadoop.mapreduce.InputSplit.getLocationInfo()[Lorg/apache/hadoop/mapred/SplitLocationInfo;
	at org.apache.spark.rdd.NewHadoopRDD.getPreferredLocations(NewHadoopRDD.scala:282)
	at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:274)
	at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:274)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:273)
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1633)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$1.apply$mcVI$sp(DAGScheduler.scala:1644)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$1.apply(DAGScheduler.scala:1643)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$1.apply(DAGScheduler.scala:1643)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1643)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1641)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1641)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$1.apply$mcVI$sp(DAGScheduler.scala:1644)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$1.apply(DAGScheduler.scala:1643)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$1.apply(DAGScheduler.scala:1643)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1643)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1641)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1641)
	at org.apache.spark.scheduler.DAGScheduler.getPreferredLocs(DAGScheduler.scala:1607)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$16.apply(DAGScheduler.scala:973)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$16.apply(DAGScheduler.scala:971)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.Iterator$class.foreach(Iterator.scala:893)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
	at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
	at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
	at scala.collection.AbstractTraversable.map(Traversable.scala:104)
	at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:971)
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:930)
	at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:874)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1695)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1687)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1676)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

@heuermh heuermh added this to the 0.2.0 milestone Feb 5, 2018

@heuermh heuermh requested a review from fnothaft Feb 5, 2018

@AmplabJenkins

This comment has been minimized.

AmplabJenkins commented Feb 5, 2018

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/cannoli-prb/98/

Build result: ABORTED

GitHub pull request #102 of commit b78fbf2 automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-03 (centos spark-test) in workspace /home/jenkins/workspace/cannoli-prbWiping out workspace first.Cloning the remote Git repositoryCloning repository https://github.com/bigdatagenomics/cannoli.git > /home/jenkins/git2/bin/git init /home/jenkins/workspace/cannoli-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/cannoli.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/cannoli.git +refs/heads/:refs/remotes/origin/ # timeout=15 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/cannoli.git # timeout=10 > /home/jenkins/git2/bin/git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/cannoli.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/cannoli.git > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/cannoli.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/102/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains e8260cb # timeout=10Checking out Revision e8260cb (origin/pr/102/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f e8260cb30f23c034c6fcdc7e55d24937d76f65baFirst time build. Skipping changelog.Triggering cannoli-prb ? 2.7.3,2.11,2.2.1,centoscannoli-prb ? 2.7.3,2.11,2.2.1,centos completed with result ABORTEDNotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@heuermh

This comment has been minimized.

Member

heuermh commented Feb 8, 2018

Pulling the Spark dependency version back to 2.1.0 passes tests.

In both cases (Spark 2.2.1 vs 2.1.0 as compile-time dependency version), the InputSplit included in the cannoli assembly jar looks like

$ javap -cp assembly/target/cannoli-assembly-spark2_2.11-0.2.0-SNAPSHOT.jar org.apache.hadoop.mapreduce.InputSplit
Compiled from "InputSplit.java"
public abstract class org.apache.hadoop.mapreduce.InputSplit {
  public org.apache.hadoop.mapreduce.InputSplit();
  public abstract long getLength() throws java.io.IOException, java.lang.InterruptedException;
  public abstract java.lang.String[] getLocations() throws java.io.IOException, java.lang.InterruptedException;
}

which as the stack trace indicates is missing a getLocationInfo() method. That change was made ages ago (Hadoop version 2.5.0, apparently).

@AmplabJenkins

This comment has been minimized.

AmplabJenkins commented Feb 8, 2018

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/cannoli-prb/99/
Test PASSed.

@coveralls

This comment has been minimized.

coveralls commented Feb 8, 2018

Coverage Status

Coverage decreased (-0.2%) to 31.592% when pulling 50fd613 on heuermh:adam-snapshot into 91fc32c on bigdatagenomics:master.

@heuermh heuermh merged commit 7c16def into bigdatagenomics:master Feb 8, 2018

1 check passed

default Merged build finished.
Details

@heuermh heuermh deleted the heuermh:adam-snapshot branch Feb 8, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment