[SPARK-13294] [PROJECT INFRA] Remove MiMa's dependency on spark-class / Spark assembly #11178

JoshRosen · 2016-02-12T01:24:08Z

This patch removes the need to build a full Spark assembly before running the dev/mima script.

I modified the tools project to remove a direct dependency on Spark, so sbt/sbt tools/fullClasspath will now return the classpath for the GenerateMIMAIgnore class itself plus its own dependencies.
- This required me to delete two classes full of dead code that we don't use anymore
GenerateMIMAIgnore now uses ClassUtil to find all of the Spark classes rather than our homemade JAR traversal code. The problem in our own code was that it didn't handle folders of classes properly, which is necessary in order to generate excludes with an assembly-free Spark build.
./dev/mima no longer runs through spark-class, eliminating the need to reason about classpath ordering between SPARK_CLASSPATH and the assembly.

SparkQA · 2016-02-12T01:48:55Z

Test build #51162 has finished for PR 11178 at commit b6f1ce8.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-02-12T02:11:22Z

LGTM provided tests pass.

SparkQA · 2016-02-12T05:01:39Z

Test build #51168 has finished for PR 11178 at commit 5528c48.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-02-12T07:17:01Z

Test build #51176 has finished for PR 11178 at commit bef62eb.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-02-13T00:40:15Z

Test build #51214 has finished for PR 11178 at commit 76a365e.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-02-13T03:19:41Z

Test build #51224 has finished for PR 11178 at commit 9fc0f7a.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-02-13T06:34:18Z

Test build #51228 has finished for PR 11178 at commit 31854eb.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

JoshRosen · 2016-02-13T07:13:06Z

This patch ended up changing substantially, so I'd like @ScrapCodes to take a quick look at it.

In a nutshell:

I modified the tools project to remove a direct dependency on Spark, so sbt/sbt tools/fullClasspath will now return the classpath for the GenerateMIMAIgnore class itself plus its own dependencies.
- This required me to delete two classes full of dead code that we don't use anymore
GenerateMIMAIgnore now uses ClassUtil to find all of the Spark classes rather than our homemade JAR traversal code. The problem in our own code was that it didn't handle folders of classes properly, which is necessary in order to generate excludes with an assembly-free Spark build.
./dev/mima no longer runs through spark-class, eliminating the need to reason about classpath ordering between SPARK_CLASSPATH and the assembly.

JoshRosen · 2016-02-17T02:49:07Z

Jenkins, retest this please.

Any comments here?

ScrapCodes · 2016-02-17T03:13:18Z

@JoshRosen I am sorry for the delay here, I will try to do it today itself.

ScrapCodes · 2016-02-17T04:16:14Z

dev/run-tests.py

@@ -336,7 +336,6 @@ def build_spark_sbt(hadoop_version):
    # Enable all of the profiles for the build:
    build_profiles = get_hadoop_profiles(hadoop_version) + modules.root.build_profile_flags
    sbt_goals = ["package",
-                 "assembly/assembly",


A full assembly is no longer needed ?, how do you configure classpath ?

See https://issues.apache.org/jira/browse/SPARK-9284

Understood, for tests assembly is no longer needed.

ScrapCodes · 2016-02-17T04:30:24Z

Looks good !, I have taken a quick look and did not actually ran it. Hoping the tests will ensure that.

JoshRosen · 2016-02-17T08:19:54Z

Jenkins, retest this please.

ScrapCodes · 2016-02-17T10:18:48Z

Not sure what is causing this:

git fetch --tags --progress https://github.com/apache/spark.git
+refs/pull/11178/*:refs/remotes/origin/pr/11178/* # timeout=15
ERROR: Timeout after 15 minutesERROR
<http://stacktrace.jenkins-ci.org/search?query=ERROR>: Error fetching
remote repo 'origin'

Prashant Sharma

On Wed, Feb 17, 2016 at 2:07 PM, UCB AMPLab notifications@github.com
wrote:

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51422/
Test FAILed.

—
Reply to this email directly or view it on GitHub
#11178 (comment).

JoshRosen · 2016-02-17T17:23:51Z

Jenkins, retest this please.

JoshRosen · 2016-02-17T17:24:13Z

@ScrapCodes, it's some transient Jenkins flakiness, not caused by this PR.

SparkQA · 2016-02-17T19:20:36Z

Test build #51435 has finished for PR 11178 at commit 31854eb.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

JoshRosen · 2016-02-17T19:30:22Z

Failed to find Spark assembly in /home/jenkins/workspace/SparkPullRequestBuilder/assembly/target/scala-2.11.
You need to build Spark before running this program.

I guess that the PySpark test scripts also need to set SPARK_PREPEND_CLASSES and a couple of other environment variables. I'll see about fixing that now.

JoshRosen · 2016-02-17T23:53:30Z

Jenkins, retest this please.

SparkQA · 2016-02-18T00:26:45Z

Test build #51449 has finished for PR 11178 at commit 906d8c8.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

…run-tests

JoshRosen · 2016-03-11T02:32:20Z

tools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala

@@ -133,14 +140,14 @@ object GenerateMIMAIgnore {
    val (privateClasses, privateMembers) = privateWithin("org.apache.spark")
    val previousContents = Try(File(".generated-mima-class-excludes").lines()).
      getOrElse(Iterator.empty).mkString("\n")
-    File(".generated-mima-class-excludes")
-      .writeAll(previousContents + privateClasses.mkString("\n"))
+    File(".generated-mima-class-excludes").writeAll(


Here, I decided to sort things just to make debugging a little nicer for myself.

JoshRosen · 2016-03-11T02:38:26Z

Actually, maybe I can roll back the changes to the exclude generation which were aimed at reducing log noise; those can go in separately and slitting them off will ease review.

…k later).

JoshRosen · 2016-03-11T02:43:26Z

Reverted improvements, so the diff should be tiny. I'll now be able to confirm that the generated excludes match exactly.

SparkQA · 2016-03-11T03:07:19Z

Test build #52887 has finished for PR 11178 at commit 97b5d78.

This patch fails MiMa tests.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2016-03-11T03:07:22Z

Test build #52880 has finished for PR 11178 at commit 373fd52.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

JoshRosen · 2016-03-11T03:09:14Z

[error]  * deprecated method mapPartitionsWithContext(scala.Function2,Boolean,scala.reflect.ClassTag)org.apache.spark.rdd.RDD in class org.apache.spark.rdd.RDD does not have a correspondent in new version
[error]    filter with: ProblemFilters.exclude[MissingMethodProblem]("org.apache.spark.rdd.RDD.mapPartitionsWithContext")
[error]  * synthetic method mapPartitionsWithContext$default$2()Boolean in class org.apache.spark.rdd.RDD does not have a correspondent in new version
[error]    filter with: ProblemFilters.exclude[MissingMethodProblem]("org.apache.spark.rdd.RDD.mapPartitionsWithContext$default$2")
[error]  * synthetic method org$apache$spark$deploy$history$HistoryServer$$detachSparkUI(org.apache.spark.ui.SparkUI)Unit in class org.apache.spark.deploy.history.HistoryServer does not have a correspondent in new version
[error]    filter with: ProblemFilters.exclude[MissingMethodProblem]("org.apache.spark.deploy.history.HistoryServer.org$apache$spark$deploy$history$HistoryServer$$detachSparkUI")

JoshRosen · 2016-03-11T03:13:44Z

Modified the code to give more detail in the error message:

Error instrumenting class:org.apache.spark.deploy.history.HistoryServer$
java.lang.AssertionError: assertion failed: no symbol could be loaded from class org.apache.spark.deploy.history.HistoryServer$ in package history with name HistoryServer$ and classloader sun.misc.Launcher$AppClassLoader@1b6d3586
    at scala.reflect.runtime.JavaMirrors$JavaMirror.scala$reflect$runtime$JavaMirrors$JavaMirror$$classToScala1(JavaMirrors.scala:1020)
    at scala.reflect.runtime.JavaMirrors$JavaMirror$$anonfun$classToScala$1.apply(JavaMirrors.scala:979)
    at scala.reflect.runtime.JavaMirrors$JavaMirror$$anonfun$classToScala$1.apply(JavaMirrors.scala:979)
    at scala.reflect.runtime.JavaMirrors$JavaMirror$$anonfun$toScala$1.apply(JavaMirrors.scala:97)
    at scala.reflect.runtime.TwoWayCaches$TwoWayCache$$anonfun$toScala$1.apply(TwoWayCaches.scala:39)
    at scala.reflect.runtime.Gil$class.gilSynchronized(Gil.scala:19)
    at scala.reflect.runtime.JavaUniverse.gilSynchronized(JavaUniverse.scala:16)
    at scala.reflect.runtime.TwoWayCaches$TwoWayCache.toScala(TwoWayCaches.scala:34)
    at scala.reflect.runtime.JavaMirrors$JavaMirror.toScala(JavaMirrors.scala:95)
    at scala.reflect.runtime.JavaMirrors$JavaMirror.classToScala(JavaMirrors.scala:979)
    at scala.reflect.runtime.JavaMirrors$JavaMirror.classSymbol(JavaMirrors.scala:196)
    at scala.reflect.runtime.JavaMirrors$JavaMirror.classSymbol(JavaMirrors.scala:54)
    at org.apache.spark.tools.GenerateMIMAIgnore$$anonfun$privateWithin$1.apply(GenerateMIMAIgnore.scala:71)
    at org.apache.spark.tools.GenerateMIMAIgnore$$anonfun$privateWithin$1.apply(GenerateMIMAIgnore.scala:69)
    at scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:322)
    at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:978)
    at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:978)
    at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:978)
    at org.apache.spark.tools.GenerateMIMAIgnore$.privateWithin(GenerateMIMAIgnore.scala:69)
    at org.apache.spark.tools.GenerateMIMAIgnore$.main(GenerateMIMAIgnore.scala:135)
    at org.apache.spark.tools.GenerateMIMAIgnore.main(GenerateMIMAIgnore.scala)

JoshRosen · 2016-03-11T03:21:23Z

As it turns out I think we do need to pull in transitive deps of the old spark version given that we're no longer getting transitive deps from the new spark version (via spark-class), which causes reflection issues.

…run-tests

JoshRosen · 2016-03-11T03:51:08Z

Alright, MiMa passes, so I've gone ahead and merged with master to pull in the change to temporarily disable MiMa during the DF -> DS[Row] type aliasing / migration. Provided this passes compilation, I'm going to merge this to try to unblock other patches.

SparkQA · 2016-03-11T04:00:48Z

Test build #52882 has finished for PR 11178 at commit 5d32e74.

This patch passes all tests.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2016-03-11T06:15:40Z

Test build #52896 has finished for PR 11178 at commit 86cd513.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-03-11T06:21:57Z

Test build #52893 has finished for PR 11178 at commit 4070c0d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-03-11T06:50:41Z

Test build #52891 has finished for PR 11178 at commit f9e2b42.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

JoshRosen · 2016-03-11T07:11:06Z

I've rolled back a bunch of changes and minimized this patch to a point where I think it's safe and uncontroversial, so I'm going to merge this into master now so that it doesn't conflict. The changes here should come in handy when we work towards re-enabling the MiMa tests which were disabled in the DF-to-DS migration (/cc @marmbrus @liancheng).

liancheng · 2016-03-11T11:11:57Z

@JoshRosen MiMA check is re-enabled in PR #11656.

nchammas · 2016-03-17T00:40:56Z

For some reason, this PR breaks the following invocation:

./dev/make-distribution.sh -T 1C -Phadoop-2.6

The problem appears to be with this line

SCALA_VERSION=$("$MVN" help:evaluate -Dexpression=scala.binary.version $@ 2>/dev/null\
    | grep -v "INFO"\
    | tail -n 1)

which outputs this when run

+ SCALA_VERSION='[ERROR] Re-run Maven using the -X switch to enable full debug logging.'

Removing the -T 1C fixes it, for some reason.

Any ideas why this PR is interfering with the additional flags passed to Maven?

nchammas · 2016-03-18T18:10:11Z

Looking at the debug output by Maven, it looks like there were some project structure changes that interfere with Maven's ability to do parallel builds.

See: nchammas/flintrock#93 (comment)

Given that, I'm guessing there's nothing to do here since those project changes are probably not worth rejiggering just to get parallel builds working again.

…/ Spark assembly This patch removes the need to build a full Spark assembly before running the `dev/mima` script. - I modified the `tools` project to remove a direct dependency on Spark, so `sbt/sbt tools/fullClasspath` will now return the classpath for the `GenerateMIMAIgnore` class itself plus its own dependencies. - This required me to delete two classes full of dead code that we don't use anymore - `GenerateMIMAIgnore` now uses [ClassUtil](http://software.clapper.org/classutil/) to find all of the Spark classes rather than our homemade JAR traversal code. The problem in our own code was that it didn't handle folders of classes properly, which is necessary in order to generate excludes with an assembly-free Spark build. - `./dev/mima` no longer runs through `spark-class`, eliminating the need to reason about classpath ordering between `SPARK_CLASSPATH` and the assembly. Author: Josh Rosen <joshrosen@databricks.com> Closes apache#11178 from JoshRosen/remove-assembly-in-run-tests.

JoshRosen added 2 commits February 11, 2016 17:19

[BUILD] Remove assembly/assembly step from dev/run-tests.

73b1550

Also remove from Maven build step.

b6f1ce8

Set SPARK_PREPEND_CLASSES in dev/mima.

5528c48

Use 1 instead of true

bef62eb

More attempts towards fixing MiMa.

76a365e

Fix heap sizing.

9fc0f7a

Try using classutil to find Spark classes.

31854eb

ScrapCodes reviewed Feb 17, 2016
View reviewed changes

Fix PySpark tests by setting SPARK_DIST_CLASSPATH

906d8c8

JoshRosen added 2 commits March 1, 2016 11:47

Merge remote-tracking branch 'origin/master' into remove-assembly-in-…

1f995a4

…run-tests

Update to fix classpaths in MiMA.

902b1b7

JoshRosen reviewed Mar 11, 2016
View reviewed changes

Revert attempts at improving MiMa code to minimize diff (will add bac…

97b5d78

…k later).

Turns out that we do need transitive deps.

f9e2b42

Merge remote-tracking branch 'origin/master' into remove-assembly-in-…

4070c0d

…run-tests

We can't use .tree.tpe due to Scala 2.10.

86cd513

JoshRosen mentioned this pull request Mar 11, 2016

[SPARK-13578] [core] Modify launch scripts to not use assemblies. #11591

Closed

asfgit closed this in 6ca990f Mar 11, 2016

JoshRosen deleted the remove-assembly-in-run-tests branch March 11, 2016 07:32

JoshRosen mentioned this pull request Mar 14, 2016

[SPARK-13808][test-maven] Don't build assembly in dev/run-tests #11701

Closed

nchammas mentioned this pull request Mar 17, 2016

Look for Spark's make-distribution.sh script in its new location (plus its current one) nchammas/flintrock#93

Closed

[SPARK-13294] [PROJECT INFRA] Remove MiMa's dependency on spark-class / Spark assembly #11178

[SPARK-13294] [PROJECT INFRA] Remove MiMa's dependency on spark-class / Spark assembly #11178

Conversation

JoshRosen commented Feb 12, 2016

SparkQA commented Feb 12, 2016

rxin commented Feb 12, 2016

SparkQA commented Feb 12, 2016

SparkQA commented Feb 12, 2016

SparkQA commented Feb 13, 2016

SparkQA commented Feb 13, 2016

SparkQA commented Feb 13, 2016

JoshRosen commented Feb 13, 2016

JoshRosen commented Feb 17, 2016

ScrapCodes commented Feb 17, 2016

ScrapCodes Feb 17, 2016

Choose a reason for hiding this comment

JoshRosen Feb 17, 2016

Choose a reason for hiding this comment

ScrapCodes Feb 17, 2016

Choose a reason for hiding this comment

ScrapCodes commented Feb 17, 2016

JoshRosen commented Feb 17, 2016

ScrapCodes commented Feb 17, 2016

JoshRosen commented Feb 17, 2016

JoshRosen commented Feb 17, 2016

SparkQA commented Feb 17, 2016

JoshRosen commented Feb 17, 2016

JoshRosen commented Feb 17, 2016

SparkQA commented Feb 18, 2016

JoshRosen Mar 11, 2016

Choose a reason for hiding this comment

JoshRosen commented Mar 11, 2016

JoshRosen commented Mar 11, 2016

SparkQA commented Mar 11, 2016

SparkQA commented Mar 11, 2016

JoshRosen commented Mar 11, 2016

JoshRosen commented Mar 11, 2016

JoshRosen commented Mar 11, 2016

JoshRosen commented Mar 11, 2016

SparkQA commented Mar 11, 2016

SparkQA commented Mar 11, 2016

SparkQA commented Mar 11, 2016

SparkQA commented Mar 11, 2016

JoshRosen commented Mar 11, 2016

liancheng commented Mar 11, 2016

nchammas commented Mar 17, 2016

nchammas commented Mar 18, 2016