Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-13294] [PROJECT INFRA] Remove MiMa's dependency on spark-class / Spark assembly #11178

Closed
wants to merge 25 commits into from

Conversation

JoshRosen
Copy link
Contributor

This patch removes the need to build a full Spark assembly before running the dev/mima script.

  • I modified the tools project to remove a direct dependency on Spark, so sbt/sbt tools/fullClasspath will now return the classpath for the GenerateMIMAIgnore class itself plus its own dependencies.
    • This required me to delete two classes full of dead code that we don't use anymore
  • GenerateMIMAIgnore now uses ClassUtil to find all of the Spark classes rather than our homemade JAR traversal code. The problem in our own code was that it didn't handle folders of classes properly, which is necessary in order to generate excludes with an assembly-free Spark build.
  • ./dev/mima no longer runs through spark-class, eliminating the need to reason about classpath ordering between SPARK_CLASSPATH and the assembly.

@SparkQA
Copy link

SparkQA commented Feb 12, 2016

Test build #51162 has finished for PR 11178 at commit b6f1ce8.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Feb 12, 2016

LGTM provided tests pass.

@SparkQA
Copy link

SparkQA commented Feb 12, 2016

Test build #51168 has finished for PR 11178 at commit 5528c48.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 12, 2016

Test build #51176 has finished for PR 11178 at commit bef62eb.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 13, 2016

Test build #51214 has finished for PR 11178 at commit 76a365e.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 13, 2016

Test build #51224 has finished for PR 11178 at commit 9fc0f7a.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 13, 2016

Test build #51228 has finished for PR 11178 at commit 31854eb.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor Author

This patch ended up changing substantially, so I'd like @ScrapCodes to take a quick look at it.

In a nutshell:

  • I modified the tools project to remove a direct dependency on Spark, so sbt/sbt tools/fullClasspath will now return the classpath for the GenerateMIMAIgnore class itself plus its own dependencies.
    • This required me to delete two classes full of dead code that we don't use anymore
  • GenerateMIMAIgnore now uses ClassUtil to find all of the Spark classes rather than our homemade JAR traversal code. The problem in our own code was that it didn't handle folders of classes properly, which is necessary in order to generate excludes with an assembly-free Spark build.
  • ./dev/mima no longer runs through spark-class, eliminating the need to reason about classpath ordering between SPARK_CLASSPATH and the assembly.

@JoshRosen
Copy link
Contributor Author

Jenkins, retest this please.

Any comments here?

@ScrapCodes
Copy link
Member

@JoshRosen I am sorry for the delay here, I will try to do it today itself.

@@ -336,7 +336,6 @@ def build_spark_sbt(hadoop_version):
# Enable all of the profiles for the build:
build_profiles = get_hadoop_profiles(hadoop_version) + modules.root.build_profile_flags
sbt_goals = ["package",
"assembly/assembly",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A full assembly is no longer needed ?, how do you configure classpath ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, for tests assembly is no longer needed.

@ScrapCodes
Copy link
Member

Looks good !, I have taken a quick look and did not actually ran it. Hoping the tests will ensure that.

@JoshRosen
Copy link
Contributor Author

Jenkins, retest this please.

@ScrapCodes
Copy link
Member

Not sure what is causing this:

git fetch --tags --progress https://github.com/apache/spark.git
+refs/pull/11178/*:refs/remotes/origin/pr/11178/* # timeout=15
ERROR: Timeout after 15 minutesERROR
<http://stacktrace.jenkins-ci.org/search?query=ERROR>: Error fetching
remote repo 'origin'

Prashant Sharma

On Wed, Feb 17, 2016 at 2:07 PM, UCB AMPLab notifications@github.com
wrote:

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51422/
Test FAILed.


Reply to this email directly or view it on GitHub
#11178 (comment).

@JoshRosen
Copy link
Contributor Author

Jenkins, retest this please.

@JoshRosen
Copy link
Contributor Author

@ScrapCodes, it's some transient Jenkins flakiness, not caused by this PR.

@SparkQA
Copy link

SparkQA commented Feb 17, 2016

Test build #51435 has finished for PR 11178 at commit 31854eb.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor Author

Failed to find Spark assembly in /home/jenkins/workspace/SparkPullRequestBuilder/assembly/target/scala-2.11.
You need to build Spark before running this program.

I guess that the PySpark test scripts also need to set SPARK_PREPEND_CLASSES and a couple of other environment variables. I'll see about fixing that now.

@JoshRosen
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Feb 18, 2016

Test build #51449 has finished for PR 11178 at commit 906d8c8.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -133,14 +140,14 @@ object GenerateMIMAIgnore {
val (privateClasses, privateMembers) = privateWithin("org.apache.spark")
val previousContents = Try(File(".generated-mima-class-excludes").lines()).
getOrElse(Iterator.empty).mkString("\n")
File(".generated-mima-class-excludes")
.writeAll(previousContents + privateClasses.mkString("\n"))
File(".generated-mima-class-excludes").writeAll(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, I decided to sort things just to make debugging a little nicer for myself.

@JoshRosen
Copy link
Contributor Author

Actually, maybe I can roll back the changes to the exclude generation which were aimed at reducing log noise; those can go in separately and slitting them off will ease review.

@JoshRosen
Copy link
Contributor Author

Reverted improvements, so the diff should be tiny. I'll now be able to confirm that the generated excludes match exactly.

@SparkQA
Copy link

SparkQA commented Mar 11, 2016

Test build #52887 has finished for PR 11178 at commit 97b5d78.

  • This patch fails MiMa tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 11, 2016

Test build #52880 has finished for PR 11178 at commit 373fd52.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor Author

[error]  * deprecated method mapPartitionsWithContext(scala.Function2,Boolean,scala.reflect.ClassTag)org.apache.spark.rdd.RDD in class org.apache.spark.rdd.RDD does not have a correspondent in new version
[error]    filter with: ProblemFilters.exclude[MissingMethodProblem]("org.apache.spark.rdd.RDD.mapPartitionsWithContext")
[error]  * synthetic method mapPartitionsWithContext$default$2()Boolean in class org.apache.spark.rdd.RDD does not have a correspondent in new version
[error]    filter with: ProblemFilters.exclude[MissingMethodProblem]("org.apache.spark.rdd.RDD.mapPartitionsWithContext$default$2")
[error]  * synthetic method org$apache$spark$deploy$history$HistoryServer$$detachSparkUI(org.apache.spark.ui.SparkUI)Unit in class org.apache.spark.deploy.history.HistoryServer does not have a correspondent in new version
[error]    filter with: ProblemFilters.exclude[MissingMethodProblem]("org.apache.spark.deploy.history.HistoryServer.org$apache$spark$deploy$history$HistoryServer$$detachSparkUI")

@JoshRosen
Copy link
Contributor Author

Modified the code to give more detail in the error message:

Error instrumenting class:org.apache.spark.deploy.history.HistoryServer$
java.lang.AssertionError: assertion failed: no symbol could be loaded from class org.apache.spark.deploy.history.HistoryServer$ in package history with name HistoryServer$ and classloader sun.misc.Launcher$AppClassLoader@1b6d3586
    at scala.reflect.runtime.JavaMirrors$JavaMirror.scala$reflect$runtime$JavaMirrors$JavaMirror$$classToScala1(JavaMirrors.scala:1020)
    at scala.reflect.runtime.JavaMirrors$JavaMirror$$anonfun$classToScala$1.apply(JavaMirrors.scala:979)
    at scala.reflect.runtime.JavaMirrors$JavaMirror$$anonfun$classToScala$1.apply(JavaMirrors.scala:979)
    at scala.reflect.runtime.JavaMirrors$JavaMirror$$anonfun$toScala$1.apply(JavaMirrors.scala:97)
    at scala.reflect.runtime.TwoWayCaches$TwoWayCache$$anonfun$toScala$1.apply(TwoWayCaches.scala:39)
    at scala.reflect.runtime.Gil$class.gilSynchronized(Gil.scala:19)
    at scala.reflect.runtime.JavaUniverse.gilSynchronized(JavaUniverse.scala:16)
    at scala.reflect.runtime.TwoWayCaches$TwoWayCache.toScala(TwoWayCaches.scala:34)
    at scala.reflect.runtime.JavaMirrors$JavaMirror.toScala(JavaMirrors.scala:95)
    at scala.reflect.runtime.JavaMirrors$JavaMirror.classToScala(JavaMirrors.scala:979)
    at scala.reflect.runtime.JavaMirrors$JavaMirror.classSymbol(JavaMirrors.scala:196)
    at scala.reflect.runtime.JavaMirrors$JavaMirror.classSymbol(JavaMirrors.scala:54)
    at org.apache.spark.tools.GenerateMIMAIgnore$$anonfun$privateWithin$1.apply(GenerateMIMAIgnore.scala:71)
    at org.apache.spark.tools.GenerateMIMAIgnore$$anonfun$privateWithin$1.apply(GenerateMIMAIgnore.scala:69)
    at scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:322)
    at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:978)
    at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:978)
    at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:978)
    at org.apache.spark.tools.GenerateMIMAIgnore$.privateWithin(GenerateMIMAIgnore.scala:69)
    at org.apache.spark.tools.GenerateMIMAIgnore$.main(GenerateMIMAIgnore.scala:135)
    at org.apache.spark.tools.GenerateMIMAIgnore.main(GenerateMIMAIgnore.scala)

@JoshRosen
Copy link
Contributor Author

As it turns out I think we do need to pull in transitive deps of the old spark version given that we're no longer getting transitive deps from the new spark version (via spark-class), which causes reflection issues.

@JoshRosen
Copy link
Contributor Author

Alright, MiMa passes, so I've gone ahead and merged with master to pull in the change to temporarily disable MiMa during the DF -> DS[Row] type aliasing / migration. Provided this passes compilation, I'm going to merge this to try to unblock other patches.

@SparkQA
Copy link

SparkQA commented Mar 11, 2016

Test build #52882 has finished for PR 11178 at commit 5d32e74.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 11, 2016

Test build #52896 has finished for PR 11178 at commit 86cd513.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 11, 2016

Test build #52893 has finished for PR 11178 at commit 4070c0d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 11, 2016

Test build #52891 has finished for PR 11178 at commit f9e2b42.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor Author

I've rolled back a bunch of changes and minimized this patch to a point where I think it's safe and uncontroversial, so I'm going to merge this into master now so that it doesn't conflict. The changes here should come in handy when we work towards re-enabling the MiMa tests which were disabled in the DF-to-DS migration (/cc @marmbrus @liancheng).

@asfgit asfgit closed this in 6ca990f Mar 11, 2016
@JoshRosen JoshRosen deleted the remove-assembly-in-run-tests branch March 11, 2016 07:32
@liancheng
Copy link
Contributor

@JoshRosen MiMA check is re-enabled in PR #11656.

@nchammas
Copy link
Contributor

For some reason, this PR breaks the following invocation:

./dev/make-distribution.sh -T 1C -Phadoop-2.6

The problem appears to be with this line

SCALA_VERSION=$("$MVN" help:evaluate -Dexpression=scala.binary.version $@ 2>/dev/null\
    | grep -v "INFO"\
    | tail -n 1)

which outputs this when run

+ SCALA_VERSION='[ERROR] Re-run Maven using the -X switch to enable full debug logging.'

Removing the -T 1C fixes it, for some reason.

Any ideas why this PR is interfering with the additional flags passed to Maven?

@nchammas
Copy link
Contributor

Looking at the debug output by Maven, it looks like there were some project structure changes that interfere with Maven's ability to do parallel builds.

See: nchammas/flintrock#93 (comment)

Given that, I'm guessing there's nothing to do here since those project changes are probably not worth rejiggering just to get parallel builds working again.

roygao94 pushed a commit to roygao94/spark that referenced this pull request Mar 22, 2016
…/ Spark assembly

This patch removes the need to build a full Spark assembly before running the `dev/mima` script.

- I modified the `tools` project to remove a direct dependency on Spark, so `sbt/sbt tools/fullClasspath` will now return the classpath for the `GenerateMIMAIgnore` class itself plus its own dependencies.
   - This required me to delete two classes full of dead code that we don't use anymore
- `GenerateMIMAIgnore` now uses [ClassUtil](http://software.clapper.org/classutil/) to find all of the Spark classes rather than our homemade JAR traversal code. The problem in our own code was that it didn't handle folders of classes properly, which is necessary in order to generate excludes with an assembly-free Spark build.
- `./dev/mima` no longer runs through `spark-class`, eliminating the need to reason about classpath ordering between `SPARK_CLASSPATH` and the assembly.

Author: Josh Rosen <joshrosen@databricks.com>

Closes apache#11178 from JoshRosen/remove-assembly-in-run-tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
8 participants