Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-13294] [PROJECT INFRA] Remove MiMa's dependency on spark-class / Spark assembly #11178

Closed
wants to merge 25 commits into from
Closed
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
73b1550
[BUILD] Remove assembly/assembly step from dev/run-tests.
JoshRosen Feb 12, 2016
b6f1ce8
Also remove from Maven build step.
JoshRosen Feb 12, 2016
5528c48
Set SPARK_PREPEND_CLASSES in dev/mima.
JoshRosen Feb 12, 2016
bef62eb
Use 1 instead of true
JoshRosen Feb 12, 2016
76a365e
More attempts towards fixing MiMa.
JoshRosen Feb 12, 2016
9fc0f7a
Fix heap sizing.
JoshRosen Feb 13, 2016
31854eb
Try using classutil to find Spark classes.
JoshRosen Feb 13, 2016
906d8c8
Fix PySpark tests by setting SPARK_DIST_CLASSPATH
JoshRosen Feb 17, 2016
1f995a4
Merge remote-tracking branch 'origin/master' into remove-assembly-in-…
JoshRosen Mar 1, 2016
902b1b7
Update to fix classpaths in MiMA.
JoshRosen Mar 1, 2016
a73118e
Update to reflect #11426
JoshRosen Mar 1, 2016
64330d6
Second shot at fixing MiMa
JoshRosen Mar 2, 2016
9a122ad
Merge remote-tracking branch 'origin/master' into remove-assembly-in-…
JoshRosen Mar 10, 2016
db8e532
Update AbstractCommandBuilder to reflect recent relocations
JoshRosen Mar 10, 2016
cd7eb04
Just grab fullClasspath from assembly in MiMa, since we only run MiMA…
JoshRosen Mar 10, 2016
4756a1e
Remove now-unused o.a.s.tools code from SparkClassCommandBuilder
JoshRosen Mar 10, 2016
ae6b002
Fix help command of ./python/run-tests
JoshRosen Mar 10, 2016
dae4725
Bump to working version of sbt-dependency-graph to aid debugging
JoshRosen Mar 10, 2016
373fd52
Roll back non-MiMa-related changes (they'll go in later).
JoshRosen Mar 11, 2016
8ec1cc4
Revert "Bump to working version of sbt-dependency-graph to aid debugg…
JoshRosen Mar 11, 2016
5d32e74
Continue to build assembly for tests, but skip it before MiMa.
JoshRosen Mar 11, 2016
97b5d78
Revert attempts at improving MiMa code to minimize diff (will add bac…
JoshRosen Mar 11, 2016
f9e2b42
Turns out that we do need transitive deps.
JoshRosen Mar 11, 2016
4070c0d
Merge remote-tracking branch 'origin/master' into remove-assembly-in-…
JoshRosen Mar 11, 2016
86cd513
We can't use .tree.tpe due to Scala 2.10.
JoshRosen Mar 11, 2016
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 11 additions & 8 deletions dev/mima
Original file line number Diff line number Diff line change
Expand Up @@ -27,21 +27,24 @@ cd "$FWDIR"
echo -e "q\n" | build/sbt oldDeps/update
rm -f .generated-mima*

generate_mima_ignore() {
SPARK_JAVA_OPTS="-XX:MaxPermSize=1g -Xmx2g" \
./bin/spark-class org.apache.spark.tools.GenerateMIMAIgnore
}

TOOLS_CLASSPATH="$(build/sbt "export tools/fullClasspath" | tail -n1)"

# Generate Mima Ignore is called twice, first with latest built jars
# on the classpath and then again with previous version jars on the classpath.
# Because of a bug in GenerateMIMAIgnore that when old jars are ahead on classpath
# it did not process the new classes (which are in assembly jar).
generate_mima_ignore

export SPARK_CLASSPATH="$(build/sbt "export oldDeps/fullClasspath" | tail -n1)"
echo "SPARK_CLASSPATH=$SPARK_CLASSPATH"
generate_mima_ignore() {
java \
-XX:MaxPermSize=1g \
-Xmx2g \
-cp "$TOOLS_CLASSPATH:$1" \
org.apache.spark.tools.GenerateMIMAIgnore
}

generate_mima_ignore
generate_mima_ignore "$(build/sbt "export assembly/fullClasspath" | tail -n1)"
generate_mima_ignore "$(build/sbt "export oldDeps/fullClasspath" | tail -n1)"

echo -e "q\n" | build/sbt mima-report-binary-issues | grep -v -e "info.*Resolving"
ret_val=$?
Expand Down
3 changes: 1 addition & 2 deletions dev/run-tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -323,7 +323,7 @@ def get_hadoop_profiles(hadoop_version):
def build_spark_maven(hadoop_version):
# Enable all of the profiles for the build:
build_profiles = get_hadoop_profiles(hadoop_version) + modules.root.build_profile_flags
mvn_goals = ["clean", "package", "-DskipTests"]
mvn_goals = ["clean", "package", "-DskipTests", "-pl", "!assembly"]
profiles_and_goals = build_profiles + mvn_goals

print("[info] Building Spark (w/Hive 1.2.1) using Maven with these arguments: ",
Expand All @@ -336,7 +336,6 @@ def build_spark_sbt(hadoop_version):
# Enable all of the profiles for the build:
build_profiles = get_hadoop_profiles(hadoop_version) + modules.root.build_profile_flags
sbt_goals = ["package",
"assembly/assembly",
"streaming-kafka-assembly/assembly",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A full assembly is no longer needed ?, how do you configure classpath ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, for tests assembly is no longer needed.

"streaming-flume-assembly/assembly",
"streaming-mqtt-assembly/assembly",
Expand Down
19 changes: 10 additions & 9 deletions project/SparkBuild.scala
Original file line number Diff line number Diff line change
Expand Up @@ -381,18 +381,19 @@ object OldDeps {

lazy val project = Project("oldDeps", file("dev"), settings = oldDepsSettings)

def versionArtifact(id: String): Option[sbt.ModuleID] = {
val fullId = id + "_2.11"
Some("org.apache.spark" % fullId % "1.2.0")
}

def oldDepsSettings() = Defaults.coreDefaultSettings ++ Seq(
name := "old-deps",
scalaVersion := "2.10.5",
libraryDependencies := Seq("spark-streaming-mqtt", "spark-streaming-zeromq",
"spark-streaming-flume", "spark-streaming-twitter",
"spark-streaming", "spark-mllib", "spark-graphx",
"spark-core").map(versionArtifact(_).get intransitive())
libraryDependencies := Seq(
"spark-streaming-mqtt",
"spark-streaming-zeromq",
"spark-streaming-flume",
"spark-streaming-twitter",
"spark-streaming",
"spark-mllib",
"spark-graphx",
"spark-core"
).map(id => "org.apache.spark" % (id + "_2.11") % "1.2.0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intransitive at the end is no longer needed ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to post a larger followup in a few minutes with diffs to explain this fully, but in a nutshell the GenerateMiMaIgnore was hitting many spurious errors because it couldn't load / reflect on classes since their dependencies were missing; pulling in these transitive deps fixes that.

)
}

Expand Down
15 changes: 5 additions & 10 deletions tools/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -34,16 +34,6 @@
<url>http://spark.apache.org/</url>

<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.binary.version}</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_${scala.binary.version}</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-reflect</artifactId>
Expand All @@ -52,6 +42,11 @@
<groupId>org.scala-lang</groupId>
<artifactId>scala-compiler</artifactId>
</dependency>
<dependency>
<groupId>org.clapper</groupId>
<artifactId>classutil_${scala.binary.version}</artifactId>
<version>1.0.6</version>
</dependency>
</dependencies>

<build>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,13 @@
// scalastyle:off classforname
package org.apache.spark.tools

import java.io.File
import java.util.jar.JarFile

import scala.collection.mutable
import scala.collection.JavaConverters._
import scala.reflect.runtime.{universe => unv}
import scala.reflect.runtime.universe.runtimeMirror
import scala.util.Try

import org.clapper.classutil.ClassFinder

/**
* A tool for generating classes to be excluded during binary checking with MIMA. It is expected
* that this tool is run with ./spark-class.
Expand All @@ -42,12 +40,13 @@ object GenerateMIMAIgnore {
private val classLoader = Thread.currentThread().getContextClassLoader
private val mirror = runtimeMirror(classLoader)

private def isDeveloperApi(sym: unv.Symbol) = sym.annotations.exists {
_.tpe =:= mirror.staticClass("org.apache.spark.annotation.DeveloperApi").toType
}

private def isDeveloperApi(sym: unv.Symbol) =
sym.annotations.exists(_.tpe =:= unv.typeOf[org.apache.spark.annotation.DeveloperApi])

private def isExperimental(sym: unv.Symbol) =
sym.annotations.exists(_.tpe =:= unv.typeOf[org.apache.spark.annotation.Experimental])
private def isExperimental(sym: unv.Symbol) = sym.annotations.exists {
_.tpe =:= mirror.staticClass("org.apache.spark.annotation.Experimental").toType
}


private def isPackagePrivate(sym: unv.Symbol) =
Expand Down Expand Up @@ -160,35 +159,13 @@ object GenerateMIMAIgnore {
* and subpackages both from directories and jars present on the classpath.
*/
private def getClasses(packageName: String): Set[String] = {
val path = packageName.replace('.', '/')
val resources = classLoader.getResources(path)

val jars = resources.asScala.filter(_.getProtocol == "jar")
.map(_.getFile.split(":")(1).split("!")(0)).toSeq

jars.flatMap(getClassesFromJar(_, path))
.map(_.getName)
.filterNot(shouldExclude).toSet
}

/**
* Get all classes in a package from a jar file.
*/
private def getClassesFromJar(jarPath: String, packageName: String) = {
import scala.collection.mutable
val jar = new JarFile(new File(jarPath))
val enums = jar.entries().asScala.map(_.getName).filter(_.startsWith(packageName))
val classes = mutable.HashSet[Class[_]]()
for (entry <- enums if entry.endsWith(".class")) {
try {
classes += Class.forName(entry.replace('/', '.').stripSuffix(".class"), false, classLoader)
} catch {
// scalastyle:off println
case _: Throwable => println("Unable to load:" + entry)
// scalastyle:on println
}
}
classes
val finder = ClassFinder()
finder
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are just part of switching us to use ClassUtil for finding the Spark classes.

.getClasses
.map(_.name)
.filter(_.startsWith(packageName))
.filterNot(shouldExclude)
.toSet
}
}
// scalastyle:on classforname
Loading