Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-4048] Enhance and extend hadoop-provided profile. #2982

Closed
wants to merge 28 commits into from
Closed
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
284dda6
Rework the "hadoop-provided" profile, add new ones.
Oct 20, 2014
1adf91c
Re-enable maven-install-plugin for a few projects.
Oct 21, 2014
2f95f0d
Propagate classpath to child processes during testing.
Oct 23, 2014
417d90e
Introduce "SPARK_DIST_CLASSPATH".
Oct 23, 2014
4d67469
Propagate SPARK_DIST_CLASSPATH on Yarn.
Oct 27, 2014
d928d62
Redirect child stderr to parent's log.
Oct 28, 2014
9e4e001
Remove duplicate hive profile.
Nov 4, 2014
f7b3bbe
Add snappy to hadoop-provided list.
Nov 4, 2014
1fc4d0b
Update dependencies for hive-thriftserver.
Nov 5, 2014
5c54a25
Fix HiveThriftServer2Suite with *-provided profiles.
Nov 6, 2014
82a54b9
Remove unused profile.
Nov 12, 2014
d1399ed
Restore jetty dependency.
Nov 12, 2014
1be73d4
Restore flume-provided profile.
Nov 15, 2014
7820d58
Fix CliSuite with provided profiles.
Nov 17, 2014
e3ab2da
Fix hive-thriftserver profile.
Nov 17, 2014
115fde5
Simplify a comment (and make it consistent with another pom).
Nov 17, 2014
9640503
Cleanup child process log message.
Nov 17, 2014
8b00b6a
Merge branch 'master' into SPARK-4048
Nov 19, 2014
f24e9e7
Merge branch 'master' into SPARK-4048
Nov 19, 2014
322f882
Fix merge fail.
Nov 20, 2014
7377e7b
Merge branch 'master' into SPARK-4048
Dec 9, 2014
83099fc
Merge branch 'master' into SPARK-4048
Dec 24, 2014
52f366d
Merge branch 'master' into SPARK-4048
Jan 6, 2015
371ebee
Review feedback.
Jan 6, 2015
9ef79a3
Alternative way to propagate test classpath to child processes.
Jan 8, 2015
4e38f4e
Merge branch 'master' into SPARK-4048
Jan 8, 2015
eb228c0
Fix borked merge.
Jan 8, 2015
82eb688
Add a comment.
Jan 8, 2015
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions assembly/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -354,5 +354,25 @@
</dependency>
</dependencies>
</profile>

<!-- Profiles that disable inclusion of certain dependencies. -->
<profile>
<id>hadoop-provided</id>
<properties>
<hadoop.deps.scope>provided</hadoop.deps.scope>
</properties>
</profile>
<profile>
<id>hive-provided</id>
<properties>
<hive.deps.scope>provided</hive.deps.scope>
</properties>
</profile>
<profile>
<id>parquet-provided</id>
<properties>
<parquet.deps.scope>provided</parquet.deps.scope>
</properties>
</profile>
</profiles>
</project>
4 changes: 0 additions & 4 deletions bagel/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -40,10 +40,6 @@
<artifactId>spark-core_${scala.binary.version}</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-server</artifactId>
</dependency>
<dependency>
<groupId>org.scalacheck</groupId>
<artifactId>scalacheck_${scala.binary.version}</artifactId>
Expand Down
7 changes: 7 additions & 0 deletions bin/compute-classpath.cmd
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,13 @@ if "x%YARN_CONF_DIR%"=="x" goto no_yarn_conf_dir
set CLASSPATH=%CLASSPATH%;%YARN_CONF_DIR%
:no_yarn_conf_dir

rem To allow for distributions to append needed libraries to the classpath (e.g. when
rem using the "hadoop-provided" profile to build Spark), check SPARK_DIST_CLASSPATH and
rem append it to tbe final classpath.
if not "x%$SPARK_DIST_CLASSPATH%"=="x" (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can distributions instead just set spark.executor.extraClassPath and spark.driver.extraClassPath? We've intentionally removed/deprecated this type of brute force env variable, and also, it would be good to have only one way of doing this.

set CLASSPATH=%CLASSPATH%;%SPARK_DIST_CLASSPATH%
)

rem A bit of a hack to allow calling this script within run2.cmd without seeing output
if "%DONT_PRINT_CLASSPATH%"=="1" goto exit

Expand Down
7 changes: 7 additions & 0 deletions bin/compute-classpath.sh
Original file line number Diff line number Diff line change
Expand Up @@ -146,4 +146,11 @@ if [ -n "$YARN_CONF_DIR" ]; then
CLASSPATH="$CLASSPATH:$YARN_CONF_DIR"
fi

# To allow for distributions to append needed libraries to the classpath (e.g. when
# using the "hadoop-provided" profile to build Spark), check SPARK_DIST_CLASSPATH and
# append it to tbe final classpath.
if [ -n "$SPARK_DIST_CLASSPATH" ]; then
CLASSPATH="$CLASSPATH:$SPARK_DIST_CLASSPATH"
fi

echo "$CLASSPATH"
Original file line number Diff line number Diff line change
Expand Up @@ -55,19 +55,26 @@ private[spark] class SparkDeploySchedulerBackend(
"{{WORKER_URL}}")
val extraJavaOpts = sc.conf.getOption("spark.executor.extraJavaOptions")
.map(Utils.splitCommandString).getOrElse(Seq.empty)
val classPathEntries = sc.conf.getOption("spark.executor.extraClassPath").toSeq.flatMap { cp =>
cp.split(java.io.File.pathSeparator)
}
val libraryPathEntries =
sc.conf.getOption("spark.executor.extraLibraryPath").toSeq.flatMap { cp =>
cp.split(java.io.File.pathSeparator)
val classPathEntries = sc.conf.getOption("spark.executor.extraClassPath")
.map(_.split(java.io.File.pathSeparator).toSeq).getOrElse(Nil)
val libraryPathEntries = sc.conf.getOption("spark.executor.extraLibraryPath")
.map(_.split(java.io.File.pathSeparator).toSeq).getOrElse(Nil)

// When testing, expose the parent class path to the child. This is processed by
// compute-classpath.{cmd,sh} and makes all needed jars available to child processes
// when the assembly is built with the "*-provided" profiles enabled.
val testingClassPath =
if (sys.props.contains("spark.testing")) {
sys.props("java.class.path").split(java.io.File.pathSeparator).toSeq
} else {
Nil
}

// Start executors with a few necessary configs for registering with the scheduler
val sparkJavaOpts = Utils.sparkJavaOpts(conf, SparkConf.isExecutorStartupConf)
val javaOpts = sparkJavaOpts ++ extraJavaOpts
val command = Command("org.apache.spark.executor.CoarseGrainedExecutorBackend",
args, sc.executorEnvs, classPathEntries, libraryPathEntries, javaOpts)
args, sc.executorEnvs, classPathEntries ++ testingClassPath, libraryPathEntries, javaOpts)
val appUIAddress = sc.ui.map(_.appUIAddress).getOrElse("")
val appDesc = new ApplicationDescription(sc.appName, maxCores, sc.executorMemory, command,
appUIAddress, sc.eventLogDir)
Expand Down
12 changes: 10 additions & 2 deletions core/src/main/scala/org/apache/spark/util/Utils.scala
Original file line number Diff line number Diff line change
Expand Up @@ -990,11 +990,19 @@ private[spark] object Utils extends Logging {
for ((key, value) <- extraEnvironment) {
environment.put(key, value)
}

// When testing, expose the parent class path to the child. This is processed by
// compute-classpath.{cmd,sh} and makes all needed jars available to child processes
// when the assembly is built with the "*-provided" profiles enabled.
if (sys.props.contains("spark.testing")) {
environment.put("SPARK_DIST_CLASSPATH", sys.props("java.class.path"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed in all cases or only when you are actually running tests with a "hadoop provided" build and you've supplied you own SPARK_DIST_CLASSPATH for the test JVM. If the latter is true, can we just propagate the value of SPARK_DIST_CLASSPATH to the child from the parent? It's a bit weird here because you're setting SPARK_DIST_CLASSPATH to also include all of the normal spark classes.

Then you would just check if SPARK_DIST_CLASSPATH is set and if so, you'd propagate its value to the child environment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used to run into problems with tests spawning child processes that referenced hadoop classes, and those not being available when the "hadoop-provided" profile was enabled. Let me try without this and see how it goes, since I noticed some changes in the configuration of the test plugins in the root pom.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if it is still a problem, my question was, can we just set SPARK_DIST_CLASSPATH in the child to the value of SPARK_DIST_CLASSPATH in the parent? This is instead of setting it to the value of java.class.path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But SPARK_DIST_CLASSPATH is not set in the parent (the parent being the surefire/scalatest runner, in this case). The goal of this code is to add the classpath built by those (which includes all the dependencies from the pom) to the child processes without having to modify the test code to manually do it.

}

val process = builder.start()
new Thread("read stderr for " + command(0)) {
override def run() {
for (line <- Source.fromInputStream(process.getErrorStream).getLines()) {
System.err.println(line)
logInfo(line)
}
}
}.start()
Expand Down Expand Up @@ -1089,7 +1097,7 @@ private[spark] object Utils extends Logging {
var firstUserLine = 0
var insideSpark = true
var callStack = new ArrayBuffer[String]() :+ "<unknown>"

Thread.currentThread.getStackTrace().foreach { ste: StackTraceElement =>
// When running under some profilers, the current stack trace might contain some bogus
// frames. This is intended to ensure that we don't crash in these situations by
Expand Down
2 changes: 1 addition & 1 deletion core/src/test/scala/org/apache/spark/DriverSuite.scala
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ class DriverSuite extends FunSuite with Timeouts {
forAll(masters) { (master: String) =>
failAfter(60 seconds) {
Utils.executeAndGetOutput(
Seq("./bin/spark-class", "org.apache.spark.DriverWithoutCleanup", master),
Seq(s"$sparkHome/bin/spark-class", "org.apache.spark.DriverWithoutCleanup", master),
new File(sparkHome),
Map("SPARK_TESTING" -> "1", "SPARK_HOME" -> sparkHome))
}
Expand Down
Loading