Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-2454] Do not assume drivers and executors share the same Spark home #1472

Closed
wants to merge 13 commits into from

Conversation

andrewor14
Copy link
Contributor

Problem. When standalone Workers launch executors, they inherit the Spark home set by the driver. This means if the worker machines do not share the same directory structure as the driver node, the Workers will attempt to run scripts (e.g. bin/compute-classpath.sh) that do not exist locally and fail. This is a common scenario if the driver is launched from outside of the cluster.

Solution. Simply do not pass the driver's Spark home to the Workers. Note that we should still keep the functionality to optionally send some Spark home to the Workers, in case there are multiple installations of Spark on the worker machines and the application wants to pick among them.

Spark config changes.

  • spark.home - This is removed and deprecated. The motivation is that this is currently used for 3+ different things and is often confused with SPARK_HOME.
  • spark.executor.home - This is the Spark home that the executors will use. If this is not set, the Worker will use its own current working directory. This is not set by default.
  • spark.driver.home - Same as above, but for the driver. This is only relevant for standalone-cluster mode (not yet supported. See SPARK-2260).
  • spark.test.home - This is the Spark home used only for tests.

Note: #1392 proposes part of the solution described here.

This allows the worker to launch a driver or an executor from a
different installation of Spark on the same machine. To do so, the
user needs to set "spark.executor.home" and/or "spark.driver.home".

Note that this was already possible for the executors even before
this commit. However, it used to rely on "spark.home", which was
also used for 20 other things. The next step is to remove all usages
of "spark.home", which was confusing to many users (myself included).
This involves replacing spark.home to spark.test.home in tests.
Looks like python still uses spark.home, however. The next commit
will fix this.
This is because we cannot deprecate these constructors easily...
... because the only mode that uses spark.driver.home right now is
standalone-cluster, which is broken (SPARK-2260). It makes little
sense to document that this feature exists on a mode that is broken.
@SparkQA
Copy link

SparkQA commented Jul 17, 2014

QA tests have started for PR 1472. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16794/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 18, 2014

QA results for PR 1472:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16794/consoleFull

@andrewor14 andrewor14 changed the title [SPARK-2454] Do not assume drivers and executors share the same Spark home [WIP][SPARK-2454] Do not assume drivers and executors share the same Spark home Jul 18, 2014
@SparkQA
Copy link

SparkQA commented Jul 21, 2014

QA tests have started for PR 1472. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16923/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 21, 2014

QA results for PR 1472:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16923/consoleFull

@andrewor14 andrewor14 changed the title [WIP][SPARK-2454] Do not assume drivers and executors share the same Spark home [SPARK-2454] Do not assume drivers and executors share the same Spark home Jul 22, 2014
@SparkQA
Copy link

SparkQA commented Jul 22, 2014

QA tests have started for PR 1472. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16938/consoleFull

@andrewor14 andrewor14 closed this Jul 22, 2014
@andrewor14
Copy link
Contributor Author

Oops, accidentally closed. Please disregard.

@andrewor14 andrewor14 reopened this Jul 22, 2014
@andrewor14
Copy link
Contributor Author

I have tested this on a standalone cluster, purposefully changing the directory structure of the driver to be different from that of the executors. I was able to confirm that the the workers now use their own local directory to launch the executors. I also tested changing spark.executor.home to both the valid path and a bogus path, and, as expected, an application with the former runs successfully while the latter fails the application.

@SparkQA
Copy link

SparkQA commented Jul 22, 2014

QA results for PR 1472:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16938/consoleFull

@pwendell
Copy link
Contributor

Jenkins, retest this please.

@@ -69,13 +69,16 @@ private class ClientActor(driverArgs: ClientArguments, conf: SparkConf) extends
val javaOpts = sys.props.get(javaOptionsConf)
val command = new Command(mainClass, Seq("{{WORKER_URL}}", driverArgs.mainClass) ++
driverArgs.driverOptions, env, classPathEntries, libraryPathEntries, javaOpts)
// TODO: document this once standalone-cluster mode is fixed (SPARK-2260)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this get updated now?

@SparkQA
Copy link

SparkQA commented Jul 30, 2014

QA tests have started for PR 1472. This patch DID NOT merge cleanly!
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17462/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 30, 2014

QA results for PR 1472:
- This patch PASSES unit tests.

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17462/consoleFull

@@ -121,7 +121,9 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging {
* Set the location where Spark is installed on worker nodes.
*/
def setSparkHome(home: String): SparkConf = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe should mark this as deprecated too?

@andrewor14
Copy link
Contributor Author

UPDATE: I had a conversation with @pwendell about this. We came to the conclusion that there is really no benefit from having a mechanism to specify an executor home, at least for standalone mode. Even if we have multiple installations of Spark on the worker machines, we can pick which one to connect to by simply specifying a different Master. In either case, we should just use the Worker's current working directory as the executor's (or driver's, in the case of standalone-cluster mode) Spark home.

I will make the relevant changes shortly. If I don't get to it by the 1.1 code freeze, we should just merge in #1392 instead.

Conflicts:
	core/src/main/scala/org/apache/spark/deploy/Client.scala
	core/src/main/scala/org/apache/spark/deploy/client/TestClient.scala
	core/src/main/scala/org/apache/spark/deploy/worker/DriverRunner.scala
	core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala
	core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
	core/src/test/scala/org/apache/spark/deploy/JsonProtocolSuite.scala
	core/src/test/scala/org/apache/spark/deploy/worker/ExecutorRunnerTest.scala
	python/pyspark/conf.py
@andrewor14 andrewor14 closed this Aug 2, 2014
@SparkQA
Copy link

SparkQA commented Aug 2, 2014

QA tests have started for PR 1472. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17741/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 2, 2014

QA results for PR 1472:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17741/consoleFull

@andrewor14
Copy link
Contributor Author

Closing this in favor of #1734. Please disregard this PR.

@andrewor14 andrewor14 deleted the spark-home branch August 2, 2014 05:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants