[SPARK-2454] Do not assume drivers and executors share the same Spark home #1472

andrewor14 · 2014-07-17T23:10:34Z

Problem. When standalone Workers launch executors, they inherit the Spark home set by the driver. This means if the worker machines do not share the same directory structure as the driver node, the Workers will attempt to run scripts (e.g. bin/compute-classpath.sh) that do not exist locally and fail. This is a common scenario if the driver is launched from outside of the cluster.

Solution. Simply do not pass the driver's Spark home to the Workers. Note that we should still keep the functionality to optionally send some Spark home to the Workers, in case there are multiple installations of Spark on the worker machines and the application wants to pick among them.

Spark config changes.

spark.home - This is removed and deprecated. The motivation is that this is currently used for 3+ different things and is often confused with SPARK_HOME.
spark.executor.home - This is the Spark home that the executors will use. If this is not set, the Worker will use its own current working directory. This is not set by default.
spark.driver.home - Same as above, but for the driver. This is only relevant for standalone-cluster mode (not yet supported. See SPARK-2260).
spark.test.home - This is the Spark home used only for tests.

Note: #1392 proposes part of the solution described here.

This allows the worker to launch a driver or an executor from a different installation of Spark on the same machine. To do so, the user needs to set "spark.executor.home" and/or "spark.driver.home". Note that this was already possible for the executors even before this commit. However, it used to rely on "spark.home", which was also used for 20 other things. The next step is to remove all usages of "spark.home", which was confusing to many users (myself included).

This involves replacing spark.home to spark.test.home in tests. Looks like python still uses spark.home, however. The next commit will fix this.

This is because we cannot deprecate these constructors easily...

... because the only mode that uses spark.driver.home right now is standalone-cluster, which is broken (SPARK-2260). It makes little sense to document that this feature exists on a mode that is broken.

SparkQA · 2014-07-17T23:13:11Z

QA tests have started for PR 1472. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16794/consoleFull

SparkQA · 2014-07-18T00:34:48Z

QA results for PR 1472:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16794/consoleFull

SparkQA · 2014-07-21T21:13:09Z

QA tests have started for PR 1472. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16923/consoleFull

SparkQA · 2014-07-21T22:34:54Z

QA results for PR 1472:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16923/consoleFull

SparkQA · 2014-07-22T01:33:16Z

QA tests have started for PR 1472. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16938/consoleFull

andrewor14 · 2014-07-22T01:54:31Z

Oops, accidentally closed. Please disregard.

andrewor14 · 2014-07-22T01:57:15Z

I have tested this on a standalone cluster, purposefully changing the directory structure of the driver to be different from that of the executors. I was able to confirm that the the workers now use their own local directory to launch the executors. I also tested changing spark.executor.home to both the valid path and a bogus path, and, as expected, an application with the former runs successfully while the latter fails the application.

SparkQA · 2014-07-22T03:13:18Z

QA results for PR 1472:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16938/consoleFull

pwendell · 2014-07-30T18:44:53Z

Jenkins, retest this please.

pwendell · 2014-07-30T18:45:51Z

core/src/main/scala/org/apache/spark/deploy/Client.scala

@@ -69,13 +69,16 @@ private class ClientActor(driverArgs: ClientArguments, conf: SparkConf) extends
        val javaOpts = sys.props.get(javaOptionsConf)
        val command = new Command(mainClass, Seq("{{WORKER_URL}}", driverArgs.mainClass) ++
          driverArgs.driverOptions, env, classPathEntries, libraryPathEntries, javaOpts)
+        // TODO: document this once standalone-cluster mode is fixed (SPARK-2260)


Does this get updated now?

SparkQA · 2014-07-30T18:49:13Z

QA tests have started for PR 1472. This patch DID NOT merge cleanly!
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17462/consoleFull

SparkQA · 2014-07-30T20:31:34Z

QA results for PR 1472:
- This patch PASSES unit tests.

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17462/consoleFull

vanzin · 2014-07-30T21:07:39Z

core/src/main/scala/org/apache/spark/SparkConf.scala

@@ -121,7 +121,9 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging {
   * Set the location where Spark is installed on worker nodes.
   */
  def setSparkHome(home: String): SparkConf = {


Maybe should mark this as deprecated too?

andrewor14 · 2014-07-31T00:53:08Z

UPDATE: I had a conversation with @pwendell about this. We came to the conclusion that there is really no benefit from having a mechanism to specify an executor home, at least for standalone mode. Even if we have multiple installations of Spark on the worker machines, we can pick which one to connect to by simply specifying a different Master. In either case, we should just use the Worker's current working directory as the executor's (or driver's, in the case of standalone-cluster mode) Spark home.

I will make the relevant changes shortly. If I don't get to it by the 1.1 code freeze, we should just merge in #1392 instead.

Conflicts: core/src/main/scala/org/apache/spark/deploy/Client.scala core/src/main/scala/org/apache/spark/deploy/client/TestClient.scala core/src/main/scala/org/apache/spark/deploy/worker/DriverRunner.scala core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala core/src/test/scala/org/apache/spark/deploy/JsonProtocolSuite.scala core/src/test/scala/org/apache/spark/deploy/worker/ExecutorRunnerTest.scala python/pyspark/conf.py

SparkQA · 2014-08-02T02:59:24Z

QA tests have started for PR 1472. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17741/consoleFull

SparkQA · 2014-08-02T03:23:46Z

QA results for PR 1472:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17741/consoleFull

andrewor14 · 2014-08-02T05:24:35Z

Closing this in favor of #1734. Please disregard this PR.

andrewor14 added 7 commits July 17, 2014 11:42

Remove / deprecate all occurrences of spark.home

b90444d

This involves replacing spark.home to spark.test.home in tests. Looks like python still uses spark.home, however. The next commit will fix this.

Remove usages of spark.home in python

2a64cfc

Add back *SparkContext functionality to setSparkHome

8171062

This is because we cannot deprecate these constructors easily...

Minor deprecation message change

2333c0e

Document spark.executor.home (but not spark.driver.home)

b94020e

... because the only mode that uses spark.driver.home right now is standalone-cluster, which is broken (SPARK-2260). It makes little sense to document that this feature exists on a mode that is broken.

Merge branch 'master' of github.com:apache/spark into spark-home

a50f0e7

andrewor14 changed the title ~~[SPARK-2454] Do not assume drivers and executors share the same Spark home~~ [WIP][SPARK-2454] Do not assume drivers and executors share the same Spark home Jul 18, 2014

Merge branch 'master' of github.com:apache/spark into spark-home

953997a

andrewor14 added 3 commits July 21, 2014 18:28

Fix tests that use local-cluster mode

0014764

Formatting changes (minor)

ecdfa92

Merge branch 'master' of github.com:apache/spark into spark-home

c81f506

andrewor14 changed the title ~~[WIP][SPARK-2454] Do not assume drivers and executors share the same Spark home~~ [SPARK-2454] Do not assume drivers and executors share the same Spark home Jul 22, 2014

andrewor14 closed this Jul 22, 2014

andrewor14 reopened this Jul 22, 2014

pwendell reviewed Jul 30, 2014
View reviewed changes

vanzin reviewed Jul 30, 2014
View reviewed changes

pwendell mentioned this pull request Jul 30, 2014

[SPARK-2290] Worker should directly use its own sparkHome instead of appDesc.sparkHome when LaunchExecutor #1392

Closed

andrewor14 added 2 commits August 1, 2014 19:45

Do not ship any spark home to workers

c6533bc

andrewor14 closed this Aug 2, 2014

andrewor14 deleted the spark-home branch August 2, 2014 05:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-2454] Do not assume drivers and executors share the same Spark home #1472

[SPARK-2454] Do not assume drivers and executors share the same Spark home #1472

andrewor14 commented Jul 17, 2014

SparkQA commented Jul 17, 2014

SparkQA commented Jul 18, 2014

SparkQA commented Jul 21, 2014

SparkQA commented Jul 21, 2014

SparkQA commented Jul 22, 2014

andrewor14 commented Jul 22, 2014

andrewor14 commented Jul 22, 2014

SparkQA commented Jul 22, 2014

pwendell commented Jul 30, 2014

pwendell Jul 30, 2014

SparkQA commented Jul 30, 2014

SparkQA commented Jul 30, 2014

vanzin Jul 30, 2014

andrewor14 commented Jul 31, 2014

SparkQA commented Aug 2, 2014

SparkQA commented Aug 2, 2014

andrewor14 commented Aug 2, 2014

[SPARK-2454] Do not assume drivers and executors share the same Spark home #1472

[SPARK-2454] Do not assume drivers and executors share the same Spark home #1472

Conversation

andrewor14 commented Jul 17, 2014

SparkQA commented Jul 17, 2014

SparkQA commented Jul 18, 2014

SparkQA commented Jul 21, 2014

SparkQA commented Jul 21, 2014

SparkQA commented Jul 22, 2014

andrewor14 commented Jul 22, 2014

andrewor14 commented Jul 22, 2014

SparkQA commented Jul 22, 2014

pwendell commented Jul 30, 2014

pwendell Jul 30, 2014

Choose a reason for hiding this comment

SparkQA commented Jul 30, 2014

SparkQA commented Jul 30, 2014

vanzin Jul 30, 2014

Choose a reason for hiding this comment

andrewor14 commented Jul 31, 2014

SparkQA commented Aug 2, 2014

SparkQA commented Aug 2, 2014

andrewor14 commented Aug 2, 2014