-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-2454] Do not assume drivers and executors share the same Spark home #1472
Conversation
This allows the worker to launch a driver or an executor from a different installation of Spark on the same machine. To do so, the user needs to set "spark.executor.home" and/or "spark.driver.home". Note that this was already possible for the executors even before this commit. However, it used to rely on "spark.home", which was also used for 20 other things. The next step is to remove all usages of "spark.home", which was confusing to many users (myself included).
This involves replacing spark.home to spark.test.home in tests. Looks like python still uses spark.home, however. The next commit will fix this.
This is because we cannot deprecate these constructors easily...
... because the only mode that uses spark.driver.home right now is standalone-cluster, which is broken (SPARK-2260). It makes little sense to document that this feature exists on a mode that is broken.
QA tests have started for PR 1472. This patch merges cleanly. |
QA results for PR 1472: |
QA tests have started for PR 1472. This patch merges cleanly. |
QA results for PR 1472: |
QA tests have started for PR 1472. This patch merges cleanly. |
Oops, accidentally closed. Please disregard. |
I have tested this on a standalone cluster, purposefully changing the directory structure of the driver to be different from that of the executors. I was able to confirm that the the workers now use their own local directory to launch the executors. I also tested changing |
QA results for PR 1472: |
Jenkins, retest this please. |
@@ -69,13 +69,16 @@ private class ClientActor(driverArgs: ClientArguments, conf: SparkConf) extends | |||
val javaOpts = sys.props.get(javaOptionsConf) | |||
val command = new Command(mainClass, Seq("{{WORKER_URL}}", driverArgs.mainClass) ++ | |||
driverArgs.driverOptions, env, classPathEntries, libraryPathEntries, javaOpts) | |||
// TODO: document this once standalone-cluster mode is fixed (SPARK-2260) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this get updated now?
QA tests have started for PR 1472. This patch DID NOT merge cleanly! |
QA results for PR 1472: |
@@ -121,7 +121,9 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging { | |||
* Set the location where Spark is installed on worker nodes. | |||
*/ | |||
def setSparkHome(home: String): SparkConf = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe should mark this as deprecated too?
UPDATE: I had a conversation with @pwendell about this. We came to the conclusion that there is really no benefit from having a mechanism to specify an executor home, at least for standalone mode. Even if we have multiple installations of Spark on the worker machines, we can pick which one to connect to by simply specifying a different Master. In either case, we should just use the Worker's current working directory as the executor's (or driver's, in the case of standalone-cluster mode) Spark home. I will make the relevant changes shortly. If I don't get to it by the 1.1 code freeze, we should just merge in #1392 instead. |
Conflicts: core/src/main/scala/org/apache/spark/deploy/Client.scala core/src/main/scala/org/apache/spark/deploy/client/TestClient.scala core/src/main/scala/org/apache/spark/deploy/worker/DriverRunner.scala core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala core/src/test/scala/org/apache/spark/deploy/JsonProtocolSuite.scala core/src/test/scala/org/apache/spark/deploy/worker/ExecutorRunnerTest.scala python/pyspark/conf.py
QA tests have started for PR 1472. This patch merges cleanly. |
QA results for PR 1472: |
Closing this in favor of #1734. Please disregard this PR. |
Problem. When standalone Workers launch executors, they inherit the Spark home set by the driver. This means if the worker machines do not share the same directory structure as the driver node, the Workers will attempt to run scripts (e.g.
bin/compute-classpath.sh
) that do not exist locally and fail. This is a common scenario if the driver is launched from outside of the cluster.Solution. Simply do not pass the driver's Spark home to the Workers. Note that we should still keep the functionality to optionally send some Spark home to the Workers, in case there are multiple installations of Spark on the worker machines and the application wants to pick among them.
Spark config changes.
spark.home
- This is removed and deprecated. The motivation is that this is currently used for 3+ different things and is often confused withSPARK_HOME
.spark.executor.home
- This is the Spark home that the executors will use. If this is not set, the Worker will use its own current working directory. This is not set by default.spark.driver.home
- Same as above, but for the driver. This is only relevant for standalone-cluster mode (not yet supported. See SPARK-2260).spark.test.home
- This is the Spark home used only for tests.Note: #1392 proposes part of the solution described here.