[SPARK-1870] Make spark-submit --jars work in yarn-cluster mode. #848

mengxr · 2014-05-21T18:58:34Z

Sent secondary jars to distributed cache of all containers and add the cached jars to classpath before executors start. Tested on a YARN cluster (CDH-5.0).

spark-submit --jars also works in standalone server and yarn-client. Thanks for @andrewor14 for testing!

I removed "Doesn't work for drivers in standalone mode with "cluster" deploy mode." from spark-submit's help message, though we haven't tested mesos yet.

CC: @dbtsai @sryza

AmplabJenkins · 2014-05-21T19:02:59Z

Merged build triggered.

AmplabJenkins · 2014-05-21T19:03:07Z

Merged build started.

AmplabJenkins · 2014-05-21T19:43:12Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-05-21T19:43:12Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15125/

dbtsai · 2014-05-21T20:47:31Z

yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala

@@ -479,37 +485,24 @@ object ClientBase {

    extraClassPath.foreach(addClasspathEntry)

-    addClasspathEntry(Environment.PWD.$())
+    val cachedSecondaryJarLinks =
+      sparkConf.getOption(CONF_SPARK_YARN_SECONDARY_JARS).getOrElse("").split(",")
    // Normally the users app.jar is last in case conflicts with spark jars
    if (sparkConf.get("spark.yarn.user.classpath.first", "false").toBoolean) {


What's difference between spark.yarn.user.classpath.first and spark.files.userClassPathFirst? For me, it seems to be the same thing with two different configuration.

PS, in line 47, * 1. In standalone mode, it will launch an [[org.apache.spark.deploy.yarn.ApplicationMaster]]
should it be cluster mode now?

spark.files.userClassPath is a global configuration that controls the ordering of dynamically added jars, while spark.yarn.user.classpath.first is only for YARN. I agree it is a little confusing, but this is independent of this PR. We can create a new JIRA for it.

I will update the doc. Thanks!

dbtsai · 2014-05-21T20:56:26Z

Thanks. It looks great for me, and better than my patch.

cachedSecondaryJarLinks.foreach(addPwdClasspathEntry) is not needed since we have
addPwdClasspathEntry("*"). But later, we may change the priority of the jars since we explicitly add them.

This patch also works for me.

mengxr · 2014-05-21T21:08:37Z

The symbolic links may not be under the PWD. That is why it didn't work before.

dbtsai · 2014-05-21T21:11:27Z

It works under driver before, so the major issue is those files are not in executor's distributed cache. But I like the idea to add them explicitly so we'll not miss anything.

mengxr · 2014-05-21T21:16:29Z

Yes, we can also control the ordering in this way.

mengxr · 2014-05-21T21:26:28Z

@dbtsai Could you backport the patch to branch-0.9 and test it on your cluster?

AmplabJenkins · 2014-05-21T21:27:59Z

Merged build triggered.

AmplabJenkins · 2014-05-21T21:28:08Z

Merged build started.

AmplabJenkins · 2014-05-21T22:29:40Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-05-21T22:29:41Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15128/

… confliction apped $CWD/ and $CWD/* to the classpath remove unused methods

AmplabJenkins · 2014-05-21T23:02:58Z

Merged build triggered.

AmplabJenkins · 2014-05-21T23:03:04Z

Merged build started.

mateiz · 2014-05-21T23:24:51Z

On standalone mode and Mesos, does this fix require the JARs to be accessible from the same URL on all nodes?

AmplabJenkins · 2014-05-22T00:06:55Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-05-22T00:06:55Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15133/

andrewor14 · 2014-05-22T01:01:23Z

This doesn't apply to standalone or Mesos. For these two modes (and all others except yarn-cluster), Spark submit translates --jars to spark.jars, then SparkContext uploads these jars to the HTTP server, and the executors pull from the server.

tdas · 2014-05-22T08:50:28Z

I independently tested this on Yarn 2.4 running in a VM where I could reproduce the problem. This change indeed allows Jars loaded with --jars to be accessible in executors. I am going to merge this. Thanks @mengxr for fixing this, and @andrewor14, @sryza and @dbtsai for helping out along the way!

@andrewor14

Sent secondary jars to distributed cache of all containers and add the cached jars to classpath before executors start. Tested on a YARN cluster (CDH-5.0). `spark-submit --jars` also works in standalone server and `yarn-client`. Thanks for @andrewor14 for testing! I removed "Doesn't work for drivers in standalone mode with "cluster" deploy mode." from `spark-submit`'s help message, though we haven't tested mesos yet. CC: @dbtsai @sryza Author: Xiangrui Meng <meng@databricks.com> Closes #848 from mengxr/yarn-classpath and squashes the following commits: 23e7df4 [Xiangrui Meng] rename spark.jar to __spark__.jar and app.jar to __app__.jar to avoid confliction apped $CWD/ and $CWD/* to the classpath remove unused methods a40f6ed [Xiangrui Meng] standalone -> cluster 65e04ad [Xiangrui Meng] update spark-submit help message and add a comment for yarn-client 11e5354 [Xiangrui Meng] minor changes 3e7e1c4 [Xiangrui Meng] use sparkConf instead of hadoop conf dc3c825 [Xiangrui Meng] add secondary jars to classpath in yarn (cherry picked from commit dba3140) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>

sryza · 2014-05-22T21:36:39Z

core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala

@@ -326,8 +326,7 @@ private[spark] class SparkSubmitArguments(args: Seq[String]) {
        |  --class CLASS_NAME          Your application's main class (for Java / Scala apps).
        |  --name NAME                 A name of your application.
        |  --jars JARS                 Comma-separated list of local jars to include on the driver
-        |                              and executor classpaths. Doesn't work for drivers in


Was there a reason for taking this out? My impression is that this still won't work on standalone with cluster deploy mode.

This should not have been taken out actually. It can be put back in. But we found out just now that the "cluster mode" of Spark Standalone cluster is sort of semi-broken with spark submit.

@andrewor14

Sent secondary jars to distributed cache of all containers and add the cached jars to classpath before executors start. Tested on a YARN cluster (CDH-5.0). `spark-submit --jars` also works in standalone server and `yarn-client`. Thanks for @andrewor14 for testing! I removed "Doesn't work for drivers in standalone mode with "cluster" deploy mode." from `spark-submit`'s help message, though we haven't tested mesos yet. CC: @dbtsai @sryza Author: Xiangrui Meng <meng@databricks.com> Closes apache#848 from mengxr/yarn-classpath and squashes the following commits: 23e7df4 [Xiangrui Meng] rename spark.jar to __spark__.jar and app.jar to __app__.jar to avoid confliction apped $CWD/ and $CWD/* to the classpath remove unused methods a40f6ed [Xiangrui Meng] standalone -> cluster 65e04ad [Xiangrui Meng] update spark-submit help message and add a comment for yarn-client 11e5354 [Xiangrui Meng] minor changes 3e7e1c4 [Xiangrui Meng] use sparkConf instead of hadoop conf dc3c825 [Xiangrui Meng] add secondary jars to classpath in yarn

Co-authored-by: Egor Krivokon <>

mengxr added 4 commits May 21, 2014 10:51

add secondary jars to classpath in yarn

dc3c825

use sparkConf instead of hadoop conf

3e7e1c4

minor changes

11e5354

update spark-submit help message and add a comment for yarn-client

65e04ad

dbtsai reviewed May 21, 2014
View reviewed changes

standalone -> cluster

a40f6ed

rename spark.jar to __spark__.jar and app.jar to __app__.jar to avoid…

23e7df4

… confliction apped $CWD/ and $CWD/* to the classpath remove unused methods

asfgit closed this in dba3140 May 22, 2014

sryza reviewed May 22, 2014
View reviewed changes

maropu mentioned this pull request Oct 2, 2020

[SPARK-32741][SQL][FOLLOWUP] Run plan integrity check only for effective plan changes #29928

Closed

Agirish pushed a commit to HPEEzmeral/apache-spark that referenced this pull request May 5, 2022

[EZSPA-212] Move creating of spark-env.sh script to Spark (apache#848)

37abe05

Co-authored-by: Egor Krivokon <>

udaynpusa pushed a commit to mapr/spark that referenced this pull request Jan 30, 2024

[EZSPA-212] Move creating of spark-env.sh script to Spark (apache#848)

65068ba

Co-authored-by: Egor Krivokon <>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-1870] Make spark-submit --jars work in yarn-cluster mode. #848

[SPARK-1870] Make spark-submit --jars work in yarn-cluster mode. #848

mengxr commented May 21, 2014

AmplabJenkins commented May 21, 2014

AmplabJenkins commented May 21, 2014

AmplabJenkins commented May 21, 2014

AmplabJenkins commented May 21, 2014

dbtsai May 21, 2014

dbtsai May 21, 2014

mengxr May 21, 2014

mengxr May 21, 2014

dbtsai commented May 21, 2014

mengxr commented May 21, 2014

dbtsai commented May 21, 2014

mengxr commented May 21, 2014

mengxr commented May 21, 2014

AmplabJenkins commented May 21, 2014

AmplabJenkins commented May 21, 2014

AmplabJenkins commented May 21, 2014

AmplabJenkins commented May 21, 2014

AmplabJenkins commented May 21, 2014

AmplabJenkins commented May 21, 2014

mateiz commented May 21, 2014

AmplabJenkins commented May 22, 2014

AmplabJenkins commented May 22, 2014

andrewor14 commented May 22, 2014

tdas commented May 22, 2014

sryza May 22, 2014

tdas May 22, 2014

[SPARK-1870] Make spark-submit --jars work in yarn-cluster mode. #848

[SPARK-1870] Make spark-submit --jars work in yarn-cluster mode. #848

Conversation

mengxr commented May 21, 2014

AmplabJenkins commented May 21, 2014

AmplabJenkins commented May 21, 2014

AmplabJenkins commented May 21, 2014

AmplabJenkins commented May 21, 2014

dbtsai May 21, 2014

Choose a reason for hiding this comment

dbtsai May 21, 2014

Choose a reason for hiding this comment

mengxr May 21, 2014

Choose a reason for hiding this comment

mengxr May 21, 2014

Choose a reason for hiding this comment

dbtsai commented May 21, 2014

mengxr commented May 21, 2014

dbtsai commented May 21, 2014

mengxr commented May 21, 2014

mengxr commented May 21, 2014

AmplabJenkins commented May 21, 2014

AmplabJenkins commented May 21, 2014

AmplabJenkins commented May 21, 2014

AmplabJenkins commented May 21, 2014

AmplabJenkins commented May 21, 2014

AmplabJenkins commented May 21, 2014

mateiz commented May 21, 2014

AmplabJenkins commented May 22, 2014

AmplabJenkins commented May 22, 2014

andrewor14 commented May 22, 2014

tdas commented May 22, 2014

sryza May 22, 2014

Choose a reason for hiding this comment

tdas May 22, 2014

Choose a reason for hiding this comment