[SPARK-6869][PySpark] Add pyspark archives path to PYTHONPATH #5580

lianhuiwang · 2015-04-19T14:31:39Z

Based on #5478 that provide a PYSPARK_ARCHIVES_PATH env. within this PR, we just should export PYSPARK_ARCHIVES_PATH=/user/spark/pyspark.zip,/user/spark/python/lib/py4j-0.8.2.1-src.zip in conf/spark-env.sh when we don't install PySpark on each node of Yarn. i run python application successfully on yarn-client and yarn-cluster with this PR.
@andrewor14 @sryza @Sephiroth-Lin Can you take a look at this?thanks.

SparkQA · 2015-04-19T14:33:49Z

Test build #30559 has started for PR 5580 at commit 9f31dac.

SparkQA · 2015-04-19T14:38:34Z

Test build #30560 has started for PR 5580 at commit 31e8e06.

SparkQA · 2015-04-19T16:20:49Z

Test build #30559 has finished for PR 5580 at commit 9f31dac.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.
This patch does not change any dependencies.

AmplabJenkins · 2015-04-19T16:20:54Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30559/
Test PASSed.

SparkQA · 2015-04-19T16:24:10Z

Test build #30560 has finished for PR 5580 at commit 31e8e06.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.
This patch does not change any dependencies.

AmplabJenkins · 2015-04-19T16:24:15Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30560/
Test PASSed.

andrewor14 · 2015-04-20T23:50:08Z

core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala

+    // In yarn mode for a python app, if PYSPARK_ARCHIVES_PATH is in the user environment
+    // add pyspark archives to files that can be distributed with the job
+    if (args.isPython && clusterManager == YARN){
+      sys.env.get("PYSPARK_ARCHIVES_PATH").map { archives =>


Does the user set this himself? What does he set it too?

spark administrator can set pyspark's zip of local or hdfs path to PYSPARK_ARCHIVES_PATH in spark-env.sh. and then user run python application as before.

andrewor14 · 2015-04-20T23:52:37Z

@lianhuiwang Two high level comments. First, why just put it on --py-files? If it works as expected, then it will automatically add it to the executor's PYTHONPATHs without you having to do it manually as you have done in Client.scala. Second, this seems to still require some action on the user's part. They must manually zip up all the python archives themselves and put it on the PYSPARK_ARCHIVES_PATH. I would propose that we do this automatically for the user behind the scenes using Java's ZipEntry APIs.

lianhuiwang · 2015-04-21T01:46:55Z

@andrewor14 First question is why not just put it on --py-files?if we donot set PYTHONPATH to executor in Client, Executor will throw a exception "/usr/bin/python: No module named pyspark" because executor can not use --py-files. so we also need to set PYTHONPATH environment in Client that can make executor run with pyspark module.
for second question, now it require to set PYSPARK_ARCHIVES_PATH in spark-env.sh like hadoop_home enviroment. why need this setting?because if spark_home or PYSPARKPATH is installed on local path of every node, that do not need to set PYSPARK_ARCHIVES_PATH. but we cannot know whether SPARK_HOME or PYSPARKPATH is installed. so i think administrator can know that.
also i agree that we need to zip archives when assembly spark's jar.
@andrewor14 how about your opinions? thanks.

lianhuiwang · 2015-04-26T11:56:35Z

@andrewor14 for second question,i add two things for it.one is i add zip pyspark archives to pyspark/lib when we build spark jar. other is in submit if PYSPARK_ARCHIVES_PATH does not exist and pyspark.zip does not exist, then we zip archives to pyspark/lib.
and i add a conf 'spark.submit.pyArchives' to store pyspark archives that let Client know pyArchives.we cannot use PYSPARK_ARCHIVES_PATH env because Client and spark-submit is one process and when we set PYSPARK_ARCHIVES_PATH in submit, Client cannot get it.
One thing to note that if we install PYSPARKPATH on every node, now we need to set local pyspark archives to PYSPARK_ARCHIVES_PATH. because spark-submit will check whether PYSPARK_ARCHIVES_PATH exists.

lianhuiwang · 2015-04-26T11:59:34Z

@tgravescs i think this PR is useful for you. you can try it.

tgravescs · 2015-04-27T14:21:55Z

core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala

+        for (sparkHome <- sys.env.get("SPARK_HOME")) {
+          val pyLibPath = Seq(sparkHome, "python", "lib").mkString(File.separator)
+          val pyArchivesFile = new File(pyLibPath, "pyspark.zip")
+          if (!pyArchivesFile.exists()) {


I think if we just make sure the zip is built during the build then we don't need to do the zip in the code. Just require it already there.

i think that is no effect. maybe sometime we upgrade just with coping spark.jar. so that is good for this situation.

I'm not sure I follow, if you just upgrade spark.jar then there are no change to the python scripts so you don't need to put new pyspark.zip. If there are changes then you either need to copy over the new python scripts or put a new pyspark.zip on there. It seems putting new pyspark.zip on there would be easier. Although I guess you need the python scripts there anyway for client mode so you probably need both.

In many cases I wouldn't expect a user to have write permissions on the python/lib directory. I would expect that to be a privileged operation. In that case the zip would fail.

yes, i agree with you. thanks

lianhuiwang · 2015-04-27T17:36:32Z

@tgravescs yes, i agree with your comments and have updated it. can you review it again? thanks.

SparkQA · 2015-04-27T18:19:03Z

Test build #31000 has started for PR 5580 at commit 5192cca.

Sephiroth-Lin · 2015-04-29T08:14:43Z

If user don't use make-distribution.sh and just compile Spark use maven or sbt, then don't have pyspark.zip. So we really don't need to do the zip in the code?

tgravescs · 2015-04-29T14:13:35Z

@lianhuiwang like mentioned can we have the zip happen during the package phase rather then in make-distribution.sh?

vanzin · 2015-04-29T18:34:46Z

core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala

+    if (args.isPython && clusterManager == YARN) {
+      var pyArchives: String = null
+      if (sys.env.contains("PYSPARK_ARCHIVES_PATH")) {
+        pyArchives = sys.env.get("PYSPARK_ARCHIVES_PATH").get


sys.env("PYSPARK_ARCHIVES_PATH")? Or even:

sys.env.get("PYSPARK_ARCHIVES_PATH").getOrElse( // code to figure out where the archives are )

i have updated it using option, because in getOrElse(default),default must be a value, can not be expression.

Actually that's not true, getOrElse takes a closure:

def getOrElse[B >: A](default: ⇒ B): B

Which is why you can do getOrElse(throw SomeException()) (look for it in Spark's code base).

tgravescs · 2015-05-06T17:54:32Z

I tested this out and this is working fine for me with jdk7 for both cluster and client mode. Code looks good. My only comment is that I think we can now stop packaging the python stuff in the assembly jar.

@lianhuiwang have you looked at removing those? If its not much works it would be best to do it here. If it looks like quite a bit we can file another jira for this. I would like to get this into 1.4 based on the discussions of ending support for jdk6.

tgravescs · 2015-05-08T13:10:44Z

having heard nothing from @lianhuiwang I'm going to commit this and file a followup to remove python files from the assembly jar.

Based on #5478 that provide a PYSPARK_ARCHIVES_PATH env. within this PR, we just should export PYSPARK_ARCHIVES_PATH=/user/spark/pyspark.zip,/user/spark/python/lib/py4j-0.8.2.1-src.zip in conf/spark-env.sh when we don't install PySpark on each node of Yarn. i run python application successfully on yarn-client and yarn-cluster with this PR. andrewor14 sryza Sephiroth-Lin Can you take a look at this?thanks. Author: Lianhui Wang <lianhuiwang09@gmail.com> Closes #5580 from lianhuiwang/SPARK-6869 and squashes the following commits: 66ffa43 [Lianhui Wang] Update Client.scala c2ad0f9 [Lianhui Wang] Update Client.scala 1c8f664 [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869 008850a [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869 f0b4ed8 [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869 150907b [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869 20402cd [Lianhui Wang] use ZipEntry 9d87c3f [Lianhui Wang] update scala style e7bd971 [Lianhui Wang] address vanzin's comments 4b8a3ed [Lianhui Wang] use pyArchivesEnvOpt e6b573b [Lianhui Wang] address vanzin's comments f11f84a [Lianhui Wang] zip pyspark archives 5192cca [Lianhui Wang] update import path 3b1e4c8 [Lianhui Wang] address tgravescs's comments 9396346 [Lianhui Wang] put zip to make-distribution.sh 0d2baf7 [Lianhui Wang] update import paths e0179be [Lianhui Wang] add zip pyspark archives in build or sparksubmit 31e8e06 [Lianhui Wang] update code style 9f31dac [Lianhui Wang] update code and add comments f72987c [Lianhui Wang] add archives path to PYTHONPATH (cherry picked from commit ebff732) Signed-off-by: Thomas Graves <tgraves@apache.org>

tgravescs · 2015-05-08T13:50:10Z

filed https://issues.apache.org/jira/browse/SPARK-7485

Add `python/lib/pyspark.zip` to `.gitignore`. After merging #5580, `python/lib/pyspark.zip` will be generated when building Spark. Author: zsxwing <zsxwing@gmail.com> Closes #6017 from zsxwing/gitignore and squashes the following commits: 39b10c4 [zsxwing] Ignore python/lib/pyspark.zip

Add `python/lib/pyspark.zip` to `.gitignore`. After merging #5580, `python/lib/pyspark.zip` will be generated when building Spark. Author: zsxwing <zsxwing@gmail.com> Closes #6017 from zsxwing/gitignore and squashes the following commits: 39b10c4 [zsxwing] Ignore python/lib/pyspark.zip (cherry picked from commit dc71e47) Signed-off-by: Andrew Or <andrew@databricks.com>

lianhuiwang · 2015-05-10T14:54:30Z

@tgravescs sorry for my late reply. i think #6022 is working for SPARK-7485.

…n python/pyspark As PR #5580 we have created pyspark.zip on building and set PYTHONPATH to python/lib/pyspark.zip, so to keep consistence update this. Author: linweizhong <linweizhong@huawei.com> Closes #6047 from Sephiroth-Lin/pyspark_pythonpath and squashes the following commits: 8cc3d96 [linweizhong] Set PYTHONPATH to python/lib/pyspark.zip rather than python/pyspark as PR#5580 we have create pyspark.zip on build

…n python/pyspark As PR #5580 we have created pyspark.zip on building and set PYTHONPATH to python/lib/pyspark.zip, so to keep consistence update this. Author: linweizhong <linweizhong@huawei.com> Closes #6047 from Sephiroth-Lin/pyspark_pythonpath and squashes the following commits: 8cc3d96 [linweizhong] Set PYTHONPATH to python/lib/pyspark.zip rather than python/pyspark as PR#5580 we have create pyspark.zip on build (cherry picked from commit 9847875) Signed-off-by: Andrew Or <andrew@databricks.com>

Based on apache#5478 that provide a PYSPARK_ARCHIVES_PATH env. within this PR, we just should export PYSPARK_ARCHIVES_PATH=/user/spark/pyspark.zip,/user/spark/python/lib/py4j-0.8.2.1-src.zip in conf/spark-env.sh when we don't install PySpark on each node of Yarn. i run python application successfully on yarn-client and yarn-cluster with this PR. andrewor14 sryza Sephiroth-Lin Can you take a look at this?thanks. Author: Lianhui Wang <lianhuiwang09@gmail.com> Closes apache#5580 from lianhuiwang/SPARK-6869 and squashes the following commits: 66ffa43 [Lianhui Wang] Update Client.scala c2ad0f9 [Lianhui Wang] Update Client.scala 1c8f664 [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869 008850a [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869 f0b4ed8 [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869 150907b [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869 20402cd [Lianhui Wang] use ZipEntry 9d87c3f [Lianhui Wang] update scala style e7bd971 [Lianhui Wang] address vanzin's comments 4b8a3ed [Lianhui Wang] use pyArchivesEnvOpt e6b573b [Lianhui Wang] address vanzin's comments f11f84a [Lianhui Wang] zip pyspark archives 5192cca [Lianhui Wang] update import path 3b1e4c8 [Lianhui Wang] address tgravescs's comments 9396346 [Lianhui Wang] put zip to make-distribution.sh 0d2baf7 [Lianhui Wang] update import paths e0179be [Lianhui Wang] add zip pyspark archives in build or sparksubmit 31e8e06 [Lianhui Wang] update code style 9f31dac [Lianhui Wang] update code and add comments f72987c [Lianhui Wang] add archives path to PYTHONPATH

Add `python/lib/pyspark.zip` to `.gitignore`. After merging apache#5580, `python/lib/pyspark.zip` will be generated when building Spark. Author: zsxwing <zsxwing@gmail.com> Closes apache#6017 from zsxwing/gitignore and squashes the following commits: 39b10c4 [zsxwing] Ignore python/lib/pyspark.zip

…n python/pyspark As PR apache#5580 we have created pyspark.zip on building and set PYTHONPATH to python/lib/pyspark.zip, so to keep consistence update this. Author: linweizhong <linweizhong@huawei.com> Closes apache#6047 from Sephiroth-Lin/pyspark_pythonpath and squashes the following commits: 8cc3d96 [linweizhong] Set PYTHONPATH to python/lib/pyspark.zip rather than python/pyspark as PR#5580 we have create pyspark.zip on build

Based on apache#5478 that provide a PYSPARK_ARCHIVES_PATH env. within this PR, we just should export PYSPARK_ARCHIVES_PATH=/user/spark/pyspark.zip,/user/spark/python/lib/py4j-0.8.2.1-src.zip in conf/spark-env.sh when we don't install PySpark on each node of Yarn. i run python application successfully on yarn-client and yarn-cluster with this PR. andrewor14 sryza Sephiroth-Lin Can you take a look at this?thanks. Author: Lianhui Wang <lianhuiwang09@gmail.com> Closes apache#5580 from lianhuiwang/SPARK-6869 and squashes the following commits: 66ffa43 [Lianhui Wang] Update Client.scala c2ad0f9 [Lianhui Wang] Update Client.scala 1c8f664 [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869 008850a [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869 f0b4ed8 [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869 150907b [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869 20402cd [Lianhui Wang] use ZipEntry 9d87c3f [Lianhui Wang] update scala style e7bd971 [Lianhui Wang] address vanzin's comments 4b8a3ed [Lianhui Wang] use pyArchivesEnvOpt e6b573b [Lianhui Wang] address vanzin's comments f11f84a [Lianhui Wang] zip pyspark archives 5192cca [Lianhui Wang] update import path 3b1e4c8 [Lianhui Wang] address tgravescs's comments 9396346 [Lianhui Wang] put zip to make-distribution.sh 0d2baf7 [Lianhui Wang] update import paths e0179be [Lianhui Wang] add zip pyspark archives in build or sparksubmit 31e8e06 [Lianhui Wang] update code style 9f31dac [Lianhui Wang] update code and add comments f72987c [Lianhui Wang] add archives path to PYTHONPATH

Add `python/lib/pyspark.zip` to `.gitignore`. After merging apache#5580, `python/lib/pyspark.zip` will be generated when building Spark. Author: zsxwing <zsxwing@gmail.com> Closes apache#6017 from zsxwing/gitignore and squashes the following commits: 39b10c4 [zsxwing] Ignore python/lib/pyspark.zip

…n python/pyspark As PR apache#5580 we have created pyspark.zip on building and set PYTHONPATH to python/lib/pyspark.zip, so to keep consistence update this. Author: linweizhong <linweizhong@huawei.com> Closes apache#6047 from Sephiroth-Lin/pyspark_pythonpath and squashes the following commits: 8cc3d96 [linweizhong] Set PYTHONPATH to python/lib/pyspark.zip rather than python/pyspark as PR#5580 we have create pyspark.zip on build

Based on apache#5478 that provide a PYSPARK_ARCHIVES_PATH env. within this PR, we just should export PYSPARK_ARCHIVES_PATH=/user/spark/pyspark.zip,/user/spark/python/lib/py4j-0.8.2.1-src.zip in conf/spark-env.sh when we don't install PySpark on each node of Yarn. i run python application successfully on yarn-client and yarn-cluster with this PR. andrewor14 sryza Sephiroth-Lin Can you take a look at this?thanks. Author: Lianhui Wang <lianhuiwang09@gmail.com> Closes apache#5580 from lianhuiwang/SPARK-6869 and squashes the following commits: 66ffa43 [Lianhui Wang] Update Client.scala c2ad0f9 [Lianhui Wang] Update Client.scala 1c8f664 [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869 008850a [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869 f0b4ed8 [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869 150907b [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869 20402cd [Lianhui Wang] use ZipEntry 9d87c3f [Lianhui Wang] update scala style e7bd971 [Lianhui Wang] address vanzin's comments 4b8a3ed [Lianhui Wang] use pyArchivesEnvOpt e6b573b [Lianhui Wang] address vanzin's comments f11f84a [Lianhui Wang] zip pyspark archives 5192cca [Lianhui Wang] update import path 3b1e4c8 [Lianhui Wang] address tgravescs's comments 9396346 [Lianhui Wang] put zip to make-distribution.sh 0d2baf7 [Lianhui Wang] update import paths e0179be [Lianhui Wang] add zip pyspark archives in build or sparksubmit 31e8e06 [Lianhui Wang] update code style 9f31dac [Lianhui Wang] update code and add comments f72987c [Lianhui Wang] add archives path to PYTHONPATH

Add `python/lib/pyspark.zip` to `.gitignore`. After merging apache#5580, `python/lib/pyspark.zip` will be generated when building Spark. Author: zsxwing <zsxwing@gmail.com> Closes apache#6017 from zsxwing/gitignore and squashes the following commits: 39b10c4 [zsxwing] Ignore python/lib/pyspark.zip

…n python/pyspark As PR apache#5580 we have created pyspark.zip on building and set PYTHONPATH to python/lib/pyspark.zip, so to keep consistence update this. Author: linweizhong <linweizhong@huawei.com> Closes apache#6047 from Sephiroth-Lin/pyspark_pythonpath and squashes the following commits: 8cc3d96 [linweizhong] Set PYTHONPATH to python/lib/pyspark.zip rather than python/pyspark as PR#5580 we have create pyspark.zip on build

…very yarn node - Followed spark's way to support pyspark - https://issues.apache.org/jira/browse/SPARK-6869 - apache/spark#5580 - https://github.com/apache/spark/pull/5478/files

…very yarn node - Spark supports pyspark on yarn cluster without deploying python libraries from Spark 1.4 - https://issues.apache.org/jira/browse/SPARK-6869 - apache/spark#5580, apache/spark#5478 Author: Jongyoul Lee <jongyoul@gmail.com> Closes #118 from jongyoul/ZEPPELIN-18 and squashes the following commits: a47e27c [Jongyoul Lee] - Fixed test script for spark 1.4.0 72a65fd [Jongyoul Lee] - Fixed test script for spark 1.4.0 ee6d100 [Jongyoul Lee] - Cleanup codes 47fd9c9 [Jongyoul Lee] - Cleanup codes 248e330 [Jongyoul Lee] - Cleanup codes 4cd10b5 [Jongyoul Lee] - Removed meaningless codes comments c9cda29 [Jongyoul Lee] - Removed setting SPARK_HOME - Changed the location of pyspark's directory into interpreter/spark ef240f5 [Jongyoul Lee] - Fixed typo 06002fd [Jongyoul Lee] - Fixed typo 4b35c8d [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - Dummy for trigger 682986e [Jongyoul Lee] rebased 8a7bf47 [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - rebasing ad610fb [Jongyoul Lee] rebased 94bdf30 [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - Fixed checkstyle 929333d [Jongyoul Lee] rebased 64b8195 [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - rebasing 0a2d90e [Jongyoul Lee] rebased b05ae6e [Jongyoul Lee] [ZEPPELIN-18] Remove setting SPARK_HOME for PySpark - Excludes python/** from apache-rat 71e2a92 [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - Removed verbose setting 0ddb436 [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - Followed spark's way to support pyspark - https://issues.apache.org/jira/browse/SPARK-6869 - apache/spark#5580 - https://github.com/apache/spark/pull/5478/files 1b192f6 [Jongyoul Lee] [ZEPPELIN-18] Remove setting SPARK_HOME for PySpark - Removed redundant dependency setting 32fd9e1 [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - rebasing

…very yarn node - Spark supports pyspark on yarn cluster without deploying python libraries from Spark 1.4 - https://issues.apache.org/jira/browse/SPARK-6869 - apache/spark#5580, apache/spark#5478 Author: Jongyoul Lee <jongyoul@gmail.com> Closes #118 from jongyoul/ZEPPELIN-18 and squashes the following commits: a47e27c [Jongyoul Lee] - Fixed test script for spark 1.4.0 72a65fd [Jongyoul Lee] - Fixed test script for spark 1.4.0 ee6d100 [Jongyoul Lee] - Cleanup codes 47fd9c9 [Jongyoul Lee] - Cleanup codes 248e330 [Jongyoul Lee] - Cleanup codes 4cd10b5 [Jongyoul Lee] - Removed meaningless codes comments c9cda29 [Jongyoul Lee] - Removed setting SPARK_HOME - Changed the location of pyspark's directory into interpreter/spark ef240f5 [Jongyoul Lee] - Fixed typo 06002fd [Jongyoul Lee] - Fixed typo 4b35c8d [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - Dummy for trigger 682986e [Jongyoul Lee] rebased 8a7bf47 [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - rebasing ad610fb [Jongyoul Lee] rebased 94bdf30 [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - Fixed checkstyle 929333d [Jongyoul Lee] rebased 64b8195 [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - rebasing 0a2d90e [Jongyoul Lee] rebased b05ae6e [Jongyoul Lee] [ZEPPELIN-18] Remove setting SPARK_HOME for PySpark - Excludes python/** from apache-rat 71e2a92 [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - Removed verbose setting 0ddb436 [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - Followed spark's way to support pyspark - https://issues.apache.org/jira/browse/SPARK-6869 - apache/spark#5580 - https://github.com/apache/spark/pull/5478/files 1b192f6 [Jongyoul Lee] [ZEPPELIN-18] Remove setting SPARK_HOME for PySpark - Removed redundant dependency setting 32fd9e1 [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - rebasing (cherry picked from commit 3bd2b21) Signed-off-by: Lee moon soo <moon@apache.org>

lianhuiwang added 2 commits April 17, 2015 00:15

add archives path to PYTHONPATH

f72987c

update code and add comments

9f31dac

update code style

31e8e06

andrewor14 reviewed Apr 20, 2015
View reviewed changes

add zip pyspark archives in build or sparksubmit

e0179be

update import paths

0d2baf7

tgravescs reviewed Apr 27, 2015
View reviewed changes

tgravescs mentioned this pull request Apr 27, 2015

[SPARK-6869][PySpark] Add pyspark archives path to PYTHONPATH #5478

Closed

lianhuiwang added 3 commits April 28, 2015 00:19

put zip to make-distribution.sh

9396346

address tgravescs's comments

3b1e4c8

update import path

5192cca

vanzin reviewed Apr 29, 2015
View reviewed changes

asfgit closed this in ebff732 May 8, 2015

zsxwing mentioned this pull request May 8, 2015

[Minor] Ignore python/lib/pyspark.zip #6017

Closed

Sephiroth-Lin mentioned this pull request May 11, 2015

[Minor][PySpark] Set PYTHONPATH to python/lib/pyspark.zip rather than python/pyspark #6047

Closed

jongyoul mentioned this pull request Jun 24, 2015

[ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node apache/zeppelin#118

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-6869][PySpark] Add pyspark archives path to PYTHONPATH #5580

[SPARK-6869][PySpark] Add pyspark archives path to PYTHONPATH #5580

lianhuiwang commented Apr 19, 2015

SparkQA commented Apr 19, 2015

SparkQA commented Apr 19, 2015

SparkQA commented Apr 19, 2015

AmplabJenkins commented Apr 19, 2015

SparkQA commented Apr 19, 2015

AmplabJenkins commented Apr 19, 2015

andrewor14 Apr 20, 2015

lianhuiwang Apr 21, 2015

andrewor14 commented Apr 20, 2015

lianhuiwang commented Apr 21, 2015

lianhuiwang commented Apr 26, 2015

lianhuiwang commented Apr 26, 2015

tgravescs Apr 27, 2015

lianhuiwang Apr 27, 2015

tgravescs Apr 27, 2015

lianhuiwang Apr 27, 2015

lianhuiwang commented Apr 27, 2015

SparkQA commented Apr 27, 2015

Sephiroth-Lin commented Apr 29, 2015

tgravescs commented Apr 29, 2015

vanzin Apr 29, 2015

lianhuiwang Apr 29, 2015

vanzin Apr 29, 2015

tgravescs commented May 6, 2015

tgravescs commented May 8, 2015

tgravescs commented May 8, 2015

lianhuiwang commented May 10, 2015

[SPARK-6869][PySpark] Add pyspark archives path to PYTHONPATH #5580

[SPARK-6869][PySpark] Add pyspark archives path to PYTHONPATH #5580

Conversation

lianhuiwang commented Apr 19, 2015

SparkQA commented Apr 19, 2015

SparkQA commented Apr 19, 2015

SparkQA commented Apr 19, 2015

AmplabJenkins commented Apr 19, 2015

SparkQA commented Apr 19, 2015

AmplabJenkins commented Apr 19, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewor14 commented Apr 20, 2015

lianhuiwang commented Apr 21, 2015

lianhuiwang commented Apr 26, 2015

lianhuiwang commented Apr 26, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lianhuiwang commented Apr 27, 2015

SparkQA commented Apr 27, 2015

Sephiroth-Lin commented Apr 29, 2015

tgravescs commented Apr 29, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tgravescs commented May 6, 2015

tgravescs commented May 8, 2015

tgravescs commented May 8, 2015

lianhuiwang commented May 10, 2015