Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-6869][PySpark] Add pyspark archives path to PYTHONPATH #5478

Closed
wants to merge 11 commits into from

Conversation

Sephiroth-Lin
Copy link
Contributor

From SPARK-1920 and SPARK-1520 we know PySpark on Yarn can not work when the assembly jar are package by JDK 1.7+, so ship pyspark archives to executors by Yarn.
Usage:

  1. Do nothing, then will zip pyspark auto, and ship pyspark.zip, py4j.zip to executors;
  2. Set PYSPARK_ARCHIVES_PATH
  1. ship archives to executor, then set PYSPARK_ARCHIVES_PATH=/you/path/pyspark.zip,/you/path/py4j.zip
  2. don't ship archives, use locally, then set PYSPARK_ARCHIVES_PATH=local:///you/path/pyspark.zip,local:///you/path/py4j.zip

@SparkQA
Copy link

SparkQA commented Apr 12, 2015

Test build #30111 has finished for PR 5478 at commit 413fa25.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@andrewor14
Copy link
Contributor

@Sephiroth-Lin the point of running PySpark on YARN is that the user does not have to install Spark on the slave machines. Instead, we package the python files in the assembly jar, which is automatically shipped by YARN to all containers.

This change assumes that the python files will already be present on the slave machines, since PYTHONPATH reads from the local file system. I don't believe this is a deployment requirement that we want to enforce, especially since the user must now ensure all Spark python files are consistent across all the machines (as they must do so in standalone mode).

I would recommend that we close this issue since this isn't a feature we want to support.

@sryza
Copy link
Contributor

sryza commented Apr 14, 2015

IIUC, the motivation for this change is that the assembly jar distribution mechanism doesn't work for some Java versions.

I agree with Andrew that, if at all possible, we should avoid deployment models that expect PySpark or anything to be on every node. Even if we advise against it, it increases the number of places one needs to check when debugging why something does or does not appear on the executor PYTHONPATH.

Are there workarounds for the Java versions issue that don't require python to be installed on the NodeManagers?

@Sephiroth-Lin
Copy link
Contributor Author

@andrewor14 @sryza Yes, to assume that the python files will already be present on the slave machines is not very reasonable. But if user want to use PySpark, then they must compile the Spark in JDK1.6, but I think now most user are use JDK1.7+. Maybe a good solution is package the PySpark in another jar and automatically shipped by YARN to all containers. And add this jar to PYTHONPATH with asseambly jar.

@sryza
Copy link
Contributor

sryza commented Apr 15, 2015

That alternative solution makes sense to me. If it's not going to be added to the classpath, it might make more sense to use a zip than a jar.

@andrewor14
Copy link
Contributor

That sounds good. We can ship it through --py-files. @Sephiroth-Lin Would you mind updating the PR and the JIRA to reflect this change in intention?

@Sephiroth-Lin Sephiroth-Lin changed the title [SPARK-6869][PySpark] Pass PYTHONPATH to executor, so that executor can read pyspark file from local file system on executor node [SPARK-6869][PySpark] Add pyspark archives path to PYTHONPATH Apr 16, 2015
@SparkQA
Copy link

SparkQA commented Apr 16, 2015

Test build #30407 has finished for PR 5478 at commit 51ebb26.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@Sephiroth-Lin
Copy link
Contributor Author

@andrewor14 @sryza Done, thanks.

@WangTaoTheTonic
Copy link
Contributor

Hey guys, after discussion with @Sephiroth-Lin offline, we have some questions:

  • If we try to ship the zip file to executer backend with --py-files:
    • User must use --py-file to point to the xxx.zip file we provide;
    • We must change format of directories(use xxx.zip to replace python dir) in our release.
    • Even the zip file is not very big, shipping it everytime still cost time.
  • Inspired by the former solution, I thought we might add some config like spark.exectuor.extraClassPath in PySpark side, which points to the files needed by user's application so that user could use them without shipping.

@SparkQA
Copy link

SparkQA commented Apr 16, 2015

Test build #30415 has finished for PR 5478 at commit 052e288.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 16, 2015

Test build #30414 has finished for PR 5478 at commit 309679a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@Sephiroth-Lin
Copy link
Contributor Author

@andrewor14 @sryza @WangTaoTheTonic As I have test again, if we install Spark on each node, then we can set spark.executorEnv.PYTHONPATH=${SPARK_HOME}/python:${SPARK_HOME}/python/lib/py4j-0.8.2.1-src.zip to pass PYTHONPATH to executor. So this PR is another solution to run PySpark on yan if we don't install Spark on each node.

@WangTaoTheTonic
Copy link
Contributor

Yeah, spark.executorEnv.PYTHONPATH is PySpark's spark.executor.extraClassPath in some way.

@SparkQA
Copy link

SparkQA commented Apr 16, 2015

Test build #30417 has finished for PR 5478 at commit 3a0ec77.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@lianhuiwang
Copy link
Contributor

yes, i think in SparkSubmit we can automatically add PYSPARK_ARCHIVES_PATH to dist files. and then in Client and ExecutorRunnable can set PYTHONPATH according to PYSPARK_ARCHIVES_PATH if it exist. and if user set PYTHONPATH, PYSPARK_ARCHIVES_PATH is unused. i have run successfully on yarn client and cluster.

@sryza
Copy link
Contributor

sryza commented Apr 16, 2015

Regarding the performance issue, this can be solved with the YARN distributed cache in the same way it works for the Spark assembly jar. If the file is placed on HDFS in a public location, it will be cached on the nodes as a public YARN local resource so it doesn't need to be downloaded each time an app is submitted.

@vanzin
Copy link
Contributor

vanzin commented Apr 16, 2015

if we install Spark on each node, then we can set spark.executorEnv.PYTHONPATH

That's a workaround, but not really in the spirit of how Spark-on-YARN is expected to work. For example, you'd have to have all nodes have $SPARK_HOME be the same for that to work.

If this must be distributed as a separate file, then Sandy's solution is the way to go. The pyspark zip needs to be treated the same way the spark assembly is: if no configuration, find it locally and upload it to nodes using the distributed cache; add a config option so that users can store that file on HDFS or even use local: URIs in case they want to manually distribute the file.

@andrewor14
Copy link
Contributor

Also, how big is the actual zip? I would imagine that it's at least one or two orders of magnitude smaller than the assembly jar, so it shouldn't be expensive especially if we cache it. As others have pointed out, the whole point of Spark on YARN is that the user doesn't need to install it on every node, and doing this through spark.executorEnv.PYTHONPATH defeats this purpose.

@lianhuiwang
Copy link
Contributor

@sryza we can export PYSPARK_ARCHIVES_PATH=local://xx/pyspark.zip;local://xx/py4j.zip in spark-env.sh and we also can export PYSPARK_ARCHIVES_PATH=hdfs://xx/pyspaark.zip. then in SparkSubmit we can automatically add PYSPARK_ARCHIVES_PATH to yarn's dist files. and then Spark-on-Yarn can put dist files to YARN distributed cache. these work is same as Spark assembly jar.
@andrewor14 in my test, the pyspark.zip is 378KB, i think it is very small than the assembly jar.so it can put it to dist files automatically.

@SparkQA
Copy link

SparkQA commented Apr 17, 2015

Test build #30462 has finished for PR 5478 at commit 547fd95.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 17, 2015

Test build #30478 has finished for PR 5478 at commit c63f31f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@andrewor14
Copy link
Contributor

@Sephiroth-Lin I looked at the latest changes and it doesn't seem to reflect what I had in mind. There is still a manual step in which the user still needs to package the zip themselves and put it in a certain environment variable or config.

My proposal was to automatically zip up the python files in SparkSubmit and add it to the --py-files. If you do that then I believe all the PYTHONPATH will just be automatically set up as well. You can take a look at Java's ZipEntry or this post as a reference on how to do this.

@lianhuiwang
Copy link
Contributor

@Sephiroth-Lin i think later i will submit my PR based on this PR and then please help me review it. thanks.

@Sephiroth-Lin
Copy link
Contributor Author

@lianhuiwang OK.

@andrewor14
Copy link
Contributor

@Sephiroth-Lin would you mind closing this PR then?

@Sephiroth-Lin
Copy link
Contributor Author

@andrewor14 Sorry, these days I am busy, now I have update the code. ^-^

@SparkQA
Copy link

SparkQA commented Apr 22, 2015

Test build #30747 has finished for PR 5478 at commit d012cde.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.
  • This patch adds the following new dependencies:
    • commons-math3-3.1.1.jar
    • snappy-java-1.1.1.6.jar
  • This patch removes the following dependencies:
    • commons-math3-3.4.1.jar
    • snappy-java-1.1.1.7.jar

@SparkQA
Copy link

SparkQA commented Apr 22, 2015

Test build #30748 has finished for PR 5478 at commit 5d9bcb6.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.
  • This patch adds the following new dependencies:
    • commons-math3-3.1.1.jar
    • snappy-java-1.1.1.6.jar
  • This patch removes the following dependencies:
    • commons-math3-3.4.1.jar
    • snappy-java-1.1.1.7.jar

@Sephiroth-Lin
Copy link
Contributor Author

@andrewor14 @sryza how about your opinions? thanks. @lianhuiwang please help me review this, thanks.

@tgravescs
Copy link
Contributor

so is this competing directly with #5580?

@Sephiroth-Lin
Copy link
Contributor Author

@tgravescs yes

@asfgit asfgit closed this in 8dee274 Apr 29, 2015
asfgit pushed a commit that referenced this pull request May 8, 2015
Based on #5478 that provide a PYSPARK_ARCHIVES_PATH env. within this PR, we just should export PYSPARK_ARCHIVES_PATH=/user/spark/pyspark.zip,/user/spark/python/lib/py4j-0.8.2.1-src.zip in conf/spark-env.sh when we don't install PySpark on each node of Yarn. i run python application successfully on yarn-client and yarn-cluster with this PR.
andrewor14 sryza Sephiroth-Lin Can you take a look at this?thanks.

Author: Lianhui Wang <lianhuiwang09@gmail.com>

Closes #5580 from lianhuiwang/SPARK-6869 and squashes the following commits:

66ffa43 [Lianhui Wang] Update Client.scala
c2ad0f9 [Lianhui Wang] Update Client.scala
1c8f664 [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869
008850a [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869
f0b4ed8 [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869
150907b [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869
20402cd [Lianhui Wang] use ZipEntry
9d87c3f [Lianhui Wang] update scala style
e7bd971 [Lianhui Wang] address vanzin's comments
4b8a3ed [Lianhui Wang] use pyArchivesEnvOpt
e6b573b [Lianhui Wang] address vanzin's comments
f11f84a [Lianhui Wang] zip pyspark archives
5192cca [Lianhui Wang] update import path
3b1e4c8 [Lianhui Wang] address tgravescs's comments
9396346 [Lianhui Wang] put zip to make-distribution.sh
0d2baf7 [Lianhui Wang] update import paths
e0179be [Lianhui Wang] add zip pyspark archives in build or sparksubmit
31e8e06 [Lianhui Wang] update code style
9f31dac [Lianhui Wang] update code and add comments
f72987c [Lianhui Wang] add archives path to PYTHONPATH

(cherry picked from commit ebff732)
Signed-off-by: Thomas Graves <tgraves@apache.org>
asfgit pushed a commit that referenced this pull request May 8, 2015
Based on #5478 that provide a PYSPARK_ARCHIVES_PATH env. within this PR, we just should export PYSPARK_ARCHIVES_PATH=/user/spark/pyspark.zip,/user/spark/python/lib/py4j-0.8.2.1-src.zip in conf/spark-env.sh when we don't install PySpark on each node of Yarn. i run python application successfully on yarn-client and yarn-cluster with this PR.
andrewor14 sryza Sephiroth-Lin Can you take a look at this?thanks.

Author: Lianhui Wang <lianhuiwang09@gmail.com>

Closes #5580 from lianhuiwang/SPARK-6869 and squashes the following commits:

66ffa43 [Lianhui Wang] Update Client.scala
c2ad0f9 [Lianhui Wang] Update Client.scala
1c8f664 [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869
008850a [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869
f0b4ed8 [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869
150907b [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869
20402cd [Lianhui Wang] use ZipEntry
9d87c3f [Lianhui Wang] update scala style
e7bd971 [Lianhui Wang] address vanzin's comments
4b8a3ed [Lianhui Wang] use pyArchivesEnvOpt
e6b573b [Lianhui Wang] address vanzin's comments
f11f84a [Lianhui Wang] zip pyspark archives
5192cca [Lianhui Wang] update import path
3b1e4c8 [Lianhui Wang] address tgravescs's comments
9396346 [Lianhui Wang] put zip to make-distribution.sh
0d2baf7 [Lianhui Wang] update import paths
e0179be [Lianhui Wang] add zip pyspark archives in build or sparksubmit
31e8e06 [Lianhui Wang] update code style
9f31dac [Lianhui Wang] update code and add comments
f72987c [Lianhui Wang] add archives path to PYTHONPATH
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 28, 2015
Based on apache#5478 that provide a PYSPARK_ARCHIVES_PATH env. within this PR, we just should export PYSPARK_ARCHIVES_PATH=/user/spark/pyspark.zip,/user/spark/python/lib/py4j-0.8.2.1-src.zip in conf/spark-env.sh when we don't install PySpark on each node of Yarn. i run python application successfully on yarn-client and yarn-cluster with this PR.
andrewor14 sryza Sephiroth-Lin Can you take a look at this?thanks.

Author: Lianhui Wang <lianhuiwang09@gmail.com>

Closes apache#5580 from lianhuiwang/SPARK-6869 and squashes the following commits:

66ffa43 [Lianhui Wang] Update Client.scala
c2ad0f9 [Lianhui Wang] Update Client.scala
1c8f664 [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869
008850a [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869
f0b4ed8 [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869
150907b [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869
20402cd [Lianhui Wang] use ZipEntry
9d87c3f [Lianhui Wang] update scala style
e7bd971 [Lianhui Wang] address vanzin's comments
4b8a3ed [Lianhui Wang] use pyArchivesEnvOpt
e6b573b [Lianhui Wang] address vanzin's comments
f11f84a [Lianhui Wang] zip pyspark archives
5192cca [Lianhui Wang] update import path
3b1e4c8 [Lianhui Wang] address tgravescs's comments
9396346 [Lianhui Wang] put zip to make-distribution.sh
0d2baf7 [Lianhui Wang] update import paths
e0179be [Lianhui Wang] add zip pyspark archives in build or sparksubmit
31e8e06 [Lianhui Wang] update code style
9f31dac [Lianhui Wang] update code and add comments
f72987c [Lianhui Wang] add archives path to PYTHONPATH
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
Based on apache#5478 that provide a PYSPARK_ARCHIVES_PATH env. within this PR, we just should export PYSPARK_ARCHIVES_PATH=/user/spark/pyspark.zip,/user/spark/python/lib/py4j-0.8.2.1-src.zip in conf/spark-env.sh when we don't install PySpark on each node of Yarn. i run python application successfully on yarn-client and yarn-cluster with this PR.
andrewor14 sryza Sephiroth-Lin Can you take a look at this?thanks.

Author: Lianhui Wang <lianhuiwang09@gmail.com>

Closes apache#5580 from lianhuiwang/SPARK-6869 and squashes the following commits:

66ffa43 [Lianhui Wang] Update Client.scala
c2ad0f9 [Lianhui Wang] Update Client.scala
1c8f664 [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869
008850a [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869
f0b4ed8 [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869
150907b [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869
20402cd [Lianhui Wang] use ZipEntry
9d87c3f [Lianhui Wang] update scala style
e7bd971 [Lianhui Wang] address vanzin's comments
4b8a3ed [Lianhui Wang] use pyArchivesEnvOpt
e6b573b [Lianhui Wang] address vanzin's comments
f11f84a [Lianhui Wang] zip pyspark archives
5192cca [Lianhui Wang] update import path
3b1e4c8 [Lianhui Wang] address tgravescs's comments
9396346 [Lianhui Wang] put zip to make-distribution.sh
0d2baf7 [Lianhui Wang] update import paths
e0179be [Lianhui Wang] add zip pyspark archives in build or sparksubmit
31e8e06 [Lianhui Wang] update code style
9f31dac [Lianhui Wang] update code and add comments
f72987c [Lianhui Wang] add archives path to PYTHONPATH
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
Based on apache#5478 that provide a PYSPARK_ARCHIVES_PATH env. within this PR, we just should export PYSPARK_ARCHIVES_PATH=/user/spark/pyspark.zip,/user/spark/python/lib/py4j-0.8.2.1-src.zip in conf/spark-env.sh when we don't install PySpark on each node of Yarn. i run python application successfully on yarn-client and yarn-cluster with this PR.
andrewor14 sryza Sephiroth-Lin Can you take a look at this?thanks.

Author: Lianhui Wang <lianhuiwang09@gmail.com>

Closes apache#5580 from lianhuiwang/SPARK-6869 and squashes the following commits:

66ffa43 [Lianhui Wang] Update Client.scala
c2ad0f9 [Lianhui Wang] Update Client.scala
1c8f664 [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869
008850a [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869
f0b4ed8 [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869
150907b [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869
20402cd [Lianhui Wang] use ZipEntry
9d87c3f [Lianhui Wang] update scala style
e7bd971 [Lianhui Wang] address vanzin's comments
4b8a3ed [Lianhui Wang] use pyArchivesEnvOpt
e6b573b [Lianhui Wang] address vanzin's comments
f11f84a [Lianhui Wang] zip pyspark archives
5192cca [Lianhui Wang] update import path
3b1e4c8 [Lianhui Wang] address tgravescs's comments
9396346 [Lianhui Wang] put zip to make-distribution.sh
0d2baf7 [Lianhui Wang] update import paths
e0179be [Lianhui Wang] add zip pyspark archives in build or sparksubmit
31e8e06 [Lianhui Wang] update code style
9f31dac [Lianhui Wang] update code and add comments
f72987c [Lianhui Wang] add archives path to PYTHONPATH
asfgit pushed a commit to apache/zeppelin that referenced this pull request Jul 5, 2015
…very yarn node

- Spark supports pyspark on yarn cluster without deploying python libraries from Spark 1.4
 - https://issues.apache.org/jira/browse/SPARK-6869
 - apache/spark#5580, apache/spark#5478

Author: Jongyoul Lee <jongyoul@gmail.com>

Closes #118 from jongyoul/ZEPPELIN-18 and squashes the following commits:

a47e27c [Jongyoul Lee] - Fixed test script for spark 1.4.0
72a65fd [Jongyoul Lee] - Fixed test script for spark 1.4.0
ee6d100 [Jongyoul Lee] - Cleanup codes
47fd9c9 [Jongyoul Lee] - Cleanup codes
248e330 [Jongyoul Lee] - Cleanup codes
4cd10b5 [Jongyoul Lee] - Removed meaningless codes comments
c9cda29 [Jongyoul Lee] - Removed setting SPARK_HOME - Changed the location of pyspark's directory into interpreter/spark
ef240f5 [Jongyoul Lee] - Fixed typo
06002fd [Jongyoul Lee] - Fixed typo
4b35c8d [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - Dummy for trigger
682986e [Jongyoul Lee] rebased
8a7bf47 [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - rebasing
ad610fb [Jongyoul Lee] rebased
94bdf30 [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - Fixed checkstyle
929333d [Jongyoul Lee] rebased
64b8195 [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - rebasing
0a2d90e [Jongyoul Lee] rebased
b05ae6e [Jongyoul Lee] [ZEPPELIN-18] Remove setting SPARK_HOME for PySpark - Excludes python/** from apache-rat
71e2a92 [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - Removed verbose setting
0ddb436 [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - Followed spark's way to support pyspark - https://issues.apache.org/jira/browse/SPARK-6869 - apache/spark#5580 - https://github.com/apache/spark/pull/5478/files
1b192f6 [Jongyoul Lee] [ZEPPELIN-18] Remove setting SPARK_HOME for PySpark - Removed redundant dependency setting
32fd9e1 [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - rebasing
asfgit pushed a commit to apache/zeppelin that referenced this pull request Jul 5, 2015
…very yarn node

- Spark supports pyspark on yarn cluster without deploying python libraries from Spark 1.4
 - https://issues.apache.org/jira/browse/SPARK-6869
 - apache/spark#5580, apache/spark#5478

Author: Jongyoul Lee <jongyoul@gmail.com>

Closes #118 from jongyoul/ZEPPELIN-18 and squashes the following commits:

a47e27c [Jongyoul Lee] - Fixed test script for spark 1.4.0
72a65fd [Jongyoul Lee] - Fixed test script for spark 1.4.0
ee6d100 [Jongyoul Lee] - Cleanup codes
47fd9c9 [Jongyoul Lee] - Cleanup codes
248e330 [Jongyoul Lee] - Cleanup codes
4cd10b5 [Jongyoul Lee] - Removed meaningless codes comments
c9cda29 [Jongyoul Lee] - Removed setting SPARK_HOME - Changed the location of pyspark's directory into interpreter/spark
ef240f5 [Jongyoul Lee] - Fixed typo
06002fd [Jongyoul Lee] - Fixed typo
4b35c8d [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - Dummy for trigger
682986e [Jongyoul Lee] rebased
8a7bf47 [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - rebasing
ad610fb [Jongyoul Lee] rebased
94bdf30 [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - Fixed checkstyle
929333d [Jongyoul Lee] rebased
64b8195 [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - rebasing
0a2d90e [Jongyoul Lee] rebased
b05ae6e [Jongyoul Lee] [ZEPPELIN-18] Remove setting SPARK_HOME for PySpark - Excludes python/** from apache-rat
71e2a92 [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - Removed verbose setting
0ddb436 [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - Followed spark's way to support pyspark - https://issues.apache.org/jira/browse/SPARK-6869 - apache/spark#5580 - https://github.com/apache/spark/pull/5478/files
1b192f6 [Jongyoul Lee] [ZEPPELIN-18] Remove setting SPARK_HOME for PySpark - Removed redundant dependency setting
32fd9e1 [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - rebasing

(cherry picked from commit 3bd2b21)
Signed-off-by: Lee moon soo <moon@apache.org>
@Sephiroth-Lin Sephiroth-Lin deleted the SPARK-6869 branch May 15, 2016 10:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
8 participants