Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-6869][PySpark] Add pyspark archives path to PYTHONPATH #5580

Closed
wants to merge 20 commits into from

Conversation

lianhuiwang
Copy link
Contributor

Based on #5478 that provide a PYSPARK_ARCHIVES_PATH env. within this PR, we just should export PYSPARK_ARCHIVES_PATH=/user/spark/pyspark.zip,/user/spark/python/lib/py4j-0.8.2.1-src.zip in conf/spark-env.sh when we don't install PySpark on each node of Yarn. i run python application successfully on yarn-client and yarn-cluster with this PR.
@andrewor14 @sryza @Sephiroth-Lin Can you take a look at this?thanks.

@SparkQA
Copy link

SparkQA commented Apr 19, 2015

Test build #30559 has started for PR 5580 at commit 9f31dac.

@SparkQA
Copy link

SparkQA commented Apr 19, 2015

Test build #30560 has started for PR 5580 at commit 31e8e06.

@SparkQA
Copy link

SparkQA commented Apr 19, 2015

Test build #30559 has finished for PR 5580 at commit 9f31dac.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30559/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Apr 19, 2015

Test build #30560 has finished for PR 5580 at commit 31e8e06.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30560/
Test PASSed.

// In yarn mode for a python app, if PYSPARK_ARCHIVES_PATH is in the user environment
// add pyspark archives to files that can be distributed with the job
if (args.isPython && clusterManager == YARN){
sys.env.get("PYSPARK_ARCHIVES_PATH").map { archives =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the user set this himself? What does he set it too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spark administrator can set pyspark's zip of local or hdfs path to PYSPARK_ARCHIVES_PATH in spark-env.sh. and then user run python application as before.

@andrewor14
Copy link
Contributor

@lianhuiwang Two high level comments. First, why just put it on --py-files? If it works as expected, then it will automatically add it to the executor's PYTHONPATHs without you having to do it manually as you have done in Client.scala. Second, this seems to still require some action on the user's part. They must manually zip up all the python archives themselves and put it on the PYSPARK_ARCHIVES_PATH. I would propose that we do this automatically for the user behind the scenes using Java's ZipEntry APIs.

@lianhuiwang
Copy link
Contributor Author

@andrewor14 First question is why not just put it on --py-files?if we donot set PYTHONPATH to executor in Client, Executor will throw a exception "/usr/bin/python: No module named pyspark" because executor can not use --py-files. so we also need to set PYTHONPATH environment in Client that can make executor run with pyspark module.
for second question, now it require to set PYSPARK_ARCHIVES_PATH in spark-env.sh like hadoop_home enviroment. why need this setting?because if spark_home or PYSPARKPATH is installed on local path of every node, that do not need to set PYSPARK_ARCHIVES_PATH. but we cannot know whether SPARK_HOME or PYSPARKPATH is installed. so i think administrator can know that.
also i agree that we need to zip archives when assembly spark's jar.
@andrewor14 how about your opinions? thanks.

@lianhuiwang
Copy link
Contributor Author

@andrewor14 for second question,i add two things for it.one is i add zip pyspark archives to pyspark/lib when we build spark jar. other is in submit if PYSPARK_ARCHIVES_PATH does not exist and pyspark.zip does not exist, then we zip archives to pyspark/lib.
and i add a conf 'spark.submit.pyArchives' to store pyspark archives that let Client know pyArchives.we cannot use PYSPARK_ARCHIVES_PATH env because Client and spark-submit is one process and when we set PYSPARK_ARCHIVES_PATH in submit, Client cannot get it.
One thing to note that if we install PYSPARKPATH on every node, now we need to set local pyspark archives to PYSPARK_ARCHIVES_PATH. because spark-submit will check whether PYSPARK_ARCHIVES_PATH exists.

@lianhuiwang
Copy link
Contributor Author

@tgravescs i think this PR is useful for you. you can try it.

for (sparkHome <- sys.env.get("SPARK_HOME")) {
val pyLibPath = Seq(sparkHome, "python", "lib").mkString(File.separator)
val pyArchivesFile = new File(pyLibPath, "pyspark.zip")
if (!pyArchivesFile.exists()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we just make sure the zip is built during the build then we don't need to do the zip in the code. Just require it already there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that is no effect. maybe sometime we upgrade just with coping spark.jar. so that is good for this situation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow, if you just upgrade spark.jar then there are no change to the python scripts so you don't need to put new pyspark.zip. If there are changes then you either need to copy over the new python scripts or put a new pyspark.zip on there. It seems putting new pyspark.zip on there would be easier. Although I guess you need the python scripts there anyway for client mode so you probably need both.

In many cases I wouldn't expect a user to have write permissions on the python/lib directory. I would expect that to be a privileged operation. In that case the zip would fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i agree with you. thanks

@lianhuiwang
Copy link
Contributor Author

@tgravescs yes, i agree with your comments and have updated it. can you review it again? thanks.

@SparkQA
Copy link

SparkQA commented Apr 27, 2015

Test build #31000 has started for PR 5580 at commit 5192cca.

@Sephiroth-Lin
Copy link
Contributor

If user don't use make-distribution.sh and just compile Spark use maven or sbt, then don't have pyspark.zip. So we really don't need to do the zip in the code?

@tgravescs
Copy link
Contributor

@lianhuiwang like mentioned can we have the zip happen during the package phase rather then in make-distribution.sh?

if (args.isPython && clusterManager == YARN) {
var pyArchives: String = null
if (sys.env.contains("PYSPARK_ARCHIVES_PATH")) {
pyArchives = sys.env.get("PYSPARK_ARCHIVES_PATH").get
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sys.env("PYSPARK_ARCHIVES_PATH")? Or even:

sys.env.get("PYSPARK_ARCHIVES_PATH").getOrElse(
  // code to figure out where the archives are
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have updated it using option, because in getOrElse(default),default must be a value, can not be expression.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually that's not true, getOrElse takes a closure:

def getOrElse[B >: A](default: ⇒ B): B 

Which is why you can do getOrElse(throw SomeException()) (look for it in Spark's code base).

@tgravescs
Copy link
Contributor

I tested this out and this is working fine for me with jdk7 for both cluster and client mode. Code looks good. My only comment is that I think we can now stop packaging the python stuff in the assembly jar.

@lianhuiwang have you looked at removing those? If its not much works it would be best to do it here. If it looks like quite a bit we can file another jira for this. I would like to get this into 1.4 based on the discussions of ending support for jdk6.

@tgravescs
Copy link
Contributor

having heard nothing from @lianhuiwang I'm going to commit this and file a followup to remove python files from the assembly jar.

asfgit pushed a commit that referenced this pull request May 8, 2015
Based on #5478 that provide a PYSPARK_ARCHIVES_PATH env. within this PR, we just should export PYSPARK_ARCHIVES_PATH=/user/spark/pyspark.zip,/user/spark/python/lib/py4j-0.8.2.1-src.zip in conf/spark-env.sh when we don't install PySpark on each node of Yarn. i run python application successfully on yarn-client and yarn-cluster with this PR.
andrewor14 sryza Sephiroth-Lin Can you take a look at this?thanks.

Author: Lianhui Wang <lianhuiwang09@gmail.com>

Closes #5580 from lianhuiwang/SPARK-6869 and squashes the following commits:

66ffa43 [Lianhui Wang] Update Client.scala
c2ad0f9 [Lianhui Wang] Update Client.scala
1c8f664 [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869
008850a [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869
f0b4ed8 [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869
150907b [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869
20402cd [Lianhui Wang] use ZipEntry
9d87c3f [Lianhui Wang] update scala style
e7bd971 [Lianhui Wang] address vanzin's comments
4b8a3ed [Lianhui Wang] use pyArchivesEnvOpt
e6b573b [Lianhui Wang] address vanzin's comments
f11f84a [Lianhui Wang] zip pyspark archives
5192cca [Lianhui Wang] update import path
3b1e4c8 [Lianhui Wang] address tgravescs's comments
9396346 [Lianhui Wang] put zip to make-distribution.sh
0d2baf7 [Lianhui Wang] update import paths
e0179be [Lianhui Wang] add zip pyspark archives in build or sparksubmit
31e8e06 [Lianhui Wang] update code style
9f31dac [Lianhui Wang] update code and add comments
f72987c [Lianhui Wang] add archives path to PYTHONPATH

(cherry picked from commit ebff732)
Signed-off-by: Thomas Graves <tgraves@apache.org>
@asfgit asfgit closed this in ebff732 May 8, 2015
@tgravescs
Copy link
Contributor

asfgit pushed a commit that referenced this pull request May 8, 2015
Add `python/lib/pyspark.zip` to `.gitignore`. After merging #5580, `python/lib/pyspark.zip` will be generated when building Spark.

Author: zsxwing <zsxwing@gmail.com>

Closes #6017 from zsxwing/gitignore and squashes the following commits:

39b10c4 [zsxwing] Ignore python/lib/pyspark.zip
asfgit pushed a commit that referenced this pull request May 8, 2015
Add `python/lib/pyspark.zip` to `.gitignore`. After merging #5580, `python/lib/pyspark.zip` will be generated when building Spark.

Author: zsxwing <zsxwing@gmail.com>

Closes #6017 from zsxwing/gitignore and squashes the following commits:

39b10c4 [zsxwing] Ignore python/lib/pyspark.zip

(cherry picked from commit dc71e47)
Signed-off-by: Andrew Or <andrew@databricks.com>
@lianhuiwang
Copy link
Contributor Author

@tgravescs sorry for my late reply. i think #6022 is working for SPARK-7485.

asfgit pushed a commit that referenced this pull request May 12, 2015
…n python/pyspark

As PR #5580 we have created pyspark.zip on building and set PYTHONPATH to python/lib/pyspark.zip, so to keep consistence update this.

Author: linweizhong <linweizhong@huawei.com>

Closes #6047 from Sephiroth-Lin/pyspark_pythonpath and squashes the following commits:

8cc3d96 [linweizhong] Set PYTHONPATH to python/lib/pyspark.zip rather than python/pyspark as PR#5580 we have create pyspark.zip on build
asfgit pushed a commit that referenced this pull request May 12, 2015
…n python/pyspark

As PR #5580 we have created pyspark.zip on building and set PYTHONPATH to python/lib/pyspark.zip, so to keep consistence update this.

Author: linweizhong <linweizhong@huawei.com>

Closes #6047 from Sephiroth-Lin/pyspark_pythonpath and squashes the following commits:

8cc3d96 [linweizhong] Set PYTHONPATH to python/lib/pyspark.zip rather than python/pyspark as PR#5580 we have create pyspark.zip on build

(cherry picked from commit 9847875)
Signed-off-by: Andrew Or <andrew@databricks.com>
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 28, 2015
Based on apache#5478 that provide a PYSPARK_ARCHIVES_PATH env. within this PR, we just should export PYSPARK_ARCHIVES_PATH=/user/spark/pyspark.zip,/user/spark/python/lib/py4j-0.8.2.1-src.zip in conf/spark-env.sh when we don't install PySpark on each node of Yarn. i run python application successfully on yarn-client and yarn-cluster with this PR.
andrewor14 sryza Sephiroth-Lin Can you take a look at this?thanks.

Author: Lianhui Wang <lianhuiwang09@gmail.com>

Closes apache#5580 from lianhuiwang/SPARK-6869 and squashes the following commits:

66ffa43 [Lianhui Wang] Update Client.scala
c2ad0f9 [Lianhui Wang] Update Client.scala
1c8f664 [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869
008850a [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869
f0b4ed8 [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869
150907b [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869
20402cd [Lianhui Wang] use ZipEntry
9d87c3f [Lianhui Wang] update scala style
e7bd971 [Lianhui Wang] address vanzin's comments
4b8a3ed [Lianhui Wang] use pyArchivesEnvOpt
e6b573b [Lianhui Wang] address vanzin's comments
f11f84a [Lianhui Wang] zip pyspark archives
5192cca [Lianhui Wang] update import path
3b1e4c8 [Lianhui Wang] address tgravescs's comments
9396346 [Lianhui Wang] put zip to make-distribution.sh
0d2baf7 [Lianhui Wang] update import paths
e0179be [Lianhui Wang] add zip pyspark archives in build or sparksubmit
31e8e06 [Lianhui Wang] update code style
9f31dac [Lianhui Wang] update code and add comments
f72987c [Lianhui Wang] add archives path to PYTHONPATH
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 28, 2015
Add `python/lib/pyspark.zip` to `.gitignore`. After merging apache#5580, `python/lib/pyspark.zip` will be generated when building Spark.

Author: zsxwing <zsxwing@gmail.com>

Closes apache#6017 from zsxwing/gitignore and squashes the following commits:

39b10c4 [zsxwing] Ignore python/lib/pyspark.zip
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 28, 2015
…n python/pyspark

As PR apache#5580 we have created pyspark.zip on building and set PYTHONPATH to python/lib/pyspark.zip, so to keep consistence update this.

Author: linweizhong <linweizhong@huawei.com>

Closes apache#6047 from Sephiroth-Lin/pyspark_pythonpath and squashes the following commits:

8cc3d96 [linweizhong] Set PYTHONPATH to python/lib/pyspark.zip rather than python/pyspark as PR#5580 we have create pyspark.zip on build
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
Based on apache#5478 that provide a PYSPARK_ARCHIVES_PATH env. within this PR, we just should export PYSPARK_ARCHIVES_PATH=/user/spark/pyspark.zip,/user/spark/python/lib/py4j-0.8.2.1-src.zip in conf/spark-env.sh when we don't install PySpark on each node of Yarn. i run python application successfully on yarn-client and yarn-cluster with this PR.
andrewor14 sryza Sephiroth-Lin Can you take a look at this?thanks.

Author: Lianhui Wang <lianhuiwang09@gmail.com>

Closes apache#5580 from lianhuiwang/SPARK-6869 and squashes the following commits:

66ffa43 [Lianhui Wang] Update Client.scala
c2ad0f9 [Lianhui Wang] Update Client.scala
1c8f664 [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869
008850a [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869
f0b4ed8 [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869
150907b [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869
20402cd [Lianhui Wang] use ZipEntry
9d87c3f [Lianhui Wang] update scala style
e7bd971 [Lianhui Wang] address vanzin's comments
4b8a3ed [Lianhui Wang] use pyArchivesEnvOpt
e6b573b [Lianhui Wang] address vanzin's comments
f11f84a [Lianhui Wang] zip pyspark archives
5192cca [Lianhui Wang] update import path
3b1e4c8 [Lianhui Wang] address tgravescs's comments
9396346 [Lianhui Wang] put zip to make-distribution.sh
0d2baf7 [Lianhui Wang] update import paths
e0179be [Lianhui Wang] add zip pyspark archives in build or sparksubmit
31e8e06 [Lianhui Wang] update code style
9f31dac [Lianhui Wang] update code and add comments
f72987c [Lianhui Wang] add archives path to PYTHONPATH
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
Add `python/lib/pyspark.zip` to `.gitignore`. After merging apache#5580, `python/lib/pyspark.zip` will be generated when building Spark.

Author: zsxwing <zsxwing@gmail.com>

Closes apache#6017 from zsxwing/gitignore and squashes the following commits:

39b10c4 [zsxwing] Ignore python/lib/pyspark.zip
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
…n python/pyspark

As PR apache#5580 we have created pyspark.zip on building and set PYTHONPATH to python/lib/pyspark.zip, so to keep consistence update this.

Author: linweizhong <linweizhong@huawei.com>

Closes apache#6047 from Sephiroth-Lin/pyspark_pythonpath and squashes the following commits:

8cc3d96 [linweizhong] Set PYTHONPATH to python/lib/pyspark.zip rather than python/pyspark as PR#5580 we have create pyspark.zip on build
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
Based on apache#5478 that provide a PYSPARK_ARCHIVES_PATH env. within this PR, we just should export PYSPARK_ARCHIVES_PATH=/user/spark/pyspark.zip,/user/spark/python/lib/py4j-0.8.2.1-src.zip in conf/spark-env.sh when we don't install PySpark on each node of Yarn. i run python application successfully on yarn-client and yarn-cluster with this PR.
andrewor14 sryza Sephiroth-Lin Can you take a look at this?thanks.

Author: Lianhui Wang <lianhuiwang09@gmail.com>

Closes apache#5580 from lianhuiwang/SPARK-6869 and squashes the following commits:

66ffa43 [Lianhui Wang] Update Client.scala
c2ad0f9 [Lianhui Wang] Update Client.scala
1c8f664 [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869
008850a [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869
f0b4ed8 [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869
150907b [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869
20402cd [Lianhui Wang] use ZipEntry
9d87c3f [Lianhui Wang] update scala style
e7bd971 [Lianhui Wang] address vanzin's comments
4b8a3ed [Lianhui Wang] use pyArchivesEnvOpt
e6b573b [Lianhui Wang] address vanzin's comments
f11f84a [Lianhui Wang] zip pyspark archives
5192cca [Lianhui Wang] update import path
3b1e4c8 [Lianhui Wang] address tgravescs's comments
9396346 [Lianhui Wang] put zip to make-distribution.sh
0d2baf7 [Lianhui Wang] update import paths
e0179be [Lianhui Wang] add zip pyspark archives in build or sparksubmit
31e8e06 [Lianhui Wang] update code style
9f31dac [Lianhui Wang] update code and add comments
f72987c [Lianhui Wang] add archives path to PYTHONPATH
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
Add `python/lib/pyspark.zip` to `.gitignore`. After merging apache#5580, `python/lib/pyspark.zip` will be generated when building Spark.

Author: zsxwing <zsxwing@gmail.com>

Closes apache#6017 from zsxwing/gitignore and squashes the following commits:

39b10c4 [zsxwing] Ignore python/lib/pyspark.zip
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
…n python/pyspark

As PR apache#5580 we have created pyspark.zip on building and set PYTHONPATH to python/lib/pyspark.zip, so to keep consistence update this.

Author: linweizhong <linweizhong@huawei.com>

Closes apache#6047 from Sephiroth-Lin/pyspark_pythonpath and squashes the following commits:

8cc3d96 [linweizhong] Set PYTHONPATH to python/lib/pyspark.zip rather than python/pyspark as PR#5580 we have create pyspark.zip on build
jongyoul added a commit to jongyoul/zeppelin that referenced this pull request Jun 24, 2015
jongyoul added a commit to jongyoul/zeppelin that referenced this pull request Jun 25, 2015
jongyoul added a commit to jongyoul/zeppelin that referenced this pull request Jul 3, 2015
jongyoul added a commit to jongyoul/zeppelin that referenced this pull request Jul 4, 2015
asfgit pushed a commit to apache/zeppelin that referenced this pull request Jul 5, 2015
…very yarn node

- Spark supports pyspark on yarn cluster without deploying python libraries from Spark 1.4
 - https://issues.apache.org/jira/browse/SPARK-6869
 - apache/spark#5580, apache/spark#5478

Author: Jongyoul Lee <jongyoul@gmail.com>

Closes #118 from jongyoul/ZEPPELIN-18 and squashes the following commits:

a47e27c [Jongyoul Lee] - Fixed test script for spark 1.4.0
72a65fd [Jongyoul Lee] - Fixed test script for spark 1.4.0
ee6d100 [Jongyoul Lee] - Cleanup codes
47fd9c9 [Jongyoul Lee] - Cleanup codes
248e330 [Jongyoul Lee] - Cleanup codes
4cd10b5 [Jongyoul Lee] - Removed meaningless codes comments
c9cda29 [Jongyoul Lee] - Removed setting SPARK_HOME - Changed the location of pyspark's directory into interpreter/spark
ef240f5 [Jongyoul Lee] - Fixed typo
06002fd [Jongyoul Lee] - Fixed typo
4b35c8d [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - Dummy for trigger
682986e [Jongyoul Lee] rebased
8a7bf47 [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - rebasing
ad610fb [Jongyoul Lee] rebased
94bdf30 [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - Fixed checkstyle
929333d [Jongyoul Lee] rebased
64b8195 [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - rebasing
0a2d90e [Jongyoul Lee] rebased
b05ae6e [Jongyoul Lee] [ZEPPELIN-18] Remove setting SPARK_HOME for PySpark - Excludes python/** from apache-rat
71e2a92 [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - Removed verbose setting
0ddb436 [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - Followed spark's way to support pyspark - https://issues.apache.org/jira/browse/SPARK-6869 - apache/spark#5580 - https://github.com/apache/spark/pull/5478/files
1b192f6 [Jongyoul Lee] [ZEPPELIN-18] Remove setting SPARK_HOME for PySpark - Removed redundant dependency setting
32fd9e1 [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - rebasing
asfgit pushed a commit to apache/zeppelin that referenced this pull request Jul 5, 2015
…very yarn node

- Spark supports pyspark on yarn cluster without deploying python libraries from Spark 1.4
 - https://issues.apache.org/jira/browse/SPARK-6869
 - apache/spark#5580, apache/spark#5478

Author: Jongyoul Lee <jongyoul@gmail.com>

Closes #118 from jongyoul/ZEPPELIN-18 and squashes the following commits:

a47e27c [Jongyoul Lee] - Fixed test script for spark 1.4.0
72a65fd [Jongyoul Lee] - Fixed test script for spark 1.4.0
ee6d100 [Jongyoul Lee] - Cleanup codes
47fd9c9 [Jongyoul Lee] - Cleanup codes
248e330 [Jongyoul Lee] - Cleanup codes
4cd10b5 [Jongyoul Lee] - Removed meaningless codes comments
c9cda29 [Jongyoul Lee] - Removed setting SPARK_HOME - Changed the location of pyspark's directory into interpreter/spark
ef240f5 [Jongyoul Lee] - Fixed typo
06002fd [Jongyoul Lee] - Fixed typo
4b35c8d [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - Dummy for trigger
682986e [Jongyoul Lee] rebased
8a7bf47 [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - rebasing
ad610fb [Jongyoul Lee] rebased
94bdf30 [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - Fixed checkstyle
929333d [Jongyoul Lee] rebased
64b8195 [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - rebasing
0a2d90e [Jongyoul Lee] rebased
b05ae6e [Jongyoul Lee] [ZEPPELIN-18] Remove setting SPARK_HOME for PySpark - Excludes python/** from apache-rat
71e2a92 [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - Removed verbose setting
0ddb436 [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - Followed spark's way to support pyspark - https://issues.apache.org/jira/browse/SPARK-6869 - apache/spark#5580 - https://github.com/apache/spark/pull/5478/files
1b192f6 [Jongyoul Lee] [ZEPPELIN-18] Remove setting SPARK_HOME for PySpark - Removed redundant dependency setting
32fd9e1 [Jongyoul Lee] [ZEPPELIN-18] Running pyspark without deploying python libraries to every yarn node - rebasing

(cherry picked from commit 3bd2b21)
Signed-off-by: Lee moon soo <moon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants