New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPARK-2404: don't overwrite SPARK_HOME when it is set #1331
Conversation
Merged build triggered. |
Merged build started. |
Merged build finished. All automated tests passed. |
All automated tests passed. |
This change seems quite reasonable, though we should probably do this in other places as well, like run-example and pyspark. |
sure, I will append the modification this evening, |
Merged build triggered. |
Merged build started. |
Merged build finished. |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16444/ |
....no luck with Jenkins recently |
Merged build triggered. |
Merged build started. |
Merged build finished. |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16445/ |
@aarondav how about the current one? |
Just to be sure, this will only change the exported SPARK_HOME variable, which may be used to actually start executors or some such. However, we will keep using @pwendell I know you've looked at this problem before (of having multiple spark directories that we multiplex between), could you take a look at this PR? |
I think using $SPARK_HOME instead of $FWDIR may bring some errors, e.g. if SPARK_HOME in remote cluster does not exist in the local machine (where the driver runs), when you replace $FWDIR with SPARK_HOME, the commands scripts will not be found.... |
ping.... |
ping |
@CodingCat what if instead of this we just don't ship the local spark home to the cluster? There is really no reason to do that... I spoke with @andrewor14 about it today. A Spark Standalone cluster already knows where it is installed, so there is no reason for the spark location on the driver node to be identical to the location on the worker. |
OK, that would be a more comprehensive solution, then I will close this one |
https://issues.apache.org/jira/browse/SPARK-2404
In spark-class and spark-submit, the SPARK_HOME is set to the present working directory, causing the value of already defined SPARK_HOME being overwritten
We should not overwrite that if SPARK_HOME has been defined
Our scenario:
We have a login portal for all the team members to use the spark-cluster, everyone gets an account and home directory. Spark-1.0 is copied in the root path "/", every account gets a soft link to the /spark-1.0 in it's home directory.
Spark 1.0 is deployed in a cluster, where the user name is different with any accounts except one, say "CodingCat", we set a global SPARK_HOME to /home/CodingCat/spark-1.0 which is consistent with the remote cluster setup, but unfortunately, this is overwritten by the spark-class and spark-submit ....so in login portal, when the user runs spark-shell, it always tries to run /home/user_account/spark-1.0/bin/compute-class.sh, which does not exist.