Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARK-2404: don't overwrite SPARK_HOME when it is set #1331

Closed
wants to merge 2 commits into from

Conversation

CodingCat
Copy link
Contributor

https://issues.apache.org/jira/browse/SPARK-2404

In spark-class and spark-submit, the SPARK_HOME is set to the present working directory, causing the value of already defined SPARK_HOME being overwritten

We should not overwrite that if SPARK_HOME has been defined

Our scenario:

We have a login portal for all the team members to use the spark-cluster, everyone gets an account and home directory. Spark-1.0 is copied in the root path "/", every account gets a soft link to the /spark-1.0 in it's home directory.

Spark 1.0 is deployed in a cluster, where the user name is different with any accounts except one, say "CodingCat", we set a global SPARK_HOME to /home/CodingCat/spark-1.0 which is consistent with the remote cluster setup, but unfortunately, this is overwritten by the spark-class and spark-submit ....so in login portal, when the user runs spark-shell, it always tries to run /home/user_account/spark-1.0/bin/compute-class.sh, which does not exist.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16411/

@aarondav
Copy link
Contributor

aarondav commented Jul 8, 2014

This change seems quite reasonable, though we should probably do this in other places as well, like run-example and pyspark.

@CodingCat
Copy link
Contributor Author

sure, I will append the modification this evening,

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16444/

@CodingCat
Copy link
Contributor Author

....no luck with Jenkins recently

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16445/

@CodingCat
Copy link
Contributor Author

@aarondav how about the current one?

@aarondav
Copy link
Contributor

aarondav commented Jul 9, 2014

Just to be sure, this will only change the exported SPARK_HOME variable, which may be used to actually start executors or some such. However, we will keep using $FWDIR for running other commands within the script. I wonder if we should use SPARK_HOME instead of FWDIR (i.e., move the usage of FWDIR inside of this if statement) when available.

@pwendell I know you've looked at this problem before (of having multiple spark directories that we multiplex between), could you take a look at this PR?

@CodingCat
Copy link
Contributor Author

I think using $SPARK_HOME instead of $FWDIR may bring some errors,

e.g. if SPARK_HOME in remote cluster does not exist in the local machine (where the driver runs), when you replace $FWDIR with SPARK_HOME, the commands scripts will not be found....

@CodingCat
Copy link
Contributor Author

ping....

@CodingCat
Copy link
Contributor Author

ping

@pwendell
Copy link
Contributor

@CodingCat what if instead of this we just don't ship the local spark home to the cluster? There is really no reason to do that... I spoke with @andrewor14 about it today. A Spark Standalone cluster already knows where it is installed, so there is no reason for the spark location on the driver node to be identical to the location on the worker.

@CodingCat
Copy link
Contributor Author

OK, that would be a more comprehensive solution,

then I will close this one

@CodingCat CodingCat closed this Jul 31, 2014
kazuyukitanimura pushed a commit to kazuyukitanimura/spark that referenced this pull request Aug 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants