Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-3943] Some scripts bin\*.cmd pollutes environment variables in Windows #2797

Closed
wants to merge 1 commit into from

Conversation

tsudukim
Copy link
Contributor

Modified not to pollute environment variables.
Just moved the main logic into XXX2.cmd from XXX.cmd, and call XXX2.cmd with cmd command in XXX.cmd.
pyspark.cmd and spark-class.cmd are already using the same way, but spark-shell.cmd, spark-submit.cmd and /python/docs/make.bat are not.

… Windows

Modified not to pollute environment variables.
@tsudukim
Copy link
Contributor Author

Please merge this AFTER #2796 is merged, because /python/docs/make2.bat will be ignored by .gitignore in /python by mistake.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@andrewor14
Copy link
Contributor

ok to test


set SPARK_HOME=%~dp0..

cmd /V /E /C %SPARK_HOME%\bin\spark-submit.cmd --class org.apache.spark.repl.Main %* spark-shell
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the purpose of this file such that exporting SPARK_HOME here doesn't keep it in the system environment even after the application exits?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is.

Although this script now has only one environment variable SPARK_HOME, it might be more complicated in future just like bin/spark-shell (linux version), then it might export other environment variables.

@andrewor14
Copy link
Contributor

I haven't done a close review to make sure this doesn't break anything yet, but the general idea looks good to me. Just so I understand correctly, is the purpose of this to ensure we don't accidentally read environment variables from previous applications that have now exited? Were you observing any strange behavior, or is this just a safeguard for the future?

@SparkQA
Copy link

SparkQA commented Oct 14, 2014

QA tests have started for PR 2797 at commit b397a7d.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 14, 2014

QA tests have finished for PR 2797 at commit b397a7d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21737/
Test PASSed.

@tsudukim
Copy link
Contributor Author

@andrewor14 thank you for following this PR.
Yes that's I mean.
I'm not observing any problems, this is just a safeguard. Polluting environment might affect noy only spark scripts but also non-spark batch scripts and applications.

@andrewor14
Copy link
Contributor

I see. This LGTM actually. I just did a closer look at the scripts, and it appears that we already do the same for older ones (pyspark.cmd, spark-class.cmd etc.). I'm merging this into master.

@asfgit asfgit closed this in 66af8e2 Oct 15, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants