Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-1680] [Docs] Explain environment variables for running on YARN in cluster mode #10869

Closed
wants to merge 2 commits into from

Conversation

weineran
Copy link
Contributor

JIRA 1680 added a property called spark.yarn.appMasterEnv. This PR draws users' attention to this special case by adding an explanation in configuration.html#environment-variables

…ark on YARN in cluster mode, which is a special case.
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@srowen
Copy link
Member

srowen commented Jan 21, 2016

CC @sryza @vanzin @tgravescs for a check

@tgravescs
Copy link
Contributor

you should file a separate jira for this. The original went in and is closed.

I'm fine with the text, it might be nice to add something about preferring the use of spark.executorEnv.[EnvironmentVariableName] over the spark-env.sh file.

@weineran
Copy link
Contributor Author

It looks like the original JIRA status is "RESOLVED" but it was never "CLOSED" and so it is currently showing this pull request, which is probably a good thing. Since these documentation changes relate to the earlier code changes, I think it makes sense to put them on the same JIRA if possible. Otherwise, I'd be tempted to call this doc change "trivial" and skip creating a new JIRA.

Can you assist with the text relating to spark.executorEnv.[EnvironmentVariableName]? I don't have a deep understanding of when to use the property vs. when to use the .sh file.

@srowen
Copy link
Member

srowen commented Jan 22, 2016

@weineran agree with leaving it as is; my full logic: since that JIRA is soo old, and this change is still logically separable (i.e. one makes sense without the other), it's reasonable to make a new JIRA. However it's also trivial (i.e. diff ~= description of change) so in that sense, doesn't really matter.

@@ -1700,6 +1700,8 @@ to use on each machine and maximum memory.
Since `spark-env.sh` is a shell script, some of these can be set programmatically -- for example, you might
compute `SPARK_LOCAL_IP` by looking up the IP of a specific network interface.

Note: When running Spark on YARN in cluster mode, environment variables need to be set using the <code>spark.yarn.appMasterEnv.[EnvironmentVariableName]</code> property in your `conf/spark-defaults.conf` file. Environment variables that are set in `spark-env.sh` will not be reflected in the YARN Application Master process in cluster mode. See the [YARN-related Spark Properties](running-on-yarn.html#spark-properties) for more information.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit -- this needs back-ticks rather than <code> right? I think it's OK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah well that could be. Does the <code> tag only get used in tables? Let me know if I should switch to back-ticks.

Just realized I should throw some back-ticks around cluster too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah as I recall <code> is required in tables since it occurs within the <table> tag and ticks aren't parsed there. But otherwise use back ticks.

@srowen
Copy link
Member

srowen commented Jan 27, 2016

Merged to master

@asfgit asfgit closed this in 093291c Jan 27, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants