-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-4616][Core] - SPARK_CONF_DIR is not effective in spark-submit #3559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…through the command line options
|
Can one of the admins verify this patch? |
|
Jenkins, this is ok to test. |
|
Test build #24152 has started for PR 3559 at commit
|
|
Test build #24152 has finished for PR 3559 at commit
|
|
Test PASSed. |
|
@JoshRosen Is there anything else needed for this patch to be pushed in? Any feedback / review would be great as well! |
|
Hey @brennonyork I'm actually not sure if we should introduce an additional command line option to specify the environment file. I think it makes the semantics very confusing because we can't be sure which environment file is being used: is it the one specified through I think because of this circular dependency relationship between |
|
@andrewor14 definitely understand that reasoning. I guess my only question would be how would we a) answer the bug then and still b) support the What about this solution... We completely remove the |
|
@andrewor14 @JoshRosen wondering what should be done with this issue, thoughts on my comments above?? |
|
@brennonyork we can't remove I think the better approach is to simply warn the user in |
|
Roger that. What do you think about, for this PR, to put merely put a blurb into the |
|
That sounds good. Feel free to make a brief mention in the docs where appropriate. |
By default the
SPARK_CONF_DIRis not capable of being set from thespark-submitscript, but a spark properties file is. After diving through the code it turned out that theSPARK_CONF_DIRis actually a cyclic reference as it is referenced within theload-spark-env.shscript and can also then be reset within the loadedspark-env.shfile. What's worse is that, if thespark-env.shdefined aSPARK_CONF_DIRit wouldn't be picked up and used to grab the defaultspark-defaults.confif no--properties-fileflag was present. As such, it seemed best to provide a--environment-fileflag that can be used to present an arbitrary bash file to the system which would preload any necessary environment configuration options (includingSPARK_CONF_DIR). This then solves the original problem that theSPARK_CONF_DIRwasn't effective withinspark-submit, but also removes the cyclic dependency on where and when theSPARK_CONF_DIRis loaded.