-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-48392][CORE] Also load spark-defaults.conf
when provided --properties-file
#46709
Conversation
spark-defaults.conf
when provided --properties-file
spark-defaults.conf
when provided --properties-file
} | ||
|
||
if (propertiesFile == null) { | ||
propertiesFile = defaultSparkConf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly to make the test happy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this follows original behavior to overwrite propertiesFile
value. Although it looks like a strange behavior.
@@ -1623,6 +1640,22 @@ class SparkSubmitSuite | |||
} | |||
} | |||
|
|||
private def withPropertyFile(fileName: String, conf: Map[String, String])(f: String => Unit) = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extract some common logic of creating a property file into a util function.
cc @viirya @dongjoon-hyun @cloud-fan let me know what you think |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this proposed behavior looks reasonable to me. If there is any document on this, we probably need to update it too.
Do you also think we should add an item to migration guide for this?
Thanks @viirya . There is a related documentation here: https://spark.apache.org/docs/latest/configuration.html#dynamically-loading-spark-properties. It looks like the behavior change in this PR is consistent with the description there:
Yes sure. I can add this to the migration guide as a separate PR afterwards. |
Yea, but seems it doesn't mention how does |
@@ -125,14 +125,27 @@ private[deploy] class SparkSubmitArguments(args: Seq[String], env: Map[String, S | |||
* When this is called, `sparkProperties` is already filled with configs from the latter. | |||
*/ | |||
private def mergeDefaultSparkProperties(): Unit = { | |||
// Use common defaults file, if not specified by user | |||
propertiesFile = Option(propertiesFile).getOrElse(Utils.getDefaultPropertiesFile(env)) | |||
// Honor --conf before the defaults file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Honor --conf before the defaults file | |
// Honor --conf before the specified properties file and defaults file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Yea makes sense. Just updated the doc too. |
Thanks! merged to master/4.0 |
You need to update all the usage information for commands with |
@yaooqinn do you mean the
which I think is also consistent with the behavior here although not complete? I can add more info there. |
Do the commands with the above message also get affected? |
No they are not affected. This PR only changes |
I'm not sure if such an implicit discrepancy is user-friendly or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a behavior change and it seems there's no easy way for users to restore the original behavior. Do you think it would be safer to add a new cli argument, such as --extra-properties-file
instead? By adding --extra-properties-files
, I think the affect scope could be minimum.
A potential use case for the original --properties-file
: there might be multiple spark clusters(or yarn clusters) to submit spark jobs and there are multiple spark-config files under $SPARK_CONF_DIR, such as:
- spark-defaults.conf --> which stores the default cluster settings
- spark-cluster-a.conf --> which stores the cluster settings for cluster a
- spark-cluster-b.conf --> which stores the cluster settings for cluster b
For most cases, the default cluster is used. For some specific jobs, cluster-a might be used. After this change, both spark-defaults.conf
and spark-cluster-a.conf
are loaded, which might be undesired as users are already assuming only one property file will be loaded.
// Use common defaults file, if not specified by user | ||
propertiesFile = Option(propertiesFile).getOrElse(Utils.getDefaultPropertiesFile(env)) | ||
// Honor --conf before the defaults file | ||
// Honor --conf before the specified properties file and defaults file | ||
defaultSparkProperties.foreach { case (k, v) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when running with --verbose
, the defaultSparkProperties
evaluation will logging: "Using properties file: null"
if --properties-file
is not provided as propertiesFile
is not set yet., which I think it's not expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops that's right. I'll try to address it in the follow-up PR.
For the properties defined in For what is described, it seems giving this change more support? As users can put common default ones in Actually I feel this is more aligned with the meanings of "default" configs. |
+1 on what @viirya said. @advancedxy to be clear the configurations from Since we are preparing a new major release, I'm just hoping to take the opportunity to improve this. This is also important for kubeflow Spark k8s operator users, since currently there is no way to separate system-wide default configurations and user and job specific configurations. |
@sunchao and @viirya thanks for your replies and clarifications. Yeah, I'm aware of the effectiveness and precedence between the The biggest problem is that the potential unwanted configurations defined in
This sounds legit and it might be worthy of taking the risk.
This is a legit use case and we should definitely support that. It could be addressed by another cli argument though. Anyway, I would prefer an extra cli argument such as |
IIUC, we want to have a way to load the default config file, and it's cleaner if |
BTW, I agree that it's a better behavior to just let |
+1 for As a maintainer of Apache Kyuubi, which supports multitenant and multi-runtime environments. This change might pollute an isolated custom spark conf. |
Thanks for all the feedback! Let me re-purpose the follow up PR #46782 to add this extra flag then (and we no longer need to update the migration guide). One minor inconvenience is we'd also need to update kubeflow k8s operator to consume this flag, and then users of the operator need to upgrade first in order to use this. I'll make a PR there too. |
…ide whether to load `spark-defaults.conf` ### What changes were proposed in this pull request? Following the discussions in #46709, this PR adds a flag `--load-spark-defaults` to control whether `spark-defaults.conf` should be loaded when `--properties-file` is specified. By default, the flag is turned off. ### Why are the changes needed? Ideally we should avoid behavior change and reduce user disruptions. For this reason, a flag is preferred. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? N/A ### Was this patch authored or co-authored using generative AI tooling? No Closes #46782 from sunchao/SPARK-48392-followup. Authored-by: Chao Sun <chao@openai.com> Signed-off-by: Chao Sun <chao@openai.com>
…properties-file` ### What changes were proposed in this pull request? Currently if a property file is provided as argument to Spark submit, the `spark-defaults.conf` is ignored. This PR changes the behavior such that `spark-defaults.conf` is still loaded in this scenario, and any Spark configurations that are in the file but not in the input property file will be loaded. ### Why are the changes needed? Currently if a property file is provided as argument to Spark submit, the `spark-defaults.conf` is ignored. This causes inconveniences for users who want to split Spark configurations into the two. For example, in Spark on K8S users may want to store system wide default settings in `spark-defaults.conf`, while user specified configurations that are more dynamic in an property file and pass to Spark applications via `--properties-file` parameter. Currently this is not possible. See also kubeflow/spark-operator#1321. ### Does this PR introduce _any_ user-facing change? Yes, now when a property file is provided via `--properties-file`, `spark-defaults.conf` will also be loaded. However, those configurations specified in the former will take precedence over the same configurations in the latter. ### How was this patch tested? Existing tests and a new test. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#46709 from sunchao/SPARK-48392. Authored-by: Chao Sun <chao@openai.com> Signed-off-by: Chao Sun <chao@openai.com>
…ide whether to load `spark-defaults.conf` ### What changes were proposed in this pull request? Following the discussions in apache#46709, this PR adds a flag `--load-spark-defaults` to control whether `spark-defaults.conf` should be loaded when `--properties-file` is specified. By default, the flag is turned off. ### Why are the changes needed? Ideally we should avoid behavior change and reduce user disruptions. For this reason, a flag is preferred. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? N/A ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#46782 from sunchao/SPARK-48392-followup. Authored-by: Chao Sun <chao@openai.com> Signed-off-by: Chao Sun <chao@openai.com>
What changes were proposed in this pull request?
Currently if a property file is provided as argument to Spark submit, the
spark-defaults.conf
is ignored. This PR changes the behavior such thatspark-defaults.conf
is still loaded in this scenario, and any Spark configurations that are in the file but not in the input property file will be loaded.Why are the changes needed?
Currently if a property file is provided as argument to Spark submit, the
spark-defaults.conf
is ignored. This causes inconveniences for users who want to split Spark configurations into the two. For example, in Spark on K8S users may want to store system wide default settings inspark-defaults.conf
, while user specified configurations that are more dynamic in an property file and pass to Spark applications via--properties-file
parameter. Currently this is not possible. See also kubeflow/spark-operator#1321.Does this PR introduce any user-facing change?
Yes, now when a property file is provided via
--properties-file
,spark-defaults.conf
will also be loaded. However, those configurations specified in the former will take precedence over the same configurations in the latter.How was this patch tested?
Existing tests and a new test.
Was this patch authored or co-authored using generative AI tooling?
No