-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-54525][SQL] Disable nested struct coercion in MERGE INTO under a config #53229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[SPARK-54525][SQL] Disable nested struct coercion in MERGE INTO under a config #53229
Conversation
| def coerceMergeNestedTypes: Boolean = | ||
| getConf(SQLConf.MERGE_INTO_NESTED_TYPE_COERCION_ENABLED) | ||
| // Disable until we define the semantics of UPDATE SET * with nested types | ||
| def coerceMergeNestedTypes: Boolean = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we just turn off the config by default without removing tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, that'd be great
|
Is it possible to make this a behavior that folks can opt into while the semantics are being sorted out? I would find the new behavior very useful, and it's a bit sad to leave it disabled when it's implemented. The gymnastics I do to handle this today isn't fun. |
|
sure, as per @cloud-fan 's comment we will disable by config. I will mark config as experimental and know the semantics for nested field assignment may change in future release, if they are not matching in schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @szehon-ho . Looking forward to seeing the final status of this PR.
BTW, when you make a PR for master branch, it affects Apache Spark 4.2.0 too. So, please remove this wording from this PR.
This change disable it for Spark 4.1.
This reverts commit 96fca0e.
|
Updated pr description and pr. Because now I have a config to enable it, I reverted the more controversial pr #53149 manually, it should revert to the original simpler behavior of replacing structs at column level for UPDATE SET *. Thanks! |
| |s STRUCT<c1: INT, c2: STRUCT<a: ARRAY<INT>, m: MAP<STRING, STRING>>>, | ||
| |dep STRING""".stripMargin, | ||
| """{ "pk": 1, "s": { "c1": 2, "c2": { "a": [1,2], "m": { "a": "b" } } }, "dep": "hr" }""") | ||
| Seq(true, false).foreach { coerceNestedTypes => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: most of these changes is because coerceNestedTypes was true by default, now its false, so i add another dimension to these tests
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for the direction. Let's see the CI result first. Thanks.
What changes were proposed in this pull request?
#52225 allow MERGE INTO to support case where assignment value is a struct with less fields than the assignment key, ie UPDATE SET big_struct = source.small_struct.
This makes this feature off by default, and turned on via a config.
Why are the changes needed?
The change brought some interesting question, for example there is some ambiguity in user intent. Does the UPDATE SET * mean set all nested fields or top level columns? In the first case, missing fields are kept. In the second case, missing fields are nullified.
I tried to make a choice in #53149 but after some feedback, it may be a bit controversial, choosing one interpretation over another. A SQLConf may not be the right choice, and instead we may need to introduce some new syntax, which require more discussion.
Does this PR introduce any user-facing change?
No this feature is unreleased
How was this patch tested?
Existing unit test
Was this patch authored or co-authored using generative AI tooling?
No