Skip to content

Conversation

@szehon-ho
Copy link
Member

@szehon-ho szehon-ho commented Nov 26, 2025

What changes were proposed in this pull request?

#52225 allow MERGE INTO to support case where assignment value is a struct with less fields than the assignment key, ie UPDATE SET big_struct = source.small_struct.

This makes this feature off by default, and turned on via a config.

Why are the changes needed?

The change brought some interesting question, for example there is some ambiguity in user intent. Does the UPDATE SET * mean set all nested fields or top level columns? In the first case, missing fields are kept. In the second case, missing fields are nullified.

I tried to make a choice in #53149 but after some feedback, it may be a bit controversial, choosing one interpretation over another. A SQLConf may not be the right choice, and instead we may need to introduce some new syntax, which require more discussion.

Does this PR introduce any user-facing change?

No this feature is unreleased

How was this patch tested?

Existing unit test

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Nov 26, 2025
def coerceMergeNestedTypes: Boolean =
getConf(SQLConf.MERGE_INTO_NESTED_TYPE_COERCION_ENABLED)
// Disable until we define the semantics of UPDATE SET * with nested types
def coerceMergeNestedTypes: Boolean = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we just turn off the config by default without removing tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, that'd be great

@corleyma
Copy link

Is it possible to make this a behavior that folks can opt into while the semantics are being sorted out? I would find the new behavior very useful, and it's a bit sad to leave it disabled when it's implemented. The gymnastics I do to handle this today isn't fun.

@szehon-ho
Copy link
Member Author

szehon-ho commented Nov 26, 2025

sure, as per @cloud-fan 's comment we will disable by config. I will mark config as experimental and know the semantics for nested field assignment may change in future release, if they are not matching in schema

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @szehon-ho . Looking forward to seeing the final status of this PR.

BTW, when you make a PR for master branch, it affects Apache Spark 4.2.0 too. So, please remove this wording from this PR.

This change disable it for Spark 4.1.

@szehon-ho
Copy link
Member Author

Updated pr description and pr.

Because now I have a config to enable it, I reverted the more controversial pr #53149 manually, it should revert to the original simpler behavior of replacing structs at column level for UPDATE SET *.

Thanks!

|s STRUCT<c1: INT, c2: STRUCT<a: ARRAY<INT>, m: MAP<STRING, STRING>>>,
|dep STRING""".stripMargin,
"""{ "pk": 1, "s": { "c1": 2, "c2": { "a": [1,2], "m": { "a": "b" } } }, "dep": "hr" }""")
Seq(true, false).foreach { coerceNestedTypes =>
Copy link
Member Author

@szehon-ho szehon-ho Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: most of these changes is because coerceNestedTypes was true by default, now its false, so i add another dimension to these tests

@dongjoon-hyun dongjoon-hyun self-assigned this Nov 26, 2025
@szehon-ho szehon-ho changed the title [SPARK-54525][SQL] Disable nested struct coercion in MERGE INTO [SPARK-54525][SQL] Disable nested struct coercion in MERGE INTO under a config Nov 26, 2025
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for the direction. Let's see the CI result first. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants