[SPARK-54289][SQL] Allow MERGE INTO to preserve existing struct fields for UPDATE SET * when source struct has less nested fields than target struct #53149

szehon-ho · 2025-11-21T02:47:56Z

What changes were proposed in this pull request?

Introduce a new flag spark.sql.merge.nested.type.assign.by.field that allows UPDATE SET * action in MERGE INTO to be shorthand to assign every nested struct to its existing source counterpart (ie, UPDATE SET a.b.c = source.a.b.c). This will have the implication that existing struct field in the target table that has no source equivalent are preserved, when the corresponding source struct has less fields than target.

Additional code is added to prevent null expansion in this case (ie, a null source struct expanding to a struct of nulls).

Why are the changes needed?

Following #52347, we now allow MERGE INTO to have a source table struct with less nested fields than target table struct. In this scenario, a user making a UPDATE SET * may have two interpretations.

The use may interpret UPDATE SET * as shorthand to assign every top-column level field, ie UPDATE SET struct=source.struct, then the target struct is set to source struct object as is, with missing fields as NULL. This is the current behavior.

The user may also mean that UPDATE SET * is short-hand to assign every nested struct field (ie, UPDATE SET struct.a.b = source.struct.a.b), in which case the target struct fields missing in source are retained. This is similar to UPDATE SET * not overriding existing target columns missing in the source, for example. For this case, this flag is added.

Does this PR introduce any user-facing change?

No, the support to allow source structs to have less fields than target structs in MERGE INTO is unreleased yet (#52347), and in any case there is a flag to toggle this functionality.

How was this patch tested?

Unit tests, especially around cases where the source struct is null.

Was this patch authored or co-authored using generative AI tooling?

No

…s for UPDATE SET * when source has less fields

dongjoon-hyun

~~Is this an improvement, @szehon-ho ?~~
~~It looks like a massive change if it's a bug fix.~~

Never mind. I checked the JIRA discussion we had before.

szehon-ho · 2025-11-21T09:41:15Z

@cloud-fan can you help review? Thanks

szehon-ho · 2025-11-21T09:43:04Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AssignmentUtils.scala

    }
  }

+  private def applyNestedFieldAssignments(


Note: this is like applyFieldAssignments above, but recurses into nested structs

dongjoon-hyun · 2025-11-21T15:53:09Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

      .createWithDefault(true)

+  val MERGE_INTO_SOURCE_NESTED_TYPE_UPDATE_BY_FIELD =
+    buildConf("spark.sql.merge.nested.type.assign.by.fieldv2")


Hi, @szehon-ho . The naming space design looks weird. Why nested is at the different level like the following?

spark.sql.merge.nested.type.assign.by.fieldv2 spark.sql.merge.source.nested.type.coercion.enabled

ah yes i fixed the config string in latest one, thanks!

dongjoon-hyun · 2025-11-21T16:01:42Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

    getConf(SQLConf.LEGACY_XML_PARSER_ENABLED)

-  def coerceMergeNestedTypes: Boolean =
+  def mergeCoerceNestedTypes: Boolean =


Oh, you also knew that the naming is weird, don't you, @szehon-ho ? That's the reason you renaming this in this PR.

ah yes, its not strictly related, but realized its better they align.

dongjoon-hyun

+1, LGTM.

…s for UPDATE SET * when source struct has less nested fields than target struct ### What changes were proposed in this pull request? Introduce a new flag spark.sql.merge.nested.type.assign.by.field that allows UPDATE SET * action in MERGE INTO to be shorthand to assign every nested struct to its existing source counterpart (ie, UPDATE SET a.b.c = source.a.b.c). This will have the implication that existing struct field in the target table that has no source equivalent are preserved, when the corresponding source struct has less fields than target. Additional code is added to prevent null expansion in this case (ie, a null source struct expanding to a struct of nulls). ### Why are the changes needed? Following #52347, we now allow MERGE INTO to have a source table struct with less nested fields than target table struct. In this scenario, a user making a UPDATE SET * may have two interpretations. The use may interpret UPDATE SET * as shorthand to assign every top-column level field, ie UPDATE SET struct=source.struct, then the target struct is set to source struct object as is, with missing fields as NULL. This is the current behavior. The user may also mean that UPDATE SET * is short-hand to assign every nested struct field (ie, UPDATE SET struct.a.b = source.struct.a.b), in which case the target struct fields missing in source are retained. This is similar to UPDATE SET * not overriding existing target columns missing in the source, for example. For this case, this flag is added. ### Does this PR introduce _any_ user-facing change? No, the support to allow source structs to have less fields than target structs in MERGE INTO is unreleased yet (#52347), and in any case there is a flag to toggle this functionality. ### How was this patch tested? Unit tests, especially around cases where the source struct is null. ### Was this patch authored or co-authored using generative AI tooling? No Closes #53149 from szehon-ho/merge_schema_evolution_update_nested. Authored-by: Szehon Ho <szehon.apache@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 966e053) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

dongjoon-hyun · 2025-11-22T04:05:54Z

Merged to master/4.1 for Apache Spark 4.1.0.

Happy Thanksgiving, @szehon-ho .

github-actions bot added the SQL label Nov 21, 2025

szehon-ho force-pushed the merge_schema_evolution_update_nested branch 3 times, most recently from db416a9 to 142d795 Compare November 21, 2025 03:08

[SPARK-54289][SQL] Allow MERGE INTO to preserve existing struct field…

fdddef1

…s for UPDATE SET * when source has less fields

szehon-ho force-pushed the merge_schema_evolution_update_nested branch from 142d795 to fdddef1 Compare November 21, 2025 03:13

dongjoon-hyun reviewed Nov 21, 2025

View reviewed changes

szehon-ho commented Nov 21, 2025

View reviewed changes

dongjoon-hyun reviewed Nov 21, 2025

View reviewed changes

Review comments

0011c07

dongjoon-hyun approved these changes Nov 22, 2025

View reviewed changes

dongjoon-hyun closed this in 966e053 Nov 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-54289][SQL] Allow MERGE INTO to preserve existing struct fields for UPDATE SET * when source struct has less nested fields than target struct #53149

[SPARK-54289][SQL] Allow MERGE INTO to preserve existing struct fields for UPDATE SET * when source struct has less nested fields than target struct #53149

szehon-ho commented Nov 21, 2025 •

edited

Loading

Uh oh!

dongjoon-hyun left a comment •

edited

Loading

Uh oh!

szehon-ho commented Nov 21, 2025

Uh oh!

szehon-ho Nov 21, 2025 •

edited

Loading

Uh oh!

dongjoon-hyun Nov 21, 2025 •

edited

Loading

Uh oh!

szehon-ho Nov 22, 2025 •

edited

Loading

Uh oh!

dongjoon-hyun Nov 21, 2025

Uh oh!

szehon-ho Nov 22, 2025

Uh oh!

dongjoon-hyun left a comment

Uh oh!

dongjoon-hyun commented Nov 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[SPARK-54289][SQL] Allow MERGE INTO to preserve existing struct fields for UPDATE SET * when source struct has less nested fields than target struct #53149

[SPARK-54289][SQL] Allow MERGE INTO to preserve existing struct fields for UPDATE SET * when source struct has less nested fields than target struct #53149

Conversation

szehon-ho commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

dongjoon-hyun left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szehon-ho commented Nov 21, 2025

Uh oh!

szehon-ho Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szehon-ho Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

szehon-ho Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Nov 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

szehon-ho commented Nov 21, 2025 •

edited

Loading

dongjoon-hyun left a comment •

edited

Loading

szehon-ho Nov 21, 2025 •

edited

Loading

dongjoon-hyun Nov 21, 2025 •

edited

Loading

szehon-ho Nov 22, 2025 •

edited

Loading