[spark] Unify MERGE INTO assignment alignment by Zouxxyy · Pull Request #7976 · apache/paimon

Zouxxyy · 2026-05-26T08:09:16Z

Purpose

Route MERGE INTO assignment alignment through PaimonOutputResolver so MERGE / UPDATE / INSERT share one name-based, depth-aware alignment path with consistent merge-schema semantics.

Behavior is controlled by spark.paimon.write.merge-schema:

Top-level alignment is by name. Unmentioned target columns are NULL-filled under explicit clauses (matches Spark INSERT FILL).
INSERT * / UPDATE * with a source column missing from the target throws when merge-schema=false; when true, source-extras are kept so SchemaHelper evolves the table at write time.
Nested struct alignment follows MissingFieldBehavior:
- FailMissing (merge-schema=false): nested missing target / source-extra throws.
- NullForMissing (merge-schema=true, INSERT and explicit UPDATE): missing nested fields NULL-fill, source-extras kept at any depth.
- PreserveTarget (merge-schema=true, UPDATE * on a struct target): missing source subfields are substituted with GetStructField(target, ordinal) so unmentioned subfields keep their current value instead of being nulled.

Tests

New MergeIntoAlignmentTest (24 cases) covers basic UPDATE * / INSERT *, source-extra drop under strict star, top-level and nested merge-schema evolution, PreserveTarget semantics on nested struct UPDATE *, explicit assignments to nested fields, and null-fill for omitted columns.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three focused tests under default spark.sql.caseSensitive=false: - top-level column UPDATE * / INSERT * via expandStarAssignments - explicit SET LHS resolution via resolveAssignments - nested struct field matching via resolveStructType recursion Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

leaves12138

I reviewed the MERGE INTO alignment changes. The common alignment/evolution path and the Spark 3/4 shims look reasonable to me, and CI is green. I found one documentation mismatch that should be fixed before merge.

leaves12138 · 2026-05-27T01:37:01Z

+
+`INSERT *` / `UPDATE SET *` expand against the target columns:
+
+- Source columns missing from the target are rejected by default. Enable `spark.paimon.write.merge-schema` to keep them and evolve the table schema at write time (see [Write Merge Schema](#write-merge-schema)).


This section does not seem to match the implementation and the new tests. For top-level source-only columns under UPDATE SET * / INSERT *, strict mode currently drops them (for example source extra columns silently dropped under star expansion (mergeSchema=false)), while merge-schema=true evolves the target schema. Conversely, for target-only columns missing from the source, strict UPDATE SET * / INSERT * throws, and only merge-schema mode preserves the target value for UPDATE SET * or fills default/NULL for INSERT *. Could you adjust these bullets to describe those four cases explicitly?

JingsongLi

This is a significant refactoring with good motivations — unifying the assignment alignment logic removes a lot of duplicated version-specific code. The +1777/-1481 diff shows meaningful consolidation.

A few observations:

MissingFieldBehavior semantics are well-thought-out: The three modes (FailMissing, NullForMissing, PreserveTarget) cover the right matrix of explicit vs star clauses under strict/merge-schema modes. The PreserveTarget mode for UPDATE * on nested structs (substituting GetStructField(target, ordinal) for unmentioned subfields) is the correct behavior.
Test coverage looks thorough: 24 cases in MergeIntoAlignmentTest covering the key scenarios. This gives me confidence in the refactoring.
Question about the Spark 4.0 path: The diff removes PaimonMergeIntoResolverBase.scala from paimon-spark-4.0 and significantly simplifies Spark41MergeIntoRewrite.scala. Can you confirm the Spark 4.0 tests pass? The Spark 4 MERGE INTO semantics differ from 3.x in how they handle star clause expansion.
Documentation update: Good that the docs update mentions the spark.paimon.write.merge-schema control. This is user-facing behavior change worth highlighting in release notes.

Overall this looks like a good consolidation. @Zouxxyy please confirm CI passes for all Spark versions (3.2, 3.3, 3.4, 3.5, 4.0).

Zouxxyy and others added 2 commits May 26, 2026 16:05

[spark] Unify MERGE INTO assignment alignment

2931488

[docs] Document MERGE INTO column alignment

05eaf3e

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Zouxxyy closed this May 26, 2026

Zouxxyy reopened this May 26, 2026

Zouxxyy and others added 2 commits May 26, 2026 17:20

Update

0b591d4

leaves12138 reviewed May 27, 2026

View reviewed changes

JingsongLi reviewed May 27, 2026

View reviewed changes

Zouxxyy marked this pull request as draft May 27, 2026 02:41

Update doc

e47d056

Zouxxyy marked this pull request as ready for review May 27, 2026 04:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[spark] Unify MERGE INTO assignment alignment#7976

[spark] Unify MERGE INTO assignment alignment#7976
Zouxxyy wants to merge 5 commits into
apache:masterfrom
Zouxxyy:dev/merge-into-update

Zouxxyy commented May 26, 2026 •

edited

Loading

Uh oh!

leaves12138 left a comment

Uh oh!

leaves12138 May 27, 2026 •

edited

Loading

Uh oh!

JingsongLi left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		`INSERT ` / `UPDATE SET ` expand against the target columns:

		- Source columns missing from the target are rejected by default. Enable `spark.paimon.write.merge-schema` to keep them and evolve the table schema at write time (see [Write Merge Schema](#write-merge-schema)).

Conversation

Zouxxyy commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Tests

Uh oh!

leaves12138 left a comment

Choose a reason for hiding this comment

Uh oh!

leaves12138 May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JingsongLi left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Zouxxyy commented May 26, 2026 •

edited

Loading

leaves12138 May 27, 2026 •

edited

Loading