Skip to content

[SPARK-56669][SQL] Implement group filtering for WriteDelta row level operations#55635

Closed
gengliangwang wants to merge 2 commits intoapache:masterfrom
gengliangwang:spark-56669-redo
Closed

[SPARK-56669][SQL] Implement group filtering for WriteDelta row level operations#55635
gengliangwang wants to merge 2 commits intoapache:masterfrom
gengliangwang:spark-56669-redo

Conversation

@gengliangwang
Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

This PR implements group filtering for WriteDelta row level operations.

It re-applies #55612 (commit 5ef2e1ba174, reverted in 8e8fee2692f) and resolves the test failures reported in #55612 (comment) by updating the scan-count assertions in the transactional check tests in MergeIntoTableSuiteBase and UpdateTableSuiteBase. With group filtering, matchingRowsPlan re-scans the target, and for MERGE RewritePredicateSubquery also re-scans the source. For MERGE the delta scan counts now match the non-delta values, so the deltaMerge conditionals collapse. For UPDATE the delta counts double but remain under the non-delta values because ReplaceData still adds further scans.

Why are the changes needed?

These changes are needed to close the gap in WriteDelta plans.

Does this PR introduce any user-facing change?

Changes are backward compatible.

How was this patch tested?

This PR comes with tests. Locally verified all 9 affected suites are green (517 tests):

build/sbt 'sql/testOnly \
  org.apache.spark.sql.connector.DeltaBasedMergeIntoTableSuite \
  org.apache.spark.sql.connector.DeltaBasedMergeIntoTableWithDeletionVectorsSuite \
  org.apache.spark.sql.connector.DeltaBasedMergeIntoTableUpdateAsDeleteAndInsertSuite \
  org.apache.spark.sql.connector.DeltaBasedUpdateTableSuite \
  org.apache.spark.sql.connector.DeltaBasedUpdateTableWithDeletionVectorsSuite \
  org.apache.spark.sql.connector.DeltaBasedUpdateAsDeleteAndInsertTableSuite \
  org.apache.spark.sql.connector.DeltaBasedNoMetadataDeleteFromTableSuite \
  org.apache.spark.sql.connector.GroupBasedMergeIntoTableSuite \
  org.apache.spark.sql.connector.GroupBasedUpdateTableSuite'

Was this patch authored or co-authored using generative AI tooling?

Claude Code v2.1.123.

aokolnychyi and others added 2 commits April 30, 2026 12:50
… operations

### What changes were proposed in this pull request?

This PR implement group filtering for WriteDelta row level operations.

### Why are the changes needed?

These changes are needed to close the gap in WriteDelta plans.

### Does this PR introduce _any_ user-facing change?

Changes are backward compatible.

### How was this patch tested?

This PR comes with tests.

### Was this patch authored or co-authored using generative AI tooling?

Claude Code v2.1.123.

Closes apache#55612 from aokolnychyi/spark-56669.

Authored-by: Anton Okolnychyi <aokolnychyi@apache.org>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
…r WriteDelta group filtering

The previous commit added group filtering for WriteDelta row level operations
but missed updating scan-count assertions in the transactional check tests in
MergeIntoTableSuiteBase and UpdateTableSuiteBase, which still assumed the old
delta behavior. With group filtering, matchingRowsPlan re-scans the target,
and for MERGE RewritePredicateSubquery also re-scans the source.

For MERGE the delta scan counts now match the non-delta values, so the
deltaMerge conditionals collapse. For UPDATE the delta counts double but
remain under the non-delta values because ReplaceData adds further scans.
@gengliangwang
Copy link
Copy Markdown
Member Author

I will just merge this re-apply commit to catch the branch cut.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants