-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
Is your feature request related to a problem or challenge?
The goal is to support MERGE INTO SQL statements in DataFusion so that downstream table providers (specifically Iceberg via iceberg-rust) can implement merge logic. The iceberg-rust repo (feature/merge-into branch) already has a merge_into function on its DataFusion table provider that expects DataFusion to parse MERGE INTO SQL and invoke a merge_into hook on TableProvider.
Example SQL:
MERGE INTO target_table t
USING source_table s
ON t.id = s.id
WHEN MATCHED THEN UPDATE SET t.value = s.value
WHEN NOT MATCHED THEN INSERT (id, value) VALUES (s.id, s.value)
Describe the solution you'd like
The implementation follows the same pattern as UPDATE/DELETE DML hooks (PR#19142). Reuse the existing DmlStatement logical plan node with a new WriteOp::MergeInto(MergeIntoOp) variant. The merge-specific data (ON condition, WHEN clauses) is carried in MergeIntoOp. The DmlStatement.input field holds the source plan (USING clause), and DmlStatement.target holds the target table.
sqlparser v0.61.0 already parses Statement::Merge — no parser work needed.
The following tasks are already implemented in the PoC branch with 3 commits. Plan to raise PRs one by another as the fork repo doesn't support stacking PRs.
- Add MERGE INTO types to datafusion-expr #20763
- Add merge_into hook to TableProvider trait
- Add SQL planner (merge_to_plan) + physical planner dispatch