[flink] introduce a simplified MERGE INTO procedure on data-evolution-table for flink #7128

steFaiz · 2026-01-27T03:06:57Z

Purpose

Linked issue: #7019

This PR is about to introduce a simplified MERGE INTO action/procedure on DataEvolutionTable for flink.
The motivation is that for data-evolution tables, we could efficiently update or insert columns without rewriting existing data files. Paimon-spark module implements it through merge into syntax, which is not supported by flink. So we introduce this action to simulate merge into behavier.

NOTE: Due to limitations in Flink’s implementation, compared with paimon-spark we currently only support the MERGE UPDATE SET branch and do not support inserting new records yet.

The process can be illustrated as below:

Source table INNER JOIN target table on merge condition.
This step will assign _row_id for each row in source table.
Shuffle the joined table by corresponding FirstRowId of each newly assigned _row_id.
This step is about to ensure that rows belonging to the same files should be processed by same writer operators.
Write updated/inserted columns to new files.
a. Sort rows by _row_id
b. Read original data from each row ranges, merge original data with new rows
c. Write out merged data
Commit new files.

This implementation is specially designed for cases where the source table may be much smaller than the target table. Each writer is responsible for reading the original file data. Another possible approach is to perform a left outer join of the target table with the source table, rather than an inner join.

Merge Detail

New rows will be merged with existing rows to make new files aligned with existing files. For example, consider existing rows:

_row_id	value (double)	first_row_id
1	12.34	1
2	0.00	1
3	-7.50	1
4	100.01	1
5	3.14	1

They belong to a same file whose row range is [1, 5]
Then a new updated row comes:

_row_id	value (double)	first_row_id
3	10000.00	1

We will merge exiting file and the new file, write out:

_row_id	value (double)	first_row_id
1	12.34	1
2	0.00	1
3	10000.00	1
4	100.01	1
5	3.14	1

Tests

Please see org.apache.paimon.flink.action.DataEvolutionMergeIntoActionITCase

API and Format

Do not modify any existing api.

Documentation

Will be added ASAP

…ble for flink

JingsongLi

Please add documentation in /append-table/data-evolution too.

JingsongLi · 2026-01-29T02:45:48Z

+1

JingsongLi · 2026-01-29T02:45:54Z

Thanks @steFaiz !

[flink] introduce a simplified MERGE INTO action on data-evolution-ta…

d78934a

…ble for flink

steFaiz marked this pull request as draft January 27, 2026 03:07

steFaiz added 4 commits January 27, 2026 17:58

fix compilation

2adc6ff

minor fix

064a090

introduce related procedure to fix test

81fbe84

fix test

f975a52

steFaiz changed the title ~~[wip][flink] introduce a simplified MERGE INTO action on data-evolution-table for flink~~ [flink] introduce a simplified MERGE INTO action on data-evolution-table for flink Jan 28, 2026

steFaiz changed the title ~~[flink] introduce a simplified MERGE INTO action on data-evolution-table for flink~~ [flink] introduce a simplified MERGE INTO procedure on data-evolution-table for flink Jan 28, 2026

steFaiz marked this pull request as ready for review January 28, 2026 07:15

JingsongLi reviewed Jan 28, 2026

View reviewed changes

add docs

4ef3ee2

JingsongLi merged commit bc6d341 into apache:master Jan 29, 2026
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[flink] introduce a simplified MERGE INTO procedure on data-evolution-table for flink #7128

[flink] introduce a simplified MERGE INTO procedure on data-evolution-table for flink #7128

Uh oh!

steFaiz commented Jan 27, 2026 •

edited

Loading

Uh oh!

JingsongLi left a comment

Uh oh!

JingsongLi commented Jan 29, 2026

Uh oh!

JingsongLi commented Jan 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[flink] introduce a simplified MERGE INTO procedure on data-evolution-table for flink #7128

[flink] introduce a simplified MERGE INTO procedure on data-evolution-table for flink #7128

Uh oh!

Conversation

steFaiz commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Merge Detail

Tests

API and Format

Documentation

Uh oh!

JingsongLi left a comment

Choose a reason for hiding this comment

Uh oh!

JingsongLi commented Jan 29, 2026

Uh oh!

JingsongLi commented Jan 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

steFaiz commented Jan 27, 2026 •

edited

Loading