Skip to content

[flink][spark] fix incorrect first_row_id range in DataEvolution MergeInto#7790

Merged
JingsongLi merged 1 commit into
apache:masterfrom
steFaiz:fix_de_merge_into_row_id
May 9, 2026
Merged

[flink][spark] fix incorrect first_row_id range in DataEvolution MergeInto#7790
JingsongLi merged 1 commit into
apache:masterfrom
steFaiz:fix_de_merge_into_row_id

Conversation

@steFaiz
Copy link
Copy Markdown
Contributor

@steFaiz steFaiz commented May 9, 2026

Purpose

Current first row id check in DataEvolutionPartialWriter maybe incorrect because of special files i.e. Blob Files and Vector FIles, which may cause:

java.lang.AssertionError: assertion failed: Number of written records 2419 does not match expected number 244 for first row ID 19352.

This is because the blob file's record count override the normal file's record count:
image

We should filter out special files when calculating first_row_id to record_count mapping

Tests

See :
org.apache.paimon.flink.action.DataEvolutionMergeIntoActionITCase for flink test
org.apache.paimon.spark.sql.BlobTestBase for spark test

@steFaiz steFaiz force-pushed the fix_de_merge_into_row_id branch from 3641cbd to e0f3ba7 Compare May 9, 2026 04:36
@JingsongLi
Copy link
Copy Markdown
Contributor

Thanks @steFaiz +1

@JingsongLi JingsongLi merged commit 1c5f2de into apache:master May 9, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants