[flink][spark] fix incorrect first_row_id range in DataEvolution MergeInto by steFaiz · Pull Request #7790 · apache/paimon

steFaiz · 2026-05-09T04:14:43Z

Current first row id check in DataEvolutionPartialWriter maybe incorrect because of special files i.e. Blob Files and Vector FIles, which may cause:

java.lang.AssertionError: assertion failed: Number of written records 2419 does not match expected number 244 for first row ID 19352.

This is because the blob file's record count override the normal file's record count:

We should filter out special files when calculating first_row_id to record_count mapping

See :
org.apache.paimon.flink.action.DataEvolutionMergeIntoActionITCase for flink test
org.apache.paimon.spark.sql.BlobTestBase for spark test

…eInto

JingsongLi · 2026-05-09T06:13:58Z

Thanks @steFaiz +1

[flink][spark] fix incorrect first_row_id range in DataEvolution Merg…

e0f3ba7

…eInto

steFaiz force-pushed the fix_de_merge_into_row_id branch from 3641cbd to e0f3ba7 Compare May 9, 2026 04:36

JingsongLi merged commit 1c5f2de into apache:master May 9, 2026
12 checks passed

Provide feedback