[core][python] Add row ID existence check and column-level conflict detection by JingsongLi · Pull Request #7971 · apache/paimon

JingsongLi · 2026-05-26T04:51:49Z

Add a Rust-style row ID existence check to both Java and Python commit validation, ensuring pre-assigned firstRowId files reference base files that still exist in the current snapshot. This covers a gap where checkRowIdRangeConflicts misses conflicts when base files are fully deleted (e.g., DROP PARTITION / OVERWRITE) with no replacement.

Also port Java's RowIdColumnConflictChecker to Python, enabling row×column two-dimensional conflict detection in check_row_id_from_snapshot. This allows concurrent MERGE INTO on different columns of the same rows to proceed without false conflicts.

…etection Add a Rust-style row ID existence check to both Java and Python commit validation, ensuring pre-assigned firstRowId files reference base files that still exist in the current snapshot. This covers a gap where checkRowIdRangeConflicts misses conflicts when base files are fully deleted (e.g., DROP PARTITION / OVERWRITE) with no replacement. Also port Java's RowIdColumnConflictChecker to Python, enabling row×column two-dimensional conflict detection in check_row_id_from_snapshot. This allows concurrent MERGE INTO on different columns of the same rows to proceed without false conflicts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

leaves12138

Thanks for the update. I found a blocking regression in the new row-id existence check.

checkRowIdExistence / check_row_id_existence currently checks every ADD delta entry with a non-null firstRowId. However, row tracking metadata is assigned before conflict detection, so newly appended files in the same commit also have firstRowId. Those files are expected to be absent from the latest snapshot, but this check treats them as stale references and rejects the commit.

This breaks an existing mixed upsert flow where one row is updated and another row is inserted in the same commit. I can reproduce it on this PR head with:

cd paimon-python
PYTHONPATH=. python -m pytest \
  pypaimon/tests/table_upsert_by_key_test.py::TableUpsertByKeyBatchTest::test_mixed_update_and_append \
  pypaimon/tests/table_upsert_by_key_test.py::TableUpsertByKeyStreamTest::test_mixed_update_and_append \
  -q

Both tests fail with:

RuntimeError: Row ID existence conflict: file 'data-...parquet' references firstRowId=2, rowCount=1 in bucket 0, but no matching file exists in the current snapshot.

The file with firstRowId=2 is the newly inserted row, not a pre-assigned rewrite of an existing row-id range. The existence check needs to distinguish newly allocated append row-id ranges from files that intentionally reference existing row ids (for example by only checking ranges below the snapshot's nextRowId, or by carrying explicit source/rewrite semantics), and this mixed update+insert case should be covered by an end-to-end test.

Row tracking assigns firstRowId before conflict detection, so newly appended files also have firstRowId set and were falsely rejected. Add nextRowId parameter to only check files where firstRowId < nextRowId. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

leaves12138

Thanks for the update. The previous false positive for mixed update+append upsert is fixed now.

I verified the PR head with:

cd paimon-python
PYTHONPATH=. python -m pytest \
  pypaimon/tests/table_upsert_by_key_test.py \
  pypaimon/tests/table_update_test.py \
  pypaimon/tests/write/conflict_detection_test.py \
  -q

and:

mvn -pl paimon-core -DskipITs -Dcheckstyle.skip -Drat.skip=true \
  -Dspotless.check.skip=true -Dtest=ConflictDetectionTest test

Both passed. Looks good to me.

…h Java In Java, conflict detection runs before row ID assignment. Python had the opposite order, which meant delta entries already had firstRowId set during conflict checks. Reorder to match Java's flow. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

leaves12138

I checked the Python ordering change in the latest head as well.

Moving conflict detection before row-id assignment matches the Java commit ordering and avoids treating newly assigned append row ids as pre-existing row-id references. I also verified these paths locally:

cd paimon-python
PYTHONPATH=. python -m pytest \
  pypaimon/tests/table_upsert_by_key_test.py \
  pypaimon/tests/table_update_test.py \
  pypaimon/tests/write/conflict_detection_test.py \
  -q

Result: 100 passed, 20 subtests passed.

I additionally checked concurrent mixed upsert/update scenarios manually: same row+same column still conflicts, same row+different columns still succeeds, and concurrent pure appends still succeed. Looks good to me.

…essage The check_row_id_existence error message uses singular 'conflict', while the test asserted on plural 'conflicts'. Use 'conflict' to match both existence check and range conflict messages. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The concurrent merge+compact test retries on conflict exceptions, but only caught the range conflict message. Add the new existence conflict message so the retry works for both conflict types. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

leaves12138

+1

leaves12138 requested changes May 26, 2026

View reviewed changes

leaves12138 approved these changes May 26, 2026

View reviewed changes

JingsongLi and others added 2 commits May 26, 2026 13:46

[python] Remove unused Optional import in conflict detection test

88d228c

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

leaves12138 approved these changes May 26, 2026

View reviewed changes

JingsongLi and others added 3 commits May 26, 2026 14:18

[spark] Fix spotless formatting in RowTrackingTestBase

a9631f3

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

leaves12138 approved these changes May 26, 2026

View reviewed changes

JingsongLi merged commit 73c962a into apache:master May 26, 2026
17 of 18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core][python] Add row ID existence check and column-level conflict detection#7971

[core][python] Add row ID existence check and column-level conflict detection#7971
JingsongLi merged 7 commits into
apache:masterfrom
JingsongLi:commit_check

JingsongLi commented May 26, 2026

Uh oh!

leaves12138 left a comment

Uh oh!

leaves12138 left a comment

Uh oh!

leaves12138 left a comment

Uh oh!

leaves12138 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JingsongLi commented May 26, 2026

Uh oh!

leaves12138 left a comment

Choose a reason for hiding this comment

Uh oh!

leaves12138 left a comment

Choose a reason for hiding this comment

Uh oh!

leaves12138 left a comment

Choose a reason for hiding this comment

Uh oh!

leaves12138 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants