Skip to content

[core][python] Add row ID existence check and column-level conflict detection#7971

Merged
JingsongLi merged 7 commits into
apache:masterfrom
JingsongLi:commit_check
May 26, 2026
Merged

[core][python] Add row ID existence check and column-level conflict detection#7971
JingsongLi merged 7 commits into
apache:masterfrom
JingsongLi:commit_check

Conversation

@JingsongLi
Copy link
Copy Markdown
Contributor

Add a Rust-style row ID existence check to both Java and Python commit validation, ensuring pre-assigned firstRowId files reference base files that still exist in the current snapshot. This covers a gap where checkRowIdRangeConflicts misses conflicts when base files are fully deleted (e.g., DROP PARTITION / OVERWRITE) with no replacement.

Also port Java's RowIdColumnConflictChecker to Python, enabling row×column two-dimensional conflict detection in check_row_id_from_snapshot. This allows concurrent MERGE INTO on different columns of the same rows to proceed without false conflicts.

…etection

Add a Rust-style row ID existence check to both Java and Python commit
validation, ensuring pre-assigned firstRowId files reference base files
that still exist in the current snapshot. This covers a gap where
checkRowIdRangeConflicts misses conflicts when base files are fully
deleted (e.g., DROP PARTITION / OVERWRITE) with no replacement.

Also port Java's RowIdColumnConflictChecker to Python, enabling row×column
two-dimensional conflict detection in check_row_id_from_snapshot. This
allows concurrent MERGE INTO on different columns of the same rows to
proceed without false conflicts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@leaves12138 leaves12138 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. I found a blocking regression in the new row-id existence check.

checkRowIdExistence / check_row_id_existence currently checks every ADD delta entry with a non-null firstRowId. However, row tracking metadata is assigned before conflict detection, so newly appended files in the same commit also have firstRowId. Those files are expected to be absent from the latest snapshot, but this check treats them as stale references and rejects the commit.

This breaks an existing mixed upsert flow where one row is updated and another row is inserted in the same commit. I can reproduce it on this PR head with:

cd paimon-python
PYTHONPATH=. python -m pytest \
  pypaimon/tests/table_upsert_by_key_test.py::TableUpsertByKeyBatchTest::test_mixed_update_and_append \
  pypaimon/tests/table_upsert_by_key_test.py::TableUpsertByKeyStreamTest::test_mixed_update_and_append \
  -q

Both tests fail with:

RuntimeError: Row ID existence conflict: file 'data-...parquet' references firstRowId=2, rowCount=1 in bucket 0, but no matching file exists in the current snapshot.

The file with firstRowId=2 is the newly inserted row, not a pre-assigned rewrite of an existing row-id range. The existence check needs to distinguish newly allocated append row-id ranges from files that intentionally reference existing row ids (for example by only checking ranges below the snapshot's nextRowId, or by carrying explicit source/rewrite semantics), and this mixed update+insert case should be covered by an end-to-end test.

Row tracking assigns firstRowId before conflict detection, so newly
appended files also have firstRowId set and were falsely rejected.
Add nextRowId parameter to only check files where firstRowId < nextRowId.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@leaves12138 leaves12138 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. The previous false positive for mixed update+append upsert is fixed now.

I verified the PR head with:

cd paimon-python
PYTHONPATH=. python -m pytest \
  pypaimon/tests/table_upsert_by_key_test.py \
  pypaimon/tests/table_update_test.py \
  pypaimon/tests/write/conflict_detection_test.py \
  -q

and:

mvn -pl paimon-core -DskipITs -Dcheckstyle.skip -Drat.skip=true \
  -Dspotless.check.skip=true -Dtest=ConflictDetectionTest test

Both passed. Looks good to me.

JingsongLi and others added 2 commits May 26, 2026 13:46
…h Java

In Java, conflict detection runs before row ID assignment. Python had
the opposite order, which meant delta entries already had firstRowId
set during conflict checks. Reorder to match Java's flow.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@leaves12138 leaves12138 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the Python ordering change in the latest head as well.

Moving conflict detection before row-id assignment matches the Java commit ordering and avoids treating newly assigned append row ids as pre-existing row-id references. I also verified these paths locally:

cd paimon-python
PYTHONPATH=. python -m pytest \
  pypaimon/tests/table_upsert_by_key_test.py \
  pypaimon/tests/table_update_test.py \
  pypaimon/tests/write/conflict_detection_test.py \
  -q

Result: 100 passed, 20 subtests passed.

I additionally checked concurrent mixed upsert/update scenarios manually: same row+same column still conflicts, same row+different columns still succeeds, and concurrent pure appends still succeed. Looks good to me.

JingsongLi and others added 3 commits May 26, 2026 14:18
…essage

The check_row_id_existence error message uses singular 'conflict',
while the test asserted on plural 'conflicts'. Use 'conflict' to
match both existence check and range conflict messages.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The concurrent merge+compact test retries on conflict exceptions,
but only caught the range conflict message. Add the new existence
conflict message so the retry works for both conflict types.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@leaves12138 leaves12138 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@JingsongLi JingsongLi merged commit 73c962a into apache:master May 26, 2026
17 of 18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants