[fix](insert overwrite) delay overwrite partition routing until incremental open#63209
Merged
Merged
Conversation
Contributor
Author
|
run buildall |
Contributor
TPC-H: Total hot run time: 29460 ms |
Contributor
TPC-DS: Total hot run time: 172487 ms |
Contributor
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
Contributor
|
/review |
Contributor
There was a problem hiding this comment.
Code review completed for PR 63209.
Summary opinion: no blocking issues found. The change is small and targeted: partition replacement now finishes the create-partition callback, including incremental open/open_wait, before publishing the new partition/tablet mapping into local routing. This matches the stated race and avoids exposing new tablet ids before the sender-side channels/streams are ready.
Critical checkpoint conclusions:
- Goal and proof: the code addresses the described auto-detect insert overwrite race, and the updated unit test verifies the callback runs before local replacement is visible.
- Scope: the modification is focused on callback/routing order plus a targeted test harness extension.
- Concurrency: the affected behavior is a routing publication race; this PR reduces the unsafe window by keeping old routing visible until incremental open completes. I did not find a new lock-order or dependency issue in the changed code.
- Lifecycle/static state: no new static/global lifecycle concerns were introduced.
- Configuration/compatibility: no new config, wire format, or storage-format compatibility changes.
- Parallel paths: both create-partition and replace-partition paths now publish routing after callback completion; V1/V2 writer callbacks consume the same result shape.
- Error handling: callback and add/replace failures are propagated with
RETURN_IF_ERROR; no ignored Status in the changed lines. - Tests: unit coverage was updated for the replace ordering. I did not run the BE unit test in this review environment.
- Observability/performance: no additional observability appears necessary for this small ordering fix; the extra copy in
cast_as_create_resultis not on a hot path and avoids moving from the original result before later use.
User focus: no additional user-provided review focus was specified.
Contributor
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
github-actions Bot
pushed a commit
that referenced
this pull request
May 14, 2026
…mental open (#63209) ### What problem does this PR solve? Problem Summary: In auto-detect insert overwrite, BE sender could publish newly replaced temporary partitions to local row routing before incremental open finished on target BEs. The race was: 1. One sender calls FE `replacePartition` and receives new temporary partition/tablet metadata. 2. The sender records the new partition id and replaces local `_vpartition` routing first. 3. Another concurrent batch can then route rows to the new tablet. 4. The first sender has not finished incremental open yet, so the target BE may not have created the delta writer for that tablet. 5. The target BE returns `unknown tablet to append data`. This PR makes the sender finish `_create_partition_callback`, including incremental open/open_wait, before publishing the new partition/tablet to local routing and marking the new partition as handled.
github-actions Bot
pushed a commit
that referenced
this pull request
May 14, 2026
…mental open (#63209) ### What problem does this PR solve? Problem Summary: In auto-detect insert overwrite, BE sender could publish newly replaced temporary partitions to local row routing before incremental open finished on target BEs. The race was: 1. One sender calls FE `replacePartition` and receives new temporary partition/tablet metadata. 2. The sender records the new partition id and replaces local `_vpartition` routing first. 3. Another concurrent batch can then route rows to the new tablet. 4. The first sender has not finished incremental open yet, so the target BE may not have created the delta writer for that tablet. 5. The target BE returns `unknown tablet to append data`. This PR makes the sender finish `_create_partition_callback`, including incremental open/open_wait, before publishing the new partition/tablet to local routing and marking the new partition as handled.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Problem Summary:
In auto-detect insert overwrite, BE sender could publish newly replaced temporary partitions to local row routing before incremental open finished on target BEs.
The race was:
replacePartitionand receives new temporary partition/tablet metadata._vpartitionrouting first.unknown tablet to append data.This PR makes the sender finish
_create_partition_callback, including incremental open/open_wait, before publishing the new partition/tablet to local routing and marking the new partition as handled.Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)