-
Notifications
You must be signed in to change notification settings - Fork 3.5k
[Fix](cloud-mow) avoid calc delete bitmap tasks on same (txn_id, tablet_id) being executed concurrently #50847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fix](cloud-mow) avoid calc delete bitmap tasks on same (txn_id, tablet_id) being executed concurrently #50847
Conversation
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
9d515f4
to
b174c1b
Compare
run buildall |
TPC-H: Total hot run time: 33931 ms
|
TPC-DS: Total hot run time: 193988 ms
|
ClickBench: Total hot run time: 29.69 s
|
32959f9
to
5f80ddc
Compare
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
run p0 |
run cloud_p0 |
run p0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
run be ut |
run buildall |
TPC-H: Total hot run time: 34087 ms
|
TPC-DS: Total hot run time: 186425 ms
|
ClickBench: Total hot run time: 29.38 s
|
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
1 similar comment
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
run feut |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…et_id) being executed concurrently (apache#50847) After apache#50417, there may be multiple calc delete bitmap tasks with different signatures on the same (txn_id, tablet_id) load in same BE. We use _rowset_update_lock to avoid them being executed concurrently to avoid correctness problem. e.g. rowset meta and segment data object mismatches due to concurrent writes on same rowset with transient rowset writer in partial update publish phase ``` W20250513 15:50:55.371588 1049 file_reader.cpp:36] [NOT_FOUND]failed to read from : code=NOT_FOUND, type=16, request_id=failed to read W20250513 15:50:55.371667 1049 beta_rowset.cpp:202] failed to open segment. data/1747122561886/020000000000000125473fbacc484a4f8c46478ab6f64b90_2.dat under rowset 020000000000000125473fbacc484a4f8c46478ab6f64b90 : [NOT_FOUND]failed to read from : code=NOT_FOUND, type=16, request_id=failed to read ```
…et_id) being executed concurrently (apache#50847) ### What problem does this PR solve? After apache#50417, there may be multiple calc delete bitmap tasks with different signatures on the same (txn_id, tablet_id) load in same BE. We use _rowset_update_lock to avoid them being executed concurrently to avoid correctness problem. e.g. rowset meta and segment data object mismatches due to concurrent writes on same rowset with transient rowset writer in partial update publish phase ``` W20250513 15:50:55.371588 1049 file_reader.cpp:36] [NOT_FOUND]failed to read from : code=NOT_FOUND, type=16, request_id=failed to read W20250513 15:50:55.371667 1049 beta_rowset.cpp:202] failed to open segment. data/1747122561886/020000000000000125473fbacc484a4f8c46478ab6f64b90_2.dat under rowset 020000000000000125473fbacc484a4f8c46478ab6f64b90 : [NOT_FOUND]failed to read from : code=NOT_FOUND, type=16, request_id=failed to read ```
What problem does this PR solve?
After #50417, there may be multiple calc delete bitmap tasks with different signatures on the same (txn_id, tablet_id) load in same BE. We use _rowset_update_lock to avoid them being executed concurrently to avoid correctness problem.
e.g. rowset meta and segment data object mismatches due to concurrent writes on same rowset with transient rowset writer in partial update publish phase
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)