-
Notifications
You must be signed in to change notification settings - Fork 3.5k
[Fix](cloud-mow) Fix race between CloudMetaMgr::sync_tablet_rowsets
and CloudSchemaChangeJob::_convert_historical_rowsets
#50051
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fix](cloud-mow) Fix race between CloudMetaMgr::sync_tablet_rowsets
and CloudSchemaChangeJob::_convert_historical_rowsets
#50051
Conversation
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
6c12861
to
782df85
Compare
7855598
to
88266ab
Compare
run buildall |
TPC-H: Total hot run time: 34425 ms
|
TPC-DS: Total hot run time: 187353 ms
|
ClickBench: Total hot run time: 31.39 s
|
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
run cloud_p0 |
88266ab
to
ae29a47
Compare
ae29a47
to
444f407
Compare
run buildall |
run buildall |
TPC-H: Total hot run time: 34354 ms
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
TPC-DS: Total hot run time: 192941 ms
|
ClickBench: Total hot run time: 31.13 s
|
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
… and `CloudSchemaChangeJob::_convert_historical_rowsets` (apache#50051) Considering the following race situation: 1. thread 1 finish convert historical rowsets before version Y on tablet A and write tmp converted rowset metas on MS 2. thread 2 begins sync_rowsets on tablet A with version Y and get visible rowsets before version Y. 3. thread 1 commit heavy schema change job on MS and turn tmp converted historical rowsets to visible rowsets on tablet Y. 4. thread 1 add converted historical rowset metas to tablet Y's BE local tablet meta and remove all delete bitmaps of new tablet before version Y. 5. thread 2 **add rowsets to tablet Y's BE local tablet meta which overwrites schema change's converted rowsets**. This will cause correctness problem. This PR add a lock to avoid this situation.
… and `CloudSchemaChangeJob::_convert_historical_rowsets` (apache#50051) Considering the following race situation: 1. thread 1 finish convert historical rowsets before version Y on tablet A and write tmp converted rowset metas on MS 2. thread 2 begins sync_rowsets on tablet A with version Y and get visible rowsets before version Y. 3. thread 1 commit heavy schema change job on MS and turn tmp converted historical rowsets to visible rowsets on tablet Y. 4. thread 1 add converted historical rowset metas to tablet Y's BE local tablet meta and remove all delete bitmaps of new tablet before version Y. 5. thread 2 **add rowsets to tablet Y's BE local tablet meta which overwrites schema change's converted rowsets**. This will cause correctness problem. This PR add a lock to avoid this situation.
… and `CloudSchemaChangeJob::_convert_historical_rowsets` (apache#50051) Considering the following race situation: 1. thread 1 finish convert historical rowsets before version Y on tablet A and write tmp converted rowset metas on MS 2. thread 2 begins sync_rowsets on tablet A with version Y and get visible rowsets before version Y. 3. thread 1 commit heavy schema change job on MS and turn tmp converted historical rowsets to visible rowsets on tablet Y. 4. thread 1 add converted historical rowset metas to tablet Y's BE local tablet meta and remove all delete bitmaps of new tablet before version Y. 5. thread 2 **add rowsets to tablet Y's BE local tablet meta which overwrites schema change's converted rowsets**. This will cause correctness problem. This PR add a lock to avoid this situation.
… and `CloudSchemaChangeJob::_convert_historical_rowsets` (apache#50051) Considering the following race situation: 1. thread 1 finish convert historical rowsets before version Y on tablet A and write tmp converted rowset metas on MS 2. thread 2 begins sync_rowsets on tablet A with version Y and get visible rowsets before version Y. 3. thread 1 commit heavy schema change job on MS and turn tmp converted historical rowsets to visible rowsets on tablet Y. 4. thread 1 add converted historical rowset metas to tablet Y's BE local tablet meta and remove all delete bitmaps of new tablet before version Y. 5. thread 2 **add rowsets to tablet Y's BE local tablet meta which overwrites schema change's converted rowsets**. This will cause correctness problem. This PR add a lock to avoid this situation.
What problem does this PR solve?
Considering the following race situation:
This will cause correctness problem. This PR add a lock to avoid this situation.
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)