-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] Fix race condition when access delete predicate list in tablet meta (#42099) #42100
Conversation
return version_for_delete_predicate_unlocked(version); | ||
} | ||
|
||
bool Tablet::version_for_delete_predicate_unlocked(const Version& version) { | ||
return _tablet_meta->version_for_delete_predicate(version); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most risky bug in this code is:
Using an empty catch block swallows exceptions silently without handling errors appropriately.
You can modify the code like this:
bool Tablet::version_for_delete_predicate(const Version& version) {
// we use try/catch-block to lock _meta_lock in share mode
// It is because in some case, _meta_lock may has been locked
// by the current thread in shared/exclusive mode. It may
// lead to undefined behavior or deadlock if we try to acquire a shared lock here again.
// Proper error handling should be employed instead of doing nothing.
try {
std::shared_lock<std::mutex> rlock(get_header_lock());
return version_for_delete_predicate_unlocked(version);
} catch (const std::exception& e) {
// Log the exception or handle it accordingly instead of doing nothing.
// For instance, you might want to log the error and return false or rethrow the exception.
std::cerr << "Failed to acquire lock: " << e.what() << std::endl;
throw; // Rethrow the caught exception for caller to handle, or decide on a default action.
}
}
This modification adds error logging and rethrows the caught exception, encouraging proper error handling in calling code. Using std::cerr
for logging is a simple example; in a real application, you would likely use a dedicated logging facility.
be/src/storage/tablet.cpp
Outdated
try { | ||
std::shared_lock rlock(get_header_lock()); | ||
} catch (const std::exception& e) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lock will be released after this try block, are you sure that this will fix the bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is a problem, maybe we can use get_header_lock().lock_shared() instead. But i still reviewing implementation is reasonable or not. I will fix it later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
…et meta (StarRocks#42099) Why I'm doing: tablet::version_for_delete_predicate do not hold lock to access delete predicate list which may cause crash. What I'm doing: separate tablet::version_for_delete_predicate into to function: tablet::version_for_delete_predicate tablet::version_for_delete_predicate_unlocked Use this two functions base on the caller has acquire the _meta_lock or not Signed-off-by: srlch <linzichao@starrocks.com>
cf26ed9
to
ee83a90
Compare
@@ -836,9 +838,12 @@ Status Tablet::_capture_consistent_rowsets_unlocked(const std::vector<Version>& | |||
return Status::OK(); | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tablet::get_compaction_status() also need use version_for_delete_predicate_unlocked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
[FE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[BE Incremental Coverage Report]❌ fail : 3 / 5 (60.00%) file detail
|
@Mergifyio backport branch-3.2 |
@Mergifyio backport branch-3.1 |
@Mergifyio backport branch-3.0 |
@Mergifyio backport branch-2.5 |
✅ Backports have been created
|
✅ Backports have been created
|
✅ Backports have been created
|
✅ Backports have been created
|
Why I'm doing:
tablet::version_for_delete_predicate do not hold lock to access delete predicate list which may cause crash.
What I'm doing:
Try to acquire lock in share mode when access delete predicate list
Fixes #42099
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: