Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Fix race condition when access delete predicate list in tablet meta (#42099) #42100

Merged
merged 4 commits into from
Mar 6, 2024

Conversation

srlch
Copy link
Contributor

@srlch srlch commented Mar 5, 2024

Why I'm doing:

tablet::version_for_delete_predicate do not hold lock to access delete predicate list which may cause crash.

What I'm doing:

Try to acquire lock in share mode when access delete predicate list

Fixes #42099

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.2
    • 3.1
    • 3.0
    • 2.5

@srlch srlch requested a review from a team as a code owner March 5, 2024 05:16
return version_for_delete_predicate_unlocked(version);
}

bool Tablet::version_for_delete_predicate_unlocked(const Version& version) {
return _tablet_meta->version_for_delete_predicate(version);
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most risky bug in this code is:
Using an empty catch block swallows exceptions silently without handling errors appropriately.

You can modify the code like this:

bool Tablet::version_for_delete_predicate(const Version& version) {
    // we use try/catch-block to lock _meta_lock in share mode
    // It is because in some case, _meta_lock may has been locked
    // by the current thread in shared/exclusive mode. It may
    // lead to undefined behavior or deadlock if we try to acquire a shared lock here again. 
    // Proper error handling should be employed instead of doing nothing.
    try {
        std::shared_lock<std::mutex> rlock(get_header_lock());
        return version_for_delete_predicate_unlocked(version);
    } catch (const std::exception& e) {
        // Log the exception or handle it accordingly instead of doing nothing.
        // For instance, you might want to log the error and return false or rethrow the exception.
        std::cerr << "Failed to acquire lock: " << e.what() << std::endl;
        throw; // Rethrow the caught exception for caller to handle, or decide on a default action.
    }
}

This modification adds error logging and rethrows the caught exception, encouraging proper error handling in calling code. Using std::cerr for logging is a simple example; in a real application, you would likely use a dedicated logging facility.

Comment on lines 845 to 847
try {
std::shared_lock rlock(get_header_lock());
} catch (const std::exception& e) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lock will be released after this try block, are you sure that this will fix the bug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is a problem, maybe we can use get_header_lock().lock_shared() instead. But i still reviewing implementation is reasonable or not. I will fix it later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

…et meta (StarRocks#42099)

Why I'm doing:
tablet::version_for_delete_predicate do not hold lock to access delete predicate list
which may cause crash.

What I'm doing:
separate tablet::version_for_delete_predicate into to function:
tablet::version_for_delete_predicate
tablet::version_for_delete_predicate_unlocked
Use this two functions base on the caller has acquire the _meta_lock or not

Signed-off-by: srlch <linzichao@starrocks.com>
@@ -836,9 +838,12 @@ Status Tablet::_capture_consistent_rowsets_unlocked(const std::vector<Version>&
return Status::OK();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tablet::get_compaction_status() also need use version_for_delete_predicate_unlocked?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Signed-off-by: srlch <linzichao@starrocks.com>
trueeyu
trueeyu previously approved these changes Mar 5, 2024
Signed-off-by: srlch <linzichao@starrocks.com>
Signed-off-by: srlch <linzichao@starrocks.com>
Copy link

github-actions bot commented Mar 5, 2024

[FE Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

github-actions bot commented Mar 5, 2024

[BE Incremental Coverage Report]

fail : 3 / 5 (60.00%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 be/src/storage/tablet.cpp 3 5 60.00% [846, 1191]

@andyziye andyziye merged commit 84c884d into StarRocks:main Mar 6, 2024
40 of 41 checks passed
Copy link

github-actions bot commented Mar 6, 2024

@Mergifyio backport branch-3.2

@github-actions github-actions bot removed the 3.2 label Mar 6, 2024
Copy link

github-actions bot commented Mar 6, 2024

@Mergifyio backport branch-3.1

@github-actions github-actions bot removed the 3.1 label Mar 6, 2024
Copy link

github-actions bot commented Mar 6, 2024

@Mergifyio backport branch-3.0

@github-actions github-actions bot removed the 3.0 label Mar 6, 2024
Copy link

github-actions bot commented Mar 6, 2024

@Mergifyio backport branch-2.5

@github-actions github-actions bot removed the 2.5 label Mar 6, 2024
Copy link
Contributor

mergify bot commented Mar 6, 2024

backport branch-3.2

✅ Backports have been created

Copy link
Contributor

mergify bot commented Mar 6, 2024

backport branch-3.1

✅ Backports have been created

Copy link
Contributor

mergify bot commented Mar 6, 2024

backport branch-3.0

✅ Backports have been created

Copy link
Contributor

mergify bot commented Mar 6, 2024

backport branch-2.5

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Mar 6, 2024
…et meta (#42099) (#42100)

Signed-off-by: srlch <linzichao@starrocks.com>
(cherry picked from commit 84c884d)
mergify bot pushed a commit that referenced this pull request Mar 6, 2024
…et meta (#42099) (#42100)

Signed-off-by: srlch <linzichao@starrocks.com>
(cherry picked from commit 84c884d)
mergify bot pushed a commit that referenced this pull request Mar 6, 2024
…et meta (#42099) (#42100)

Signed-off-by: srlch <linzichao@starrocks.com>
(cherry picked from commit 84c884d)

# Conflicts:
#	be/src/storage/tablet.h
mergify bot pushed a commit that referenced this pull request Mar 6, 2024
…et meta (#42099) (#42100)

Signed-off-by: srlch <linzichao@starrocks.com>
(cherry picked from commit 84c884d)

# Conflicts:
#	be/src/storage/tablet.h
wanpengfei-git pushed a commit that referenced this pull request Mar 6, 2024
…et meta (#42099) (backport #42100) (#42167)

Co-authored-by: srlch <111035020+srlch@users.noreply.github.com>
wanpengfei-git pushed a commit that referenced this pull request Mar 6, 2024
…et meta (#42099) (backport #42100) (#42168)

Co-authored-by: srlch <111035020+srlch@users.noreply.github.com>
andyziye pushed a commit that referenced this pull request Mar 6, 2024
…et meta (#42099)(backport #42100) (#42172)

Signed-off-by: srlch <linzichao@starrocks.com>
andyziye pushed a commit that referenced this pull request Mar 6, 2024
…et meta (#42099)(backport #42100) (#42174)

Signed-off-by: srlch <linzichao@starrocks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BugFix] Fix race condition when access delete predicate list in tablet meta
5 participants