Skip to content

branch-4.1: [fix](cloud) Fix tablets permanently invisible to compaction scheduler due to race condition in CloudTabletMgr::get_tablet (#60832)#61612

Open
bobhan1 wants to merge 1 commit intoapache:branch-4.1from
bobhan1:brnach-4.1-pick-60832
Open

branch-4.1: [fix](cloud) Fix tablets permanently invisible to compaction scheduler due to race condition in CloudTabletMgr::get_tablet (#60832)#61612
bobhan1 wants to merge 1 commit intoapache:branch-4.1from
bobhan1:brnach-4.1-pick-60832

Conversation

@bobhan1
Copy link
Contributor

@bobhan1 bobhan1 commented Mar 23, 2026

pick #60832

@bobhan1 bobhan1 requested a review from yiguolei as a code owner March 23, 2026 04:20
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@bobhan1
Copy link
Contributor Author

bobhan1 commented Mar 23, 2026

run buildall

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 84.21% (16/19) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.68% (19512/37037)
Line Coverage 36.15% (182407/504594)
Region Coverage 32.51% (140845/433206)
Branch Coverage 33.48% (61379/183313)

…r due to race condition in `CloudTabletMgr::get_tablet` (apache#60832)

Issue Number: close #xxx

Related PR: apache#57922

Fix a race condition in `CloudTabletMgr::get_tablet()` introduced by
disappear from `_tablet_map`, making them invisible to the compaction
scheduler. This leads to tables accumulating hundreds of rowsets without
any compaction under high-frequency import.

 ## Root Cause

Commit `0918952c70` refactored `get_tablet()` by moving
`_cache->insert()` and `_tablet_map->put()` from inside the
`SingleFlight` lambda to outside it. This introduced a race condition:

1. When N concurrent `get_tablet()` calls arrive for the same
`tablet_id`, `SingleFlight` executes the load lambda only once (by the
"leader"), but all N callers receive a `shared_ptr` pointing to the same
`CloudTablet` object.

2. After `SingleFlight::load()` returns, all N callers independently
execute:

 ```cpp
_cache->insert(key, value, ...) // each creates a competing LRU cache
entry
 _tablet_map->put(tablet)          // each inserts into tablet_map
 ```

3. **Each `_cache->insert()` evicts the previous entry for the same key.
The evicted `Value`'s destructor calls
`_tablet_map.erase(tablet.get())`**. The safety check `it->second.get()
== tablet` was designed to prevent erasing a newer tablet object — but
here **all callers share the same raw pointer from `SingleFlight`**, so
the check always passes, and the erase succeeds.

4. After the last caller's old cache handle is released (when its
returned shared_ptr goes out of scope), the destructor erases the entry
from _tablet_map.

5. Crucially, subsequent `get_tablet()` calls find the tablet in the LRU
cache (cache hit path), which never touches `_tablet_map`. So the tablet
is permanently invisible to `_tablet_map` and can never re-enter it.

6. **The compaction scheduler uses `get_weak_tablets()` which iterates
`_tablet_map`, so it never sees these tablets and never schedules
compaction for them.**

 ## Before (original correct code, prior to apache#57922):

 ```cpp
 auto load_tablet = [this, &key, ...](int64_t tablet_id) {
     // load from meta service...
// Cache insert + tablet_map put INSIDE lambda — only leader executes
     auto* handle = _cache->insert(key, value.release(), ...);
     _tablet_map->put(std::move(tablet));
     return ret;
 };
 s_singleflight_load_tablet.load(tablet_id, std::move(load_tablet));
 ```

 ## After apache#57922 (buggy code):

 ```cpp
 auto load_tablet = [this, ...](int64_t tablet_id) {
     // load from meta service...
     return tablet;  // just return raw tablet
 };
auto result = s_singleflight_load_tablet.load(tablet_id,
std::move(load_tablet));
// Cache insert + tablet_map put OUTSIDE lambda — ALL concurrent callers
execute
 _cache->insert(key, value.release(), ...);
 _tablet_map->put(std::move(tablet));
 ```

 ## Fix

Move `_cache->insert()` and `_tablet_map->put()` back inside the
`SingleFlight` lambda, ensuring only the leader caller performs cache
insertion and `_tablet_map` registration. This restores the invariant
that a single `get_tablet()` cache miss produces exactly one LRU cache
entry and one `_tablet_map` entry, eliminating the race condition.

None

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
@bobhan1 bobhan1 force-pushed the brnach-4.1-pick-60832 branch from 33d33ce to 57ee598 Compare March 23, 2026 11:42
@bobhan1
Copy link
Contributor Author

bobhan1 commented Mar 23, 2026

run buildall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants