Skip to content

branch-4.1: [improvement](be) Add stampede protection for AnnIndexIVFListCache #62442#62567

Merged
yiguolei merged 1 commit intobranch-4.1from
auto-pick-62442-branch-4.1
Apr 16, 2026
Merged

branch-4.1: [improvement](be) Add stampede protection for AnnIndexIVFListCache #62442#62567
yiguolei merged 1 commit intobranch-4.1from
auto-pick-62442-branch-4.1

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

Cherry-picked from #62442

…62442)

## Summary

- Add double-check locking pattern to
`CachedRandomAccessReader::borrow()` to prevent concurrent threads from
redundantly reading the same IVF list data from disk (cache stampede)
- Eliminate ghost memory caused by LRU cache entry replacement under
high concurrency, reducing usage ratio fluctuations
- Zero overhead on the fast path (cache hit); reuses the existing
`_io_mutex` without introducing new synchronization primitives

## Proposed Changes

When multiple threads concurrently miss the `AnnIndexIVFListCache` for
the same key, they all independently read the same data from disk and
insert it into the cache. The LRU cache silently replaces duplicate
entries (counted as "stampede"), causing:

1. **Redundant disk I/O**: N concurrent misses produce N identical reads
instead of 1
2. **Ghost memory**: replaced entries whose handles are still pinned
consume memory not tracked by cache `_usage`, leading to usage ratio
fluctuations (observed 87%→50% under high concurrency)
3. **Lower cache hit ratio**: unnecessary evictions caused by inflated
insertion volume

The fix leverages the existing `_io_mutex` (required because CLucene
`IndexInput` is stateful) by moving its acquisition before the disk read
and adding a cache re-check after acquiring the lock. If a preceding
thread already loaded the same key, subsequent threads skip the disk I/O
entirely.

### What problem does this PR solve?

Problem Summary: Under high concurrency (e.g., vectordb_bench with 10-90
concurrent threads), the AnnIndexIVFListCache exhibits excessive
stampede events, redundant disk I/O, and usage ratio fluctuations due to
the TOCTOU gap between cache lookup and insert.

### Release note

Reduced redundant disk I/O and memory waste for IVF on-disk vector index
cache under concurrent queries by adding stampede protection
(double-check locking pattern) to CachedRandomAccessReader::borrow().

### Check List (For Author)

- Test: Manual test with vectordb_bench concurrency benchmark
- Behavior changed: No
- Does this need documentation: No
@github-actions github-actions Bot requested a review from yiguolei as a code owner April 16, 2026 15:42
@Thearas
Copy link
Copy Markdown
Contributor

Thearas commented Apr 16, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring reopened this Apr 16, 2026
@Thearas
Copy link
Copy Markdown
Contributor

Thearas commented Apr 16, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 100.00% (7/7) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.13% (19933/37516)
Line Coverage 36.63% (187719/512446)
Region Coverage 33.00% (146012/442504)
Branch Coverage 34.08% (63858/187384)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (7/7) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.49% (26252/36720)
Line Coverage 54.38% (277737/510712)
Region Coverage 51.65% (230586/446444)
Branch Coverage 53.03% (99651/187910)

@yiguolei yiguolei merged commit e77064f into branch-4.1 Apr 16, 2026
29 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants