[Enhancement](ms) Add sharded LRU cache for tablet index metadata to reduce FDB IO by wyxxxcat · Pull Request #61666 · apache/doris

wyxxxcat · 2026-03-24T09:10:57Z

Summary

This PR implements a sharded LRU cache for meta_tablet_idx_key lookups in both MetaService and Recycler to reduce frequent FDB reads of immutable metadata.

Background

MS-side operations like commit_rowset, finish_tablet_job, and commit_txn frequently read the same tablet index metadata (TabletIndexPB), which is nearly immutable after creation. This causes unnecessary FDB IO overhead.

Implementation

Core Components

KvCache Template (cloud/src/common/kv_cache.h)
- Generic sharded LRU cache with 16 shards by default
- Reduces lock contention in high-concurrency scenarios
- Supports any KeyTuple and ValuePB types
- TTL support: entries expire after configurable time
KvCacheManager (cloud/src/common/kv_cache_manager.h)
- Manages cache instances with configurable capacity and TTL
- Extensible for future cache types (e.g., SchemaCache)
Configuration (cloud/src/common/config.h)
- ms_tablet_index_cache_capacity: MS cache capacity (default: 500000)
- recycler_tablet_index_cache_capacity: Recycler cache capacity (default: 500000)
- tablet_index_cache_ttl_seconds: TTL in seconds (default: 0, no TTL)

Integration Points

MetaService (cloud/src/meta-service/meta_service.cpp):

Initialize global g_ms_cache_manager in constructor
Add cache lookup/put in get_tablet_idx() function
Transparent to callers - no API changes required

Recycler (cloud/src/recycler/util.cpp, recycler.cpp):

Initialize global g_recycler_cache_manager in Recycler::start()
Add cache lookup/put in recycler's get_tablet_idx() function
Invalidate cache when deleting tablet_idx_key in recycle_tablets()

Cache Invalidation Strategy

MS: Can invalidate on drop_tablet/drop_index/drop_partition if needed
Recycler: Actively invalidates cache when deleting tablet_idx_key in recycle_tablets()
TTL: Entries automatically expire after configured TTL (if enabled)

Testing

Added comprehensive unit tests in cloud/test/kv_cache_test.cpp:

Basic get/put operations
LRU eviction behavior
Cache invalidation
Concurrent access (8 threads)

Performance Benefits

Reduces FDB read operations for frequently accessed tablet metadata
16-way sharding minimizes lock contention under high concurrency
Transparent integration - zero impact on existing code paths
Dual eviction: LRU + TTL for flexible cache management

Release note

None

Check List (For Author)

Test
- Regression test
- Unit Test
- Manual test (add detailed scripts or steps below)
- No need to test or manual test. Explain why:
  - This is a refactor/code format and no logic has been changed.
  - Previous test can cover this change.
  - No code files have been changed.
  - Other reason
Behavior changed:
- No.
- Yes.
Does this need documentation?
- No.
- Yes.

Check List (For Reviewer who merge this PR)

Confirm the release note
Confirm test cases
Confirm document
Add branch pick label

hello-stephen · 2026-03-24T09:11:04Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

wyxxxcat · 2026-03-24T09:27:04Z

run buildall

dataroaring · 2026-03-25T12:13:44Z

cloud/src/common/kv_cache.h

There is already a lru cache in be, we should use it.

wyxxxcat force-pushed the ms_lru_cache branch from 8589aa1 to 4311778 Compare March 24, 2026 09:24

1

6b47a05

wyxxxcat force-pushed the ms_lru_cache branch from 4311778 to 6b47a05 Compare March 24, 2026 09:26

dataroaring reviewed Mar 25, 2026

View reviewed changes

cloud/src/common/kv_cache.h

Copy link

Contributor

dataroaring Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already a lru cache in be, we should use it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement](ms) Add sharded LRU cache for tablet index metadata to reduce FDB IO#61666

[Enhancement](ms) Add sharded LRU cache for tablet index metadata to reduce FDB IO#61666
wyxxxcat wants to merge 1 commit intoapache:masterfrom
wyxxxcat:ms_lru_cache

wyxxxcat commented Mar 24, 2026

Uh oh!

hello-stephen commented Mar 24, 2026

Uh oh!

wyxxxcat commented Mar 24, 2026

Uh oh!

dataroaring Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

wyxxxcat commented Mar 24, 2026

Summary

Background

Implementation

Core Components

Integration Points

Cache Invalidation Strategy

Testing

Performance Benefits

Release note

Check List (For Author)

Check List (For Reviewer who merge this PR)

Uh oh!

hello-stephen commented Mar 24, 2026

Uh oh!

wyxxxcat commented Mar 24, 2026

Uh oh!

dataroaring Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants