Summary
The tiered storage read-ahead cache currently uses a single access-based TTL. In cold-read scenarios, prefetched entries may expire before consumers have enough time to consume the whole read-ahead batch, while entries that have already been read should usually be released sooner to reduce heap pressure.
This issue proposes adding a native dual-TTL policy for the tiered storage read-ahead cache:
- a longer TTL after cache entry creation, so unread prefetched entries have enough time to be consumed;
- a shorter TTL after cache entry read, so consumed entries can be released sooner.
Motivation
In tiered storage cold-read workloads, read-ahead can fetch batches ahead of consumer demand. A single short expireAfterAccess duration is not expressive enough for two different phases of a cache entry lifecycle:
- before first read: keep the prefetched data long enough for the consumer to catch up;
- after read: keep only a short grace period for concurrent/repeated reads, then release memory.
A dual-TTL policy helps balance read-ahead effectiveness and heap usage.
Describe the Solution You Would Like
Add two read-ahead cache TTL configurations in tieredstore:
readAheadCacheCreateExpireDuration, default 180000ms;
readAheadCacheAfterReadExpireDuration, default 10000ms.
Keep readAheadCacheExpireDuration as the legacy single-TTL fallback. If both new TTLs are disabled, use the legacy expireAfterAccess behavior. If only one new TTL is disabled, resolve it from the legacy duration.
Implement the policy in MessageStoreFetcherImpl using Caffeine Expiry:
expireAfterCreate: create TTL;
expireAfterRead: after-read TTL;
expireAfterUpdate: create TTL.
Describe Alternatives You Have Considered
One alternative is to keep the dual-TTL policy in downstream storage integrations. However, the read-ahead cache is implemented in open-source tieredstore, and both normal broker cold reads and mount-style tiered reads can benefit from the same lifecycle-aware cache policy.
Another alternative is to only increase the existing single TTL. That improves unread prefetch retention but also keeps already-read entries longer, increasing memory pressure.
Additional Context
The default values are intended to preserve a longer window for unread prefetched batches while shortening retention after a successful read.
Summary
The tiered storage read-ahead cache currently uses a single access-based TTL. In cold-read scenarios, prefetched entries may expire before consumers have enough time to consume the whole read-ahead batch, while entries that have already been read should usually be released sooner to reduce heap pressure.
This issue proposes adding a native dual-TTL policy for the tiered storage read-ahead cache:
Motivation
In tiered storage cold-read workloads, read-ahead can fetch batches ahead of consumer demand. A single short
expireAfterAccessduration is not expressive enough for two different phases of a cache entry lifecycle:A dual-TTL policy helps balance read-ahead effectiveness and heap usage.
Describe the Solution You Would Like
Add two read-ahead cache TTL configurations in
tieredstore:readAheadCacheCreateExpireDuration, default180000ms;readAheadCacheAfterReadExpireDuration, default10000ms.Keep
readAheadCacheExpireDurationas the legacy single-TTL fallback. If both new TTLs are disabled, use the legacyexpireAfterAccessbehavior. If only one new TTL is disabled, resolve it from the legacy duration.Implement the policy in
MessageStoreFetcherImplusing CaffeineExpiry:expireAfterCreate: create TTL;expireAfterRead: after-read TTL;expireAfterUpdate: create TTL.Describe Alternatives You Have Considered
One alternative is to keep the dual-TTL policy in downstream storage integrations. However, the read-ahead cache is implemented in open-source
tieredstore, and both normal broker cold reads and mount-style tiered reads can benefit from the same lifecycle-aware cache policy.Another alternative is to only increase the existing single TTL. That improves unread prefetch retention but also keeps already-read entries longer, increasing memory pressure.
Additional Context
The default values are intended to preserve a longer window for unread prefetched batches while shortening retention after a successful read.