[pip] PIP-430: Pulsar Broker cache improvements: refactoring eviction and adding a new cache strategy based on expected read count #24444
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
I'd like to propose PIP-430, which addresses performance and
efficiency issues in Pulsar broker's entry cache eviction mechanisms
and introduces a more efficient caching strategy.
The current broker entry cache implementation has several
production-impacting issues. The size-based eviction doesn't guarantee
removal of globally oldest entries, leading to suboptimal cache
utilization. More critically, the timestamp-based eviction iterates
through all ManagedLedgers every 10ms by default, causing high CPU
utilization in brokers with many topics. Mixed read patterns like
tailing, catch-up, and Key_Shared replays break eviction assumptions,
resulting in unnecessary BookKeeper and S3 reads that increase
operational costs.
PIP-430 introduces two main improvements.
First, a centralized eviction mechanism using a global
RangeCacheRemovalQueue that tracks all cached entries in insertion
order. This replaces the expensive per-ledger iteration with a single
periodic task and ensures true oldest-first eviction globally. The
implementation PR for this part is
#24363.
Second, a new "expected read count" cache strategy where entries track
how many active cursors are anticipated to read them. This allows the
cache to intelligently retain entries that have higher utility,
especially in high fan-out catch-up read scenarios and Key_Shared
subscriptions.
The benefits include reduced CPU overhead, improved cache hit rates
through better eviction decisions, and proper handling of diverse read
patterns. The new strategy is configurable via
cacheEvictionByExpectedReadCount (default: true) and maintains full
backward compatibility with no client-facing API changes.
This addresses long-standing performance issues that particularly
affect production deployments with high topic counts or diverse
consumption patterns. The refactored architecture also provides a
solid foundation for future cache optimizations.
The full proposal can be found at: #24444
Rendered PIP document:
https://github.com/lhotari/pulsar/blob/lh-pip-430/pip/pip-430.md
I welcome your feedback and discussion on this proposal. Please share
your thoughts, concerns, or suggestions.
Mailing list discussion: https://lists.apache.org/thread/o1ozbg468kxfd38pxk2ppzsstdnxnok2
Documentation
doc
doc-required
doc-not-needed
doc-complete