Skip to content

[pip] PIP-430: Pulsar Broker cache improvements: refactoring eviction and adding a new cache strategy based on expected read count #24444

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

lhotari
Copy link
Member

@lhotari lhotari commented Jun 23, 2025

Motivation

I'd like to propose PIP-430, which addresses performance and
efficiency issues in Pulsar broker's entry cache eviction mechanisms
and introduces a more efficient caching strategy.

The current broker entry cache implementation has several
production-impacting issues. The size-based eviction doesn't guarantee
removal of globally oldest entries, leading to suboptimal cache
utilization. More critically, the timestamp-based eviction iterates
through all ManagedLedgers every 10ms by default, causing high CPU
utilization in brokers with many topics. Mixed read patterns like
tailing, catch-up, and Key_Shared replays break eviction assumptions,
resulting in unnecessary BookKeeper and S3 reads that increase
operational costs.

PIP-430 introduces two main improvements.
First, a centralized eviction mechanism using a global
RangeCacheRemovalQueue that tracks all cached entries in insertion
order. This replaces the expensive per-ledger iteration with a single
periodic task and ensures true oldest-first eviction globally. The
implementation PR for this part is
#24363.
Second, a new "expected read count" cache strategy where entries track
how many active cursors are anticipated to read them. This allows the
cache to intelligently retain entries that have higher utility,
especially in high fan-out catch-up read scenarios and Key_Shared
subscriptions.

The benefits include reduced CPU overhead, improved cache hit rates
through better eviction decisions, and proper handling of diverse read
patterns. The new strategy is configurable via
cacheEvictionByExpectedReadCount (default: true) and maintains full
backward compatibility with no client-facing API changes.

This addresses long-standing performance issues that particularly
affect production deployments with high topic counts or diverse
consumption patterns. The refactored architecture also provides a
solid foundation for future cache optimizations.

The full proposal can be found at: #24444
Rendered PIP document:
https://github.com/lhotari/pulsar/blob/lh-pip-430/pip/pip-430.md

I welcome your feedback and discussion on this proposal. Please share
your thoughts, concerns, or suggestions.

Mailing list discussion: https://lists.apache.org/thread/o1ozbg468kxfd38pxk2ppzsstdnxnok2

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

… and adding a new cache strategy based on expected read count
@github-actions github-actions bot added PIP doc Your PR contains doc changes, no matter whether the changes are in markdown or code files. labels Jun 23, 2025
@lhotari
Copy link
Member Author

lhotari commented Jun 23, 2025

Implementation related:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Your PR contains doc changes, no matter whether the changes are in markdown or code files. PIP ready-to-test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant