[improve][ml] Warn and emit metric when cursor ack state exceeds persist limits by ng-galien · Pull Request #25548 · apache/pulsar

ng-galien · 2026-04-17T11:53:42Z

Motivation

When a cursor persists more ack ranges than managedLedgerMaxUnackedRangesToPersist or more batch deleted indexes than managedLedgerMaxBatchDeletedIndexToPersist, the excess is silently truncated. On broker restart those acks are lost and messages are redelivered. Today there is no signal when this happens — operators have to monitor totalNonContiguousDeletedMessagesRange manually. The issue discussion asks for a WARN log (with tuning advice) and a cursor-level metric.

Modifications

Commit 1 — warn and emit metric on truncation:

WARN log emitted once per crossing (edge-detected) in both buildIndividualDeletedMessageRanges() and buildBatchEntryDeletionIndexInfoList(), with tuning advice covering the two limits, managedLedgerPersistIndividualAckAsLongArray, and managedCursorInfoCompressionType.
Two OTel counters tagged with the cursor's standard attributes (pulsar.managed_ledger.name, pulsar.managed_ledger.cursor.name, pulsar.namespace)
Signals documented next to the settings in broker.conf.

Commit 2 — fix pre-existing off-by-one in the ranges cap:

buildIndividualDeletedMessageRanges used to persist maxRanges + 1 entries because the forEach callback added before testing rangeList.size() <= maxRanges. Regression introduced in #3819 when stream().limit(N) was dropped. Without this fix the new WARN/counter fire spuriously when totalRanges == maxRanges + 1. Fixed by switching to a check-before-add pattern (symmetric with buildBatchEntryDeletionIndexInfoList) with a MutableBoolean truncated flag.

Verifying this change

ManagedCursorTest.testPersistUnackedRangesTruncatedCounter
ManagedCursorTest.testPersistBatchDeletedIndexesTruncatedCounter

Does this pull request potentially affect one of the following parts:

The metrics — adds two cursor-level OTel counters (also documented in broker.conf).

…ist limits Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…maxRanges Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

lhotari

Good work @ng-galien! Some comments on naming. In the metric names, I think it's clearer to describe the effect, "persisted unacked ranges being truncated" or "persisted batch deleted indexes being truncated" — rather than using "overflow". The former tells operators directly what happened (state was dropped on persist), whereas "overflow" is a more abstract term that requires them to infer the consequence.

lhotari

LGTM, good work @ng-galien

ng-galien · 2026-04-18T07:59:08Z

Hi @lhotari, thanks for your review. Semantic is aligned with truncate and telemetry is more precise as you suggest.
I've just noticed the logger already have ledger and cursor name at constructor level.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ng-galien · 2026-04-18T08:22:11Z

@lhotari managed-ledger:checkstyleTest and and managed-ledger:test are now green on side, my bad.

See apache/pulsar#25548.

lhotari

A few comments about details that Claude Code spotted.

In addition 2 comments about the description:

reconcile the PR description (it claims broker‑level / no cardinality growth, but the code emits per‑cursor labels),
update the Modifications section to use the final metric names.

Since the counter will only be emitted in Otel when the threshold has been crossed, it's fine to increase cardinality. In Prometheus this would be different.

After review on apache/pulsar#25548, the two new counters emit the cursor's standard attribute set (pulsar.namespace, pulsar.managed_ledger.name, pulsar.managed_ledger.cursor.name) instead of custom managedLedger/cursor keys. Update the reference doc to match. See apache/pulsar#25548.

See apache/pulsar#25548.

ng-galien and others added 2 commits April 17, 2026 13:41

[improve][ml] Warn and emit metric when cursor ack state exceeds pers…

9c9be22

…ist limits Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

[fix][ml] Cap persisted individual deleted message ranges at exactly …

4dab174

…maxRanges Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ng-galien marked this pull request as ready for review April 17, 2026 12:03

lhotari reviewed Apr 17, 2026

View reviewed changes

[fix][ml] after review: align semantic from overflow to truncated

c3a3baa

lhotari approved these changes Apr 18, 2026

View reviewed changes

[fix][ml] fix checkstyle line length in ManagedCursorTest

8a64561

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ng-galien added a commit to ng-galien/pulsar-site that referenced this pull request Apr 18, 2026

Docs: document cursor persist truncation OpenTelemetry metrics

e1ed1f1

See apache/pulsar#25548.

This was referenced Apr 18, 2026

Docs: document cursor persist truncation OpenTelemetry metrics apache/pulsar-site#1118

Merged

Docs: reminder — cursor ack state persistence tuning guide apache/pulsar-site#1119

Open

lhotari reviewed Apr 20, 2026

View reviewed changes

[fix][ml] after review: use ManagedCursor attributes

08dc881

lhotari approved these changes Apr 20, 2026

View reviewed changes

lhotari merged commit d553cec into apache:master Apr 21, 2026
44 checks passed

lhotari added this to the 4.3.0 milestone Apr 21, 2026

lhotari pushed a commit to apache/pulsar-site that referenced this pull request Apr 21, 2026

Docs: document cursor persist truncation OpenTelemetry metrics (#1118)

54df315

See apache/pulsar#25548.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[improve][ml] Warn and emit metric when cursor ack state exceeds persist limits#25548

[improve][ml] Warn and emit metric when cursor ack state exceeds persist limits#25548
lhotari merged 5 commits intoapache:masterfrom
ng-galien:fix/warn-ack-holes-exceed-persist-limit

ng-galien commented Apr 17, 2026 •

edited

Loading

Uh oh!

lhotari left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lhotari left a comment

Uh oh!

ng-galien commented Apr 18, 2026

Uh oh!

ng-galien commented Apr 18, 2026

Uh oh!

lhotari left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ng-galien commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Verifying this change

Does this pull request potentially affect one of the following parts:

Uh oh!

lhotari left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lhotari left a comment

Choose a reason for hiding this comment

Uh oh!

ng-galien commented Apr 18, 2026

Uh oh!

ng-galien commented Apr 18, 2026

Uh oh!

lhotari left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ng-galien commented Apr 17, 2026 •

edited

Loading