KAFKA-19888: Clamp negative values in coordinator histograms #20986

squah-confluent · 2025-11-25T05:43:02Z

The coordinator runtime and group coordinator currently use wall clock
time to measure durations for metrics. When the system clock goes
backwards due to time adjustments, we attempt to record negative
durations for metrics, which throws an ArrayIndexOutOfBoundsException
exception. This causes request processing and partition loading to fail
while the clock is being adjusted. If partition loading fails, the group
the coordinator for that partition becomes unavailable until the broker
is restarted or leadership changes again.

To address this, we clamp negative durations to zero in histograms
instead of throwing ArrayIndexOutOfBoundsExceptions. We will move
towards using a monotonic clock for metrics in future work.

Reviewers: David Jacot djacot@confluent.io

Clamp negative values in coordinator histograms, instead of throwing an exception.

squah-confluent · 2025-11-25T05:44:13Z

...nator-common/src/test/java/org/apache/kafka/coordinator/common/runtime/HdrHistogramTest.java

+        assertEquals(highestTrackableValue, hdrHistogram.max(now));
+
+        hdrHistogram.record(-50L);
+        assertEquals(0, hdrHistogram.max(now + 1000L));


There's no histogram min method. We can add one, but it'd be only used in this one test.

dajac · 2025-11-25T19:05:51Z

@squah-confluent Could you please explain the issue and the fix in the description?

This must be back ported to 4.2, 4.1 and 4.0.

squah-confluent · 2025-11-26T13:23:57Z

@dajac Thanks for taking a look. I rewrote the description. Let me know your thoughts.

dajac

lgtm, thanks

The coordinator runtime and group coordinator currently use wall clock time to measure durations for metrics. When the system clock goes backwards due to time adjustments, we attempt to record negative durations for metrics, which throws an ArrayIndexOutOfBoundsException exception. This causes request processing and partition loading to fail while the clock is being adjusted. If partition loading fails, the group the coordinator for that partition becomes unavailable until the broker is restarted or leadership changes again. To address this, we clamp negative durations to zero in histograms instead of throwing ArrayIndexOutOfBoundsExceptions. We will move towards using a monotonic clock for metrics in future work. Reviewers: David Jacot <djacot@confluent.io>

dajac · 2025-11-26T14:30:17Z

Merged to trunk, 4.2, 4.1 and 4.0.

KAFKA-19888: Clamp negative values in coordinator histograms

572e5d6

Clamp negative values in coordinator histograms, instead of throwing an exception.

github-actions bot added triage PRs from the community group-coordinator small Small PRs labels Nov 25, 2025

squah-confluent commented Nov 25, 2025

View reviewed changes

sjhajharia added the ci-approved label Nov 25, 2025

dajac self-requested a review November 25, 2025 19:07

github-actions bot removed the triage PRs from the community label Nov 26, 2025

dajac approved these changes Nov 26, 2025

View reviewed changes

dajac merged commit 889c3d4 into apache:trunk Nov 26, 2025
28 checks passed

dajac deleted the squah-clamp-coordinator-histogram-negative-values branch November 26, 2025 14:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

KAFKA-19888: Clamp negative values in coordinator histograms #20986

KAFKA-19888: Clamp negative values in coordinator histograms #20986

squah-confluent commented Nov 25, 2025 •

edited by github-actions bot

Loading

Uh oh!

squah-confluent Nov 25, 2025

Uh oh!

dajac commented Nov 25, 2025

Uh oh!

squah-confluent commented Nov 26, 2025

Uh oh!

dajac left a comment

Uh oh!

Uh oh!

dajac commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

KAFKA-19888: Clamp negative values in coordinator histograms #20986

KAFKA-19888: Clamp negative values in coordinator histograms #20986

Conversation

squah-confluent commented Nov 25, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

squah-confluent Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

dajac commented Nov 25, 2025

Uh oh!

squah-confluent commented Nov 26, 2025

Uh oh!

dajac left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dajac commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

squah-confluent commented Nov 25, 2025 •

edited by github-actions bot

Loading