KAFKA-5203: Metrics: fix resetting of histogram sample #3002

iv-m · 2017-05-09T12:27:43Z

Without the histogram cleanup, the percentiles are calculated
incorrectly after purging of one or more samples: event counts
go out of sync with counts in histogram buckets, and bucket
with lower value gets chosen for the given quantile.

This change adds the necessary histogram cleanup.

Without the histogram cleanup, the percentiles are calculated incorrectly after purging of one or more samples: event counts go out of sync with counts in histogram buckets, and bucket with lower value gets chosen for the given quantile. This change adds the necessary histogram cleanup.

iv-m · 2017-05-09T12:38:46Z

@jkreps could you please review this?

ijuma · 2017-05-09T12:52:28Z

Thanks for the PR. Fix is trivial and seems to make sense, cc @junrao in case there is some reason for the current behaviour.

asfbot · 2017-05-09T13:46:20Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3672/
Test PASSed (JDK 8 and Scala 2.12).

asfbot · 2017-05-09T13:51:00Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/3676/
Test PASSed (JDK 7 and Scala 2.10).

asfbot · 2017-05-09T13:59:19Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/3682/
Test FAILed (JDK 8 and Scala 2.11).

iv-m · 2017-05-09T14:38:40Z

retest this please

asfbot · 2017-05-09T15:45:18Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3675/
Test PASSed (JDK 8 and Scala 2.12).

asfbot · 2017-05-09T15:50:45Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/3679/
Test PASSed (JDK 7 and Scala 2.10).

asfbot · 2017-05-09T15:50:48Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/3685/
Test PASSed (JDK 8 and Scala 2.11).

junrao

@iv-m : Thanks for the patch. Looks good. Just a comment on the test.

junrao · 2017-05-12T00:20:32Z

clients/src/test/java/org/apache/kafka/common/metrics/MetricsTest.java

@@ -424,6 +424,14 @@ public void testPercentiles() {
        assertEquals(0.0, p25.value(), 1.0);
        assertEquals(0.0, p50.value(), 1.0);
        assertEquals(0.0, p75.value(), 1.0);
+
+        // record two more windows worth of sequential values


Hmm, it seems that we need to advance the mocked time to force the rolling of the old window?

This test sets eventWindow to 50 (line 406 above), so we don't need to adjust the time -- it's enough to record 50 events to roll one sample, and 100 events to roll all the two of them. That's how the test works in trunk now btw -- I'm just adding one more "full roll" that reproduces the issue/confirms the fix.

@iv-m : Thanks for the explanation. Do you know why the existing test didn't trigger this issue when we added 2 more windows of 0?

Do you know why the existing test didn't trigger this issue when we added 2 more windows of 0?

When histogram is not reset, incorrect histogram bucket is selected for a given quantile: simply put, as the samples are purged, instead of p50 you get (approximate) value for p25, then p12.5 and so on. But when two windows of zeros are recorded, it does not matter, since all percentiles have the same value: zero.

The test with zeroes looks useful though, as it clearly shows that the percentiles depend on the recent data only.

junrao · 2017-05-15T17:48:41Z

@iv-m : Thanks for the explanation. LGTM.

iv-m · 2017-05-15T18:08:08Z

@junrao, thank you!

junrao reviewed May 12, 2017

View reviewed changes

asfgit closed this in a511a47 May 15, 2017

SlevinBE mentioned this pull request Jan 25, 2018

upgrade to Kafka 0.11.0.1 to resolve percentile metrics issue (KAFKA-5203) linkedin/kafka-monitor#111

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAFKA-5203: Metrics: fix resetting of histogram sample #3002

KAFKA-5203: Metrics: fix resetting of histogram sample #3002

iv-m commented May 9, 2017

iv-m commented May 9, 2017

ijuma commented May 9, 2017

asfbot commented May 9, 2017

asfbot commented May 9, 2017

asfbot commented May 9, 2017

iv-m commented May 9, 2017

asfbot commented May 9, 2017

asfbot commented May 9, 2017

asfbot commented May 9, 2017

junrao left a comment

junrao May 12, 2017

iv-m May 12, 2017

junrao May 12, 2017

iv-m May 15, 2017

junrao commented May 15, 2017

iv-m commented May 15, 2017

KAFKA-5203: Metrics: fix resetting of histogram sample #3002

KAFKA-5203: Metrics: fix resetting of histogram sample #3002

Conversation

iv-m commented May 9, 2017

iv-m commented May 9, 2017

ijuma commented May 9, 2017

asfbot commented May 9, 2017

asfbot commented May 9, 2017

asfbot commented May 9, 2017

iv-m commented May 9, 2017

asfbot commented May 9, 2017

asfbot commented May 9, 2017

asfbot commented May 9, 2017

junrao left a comment

Choose a reason for hiding this comment

junrao May 12, 2017

Choose a reason for hiding this comment

iv-m May 12, 2017

Choose a reason for hiding this comment

junrao May 12, 2017

Choose a reason for hiding this comment

iv-m May 15, 2017

Choose a reason for hiding this comment

junrao commented May 15, 2017

iv-m commented May 15, 2017