Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-5203: Metrics: fix resetting of histogram sample #3002

Closed
wants to merge 1 commit into from

Conversation

iv-m
Copy link

@iv-m iv-m commented May 9, 2017

Without the histogram cleanup, the percentiles are calculated
incorrectly after purging of one or more samples: event counts
go out of sync with counts in histogram buckets, and bucket
with lower value gets chosen for the given quantile.

This change adds the necessary histogram cleanup.

Without the histogram cleanup, the percentiles are calculated
incorrectly after purging of one or more samples: event counts
go out of sync with counts in histogram buckets, and bucket
with lower value gets chosen for the given quantile.

This change adds the necessary histogram cleanup.
@iv-m
Copy link
Author

iv-m commented May 9, 2017

@jkreps could you please review this?

@ijuma
Copy link
Contributor

ijuma commented May 9, 2017

Thanks for the PR. Fix is trivial and seems to make sense, cc @junrao in case there is some reason for the current behaviour.

@asfbot
Copy link

asfbot commented May 9, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3672/
Test PASSed (JDK 8 and Scala 2.12).

@asfbot
Copy link

asfbot commented May 9, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/3676/
Test PASSed (JDK 7 and Scala 2.10).

@asfbot
Copy link

asfbot commented May 9, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/3682/
Test FAILed (JDK 8 and Scala 2.11).

@iv-m
Copy link
Author

iv-m commented May 9, 2017

retest this please

@asfbot
Copy link

asfbot commented May 9, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3675/
Test PASSed (JDK 8 and Scala 2.12).

@asfbot
Copy link

asfbot commented May 9, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/3679/
Test PASSed (JDK 7 and Scala 2.10).

@asfbot
Copy link

asfbot commented May 9, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/3685/
Test PASSed (JDK 8 and Scala 2.11).

Copy link
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iv-m : Thanks for the patch. Looks good. Just a comment on the test.

@@ -424,6 +424,14 @@ public void testPercentiles() {
assertEquals(0.0, p25.value(), 1.0);
assertEquals(0.0, p50.value(), 1.0);
assertEquals(0.0, p75.value(), 1.0);

// record two more windows worth of sequential values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it seems that we need to advance the mocked time to force the rolling of the old window?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test sets eventWindow to 50 (line 406 above), so we don't need to adjust the time -- it's enough to record 50 events to roll one sample, and 100 events to roll all the two of them. That's how the test works in trunk now btw -- I'm just adding one more "full roll" that reproduces the issue/confirms the fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iv-m : Thanks for the explanation. Do you know why the existing test didn't trigger this issue when we added 2 more windows of 0?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know why the existing test didn't trigger this issue when we added 2 more windows of 0?

When histogram is not reset, incorrect histogram bucket is selected for a given quantile: simply put, as the samples are purged, instead of p50 you get (approximate) value for p25, then p12.5 and so on. But when two windows of zeros are recorded, it does not matter, since all percentiles have the same value: zero.

The test with zeroes looks useful though, as it clearly shows that the percentiles depend on the recent data only.

@junrao
Copy link
Contributor

junrao commented May 15, 2017

@iv-m : Thanks for the explanation. LGTM.

@asfgit asfgit closed this in a511a47 May 15, 2017
@iv-m
Copy link
Author

iv-m commented May 15, 2017

@junrao, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants