Skip to content

KAFKA-9939; Fix overcounting delayed fetches in request rate metrics#8586

Merged
hachikuji merged 5 commits into
apache:trunkfrom
hachikuji:KAFKA-9939
May 1, 2020
Merged

KAFKA-9939; Fix overcounting delayed fetches in request rate metrics#8586
hachikuji merged 5 commits into
apache:trunkfrom
hachikuji:KAFKA-9939

Conversation

@hachikuji
Copy link
Copy Markdown
Contributor

Fetches which hit purgatory are currently counted twice in fetch request rate metrics. This patch moves the metric update into fetchMessages so that they are only counted once.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@hachikuji
Copy link
Copy Markdown
Contributor Author

hachikuji commented Apr 29, 2020

The failure in the test case is a problem with the isolation of metrics between test cases. I will submit a fix shortly.

* This test also verifies counts of fetch requests recorded by the ReplicaManager
*/
@Test
def testReadFromLog(): Unit = {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to get rid of this. I believe it is already covered by test cases in ReplicaManagerTest. See for example, testFetchBeyondHighWatermarkReturnEmptyResponse.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That test name is a bit misleading. Should we name it testFetchBeyondHighwatermarkForConsumerAndFollower or something?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's fair. How about just testFetchBeyondHighwatermark?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine. Is it worth adding a brief comment stating what it's testing?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Borderline overkill I guess, but I added a few comments explaining the test behavior.

@edoardocomar
Copy link
Copy Markdown
Contributor

Hi @hachikuji as you're working on this ... do you mind taking a look at #4204 ?

the test we have added there still fails with this change of yours, without the workarounds we had suggested.
Thanks

@edoardocomar
Copy link
Copy Markdown
Contributor

cc @mimaison

Copy link
Copy Markdown
Member

@ijuma ijuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, LGTM. A question and a minor comment below.

* This test also verifies counts of fetch requests recorded by the ReplicaManager
*/
@Test
def testReadFromLog(): Unit = {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That test name is a bit misleading. Should we name it testFetchBeyondHighwatermarkForConsumerAndFollower or something?

mockTimer.advanceClock(11)
assertNotNull(purgatoryFetchResult.get)
assertEquals(Errors.NONE, purgatoryFetchResult.get.error)
assertMetricCount(2)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would return 3 without the fix?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right.

@hachikuji
Copy link
Copy Markdown
Contributor Author

@edoardocomar Thanks for the comment. I agree it is related. Left a comment on #4204.

Copy link
Copy Markdown
Member

@ijuma ijuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hachikuji
Copy link
Copy Markdown
Contributor Author

The 32 test failures on jdk11 were due to threads being left behind after a failure in TransactionsBounceTest. I will submit a separate fix for this. The other failure seems to be one of the recently flaky streams tests.

@hachikuji hachikuji merged commit 794648a into apache:trunk May 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants