Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-5565. Add a broker metric specifying the number of consumer gro… #3506

Closed
wants to merge 3 commits into from

Conversation

cmccabe
Copy link
Contributor

@cmccabe cmccabe commented Jul 7, 2017

…up rebalances in progress

@asfgit
Copy link

asfgit commented Jul 7, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/5990/
Test FAILed (JDK 7 and Scala 2.11).

@asfgit
Copy link

asfgit commented Jul 7, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/5975/
Test FAILed (JDK 8 and Scala 2.12).

@asfgit
Copy link

asfgit commented Jul 8, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/5995/
Test PASSed (JDK 7 and Scala 2.11).

@asfgit
Copy link

asfgit commented Jul 8, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/5980/
Test PASSed (JDK 8 and Scala 2.12).

Copy link
Contributor

@ijuma ijuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. We need a KIP because metrics are a public API. Left a couple of minor comments as well.

def value(): Int = {
groupMetadataCache.values.map(group => {
group synchronized { group.currentState == AwaitingSync }
}).count(_ == true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

groupMetadataCache.values.count { group =>
  group synchronized { group.currentState == AwaitingSync }
}

@@ -82,19 +82,38 @@ class GroupMetadataManager(brokerId: Int,

this.logIdent = "[Group Metadata Manager on Broker " + brokerId + "]: "

newGauge("NumOffsets",
private def recreateGauge[T](name: String, gauge: Gauge[T]): Gauge[T] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this? Can we not do the clean up in the test? If we think this is important, we should probably do it for all the gauges in KafkaMetricsGroup, but I am not sure if we need it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is basically:

  1. GroupMetadataManager1 is created. It calls newGauge("NumOffsets"), which registers a metric with the global registry and returns it.
  2. GroupMetadataManager2 is created. It calls newGauge("NumOffsets"). However, this function looks at the global registry and finds that there is already a metric there with that name. So it returns the gauge created in step Switch to using scala 2.9.2 #1.
  3. The step Switch to using scala 2.9.2 #1 gauge is bound to the GroupMetadataManager object from step Switch to using scala 2.9.2 #1, not the one from step KAFKA-294 #2. So the tests fail because their modifications to GroupMetadataManager have no effect on the metrics.

The best way to solve this would probably be to explicitly pass the metric registry to the constructor of GroupMetadataManager, rather than relying on magical global variables. We could perhaps have it default to the global registry if no value was passed.

@asfgit
Copy link

asfgit commented Jul 20, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/6221/
Test PASSed (JDK 7 and Scala 2.11).

@asfgit
Copy link

asfgit commented Jul 20, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/6205/
Test PASSed (JDK 8 and Scala 2.12).

@ijuma
Copy link
Contributor

ijuma commented Sep 29, 2017

@hachikuji, maybe you can review this simple PR.

@hachikuji
Copy link

Discussed with @cmccabe offline. Seems like we may have lost some of the updates during rebase since this seems to reflect an old version of the KIP.

@cmccabe
Copy link
Contributor Author

cmccabe commented Oct 4, 2017

Sorry about the mixup. This should be the latest version, which reflects all KIP discussion.

Copy link

@hachikuji hachikuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch. Just a couple comments.

val numGroupsPreparingRebalanceGauge = recreateGauge("NumGroupsPreparingRebalance",
new Gauge[Int] {
def value(): Int = groupMetadataCache.values.count(group => {
group synchronized { group.currentState == PreparingRebalance }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can replace this with group.is(PreparingRebalance). Similarly for the others.


newGauge("NumGroups",
val numGroupsGauge = recreateGauge("NumGroups",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure these need to be fields. It's a little more work, but we can pull the metric instances out of the metric registry in the test case.

Also, should we update the test case to cover all the metrics?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, other tests usually just retrieve the data from the metrics registry as you said.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK...

import org.apache.kafka.common.internals.Topic

import scala.collection.JavaConverters._
import scala.collection._

class GroupMetadataManagerTest {
class GroupMetadataManagerTest extends Logging {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this is not needed.

@cmccabe
Copy link
Contributor Author

cmccabe commented Oct 6, 2017

retest this please

@cmccabe
Copy link
Contributor Author

cmccabe commented Oct 6, 2017

Looks like Jenkins is having some issues

Caused: java.io.IOException: remote file operation failed: /home/jenkins/jenkins-slave/workspace/kafka-pr-jdk7-scala2.11 at hudson.remoting.Channel@1117e8b9:ubuntu-1
   at hudson.FilePath.act(FilePath.java:994)
   at hudson.FilePath.act(FilePath.java:976)
   at hudson.tasks.junit.JUnitParser.parseResult(JUnitParser.java:103)
   at hudson.tasks.junit.JUnitResultArchiver.parse(JUnitResultArchiver.java:128)
   at hudson.tasks.junit.JUnitResultArchiver.perform(JUnitResultArchiver.java:149)
   at hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:81)
   at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
   at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:736)
   at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:682)
   at hudson.model.Build$BuildExecution.post2(Build.java:186)
   at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:627)
   at hudson.model.Run.execute(Run.java:1762)
   at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
   at hudson.model.ResourceController.execute(ResourceController.java:97)
   at hudson.model.Executor.run(Executor.java:419)
Adding one-line test results to commit status...
Setting status of 882c039b4ce2e5ff4c57ad96f3d136b9fa716eeb to FAILURE with url https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/8731/ and message: 'FAILURE

@cmccabe
Copy link
Contributor Author

cmccabe commented Oct 6, 2017

Retest this please

@hachikuji
Copy link

@cmccabe The one test failure seems legitimate.

@guozhangwang
Copy link
Contributor

retest this please

@@ -82,19 +82,57 @@ class GroupMetadataManager(brokerId: Int,

this.logIdent = s"[GroupMetadataManager brokerId=$brokerId] "

newGauge("NumOffsets",
private def recreateGauge[T](name: String, gauge: Gauge[T]): Gauge[T] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why we want to remove-then-recreate here? Isn't this a one-time call for the life time?

@guozhangwang
Copy link
Contributor

LGTM. Merged to trunk.

@asfgit asfgit closed this in 6d6080f Oct 9, 2017
jeqo pushed a commit to jeqo/kafka that referenced this pull request Nov 16, 2017
…up rebalances in progress

…up rebalances in progress

Author: Colin P. Mccabe <cmccabe@confluent.io>

Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>

Closes apache#3506 from cmccabe/KAFKA-5565
@cmccabe cmccabe deleted the KAFKA-5565 branch May 20, 2019 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants