Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-15129;[3/N] Remove metrics in AbstractFetcherManager when fetcher manager instance shutdown #13929

Closed
wants to merge 9 commits into from

Conversation

hudeqi
Copy link
Collaborator

@hudeqi hudeqi commented Jun 29, 2023

This pr is used to remove the metrics in AbstractFetcherManager when ReplicaFetcherManager/ReplicaAlterLogDirsManager instance shutdown.
This pr has passed the corresponding unit test, and it is part of KAFKA-15129.

@hudeqi hudeqi marked this pull request as draft June 29, 2023 11:04
@clolov clolov self-requested a review June 29, 2023 12:37
@clolov clolov added the core label Jun 29, 2023
@hudeqi hudeqi marked this pull request as ready for review June 29, 2023 16:46
@hudeqi hudeqi marked this pull request as draft June 30, 2023 04:28
@hudeqi hudeqi marked this pull request as ready for review June 30, 2023 08:16
Copy link
Collaborator

@clolov clolov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with the other two PRs I have a preference for variable names to start with a lowercase unless there is a reason (i.e. a convention) for them to be with uppercase, but beyond that, this seems reasonable to me.

@hudeqi
Copy link
Collaborator Author

hudeqi commented Jun 30, 2023

And please help to review this @divijvaidya

Copy link
Contributor

@divijvaidya divijvaidya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these code changes @hudeqi. I have seen cases of OOM caused by metric leaks in production and the changes you are making go a long way towards ensure safe code.

@hudeqi hudeqi closed this Jul 5, 2023
@hudeqi hudeqi deleted the clean_metric_replicaManager branch July 5, 2023 03:28
@hudeqi hudeqi reopened this Jul 5, 2023
@hudeqi hudeqi requested a review from divijvaidya July 5, 2023 03:33
@hudeqi hudeqi requested a review from divijvaidya July 6, 2023 15:12
@hudeqi
Copy link
Collaborator Author

hudeqi commented Jul 10, 2023

Are there any hidden problems in this PR? @divijvaidya

@divijvaidya
Copy link
Contributor

I will get to all your PRs on metrics close by end of this week @hudeqi. Right now I am focusing on releasing 3.5.1.

@hudeqi
Copy link
Collaborator Author

hudeqi commented Jul 17, 2023

I will get to all your PRs on metrics close by end of this week @hudeqi. Right now I am focusing on releasing 3.5.1.

@divijvaidya hi, reminding~

@divijvaidya
Copy link
Contributor

I still need to get 3.5.1 out and focus on KIP-405 PRs (targeting 3.6 release) before I get some more time @hudeqi!

@hudeqi
Copy link
Collaborator Author

hudeqi commented Aug 4, 2023

I still need to get 3.5.1 out and focus on KIP-405 PRs (targeting 3.6 release) before I get some more time @hudeqi!

Hi, do you have time to review this PR now? I want to combine all the currently opened PRs about metric leaks into one PR. I don’t know if this will help or burden your review? @divijvaidya

@hudeqi
Copy link
Collaborator Author

hudeqi commented Sep 19, 2023

@nizhikov Hi, can you help review the PR of this series? Looks like @divijvaidya doesn't have time.

@nizhikov
Copy link
Contributor

Hello. I will review this PR in the nearest time.

@@ -31,6 +32,9 @@ abstract class AbstractFetcherManager[T <: AbstractFetcherThread](val name: Stri
extends Logging {
private val metricsGroup = new KafkaMetricsGroup(this.getClass)

// Visible for test
private[server] var metricNamesToTags: Map[String, java.util.Map[String, String]] = Map.empty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use mutable map like it's done for fetcherThreadMap?
Or even Set<MetricName>, see comment below

}, tags)
private[server] def newMetrics(): Unit = {
val tags = Map("clientId" -> clientId).asJava
metricsGroup.newGauge(MaxLagMetricName, () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use

Set<MetricName> metrics = ...;

MetricName name = metricGroup.metricName(MaxLagMetricName, tags);
KafkaYammerMetrics.defaultRegistry().newGauge(name, () => { ... });
metrics.add(name);

And store explicit MetricName instead of implicit name and tags.

metrics.forEach(KafkaYammerMetrics.defaultRegistry()::removeMetric);
`` `

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have metrigGroup#meter and other methods that takes MetricName as parameter.
It's look convinient to create one for Gauge.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, do we really need metricNamesToTags set. It's just 4 metric so may be it will be more clear to create explicit MetricName for each and use them while closing fetcher manager.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @nizhikov , thank you for taking the time to review. The significance of metricNamesToTags is that it can be used in the unit test to check whether all the metrics of the corresponding module are removed when closed, to prevent someone from adding new metrics but forgetting remove, with this collection, you only need to add the new metric to this collection.
Removing the leaking metrics is a long and potentially fixed process, as can be seen from the original reviews reached with @divijvaidya.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems you can use MetricName to check metrics registered and removed during test.

@nizhikov
Copy link
Contributor

Hello. @hudeqi , thanks for the PR.
Left couple of comments.

@hudeqi hudeqi closed this Nov 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants