Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature:Configurable summarySetBlackList to set monitoring blacklist. #2165

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

zichen-gan
Copy link

@zichen-gan zichen-gan commented May 16, 2024

Issue:

  • The current ZooKeeper (ZK) cluster frequently experiences excessively long Full Garbage Collection (FGC) events (averaging once a week) that last over 10 seconds, leading to occasional, unexpected leader elections in the ZK cluster.
  • The cluster currently fails to collect mntr monitoring data.

Root Cause:

  • The ZK cluster uses the direct child nodes under the root directory (/) as Namespaces, At the Namespace level, the following have been implemented:
  • Monitoring of MIN, CNT, AVG, MAX, and SUM request counts, which can be obtained via mntr monitoring.
  • monitoring for Namespaces only accounts for additions and not removals, tasks are created directly under the root directory (/), with tasks older than seven days being cleaned daily.
  • This leads to the following phenomenon: The number of Namespaces stored in ZK nodes (157,376) is significantly greater than the actual number of Namespaces in the ZK cluster (5,856).
image image

Solution:
Provide a new version of the ZK server that supports configurable Namespace monitoring.

/**
* Base implementation of {@link MetricsProvider}.<br>
* 提供默认的实现,当前包含summarySet黑名单功能。
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use English, instead of Chinese

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

@maoling
Copy link
Member

maoling commented May 21, 2024

I cannot understand the intention of this PR clearly. From the description, I guess

  • it wants to solve the metric issues which is inaccurate or too expensive? which metrics(write_per_namespace,read_per_namespace)?
  • Add a blacklist can completely solve this problem?

@zichen-gan
Copy link
Author

it wants to solve metrics(write_per_namespace,read_per_namespace) which is too expensive.

  • / stores task IDs, which will be frequently added and deleted.metrics(write_per_namespace,read_per_namespace) only increase and do not decrease.
  • we do not require monitoring for write_per_namespace and read_per_namespace
    Hope to support write_per_namespace, read_per_namespace blacklist。
    @maoling

@zichen-gan
Copy link
Author

Perhaps it's a more universal and maintainable solution to remove the metric monitoring when the / path node is deleted. Regarding this PR, I will make some further modifications.

@zichen-gan zichen-gan marked this pull request as draft May 24, 2024 01:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants