-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blocklist to disable specific metric tags or metric names #29881
Conversation
@hussein-awala @potiuk may I ask for your review on this PR? Thank you both very much for your input! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is very confusing as defined now. I think we should change "allow_list_validator" field into "list_validator" and have two implementations:
- AllowListValidator
- BlockListValidator
And let the user choose which validator to use (by name for example). The way where you invert meaning of a list is super confusing. IMHO
I see what you are saying @potiuk . I've made your suggested changes in stats.py. I've also introduced a separate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much more straightforward now :). I'd love another committer to review it too.
# If you want to avoid sending all the available metrics to StatsD, | ||
# you can configure a block list of prefixes (comma separated) to filter out metrics that | ||
# start with the elements of the list (e.g: "scheduler,executor,dagrun"). | ||
# If statsd_allow_list and statsd_block_list are both configured, statsd_block_list is ignored |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small NIT. Probably we should have a warning in the logs if both are configured.
Static checks need fixing. |
stat += f",{k}={v}" | ||
else: | ||
log.error("Dropping invalid tag: %s=%s.", k, v) | ||
if self.metric_tags_validator.test(k): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to check if the tag is filtered here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't we only concatenating metric tags that are allowed? (i.e. not disabled / not blocked)
If it's test(k) returns True, we will want to publish that metric tag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, I thought that this filter:
tags_list = [
f"{key}:{value}" for key, value in tags.items() if self.metric_tags_validator.test(key)
]
is used for the two clients datadog and influxdb, but just found that is just used for datadog, and if self.metric_tags_validator.test(k):
is used for influxdb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - influxdb integration just adds the tag key:value pairs to the metric name with a comma-delimited standard, instead of having a separate Statsd client implementation that takes tags as a separate parameter.
Fixed the checks and added the log based on your feedback! @potiuk |
closes: #29663
This PR adds two additional configuration parameters:
statsd_disabled_tags allows users to define a blocklist of tag name prefixes to disable.
statsd_invert_allow_list is a boolean parameter that converts the functionality of statsd_invert_allow_list from an allowlist to a blocklist.
The reason we need the metric name blocklist on top of a blocklist just for metric tags is because there are legacy metric names that have high cardinality under the definition explored in #29663 like:
local_task_job.task_exit.<job_id>.<dag_id>.<task_id>.<return_code>
that users might want to disable to reduce their metric storage costs.