Blocklist to disable specific metric tags or metric names #29881

sungwy · 2023-03-02T16:56:12Z

closes: #29663

This PR adds two additional configuration parameters:

statsd_disabled_tags:
      description: |
        If you want to avoid sending all the available metrics tags to StatsD,
        you can configure a blocklist of prefixes (comma separated) to filter out metric tags 
        that start with the elements of the list (e.g: "job_id,run_id")
      version_added: 2.6.0
      type: string
      example: job_id,run_id,dag_id,task_id
      default: job_id,run_id

statsd_block_list:
      description: |
        If you want to avoid sending all the available metrics to StatsD,
        you can configure a block list of prefixes (comma separated) to filter out metrics that
        start with the elements of the list (e.g: "scheduler,executor,dagrun").
        If statsd_allow_list and statsd_block_list are both configured, statsd_block_list is ignored
      version_added: 2.6.0
      type: boolean
      example: ~
      default: "False"

statsd_disabled_tags allows users to define a blocklist of tag name prefixes to disable.

statsd_invert_allow_list is a boolean parameter that converts the functionality of statsd_invert_allow_list from an allowlist to a blocklist.

The reason we need the metric name blocklist on top of a blocklist just for metric tags is because there are legacy metric names that have high cardinality under the definition explored in #29663 like: local_task_job.task_exit.<job_id>.<dag_id>.<task_id>.<return_code> that users might want to disable to reduce their metric storage costs.

sungwy · 2023-03-02T16:57:14Z

@hussein-awala @potiuk may I ask for your review on this PR? Thank you both very much for your input!

potiuk

I think it is very confusing as defined now. I think we should change "allow_list_validator" field into "list_validator" and have two implementations:

AllowListValidator
BlockListValidator

And let the user choose which validator to use (by name for example). The way where you invert meaning of a list is super confusing. IMHO

sungwy · 2023-03-05T00:50:56Z

I think it is very confusing as defined now. I think we should change "allow_list_validator" field into "list_validator" and have two implementations:

AllowListValidator

BlockListValidator

And let the user choose which validator to use (by name for example). The way where you invert meaning of a list is super confusing. IMHO

I see what you are saying @potiuk . I've made your suggested changes in stats.py. I've also introduced a separate statsd_block_list config parameter that expects a comma delimited string instead of statsd_invert_allow_list. Made a note on the config comment to note that if both statsd_allow_list and statsd_block_list are set, then statsd_block_list will be ignored in favor of statsd_allow_list.

potiuk

Much more straightforward now :). I'd love another committer to review it too.

potiuk · 2023-03-05T10:35:13Z

airflow/config_templates/default_airflow.cfg

+# If you want to avoid sending all the available metrics to StatsD,
+# you can configure a block list of prefixes (comma separated) to filter out metrics that
+# start with the elements of the list (e.g: "scheduler,executor,dagrun").
+# If statsd_allow_list and statsd_block_list are both configured, statsd_block_list is ignored


One small NIT. Probably we should have a warning in the logs if both are configured.

potiuk · 2023-03-05T11:07:12Z

Static checks need fixing.

…m/syun64/airflow into disable-tags

hussein-awala · 2023-03-05T23:02:38Z

airflow/stats.py

-                        stat += f",{k}={v}"
-                    else:
-                        log.error("Dropping invalid tag: %s=%s.", k, v)
+                    if self.metric_tags_validator.test(k):


Why do we need to check if the tag is filtered here?

@hussein-awala

Aren't we only concatenating metric tags that are allowed? (i.e. not disabled / not blocked)

If it's test(k) returns True, we will want to publish that metric tag

My bad, I thought that this filter:

tags_list = [ f"{key}:{value}" for key, value in tags.items() if self.metric_tags_validator.test(key) ]

is used for the two clients datadog and influxdb, but just found that is just used for datadog, and if self.metric_tags_validator.test(k): is used for influxdb

Yes - influxdb integration just adds the tag key:value pairs to the metric name with a comma-delimited standard, instead of having a separate Statsd client implementation that takes tags as a separate parameter.

sungwy · 2023-03-06T01:25:00Z

Static checks need fixing.

Fixed the checks and added the log based on your feedback! @potiuk

sungwy force-pushed the disable-tags branch from 29f2471 to e3e196e Compare March 2, 2023 16:56

sungwy added 2 commits March 3, 2023 09:17

option to disable metric tags

4e7d1c8

option to disable metric tags

00de48c

sungwy force-pushed the disable-tags branch from e3e196e to 00de48c Compare March 3, 2023 14:17

doc spelling

b3e3d88

potiuk added this to the Airflow 2.6.0 milestone Mar 3, 2023

potiuk requested changes Mar 4, 2023

View reviewed changes

refactor for readability

4009f5b

sungwy added 6 commits March 4, 2023 19:52

refactor for readability

f13ac32

option to disable metric tags

dff1629

option to disable metric tags

1226530

doc spelling

4841b63

refactor for readability

1d2cc94

refactor for readability

caf202d

potiuk force-pushed the disable-tags branch from f13ac32 to caf202d Compare March 5, 2023 10:32

potiuk approved these changes Mar 5, 2023

View reviewed changes

potiuk reviewed Mar 5, 2023

View reviewed changes

sungwy added 3 commits March 5, 2023 16:36

logging

4bd1138

Merge branches 'disable-tags' and 'disable-tags' of https://github.co…

2784099

…m/syun64/airflow into disable-tags

docstring

5aa099a

hussein-awala reviewed Mar 5, 2023

View reviewed changes

potiuk approved these changes Mar 6, 2023

View reviewed changes

potiuk merged commit 86cd79f into apache:main Mar 6, 2023

sungwy deleted the disable-tags branch March 6, 2023 15:02

pierrejeambrun added the type:new-feature Changelog: New Features label Mar 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blocklist to disable specific metric tags or metric names #29881

Blocklist to disable specific metric tags or metric names #29881

sungwy commented Mar 2, 2023 •

edited

Loading

sungwy commented Mar 2, 2023

potiuk left a comment

sungwy commented Mar 5, 2023

potiuk left a comment

potiuk Mar 5, 2023

potiuk commented Mar 5, 2023

hussein-awala Mar 5, 2023

sungwy Mar 5, 2023

hussein-awala Mar 5, 2023

sungwy Mar 6, 2023

sungwy commented Mar 6, 2023

Blocklist to disable specific metric tags or metric names #29881

Blocklist to disable specific metric tags or metric names #29881

Conversation

sungwy commented Mar 2, 2023 • edited Loading

closes: #29663

sungwy commented Mar 2, 2023

potiuk left a comment

Choose a reason for hiding this comment

sungwy commented Mar 5, 2023

potiuk left a comment

Choose a reason for hiding this comment

potiuk Mar 5, 2023

Choose a reason for hiding this comment

potiuk commented Mar 5, 2023

hussein-awala Mar 5, 2023

Choose a reason for hiding this comment

sungwy Mar 5, 2023

Choose a reason for hiding this comment

hussein-awala Mar 5, 2023

Choose a reason for hiding this comment

sungwy Mar 6, 2023

Choose a reason for hiding this comment

sungwy commented Mar 6, 2023

sungwy commented Mar 2, 2023 •

edited

Loading