StatsD metrics have tags + generate reference docs #12158

mik-laj · 2020-11-07T12:05:35Z

While I was working on generating automatic reference documentation, I entered the labels parameter to be able to generate text lazily. However, when I took a closer look at it, I realized that it can also be used to generate valid tags for statsd metrics.
Close: #11463

The name of the metric is not perfect as it contains placeholders, but it is good enough to be found in the documentation. We don't have two different sets of keys.

^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

potiuk · 2020-11-07T12:44:59Z

airflow/models/dagbag.py

            # TODO: Remove for Airflow 2.0
            filename = file_stat.file.split('/')[-1].replace('.py', '')
-            Stats.timing(f'dag.loading-duration.{filename}', file_stat.duration)
+            Stats.timing('dag.loading-duration.{filename}', file_stat.duration, labels={"filename": filename})


Should we also add (on top of existing ones) new metrics that have just "short" name and the variables are only passed as labels?

For example in this case:

Stats.timing('dag.loading-duration', file_stat.duration, labels={"filename": filename})

I think those will be much easier to aggregate in some tools.

Or maybe we should leave the old stats as they are and only add labels in the new ones ? WDYT? We could even add some prefix/indication that those stats are the "labeled" ones and distinguish them in the stats documentation ?

The only problem is that not every client supports tags, and some clients support tags in a different way than the official Datadog. For this reason, I preferred not to add more keys, but leave these decisions to the statsd client.
If the client supports the tags, it will keep metrics in the best form for itself.

We currently have two clients:

Classic statsd client (without tags support), which in this case will get a metric - dag.loading-duration.my-awesome-file.py without tags

Datadog statsd (with tags support), which in this in this case will get a metric - dag.loading-duration.filename with tag filename:my-awesome-file.py

So you can aggregate this data if you need it.

This key name is not ideal when the client supports tags, but we don't need to generate the metrics key twice. On the other hand, writing the available parameters in the name of the metrics is not a stupid idea. This can help us find the tags more easily.

It is also worth adding that Stackdriver stores labels differently. Instead of tags, it uses a dictionary that has a normal key and value

I have added documentation that describes this behavior.

Hi guys.
Usually, best practice is to have generic metric names and use tags for any variable parameters, that it will make it possible to do all possible breakdowns, slice and dice etc. in the monitoring tool (if it supports it of course).
With that said, even if it is a weird name I actually do not see problem to have metric name like dag.loading-duration.filename with tags=['filename={filename}'], if this is going to simplify implementation/support/usage.

However, there is one caveat. What if some time later we are going to extend metric with additional tags, e.g. for metric operator_failures_{task_type} we would like to add one more tag like dag_id. Maybe not a very good example, but I am pretty sure for some metrics we could put more information that will be helpful to build breakdown by.
Then we actually will have to change name of the metric, that will be a breaking change, cause all existing setup alerts/dashboards will stop to work (or we will have to support two metrics from now operator_failures_{task_type} + operator_failures_{task_type}_{dag_id}, which is not also ideal).

So, as for me, actually omitting placeholders in names like operator_failures/dag.loading-duration/... and using tags should be the best for clients which support tags and want to leverage them. Unless we are sure that we will not add more tags to existing metrics, which can be not the true.
Then it seems like having maybe two type of metrics, one without tags, other with tags without placeholders in the names might be a good solution.

But maybe this is not an issue. What do you think guys about the fact that metrics can be extended with additional tags in the future.

…tags

…s have tags

… metrics have tags

mik-laj · 2020-11-12T11:41:56Z

@Acehaidrey do you have any thoughts on this change?

potiuk · 2020-11-22T08:04:48Z

some confflicts - but it's good for me.

ashb · 2020-11-24T10:36:27Z

airflow/stats.py

+    @_format_safe_stats_logger_args
+    def incr(self, stat, count=1, rate=1, labels=None):
        """Increment stat"""
+        del labels


What's this for?

This prevents an error about an unused variable and at the same time we do not have a poorly descriptive variable _.

Is this actually needed? Because of the _format_safe_stats_logger_args decorator, this is never called with any value:

def func_wrap(_self, stat, *args, labels=None, **kwargs): if labels: # Remove {} from stat key e.g. {class_name} => class_name stat = stat.format(**labels) return validate_stat(func)(_self, stat, *args, **kwargs)

labels is an explicit parameter, so won't be in the kwargs passed to this function (func in that snippet)

airflow/stats.py

ashb · 2020-11-24T10:39:13Z

airflow/stats.py

+    @functools.wraps(func)
+    def func_wrap(_self, stat, *args, labels=None, **kwargs):
+        if labels:
+            # Remove {} from stat key e.g. {class_name} => class_name


Suggested change

# Remove {} from stat key e.g. {class_name} => class_name

# Remove {} from stat key e.g. {class_name} => SchedulerJob

ashb · 2020-11-24T10:44:41Z

airflow/stats.py

+    description: str
+
+
+METRICS_LIST: List[Metric] = [


I'm not sure the purpose of this List, nor the Metric class -- it appears this is never used for emitting metrics at runtime, so I'm not sure why we need this?

What was your thinking here please?

Acehaidrey · 2020-12-07T06:50:46Z

I honestly think this is great. I'm not sure how to capture this as we use opentsdb and have an internal client for it, so I had to make changes to all of these stat calls to match the same signature and use tags instead of the a.b.c. This will help make this easier to manage. It seems graphite uses that specific notation to do aggregation, whereas other clients use tags for that same result.

github-actions · 2021-03-01T00:30:36Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

arbass22 · 2021-03-03T18:57:29Z

airflow/models/taskinstance.py

-        Stats.incr(f'operator_successes_{self.task.task_type}', 1, 1)
+        task_type = self.task.task_type
+        Stats.incr('operator_successes_{task_type}', 1, 1, labels={"task_type": task_type})
        Stats.incr('ti_successes')


Would it be possible to add the dag/task tags to ti_successes and ti_failures as well? I think this would useful for aggregation so we can ignore or watch specific dags for failures

williamBartos · 2021-03-25T21:03:19Z

Are there plans to still merge this?

boring-cyborg bot added area:dev-tools area:docs area:Scheduler including HA (high availability) scheduler labels Nov 7, 2020

Kamil Breguła added 2 commits November 7, 2020 13:07

StatsD metrics have tags

6f26f28

fixup! StatsD metrics have tags

512f387

mik-laj force-pushed the stats-tags branch from 22fe7f4 to 512f387 Compare November 7, 2020 12:11

fixup! fixup! StatsD metrics have tags

8606d86

mik-laj marked this pull request as ready for review November 7, 2020 12:13

mik-laj requested a review from potiuk November 7, 2020 12:17

Kamil Breguła added 2 commits November 7, 2020 13:20

fixup! fixup! fixup! StatsD metrics have tags

e40aa49

fixup! fixup! fixup! fixup! StatsD metrics have tags

2c42f34

mik-laj requested a review from kaxil November 7, 2020 12:33

potiuk reviewed Nov 7, 2020

View reviewed changes

Kamil Breguła added 4 commits November 7, 2020 14:45

fixup! fixup! fixup! fixup! fixup! StatsD metrics have tags

6baffb8

fixup! fixup! fixup! fixup! fixup! fixup! StatsD metrics have tags

eb9e78b

fixup! fixup! fixup! fixup! fixup! fixup! fixup! StatsD metrics have …

f47b4e6

…tags

fixup! fixup! fixup! fixup! fixup! fixup! fixup! fixup! StatsD metric…

fc52c66

…s have tags

mik-laj added the area:monitoring label Nov 7, 2020

fixup! fixup! fixup! fixup! fixup! fixup! fixup! fixup! fixup! StatsD…

fe58819

… metrics have tags

turbaszek requested a review from potiuk November 18, 2020 18:13

potiuk approved these changes Nov 22, 2020

View reviewed changes

ashb reviewed Nov 24, 2020

View reviewed changes

airflow/stats.py Show resolved Hide resolved

ashb reviewed Nov 24, 2020

View reviewed changes

mik-laj mentioned this pull request Feb 26, 2021

Utilize tags for metrics sent to SafeDogStatsdLogger #8743

Closed

github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Mar 1, 2021

arbass22 reviewed Mar 3, 2021

View reviewed changes

github-actions bot closed this Mar 9, 2021

malthe mentioned this pull request Oct 6, 2022

Tags rather than names in variable parts of the metrics #11463

Closed

	# Remove {} from stat key e.g. {class_name} => class_name
	# Remove {} from stat key e.g. {class_name} => SchedulerJob

StatsD metrics have tags + generate reference docs #12158

StatsD metrics have tags + generate reference docs #12158

Uh oh!

Conversation

mik-laj commented Nov 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

potiuk Nov 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mik-laj commented Nov 12, 2020

Uh oh!

potiuk commented Nov 22, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Acehaidrey commented Dec 7, 2020

Uh oh!

github-actions bot commented Mar 1, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

williamBartos commented Mar 25, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

mik-laj commented Nov 7, 2020 •

edited

Loading

potiuk Nov 7, 2020 •

edited

Loading