-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export Common Duration Metrics #34168
Comments
It's not easy to provide a feature like this in our core stats system. Dynamic and flexible stats means additional memory, additional complexity. (And I think it's complex enough) But the good news is our stats is extendable. I am okay if we do it in an optional filter (or logger? @kyessenov ) |
I filed #30619 which replicates the Istio design with high cardinality metrics so you can do break downs by upstream/downstream paths easily. The general problem is that doing all of this in Envoy would push its stats subsystem beyond its design capabilities, so you still need to run a collector or some stats engine to hold the aggregate data. I'd recommend using delta aggregation temporality as well to flush metrics which Envoy doesn't directly support it. |
To be clear, what I am mostly looking for is to have specific metrics available for the kind of deltas that #33240 enables, for example:
We currently expose these in access logs, but aggregating these into metrics is quite an expensive process if all we are after is some aggregates per cluster / method / status. These metrics do not need the same granularity (read: cardinality) as the access logs, an aggregation by upstream cluster, HTTP method and HTTP status would already be a very useful start. I do like the idea of this being added as an optional filter, too. The metrics could be created dynamically and if the set of potential attributes is limited, cardinality should not be a big problem. |
Title: Export Common Duration Metrics
Description:
With #33240 we got the ability to export various commonly used durations via access logs (thank you!!!). However, it would be great if there was a way to also export these as metrics so they can be ingested by Prometheus.
I don't have a specific design in mind right now, but anything that would pre-aggregate these durations would help immensely in ensuring we can easily alert on the performance of downstream, upstream and envoy itself.
The text was updated successfully, but these errors were encountered: