Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus metrics? #2

Closed
emmanuel opened this issue Oct 24, 2017 · 11 comments
Closed

Prometheus metrics? #2

emmanuel opened this issue Oct 24, 2017 · 11 comments

Comments

@emmanuel
Copy link

On the blog articles that correspond to this repository the table of contents indicates that Prometheus metrics exposition was intended as the fourth (next) article in the series.

I'm very interested in Prometheus exposition from Envoy, and I'm wondering if you could give me a hint about how you were planning to go about it. I have a couple of thoughts so far. First, I see that Envoy's native /stats endpoint on the admin port is a bit like statsd output.

  1. I was thinking a bit about using the Prometheus statsd-exporter in order to provide a sink for Envoy to flush to, but I'm not yet familiar enough with the Envoy stats format to have a sense of how many rules would be called for to extract the varying portions of the stat names.
  2. Envoy itself could be extended to natively provide metrics in Prometheus's exposition format. I'm not yet familiar enough with the codebase to have a sense of how much effort would be called for to take this approach, though the docs signal that it shouldn't be too hard.

What do you think? Were you imagining going for one of the above approaches, or do you have something else in mind? I'm interested to implement this (depending on effort required), so if you have a thought of a good route, I may come back shortly with a pull request.

@emmanuel
Copy link
Author

emmanuel commented Oct 24, 2017

With regard to option 2 above, my comment about expected effort is based on the comment here: https://www.envoyproxy.io/envoy/intro/arch_overview/statistics.html

Envoy uses statsd as the statistics output format, though plugging in a different statistics sink would not be difficult.

That said, the comment is with regard to a push-based stats approach, rather than a pull-based one like Prometheus.

@christian-posta
Copy link
Owner

christian-posta commented Oct 26, 2017 via email

@christian-posta
Copy link
Owner

christian-posta commented Oct 26, 2017 via email

@emmanuel
Copy link
Author

emmanuel commented Oct 26, 2017

I’m going to try #1 for now.

There’s been at least some discussion in the Envoy project about #2: naively natively supporting a tag-based (dimensional/metrics 2.0) approach, here: envoyproxy/envoy#1536. That said, it doesn’t look like anyone is working on it at the moment.

I would like to contribute native support (#2) but I will start by parsing the current output (#1) and then circle back around if it warrants the effort.

@mrice32
Copy link

mrice32 commented Oct 26, 2017

It would be great to see new stats sinks/exporters added to Envoy. There seems to be a lot of interest around this.

For 2, we just closed 1536 with 1852 just now. This means that tags/dimensions have first class support in Envoy. Please see here and here for documentation on how to configure these tags as a part of the bootstrap config. Please see here for Stats::Metric, which is the base class that encapsulates all of this identifying information for all types of metrics that are flushed to sinks.

However, to take advantage of these tags, there will need to be stats sinks that use them (this currently does not exist in Envoy as the only native stats sink is statsd), which goes back to 1.

Feel free to open an issue in Envoy to discuss creating a sink for Prometheus or if there are any issues you see with the existing tags/dimensions built into Envoy now.

@mrice32
Copy link

mrice32 commented Oct 26, 2017

I'd also like to note that stats sinks have recently become statically pluggable (like filters). This means that you can link in a factory implementation, provide the corresponding string in the config, and Envoy will use your provided sink without any modification to the Envoy codebase (although it is definitely encouraged to add these sinks to the upstream Envoy repo so other users can look it up by default). See the statsd factory and the tcp statsd sink it creates for an example of how to do this.

@emmanuel
Copy link
Author

@mrice32 thanks for adding your comments. I wrote my comments above before realizing that work was actually underway to implement tagging/labels/dimensions for stats in Envoy. I'm curious why tagging wasn't treated as the canonical representation and converted to strings as needed (based on my presumption that labels provide a complete superset of string-based naming), but I'm nowhere near deep enough in the codebase to make a claim about that being a better approach overall.

As far as taking the tagging support in its current state all the way through to Prometheus support, the first thing I would bring up is that Prometheus uses a pull-based collection model, and as such is not a 'sink' in the same way that statsd, etc are. I suspect that it would therefore make more sense to provide Prometheus formatted metrics as another format on the /stats path of the admin endpoint (in addition to statsd-like text and JSON, as currently available). A Prometheus server would then discover the Envoy instances as 'scrape targets' using one of the available service-discovery mechanisms that Prometheus supports, and would periodically issue requests to the appropriate endpoint on each Envoy instance to retrieve stats from it.

In fact, 'Prometheus format' is actually one or both of two encodings: a Protobuf representation and a text representation. Given that there is already Protobuf machinery in play in Envoy's build system, I'm hopeful that leveraging the existing tooling to support Prometheus's Protobuf format is not unreasonable. In any case, the text format is relatively compact (moreso than JSON, certainly), and I would hope that it be supported regardless of Protobuf support, for the purpose of adhoc debugging and inspection.

Does that make sense? Do you see any major roadblocks to adding another format to the /stats endpoint?

@mrice32
Copy link

mrice32 commented Oct 26, 2017

To respond to your first question, there are two reasons:

  1. For the tags to be the canonical representation, Envoy's codebase would have to be opinionated about what is a tag vs a static portion of the stat name. It was seen as beneficial to allow this to be configurable (with defaults) using regexes rather than static and requiring modification of Envoy internals for a user to change.
  2. Historical reasons. Envoy used to only support statsd, which (as far as I'm aware) accepts static strings from which tags can be extracted downstream. This meant that the existing canonical representation was originally static strings. There was a decent amount of infrastructure in Envoy built around this assumption (for instance, this is embedded in hot restart). Much of this would need to be redesigned to handle a different canonical representation.

As for adding new formats for scraping stats using a pull model, I think that makes sense. I agree that protobuf is probably the right way to go. As usual, it will probably need to be configurable in the Envoy config, but I don't see any reason this would not work or people would be opposed to having new formats available via /stats. Here is where those stats are dispatched in response to /stats. However, as it is noted in the code, timers (we actually changed that to histograms recently, so that should be updated) and histograms are sent to statsd when recorded and not stored. That means that a pull-based approach will not be able to capture those stats unless we implement some storage system for them (non-trivial since this storage would be unbounded without some way to aggregate or delete old samples). Another thing to note is that currently the tags are not included in the /stats response. The counters and gauges are readily available in that method, though, so that shouldn't be difficult to add.

@mrice32
Copy link

mrice32 commented Oct 26, 2017

cc @mattklein123 @htuch

@mattklein123
Copy link

There is a lot of general interest in native Prometheus support. I opened an issue to track. Can we please discuss there: envoyproxy/envoy#1947

@emmanuel
Copy link
Author

Yeah, this conversation definitely belongs there. Thanks @mattklein123.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants