New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dnsdist: expose all metrics in API (including servers, pools and frontends) #6002
Comments
One item that would be useful to convert according to https://prometheus.io/docs/concepts/metric_types/#histogram if possible is latency:
This would enable the use of https://prometheus.io/docs/prometheus/latest/querying/functions/#histogram_quantile Note that the bucket values are cumulative in contrast to the current buckets. This makes it easy to add additional buckets without breaking dashboards. |
I noticed when you did your mapping that you did not create labels for metrics that only have two possible choices, even if they are mutually exclusive measurements of the same metric type. (Example: cache-hits and cache-misses remain separate metrics, instead of becoming pdns_dnsdist_cache{result="miss"} and pdns_dnsdist_cache{result="hit"} ) Is this a best practice with just two possible results? I know now from experience that any use of labels on the same metric type makes life much, much easier so I'd always lean towards using labels where possible if a metric type has to be aggregated across types. |
@johnhtodd I initially tried to map recursor metric names into 'proper' labeled names, including the hit/miss metrics. I found that things that appear to belong together at first glance (based on names and type of information) are often not drawn together when you build graphs that are actually insightful (you can check the Metronome graphs). I ended up with various graphs where some metrics access information using a label and others do not. Some of the others maybe should have a label, or maybe not. Some metrics would conflict in name with others if you move part into a label, and you would have to rename one of them. It is pretty hard to map a set of metrics to proper labeled entries, if you do not do so from the start and think it through carefully. The current unlabeled metric names at least are known and used by many users, documented (there is room for improvement there). Keeping relabeling to the minimum makes it easier to build dashboards for arbitrary metric storage systems based on the documentation, because you do not need to guess if a specific metric was renamed or not. |
@wojas Thanks for the run-through. We just did a one-for-one mapping to import into Prometheus, and it's been a real challenge to create reasonable queries because we should have used labels where we just mapped to metric names. |
re: full Prometheus histograms support, I've opened #6088 to expose the |
Unless I'm mistaken, we know expose all metrics via the API, and via the prometheus endpoint as well. |
Short description
Currently dnsdist does not export all metrics that are available in Carbon to the API.
The ones I am aware of are:
dnsdist.$instance.main.servers.*.$metric
dnsdist.$instance.main.pools.*.$metric
dnsdist.$instance.main.frontends.*.$metric
Usecase
Gathering all statistics without using Carbon.
Description
It would be hard to expose those in a clean way in the current
/api/v1/servers/localhost/statistics
API.I propose adding a new
/api/v1/servers/localhost/metrics
API that would include these missing statistics, and structure them in a way that would make conversion to Prometheus metrics easy (see also #4947 ). The result data for this API would be in JSON, so that users that do not use Prometheus can still easily access them.Example JSON:
Note that these names do not fully comply to Prometheus metric naming recommendations. Since the current statistic names are well documented, I think it's better to stick as close as possible to current names instead of renaming all of them. The few exceptions that I propose are those where a label would be clearer and more flexible.
If we decide to actually add a Prometheus endpoint, we would need to map '-' to '_' and prefix all of them. Example conversion:
The text was updated successfully, but these errors were encountered: