Description
- Program: dnsdist
- Issue type: Feature request
Short description
Currently dnsdist does not export all metrics that are available in Carbon to the API.
The ones I am aware of are:
dnsdist.$instance.main.servers.*.$metric
dnsdist.$instance.main.pools.*.$metric
dnsdist.$instance.main.frontends.*.$metric
Usecase
Gathering all statistics without using Carbon.
Description
It would be hard to expose those in a clean way in the current /api/v1/servers/localhost/statistics
API.
I propose adding a new /api/v1/servers/localhost/metrics
API that would include these missing statistics, and structure them in a way that would make conversion to Prometheus metrics easy (see also #4947 ). The result data for this API would be in JSON, so that users that do not use Prometheus can still easily access them.
Example JSON:
[
{"name": "queries", "value": 123},
{"name": "cache-hits", "value": 12},
{"name": "server-queries", "labels": {"server": "foo"}, "value": 123},
{"name": "server-drops", "labels": {"server": "foo"}, "value": 2},
{"name": "pool-cache-hits", "labels": {"pool": "bar"}, "value": 2},
{"name": "latency-avg", "labels": {"window": "100"}, "value": 2.1},
{"name": "latency-avg", "labels": {"window": "1000"}, "value": 1.7},
{"name": "latency", "labels": {"bucket": "0-1"}, "value": 114},
{"name": "latency", "labels": {"bucket": "1-10"}, "value": 123},
{"name": "latency", "labels": {"bucket": "slow"}, "value": 13},
{"name": "rule", "labels": {"action": "nxdomain"}, "value": 7},
]
Note that these names do not fully comply to Prometheus metric naming recommendations. Since the current statistic names are well documented, I think it's better to stick as close as possible to current names instead of renaming all of them. The few exceptions that I propose are those where a label would be clearer and more flexible.
If we decide to actually add a Prometheus endpoint, we would need to map '-' to '_' and prefix all of them. Example conversion:
acl-drops pdns_dnsdist_acl_drops
cache-hits pdns_dnsdist_cache_hits
cache-misses pdns_dnsdist_cache_misses
cpu-sys-msec pdns_dnsdist_cpu_sys_msec
cpu-user-msec pdns_dnsdist_cpu_user_msec
downstream-send-errors pdns_dnsdist_downstream_send_errors
downstream-timeouts pdns_dnsdist_downstream_timeouts
dyn-block-nmg-size pdns_dnsdist_dyn_block_nmg_size
dyn-blocked pdns_dnsdist_dyn_blocked
empty-queries pdns_dnsdist_empty_queries
fd-usage pdns_dnsdist_fd_usage
latency-avg100 pdns_dnsdist_latency_avg{window="100"}
latency-avg1000 pdns_dnsdist_latency_avg{window="1000"}
latency-avg10000 pdns_dnsdist_latency_avg{window="10000"}
latency-avg1000000 pdns_dnsdist_latency_avg{window="1000000"}
latency0-1 pdns_dnsdist_latency{bucket="0-1"}
latency1-10 pdns_dnsdist_latency{bucket="1-10"}
latency10-50 pdns_dnsdist_latency{bucket="10-50"}
latency100-1000 pdns_dnsdist_latency{bucket="100-1000"}
latency50-100 pdns_dnsdist_latency{bucket="50-100"}
latency-slow pdns_dnsdist_latency{bucket="slow"}
no-policy pdns_dnsdist_no_policy
noncompliant-queries pdns_dnsdist_noncompliant_queries
noncompliant-responses pdns_dnsdist_noncompliant_responses
queries pdns_dnsdist_queries
rdqueries pdns_dnsdist_rdqueries
real-memory-usage pdns_dnsdist_real_memory_usage
responses pdns_dnsdist_responses
rule-drop pdns_dnsdist_rule{action="drop"}
rule-nxdomain pdns_dnsdist_rule{action="nxdomain"}
rule-refused pdns_dnsdist_rule{action="refused"}
self-answered pdns_dnsdist_self_answered
servfail-responses pdns_dnsdist_servfail_responses
trunc-failures pdns_dnsdist_trunc_failures
uptime pdns_dnsdist_uptime