You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
new prometheus metric showing a counter how often the status of a resolver changed.
Usecase
For some reason we have a flapping resolver. The logs show:
Oct 21 10:48:09 bender-dpriv1 dnsdist[24782]: Marking downstream 109.70.100.136:53 as 'down'
Oct 21 10:48:10 bender-dpriv1 dnsdist[24782]: Marking downstream 109.70.100.136:53 as 'up'
Oct 21 10:48:17 bender-dpriv1 dnsdist[24782]: Marking downstream 109.70.100.136:53 as 'down'
Oct 21 10:48:19 bender-dpriv1 dnsdist[24782]: Marking downstream 109.70.100.136:53 as 'up'
Oct 21 10:48:34 bender-dpriv1 dnsdist[24782]: Marking downstream 109.70.100.136:53 as 'down'
Oct 21 10:48:36 bender-dpriv1 dnsdist[24782]: Marking downstream 109.70.100.136:53 as 'up'
Oct 21 10:48:57 bender-dpriv1 dnsdist[24782]: Marking downstream 109.70.100.136:53 as 'down'
Oct 21 10:48:58 bender-dpriv1 dnsdist[24782]: Marking downstream 109.70.100.136:53 as 'up'
Since the outage is usually lasts just 1-2 seconds it remains largely invisible when monitoring dnsdist_server_status,
therefore we would propose to add two new counters to dnsdist's prometheus metrics to make these issues visible to monitoring.
Description
Given these events:
Oct 21 10:48:09 bender-dpriv1 dnsdist[24782]: Marking downstream 109.70.100.136:53 as 'down'
Oct 21 10:48:10 bender-dpriv1 dnsdist[24782]: Marking downstream 109.70.100.136:53 as 'up'
Oct 21 10:48:17 bender-dpriv1 dnsdist[24782]: Marking downstream 109.70.100.136:53 as 'down'
Oct 21 10:48:18 bender-dpriv1 dnsdist[24782]: Marking downstream 109.70.100.136:53 as 'up'
That sounds like a very good idea, thanks! I have put this in the 1.9 milestone as we are (hopefully) near the first alpha release of 1.8 and I'm afraid I will not have to actually implement that change before the first beta (after which we are in "bug fixes only" until the final release), but I will gladly merge a pull request before the beta if someone else feels up to it :)
Short description
new prometheus metric showing a counter how often the status of a resolver changed.
Usecase
For some reason we have a flapping resolver. The logs show:
Since the outage is usually lasts just 1-2 seconds it remains largely invisible when monitoring
dnsdist_server_status
,therefore we would propose to add two new counters to dnsdist's prometheus metrics to make these issues visible to monitoring.
Description
Given these events:
the new metrics would contain:
The text was updated successfully, but these errors were encountered: