-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose server reachability #75
Comments
From the chrony docs
The facebook/time library returns this value as a uint16. While there's some interesting things we could do with the bits, the most useful thing that comes to mind would be to compute the ratio of 1 to 0 bits in the value as a ratio. So if all probes fail the metrics is 0.0. If all probes are passing, 1.0. Another option would be to only expose the current bit, since Prometheus is polling typically faster than NTP packets are sent, we could represent the "last reach success" as a simple bool. The question is, how is that register updated, shift left? shift right? The next useful option would be to expose the bits directly as a state set. While this would provide the full bit detail, it's a bit high cardinality. As for the easy option, exposing the raw byte directly as a value, this seems less useful for monitoring, as you would have to interpret the bits in PromQL for the alert to be useful. I would say this is better mapped in the exporter's code. |
The reachability is available in the SourceData. Edit: a simple |
Flattening the value to a binary Hmm.. maybe this wasn't such a useful idea at all 😅 |
Chrony's default NTP is a very low packet count protocol. |
Compute two reachability metrics from the "Reachability" bitmask. * Count the number of 1s in the bitmask as the polling success ratio. * Expose the right most bit as the "last reach success" Fixes: #75 Signed-off-by: SuperQ <superq@gmail.com>
Did some local testing.
|
Compute two reachability metrics from the "Reachability" bitmask. * Count the number of 1s in the bitmask as the polling success ratio. * Expose the right most bit as the "last reach success" Fixes: #75 Signed-off-by: SuperQ <superq@gmail.com>
Based on a recent mailing list thread I'd like to propose the exposure of an additional metric.
The original chrony client shows the reachability of upstream servers and this can be used to detect changes in network topology without having to rely on time/clock drift for failure detection.
Having this field exposed would make it easy to create e.g. an alertmanager rule.
The text was updated successfully, but these errors were encountered: