Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unbound occasionally reports broken stats #485

Closed
jaseemabid opened this issue May 6, 2021 · 2 comments
Closed

Unbound occasionally reports broken stats #485

jaseemabid opened this issue May 6, 2021 · 2 comments

Comments

@jaseemabid
Copy link

Describe the bug

Some of the time stats reported by unbound through the control socket formats floats incorrectly and this broke unbound_exporter for us. The values are unexpectedly negative and in the wrong format.

Example stats:

$ sudo unbound-control -c /etc/unbound.conf stats_noreset | grep -F '0.-'

thread1.recursion.time.avg=0.-402022313
thread2.recursion.time.avg=0.-1726117941
thread3.recursion.time.avg=0.-1813043235
total.recursion.time.avg=0.-701203423

Prometheus exporter failed to parse these lines:

2021/05/06 16:47:22 Failed to scrape socket: strconv.ParseFloat: parsing "0.-1176731538": invalid syntax

The exporter barfs on the first parse error and reports unbound as down.

$ curl -s 127.0.0.1:10062/metrics  | grep unbound_up

# HELP unbound_up Whether scraping Unbound's metrics was successful.
# TYPE unbound_up gauge
unbound_up 0

It's interesting how avg is negative but median is not. That may help us reverse engineer and figure out what happened?

thread1.recursion.time.avg=0.-402022313
thread1.recursion.time.median=0.000276483

I can try to reproduce and find more information if required. Unfortunately the running binary didn't have debug symbols so I couldn't extract all local variables with gdb.

To reproduce

  1. Not sure yet, will try to isolate the issue and report more stats later.

Expected behavior

  1. Report time stats in the correct format
  2. Make sure time stats are always positive.

System:

  • Unbound version: Version 1.7.3
  • OS: CentOS

I know its an old version, but it doesn't look like the metrics code changed a lot since then. I couldn't find anything related in the changelog either.

Thanks!

@wcawijngaards
Copy link
Member

The commit fixes the issue by making sure that the values are not negative. It then turns to 0.

It also casts the size_t to long long to stop what may be an integer overflow causing the negative numbers in the division function. This likely stops the wrong statistic report, but I do not know for sure; I guess it was related to a 32bit compile. Thanks for the report!

@jaseemabid
Copy link
Author

Whoa! That's a pretty quick fix @wcawijngaards. Thank you so much 🍻

jedisct1 added a commit to jedisct1/unbound that referenced this issue May 9, 2021
* nlnet/master:
  - Remove case fallthrough from deprecate-rsa-1024 code.
  - Add ./configure --with-deprecate-rsa-1024 that turns off RSA 1024.
  - Fix NLnetLabs#485: Unbound occasionally reports broken stats.
  - Rerun flex and bison.
  - Fix to squelch tcp socket bind failures when the interface is gone.
  - Add more logging for out-of-memory cases.
  - Fix for NLnetLabs#367: only attempt to get the interface for queries   that are no longer on the tcp_waiting_list.
  Clearer template text since not everyone can reopen GitHub issues.
  Changelog note for NLnetLabs#478 - Merge NLnetLabs#478: Allow configuration of TCP timeout while waiting for   response.
  Changelog note and improved comment. - Fix NLnetLabs#481: Fix comment in configuration file.
  doc/example.conf.in: Clarify comment for `auto-trust-anchor-file`
  - Add that log-servfail prints an IP address and more information   about one of the last failures for that query.
  Allow configuration of TCP timeout while waiting for response
  Create issue templates
  - Fix compiler warning for signed/unsigned comparison for   max_reuse_tcp_queries.
  - Fix NLnetLabs#474: always_null and others inside view.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants