Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dnsdist: carbon - inaccurate timestamps when using a short reporting interval #11216

Closed
rygl opened this issue Jan 19, 2022 · 2 comments
Closed

Comments

@rygl
Copy link

rygl commented Jan 19, 2022

  • Program: dnsdist
  • Issue type: Bug report

Short description

When using a carbon reporting with a reporting interval of about 5s the timestamps in the carbon protocol are not accurate. This causes a problem on the carbon server - the data are not accepted when using a resolution of 5s as they do not match 5s pattern. It causes empty datapoints and spoils graphs.

Environment

  • Operating system: Debian 11.2
  • Software version: 1.7.0, 1.6.1
  • Software source: PowerDNS repository

Steps to reproduce

  1. Configure a dnsdist instance with rather a complex configuration
  2. Configure a carbon server with reporting interval of 5s
  3. Watch the data stored in whisper db files or capture the carbon traffic

Expected behaviour

The timestamps are strictly in 5s interval

Actual behaviour

A sample decoded from the TCP stream:
dnsdist.rzt-dns-lb1.main.responses 54052979 1642623170 ~ Wed 19 Jan 2022 08:54:04 PM CET
dnsdist.rzt-dns-lb1.main.responses 54030274 1642623159 ~Wed 19 Jan 2022 09:12:39 PM CET

A restart does not help here.

Other information

What is interesting is the fact that this happens in an interval of about 6min - every 6 minutes there is a shifted timestamp. It looks like that with reporting interval of 10s it does not occur. And not all dnsdist instances we use are affected by this. Is is likely related to the complexity of the configuration (number of listeners etc.) than the load (instances nearly without traffic are affected the same way as instances with thousands of qps)

Both dnsdist and carbon receiver are hooked to GPS NTP.

@rgacogne
Copy link
Member

I think the root cause is that we sleep for the configured interval between two runs, regardless of how long a run took. We already know 1 that it causes issues when a server is unreachable but it might also becomes one if a single run is taking too long.

@rgacogne
Copy link
Member

Fixed by #12424.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants