Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dnsdist: Add option to set interval between health checks #7142

Merged
merged 1 commit into from Feb 11, 2019
Merged

dnsdist: Add option to set interval between health checks #7142

merged 1 commit into from Feb 11, 2019

Conversation

@1848
Copy link

@1848 1848 commented Nov 4, 2018

Short description

Add option 'checkInterval' for newServer() to set the interval between
health checks.

Checklist

I have:

  • [yes] read the CONTRIBUTING.md document
  • [yes] compiled this code
  • [yes] tested this code
  • [yes] included documentation (including possible behaviour changes)
  • [yes] documented the code
  • [no] added or modified regression test(s)
  • [no] added or modified unit test(s)

Hi,
this is a small patch to allow users to set the interval between health checks for each server.
Did a quick test on my setup which worked as expected.
I hope I did not forget any documentation?

Best regards

@@ -1947,6 +1947,9 @@ static void* healthChecksThread()

auto states = g_dstates.getLocal(); // this points to the actual shared_ptrs!
for(auto& dss : *states) {
if(++dss->lastCheck < dss->checkInterval)

This comment has been minimized.

@RobinGeuze

RobinGeuze Nov 5, 2018
Contributor

A little above this is a sleep which takes an interval as a parameter. Would be much nicer to just change the sleep duration instead of this awkward method. Also maybe replace the sleep with a usleep so you can also go for more fine-grained checks if you want to.

This comment has been minimized.

@rgacogne

rgacogne Nov 5, 2018
Member

I'm not sure I agree, the current code is easy to understand, while the one you suggest would require a bit more logic to keep track of the next health check regardless of the backend (to know for how long we need to sleep) AND the next health check per backend. It might also lead to unexpected surprises when a backend has just been added, for example if the new backend has to be checked every second but the existing ones were only checked every 60s, we might be in the middle of a 60s sleep..

This comment has been minimized.

@RobinGeuze

RobinGeuze Nov 5, 2018
Contributor

Oh right, there is one loop for multiple backends, in that case this is easier/better indeed.

This comment has been minimized.

@1848

1848 Nov 5, 2018
Author

Thought about changing that sleep in two ways.

  1. make checkInterval global for all servers so that sleep(1) would become sleep(checkInterval). But I assume most people (including me) want different intervals for the servers.
  2. remove the sleep(1) and create logic to always wait for the next coming check. But that would add much more code as rgacogne said correctly. Always waking up after 1 second is not a pretty solution but I guess the code/value ratio makes this "ok" :)
    I think there is room to improve server checking in the future. So maybe a overhaul might be the right choice later on.

This comment has been minimized.

@RobinGeuze

RobinGeuze Nov 5, 2018
Contributor

Ye I agree, this is the best solution without completely overhauling this.

Copy link
Member

@rgacogne rgacogne left a comment

Thanks for this PR, one nit but looks good to me otherwise!

@@ -698,6 +698,8 @@ struct DownstreamState
int tcpConnectTimeout{5};
int tcpRecvTimeout{30};
int tcpSendTimeout{30};
int checkInterval{1};
int lastCheck{0};

This comment has been minimized.

@rgacogne

rgacogne Nov 5, 2018
Member

Using unsigned int instead of int for these two would be nicer.

This comment has been minimized.

@1848

1848 Nov 5, 2018
Author

I went for int because I am using std::stoi and there is no std::stoU... maybe I was a bit lazy.
But you are right, I will update this commit.

This comment has been minimized.

@rgacogne

rgacogne Nov 5, 2018
Member

We do have pdns_stou() in misc.hh, which throws std::exception on invalid content.

@pavel-odintsov
Copy link

@pavel-odintsov pavel-odintsov commented Nov 5, 2018

Oh, that's very useful! Thanks!

Add option 'checkInterval' for newServer() to set the interval between
health checks.
@1848
Copy link
Author

@1848 1848 commented Nov 5, 2018

Updated.

First commit is the initial commit with the fix requested by @rgacogne (sorry I read your comment about pdns_stou() too late, will fix later).
Second commit is more like a RFC...

What do you think? Anything else you would like to change besides pdns_stou()?

@rgacogne
Copy link
Member

@rgacogne rgacogne commented Nov 6, 2018

So I would be fine with merging a branch with only the first commit.

The second one is a nice attempt at introducing a better granularity but IMHO this would need much more work. Waking up every second is not an issue, so the only benefit I could see would be sub-second granularity but because the health check thread is currently blocking while checking a backend, which might take up to 1s (the duration of the currently hardcoded timeout), it seems a bit useless to try implementing a sub-second granularity without fixing that at the same time. We should probably make that timeout configurable, and perhaps consider spawning a new thread to do the actual blocking check or use a multiplexer. This is 1.4.0 material, though.

@1848
Copy link
Author

@1848 1848 commented Nov 6, 2018

Yes I think multiplexing would be the right approach, I am thinking about some timers+epoll stuff. But as you said, thats out of this scope.
Last open question: would you prefer pdns_stou() over the current solution (not sure if this little thing is worth including another file)? If so I would do a final update to the commit.

@rgacogne
Copy link
Member

@rgacogne rgacogne commented Nov 6, 2018

Yes I think multiplexing would be the right approach, I am thinking about some timers+epoll stuff.

If you ever want to start working on that, please use our multiplexer code in mplexer.hh, although I think multiplexing might be a bit overkill for this, but let's see.

Last open question: would you prefer pdns_stou() over the current solution (not sure if this little thing is worth including another file)? If so I would do a final update to the commit.

I'm fine with the current code.

@Habbie
Copy link
Member

@Habbie Habbie commented Feb 7, 2019

Ping! If I understand the discussion correctly - can you limit this PR to the first commit so we can merge it?

@1848
Copy link
Author

@1848 1848 commented Feb 10, 2019

Sorry, I was sure I reverted the second commit already.

Copy link
Member

@rgacogne rgacogne left a comment

Looks good to me, thanks!

@rgacogne rgacogne merged commit 9704e9a into PowerDNS:master Feb 11, 2019
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants