Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dnsdist: Add option to set interval between health checks #7142

Merged
merged 1 commit into from Feb 11, 2019

Conversation

Projects
None yet
5 participants
@1848
Copy link

commented Nov 4, 2018

Short description

Add option 'checkInterval' for newServer() to set the interval between
health checks.

Checklist

I have:

  • [yes] read the CONTRIBUTING.md document
  • [yes] compiled this code
  • [yes] tested this code
  • [yes] included documentation (including possible behaviour changes)
  • [yes] documented the code
  • [no] added or modified regression test(s)
  • [no] added or modified unit test(s)

Hi,
this is a small patch to allow users to set the interval between health checks for each server.
Did a quick test on my setup which worked as expected.
I hope I did not forget any documentation?

Best regards

@@ -1947,6 +1947,9 @@ static void* healthChecksThread()

auto states = g_dstates.getLocal(); // this points to the actual shared_ptrs!
for(auto& dss : *states) {
if(++dss->lastCheck < dss->checkInterval)

This comment has been minimized.

Copy link
@RobinGeuze

RobinGeuze Nov 5, 2018

Contributor

A little above this is a sleep which takes an interval as a parameter. Would be much nicer to just change the sleep duration instead of this awkward method. Also maybe replace the sleep with a usleep so you can also go for more fine-grained checks if you want to.

This comment has been minimized.

Copy link
@rgacogne

rgacogne Nov 5, 2018

Member

I'm not sure I agree, the current code is easy to understand, while the one you suggest would require a bit more logic to keep track of the next health check regardless of the backend (to know for how long we need to sleep) AND the next health check per backend. It might also lead to unexpected surprises when a backend has just been added, for example if the new backend has to be checked every second but the existing ones were only checked every 60s, we might be in the middle of a 60s sleep..

This comment has been minimized.

Copy link
@RobinGeuze

RobinGeuze Nov 5, 2018

Contributor

Oh right, there is one loop for multiple backends, in that case this is easier/better indeed.

This comment has been minimized.

Copy link
@1848

1848 Nov 5, 2018

Author

Thought about changing that sleep in two ways.

  1. make checkInterval global for all servers so that sleep(1) would become sleep(checkInterval). But I assume most people (including me) want different intervals for the servers.
  2. remove the sleep(1) and create logic to always wait for the next coming check. But that would add much more code as rgacogne said correctly. Always waking up after 1 second is not a pretty solution but I guess the code/value ratio makes this "ok" :)
    I think there is room to improve server checking in the future. So maybe a overhaul might be the right choice later on.

This comment has been minimized.

Copy link
@RobinGeuze

RobinGeuze Nov 5, 2018

Contributor

Ye I agree, this is the best solution without completely overhauling this.

@rgacogne
Copy link
Member

left a comment

Thanks for this PR, one nit but looks good to me otherwise!

@@ -698,6 +698,8 @@ struct DownstreamState
int tcpConnectTimeout{5};
int tcpRecvTimeout{30};
int tcpSendTimeout{30};
int checkInterval{1};
int lastCheck{0};

This comment has been minimized.

Copy link
@rgacogne

rgacogne Nov 5, 2018

Member

Using unsigned int instead of int for these two would be nicer.

This comment has been minimized.

Copy link
@1848

1848 Nov 5, 2018

Author

I went for int because I am using std::stoi and there is no std::stoU... maybe I was a bit lazy.
But you are right, I will update this commit.

This comment has been minimized.

Copy link
@rgacogne

rgacogne Nov 5, 2018

Member

We do have pdns_stou() in misc.hh, which throws std::exception on invalid content.

@pavel-odintsov

This comment has been minimized.

Copy link

commented Nov 5, 2018

Oh, that's very useful! Thanks!

1848
Add option to set interval between health checks
Add option 'checkInterval' for newServer() to set the interval between
health checks.

@1848 1848 force-pushed the 1848:checkinterval branch from 05e22ae to bed7696 Nov 5, 2018

@1848

This comment has been minimized.

Copy link
Author

commented Nov 5, 2018

Updated.

First commit is the initial commit with the fix requested by @rgacogne (sorry I read your comment about pdns_stou() too late, will fix later).
Second commit is more like a RFC...

What do you think? Anything else you would like to change besides pdns_stou()?

@rgacogne

This comment has been minimized.

Copy link
Member

commented Nov 6, 2018

So I would be fine with merging a branch with only the first commit.

The second one is a nice attempt at introducing a better granularity but IMHO this would need much more work. Waking up every second is not an issue, so the only benefit I could see would be sub-second granularity but because the health check thread is currently blocking while checking a backend, which might take up to 1s (the duration of the currently hardcoded timeout), it seems a bit useless to try implementing a sub-second granularity without fixing that at the same time. We should probably make that timeout configurable, and perhaps consider spawning a new thread to do the actual blocking check or use a multiplexer. This is 1.4.0 material, though.

@1848

This comment has been minimized.

Copy link
Author

commented Nov 6, 2018

Yes I think multiplexing would be the right approach, I am thinking about some timers+epoll stuff. But as you said, thats out of this scope.
Last open question: would you prefer pdns_stou() over the current solution (not sure if this little thing is worth including another file)? If so I would do a final update to the commit.

@rgacogne

This comment has been minimized.

Copy link
Member

commented Nov 6, 2018

Yes I think multiplexing would be the right approach, I am thinking about some timers+epoll stuff.

If you ever want to start working on that, please use our multiplexer code in mplexer.hh, although I think multiplexing might be a bit overkill for this, but let's see.

Last open question: would you prefer pdns_stou() over the current solution (not sure if this little thing is worth including another file)? If so I would do a final update to the commit.

I'm fine with the current code.

@Habbie

This comment has been minimized.

Copy link
Member

commented Feb 7, 2019

Ping! If I understand the discussion correctly - can you limit this PR to the first commit so we can merge it?

@1848 1848 force-pushed the 1848:checkinterval branch from bed7696 to 7c9bf18 Feb 10, 2019

@1848

This comment has been minimized.

Copy link
Author

commented Feb 10, 2019

Sorry, I was sure I reverted the second commit already.

@rgacogne
Copy link
Member

left a comment

Looks good to me, thanks!

@rgacogne rgacogne merged commit 9704e9a into PowerDNS:master Feb 11, 2019

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.