Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dnsdist: Force a reconnection when a downstream transitions to the UP state #9275

Merged
merged 3 commits into from
Aug 10, 2020
Merged

dnsdist: Force a reconnection when a downstream transitions to the UP state #9275

merged 3 commits into from
Aug 10, 2020

Conversation

Nuitari
Copy link
Contributor

@Nuitari Nuitari commented Jun 26, 2020

Force a reconnection when a downstream transitions to the UP state

Short description

Force the reconnection of a downstream when a server comes back up.

Checklist

I have:

  • read the CONTRIBUTING.md document
  • compiled this code
  • tested this code
  • included documentation (including possible behaviour changes)
  • documented the code
  • added or modified regression test(s)
  • added or modified unit test(s)

Force a reconnection when a downstream transitions to the UP state
@Habbie Habbie changed the title Closes #5220 dnsdist: Force a reconnection when a downstream transitions to the UP state Jun 26, 2020
Copy link
Member

@rgacogne rgacogne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for submitting this pull request! I'm a bit cautious about reconnecting the socket(s) every time we transition from down to up, however, because it might have a negative impact (responder thread failing to send at least one response, queued TCP connections being lost). If I read the IRC backlog correctly, it looks like in your case the backend was never considered up? It would probably make sense to reconnect in that case indeed.
Out of curiosity, do you know what sendmsg() returned (in udpClientSendRequestToBackend()) before reconnecting the socket? We try to detect such a network configuration change there already, and perhaps we could enhance our logic.

@rgacogne rgacogne added this to the dnsdist-1.6.0 milestone Jun 26, 2020
@Nuitari
Copy link
Contributor Author

Nuitari commented Jun 26, 2020

Thank you for submitting this pull request! I'm a bit cautious about reconnecting the socket(s) every time we transition from down to up, however, because it might have a negative impact (responder thread failing to send at least one response, queued TCP connections being lost). If I read the IRC backlog correctly, it looks like in your case the backend was never considered up? It would probably make sense to reconnect in that case indeed.

In that case the backend was never up, and would come up when a vpn connected. tcpdump would show the packet coming out of the correct interface but with the wrong source IP.
We also saw similar issues where the interface for a server changed (think VM migrations).

I have no way to test those cases with TCP, but I would assume that a down backend would not have a response waiting to be sent upstream. I don't know if dnsdist would redispatch things to another server in a pool if a healthcheck fails.

Does dnsdist open multiple tcp connections to a backend outside of healthchecks?

Out of curiosity, do you know what sendmsg() returned (in udpClientSendRequestToBackend()) before reconnecting the socket? We try to detect such a network configuration change there already, and perhaps we could enhance our logic.

I did not check, however I can infer that it was not -1.
The reason is that the error message in the logs matched the one at line 1698 in dnsdist.cc which is at the info log level.
If the return was -1, there should be an info log entry with "Error sending request to backend", which we never saw.

@rgacogne
Copy link
Member

In that case the backend was never up, and would come up when a vpn connected. tcpdump would show the packet coming out of the correct interface but with the wrong source IP.

OK, I completely agree it would make sense to reconnect when the backend was never up.
Out of curiosity, did you try specifying the source IP via the source parameter? It might not work if the IP doesn't exist yet in the system, though.

We also saw similar issues where the interface for a server changed (think VM migrations).

Right, I can see how this is an issue.

I have no way to test those cases with TCP, but I would assume that a down backend would not have a response waiting to be sent upstream. I don't know if dnsdist would redispatch things to another server in a pool if a healthcheck fails.
Does dnsdist open multiple tcp connections to a backend outside of healthchecks?

I did not think this through last time but we don't prepare sockets for TCP in that case, of course, and already established connections will not be impacted by that change. Only UDP sockets are set up in advance for performance reasons.

So I think there are two different cases:

  • if the backend was never up before, it should not hurt to reconnect the sockets right away when it does get up ;
  • in the general case, I don't think we should reconnect the sockets on a backend going down and up, since we have seen setups with flaky backends going up and down all the time, and recreating the sockets is not cheap (we need to remove them from the IO multiplexer, close them, create new ones and register them to the IO multiplexer, then wait for the receiving thread to realize the sockets have been closed and switch to the new ones). On the other hand I understand this might be useful for some setups where interfaces come and go (VPN, VM migrations, ...) so in my opinion it would make sense to make that automatic re-connection optional, by adding an option (reconnectOnUp?) to newServer and a boolean to DownstreamState.

@Nuitari
Copy link
Contributor Author

Nuitari commented Jul 23, 2020

Out of curiosity, did you try specifying the source IP via the source parameter? It might not work if the IP doesn't exist yet in the system, though.

The IP is not always known either in some of these cases.

* in the general case, I don't think we should reconnect the sockets on a backend going down and up, since we have seen setups with flaky backends going up and down all the time, and recreating the sockets is not cheap (we need to remove them from the IO multiplexer, close them, create new ones and register them to the IO multiplexer, then wait for the receiving thread to realize the sockets have been closed and switch to the new ones). On the other hand I understand this might be useful for some setups where interfaces come and go (VPN, VM migrations, ...) so in my opinion it would make sense to make that automatic re-connection optional, by adding an option (`reconnectOnUp`?) to `newServer` and a boolean to `DownstreamState`.

I've added the option to this PR.

Copy link
Member

@rgacogne rgacogne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this pull request! I'm requesting just one change since the proposed code would not properly connect a socket on the first transition to Up ever if reconnectOnUp is set to false.

pdns/dnsdistdist/dnsdist-healthchecks.cc Outdated Show resolved Hide resolved
Co-authored-by: Remi Gacogne <rgacogne+github@valombre.net>
Copy link
Member

@rgacogne rgacogne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@rgacogne rgacogne merged commit 573dc48 into PowerDNS:master Aug 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants