Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dnsdist: unreachable DoH downstream causes dnsdist to hang on startup until a connection timeout is reached #11250

Closed
sbiberhofer opened this issue Jan 29, 2022 · 3 comments · Fixed by #11253

Comments

@sbiberhofer
Copy link

  • Program: dnsdist
  • Issue type: Bug report

Short description

If a DoH-downstream is configured using newServer({address="ip:443", tls="openssl", subjectName="dns.domain.tld, dohPath="/query"}) and the IP is unreachable, dnsdist hangs on start until a connection timeout is reached. During this time, dnsdist is completely unresponsive and doesn't process client requests despite other available downstream backends.

Environment

  • Operating system: Voidlinux
  • Software version: 1.7.0
  • Software source: Operating system repository

Steps to reproduce

  1. Add a DoH-backend via newServer({address="IP:443", tls="openssl", subjectName="dns.domain.tld, dohPath="/query"})
  2. Ensure that the IP is unreachable and connections to that IP run into a timeout
  3. Start dnsdist

Expected behaviour

dnsdist should continue starting and mark the downstream as unavailble until a response has been obtained.

Actual behaviour

dnsdist stalls once the backend is being checked until a connection timeout is reached:

Exception while trying to write (ready) to HTTP backend connection: Syscall error while processing TLS connection: Connection timed out

Afterwards, startup resumes as normal. During the wait, dnsdist is unresponsive and doesn't accept client connections. DoT downstream backends do not suffer from this problem.

@Habbie
Copy link
Member

Habbie commented Jan 29, 2022

I tested this, and even the > console does not show up until that first timeout is reached.

@rgacogne
Copy link
Member

I doubt this is related to DoH. I have not looked into it but I would not be surprised that this is the direct consequence of the fact that we want to get the initial status of backends before starting to accept queries, so we do a first health-check pass early when the program starts, and I'm not sure we want to change that. It might be an indication that our default timeouts are not great, though.

@rgacogne
Copy link
Member

I was wrong! The DoH health-check was not correctly using the timeout value (milliseconds vs seconds) so our "initial health-check at startup" behaviour was indeed much more painful for outgoing DoH. #11253 fixes that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants