Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dnsdist: Use non-blocking pipes to pass DoH queries/responses around #9211

Merged
merged 4 commits into from Jun 10, 2020

Conversation

rgacogne
Copy link
Member

@rgacogne rgacogne commented Jun 8, 2020

Short description

This commit makes the internal sockets non-blocking so we don't freeze if they ever fill up, and log errors/increment metrics instead.

It also replaces the socket pairs by pipes, since the default buffer size for sockets seems to allow only ~278 pending queries which might be reached given how libh2o batches events. On Linux, a pipe gives us 8192 pending queries by default due to the lower overhead, and it can easily be incremented to 131072 pending queries by setting the pipe size to 1048576. This commits adds a new setting to do just that.

Might help with the issue reported in #9206.

Checklist

I have:

  • read the CONTRIBUTING.md document
  • compiled this code
  • tested this code
  • included documentation (including possible behaviour changes)
  • documented the code
  • added or modified regression test(s)
  • added or modified unit test(s)

This commit makes the internal sockets non-blocking so we don't freeze if
they ever fill up, and log errors/increment metrics instead.

It also replaces the socket pairs by pipes, since the default buffer
size for sockets seems to allow only ~278 pending queries which might
be reached given how libh2o batches events. On Linux, a pipe gives us
8192 pending queries by default due to the lower overhead, and it
can easily be incremented to 131072 pending queries by setting the
pipe size to 1048576. This commits adds a new setting to do just
that.
@rgacogne rgacogne requested a review from omoerbeek Jun 8, 2020
@omoerbeek
Copy link
Member

omoerbeek commented Jun 9, 2020

This looks good, although on OpenBSD en FreeBSD there is not way to set the pipe buffer size, while a socketpair has standardized socket options to set both receive and send buffer size. So I'm not 100% convinced moving to a pipe is the best way.

In the meantime I'll do some tests on OpenBSD.

@rgacogne
Copy link
Member Author

rgacogne commented Jun 9, 2020

This looks good, although on OpenBSD en FreeBSD there is not way to set the pipe buffer size, while a socketpair has standardized socket options to set both receive and send buffer size. So I'm not 100% convinced moving to a pipe is the best way.

The thing is that, on Linux at least, the default value for a socket pair allows up to ~277 queries. Increasing it to the maximum size without changing system-wide settings via sysctl yields a maximum of ~554 queries in my tests, ~87382 after a lot of system tuning.
The default value for a pipe is 8192 queries on Linux, 131072 by setting the pipe buffer size to 1048576 which requires no system tuning. I would also not be surprised if there was more overhead than just the amount of memory when sockets are used instead of pipes. And for what it's worth we have been using a pipe for that task in the recursor for quite some time now :-)

@omoerbeek
Copy link
Member

omoerbeek commented Jun 9, 2020

Could it be that the limits to buffer sizes are because the socketpair created specifies SOCK_DGRAM? I think a SOCK_STREAM socketpair would not have those strict limits.

Come to think of it, it even might explain the root-cause of the issue: datagram socketpairs are not reliable, I suspect even for local sockets.

@rgacogne
Copy link
Member Author

rgacogne commented Jun 9, 2020

Could it be that the limits to buffer sizes are because the socketpair created specifies SOCK_DGRAM? I think a SOCK_STREAM socketpair would not have those strict limits.

Writing and reading to a SOCK_STREAM socket, even a local one, is not atomic. It is for a SOCK_DGRAM one.

@rgacogne
Copy link
Member Author

rgacogne commented Jun 9, 2020

Come to think of it, it even might explain the root-cause of the issue: datagram socketpairs are not reliable, I suspect even for local sockets.

UNIX domain sockets are usually reliable even in datagram mode on most UNIX implementations, although I don't think this is actually required by POSIX.

@omoerbeek
Copy link
Member

omoerbeek commented Jun 9, 2020

There is also SOCK_SEQPACKET, but I agree, it is better to move to a pipe. Noting that atomicity is only guaranteed for writes up to PIPE_BUF, but sizeof(uw) is likely much smaller. A static_assert might be in place though.

@omoerbeek
Copy link
Member

omoerbeek commented Jun 9, 2020

As for reliability of AF_UNIX datagram socket pairs, Posix does not require that indeed.

@@ -1936,6 +1936,10 @@ static void setupLuaConfig(bool client, bool configCheck)
frontend->d_trustForwardedForHeader = boost::get<bool>((*vars)["trustForwardedForHeader"]);
}

if (vars->count("internalPipeBufferSize")) {
frontend->d_internalPipeBufferSize = boost::get<int>((*vars)["internalPipeBufferSize"]);
Copy link
Member

@omoerbeek omoerbeek Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
frontend->d_internalPipeBufferSize = boost::get<int>((*vars)["internalPipeBufferSize"]);
frontend->d_internalPipeBufferSize = boost::get<uint32_t>((*vars)["internalPipeBufferSize"]);

Copy link
Member Author

@rgacogne rgacogne Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not compile because the uint32_t type is not present in that variant. I could add it but I would prefer keeping the number of types limited because it quickly becomes confusing :)

if (errno == EAGAIN || errno == EWOULDBLOCK) {
++g_stats.dohResponsePipeFull;
}

Copy link
Member

@omoerbeek omoerbeek Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe internal erorr on other errors?

Copy link
Member Author

@rgacogne rgacogne Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really know what to do in that case besides releasing the memory.. We can't trigger a response from libh2o because we are not in the right thread for that, and we can't talk to that thread without writing to this pipe. So apart from releasing the memory.. Perhaps I could log an error in verbose mode?

if (sent != sizeof(oldDU)) {
if (errno == EAGAIN || errno == EWOULDBLOCK) {
++g_stats.dohResponsePipeFull;
}
Copy link
Member

@omoerbeek omoerbeek Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe internal error on other errors?

if (sent != sizeof(du)) {
if (errno == EAGAIN || errno == EWOULDBLOCK) {
++g_stats.dohResponsePipeFull;
}
Copy link
Member

@omoerbeek omoerbeek Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here too...

@rgacogne
Copy link
Member Author

rgacogne commented Jun 9, 2020

I added a few static assertions to make sure we don't write more than PIPE_BUF bytes at a time in our pipes.

@rgacogne rgacogne merged commit d7cacb9 into PowerDNS:master Jun 10, 2020
29 checks passed
@rgacogne rgacogne deleted the ddist-doh-non-blocking branch Jun 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants