Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce number of TLS connections to forwarded (DoT) when using "forward-tls-upstream" #47

Closed
iz8mbw opened this issue Jul 3, 2019 · 35 comments · Fixed by #283
Closed

Comments

@iz8mbw
Copy link

iz8mbw commented Jul 3, 2019

Hi.
First of all I want to thank you very much for your work about unbound!!
I mainly use unbound as forwarder to Cloudflare (1.1.1.1 and 1.0.0.1). So unbound listen on my Linux on port 53 and forward DNS queries to 1.1.1.1:853 via TLS (DNS over TLS). I use "forward-tls-upstream" option.
I have notice that unbound, for each new lookup, create a new connection to 1.1.1.1 on port 853 in order to process the lookup. This is tested also thanks to netstat command.
Well, on a performance point of view, I expect that if unbound will go to establish only one (or much less I mean) TLS connection to 1.1.1.1:853 (and leave it UP) the DNS queries will be processed more fastly because unbound don't need time to establish a new TLS connection each time to 1.1.1.1/1.0.0.1.
Is there the possiblity to set unbound to use one and always the same connection to 1.1.1.1:853 for all DNS over TLS queries?
If not, is there a plan to add an option to unbound in order to set what I'm asking?
Many thanks for you time.

This is the output of "netstat" command that show how many connections unbound do to DoT forwarder (Cloudflare in this case). In this example only one client is connected to unbound DNS Server (port 53):

root@server:~# netstat -anp | grep -i 53
tcp        0      0 0.0.0.0:53              0.0.0.0:*               LISTEN     620/unbound
tcp        0      0 192.168.1.10:40436     1.0.0.1:853             TIME_WAIT   -
tcp        0      0 192.168.1.10:48402     1.1.1.1:853             TIME_WAIT   -
tcp        0      0 192.168.1.10:48422     1.1.1.1:853             TIME_WAIT   -
tcp        0      0 192.168.1.10:40402     1.0.0.1:853             TIME_WAIT   -
tcp        0      0 192.168.1.10:48432     1.1.1.1:853             TIME_WAIT   -
tcp        0      0 192.168.1.10:48428     1.1.1.1:853             TIME_WAIT   -
tcp        0      0 192.168.1.10:40408     1.0.0.1:853             TIME_WAIT   -
tcp      225    137 192.168.1.10:48444     1.1.1.1:853             ESTABLISHED 620/unbound
tcp      152      0 192.168.1.10:40456     1.0.0.1:853             ESTABLISHED 620/unbound
tcp        0      0 192.168.1.10:48426     1.1.1.1:853             TIME_WAIT   -
tcp        0      0 192.168.1.10:40410     1.0.0.1:853             TIME_WAIT   -
tcp        0      0 192.168.1.10:48400     1.1.1.1:853             TIME_WAIT   -
tcp        0      0 192.168.1.10:40434     1.0.0.1:853             TIME_WAIT   -
tcp        0      0 192.168.1.10:40404     1.0.0.1:853             TIME_WAIT   -
tcp        0      0 192.168.1.10:40412     1.0.0.1:853             TIME_WAIT   -
tcp        0      0 192.168.1.10:48396     1.1.1.1:853             TIME_WAIT   -
tcp        0      0 192.168.1.10:48430     1.1.1.1:853             TIME_WAIT   -
tcp        0      0 192.168.1.10:48410     1.1.1.1:853             TIME_WAIT   -
tcp        0      0 192.168.1.10:40470     1.0.0.1:853             ESTABLISHED 620/unbound
tcp      377      0 192.168.1.10:48442     1.1.1.1:853             ESTABLISHED 620/unbound
tcp        0      0 192.168.1.10:48420     1.1.1.1:853             TIME_WAIT   -
tcp        0      0 192.168.1.10:40414     1.0.0.1:853             TIME_WAIT   -
tcp        0      0 192.168.1.10:40472     1.0.0.1:853             ESTABLISHED 620/unbound
tcp        0      0 192.168.1.10:40446     1.0.0.1:853             TIME_WAIT   -
tcp        0      0 192.168.1.10:48418     1.1.1.1:853             TIME_WAIT   -
tcp        0      0 192.168.1.10:48416     1.1.1.1:853             TIME_WAIT   -
tcp      385      0 192.168.1.10:48446     1.1.1.1:853             ESTABLISHED 620/unbound
tcp      152      0 192.168.1.10:40460     1.0.0.1:853             ESTABLISHED 620/unbound
tcp        0      0 192.168.1.10:40426     1.0.0.1:853             TIME_WAIT   -
tcp        0      0 192.168.1.10:40420     1.0.0.1:853             TIME_WAIT   -
tcp        0      0 192.168.1.10:48436     1.1.1.1:853             TIME_WAIT   -
tcp      377      0 192.168.1.10:48440     1.1.1.1:853             ESTABLISHED 620/unbound
tcp        0      0 192.168.1.10:48408     1.1.1.1:853             TIME_WAIT   -
tcp        0      0 192.168.1.10:40428     1.0.0.1:853             TIME_WAIT   -
tcp        0      0 192.168.1.10:40416     1.0.0.1:853             TIME_WAIT   -
tcp        0      0 192.168.1.10:40406     1.0.0.1:853             TIME_WAIT   -
udp     6016      0 0.0.0.0:53              0.0.0.0:*                          620/unbound
@ralphdolmans
Copy link
Contributor

Hi Fabio,
Upstream connection reuse is not yet supported in Unbound but is something that is on our roadmap for the near future.

@iz8mbw
Copy link
Author

iz8mbw commented Jul 11, 2019

Hi Ralph.
Many many thanks, I hope this improvements will be implemented soon.
Thank you!!

@iz8mbw
Copy link
Author

iz8mbw commented Nov 19, 2019

Hi @ralphdolmans @wcawijngaards
Any news about this improvement?
Thank you!!

@yegle
Copy link

yegle commented Nov 19, 2019

Just want to chime in on this issue. I've being using Stubby instead of Unbound for this exact issue. Stubby will try to reuse existing connection as much as possible, which is more performant and also saves a lot of duplicated entries on my router's NAT table.

@iz8mbw
Copy link
Author

iz8mbw commented Nov 19, 2019

I know Stubby reuse connections.
I hope soon also unbound will go to reuse connections.

Regards

@triatic
Copy link

triatic commented Nov 19, 2019

I use Stubby too for reusable connections. Unbound runs as a Windows service though, which is not yet supported in Stubby, so it would be nice if Unbound added reusable connections.

@matkeith
Copy link

matkeith commented Jan 5, 2020

pfsense makes use of unbound for DNS over TLS as well. TCP session reuse would be fantastic here for applying this service internally and externally to an entire network. I am eagerly waiting for an update here as well.

@bjovereinder
Copy link
Member

We plan to include connection reuse in Unbound in the coming months. Most likely not in the next new release, but in the one after that.

@vbence
Copy link

vbence commented May 1, 2020

Let me also chime in: please put it behind a yes/no flag. Using TOR with standalone connections provides a level of privacy (queries can not be correlated by the server based on originating IP anymore).
Using a single connection puts them into the same bucket re-introducing their relation we have successfully hidden using TOR.

(Again, talking about the server's viewpoint).

@maciejsszmigiero
Copy link

The issue has been reported earlier in NLnet Labs Bugzilla:
https://www.nlnetlabs.nl/bugs-script/show_bug.cgi?id=4089

@ialex87
Copy link

ialex87 commented Jun 1, 2020

We plan to include connection reuse in Unbound in the coming months. Most likely not in the next new release, but in the one after that.

Hey sorry to bother but was this done?

@triatic
Copy link

triatic commented Jun 1, 2020

Hey sorry to bother but was this done?

No. It should be in Unbound 1.11 going off what was said.

@ialex87
Copy link

ialex87 commented Jun 1, 2020

Alright, thanks for quick answer

@iz8mbw iz8mbw closed this as completed Jun 1, 2020
@iz8mbw iz8mbw reopened this Jun 1, 2020
@bjovereinder
Copy link
Member

Hi all,

Here is an update on TLS session resumption in Unbound. We are currently developing this feature in Unbound, but it will most likely not be included in Unbound version 1.11 (or an initial 1.11.0 version) yet. However, we expect to complete TLS session resumption soon. We will keep you posted on progress this month.

Thank you,

-- Benno

@iz8mbw
Copy link
Author

iz8mbw commented Jun 2, 2020

Great, thanks

@iz8mbw
Copy link
Author

iz8mbw commented Jul 30, 2020

Hi @ralphdolmans @bjovereinder
any progress on that?
Still thank you!!

@wcawijngaards
Copy link
Member

There is code in progress on the branch https://github.com/NLnetLabs/unbound/tree/stream-reuse
The code looks to work but needs some further testing to make sure.

@iz8mbw
Copy link
Author

iz8mbw commented Jul 30, 2020

@wcawijngaards if I would like to test, after compiling https://github.com/NLnetLabs/unbound/tree/stream-reuse, how can I do?
Is there a parameter to enable the stream reuse?

@wcawijngaards
Copy link
Member

There is no parameter, it should just work by itself.

@iz8mbw
Copy link
Author

iz8mbw commented Jul 30, 2020

I'm testing but I see multiple connections (ESTABLISHED) to upstream (1.1.1.1 and 1.0.0.1 in my case):

root@odroidc2:~/script/unbound-stream-reuse# netstat -anp | grep -i 53
tcp        0      0 192.168.147.4:53        0.0.0.0:*               LISTEN      28814/unbound
tcp        0      0 192.168.147.4:53        0.0.0.0:*               LISTEN      28814/unbound
tcp        0      0 127.0.0.53:53           0.0.0.0:*               LISTEN      1670/systemd-resolv
tcp        0      0 192.168.147.4:41527     1.0.0.1:853             ESTABLISHED 28814/unbound
tcp        0      0 192.168.147.4:33239     1.1.1.1:853             ESTABLISHED 28814/unbound
tcp        0      0 192.168.147.4:39885     1.1.1.1:853             TIME_WAIT   -
tcp        0      0 192.168.147.4:35971     1.0.0.1:853             ESTABLISHED 28814/unbound
tcp        0      0 192.168.147.4:44217     1.1.1.1:853             TIME_WAIT   -
tcp        0      0 192.168.147.4:38869     1.0.0.1:853             TIME_WAIT   -
tcp        0      0 192.168.147.4:36095     1.1.1.1:853             ESTABLISHED 28814/unbound
udp        0      0 192.168.147.4:53        0.0.0.0:*                           28814/unbound
udp        0      0 192.168.147.4:53        0.0.0.0:*                           28814/unbound
udp        0      0 127.0.0.53:53           0.0.0.0:*                           1670/systemd-resolv
unix  3      [ ]         STREAM     CONNECTED     127253   28814/unbound

@iz8mbw
Copy link
Author

iz8mbw commented Jul 30, 2020

and, after various lookups, I receive:

root@odroidc2:~/script/unbound-stream-reuse# nslookup github.com 192.168.147.4
Server:         192.168.147.4
Address:        192.168.147.4#53

** server can't find github.com: SERVFAIL

@wcawijngaards
Copy link
Member

It is going to make one connection per thread, if you have a lot of traffic, even multiple to service thousands of queries (because cannot put that much on one connection). Perhaps you have num-threads increased? Otherwise, that would be a bug of some sort.

The servfail looks like it is as well, and I would like to debug it, do you know how I can reproduce it?

@iz8mbw
Copy link
Author

iz8mbw commented Jul 30, 2020

I made only 5 or 6 lookups to different domain, so no lots of traffic.
This my simple config file:

server:
num-threads: 2
so-reuseport: yes
rrset-roundrobin: yes
local-data: "one.one.one.one. A 1.1.1.1"
local-data: "one.one.one.one. A 1.0.0.1"
local-data: "1dot1dot1dot1.cloudflare-dns.com. A 1.1.1.1"
local-data: "1dot1dot1dot1.cloudflare-dns.com. A 1.0.0.1"
qname-minimisation: yes
tls-cert-bundle: "/etc/ssl/certs/ca-certificates.crt"
hide-identity: yes
hide-version: yes
interface: 192.168.147.4@53
outgoing-interface: 192.168.147.4
username: ""
        do-ip4: yes
        do-ip6: no
        do-udp: yes
        do-tcp: yes
        prefer-ip6: no
        prefer-ip4: yes

access-control: 192.168.147.0/24 allow

forward-zone:
  name: "."
  forward-tls-upstream: yes
  # Cloudflare DNS
  forward-addr: 1.1.1.1@853
  forward-addr: 1.0.0.1@853

To reproduce make tons of lookup to different domain/FQDN.

@iz8mbw
Copy link
Author

iz8mbw commented Jul 30, 2020

FYI, unbound was compiled with --sysconfdir=/etc --enable-tfo-client --enable-tfo-server --with-ssl=/opt/ssl on Ubuntu 20.04.1, Kernel 5.6.15.
In /opt/ssl there is OpenSSL 1.1.1g.

If I compile/use the Master branch, I don't have issue on the same Linux machine with same config file and same options to "configure".

@pyropeter
Copy link

@wcawijngaards I tested the branch, too. It works fine until the upstream servers close the connection due to timeout. (This is done by quad9, cloudflare and google) After the FIN/RST packet, I never see another packet towards the upstream servers until I restart unbound.

@wcawijngaards
Copy link
Member

Fixed what I think is the issue, in commit 7a211e5

Thanks for the test details! That was very helpful in finding the issue. The write event was not properly re-enabled upon struct reuse after the reused ssl stream was closed by the other side due to timeout when the stream was not in use for sending queries while the other side has the timeout. This error did not impact TCP connections (the branch also implements for TCP connections).

@iz8mbw
Copy link
Author

iz8mbw commented Jul 30, 2020

going to test

@iz8mbw
Copy link
Author

iz8mbw commented Jul 30, 2020

It works now! Thank you.

@iz8mbw
Copy link
Author

iz8mbw commented Jul 30, 2020

Just for info, how long is the time out to upstream?
I try to be more clear: If I make some DNS query to unbound, then I'll go to don't more use unbound for a while, so no DNS query to unbound, when unbound will got to close the socket to upstream that pulled up to last DNS query (in other words, when it will go to close the ESTABLISHED connection)?

@pyropeter
Copy link

@wcawijngaards I tested d973b75 and it works like a charm with google/cloudflare/quad9. Well done!

@wcawijngaards
Copy link
Member

Good to know it works!

That timeout for no traffic is currently coded at 30 seconds. The server unbound talks to also has a timeout for idle connections, and that is frequently shorter, or even very short in case of heavy traffic. The upstream timeout to the server for an idle connection is currently set in code (added in the new branch) and not configurable. The downstream timeout for clients can be configured and it is tcp-idle-timeout: 30000 (msec) also 30 seconds.

@iz8mbw
Copy link
Author

iz8mbw commented Jul 31, 2020

I'm still testing "stream-reuse", I have some comment.
Look at this example:

root@odroidc2:~# netstat -anp | grep -i ":853"
tcp        0      0 192.168.147.4:35603     1.1.1.1:853             TIME_WAIT   -
tcp        0      0 192.168.147.4:34375     1.0.0.1:853             TIME_WAIT   -
tcp        0      0 192.168.147.4:37547     1.1.1.1:853             ESTABLISHED 2377/unbound
tcp        0      0 192.168.147.4:35355     1.0.0.1:853             TIME_WAIT   -
tcp        0      0 192.168.147.4:43449     1.0.0.1:853             TIME_WAIT   -
tcp        0      0 192.168.147.4:41225     1.1.1.1:853             TIME_WAIT   -
root@odroidc2:~# nslookup www.libero.it 192.168.147.4
Server:         192.168.147.4
Address:        192.168.147.4#53

Non-authoritative answer:
www.libero.it   canonical name = d31d9gezsyt1z8.cloudfront.net.
Name:   d31d9gezsyt1z8.cloudfront.net
Address: 13.226.170.111
Name:   d31d9gezsyt1z8.cloudfront.net
Address: 13.226.170.100
Name:   d31d9gezsyt1z8.cloudfront.net
Address: 13.226.170.77
Name:   d31d9gezsyt1z8.cloudfront.net
Address: 13.226.170.61

root@odroidc2:~# netstat -anp | grep -i ":853"
tcp        0      0 192.168.147.4:35603     1.1.1.1:853             TIME_WAIT   -
tcp        0      0 192.168.147.4:37547     1.1.1.1:853             ESTABLISHED 2377/unbound
tcp        0      0 192.168.147.4:35355     1.0.0.1:853             TIME_WAIT   -
tcp        0      0 192.168.147.4:43449     1.0.0.1:853             TIME_WAIT   -
tcp        0      0 192.168.147.4:41547     1.0.0.1:853             ESTABLISHED 2377/unbound

As you can see there is an already "ESTABLISHED" connection to upstream (that can be "reused") but just after I make another lookup, unbound is going to create another connection to upstream.
In fact as you can see, before the lookup there was only one connection ESTABLISHED to upstream but after the lookup the ESTABLISHED connection will be two.

So, since the goal is to reuse connection to avoid multiple connection to upstream (or try to reduce them as much as possible), why in this case unbound is not going to reuse the already existing connection?

Thank you.

@wcawijngaards
Copy link
Member

This happens because you have num-threads: 2 configured (I saw this in the config you pasted above). Unbound reuses the connections per thread. If the query is serviced by the other thread, it has to make a new one because it does not have a connection for reuse.

So, Unbound reuses connections, but this is a per-thread-worker reuse. It keeps a list of reusable connections per thread and then uses it if possible. IPv4, IPv6 and TCP and TLS connections. There is also a maximum number of open requests on a channel, if this is exceeded Unbound would open more connections too. This needs thousands of queries waiting to add up to a number of connections (and this is not happening to you, it seems).

@iz8mbw
Copy link
Author

iz8mbw commented Jul 31, 2020

OK.

@wcawijngaards wcawijngaards linked a pull request Aug 4, 2020 that will close this issue
@iz8mbw
Copy link
Author

iz8mbw commented Nov 24, 2020

@wcawijngaards many many thanks for this!!! This is a very great improvement for unbound.
Ciao!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.