Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS timeout issues with v0.14.1 #54

Closed
ciapecki opened this issue Jun 9, 2020 · 27 comments · Fixed by #55
Closed

DNS timeout issues with v0.14.1 #54

ciapecki opened this issue Jun 9, 2020 · 27 comments · Fixed by #55
Labels

Comments

@ciapecki
Copy link

ciapecki commented Jun 9, 2020

$ vpn-slice --version
vpn-slice 0.14.1
Connected as XXXX + YYYYYY/64, using SSL, with DTLS disabled
WARNING: IPv6 address or netmask set. Support for IPv6 in vpn-slice should be considered BETA-QUALITY.
WARNING: IPv6 address or netmask set. Support for IPv6 in vpn-slice should be considered BETA-QUALITY.
Blocked incoming traffic from VPN interface with iptables.
Added routes for 3 nameservers, 1 subnets, 0 aliases.
Restored routes for 0 excluded subnets. []
Adding /etc/hosts entries for 3 nameservers...
  XXX = dns0.tun0
  YYY = dns1.tun0
  ZZZZZZZZ = dns2.tun0
Looking up 55 hosts using VPN DNS servers...
WARNING: Lookup for xxx on VPN DNS servers failed:
	The DNS operation timed out after 30.00084924697876 seconds
WARNING: Lookup for yyy on VPN DNS servers failed:
	The DNS operation timed out after 30.000951528549194 seconds
WARNING: Lookup for zzz on VPN DNS servers failed:
	The DNS operation timed out after 30.000957012176514 seconds
WARNING: Lookup for sss on VPN DNS servers failed:
	The DNS operation timed out after 30.00096583366394 seconds
^CSend BYE packet: Aborted by caller

reverting back to v0.14 resolves these issues for me:

$ vpn-slice --version
vpn-slice 0.14
Connected as XXXX + YYYYYYYYY/64, using SSL, with DTLS disabled
WARNING: IPv6 address or netmask set, but this version of vpn-slice has only rudimentary support for them.
WARNING: IPv6 address or netmask set, but this version of vpn-slice has only rudimentary support for them.
Blocked incoming traffic from VPN interface with iptables.
Added routes for 3 nameservers, 1 subnets, 0 aliases.
Restored routes for 0 excluded subnets.
Adding /etc/hosts entries for 3 nameservers...
  XXX = dns0.tun0
  YYY = dns1.tun0
  ZZZZZZZZ = dns2.tun0
Looking up 55 hosts using VPN DNS servers...
[...]
Added hostnames and aliases for 98 addresses to /etc/hosts.
Added 92 routes for named hosts.
Connection setup done, child process 2644287 exiting.

There must have been changes between v0.14 and v0.14.1 causing above timeouts.
Anything I can test on my side?

thanks,
Chris

@ciapecki
Copy link
Author

ciapecki commented Jun 9, 2020

I noticed this in --dump -vv
in master (v0.14.1):

INTERNAL_IP6_ADDRESS    => myaddr6=IPv6Interface('2606:b400:2b5b:88:8000::13b4/64')
INTERNAL_IP6_NETMASK    => netmask6=IPv6Interface('2606:b400:2b5b:88:8000::13b4/64')

whereas in (v0.14):

INTERNAL_IP6_ADDRESS    => myaddr6=IPv6Address('2606:b400:2b5b:89:8000::196')
INTERNAL_IP6_NETMASK    => netmask6=IPv6Interface('2606:b400:2b5b:89:8000::196/64')

@dlenski
Copy link
Owner

dlenski commented Jun 9, 2020

Hmmm… here are all the changes between v0.14 and v0.14.1.

Adding /etc/hosts entries for 3 nameservers...
  XXX = dns0.tun0
  YYY = dns1.tun0
  ZZZZZZZZ = dns2.tun0

Am I to assume that XXX and YYY are IPv4 addresses, while ZZZZZZZZ is an IPv6 address?

Once connected, can you ping/ping6 all of these addresses?

What happens if you try to do the DNS lookups "manually" via dig @DNS_server_IP host.to.look.up, after connecting?

What happens if you add --disable-ipv6 to the openconnect command line? (I realize this might limit your functionality, but I'm curious about whether DNS lookups can complete successfully with IPv6 disabled.)

As for the netmask changes between v0.14 and v0.14.1… is this a consistent and persistent difference? These values are sent by the server itself… so it's quite like that if the prefix changed from 2606:b400:2b5b:88/64 to 2606:b400:2b5b:89/64, it's because the server "wanted it" to change.

@ciapecki
Copy link
Author

ciapecki commented Jun 9, 2020

As for the netmask changes between v0.14 and v0.14.1… is this a consistent and persistent difference? These values are sent by the server itself… so it's quite like that if the prefix changed from 2606:b400:2b5b:88/64 to 2606:b400:2b5b:89/64, it's because the server "wanted it" to change.

I didn't mean the exact values, but the following:

v0.14:  INTERNAL_IP6_ADDRESS    => myaddr6=IPv6Interface(...
v0.14.1: INTERNAL_IP6_ADDRESS    => myaddr6=IPv6Address(...

IPv6Interface vs IPv6Address

I will try to check your other thoughts later.

dlenski added a commit that referenced this issue Jun 9, 2020
@dlenski
Copy link
Owner

dlenski commented Jun 9, 2020

I've attempted to reproduce this, by adding bogus/black-holed DNS servers to my list…

from vpn_slice.dnspython import DNSPythonProvider
import ipaddress

p=DNSPythonProvider()

# One bogus IPv4 DNS server, two real IPv6 DNS servers
p.configure(list(map(ipaddress.ip_address, (
    '123.45.67.89', # bogus IPv4 DNS server
    '2001:4860:4860::8888','2001:4860:4860::8844', # real Google Public DNS servers (IPv6)
))), bind_addresses=list(map(ipaddress.ip_address, ('192.168.12.233','2601:…'))) )

# This succeeds for me
p.lookup_host('www.google.com')

Can you test fdf9ef2 and see if it fixes the problem for you?

IPv6Interface vs IPv6Address

Ah, okay. This was an intended change, to correctly replicate the behavior of the "standard" vpnc-script in terms of how it handles the parameters sent by openconnect. If it had an incorrect effect on your (IPv6) routing configuration, I'm interested to hear it.

@dlenski dlenski added the ipv6 label Jun 9, 2020
@gmacon
Copy link
Collaborator

gmacon commented Jun 9, 2020

I've started seeing this, too. I looked at the traffic with Wireshark, and I now suspect a bug in dnspython. I was seeing a request go out from dnspython, and an empty response come back from the name server immediately. If dnspython were correctly receiving the response, I'd expect NoAnswer to be raised right away.

There were two differences between a successful request from dig and the request from dnspython: dig sets the AD flag, and dig uses EDNS0. I reconfigured dnspython to use EDNS0, but that didn't fix the issue. I didn't see an obvious way to get dnspython to set the AD flag, but I wouldn't really expect that to change anything.

@dlenski
Copy link
Owner

dlenski commented Jun 9, 2020

I've started seeing this, too. I looked at the traffic with Wireshark, and I now suspect a bug in dnspython. I was seeing a request go out from dnspython, and an empty response come back from the name server immediately.

Huh, that's interesting.

So your idea is that dnspython isn't raising NoAnswer where it should be… but is instead waiting and eventually timing out?

There were two differences between a successful request from dig and the request from dnspython: dig sets the AD flag, …

Yeah, I don't see how that would change anything either.

and dig uses EDNS0.

🤔
It's not supposed to use EDNS… we disable that explicitly.

self.base_cl = [self.dig, '+short', '+noedns']

@gmacon
Copy link
Collaborator

gmacon commented Jun 9, 2020

So your idea is that dnspython isn't raising NoAnswer where it should be… but is instead waiting and eventually timing out?

That's right.

It's not supposed to use EDNS… we disable that explicitly.

This one's my fault. I compared vpn-slice's traffic with running dig by hand, so just dig @server_ip some.internal.name. Sorry about that.

@dlenski
Copy link
Owner

dlenski commented Jun 9, 2020

So your idea is that dnspython isn't raising NoAnswer where it should be… but is instead waiting and eventually timing out?

That's right.

I wish I could reproduce this, but cannot.

When I try fdf9ef2 (with the logging statement uncommented) on any of the real VPNs and DNS servers that I have access to (running on Ubuntu 18.04 here)… I get an immediate reply, rather than a timeout, and then the remaining DNS lookups complete successfully:

Issuing query for hostname 'fnord.nothing.net', rectype 'A', source IPv4Address('10.4.254.248'), search_domains [<DNS name foo.com.>, <DNS name foo.local.>], nameservers ['10.4.16.122', '10.4.16.166']
WARNING: Lookup for fnord.nothing.net on VPN DNS servers returned nothing.

Issuing query for hostname 'www.slashdot.org', rectype 'A', source IPv4Address('10.4.254.250'), search_domains [<DNS name foo.com.>, <DNS name foo.local.>], nameservers ['10.4.16.122', '10.4.16.166']
Got results: [<DNS IN A rdata: 216.105.38.15>]
  www.slashdot.org = 216.105.38.15

@gmacon Could you try to use something like the short standalone test script I posted above to repeatedly generate the DNS queries that are causing the timeout with dnspython?

@dlenski dlenski pinned this issue Jun 9, 2020
@dlenski dlenski changed the title v0.14.1 issues with time out DNS timeout issues with v0.14.1 Jun 9, 2020
@ciapecki
Copy link
Author

ciapecki commented Jun 9, 2020

Can you test fdf9ef2 and see if it fixes the problem for you?

I tested with your fix, and it works. It's not timing out, just resolving of the servers takes a long time before the connection is established.

@dlenski
Copy link
Owner

dlenski commented Jun 9, 2020

I tested with your fix, and it works. It's not timing out, just resolving of the servers takes a long time before the connection is established.

Ahhhh… that's good to know.

Can you uncomment this additional logging statement and run again? fdf9ef2#diff-6213469e2b2c96b8c072c5abf4d783b7R35-R36

I'm guessing that what's happening here is that it's timing out on the IPv6 DNS server(s) or the IPv4 DNS server(s) — whichever ones it tries first — and then after 30s tries on the other IP-version servers, and succeeds.

If you uncomment the logging statement and try again, it will be 100% clear if this is what's happening.

@gmacon
Copy link
Collaborator

gmacon commented Jun 9, 2020

I'm running the version with extra logging, and it looks like I'm just impatient. It's successfully looking up the DNS on the IPv4 DNS server and then failing to do so against the IPv6 server (and there are, in fact, no IPv6 packets in Wireshark). Because I have a bunch of names, it'll take ten minutes to wait out all the IPv6 timeouts, and I never did.

I think it's surprising that, after successfully resolving a name against one name server, it tries again against the other. It strikes me that a configuration where AAAA records are only returned over IPv6 would be a very unusual one.

I'll try to debug my IPv6 connectivity.

@ciapecki
Copy link
Author

ciapecki commented Jun 9, 2020

Can you uncomment this additional logging statement and run again? fdf9ef2#diff-6213469e2b2c96b8c072c5abf4d783b7R35-R36

it works but still way too long. I tested for 5 servers it took 2,5 minutes to resolve.
on v0.14 it works almost immediately (couple of seconds)

@gmacon
Copy link
Collaborator

gmacon commented Jun 9, 2020

I just pushed a branch to my fork, dns-by-route that contains a modification so that dnspython can figure out what servers it wants to talk to on its own. This does remove the explicit source addresses, but they don't seem to be required for me. Thoughts?

@ciapecki
Copy link
Author

ciapecki commented Jun 9, 2020

@gmacon your fork works fine for me

@dlenski
Copy link
Owner

dlenski commented Jun 9, 2020

it works but still way too long. I tested for 5 servers it took 2,5 minutes to resolve.
on v0.14 it works almost immediately (couple of seconds)

Hmmm… 2.5 minutes is impossibly long with a 30 second timeout. There are at most 4 combinations of (DNS server address: IPv4 vs. IPv6 ) × (DNS rectype: A vs AAAA).

@ciapecki Can you share the logging output? It should not take more than 2 minutes given 4 combinations and a 30 second timeout.

@dlenski
Copy link
Owner

dlenski commented Jun 9, 2020

… that contains a modification so that dnspython can figure out what servers it wants to talk to on its own. This does remove the explicit source addresses, but they don't seem to be required for me. Thoughts?

@gmacon thanks.

I can understand why this works better, but I worry that it may go against important design goals of vpn-slice, namely:

  1. not to leak anything about what you're doing "inside" the VPN to the rest of the Internet, and
  2. not to leak anything about what you're doing on the rest of the Internet "inside" the VPN.

Not specifying the source-address means that dnspython may, for example, send some of your DNS queries out, unencrypted on the public Internet. This shouldn't happen iff the routing tables only have routes to the DNS servers via the VPN tunnel interface… but I'm not at all confident that all OSes handle it in this way.

@dlenski
Copy link
Owner

dlenski commented Jun 9, 2020

I'm running the version with extra logging, and it looks like I'm just impatient. It's successfully looking up the DNS on the IPv4 DNS server and then failing to do so against the IPv6 server (and there are, in fact, no IPv6 packets in Wireshark). Because I have a bunch of names, it'll take ten minutes to wait out all the IPv6 timeouts, and I never did.

I'll try to debug my IPv6 connectivity.

Yeah. Would be good to know if it's the VPN admins' fault (DNS-over-IPv6 is configured wrong on the server side) or vpn-slice's fault (traffic to DNS servers' IPv6 addresses is being routed incorrectly).

I think it's surprising that, after successfully resolving a name against one name server, it tries again against the other. It strikes me that a configuration where AAAA records are only returned over IPv6 would be a very unusual one.

Unfortunately there are DNS servers and VPNs which are known to misbehave in ways like this, probably as a crutch for older client OSes that themselves mishandled IPv6.

It's not exactly the same thing, but Pulse VPNs won't let you tunnel IPv6 packets to the VPN when connected to the gateway via IPv4, and vice versa… a completely idiotic arrangement that defeats the entire purpose of a tunneling protocol, but nevertheless true.

We could change the keep_going parameter to default to False, and ensure that IPv4 servers are tried first… but then you would fail to get IPv6 addresses for VPNs that only send A records via DNS-over-IPv4.


tl;dr It seems to me that the bottom line here is we need to figure out why IPv6 connectivity to your DNS servers isn't working, and fix or workaround that, rather than simply ignore or skip DNS-over-IPv6 silently. (We could also add an --ignore-ipv6-dns option to vpn-slice itself for cases where the IPv6 DNS server addresses are intractably broken and the VPN admins won't fix.)

@ciapecki
Copy link
Author

ciapecki commented Jun 9, 2020

it works but still way too long. I tested for 5 servers it took 2,5 minutes to resolve.
on v0.14 it works almost immediately (couple of seconds)

Hmmm… 2.5 minutes is impossibly long with a 30 second timeout. There are at most 4 combinations of (DNS server address: IPv4 vs. IPv6 ) × (DNS rectype: A vs AAAA).

@ciapecki Can you share the logging output? It should not take more than 2 minutes given 4 combinations and a 30 second timeout.

I have 5 hosts passed to vpn-slice. For every one it takes 30seconds => together ~2,5min

for every host:

Got results: [<DNS IN A rdata: AAA>]
[30seconds wait time]
  login.myserver.com = AAA

@dlenski
Copy link
Owner

dlenski commented Jun 9, 2020

I have 5 hosts passed to vpn-slice. For every one it takes 30seconds => together ~2,5min

Ah, thanks for clarifying. 👌

for every host:

Got results: [<DNS IN A rdata: AAA>]
[30seconds wait time]
  login.myserver.com = AAA

Yeah, this has to be because of a timeout on whichever type of DNS server it is trying in the first 30 seconds.

Can you please try 505e136 (= fdf9ef2 with the logging statements uncommented)? This will show you whether or not it's the DNS-over-IPv6 or DNS-over-IPv4 queries which are timing out; that's an important piece of information to help us know whether this is really an IPv6-specific problem.

@ciapecki
Copy link
Author

I have 5 hosts passed to vpn-slice. For every one it takes 30seconds => together ~2,5min

Ah, thanks for clarifying.

for every host:

Got results: [<DNS IN A rdata: AAA>]
[30seconds wait time]
  login.myserver.com = AAA

Yeah, this has to be because of a timeout on whichever type of DNS server it is trying in the first 30 seconds.

Can you please try 505e136 (= fdf9ef2 with the logging statements uncommented)? This will show you whether or not it's the DNS-over-IPv6 or DNS-over-IPv4 queries which are timing out; that's an important piece of information to help us know whether this is really an IPv6-specific problem.

WARNING: IPv6 address or netmask set. Support for IPv6 in vpn-slice should be considered BETA-QUALITY.
WARNING: IPv6 address or netmask set. Support for IPv6 in vpn-slice should be considered BETA-QUALITY.
Blocked incoming traffic from VPN interface with iptables.
Added routes for 3 nameservers, 0 subnets, 0 aliases.
Restored routes for 0 excluded subnets. []
Adding /etc/hosts entries for 3 nameservers...
  XXX = dns0.tun0
  YYY = dns1.tun0
  ZZZZZZ = dns2.tun0
Looking up 5 hosts using VPN DNS servers...
Issuing query for hostname 'my.mycompany.com', rectype 'A', source IPv4Address('AAA'), search_domains [<DNS name nl.mycompany.com.>], nameservers ['XXX', 'YYY']
Got results: [<DNS IN A rdata: BBB>]
Issuing query for hostname 'my.mycompany.com', rectype 'AAAA', source IPv4Address('AAA'), search_domains [<DNS name nl.mycompany.com.>], nameservers ['XXX', 'YYY']
Issuing query for hostname 'my.mycompany.com', rectype 'A', source IPv6Interface('SSSSSS/64'), search_domains [<DNS name nl.mycompany.com.>], nameservers ['ZZZZZZ']


30secs....

  my.mycompany.com = BBB
Issuing query for hostname 'blog.us.mycompany.com', rectype 'A', source IPv4Address('AAA'), search_domains [<DNS name nl.mycompany.com.>], nameservers ['XXX', 'YYY']
Got results: [<DNS IN A rdata: CCC>]
Issuing query for hostname 'blog.us.mycompany.com', rectype 'AAAA', source IPv4Address('AAA'), search_domains [<DNS name nl.mycompany.com.>], nameservers ['XXX', 'YYY']
Issuing query for hostname 'blog.us.mycompany.com', rectype 'A', source IPv6Interface('SSSSSS/64'), search_domains [<DNS name nl.mycompany.com.>], nameservers ['ZZZZZZ']

30secs....

  blog.us.mycompany.com = CCC

Issuing..... for other 3 hosts

Added hostnames and aliases for 8 addresses to /etc/hosts.
Added 5 routes for named hosts.
Connection setup done, child process 221281 exiting.

@dlenski
Copy link
Owner

dlenski commented Jun 10, 2020

Thanks @ciapecki. Okay, we can definitely confirm from this that it is indeed the DNS-over-IPv6 servers which are timing out for you

@fahasch
Copy link

fahasch commented Jun 10, 2020

I have an issue that I think is related. For me vpn-slice stopped working at version v0.14.1. In particular, I get the error:

Traceback (most recent call last): File "/usr/local/bin/vpn-slice", line 8, in <module> sys.exit(main()) File "/usr/local/Cellar/vpn-slice/0.14.1/libexec/lib/python3.8/site-packages/vpn_slice/__main__.py", line 509, in main do_connect(env, args) File "/usr/local/Cellar/vpn-slice/0.14.1/libexec/lib/python3.8/site-packages/vpn_slice/__main__.py", line 168, in do_connect providers.route.add_address(env.tundev, env.myaddr6) File "/usr/local/Cellar/vpn-slice/0.14.1/libexec/lib/python3.8/site-packages/vpn_slice/mac.py", line 109, in add_address self._ifconfig(device, family, ip_network(address), address) File "/usr/local/opt/python@3.8/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ipaddress.py", line 79, in ip_network return IPv6Network(address, strict) File "/usr/local/opt/python@3.8/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ipaddress.py", line 2106, in __init__ raise ValueError('%s has host bits set' % self) ValueError: ZZZZZZZZ/64 has host bits set

If I look at the dump and compare version 0.14.0 to version 0.14.1, I notice that
INTERNAL_IP6_ADDRESS has changed. In version 0.14.0 it has been simply "ZZZZZZZZ" (without any subnet). In version 0.14.1, it is "ZZZZZZZZ/64". This problem seems to be related to the one reported by @ciapecki . The variable INTERNAL_IP6_NETMASK has a subnet in both versions.

Note however that I do not have timing out issues but the error that "... has host bits set".

@gmacon
Copy link
Collaborator

gmacon commented Jun 10, 2020

@fahasch, that was the first error with IPv6 that I got. The proximate cause there is that an IPv6Interface object can't (usually) be passed to ip_network. I changed that line to

        address = ip_interface(address)
        self._ifconfig(device, family, address, address.ip)

which ran without error (if I remembered correctly exactly what I did, I've changed it again since then trying to figure out what's wrong) but still resulted in broken IPv6 connectivity.

I connected with the real AnyConnect client to see how it configures the interface:

utun1: flags=80d1<UP,POINTOPOINT,RUNNING,NOARP,MULTICAST> mtu 1390
	inet 10.x.x.x --> 10.x.x.x netmask 0xfffffe00 
	inet6 fe80::yyyy%utun1 prefixlen 64 scopeid 0x13 
	inet6 zzzz prefixlen 128 
	nd6 options=201<PERFORMNUD,DAD>

It's not clear exactly what Cisco is thinking here. IPv4 is point-to-point but has zero bits in the netmask. IPv6 is not point-to-point, but it has the all-ones netmask.

@fahasch
Copy link

fahasch commented Jun 10, 2020

@gmacon Thanks for the tip. It does not solve the problem for me though. If I change the relevant line 109 in mac.py (making sure that ip_interface is imported), I do get another error:

raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['/sbin/ifconfig', 'utun3', 'inet6', 'ZZZZZZ/64', 'ZZZZZZ']' returned non-zero exit status 1.

If I run sudo /sbin/ifconfig utun3 inet6 ZZZZZZ/64 ZZZZZZ directly from the command line, I obtain ifconfig: ioctl (SIOCAIFADDR): Invalid argument. This might be a incompatibility with the BSD on which the OS X runs on? (sorry I cannot be of any help as I do not really know what I am doing)

@gmacon
Copy link
Collaborator

gmacon commented Jun 10, 2020

I guess I should have said this, too. My current state is:

        address = ip_interface(address)
        self._ifconfig(device, family, ip_network(address.ip), address.ip)

which just throws away any available netmask information.

@fahasch
Copy link

fahasch commented Jun 10, 2020

That works like a charm (except for the DNS operation timed out mentioned above). I hope the issues with ip6 (wherever they are located) will be settled at some point in the near future.

dlenski added a commit that referenced this issue Jun 11, 2020
handle IPv6Interface objects consistently and gracefully (fix for #54)
dlenski added a commit that referenced this issue Jun 11, 2020
dlenski added a commit that referenced this issue Jun 11, 2020
handle IPv6Interface objects consistently and gracefully (fix for #54)
@dlenski
Copy link
Owner

dlenski commented Jun 11, 2020

Merged #55 which should resolve the issue on both macOS and Linux.

Brief summary is that dnspython does not tolerate the "interface" form of an IPv4/6 source address which combines the host address and netmask (e.g. 1.2.3.4/24 or 2601:f00f::f00b/64).

This affected IPv6 due to the changes to store the IPv6 address in this form between v0.14 and v0.14.1 (needed for correct behavior with IPv6 routing). I didn't test the DNS-over-IPv6 interaction properly in this form because I haven't had access to a VPN with IPv6 DNS servers in the last few weeks (bad me 🤦‍♂️ 👎).

Three cheers to @gmacon for figuring out a solution and writing a clean fix. 🍻

@dlenski dlenski unpinned this issue Jun 11, 2020
Repository owner locked and limited conversation to collaborators Oct 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants