Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS resolution does not work (udp port 59693 unreachable) #5226

Closed
tgraf opened this issue Aug 16, 2018 · 5 comments
Closed

DNS resolution does not work (udp port 59693 unreachable) #5226

tgraf opened this issue Aug 16, 2018 · 5 comments

Comments

@tgraf
Copy link
Member

tgraf commented Aug 16, 2018

10:20:39.442435 IP (tos 0x0, ttl 64, id 48046, offset 0, flags [DF], proto UDP (17), length 92)
    10.2.5.93.33429 > 172.18.0.10.53: 31963+ A? www.google.com.chaos-testing.svc.cluster.local. (64)
10:20:39.443089 IP (tos 0x0, ttl 63, id 57924, offset 0, flags [DF], proto UDP (17), length 185)
    172.18.0.10.53 > 10.2.5.93.33429: 31963 NXDomain 0/1/0 (157)
10:20:39.443155 IP (tos 0x0, ttl 64, id 48047, offset 0, flags [DF], proto UDP (17), length 78)
    10.2.5.93.50220 > 172.18.0.10.53: 22090+ A? www.google.com.svc.cluster.local. (50)
10:20:39.443782 IP (tos 0x0, ttl 63, id 57925, offset 0, flags [DF], proto UDP (17), length 171)
    172.18.0.10.53 > 10.2.5.93.50220: 22090 NXDomain 0/1/0 (143)
10:20:39.443832 IP (tos 0x0, ttl 64, id 48048, offset 0, flags [DF], proto UDP (17), length 74)
    10.2.5.93.33998 > 172.18.0.10.53: 22814+ A? www.google.com.cluster.local. (46)
10:20:39.445218 IP (tos 0x0, ttl 63, id 40585, offset 0, flags [DF], proto UDP (17), length 167)
    172.18.0.10.53 > 10.2.5.93.33998: 22814 NXDomain 0/1/0 (139)
10:20:39.445270 IP (tos 0x0, ttl 64, id 48050, offset 0, flags [DF], proto UDP (17), length 87)
    10.2.5.93.44737 > 172.18.0.10.53: 44453+ A? www.google.com.us-west-2.compute.internal. (59)
10:20:39.448827 IP (tos 0x0, ttl 63, id 57927, offset 0, flags [DF], proto UDP (17), length 87)
    172.18.0.10.53 > 10.2.5.93.44737: 44453 NXDomain 0/0/0 (59)
10:20:39.448871 IP (tos 0x0, ttl 64, id 48052, offset 0, flags [DF], proto UDP (17), length 60)
    10.2.5.93.59693 > 172.18.0.10.53: 27+ A? www.google.com. (32)
10:20:39.449278 IP (tos 0x0, ttl 63, id 57928, offset 0, flags [DF], proto UDP (17), length 76)
    10.2.5.15.53 > 10.2.5.93.59693: 27 1/0/0 www.google.com. A 216.58.217.36 (48)
10:20:39.449300 IP (tos 0xc0, ttl 64, id 940, offset 0, flags [none], proto ICMP (1), length 104)
    10.2.5.93 > 10.2.5.15: ICMP 10.2.5.93 udp port 59693 unreachable, length 84
        IP (tos 0x0, ttl 63, id 57928, offset 0, flags [DF], proto UDP (17), length 76)
    10.2.5.15.53 > 10.2.5.93.59693: 27 1/0/0 www.google.com. A 216.58.217.36 (48)
10:20:44.450777 IP (tos 0x0, ttl 64, id 50872, offset 0, flags [DF], proto UDP (17), length 60)
    10.2.5.93.59693 > 172.18.0.10.53: 27+ A? www.google.com. (32)
10:20:44.451108 IP (tos 0x0, ttl 63, id 61459, offset 0, flags [DF], proto UDP (17), length 76)
    10.2.5.15.53 > 10.2.5.93.59693: 27 1/0/0 www.google.com. A 216.58.217.36 (48)
10:20:44.451121 IP (tos 0xc0, ttl 64, id 4160, offset 0, flags [none], proto ICMP (1), length 104)
    10.2.5.93 > 10.2.5.15: ICMP 10.2.5.93 udp port 59693 unreachable, length 84
        IP (tos 0x0, ttl 63, id 61459, offset 0, flags [DF], proto UDP (17), length 76)
    10.2.5.15.53 > 10.2.5.93.59693: 27 1/0/0 www.google.com. A 216.58.217.36 (48)
@joestringer
Copy link
Member

joestringer commented Aug 16, 2018

At a glance that looks like the endpoint 10.2.5.93 made a DNS request to 10.2.5.15, then the DNS server responded, but by that time the endpoint went away so the kernel didn't know how to process the DNS response, and hence sent back an ICMP destination unreachable message.

That said, it happens multiple times.. and with the same port, so presumably the app doesn't go away.

@tgraf
Copy link
Member Author

tgraf commented Aug 16, 2018

The curl timeout is 15 seconds and based on the timestamps it is nowhere close to that unless DNS resolution is postponed by 10 seconds.

@borkmann
Copy link
Member

Does it remain unreachable after that point onwards? Hitting limit and GC for hashtab not cleaning up fast enough?

@tgraf
Copy link
Member Author

tgraf commented Aug 16, 2018

Yes, confirmed. Failure always happens on the same node:

root@ip-172-0-163-219:~# cilium bpf ct list global | wc -l
999532

@joestringer
Copy link
Member

Should be fixed by #5230.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants