Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS - wrong "No such name" answer #202

Closed
simonferquel opened this issue Mar 27, 2017 · 3 comments
Closed

DNS - wrong "No such name" answer #202

simonferquel opened this issue Mar 27, 2017 · 3 comments
Labels

Comments

@simonferquel
Copy link
Contributor

dns.zip
Attached is a vpnkit capture/dns.pcap file when a dns request has failed.
By analyzing the logs, you'll se that query with 0x8f82 (message number 878) is answered with No such name on message 888 and 889 before the upstream dns server has had any chance to answer that (on message 913 - with correct records).

@djs55
Copy link
Collaborator

djs55 commented Mar 27, 2017

Thanks for the report!

I think the high-level sequence is:

  • A IPv4 query sent in message 878 (and then repeated every 1s for 5s)
  • No response is received for any of these IPv4 requests so the server is marked offline and we return an error
  • After 6s from the original query we start receiving responses, so the server is marked online and starts working again in message 913,914,915,916

The detailed trace is:

VM sends query

  • 878 378.78s: VM -> Host: Query A registry-1.docker.io

Host forwards query

  • 879: 378.78: Host -> Upstream: Query A registry-1.docker.io

Host receives response for earlier AAAA IPv6 query (packet 876)

  • 880: 378.78: Upstream -> Host: Response AAAA registry-1.docker.io (with no answers)
  • 881, 882: 378.78: Host -> VM: Response AAAA registry-1.docker.io (No such name)

The Host sends the query again, 1 time every second. It looks like the upstream server isn't responding.

  • 883: 379.79: Host -> Upstream: Query A registry-1.docker.io
  • 884: 380.79: Host -> Upstream: Query A registry-1.docker.io
  • 885: 381.79: Host -> Upstream: Query A registry-1.docker.io
  • 886: 382.79: Host -> Upstream: Query A registry-1.docker.io
  • 887: 383.78: Host -> Upstream: Query A registry-1.docker.io

The no such name response is sent after 5s of failure. At this point the VM's resolver would probably have given up

  • 888,889: 383.74: Host -> VM: Response No such name

Since the upstream resolver has failed to respond it has now been marked offline. This is to fix the scenario where the priority 1 server is offline (e.g. the VPN is down) but the priority 2 server is online and every lookup takes 5s.

The next query arrives and fails immediately. Note however we do send a request to the upstream server, and if it ever responds we will mark the server as online again.

  • 890: 383.78: VM -> Host: Query A registry-1.docker.io
  • 891,892: 383.74: Host -> VM Response No such name
  • 893: 383.74: Host -> Upstream: Query A registry-1.docker.io

The VM's resolver has now given up and is trying to use other search domains:

  • 894: 383.78: VM -> Host: Query AAAA registry-1.docker.io.docker-paris.local

We start receiving responses from the upstream 1s later:

  • 915: 384.79: Upstream -> Host: Response A registry-1.docker.io
  • 916: 384.79: Host -> VM: Response A registry-1.docker.io

So I think in this case there was a period of time from 378.78s - 384.79s (6s) where the upstream server didn't respond. A Linux resolver will often timeout after about 5s anyway, and we see the queries for the docker-paris.local address.

I think this glitch was genuinely caused by the upstream DNS server failing. We normally try to suppress these errors by caching, but the TTLs on the records is only 45s so the caching isn't very effective.

@simonferquel
Copy link
Contributor Author

Shouldn't we behave like a timeouting dns server on this case ? Instead of failing to resolve ?

@djs55
Copy link
Collaborator

djs55 commented Mar 27, 2017

Yeah I agree - it would probably be better to send no reply in this case. I'll take a look at the DNS forwarding code.

djs55 added a commit to djs55/vpnkit that referenced this issue Mar 28, 2017
This restores the previous behaviour of sending no response rather
than NXDomain by default.

Fixes moby#202

Signed-off-by: David Scott <dave.scott@docker.com>
avsm pushed a commit to avsm/vpnkit that referenced this issue May 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants