Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS proxy does not truncate UDP responses correctly #2160

Open
matthiasr opened this issue Oct 19, 2017 · 15 comments

Comments

@matthiasr
Copy link

commented Oct 19, 2017

Expected behavior

Any DNS record that can be resolved on the host can be resolved inside containers, reliably.

Large responses over UDP should be truncated, with the truncation flag set, so that clients know they should retry over TCP.

Actual behavior

Large DNS responses are very unreliable. At ~300 A records, maybe every other response packet never arrives. At a CNAME to the same 300 A records, several attempts are necessary until one succeeds; about half the time the number of retries is exhausted and the lookup fails. Looking up a CNAME to a CNAME to 300 A records works basically never. 400 A records in a single response work basically never.

We use DNS extensively, so large DNS records are not uncommon for us. This was even worse (none of the above cases worked at all) in 17.06, with 17.09 some of them work some of the time. The DNS server is behind a VPN.

I packet dumped this; I can share PCAPs privately if needed. I don't know of any public DNS servers that produce such large records – if you know any I'm happy to produce a clean PCAP that I can share.

What I observed is that for these large records, the DNS proxy falls back to TCP every time (as it should), while the container receives a UDP packet. In the stage where things get wonky, these packets are just under 1500 bytes, varying slightly in each response (presumably due to reordering of the records & compression).

For the records that only work from time to time, the ones that come through are at 1508 bytes including the Ethernet header. That means they're edging up on the 1500 byte MTU of the ethernet devices between the host and the VM, as well as the VM and the container. Based on the smaller records I assume that there is some spread of the response length, but only those responses that actually fall under the magic 1514 bytes make it through.

As far as I can see, the problem is that Docker for Mac sends too-large packets instead of truncating the DNS response and offering TCP fallback.

Information

  • Full output of the diagnostics from "Diagnose & Feedback" in the menu
Docker for Mac: version: 17.09.0-ce-mac35 (69202b202f497d4b6e627c3370781b9e4b51ec78)
macOS: version 10.11.6 (build: 15G1611)
logs: /tmp/EE9D4130-CC1E-48FA-AC53-DD8616C1F700/20171019-161738.tar.gz
[OK]     db.git
[OK]     vmnetd
[OK]     dns
[OK]     driver.amd64-linux
[OK]     virtualization VT-X
[OK]     app
[OK]     moby
[OK]     system
[OK]     moby-syslog
[OK]     db
[OK]     env
[OK]     virtualization kern.hv_support
[OK]     slirp
[OK]     osxfs
[OK]     moby-console
[OK]     logs
[OK]     docker-cli
[OK]     menubar
[OK]     disk

Diagnostic ID: EE9D4130-CC1E-48FA-AC53-DD8616C1F700

  • A reproducible case if this is a bug, Dockerfiles FTW

See below the fold.

Steps to reproduce the behavior

reproduction/packet dump script
#!/usr/bin/env bash

set -x

d="$(date +%Y-%m-%d_%H:%M)"

exec &> >(tee "log.txt")

sudo true
docker version

cat > Dockerfile << 'EOF'
FROM ubuntu

RUN apt-get update
RUN apt-get install -y dnsutils iproute2 iputils-ping tcpdump curl
RUN apt-get clean
EOF

docker build -t dnstest .

docker ps -a --filter label=dnstest -q | xargs docker rm -f

docker run --rm --name dnstest-0 -l dnstest --net host -v "$(pwd):/mnt" dnstest tcpdump -i eth0 -w "/mnt/${d}.docker-vm.pcap" port 53 &
docker run --rm --name dnstest-1 -l dnstest -v "$(pwd):/mnt" dnstest tcpdump -w "/mnt/${d}.container.pcap" port 53 &
sudo tcpdump -i utun0 -w "${d}.mac-host.pcap" port 53 &

sleep 10

host=300-a-records.example.com
docker run --rm --name "dnstest-$host-nslookup" -l dnstest --network container:dnstest-1 dnstest bash -c "for i in `seq 1 10 | xargs`; do nslookup '$host'; done"
host=cname-to-300-a-records.example.com
docker run --rm --name "dnstest-$host-nslookup" -l dnstest --network container:dnstest-1 dnstest bash -c "for i in `seq 1 10 | xargs`; do nslookup '$host'; done"
host=cname-to-cname-to-300-a-records.example.com
docker run --rm --name "dnstest-$host-nslookup" -l dnstest --network container:dnstest-1 dnstest bash -c "for i in `seq 1 10 | xargs`; do nslookup '$host'; done"

sleep 10

kill %sudo
docker ps -a --filter label=dnstest -q | xargs docker stop
@matthiasr

This comment has been minimized.

Copy link
Author

commented Oct 19, 2017

I didn't think to test this earlier, but it is easy to confirm that TCP lookups work (dig +tcp huge-record.example.com reliably works), they just never get triggered because no truncated UDP responses arrive.

@matthiasr

This comment has been minimized.

Copy link
Author

commented Oct 19, 2017

PPS: it is possible that there was no actual change that relates to this between 17.06 and 17.09, as the record that this most hurts on is just on the edge and may have been a little smaller in the past.

@alexandruionica

This comment has been minimized.

Copy link

commented Jan 15, 2018

I've encountered this bug last week with docker-ce=17.09.0ce-0ubuntu .
This bug was observed with a DNS record having 32 entries. When a client outside of Docker tries to resolve that name, it gets a reply that the response is truncated so it switches to a TCP connection and queries again the server.
When using Docker and user mode networking, the initial request from a client (in a container) is replied over UDP with 30 answers so 2 random entries are truncated.

@docker-desktop-robot

This comment has been minimized.

Copy link
Collaborator

commented Apr 16, 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale comment.
Stale issues will be closed after an additional 30d of inactivity.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows.
/lifecycle stale

@alexandruionica

This comment has been minimized.

Copy link

commented Apr 16, 2018

/remove-lifecycle stale

@jijojv

This comment has been minimized.

Copy link

commented Jul 5, 2018

ran into this bug testing git-lfs on centos 7

@docker-desktop-robot

This comment has been minimized.

Copy link
Collaborator

commented Oct 3, 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale comment.
Stale issues will be closed after an additional 30d of inactivity.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows.
/lifecycle stale

@dziemba

This comment has been minimized.

Copy link

commented Oct 20, 2018

/remove-lifecycle stale

@dziemba

This comment has been minimized.

Copy link

commented Oct 20, 2018

This is still a huge issue for us, it prevents any engineer in our company from using docker-for-mac for development.

I created a testing DNS server to make it easier for everybody to recreate the scenario:

dig SRV smalldns.test.dziemba.net
dig SRV hugedns.test.dziemba.net
dig +tcp SRV hugedns.test.dziemba.net

All these commands should run fine. The smalldns set contains 3 records, hugedns contains 420.

These commands work fine directly on Linux/MacOS and with a docker-machine/virtualbox docker setup under MacOS. Only on docker-for-mac (tested on 18.06.1-ce-mac73) the issues describe above appear:

  • smalldns works fine
  • hugedns times out
  • +tcp hugedns returns incomplete data, in my tests it only returned 57 out of 420 records

Let me know if you need any further information!

@Habbie

This comment has been minimized.

Copy link

commented Nov 27, 2018

If somebody is going to look at the DNS proxy, here's another bug:

Outside Docker:

$ dig a smalldns.test.dziemba.net

; <<>> DiG 9.12.2-P1 <<>> a smalldns.test.dziemba.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 33506
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;smalldns.test.dziemba.net.	IN	A

;; AUTHORITY SECTION:
test.dziemba.net.	5	IN	SOA	test.dziemba.net. admin.example.com. 5 30 30 30 30

;; Query time: 34 msec
;; SERVER: 62.179.104.196#53(62.179.104.196)
;; WHEN: Tue Nov 27 20:47:07 CET 2018
;; MSG SIZE  rcvd: 107

Note the NOERROR status, which is correct - the name exists, just the type does not.

Inside a docker container (debian:9 with apt-get install dnsutils):

# dig A smalldns.test.dziemba.net +tcp

; <<>> DiG 9.10.3-P4-Debian <<>> A smalldns.test.dziemba.net +tcp
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 431
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;smalldns.test.dziemba.net.	IN	A

;; Query time: 32 msec
;; SERVER: 192.168.65.1#53(192.168.65.1)
;; WHEN: Tue Nov 27 19:46:37 UTC 2018
;; MSG SIZE  rcvd: 43

Note the NXDOMAIN status which is wrong - the name does exist!

@docker-desktop-robot

This comment has been minimized.

Copy link
Collaborator

commented Feb 25, 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale comment.
Stale issues will be closed after an additional 30d of inactivity.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows.
/lifecycle stale

@Habbie

This comment has been minimized.

Copy link

commented Feb 26, 2019

Docker Desktop (Mac), Community, Version 2.0.0.3 (31259), Channel: stable, 8858db33c8, Engine: 18.09.2.

Issue still present.

@dziemba

This comment has been minimized.

Copy link

commented Feb 26, 2019

/remove-lifecycle stale

@djs55

This comment has been minimized.

Copy link
Contributor

commented Mar 19, 2019

@matthiasr thanks very much for the clear bug report and repro scripts. I believe the error is in the DNS forwarder inside https://github.com/moby/vpnkit so I've created a candidate fix and a unit test for it.

I'll keep you all informed of progress.

@djs55 djs55 self-assigned this Mar 25, 2019

@djs55

This comment has been minimized.

Copy link
Contributor

commented Mar 25, 2019

There's a development build with the candidate fix in, if you'd like to try it: https://download-stage.docker.com/mac/edge/32461/Docker.dmg

When I try @dziemba 's example queries the UDP -> TCP fallback seems to work:

/ # dig SRV hugedns.test.dziemba.net
;; Truncated, retrying in TCP mode.
...
;; ANSWER SECTION:
hugedns.test.dziemba.net. 7	IN	SRV	0 5 8080 www404.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.example.com.
...

However I notice only 55 records are returned rather than 420. So I think the TCP fallback issue is fixed but there appears to be a separate issue about the total number of records returned.

Edited to add: The separate issue about the total number of records returned is Mac-specific. On Windows I see the full 420 records.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants
You can’t perform that action at this time.