Huge primary DNS server timeout in overlay networks #1361

tomashejatko · 2016-07-27T11:25:47Z

Output of docker version:

Client:
 Version:      1.11.2
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   b9f10c9
 Built:        Wed Jun  1 21:47:50 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.11.2
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   b9f10c9
 Built:        Wed Jun  1 21:47:50 2016
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 14
 Running: 12
 Paused: 0
 Stopped: 2
Images: 73
Server Version: 1.11.2
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 1180
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: overlay bridge null host
Kernel Version: 4.4.0-24-generic
Operating System: Ubuntu 14.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.67 GiB
Name: ctrl1
ID: IOLI:2MX6:TLFT:BGXW:O35O:4W73:V5UZ:QZ7Q:NTLW:IFFG:JYK7:PTDH
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): true
 File Descriptors: 119
 Goroutines: 209
 System Time: 2016-07-27T10:50:59.486237445Z
 EventsListeners: 3
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Cluster store: etcd://ctrl1.dev:2379,ctrl2.dev:2379,ctrl3.dev:2379/_pa
Cluster advertise: 10.248.0.2:4243

Additional environment details (AWS, VirtualBox, physical, etc.):
Containers are running on QEMU virtual machines, but this is not relevant here.
Results are same when I specify --dns option to docker daemon even when using default (let docker read /etc/resolv.conf) on docker host

Steps to reproduce the issue:

Primary DNS server is down
Linux resolving (on docker host) works slower, but fine :
[root@ctrl1.dev ~]# time host www.google.com
www.google.com has address 172.217.16.100
www.google.com has IPv6 address 2a00:1450:4014:80b::2004
real 0m3.087s
user 0m0.012s
sys 0m0.008s
Resolving inside container with overlay network is waaay slower :
[root@ctrl1.dev ~]# docker exec -t -i redis1 bash -c 'time host www.google.com'
www.google.com has address 172.217.16.100
www.google.com has IPv6 address 2a00:1450:4014:80b::2004
real 0m12.055s
user 0m0.008s
sys 0m0.004s

Container with default (bridge) network works fine :
root@848f9f8e5478:/# time host www.google.com
www.google.com has address 172.217.16.100
www.google.com has IPv6 address 2a00:1450:4014:80b::2004

real 0m3.019s
user 0m0.004s
sys 0m0.008s

Describe the results you received:
Fallback to secondary server take too much time to proceed and I can't use NSCD here because hostnames inside overlay network will get cached too (or not ?)

Describe the results you expected:
Fallback to secondary takes same time as native Linux resolver

The text was updated successfully, but these errors were encountered:

sanimej · 2016-07-27T17:35:55Z

@arteal For overlay and custom bridge networks container's resolv.conf has only the Docker DNS server's IP 127.0.0.11. Its part of the Docker daemon and always available. When the client sends an external query embedded server forwards it to the configured nameservers.

By default host queries for A, AAAA and MX records and host tries them sequentially. So each of those queries will be first sent to the primary DNS server and when it fails to the secondary DNS server. Currently there is a timeout of 4 seconds resulting in a total time of 12 seconds. So the behavior you are seeing is expected for the host command. We can consider an enhancement where the Docker DNS server monitors the 'liveness' of the DNS servers and picks one that is available.

Note that for real applications there might experience such large delay because not all 3 records are queried typically and also some apps query A & AAAA in parallel.

tomashejatko · 2016-07-28T09:10:25Z

This "liveness" monitoring will be cool.
This ticket is based on my experience where ETCD becomes unusable because it simply refuses to wait that long time for DNS resolv.

mimousewu · 2016-08-18T00:20:46Z

I met this problem too.And after the container starting up a day later, seems can not resolve outside domain by just using DNS of 127.0.0.11, I have to add name server manually into /etc/resolv.conf. --dns is useless in swarm env. My docker version is just the same as above. swarm version is v1.2.4-rc2. BTW, I use zookeeper. It lost about 25% in ping overly node ip address. But in some reasons it back to normal, and then sometimes it crashes.

GordonTheTurtle · 2017-08-30T00:27:28Z

@arteal It has been detected that this issue has not received any activity in over 6 months. Can you please let us know if it is still relevant:

For a bug: do you still experience the issue with the latest version?
For a feature request: was your request appropriately answered in a later version?

Thank you!
This issue will be automatically closed in 1 week unless it is commented on.
For more information please refer to #1926

GordonTheTurtle closed this as completed Sep 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Huge primary DNS server timeout in overlay networks #1361

Huge primary DNS server timeout in overlay networks #1361

tomashejatko commented Jul 27, 2016

sanimej commented Jul 27, 2016

tomashejatko commented Jul 28, 2016

mimousewu commented Aug 18, 2016

GordonTheTurtle commented Aug 30, 2017

Huge primary DNS server timeout in overlay networks #1361

Huge primary DNS server timeout in overlay networks #1361

Comments

tomashejatko commented Jul 27, 2016

sanimej commented Jul 27, 2016

tomashejatko commented Jul 28, 2016

mimousewu commented Aug 18, 2016

GordonTheTurtle commented Aug 30, 2017