Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge primary DNS server timeout in overlay networks #1361

Closed
tomashejatko opened this issue Jul 27, 2016 · 4 comments
Closed

Huge primary DNS server timeout in overlay networks #1361

tomashejatko opened this issue Jul 27, 2016 · 4 comments

Comments

@tomashejatko
Copy link

Output of docker version:

Client:
 Version:      1.11.2
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   b9f10c9
 Built:        Wed Jun  1 21:47:50 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.11.2
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   b9f10c9
 Built:        Wed Jun  1 21:47:50 2016
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 14
 Running: 12
 Paused: 0
 Stopped: 2
Images: 73
Server Version: 1.11.2
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 1180
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: overlay bridge null host
Kernel Version: 4.4.0-24-generic
Operating System: Ubuntu 14.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.67 GiB
Name: ctrl1
ID: IOLI:2MX6:TLFT:BGXW:O35O:4W73:V5UZ:QZ7Q:NTLW:IFFG:JYK7:PTDH
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): true
 File Descriptors: 119
 Goroutines: 209
 System Time: 2016-07-27T10:50:59.486237445Z
 EventsListeners: 3
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Cluster store: etcd://ctrl1.dev:2379,ctrl2.dev:2379,ctrl3.dev:2379/_pa
Cluster advertise: 10.248.0.2:4243

Additional environment details (AWS, VirtualBox, physical, etc.):
Containers are running on QEMU virtual machines, but this is not relevant here.
Results are same when I specify --dns option to docker daemon even when using default (let docker read /etc/resolv.conf) on docker host

Steps to reproduce the issue:

  1. Primary DNS server is down
  2. Linux resolving (on docker host) works slower, but fine :
    [root@ctrl1.dev ~]# time host www.google.com
    www.google.com has address 172.217.16.100
    www.google.com has IPv6 address 2a00:1450:4014:80b::2004
    real 0m3.087s
    user 0m0.012s
    sys 0m0.008s
  3. Resolving inside container with overlay network is waaay slower :
    [root@ctrl1.dev ~]# docker exec -t -i redis1 bash -c 'time host www.google.com'
    www.google.com has address 172.217.16.100
    www.google.com has IPv6 address 2a00:1450:4014:80b::2004
    real 0m12.055s
    user 0m0.008s
    sys 0m0.004s

Container with default (bridge) network works fine :
root@848f9f8e5478:/# time host www.google.com
www.google.com has address 172.217.16.100
www.google.com has IPv6 address 2a00:1450:4014:80b::2004

real 0m3.019s
user 0m0.004s
sys 0m0.008s

Describe the results you received:
Fallback to secondary server take too much time to proceed and I can't use NSCD here because hostnames inside overlay network will get cached too (or not ?)

Describe the results you expected:
Fallback to secondary takes same time as native Linux resolver

@sanimej
Copy link

sanimej commented Jul 27, 2016

@arteal For overlay and custom bridge networks container's resolv.conf has only the Docker DNS server's IP 127.0.0.11. Its part of the Docker daemon and always available. When the client sends an external query embedded server forwards it to the configured nameservers.

By default host queries for A, AAAA and MX records and host tries them sequentially. So each of those queries will be first sent to the primary DNS server and when it fails to the secondary DNS server. Currently there is a timeout of 4 seconds resulting in a total time of 12 seconds. So the behavior you are seeing is expected for the host command. We can consider an enhancement where the Docker DNS server monitors the 'liveness' of the DNS servers and picks one that is available.

Note that for real applications there might experience such large delay because not all 3 records are queried typically and also some apps query A & AAAA in parallel.

@tomashejatko
Copy link
Author

This "liveness" monitoring will be cool.
This ticket is based on my experience where ETCD becomes unusable because it simply refuses to wait that long time for DNS resolv.

@mimousewu
Copy link

I met this problem too.And after the container starting up a day later, seems can not resolve outside domain by just using DNS of 127.0.0.11, I have to add name server manually into /etc/resolv.conf. --dns is useless in swarm env. My docker version is just the same as above. swarm version is v1.2.4-rc2. BTW, I use zookeeper. It lost about 25% in ping overly node ip address. But in some reasons it back to normal, and then sometimes it crashes.

@GordonTheTurtle
Copy link

@arteal It has been detected that this issue has not received any activity in over 6 months. Can you please let us know if it is still relevant:

  • For a bug: do you still experience the issue with the latest version?
  • For a feature request: was your request appropriately answered in a later version?

Thank you!
This issue will be automatically closed in 1 week unless it is commented on.
For more information please refer to #1926

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants