Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embedded DNS not resolving hostname #179

Closed
1 of 3 tasks
bluefurrymonster opened this issue Dec 4, 2017 · 16 comments
Closed
1 of 3 tasks

Embedded DNS not resolving hostname #179

bluefurrymonster opened this issue Dec 4, 2017 · 16 comments

Comments

@bluefurrymonster
Copy link

bluefurrymonster commented Dec 4, 2017

  • This is a bug report
  • This is a feature request
  • I searched existing issues before opening this one

Expected behavior

# docker exec -it busybox01 ping -c 2 busybox02

PING busybox02 (172.18.0.3): 56 data bytes
64 bytes from 172.18.0.3: seq=0 ttl=64 time=0.037 ms
64 bytes from 172.18.0.3: seq=1 ttl=64 time=0.060 ms

Actual behavior

# docker exec -it busybox01 ping -c 2 busybox02
ping: bad address 'busybox02'

Steps to reproduce the behavior

# docker network create -d bridge mybridge01
# docker run -itd --network=mybridge01 --name busybox01 -h busybox01 busybox
# docker run -itd --network=mybridge01 --name busybox02 -h busybox02 busybox
# docker exec -it busybox01 ping -c 2 busybox02

Output of docker version:

Client:
 Version:      17.09.0-ce
 API version:  1.32
 Go version:   go1.8.3
 Git commit:   afdb6d4
 Built:        Tue Sep 26 22:41:23 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.09.0-ce
 API version:  1.32 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   afdb6d4
 Built:        Tue Sep 26 22:42:49 2017
 OS/Arch:      linux/amd64
 Experimental: false

Output of docker info:

Containers: 10
 Running: 10
 Paused: 0
 Stopped: 0
Images: 21
Server Version: 17.09.0-ce
Storage Driver: devicemapper
 Pool Name: docker-252:17-8126473-pool
 Pool Blocksize: 65.54kB
 Base Device Size: 10.74GB
 Backing Filesystem: xfs
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 13.65GB
 Data Space Total: 107.4GB
 Data Space Available: 93.72GB
 Metadata Space Used: 14.74MB
 Metadata Space Total: 2.147GB
 Metadata Space Available: 2.133GB
 Thin Pool Minimum Free Space: 10.74GB
 Udev Sync Supported: true
 Deferred Removal Enabled: true
 Deferred Deletion Enabled: true
 Deferred Deleted Device Count: 0
 Data loop file: /data/docker_root_dir/devicemapper/devicemapper/data
 Metadata loop file: /data/docker_root_dir/devicemapper/devicemapper/metadata
 Library Version: 1.02.140-RHEL7 (2017-05-03)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 06b9cb35161009dcb7123345749fef02f7cea8e0
runc version: 3f2f8b84a77f73d38244dd690525642a72156c64
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-327.36.3.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.51GiB
Name: XXXXXX
ID: UCJI:4WP5:AXWX:FMRR:W7VV:HS53:NXUF:QJS4:RVPY:METQ:KIJN:ZUJS
Docker Root Dir: /data/docker_root_dir
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: devicemapper: usage of loopback devices is strongly discouraged for pro  duction use.
         Use `--storage-opt dm.thinpooldev` to specify a custom block storage de  vice.
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

Additional environment details (AWS, VirtualBox, physical, etc.)
Works perfectly fine in another VM in Azure with the same OS.
image

@trapier
Copy link

trapier commented Dec 7, 2017

Put the docker daemon on the affected host into debug mode then watch the daemon logs while running the failing ping. Healthy output looks something like this:

sudo journalctl -fu docker -n0 |grep -i resolve

Dec 06 23:52:41 revnd dockerd[32205]: time="2017-12-06T23:52:41.983800192-05:00" level=debug msg="Name To resolve: busybox02."
Dec 06 23:52:41 revnd dockerd[32205]: time="2017-12-06T23:52:41.983976480-05:00" level=debug msg="[resolver] lookup name busybox02. present without IPv6 address"
Dec 06 23:52:41 revnd dockerd[32205]: time="2017-12-06T23:52:41.984087899-05:00" level=debug msg="Name To resolve: busybox02."
Dec 06 23:52:41 revnd dockerd[32205]: time="2017-12-06T23:52:41.984118877-05:00" level=debug msg="[resolver] lookup for busybox02.: IP [172.20.0.3]"

If there's no activity in the daemon logs while name resolution fails, then check on the plumbing for service discovery. container's /etc/resolv.conf should point nameserver 127.0.0.11:

docker exec busybox01 cat /etc/resolv.conf

nameserver 127.0.0.11
options ndots:0

container iptables should have a DNAT rule to dest port 53 on udp and tcp. Output should look something like this (natted-to port will be different):

sudo nsenter -n -t $(docker inspect --format {{.State.Pid}} busybox01) iptables -t nat -nvL 

DOCKER_OUTPUT
Chain DOCKER_OUTPUT (1 references)
 pkts bytes target     prot opt in     out     source               destination
    0     0 DNAT       tcp  --  *      *       0.0.0.0/0            127.0.0.11           tcp dpt:53 to:127.0.0.11:45005
    0     0 DNAT       udp  --  *      *       0.0.0.0/0            127.0.0.11           udp dpt:53 to:127.0.0.11:57017

Then check if dockerd is listening on those ports:

sudo nsenter -n -p -t $(docker inspect --format {{.State.Pid}} busybox01) ss -utnlp |grep dockerd

udp    UNCONN     0      0      127.0.0.11:57017                 *:*                   users:(("dockerd",pid=32205,fd=29))
tcp    LISTEN     0      128    127.0.0.11:45005                 *:*                   users:(("dockerd",pid=32205,fd=31))

Diagram of DNS programming here: https://www.slideshare.net/MadhuVenugopal2/dcus17-docker-networking-deep-dive/18?src=clipshare

If the "plumbing" for DNS is incorrect or missing, then please reply with attachments:

@thaJeztah
Copy link
Member

Closing because this issue went stale; also docker 17.09.1 and 17.12 have been released, and may have fixes in this area

@jamesdboone
Copy link

I'm seeing the same issue, this is the second time. I tried to do everything you asked, but the stack pull isn't working...

docker version
Client:
Version: 1.12.5
API version: 1.24
Package version: docker-common-1.12.5-14.el7.x86_64
Go version: go1.7.4
Git commit: 047e51b/1.12.5
Built: Wed Jan 11 17:53:20 2017
OS/Arch: linux/amd64

Server:
Version: 1.12.5
API version: 1.24
Package version: docker-common-1.12.5-14.el7.x86_64
Go version: go1.7.4
Git commit: 047e51b/1.12.5
Built: Wed Jan 11 17:53:20 2017
OS/Arch: linux/amd64

Issuing docker exec ping resulted in no journald entries for docker.
docker exec -it rfrla0212454dc ping -c 2
ping: unknown host

docker exec rfrla0212454dc cat /etc/resolv.conf
search ...
nameserver 127.0.0.11
options ndots:0

sudo nsenter -n -p -t $(docker inspect --format {{.State.Pid}} rfrla0212454dc) ss -utnlp
Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port
tcp LISTEN 0 128 :22 : users:(("sshd",pid=50003,fd=3))
tcp LISTEN 0 100 127.0.0.1:25 : users:(("master",pid=50117,fd=13))
tcp LISTEN 0 128 :::22 :::
users:(("sshd",pid=50003,fd=4))
tcp LISTEN 0 100 ::1:25 :::* users:(("master",pid=50117,fd=14))

sudo nsenter -n -t $(docker inspect --format {{.State.Pid}} rfrla0212454dc) iptables -t nat -nvL
Chain PREROUTING (policy ACCEPT 1474 packets, 299K bytes)
pkts bytes target prot opt in out source destination

Chain INPUT (policy ACCEPT 1474 packets, 299K bytes)
pkts bytes target prot opt in out source destination

Chain OUTPUT (policy ACCEPT 467 packets, 33554 bytes)
pkts bytes target prot opt in out source destination

Chain POSTROUTING (policy ACCEPT 467 packets, 33554 bytes)
pkts bytes target prot opt in out source destination

Stack trace call doesn't seem to work. No log output results from 'sudo kill -SIGUSR1 '.

@thaJeztah
Copy link
Member

@jamesdboone you’re running the Red Hat fork of Docker, which is an unofficial package, that has a large number of modifications that are not in the official packages, and is not maintained here.

Also note that the 1.12 version reached end of life in March last year. I’d recommend upgrading to a current version of Docker.

If you cannot upgrade, at least make sure to install the latest patch release of docker 1.12 (version 1.12.6), because docker 1.12.5 uses a version of the runc runtime that has a critical vulnerability that enables container processes to escape the container and get root access on the host

@jamesdboone
Copy link

jamesdboone commented Feb 2, 2018 via email

@alin-amana
Copy link

FYI, we just ran into an issue with the embedded DNS server behaving differently from the Linux resolver. We had 2 DNS servers listed in /etc/resolv.conf, with the first consistently returning SERVFAIL due to being misconfigured and the second working correctly.

SERVFAIL (or RCODE 2) is defined in RFC 1035 as "Server failure - The name server was unable to process this query due to a problem with the name server." This is distinct from NXDOMAIN (RCODE 3), which stands for Non-Existent Domain or Name error. The former is a server error, the latter is a categorical "domain does not exist" answer.

In our case, the Linux resolver correctly (in my view) moved on on to the second configured DNS server and resolved the host name. The Docker embedded DNS server appears to interpret SERVFAIL same as NXDOMAIN (unfortunately I wasn't fast enough to see which error it actually returned) and fail the name resolution.

What this meant in practice was that any container not started on a bridge network was able to do DNS resolution without problems (maybe with some added latency) when faced with a misconfigured first DNS server, whereas any container using a bridge network would consistently fail to resolve any host names. This was especially confusing as (without direct access to the Docker server) all I could see was different /etc/resolv.conf contents between the 2 containers, with one of them succeeding and the other one failing DNS resolution.

@johnmaguire
Copy link

Just wanted to chime in to say thanks for saving me time tracking down the same issue @alin-amana. Saw that behavior on Docker 18.09.5 on CentOS 7. Container was using bridged networking, and could not resolve hostnames. Checked /etc/resolv.conf on the host, and the first two nameservers were not resolving the hostnames, while the third did. Adding a dns: line to docker-compose.yml with the working nameserver resolved the issue.

@jurajseffer
Copy link

@thaJeztah I'm running the official 19.03.5 version on Ubuntu 19.10 and as described in #179 (comment) this is still broken. First DNS server not being available or returning an error should not break DNS resolution completely.

@thaJeztah
Copy link
Member

What error is returned by the first DNS?

@thaJeztah
Copy link
Member

moby/libnetwork#2171 should've changed the behavior to continue with the next DNS (depending on the error)

@jurajseffer
Copy link

jurajseffer commented Feb 3, 2020

@thaJeztah In my case the first server wasn't available at all (networking outage - no route to host or connection timeout).

@thaJeztah
Copy link
Member

in that case it should continue with the next DNS; https://github.com/docker/libnetwork/blob/a86d2765b829fb122c70eea7a914d59a8fb1df4a/resolver.go#L452 (did you see that message in the daemon logs?)

Could you open a new ticket with details, and if possible with steps to reproduce the situation?

@jurajseffer
Copy link

Unfortunately I didn't have debug mode on for the daemon at the time but I've found these entries in the logs from around the time it was failing. I didn't find the one you mentioned above.

level=info msg="detected 127.0.0.53 nameserver, assuming systemd-resolved, so using resolv.conf: /run/systemd/resolve/resolv.conf"
level=error msg="[resolver] invalid concurrent query count"`

Worth noting that it was only failing when using docker-compose and thus presumably docker DNS resolver, starting containers with docker run -it alpine was able to resolve DNS. The problem went away when I manually configured DNS servers (instead of using DHCP) in Ubuntu's network manager and removed the first entry which was timing out (dig google.com @ took a long time and never finished for that address at the time).

I've tried to reproduce the problem now by setting an unreachable IP as the first DNS server but was unable to reproduce it, docker reported slightly different error to what you expected:

level=debug msg="[resolver] read from DNS server failed, read udp 172.18.0.2:45270->192.68.255.255:53: i/o timeout"

It properly went on to use the second DNS server which resolved the domain.

If I manage to reproduce it I'll create a new ticket.

@thaJeztah
Copy link
Member

Thanks!

This one is interesting as well; haven't looked at the code yet to see in what situation that error would be produced though;

invalid concurrent query count

@tvajjala
Copy link

i am facing similar issue on RHEL
sg="[resolver] read from DNS server failed, read udp 172.19.0.4:42216->100.108.191.250:53: i/o timeout"

docker info
Server Version: 19.03.12

Security Options:
Operating System: Oracle Linux Server 7.6

@bonswouar
Copy link

bonswouar commented Aug 19, 2022

Just in case anybody has the same problem than me, that could save you some time:
Personally it was because of some containers IPs blocked by fail2ban for some reason. They weren't ignored by configuration as they were using a custom subnet, that I forgot to add to the ignoreip whitelist...
You can check this by doing:

iptables-save | grep -e YOUR_CONTAINER_IP | grep DROP

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants