Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.10 breaks containerized DNS #20019

Closed
huguesb opened this issue Feb 5, 2016 · 11 comments · Fixed by #20181
Closed

1.10 breaks containerized DNS #20019

huguesb opened this issue Feb 5, 2016 · 11 comments · Fixed by #20181
Assignees
Labels
area/networking priority/P3 Best effort: those are nice to have / minor issues.
Milestone

Comments

@huguesb
Copy link

huguesb commented Feb 5, 2016

server: docker 1.10 (boot2docker 1.10 / docker-machine 0.6)
client: docker 1.9.1 (OSX El Capitan)

In our dev environment we're using a containerized instance of rawdns to transparently proxy apt and apk downloads through appropriate caching proxy, themselves in containers on the same host. For python packages, unfortunately we cannot do transparent proxying but we can rely on rawdns to resolve the the name of the caching container to its ip.

Because docker build doesn't accept a --dns argument (which blows my mind but I digress) we have to change the docker daemon configuration to point all containers to that DNS server. The most reliable way to do this we've found is to have the rawdns container bind to the bridge ip directly and use that ip for the daemon --dns argument (tangent: it'd be nice if DNS settings could be changed at runtime with the new SIGHUP config reload).

The upgrade to 1.10 completely breaks our setup: for some reason the rawdns container doesn't get any requests, which means that resolving the pypy caching containers fails (with a annoyingly long delay as well), transparent apt caching doesn't work and, more importantly, our sanity checks fail so the build doesn't even start.

Is this a known limitation of the new networking model? If not, how can I debug this? The daemon logs don't show anything suspicious afaict. If yes, is there an alternate approach to achieve the desired behavior?

@huguesb
Copy link
Author

huguesb commented Feb 5, 2016

For completeness, here's a gist that should allow the issue to be reproduced:

https://gist.github.com/huguesb/ce2c88f144dce518e4ce

Create a fresh boot2docker VM and run rawdns.sh
It should end with dns successfully configured but with 1.10 it ends with invalid dns configuration.

@tiborvass tiborvass added this to the 1.10.1 milestone Feb 5, 2016
@mavenugo
Copy link
Contributor

mavenugo commented Feb 5, 2016

@huguesb the new DNS model in 1.10 should not impact your use-case since all your containers are connected to the default bridge network (docker0). The new embedded DNS in 1.10 is applicable only to user-defined networks.
Hence the failure you are observing could be something unrelated to the new 1.10 features.
I will give your script a try once I get my setup replicated as yours (your script isn't working as-is in a linux dev env). I hope you are not using 172.17.42.1 as your --dns configuration statically (that has changed in 1.9 release).

@huguesb
Copy link
Author

huguesb commented Feb 5, 2016

@mavenugo the script expects a docker-machine environment. raw docker on linux is not fully supported because updating the daemon dns config requires a manual step (cf tangents above about --dns arg to docker build and support for dns change in the new SIGHUP config reload)

NB: as it is, the script requires jq 1.5 and on linux it needs a recent version of curl to use unix sockets but that could easily be removed to make a minimal reproducer.

@mavenugo
Copy link
Contributor

mavenugo commented Feb 5, 2016

@huguesb okay. I think i got the setup in place and reproduce your use-case. I dont know much about the rawdns binary... I assume that the rawdns.json is the basic configuration for the binary and will help with the lookup for rawdns.docker when you try to ping that name. Which DNS server will actually resolve that name ? and what is 172.16.0.83 in that json file ?

@mavenugo
Copy link
Contributor

mavenugo commented Feb 5, 2016

@huguesb also, I tried a simple nc 172.17.0.1 53 from another container (yes TCP) in that network and the connection seem to be working fine, which indicates that the DNS queries should be correctly forwarded to the rawdns container.

@mavenugo
Copy link
Contributor

mavenugo commented Feb 5, 2016

@huguesb btw, I tried the scenario in an ubuntu 14.04 system with the following result

$ cat /etc/rawdns.json
{
    "docker.": {
        "type": "containers",
        "socket": "unix:///var/run/docker.sock"
    },
    "local.": {
        "type": "forwarding",
        "nameservers": [ "192.168.1.1" ]
    },
    ".": {
        "type": "forwarding",
        "nameservers": [ "8.8.8.8" ]
    }
}

$ sudo ./docker-1.10.0 run --name=rawdns --rm -p 172.17.0.1:53:53/udp -v /var/run/docker.sock:/var/run/docker.sock -v /etc/rawdns.json:/etc/rawdns.json:ro tianon/rawdns  rawdns /etc/rawdns.json
2016/02/05 08:10:29 rawdns v1.3 (go1.5.3 on linux/amd64; gc)
2016/02/05 08:10:29 listening on domain: docker.
2016/02/05 08:10:29 listening on domain: local.
2016/02/05 08:10:29 listening on domain: .

$ sudo ./docker-1.10.0 run -it --dns 172.17.0.1 busybox ping -c1 rawdns.docker
PING rawdns.docker (172.17.0.2): 56 data bytes
64 bytes from 172.17.0.2: seq=0 ttl=64 time=0.098 ms

--- rawdns.docker ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.098/0.098/0.098 ms

It does seem to work as expected. can you please find out the difference with your setup ?

@sanimej
Copy link

sanimej commented Feb 5, 2016

@huguesb The problem seems to happen when using the bridge IP for --dns and with userland_proxy set to false. I tried two things just to narrow down the issue..

  1. instead of bridge ip use the rawdns container IP through --dns - this works.
  2. remove userland_proxy = false and use bridge IP through --dns - this works as well.

This is with boot2docker. We have to look into this some more to identify the change in 1.10 that is causing this. Looks like its specific to boot2docker environment.

@tiborvass tiborvass added the priority/P0 Urgent: Security, critical bugs, blocking issues. drop everything until this issue is addressed. label Feb 5, 2016
@huguesb
Copy link
Author

huguesb commented Feb 5, 2016

@mavenugo 172.16.0.83 is the address of our vpn-internal dns server. Sorry, I should have scrubbed that config file better.

@sanimej I tried passing the rawdns container ip instead of the bridge ip. It works but it's not acceptable for our use case. Remember: this is about passing a dns option to docker build, so we need to change the daemon settings and the container ip is not stable across restarts (or did that change since I last checked?)

Unfortunately, removing userland_proxy=false is also not an option. The memory and cpu overhead is just too high.

@mavenugo
Copy link
Contributor

mavenugo commented Feb 5, 2016

@huguesb userland-proxy=false is something that we are not recommending atm. We have a serious kernel issue that we have to resolve before we can turn off userland-proxy by default.
We will continue to debug this issue. But our recommendation is to be aware of the kernel issue (PTAL : #14856)

@tiborvass tiborvass added priority/P2 Normal priority: default priority applied. and removed priority/P0 Urgent: Security, critical bugs, blocking issues. drop everything until this issue is addressed. labels Feb 5, 2016
@tiborvass
Copy link
Contributor

Sorry I mixed it with another issue, this one is not a p0.

@thaJeztah thaJeztah added priority/P3 Best effort: those are nice to have / minor issues. and removed priority/P2 Normal priority: default priority applied. labels Feb 5, 2016
@thaJeztah
Copy link
Member

I made this a P3; we have some urgent issues that we want solved asap for the 1.10.1 release; I think it's ok to let this slip to 1.10.2 if we're unable to resolve before 1.10.1. enabling userland-proxy (the default setting) can work as a workaround in the meantime

mavenugo added a commit to mavenugo/docker that referenced this issue Feb 10, 2016
- Fixes moby#20132 moby#20140 moby#20019

Signed-off-by: Madhu Venugopal <madhu@docker.com>
tiborvass pushed a commit to tiborvass/docker that referenced this issue Feb 10, 2016
- Fixes moby#20132 moby#20140 moby#20019

Signed-off-by: Madhu Venugopal <madhu@docker.com>
(cherry picked from commit 84705f1)

From PR moby#20181
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/networking priority/P3 Best effort: those are nice to have / minor issues.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants