Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: getaddrinfo EAI_AGAIN host.docker.internal after upgrade to 4.17.0 #6747

Open
nanek opened this issue Feb 28, 2023 · 21 comments
Open

Comments

@nanek
Copy link

nanek commented Feb 28, 2023

Expected behavior

host.docker.internal should resolve

Actual behavior

host.docker.internal does not resolve

Information

This problem is new. After upgrading to 4.17.0, I'm having issues sending UDP request to host.docker.internal. After downgrading back to 4.16.2, it works again as expected.

  • macOS Version: 13.2.1
  • Intel chip or Apple chip: intel
  • Docker Desktop Version: 4.17.0

Output of /Applications/Docker.app/Contents/MacOS/com.docker.diagnose check

[2023-03-02T20:37:30.078466000Z][com.docker.diagnose][I] set path configuration to OnHost
Starting diagnostics

[PASS] DD0027: is there available disk space on the host?
[PASS] DD0028: is there available VM disk space?
[PASS] DD0018: does the host support virtualization?
[PASS] DD0001: is the application running?
[PASS] DD0017: can a VM be started?
[PASS] DD0016: is the LinuxKit VM running?
[PASS] DD0011: are the LinuxKit services running?
[PASS] DD0004: is the Docker engine running?
[PASS] DD0015: are the binary symlinks installed?
[PASS] DD0031: does the Docker API work?
[PASS] DD0013: is the $PATH ok?
[PASS] DD0003: is the Docker CLI working?
[PASS] DD0038: is the connection to Docker working?
[PASS] DD0014: are the backend processes running?
[PASS] DD0007: is the backend responding?
[PASS] DD0008: is the native API responding?
[PASS] DD0009: is the vpnkit API responding?
[PASS] DD0010: is the Docker API proxy responding?
[SKIP] DD0030: is the image access management authorized?
[PASS] DD0033: does the host have Internet access?
[PASS] DD0018: does the host support virtualization?
[PASS] DD0001: is the application running?
[PASS] DD0017: can a VM be started?
[PASS] DD0016: is the LinuxKit VM running?
[PASS] DD0011: are the LinuxKit services running?
[PASS] DD0004: is the Docker engine running?
[PASS] DD0015: are the binary symlinks installed?
[PASS] DD0031: does the Docker API work?
[PASS] DD0032: do Docker networks overlap with host IPs?
No fatal errors detected.

@raylu
Copy link

raylu commented Mar 1, 2023

possibly #6699?

@djs55
Copy link
Contributor

djs55 commented Mar 1, 2023

It sounds related to the changes in #6699 but probably slightly different. Is the problem only with host.docker.internal? There is some special-casing there which might have broken.

@nanek
Copy link
Author

nanek commented Mar 2, 2023

I'm seeing the same issue with any host. So not host.docker.internal specific.

@nanek
Copy link
Author

nanek commented Mar 2, 2023

From within the container if I run curl --url www.google.com it returns curl: (6) Could not resolve host: www.google.com on 4.17 and on previous versions this would work fine. I guess it is more of a DNS issue.

@wpline
Copy link

wpline commented Mar 4, 2023

We have multiple users (MacOS 13.2.1) running into a similar issue after upgrading to 4.17--4.16.2 doesn't have this problem.

docker run -it --rm curlimages/curl curl https://www.google.com/

curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: https://curl.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.

@jkroepke
Copy link

jkroepke commented Mar 5, 2023

I have the problem, since I'm using the dev builds from here

#6699 (comment) (switching to stable 4.17 doesnt resolve the issue)

I do have same problem after some amount of time. Restart Docker Desktop help me to resolve the some for some time.

@jspuij
Copy link

jspuij commented Mar 14, 2023

I have the same issue on a mac. Restarting does not solve it. I had to downgrade to 4.16

@mgh520
Copy link

mgh520 commented Mar 15, 2023

From within the container if I run curl --url www.google.com it returns curl: (6) Could not resolve host: www.google.com on 4.17 and on previous versions this would work fine. I guess it is more of a DNS issue.

and as @wpline says, for me it is an ssl certificate error. But it's not just curl, it's any http calls from within a container.

Given that this works in 4.16.2 and stopped working in 4.17.0, is it a bug? Or was it an intentional breaking change in 4.17? I looked through the change log but didn't see what would have caused this.

I can downgrade to 4.16.2 to resolve it but in a few minutes docker desktop will force me to upgrade again :(

@Ryux
Copy link

Ryux commented Mar 19, 2023

Same problem for me...
Downgrade to 4.16.2 to solve this issue

@axiopisty
Copy link

I'm experiencing the same issue on my mac, but mine is on Apple Silicon, not Intel.

@CodingWithTashi
Copy link

@axiopisty This is an open issue, try downgrading. work like a charm.
Download link here for apple silicon

@jkroepke
Copy link

Downgrade doesnt work for me, because I'm hitting #6699

@XA21X
Copy link

XA21X commented Apr 12, 2023

I'm running Docker Desktop 4.17.0 on Apple M1.

docker-mac-net-connect's Brew service was failing due to DNS resolution:

Interface chip0 already exists. Removing.
Creating WireGuard interface chip0
Assigning IP to WireGuard interface
Failed to lookup IP: lookup host.docker.internal on 192.168.65.7:53: read udp 192.168.65.4:33092->192.168.65.7:53: i/o timeout

and I saw the same error as @nanek when curling Google. DNS resolution inside containers was broken.

chipmk/docker-mac-net-connect#19 (comment) gave me the idea to try changing the Docker subnet in case there's something that only gets initialised once. I set it to 192.168.66.0/24 (+1 from the default), and... it works. 😮

That fixed DNS resolution inside containers, and docker-mac-net-connect. Everything works now. 😃

@jkroepke
Copy link

Everything works now. 😃

How long you test this? Restart Docker Desktop fixes this issues temporarily. Changing the CIDR would restart Docker.

@XA21X
Copy link

XA21X commented Apr 12, 2023

How long you test this?

Not very long - hours, including one reboot.

Restart Docker Desktop fixes this issues temporarily.

Restarting Docker Desktop hasn't worked for me, not even temporarily. 🤷

I hope it keeps working, but a repeatable workaround would already be a great improvement.


EDIT: Wow, it randomly died right after I sent this^. 😢

The WireGuard tunnel stopped responding, and DNS also started timing out.

DEBUG: (utun3) 2023/04/13 01:33:11 peer(AVrP…jHWI) - Receiving keepalive packet
DEBUG: (utun3) 2023/04/13 01:33:25 peer(AVrP…jHWI) - Sending keepalive packet
DEBUG: (utun3) 2023/04/13 01:33:40 peer(AVrP…jHWI) - Sending keepalive packet
DEBUG: (utun3) 2023/04/13 01:34:01 peer(AVrP…jHWI) - Retrying handshake because we stopped hearing back after 15 seconds
DEBUG: (utun3) 2023/04/13 01:34:01 peer(AVrP…jHWI) - Sending handshake initiation
DEBUG: (utun3) 2023/04/13 01:34:06 peer(AVrP…jHWI) - Handshake did not complete after 5 seconds, retrying (try 2)

Restarting Docker Desktop (followed by docker-mac-net-connect) did work this time. 🤔

@jkroepke
Copy link

To get deeper into this, I debug the linux vm while I was not able to resolve DNS anymore.

I could figure out that the issue needs to be on the libc resolver. Direct DNS lookup are working fine while program like wget does work.

I also download a static curl binary to avoid any issues with busybox resolver. but the issues still exists.

Then i started tcpdump on background to capture the DNS traffic. here are the results:

# /var/lib/curl -v google.de
* Could not resolve host: google.de
* Closing connection 0
curl: (6) Could not resolve host: google.de


17:02:36.708544 IP (tos 0x0, ttl 64, id 9114, offset 0, flags [DF], proto UDP (17), length 55)
    192.168.65.6.61499 > 192.168.65.7.53: [bad udp cksum 0x0393 -> 0x6c8f!] 62877+ A? google.de. (27)
17:02:36.708650 IP (tos 0x0, ttl 64, id 9115, offset 0, flags [DF], proto UDP (17), length 55)
    192.168.65.6.61499 > 192.168.65.7.53: [bad udp cksum 0x0393 -> 0x5054!] 63192+ AAAA? google.de. (27)
17:02:36.711749 IP (tos 0x0, ttl 64, id 38807, offset 0, flags [DF], proto UDP (17), length 80)
    192.168.65.7.53 > 192.168.65.6.61499: [bad udp cksum 0x03ac -> 0x80f1!] 62877 q: A? google.de. 1/0/0 google.de. A 172.217.18.3 (52)
17:02:36.738170 IP (tos 0x0, ttl 64, id 38822, offset 0, flags [DF], proto UDP (17), length 92)
    192.168.65.7.53 > 192.168.65.6.61499: [bad udp cksum 0x03b8 -> 0x7d35!] 63192 q: AAAA? google.de. 1/0/0 google.de. AAAA 2a00:1450:4001:80b::2003 (64)

17:02:39.209075 IP (tos 0x0, ttl 64, id 9498, offset 0, flags [DF], proto UDP (17), length 55)
    192.168.65.6.61499 > 192.168.65.7.53: [bad udp cksum 0x0393 -> 0x6c8f!] 62877+ A? google.de. (27)
17:02:39.209191 IP (tos 0x0, ttl 64, id 9499, offset 0, flags [DF], proto UDP (17), length 55)
    192.168.65.6.61499 > 192.168.65.7.53: [bad udp cksum 0x0393 -> 0x5054!] 63192+ AAAA? google.de. (27)
17:02:39.212419 IP (tos 0x0, ttl 64, id 40377, offset 0, flags [DF], proto UDP (17), length 80)
    192.168.65.7.53 > 192.168.65.6.61499: [bad udp cksum 0x03ac -> 0x80f1!] 62877 q: A? google.de. 1/0/0 google.de. A 172.217.18.3 (52)
17:02:39.212636 IP (tos 0x0, ttl 64, id 40378, offset 0, flags [DF], proto UDP (17), length 92)
    192.168.65.7.53 > 192.168.65.6.61499: [bad udp cksum 0x03b8 -> 0x7d35!] 63192 q: AAAA? google.de. 1/0/0 google.de. AAAA 2a00:1450:4001:80b::2003 (64)
# nslookup google.de
Server:   192.168.65.7
Address:  192.168.65.7:53

Non-authoritative answer:
Name: google.de
Address: 172.217.18.3

Non-authoritative answer:
Name: google.de
Address: 2a00:1450:4001:80b::2003


17:02:46.430386 IP (tos 0x0, ttl 64, id 26349, offset 0, flags [DF], proto UDP (17), length 55)
    192.168.65.6.59364 > 192.168.65.7.53: [bad udp cksum 0x0393 -> 0x00f5!] 26991+ A? google.de. (27)
17:02:46.430465 IP (tos 0x0, ttl 64, id 26350, offset 0, flags [DF], proto UDP (17), length 55)
    192.168.65.6.59364 > 192.168.65.7.53: [bad udp cksum 0x0393 -> 0xe465!] 27390+ AAAA? google.de. (27)
17:02:46.433646 IP (tos 0x0, ttl 64, id 44636, offset 0, flags [DF], proto UDP (17), length 80)
    192.168.65.7.53 > 192.168.65.6.59364: [bad udp cksum 0x03ac -> 0x1577!] 26991 q: A? google.de. 1/0/0 google.de. A 172.217.18.3 (52)
17:02:46.434193 IP (tos 0x0, ttl 64, id 44637, offset 0, flags [DF], proto UDP (17), length 92)
    192.168.65.7.53 > 192.168.65.6.59364: [bad udp cksum 0x03b8 -> 0x1167!] 27390 q: AAAA? google.de. 1/0/0 google.de. AAAA 2a00:1450:4001:80b::2003 (64)

It seems like if curl (or the preinstalled wget) is trying to resolve the hostname through libc the DNS queries are visible and successful. But something in between the DNS answer and the executing program wont work here.

@nanek
Copy link
Author

nanek commented Apr 12, 2023

Restarting docker desktop hasn't been working for me either to fix this issue. However, updating the subnet did work for a few hours and just now started failing again.

@taiidani
Copy link

Confirmed that the issue appears to still be manifesting on the newer 4.18.0 release. Receiving this sporadically after clean/purging the VM's data and executing a docker build:

Error response from daemon: Head "https://registry.url": dialing registry.url:443 with direct connection: resolving host registry.url: lookup registry.url: no such host

Executing the build repeatedly until it succeeds is a working but an inefficient solution.

@charginghawk
Copy link

charginghawk commented Jul 6, 2023

Confirming this is still an issue with 4.21, getting cert errors while installing libraries that can only be resolved by downgrading to 4.16 (I wouldn't be surprised if a certain Zscaler SSL proxying is tied into it). Is there a plan to fix this, or if not, any guidance on what changes to make to accommodate newer versions?

@danepowell
Copy link

To be clear since I've only seen this mentioned once in the most recent comment by @charginghawk : this seems to especially impact organizations using Docker behind Zscaler.

@danepowell
Copy link

A coworker smartly suggested this as the root cause from the Docker 4.17 release notes:

Fixed a bug where the “system” proxy would not handle “autoproxy” / “pac file” configurations.

ZScaler uses a .pac file hosted locally to configure the proxy like so 127.0.0.1:9000/localproxy-723ce0fc.pac

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests