Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: Get https://registry-1.docker.io/v2/: dial tcp: lookup registry-1.docker.io on [::1]:53: read udp [::1]:35711->[::1]:53: read: connection refused #2618

Closed
hessam94 opened this issue Jul 2, 2020 · 32 comments

Comments

@hessam94
Copy link

hessam94 commented Jul 2, 2020

hi,
I installed docker on Android and sudo docker version works fine and shows Client and Server available. but any kind of command , such as hello-world, pull, push, build, etc returns me this error.
this is my command
sudo docker build -t img1 .
ERROR: Get https://registry-1.docker.io/v2/: dial tcp: lookup registry-1.docker.io on [::1]:53: read udp [::1]:35711->[::1]:53: read: connection refused

i am working on Android and do not have systemd, systemctl to start restart service, i used dockerd to start daemon and it started with no error
i am not sure if there is any firewall proxy on android, how can i evaluate it?
please help me i really need it

@thaJeztah
Copy link
Member

looks like it's trying to connect to IPv6 localhost for DNS resolution.

note that Android is currently not a supported platform for Docker (at least very much untested).

Does it work if you configure the host to use (e.g.) 8.8.8.8 as primary DNS server (not sure how it's configured in your environment, on linux it would usually be done through /etc/resolv.conf

@hessam94
Copy link
Author

hessam94 commented Jul 3, 2020

Hello again,
tried to set DNS on android, by sudo setprop dns1 8.8.8.8 and sudo setprop dns1 8.8.4.4 . and getprop also returns correctly. but still have connection refuse problem.
there is not /etc/resolv.conf on android9 !!! no idea how to solve this problem , i have limited time, please help me. thanks

@hessam94
Copy link
Author

hessam94 commented Jul 3, 2020

I took this screen shot after setting my DNS, any Idea guys??
Screenshot_20200703-174541

@lemmy04
Copy link

lemmy04 commented Oct 1, 2020

I'm having the same issue on openSUSE Leap 15.2, but only on the first attempt to pull an image after boot.
my /etc/resolv.conf is fine, configured by DHCP.

What happens here is the docker daemon trying to contact ::1 as nameserver, via IPv6, and totally ignores what is actually configured in /etc/resolv.conf... why?

@lemmy04
Copy link

lemmy04 commented Oct 1, 2020

Interesting detail:

after the first fail the issue "repairs itself" - after waiting for ...maybe a minute or so the next attempt to pull an image works. But if I try again right after the first fail, it fails again.
On the other hand, it does not matter how long the system has been up - the first pull fails, the next one fails if tried too soon afte the fail, then it works until reboot.

Someone please explain o.0

@hessam94
Copy link
Author

hessam94 commented Oct 1, 2020

have you set
nameserver 8.8.8.8
nameserver 8.8.4.4
in your /etc/resolv.conf?

@lemmy04
Copy link

lemmy04 commented Oct 1, 2020

i already said that my /etc/resolv.conf is fine - everyting else and their dog is perfectly able to resolve hostnames.
You do realize that with the settings that you suggest the system will be unable to resolve hostnames for local adresses?

Anyway, I found something:
the systems here that show this problem all run NetworkManager for the network, the ones that dont have static (manually) configured network.
So .. I changed the start of the docker unit file to include "NetworkManager-wait-online.service" in the "After=" line, and nof the problem doesn't happen anymore. Seems that when dockerd is "normally" started, networkmanager isn't up yet so docker can'T find the DNS in /etc/resolv.conf, and just assumes ::1 instead. But on the second try it uses what's in resolv.conf by then, which is the properly configured dns.

@mkozjak
Copy link

mkozjak commented Oct 5, 2020

Just saw this on aarch64, Debian Buster. This is a wreck.

@davewood
Copy link

davewood commented Nov 6, 2020

had the same error and kept re-running the command docker-compose up db which attempts to pull the mariadb container. I didnt change a thing and the 5th try succeeded.

@pie-r
Copy link

pie-r commented Dec 15, 2020

re-trying N times, each time it works.

Now I tried to set the nameserver 8.8.8.8

@cdauth
Copy link

cdauth commented Mar 25, 2021

Same problem on Arch Linux. Rerunning the command helps, usually it succeeds with the 2nd or 3rd try.

@andreaagongora
Copy link

Same problem on Arch Linux. Rerunning the command helps, usually it succeeds with the 2nd or 3rd try.

Hi, what command helps did you use?, I have the same error in Arch linux

@cdauth
Copy link

cdauth commented Apr 15, 2021

It usually happens to me when I'm pulling an image using docker-compose pull. I just rerun the command again and again until it works by itself.

@GavinFF-SS
Copy link

having same issue just uninstalled pihole and set my pi to use google dns and Get https://registry-1.docker.io/v2/: dial tcp: lookup registry-1.docker.io on 127.0.0.1:53: read udp 127.0.0.1:40185->127.0.0.1:53: read: connection refused error

@muuvmuuv
Copy link

Having some kind of the same, and I am able to resolve the address, just docker cannot:

error: Error response from daemon: Get https://registry-1.docker.io/v2/: proxyconnect tcp: dial tcp 192.168.65.1:3128: connect: network is unreachable

I am on a Mac mini M1 with the latest OS and Docker installed. Resetting Docker to factory defaults resolve the problem but enabling the new virtualization framework produces the error again.

@joeky888
Copy link

joeky888 commented Jun 18, 2021

I have found a solution for snapcraft installation of docker (sudo snap install docker), tested on Archlinux.

sudo snap disable docker
sudo snap enable docker

If it still doesn't work for you, see if upgrading to the edge branch can help.

sudo snap refresh docker --edge # Don't do this in the production environment

@paulcalabro
Copy link

@muuvmuuv 's fix worked for me as well.

@darrentmorgan
Copy link

Weird.. MX Linux, same issue. Retry 3 minutes later with no changes, works fine.

@Uzer1
Copy link

Uzer1 commented Aug 18, 2021

It was the same for me on Windows 8.1 with 'timeout' at the end. I changed DNS to Google's and it worked.
8.8.8.8
8.8.4.4

@stuntpig
Copy link

stuntpig commented Oct 22, 2021

I am getting this same error. I have tried pretty much everything in this post. Any ideas would be super helpful.

I'm having an issue when I run our make local command that none of the other developers on my team are having. I'm on a Mac M1 machine. One other dev is also on an M1 and our setup works fine for him. This is happening on multiple repositories. So I'm assuming I have a global setting messed up somehow.

Here is the error I am getting that I don't fully understand...

=> ERROR [dealers_app internal] load metadata for docker.io/library/golang:alpine                                                      0.1s
 => CANCELED [dealers_nginx internal] load metadata for docker.io/library/nginx:1.15.2-alpine                                           0.0s
------
 > [dealers_app internal] load metadata for docker.io/library/golang:alpine:
failed to solve: rpc error: code = Unknown desc = failed to solve with frontend dockerfile.v0: failed to create LLB definition: failed to authorize: rpc error: code = Unknown desc = failed to fetch anonymous token: Get "https://auth.docker.io/token?scope=repository%3Alibrary%2Fgolang%3Apull&service=registry.docker.io": dial tcp: lookup auth.docker.io on [::1]:53: read udp [::1]:61456->[::1]:53: read: connection refused
make: *** [app] Error 17
Stuff that I have tried.
    1. docker logout then docker login
    1. Changed dns to 8.8.8.8 in the docker dameon
    1. Changed my mac's dns entries to 8.8.8.8
    1. Uninstalled and reinstalled Docker
    1. Reverted docker to 4.0.1 which is what the other M1 dev is using
    1. Turned off buildkit in the docker dameon "features": { buildkit: false}
    1. I had dnsmasq running in brew services. Have completely removed dnsmasq now.
    1. Have confirmed I'm not behind a proxy
    1. There is no firewall
    1. Running docker compose up -d instead of make local produces the same error
    1. docker pull golang:alpine works as expected
    1. export DOCKER_BUILDKIT=0 and export COMPOSE_DOCKER_CLI_BUILD=0 did not help
    1. If I navigate directly to this url in a browser, I get a token... https://auth.docker.io/token?scope=repository%3Alibrary%2Fgolang%3Apull&service=registry.docker.io

Any thoughts would be great!

docker-compose.yml

version: "3.3"

services:
  dealers-app:
    container_name: ${APP_NAME}_app
    image: ${APP_NAME}_app
    build:
      context: .
      dockerfile: docker/app.local.Dockerfile
    environment:
      VIRTUAL_HOST: ${APP_NAME}-app
      HTTPS_METHOD: noredirect
    env_file:
      - .env
    ports:
      - 80
    expose:
      - ${APP_PORT}
    volumes:
      - .:/go/src/app
      - ${GOPATH}/pkg/mod:/go/pkg/mod
  dealers-nginx:
    container_name: ${APP_NAME}_nginx
    image: ${APP_NAME}_nginx
    build:
      context: .
      dockerfile: docker/nginx/nginx.Dockerfile
    environment:
      VIRTUAL_HOST: ${APP_NAME}.test
      HSTS: "off"
      HTTPS_METHOD: noredirect
      APP_DOMAIN: ${APP_NAME} ${APP_NAME}.test
      APP_HOST: ${APP_NAME}-app
      APP_PORT: ${APP_PORT}
networks:
  default:
    external:
      name: development

Makefile

PROJECT_NAME := dealers

.DEFAULT := help
.PHONY: local down reset app all-reset nginx nginx-reset

help:
    @echo '# Dev Targets'
    @echo
    @echo '    local            build the application containers/services locally'
    @echo '    down             tear down the application containers/services locally'
    @echo '    reset            rebuild the application container and restart the service'
    @echo
    @echo '    all-reset        reset the application containers/services'
    @echo '    app              build the application container'
    @echo '    nginx            build the nginx container'
    @echo '    nginx-reset      rebuild the nginx container and restart the service'
    @echo
    @echo '    help             show this help message'
    @echo

local: app nginx
    docker compose up -d --force-recreate \
    && docker compose logs -f

down:
    docker compose down -v

all-reset: down local

reset: app
    docker compose up -d --force-recreate ${PROJECT_NAME}-app \
    && docker compose logs -f

app:
    docker compose build ${PROJECT_NAME}-app

nginx:
    docker compose build ${PROJECT_NAME}-nginx

nginx-reset: nginx
    docker compose up -d --force-recreate ${PROJECT_NAME}-nginx

Dockerfile

FROM golang:alpine as build

RUN apk add --no-cache git

ENV GOPRIVATE gitlab.com/companynameishere-software-development

RUN go install github.com/githubnemo/CompileDaemon@latest \
    && chmod +x /go/bin/CompileDaemon

WORKDIR /go/src/app

ADD ../.netrc /root/.netrc

ENTRYPOINT CompileDaemon --build="go build -o main"

@ca-d
Copy link

ca-d commented Nov 20, 2021

Am currently experiencing the same issue on 64 bit x86 Void Linux

@lanzer
Copy link

lanzer commented Feb 17, 2022

i already said that my /etc/resolv.conf is fine - everyting else and their dog is perfectly able to resolve hostnames. You do realize that with the settings that you suggest the system will be unable to resolve hostnames for local adresses?

Anyway, I found something: the systems here that show this problem all run NetworkManager for the network, the ones that dont have static (manually) configured network. So .. I changed the start of the docker unit file to include "NetworkManager-wait-online.service" in the "After=" line, and nof the problem doesn't happen anymore. Seems that when dockerd is "normally" started, networkmanager isn't up yet so docker can'T find the DNS in /etc/resolv.conf, and just assumes ::1 instead. But on the second try it uses what's in resolv.conf by then, which is the properly configured dns.

This happened to me while trying to install Home Assistant on Debian 11.2

Docker tried to make network calls before network interface is ready, causing the install script to fail.

Solution:
run "systemctl cat docker.service" , and it will tell you where the docker service file is located.
edit /lib/systemd/system/docker.service with your favorite text editor
append "NetworkManager-wait-online.service" to line 4, that line should start with the word "After="
run "systemctl daemon-reload" to update your changes to the service file.

@Allteran
Copy link

I have found a solution for snapcraft installation of docker (sudo snap install docker), tested on Archlinux.

sudo snap disable docker
sudo snap enable docker

Manjaro Linux here. Thanks a lot! Only your solution helped me.

@chriswyatt
Copy link

chriswyatt commented Mar 10, 2022

For me, it was because I forgot to symlink /etc/resolv.conf; however the problem was specific to my particular set-up: https://wiki.archlinux.org/title/Systemd-resolved . I hadn't had any DNS issues up to this point, so it took me a while to figure it out.

@fromage9747
Copy link

I am using an LXC Container in Proxmox with Docker running on it. Confirmed that when adding:
nameserver 8.8.8.8
nameserver 8.8.4.4
to /etc/resolv.conf the issue is resolved.

Cheers.

@PitchAbyss
Copy link

PitchAbyss commented Jul 9, 2022

It was the same for me on Windows 8.1 with 'timeout' at the end. I changed DNS to Google's and it worked. 8.8.8.8 8.8.4.4

i am on windows 11 with this error , any one please help

image

@brunom63
Copy link

brunom63 commented Sep 4, 2022

i already said that my /etc/resolv.conf is fine - everyting else and their dog is perfectly able to resolve hostnames. You do realize that with the settings that you suggest the system will be unable to resolve hostnames for local adresses?
Anyway, I found something: the systems here that show this problem all run NetworkManager for the network, the ones that dont have static (manually) configured network. So .. I changed the start of the docker unit file to include "NetworkManager-wait-online.service" in the "After=" line, and nof the problem doesn't happen anymore. Seems that when dockerd is "normally" started, networkmanager isn't up yet so docker can'T find the DNS in /etc/resolv.conf, and just assumes ::1 instead. But on the second try it uses what's in resolv.conf by then, which is the properly configured dns.

This happened to me while trying to install Home Assistant on Debian 11.2

Docker tried to make network calls before network interface is ready, causing the install script to fail.

Solution: run "systemctl cat docker.service" , and it will tell you where the docker service file is located. edit /lib/systemd/system/docker.service with your favorite text editor append "NetworkManager-wait-online.service" to line 4, that line should start with the word "After=" run "systemctl daemon-reload" to update your changes to the service file.

You're a genius! This is the definite solution for Debian. Worked on Debian 10 and 11.

stepanblyschak added a commit to stepanblyschak/sonic-buildimage that referenced this issue Feb 3, 2023
Go's runtime (and dockerd inherits this) uses own DNS resolver implementation by default on Linux.
It has been observed that there are some DNS resolution issues when executing ```docker pull``` after first boot.

Consider the following script:

```
admin@r-boxer-sw01:~$ while :; do date; cat /etc/resolv.conf; ping -c 1 harbor.mellanox.com; docker pull harbor.mellanox.com/sonic/cpu-report:1.0.0 ; sleep 1; done
Fri 03 Feb 2023 10:06:22 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.99 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.989/5.989/5.989/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:57245->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:23 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.56 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.561/5.561/5.561/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:53299->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:24 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.78 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.783/5.783/5.783/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:55765->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:25 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=7.17 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 7.171/7.171/7.171/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:44877->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:26 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.66 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.656/5.656/5.656/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:54604->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:27 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=8.22 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 8.223/8.223/8.223/0.000 ms
1.0.0: Pulling from sonic/cpu-report
004f1eed87df: Downloading [===================>                               ]   19.3MB/50.43MB
5d6f1e8117db: Download complete
48c2faf66abe: Download complete
234b70d0479d: Downloading [=========>                                         ]  9.363MB/51.84MB
6fa07a00e2f0: Downloading [==>                                                ]   9.51MB/192.4MB
04a31b4508b8: Waiting
e11ae5168189: Waiting
8861a99744cb: Waiting
d59580d95305: Waiting
12b1523494c1: Waiting
d1a4b09e9dbc: Waiting
99f41c3f014f: Waiting
```

While /etc/resolv.conf has the correct content and ping (and any other utility that uses libc's DNS resolution implementation) works correctly
docker is unable to resolve the hostname and falls back to default [::1]:53. This started to happen after PR sonic-net#13516 has been merged.
As you can see from the log, dockerd is able to pick up the correct /etc/resolv.conf only after 5 sec since first try. This seems to be somehow related to the logic in Go's DNS resolver
https://github.com/golang/go/blob/master/src/net/dnsclient_unix.go#L385.

There have been issues like that reported in docker like:
  - docker/cli#2299
  - docker/cli#2618
  - moby/moby#22398

Since this starts to happen after inclusion of resolvconf package by
above mentioned PR and the fact I can't see any problem with that (ping,
nslookup, etc. works) the choice is made to force dockerd to use cgo
(libc) resolver.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
liat-grozovik pushed a commit to sonic-net/sonic-buildimage that referenced this issue Feb 14, 2023
Go's runtime (and dockerd inherits this) uses own DNS resolver implementation by default on Linux.
It has been observed that there are some DNS resolution issues when executing ```docker pull``` after first boot.

Consider the following script:

```
admin@r-boxer-sw01:~$ while :; do date; cat /etc/resolv.conf; ping -c 1 harbor.mellanox.com; docker pull harbor.mellanox.com/sonic/cpu-report:1.0.0 ; sleep 1; done
Fri 03 Feb 2023 10:06:22 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.99 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.989/5.989/5.989/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:57245->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:23 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.56 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.561/5.561/5.561/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:53299->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:24 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.78 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.783/5.783/5.783/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:55765->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:25 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=7.17 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 7.171/7.171/7.171/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:44877->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:26 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.66 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.656/5.656/5.656/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:54604->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:27 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=8.22 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 8.223/8.223/8.223/0.000 ms
1.0.0: Pulling from sonic/cpu-report
004f1eed87df: Downloading [===================>                               ]   19.3MB/50.43MB
5d6f1e8117db: Download complete
48c2faf66abe: Download complete
234b70d0479d: Downloading [=========>                                         ]  9.363MB/51.84MB
6fa07a00e2f0: Downloading [==>                                                ]   9.51MB/192.4MB
04a31b4508b8: Waiting
e11ae5168189: Waiting
8861a99744cb: Waiting
d59580d95305: Waiting
12b1523494c1: Waiting
d1a4b09e9dbc: Waiting
99f41c3f014f: Waiting
```

While /etc/resolv.conf has the correct content and ping (and any other utility that uses libc's DNS resolution implementation) works correctly
docker is unable to resolve the hostname and falls back to default [::1]:53. This started to happen after PR #13516 has been merged.
As you can see from the log, dockerd is able to pick up the correct /etc/resolv.conf only after 5 sec since first try. This seems to be somehow related to the logic in Go's DNS resolver
https://github.com/golang/go/blob/master/src/net/dnsclient_unix.go#L385.

There have been issues like that reported in docker like:
  - docker/cli#2299
  - docker/cli#2618
  - moby/moby#22398

Since this starts to happen after inclusion of resolvconf package by
above mentioned PR and the fact I can't see any problem with that (ping,
nslookup, etc. works) the choice is made to force dockerd to use cgo
(libc) resolver.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this issue Feb 17, 2023
Go's runtime (and dockerd inherits this) uses own DNS resolver implementation by default on Linux.
It has been observed that there are some DNS resolution issues when executing ```docker pull``` after first boot.

Consider the following script:

```
admin@r-boxer-sw01:~$ while :; do date; cat /etc/resolv.conf; ping -c 1 harbor.mellanox.com; docker pull harbor.mellanox.com/sonic/cpu-report:1.0.0 ; sleep 1; done
Fri 03 Feb 2023 10:06:22 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.99 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.989/5.989/5.989/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:57245->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:23 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.56 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.561/5.561/5.561/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:53299->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:24 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.78 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.783/5.783/5.783/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:55765->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:25 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=7.17 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 7.171/7.171/7.171/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:44877->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:26 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.66 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.656/5.656/5.656/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:54604->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:27 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=8.22 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 8.223/8.223/8.223/0.000 ms
1.0.0: Pulling from sonic/cpu-report
004f1eed87df: Downloading [===================>                               ]   19.3MB/50.43MB
5d6f1e8117db: Download complete
48c2faf66abe: Download complete
234b70d0479d: Downloading [=========>                                         ]  9.363MB/51.84MB
6fa07a00e2f0: Downloading [==>                                                ]   9.51MB/192.4MB
04a31b4508b8: Waiting
e11ae5168189: Waiting
8861a99744cb: Waiting
d59580d95305: Waiting
12b1523494c1: Waiting
d1a4b09e9dbc: Waiting
99f41c3f014f: Waiting
```

While /etc/resolv.conf has the correct content and ping (and any other utility that uses libc's DNS resolution implementation) works correctly
docker is unable to resolve the hostname and falls back to default [::1]:53. This started to happen after PR sonic-net#13516 has been merged.
As you can see from the log, dockerd is able to pick up the correct /etc/resolv.conf only after 5 sec since first try. This seems to be somehow related to the logic in Go's DNS resolver
https://github.com/golang/go/blob/master/src/net/dnsclient_unix.go#L385.

There have been issues like that reported in docker like:
  - docker/cli#2299
  - docker/cli#2618
  - moby/moby#22398

Since this starts to happen after inclusion of resolvconf package by
above mentioned PR and the fact I can't see any problem with that (ping,
nslookup, etc. works) the choice is made to force dockerd to use cgo
(libc) resolver.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this issue Feb 21, 2023
Go's runtime (and dockerd inherits this) uses own DNS resolver implementation by default on Linux.
It has been observed that there are some DNS resolution issues when executing ```docker pull``` after first boot.

Consider the following script:

```
admin@r-boxer-sw01:~$ while :; do date; cat /etc/resolv.conf; ping -c 1 harbor.mellanox.com; docker pull harbor.mellanox.com/sonic/cpu-report:1.0.0 ; sleep 1; done
Fri 03 Feb 2023 10:06:22 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.99 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.989/5.989/5.989/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:57245->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:23 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.56 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.561/5.561/5.561/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:53299->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:24 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.78 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.783/5.783/5.783/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:55765->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:25 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=7.17 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 7.171/7.171/7.171/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:44877->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:26 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.66 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.656/5.656/5.656/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:54604->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:27 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=8.22 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 8.223/8.223/8.223/0.000 ms
1.0.0: Pulling from sonic/cpu-report
004f1eed87df: Downloading [===================>                               ]   19.3MB/50.43MB
5d6f1e8117db: Download complete
48c2faf66abe: Download complete
234b70d0479d: Downloading [=========>                                         ]  9.363MB/51.84MB
6fa07a00e2f0: Downloading [==>                                                ]   9.51MB/192.4MB
04a31b4508b8: Waiting
e11ae5168189: Waiting
8861a99744cb: Waiting
d59580d95305: Waiting
12b1523494c1: Waiting
d1a4b09e9dbc: Waiting
99f41c3f014f: Waiting
```

While /etc/resolv.conf has the correct content and ping (and any other utility that uses libc's DNS resolution implementation) works correctly
docker is unable to resolve the hostname and falls back to default [::1]:53. This started to happen after PR sonic-net#13516 has been merged.
As you can see from the log, dockerd is able to pick up the correct /etc/resolv.conf only after 5 sec since first try. This seems to be somehow related to the logic in Go's DNS resolver
https://github.com/golang/go/blob/master/src/net/dnsclient_unix.go#L385.

There have been issues like that reported in docker like:
  - docker/cli#2299
  - docker/cli#2618
  - moby/moby#22398

Since this starts to happen after inclusion of resolvconf package by
above mentioned PR and the fact I can't see any problem with that (ping,
nslookup, etc. works) the choice is made to force dockerd to use cgo
(libc) resolver.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
mssonicbld pushed a commit to sonic-net/sonic-buildimage that referenced this issue Feb 21, 2023
Go's runtime (and dockerd inherits this) uses own DNS resolver implementation by default on Linux.
It has been observed that there are some DNS resolution issues when executing ```docker pull``` after first boot.

Consider the following script:

```
admin@r-boxer-sw01:~$ while :; do date; cat /etc/resolv.conf; ping -c 1 harbor.mellanox.com; docker pull harbor.mellanox.com/sonic/cpu-report:1.0.0 ; sleep 1; done
Fri 03 Feb 2023 10:06:22 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.99 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.989/5.989/5.989/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:57245->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:23 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.56 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.561/5.561/5.561/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:53299->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:24 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.78 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.783/5.783/5.783/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:55765->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:25 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=7.17 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 7.171/7.171/7.171/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:44877->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:26 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.66 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.656/5.656/5.656/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:54604->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:27 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=8.22 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 8.223/8.223/8.223/0.000 ms
1.0.0: Pulling from sonic/cpu-report
004f1eed87df: Downloading [===================>                               ]   19.3MB/50.43MB
5d6f1e8117db: Download complete
48c2faf66abe: Download complete
234b70d0479d: Downloading [=========>                                         ]  9.363MB/51.84MB
6fa07a00e2f0: Downloading [==>                                                ]   9.51MB/192.4MB
04a31b4508b8: Waiting
e11ae5168189: Waiting
8861a99744cb: Waiting
d59580d95305: Waiting
12b1523494c1: Waiting
d1a4b09e9dbc: Waiting
99f41c3f014f: Waiting
```

While /etc/resolv.conf has the correct content and ping (and any other utility that uses libc's DNS resolution implementation) works correctly
docker is unable to resolve the hostname and falls back to default [::1]:53. This started to happen after PR #13516 has been merged.
As you can see from the log, dockerd is able to pick up the correct /etc/resolv.conf only after 5 sec since first try. This seems to be somehow related to the logic in Go's DNS resolver
https://github.com/golang/go/blob/master/src/net/dnsclient_unix.go#L385.

There have been issues like that reported in docker like:
  - docker/cli#2299
  - docker/cli#2618
  - moby/moby#22398

Since this starts to happen after inclusion of resolvconf package by
above mentioned PR and the fact I can't see any problem with that (ping,
nslookup, etc. works) the choice is made to force dockerd to use cgo
(libc) resolver.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
mssonicbld pushed a commit to sonic-net/sonic-buildimage that referenced this issue Feb 22, 2023
Go's runtime (and dockerd inherits this) uses own DNS resolver implementation by default on Linux.
It has been observed that there are some DNS resolution issues when executing ```docker pull``` after first boot.

Consider the following script:

```
admin@r-boxer-sw01:~$ while :; do date; cat /etc/resolv.conf; ping -c 1 harbor.mellanox.com; docker pull harbor.mellanox.com/sonic/cpu-report:1.0.0 ; sleep 1; done
Fri 03 Feb 2023 10:06:22 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.99 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.989/5.989/5.989/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:57245->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:23 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.56 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.561/5.561/5.561/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:53299->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:24 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.78 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.783/5.783/5.783/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:55765->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:25 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=7.17 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 7.171/7.171/7.171/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:44877->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:26 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.66 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.656/5.656/5.656/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:54604->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:27 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=8.22 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 8.223/8.223/8.223/0.000 ms
1.0.0: Pulling from sonic/cpu-report
004f1eed87df: Downloading [===================>                               ]   19.3MB/50.43MB
5d6f1e8117db: Download complete
48c2faf66abe: Download complete
234b70d0479d: Downloading [=========>                                         ]  9.363MB/51.84MB
6fa07a00e2f0: Downloading [==>                                                ]   9.51MB/192.4MB
04a31b4508b8: Waiting
e11ae5168189: Waiting
8861a99744cb: Waiting
d59580d95305: Waiting
12b1523494c1: Waiting
d1a4b09e9dbc: Waiting
99f41c3f014f: Waiting
```

While /etc/resolv.conf has the correct content and ping (and any other utility that uses libc's DNS resolution implementation) works correctly
docker is unable to resolve the hostname and falls back to default [::1]:53. This started to happen after PR #13516 has been merged.
As you can see from the log, dockerd is able to pick up the correct /etc/resolv.conf only after 5 sec since first try. This seems to be somehow related to the logic in Go's DNS resolver
https://github.com/golang/go/blob/master/src/net/dnsclient_unix.go#L385.

There have been issues like that reported in docker like:
  - docker/cli#2299
  - docker/cli#2618
  - moby/moby#22398

Since this starts to happen after inclusion of resolvconf package by
above mentioned PR and the fact I can't see any problem with that (ping,
nslookup, etc. works) the choice is made to force dockerd to use cgo
(libc) resolver.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
@hoftherose
Copy link

Solution:
run "systemctl cat docker.service" , and it will tell you where the docker service file is located.
edit /lib/systemd/system/docker.service with your favorite text editor
append "NetworkManager-wait-online.service" to line 4, that line should start with the word "After="
run "systemctl daemon-reload" to update your changes to the service file.

This solved it for me, was getting this error in KinD and simply had to recreate to get it to reflect the changes.

@sam-thibault
Copy link
Contributor

This is an old issue. I will resolve it since a solution is described above.

@sam-thibault sam-thibault closed this as not planned Won't fix, can't repro, duplicate, stale Nov 10, 2023
@cdauth
Copy link

cdauth commented Nov 10, 2023

The solution described above sounds to me like a workaround and not like a solution.

The issue seems like docker is somehow caching the DNS resolution information when it starts and does not refresh it later when a command is run. The workaround described above will solve the problem that docker apparently caches the DNS resolution info when it is not even there yet because no network is connected yet. But I strongly suspect that the underlying issue will still cause errors when for example the DNS resolution information changes while docker is running, for example when connecting to a new network.

@sam-thibault sam-thibault reopened this Nov 13, 2023
@thaJeztah
Copy link
Member

It looks like this ticket has collected assorted issues with DNS, not all of those identical (but may result in similar outcome), and I'm not sure if there's a "single solution" for all of these, if at all.

This ticket looks to be about DNS resolution failures during docker push and docker pull; these errors originate from the daemon, which is not maintained in this repository (this repository contains the source-code for the docker CLI client).

w.r.t. the fix mentions (adding NetworkManager-wait-online.service service to the After rule); I don't think we can ship that as a general solution; NetworkManager may not be installed on the system, and I don't think we can add all possible combinations of other software in the list. Starting with moby/moby@dfa4e77 (moby/moby#30210), the systemd unit has network.online.target both as After and Wants, which should indicate that networking is available, but it's possible that other software modifies /etc/resolv.conf (and related files), which could interfere. Depending on your setup, it may be needed to make changes in systemd configuration to account for these, as it may not be possible to ship these configurations as a default.

I'll close this ticket, as this issue looks to be related to the daemon, which is not maintained in this repository, and because there's potentially multiple issues being conflated here, which makes it difficult to track.

If you have specific details about your situation, and possible solutions that can be included as a default, feel free to open a new ticket with those details included in https://githuib.com/moby/moby/issues.

@chaitu236
Copy link

chaitu236 commented Nov 22, 2023

Documenting one more instance of this issue and our findings in case it helps anyone who stumbles into it

docker is written in go and uses golang's APIs for network calls. golang has its own implementation of DNS resolver but can also use operating system's DNS resolver. On Linux machines, golang's DNS resolver is used as default.
There is a bug in older versions of golang's DNS resolver here.

	conf.dnsConfig = systemConf().resolv
	if conf.dnsConfig == nil {
		conf.dnsConfig = dnsReadConfig("/etc/resolv.conf")
	}
	conf.lastChecked = time.Now()

The if condition can never be true and dnsConfiguration is not re-read from /etc/resolv.conf. So on the first network call, the same DNS configuration that existed during program startup is used. Only network calls made 5 seconds after this first call trigger a re-read of /etc/resolv.conf.

In our case, when docker daemon is started networking is still not up completely. IP address is not yet assigned and /etc/resolv.conf is empty. So DNS configuration is initialized to some defaults and the first docker pull ends up using these even if by that point /etc/resolv.conf is updated.

There are a few ways to workaround this

  • Move docker initscript down in boot order so that there is more time for networking to be up by the time it starts. Other people posted similar workarounds here for systemd.
  • Add the line export GODEBUG=netdns=cgo to docker's initscript so that operating system's native DNS resolver is used instead of golang's resolver. Ref.
  • Unconditionally read dnsReadConfig in the snippet above like newer versions of go are doing since this commit. This is what we ended up doing in our OS by porting relevant fix.

I see that newer versions of docker use golang versions that contain the fix but I don't know if most distros picked up the versions that contain the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests