Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker commands take 1 minute to timeout if context endpoint is unreachable #1739

Closed
kinghuang opened this issue Mar 15, 2019 · 2 comments · Fixed by #2424
Closed

Docker commands take 1 minute to timeout if context endpoint is unreachable #1739

kinghuang opened this issue Mar 15, 2019 · 2 comments · Fixed by #2424

Comments

@kinghuang
Copy link

Description

The upcoming context feature seem to cause CLI commands to pause for a 1 minute timeout if the current context's endpoint is not reachable.

Some of my Docker endpoints are only reachable via a VPN connection. If the current context is for an endpoint that I cannot currently reach, every command takes an extra minute (60 seconds) to complete.

Steps to reproduce the issue:

  1. Create a context for an unreachable endpoint.
  2. Use that context.
  3. Run CLI commands like docker context ls or docker info.

Describe the results you received:

When the endpoint is unreachable, commands take an extra minute to complete.

In this example, when I switch to psc-prod-gen without the requisite network access, Docker CLI commands go through a 1 minute timeout. Even switching back to the default context for my local Docker config takes a minute.

▸ ~ time docker context ls
NAME                DESCRIPTION                               DOCKER ENDPOINT               KUBERNETES ENDPOINT                ORCHESTRATOR
default *           Current DOCKER_HOST based configuration   unix:///var/run/docker.sock   https://localhost:6443 (default)   swarm
psc-dev-gen         PSC Development Cluster                   tcp://hidden:2376                                           
psc-onboard-gen     PSC Onboard Cluster                       tcp://hidden:2376                                          
psc-prod-gen        PSC Production Cluster                    tcp://hidden:2376                                          
psc-staging-gen     PSC Staging Cluster                       tcp://hidden:2376                                           
psc-uat-gen         PSC UAT Cluster                           tcp://hidden:2376                                          
/Users/king.huang/go/src/github.com/docker/cli/build/docker-darwin-amd64  ls  0.03s user 0.01s system 101% cpu 0.047 total

▸ ~ time docker context use psc-prod-gen
psc-prod-gen
Current context is now "psc-prod-gen"
/Users/king.huang/go/src/github.com/docker/cli/build/docker-darwin-amd64  use  0.04s user 0.02s system 94% cpu 0.060 total

▸ ~ time docker context ls
NAME                DESCRIPTION                               DOCKER ENDPOINT           KUBERNETES ENDPOINT   ORCHESTRATOR
default             Current DOCKER_HOST based configuration                                                   
psc-dev-gen         PSC Development Cluster                   tcp://hidden:2376                          
psc-onboard-gen     PSC Onboard Cluster                       tcp://hidden:2376                         
psc-prod-gen *      PSC Production Cluster                    tcp://hidden:2376                         
psc-staging-gen     PSC Staging Cluster                       tcp://hidden:2376                          
psc-uat-gen         PSC UAT Cluster                           tcp://hidden:2376                         
/Users/king.huang/go/src/github.com/docker/cli/build/docker-darwin-amd64  ls  0.03s user 0.02s system 0% cpu 1:00.05 total

▸ ~ time docker context use default
default
Current context is now "default"
/Users/king.huang/go/src/github.com/docker/cli/build/docker-darwin-amd64  use  0.03s user 0.02s system 0% cpu 1:00.05 total

▸ ~ time docker context ls
NAME                DESCRIPTION                               DOCKER ENDPOINT               KUBERNETES ENDPOINT                ORCHESTRATOR
default *           Current DOCKER_HOST based configuration   unix:///var/run/docker.sock   https://localhost:6443 (default)   swarm
psc-dev-gen         PSC Development Cluster                   tcp://hidden:2376                                           
psc-onboard-gen     PSC Onboard Cluster                       tcp://hidden:2376                                          
psc-prod-gen        PSC Production Cluster                    tcp://hidden:2376                                          
psc-staging-gen     PSC Staging Cluster                       tcp://hidden:2376                                           
psc-uat-gen         PSC UAT Cluster                           tcp://hidden:2376                                          
/Users/king.huang/go/src/github.com/docker/cli/build/docker-darwin-amd64  ls  0.03s user 0.01s system 94% cpu 0.048 total

Describe the results you expected:

Commands should fail immediately if the endpoint is unreachable.

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

Client:
 Version:           19.03.0-dev
 API version:       1.39 (downgraded from 1.40)
 Go version:        go1.11.5
 Git commit:        81ac432c
 Built:             Tue Mar 12 22:57:30 2019
 OS/Arch:           darwin/amd64
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          18.09.3
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.8
  Git commit:       774a1f4
  Built:            Thu Feb 28 06:40:58 2019
  OS/Arch:          linux/amd64
  Experimental:     true

Output of docker info:

Client:
 Debug Mode: false

Server:
 Containers: 16
  Running: 1
  Paused: 0
  Stopped: 15
 Images: 315
 Server Version: 18.09.3
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: active
  NodeID: ilntgb4sbycucyl3iiva1wk3p
  Is Manager: true
  ClusterID: wsz99afbhy169obu69h1eq7z0
  Managers: 1
  Nodes: 1
  Default Address Pool: 10.0.0.0/8  
  SubnetSize: 24
  Orchestration:
   Task History Retention Limit: 5
  Raft:
   Snapshot Interval: 10000
   Number of Old Snapshots to Retain: 0
   Heartbeat Tick: 1
   Election Tick: 10
  Dispatcher:
   Heartbeat Period: 5 seconds
  CA Configuration:
   Expiry Duration: 3 months
   Force Rotate: 0
  Autolock Managers: false
  Root Rotation In Progress: false
  Node Address: 192.168.65.3
  Manager Addresses:
   192.168.65.3:2377
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: e6b3f5632f50dbc4e9cb6288d911bf4f5e95b18e
 runc version: 6635b4f0c6af3810594d2770f662f34ddc15b40d
 init version: fec3683
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 4.9.125-linuxkit
 Operating System: Docker Desktop
 OSType: linux
 Architecture: x86_64
 CPUs: 6
 Total Memory: 7.786GiB
 Name: docker-desktop
 ID: Z6TC:PPXT:NO7F:M5SJ:E6D7:R4PM:3A6I:BZLQ:UTDJ:KPCE:6YKD:XI6X
 Docker Root Dir: /var/lib/docker
 Debug Mode: true
  File Descriptors: 56
  Goroutines: 200
  System Time: 2019-03-15T17:29:19.1958634Z
  EventsListeners: 3
 HTTP Proxy: gateway.docker.internal:3128
 HTTPS Proxy: gateway.docker.internal:3129
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: true
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Product License: Community Engine

Additional environment details (AWS, VirtualBox, physical, etc.):

@simonferquel
Copy link
Contributor

I suppose the docker info case makes sense to wait for the timeout (it is doing an API call), but context ls/context use etc, should not.

@thaJeztah I will have a look at it

@simonferquel
Copy link
Contributor

I just looked at it, and it comes from the early call to initializeFromClient in the CLI Initialize function. Actually the problem seems to be present in 18.09 as well: https://github.com/docker/cli/blob/18.09/cli/command/cli.go#L197.

I suppose I can work on a fix for the upcoming version, but it might be difficult to backport.

ijc pushed a commit to ijc/docker-cli that referenced this issue May 13, 2019
This partially mitigates docker#1739 ("Docker commands take 1 minute to timeout if
context endpoint is unreachable") and is a simpler alternative to docker#1747 (which
completely defers the client connection until an actual call is attempted).

Note that the previous 60s delay was the culmination of two separate 30s
timeouts since the ping is tried twice. This with this patch the overall
timeout is 20s. moby/moby#39206 will remove the second
ping and once that propagates to this tree the timeout will be 10s.

Signed-off-by: Ian Campbell <ijc@docker.com>
thaJeztah pushed a commit to thaJeztah/docker that referenced this issue May 13, 2019
When we see an `ECONNREFUSED` (or equivalent) from an attempted `HEAD` on the
`/_ping` endpoint there is no point in trying again with `GET` since the server
is not responding/available at all.

Once vendored into the cli this will partially mitigate docker/cli#1739
("Docker commands take 1 minute to timeout if context endpoint is unreachable")
by cutting the effective timeout in half.

Signed-off-by: Ian Campbell <ijc@docker.com>
(cherry picked from commit 8c8457b)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
cpuguy83 pushed a commit to moby/moby that referenced this issue May 15, 2019
When we see an `ECONNREFUSED` (or equivalent) from an attempted `HEAD` on the
`/_ping` endpoint there is no point in trying again with `GET` since the server
is not responding/available at all.

Once vendored into the cli this will partially mitigate docker/cli#1739
("Docker commands take 1 minute to timeout if context endpoint is unreachable")
by cutting the effective timeout in half.

Signed-off-by: Ian Campbell <ijc@docker.com>
docker-jenkins pushed a commit to docker-archive/docker-ce that referenced this issue May 15, 2019
When we see an `ECONNREFUSED` (or equivalent) from an attempted `HEAD` on the
`/_ping` endpoint there is no point in trying again with `GET` since the server
is not responding/available at all.

Once vendored into the cli this will partially mitigate docker/cli#1739
("Docker commands take 1 minute to timeout if context endpoint is unreachable")
by cutting the effective timeout in half.

Signed-off-by: Ian Campbell <ijc@docker.com>
Upstream-commit: 8c8457b0f2f85d58c0e1a5b4a0e174d7808cf886
Component: engine
kiku-jw pushed a commit to kiku-jw/moby that referenced this issue May 16, 2019
When we see an `ECONNREFUSED` (or equivalent) from an attempted `HEAD` on the
`/_ping` endpoint there is no point in trying again with `GET` since the server
is not responding/available at all.

Once vendored into the cli this will partially mitigate docker/cli#1739
("Docker commands take 1 minute to timeout if context endpoint is unreachable")
by cutting the effective timeout in half.

Signed-off-by: Ian Campbell <ijc@docker.com>
ijc pushed a commit to ijc/go-connections that referenced this issue May 17, 2019
This helps to address docker/cli#1739, where an
invalid `DOCKER_HOST` setting could result in a 64s delay (that's twice the
delay here because the client was trying to hit the `/_ping` endpoint twice,
which was addressed in moby/moby#39206)

I made a previous attempt to fix this purely on the Docker cli side
(docker/cli#1872) however that had the side effect of
adding the timeout across the board and not just for the dial phase, which
caused a regression for `docker logs -f` (docker/cli#1892)
and so was reverted (docker/cli#1893).

The new value of 10s is just based on a gut feeling, no initial connection
should be taking that long in the real world unless something about the network
link is pretty broken (e.g. bad dns perhaps), in which case affected users are
surely pretty used to retrying things, better to fail faster in the normal case.

Also drop the comment since the linked issue just shows that the original
number, just like the new number, was arrived at fairly arbitrarily based on
gut feelings (rather than anything empirical) so the reference is not really
terribly useful.

Signed-off-by: Ian Campbell <ijc@docker.com>
docker-jenkins pushed a commit to docker-archive/docker-ce that referenced this issue May 24, 2019
When we see an `ECONNREFUSED` (or equivalent) from an attempted `HEAD` on the
`/_ping` endpoint there is no point in trying again with `GET` since the server
is not responding/available at all.

Once vendored into the cli this will partially mitigate docker/cli#1739
("Docker commands take 1 minute to timeout if context endpoint is unreachable")
by cutting the effective timeout in half.

Signed-off-by: Ian Campbell <ijc@docker.com>
(cherry picked from commit 8c8457b0f2f85d58c0e1a5b4a0e174d7808cf886)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: 61a234d5621cf130db4beb163b8f0b357a27764f
Component: engine
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants