Docker commands take 1 minute to timeout if context endpoint is unreachable #1739

kinghuang · 2019-03-15T17:30:03Z

Description

The upcoming context feature seem to cause CLI commands to pause for a 1 minute timeout if the current context's endpoint is not reachable.

Some of my Docker endpoints are only reachable via a VPN connection. If the current context is for an endpoint that I cannot currently reach, every command takes an extra minute (60 seconds) to complete.

Steps to reproduce the issue:

Create a context for an unreachable endpoint.
Use that context.
Run CLI commands like docker context ls or docker info.

Describe the results you received:

When the endpoint is unreachable, commands take an extra minute to complete.

In this example, when I switch to psc-prod-gen without the requisite network access, Docker CLI commands go through a 1 minute timeout. Even switching back to the default context for my local Docker config takes a minute.

▸ ~ time docker context ls
NAME                DESCRIPTION                               DOCKER ENDPOINT               KUBERNETES ENDPOINT                ORCHESTRATOR
default *           Current DOCKER_HOST based configuration   unix:///var/run/docker.sock   https://localhost:6443 (default)   swarm
psc-dev-gen         PSC Development Cluster                   tcp://hidden:2376                                           
psc-onboard-gen     PSC Onboard Cluster                       tcp://hidden:2376                                          
psc-prod-gen        PSC Production Cluster                    tcp://hidden:2376                                          
psc-staging-gen     PSC Staging Cluster                       tcp://hidden:2376                                           
psc-uat-gen         PSC UAT Cluster                           tcp://hidden:2376                                          
/Users/king.huang/go/src/github.com/docker/cli/build/docker-darwin-amd64  ls  0.03s user 0.01s system 101% cpu 0.047 total

▸ ~ time docker context use psc-prod-gen
psc-prod-gen
Current context is now "psc-prod-gen"
/Users/king.huang/go/src/github.com/docker/cli/build/docker-darwin-amd64  use  0.04s user 0.02s system 94% cpu 0.060 total

▸ ~ time docker context ls
NAME                DESCRIPTION                               DOCKER ENDPOINT           KUBERNETES ENDPOINT   ORCHESTRATOR
default             Current DOCKER_HOST based configuration                                                   
psc-dev-gen         PSC Development Cluster                   tcp://hidden:2376                          
psc-onboard-gen     PSC Onboard Cluster                       tcp://hidden:2376                         
psc-prod-gen *      PSC Production Cluster                    tcp://hidden:2376                         
psc-staging-gen     PSC Staging Cluster                       tcp://hidden:2376                          
psc-uat-gen         PSC UAT Cluster                           tcp://hidden:2376                         
/Users/king.huang/go/src/github.com/docker/cli/build/docker-darwin-amd64  ls  0.03s user 0.02s system 0% cpu 1:00.05 total

▸ ~ time docker context use default
default
Current context is now "default"
/Users/king.huang/go/src/github.com/docker/cli/build/docker-darwin-amd64  use  0.03s user 0.02s system 0% cpu 1:00.05 total

▸ ~ time docker context ls
NAME                DESCRIPTION                               DOCKER ENDPOINT               KUBERNETES ENDPOINT                ORCHESTRATOR
default *           Current DOCKER_HOST based configuration   unix:///var/run/docker.sock   https://localhost:6443 (default)   swarm
psc-dev-gen         PSC Development Cluster                   tcp://hidden:2376                                           
psc-onboard-gen     PSC Onboard Cluster                       tcp://hidden:2376                                          
psc-prod-gen        PSC Production Cluster                    tcp://hidden:2376                                          
psc-staging-gen     PSC Staging Cluster                       tcp://hidden:2376                                           
psc-uat-gen         PSC UAT Cluster                           tcp://hidden:2376                                          
/Users/king.huang/go/src/github.com/docker/cli/build/docker-darwin-amd64  ls  0.03s user 0.01s system 94% cpu 0.048 total

Describe the results you expected:

Commands should fail immediately if the endpoint is unreachable.

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

Client:
 Version:           19.03.0-dev
 API version:       1.39 (downgraded from 1.40)
 Go version:        go1.11.5
 Git commit:        81ac432c
 Built:             Tue Mar 12 22:57:30 2019
 OS/Arch:           darwin/amd64
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          18.09.3
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.8
  Git commit:       774a1f4
  Built:            Thu Feb 28 06:40:58 2019
  OS/Arch:          linux/amd64
  Experimental:     true

Output of docker info:

Client:
 Debug Mode: false

Server:
 Containers: 16
  Running: 1
  Paused: 0
  Stopped: 15
 Images: 315
 Server Version: 18.09.3
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: active
  NodeID: ilntgb4sbycucyl3iiva1wk3p
  Is Manager: true
  ClusterID: wsz99afbhy169obu69h1eq7z0
  Managers: 1
  Nodes: 1
  Default Address Pool: 10.0.0.0/8  
  SubnetSize: 24
  Orchestration:
   Task History Retention Limit: 5
  Raft:
   Snapshot Interval: 10000
   Number of Old Snapshots to Retain: 0
   Heartbeat Tick: 1
   Election Tick: 10
  Dispatcher:
   Heartbeat Period: 5 seconds
  CA Configuration:
   Expiry Duration: 3 months
   Force Rotate: 0
  Autolock Managers: false
  Root Rotation In Progress: false
  Node Address: 192.168.65.3
  Manager Addresses:
   192.168.65.3:2377
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: e6b3f5632f50dbc4e9cb6288d911bf4f5e95b18e
 runc version: 6635b4f0c6af3810594d2770f662f34ddc15b40d
 init version: fec3683
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 4.9.125-linuxkit
 Operating System: Docker Desktop
 OSType: linux
 Architecture: x86_64
 CPUs: 6
 Total Memory: 7.786GiB
 Name: docker-desktop
 ID: Z6TC:PPXT:NO7F:M5SJ:E6D7:R4PM:3A6I:BZLQ:UTDJ:KPCE:6YKD:XI6X
 Docker Root Dir: /var/lib/docker
 Debug Mode: true
  File Descriptors: 56
  Goroutines: 200
  System Time: 2019-03-15T17:29:19.1958634Z
  EventsListeners: 3
 HTTP Proxy: gateway.docker.internal:3128
 HTTPS Proxy: gateway.docker.internal:3129
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: true
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Product License: Community Engine

Additional environment details (AWS, VirtualBox, physical, etc.):

The text was updated successfully, but these errors were encountered:

simonferquel · 2019-03-18T10:08:21Z

I suppose the docker info case makes sense to wait for the timeout (it is doing an API call), but context ls/context use etc, should not.

@thaJeztah I will have a look at it

simonferquel · 2019-03-18T10:26:27Z

I just looked at it, and it comes from the early call to initializeFromClient in the CLI Initialize function. Actually the problem seems to be present in 18.09 as well: https://github.com/docker/cli/blob/18.09/cli/command/cli.go#L197.

I suppose I can work on a fix for the upcoming version, but it might be difficult to backport.

This partially mitigates docker#1739 ("Docker commands take 1 minute to timeout if context endpoint is unreachable") and is a simpler alternative to docker#1747 (which completely defers the client connection until an actual call is attempted). Note that the previous 60s delay was the culmination of two separate 30s timeouts since the ping is tried twice. This with this patch the overall timeout is 20s. moby/moby#39206 will remove the second ping and once that propagates to this tree the timeout will be 10s. Signed-off-by: Ian Campbell <ijc@docker.com>

When we see an `ECONNREFUSED` (or equivalent) from an attempted `HEAD` on the `/_ping` endpoint there is no point in trying again with `GET` since the server is not responding/available at all. Once vendored into the cli this will partially mitigate docker/cli#1739 ("Docker commands take 1 minute to timeout if context endpoint is unreachable") by cutting the effective timeout in half. Signed-off-by: Ian Campbell <ijc@docker.com> (cherry picked from commit 8c8457b) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

When we see an `ECONNREFUSED` (or equivalent) from an attempted `HEAD` on the `/_ping` endpoint there is no point in trying again with `GET` since the server is not responding/available at all. Once vendored into the cli this will partially mitigate docker/cli#1739 ("Docker commands take 1 minute to timeout if context endpoint is unreachable") by cutting the effective timeout in half. Signed-off-by: Ian Campbell <ijc@docker.com>

When we see an `ECONNREFUSED` (or equivalent) from an attempted `HEAD` on the `/_ping` endpoint there is no point in trying again with `GET` since the server is not responding/available at all. Once vendored into the cli this will partially mitigate docker/cli#1739 ("Docker commands take 1 minute to timeout if context endpoint is unreachable") by cutting the effective timeout in half. Signed-off-by: Ian Campbell <ijc@docker.com> Upstream-commit: 8c8457b0f2f85d58c0e1a5b4a0e174d7808cf886 Component: engine

When we see an `ECONNREFUSED` (or equivalent) from an attempted `HEAD` on the `/_ping` endpoint there is no point in trying again with `GET` since the server is not responding/available at all. Once vendored into the cli this will partially mitigate docker/cli#1739 ("Docker commands take 1 minute to timeout if context endpoint is unreachable") by cutting the effective timeout in half. Signed-off-by: Ian Campbell <ijc@docker.com>

This helps to address docker/cli#1739, where an invalid `DOCKER_HOST` setting could result in a 64s delay (that's twice the delay here because the client was trying to hit the `/_ping` endpoint twice, which was addressed in moby/moby#39206) I made a previous attempt to fix this purely on the Docker cli side (docker/cli#1872) however that had the side effect of adding the timeout across the board and not just for the dial phase, which caused a regression for `docker logs -f` (docker/cli#1892) and so was reverted (docker/cli#1893). The new value of 10s is just based on a gut feeling, no initial connection should be taking that long in the real world unless something about the network link is pretty broken (e.g. bad dns perhaps), in which case affected users are surely pretty used to retrying things, better to fail faster in the normal case. Also drop the comment since the linked issue just shows that the original number, just like the new number, was arrived at fairly arbitrarily based on gut feelings (rather than anything empirical) so the reference is not really terribly useful. Signed-off-by: Ian Campbell <ijc@docker.com>

When we see an `ECONNREFUSED` (or equivalent) from an attempted `HEAD` on the `/_ping` endpoint there is no point in trying again with `GET` since the server is not responding/available at all. Once vendored into the cli this will partially mitigate docker/cli#1739 ("Docker commands take 1 minute to timeout if context endpoint is unreachable") by cutting the effective timeout in half. Signed-off-by: Ian Campbell <ijc@docker.com> (cherry picked from commit 8c8457b0f2f85d58c0e1a5b4a0e174d7808cf886) Signed-off-by: Sebastiaan van Stijn <github@gone.nl> Upstream-commit: 61a234d5621cf130db4beb163b8f0b357a27764f Component: engine

simonferquel mentioned this issue Mar 18, 2019

Defer API version negotiation #1747

Closed

silvin-lubecki mentioned this issue Apr 15, 2019

19.03.0-beta1: docker context commands hang if current engine is not reachable #1831

Closed

ijc mentioned this issue May 13, 2019

client: do not fallback to GET if HEAD on _ping fail to connect moby/moby#39206

Merged

ijc mentioned this issue May 13, 2019

add a 10s timeout to the client object. #1872

Merged

thaJeztah mentioned this issue May 13, 2019

[19.03 backport] client: do not fallback to GET if HEAD on _ping fail to connect docker-archive/engine#206

Merged

ijc mentioned this issue May 17, 2019

Reduce default connection time to 10s (from 32s) docker/go-connections#63

Merged

thaJeztah added the area/context label Jun 21, 2019

karolz-ms mentioned this issue Apr 6, 2020

Docker Desktop on Windows: 'docker context inspect' is very slow #2420

Closed

thaJeztah mentioned this issue Apr 7, 2020

cli: perform feature detection lazily #2424

Merged

silvin-lubecki closed this as completed in #2424 Apr 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docker commands take 1 minute to timeout if context endpoint is unreachable #1739

Docker commands take 1 minute to timeout if context endpoint is unreachable #1739

kinghuang commented Mar 15, 2019

simonferquel commented Mar 18, 2019

simonferquel commented Mar 18, 2019

Docker commands take 1 minute to timeout if context endpoint is unreachable #1739

Docker commands take 1 minute to timeout if context endpoint is unreachable #1739

Comments

kinghuang commented Mar 15, 2019

simonferquel commented Mar 18, 2019

simonferquel commented Mar 18, 2019