Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker daemon stops accepting connections when vpnkit is under load #2841

Closed
imavroukakis opened this issue Apr 24, 2018 · 14 comments

Comments

@imavroukakis
Copy link

commented Apr 24, 2018

Please, check https://docs.docker.com/docker-for-mac/troubleshoot/.
Issues without logs and details cannot be debugged, and will be closed.

  • I have tried with the latest version of my channel (Stable or Edge)
  • I have included the Diagnostics ID in this Issue
  • I have submitted Diagnostics

Expected behavior

Docker CLI remains responsive under load

Actual behavior

Cannot perform any Docker CLI related commands

Information

Docker for Mac: version: 18.04.0-ce-mac62 (7e2bb492d35dc35ec0037d8024b7e7be57fc72a9)
macOS: version 10.12.6 (build: 16G1314)
logs: /tmp/7B0E50A9-8EE3-48E7-946D-550D9BAFA7E9/20180424-130751.tar.gz
failure: docker ps failed: (Failure "docker ps: timeout after 10.00s")
[OK]     vpnkit
[OK]     virtualization hypervisor
[OK]     vmnetd
[OK]     dns
[OK]     driver.amd64-linux
[OK]     virtualization VT-X
[OK]     app
[OK]     moby
[OK]     system
[OK]     moby-syslog
[OK]     kubernetes
[OK]     files
[OK]     env
[OK]     virtualization kern.hv_support
[OK]     osxfs
[OK]     moby-console
[OK]     logs
[ERROR]  docker-cli
         docker ps failed
[OK]     disk

Diagnostics id: 7B0E50A9-8EE3-48E7-946D-550D9BAFA7E9

Steps to reproduce the behaviour

Launch two containers , as part of a maven build (using fabric8 plugin). Containers are latest MSSQL and ElasticSearch 5.5.2 and start integration tests. At a specific point in the test suite, several hundred threads are running against the two services in the container, which eventually makes it unresponsive, with the following errors

➜  docker ps
Error response from daemon: Bad response from Docker engine

➜ docker stats
Error response from daemon: Bad response from Docker engine

When the test is stopped, Docker becomes responsive again

Both containers are allocated 3Gb of RAM each, Docker itself is set to use 6Gb

@imavroukakis

This comment has been minimized.

Copy link
Author

commented Apr 24, 2018

This seems to happen when vpnkit reaches around 2001 open files

lsof -p $(pgrep vpnkit) | wc -l
    2001

Limits on my machine as follows

➜  group.com.docker launchctl limit
	cpu         unlimited      unlimited
	filesize    unlimited      unlimited
	data        unlimited      unlimited
	stack       8388608        67104768
	core        0              unlimited
	rss         unlimited      unlimited
	memlock     unlimited      unlimited
	maxproc     2500           2500
	maxfiles    524288         524288

I have added
"vpnKitMaxConnections" : 2500, to ~/Library/Group\ Containers/group.com.docker/settings.json and I can see the value present in vpnkit after a docker restart

 com.docker.vpnkit [..elided for brevity] --max-connections 2500 [...]

@imavroukakis imavroukakis changed the title Docker daemon stops accepting connections when under load Docker daemon stops accepting connections when vpnkit is under load Apr 24, 2018

@imavroukakis

This comment has been minimized.

Copy link
Author

commented Apr 27, 2018

This also happens on the stable version

@akimd akimd added the area/docker label May 2, 2018

@djs55

This comment has been minimized.

Copy link
Contributor

commented May 2, 2018

The error is coming from https://github.com/moby/hyperkit/blob/3ace9850121a2ef270e0309a3ff6c2f991357842/src/lib/pci_virtio_sock.c#L1365

2018-04-23 13:16:05.283372+0100  localhost com.docker.driver.amd64-linux[50052]: hyperkit: stderr: TX: Unable to accept incoming connection: fd 1024 > FD_SETSIZE 1024

When there are a large number of open AF_VSOCK connections (in this case opened by vpnkit for port forwarding) hyperkit is unable to accept any more for things like forwarding the docker socket.

@imavroukakis

This comment has been minimized.

Copy link
Author

commented May 2, 2018

Thanks @djs55 . Any way we can tweak that setting?

@djs55

This comment has been minimized.

Copy link
Contributor

commented May 2, 2018

I don't think it's easily tweakable -- I think we would need to switch over from select to kqueue (or poll) in hyperkit's hypervisor socket implementation to remove the limit. I've filed an issue in [moby/hyperkit] and will gather feedback / second opinions from other maintainers there.

@docker-desktop-robot

This comment has been minimized.

Copy link
Collaborator

commented Jul 31, 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale comment.
Stale issues will be closed after an additional 30d of inactivity.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows.
/lifecycle stale

@imavroukakis

This comment has been minimized.

Copy link
Author

commented Jul 31, 2018

/remove-lifecycle stale

@docker-desktop-robot

This comment has been minimized.

Copy link
Collaborator

commented Oct 29, 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale comment.
Stale issues will be closed after an additional 30d of inactivity.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows.
/lifecycle stale

@imavroukakis

This comment has been minimized.

Copy link
Author

commented Oct 29, 2018

/remove-lifecycle stale

@djs55

This comment has been minimized.

Copy link
Contributor

commented Oct 29, 2018

@imavroukakis I have an experimental build which should bypass this particular scalability bottleneck. The build is available from
https://download-stage.docker.com/mac/master/Docker.dmg

Let me know if it behaves better or worse. If worse, could you upload a fresh set of diagnostics and quote the diagnostics ID? Thanks!

@IhorHryshkov

This comment has been minimized.

Copy link

commented Oct 30, 2018

@djs55 I have equal problem and I trying your new version of docker: 18.09.0-beta5 and don't have this problem, waiting this fix in stable 18.09.0.

@docker-desktop-robot

This comment has been minimized.

Copy link
Collaborator

commented Jan 28, 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale comment.
Stale issues will be closed after an additional 30d of inactivity.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows.
/lifecycle stale

@imavroukakis

This comment has been minimized.

Copy link
Author

commented Feb 27, 2019

Apologies for the late response, will check the build @djs55

@imavroukakis

This comment has been minimized.

Copy link
Author

commented Feb 27, 2019

/remove-lifecycle stale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.