Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.10] docker ps slow down #19328

Closed
ibuildthecloud opened this issue Jan 14, 2016 · 13 comments
Closed

[1.10] docker ps slow down #19328

ibuildthecloud opened this issue Jan 14, 2016 · 13 comments
Labels
area/daemon kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. priority/P3 Best effort: those are nice to have / minor issues.

Comments

@ibuildthecloud
Copy link
Contributor

This applies to 1.9 also, but I found this on master. TL;DR Obtaining the container.Lock() at https://github.com/docker/docker/blob/master/daemon/list.go#L115 is very problematic.

Even after all the fixes in Docker 1.9 for docker ps performance I sometimes see it go to crap. I've largely tracked this down to that one container Lock acquire. The problem is that the container lock is acquired during container start also. While starting a container you can do a lot of things that are very very slow or possibly hang for awhile. The one I'm seeing the most right now is calling volume drivers, like the below stack trace.

net/http.(*persistConn).roundTrip(0xc208090580, 0xc2088155b0, 0x0, 0x0, 0x0)
\t/usr/local/go/src/net/http/transport.go:1082 +0x7ad
net/http.(*Transport).RoundTrip(0xc20908f680, 0xc208c77520, 0xc208f9f140, 0x0, 0x0)
\t/usr/local/go/src/net/http/transport.go:235 +0x558
net/http.send(0xc208c77520, 0x7f70dde256a8, 0xc20908f680, 0x35, 0x0, 0x0)
\t/usr/local/go/src/net/http/client.go:219 +0x4fc
net/http.(*Client).send(0xc208a348a0, 0xc208c77520, 0x35, 0x0, 0x0)
\t/usr/local/go/src/net/http/client.go:142 +0x15b
net/http.(*Client).doFollowingRedirects(0xc208a348a0, 0xc208c77520, 0x1734a60, 0x0, 0x0, 0x0)
\t/usr/local/go/src/net/http/client.go:367 +0xb25
net/http.(*Client).Do(0xc208a348a0, 0xc208c77520, 0x1da8ea0, 0x0, 0x0)
\t/usr/local/go/src/net/http/client.go:177 +0x192
github.com/docker/docker/pkg/plugins.(*Client).callWithRetry(0xc208a34870, 0x14c4ab0, 0x12, 0x7f70dc43c7e0, 0xc20877fc00, 0x1, 0x0, 0x0, 0x0, 0x0)
\t/go/src/github.com/docker/docker/pkg/plugins/client.go:113 +0x22e
github.com/docker/docker/pkg/plugins.(*Client).Call(0xc208a34870, 0x14c4ab0, 0x12, 0x12162e0, 0xc208815540, 0xfaa120, 0xc208a23b00, 0x0, 0x0)
\t/go/src/github.com/docker/docker/pkg/plugins/client.go:66 +0x1e3
github.com/docker/docker/volume/drivers.(*volumeDriverProxy).Mount(0xc20873ea90, 0xc20873e618, 0x4, 0x0, 0x0, 0x0, 0x0)
\t/go/src/github.com/docker/docker/volume/drivers/proxy.go:114 +0x17f
github.com/docker/docker/volume/drivers.(*volumeAdapter).Mount(0xc2089f22c0, 0x0, 0x0, 0x0, 0x0)
\t/go/src/github.com/docker/docker/volume/drivers/adapter.go:59 +0x74
github.com/docker/docker/daemon.(*mountPoint).Setup(0xc2086a97a0, 0x0, 0x0, 0x0, 0x0)
\t/go/src/github.com/docker/docker/daemon/volumes.go:42 +0x7d
github.com/docker/docker/daemon.(*Container).setupMounts(0xc208bd0a00, 0x0, 0x0, 0x0, 0x0, 0x0)
\t/go/src/github.com/docker/docker/daemon/volumes_unix.go:44 +0x112
github.com/docker/docker/daemon.(*Container).Start(0xc208bd0a00, 0x0, 0x0)
\t/go/src/github.com/docker/docker/daemon/container.go:318 +0x4f7
github.com/docker/docker/daemon.(*Daemon).ContainerStart(0xc20805f040, 0xc20869b587, 0x40, 0x0, 0x0, 0x0)
\t/go/src/github.com/docker/docker/daemon/start.go:47 +0x24f
github.com/docker/docker/api/server/router/local.(*router).postContainersStart(0xc2085dad00, 0x7f70dc446628, 0xc208cecf90, 0x7f70dc4465f0, 0xc2090bf400, 0xc208c76820, 0xc208cece10, 0x0, 0x0)
\t/go/src/github.com/docker/docker/api/server/router/local/container.go:192 +0x162
github.com/docker/docker/api/server/router/local.*router.(github.com/docker/docker/api/server/router/local.postContainersStart)\ufffd\ufffdfm(0x7f70dc446628, 0xc208cecf90, 0x7f70dc4465f0, 0xc2090bf400, 0xc208c76820, 0xc208cece10, 0x0, 0x0)
\t/go/src/github.com/docker/docker/api/server/router/local/local.go:134 +0x7b
github.com/docker/docker/api/server.func\ufffd\ufffd004(0x7f70dc446628, 0xc208cecf90, 0x7f70dc4465f0, 0xc2090bf400, 0xc208c76820, 0xc208cece10, 0x0, 0x0)
\t/go/src/github.com/docker/docker/api/server/middleware.go:87 +0x7c7
github.com/docker/docker/api/server.func\ufffd\ufffd003(0x7f70dde25b38, 0xc20802af28, 0x7f70dc4465f0, 0xc2090bf400, 0xc208c76820, 0xc208cece10, 0x0, 0x0)
\t/go/src/github.com/docker/docker/api/server/middleware.go:66 +0x10e
github.com/docker/docker/api/server.func\ufffd\ufffd002(0x7f70dde25b38, 0xc20802af28, 0x7f70dc4465f0, 0xc2090bf400, 0xc208c76820, 0xc208cece10, 0x0, 0x0)
\t/go/src/github.com/docker/docker/api/server/middleware.go:49 +0x47c
github.com/docker/docker/api/server.func\ufffd\ufffd001(0x7f70dde25b38, 0xc20802af28, 0x7f70dc4465f0, 0xc2090bf400, 0xc208c76820, 0xc208cece10, 0x0, 0x0)
\t/go/src/github.com/docker/docker/api/server/middleware.go:27 +0x1dc
github.com/docker/docker/api/server.func\ufffd\ufffd007(0x7f70dc4465f0, 0xc2090bf400, 0xc208c76820)
\t/go/src/github.com/docker/docker/api/server/server.go:156 +0x265
net/http.HandlerFunc.ServeHTTP(0xc2086c1260, 0x7f70dc4465f0, 0xc2090bf400, 0xc208c76820)
\t/usr/local/go/src/net/http/server.go:1265 +0x41

So the problem is that you just need one container to hang up on start and then all docker ps calls will hang. I've seen this get so bad that enough goroutines backed up such that the daemon was taking 700mb of memory just for blocked goroutines.

It seems like the best fix is to just never acquire this lock. Lets have a lock free docker ps!

@vdemeester vdemeester added the kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. label Jan 14, 2016
@WeiZhang555
Copy link
Contributor

So the problem is that you just need one container to hang up on start and then all docker ps calls will hang. I've seen this get so bad that enough goroutines backed up such that the daemon was taking 700mb of memory just for blocked goroutines.

It seems like the best fix is to just never acquire this lock. Lets have a lock free docker ps!

👍 It's true, I also noticed that. And I'm definitely +1 on lock free docker ps.

@LK4D4
Copy link
Contributor

LK4D4 commented Jan 14, 2016

But how can you be sure about ps output validness?

@cpuguy83
Copy link
Member

Yeah, we'd at least need a read lock on ps.

@calavera calavera self-assigned this Jan 14, 2016
@calavera
Copy link
Contributor

But how can you be sure about ps output validness?

we cannot be sure about output validness anyways with the current lock. It only guarantees that the container doesn't change while we're reading its information, but there are no further guarantees.

Imagine that we have 1000 containers. There is no guarantees that the container 1 is not deleted when we read the information of the container 1000 and return the results for docker ps.

I'm assigning this to myself because I have already other changes in docker ps that make this operation faster and I can add this to it.

@ibuildthecloud
Copy link
Contributor Author

@LK4D4 your right that we probably can't have a 100% lock free if we want consistent data. The problem I see at the moment is that the container lock is used to both control concurrency of operations and consistency of data. I think the two should be split.

@calavera Will this be addressed in 1.10? All I can say is that the call to container.Lock() in reducePsContainer() has been the bane of my existence lately. I recently started doing some very resource intensive operations in volume drivers and everything goes to crap.

@ibuildthecloud
Copy link
Contributor Author

Just wanted to point out that this issue applied to inspect too as that also acquires the container lock.

@thaJeztah
Copy link
Member

Adding to the milestone so that we don't loose sight

@thaJeztah thaJeztah added this to the 1.10.0 milestone Jan 15, 2016
@calavera
Copy link
Contributor

Will this be addressed in 1.10?

I'm not sure if we should patch this for 1.10. I understand your frustration, but we should follow the rules 😄 fixing this might involve something else besides removing those 4 lines. I'll open a PR shortly.

@thaJeztah
Copy link
Member

@calavera is this PR meant to improve this? #19383

@calavera
Copy link
Contributor

that PR is not strictly necessary, but we'll need to do something at the store level, like locking the IDs in the store rather than locking the whole container.

@tiborvass tiborvass added the priority/P3 Best effort: those are nice to have / minor issues. label Jan 20, 2016
@calavera calavera removed this from the 1.10.0 milestone Jan 26, 2016
@calavera
Copy link
Contributor

I'm removing this from the milestone. As I said, fixing this will carry consequences that we should not consider at this point of the release.

@LK4D4
Copy link
Contributor

LK4D4 commented Nov 29, 2016

@cpuguy83 works on lock free container structure.

@AkihiroSuda
Copy link
Member

Closing as outdated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/daemon kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. priority/P3 Best effort: those are nice to have / minor issues.
Projects
None yet
Development

No branches or pull requests

9 participants