New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory usage of containerd-shim #21737
Comments
It should be noted that the same issue is true for Docker versions
So this isn't an issue that's unique to Docker 1.11. And I actually thought that having |
@cyphar Yes, I only care about overcommit because I see issues with it often in <= 1.10. I'm not sure if containerd-shim makes it worse or better and that was one of my questions. My guess is that ultimately it makes it worse. The problem with overcommit usually manifests itself in forking iptables, not a container. So containerd doesn't help there and containerd-shim is now consuming even more memory than before. The reason that Docker daemon goes up to >1gb memory usage does not seem to be related to the number of containers, but some other type of bug. |
I'm of the opinion this is a Go runtime issue (there's no resource leak in pprof, and the "leak" is of the "unused heap space freed by
That's very interesting, I didn't know that. To be clear, you're referring to forking the |
AFAIK not calling iptables isn't feasible because the netlink protocol is not stable and the CLI is the considered the stable interface. I don't think there is anything special about iptables, it is just that once memory gets high enough you can't fork any process and iptables gets forked before you can launch a container. I don't really have a reproducible use case for memory usage. If I had to guess I would say a lot of container logging has a tendency to grow the virtual size. |
I have some optimizations that I'm working on, this being one: https://github.com/docker/containerd/pull/184 The end goal is to have the shim written in C. |
@crosbymichael What is the purpose of the shim? Why can't containerd be the parent? |
@crosbymichael you should write it in OCaml or Rust. C would be far too practical. |
@ibuildthecloud so the main purpose is for reattach. Its not enabled in this release but with the shim being the parent and keeping a hold of the fifo's and pty master it allows docker and containerd to both die and your containers to keep running. When docker/containerd come back up then they can reattach to the container and get exit events and attach to stdio. |
There are ways we can improve As for @crosbymichael's optimisations, I'm not sure whether that actually decreases the overcommit footprint. It looks like it just decreases the binary size. Or am I misunderstanding something? |
@crosbymichael As for your second point:
That sort of begs the question why containerd was necessary at all. Surely that could've been implemented purely within Docker. |
@crosbymichael the two levels seem like over kill then. Why not just embed containerd in Docker? |
@crosbymichael I do like the idea of having mini olaf rain clouds floating around with containers. That is cool. If the shim was in C then the overhead really should be like 2k. |
Yes, FWIW, iptables doesn't use netlink until 3.14 (when nftables api became the core of how iptables are implemented) so programmatic access is not available in all kernels. In the older kernels the kernel api is fairly undocumented and even though there is something called |
We have been thinking about it for some time (to use |
@ibuildthecloud I did some testing with Docker 1.10.3 (before containerd, because the architecture is simpler). Running the daemon with |
+1. The shim is really not that big.
I think that separating it also facilitates more independent updates ( Docker is a lot more than just just running bundles after all :p ).
This is interesting. Can you explain how such iptables-restore api will work and/or interact with existing iptables rules / iptables manipulator on the host ? |
Right, but if the end-goal is to have containers be completely separated from either daemon (meaning if you update the daemon the containers won't die) -- what is the point of having "independent" updates if updating either daemon will not kill containers (and there isn't any other overwhelming "independence" issue AFAICS)? Surely if this functionality was embedded inside Docker, it would also make updates of the Docker daemon independent of the container lifetimes? And it wouldn't require packaging an extra daemon that is very intertwined with the inner workings of Docker (mountpoints are very fiddly between the two daemons). I'm going to be honest, I never understood the benefits of splitting the daemon into two separate daemons. So I'm probably biased in this discussion, but that's because I don't see any overwhelming benefit and do see the potential for a lot of pain. Also, I really want to know how many bugs are going to come up because of the fact that there's a lot of mountpoint cleaning up that we must not do if containerd can stand on its own. |
We did the best we can in 1.11 with the Go shim. I'm keeping the issue open and rolling to 1.12 for when we port it to C. |
@jetheurer Why do you think the shim is the cause of the machine crashing? |
@cpuguy83 I thought the shim might be the issue because its the root process. Is there a way to diagnose the issue? |
@jetheurer This is probably not the best place since it is not related to the topic of this issue. |
This issue is a very big problem for us, and as far as I can tell it's a bug in the Go runtime (it doesn't fully deallocate memory it just uses Unfortunately, some of my interactions with the Go community have been met with "this isn't an issue because there's no limit on address space" -- missing the whole point of the issue. So, is there a way we can make this a big enough issue for the Go community (like we did with WDYT? |
@cyphar Could you point to issues you have created in Go so we can help push for fixes there? |
I didn't open an issue as of yet, I just had some discussion with upstream here https://groups.google.com/forum/#!topic/golang-dev/zqFt5oVcTCY. |
FWIW Im working on a shimless version of containerd https://github.com/docker/containerd/pull/227 |
I beg your pardon, but was there any progress on this issue? I have a project which needs to run thousands of tiny processes in hundreds of containers per minute. I discovered the overhead of |
Shim memory usage is much better (than previous 1.0 based shims) now. This is still from containerd 1.0.1, there's more patches in 1.0.2 to further improve shim memory usage. @frol If you are looking for raw runtime performance, directly using containerd may be better although containerd doesn't setup networking for you which is a large amount of the time that moby takes (although moby has other inefficiencies that need to be addressed as well). |
@cpuguy83 Thank you for the suggestion. I will consider using containerd directly. |
FYI, in containerd create+start is ~250ms, stop+delete add another ~250ms |
@frol Another option would be to use |
So, any official guide to tell us how many containers suggest for per 1GB memory as shim will cost 5M memory? |
This is a question and concern. With every container a containerd-shim process is started. It seems to be that that process consumes between 3-5mb (excluding shared memory). Doing some quick, not so scientific tests, I have seen the memory usage of my system take ~400mb more than 1.10.3 to launch 100 nginx containers. This was a quick cursory test, so I wanted to ask the maintainers how they envision this working going forward, before I dug in much more doing performance tests on 1.11. I also have a concern about VM overcommit because each containerd-shim process consumes over 200mb of virtual memory.
I honestly don't see how this approach will scale with this intermediate process. With the current usage pattern if I run 500 containers (which is a very common use case), I will be using 2gb of memory just for Docker. Am I missing something obvious here? This doesn't seem good.
The text was updated successfully, but these errors were encountered: