Revert "Mount cgroups in the container" #30
Revert "Mount cgroups in the container" #30
Conversation
This reverts commit b441dfa. Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@docker.com> (github: crosbymichael)
ping @vmarmol |
LGTM |
Yeah, for that usecase Docker will want these to be mounted RW. I think it makes sense to do that. Do we know if that will fix the issues? |
Not just docker but there are existing users that are mounting the full cgroup fs in containers at that same location. I think we will have to rethink how we want to set this up and make it so the user can configure the options to mount this in as RO or RW and not force a default. I think the goal is nested containers right? |
Revert "Mount cgroups in the container"
Yes, the goal is nested containers. We can probably do a split where we mount your container RW and everything else RO. Also provide a flag to mount "everything else" RW (in the case of some supervisor container). |
Well, one goal is nested containers. Another goal is running e.g. systemd or other cgroups using apps. For instance, reading back cgroups performance stats for your own processes. |
For possible options to do this, see what lxc 1.0 exposes for this: https://qa.linuxcontainers.org/master/current/doc/man/lxc.container.conf.5.html, the lxc.mount.auto option. |
What about we make this a libcontainer option, with possible values similar to the ones in lxc, and then we make the default for a privileged container to get the full host cgroup rw, and the default for non-privileged systems something like the reverted commit (which is almost the same as cgroup-full:mixed in lxc). |
Probably its easier to just let users specify the mounts they want in On Wed, Jun 18, 2014 at 2:19 AM, Alexander Larsson <notifications@github.com
|
@rjnagal Well, its certainly possible to just mount the host cgroups into the container via volumes. However, that is a very unsafe thing to do, as it gives you full cgroup rights. For instance, you can then move your processes to any other system cgroup (i.e. out of the container cgroup). This is clearly something only privileged containers should be allowed to do. Also, its not always very simple to do this, as the cgroups may be mounted in various ways on the host, so basically you need some kind of script to generate all the volumes needed. If you want to do something more safe/complicated like mounting only your own cgroups as read-write, then that is not generally doable from the outside of the container, as the paths in the cgroup depends on what cgroup path docker assigns the container, which depends on the container id. Also, fundamentally cgroup setup is a complicated piece of the runtime machinery that we don't want users to have to understand and set up themselves. If some container (be it systemd or something else) needs some level of cgroups mounted, then they should be able to do that without extremely deep knowledge in how containers work. I.e. if its not by default you want something like |
@alexlarsson I agree with that it isn't safe to mount cgroups rw for any Complicated mounting might not be possible from docker API, but On Thu, Jun 19, 2014 at 6:16 AM, Alexander Larsson <notifications@github.com
|
@rjnagal Yes, first class option in the container spec for how and if cgroups are visible inside the container is what i would like to have. Then docker could set it to full r/w in the case of docker run --privileged, and use some other default otherwise. Plus other alternatives could be exposed as options in docker. |
@alexlarsson the only problem with making cgroup visibility a first-class @crosbymichael WDYT? On Thu, Jun 19, 2014 at 9:49 AM, Alexander Larsson <notifications@github.com
|
@rjnagal I share the concern with the duplication, but if we want containers to run inside containers we'll need to expose them as a first-class knob I think. We shouldn't need to rely on the user setting up the mounts right (especially since we want a standard around that). |
So our distro used to work fine in docker with systemd, but with this commit reveted it does not, can you provide workaround wihtout running in privileged mode. I think reverting this breaks more stuff, than it fixes. |
@crosbymichael I think another use case for this was reading back things like memory constraints of the current container, right? This would be handy to be able to do at some point. :) |
@tianon yes and for nested containers in user namespaces. We just have to do it right and not have all the systemd specific stuff |
We need to think about this more before enabling this change. It is blowing up with read only filesystem errors when running nested containers ( docker's privileged usecase ) and we need to make sure this is something we want people depending on always being there in the future.
This reverts commit b441dfa.
Docker-DCO-1.1-Signed-off-by: Michael Crosby michael@docker.com (github: crosbymichael)