Revert "Mount cgroups in the container" #30

crosbymichael · 2014-06-17T18:44:37Z

We need to think about this more before enabling this change. It is blowing up with read only filesystem errors when running nested containers ( docker's privileged usecase ) and we need to make sure this is something we want people depending on always being there in the future.

This reverts commit b441dfa.

Docker-DCO-1.1-Signed-off-by: Michael Crosby michael@docker.com (github: crosbymichael)

This reverts commit b441dfa. Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@docker.com> (github: crosbymichael)

crosbymichael · 2014-06-17T18:47:09Z

ping @vmarmol

vmarmol · 2014-06-17T18:54:07Z

LGTM

vmarmol · 2014-06-17T18:56:04Z

Yeah, for that usecase Docker will want these to be mounted RW. I think it makes sense to do that. Do we know if that will fix the issues?

crosbymichael · 2014-06-17T18:59:49Z

Not just docker but there are existing users that are mounting the full cgroup fs in containers at that same location.

I think we will have to rethink how we want to set this up and make it so the user can configure the options to mount this in as RO or RW and not force a default.

I think the goal is nested containers right?

Revert "Mount cgroups in the container"

vmarmol · 2014-06-17T19:04:28Z

Yes, the goal is nested containers.

We can probably do a split where we mount your container RW and everything else RO. Also provide a flag to mount "everything else" RW (in the case of some supervisor container).

alexlarsson · 2014-06-17T19:55:24Z

Well, one goal is nested containers. Another goal is running e.g. systemd or other cgroups using apps. For instance, reading back cgroups performance stats for your own processes.

alexlarsson · 2014-06-17T19:56:43Z

For possible options to do this, see what lxc 1.0 exposes for this: https://qa.linuxcontainers.org/master/current/doc/man/lxc.container.conf.5.html, the lxc.mount.auto option.

alexlarsson · 2014-06-18T09:19:01Z

What about we make this a libcontainer option, with possible values similar to the ones in lxc, and then we make the default for a privileged container to get the full host cgroup rw, and the default for non-privileged systems something like the reverted commit (which is almost the same as cgroup-full:mixed in lxc).

rjnagal · 2014-06-18T16:55:27Z

Probably its easier to just let users specify the mounts they want in
regular mount options.If we want to treat cgroup mount as special, say to
mount the task's own cgroup differently, it would make sense to make it a
special option. It feels like having two ways to do the same thing (special
cgroup option vs another mount call) would allow surprising behavior for
some users.

On Wed, Jun 18, 2014 at 2:19 AM, Alexander Larsson <notifications@github.com

wrote:

What about we make this a libcontainer option, with possible values
similar to the ones in lxc, and then we make the default for a privileged
container to get the full host cgroup rw, and the default for
non-privileged systems something like the reverted commit (which is almost
the same as cgroup-full:mixed in lxc).

—
Reply to this email directly or view it on GitHub
#30 (comment).

alexlarsson · 2014-06-19T13:16:54Z

@rjnagal Well, its certainly possible to just mount the host cgroups into the container via volumes. However, that is a very unsafe thing to do, as it gives you full cgroup rights. For instance, you can then move your processes to any other system cgroup (i.e. out of the container cgroup). This is clearly something only privileged containers should be allowed to do. Also, its not always very simple to do this, as the cgroups may be mounted in various ways on the host, so basically you need some kind of script to generate all the volumes needed.

If you want to do something more safe/complicated like mounting only your own cgroups as read-write, then that is not generally doable from the outside of the container, as the paths in the cgroup depends on what cgroup path docker assigns the container, which depends on the container id.

Also, fundamentally cgroup setup is a complicated piece of the runtime machinery that we don't want users to have to understand and set up themselves. If some container (be it systemd or something else) needs some level of cgroups mounted, then they should be able to do that without extremely deep knowledge in how containers work. I.e. if its not by default you want something like docker run --cgroup=mixed image.

rjnagal · 2014-06-19T16:13:54Z

@alexlarsson I agree with that it isn't safe to mount cgroups rw for any
task. But libcontainer doesn't have a concept of a privileged container. I
am fine with whatever policies docker uses to set up containers. Do we need
a first-class concept of cgroup visibility in libcontainer?

Complicated mounting might not be possible from docker API, but
libcontainer can definitely do that as long as the cgroups are created
early enough.

On Thu, Jun 19, 2014 at 6:16 AM, Alexander Larsson <notifications@github.com

wrote:

@rjnagal https://github.com/rjnagal Well, its certainly possible to
just mount the host cgroups into the container via volumes. However, that
is a very unsafe thing to do, as it gives you full cgroup rights. For
instance, you can then move your processes to any other system cgroup (i.e.
out of the container cgroup). This is clearly something only privileged
containers should be allowed to do. Also, its not always very simple to do
this, as the cgroups may be mounted in various ways on the host, so
basically you need some kind of script to generate all the volumes needed.

If you want to do something more safe/complicated like mounting only your
own cgroups as read-write, then that is not generally doable from the
outside of the container, as the paths in the cgroup depends on what cgroup
path docker assigns the container, which depends on the container id.

Also, fundamentally cgroup setup is a complicated piece of the runtime
machinery that we don't want users to have to understand and set up
themselves. If some container (be it systemd or something else) needs some
level of cgroups mounted, then they should be able to do that without
extremely deep knowledge in how containers work. I.e. if its not by default
you want something like docker run --cgroup=mixed image.

—
Reply to this email directly or view it on GitHub
#30 (comment).

alexlarsson · 2014-06-19T16:49:20Z

@rjnagal Yes, first class option in the container spec for how and if cgroups are visible inside the container is what i would like to have. Then docker could set it to full r/w in the case of docker run --privileged, and use some other default otherwise. Plus other alternatives could be exposed as options in docker.

rjnagal · 2014-06-20T15:48:13Z

@alexlarsson the only problem with making cgroup visibility a first-class
knob is that it allows for two ways to configure cgroup visibility (the new
knob and mount options). We'll have to maintain some code to reconcile both.
But I don't feel that strongly about it.

@crosbymichael WDYT?

On Thu, Jun 19, 2014 at 9:49 AM, Alexander Larsson <notifications@github.com

wrote:

@rjnagal https://github.com/rjnagal Yes, first class option in the
container spec for how and if cgroups are visible inside the container is
what i would like to have. Then docker could set it to full r/w in the case
of docker run --privileged, and use some other default otherwise. Plus
other alternatives could be exposed as options in docker.

—
Reply to this email directly or view it on GitHub
#30 (comment).

vmarmol · 2014-06-20T15:50:48Z

@rjnagal I share the concern with the duplication, but if we want containers to run inside containers we'll need to expose them as a first-class knob I think. We shouldn't need to rely on the user setting up the mounts right (especially since we want a standard around that).

offlinehacker · 2014-08-23T22:17:34Z

So our distro used to work fine in docker with systemd, but with this commit reveted it does not, can you provide workaround wihtout running in privileged mode. I think reverting this breaks more stuff, than it fixes.

tianon · 2014-09-12T03:16:09Z

@crosbymichael I think another use case for this was reading back things like memory constraints of the current container, right? This would be handy to be able to do at some point. :)

crosbymichael · 2014-09-12T17:19:12Z

@tianon yes and for nested containers in user namespaces. We just have to do it right and not have all the systemd specific stuff

Revert "Mount cgroups in the container"

c7d1cb2

This reverts commit b441dfa. Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@docker.com> (github: crosbymichael)

crosbymichael pushed a commit that referenced this pull request Jun 17, 2014

Merge pull request #30 from crosbymichael/revert-mount-cgroup

77ffd49

Revert "Mount cgroups in the container"

crosbymichael merged commit 77ffd49 into docker-archive:master Jun 17, 2014

crosbymichael deleted the revert-mount-cgroup branch June 17, 2014 19:00

This was referenced Aug 23, 2014

running systemd inside docker arch container hangs or segfaults moby/moby#3629

Closed

full nixos inside docker NixOS/nixpkgs#3779

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert "Mount cgroups in the container" #30

Revert "Mount cgroups in the container" #30

crosbymichael commented Jun 17, 2014

crosbymichael commented Jun 17, 2014

vmarmol commented Jun 17, 2014

vmarmol commented Jun 17, 2014

crosbymichael commented Jun 17, 2014

vmarmol commented Jun 17, 2014

alexlarsson commented Jun 17, 2014

alexlarsson commented Jun 17, 2014

alexlarsson commented Jun 18, 2014

rjnagal commented Jun 18, 2014

alexlarsson commented Jun 19, 2014

rjnagal commented Jun 19, 2014

alexlarsson commented Jun 19, 2014

rjnagal commented Jun 20, 2014

vmarmol commented Jun 20, 2014

offlinehacker commented Aug 23, 2014

tianon commented Sep 12, 2014

crosbymichael commented Sep 12, 2014

Revert "Mount cgroups in the container" #30

Revert "Mount cgroups in the container" #30

Conversation

crosbymichael commented Jun 17, 2014

crosbymichael commented Jun 17, 2014

vmarmol commented Jun 17, 2014

vmarmol commented Jun 17, 2014

crosbymichael commented Jun 17, 2014

vmarmol commented Jun 17, 2014

alexlarsson commented Jun 17, 2014

alexlarsson commented Jun 17, 2014

alexlarsson commented Jun 18, 2014

rjnagal commented Jun 18, 2014

alexlarsson commented Jun 19, 2014

rjnagal commented Jun 19, 2014

alexlarsson commented Jun 19, 2014

rjnagal commented Jun 20, 2014

vmarmol commented Jun 20, 2014

offlinehacker commented Aug 23, 2014

tianon commented Sep 12, 2014

crosbymichael commented Sep 12, 2014