Skip to content
This repository has been archived by the owner on Dec 13, 2018. It is now read-only.

Revert "Mount cgroups in the container" #30

Merged

Conversation

crosbymichael
Copy link
Contributor

We need to think about this more before enabling this change. It is blowing up with read only filesystem errors when running nested containers ( docker's privileged usecase ) and we need to make sure this is something we want people depending on always being there in the future.

This reverts commit b441dfa.

Docker-DCO-1.1-Signed-off-by: Michael Crosby michael@docker.com (github: crosbymichael)

This reverts commit b441dfa.

Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@docker.com> (github: crosbymichael)
@crosbymichael
Copy link
Contributor Author

ping @vmarmol

@vmarmol
Copy link
Contributor

vmarmol commented Jun 17, 2014

LGTM

@vmarmol
Copy link
Contributor

vmarmol commented Jun 17, 2014

Yeah, for that usecase Docker will want these to be mounted RW. I think it makes sense to do that. Do we know if that will fix the issues?

@crosbymichael
Copy link
Contributor Author

Not just docker but there are existing users that are mounting the full cgroup fs in containers at that same location.

I think we will have to rethink how we want to set this up and make it so the user can configure the options to mount this in as RO or RW and not force a default.

I think the goal is nested containers right?

crosbymichael pushed a commit that referenced this pull request Jun 17, 2014
Revert "Mount cgroups in the container"
@crosbymichael crosbymichael merged commit 77ffd49 into docker-archive:master Jun 17, 2014
@crosbymichael crosbymichael deleted the revert-mount-cgroup branch June 17, 2014 19:00
@vmarmol
Copy link
Contributor

vmarmol commented Jun 17, 2014

Yes, the goal is nested containers.

We can probably do a split where we mount your container RW and everything else RO. Also provide a flag to mount "everything else" RW (in the case of some supervisor container).

@alexlarsson
Copy link
Contributor

Well, one goal is nested containers. Another goal is running e.g. systemd or other cgroups using apps. For instance, reading back cgroups performance stats for your own processes.

@alexlarsson
Copy link
Contributor

For possible options to do this, see what lxc 1.0 exposes for this: https://qa.linuxcontainers.org/master/current/doc/man/lxc.container.conf.5.html, the lxc.mount.auto option.

@alexlarsson
Copy link
Contributor

What about we make this a libcontainer option, with possible values similar to the ones in lxc, and then we make the default for a privileged container to get the full host cgroup rw, and the default for non-privileged systems something like the reverted commit (which is almost the same as cgroup-full:mixed in lxc).

@rjnagal
Copy link
Contributor

rjnagal commented Jun 18, 2014

Probably its easier to just let users specify the mounts they want in
regular mount options.If we want to treat cgroup mount as special, say to
mount the task's own cgroup differently, it would make sense to make it a
special option. It feels like having two ways to do the same thing (special
cgroup option vs another mount call) would allow surprising behavior for
some users.

On Wed, Jun 18, 2014 at 2:19 AM, Alexander Larsson <notifications@github.com

wrote:

What about we make this a libcontainer option, with possible values
similar to the ones in lxc, and then we make the default for a privileged
container to get the full host cgroup rw, and the default for
non-privileged systems something like the reverted commit (which is almost
the same as cgroup-full:mixed in lxc).


Reply to this email directly or view it on GitHub
#30 (comment).

@alexlarsson
Copy link
Contributor

@rjnagal Well, its certainly possible to just mount the host cgroups into the container via volumes. However, that is a very unsafe thing to do, as it gives you full cgroup rights. For instance, you can then move your processes to any other system cgroup (i.e. out of the container cgroup). This is clearly something only privileged containers should be allowed to do. Also, its not always very simple to do this, as the cgroups may be mounted in various ways on the host, so basically you need some kind of script to generate all the volumes needed.

If you want to do something more safe/complicated like mounting only your own cgroups as read-write, then that is not generally doable from the outside of the container, as the paths in the cgroup depends on what cgroup path docker assigns the container, which depends on the container id.

Also, fundamentally cgroup setup is a complicated piece of the runtime machinery that we don't want users to have to understand and set up themselves. If some container (be it systemd or something else) needs some level of cgroups mounted, then they should be able to do that without extremely deep knowledge in how containers work. I.e. if its not by default you want something like docker run --cgroup=mixed image.

@rjnagal
Copy link
Contributor

rjnagal commented Jun 19, 2014

@alexlarsson I agree with that it isn't safe to mount cgroups rw for any
task. But libcontainer doesn't have a concept of a privileged container. I
am fine with whatever policies docker uses to set up containers. Do we need
a first-class concept of cgroup visibility in libcontainer?

Complicated mounting might not be possible from docker API, but
libcontainer can definitely do that as long as the cgroups are created
early enough.

On Thu, Jun 19, 2014 at 6:16 AM, Alexander Larsson <notifications@github.com

wrote:

@rjnagal https://github.com/rjnagal Well, its certainly possible to
just mount the host cgroups into the container via volumes. However, that
is a very unsafe thing to do, as it gives you full cgroup rights. For
instance, you can then move your processes to any other system cgroup (i.e.
out of the container cgroup). This is clearly something only privileged
containers should be allowed to do. Also, its not always very simple to do
this, as the cgroups may be mounted in various ways on the host, so
basically you need some kind of script to generate all the volumes needed.

If you want to do something more safe/complicated like mounting only your
own cgroups as read-write, then that is not generally doable from the
outside of the container, as the paths in the cgroup depends on what cgroup
path docker assigns the container, which depends on the container id.

Also, fundamentally cgroup setup is a complicated piece of the runtime
machinery that we don't want users to have to understand and set up
themselves. If some container (be it systemd or something else) needs some
level of cgroups mounted, then they should be able to do that without
extremely deep knowledge in how containers work. I.e. if its not by default
you want something like docker run --cgroup=mixed image.


Reply to this email directly or view it on GitHub
#30 (comment).

@alexlarsson
Copy link
Contributor

@rjnagal Yes, first class option in the container spec for how and if cgroups are visible inside the container is what i would like to have. Then docker could set it to full r/w in the case of docker run --privileged, and use some other default otherwise. Plus other alternatives could be exposed as options in docker.

@rjnagal
Copy link
Contributor

rjnagal commented Jun 20, 2014

@alexlarsson the only problem with making cgroup visibility a first-class
knob is that it allows for two ways to configure cgroup visibility (the new
knob and mount options). We'll have to maintain some code to reconcile both.
But I don't feel that strongly about it.

@crosbymichael WDYT?

On Thu, Jun 19, 2014 at 9:49 AM, Alexander Larsson <notifications@github.com

wrote:

@rjnagal https://github.com/rjnagal Yes, first class option in the
container spec for how and if cgroups are visible inside the container is
what i would like to have. Then docker could set it to full r/w in the case
of docker run --privileged, and use some other default otherwise. Plus
other alternatives could be exposed as options in docker.


Reply to this email directly or view it on GitHub
#30 (comment).

@vmarmol
Copy link
Contributor

vmarmol commented Jun 20, 2014

@rjnagal I share the concern with the duplication, but if we want containers to run inside containers we'll need to expose them as a first-class knob I think. We shouldn't need to rely on the user setting up the mounts right (especially since we want a standard around that).

@offlinehacker
Copy link

So our distro used to work fine in docker with systemd, but with this commit reveted it does not, can you provide workaround wihtout running in privileged mode. I think reverting this breaks more stuff, than it fixes.

@tianon
Copy link
Contributor

tianon commented Sep 12, 2014

@crosbymichael I think another use case for this was reading back things like memory constraints of the current container, right? This would be handy to be able to do at some point. :)

@crosbymichael
Copy link
Contributor Author

@tianon yes and for nested containers in user namespaces. We just have to do it right and not have all the systemd specific stuff

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
6 participants