Implement --cap and --namespace flags for better control #6687

crosbymichael · 2014-06-26T00:06:55Z

We discussed this and think it would be best to not use something like --opts to be able to add and remove individual capabilities and namespaces for containers. We would rather use a dedicated flag and make sure that these features work across libcontainer and lxc. So our proposal is to add these two flags and provide a warning when elevating privileges for a container so that a user is aware of that they are doing.

Caps

http://linux.die.net/man/7/capabilities

In addition to the default caps that are keep for a container add the NET_ADMIN cap to a container.

docker run --cap +NET_ADMIN crosbymichael/redis

Remove the MKNOD capability from the default cap set that is kept

docker run --cap -MKNOD crosbymichael/redis

Drop MKNOD and add NET_ADMIN and SYS_ADMIN

docker run --cap -MKNOD --cap +NET_ADMIN --cap +SYS_ADMIN crosbymichael/redis

Any additional caps that are retained for a container should produce an error on stderr when running the container to let the user know they have elevated the privileges for a container.

Namespaces

There are a few reasons why you want to modify the namespaces applied to a container. If you are running a monitoring application in a container you may want to not drop into your own PID namespace so that you can see all your parents PID.

Do not create a PID namespace

docker run --namespace +PID -ti crosbymichael/htop

Changing the default namespace profile should result in a warning to the user.

Privileged

Using the privileged flag should result in a warning on stderr just like the two new flags above.

Development

We can discuss some of the details in this issue but if you would like to work on any of the changes please let me know.

I think all three of these changes, caps, namespaces, and privileged warning can be done separately. Ideally the privileged warning code can be merged in first so we have the warning pipeline to work with on the other prs. Caps should be easy to implement. Namespaces will be a little more challenging because it may consist of some runtime/libcontainer changes to support the different configuration.

Suggestions?

The text was updated successfully, but these errors were encountered:

crosbymichael · 2014-06-26T00:12:14Z

ping @philips @unclejack @rhatdan

mrunalp · 2014-06-26T00:12:24Z

@crosbymichael +1 to adding this feature. I can help with changes to libcontainer to make this happen.

vishh · 2014-06-26T00:24:03Z

+1

On Wed, Jun 25, 2014 at 5:12 PM, Mrunal Patel notifications@github.com
wrote:

@crosbymichael https://github.com/crosbymichael +1 to adding this
feature. I can help with changes to libcontainer to make this happen.

—
Reply to this email directly or view it on GitHub
#6687 (comment).

rhatdan · 2014-06-26T13:18:22Z

Does this mean we are going away from the --opts patch altogether, and I should work on the --selinuxopts?

rhatdan · 2014-06-26T13:23:06Z

I actually like the feature but the CLI is a little ugly.

--namespace -PID

PID looks like an Option but it is really not, and does this eliminate the use of -P as an option.

systemd-nspawn does this using --capabiltiy and --drop-capabilty

man systemd-nspawn
...
--capability=
List one or more additional capabilities to grant the container. Takes a comma-separated list of capability names, see capabilities(7) for more information. Note that the following capabilities will be granted in any way: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_DAC_READ_SEARCH, CAP_FOWNER, CAP_FSETID, CAP_IPC_OWNER, CAP_KILL, CAP_LEASE, CAP_LINUX_IMMUTABLE, CAP_NET_BIND_SERVICE, CAP_NET_BROADCAST, CAP_NET_RAW, CAP_SETGID, CAP_SETFCAP, CAP_SETPCAP, CAP_SETUID, CAP_SYS_ADMIN, CAP_SYS_CHROOT, CAP_SYS_NICE,CAP_SYS_PTRACE, CAP_SYS_TTY_CONFIG, CAP_SYS_RESOURCE, CAP_SYS_BOOT, CAP_AUDIT_WRITE, CAP_AUDIT_CONTROL. Also CAP_NET_ADMIN is retained if --private-network is specified.
If the special value "all" is passed, all capabilities are retained.

   --drop-capability=
       Specify one or more additional capabilities to drop for the container. This allows running the container with fewer capabilities than the default (see above).

cpuguy83 · 2014-06-26T14:26:05Z

I agree with @rhatdan, but more concisely just --drop-cap, --add-cap

Also, for namespaces it would be really nice to have the same functionality that we currently have with net for all namespaces... so host/new/container:<id/name>

jeremyeder · 2014-06-26T14:29:32Z

Any thoughts about adding this support to Dockerfiles, such that these things can be baked into an image? Dockerfile could also allow/deny override of the baked-in config via docker run ...

danbeaulieu · 2014-06-26T14:49:03Z

I like --add-cap, other options are --keep-cap or --allow-cap.

namespace is sort of overloaded, what about something like --pid-namespace=true|false?

I think as separate issues it'd be great to list capabilities via docker inspect, or possibly a new API method.

crosbymichael · 2014-06-26T16:34:49Z

@rhatdan yes, I would just work on a --selinux flag for what you need. We are not ready for --opts right now

crosbymichael · 2014-06-26T16:37:19Z

@jeremyeder No, we are not ready to add anything to the image until we have some type of control that says do not run any untrusted image with elevated permissions on my docker host. For now we will keep these as runtime only options

crosbymichael · 2014-06-26T17:35:52Z

ping @vieux @shykes what do you think about these alternate flag suggestions?

philips · 2014-06-26T17:48:08Z

FWIW, I like the alternative ideas over the +/- syntax.

vieux · 2014-06-26T23:16:13Z

I agree with @rhatdan -P and -PID could conflict

I like to see the flags grouped together. What about --cap-add and --cap-drop

michaelneale · 2014-06-27T00:00:28Z

I see a dichotomy: docker default - you can only add capabilities, not drop**.
docker --privileged - you can only drop, not add.
The command line can enforce.

docker run --add-cap ....
docker run --privileged --drop-cap ...

** is there a case for taking away from the default whitelist capabilities of docker? It seems that would only break things?

bgrant0607 · 2014-06-27T03:01:42Z

@crosbymichael Regarding Dockerfile support. As mentioned in #6616, often applications will not run correctly if not granted the capabilities they require. I recommend that Dockerfile capability specification should not actually provide elevated permissions, but should indicate required permissions, so a clear error can be issued in the case that they are not granted, higher-level systems can interrogate the image to determine required capabilities, etc.

rhatdan · 2014-06-27T10:16:25Z

@michaelneale The problem with this is --privileged means more then just --drop-cap. It also turns off MAC support (selinux or AppArmor) and maybe a couple of other things.

Again I look for systemd-nspawn which has the all flag

--drop-cap=all --add-cap=SYS_ADMIN
--add-cap=all

--drop-namespace=all --add-namespace=MNT

We probably should separate out the discussion of DockerFiles from the discussion of CLI for dropping/adding namespaces and capabilities.

rhatdan · 2014-06-27T10:22:41Z

@bgrant0607 I do agree. One of the difficult things for users of "Contained" Docker Containers will be figuring out why a container is getting "permission denied". There are several potential reasons for this.

Read Only File Systems
Missing Capabilitiies
DAC Permissions (Wrong Ownership, Permission Flags)
MAC (SELinux, AppArmor)
NODEV/NOEXEC/NOSUID on file systems
Device CGroup
Namespaces?

And the kernel does not help you out. We did have an effort to try to help, but Linus did not like it. https://fedoraproject.org/wiki/Features/FriendlyEPERM

Currently the only thing a user can do is turn everything (all security) off (--privilege) and if it works he is happy but insecure.

danbeaulieu · 2014-06-27T15:04:28Z

I really like the all flag @rhatdan is suggesting. It makes things very explicit, which is extremely important imo, and you don't have to wonder which caps Docker is adding or not adding by default or if the defaults ever change, etc.

@michaelneale I'm actually not seeing a case for Docker to have a default whitelist at all. That it does isn't very well documented, and I think it leads to a false sense of security as well. I get the impression the Docker whitelist was created when Docker was just a project to be used by dotcloud, and best fit their use cases.

ibuildthecloud · 2014-06-27T15:47:57Z

Regarding namespaces, instead of just on/off, can we have the ability to specify another container id/name, similar to the -net flag. So --namespace pid=container:otherguy. Just as its nice to be able to have namespaces turned off so that you can augment the host system with things like monitoring tools, it is also just as nice to be able to augment other containers.

crosbymichael · 2014-06-27T18:36:23Z

@danbeaulieu Docker's whitelist has been created by docker and the community ( anyone remember the great MKNOD battle? ). It is not created to best fit the dotcloud usecase and the list evolves as we learn more about how people are using docker.

Yes, i think we should have an all or * value to denote all caps

rhatdan · 2014-06-27T19:28:11Z

Sadly I missed the MKNOD battle, I would have been on the side of don't grant it. And get rid of cgroup device node stuff, which I still have no clue why this is not a namespace.

Having a keyword like "all" and "none" would make building up or taking away capabilities a lot easier.

I think having a group of predefined CAPABILITIES is good so people will run in a "semi"-secure environment. I just wish it was easy to see what the default list is. Perhaps we should list it in the

docker run --help
message and in docker-run man page.

wyaeld · 2014-07-01T09:50:18Z

If add/remove capabilities/namespaces goes in, can inspect be enhanced to show the current state?

I do share @rhatdan concern that if something is getting denied, people might just run privileged and therefore disable MAC entirely.

vieux · 2014-07-10T22:37:11Z

@rhatdan I started a basic implementation for caps here: https://github.com/vieux/docker/compare/dotcloud:master...vieux:cap_add_drop?expand=1

it supports --cap-add, --cap-drop and all I don't see the point of none, can you give me an example ?

Any idea ? /cc @crosbymichael

crosbymichael · 2014-07-10T22:47:04Z

all is fine with me

vieux · 2014-07-10T23:05:17Z

@rhatdan nevermind I edited my comment, drop=all and add=all work in a order now.

vieux · 2014-07-11T00:02:47Z

See #6968

ewindisch · 2014-07-16T21:23:03Z

One thing we discussed part of #2452 was support for FreeBSD and how this feature looks when we consider supporting other operating systems. Do we simply assume that all arguments to --cap are OS-dependent, or do we attempt a mapping to our own verbs to the OS-specific ones? I'm inclined to say it should be OS-dependent.

rhatdan · 2014-07-17T01:24:02Z

It should be OS Specific. We should not be inventing a new language.

How close is the --namespace patch?

phemmer · 2015-01-05T20:59:40Z

Has there been any progress on this? I am highly interested in the ability to drop specific namespaces.

ewindisch · 2015-01-05T21:10:58Z

--cap-add and --cap-drop was included with Docker 1.2. No --namespace flag has been added.

We should probably close this issue, possibly creating one for more granular namespace configuration.

LK4D4 · 2015-01-05T21:16:04Z

@phemmer We have support for dropping and specifying paths for namespaces in libcontainer. This can be added with dozen lines of code in docker. But having way to override namespaces in docker is sort of UX decision, so this is probably depending on @shykes.

phemmer · 2015-01-05T21:22:53Z

Perhaps my use case would help here. What I'm trying to do is mount something within a volume in one container, and then share the volume (containing the mount) with another container. Because of the mount namespace, when the volume is shared with another container, it appears empty. If I could disable the mount namespace, I could work around this.

rhatdan · 2015-01-05T21:53:18Z

Currently you can disable network,UTC and IPC namespace.

--net=host, --ipc=host

I have submitted a patch to disable the PID namespace --pid=host

Usernamespace is not enabled yet. That leaves mount, but kind of the definition of docker is using the MNT Namespace for handing of the image content. Not sure how you disable that and still have docker. libcontainer allows you to disable/enable all combinations.

phemmer · 2015-01-05T21:58:52Z

Well I imagine images would break left and right if the root (/) of the container isn't the root of the image, but we can fall back to a simple chroot to solve that issue.

SvenDowideit · 2015-01-06T06:25:01Z

imo it would be cool to have it tho - perhaps its a way for a container to union on top of another host dir / container, and in the process, we make even lighter weight containers. :)

rhatdan · 2015-01-06T13:32:00Z

Even if you shared / or /usr you would probably still want to have applications write to locations that are different per container /var and /tmp for example.

But if you want just to have namespaces, why not just use the unshare command.

crosbymichael · 2015-01-14T01:01:38Z

This is complete with --pid

crosbymichael added /system labels Jun 26, 2014

crosbymichael changed the title ~~Implement --caps and --namespaces flags for better control~~ Implement --cap and --namespace flags for better control Jun 26, 2014

vieux mentioned this issue Jun 26, 2014

Add warning when using privileged mode. #6689

Closed

vieux added this to the 1.0.2 milestone Jun 26, 2014

cpuguy83 mentioned this issue Jun 26, 2014

Docker security approach encourages privileged containers #6616

Closed

ibuildthecloud mentioned this issue Jun 27, 2014

Create idiomatic and technically feasible way to mount / into a container #6733

Closed

erikh added Orchestration labels Jun 30, 2014

vieux modified the milestones: 1.0.3, 1.0.2 Jul 1, 2014

vaceletm mentioned this issue Jul 9, 2014

Cannot use setuid/setgid capabilities in Dockerfile (fail to install postfix) #6925

Closed

vieux mentioned this issue Jul 11, 2014

Add support for --cap-add and --cap-drop #6968

Merged

vieux modified the milestones: 1.1.2, 1.1.1 Jul 12, 2014

ewindisch mentioned this issue Jul 16, 2014

Support granular capabilities #2452

Closed

tianon mentioned this issue Jul 17, 2014

libcontainer: How to change namespaces? #5393

Closed

crosbymichael removed this from the 1.1.3 milestone Aug 15, 2014

wlan0 mentioned this issue Dec 10, 2014

add support for capAdd rancher/cattle#34

Closed

crosbymichael closed this as completed Jan 14, 2015

phemmer mentioned this issue Jan 14, 2015

Allow controlling the mount namespace #10088

Closed

philwo mentioned this issue Sep 3, 2015

Can't run Bazel as part of a Docker build due to sandboxing issues bazelbuild/bazel#418

Closed

gdm85 mentioned this issue Sep 27, 2015

Add capabilities support ghedo/pflask#6

Closed

Implement --cap and --namespace flags for better control #6687

Implement --cap and --namespace flags for better control #6687

Comments

crosbymichael commented Jun 26, 2014

Caps

Namespaces

Privileged

Development

crosbymichael commented Jun 26, 2014

mrunalp commented Jun 26, 2014

vishh commented Jun 26, 2014

rhatdan commented Jun 26, 2014

rhatdan commented Jun 26, 2014

cpuguy83 commented Jun 26, 2014

jeremyeder commented Jun 26, 2014

danbeaulieu commented Jun 26, 2014

crosbymichael commented Jun 26, 2014

crosbymichael commented Jun 26, 2014

crosbymichael commented Jun 26, 2014

philips commented Jun 26, 2014

vieux commented Jun 26, 2014

michaelneale commented Jun 27, 2014

bgrant0607 commented Jun 27, 2014

rhatdan commented Jun 27, 2014

rhatdan commented Jun 27, 2014

danbeaulieu commented Jun 27, 2014

ibuildthecloud commented Jun 27, 2014

crosbymichael commented Jun 27, 2014

rhatdan commented Jun 27, 2014

wyaeld commented Jul 1, 2014

vieux commented Jul 10, 2014

crosbymichael commented Jul 10, 2014

vieux commented Jul 10, 2014

vieux commented Jul 11, 2014

ewindisch commented Jul 16, 2014

rhatdan commented Jul 17, 2014

phemmer commented Jan 5, 2015

ewindisch commented Jan 5, 2015

LK4D4 commented Jan 5, 2015

phemmer commented Jan 5, 2015

rhatdan commented Jan 5, 2015

phemmer commented Jan 5, 2015

SvenDowideit commented Jan 6, 2015

rhatdan commented Jan 6, 2015

crosbymichael commented Jan 14, 2015