Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement --cap and --namespace flags for better control #6687

Closed
crosbymichael opened this issue Jun 26, 2014 · 37 comments
Closed

Implement --cap and --namespace flags for better control #6687

crosbymichael opened this issue Jun 26, 2014 · 37 comments

Comments

@crosbymichael
Copy link
Contributor

We discussed this and think it would be best to not use something like --opts to be able to add and remove individual capabilities and namespaces for containers. We would rather use a dedicated flag and make sure that these features work across libcontainer and lxc. So our proposal is to add these two flags and provide a warning when elevating privileges for a container so that a user is aware of that they are doing.

Caps

http://linux.die.net/man/7/capabilities

  1. In addition to the default caps that are keep for a container add the NET_ADMIN cap to a container.
docker run --cap +NET_ADMIN crosbymichael/redis
  1. Remove the MKNOD capability from the default cap set that is kept
docker run --cap -MKNOD crosbymichael/redis
  1. Drop MKNOD and add NET_ADMIN and SYS_ADMIN
docker run --cap -MKNOD --cap +NET_ADMIN --cap +SYS_ADMIN crosbymichael/redis

Any additional caps that are retained for a container should produce an error on stderr when running the container to let the user know they have elevated the privileges for a container.

Namespaces

There are a few reasons why you want to modify the namespaces applied to a container. If you are running a monitoring application in a container you may want to not drop into your own PID namespace so that you can see all your parents PID.

  1. Do not create a PID namespace
docker run --namespace +PID -ti crosbymichael/htop

Changing the default namespace profile should result in a warning to the user.

Privileged

Using the privileged flag should result in a warning on stderr just like the two new flags above.

Development

We can discuss some of the details in this issue but if you would like to work on any of the changes please let me know.

I think all three of these changes, caps, namespaces, and privileged warning can be done separately. Ideally the privileged warning code can be merged in first so we have the warning pipeline to work with on the other prs. Caps should be easy to implement. Namespaces will be a little more challenging because it may consist of some runtime/libcontainer changes to support the different configuration.

Suggestions?

@crosbymichael
Copy link
Contributor Author

ping @philips @unclejack @rhatdan

@mrunalp
Copy link
Contributor

mrunalp commented Jun 26, 2014

@crosbymichael +1 to adding this feature. I can help with changes to libcontainer to make this happen.

@vishh
Copy link
Contributor

vishh commented Jun 26, 2014

+1

On Wed, Jun 25, 2014 at 5:12 PM, Mrunal Patel notifications@github.com
wrote:

@crosbymichael https://github.com/crosbymichael +1 to adding this
feature. I can help with changes to libcontainer to make this happen.


Reply to this email directly or view it on GitHub
#6687 (comment).

@crosbymichael crosbymichael changed the title Implement --caps and --namespaces flags for better control Implement --cap and --namespace flags for better control Jun 26, 2014
@vieux vieux added this to the 1.0.2 milestone Jun 26, 2014
@rhatdan
Copy link
Contributor

rhatdan commented Jun 26, 2014

Does this mean we are going away from the --opts patch altogether, and I should work on the --selinuxopts?

@rhatdan
Copy link
Contributor

rhatdan commented Jun 26, 2014

I actually like the feature but the CLI is a little ugly.

--namespace -PID

PID looks like an Option but it is really not, and does this eliminate the use of -P as an option.

systemd-nspawn does this using --capabiltiy and --drop-capabilty

man systemd-nspawn
...
--capability=
List one or more additional capabilities to grant the container. Takes a comma-separated list of capability names, see capabilities(7) for more information. Note that the following capabilities will be granted in any way: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_DAC_READ_SEARCH, CAP_FOWNER, CAP_FSETID, CAP_IPC_OWNER, CAP_KILL, CAP_LEASE, CAP_LINUX_IMMUTABLE, CAP_NET_BIND_SERVICE, CAP_NET_BROADCAST, CAP_NET_RAW, CAP_SETGID, CAP_SETFCAP, CAP_SETPCAP, CAP_SETUID, CAP_SYS_ADMIN, CAP_SYS_CHROOT, CAP_SYS_NICE,CAP_SYS_PTRACE, CAP_SYS_TTY_CONFIG, CAP_SYS_RESOURCE, CAP_SYS_BOOT, CAP_AUDIT_WRITE, CAP_AUDIT_CONTROL. Also CAP_NET_ADMIN is retained if --private-network is specified.
If the special value "all" is passed, all capabilities are retained.

   --drop-capability=
       Specify one or more additional capabilities to drop for the container. This allows running the container with fewer capabilities than the default (see above).

@cpuguy83
Copy link
Member

I agree with @rhatdan, but more concisely just --drop-cap, --add-cap

Also, for namespaces it would be really nice to have the same functionality that we currently have with net for all namespaces... so host/new/container:<id/name>

@jeremyeder
Copy link

Any thoughts about adding this support to Dockerfiles, such that these things can be baked into an image? Dockerfile could also allow/deny override of the baked-in config via docker run ...

@danbeaulieu
Copy link

I like --add-cap, other options are --keep-cap or --allow-cap.

namespace is sort of overloaded, what about something like --pid-namespace=true|false?

I think as separate issues it'd be great to list capabilities via docker inspect, or possibly a new API method.

@crosbymichael
Copy link
Contributor Author

@rhatdan yes, I would just work on a --selinux flag for what you need. We are not ready for --opts right now

@crosbymichael
Copy link
Contributor Author

@jeremyeder No, we are not ready to add anything to the image until we have some type of control that says do not run any untrusted image with elevated permissions on my docker host. For now we will keep these as runtime only options

@crosbymichael
Copy link
Contributor Author

ping @vieux @shykes what do you think about these alternate flag suggestions?

@philips
Copy link
Contributor

philips commented Jun 26, 2014

FWIW, I like the alternative ideas over the +/- syntax.

@vieux
Copy link
Contributor

vieux commented Jun 26, 2014

I agree with @rhatdan -P and -PID could conflict

I like to see the flags grouped together. What about --cap-add and --cap-drop

@michaelneale
Copy link
Contributor

I see a dichotomy: docker default - you can only add capabilities, not drop**.
docker --privileged - you can only drop, not add.
The command line can enforce.

docker run --add-cap ....
docker run --privileged --drop-cap ...

** is there a case for taking away from the default whitelist capabilities of docker? It seems that would only break things?

@bgrant0607
Copy link

@crosbymichael Regarding Dockerfile support. As mentioned in #6616, often applications will not run correctly if not granted the capabilities they require. I recommend that Dockerfile capability specification should not actually provide elevated permissions, but should indicate required permissions, so a clear error can be issued in the case that they are not granted, higher-level systems can interrogate the image to determine required capabilities, etc.

@rhatdan
Copy link
Contributor

rhatdan commented Jun 27, 2014

@michaelneale The problem with this is --privileged means more then just --drop-cap. It also turns off MAC support (selinux or AppArmor) and maybe a couple of other things.

Again I look for systemd-nspawn which has the all flag

--drop-cap=all --add-cap=SYS_ADMIN
--add-cap=all

--drop-namespace=all --add-namespace=MNT

We probably should separate out the discussion of DockerFiles from the discussion of CLI for dropping/adding namespaces and capabilities.

@rhatdan
Copy link
Contributor

rhatdan commented Jun 27, 2014

@bgrant0607 I do agree. One of the difficult things for users of "Contained" Docker Containers will be figuring out why a container is getting "permission denied". There are several potential reasons for this.

Read Only File Systems
Missing Capabilitiies
DAC Permissions (Wrong Ownership, Permission Flags)
MAC (SELinux, AppArmor)
NODEV/NOEXEC/NOSUID on file systems
Device CGroup
Namespaces?

And the kernel does not help you out. We did have an effort to try to help, but Linus did not like it. https://fedoraproject.org/wiki/Features/FriendlyEPERM

Currently the only thing a user can do is turn everything (all security) off (--privilege) and if it works he is happy but insecure.

@danbeaulieu
Copy link

I really like the all flag @rhatdan is suggesting. It makes things very explicit, which is extremely important imo, and you don't have to wonder which caps Docker is adding or not adding by default or if the defaults ever change, etc.

@michaelneale I'm actually not seeing a case for Docker to have a default whitelist at all. That it does isn't very well documented, and I think it leads to a false sense of security as well. I get the impression the Docker whitelist was created when Docker was just a project to be used by dotcloud, and best fit their use cases.

@ibuildthecloud
Copy link
Contributor

Regarding namespaces, instead of just on/off, can we have the ability to specify another container id/name, similar to the -net flag. So --namespace pid=container:otherguy. Just as its nice to be able to have namespaces turned off so that you can augment the host system with things like monitoring tools, it is also just as nice to be able to augment other containers.

@crosbymichael
Copy link
Contributor Author

@danbeaulieu Docker's whitelist has been created by docker and the community ( anyone remember the great MKNOD battle? ). It is not created to best fit the dotcloud usecase and the list evolves as we learn more about how people are using docker.

Yes, i think we should have an all or * value to denote all caps

@rhatdan
Copy link
Contributor

rhatdan commented Jun 27, 2014

Sadly I missed the MKNOD battle, I would have been on the side of don't grant it. And get rid of cgroup device node stuff, which I still have no clue why this is not a namespace.

Having a keyword like "all" and "none" would make building up or taking away capabilities a lot easier.

I think having a group of predefined CAPABILITIES is good so people will run in a "semi"-secure environment. I just wish it was easy to see what the default list is. Perhaps we should list it in the

docker run --help
message and in docker-run man page.

@wyaeld
Copy link

wyaeld commented Jul 1, 2014

If add/remove capabilities/namespaces goes in, can inspect be enhanced to show the current state?

I do share @rhatdan concern that if something is getting denied, people might just run privileged and therefore disable MAC entirely.

@vieux
Copy link
Contributor

vieux commented Jul 10, 2014

@rhatdan I started a basic implementation for caps here: https://github.com/vieux/docker/compare/dotcloud:master...vieux:cap_add_drop?expand=1

it supports --cap-add, --cap-drop and all I don't see the point of none, can you give me an example ?

Any idea ? /cc @crosbymichael

@crosbymichael
Copy link
Contributor Author

all is fine with me

@vieux
Copy link
Contributor

vieux commented Jul 10, 2014

@rhatdan nevermind I edited my comment, drop=all and add=all work in a order now.

@vieux
Copy link
Contributor

vieux commented Jul 11, 2014

See #6968

@vieux vieux modified the milestones: 1.1.2, 1.1.1 Jul 12, 2014
@ewindisch
Copy link
Contributor

One thing we discussed part of #2452 was support for FreeBSD and how this feature looks when we consider supporting other operating systems. Do we simply assume that all arguments to --cap are OS-dependent, or do we attempt a mapping to our own verbs to the OS-specific ones? I'm inclined to say it should be OS-dependent.

@rhatdan
Copy link
Contributor

rhatdan commented Jul 17, 2014

It should be OS Specific. We should not be inventing a new language.

How close is the --namespace patch?

@phemmer
Copy link
Contributor

phemmer commented Jan 5, 2015

Has there been any progress on this? I am highly interested in the ability to drop specific namespaces.

@ewindisch
Copy link
Contributor

--cap-add and --cap-drop was included with Docker 1.2. No --namespace flag has been added.

We should probably close this issue, possibly creating one for more granular namespace configuration.

@LK4D4
Copy link
Contributor

LK4D4 commented Jan 5, 2015

@phemmer We have support for dropping and specifying paths for namespaces in libcontainer. This can be added with dozen lines of code in docker. But having way to override namespaces in docker is sort of UX decision, so this is probably depending on @shykes.

@phemmer
Copy link
Contributor

phemmer commented Jan 5, 2015

Perhaps my use case would help here. What I'm trying to do is mount something within a volume in one container, and then share the volume (containing the mount) with another container. Because of the mount namespace, when the volume is shared with another container, it appears empty. If I could disable the mount namespace, I could work around this.

@rhatdan
Copy link
Contributor

rhatdan commented Jan 5, 2015

Currently you can disable network,UTC and IPC namespace.

--net=host, --ipc=host

I have submitted a patch to disable the PID namespace --pid=host

Usernamespace is not enabled yet. That leaves mount, but kind of the definition of docker is using the MNT Namespace for handing of the image content. Not sure how you disable that and still have docker. libcontainer allows you to disable/enable all combinations.

@phemmer
Copy link
Contributor

phemmer commented Jan 5, 2015

Well I imagine images would break left and right if the root (/) of the container isn't the root of the image, but we can fall back to a simple chroot to solve that issue.

@SvenDowideit
Copy link
Contributor

imo it would be cool to have it tho - perhaps its a way for a container to union on top of another host dir / container, and in the process, we make even lighter weight containers. :)

@rhatdan
Copy link
Contributor

rhatdan commented Jan 6, 2015

Even if you shared / or /usr you would probably still want to have applications write to locations that are different per container /var and /tmp for example.

But if you want just to have namespaces, why not just use the unshare command.

@crosbymichael
Copy link
Contributor Author

This is complete with --pid

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests