Skip to content
This repository has been archived by the owner on Dec 13, 2018. It is now read-only.

Add User Namespace support to libcontainer. #23

Closed
wants to merge 1 commit into from

Conversation

mrunalp
Copy link
Contributor

@mrunalp mrunalp commented Jun 16, 2014

I have split the libcontainer changes from this PR moby/moby#6111 into this PR.
This PR depends upon this change -- moby/moby#6417

I want to get feedback on the implementation and the UI. For using this see moby/moby#6111 changes to the docker side.

The changes allow one to specify UID/GID Mappings and also the user on host that will be mapped to root within the container. The UI can be tweaked on feedback.

Docker-DCO-1.1-Signed-off-by: Mrunal Patel mrunalp@gmail.com (github: mrunalp)

@tianon
Copy link
Contributor

tianon commented Jun 17, 2014

I'm not the maintainer (IANTM), but this seems too docker-specific IMO. Isn't libcontainer supposed to just expose the primitives, essentially?

@mrunalp
Copy link
Contributor Author

mrunalp commented Jun 17, 2014

@tianon Fair enough. I will make some changes to make it more generic.

@tianon
Copy link
Contributor

tianon commented Jun 17, 2014

I think it's probably still worth waiting for a maintainer to confirm
before doing too much more work, isn't it? :)

@glyn
Copy link
Contributor

glyn commented Jun 17, 2014

Would it be possible to include unit tests? I know libcontainer is light on unit tests, but...

@mrunalp
Copy link
Contributor Author

mrunalp commented Jun 17, 2014

@vmarmol @crosbymichael review please :) I think I can take out docker specific terminology and use more generic terms.

@crosbymichael
Copy link
Contributor

I think we should just expose this via generic uid and gid mappings don't you?

@mrunalp
Copy link
Contributor Author

mrunalp commented Jun 17, 2014

@crosbymichael Agree on the general case.
There are two features here:

  1. Generic UID/GID mappings that can just be set and will be applied by libcontainer.
  2. Docker use case of mapping a host user to root. Here we need to know the uid/gid since we change to that user before calling clone and perhaps a flag to indicate that user wants to do that.. Also, there are two choices about the mapping for this feature, either we set it in libcontainer or have the user set it. Any thoughts?

@crosbymichael
Copy link
Contributor

@mrunalp why would the root -> "docker" root usecase be any different from the generic uid/gid mappings? Wouldn't it just be one entry for the mapping?

@mrunalp
Copy link
Contributor Author

mrunalp commented Jun 17, 2014

@crosbymichael There is the mapping and then there is the user that you are going to run as and that's what the DockerRootUid DockerRootGid (which will be renamed) are for. Do we want to use those variables or should be use the User field?

From your response, I take it that we want to make the user supply the mappings for this case as well and I think it makes sense.

@crosbymichael
Copy link
Contributor

Well since libcontainer is the low level part we can have the user specify the exact mapping that they want. Docker can provide the root -> docker-root abstraction

@vmarmol
Copy link
Contributor

vmarmol commented Jun 17, 2014

I agree with @crosbymichael and @tianon. We should really be exposing a mapping:

type UidRange struct {
  Start  int
  Length int
}

type UidMapping struct {
  Internal UidRange
  External UidRange
}

UserMapping []UidMapping

@dineshs-altiscale
Copy link

Yup. Kernel represents it somewhat more concisely as a triplet (hostUid, containerUid, size):

https://github.com/dotcloud/docker/pull/4572/files#diff-679e6137274897a7bfcc7688f0c6f089R361

Using same format saves a few lines of code here and there.

@vmarmol
Copy link
Contributor

vmarmol commented Jun 17, 2014

@dineshs-altiscale lets go with that :)

@mrunalp
Copy link
Contributor Author

mrunalp commented Jun 17, 2014

@dineshs-altiscale @vmarmol @crosbymichael Thanks for the feedback. I will incorporate the changes and update the PR.

@mrunalp
Copy link
Contributor Author

mrunalp commented Jun 19, 2014

@crosbymichael @vmarmol I have updated the PR.

@mrunalp
Copy link
Contributor Author

mrunalp commented Jun 19, 2014

Here is how this could be used from docker -- mrunalp/docker@6218272

@@ -66,6 +66,18 @@ type Container struct {

// The device nodes that should be automatically created within the container upon container start. Note, make sure that the node is marked as allowed in the cgroup as well!
DeviceNodes []*devices.Device `json:"device_nodes,omitempty"`

// UserNsUid specifies the uid to run as when user namespace mappings are specified
UserNsUid int `json:"user_ns_uid,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do want us to start making a distinction between runtime config and container config. Runtime config are the things that we can change if we were to run another process in the container. This is one example of that. I don't think it affects your PR, but it is something for @rjnagal, @crosbymichael, and me to think about :)

@mrunalp
Copy link
Contributor Author

mrunalp commented Jun 23, 2014

@crosbymichael @vmarmol You guys had a chance to review?

HostUid int `json:"host_uid,omitempty"`
ContainerUid int `json:"container_uid,omitempty"`
Size int `json:"size,omitempty"`
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making that change.

@dineshs-altiscale
Copy link

This is missing key things like dropping capabilities and clearing KEEP_CAPS.

@mrunalp
Copy link
Contributor Author

mrunalp commented Jun 23, 2014

@dineshs-altiscale Good point. What capabilities do we want to retain in the user namespace? We should be able to retain more than what we do in the non-user namespace case.

@mrunalp
Copy link
Contributor Author

mrunalp commented Jun 23, 2014

@dineshs-altiscale capabilities could be dropped in the parent to match non-user namespace case, but I think it makes sense to allow full capabilities in the child which is the point of user ns anyway, right? Please let me know if that is a wrong assumption. From whatever I have read the capabilities are only appicable to the new child user ns. If there are any capabilities that could allow the child process to escalate onto host then we could drop those.

@dineshs-altiscale
Copy link

We should begin with the same list used for non user namespace case as a conservative starting point and defer a closer audit of capabilities for a later PR.

Also, it's good to keep the same model (-u <username>) currently supported for non user namespace case. It also handles supplementary groups.

In general, reusing FinalizeNamespace avoids code duplication and handles these cases correctly.

@vmarmol
Copy link
Contributor

vmarmol commented Jun 23, 2014

+1 to using the same caps for now

@@ -95,17 +102,155 @@ func Init(container *libcontainer.Container, uncleanRootfs, consolePath string,
return fmt.Errorf("get parent death signal %s", err)
}

if err := FinalizeNamespace(container); err != nil {
return fmt.Errorf("finalize namespace %s", err)
userNsMode := len(container.UidMappings) > 0 || len(container.GidMappings) > 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More than this, we should check if a new NS_USER was specifed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add that.

@mrunalp
Copy link
Contributor Author

mrunalp commented Jun 23, 2014

@dineshs-altiscale -u user refers to a user inside the container whereas the options added here are for the host uid/host gid. Also, we cannot really use FinalizeNamespace or any other golang function in the child after making a raw syscall to CLONE. There was a discussion around this on #docker-dev.

@mrunalp
Copy link
Contributor Author

mrunalp commented Jun 23, 2014

@vmarmol @dineshs-altiscale @crosbymichael As far as capabilities in user ns go -- first paragraph here -- http://lwn.net/Articles/528117/ :) Of course, there are/will be vulnerabilities, but they will be fixed sooner the more people try out full caps in a container and report them. Also, this could be started off as an experimental feature in docker.

}
}

syscall.ForkLock.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering whether we should do all this in C. We're gonna have to migrate it there anyways. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vmarmol I agree that this is much easier to do in C as we are working around go here. Question is where are the bits so I can contribute :) and also when is it expected to be released?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is starting slowly. The first thing is to nail down the API and then start the merge. If we use cgo for now, it'll be a drop-in replacement when the time comes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah if you're not too against it (and sorry for the extra work!) but can we write this in C? @crosbymichael let me know if you disagree :)

@mrunalp
Copy link
Contributor Author

mrunalp commented Jun 23, 2014

Pushed change to use uint32 for UIDs/GIDs and use NEWUSER as a flag in init.go

@dineshs-altiscale
Copy link

Specifying container user is intuitive and aligns with non user namespace usage model. Implementation difficulty with raw syscalls cannot really be a reason for artificially adding new config to container.go and making the usage harder.

+1 that these things are much easier to do in C when Go is not changing stack under you. But for now, since you know the mappings, you could convert container.User to its host equivalent and use it.

@@ -95,17 +102,155 @@ func Init(container *libcontainer.Container, uncleanRootfs, consolePath string,
return fmt.Errorf("get parent death signal %s", err)
}

if err := FinalizeNamespace(container); err != nil {
return fmt.Errorf("finalize namespace %s", err)
userNsMode := container.Namespaces["NEWUSER"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: userNsEnabled

@dineshs-altiscale
Copy link

Need to close FDs in the child as well -- there could be some FDs not marked close-on-exec that should not get through to the child. Forklock only takes care of the ones that are marked close-on-exec. With fchdir and other VFS calls that take FD rather than filename, an open FD to a host file can be as bad as having access to host file system.

@mrunalp
Copy link
Contributor Author

mrunalp commented Jun 23, 2014

@dineshs-altiscale I would argue that using uid/gid is more apt in libcontainer since there need not even be a user created on the host to be able to use this feature :) (and that was the driving reason behind using that as I went back and forth on it).

@dineshs-altiscale
Copy link

Not sure I follow that, but I don't think -u and UID maps should be mixed up.

@mrunalp
Copy link
Contributor Author

mrunalp commented Jun 24, 2014

@vmarmol @crosbymichael Rebased the PR and made modifications to address feedback except the -u flag. I think overloading the meaning of -u to lookup users on host or within a container namespace depending on userns/non-userns case isn't a good idea.

@crosbymichael
Copy link
Contributor

@mrunalp do you have a sample_config for us to test this out with nsinit?

If not you can just copy the minimal.json file and add the mappings

@mrunalp
Copy link
Contributor Author

mrunalp commented Jun 24, 2014

@crosbymichael I have been maintaining changes to docker to test this. Need to modify create.go and driver.go -- mrunalp/docker@37dcb1a

Here is a sample container.json -- https://gist.github.com/mrunalp/833decb50fc9ae443106

return fmt.Errorf("finalize namespace %s", err)
userNsEnabled := container.Namespaces["NEWUSER"]

if userNsEnabled {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you break this into two separate functions just for readability?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

@mrunalp
Copy link
Contributor Author

mrunalp commented Jun 25, 2014

@crosbymichael Pushed changes to address all of your comments.

Docker-DCO-1.1-Signed-off-by: Mrunal Patel <mrunalp@gmail.com> (github: mrunalp)
@crosbymichael
Copy link
Contributor

I think we can close this one because #53 looks much better and I don't want to split the conversation.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants