Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support user namespaces phase I in Kubernetes 1.25+ #7063

Closed
rata opened this issue Jun 15, 2022 · 2 comments · Fixed by #7679
Closed

Support user namespaces phase I in Kubernetes 1.25+ #7063

rata opened this issue Jun 15, 2022 · 2 comments · Fixed by #7679
Labels
area/cri Container Runtime Interface (CRI) kind/feature

Comments

@rata
Copy link
Contributor

rata commented Jun 15, 2022

What is the problem you're trying to solve

Kubernetes 1.25 will have support for user namespaces, the phase I of the KEP will be implemented and I'm planning to write the containerd patches for that too (I have an early prototype already).

Please note that phase 1 as described in the KEP doesn't support volumes except for ephemeral ones that have the same lifecycle of the pod, like secret or configmaps mounted as volumes. The kubelet will create the files for those volumes with the proper permissions so the user in the userns can read it. In other words, nothing to do in the container runtime for volumes yet :)

Describe the solution you'd like

The containerd implementation will have to solve the following items.

Bear in mind I will create an issue in the following days, if one doesn't exist already, for each item to explain it in more detail:

  • netns ownership issue: currently containerd cri pkg creates the netns and tells the OCI runtime to attach to this netns. Then we tell the OCI runtime to create the sandbox, and doing so the OCI runtime creates the rest of the namespaces. This won't work when we add userns, as the netns must be owned by the userns, therefore the netns can't be created first. More details about the different ideas to solve this, in the issue (will create an issue soon)
  • Ownership of the rootfs: The user in the userns will need to have acces to the rootfs. We can achieve it in several ways: (a) rely on idmap mounts (RFC: Initial support of idmapped mount points #5890), although for overlayfs we need a 5.19+ kernel, (b) chown the image and add support for metacopy overlayfs param (Metadata only copy up support for overlayfs snapshotter #6310), (c) fuse-overlayfs with its own usermode idmap. I would like to support idmap mounts and one other way as well, as that means users can start using it soon and not whenever they upgrade to a super recent kernel (and whenever they have a recent kernel, then we use that). More details in the issue I will create soon.
  • Just update the CRI changes (when that is merged in k/k upstream) and implement the feature. Depends on what we do for the previous steps, but this is simple judging from my PoC patches (specially if we manage to let the OCI runtime create the netns and userns)

Additional context

It will be great if anyone else wants to help with any of these issues, you can write me in the CNCF slack or k8s slack. You can find me there as Rodrigo Campos/rata, we can coordinate to not duplicate efforts :)

I'll focus first on the Kubernetes bits, as we have a tight deadline for 1.25. So, don't worry if I take some time to start opening PRs for this in containerd :)

@rata rata changed the title Support user namespaces in Kubernetes 1.25+ Support user namespaces phase I in Kubernetes 1.25+ Jun 15, 2022
@AkihiroSuda AkihiroSuda added the area/cri Container Runtime Interface (CRI) label Jun 15, 2022
@AkihiroSuda
Copy link
Member

AkihiroSuda commented Jun 15, 2022

We can achieve it in two ways: (a) rely on idmap mounts (#5890), although for overlayfs we need a 5.19+ kernel, (b) chown the image and add support for metacopy overlayfs param (#6310).

(c) fuse-overlayfs with its own usermode idmap

// WithRemapperLabels creates the labels used by any supporting snapshotter
// to shift the filesystem ownership (user namespace mapping) automatically; currently
// supported by the fuse-overlayfs snapshotter
func WithRemapperLabels(ctrUID, hostUID, ctrGID, hostGID, length uint32) snapshots.Opt {
return snapshots.WithLabels(map[string]string{
"containerd.io/snapshot/uidmapping": fmt.Sprintf("%d:%d:%d", ctrUID, hostUID, length),
"containerd.io/snapshot/gidmapping": fmt.Sprintf("%d:%d:%d", ctrGID, hostGID, length),
})
}

But anyway we should prioritize (a)

@rata
Copy link
Contributor Author

rata commented Jun 16, 2022

@AkihiroSuda I agree, (a) is very important. And added (c) now, thanks!

Do you know if fuse-overlayfs supports exposing the image with permissions changed without the storage overhead of a chown (nor the file nor the inodes usage, we can reduce the storage using metacopy but we use a lot of inodes) and without the latency startup of a chown too?

rata added a commit to kinvolk/containerd that referenced this issue Jun 29, 2022
This version contains the CRI changes for user namespaces support.
Future patches will use the new fields in the CRI.

Updating the module without using the new fields doesn't cause any
behaviour change.

Updates: containerd#7063

Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
rata added a commit to kinvolk/containerd that referenced this issue Jun 29, 2022
This version contains the CRI changes for user namespaces support.
Future patches will use the new fields in the CRI.

Updating the module without using the new fields doesn't cause any
behaviour change.

Updates: containerd#7063

Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
rata added a commit to kinvolk/containerd that referenced this issue Jun 29, 2022
This version contains the CRI changes for user namespaces support.
Future patches will use the new fields in the CRI.

Updating the module without using the new fields doesn't cause any
behaviour change.

Updates: containerd#7063

Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
rata added a commit to kinvolk/containerd that referenced this issue Jun 30, 2022
This version contains the CRI changes for user namespaces support.
Future patches will use the new fields in the CRI.

Updating the module without using the new fields doesn't cause any
behaviour change.

Updates: containerd#7063

Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
rata added a commit to kinvolk/containerd that referenced this issue Jun 30, 2022
This version contains the CRI changes for user namespaces support.
Future patches will use the new fields in the CRI.

Updating the module without using the new fields doesn't cause any
behaviour change.

Updates: containerd#7063

Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cri Container Runtime Interface (CRI) kind/feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants