New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow a CRI-O container's cgroup to be delegated to a user #5228
Comments
And, to anyone who might know- since we actually have to change some filesystem ownership and permissions, does this require messing with the OCI, making these changes as a cri-o process, or simply passing the right configs to the OCI? Specifically, we need to make the changes indicated here to the cgroup that contains the container processes. |
Documentation note, quoted in this lkml link:
A 'proper' way to delegate cgroups would be to use the /sys/kernel/cgroup/delegate file to know which files to chown. These are the files that belong to the cgroup itself and not the parent, and thus should be controlled by the delegatee. |
@giuseppe @kolyshkin ptal |
Since this is cgroup v2 and we do have cgroupns, this can probably be achieved by merely passing "nsdelegate" option to cgroupfs mount (and making it rw). I tried it briefly and it didn't work as expected, will continue tomorrow. |
I thought it might work like that when I read this here:
But when I try to parse that second sentence out, it doesn't really make much sense. The first sentence is saying it's delegating to a user when you grant write access to those files. The second sentence is saying it's delegating to a namespace, not a user, and in my knowledge of cgroups, that doesn't make sense. I haven't found any sources other than this documentation that backs this up as correct. What does delegating to a namespace mean? From what I can gather on this lkml thread, it seems that nsdelegate only makes namespaces into a stronger security boundary but it doesn't change the permissions on the files, and nothing short of actually changing those permissions has worked for me. |
I wouldn't guess that doing this would be possible without changing the ownership of the delegation files over to the delegatee user. |
is fine for your use case to use user namespaces as well? In this case the cgroup could be owned by root in the user namespace (crun always chowns the cgroup to root in the container) and still it maps to a different user on the host. |
I think that delegating to a userns root would be the ideal solution, though I would have to wait for kubernetes user namespace support. A non-userns delegation would be useful as a stop-gap or an alternative design option. Edit:
|
yes :) there is some kind of transparent support for user namespaces in CRI-O using annotations that was added with #3944 The downside is that user namespaces are still expensive, as we need to create a clone of the image to chown the files. There is support in the kernel now for doing it at runtime, but we are still not supporting it. |
I'll see if I can figure out how to make that work and get back. Sounds promising! |
cri-o/server/sandbox_run_linux.go Line 98 in ee8e721
@giuseppe Will it be possible to do this without being root in the securitycontext in the future? Or, when you use this annotation and runAsRoot in security context, are you still only the container's root and not the host's? |
So, I'm not that well versed in uidmapping: When I use io.kubernetes.cri-o.userns-mode: "private:uidmapping=0:57500:5000" where 57500 is my host uid and runAsUser: 0 I show up in the container as 5000. How would I map root in the container as my host uid? When I tried to make the last number '1', I got this:
|
have you tried just using |
Tried it just now- here's my annotation- do I need more?
|
what's |
Empty. I'm guessing it's not supposed to be? |
yeah I would have |
Had to change to a different kubernetes cluster because I was previously using a heavily modified kind cluster. But thanks for the help. I can confirm that as pod root within a namespace, I can create and configure cgroups. I'll need to do a little more research before I'm sure this works with other components of the program. Okay, had some spare cycles today to try all of this out more. I've managed to configure up a system where this works. Will do more testing to see if this works with my use case. |
Actually there is an ongoing effort to make this happen in OCI: opencontainers/runtime-spec#1123 |
Exactly, we tested userns and it heavily impacts pod start time (increased from a few seconds to 30-60 seconds) |
Yes, based on my development nothing special needs to be done in CRI-O at this point. It is possible (though I'm hoping not) that the runc behaviour might land behind an OCI annotation to enable it, in which case either CRI-O needs to add the annotation, or the user must add the annotation to the Pod spec (CRI-O propagates these annotations to the OCI config). |
runc PR opencontainers/runc#3057 was merged. It will automatically chown the cgroupfs to the container process UID when particular conditions are met, as now specified in the OCI runtime-spec: https://github.com/opencontainers/runtime-spec/blob/main/config-linux.md#cgroup-ownership. Accordingly there is nothing more to be done here. I leave to a CRI-O maintainer to close this ticket. |
Wanted to come back and say that I've had a chance to test this stack, and it works perfectly! I'm able to get the cgroup hierarchy delegated to a user on the host namespace with the cgroup2-mount-hierarchy-rw annotation in cri-o set to "true" and with default settings on the newest (built from github source) runc. Perhaps in the future we can get a setting to the cgroup.max.descendants for the base of the hierarchy and remove the need for an annotation. |
Reference to discussion in 8/19/2021 weekly meeting.
Acceptance: When crio is using cgroups v2, there should be a way to specify a user that will own the root cgroup within the container. Low level details not yet determined.
My first step is to try to make a POC.
The text was updated successfully, but these errors were encountered: