-
Notifications
You must be signed in to change notification settings - Fork 159
SELinux Module for NVIDIA containers #42
Comments
Besided restoring the context of NVIDIA files for mounting, one crucial part of the story is the correct label of With the above mentioned module it is possible to run the device-plugin with a restricted SCC and with
No need to run the device-plugin or the gpu workload privileged in SELinux context. |
I don't understand why do you need more than |
I am basing
The hook mounts
The bin files mounted by the hook have
This should be clear, the devices have
There is currently no problem with the libraries The hook creates symlinks of each library and the symlinks get the correct label (container_file_t) inherited by the parent folder. A symlink has an own inode and hence gets an own selinux label.
This does not mean that you can create symlinks for a file that you're not able to read with a correct label, the type reading the symlinks must have permissions to read from the symlink src and dst. |
I believe @3XX0 expected the hook to run with context |
Beware if you're running with podman, you will need at least This will mount e.g |
No what I would like to know is why do we need the container_runtime* (i.e. all the non xserver_*) rules in the first place. But really, the problem I have with this patch is that you are assuming that the host driver files carry xserver contexts and we can't really be opinionated about that.
|
So I just looked into it and I think this is due to the fact that your policy doesn't have the
|
You're right. I've update the policy for the missing attributes, see updated nvidia-container.te. But still the problem with tmpfs stays because container_domain is only allowed to dir read.
But the container is doing more than just dir read, thats why I have added the other rules. Does the /proc/driver/nvidia/gpus/... need to be mounted as a tmpfs? The host labels are another point of discussion. The default selinux-policy is labelling bin files |
Oh right, I assumed it was already the case but thinking more about it, runc always bind mounts on top of its
So we might just be missing I understand why you do it, but coming back to our options, I would prefer we provide better file contexts than these ones in the first place (e.g.
If we were to rely on the default |
Totally on your side I do not like the We're currenltly in a position to create here an example workflow how to enable hw acclereators in general on a system with SELinux. If other hw vendors follow the path with a similar method to provide needed libraries (prestart-hook, bind mounts) we could create here "generic" rules for labeling and a policy for accelerators.
|
@3XX0 and @zvonkok , I try to setup the openshift+nvidia docker hook+SELinux environment for AI training job, I found some AI training Frameworks (Pytorch) want to write something to the /dev/shm in the gpu container, but after I run the container with "container_t" or @zvonkok's "nvidia_container_t", the /dev/shm in the container is not accessible by the training code. |
I think this is done, there is an example selinux policy for DGX available here: https://github.com/NVIDIA/dgx-selinux |
That should likely be generalized to non-DGX EL7 / EL8 environments and made part of this project's packages. |
When we run NVIDIA containers on a SELinux enabled distribution we need a separate SELinux module to run the container contained. Without a SELinux module we have to run the container
privileged
as this is the only way to allow specific SELinux contexts to interact (read, write, chattr, ...) with the files mounted into the container.A container running
privileged
will get thespc_t
label that is allowed to rw, chattr of base types. The base types (device_t, bin_t, proc_t, ...) are introduced by the bind mounts of the hook. A bind mount cannot have two different SELinux contexts as SELinux operates on inode level.I have created the following SELinux nvidia-container.te that works with podman/cri-o/docker.
A prerequisit for the SELinux module to work correctly is to ensure that the labels are correct for the mounted files. Therefore I have added a additional line to the oci-nvidia-hook where I am running a
With this, everytime a container is started the files to be mounted will have the correct SELinux label and the SELinux will work.
Now I can run NVIDIA containers without the
privileged
, cancap-drop=ALL
capabilites andsecurity-opt=no-new-privileges
.The text was updated successfully, but these errors were encountered: