Automatically provisioning X11 and Wayland devices of GPU inside container? #118

ehfd · 2020-11-28T13:17:16Z

Redirected from NVIDIA/k8s-device-plugin#206 to a more suitable repository.

1. Issue or feature description

In docker and kubernetes, people have had to have manual host setup to provision the X server using host path directive /tmp/.X11-unix. This is quite tedious for sysadmins and at the same time a security threat as people can spoof the host.

To mitigate this, there have been attempts (https://github.com/ehfd/docker-nvidia-glx-desktop which is based on https://github.com/ryought/glx-docker-headless-gpu) to execute an X server and use GLX inside the container after getting provisioned the GPU using libnvidia-container.

An alternative was created by the developers at VirtualGL (used widely in HPC to enable GPU-based rendering in VNC virtual display environments) have been able to develop a feature that uses the EGL API to enable 3D GL rendering such as Blender, Matlab, and Unity, previously only possible with GLX and thus an X server. As you guys know well, nvidia-docker does not support GLX but has introduced the EGL API just below two years ago.
See EGL config section of VirtualGL/virtualgl#113 (comment)

EGL is also required to start a Wayland compositor inside a container with the EGLStreams specification in NVIDIA GPUs, which is the way forward after X11 development has stopped.

These use cases require access to the devices /dev/dri/cardX corresponding to each GPU provisioned using libnvidia-container. However, it does not seem like libnvidia-container provisions this automatically. I would like to ask you whether this is possible, and how this can be configured.

2. Steps to reproduce the issue

Provision one GPU inside container nvidia/cudagl:11.0-devel-ubuntu20.04 or nvidia/opengl:1.2-glvnd-devel-ubuntu20.04 in Docker CE 19.03 (or using one nvidia.com/gpu: 1 with k8s-device-plugin v0.7.0 with default configurations in Kubernetes v1.18.6).

Do: ls /dev

Result: Inside the container you see /dev/nvidiaX, /dev/nvidia-modeset, /dev/nvidia-uvm, /dev/nvidia-uvm-tools, HOWEVER directory /dev/dri does not exist.
Wayland compositors are unlikely to start inside a container without DRM devices. VirtualGL does not work through any devices other than /dev/dri/cardX as well.

3. Information to attach (optional if deemed irrelevant)

Other issues and repositories:
Example of VirtualGL EGL configuration that requires /dev/dri/cardX: https://github.com/ehfd/docker-nvidia-egl-desktop

Implementation of an unprivileged remote desktop bundling an X server with many hacks: https://github.com/ehfd/docker-nvidia-glx-desktop

The text was updated successfully, but these errors were encountered:

klueska · 2020-12-01T11:56:41Z

I have added this feature request to our backlog. At present we have a big backlog, so it's unclear exactly when we will be able to look at this in detail.

That said, it feels like it could be added as a new NVIDIA_DRIVER_CAPABILITY that tries to look for these devices if they exist and inject them. You would set this capability either in the container image or the command line via an environment variable (which would work in the k8s context as well).

ehfd · 2020-12-02T10:35:50Z

As you see the thumbs up, this feature is in quite a big demand, so it would be great to be implemented quickly. Thank you.

xkszltl · 2020-12-15T19:33:25Z

If you get a chance to do that, maybe add /dev/gdrdrv for nvidia gdrcopy as well.

ehfd · 2021-01-23T03:31:17Z

https://github.com/mviereck/x11docker/wiki/Hardware-acceleration#share-nvidia-device-files-with-container

To use a custom base image, share all files matching /dev/nvidia*, /dev/nvhost* and /dev/nvmap with docker run option --device. Share /dev/dri and /dev/vga_arbiter, too. Add container user to groups video and render with --group-add video --group-add render.

In addition to the initial feature request, these are all the devices required to be provisioned automatically for NVIDIA to officially support Display (e.g. X11, Wayland) in Docker. If these devices are able to be provisioned using the container toolkit automatically, the nvidia/opengl container (nvidia-docker) can properly support the NVIDIA version of XWayland (currently undergoing support into the Linux kernel by NVIDIA devs) and thus support Displays.

There are a lot of people waiting for Display support in Docker and Kubernetes, especially because NVIDIA is to support XWayland in the near future. Please implement this feature to streamline this.

ehfd · 2021-07-21T10:44:23Z

Any updates? @klueska
I was able to start up an unprivileged X server inside an OCI Docker container with nvidia-docker in https://github.com/ehfd/docker-nvidia-glx-desktop, but thinking ahead to Wayland support (since the 470 driver is out), we likely this.

ehfd · 2022-06-15T10:08:25Z

Please use https://gitlab.com/arm-research/smarter/smarter-device-manager for /dev/dri/card* and /dev/dri/render* if you stumble upon this issue.

ehfd · 2022-09-12T14:31:49Z

EGL does not require /dev/dri for NVIDIA devices. VirtualGL has merged support for GLX over EGL without such devices.

ehfd · 2022-09-28T15:55:50Z

Still likely needed for Wayland with GBM.

elezar · 2022-09-28T15:59:24Z

Thanks @ehfd. We are working on improving the injection of these devices in an upcoming release. Note that the current plan is to do so using the nvidia-container-runtime at an OCI runtime specification level instead of relying on the NVIDIA Container CLI.

Do you have samples containers / test cases that you would be able to provide to ensure that we meet the requirements?

ehfd · 2022-09-28T16:21:51Z

@elezar
https://github.com/ehfd/docker-nvidia-glx-desktop/blob/main/entrypoint.sh
https://github.com/ehfd/docker-nvidia-egl-desktop/blob/main/entrypoint.sh

These two repositories involve a series of hacks to make NVIDIA GPUs work reliably inside a container unprivileged with a properly accelerated GUI.

docker-nvidia-glx-desktop must install the userspace driver components at startup mostly following your examples but after reading from /proc/driver/nvidia/version because libraries aren't injected to the container.

In the current state, the same userspace driver installation must be done for Wayland by reading /proc/driver/nvidia/version as well. This is undesirable.

Also, in docker-nvidia-egl-desktop, where the userspace drivers aren't installed at startup, an annoying situation arises, where Vulkan requires the display capability of NVIDIA_DRIVER_CAPABILITIES must be included because nvidia_icd.json requires libGLX_nvidia.so.0 and probably more other libraries even when not using Xorg with the NVIDIA driver.

Vulkan should be possible only with the graphics capability as intended, but it requires display as well. NVIDIA/nvidia-container-toolkit#140 ~~Thank god it does work without major modifications to libnvidia-container.~~

And the display feature currently does not enable starting an Xorg server with the NVIDIA driver in its current state, because of the lack of the libraries being injected to the container. Hence the hacks applied by these two containers are required.

Please also consider injecting the necessary libraries for NVFBC with the video capability as well, even if the SDK must be installed inside the container.

We really hope that NVIDIA_DRIVER_CAPABILITIES starts working properly and that the hacks that my containers applied won't be needed anymore. These can all likely be done at OCI runtime specification level.

Note that we currently use https://gitlab.com/arm-research/smarter/smarter-device-manager for provisioning /dev/dri devices, but there is no methodology to push just the devices for the GPU allocated to the container.

Thanks a lot!

elezar · 2022-09-28T17:00:12Z

Thanks for all the information. I will comb through it while working on the feature. Hopefully we can improve things significantly!

Zubnix · 2022-10-18T10:39:17Z

Hi @elezar @ehfd,

I'm writing a remote Wayland compositor and am currently busy integrating it with k8s and can independently confirm everything @ehfd has stated so far, as I've hit all of them in the last couple of weeks. Being able to access /dev/dri/renderDevice12x and /dev/dri/cardx while limiting, preferably eliminating startup actions and driver dependencies of a container is an absolute must.

@elezar I'm happy to assist and answer any questions you might have to help move this forward!

elezar · 2022-10-18T10:55:50Z

Thanks @Zubnix. We have started work on injecting the /dev/dri/cardx devices as part of https://gitlab.com/nvidia/container-toolkit/container-toolkit/-/merge_requests/219

I think in all cases having a list of specific devices, libraries, and environment variables that are required in a container for things to work as expected would be quite useful. We will be sure to update this issue as soon as there is something our for testing and early feedback.

ehfd · 2022-10-18T12:05:16Z

@Zubnix Hi! I've been having interest in Greenfield for a long time. Nice to meet you here! I also hope that eliminating driver dependencies of a container is very important. Thanks for your feedback!
Btw, do you have any interest in using WebTransport over WebSockets in your project?

Zubnix · 2022-10-18T12:27:44Z

Hi @ehfd I've written my answer here as not to hijack this thread :)

ehfd · 2023-01-16T11:58:16Z

@elezar Hi! I saw that the /dev/dri component got merged.
https://gitlab.com/nvidia/container-toolkit/container-toolkit/-/commit/f7021d84b555b00857640681136b9b9b08fd067f

I believe that should make Wayland fundamentally work in Kubernetes.

Would it be possible to pass the below library components for enhanced X11/Wayland support?
https://download.nvidia.com/XFree86/Linux-x86_64/525.78.01/README/installedcomponents.html

elezar · 2023-01-18T10:13:05Z

Thanks @ehfd I will have a look at the link you suggested.

ehfd · 2023-01-18T12:18:01Z

@elezar
In specific, I feel the below are neccessary for a full X11/Wayland + OpenGL EGL/GLX + Vulkan stack without downloading the driver from the container.

Anything with AND means should be injected in either of the cases. And as you know well, the generic symlinks to the .so.525.78.01 files should be passed.

And I believe that, for practical use, everything in graphics should be injected anyways if display is specified without graphics. Else I feel that it won't work.

Configuration .json files should be added to the container like the base images do now.

(should be injected to display)
'/usr/lib/xorg/modules/drivers/nvidia_drv.so'
'/usr/lib/xorg/modules/extensions/libglxserver_nvidia.so.525.78.01'
'/usr/bin/nvidia-xconfig'
'/usr/bin/nvidia-settings' + /usr/lib/libnvidia-gtk2.so.525.78.01 and on some platforms /usr/lib/libnvidia-gtk3.so.525.78.01

(should be injected to graphics AND display, probably already injected)
'/usr/lib/libGL.so.1', '/usr/lib/libEGL.so.1', '/usr/lib/libGLESv1_CM.so.525.78.01', '/usr/lib/libGLESv2.so.525.78.01', '/usr/lib/libEGL_nvidia.so.0'

(should be injected to graphics AND display)
'/usr/lib/libOpenGL.so.0', '/usr/lib/libGLX.so.0', and '/usr/lib/libGLdispatch.so.0', '/usr/lib/libnvidia-tls.so.525.78.01'

(currently injected to display only, must be injected for graphics too in order to use Vulkan)
'/usr/lib/libGLX_nvidia.so.0' and the configuration /etc/vulkan/icd.d/nvidia_icd.json

(should be injected to display AND egl, else eglinfo segfaults)
'/usr/lib/libnvidia-egl-wayland.so.1' and the config '/usr/share/egl/egl_external_platform.d/10_nvidia_wayland.json'
'/usr/lib/libnvidia-egl-gbm.so.1' and the config '/usr/share/egl/egl_external_platform.d/15_nvidia_gbm.json'

(should be injected to video AND display)
/usr/lib/libnvidia-fbc.so.525.78.01

(should be injected to graphics AND video)
/usr/lib/libnvoptix.so.1

(should be injected to compute as there is a CUDA and CUVID dependency)
/usr/lib/libnvidia-opticalflow.so.525.78.01

(should be injected to video, not currently injected)
/usr/lib/vdpau/libvdpau_nvidia.so.525.78.01

(should be injected to video)
/usr/lib/libnvidia-encode.so.525.78.01

(should be injected to both compute AND video)
/usr/lib/libnvcuvid.so.525.78.01

(should be injected to compute if not already there)
Two OpenCL libraries (/usr/lib/libOpenCL.so.1.0.0, /usr/lib/libnvidia-opencl.so.525.78.01); the former is a vendor-independent Installable Client Driver (ICD) loader, and the latter is the NVIDIA Vendor ICD. A config file /etc/OpenCL/vendors/nvidia.icd is also installed, to advertise the NVIDIA Vendor ICD to the ICD Loader.

(should be injected to utility)
/usr/lib/libnvidia-ml.so.525.78.01

(should be injected to ngx)
/usr/lib/libnvidia-ngx.so.525.78.01
/usr/bin/nvidia-ngx-updater
/usr/lib/nvidia/wine/nvngx.dll
/usr/lib/nvidia/wine/_nvngx.dll

Various libraries that are used internally by other driver components. These include /usr/lib/libnvidia-cfg.so.525.78.01, /usr/lib/libnvidia-compiler.so.525.78.01, /usr/lib/libnvidia-eglcore.so.525.78.01, /usr/lib/libnvidia-glcore.so.525.78.01, /usr/lib/libnvidia-glsi.so.525.78.01, /usr/lib/libnvidia-glvkspirv.so.525.78.01, /usr/lib/libnvidia-rtcore.so.525.78.01, and /usr/lib/libnvidia-allocator.so.525.78.01.

ehfd · 2023-11-25T10:06:16Z

As of libnvidia-container 1.14.3-1:

/usr/lib/xorg/modules/drivers/nvidia_drv.so
/usr/lib/xorg/modules/extensions/libglxserver_nvidia.so.525.78.01

libnvidia-egl-gbm.so.1
libnvidia-egl-wayland.so.1

libnvidia-vulkan-producer.so

gbm/nvidia-drm_gbm.so

These important libraries are still not provisioned.

@elezar

ehfd · 2023-12-21T21:50:14Z

@klueska @elezar A reminder for you guys... The below are the only libraries left until I can finally close this three-year-old issue and both X11 and Wayland works inside a container.

This is likely a 30 minute work for you guys.

Things mostly work now, but only after downloading .run userspace driver library files inside the container.

/usr/lib/xorg/modules/drivers/nvidia_drv.so
/usr/lib/xorg/modules/extensions/libglxserver_nvidia.so.525.78.01

libnvidia-egl-gbm.so.1
libnvidia-egl-wayland.so.1

libnvidia-vulkan-producer.so

gbm/nvidia-drm_gbm.so

If you can't include some of these into the container toolkit, please tell us why.

elezar · 2024-01-08T12:57:26Z

@ehfd thanks for the reminder here.

Some of the libraries are already handled by the NVIDIA Container Toolkit -- with the Caveat that their detection may be distribution dependent at the moment. The main thing to change here is where we search for the libraries. There should be no technical reason for why we haven't done this and the delay is largely caused by resource constraints.

Note that in theory, if you mount these missing libraries from the host it should not be required to use the .run file to install the user space libraries in the container.

If you have capacity to contribute the changes, I would be happy to review these. Note that I would recommend making these against the NVIDIA Container Toolkit where we already inject some of the libraries that you mentioned.

ehfd · 2024-01-08T20:07:26Z

Thank you @elezar
I will assess this within the NVIDIA GitLab repositories and possibly contribute code to inject these packages.
Thanks!

ehfd · 2024-01-09T21:25:20Z

https://gitlab.com/nvidia/container-toolkit/libnvidia-container/-/blob/main/src/nvc_info.c
https://gitlab.com/nvidia/container-toolkit/container-toolkit/-/blob/main/internal/discover/graphics.go

These look like the code responsible.

CC @elezar @Zubnix @ABeltramo

ehfd · 2024-01-10T00:19:25Z

@elezar

The core issue seems that https://gitlab.com/nvidia/container-toolkit/container-toolkit/-/blob/main/internal/discover/graphics.go does not invoke with Docker somehow. Perhaps this might be something with the Docker runner not being based on CDI?

elezar · 2024-01-10T12:39:45Z

To trigger the logic as linked you need to:

Use the nvidia runtime
Ensure that NVIDIA_DRIVER_CAPABILITIES includes graphics or display.

To configure the nvidia runtime for docker follow the steps described here.

Then we can run a container:

docker run --rm -ti --runtime=nvidia --gpus=all -e NVIDIA_DRIVER_CAPABILITIES=all ubuntu

This does not require CDI support explicitly.

ehfd · 2024-03-28T09:50:08Z

Most of the above issues were probably because the PPA for graphics drivers did not install:

libnvidia-egl-gbm1
libnvidia-egl-wayland1

ehfd · 2024-05-10T14:34:07Z

@elezar I have a contribution.

NVIDIA/nvidia-container-toolkit#490

ehfd · 2024-05-10T18:58:42Z

NVIDIA/nvidia-container-toolkit#490 (comment)

More detailed situation and requirements to close this issue conclusively.

ehfd mentioned this issue Nov 28, 2020

Automatically provisioning /dev/dri devices of GPU inside container? NVIDIA/k8s-device-plugin#206

Closed

ehfd closed this as completed Sep 12, 2022

ehfd reopened this Sep 28, 2022

ehfd mentioned this issue Jan 18, 2023

Add support for NVIDIA FBC+NVENC selkies-project/selkies-gstreamer#30

Open

2 tasks

ehfd mentioned this issue Nov 18, 2023

Integrate the NVIDIA container toolkit games-on-whales/wolf#52

Open

ehfd changed the title ~~Automatically provisioning /dev/dri devices of GPU inside container?~~ Automatically provisioning X11 and Wayland devices of GPU inside container? Dec 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatically provisioning X11 and Wayland devices of GPU inside container? #118

Automatically provisioning X11 and Wayland devices of GPU inside container? #118

ehfd commented Nov 28, 2020 •

edited

klueska commented Dec 1, 2020

ehfd commented Dec 2, 2020

xkszltl commented Dec 15, 2020

ehfd commented Jan 23, 2021 •

edited

ehfd commented Jul 21, 2021 •

edited

ehfd commented Jun 15, 2022

ehfd commented Sep 12, 2022

ehfd commented Sep 28, 2022 •

edited

elezar commented Sep 28, 2022

ehfd commented Sep 28, 2022 •

edited

elezar commented Sep 28, 2022

Zubnix commented Oct 18, 2022 •

edited

elezar commented Oct 18, 2022

ehfd commented Oct 18, 2022 •

edited

Zubnix commented Oct 18, 2022 •

edited

ehfd commented Jan 16, 2023 •

edited

elezar commented Jan 18, 2023

ehfd commented Jan 18, 2023 •

edited

ehfd commented Nov 25, 2023 •

edited

ehfd commented Dec 21, 2023 •

edited

elezar commented Jan 8, 2024

ehfd commented Jan 8, 2024

ehfd commented Jan 9, 2024 •

edited

ehfd commented Jan 10, 2024 •

edited

elezar commented Jan 10, 2024

ehfd commented Mar 28, 2024

ehfd commented May 10, 2024

ehfd commented May 10, 2024 •

edited

Automatically provisioning X11 and Wayland devices of GPU inside container? #118

Automatically provisioning X11 and Wayland devices of GPU inside container? #118

Comments

ehfd commented Nov 28, 2020 • edited

1. Issue or feature description

2. Steps to reproduce the issue

3. Information to attach (optional if deemed irrelevant)

klueska commented Dec 1, 2020

ehfd commented Dec 2, 2020

xkszltl commented Dec 15, 2020

ehfd commented Jan 23, 2021 • edited

ehfd commented Jul 21, 2021 • edited

ehfd commented Jun 15, 2022

ehfd commented Sep 12, 2022

ehfd commented Sep 28, 2022 • edited

elezar commented Sep 28, 2022

ehfd commented Sep 28, 2022 • edited

elezar commented Sep 28, 2022

Zubnix commented Oct 18, 2022 • edited

elezar commented Oct 18, 2022

ehfd commented Oct 18, 2022 • edited

Zubnix commented Oct 18, 2022 • edited

ehfd commented Jan 16, 2023 • edited

elezar commented Jan 18, 2023

ehfd commented Jan 18, 2023 • edited

ehfd commented Nov 25, 2023 • edited

ehfd commented Dec 21, 2023 • edited

elezar commented Jan 8, 2024

ehfd commented Jan 8, 2024

ehfd commented Jan 9, 2024 • edited

ehfd commented Jan 10, 2024 • edited

elezar commented Jan 10, 2024

ehfd commented Mar 28, 2024

ehfd commented May 10, 2024

ehfd commented May 10, 2024 • edited

ehfd commented Nov 28, 2020 •

edited

ehfd commented Jan 23, 2021 •

edited

ehfd commented Jul 21, 2021 •

edited

ehfd commented Sep 28, 2022 •

edited

ehfd commented Sep 28, 2022 •

edited

Zubnix commented Oct 18, 2022 •

edited

ehfd commented Oct 18, 2022 •

edited

Zubnix commented Oct 18, 2022 •

edited

ehfd commented Jan 16, 2023 •

edited

ehfd commented Jan 18, 2023 •

edited

ehfd commented Nov 25, 2023 •

edited

ehfd commented Dec 21, 2023 •

edited

ehfd commented Jan 9, 2024 •

edited

ehfd commented Jan 10, 2024 •

edited

ehfd commented May 10, 2024 •

edited