Skip to content

Conversation

@mihalicyn
Copy link
Contributor

Without this fix it's not possible to use a custom config search path option with nvcdi interface.

Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
mihalicyn added a commit to mihalicyn/lxd-pkg-snap that referenced this pull request Jan 7, 2025
…variable

We have to adjust XDG_DATA_DIRS variable to make nvidia-container-toolkit
happy and find necessary configuration files. Actually, XDG_DATA_DIRS environment
variable value we provided with is already correct (it's getting prepared for us in
https://github.com/canonical/mesa-2404/blob/main/scripts/bin/gpu-2404-provider-wrapper.in
and
https://git.launchpad.net/~canonical-kernel-snaps/canonical-kernel-snaps/+git/kernel-snaps-u24.04/tree/hooks/kernel-gpu-2404-provider-mangler?h=pc-components)
but for some reason nvidia-container-toolkit library expects these path to be relative to driver root,
which leads to a situation when driver root is getting appended twice to a path and files
can not be found. One possible (and clean) solution would be to explicitly
specify config files seach path with nvcdi.WithConfigSearchPaths() but it doesn't
work because of a bug NVIDIA/nvidia-container-toolkit#847.

So the easiest way I found is to adjust XDG_DATA_DIRS variable in that weird way
so when nvidia-container-toolkit prepend driverRoot to it a final path became
valid and everything works.

Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
mihalicyn added a commit to mihalicyn/lxd-pkg-snap that referenced this pull request Jan 8, 2025
…variable

We have to adjust XDG_DATA_DIRS variable to make nvidia-container-toolkit
happy and find necessary configuration files. Actually, XDG_DATA_DIRS environment
variable value we provided with is already correct (it's getting prepared for us in
https://github.com/canonical/mesa-2404/blob/main/scripts/bin/gpu-2404-provider-wrapper.in
and
https://git.launchpad.net/~canonical-kernel-snaps/canonical-kernel-snaps/+git/kernel-snaps-u24.04/tree/hooks/kernel-gpu-2404-provider-mangler?h=pc-components)
but for some reason nvidia-container-toolkit library expects these path to be relative to driver root,
which leads to a situation when driver root is getting appended twice to a path and files
can not be found. One possible (and clean) solution would be to explicitly
specify config files seach path with nvcdi.WithConfigSearchPaths() but it doesn't
work because of a bug NVIDIA/nvidia-container-toolkit#847.

So the easiest way I found is to adjust XDG_DATA_DIRS variable in that weird way
so when nvidia-container-toolkit prepend driverRoot to it a final path became
valid and everything works.

Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
tomponline added a commit to canonical/lxd-pkg-snap that referenced this pull request Jan 8, 2025
…variable (#679)

We have to adjust XDG_DATA_DIRS variable to make
nvidia-container-toolkit happy and find necessary configuration files.
Actually, XDG_DATA_DIRS environment variable value we provided with is
already correct (it's getting prepared for us in
https://github.com/canonical/mesa-2404/blob/main/scripts/bin/gpu-2404-provider-wrapper.in
and

https://git.launchpad.net/~canonical-kernel-snaps/canonical-kernel-snaps/+git/kernel-snaps-u24.04/tree/hooks/kernel-gpu-2404-provider-mangler?h=pc-components)
but for some reason nvidia-container-toolkit library expects these path
to be relative to driver root, which leads to a situation when driver
root is getting appended twice to a path and files can not be found. One
possible (and clean) solution would be to explicitly specify config
files search path with nvcdi.WithConfigSearchPaths() but it doesn't work
because of a bug
NVIDIA/nvidia-container-toolkit#847.

So the easiest way I found is to adjust XDG_DATA_DIRS variable in that
weird way so when nvidia-container-toolkit prepend driverRoot to it a
final path became valid and everything works.

This is to fix an issue with delivering some of the nvidia files that
CDI does by default, which are the config files for things like Vulkan
and EGL The files are included in the kernels user components, so it
just seems like LXD is excluding passing them through via CDI.

Please don't consider this an exhaustive list of files, necessarily, but
examples are:

```
/usr/share/glvnd/egl_vendor.d/10_nvidia.json
/usr/share/glvnd/egl_vendor.d/50_mesa.json
/usr/share/vulkan/icd.d/nvidia_icd.json
/usr/share/vulkan/implicit_layer.d/nvidia_layers.json
/usr/share/egl/egl_external_platform.d/15_nvidia_gbm.json
/usr/share/egl/egl_external_platform.d/10_nvidia_wayland.json
```
Copy link
Member

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mihalicyn.

Would you be able to backport this commit to the release-1.17 branch once we've merged it too?

@elezar elezar self-assigned this Jan 9, 2025
@elezar elezar added the must-backport The changes in PR need to be backported to at least one stable release branch. label Jan 9, 2025
@mihalicyn
Copy link
Contributor Author

Would you be able to backport this commit to the release-1.17 branch once we've merged it too?

Sure!

@elezar elezar merged commit b6d360f into NVIDIA:main Jan 14, 2025
10 checks passed
@elezar
Copy link
Member

elezar commented Jan 14, 2025

Thanks. Please ping me on the backport PR.

@elezar
Copy link
Member

elezar commented Jan 15, 2025

I created #862 as a backport.

tomponline pushed a commit to tomponline/lxd-pkg-snap that referenced this pull request Jan 27, 2025
…variable

We have to adjust XDG_DATA_DIRS variable to make nvidia-container-toolkit
happy and find necessary configuration files. Actually, XDG_DATA_DIRS environment
variable value we provided with is already correct (it's getting prepared for us in
https://github.com/canonical/mesa-2404/blob/main/scripts/bin/gpu-2404-provider-wrapper.in
and
https://git.launchpad.net/~canonical-kernel-snaps/canonical-kernel-snaps/+git/kernel-snaps-u24.04/tree/hooks/kernel-gpu-2404-provider-mangler?h=pc-components)
but for some reason nvidia-container-toolkit library expects these path to be relative to driver root,
which leads to a situation when driver root is getting appended twice to a path and files
can not be found. One possible (and clean) solution would be to explicitly
specify config files seach path with nvcdi.WithConfigSearchPaths() but it doesn't
work because of a bug NVIDIA/nvidia-container-toolkit#847.

So the easiest way I found is to adjust XDG_DATA_DIRS variable in that weird way
so when nvidia-container-toolkit prepend driverRoot to it a final path became
valid and everything works.

Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
(cherry picked from commit 5e9629b)
mihalicyn added a commit to mihalicyn/lxd that referenced this pull request Feb 19, 2025
After we've got NVIDIA/nvidia-container-toolkit#847
merged we can use WithConfigSearchPaths() nvcdi option to specify
where NVIDIA configuration files are located and revert our previous
hacky fix from canonical/lxd-pkg-snap#679

No behavior change for non-Ubuntu Core environment intended.

Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
tomponline added a commit to canonical/lxd that referenced this pull request Feb 19, 2025
…15027)

After we've got
NVIDIA/nvidia-container-toolkit#847 merged we
can use WithConfigSearchPaths() nvcdi option to specify where NVIDIA
configuration files are located and revert our previous hacky fix from
canonical/lxd-pkg-snap#679

No behavior change for non-Ubuntu Core environment intended.

Please, merge with canonical/lxd-pkg-snap#739
tomponline pushed a commit to tomponline/lxd that referenced this pull request Mar 7, 2025
After we've got NVIDIA/nvidia-container-toolkit#847
merged we can use WithConfigSearchPaths() nvcdi option to specify
where NVIDIA configuration files are located and revert our previous
hacky fix from canonical/lxd-pkg-snap#679

No behavior change for non-Ubuntu Core environment intended.

Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
(cherry picked from commit 16777c1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

must-backport The changes in PR need to be backported to at least one stable release branch.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants