-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inject additional libraries required for full display functionality #490
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ehfd.
Some questions to start with.
internal/discover/graphics.go
Outdated
libRoot + "/nvidia/xorg", | ||
libRoot + "/xorg/modules/extensions", | ||
"/usr/lib/xorg/modules/extensions", | ||
"/usr/X11R6/modules/extensions", | ||
libRoot + "/xorg/modules/drivers", | ||
"/usr/lib/xorg/modules/drivers", | ||
"/usr/X11R6/modules/drivers", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First question: Is the ordering of these important? Does it make sense to group the libRoot
search paths?
Second: Are any of these paths controllable by envvars (such as XDG_DATA_DIRS
for icd loaders)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Ordering is not important. I moved the libroot paths together.
-
Envvars: ... I don't really think there's an envvar specifically for where the X.Org modules are.
The additional paths are because these libraries are not guaranteed to be installed in libRoot
for some types of installations including .run
despite the distro's preference to place libraries at a certain different location based on pkg-config
, so they have to be hard-coded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check the below comment for an answer to this: #490 (comment)
internal/discover/graphics.go
Outdated
@@ -271,13 +274,21 @@ func newXorgDiscoverer(logger logger.Interface, driver *root.Driver, nvidiaCTKPa | |||
lookup.NewFileLocator( | |||
lookup.WithLogger(logger), | |||
lookup.WithRoot(driver.Root), | |||
lookup.WithSearchPaths(libRoot, "/usr/lib/x86_64-linux-gnu"), | |||
lookup.WithSearchPaths([]string{ | |||
libRoot + "/nvidia/xorg", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If libcuda.so.<RM_VERSION>
is at /usr/lib/x86_64-linux-gnu
, for example, libRoot will be /usr/lib/x86_64-linux-gnu
is this the intent? Or should this always be /usr/lib/...
or /usr/X11R/...
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. This is the intent. I removed "/usr/lib/x86_64-linux-gnu"
because it's not arch-neutral and redundant with libRoot.
In Ubuntu's APT installation for NVIDIA drivers, the paths where these packages are installed resolve to /usr/lib/x86_64-linux-gnu/nvidia/xorg
for both X11-related .so files, as intended.
Also, /usr/lib/x86_64-linux-gnu/xorg/modules/extensions
and /usr/lib/x86_64-linux-gnu/xorg/modules/drivers
are also existing paths, so it's also possible for these .so files to round up there.
internal/discover/graphics.go
Outdated
@@ -56,6 +56,9 @@ func NewGraphicsMountsDiscoverer(logger logger.Interface, driver *root.Driver, n | |||
driver.Root, | |||
[]string{ | |||
"libnvidia-egl-gbm.so.*", | |||
"gbm/nvidia-drm_gbm.so", | |||
"libnvidia-egl-wayland.so.*", | |||
"libnvidia-vulkan-producer.so", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
libnvidia-vulkan-producer.so
is a symlink which should resolve to libnvidia-vulkan-producer.so.RM_VERSION
. The latter should be detected by the "standard" driver libraries unless it's at a different location to libcuda.so.RM_VERSION
.
Is the issue that the .RM_VERSION
library is not injected, or that the symlink is not created?
Looking at the driver manifest we have:
libnvidia-vulkan-producer.so.535.86.10 0755 OPENGL_LIB NATIVE MODULE:egl
libnvidia-vulkan-producer.so 0000 UTILITY_LIB_SYMLINK NATIVE libnvidia-vulkan-producer.so.535.86.10 MODULE:egl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither are injected. Is it ok if I simply put "libnvidia-vulkan-producer.so*",
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the driver manifest we have:
Where can I see this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The manifest is available when extracting a driver .run
file.
Neither are injected. Is it ok if I simply put "libnvidia-vulkan-producer.so*",?
No, since we need to treat symlinks correctly. I assume that the issue is then that libnvidia-vulkan-producer.so.<RM_VERSION>
doesn't exist at the same location as libcuda.so.<RM_VERSION>
on your system. Can you confirm this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Symlinks should be created via hook. A bind mount may fail.
I can rewrite this for the vulkan producer and gbm/nvidia-drm_gbm.so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Symlinks should be created via hook. A bind mount may fail.
After looking at the code, this is not true here. The driver locator will take care of symlinks and resolve them on the host side.
If you could provide some steps on how you would go about validating correct functionality, that would also be great -- we are looking to improve our automated test coverage and having a starting point would be welcome. |
…ix paths Signed-off-by: Seungmin Kim <8457324+ehfd@users.noreply.github.com>
This is a pretty hard question... The first priority regardless is to consult the NVIDIA driver team on structuring libraries and these things. Like a communication channel which you can get informed of library structures and such related aspects of the driver in various environments. You have cooperative colleagues at work; so you should consider working with them. --x-prefix=X-PREFIX
The prefix under which the X components of the NVIDIA driver will be installed; the default is '/usr/X11R6'
unless nvidia-installer detects that X.Org >= 7.0 is installed, in which case the default is '/usr'. Only under
rare circumstances should this option be used.
--xfree86-prefix=XFREE86-PREFIX
This is a deprecated synonym for --x-prefix.
--x-module-path=X-MODULE-PATH
The path under which the NVIDIA X server modules will be installed. If this option is not specified,
nvidia-installer uses the following search order and selects the first valid directory it finds: 1) `X
-showDefaultModulePath`, 2) `pkg-config --variable=moduledir xorg-server`, or 3) the X library path (see the
'--x-library-path' option) plus either 'modules' (for X servers older than X.Org 7.0) or 'xorg/modules' (for
X.Org 7.0 or later).
--x-library-path=X-LIBRARY-PATH
The path under which the NVIDIA X libraries will be installed. If this option is not specified, nvidia-installer
uses the following search order and selects the first valid directory it finds: 1) `X -showDefaultLibPath`, 2)
`pkg-config --variable=libdir xorg-server`, or 3) the X prefix (see the '--x-prefix' option) plus 'lib' on 32bit
systems, and either 'lib64' or 'lib' on 64bit systems, depending on the installed Linux distribution.
--x-sysconfig-path=X-SYSCONFIG-PATH
The path under which X system configuration files will be installed. If this option is not specified,
nvidia-installer uses the following search order and selects the first valid directory it finds: 1) `pkg-config
--variable=sysconfigdir xorg-server`, or 2) /usr/share/X11/xorg.conf.d. For example, the above (visible through WARNING: nvidia-installer was forced to guess the X library path '/usr/lib64' and X module path
'/usr/lib64/xorg/modules'; these paths were not queryable from the system. If X fails to find the NVIDIA X
driver module, please install the `pkg-config` utility and the X.Org SDK/development package for your
distribution and reinstall the driver. Leading to: /usr/lib64/xorg/modules/extensions/libglxserver_nvidia.so
/usr/lib64/xorg/modules/extensions/libglxserver_nvidia.so.550.78
/usr/lib64/xorg/modules/drivers/nvidia_drv.so For container hosts, neither If they exist in Ubuntu: Similarly for libglvnd: These are the candidate "environment variables" you are looking for, at least for The second priority is to test different types of driver installation methods (because the above path method for Although the above should prevent most issues, checking whether X11 is started correctly is a different story, and probably outside the scope of this project unless NVIDIA decides to consult my container: https://github.com/selkies-project/docker-nvidia-glx-desktop and provides an official one. In conclusion, this is a really complex issue, where this PR did only partially solve the whole issue. For example, on top of this PR, it would be ideal if it's possible to push Section "OutputClass"
Identifier "nvidia"
MatchDriver "nvidia-drm"
Driver "nvidia"
Option "AllowEmptyInitialConfiguration"
ModulePath "/usr/lib/x86_64-linux-gnu/nvidia/xorg"
EndSection The above is the default behavior for the
Then, the X11 aspect will be solved and we developers can call it a day. Also, injecting 32-bit libraries is definitely desirable for usage with Wine/Proton/etc. |
Signed-off-by: Seungmin Kim <8457324+ehfd@users.noreply.github.com>
internal/discover/graphics.go
Outdated
@@ -56,6 +56,9 @@ func NewGraphicsMountsDiscoverer(logger logger.Interface, driver *root.Driver, n | |||
driver.Root, | |||
[]string{ | |||
"libnvidia-egl-gbm.so.*", | |||
"gbm/nvidia-drm_gbm.so", | |||
"libnvidia-egl-wayland.so.*", | |||
"libnvidia-vulkan-producer.so*", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want:
"libnvidia-vulkan-producer.so*", | |
"libnvidia-vulkan-producer.so.*", |
and need to create the .so
symlink if required as a hook.
internal/discover/graphics.go
Outdated
libRoot + "/xorg/modules/extensions", | ||
libRoot + "/xorg/modules/drivers", | ||
"/usr/lib/xorg/modules/extensions", | ||
"/usr/lib/xorg/modules/drivers", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like we're searching the following 2 paths: []string{"extensions", "drivers"}
in []string{"modules", "modules/updates"}
across the following "roots": []string{libRoot + "/xorg", "/usr/lib", "/usr/lib64", "/usr/X11R6"}
.
Does it make sense to construct this dynamically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it would.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to push to my branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elezar How's the progress?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed this one
Co-authored-by: Evan Lezar <evanlezar@gmail.com> Signed-off-by: Seungmin Kim <8457324+ehfd@users.noreply.github.com>
Another update: |
Thanks for the update @ehfd. I will have a look at this again this week. |
linkDir := filepath.Dir(mount.Path) | ||
linkPath := filepath.Join(linkDir, "gbm", "nvidia-drm_gbm.so") | ||
target := filepath.Join("..", filename) | ||
links = append(links, fmt.Sprintf("%s::%s", target, linkPath)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elezar I changed this to a symlink hook, but I'm not sure if this should be done this way.
When doing it via NewMounts
as before, the locator will attempt to resolve the symlink on the host side. That means the path gets only injected when the library actually exists on the host and future changes to this path (link target changes, becoming a regular file, etc...) will also be covered.
So I'd advocate to put this back to NewMounts
as originally suggested by @ehfd
Your Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This depends on @elezar 's policies on this matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that adding gbm/nvidia-drm_gbm.so
to the NewMounts
will have the desired effect. Since this resolves to libnvidia-allocator.so
we will only include libnvidia-allocator.so
in the list of mounts and NOT create the symlink.
Note that you comment on handling changes in this path is not 100% accurate. While this is true for gbm/nvidia-drm_gbm.so
, changing this to a regular file would have the downside that we may lose the resolution of the target library in this case.
Your comment does raise an interesting question though. If gbm/nvidia-drm_gbm.so
exists on the host, then we could follow a similar strategy to what we do for Tegra-based systems here.
Where we distinguish between regular libraries and symlinks. In both cases any symlinks are followed to ensure that we only mount regular files into a container, but in the case of symlinks we recreate the entire symlink chain in the container. If we were to factor out this logic to be reused here, we could treat the symlinks here explicitly as separate entities to discover.
I do think that thsi can be done as a follow-up to this PR though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that merging first is better since this is taking such a long time.
@elezar So are you satisfied with his changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ehfd I have just tried to push to your branch, but I'm getting permission errors.
I have rebased your branch off main
and created https://github.com/ehfd/nvidia-container-toolkit/pull/1/files with my proposed changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elezar I added you to the repository for write access. Merge on your own when you're ready.
Although there is no reason why you should not be able to push.
Thank you for the contributions on the Go code! @tux-rampage |
@tux-rampage Note that something like |
Some thoughts or contributions on this part of the issue can also help a lot. @tux-rampage |
Sure. I'll take a look when I have some time available, but that might take a day or two. |
@@ -251,6 +312,26 @@ func optionalXorgDiscoverer(logger logger.Interface, driver *root.Driver, nvidia | |||
return xorg | |||
} | |||
|
|||
func buildXOrgSearchPaths(libRoot string) []string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I see this as the nested for loops, it seems more difficult to actually understand what's being searched.
Does it make sense to flatten this instead and just do something like:
return []string{
filepath.Join(libRoot, "nvidia/xorg"),
filepath.Join(libRoot, "xorg", "modules", "drivers"),
filepath.Join(libRoot, "xorg", "modules", "extensions"),
filepath.Join(libRoot, "xorg", "modules/updates", "drivers"),
filepath.Join(libRoot, "xorg", "modules/updates", "extensions"),
filepath.Join("/usr/lib/xorg", "modules", "drivers"),
filepath.Join("/usr/lib/xorg", "modules", "extensions"),
filepath.Join("/usr/lib/xorg", "modules/updates", "drivers"),
filepath.Join("/usr/lib/xorg", "modules/updates", "extensions"),
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess so. Your call.
@tux-rampage @ehfd I have added some commits on top of this branch to reorganise things a bit. I don't really have a great way to verify the changes on my end -- most of my development is headless and compute focussed -- so please check the latest iteration so that we can move forward. @ehfd there are a number of steps in #490 (comment) tha I think should be follow-ups for this. Please create an issue where these are tracked as tasks or equivalent so that we can also look at them. |
I will. I also prefer merging quickly. |
I did my testing after compiling, and after initial tests, the libraries are pushed and most of the things (outside the ones that should be follow-ups) work. @elezar |
Requested reviewer: @elezar
This PR tries the fix the remainder of issues which prevent proper X11 deployment in containers.
Largely, two things:
$LD_LIBRARY_PATH/gbm/nvidia-drm_gbm.so
and other related libraries includinglibnvidia-egl-wayland.so.1
not already included by the container toolkit.$LD_LIBRARY_PATH/nvidia/xorg/libglxserver_nvidia.so
and$LD_LIBRARY_PATH/nvidia/xorg/nvidia_drv.so
for the critical X11 libraries. This is because NVIDIA drivers installed with.run
installs the libraries into/usr/lib/xorg/modules/extensions/libglxserver_nvidia.so.530.41.03
and/usr/lib/xorg/modules/drivers/nvidia_drv.so
, as specified in https://download.nvidia.com/XFree86/Linux-x86_64/550.78/README/installedcomponents.html.Please edit the PR as necessary.
Few things of note:
libnvidia-vulkan-producer.so
is required before r550 but is removed with r550.