Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for openSUSE Tumbleweed #110

Closed
kilian-hu opened this issue Sep 14, 2023 · 15 comments · Fixed by NVIDIA/libnvidia-container#238
Closed

Support for openSUSE Tumbleweed #110

kilian-hu opened this issue Sep 14, 2023 · 15 comments · Fixed by NVIDIA/libnvidia-container#238

Comments

@kilian-hu
Copy link

Hi,

I'm using openSUSE Tumbleweed and tried to follow the installation guide with zypper but encountered this error: Problem: nothing provides 'libseccomp' needed by the to be installed nvidia-container-toolkit-1.14.1-1.x86_64.

I found that in openSUSE Tumbleweed the package is actually called libseccomp2. Would it be possible to adjust the repo such that the installation on openSUSE Tumbleweed works? Any help is highly appreciated :)

@ghost
Copy link

ghost commented Sep 17, 2023

Wish I could provide some help but I am suffering from the same problem. Wish I could tell zypper to replace the dependency libseccomp2 by libseccomp since it is what provides the package libseccomp

Have a great day! And thank you!

@elezar
Copy link
Member

elezar commented Sep 18, 2023

@kilian-hu @pmrj33 would creating a dummy libseccomp package with somethign like fpm be an option. We use this to test our packages that have a dependency on docker without actually installing docker explicitly.

For example:

RUN gem install --no-document fpm
# We create and install a dummy docker package since these dependencies are out of
# scope for the tests performed here.
RUN fpm -s empty \
-t rpm \
--description "A dummy package for docker-ce_18.06.3.ce-3.el7" \
-n docker-ce --version 18.06.3.ce-3.el7 \
-p /tmp/docker.rpm \
&& \
yum localinstall -y /tmp/docker.rpm \
&& \
rm -f /tmp/docker.rpm

One could also use fpm to edit the dependency to libseccomp2 in your cases.

@ghost
Copy link

ghost commented Sep 18, 2023

I ended up building the package nvidia-container-toolkit (repo) from github with target opensuse-leap15.1-x86_64 and creating a local repo for nvidia-container-toolkit to install on the system and to zypper to handle.

I only just installed and didn't yield any errors. Also run on podman this simple example podman run --rm --device nvidia.com/gpu=all --security-opt=label=disable ubuntu nvidia-smi and it executed perfectly.

@elezar
Copy link
Member

elezar commented Sep 18, 2023

@pmrj33 note that only the libnvidia-container* packages have a libseccomp dependency, so it should only be required to build these from source.

Another note: If you're only looking to use CDI (which your example suggests), then only the nvidia-container-toolkit-base package is required. This should have no dependency on libnvidia-container-tools and would remove the need to build things from source.

@ghost
Copy link

ghost commented Sep 18, 2023

Alright, very good point. I am kind of learning how to use podman/docker + nvidia and the exact terms definition. Indeed I see I bypassed the problem with an extreme solution. But when it was building the packages I noticed it fetched libseccomp2 instead of libseccomp, that why I got away with it.

Again thank you for your help. And I had done the build before read your post @elezar and thank you for the solution

@kilian-hu
Copy link
Author

kilian-hu commented Sep 20, 2023

Thank you very much for the help @elezar :)
Installing a dummy libseccomp package built with fpm did the trick.
It works now, so feel free to close the issue.

@benjaminsabatini
Copy link

This seems un-fixed. Is there a solution other than installing a dummy package?

@elezar
Copy link
Member

elezar commented Dec 15, 2023

Hi all. I am working on a solution that involves adding a virtual package that provides libseccomp.so while depending on libseccomp2. This would mean that an additional package would need to be installed on SUSE systems, but would not require a dummy package. See https://gitlab.com/nvidia/container-toolkit/libnvidia-container/-/merge_requests/238

Would this be workable?

@rajinder-yadav
Copy link

@elezar any idea when your solution will be available?

@bubbleguuum
Copy link

bubbleguuum commented Jan 2, 2024

I found out that when installing package nvidia-container-toolkit you can ignore the error that missing libseccomp dependency error, and force install.

Only /usr/lib64/libnvidia-container.so.1 (in package libnvidia-container1) and /usr/bin/nvidia-container-cli (package libnvidia-container-tools) depend on libseccomp and they find it just fine:

 ldd /usr/lib64/libnvidia-container.so.1
        ...
	libseccomp.so.2 => /lib64/libseccomp.so.2 (0x00007f5f4b998000)
	...

Speaking of which, one has to wonder why that libseccomp dependency is on package nvidia-container-toolkit rather than the 2 other mentioned packages that have actual binaries depending on libseccomp.

However once I got the nvidia-container-toolkit package force installed, I add to modify /etc/nvidia-container-runtime/config.toml to add a user = "root:video line, for docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi to not fail with (note: using sudo does not help):

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: nvml error: insufficient permissions: unknown.

@elezar
Copy link
Member

elezar commented Jan 8, 2024

@bubbleguuum thanks for your latest reply. With regards to:

Speaking of which, one has to wonder why that libseccomp dependency is on package nvidia-container-toolkit rather than the 2 other mentioned packages that have actual binaries depending on libseccomp.

This is unexpected. I assume that this is a legacy change that has made it into this package and was never an issue until we started rolling out our "unified" packages. I think we can definitely drop the libseccomp dependency entirely for the nvidia-container-toolkit package. I will be sure to include that in the upcoming release.

With regards to:

I add to modify /etc/nvidia-container-runtime/config.toml to add a user = "root:video line, for docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi to not fail with (note: using sudo does not help):

What does:

nvidia-ctk config

output on your system?

@elezar
Copy link
Member

elezar commented Jan 8, 2024

@rajinder-yadav if we can get consensus that this is a reasonable workaround (as opposed to force-installing the libnvidia-container* packages), then it would be included in the upcoming v1.15.0 release.

@bubbleguuum
Copy link

bubbleguuum commented Jan 8, 2024

@elezar

Default output of nvidia-ctk config (after package installation):

#accept-nvidia-visible-devices-as-volume-mounts = false
#accept-nvidia-visible-devices-envvar-when-unprivileged = true
disable-require = false
supported-driver-capabilities = "compat32,compute,display,graphics,ngx,utility,video"
#swarm-resource = "DOCKER_RESOURCE_GPU"

[nvidia-container-cli]
#debug = "/var/log/nvidia-container-toolkit.log"
environment = []
#ldcache = "/etc/ld.so.cache"
ldconfig = "@/sbin/ldconfig"
load-kmods = true
#no-cgroups = false
#path = "/usr/bin/nvidia-container-cli"
#root = "/run/nvidia/driver"

[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"
log-level = "info"
mode = "auto"
runtimes = ["docker-runc", "runc"]

[nvidia-container-runtime.modes]

[nvidia-container-runtime.modes.cdi]
annotation-prefixes = ["cdi.k8s.io/"]
default-kind = "nvidia.com/gpu"
spec-dirs = ["/etc/cdi", "/var/run/cdi"]

[nvidia-container-runtime.modes.csv]
mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"

[nvidia-container-runtime-hook]
path = "nvidia-container-runtime-hook"
skip-mode-detection = false

[nvidia-ctk]
path = "nvidia-ctk"

Notice that there is no user line. If I open /etc/nvidia-container-runtime/config.toml I see a commented empty user line: #user = "". As I mentioned, I have to add the line user = "root:video" to fix that permission error (root:root works as well), after which of course that line appears in the listing above.

@elezar
Copy link
Member

elezar commented Jan 9, 2024

Thanks for the feedback @bubbleguuum. The logic added to detect a SUSE-based system for the generation of the default config was not working correctly. I have created https://gitlab.com/nvidia/container-toolkit/container-toolkit/-/merge_requests/532 to address this.

@rajinder-yadav
Copy link

Hello I can confirm the fix made by @elezar is working, thank you. 🙏
I can now run ollam locally using docker container and it's blazing fast! ⚡

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants