-
Notifications
You must be signed in to change notification settings - Fork 437
Closed
Description
Hello,
I'm trying to install nvidia helm in a k0s cluster:
[root@miriam ~]# k0s version
v1.31.2+k0s.0
[root@miriam ~]# /var/lib/k0s/bin/containerd -v
containerd github.com/containerd/containerd 1.7.22 7f7fdf5fed64eb6a7caf99b3e12efcf9d60e311c
[root@miriam ~]#
As stated possible at https://docs.k0sproject.io/v1.31.2+k0s.0/runtime/#using-nvidia-container-runtime, I've set the helm chart to following options at nvidia-container-toolkit:
toolkit:
enabled: true
repository: nvcr.io/nvidia/k8s
image: container-toolkit
version: v1.17.2-ubuntu20.04
imagePullPolicy: IfNotPresent
imagePullSecrets: []
env:
- name: CONTAINERD_CONFIG
value: "/etc/k0s/containerd.d/nvidia.toml"
- name: CONTAINERD_SOCKET
value: "/run/k0s/containerd.sock"
- name: CONTAINERD_RUNTIME_CLASS
value: "nvidia"
- name: CONTAINERD_SET_AS_DEFAULT
value: "false"
- name: CONTAINERD_USE_LEGACY_CONFIG
value: "true"
resources: {}
installDir: "/usr/local/nvidia"
The usage of CONTAINERD_USE_LEGACY_CONFIG was an attempt after reading issue #777 after the recommended way from k0s did not worked.
Anyways I run, what I get is:
IS_HOST_DRIVER=true
NVIDIA_DRIVER_ROOT=/
DRIVER_ROOT_CTR_PATH=/host
NVIDIA_DEV_ROOT=/
DEV_ROOT_CTR_PATH=/host
time="2024-11-16T02:28:31Z" level=info msg="Parsing arguments"
time="2024-11-16T02:28:31Z" level=info msg="Starting nvidia-toolkit"
time="2024-11-16T02:28:31Z" level=info msg="disabling device node creation since --cdi-enabled=false"
time="2024-11-16T02:28:31Z" level=info msg="Verifying Flags"
time="2024-11-16T02:28:31Z" level=info msg=Initializing
time="2024-11-16T02:28:31Z" level=info msg="Shutting Down"
time="2024-11-16T02:28:31Z" level=error msg="error running nvidia-toolkit: unable to determine runtime options: unable to load containerd config: unsupported config version: 3"
After checking some source code, I guess legacyConfig does not get proper version to create the file, but I haven't deeply read the source to understand (Also, not a good Go coder).
That said, what may be wrong to nvidia-toolkit does not create the legacyConfig in the specified folder properly?
Thanks!
Metadata
Metadata
Assignees
Labels
No labels