Skip to content

container-toolkit on k0s leads to unsupported config version: 3 #803

@leleobhz

Description

@leleobhz

Hello,

I'm trying to install nvidia helm in a k0s cluster:

[root@miriam ~]# k0s version
v1.31.2+k0s.0
[root@miriam ~]# /var/lib/k0s/bin/containerd -v
containerd github.com/containerd/containerd 1.7.22 7f7fdf5fed64eb6a7caf99b3e12efcf9d60e311c
[root@miriam ~]#

As stated possible at https://docs.k0sproject.io/v1.31.2+k0s.0/runtime/#using-nvidia-container-runtime, I've set the helm chart to following options at nvidia-container-toolkit:

toolkit:
  enabled: true
  repository: nvcr.io/nvidia/k8s
  image: container-toolkit
  version: v1.17.2-ubuntu20.04
  imagePullPolicy: IfNotPresent
  imagePullSecrets: []
  env:
    - name: CONTAINERD_CONFIG
      value: "/etc/k0s/containerd.d/nvidia.toml"
    - name: CONTAINERD_SOCKET
      value: "/run/k0s/containerd.sock"
    - name: CONTAINERD_RUNTIME_CLASS
      value: "nvidia"
    - name: CONTAINERD_SET_AS_DEFAULT
      value: "false"
    - name: CONTAINERD_USE_LEGACY_CONFIG
      value: "true"
  resources: {}
  installDir: "/usr/local/nvidia"

The usage of CONTAINERD_USE_LEGACY_CONFIG was an attempt after reading issue #777 after the recommended way from k0s did not worked.

Anyways I run, what I get is:

IS_HOST_DRIVER=true
NVIDIA_DRIVER_ROOT=/
DRIVER_ROOT_CTR_PATH=/host
NVIDIA_DEV_ROOT=/
DEV_ROOT_CTR_PATH=/host
time="2024-11-16T02:28:31Z" level=info msg="Parsing arguments"
time="2024-11-16T02:28:31Z" level=info msg="Starting nvidia-toolkit"
time="2024-11-16T02:28:31Z" level=info msg="disabling device node creation since --cdi-enabled=false"
time="2024-11-16T02:28:31Z" level=info msg="Verifying Flags"
time="2024-11-16T02:28:31Z" level=info msg=Initializing
time="2024-11-16T02:28:31Z" level=info msg="Shutting Down"
time="2024-11-16T02:28:31Z" level=error msg="error running nvidia-toolkit: unable to determine runtime options: unable to load containerd config: unsupported config version: 3"

After checking some source code, I guess legacyConfig does not get proper version to create the file, but I haven't deeply read the source to understand (Also, not a good Go coder).

That said, what may be wrong to nvidia-toolkit does not create the legacyConfig in the specified folder properly?

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions