Skip to content

Failed to get sandbox runtime: no runtime for "nvidia" is configured #349

@k8s-gpubuilder-markus

Description

@k8s-gpubuilder-markus

kubelet[1311]: E0505 18:42:19.812688 1311 pod_workers.go:949] "Error syncing pod, skipping" err="failed to "CreatePodSandbox" for "nvidia-device-plugin-daemonset-h9kxw_gpu-operator(8e491f32-4d92-4657-bbef-abf9d3dcb6ab)" with CreatePodSandboxError: "Failed to create sandbox for pod \"nvidia-device-plugin-daemonset-h9kxw_gpu-operator(8e491f32-4d92-4657-bbef-abf9d3dcb6ab)\": rpc error: code = Unknown desc = failed to get sandbox runtime: no runtime for \"nvidia\" is configured"" pod="gpu-operator/nvidia-device-plugin-daemonset-h9kxw" podUID=8e491f32-4d92-4657-bbef-abf9d3dcb6ab
May 05 18:42:19 hawkeye kubelet[1311]: E0505 18:42:19.812950 1311 remote_runtime.go:209] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to get sandbox

This error repeats for all the pods / deamonsets of the gpu operator.

I'm setting up the gpu-operator via helm:

helm install --wait gpu-operator -n gpu-operator --create-namespace nvidia/gpu-operator --set driver.enabled=true --set toolkit.enabled=true

The version is v1.10.1, all the OSes are Ubuntu 20.04

There is no driver or container toolkit installed on the worker node with the GPUs, the gpu operator is set to handling it all.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions