-
Notifications
You must be signed in to change notification settings - Fork 431
Description
kubelet[1311]: E0505 18:42:19.812688 1311 pod_workers.go:949] "Error syncing pod, skipping" err="failed to "CreatePodSandbox" for "nvidia-device-plugin-daemonset-h9kxw_gpu-operator(8e491f32-4d92-4657-bbef-abf9d3dcb6ab)" with CreatePodSandboxError: "Failed to create sandbox for pod \"nvidia-device-plugin-daemonset-h9kxw_gpu-operator(8e491f32-4d92-4657-bbef-abf9d3dcb6ab)\": rpc error: code = Unknown desc = failed to get sandbox runtime: no runtime for \"nvidia\" is configured"" pod="gpu-operator/nvidia-device-plugin-daemonset-h9kxw" podUID=8e491f32-4d92-4657-bbef-abf9d3dcb6ab
May 05 18:42:19 hawkeye kubelet[1311]: E0505 18:42:19.812950 1311 remote_runtime.go:209] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to get sandbox
This error repeats for all the pods / deamonsets of the gpu operator.
I'm setting up the gpu-operator via helm:
helm install --wait gpu-operator -n gpu-operator --create-namespace nvidia/gpu-operator --set driver.enabled=true --set toolkit.enabled=true
The version is v1.10.1, all the OSes are Ubuntu 20.04
There is no driver or container toolkit installed on the worker node with the GPUs, the gpu operator is set to handling it all.