-
Notifications
You must be signed in to change notification settings - Fork 435
Description
🧩 Summary
I'm running a single-node k3s cluster on a workstation with an NVIDIA RTX 5080 GPU using open-source drivers (nvidia-open) on Arch Linux. I want to run vLLM, Whisper, and other CUDA-dependent LLMs through Kubernetes with full GPU acceleration.
The key challenge:
📛 nvidia-container-runtime does not support nvidia-open, but nvidia-container-toolkit does detect the GPU and CUDA correctly.
🛠 System Setup
OS: Arch Linux, kernel 6.x
GPU: NVIDIA GeForce RTX 5080
Driver: nvidia-open (NOT proprietary)
CUDA: 12.8 (confirmed via nvidia-container-cli info)
k3s: latest (uses containerd internally)
containerd: working, default config generated and customized
NVIDIA container toolkit: installed and working
No nvidia runtime detected via ctr plugins ls
✅ What Works
nvidia-smi inside privileged containers (via Docker or pod with hostPath + privileged)
Running LLMs via docker compose with --gpus all
Custom edits to config.toml.tmpl in k3s, confirmed preserved across restarts
/dev/nvidia* and /proc/driver/nvidia are visible inside privileged pods
❌ What Fails
nvidia-container-runtime does not register with containerd
nvidia-device-plugin DaemonSet fails with network plugin not initialized (after config changes)
No nvidia.com/gpu resource exposed to Kubernetes
ctr plugins ls | grep nvidia → empty
🤔 Questions to the Community
Is it currently possible to use nvidia-open drivers with k3s + containerd + Kubernetes GPU support in any official or semi-official way?
Has anyone successfully registered a GPU runtime with containerd using nvidia-open?
Would you recommend sticking with Docker Compose + --gpus all until nvidia-container-runtime gains support for nvidia-open?
Is there an official roadmap for nvidia-container-runtime to support nvidia-open?
Any known working workarounds other than privileged pods and hostPath volumes?
🔧 Additional Context
We're developing a local GPU inference platform (LLMs, Whisper, Image models) using Portainer and Rancher, exploring Kubernetes orchestration in the long term. Right now, we're considering falling back to Docker Compose to proceed, unless there’s a viable way to unlock GPU scheduling within k3s using open drivers.
💬 Would Appreciate
Community confirmations / working examples
Links to related PRs or tracking issues
Recommendations for better architecture
Thanks!