Gap
Docker Model Runner on Windows Docker Desktop defaults to llama.cpp latest-cpu even with an RTX 5090 and InferenceCanUseGPUVariant=True in settings.
To switch to CUDA backend: Docker Desktop → Settings → AI → Enable two toggles:
- Enable GPU-backed inference
- Enable host-side TCP
Backend auto-swaps to llama.cpp latest-cuda after toggling. BUT these toggles are only in the GUI — docker desktop enable model-runner --gpu cuda doesn't exist on Windows CLI (v0.3.0). docker model install-runner --gpu cuda rejected: "Standalone installation not supported with Docker Desktop."
Impact
- Carl on Windows can't get CUDA inference without manual GUI interaction
- install.sh can't automate the toggle
- install.sh can detect the gap (check
docker model status for "cpu" vs "cuda" backend) and warn the user with instructions
Verified (BigMama, 2026-04-17)
- Before toggle: 32 tok/s (CPU)
- After toggle: 237 tok/s (CUDA) — 7.4x improvement
Workaround for install.sh
Detect CPU backend + NVIDIA GPU → print: "GPU detected but Docker Model Runner using CPU backend. Open Docker Desktop → Settings → AI → enable GPU-backed inference."
Gap
Docker Model Runner on Windows Docker Desktop defaults to
llama.cpp latest-cpueven with an RTX 5090 andInferenceCanUseGPUVariant=Truein settings.To switch to CUDA backend: Docker Desktop → Settings → AI → Enable two toggles:
Backend auto-swaps to
llama.cpp latest-cudaafter toggling. BUT these toggles are only in the GUI —docker desktop enable model-runner --gpu cudadoesn't exist on Windows CLI (v0.3.0).docker model install-runner --gpu cudarejected: "Standalone installation not supported with Docker Desktop."Impact
docker model statusfor "cpu" vs "cuda" backend) and warn the user with instructionsVerified (BigMama, 2026-04-17)
Workaround for install.sh
Detect CPU backend + NVIDIA GPU → print: "GPU detected but Docker Model Runner using CPU backend. Open Docker Desktop → Settings → AI → enable GPU-backed inference."