Bug Report: Docker Model Runner Does Not Use Intel Arc GPU via Vulkan on Windows
Summary
Docker Model Runner (DMR) falls back to CPU-only inference on Windows 11 when an Intel Arc GPU is
present, despite:
ggml-vulkan.dll being present in DMR's inference binary directory
- Vulkan being fully functional at the system level (
vulkaninfo detects the GPU)
- llama.cpp's own
--list-devices correctly enumerating the Arc GPU as Vulkan0
Environment
| Property |
Value |
| OS |
Windows 11 |
| Docker Desktop |
4.74.0 (227015) |
| Docker Engine |
29.4.3 |
| GPU |
Intel Arc Graphics (integrated, 37032 MiB shared memory) |
| Intel GPU Driver |
32.0.101.8801 (WHQL, released 2026-05-15) |
| Vulkan Loader |
vulkan-1.dll present in C:\Windows\System32 (dated 2026-05-15) |
Steps to Reproduce
- Install Docker Desktop 4.74.0 on Windows 11 with an Intel Arc GPU
- Enable Docker Model Runner in Docker Desktop settings
- Pull and run a model:
docker model pull ai/smollm2
docker model run --debug ai/smollm2 "Test"
- Check logs:
docker model logs ai/smollm2
Expected Behavior
Docker Model Runner should detect the Intel Arc GPU via the Vulkan backend and use it for inference,
printing something like:
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) Graphics (Intel Corporation) | uma: 1 | fp16: 1 ...
load_backend: loaded Vulkan backend from C:\Users\<user>\.docker\bin\inference\ggml-vulkan.dll
Actual Behavior
The inference server loads only the CPU backend. The --debug flag and docker model logs produce
no output referencing Vulkan, GPU, or backend loading. Inference runs entirely on CPU.
Diagnostic Evidence
1. ggml-vulkan.dll is present in DMR's inference directory
C:\Users\username\.docker\bin\inference\ggml-vulkan.dll (62,117,416 bytes, dated 2026-04-23)
Full contents of C:\Users\username\.docker\bin\inference\:
.llamacpp_version
com.docker.llama-server.exe
com.docker.nv-gpu-info.exe
ggml-base.dll
ggml-cpu-alderlake.dll
ggml-cpu-cannonlake.dll
ggml-cpu-cascadelake.dll
ggml-cpu-cooperlake.dll
ggml-cpu-haswell.dll
ggml-cpu-icelake.dll
ggml-cpu-ivybridge.dll
ggml-cpu-piledriver.dll
ggml-cpu-sandybridge.dll
ggml-cpu-sapphirerapids.dll
ggml-cpu-skylakex.dll
ggml-cpu-sse42.dll
ggml-cpu-x64.dll
ggml-cpu-zen4.dll
ggml-vulkan.dll
ggml.dll
llama-common.dll
llama-server.exe
llama.dll
mtmd.dll
2. Vulkan is functional at the system level
vulkaninfo.exe (from C:\Windows\System32) correctly detects the Arc GPU:
GPU id : 0 (Intel(R) Arc(TM) Graphics) [VK_KHR_win32_surface]
deviceID = 0x7d55
deviceType = PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU
deviceName = Intel(R) Arc(TM) Graphics
Intel Vulkan ICD files are present in the Windows Driver Store:
C:\Windows\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_407cbfa5003213b4\igvk64.dll
C:\Windows\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_407cbfa5003213b4\igvk64.json
C:\Windows\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_4d11ccb5a32fb9d9\igvk64.dll
C:\Windows\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_4d11ccb5a32fb9d9\igvk64.json
C:\Windows\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_ba8450f1e628107e\igvk64.dll
C:\Windows\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_ba8450f1e628107e\igvk64.json
Note: The registry key HKLM:\SOFTWARE\Khronos\Vulkan\Drivers is absent, as Windows DCH drivers use
DriverStore-based ICD discovery instead of registry-based registration. The Vulkan loader finds the
ICD correctly via this mechanism, as confirmed by vulkaninfo.
3. Standalone llama.cpp correctly enumerates the GPU
Using the standalone llama.cpp Vulkan build (llama-b9247-bin-win-vulkan-x64):
.\llama-cli.exe --list-devices
Available devices:
Vulkan0: Intel(R) Arc(TM) Graphics (37032 MiB, 36264 MiB free)
This confirms that llama.cpp itself has no issue detecting the GPU — the failure is specific to how
Docker Model Runner invokes or initializes the inference backend on Windows.
Hypothesis
Docker Model Runner's inference server on Windows may be loading ggml-vulkan.dll in a context
(working directory, DLL search path, or process environment) that prevents it from initializing the
Vulkan loader correctly. The Vulkan loader (vulkan-1.dll) relies on specific search paths or
environment variables to find ICD manifests, and a mismatch between the process environment used by
DMR's server and the one used by a standalone llama.cpp process could cause silent Vulkan
initialization failure, resulting in CPU-only fallback.
A related issue was previously reported in #305, where Vulkan support intermittently dropped after a
Docker Desktop update.
Workaround
Use the standalone llama.cpp server with the Vulkan build directly. It exposes an OpenAI-compatible
API and correctly uses the Intel Arc GPU:
.\llama-server.exe -m <model.gguf> --port 8080 -ngl 99
Request
Please investigate why com.docker.llama-server.exe (or llama-server.exe) in
~\.docker\bin\inference\ fails to initialize the Vulkan backend on Windows when the same llama.cpp
binary distributed separately works correctly. A possible fix may involve ensuring the inference
server process inherits the correct DLL search path or Vulkan layer environment variables when
launched by Docker Desktop on Windows.
Bug Report: Docker Model Runner Does Not Use Intel Arc GPU via Vulkan on Windows
Summary
Docker Model Runner (DMR) falls back to CPU-only inference on Windows 11 when an Intel Arc GPU is
present, despite:
ggml-vulkan.dllbeing present in DMR's inference binary directoryvulkaninfodetects the GPU)--list-devicescorrectly enumerating the Arc GPU asVulkan0Environment
vulkan-1.dllpresent inC:\Windows\System32(dated 2026-05-15)Steps to Reproduce
Expected Behavior
Docker Model Runner should detect the Intel Arc GPU via the Vulkan backend and use it for inference,
printing something like:
Actual Behavior
The inference server loads only the CPU backend. The
--debugflag anddocker model logsproduceno output referencing Vulkan, GPU, or backend loading. Inference runs entirely on CPU.
Diagnostic Evidence
1.
ggml-vulkan.dllis present in DMR's inference directoryFull contents of
C:\Users\username\.docker\bin\inference\:2. Vulkan is functional at the system level
vulkaninfo.exe(fromC:\Windows\System32) correctly detects the Arc GPU:Intel Vulkan ICD files are present in the Windows Driver Store:
Note: The registry key
HKLM:\SOFTWARE\Khronos\Vulkan\Driversis absent, as Windows DCH drivers useDriverStore-based ICD discovery instead of registry-based registration. The Vulkan loader finds the
ICD correctly via this mechanism, as confirmed by
vulkaninfo.3. Standalone llama.cpp correctly enumerates the GPU
Using the standalone llama.cpp Vulkan build (
llama-b9247-bin-win-vulkan-x64):This confirms that llama.cpp itself has no issue detecting the GPU — the failure is specific to how
Docker Model Runner invokes or initializes the inference backend on Windows.
Hypothesis
Docker Model Runner's inference server on Windows may be loading
ggml-vulkan.dllin a context(working directory, DLL search path, or process environment) that prevents it from initializing the
Vulkan loader correctly. The Vulkan loader (
vulkan-1.dll) relies on specific search paths orenvironment variables to find ICD manifests, and a mismatch between the process environment used by
DMR's server and the one used by a standalone llama.cpp process could cause silent Vulkan
initialization failure, resulting in CPU-only fallback.
A related issue was previously reported in #305, where Vulkan support intermittently dropped after a
Docker Desktop update.
Workaround
Use the standalone llama.cpp server with the Vulkan build directly. It exposes an OpenAI-compatible
API and correctly uses the Intel Arc GPU:
Request
Please investigate why
com.docker.llama-server.exe(orllama-server.exe) in~\.docker\bin\inference\fails to initialize the Vulkan backend on Windows when the same llama.cppbinary distributed separately works correctly. A possible fix may involve ensuring the inference
server process inherits the correct DLL search path or Vulkan layer environment variables when
launched by Docker Desktop on Windows.