Skip to content

Bug Report: Docker Model Runner Does Not Use Intel Arc GPU via Vulkan on Windows #925

@darthcav

Description

@darthcav

Bug Report: Docker Model Runner Does Not Use Intel Arc GPU via Vulkan on Windows

Summary

Docker Model Runner (DMR) falls back to CPU-only inference on Windows 11 when an Intel Arc GPU is
present, despite:

  • ggml-vulkan.dll being present in DMR's inference binary directory
  • Vulkan being fully functional at the system level (vulkaninfo detects the GPU)
  • llama.cpp's own --list-devices correctly enumerating the Arc GPU as Vulkan0

Environment

Property Value
OS Windows 11
Docker Desktop 4.74.0 (227015)
Docker Engine 29.4.3
GPU Intel Arc Graphics (integrated, 37032 MiB shared memory)
Intel GPU Driver 32.0.101.8801 (WHQL, released 2026-05-15)
Vulkan Loader vulkan-1.dll present in C:\Windows\System32 (dated 2026-05-15)

Steps to Reproduce

  1. Install Docker Desktop 4.74.0 on Windows 11 with an Intel Arc GPU
  2. Enable Docker Model Runner in Docker Desktop settings
  3. Pull and run a model:
    docker model pull ai/smollm2
    docker model run --debug ai/smollm2 "Test"
    
  4. Check logs:
    docker model logs ai/smollm2
    

Expected Behavior

Docker Model Runner should detect the Intel Arc GPU via the Vulkan backend and use it for inference,
printing something like:

ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) Graphics (Intel Corporation) | uma: 1 | fp16: 1 ...
load_backend: loaded Vulkan backend from C:\Users\<user>\.docker\bin\inference\ggml-vulkan.dll

Actual Behavior

The inference server loads only the CPU backend. The --debug flag and docker model logs produce
no output referencing Vulkan, GPU, or backend loading. Inference runs entirely on CPU.

Diagnostic Evidence

1. ggml-vulkan.dll is present in DMR's inference directory

C:\Users\username\.docker\bin\inference\ggml-vulkan.dll   (62,117,416 bytes, dated 2026-04-23)

Full contents of C:\Users\username\.docker\bin\inference\:

.llamacpp_version
com.docker.llama-server.exe
com.docker.nv-gpu-info.exe
ggml-base.dll
ggml-cpu-alderlake.dll
ggml-cpu-cannonlake.dll
ggml-cpu-cascadelake.dll
ggml-cpu-cooperlake.dll
ggml-cpu-haswell.dll
ggml-cpu-icelake.dll
ggml-cpu-ivybridge.dll
ggml-cpu-piledriver.dll
ggml-cpu-sandybridge.dll
ggml-cpu-sapphirerapids.dll
ggml-cpu-skylakex.dll
ggml-cpu-sse42.dll
ggml-cpu-x64.dll
ggml-cpu-zen4.dll
ggml-vulkan.dll
ggml.dll
llama-common.dll
llama-server.exe
llama.dll
mtmd.dll

2. Vulkan is functional at the system level

vulkaninfo.exe (from C:\Windows\System32) correctly detects the Arc GPU:

GPU id : 0 (Intel(R) Arc(TM) Graphics) [VK_KHR_win32_surface]
deviceID   = 0x7d55
deviceType = PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU
deviceName = Intel(R) Arc(TM) Graphics

Intel Vulkan ICD files are present in the Windows Driver Store:

C:\Windows\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_407cbfa5003213b4\igvk64.dll
C:\Windows\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_407cbfa5003213b4\igvk64.json
C:\Windows\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_4d11ccb5a32fb9d9\igvk64.dll
C:\Windows\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_4d11ccb5a32fb9d9\igvk64.json
C:\Windows\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_ba8450f1e628107e\igvk64.dll
C:\Windows\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_ba8450f1e628107e\igvk64.json

Note: The registry key HKLM:\SOFTWARE\Khronos\Vulkan\Drivers is absent, as Windows DCH drivers use
DriverStore-based ICD discovery instead of registry-based registration. The Vulkan loader finds the
ICD correctly via this mechanism, as confirmed by vulkaninfo.

3. Standalone llama.cpp correctly enumerates the GPU

Using the standalone llama.cpp Vulkan build (llama-b9247-bin-win-vulkan-x64):

.\llama-cli.exe --list-devices

Available devices:
  Vulkan0: Intel(R) Arc(TM) Graphics (37032 MiB, 36264 MiB free)

This confirms that llama.cpp itself has no issue detecting the GPU — the failure is specific to how
Docker Model Runner invokes or initializes the inference backend on Windows.

Hypothesis

Docker Model Runner's inference server on Windows may be loading ggml-vulkan.dll in a context
(working directory, DLL search path, or process environment) that prevents it from initializing the
Vulkan loader correctly. The Vulkan loader (vulkan-1.dll) relies on specific search paths or
environment variables to find ICD manifests, and a mismatch between the process environment used by
DMR's server and the one used by a standalone llama.cpp process could cause silent Vulkan
initialization failure, resulting in CPU-only fallback.

A related issue was previously reported in #305, where Vulkan support intermittently dropped after a
Docker Desktop update.

Workaround

Use the standalone llama.cpp server with the Vulkan build directly. It exposes an OpenAI-compatible
API and correctly uses the Intel Arc GPU:

.\llama-server.exe -m <model.gguf> --port 8080 -ngl 99

Request

Please investigate why com.docker.llama-server.exe (or llama-server.exe) in
~\.docker\bin\inference\ fails to initialize the Vulkan backend on Windows when the same llama.cpp
binary distributed separately works correctly. A possible fix may involve ensuring the inference
server process inherits the correct DLL search path or Vulkan layer environment variables when
launched by Docker Desktop on Windows.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions