Bug Report: Docker Model Runner Does Not Use Intel Arc GPU via Vulkan on Windows

# Bug Report: Docker Model Runner Does Not Use Intel Arc GPU via Vulkan on Windows

## Summary

Docker Model Runner (DMR) falls back to CPU-only inference on Windows 11 when an Intel Arc GPU is
present, despite:

- `ggml-vulkan.dll` being present in DMR's inference binary directory
- Vulkan being fully functional at the system level (`vulkaninfo` detects the GPU)
- llama.cpp's own `--list-devices` correctly enumerating the Arc GPU as `Vulkan0`

## Environment

| Property             | Value                                                              |
| -------------------- | ------------------------------------------------------------------ |
| **OS**               | Windows 11                                                         |
| **Docker Desktop**   | 4.74.0 (227015)                                                    |
| **Docker Engine**    | 29.4.3                                                             |
| **GPU**              | Intel Arc Graphics (integrated, 37032 MiB shared memory)           |
| **Intel GPU Driver** | 32.0.101.8801 (WHQL, released 2026-05-15)                          |
| **Vulkan Loader**    | `vulkan-1.dll` present in `C:\Windows\System32` (dated 2026-05-15) |

## Steps to Reproduce

1. Install Docker Desktop 4.74.0 on Windows 11 with an Intel Arc GPU
2. Enable Docker Model Runner in Docker Desktop settings
3. Pull and run a model:
    ```
    docker model pull ai/smollm2
    docker model run --debug ai/smollm2 "Test"
    ```
4. Check logs:
    ```
    docker model logs ai/smollm2
    ```

## Expected Behavior

Docker Model Runner should detect the Intel Arc GPU via the Vulkan backend and use it for inference,
printing something like:

```
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) Graphics (Intel Corporation) | uma: 1 | fp16: 1 ...
load_backend: loaded Vulkan backend from C:\Users\<user>\.docker\bin\inference\ggml-vulkan.dll
```

## Actual Behavior

The inference server loads only the CPU backend. The `--debug` flag and `docker model logs` produce
no output referencing Vulkan, GPU, or backend loading. Inference runs entirely on CPU.

## Diagnostic Evidence

### 1. `ggml-vulkan.dll` is present in DMR's inference directory

```
C:\Users\username\.docker\bin\inference\ggml-vulkan.dll   (62,117,416 bytes, dated 2026-04-23)
```

Full contents of `C:\Users\username\.docker\bin\inference\`:

```
.llamacpp_version
com.docker.llama-server.exe
com.docker.nv-gpu-info.exe
ggml-base.dll
ggml-cpu-alderlake.dll
ggml-cpu-cannonlake.dll
ggml-cpu-cascadelake.dll
ggml-cpu-cooperlake.dll
ggml-cpu-haswell.dll
ggml-cpu-icelake.dll
ggml-cpu-ivybridge.dll
ggml-cpu-piledriver.dll
ggml-cpu-sandybridge.dll
ggml-cpu-sapphirerapids.dll
ggml-cpu-skylakex.dll
ggml-cpu-sse42.dll
ggml-cpu-x64.dll
ggml-cpu-zen4.dll
ggml-vulkan.dll
ggml.dll
llama-common.dll
llama-server.exe
llama.dll
mtmd.dll
```

### 2. Vulkan is functional at the system level

`vulkaninfo.exe` (from `C:\Windows\System32`) correctly detects the Arc GPU:

```
GPU id : 0 (Intel(R) Arc(TM) Graphics) [VK_KHR_win32_surface]
deviceID   = 0x7d55
deviceType = PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU
deviceName = Intel(R) Arc(TM) Graphics
```

Intel Vulkan ICD files are present in the Windows Driver Store:

```
C:\Windows\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_407cbfa5003213b4\igvk64.dll
C:\Windows\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_407cbfa5003213b4\igvk64.json
C:\Windows\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_4d11ccb5a32fb9d9\igvk64.dll
C:\Windows\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_4d11ccb5a32fb9d9\igvk64.json
C:\Windows\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_ba8450f1e628107e\igvk64.dll
C:\Windows\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_ba8450f1e628107e\igvk64.json
```

Note: The registry key `HKLM:\SOFTWARE\Khronos\Vulkan\Drivers` is absent, as Windows DCH drivers use
DriverStore-based ICD discovery instead of registry-based registration. The Vulkan loader finds the
ICD correctly via this mechanism, as confirmed by `vulkaninfo`.

### 3. Standalone llama.cpp correctly enumerates the GPU

Using the standalone llama.cpp Vulkan build (`llama-b9247-bin-win-vulkan-x64`):

```
.\llama-cli.exe --list-devices

Available devices:
  Vulkan0: Intel(R) Arc(TM) Graphics (37032 MiB, 36264 MiB free)
```

This confirms that llama.cpp itself has no issue detecting the GPU — the failure is specific to how
Docker Model Runner invokes or initializes the inference backend on Windows.

## Hypothesis

Docker Model Runner's inference server on Windows may be loading `ggml-vulkan.dll` in a context
(working directory, DLL search path, or process environment) that prevents it from initializing the
Vulkan loader correctly. The Vulkan loader (`vulkan-1.dll`) relies on specific search paths or
environment variables to find ICD manifests, and a mismatch between the process environment used by
DMR's server and the one used by a standalone llama.cpp process could cause silent Vulkan
initialization failure, resulting in CPU-only fallback.

A related issue was previously reported in #305, where Vulkan support intermittently dropped after a
Docker Desktop update.

## Workaround

Use the standalone llama.cpp server with the Vulkan build directly. It exposes an OpenAI-compatible
API and correctly uses the Intel Arc GPU:

```
.\llama-server.exe -m <model.gguf> --port 8080 -ngl 99
```

## Request

Please investigate why `com.docker.llama-server.exe` (or `llama-server.exe`) in
`~\.docker\bin\inference\` fails to initialize the Vulkan backend on Windows when the same llama.cpp
binary distributed separately works correctly. A possible fix may involve ensuring the inference
server process inherits the correct DLL search path or Vulkan layer environment variables when
launched by Docker Desktop on Windows.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug Report: Docker Model Runner Does Not Use Intel Arc GPU via Vulkan on Windows #925

Bug Report: Docker Model Runner Does Not Use Intel Arc GPU via Vulkan on Windows

Summary

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Diagnostic Evidence

1. `ggml-vulkan.dll` is present in DMR's inference directory

2. Vulkan is functional at the system level

3. Standalone llama.cpp correctly enumerates the GPU

Hypothesis

Workaround

Request

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Property	Value
OS	Windows 11
Docker Desktop	4.74.0 (227015)
Docker Engine	29.4.3
GPU	Intel Arc Graphics (integrated, 37032 MiB shared memory)
Intel GPU Driver	32.0.101.8801 (WHQL, released 2026-05-15)
Vulkan Loader	`vulkan-1.dll` present in `C:\Windows\System32` (dated 2026-05-15)

Bug Report: Docker Model Runner Does Not Use Intel Arc GPU via Vulkan on Windows #925

Description

Bug Report: Docker Model Runner Does Not Use Intel Arc GPU via Vulkan on Windows

Summary

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Diagnostic Evidence

1. ggml-vulkan.dll is present in DMR's inference directory

2. Vulkan is functional at the system level

3. Standalone llama.cpp correctly enumerates the GPU

Hypothesis

Workaround

Request

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. `ggml-vulkan.dll` is present in DMR's inference directory