Skip to content

8.9 Docker Model Runner with Harbor

av edited this page Jun 14, 2026 · 2 revisions

Docker Model Runner Local LLMs with Harbor on Mac, Linux, and Windows

Docker Model Runner (DMR) brings official local model execution directly into the Docker experience. Harbor integrates it as a first-class backend so you can run, discover, and serve models through the same CLI and Compose stack you already use for Ollama, Open WebUI, SearXNG, and dozens of other services.

This guide explains how to set up DMR with Harbor on every supported platform, how to find and pull models (especially from HuggingFace), and how to use DMR inside real local LLM workflows.

What Docker Model Runner Gives You

Docker Model Runner is Docker's built-in runtime for running AI models locally:

  • One docker model command surface for search, pull, run, and status
  • Strong support for GGUF models via llama.cpp (excellent CPU and Metal/CUDA performance)
  • Native OpenAI-compatible and Ollama-compatible APIs once a model is active
  • Works on Apple Silicon (Metal), Linux (CPU or GPU via the plugin), and Windows (via Docker Desktop)

Harbor's dmr service adds the missing piece for containerized stacks: a small Caddy proxy that translates standard /v1 paths into the DMR engine paths and makes the backend reachable at http://dmr:8080/v1 from other services. You keep the full Harbor orchestration, launch adapters, and cross-service wiring while Docker owns the actual model execution on the host.

Prerequisites

  • Docker Desktop (Mac/Windows) or Docker Engine + model plugin (Linux)
  • Harbor installed
  • For best Apple Silicon results: recent Docker Desktop with Model Runner enabled in settings

On most Docker Desktop installations, Model Runner is only a toggle or one command away. Harbor can drive that toggle for you.

One-Command Setup on Any Platform

harbor up dmr

What Harbor does when host management is enabled (default):

  • Checks whether the docker model subcommand exists
  • On Linux: installs the official docker-model-plugin package via apt or dnf when missing
  • On Docker Desktop systems: runs docker desktop enable model-runner when the command is available
  • Enables the TCP endpoint so the proxy (and other containers) can reach it
  • Pulls the configured default model (ai/smollm2 by default) if HARBOR_DMR_AUTO_PULL is true

After the command succeeds you have both the host runner and the Harbor proxy ready.

Verify:

harbor dmr status
docker model ls
harbor models ls --source dmr

Finding Models for Docker Model Runner

DMR has excellent built-in discovery that surfaces models from both Docker Hub and HuggingFace.

Using the Docker CLI (works everywhere)

# Search HuggingFace catalog for GGUF models
docker model search --source=huggingface qwen
docker model search --source=huggingface llama
docker model search --source=huggingface phi --limit 30

# JSON output for scripting
docker model search --source=huggingface --json mistral

Docker Hub curated models

Short names in the ai/ namespace are convenient defaults:

  • ai/smollm2
  • ai/llama3.2
  • ai/phi4

These are maintained by Docker and usually offer a few quantization options.

From Docker Desktop UI

Open Docker Desktop → Models tab. It shows both Docker Hub and HuggingFace results with one-click pull and run buttons. Any model you pull here immediately becomes visible to docker model ls and to Harbor.

Pull Models Through Harbor

Harbor gives you a consistent interface regardless of backend:

# Explicit DMR source
harbor models pull --source dmr hf.co/bartowski/Qwen2.5-7B-Instruct-GGUF:Q4_K_M

# Short curated name
harbor models pull --source dmr ai/smollm2

# Using the dmr subcommand directly
harbor dmr pull hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF
harbor models dmr pull ai/phi4

List everything DMR currently has locally:

harbor models ls --source dmr

Remove models you no longer need:

harbor models rm --source dmr ai/smollm2
harbor dmr rm hf.co/bartowski/Qwen2.5-7B-Instruct-GGUF:Q4_K_M

Start a Complete Stack with DMR

Chat interface + local model:

harbor up webui dmr
harbor open

Add web search for RAG:

harbor up webui dmr searxng

Launch host coding tools against the DMR backend:

harbor launch --backend dmr --model ai/smollm2 codex
harbor launch --backend dmr --model hf.co/bartowski/Qwen2.5-7B-Instruct-GGUF:Q4_K_M claude

The launch system automatically discovers available models from the running DMR instance when you do not pass --model.

API Endpoints

  • Host / external clients: http://localhost:34920/v1
  • Harbor containers: http://dmr:8080/v1

Harbor's proxy handles the path translation (/v1/chat/completions → DMR's /engines/v1/chat/completions) and injects the API key when configured, so clients see a normal OpenAI-compatible surface.

Platform-Specific Notes

Apple Silicon (macOS)

DMR uses Metal under the hood and delivers very strong performance. Many users treat harbor up dmr as the simplest way to get a production-grade Metal backend without installing separate Python runtimes. Both DMR and MLX are excellent choices here; DMR wins when you value the unified docker model workflow and easy GGUF discovery.

Linux

Install the docker-model-plugin package (Harbor does this automatically on apt/dnf systems). GPU support depends on your drivers and the runner version. CPU inference works out of the box for GGUF models.

Windows

Docker Desktop on Windows with WSL2 backend gives you the same Model Runner experience as macOS. GPU passthrough (NVIDIA) is supported when the WSL2 CUDA stack is configured.

Configuration You Will Actually Use

Control the experience with harbor config:

  • dmr.model — default model pulled on harbor up dmr
  • dmr.auto.pull — whether Harbor should pull the default model automatically
  • dmr.enable.tcp — expose the TCP port for the proxy (usually left on)
  • dmr.manage.host — set false only if you want to drive DMR entirely yourself

Full reference lives in the Docker Model Runner backend page.

DMR vs Other Backends

  • Choose DMR when you want Docker-native model management, great GGUF support, and the simplest cross-platform story.
  • Choose Ollama when you want the broadest ecosystem and easiest tagging.
  • Choose llama.cpp (llamacpp service) when you want direct GGUF serving without the DMR layer.
  • Choose MLX (Apple only) when you specifically want Apple's MLX kernels and tooling.
  • Choose vLLM when you need maximum throughput on NVIDIA hardware with continuous batching.

Many Harbor users keep DMR or MLX as their Apple Silicon daily driver and switch to vLLM or llama.cpp on Linux workstations.

Troubleshooting

"docker model: command not found"

Run harbor up dmr with host management enabled. On Linux make sure you have permission to install packages (sudo). On Docker Desktop, open Settings → AI and enable Model Runner manually, then retry.

Proxy unhealthy or connection refused

  • Confirm DMR is reachable on the host: curl http://model-runner.docker.internal:12434/engines/v1/models (or your configured upstream)
  • Check that TCP exposure succeeded: harbor config get dmr.enable.tcp
  • Restart the proxy: harbor down dmr && harbor up dmr

Slow first pull

Large GGUF files (especially 70B) take time and bandwidth. Start with 1B–8B models while you explore.

Next Steps

  • Combine DMR with the rest of the stack: harbor up webui dmr searxng speaches comfyui
  • Compare performance with the built-in benchmark tool: harbor bench run --name dmr-local
  • Read the dedicated service documentation: Docker Model Runner backend
  • Explore the other platform-specific guides in the Harbor Guides collection

Docker Model Runner plus Harbor gives you a clean, Docker-centric path to local LLMs that works consistently whether you are on a MacBook, a Linux workstation, or a Windows development machine. Start with harbor up dmr, pull a couple of HuggingFace GGUF models, and you have a production-ready local inference backend in minutes.

Clone this wiki locally