-
-
Notifications
You must be signed in to change notification settings - Fork 207
8.9 Docker Model Runner with Harbor
Docker Model Runner (DMR) brings official local model execution directly into the Docker experience. Harbor integrates it as a first-class backend so you can run, discover, and serve models through the same CLI and Compose stack you already use for Ollama, Open WebUI, SearXNG, and dozens of other services.
This guide explains how to set up DMR with Harbor on every supported platform, how to find and pull models (especially from HuggingFace), and how to use DMR inside real local LLM workflows.
Docker Model Runner is Docker's built-in runtime for running AI models locally:
- One
docker modelcommand surface for search, pull, run, and status - Strong support for GGUF models via llama.cpp (excellent CPU and Metal/CUDA performance)
- Native OpenAI-compatible and Ollama-compatible APIs once a model is active
- Works on Apple Silicon (Metal), Linux (CPU or GPU via the plugin), and Windows (via Docker Desktop)
Harbor's dmr service adds the missing piece for containerized stacks: a small Caddy proxy that translates standard /v1 paths into the DMR engine paths and makes the backend reachable at http://dmr:8080/v1 from other services. You keep the full Harbor orchestration, launch adapters, and cross-service wiring while Docker owns the actual model execution on the host.
- Docker Desktop (Mac/Windows) or Docker Engine + model plugin (Linux)
- Harbor installed
- For best Apple Silicon results: recent Docker Desktop with Model Runner enabled in settings
On most Docker Desktop installations, Model Runner is only a toggle or one command away. Harbor can drive that toggle for you.
harbor up dmrWhat Harbor does when host management is enabled (default):
- Checks whether the
docker modelsubcommand exists - On Linux: installs the official
docker-model-pluginpackage via apt or dnf when missing - On Docker Desktop systems: runs
docker desktop enable model-runnerwhen the command is available - Enables the TCP endpoint so the proxy (and other containers) can reach it
- Pulls the configured default model (
ai/smollm2by default) ifHARBOR_DMR_AUTO_PULLis true
After the command succeeds you have both the host runner and the Harbor proxy ready.
Verify:
harbor dmr status
docker model ls
harbor models ls --source dmrDMR has excellent built-in discovery that surfaces models from both Docker Hub and HuggingFace.
# Search HuggingFace catalog for GGUF models
docker model search --source=huggingface qwen
docker model search --source=huggingface llama
docker model search --source=huggingface phi --limit 30
# JSON output for scripting
docker model search --source=huggingface --json mistralShort names in the ai/ namespace are convenient defaults:
ai/smollm2ai/llama3.2ai/phi4
These are maintained by Docker and usually offer a few quantization options.
Open Docker Desktop → Models tab. It shows both Docker Hub and HuggingFace results with one-click pull and run buttons. Any model you pull here immediately becomes visible to docker model ls and to Harbor.
Harbor gives you a consistent interface regardless of backend:
# Explicit DMR source
harbor models pull --source dmr hf.co/bartowski/Qwen2.5-7B-Instruct-GGUF:Q4_K_M
# Short curated name
harbor models pull --source dmr ai/smollm2
# Using the dmr subcommand directly
harbor dmr pull hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF
harbor models dmr pull ai/phi4List everything DMR currently has locally:
harbor models ls --source dmrRemove models you no longer need:
harbor models rm --source dmr ai/smollm2
harbor dmr rm hf.co/bartowski/Qwen2.5-7B-Instruct-GGUF:Q4_K_MChat interface + local model:
harbor up webui dmr
harbor openAdd web search for RAG:
harbor up webui dmr searxngLaunch host coding tools against the DMR backend:
harbor launch --backend dmr --model ai/smollm2 codex
harbor launch --backend dmr --model hf.co/bartowski/Qwen2.5-7B-Instruct-GGUF:Q4_K_M claudeThe launch system automatically discovers available models from the running DMR instance when you do not pass --model.
- Host / external clients:
http://localhost:34920/v1 - Harbor containers:
http://dmr:8080/v1
Harbor's proxy handles the path translation (/v1/chat/completions → DMR's /engines/v1/chat/completions) and injects the API key when configured, so clients see a normal OpenAI-compatible surface.
Apple Silicon (macOS)
DMR uses Metal under the hood and delivers very strong performance. Many users treat harbor up dmr as the simplest way to get a production-grade Metal backend without installing separate Python runtimes. Both DMR and MLX are excellent choices here; DMR wins when you value the unified docker model workflow and easy GGUF discovery.
Linux
Install the docker-model-plugin package (Harbor does this automatically on apt/dnf systems). GPU support depends on your drivers and the runner version. CPU inference works out of the box for GGUF models.
Windows
Docker Desktop on Windows with WSL2 backend gives you the same Model Runner experience as macOS. GPU passthrough (NVIDIA) is supported when the WSL2 CUDA stack is configured.
Control the experience with harbor config:
-
dmr.model— default model pulled onharbor up dmr -
dmr.auto.pull— whether Harbor should pull the default model automatically -
dmr.enable.tcp— expose the TCP port for the proxy (usually left on) -
dmr.manage.host— set false only if you want to drive DMR entirely yourself
Full reference lives in the Docker Model Runner backend page.
- Choose DMR when you want Docker-native model management, great GGUF support, and the simplest cross-platform story.
- Choose Ollama when you want the broadest ecosystem and easiest tagging.
- Choose llama.cpp (llamacpp service) when you want direct GGUF serving without the DMR layer.
- Choose MLX (Apple only) when you specifically want Apple's MLX kernels and tooling.
- Choose vLLM when you need maximum throughput on NVIDIA hardware with continuous batching.
Many Harbor users keep DMR or MLX as their Apple Silicon daily driver and switch to vLLM or llama.cpp on Linux workstations.
"docker model: command not found"
Run harbor up dmr with host management enabled. On Linux make sure you have permission to install packages (sudo). On Docker Desktop, open Settings → AI and enable Model Runner manually, then retry.
Proxy unhealthy or connection refused
- Confirm DMR is reachable on the host:
curl http://model-runner.docker.internal:12434/engines/v1/models(or your configured upstream) - Check that TCP exposure succeeded:
harbor config get dmr.enable.tcp - Restart the proxy:
harbor down dmr && harbor up dmr
Slow first pull
Large GGUF files (especially 70B) take time and bandwidth. Start with 1B–8B models while you explore.
- Combine DMR with the rest of the stack:
harbor up webui dmr searxng speaches comfyui - Compare performance with the built-in benchmark tool:
harbor bench run --name dmr-local - Read the dedicated service documentation: Docker Model Runner backend
- Explore the other platform-specific guides in the Harbor Guides collection
Docker Model Runner plus Harbor gives you a clean, Docker-centric path to local LLMs that works consistently whether you are on a MacBook, a Linux workstation, or a Windows development machine. Start with harbor up dmr, pull a couple of HuggingFace GGUF models, and you have a production-ready local inference backend in minutes.