-
-
Notifications
You must be signed in to change notification settings - Fork 207
8.8 Run MLX on Apple Silicon with Harbor
Harbor makes it simple to run Apple's MLX framework for high-performance local LLM inference on M-series Macs. MLX delivers excellent Metal-accelerated performance and memory efficiency on Apple Silicon, while Harbor keeps the rest of your local AI stack (chat UI, web search, coding agents, image tools) wired up through the same CLI and Docker Compose workflow.
This guide covers installation, starting the backend, finding and pulling models from HuggingFace, configuration, and using MLX inside a full Harbor local LLM stack.
Apple Silicon Macs have unified memory and a powerful GPU architecture. MLX is purpose-built by Apple to take full advantage of that hardware:
- Optimized Metal kernels for transformer workloads
- Efficient handling of large context and quantized models
- Direct integration with the host's memory and accelerators
Harbor's mlx backend runs the official mlx-lm server natively on macOS (for full Metal access) and exposes an OpenAI-compatible API to containers and host tools through a lightweight proxy. You get the performance of a host-native runtime with the convenience of Harbor service management, cross-service integrations, and one-command workflows.
- Apple Silicon Mac (M1 or newer recommended)
- macOS 11 (Big Sur) or later
- Docker Desktop for Mac (Apple Silicon version)
-
uvpackage manager (required for Harbor's automatic host lifecycle management of mlx-lm) - 16GB+ unified memory recommended for comfortable 7B-14B model usage
Install uv if you do not have it:
# Homebrew
brew install uv
# Or official installer
curl -LsSf https://astral.sh/uv/install.sh | shInstall or update Harbor following the installation guide.
harbor up mlxHarbor performs the following when HARBOR_MLX_MANAGE_HOST=true (the default):
- Ensures the workspace at
services/mlx/has a Python virtual environment withmlx-lm[server] - Starts
mlx-lm.serveron the host (default port 8095) pointing at the configured model - Brings up a Caddy proxy container that forwards traffic from the Compose network and host port 34930
The first start downloads the default model from HuggingFace if it is not already cached.
Open the proxy health endpoint to confirm it is ready:
curl http://localhost:34930/healthMLX works best with models that have been converted to the MLX format (weights + tokenizer in a layout optimized for the framework). The easiest place to find them is the mlx-community organization on HuggingFace:
- Browse: https://huggingface.co/mlx-community
- Search HuggingFace for "mlx 4bit" or "mlx-community Qwen" / "mlx-community Llama"
Recommended starting points for Apple Silicon (good speed/quality balance in 2025):
- Small and fast:
mlx-community/Qwen3.5-4B-4bit,mlx-community/Qwen3-4B-4bit - Strong general performance:
mlx-community/Meta-Llama-3.1-8B-Instruct-4bit,mlx-community/Qwen2.5-7B-Instruct-4bit - Larger: 14B and 32B 4bit/8bit variants when you have 24GB+ unified memory
Many community converters publish both 4-bit and 8-bit quantized versions. 4-bit variants usually offer the best tokens-per-second on Apple hardware while staying within reasonable memory limits.
You can also use the HuggingFace model search with the mlx tag filter or look for repositories that contain mlx in the name or config.json with "model_type": "mlx" signals.
Use the unified harbor models command or the mlx-specific subcommand:
# Unified interface (recommended)
harbor models pull --source mlx mlx-community/Qwen3.5-4B-4bit
# Direct mlx command (also works)
harbor mlx pull mlx-community/Qwen3.5-4B-4bit
# Alias form
harbor models mlx pull mlx-community/Qwen3.5-4B-4bitHarbor delegates the download to hf download inside the managed uv environment in services/mlx/. Weights land in the standard HuggingFace cache (~/.cache/huggingface by default).
Model removal is not supported through Harbor (harbor mlx rm and harbor models rm --source mlx are unavailable). Delete cached weights from ~/.cache/huggingface manually when you need to reclaim disk space.
List what your mlx-lm instance currently sees:
harbor models ls --source mlx
# or
harbor mlx lsUpdate both the short name Harbor uses internally and the HuggingFace repository path:
harbor config set mlx.model mlx-community/Qwen3.5-4B-4bit
harbor config set mlx.hf.path mlx-community/Qwen3.5-4B-4bitRestart the host runner (the proxy container can keep running):
harbor mlx stop
harbor mlx startThe next harbor up mlx (or any stack that includes mlx) will also launch the new model after a full stack restart.
Start Open WebUI pointed at MLX:
harbor up webui mlx
harbor openOpen WebUI will automatically discover the models served by the MLX backend and let you chat, use tools, RAG, and voice features while the actual inference stays accelerated by MLX on the host.
Use MLX from host-side coding agents and tools:
harbor launch --backend mlx --model mlx-community/Qwen3.5-4B-4bit codex
harbor launch --backend mlx --model mlx-community/Qwen3.5-4B-4bit claudeThe --web flag also works to add SearXNG-powered web search to the launched agent:
harbor launch --backend mlx --web codex- From the Mac host:
http://localhost:34930/v1 - From other Harbor containers:
http://mlx:8080/v1
The endpoint implements the standard OpenAI chat completions, completions, and models routes, so any client that speaks OpenAI can target it.
Key variables (set via harbor config set):
-
mlx.model— short name shown in UIs and launchers -
mlx.hf.path— HuggingFace repository to load on start -
mlx.host.port/mlx.runner.port— ports for proxy and host process -
mlx.manage.host— set to false if you want to run mlx-lm yourself and only use Harbor for proxying
See the full MLX backend documentation for the complete list and volume notes.
mlx-lm fails to start
- Confirm
uvis on your PATH:uv --version - Check the log:
harbor mlx logs - The first run can take time while
uvcreates the venv and downloads mlx-lm plus the model weights.
Out of memory or slow performance
- Switch to a smaller 4-bit model (3B-4B range is very responsive on base M1/M2 machines)
- Close memory-heavy applications; Apple Silicon shares memory between CPU and GPU
Proxy reports unhealthy
- Verify the host process is listening:
curl http://localhost:8095/v1/models - If you manage mlx-lm manually, disable host management and point the proxy at your endpoint:
harbor config set mlx.manage.host false
harbor config set mlx.upstream.url http://host.docker.internal:8095- Add web search with SearXNG for local RAG:
harbor up webui mlx searxng - Explore other Apple-friendly backends such as Docker Model Runner (also excellent on Apple Silicon via Metal)
- Return to the full Harbor Guides or the MLX service page
With Harbor and MLX you get a complete, reproducible local LLM environment that feels native to your Mac while still giving you the full power of the broader Harbor ecosystem.