mlx-server

OpenAI-compatible HTTP server for mlx-swift-lm on Apple Silicon.

Status

Phase 1 + tool calling: working. Loads an MLX model and serves /v1/chat/completions (streaming and non-streaming), /v1/models, and /health, with OpenAI-compatible tool calling. Validated end-to-end against Qwen3-4B and Qwen3.6-35B-A3B (MoE). See the roadmap for what is next.

Why this exists

mlx-swift-lm is a high-performance LLM inference library for Apple Silicon, with state-of-the-art speculative decoding, prefix KV-cache, and hybrid-architecture state replay landing in its Tier 1 roadmap. It is, by design, a Swift library, not a server. mlx-server is the missing HTTP layer.

The end goal is to be a drop-in replacement for llama-server in LLMKube's Apple Silicon path, with the perf characteristics of mlx-swift-lm.

Goals

OpenAI-compatible API surface: /v1/chat/completions, /v1/completions, /v1/models, /v1/embeddings
Streaming via Server-Sent Events
Tool calling and structured outputs
Vision-language model support (Qwen-VL, Gemma 4 VL, etc.)
Multi-slot concurrency with longest-prefix KV cache reuse
One-command install via Homebrew, plus prebuilt GitHub releases

Non-goals

Cross-platform support. Apple Silicon (arm64 macOS) only. If you need x86_64 or Linux, look at llama.cpp.
vLLM compatibility. For that path see vllm-swift (Python API + Swift compute) or vllm-metal.
Heavy abstractions over mlx-swift-lm. This is a thin HTTP wrapper, not a framework.

Prior art

TheTom's MLXServer (abandoned in favor of vllm-swift) was the proof-of-concept that an MLX-swift HTTP server is feasible. Several design decisions here, particularly around the slot manager and longest-prefix KV cache, are informed by his approach. The decision to rebuild rather than fork is mainly because his original used hand-rolled socket code; this repo uses Hummingbird for the HTTP layer.

Install

brew install defilantech/tap/mlx-server

Apple Silicon, macOS 14 (Sonoma) or later. This installs the latest prebuilt release; older versions and the raw tarballs are on the releases page. Then:

mlx-server --model /path/to/mlx-model-dir --port 8080

Build from source

Requires:

macOS 14 (Sonoma) or later, Apple Silicon
Swift 6.0 or later (Xcode 16+)

swift build compiles the project (and is what CI runs), but SwiftPM cannot compile mlx-swift's Metal shaders — a binary built that way fails at runtime with Failed to load the default metallib. To run the server, build with xcodebuild, which compiles and bundles the Metal library next to the binary:

xcodebuild -scheme mlx-server -destination 'platform=macOS,arch=arm64' \
  -configuration Debug -derivedDataPath .build/xcode -skipMacroValidation build

.build/xcode/Build/Products/Debug/mlx-server \
  --model /path/to/mlx-model-dir --port 8080

--model takes a local MLX model directory or a HuggingFace id. Other flags:

--host, --port, --max-slots
--tool-call-format — e.g. xml_function for Qwen3.5 / Qwen3-Coder; auto-inferred when unset
--reasoning — how thinking output is split into reasoning_content vs content: auto (default; splits on a literal <think>/</think>), prefilled (output starts mid-thought — use for Qwen3.5 / Qwen3.6), or off

Roadmap

Phase	Scope	Status
0	Scaffolding, CI, `/health` endpoint, dependency wiring	Done
1	`/v1/chat/completions` (streaming + non-streaming), `/v1/models`, single-slot model loading	Done
2	Multi-slot `SlotManager`, longest-prefix prompt cache, Prometheus `/metrics`, structured logging, graceful shutdown
3	Tool calling, thinking-model support, vision-language models, speculative decoding knobs, `/v1/embeddings`	Tool calling done
4	LLMKube `runtime: mlx-server` integration

License

Apache 2.0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github		.github
Sources		Sources
Tests/MLXServerTests		Tests/MLXServerTests
.gitignore		.gitignore
LICENSE		LICENSE
Package.swift		Package.swift
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mlx-server

Status

Why this exists

Goals

Non-goals

Prior art

Install

Build from source

Roadmap

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

mlx-server

Status

Why this exists

Goals

Non-goals

Prior art

Install

Build from source

Roadmap

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages