Implement Phase 1 + tool calling by Defilan · Pull Request #1 · defilantech/mlx-server

Defilan · 2026-05-15T17:26:25Z

What

Builds mlx-server out from the Phase 0 scaffold into a working OpenAI-compatible inference server on mlx-swift-lm.

mlx-swift-lm v3.32.1-alpha wired in; package split into a testable MLXServerKit library + a thin MLXServer executable
Model loading from a local MLX directory or a HuggingFace id
GET /v1/models
POST /v1/chat/completions — streaming (SSE) and non-streaming
OpenAI tool calling via mlx-swift-lm's chat-template parsers (tools / tool_choice in, tool_calls out)
Inferencing protocol so the HTTP layer is testable without a model
20 tests: OpenAI type encode/decode, mapping, SSE framing, HTTP routes (stub engine)

Why

Phase 0 was a scaffold serving only /health. This is "where we left off" — the README's Phase 1, plus tool calling pulled forward from Phase 3 so the server is actually usable by agentic clients (opencode).

How it was verified

End-to-end against two local MLX models — Qwen3-4B-4bit and Qwen3.6-35B-A3B-8bit (Qwen3.5-MoE): model load, non-streaming completion with token usage, streaming SSE, and tool calls.

Notes

Running requires an xcodebuild build, not swift build: SwiftPM cannot compile mlx-swift's Metal shaders, so a swift build binary fails at runtime with Failed to load the default metallib. CI now builds with xcodebuild to verify the real artifact; swift test runs the (model-free) suite.
Reasoning models currently emit <think> content into the response content — routing it to reasoning_content is a planned follow-up.

Roadmap

Phase 0 and Phase 1 (+ tool calling) are now done. Next: reasoning/content split, then LLMKube runtime: mlx-server integration (Phase 4).

Build mlx-server out from the Phase 0 scaffold into a working OpenAI-compatible inference server on mlx-swift-lm. - Wire mlx-swift-lm v3.32.1-alpha; split into a testable MLXServerKit library plus a thin MLXServer executable - Load an MLX model from a local directory or a HuggingFace id - GET /v1/models - POST /v1/chat/completions, both streaming (SSE) and non-streaming - OpenAI tool calling via mlx-swift-lm's chat-template parsers - Inferencing protocol so the HTTP layer is testable without a model - 20 tests: OpenAI types, mapping, SSE framing, HTTP routes - CI builds with xcodebuild so the Metal shaders are verified, not just compilation Validated end-to-end against Qwen3-4B-4bit and Qwen3.6-35B-A3B-8bit. Running requires an xcodebuild build: SwiftPM cannot compile mlx-swift's Metal shaders.

Defilan merged commit 4a0f547 into main May 15, 2026
1 check passed

Defilan deleted the feat/phase1-tool-calling branch May 15, 2026 17:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Phase 1 + tool calling#1

Implement Phase 1 + tool calling#1
Defilan merged 1 commit into
mainfrom
feat/phase1-tool-calling

Defilan commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Defilan commented May 15, 2026

What

Why

How it was verified

Notes

Roadmap

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant