Skip to content

feat: add mlx-server runtime to the metal-agent#471

Merged
Defilan merged 1 commit into
defilantech:mainfrom
Defilan:feat/metal-agent-mlx-server-runtime
May 17, 2026
Merged

feat: add mlx-server runtime to the metal-agent#471
Defilan merged 1 commit into
defilantech:mainfrom
Defilan:feat/metal-agent-mlx-server-runtime

Conversation

@Defilan
Copy link
Copy Markdown
Member

@Defilan Defilan commented May 15, 2026

What

Add mlx-server as a metal-agent inference runtime, alongside llama-server,
omlx, ollama, and vllm-swift.

Why

mlx-server is a native
OpenAI-compatible MLX inference server for Apple Silicon. This lets the
metal-agent supervise it as a first-class InferenceService runtime — the
same lifecycle the agent already gives the other runtimes.

No linked issue.

How

  • MLXServerExecutor (pkg/agent/executor_mlx_server.go) implements
    ProcessExecutor: resolves the model directory, spawns mlx-server, polls
    /health, and stops it with SIGTERM plus a grace period. Unlike the
    vllm-swift executor's per-spawn ephemeral port it binds a fixed port
    (default 8080) so clients keep a stable base URL across respawns.
  • runtimeMLXServer wired into the executor-selection switch and
    validateRuntimeFormat (MLX directories and HF safetensors accepted, gguf
    rejected — same as vllm-swift).
  • --mlx-server-bin, --mlx-server-port, and --mlx-server-startup-timeout
    flags on the metal-agent, with binary auto-detection.
  • mlx-server-specific flags such as --tool-call-format and --reasoning
    ride through InferenceService.spec.extraArgs — no new CRD fields.

Checklist

  • Tests added/updated — executor_mlx_server_test.go covers arg construction, the fixed-port executor, health polling, and process teardown
  • make test passes locally
  • make lint passes locally
  • Commit messages follow conventional commits
  • All commits are signed off (git commit -s) per DCO
  • Documentation updated (if user-facing change) — the macOS Metal guide's runtime list is a small follow-up

Add mlx-server as a metal-agent inference runtime alongside
llama-server, omlx, ollama, and vllm-swift. mlx-server
(github.com/Defilan/mlx-server) is a native OpenAI-compatible MLX
inference server; this lets the agent supervise it as a first-class
InferenceService runtime.

- MLXServerExecutor (executor_mlx_server.go) implements ProcessExecutor:
  resolves the model directory, spawns mlx-server, polls /health, and
  stops it with SIGTERM plus a grace period. Unlike the vllm-swift
  executor's per-spawn ephemeral port it binds a fixed port (default
  8080) so clients keep a stable base URL across respawns.
- runtimeMLXServer wired into the executor-selection switch and into
  validateRuntimeFormat (MLX directories and HF safetensors accepted,
  gguf rejected — same as vllm-swift).
- --mlx-server-bin, --mlx-server-port, and --mlx-server-startup-timeout
  flags on the metal-agent, with binary auto-detection.
- mlx-server-specific flags such as --tool-call-format and --reasoning
  ride through InferenceService.spec.extraArgs; no new CRD fields.

Unit tests cover argument construction, the fixed-port executor,
health polling, and process teardown.

Signed-off-by: Christopher Maher <chris@mahercode.io>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 15, 2026

Codecov Report

❌ Patch coverage is 28.02548% with 113 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
pkg/agent/executor_mlx_server.go 40.74% 63 Missing and 1 partial ⚠️
cmd/metal-agent/main.go 0.00% 33 Missing ⚠️
pkg/agent/agent.go 0.00% 16 Missing ⚠️

📢 Thoughts on this report? Let us know!

@Defilan Defilan merged commit 8bf9808 into defilantech:main May 17, 2026
21 checks passed
@github-actions github-actions Bot mentioned this pull request May 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant