Skip to content

v0.2.0

Latest

Choose a tag to compare

@github-actions github-actions released this 11 Jun 03:50
· 37 commits to master since this release

mlx-local-server v0.2.0

Pre-built binaries for Apple Silicon (M1/M2/M3/M4/M5).

Install

curl -L https://github.com/Ar9av/mlx-lm-server/releases/download/v0.2.0/mlx-local-server-v0.2.0-apple-silicon.tar.gz | tar -xz
brew install python@3.13   # if not already installed
./run.sh lm                # start LLM server on :8080
./run.sh image             # start image server on :8002
./run.sh audio             # start audio server on :8001

No Rust required. Python deps are installed automatically by run.sh.

What's included

  • mlx-lm-server — OpenAI-compatible LLM server (port 8080)
  • mlx-audio-server — TTS/STT/audio server (port 8001)
  • mlx-image-server — FLUX.2 image generation (port 8002)
  • run.sh — launcher (handles Python deps + venv)

What's Changed

  • feat: KV-cache quantization passthrough by @Ar9av in #1
  • feat: speculative decoding with draft model by @Ar9av in #2
  • feat: RAM precheck before model load by @Ar9av in #3
  • feat: multi-adapter hot-swap with named registry by @Ar9av in #4
  • feat: vision/multimodal support via image_url by @Ar9av in #5
  • feat: /llms.txt machine-readable API reference by @Ar9av in #6
  • feat: GET /v1/models/:id/info — rich model metadata by @Ar9av in #7
  • feat: POST /v1/benchmark — built-in latency measurement by @Ar9av in #8

New Contributors

  • @Ar9av made their first contribution in #1

Full Changelog: https://github.com/Ar9av/mlx-lm-server/commits/v0.2.0