Releases: Ar9av/mlx-lm-server
Releases · Ar9av/mlx-lm-server
v0.2.0
mlx-local-server v0.2.0
Pre-built binaries for Apple Silicon (M1/M2/M3/M4/M5).
Install
curl -L https://github.com/Ar9av/mlx-lm-server/releases/download/v0.2.0/mlx-local-server-v0.2.0-apple-silicon.tar.gz | tar -xz
brew install python@3.13 # if not already installed
./run.sh lm # start LLM server on :8080
./run.sh image # start image server on :8002
./run.sh audio # start audio server on :8001No Rust required. Python deps are installed automatically by run.sh.
What's included
mlx-lm-server— OpenAI-compatible LLM server (port 8080)mlx-audio-server— TTS/STT/audio server (port 8001)mlx-image-server— FLUX.2 image generation (port 8002)run.sh— launcher (handles Python deps + venv)
What's Changed
- feat: KV-cache quantization passthrough by @Ar9av in #1
- feat: speculative decoding with draft model by @Ar9av in #2
- feat: RAM precheck before model load by @Ar9av in #3
- feat: multi-adapter hot-swap with named registry by @Ar9av in #4
- feat: vision/multimodal support via image_url by @Ar9av in #5
- feat: /llms.txt machine-readable API reference by @Ar9av in #6
- feat: GET /v1/models/:id/info — rich model metadata by @Ar9av in #7
- feat: POST /v1/benchmark — built-in latency measurement by @Ar9av in #8
New Contributors
Full Changelog: https://github.com/Ar9av/mlx-lm-server/commits/v0.2.0