Feature/api parity roadmap by solderzzc · Pull Request #3 · SharpAI/mlx-server

solderzzc · 2026-03-24T06:27:52Z

No description provided.

- Add stop sequences (stop parameter, text trimming) - Add /v1/completions text completion endpoint (streaming + non-streaming) - Accurate token counting via lmInput.text.tokens.size (replaces chars÷4) - Add seed parameter for deterministic generation (MLXRandom.seed) - Add stream_options.include_usage for streaming token stats - Add CORS support via --cors CLI flag with CORSMiddleware - Extract handler closures into standalone functions (Swift type-checker fix) - Add ServerConfig struct for CLI defaults bundling - Expand test suite: 6 → 13 test sections (32 assertions total) All 32 tests pass.

…ra sampling params - Add response_format: { type: 'json_object' } with prompt injection + fence stripping - Add --vision CLI flag for VLM model loading via VLMModelFactory - Parse OpenAI multipart content (string or [{type:'text',...},{type:'image_url',...}]) - Decode base64 data URIs and HTTP URLs into UserInput.Image for VLM inference - Accept top_k, frequency_penalty, presence_penalty (API compat) - Add MLXVLM package dependency - Add 4 new regression tests (Tests 14-17), total: 38 assertions All 38 tests pass.

…utdown, stats - Add --mem-limit CLI flag (sets Memory.memoryLimit + Memory.cacheLimit) - Add ServerStats actor tracking requests, tokens, generation timing - Enhanced /health endpoint with GPU memory (active/peak/cache/total), architecture, request/token stats - Add /metrics Prometheus-compatible endpoint (8 metrics with TYPE/HELP) - Add SIGTERM/SIGINT graceful shutdown handlers - Wire stats tracking into all 6 handler functions - Add 3 new regression tests (Tests 18-20), total: 49 assertions All 49 tests pass.

- Add --api-key CLI option for bearer token authentication - ApiKeyMiddleware validates Authorization: Bearer <key> header - Health and metrics endpoints exempt from auth (monitoring tools) - Returns 401 with OpenAI-style error JSON for invalid/missing keys - Config line shows auth=enabled/disabled - Add Test 21: 5 auth assertions (unauthenticated, wrong key, valid key, health exempt, metrics exempt) All 54 tests pass.

- Add PromptCache actor (saves/restores KV cache state per-layer) - Cache keyed by system prompt text hash - On cache hit: restore KV state, skip cached prefix tokens, process only new tokens - On cache miss: generate normally, save system prompt KV state asynchronously - Health/metrics endpoints exempt from cache - Uses container.perform() for direct model access with cache-aware generation All 54 tests pass.

The Metal shader library is required at runtime by MLX Swift. Install via: python3 -m venv + pip install mlx + copy metallib. Also trigger CI on feature/* branches.

simba added 6 commits March 22, 2026 23:10

fix: CI — install mlx.metallib from Python mlx package

4086ce9

The Metal shader library is required at runtime by MLX Swift. Install via: python3 -m venv + pip install mlx + copy metallib. Also trigger CI on feature/* branches.

solderzzc merged commit 2d382ef into main Mar 24, 2026
3 checks passed

solderzzc deleted the feature/api-parity-roadmap branch March 24, 2026 16:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/api parity roadmap#3

Feature/api parity roadmap#3
solderzzc merged 6 commits intomainfrom
feature/api-parity-roadmap

solderzzc commented Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

solderzzc commented Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant