Submit: Rapid-MLX — OpenAI-compatible local LLM server for Apple Silicon (2-4x faster than Ollama)

## Tool Submission: Rapid-MLX

**Name:** Rapid-MLX
**URL:** https://github.com/raullenchai/Rapid-MLX
**Category:** Developer Tools / AI / Local LLM Inference
**License:** Apache-2.0
**Language:** Python

## What is Rapid-MLX?

An OpenAI-compatible local LLM inference server built specifically for Apple Silicon. It delivers **2–4x faster token generation than Ollama** by running directly on MLX (Apple's ML framework) with a highly optimized streaming pipeline.

### Key Features
- **OpenAI-compatible API** — drop-in replacement, works with any OpenAI SDK client
- **2–4x faster than Ollama** on Apple Silicon (M1/M2/M3/M4)
- **Tool calling** — full function/tool calling support for agentic workflows
- **Reasoning models** — streaming `<think>` token support (Qwen3, DeepSeek-R1, etc.)
- **Vision & Audio** — multimodal model support
- **Structured output** — JSON schema enforcement
- **Prompt caching** — persistent KV cache across requests for faster multi-turn chats
- **Speculative decoding (MTP)** — 1.4x additional decode speedup on supported models

### Install

```bash
# Homebrew (macOS)
brew install raullenchai/rapid-mlx/rapid-mlx

# pip
pip install rapid-mlx
```

### Why it's relevant to developers

Local LLM inference on Mac has historically been bottlenecked by Ollama's overhead. Rapid-MLX bypasses that by integrating directly with Apple's MLX framework, giving developers a fully OpenAI-compatible server that runs substantially faster — making local AI development and testing much more practical on MacBooks and Mac Studios.

## Benchmark (Qwen3.5-9B, M3 Ultra)

| Engine     | Tokens/sec |
|------------|-----------|
| Rapid-MLX  | ~95 tok/s |
| mlx-lm     | ~90 tok/s |
| Ollama     | ~23 tok/s |

Rapid-MLX is **4.2x faster than Ollama** on this configuration.

---

*Happy to provide any additional info or assets needed.*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Submit: Rapid-MLX — OpenAI-compatible local LLM server for Apple Silicon (2-4x faster than Ollama) #194

Tool Submission: Rapid-MLX

What is Rapid-MLX?

Key Features

Install

Why it's relevant to developers

Benchmark (Qwen3.5-9B, M3 Ultra)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Submit: Rapid-MLX — OpenAI-compatible local LLM server for Apple Silicon (2-4x faster than Ollama) #194

Description

Tool Submission: Rapid-MLX

What is Rapid-MLX?

Key Features

Install

Why it's relevant to developers

Benchmark (Qwen3.5-9B, M3 Ultra)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions