High-performance OpenAI API-compatible reverse proxy router for LLM backends
Part of the DataStudio ecosystem — intelligent request routing for LLM inference services
LLM Router is an OpenAI API-compatible reverse proxy router built with Go, providing LLM backend management for DataStudio. It supports intelligent load balancing, sliding-window RPM rate limiting, async health checking, automatic failover, and hot-reloading of configurations.
LLM Router is a companion tool to DataStudio (located in
tools/LLMRouter/). By pointingmodel.api_basein DataStudio's config to the Router address, all requests are automatically routed to multiple healthy backend instances.
| Feature | Description |
|---|---|
| Zero-copy forwarding | Only extracts the model field with gjson — no full JSON parsing |
| COW backend pool | Lock-free reads (atomic.Value), Copy-on-Write for writes |
| Three routing strategies | Weighted random / Least connections (P2C) / Least waiting (P2C + Prometheus) |
| RPM rate limiting | Sliding window counter, atomic pre-deduction, per-backend granularity |
| Async health checks | Concurrent probing of /v1/models, marks unhealthy after N consecutive failures |
| Hot config reload | Watches YAML file changes, incremental backend updates without restart |
| Smart retry | Error classification (timeout / 5xx / connection refused), exponential backoff + jitter |
| Web monitoring dashboard | Real-time backend status, RPM, load metrics, trend charts |
| Prometheus metrics collection | Auto-collects running/waiting/GPU cache metrics from vLLM/SGLang |
| Webhook notifications | Failure/recovery/periodic reports pushed to WeCom (Enterprise WeChat) |
| Multi-node deployment tool | One-click vLLM/SGLang deployment to multiple nodes with auto-generated router configs |
| Benchmarking tool | Multi-process + async coroutines, supports 10K+ concurrency with live visualization |
go_router/
├── cmd/router/ # Entry point
├── internal/
│ ├── config/ # Config loading & defaults
│ ├── handler/ # HTTP request handlers
│ ├── health/ # Health checks & config hot-reload
│ ├── model/ # Data models (Backend, Metrics, etc.)
│ ├── monitor/ # Monitoring (Prometheus collector, Web dashboard, TUI dashboard)
│ ├── notify/ # Webhook notifications (WeCom)
│ ├── pool/ # COW backend pool management
│ ├── ratelimit/ # Sliding window RPM rate limiter
│ ├── router/ # Core router (forwarding, retry, error classification)
│ ├── routing/ # Routing strategies (shuffle / least-connections / least-waiting)
│ └── util/ # Utilities (URL, stats, Prometheus parsing)
├── configs/ # Backend config files
│ ├── openai_config.yaml
│ └── self_deployed_config.yaml
├── scripts/
│ ├── self_deploy.py # Multi-node vLLM/SGLang deployment tool
│ ├── benchmark.py # High-performance benchmarking tool
│ └── install_go.sh # Go environment installation script
├── config.yaml # Main configuration file
├── run.sh # One-click launch script
└── docs/
├── quick_start.md # Quick start guide (English)
├── quick_start_zh.md # 快速上手指南 (中文)
└── readme_zh.md # 中文 README
- Go 1.23+ (use
scripts/install_go.shto install) - Python 3.8+ (for deployment and benchmarking scripts)
# Build
go build -o llm-router ./cmd/router/
# Run
./llm-router -config config.yamlOr use the one-click launch script:
bash run.shAfter startup:
- API service:
http://0.0.0.0:8000 - Web dashboard:
http://0.0.0.0:80
Point model.api_base in your DataStudio config to the Router:
model = dict(
model="Qwen3-VL-30B-A3B-Instruct",
api_base="http://<router-host>",
port=8000,
thread_num=1024,
return_dict=True,
)The Router automatically distributes requests to all healthy backend instances.
For detailed usage, see the Quick Start Guide (中文).
| Method | Path | Description |
|---|---|---|
| POST | /v1/chat/completions |
Route and forward chat completion requests |
| GET | /v1/models |
Return registered model list (OpenAI-compatible format) |
| GET | /health |
Router health check |
server:
host: "0.0.0.0"
port: 8000
routing:
strategy: "simple-shuffle" # simple-shuffle | least-connections | least-waiting
num_retries: 3
health_check:
interval: 30
timeout: 10
failure_threshold: 3
dashboard:
enabled: true
web_port: 80
backends:
config_dir: "configs"
enabled_sources:
- "openai"
- "self_deployed"Each source_type corresponds to a YAML file (filename prefix determines the type):
# configs/self_deployed_config.yaml
model_list:
- model_name: "Qwen3-VL-30B-A3B-Instruct"
litellm_params:
model: "openai/Qwen3-VL-30B-A3B-Instruct"
api_base: "http://10.0.0.1:8000/v1"
api_key: "dummy"
supports_vision: true
weight: 1.0
source_type: "self_deployed"
rpm_limit: null # null = unlimited| Argument | Default | Description |
|---|---|---|
-config |
config.yaml |
Path to main config file |
-log-level |
from config | Override log level: debug / info / warn / error |
-enabled-sources |
from config | Comma-separated enabled source types |
| Project | Description | Link |
|---|---|---|
| DataStudio | Config-driven multimodal data processing pipeline | GitHub |
| DataVis | Web-based multimodal data visualization & analysis | GitHub |
| Honey-Data-15M | 15M high-quality QA pairs produced by DataStudio | HuggingFace |
| Bee | Fully open-source MLLM project | Project Page |