A Rust-based load balancer and proxy for AI API providers. Aggregates multiple API providers (OpenAI, Azure, DeepSeek, OpenRouter, etc.) for the same model with automatic failover and health checking.
- Multi-Provider Load Balancing: Route requests across multiple providers for the same model (e.g., DeepSeek-R1 from OpenRouter, Azure, or DeepSeek)
- Automatic Failover: Automatically skips providers that are unavailable or return quota exceeded errors
- Health Checking: Background health checks monitor provider availability every 30 seconds
- OpenAI-Compatible API: Drop-in replacement for OpenAI API clients
- YAML Configuration: Simple configuration file format
- Model Isolation: Each model is independently configured
# Build the project
cargo build --release
# Run with default config
cargo run
# Run with custom config
cargo run -- config.yamlCreate a config.yaml file:
server:
host: "0.0.0.0"
port: 8080
config_file: "config.yaml"
models:
deepseek-r1:
model_name: "deepseek-r1"
providers:
- name: "openrouter"
api_base: "https://openrouter.ai/api/v1"
api_key: "${OPENROUTER_API_KEY}"
enabled: true
- name: "azure"
api_base: "https://your-resource.openai.azure.com"
api_key: "${AZURE_API_KEY}"
enabled: true
- name: "deepseek"
api_base: "https://api.deepseek.com/v1"
api_key: "${DEEPSEEK_API_KEY}"
enabled: true
gpt-4o:
model_name: "gpt-4o"
providers:
- name: "openai"
api_base: "https://api.openai.com/v1"
api_key: "${OPENAI_API_KEY}"
enabled: true
- name: "azure"
api_base: "https://your-resource.openai.azure.com"
api_key: "${AZURE_API_KEY}"
enabled: trueAPI keys support environment variable substitution:
api_key: "${OPENROUTER_API_KEY}"Set them in your shell:
export OPENROUTER_API_KEY="your-key-here"
export AZURE_API_KEY="your-key-here"
export DEEPSEEK_API_KEY="your-key-here"
export OPENAI_API_KEY="your-key-here"| Endpoint | Method | Description |
|---|---|---|
/ |
GET | Health check |
/health |
GET | Health check |
/v1/models |
GET | List available models and providers |
/v1/chat/completions |
POST | OpenAI-compatible chat endpoint |
/v1/models/:model_name/chat/completions |
POST | Chat completions for specific model |
/v1/:model_name/*tail |
POST | Generic proxy for any endpoint |
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="dummy" # Required but unused
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-r1",
"messages": [{"role": "user", "content": "Hello!"}]
}'┌─────────────┐
│ Client │
└──────┬──────┘
│
▼
┌─────────────────────────────────────────┐
│ AI API Pool Server │
│ (Axum + Tokio) │
│ │
│ ┌─────────────┐ ┌─────────────────┐ │
│ │ Router │ │ Health Checker │ │
│ │ (Routes) │ │ (Background) │ │
│ └──────┬──────┘ └────────┬────────┘ │
│ │ │ │
│ ▼ │ │
│ ┌─────────────┐ │ │
│ │LoadBalancer │◄─────────┘ │
│ │ (Round-Robin) │
│ └──────┬──────┘ │
│ │ │
└─────────┼───────────────────────────────┘
│
┌─────┴─────┬────────────┐
▼ ▼ ▼
┌───────┐ ┌───────┐ ┌───────┐
│Provider│ │Provider│ │Provider│
│(OpenAI)│ │(Azure) │ │(DeepSeek)│
└───────┘ └───────┘ └───────┘
src/
├── main.rs # Entry point
├── lib.rs # Library exports
├── config.rs # Configuration loading
├── server.rs # HTTP server and routes
├── load_balancer.rs # Provider selection logic
├── providers/ # Provider implementations
│ └── mod.rs
└── health_check.rs # Provider health monitoring
- Runtime: Tokio (async)
- Web Framework: Axum
- HTTP Client: Reqwest
- Serialization: Serde (YAML/JSON)
- Logging: Tracing
MIT