Unified LLM usage management — API proxy, session diagnostics, multi-CLI orchestration.
This project started from a need to escape deep vendor lock-in with a single AI coding tool. After investigating hidden behaviors in Claude Code — silent token inflation, false rate limits, context stripping, and opaque feature flags — it became clear that relying on one vendor's black box was a risk. llm-relay was built to take back visibility and control: monitor what's actually happening, diagnose problems independently, and orchestrate across multiple CLI tools (Claude Code, Codex, Gemini) so no single provider becomes a single point of failure.
- Proxy: Transparent API proxy with cache/token monitoring and 12-strategy pruning
- Detect: 7 detectors (orphan, stuck, synthetic, bloat, cache, resume, microcompact)
- Recover: Session recovery and doctor (7 health checks)
- Guard: 4-tier threshold daemon with dual-zone classification
- Cost: Per-1% cost calculation and rate-limit header analysis
- Orch: Multi-CLI orchestration (Claude Code, Codex CLI, Gemini CLI)
- Display: Multi-CLI session monitor with provider badges and liveness detection
- MCP: 7 tools via stdio transport (cli_delegate, cli_status, cli_probe, orch_delegate, orch_history, relay_stats, session_turns)
# CLI only (diagnostics, recovery, orchestration)
pip install llm-relay
# With proxy + web dashboard
pip install llm-relay[proxy]
# With MCP server (Python 3.10+)
pip install llm-relay[mcp]
# Everything
pip install llm-relay[all]llm-relay scan # Session health check (7 detectors)
llm-relay doctor # Configuration health check (7 checks)
llm-relay recover # Extract session context for resumption# Option 1: Direct
pip install llm-relay[proxy]
uvicorn llm_relay.proxy.proxy:app --host 0.0.0.0 --port 8083
# Option 2: Docker
cp .env.example .env # Edit as needed
docker compose up -dThen open:
/dashboard/— CLI status, cost, delegation history/display/— Turn counter with CC/Codex/Gemini session cards
llm-relay-mcp # stdio transport, 7 tools# Set in Claude Code
export ANTHROPIC_BASE_URL=http://localhost:8080| CLI | Status |
|---|---|
| Claude Code | Fully supported |
| OpenAI Codex | Fully supported |
| Gemini CLI | Display supported, oauth-personal has known 403 server-side bug (#25425) |
- Python >= 3.9
- MCP tools require Python >= 3.10
MIT
Part of the QuartzUnit open-source ecosystem.