Slash command + FastAPI proxy. Session-pinned multi-key rotation, Anthropic↔OpenAI protocol translation, rate-limit failover, budget circuit breaker. Cut Claude API spend by 60-90% on read-heavy workloads by dispatching the heavy reading to a swarm of cheap worker models.
Quick Start · How It Works · Docs · 中文文档 · FAQ
If you use Claude Code, Codex CLI, Aider, or Cursor for serious work, you have hit at least one of these:
- 💸 The bill. One Opus loop on a 50k-token repo costs
$7.50. One bad afternoon:$150. - 🚦 HTTP 429. "Too Many Requests" — the moment you fire two agents in parallel, your single API key is rate-limited.
- 🔒 Vendor lock-in. You pay full price even when 80% of the work is "scan 50 files, find unused imports" — a task a
gpt-4o-miniwould handle for 1/100th the cost. - 🧠 Context bloat. Your expensive model wastes 80k input tokens re-reading code it could have summarized in 200.
/subclaw is a slash command + a FastAPI gateway that fixes all four.
| What you get | How it works |
|---|---|
| 60-90% lower spend (read-heavy) | Heavy work (scan, draft, find) goes to models ~15-20x cheaper; Opus only audits the final summary. |
| Bypass 429 rate limits | N API keys, round-robin across worker sessions, transparent failover on throttling. |
| Prompt cache locality | Each session is pinned to one key so Anthropic's prompt cache stays warm. Up to 90% cache hit rate. |
| Drop-in protocol translation | Workers using Claude Code (Anthropic protocol) can hit OpenAI endpoints transparently. No code changes. |
| Budget circuit breaker | Per-session and per-day USD cap. Stops spending before the bill arrives. |
| Self-hosted, no SaaS | Single python app.py on localhost:4748. Your keys never leave your box. |
| Generic over models | Works with Anthropic, OpenAI, OpenRouter, any Anthropic-compatible endpoint. Tiers: cheap / balanced / smart. |
You (Claude Code / Codex / Aider) = Team Lead / Supervisor
|
| /subclaw "audit this repo"
v
run-claw-pool.sh (fans N briefs to N workers, parallel)
|
v
claw-proxy :4748 (owns key pool, pins one key/session,
| translates protocol, fails over on 429)
v
Worker models (cheap / balanced / smart) — isolated context
return CONCISE file:line evidence back to you
|
v
You synthesize + audit the N reports → final answer
The killer detail: x-session-id session affinity. A worker that finishes a multi-turn task will keep hitting the same API key, so Anthropic's prompt cache stays warm. Other gateways do not do this.
git clone https://github.com/Akichoooo/subclaw.git
cd subclaw/proxy
cp keys.example.json keys.json
# Edit keys.json: drop in your API keys, model aliases, and budget caps.
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
python app.py
# claw-proxy listening on http://localhost:4748For Claude Code:
cp ./cli-skills/claude/subclaw.md ~/.claude/commands/
cp ./cli-skills/run-claw-pool.sh ~/.claude/scripts/
cp ./cli-skills/live_tree_ui.py ~/.claude/scripts/
chmod +x ~/.claude/scripts/run-claw-pool.sh
chmod +x ~/.claude/scripts/live_tree_ui.pyFor Codex CLI / Aider / Cursor: see docs/integrations.md.
/subclaw find all unused imports in the backend and provide file:line evidence
/subclaw draft unit tests for src/payments/
/subclaw audit this whole repo for security smells
Your main model breaks the task into N briefs, the proxy fans them to cheap models in parallel, and you read N concise reports.
- 🧹 Mass code auditing — 50 files, Opus only reads 50 summaries.
- 🔍 Repo-wide search — "find all callers of
deprecated_api()". - 🧪 Test generation — draft 30 test files with a cheap model, Opus reviews.
- 📝 Documentation sweep — generate docstrings for 200 functions, batch.
- 🛡️ Security smell scan — "audit for hardcoded secrets" across the monorepo.
- 🔁 Refactor planning — cheap model proposes diffs, smart model critiques, Opus synthesizes.
See docs/use-cases.md for the full playbook.
Illustrative projections from the assumptions shown below — not independently measured. Real savings vary heavily by workload.
| Scenario | Single Opus (no proxy) | subclaw (Opus + cheap swarm) | Saving |
|---|---|---|---|
| Audit 50 files (50k tokens) | $7.50 input + 10× loop ≈ $75.00 |
Opus: $0.10 + 50× Haiku parallel $0.0075 ≈ $0.11 |
~99% (est.) |
| Repo-wide grep (200k tokens) | $30 input | $0.30 (200k input × $1.50/1M cached) | ~99% (est.) |
| Daily budget: 20 audits/day | ~$1,500 | ~$15 | ~99% (est.) |
See docs/benchmarks.md for methodology and full data.
subclaw is one of several LLM gateways. Why this one?
| Feature | subclaw | LiteLLM | OpenRouter | Portkey | claude-code-router |
|---|---|---|---|---|---|
| Self-hosted, no SaaS | ✅ | ✅ | ❌ | ✅ | |
| Session affinity for prompt cache | ✅ core | ❌ | n/a | ❌ | ❌ |
| Multi-key rotation w/ budget breaker | ✅ | ❌ | n/a | ❌ | |
| Anthropic↔OpenAI streaming translation | ✅ | ✅ | n/a | ✅ | ✅ |
| Slash command UX (Claude Code / Codex) | ✅ | ❌ | ❌ | ❌ | ✅ |
| 429 failover with key rebinding | ✅ | n/a | ✅ | ❌ | |
| Generic over any model | ✅ | ✅ | ✅ | ✅ | |
| Designed for cost-optimized swarms | ✅ | ❌ | ❌ | ❌ | ❌ |
Full comparison: docs/comparisons.md.
Q: Does subclaw require me to use Claude Code? A: No. The slash command is one frontend. You can drive the gateway from any HTTP client that speaks Anthropic or OpenAI protocol.
Q: How is this different from LiteLLM? A: LiteLLM is a 100-provider protocol router — a great library, not a cost optimizer. subclaw is built for one job: keep your expensive model's context window small by fanning cheap work to cheap keys, with session-pinned prompt cache locality and a budget circuit breaker. Use LiteLLM if you need 30 providers. Use subclaw if you need to slash your Claude bill.
Q: How is this different from OpenRouter? A: OpenRouter is a SaaS. You give them your keys (or pay them), they route. subclaw is self-hosted on your box; your keys never leave. Plus, subclaw's session pinning is a unique prompt-cache optimization that OpenRouter doesn't do.
Q: Will this work with non-Anthropic models?
A: Yes — any OpenAI-protocol endpoint, any Anthropic-protocol endpoint, any Anthropic-compatible vendor. Configure the model in keys.json and assign a tier (cheap / balanced / smart).
Q: Is it safe to give workers my real API key? A: No — that's the whole point. The gateway owns the keys. Workers carry no real credentials, only the proxy URL.
Q: What's the budget circuit breaker?
A: Configured in keys.json under global_proxy_settings.circuit_breaker. max_spend_per_session_usd and max_spend_per_day_usd halt further requests once hit. No surprise bills.
Q: How do I add a new model?
A: Add an entry to keys.json with the url, key, model_id, alias, and tier. The gateway hot-loads on the next request.
See docs/faq.md for the full list.
- 📐 Architecture deep-dive — how session pinning, prompt cache locality, and failover work.
- ⚖️ Comparisons — vs LiteLLM, OpenRouter, Portkey, claude-code-router, one-api.
- ❓ FAQ — 30+ questions about setup, costs, security, scaling.
- 📊 Benchmarks — full cost / cache hit / latency data.
- 🎯 Use cases — 6 real-world scenarios with command examples.
- 🔌 Integrations — Codex CLI, Aider, Cursor, custom clients.
- 🚀 Show HN post draft — copy-paste text for HN submission.
- 📣 Awesome list submissions — PR templates for 6 awesome-* lists.
- Prompt-cache hit rate auto-tuning (auto-detect cache miss / rekey)
- OpenAI function-calling → Anthropic tool_use full bidirectional translation (currently best-effort)
- Web dashboard with per-model cost / cache / latency charts
- Multi-user auth + per-user budget isolation
- PyPI package:
pip install subclaw - Helm chart for Kubernetes deployment
Have a feature request? Open a discussion.
We welcome PRs. Please read CONTRIBUTING.md first.
MIT © Akichoooo
- The Anthropic team for prompt caching — the entire architecture hinges on
cache_read_input_tokens. - The
claude-code-routerproject for pioneering the multi-model idea for Claude Code. - The LiteLLM project for showing the community what's possible with protocol translation.
- Everyone who files an issue, opens a PR, or kicks the tires. 🙏
If subclaw saved your Claude bill, leave a star ⭐ — it directly fuels more contributors and lower prices for everyone.