What if you could use your Claude subscription just like it was the API?
Often what we need from the API is simple text in and text out: summarizing, processing, categorizing, etc. Sublet sacrifices speed but can save significant cost in this kind of scenario.
Point any Anthropic- or OpenAI-compatible client (OpenClaw, LiteLLM, Aider, your own scripts) at a local endpoint, and Sublet answers the request by driving the Claude CLI on your behalf using your OAuth login.
- Quickstart
- Value: Use the Subscription Tokens You've Got
- Theory: Wrap Claude Code to Operate Like the API
- Strategies
- Getting Your Tokens
- Configuration
- Endpoints
- Using With LiteLLM
- Contributing
- Warnings
You need Docker, the claude CLI, and a Claude subscription. One line from there:
curl -fsSL https://raw.githubusercontent.com/JumpstartLab/sublet/master/bin/install | bashThat script clones the repo, runs claude setup-token to mint a long-lived OAuth token, writes .env, and starts the container on :4001. When it finishes, you'll have a running proxy.
Send it something to chew on — classifying a support message into one of three buckets is a decent stand-in for the "text in, text out" work Sublet is built for:
curl -s http://localhost:4001/v1/messages \
-H 'content-type: application/json' \
-d '{
"model": "claude-haiku-4-5-20251001",
"max_tokens": 20,
"messages": [{
"role": "user",
"content": "Classify this support message as one of: billing, bug, feature_request. Respond with only the label.\n\nMessage: \"The export button throws a 500 every time I hit it on Safari.\""
}]
}' | jq -r '.content[0].text'After three to five seconds (that's the CLI spin-up tax — see Strategies) you should see:
bug
That's a live round trip: your request went through the proxy, spawned a claude --print subprocess against your subscription, and came back as a normal Messages API response. Point any Anthropic- or OpenAI-compatible client at http://localhost:4001 and it'll work the same way — see Endpoints for the full surface.
If you're building AI systems, you're paying two ways at once: a subscription plan with a generous pool of included tokens that often sits idle, and an API that bills per token and adds up fast — I've burned $25 or $50 in a day on a small prototype while my subscription did nothing.
Harnesses like OpenClaw and OpenCode used to bridge the gap by routing through your subscription; Anthropic shut that off in early April 2026.
For scrapers, background jobs, and overnight experiments where "slow" is fine, Sublet lets you spend the subscription tokens you're already paying for instead of doubling up on the API.
┌────────────┐ HTTP ┌───────────────┐ stdin/stdout ┌───────────┐
│ client │ ──────────▶ │ Sinatra app │ ───────────────▶ │ claude │
│ (LiteLLM, │ │ (app.rb) │ │ --print │
│ curl, …) │ ◀────────── │ │ ◀─────────────── │ │
└────────────┘ └───────────────┘ └───────────┘
│
│ refresh when token
▼ near expiry
┌──────────────────┐
│ Anthropic OAuth │
│ /v1/oauth/token │
└──────────────────┘
- Each request spawns a fresh
claude --printprocess with the access token inCLAUDE_CODE_OAUTH_TOKEN, runs in an empty working directory, and reads the prompt from stdin. --strict-mcp-configand the empty workdir ensure no project-levelCLAUDE.md, MCP servers, or plugins are loaded.- JSON output is parsed and mapped back to the requested API shape. Usage numbers come from the CLI's
usageblock. - Token refresh happens inside the process, protected by a mutex, five minutes before expiry. Refreshed tokens are written atomically to
TOKEN_STATE_FILE(tmp-write + rename) so a kill mid-write cannot corrupt the file. - If the env-var token prefix differs from the on-disk token prefix on startup, the proxy prefers the env var — so rotating credentials by editing
.envand restarting always wins over the saved state.
🐢 Slow-and-safe.
This strategy appears to be terms-of-service compliant. It drives the real Claude CLI (which does its own per-request cryptographic signing) through your OAuth login. It's our version of following the rules.
What you can expect: each request takes about three to five seconds to come back. If you're scraping data or processing things in the background, that's great. For live chat it's a little slow. If you're firing a large burst of requests in a short window, they'll queue behind the subprocess pool and it's going to take a while. But it works, and it works pretty well.
This is the default. Every route under /v1/* uses it. Nothing extra to enable.
Claude Code signs every outgoing request with a short hash called cch on an x-anthropic-billing-header block. It's computed at send time by native Zig code inside Bun's HTTP stack — not something a non-CLI client can replicate. The mechanism was exposed by the March 2026 Claude Code source leak (coverage: Alex Kim, Zscaler, NodeSource, Engineers Codex, Cybernews).
Right now Anthropic isn't enforcing the signature, so an OAuth token by itself still works — fast, and cheap relative to the API. But the mechanism wasn't built for nothing; enforcement is coming, and sampled request logs make it trivial to identify accounts that sent unsigned traffic. Only use this with an account you're willing to lose.
To opt in, set ENABLE_DIRECT_API=true and hit /direct/* — same request and response shapes as the CLI routes, different path prefix.
Sublet authenticates to Anthropic using an OAuth token tied to your Claude subscription. There are two ways to get one.
This is what the Quickstart uses. Running claude setup-token opens a browser for authentication and then prints a long-lived OAuth token (valid for about a year) that's specifically designed for non-interactive use. No refresh token, no auto-refresh state to persist, no 8-hour expiry to worry about.
The installer handles the paste for you. Doing it by hand, open .env in your editor and add a single line CLAUDE_OAUTH_TOKEN=<the printed value> — editors let you see and fix any stray line breaks the terminal may have introduced when it wrapped the token on screen. Don't pipe the token through echo at a shell prompt; long tokens wrap in most terminals and can pick up embedded newlines on paste.
This is the right choice for servers, CI, or any deployment where you don't want to babysit the credential lifecycle.
If you've already logged in with claude on this machine, the CLI has an access token (~8 hour lifespan) and a matching refresh token stashed in OS-specific credential storage. Sublet can use those directly — it ships with a TokenManager that refreshes the access token automatically before it expires and persists the refreshed pair to TOKEN_STATE_FILE so the next restart picks up where you left off.
macOS — tokens live in the Keychain under the service name Claude Code-credentials:
security find-generic-password -s "Claude Code-credentials" -w \
| jq -r '.claudeAiOauth | "CLAUDE_OAUTH_TOKEN=\(.accessToken)\nCLAUDE_OAUTH_REFRESH_TOKEN=\(.refreshToken)"' \
> .envFirst run will prompt once for your login password so security can unlock the Keychain.
Linux — Claude Code writes credentials to ~/.claude/.credentials.json with the same JSON shape:
jq -r '.claudeAiOauth | "CLAUDE_OAUTH_TOKEN=\(.accessToken)\nCLAUDE_OAUTH_REFRESH_TOKEN=\(.refreshToken)"' \
~/.claude/.credentials.json > .envWindows — Claude Code uses the Windows Credential Manager. Open it from the Start menu, find the "Claude Code" entry, and copy accessToken and refreshToken into .env manually. (PR welcome for a PowerShell one-liner.)
Once .env has your token, bring Sublet up with Docker Compose:
docker compose up -d --build(The Quickstart installer does this for you; this step is only needed if you wrote .env yourself.)
Once the container is up, confirm the tokens are live by hitting the health endpoint:
curl -s http://localhost:4001/health | jq .You should see something like:
{
"status": "ok",
"mode": "cli-subprocess",
"cli_version": "2.1.109",
"max_concurrent": 5,
"active_requests": 0,
"subprocess_timeout": 120,
"token_prefix": "sk-ant-oat01-...",
"has_refresh": true,
"expires_in": 28245,
"expires_at": "2026-04-16T20:00:00Z",
"auto_refresh": true
}has_refresh: false is normal and fine if you minted a long-lived token with claude setup-token — that token doesn't need one.
Only one environment variable is actually required: CLAUDE_OAUTH_TOKEN. Everything else — concurrency limits, subprocess timeout, token storage path, direct-mode opt-in, and a couple of advanced knobs — is optional tuning with sensible defaults.
See CONFIGURATION.md for the full variable reference.
Sublet exposes Anthropic-compatible and OpenAI-compatible inference endpoints, plus operational routes for health checks and token management:
POST /v1/messages— Anthropic Messages API (CLI subprocess mode)POST /v1/chat/completions— OpenAI-compatiblePOST /direct/v1/messages,POST /direct/v1/chat/completions— same request/response shapes, direct-to-Anthropic path (requiresENABLE_DIRECT_API=true; see Danger Zone)GET /health,GET /v1/models,POST /refresh— liveness and token management
See ENDPOINTS.md for full request/response examples and limitations.
Sublet speaks both the Anthropic and OpenAI request shapes, so any LiteLLM model entry just needs its api_base pointed at http://localhost:4001 (or /direct if you've enabled direct mode).
See LITELLM.md for a minimal config.yaml and a dual-strategy example.
PRs welcome. See CONTRIBUTING.md for local dev setup, the repo layout, and how to run the test suite — including running tests without a paid Claude subscription.
A few things that apply no matter which strategy you pick:
- This is an unofficial tool and is not affiliated with or endorsed by Anthropic. Anthropic's Consumer Terms, Commercial Terms, and Usage Policy govern how you may use Claude. You are solely responsible for determining whether your use is permitted.
- Don't expose this proxy to the public internet. Anyone who can reach it can spend your subscription. Bind to
127.0.0.1, put it on a private network (Tailscale, WireGuard, VPN), or front it with auth. The default bind is0.0.0.0:4001for container convenience — not a safe default for a public host. - Your OAuth token is a credential. Treat the
.envfile, the token state file, and the logs with the same care you would an API key. - Responses are not streamed. Tool use / function calling is not supported. Only text in, text out.