Fix Claude Code's prompt cache inefficiency that causes up to 20x cost increase and 2.6x faster quota burn on resumed sessions.
Includes an optional reverse proxy + live dashboard for monitoring cache behavior, quota burn, and token costs in real time.
Fixes are safe — they normalize data you're already sending. No server-side features are overridden, no system prompts are modified.
cnighswonger/claude-code-cache-fix for the original fix and ANTHROPIC_BASE_URL for enabling the proxy routing. I modified the cache fix to remove features that override server-controlled behavior or modify system prompts.
Claude Code sends API requests to Anthropic with a prompt cache that relies on byte-stable prefixes. Three client-side bugs break this stability when you resume sessions:
- Tool ordering jitter — tool definitions arrive in non-deterministic order
- Fingerprint instability — the
cc_versionfingerprint is computed from meta blocks that shift on resume - Block scatter — attachment blocks (skills, MCP, hooks) get scattered across messages instead of staying in
messages[0]
Each mismatch causes a full cache bust: instead of paying $0.50/M for a cache read, you pay $10/M for a cache write.
| Metric | Without Fix | With Fix | Improvement |
|---|---|---|---|
| Cache hit rate | 96.8% | 98.7% | +1.9pp |
| Cache create tokens/call | 4,165 | 1,816 | -56% |
| Tokens per 1% quota | 2.3M | 4.5M | +94% |
| Quota burn rate | 19.1%/hr | 7.2%/hr | -62% |
| Projected session length | 4.8 hours | 11.4 hours | +138% |
96.8% → 98.7% sounds like a small change. But it translates to a 56% reduction in cache_create tokens because cache creation costs 20x more than cache reads ($10/M vs $0.50/M).
- Node.js 18+ — required for the cache fix (intercepts Node's global fetch)
- Claude Code installed via npm —
npm install -g @anthropic-ai/claude-code- Must be the npm package, not a standalone binary — the fix patches Node's global fetch before the process starts
- Python 3.10+ — only needed for the reverse proxy and analysis tools
git clone https://github.com/RobbySingh/ccproxycache.git
cd claudefix
make installThen add the aliases to your shell config so you can use claude-cache-fix from any directory. Run:
make install-aliasThis prints the exact lines to paste. Copy the block for your shell:
Linux / macOS — bash (~/.bashrc) or zsh (~/.zshrc):
# Cache fix only (no proxy logging)
alias claude-cache-fix='NODE_OPTIONS="--import /absolute/path/to/claudefix/custom-cache-fix/preload.mjs" claude'
# Cache fix + route through reverse proxy (enables dashboard)
alias claude-proxy='ANTHROPIC_BASE_URL=http://localhost:3001 NODE_OPTIONS="--import /absolute/path/to/claudefix/custom-cache-fix/preload.mjs" claude'Replace /absolute/path/to/claudefix with the actual path shown by make install-alias.
After pasting, reload your shell:
source ~/.bashrc # or: source ~/.zshrcWindows — PowerShell profile ($PROFILE):
# Cache fix only
function claude-cache-fix {
$env:NODE_OPTIONS='--import C:\path\to\claudefix\custom-cache-fix\preload.mjs'
claude @args
}
# Cache fix + proxy
function claude-proxy {
$env:NODE_OPTIONS='--import C:\path\to\claudefix\custom-cache-fix\preload.mjs'
$env:ANTHROPIC_BASE_URL='http://localhost:3001'
claude @args
}Fish shell (~/.config/fish/config.fish):
alias claude-cache-fix='env NODE_OPTIONS="--import /absolute/path/to/claudefix/custom-cache-fix/preload.mjs" claude'
alias claude-proxy='env ANTHROPIC_BASE_URL=http://localhost:3001 NODE_OPTIONS="--import /absolute/path/to/claudefix/custom-cache-fix/preload.mjs" claude'The reverse proxy sits between Claude Code and the Anthropic API. It logs every request and response and streams them to a live dashboard where you can watch cache behavior, quota burn, and cost in real time.
Open two terminals:
Terminal 1 — start the proxy (port 3001):
make start-proxyTerminal 2 — start the dashboard (port 2999):
make start-dashboardOpen http://localhost:2999.
Without the proxy (cache fix only):
claude-cache-fixThis is the minimal setup. The cache fix is active, nothing is logged to the proxy.
claude-cache-fix --resume <session-id> # resume a session
CACHE_FIX_DEBUG=0 claude-cache-fix # disable debug loggingWith the proxy (cache fix + dashboard monitoring):
Requires Steps 1 and 2 above (proxy and dashboard running).
claude-proxyYou'll see live stats in the dashboard at http://localhost:2999.
Without any fix (baseline / comparison):
claudeUse this to compare quota burn before vs after the fix.
Normally, Claude Code talks directly to Anthropic's API:
Claude Code ──────────────────────────────→ api.anthropic.com
←──────────────────────────────
(response with usage data)
Anthropic's API returns rate-limit and quota information in response headers — fields like anthropic-ratelimit-unified-5h-utilization that tell you how much of your 5-hour quota window you've consumed. By default these headers are invisible to you: Claude Code reads them internally and you never see them.
A reverse proxy inserts itself in the middle:
Claude Code ──→ localhost:3001 ──────────→ api.anthropic.com
←── (proxy) ←──────────
│
▼
logs/*.jsonl
│
▼
dashboard :2999
The proxy forwards every request upstream unmodified, collects every response (including all headers), and writes the full request + response pair to a daily log file. It then streams log entries to the dashboard via SSE (Server-Sent Events) so you can see stats live.
Without the proxy, all you know about cache performance is what Claude Code's status line shows. With the proxy you get:
- Cache hit rate per call — see exactly when busts happen and how bad they are
- Quota burn rate — how fast you're consuming your 5-hour and 7-day windows
- Cache create vs read breakdown — the expensive writes vs cheap reads, charted over time
- Context size per call — watch context grow across a long session
- Per-session stats — cost, cache %, tool call breakdown
- Full request/response inspection — the complete system prompt, messages, tool definitions, and usage object for any call
The proxy does not modify requests or responses. It is pure passthrough + logging.
| Fix | What it does |
|---|---|
| Block relocation | Moves scattered attachment blocks (skills, MCP, deferred tools, hooks) back to messages[0] |
| Block sorting | Sorts skill entries and deferred tool names alphabetically within their blocks |
| Content pinning | SHA-256 pins block content so async MCP registration jitter doesn't produce different bytes |
| Session knowledge strip | Removes <session_knowledge> tags from hooks blocks (ephemeral data that busts cache) |
| Tool order stabilization | Sorts tool definitions alphabetically by name |
| Fingerprint stabilization | Recomputes cc_version from real user text instead of meta blocks |
| Image stripping | Replaces base64 images in tool results older than 3 user turns with a text placeholder |
| Debug logging | Logs all fix actions to ~/.claude/cache-fix-debug.log |
- Transparent passthrough — no modification, pure logging
- Daily JSONL log files with full request/response data
- Live SSE streaming to the dashboard
- 8 KPI cards: cost, burn rate, cache hit rate, input/output tokens, cache writes, sessions, quota
- 4 charts: cache rate over time, quota utilization, context size per call, cache create vs read
- Session view with per-session stats and tool call breakdown
- Request detail panel with headers, full request/response, and tool calls
python devops/analyze_quota.py # today's logs
python devops/analyze_quota.py --date 2026-04-12 # specific date
python devops/analyze_quota.py --json # machine-readable outputOutputs per-phase quota burn rates, cache hit rates, TTL tier breakdown, and implied quota pool size.
| Variable | Default | Description |
|---|---|---|
CACHE_FIX_DEBUG |
1 (on) |
Log fix actions to ~/.claude/cache-fix-debug.log. Set 0 to disable |
CACHE_FIX_IMAGE_KEEP_LAST |
3 |
Keep images in last N user messages. Set 0 to disable stripping |
| Variable | Default | Description |
|---|---|---|
PROXY_PORT |
3001 |
Port the proxy listens on |
ANTHROPIC_FORWARD_URL |
https://api.anthropic.com |
Upstream API URL |
The fastest way to convince yourself the fix is real is to run the same kind of work twice — once without the fix, once with — and compare the quota burn numbers. Here's the exact procedure.
- Reverse proxy running (
make start-proxy) — this captures the usage headers - Dashboard open (
make start-dashboard) — for a visual read during the session - About 60 minutes
Phase 1 — 30 minutes without the fix (baseline)
- Note your current 5-hour quota utilization. The dashboard shows this, or check the top-right of Claude Code's status line.
- Run a normal working session for 30 minutes using
claude(no cache fix, no proxy alias):ANTHROPIC_BASE_URL=http://localhost:3001 claude
- After 30 minutes, note the quota utilization again. Calculate how many percentage points you burned.
Phase 2 — 30 minutes with the fix
- Stop and restart your Claude Code session using
claude-proxy(cache fix + proxy):claude-proxy
- Do the same kind of work for another 30 minutes.
- After 30 minutes, note the quota utilization again.
Keep the type of work roughly consistent between phases — both coding sessions, both research sessions, etc. You're controlling for work type to isolate the fix's effect.
After both phases, run the quota analysis script against today's logs:
python devops/analyze_quota.pyThis outputs per-phase burn rates, cache hit rates, and cache create vs read breakdowns. The script automatically splits phases based on gaps in API call timing.
Sample output structure:
Phase 1 (14 calls, 18.3 min)
Cache hit rate: 96.2%
Cache create/call: 4,312 tokens
Quota burn: 0.041% per 1M tokens
Phase 2 (22 calls, 26.1 min)
Cache hit rate: 98.9%
Cache create/call: 1,740 tokens
Quota burn: 0.022% per 1M tokens
For JSON output (easier to paste into Claude):
python devops/analyze_quota.py --jsonCopy the JSON output and ask Claude Code to analyze it. Here's a prompt that works well:
Here is quota and cache performance data from two phases of my Claude Code session today.
Phase 1 ran without the cache fix. Phase 2 ran with the cache fix enabled.
<paste JSON output here>
Given this data:
1. What is the reduction in cache_create tokens per call between phases?
2. What does this imply about quota burn rate improvement?
3. Are there any anomalies or signs that the fix didn't fully apply?
4. At this burn rate difference, how much longer would a 5-hour quota window last with the fix vs without?
Use the cache_creation vs cache_read token ratio as the primary signal — not wall-clock time,
since phases may have different call counts.
Claude has full context on how the proxy logs are structured and what the numbers mean (the context is right there in the session), so it can do the ratio math and flag anything unexpected.
The key ratio is cache_create tokens per API call. Not hit rate percentage — that number compresses the signal. A drop from 4,000 to 1,800 cache_create tokens/call is the real number: it's a 55% reduction in the most expensive token type.
If Phase 2 shows meaningfully lower cache_create/call and higher tokens-per-1%-quota, the fix is working. If the numbers are similar, check the debug log:
tail -f ~/.claude/cache-fix-debug.logYou should see APPLIED: lines on every call. If you don't, the preload isn't loading — verify the NODE_OPTIONS env var is set in your shell.
When CACHE_FIX_DEBUG=1 (default), all fix actions are logged to ~/.claude/cache-fix-debug.log:
[2026-04-12T04:37:02Z] --- API call, messages: 81
[2026-04-12T04:37:02Z] APPLIED: tool order stabilized
[2026-04-12T04:37:02Z] APPLIED: fingerprint c05 → 0d2
[2026-04-12T04:37:02Z] APPLIED: stripped 7 images from old tool results
[2026-04-12T04:37:02Z] Request body rewritten
If you see APPLIED: lines, the fix is working. If you see nothing, either the fix isn't loaded or the request was already optimal.
claudefix/
Makefile make install / start-proxy / start-dashboard / install-alias
custom-cache-fix/
preload.mjs Fetch interceptor (~400 lines)
CLAUDE.md Detailed docs for the cache fix
reverse-proxy/
main.py Python proxy server (~390 lines)
requirements.txt Python deps (just requests)
pyproject.toml uv config
logs/ Daily JSONL log files (gitignored — contains API keys)
dashboard/ React + Vite + Tailwind + Recharts
devops/
analyze_quota.py Quota analysis CLI
experiment/
run-experiment.sh A/B testing framework (baseline vs fixed)
The reverse proxy logs contain your full Anthropic API key in request headers. The logs/ directory is gitignored and should be treated as sensitive.
Never commit log files. Never share these logs.
- Claude Code docs
- Anthropic API reference
- claude-code-cache-fix — the original package this is based on
MIT