ccproxycache

Fix Claude Code's prompt cache inefficiency that causes up to 20x cost increase and 2.6x faster quota burn on resumed sessions.

Includes an optional reverse proxy + live dashboard for monitoring cache behavior, quota burn, and token costs in real time.

Fixes are safe — they normalize data you're already sending. No server-side features are overridden, no system prompts are modified.

Credits

cnighswonger/claude-code-cache-fix for the original fix and ANTHROPIC_BASE_URL for enabling the proxy routing. I modified the cache fix to remove features that override server-controlled behavior or modify system prompts.

The problem

Claude Code sends API requests to Anthropic with a prompt cache that relies on byte-stable prefixes. Three client-side bugs break this stability when you resume sessions:

Tool ordering jitter — tool definitions arrive in non-deterministic order
Fingerprint instability — the cc_version fingerprint is computed from meta blocks that shift on resume
Block scatter — attachment blocks (skills, MCP, hooks) get scattered across messages instead of staying in messages[0]

Each mismatch causes a full cache bust: instead of paying $0.50/M for a cache read, you pay $10/M for a cache write.

Measured results (74-minute session, 345 API calls, Opus 4.6)

Metric	Without Fix	With Fix	Improvement
Cache hit rate	96.8%	98.7%	+1.9pp
Cache create tokens/call	4,165	1,816	-56%
Tokens per 1% quota	2.3M	4.5M	+94%
Quota burn rate	19.1%/hr	7.2%/hr	-62%
Projected session length	4.8 hours	11.4 hours	+138%

96.8% → 98.7% sounds like a small change. But it translates to a 56% reduction in cache_create tokens because cache creation costs 20x more than cache reads ($10/M vs $0.50/M).

Quick Start

Prerequisites

Node.js 18+ — required for the cache fix (intercepts Node's global fetch)
Claude Code installed via npm — npm install -g @anthropic-ai/claude-code
- Must be the npm package, not a standalone binary — the fix patches Node's global fetch before the process starts
Python 3.10+ — only needed for the reverse proxy and analysis tools

Step 1 — Clone and install

git clone https://github.com/RobbySingh/ccproxycache.git
cd claudefix
make install

Then add the aliases to your shell config so you can use claude-cache-fix from any directory. Run:

make install-alias

This prints the exact lines to paste. Copy the block for your shell:

Linux / macOS — bash (~/.bashrc) or zsh (~/.zshrc):

# Cache fix only (no proxy logging)
alias claude-cache-fix='NODE_OPTIONS="--import /absolute/path/to/claudefix/custom-cache-fix/preload.mjs" claude'

# Cache fix + route through reverse proxy (enables dashboard)
alias claude-proxy='ANTHROPIC_BASE_URL=http://localhost:3001 NODE_OPTIONS="--import /absolute/path/to/claudefix/custom-cache-fix/preload.mjs" claude'

Replace /absolute/path/to/claudefix with the actual path shown by make install-alias.

After pasting, reload your shell:

source ~/.bashrc   # or: source ~/.zshrc

Windows — PowerShell profile ($PROFILE):

# Cache fix only
function claude-cache-fix {
  $env:NODE_OPTIONS='--import C:\path\to\claudefix\custom-cache-fix\preload.mjs'
  claude @args
}

# Cache fix + proxy
function claude-proxy {
  $env:NODE_OPTIONS='--import C:\path\to\claudefix\custom-cache-fix\preload.mjs'
  $env:ANTHROPIC_BASE_URL='http://localhost:3001'
  claude @args
}

Fish shell (~/.config/fish/config.fish):

alias claude-cache-fix='env NODE_OPTIONS="--import /absolute/path/to/claudefix/custom-cache-fix/preload.mjs" claude'
alias claude-proxy='env ANTHROPIC_BASE_URL=http://localhost:3001 NODE_OPTIONS="--import /absolute/path/to/claudefix/custom-cache-fix/preload.mjs" claude'

Step 2 — (Optional) Start the reverse proxy and dashboard

The reverse proxy sits between Claude Code and the Anthropic API. It logs every request and response and streams them to a live dashboard where you can watch cache behavior, quota burn, and cost in real time.

Open two terminals:

Terminal 1 — start the proxy (port 3001):

make start-proxy

Terminal 2 — start the dashboard (port 2999):

make start-dashboard

Open http://localhost:2999.

Step 3 — Run Claude Code

Without the proxy (cache fix only):

claude-cache-fix

This is the minimal setup. The cache fix is active, nothing is logged to the proxy.

claude-cache-fix --resume <session-id>   # resume a session
CACHE_FIX_DEBUG=0 claude-cache-fix       # disable debug logging

With the proxy (cache fix + dashboard monitoring):

Requires Steps 1 and 2 above (proxy and dashboard running).

claude-proxy

You'll see live stats in the dashboard at http://localhost:2999.

Without any fix (baseline / comparison):

claude

Use this to compare quota burn before vs after the fix.

What is a reverse proxy?

Normally, Claude Code talks directly to Anthropic's API:

Claude Code  ──────────────────────────────→  api.anthropic.com
             ←──────────────────────────────
                 (response with usage data)

Anthropic's API returns rate-limit and quota information in response headers — fields like anthropic-ratelimit-unified-5h-utilization that tell you how much of your 5-hour quota window you've consumed. By default these headers are invisible to you: Claude Code reads them internally and you never see them.

A reverse proxy inserts itself in the middle:

Claude Code  ──→  localhost:3001  ──────────→  api.anthropic.com
             ←──  (proxy)         ←──────────
                      │
                      ▼
                  logs/*.jsonl
                      │
                      ▼
                  dashboard :2999

The proxy forwards every request upstream unmodified, collects every response (including all headers), and writes the full request + response pair to a daily log file. It then streams log entries to the dashboard via SSE (Server-Sent Events) so you can see stats live.

Why it's valuable

Without the proxy, all you know about cache performance is what Claude Code's status line shows. With the proxy you get:

Cache hit rate per call — see exactly when busts happen and how bad they are
Quota burn rate — how fast you're consuming your 5-hour and 7-day windows
Cache create vs read breakdown — the expensive writes vs cheap reads, charted over time
Context size per call — watch context grow across a long session
Per-session stats — cost, cache %, tool call breakdown
Full request/response inspection — the complete system prompt, messages, tool definitions, and usage object for any call

The proxy does not modify requests or responses. It is pure passthrough + logging.

Features

Cache Fix (`custom-cache-fix/`)

Fix	What it does
Block relocation	Moves scattered attachment blocks (skills, MCP, deferred tools, hooks) back to `messages[0]`
Block sorting	Sorts skill entries and deferred tool names alphabetically within their blocks
Content pinning	SHA-256 pins block content so async MCP registration jitter doesn't produce different bytes
Session knowledge strip	Removes `<session_knowledge>` tags from hooks blocks (ephemeral data that busts cache)
Tool order stabilization	Sorts tool definitions alphabetically by name
Fingerprint stabilization	Recomputes `cc_version` from real user text instead of meta blocks
Image stripping	Replaces base64 images in tool results older than 3 user turns with a text placeholder
Debug logging	Logs all fix actions to `~/.claude/cache-fix-debug.log`

Reverse Proxy + Dashboard (`reverse-proxy/`)

Transparent passthrough — no modification, pure logging
Daily JSONL log files with full request/response data
Live SSE streaming to the dashboard
8 KPI cards: cost, burn rate, cache hit rate, input/output tokens, cache writes, sessions, quota
4 charts: cache rate over time, quota utilization, context size per call, cache create vs read
Session view with per-session stats and tool call breakdown
Request detail panel with headers, full request/response, and tool calls

Analysis Tools (`devops/`)

python devops/analyze_quota.py                    # today's logs
python devops/analyze_quota.py --date 2026-04-12  # specific date
python devops/analyze_quota.py --json             # machine-readable output

Outputs per-phase quota burn rates, cache hit rates, TTL tier breakdown, and implied quota pool size.

Configuration

Cache fix

Variable	Default	Description
`CACHE_FIX_DEBUG`	`1` (on)	Log fix actions to `~/.claude/cache-fix-debug.log`. Set `0` to disable
`CACHE_FIX_IMAGE_KEEP_LAST`	`3`	Keep images in last N user messages. Set `0` to disable stripping

Reverse proxy

Variable	Default	Description
`PROXY_PORT`	`3001`	Port the proxy listens on
`ANTHROPIC_FORWARD_URL`	`https://api.anthropic.com`	Upstream API URL

How to prove it works: the A/B experiment

The fastest way to convince yourself the fix is real is to run the same kind of work twice — once without the fix, once with — and compare the quota burn numbers. Here's the exact procedure.

What you need

Reverse proxy running (make start-proxy) — this captures the usage headers
Dashboard open (make start-dashboard) — for a visual read during the session
About 60 minutes

The experiment

Phase 1 — 30 minutes without the fix (baseline)

Note your current 5-hour quota utilization. The dashboard shows this, or check the top-right of Claude Code's status line.
Run a normal working session for 30 minutes using claude (no cache fix, no proxy alias):
```
ANTHROPIC_BASE_URL=http://localhost:3001 claude
```
After 30 minutes, note the quota utilization again. Calculate how many percentage points you burned.

Phase 2 — 30 minutes with the fix

Stop and restart your Claude Code session using claude-proxy (cache fix + proxy):
```
claude-proxy
```
Do the same kind of work for another 30 minutes.
After 30 minutes, note the quota utilization again.

Keep the type of work roughly consistent between phases — both coding sessions, both research sessions, etc. You're controlling for work type to isolate the fix's effect.

Analyze the logs

After both phases, run the quota analysis script against today's logs:

python devops/analyze_quota.py

This outputs per-phase burn rates, cache hit rates, and cache create vs read breakdowns. The script automatically splits phases based on gaps in API call timing.

Sample output structure:

Phase 1 (14 calls, 18.3 min)
  Cache hit rate:    96.2%
  Cache create/call: 4,312 tokens
  Quota burn:        0.041% per 1M tokens

Phase 2 (22 calls, 26.1 min)
  Cache hit rate:    98.9%
  Cache create/call: 1,740 tokens
  Quota burn:        0.022% per 1M tokens

For JSON output (easier to paste into Claude):

python devops/analyze_quota.py --json

Ask Claude to interpret the results

Copy the JSON output and ask Claude Code to analyze it. Here's a prompt that works well:

Here is quota and cache performance data from two phases of my Claude Code session today.
Phase 1 ran without the cache fix. Phase 2 ran with the cache fix enabled.

<paste JSON output here>

Given this data:
1. What is the reduction in cache_create tokens per call between phases?
2. What does this imply about quota burn rate improvement?
3. Are there any anomalies or signs that the fix didn't fully apply?
4. At this burn rate difference, how much longer would a 5-hour quota window last with the fix vs without?

Use the cache_creation vs cache_read token ratio as the primary signal — not wall-clock time,
since phases may have different call counts.

Claude has full context on how the proxy logs are structured and what the numbers mean (the context is right there in the session), so it can do the ratio math and flag anything unexpected.

What to look for

The key ratio is cache_create tokens per API call. Not hit rate percentage — that number compresses the signal. A drop from 4,000 to 1,800 cache_create tokens/call is the real number: it's a 55% reduction in the most expensive token type.

If Phase 2 shows meaningfully lower cache_create/call and higher tokens-per-1%-quota, the fix is working. If the numbers are similar, check the debug log:

tail -f ~/.claude/cache-fix-debug.log

You should see APPLIED: lines on every call. If you don't, the preload isn't loading — verify the NODE_OPTIONS env var is set in your shell.

Debug log

When CACHE_FIX_DEBUG=1 (default), all fix actions are logged to ~/.claude/cache-fix-debug.log:

[2026-04-12T04:37:02Z] --- API call, messages: 81
[2026-04-12T04:37:02Z] APPLIED: tool order stabilized
[2026-04-12T04:37:02Z] APPLIED: fingerprint c05 → 0d2
[2026-04-12T04:37:02Z] APPLIED: stripped 7 images from old tool results
[2026-04-12T04:37:02Z] Request body rewritten

If you see APPLIED: lines, the fix is working. If you see nothing, either the fix isn't loaded or the request was already optimal.

Project layout

claudefix/
  Makefile                  make install / start-proxy / start-dashboard / install-alias
  custom-cache-fix/
    preload.mjs             Fetch interceptor (~400 lines)
    CLAUDE.md               Detailed docs for the cache fix
  reverse-proxy/
    main.py                 Python proxy server (~390 lines)
    requirements.txt        Python deps (just requests)
    pyproject.toml          uv config
    logs/                   Daily JSONL log files (gitignored — contains API keys)
    dashboard/              React + Vite + Tailwind + Recharts
  devops/
    analyze_quota.py        Quota analysis CLI
  experiment/
    run-experiment.sh       A/B testing framework (baseline vs fixed)

Security note

The reverse proxy logs contain your full Anthropic API key in request headers. The logs/ directory is gitignored and should be treated as sensitive.

Never commit log files. Never share these logs.

Links

Claude Code docs
Anthropic API reference
claude-code-cache-fix — the original package this is based on

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
custom-cache-fix		custom-cache-fix
devops		devops
experiment		experiment
reverse-proxy		reverse-proxy
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

ccproxycache

Credits

The problem

Measured results (74-minute session, 345 API calls, Opus 4.6)

Quick Start

Prerequisites

Step 1 — Clone and install

Step 2 — (Optional) Start the reverse proxy and dashboard

Step 3 — Run Claude Code

What is a reverse proxy?

Why it's valuable

Features

Cache Fix (custom-cache-fix/)

Reverse Proxy + Dashboard (reverse-proxy/)

Analysis Tools (devops/)

Configuration

Cache fix

Reverse proxy

How to prove it works: the A/B experiment

What you need

The experiment

Analyze the logs

Ask Claude to interpret the results

What to look for

Debug log

Project layout

Security note

Links

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Cache Fix (`custom-cache-fix/`)

Reverse Proxy + Dashboard (`reverse-proxy/`)

Analysis Tools (`devops/`)

Packages