Skip to content

RobbySingh/ccproxycache

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ccproxycache

Fix Claude Code's prompt cache inefficiency that causes up to 20x cost increase and 2.6x faster quota burn on resumed sessions.

Includes an optional reverse proxy + live dashboard for monitoring cache behavior, quota burn, and token costs in real time.

Fixes are safe — they normalize data you're already sending. No server-side features are overridden, no system prompts are modified.

Credits

cnighswonger/claude-code-cache-fix for the original fix and ANTHROPIC_BASE_URL for enabling the proxy routing. I modified the cache fix to remove features that override server-controlled behavior or modify system prompts.

The problem

Claude Code sends API requests to Anthropic with a prompt cache that relies on byte-stable prefixes. Three client-side bugs break this stability when you resume sessions:

  1. Tool ordering jitter — tool definitions arrive in non-deterministic order
  2. Fingerprint instability — the cc_version fingerprint is computed from meta blocks that shift on resume
  3. Block scatter — attachment blocks (skills, MCP, hooks) get scattered across messages instead of staying in messages[0]

Each mismatch causes a full cache bust: instead of paying $0.50/M for a cache read, you pay $10/M for a cache write.

Measured results (74-minute session, 345 API calls, Opus 4.6)

Metric Without Fix With Fix Improvement
Cache hit rate 96.8% 98.7% +1.9pp
Cache create tokens/call 4,165 1,816 -56%
Tokens per 1% quota 2.3M 4.5M +94%
Quota burn rate 19.1%/hr 7.2%/hr -62%
Projected session length 4.8 hours 11.4 hours +138%

96.8% → 98.7% sounds like a small change. But it translates to a 56% reduction in cache_create tokens because cache creation costs 20x more than cache reads ($10/M vs $0.50/M).


Quick Start

Prerequisites

  • Node.js 18+ — required for the cache fix (intercepts Node's global fetch)
  • Claude Code installed via npmnpm install -g @anthropic-ai/claude-code
    • Must be the npm package, not a standalone binary — the fix patches Node's global fetch before the process starts
  • Python 3.10+ — only needed for the reverse proxy and analysis tools

Step 1 — Clone and install

git clone https://github.com/RobbySingh/ccproxycache.git
cd claudefix
make install

Then add the aliases to your shell config so you can use claude-cache-fix from any directory. Run:

make install-alias

This prints the exact lines to paste. Copy the block for your shell:

Linux / macOS — bash (~/.bashrc) or zsh (~/.zshrc):

# Cache fix only (no proxy logging)
alias claude-cache-fix='NODE_OPTIONS="--import /absolute/path/to/claudefix/custom-cache-fix/preload.mjs" claude'

# Cache fix + route through reverse proxy (enables dashboard)
alias claude-proxy='ANTHROPIC_BASE_URL=http://localhost:3001 NODE_OPTIONS="--import /absolute/path/to/claudefix/custom-cache-fix/preload.mjs" claude'

Replace /absolute/path/to/claudefix with the actual path shown by make install-alias.

After pasting, reload your shell:

source ~/.bashrc   # or: source ~/.zshrc

Windows — PowerShell profile ($PROFILE):

# Cache fix only
function claude-cache-fix {
  $env:NODE_OPTIONS='--import C:\path\to\claudefix\custom-cache-fix\preload.mjs'
  claude @args
}

# Cache fix + proxy
function claude-proxy {
  $env:NODE_OPTIONS='--import C:\path\to\claudefix\custom-cache-fix\preload.mjs'
  $env:ANTHROPIC_BASE_URL='http://localhost:3001'
  claude @args
}

Fish shell (~/.config/fish/config.fish):

alias claude-cache-fix='env NODE_OPTIONS="--import /absolute/path/to/claudefix/custom-cache-fix/preload.mjs" claude'
alias claude-proxy='env ANTHROPIC_BASE_URL=http://localhost:3001 NODE_OPTIONS="--import /absolute/path/to/claudefix/custom-cache-fix/preload.mjs" claude'

Step 2 — (Optional) Start the reverse proxy and dashboard

The reverse proxy sits between Claude Code and the Anthropic API. It logs every request and response and streams them to a live dashboard where you can watch cache behavior, quota burn, and cost in real time.

Open two terminals:

Terminal 1 — start the proxy (port 3001):

make start-proxy

Terminal 2 — start the dashboard (port 2999):

make start-dashboard

Open http://localhost:2999.


Step 3 — Run Claude Code

Without the proxy (cache fix only):

claude-cache-fix

This is the minimal setup. The cache fix is active, nothing is logged to the proxy.

claude-cache-fix --resume <session-id>   # resume a session
CACHE_FIX_DEBUG=0 claude-cache-fix       # disable debug logging

With the proxy (cache fix + dashboard monitoring):

Requires Steps 1 and 2 above (proxy and dashboard running).

claude-proxy

You'll see live stats in the dashboard at http://localhost:2999.

Without any fix (baseline / comparison):

claude

Use this to compare quota burn before vs after the fix.


What is a reverse proxy?

Normally, Claude Code talks directly to Anthropic's API:

Claude Code  ──────────────────────────────→  api.anthropic.com
             ←──────────────────────────────
                 (response with usage data)

Anthropic's API returns rate-limit and quota information in response headers — fields like anthropic-ratelimit-unified-5h-utilization that tell you how much of your 5-hour quota window you've consumed. By default these headers are invisible to you: Claude Code reads them internally and you never see them.

A reverse proxy inserts itself in the middle:

Claude Code  ──→  localhost:3001  ──────────→  api.anthropic.com
             ←──  (proxy)         ←──────────
                      │
                      ▼
                  logs/*.jsonl
                      │
                      ▼
                  dashboard :2999

The proxy forwards every request upstream unmodified, collects every response (including all headers), and writes the full request + response pair to a daily log file. It then streams log entries to the dashboard via SSE (Server-Sent Events) so you can see stats live.

Why it's valuable

Without the proxy, all you know about cache performance is what Claude Code's status line shows. With the proxy you get:

  • Cache hit rate per call — see exactly when busts happen and how bad they are
  • Quota burn rate — how fast you're consuming your 5-hour and 7-day windows
  • Cache create vs read breakdown — the expensive writes vs cheap reads, charted over time
  • Context size per call — watch context grow across a long session
  • Per-session stats — cost, cache %, tool call breakdown
  • Full request/response inspection — the complete system prompt, messages, tool definitions, and usage object for any call

The proxy does not modify requests or responses. It is pure passthrough + logging.


Features

Cache Fix (custom-cache-fix/)

Fix What it does
Block relocation Moves scattered attachment blocks (skills, MCP, deferred tools, hooks) back to messages[0]
Block sorting Sorts skill entries and deferred tool names alphabetically within their blocks
Content pinning SHA-256 pins block content so async MCP registration jitter doesn't produce different bytes
Session knowledge strip Removes <session_knowledge> tags from hooks blocks (ephemeral data that busts cache)
Tool order stabilization Sorts tool definitions alphabetically by name
Fingerprint stabilization Recomputes cc_version from real user text instead of meta blocks
Image stripping Replaces base64 images in tool results older than 3 user turns with a text placeholder
Debug logging Logs all fix actions to ~/.claude/cache-fix-debug.log

Reverse Proxy + Dashboard (reverse-proxy/)

  • Transparent passthrough — no modification, pure logging
  • Daily JSONL log files with full request/response data
  • Live SSE streaming to the dashboard
  • 8 KPI cards: cost, burn rate, cache hit rate, input/output tokens, cache writes, sessions, quota
  • 4 charts: cache rate over time, quota utilization, context size per call, cache create vs read
  • Session view with per-session stats and tool call breakdown
  • Request detail panel with headers, full request/response, and tool calls

Analysis Tools (devops/)

python devops/analyze_quota.py                    # today's logs
python devops/analyze_quota.py --date 2026-04-12  # specific date
python devops/analyze_quota.py --json             # machine-readable output

Outputs per-phase quota burn rates, cache hit rates, TTL tier breakdown, and implied quota pool size.


Configuration

Cache fix

Variable Default Description
CACHE_FIX_DEBUG 1 (on) Log fix actions to ~/.claude/cache-fix-debug.log. Set 0 to disable
CACHE_FIX_IMAGE_KEEP_LAST 3 Keep images in last N user messages. Set 0 to disable stripping

Reverse proxy

Variable Default Description
PROXY_PORT 3001 Port the proxy listens on
ANTHROPIC_FORWARD_URL https://api.anthropic.com Upstream API URL

How to prove it works: the A/B experiment

The fastest way to convince yourself the fix is real is to run the same kind of work twice — once without the fix, once with — and compare the quota burn numbers. Here's the exact procedure.

What you need

  • Reverse proxy running (make start-proxy) — this captures the usage headers
  • Dashboard open (make start-dashboard) — for a visual read during the session
  • About 60 minutes

The experiment

Phase 1 — 30 minutes without the fix (baseline)

  1. Note your current 5-hour quota utilization. The dashboard shows this, or check the top-right of Claude Code's status line.
  2. Run a normal working session for 30 minutes using claude (no cache fix, no proxy alias):
    ANTHROPIC_BASE_URL=http://localhost:3001 claude
  3. After 30 minutes, note the quota utilization again. Calculate how many percentage points you burned.

Phase 2 — 30 minutes with the fix

  1. Stop and restart your Claude Code session using claude-proxy (cache fix + proxy):
    claude-proxy
  2. Do the same kind of work for another 30 minutes.
  3. After 30 minutes, note the quota utilization again.

Keep the type of work roughly consistent between phases — both coding sessions, both research sessions, etc. You're controlling for work type to isolate the fix's effect.

Analyze the logs

After both phases, run the quota analysis script against today's logs:

python devops/analyze_quota.py

This outputs per-phase burn rates, cache hit rates, and cache create vs read breakdowns. The script automatically splits phases based on gaps in API call timing.

Sample output structure:

Phase 1 (14 calls, 18.3 min)
  Cache hit rate:    96.2%
  Cache create/call: 4,312 tokens
  Quota burn:        0.041% per 1M tokens

Phase 2 (22 calls, 26.1 min)
  Cache hit rate:    98.9%
  Cache create/call: 1,740 tokens
  Quota burn:        0.022% per 1M tokens

For JSON output (easier to paste into Claude):

python devops/analyze_quota.py --json

Ask Claude to interpret the results

Copy the JSON output and ask Claude Code to analyze it. Here's a prompt that works well:

Here is quota and cache performance data from two phases of my Claude Code session today.
Phase 1 ran without the cache fix. Phase 2 ran with the cache fix enabled.

<paste JSON output here>

Given this data:
1. What is the reduction in cache_create tokens per call between phases?
2. What does this imply about quota burn rate improvement?
3. Are there any anomalies or signs that the fix didn't fully apply?
4. At this burn rate difference, how much longer would a 5-hour quota window last with the fix vs without?

Use the cache_creation vs cache_read token ratio as the primary signal — not wall-clock time,
since phases may have different call counts.

Claude has full context on how the proxy logs are structured and what the numbers mean (the context is right there in the session), so it can do the ratio math and flag anything unexpected.

What to look for

The key ratio is cache_create tokens per API call. Not hit rate percentage — that number compresses the signal. A drop from 4,000 to 1,800 cache_create tokens/call is the real number: it's a 55% reduction in the most expensive token type.

If Phase 2 shows meaningfully lower cache_create/call and higher tokens-per-1%-quota, the fix is working. If the numbers are similar, check the debug log:

tail -f ~/.claude/cache-fix-debug.log

You should see APPLIED: lines on every call. If you don't, the preload isn't loading — verify the NODE_OPTIONS env var is set in your shell.


Debug log

When CACHE_FIX_DEBUG=1 (default), all fix actions are logged to ~/.claude/cache-fix-debug.log:

[2026-04-12T04:37:02Z] --- API call, messages: 81
[2026-04-12T04:37:02Z] APPLIED: tool order stabilized
[2026-04-12T04:37:02Z] APPLIED: fingerprint c05 → 0d2
[2026-04-12T04:37:02Z] APPLIED: stripped 7 images from old tool results
[2026-04-12T04:37:02Z] Request body rewritten

If you see APPLIED: lines, the fix is working. If you see nothing, either the fix isn't loaded or the request was already optimal.


Project layout

claudefix/
  Makefile                  make install / start-proxy / start-dashboard / install-alias
  custom-cache-fix/
    preload.mjs             Fetch interceptor (~400 lines)
    CLAUDE.md               Detailed docs for the cache fix
  reverse-proxy/
    main.py                 Python proxy server (~390 lines)
    requirements.txt        Python deps (just requests)
    pyproject.toml          uv config
    logs/                   Daily JSONL log files (gitignored — contains API keys)
    dashboard/              React + Vite + Tailwind + Recharts
  devops/
    analyze_quota.py        Quota analysis CLI
  experiment/
    run-experiment.sh       A/B testing framework (baseline vs fixed)

Security note

The reverse proxy logs contain your full Anthropic API key in request headers. The logs/ directory is gitignored and should be treated as sensitive.

Never commit log files. Never share these logs.


Links

License

MIT

About

Claude code reverse proxy to store all logs and monitor cache usage over time.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors