Summary
When converting an Anthropic /v1/messages request to an OpenAI Responses API payload, the proxy has two independent issues that defeat the upstream prompt cache:
- The inbound
system array is concatenated into instructions with no filter for non-semantic Anthropic header lines (whose hashes vary per request).
- The Codex adapter overwrites any inbound
prompt_cache_key with a fresh uuid.uuid4() per request.
Either alone breaks the cache. Together, every turn presents Codex with a unique prefix AND a unique cache key, so prompt caching is effectively disabled.
Background
Claude Code prefixes its system prompt with a line of the form:
x-anthropic-billing-header: cc_version=2.1.117.48f; cc_entrypoint=cli; cch=71fea;
The cch=<hash> portion regenerates on every request. The line carries no semantic value to the model — it's a billing/telemetry header. When forwarded into the upstream instructions field unchanged, every turn presents a brand-new prefix to the backend.
The OpenAI Responses API exposes a prompt_cache_key field that lets a client pin cache routing across requests with otherwise-equivalent prefixes. Setting it to a stable value (e.g. a session/conversation identifier) is part of the cache contract.
Root cause #1 — billing header forwarded raw
ccproxy/llms/formatters/anthropic_to_openai/requests.py:
if request.system:
if isinstance(request.system, str):
instructions_text = request.system
payload_data["instructions"] = request.system
else:
joined = "".join(block.text for block in request.system if block.text)
instructions_text = joined or None
if joined:
payload_data["instructions"] = joined
Every text block is concatenated as-is. There's no filter for the x-anthropic-billing-header: line (or any other non-semantic Anthropic headers), so it propagates verbatim into the upstream instructions.
Root cause #2 — prompt_cache_key randomized per request
ccproxy/plugins/codex/adapter.py, around line 644:
if "prompt_cache_key" not in merged:
prompt_cache_key = template.get("prompt_cache_key")
if isinstance(prompt_cache_key, str) and prompt_cache_key:
merged["prompt_cache_key"] = str(uuid.uuid4())
When the merged request lacks a prompt_cache_key AND the template has one configured, the adapter sets it to a fresh uuid.uuid4(). Each request therefore presents a brand-new cache key. (As a separate concern, the present logic only fires when the template has a non-empty prompt_cache_key, and silently does nothing when it doesn't — so the field may be missing entirely on other code paths.)
Impact
Each issue alone breaks the cache. Together:
- Cost. Long multi-turn sessions re-bill the full input context (often a large
CLAUDE.md plus tool catalog and accumulated history) on every turn. Typical multiplier vs. correct cache reuse: 5–10×.
- Latency. Cache-miss prefills are slower than cache hits, so every turn after the first feels sluggish.
- Subscription throughput. Per-account or per-plan rate limits exhaust faster than they should because effective input-token throughput is lower.
Suggested fix
Fix #1 — strip non-semantic Anthropic header lines
In ccproxy/llms/formatters/anthropic_to_openai/requests.py, run system text through a stripper before joining:
def _strip_nonsemantic_system_lines(text: str) -> str:
return "\n".join(
line for line in text.splitlines()
if not line.strip().lower().startswith("x-anthropic-billing-header:")
).strip()
# in the request builder:
if request.system:
if isinstance(request.system, str):
cleaned = _strip_nonsemantic_system_lines(request.system)
if cleaned:
payload_data["instructions"] = cleaned
instructions_text = cleaned
else:
cleaned_blocks = (
_strip_nonsemantic_system_lines(block.text or "")
for block in request.system
if block.text
)
joined = "\n\n".join(b for b in cleaned_blocks if b)
if joined:
payload_data["instructions"] = joined
instructions_text = joined
Fix #2 — derive prompt_cache_key deterministically
Instead of str(uuid.uuid4()), derive the key from a stable identifier that survives across requests in the same conversation. Two reasonable approaches:
# (a) keep the template-configured key as-is when present:
if "prompt_cache_key" not in merged:
template_key = template.get("prompt_cache_key")
if isinstance(template_key, str) and template_key:
merged["prompt_cache_key"] = template_key # stable; not randomized
# (b) derive from a stable session identifier (preferred when no template key exists):
if "prompt_cache_key" not in merged:
session_id = ... # e.g. the conversation/session id available in the request context
if session_id:
merged["prompt_cache_key"] = session_id
Generating a fresh UUID per request is the pessimal case — it should only happen when the caller explicitly opts into a cold cache; otherwise the proxy should pass through or compute a deterministic key.
Reproduction
- Send two consecutive Anthropic
/v1/messages requests, each with a system array beginning with a text block containing x-anthropic-billing-header: cc_version=...; cch=<hash>; (real Claude Code traffic does this automatically; the two cch values will differ).
- Capture the outbound Codex Responses payload from each.
- Observe:
instructions differs in the first line (different cch=<hash>)
prompt_cache_key differs entirely (fresh uuid.uuid4() each time)
- Compare upstream-reported cache hit metrics — both will be cold prefills.
After applying both fixes, identical-prefix conversations will share a prompt_cache_key and the instructions text will be byte-identical across calls, allowing the upstream cache to hit on the shared prefix.
Summary
When converting an Anthropic
/v1/messagesrequest to an OpenAI Responses API payload, the proxy has two independent issues that defeat the upstream prompt cache:systemarray is concatenated intoinstructionswith no filter for non-semantic Anthropic header lines (whose hashes vary per request).prompt_cache_keywith a freshuuid.uuid4()per request.Either alone breaks the cache. Together, every turn presents Codex with a unique prefix AND a unique cache key, so prompt caching is effectively disabled.
Background
Claude Code prefixes its system prompt with a line of the form:
The
cch=<hash>portion regenerates on every request. The line carries no semantic value to the model — it's a billing/telemetry header. When forwarded into the upstreaminstructionsfield unchanged, every turn presents a brand-new prefix to the backend.The OpenAI Responses API exposes a
prompt_cache_keyfield that lets a client pin cache routing across requests with otherwise-equivalent prefixes. Setting it to a stable value (e.g. a session/conversation identifier) is part of the cache contract.Root cause #1 — billing header forwarded raw
ccproxy/llms/formatters/anthropic_to_openai/requests.py:Every
textblock is concatenated as-is. There's no filter for thex-anthropic-billing-header:line (or any other non-semantic Anthropic headers), so it propagates verbatim into the upstreaminstructions.Root cause #2 —
prompt_cache_keyrandomized per requestccproxy/plugins/codex/adapter.py, around line 644:When the merged request lacks a
prompt_cache_keyAND the template has one configured, the adapter sets it to a freshuuid.uuid4(). Each request therefore presents a brand-new cache key. (As a separate concern, the present logic only fires when the template has a non-emptyprompt_cache_key, and silently does nothing when it doesn't — so the field may be missing entirely on other code paths.)Impact
Each issue alone breaks the cache. Together:
CLAUDE.mdplus tool catalog and accumulated history) on every turn. Typical multiplier vs. correct cache reuse: 5–10×.Suggested fix
Fix #1 — strip non-semantic Anthropic header lines
In
ccproxy/llms/formatters/anthropic_to_openai/requests.py, run system text through a stripper before joining:Fix #2 — derive
prompt_cache_keydeterministicallyInstead of
str(uuid.uuid4()), derive the key from a stable identifier that survives across requests in the same conversation. Two reasonable approaches:Generating a fresh UUID per request is the pessimal case — it should only happen when the caller explicitly opts into a cold cache; otherwise the proxy should pass through or compute a deterministic key.
Reproduction
/v1/messagesrequests, each with asystemarray beginning with atextblock containingx-anthropic-billing-header: cc_version=...; cch=<hash>;(real Claude Code traffic does this automatically; the twocchvalues will differ).instructionsdiffers in the first line (differentcch=<hash>)prompt_cache_keydiffers entirely (freshuuid.uuid4()each time)After applying both fixes, identical-prefix conversations will share a
prompt_cache_keyand theinstructionstext will be byte-identical across calls, allowing the upstream cache to hit on the shared prefix.