System prompt forwarded with per-request `x-anthropic-billing-header` line — defeats upstream prompt cache

## Summary

When converting an Anthropic `/v1/messages` request to an OpenAI Responses API payload, the proxy has two independent issues that defeat the upstream prompt cache:

1. The inbound `system` array is concatenated into `instructions` with no filter for non-semantic Anthropic header lines (whose hashes vary per request).
2. The Codex adapter overwrites any inbound `prompt_cache_key` with a fresh `uuid.uuid4()` per request.

Either alone breaks the cache. Together, every turn presents Codex with a unique prefix AND a unique cache key, so prompt caching is effectively disabled.

## Background

Claude Code prefixes its system prompt with a line of the form:

```
x-anthropic-billing-header: cc_version=2.1.117.48f; cc_entrypoint=cli; cch=71fea;
```

The `cch=<hash>` portion regenerates on every request. The line carries no semantic value to the model — it's a billing/telemetry header. When forwarded into the upstream `instructions` field unchanged, every turn presents a brand-new prefix to the backend.

The OpenAI Responses API exposes a `prompt_cache_key` field that lets a client pin cache routing across requests with otherwise-equivalent prefixes. Setting it to a stable value (e.g. a session/conversation identifier) is part of the cache contract.

## Root cause #1 — billing header forwarded raw

`ccproxy/llms/formatters/anthropic_to_openai/requests.py`:

```python
if request.system:
    if isinstance(request.system, str):
        instructions_text = request.system
        payload_data["instructions"] = request.system
    else:
        joined = "".join(block.text for block in request.system if block.text)
        instructions_text = joined or None
        if joined:
            payload_data["instructions"] = joined
```

Every `text` block is concatenated as-is. There's no filter for the `x-anthropic-billing-header:` line (or any other non-semantic Anthropic headers), so it propagates verbatim into the upstream `instructions`.

## Root cause #2 — `prompt_cache_key` randomized per request

`ccproxy/plugins/codex/adapter.py`, around line 644:

```python
if "prompt_cache_key" not in merged:
    prompt_cache_key = template.get("prompt_cache_key")
    if isinstance(prompt_cache_key, str) and prompt_cache_key:
        merged["prompt_cache_key"] = str(uuid.uuid4())
```

When the merged request lacks a `prompt_cache_key` AND the template has one configured, the adapter sets it to a fresh `uuid.uuid4()`. Each request therefore presents a brand-new cache key. (As a separate concern, the present logic only fires when the template has a non-empty `prompt_cache_key`, and silently does nothing when it doesn't — so the field may be missing entirely on other code paths.)

## Impact

Each issue alone breaks the cache. Together:

- **Cost.** Long multi-turn sessions re-bill the full input context (often a large `CLAUDE.md` plus tool catalog and accumulated history) on every turn. Typical multiplier vs. correct cache reuse: 5–10×.
- **Latency.** Cache-miss prefills are slower than cache hits, so every turn after the first feels sluggish.
- **Subscription throughput.** Per-account or per-plan rate limits exhaust faster than they should because effective input-token throughput is lower.

## Suggested fix

### Fix #1 — strip non-semantic Anthropic header lines

In `ccproxy/llms/formatters/anthropic_to_openai/requests.py`, run system text through a stripper before joining:

```python
def _strip_nonsemantic_system_lines(text: str) -> str:
    return "\n".join(
        line for line in text.splitlines()
        if not line.strip().lower().startswith("x-anthropic-billing-header:")
    ).strip()


# in the request builder:
if request.system:
    if isinstance(request.system, str):
        cleaned = _strip_nonsemantic_system_lines(request.system)
        if cleaned:
            payload_data["instructions"] = cleaned
            instructions_text = cleaned
    else:
        cleaned_blocks = (
            _strip_nonsemantic_system_lines(block.text or "")
            for block in request.system
            if block.text
        )
        joined = "\n\n".join(b for b in cleaned_blocks if b)
        if joined:
            payload_data["instructions"] = joined
            instructions_text = joined
```

### Fix #2 — derive `prompt_cache_key` deterministically

Instead of `str(uuid.uuid4())`, derive the key from a stable identifier that survives across requests in the same conversation. Two reasonable approaches:

```python
# (a) keep the template-configured key as-is when present:
if "prompt_cache_key" not in merged:
    template_key = template.get("prompt_cache_key")
    if isinstance(template_key, str) and template_key:
        merged["prompt_cache_key"] = template_key   # stable; not randomized

# (b) derive from a stable session identifier (preferred when no template key exists):
if "prompt_cache_key" not in merged:
    session_id = ...  # e.g. the conversation/session id available in the request context
    if session_id:
        merged["prompt_cache_key"] = session_id
```

Generating a fresh UUID per request is the pessimal case — it should only happen when the caller explicitly opts into a cold cache; otherwise the proxy should pass through or compute a deterministic key.

## Reproduction

1. Send two consecutive Anthropic `/v1/messages` requests, each with a `system` array beginning with a `text` block containing `x-anthropic-billing-header: cc_version=...; cch=<hash>;` (real Claude Code traffic does this automatically; the two `cch` values will differ).
2. Capture the outbound Codex Responses payload from each.
3. Observe:
   - `instructions` differs in the first line (different `cch=<hash>`)
   - `prompt_cache_key` differs entirely (fresh `uuid.uuid4()` each time)
4. Compare upstream-reported cache hit metrics — both will be cold prefills.

After applying both fixes, identical-prefix conversations will share a `prompt_cache_key` and the `instructions` text will be byte-identical across calls, allowing the upstream cache to hit on the shared prefix.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

System prompt forwarded with per-request `x-anthropic-billing-header` line — defeats upstream prompt cache #61

Summary

Background

Root cause #1 — billing header forwarded raw

Root cause #2 — `prompt_cache_key` randomized per request

Impact

Suggested fix

Fix #1 — strip non-semantic Anthropic header lines

Fix #2 — derive `prompt_cache_key` deterministically

Reproduction

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

System prompt forwarded with per-request x-anthropic-billing-header line — defeats upstream prompt cache #61

Description

Summary

Background

Root cause #1 — billing header forwarded raw

Root cause #2 — prompt_cache_key randomized per request

Impact

Suggested fix

Fix #1 — strip non-semantic Anthropic header lines

Fix #2 — derive prompt_cache_key deterministically

Reproduction

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

System prompt forwarded with per-request `x-anthropic-billing-header` line — defeats upstream prompt cache #61

Root cause #2 — `prompt_cache_key` randomized per request

Fix #2 — derive `prompt_cache_key` deterministically