feat: opt-in OpenRouter Response Caching for compiler retry path

## Problem

`openkb add` retries `compile_*_doc` once when it fails (registered hash only after compile succeeds — see `cli.add_single_file`). On retry, every LLM call (summary, plan, N+M concept pages) runs again with identical prompts. Without OpenRouter's Response Caching, the retry rebills every token.

Same situation applies to repeated `openkb lint` runs and dev iteration where the same compile is re-run against the same document.

## Proposal

Add an opt-in config flag that enables OpenRouter Response Caching by sending `X-OpenRouter-Cache: true` (and optional `X-OpenRouter-Cache-TTL`) on every LiteLLM call from the compiler. When the response is identical (same model, messages, params), OpenRouter returns the cached response in 80–300 ms with **zero token billing** ([docs](https://openrouter.ai/docs/guides/features/response-caching)).

### Config

`.openkb/config.yaml`:
```yaml
response_cache: true              # default: false
response_cache_ttl: 600           # optional, seconds (1-86400, OpenRouter default: 300)
```

### Behaviour

- Default OFF — opt-in to avoid surprise on KBs holding sensitive content (response caching stores responses on OpenRouter; conflicts with strict ZDR posture).
- Only emits headers when `model` starts with \`openrouter/\`. For direct Anthropic/OpenAI/etc., the headers would have no effect; skipping avoids confusing reviewers and stray bytes on the wire.
- Headers are passed via LiteLLM's standard `extra_headers` kwarg.
- This complements PR #38 (Anthropic prompt caching). They are orthogonal: prompt caching reduces cost on the cached prefix per call; response caching skips the model call entirely on identical-payload re-runs.

### Scope

- `compile_short_doc` and `compile_long_doc` only (the only direct LiteLLM callers).
- Out of scope: `query`, `chat`, `linter` — those go through OpenAI Agents SDK; threading custom headers through is a separate, larger change.

## Privacy guard

Default OFF. Document in PR body and a brief note in CLAUDE.md / docs that enabling stores responses on OpenRouter. KBs with classified content (e.g. ISMS data) should leave it disabled or use `X-OpenRouter-Cache-Clear` per-call.

## Test plan

- Unit: `_response_cache_headers` returns `{}` when disabled, when model is non-OpenRouter, and the right dict when enabled (with and without TTL).
- Integration: with `response_cache: true` in config and `model="openrouter/..."`, `litellm.completion` is called with `extra_headers={"X-OpenRouter-Cache": "true"}`.
- Regression: with the flag off (default), no `extra_headers` is passed (existing behaviour).

## Depends on

#38 (uses the `**kwargs` symmetry fix on `_llm_call_async`).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: opt-in OpenRouter Response Caching for compiler retry path #39

Problem

Proposal

Config

Behaviour

Scope

Privacy guard

Test plan

Depends on

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: opt-in OpenRouter Response Caching for compiler retry path #39

Description

Problem

Proposal

Config

Behaviour

Scope

Privacy guard

Test plan

Depends on

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions