feat: add cache_control breakpoints to compiler for Anthropic prompt caching

## Problem

OpenKB compiler reuses a "base context A" (system + document) across N+M+2 LLM calls per document (summary → concepts-plan → N create + M update concept pages). Without `cache_control` markers, every call re-bills the full document content as input tokens.

For Anthropic Sonnet 4.5 (via OpenRouter or direct), prompt caching can reduce input cost by ~90% on cached prefix and reduce TTFT. Minimum cacheable size is 1,024 tokens, easily exceeded by typical document content.

## Proposal

Add `cache_control: {"type": "ephemeral"}` markers at two breakpoints in `openkb/agent/compiler.py`:

1. **End of `doc_msg`** in `compile_short_doc` + `compile_long_doc` — caches `system + doc` for all downstream calls (summary, plan, every concept).
2. **End of assistant summary message** in `_compile_concepts` (3 call sites: plan, create, update) — caches `system + doc + summary` for all concept generation calls.

Two breakpoints, well within Anthropic's max-4 limit.

## Compatibility

- Anthropic / OpenRouter→Anthropic: cache_control honored.
- OpenAI: list-of-blocks content format is valid (Vision API uses it); cache_control silently ignored.
- Other providers: LiteLLM normalizes/strips unknown fields.

## Side fix

`_llm_call_async` currently does not forward `**kwargs` while `_llm_call` does (asymmetry noted in memory #82886). Add `**kwargs` for parity.

## Out of scope

- OpenRouter Response Caching (`X-OpenRouter-Cache: true`) — different mechanism, evaluated separately.
- Refactoring messages into a dedicated builder module — keep patch surgical.

## Test plan

- Existing pytest suite passes (mocks accept `*args, **kwargs`).
- New assertion: completion payload contains `cache_control` block on the doc_msg.
- Manual smoke against a real Anthropic key: observe `cached_tokens` in `prompt_tokens_details` on calls 2..N.

## References

- Memory observation S11144 (3-5 line patch feasibility, OpenKB compiler audit).
- CLAUDE.md compiler architecture: "Designed around prompt-cache reuse: a single base context A reused across summary → concept-plan → concept-page calls."

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add cache_control breakpoints to compiler for Anthropic prompt caching #37

Problem

Proposal

Compatibility

Side fix

Out of scope

Test plan

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: add cache_control breakpoints to compiler for Anthropic prompt caching #37

Description

Problem

Proposal

Compatibility

Side fix

Out of scope

Test plan

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions