Skip to content

feat(llm): add configurable prompt cache TTL with 1-hour Claude variant#3123

Merged
bug-ops merged 1 commit intomainfrom
prompt-cache-ttl-1h
Apr 17, 2026
Merged

feat(llm): add configurable prompt cache TTL with 1-hour Claude variant#3123
bug-ops merged 1 commit intomainfrom
prompt-cache-ttl-1h

Conversation

@bug-ops
Copy link
Copy Markdown
Owner

@bug-ops bug-ops commented Apr 17, 2026

Summary

  • Adds CacheTtl enum (Ephemeral | OneHour) to zeph-llm with serde support and exhaustive match arms
  • Extends CacheControl with an optional ttl: Option<CacheTtl> field; omitted on the wire when None/Ephemeral — default output is byte-identical to pre-feature
  • Introduces build_cache_control(ttl) helper used at all cache construction sites in cache.rs and replaces the JSON-literal tool hint in mod.rs with a typed path
  • beta_header() appends extended-cache-ttl-2025-04-25 when TTL is OneHour
  • All 7 split_system_into_blocks call sites and both apply_cache_breakpoint call sites propagate TTL
  • New ProviderEntry.prompt_cache_ttl: Option<CacheTtl> field wired through provider_factory.rs; TOML: prompt_cache_ttl = "ephemeral" | "1h"
  • Migration step in migrate_claude_provider() — copies "1h", suppresses "ephemeral", idempotent on re-run
  • 16 new unit tests covering: enum serde, byte-identity, TTL propagation through system blocks and breakpoints, negative deserialization, migration (survive / suppress / idempotent)

LLM Serialization Gate

This PR touches claude/mod.rs, cache.rs, types.rs, and CacheControl serialization. A live API session test with a stable system prompt is required before merge per project rules.

Test plan

  • cargo nextest run --workspace --lib --bins — 8207 passed, 20 skipped
  • cargo clippy --workspace -- -D warnings — clean
  • cargo +nightly fmt --check — clean
  • Live API session test with prompt_cache_ttl = "1h" in config (required before merge)
  • Verify anthropic-beta: extended-cache-ttl-2025-04-25 header appears in debug dump
  • Verify no 400/422 errors on a multi-turn session

Closes #3096

@github-actions github-actions Bot added enhancement New feature or request size/L Large PR (201-500 lines) documentation Improvements or additions to documentation llm zeph-llm crate (Ollama, Claude) rust Rust code changes core zeph-core crate dependencies Dependency updates config Configuration file changes and removed size/L Large PR (201-500 lines) labels Apr 17, 2026
Extends the Claude provider with an optional `prompt_cache_ttl` config
field ("ephemeral" | "1h") that controls Anthropic prompt cache duration.

When set to "1h", the provider injects the
`anthropic-beta: extended-cache-ttl-2025-04-25` header and emits
`CacheTtl::OneHour` in all CacheControl blocks, reducing re-encode
cost for long-running sessions with stable system prompts.

Default behaviour (Ephemeral) is byte-identical to pre-feature output.

Closes #3096
@bug-ops bug-ops force-pushed the prompt-cache-ttl-1h branch from ea1fd76 to 643a0cb Compare April 17, 2026 17:02
@bug-ops bug-ops enabled auto-merge (squash) April 17, 2026 17:02
@github-actions github-actions Bot added the size/L Large PR (201-500 lines) label Apr 17, 2026
@bug-ops bug-ops merged commit 54640e1 into main Apr 17, 2026
32 checks passed
@bug-ops bug-ops deleted the prompt-cache-ttl-1h branch April 17, 2026 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

config Configuration file changes core zeph-core crate dependencies Dependency updates documentation Improvements or additions to documentation enhancement New feature or request llm zeph-llm crate (Ollama, Claude) rust Rust code changes size/L Large PR (201-500 lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

research(llm): configurable prompt cache TTL — add 1-hour Claude cache variant (Anthropic extended-cache-ttl beta)

1 participant