feat: cache tool definitions and add prompt_caching config toggle#78
Merged
emal-avala merged 1 commit intomainfrom Apr 6, 2026
Merged
feat: cache tool definitions and add prompt_caching config toggle#78emal-avala merged 1 commit intomainfrom
emal-avala merged 1 commit intomainfrom
Conversation
Prompt caching for system prompt and conversation history was already
implemented. This adds the missing piece: cache_control on the last
tool definition in the tools array, which lets the API cache the
entire prefix (system prompt + tools) as a single block.
Changes:
- anthropic.rs: Add cache_control: {type: "ephemeral"} to the last
tool definition when enable_caching is true
- client.rs: Same change in the legacy client path
- schema.rs: Add `prompt_caching` feature flag (default: true) so
users can disable caching for providers that don't support it
- query/mod.rs: Wire feature flag into ProviderRequest instead of
hardcoding enable_caching: true
With 32 tool definitions (~15K tokens), this saves ~$0.003/turn on
cache hits. Over a 50-turn session, that's ~$0.15 saved on tools
alone, on top of the existing system prompt and history caching.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
emal-avala
added a commit
that referenced
this pull request
Apr 6, 2026
PR #78 squash merge accidentally removed the 15 feature specs added by PR #76. This restores them and applies accuracy corrections from an audit against the actual codebase: - 7.13 Provider Prompt Caching: marked Done — system prompt caching, message breakpoints, cache tracking, cost display, tool caching, and config toggle all implemented - 7.14 Local LLM Auto-Discovery: marked Partially Done — Ollama detection already exists in setup.rs, only LM Studio/llama.cpp remaining - 7.15 Conversation Branching: marked Partially Done — /fork command and /resume exist, advanced branching (named branches, checkout, merge) still needed - Updated Contributing section to reflect completed items
This was referenced Apr 6, 2026
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Completes prompt caching support by adding
cache_controlto tool definitions and exposing a user-facing config toggle.What was already implemented (this PR does NOT change):
cache_control: { type: "ephemeral" }messages_to_api_params_cached()Usagestruct trackscache_read_input_tokensandcache_creation_input_tokens/costcommand shows cache hit percentage per modelCacheTrackerdetects cache breaks via fingerprintinganthropic-beta: prompt-caching-2024-07-31headerWhat this PR adds:
cache_control: { type: "ephemeral" }on the last tool in the tools array, so the API caches the entire prefix (system + tools) as one block. With 32 tools (~15K tokens), this saves ~$0.003/turn on hits.features.prompt_caching(default:true) inconfig.toml. Users on providers without caching support can disable it to avoid unknown fields in requests.query/mod.rsnow reads the feature flag instead of hardcodingenable_caching: true.Files changed
crates/lib/src/llm/anthropic.rscache_controlto last tool definitioncrates/lib/src/llm/client.rscrates/lib/src/config/schema.rsprompt_cachingfeature flagcrates/lib/src/query/mod.rsImplements roadmap item 7.13.
Test plan
cargo fmt --all -- --check— cleancargo clippy -- -D warnings— zero warningscargo test— all tests passcache_controlin API request (debug log)prompt_caching = falsein config.toml disables all cache_control markers/costshows cache hit rate improvement after tools caching🤖 Generated with Claude Code