feat(usage): track 5m/1h cache TTL split + alwaysLoad for nerve MCP#76
Merged
Conversation
Bump claude-agent-sdk to 0.2.82 (from 0.1.60). The Anthropic API splits cache_creation by TTL under `usage.cache_creation.ephemeral_*`; the two TTLs are billed at different rates (5m = 1.25x base input, 1h = 2.00x base input), so accurate cost attribution needs the split. - Migration v027: adds cache_creation_5m_input_tokens and cache_creation_1h_input_tokens to session_usage (default 0). - record_turn_usage: accepts the split as keyword args; engine extracts it from ResultMessage.usage via extract_cache_ttl_split. - MODEL_PRICING: separate cache_write_5m and cache_write_1h rates; estimate_turn_cost prefers the split, falls back to 5m rate when the API response omits it (legacy responses + pre-v027 rows). - ContextBar surfaces the per-TTL breakdown when present.
Set alwaysLoad: true on the in-process nerve server so its core tools (memory_recall, task_*, plan_*, notify, ask_user, react, send_file, skill_*, etc.) skip Claude Code's tool-search deferral and are available on the first turn. These tools are used in nearly every session — deferring them only adds a ToolSearch round-trip on startup. Requires Claude Code CLI >= 2.1.121; silently ignored on older versions. The SDK transport passes through unknown keys verbatim under --mcp-config.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two related improvements unlocked by upgrading the Claude Agent SDK from
0.1.60→0.2.82(and the bundled Claude Code CLI from2.1.111→2.1.142):usage.cache_creation.ephemeral_{5m,1h}_input_tokensalongside the legacy aggregate. The two TTLs are billed at different rates (5m = 1.25x base, 1h = 2.00x base), so accurate per-turn cost attribution requires the split.alwaysLoad: trueon the in-process nerve MCP server. Skips Claude Code's tool-search deferral for the nerve server's tools (memory_recall,task_*,plan_*,notify,ask_user,react, etc.). These tools are used in nearly every session — deferring them only added aToolSearchround-trip on startup.Changes
Cache TTL split
cache_creation_5m_input_tokensandcache_creation_1h_input_tokenscolumns tosession_usage(default 0).extract_cache_ttl_split()parsesusage.cache_creation.ephemeral_*from the raw API dict.record_turn_usageaccepts the split as keyword args; the engine pulls it fromResultMessage.usageand persists it per turn.MODEL_PRICINGnow carries separatecache_write_5mandcache_write_1hrates per model (Opus 4.7: $6.25 vs $10.00 per MTok).estimate_turn_cost/estimate_cost_from_totalsprefer the split when present; fall back to the legacy aggregate at the 5-minute rate for pre-v027 rows and older API responses.alwaysLoad
create_session_mcp_server/create_nerve_mcp_serversetalwaysLoad: Trueon the returned SDK config. Requires Claude Code CLI >= 2.1.121; silently ignored on older versions. The SDK transport already passes through unknown keys verbatim under--mcp-config.Compatibility
session_usagerows keep their existing aggregate incache_creation_input_tokens; the new 5m/1h columns default to 0 and only get populated from the next turn forward.Test plan
tests/test_usage_cache_ttl.py(schema, parsing, persistence, pricing arity/monotonicity, 4 cost-estimation scenarios)tests/test_session_mcp.py::test_nerve_server_marked_always_loadnpm run buildclean (strict TS)<system-reminder>block