feat(usage): track 5m/1h cache TTL split + alwaysLoad for nerve MCP by pufit · Pull Request #76 · ClickHouse/nerve

pufit · 2026-05-16T18:47:56Z

Summary

Two related improvements unlocked by upgrading the Claude Agent SDK from 0.1.60 → 0.2.82 (and the bundled Claude Code CLI from 2.1.111 → 2.1.142):

Track the 5-minute vs 1-hour ephemeral cache TTL split. The Anthropic API returns usage.cache_creation.ephemeral_{5m,1h}_input_tokens alongside the legacy aggregate. The two TTLs are billed at different rates (5m = 1.25x base, 1h = 2.00x base), so accurate per-turn cost attribution requires the split.
alwaysLoad: true on the in-process nerve MCP server. Skips Claude Code's tool-search deferral for the nerve server's tools (memory_recall, task_*, plan_*, notify, ask_user, react, etc.). These tools are used in nearly every session — deferring them only added a ToolSearch round-trip on startup.

Changes

Cache TTL split

Migration v027 adds cache_creation_5m_input_tokens and cache_creation_1h_input_tokens columns to session_usage (default 0).
extract_cache_ttl_split() parses usage.cache_creation.ephemeral_* from the raw API dict.
record_turn_usage accepts the split as keyword args; the engine pulls it from ResultMessage.usage and persists it per turn.
MODEL_PRICING now carries separate cache_write_5m and cache_write_1h rates per model (Opus 4.7: $6.25 vs $10.00 per MTok).
estimate_turn_cost / estimate_cost_from_totals prefer the split when present; fall back to the legacy aggregate at the 5-minute rate for pre-v027 rows and older API responses.
ContextBar surfaces nested ↳5m / ↳1h rows under "Cache created" when the split arrives.

alwaysLoad

create_session_mcp_server / create_nerve_mcp_server set alwaysLoad: True on the returned SDK config. Requires Claude Code CLI >= 2.1.121; silently ignored on older versions. The SDK transport already passes through unknown keys verbatim under --mcp-config.

Compatibility

SDK upgrade is internal cleanup — the public API is backward-compatible, all 564 pre-existing tests pass unchanged.
Historical session_usage rows keep their existing aggregate in cache_creation_input_tokens; the new 5m/1h columns default to 0 and only get populated from the next turn forward.
The cost estimator's fallback bills legacy rows at the 5-minute rate — the conservative default.

Test plan

17 new tests in tests/test_usage_cache_ttl.py (schema, parsing, persistence, pricing arity/monotonicity, 4 cost-estimation scenarios)
1 new assertion in tests/test_session_mcp.py::test_nerve_server_marked_always_load
Full suite: 582 passing (was 564)
Frontend npm run build clean (strict TS)
Verified live on Pi after restart — migration applied, new sessions surface the 5m/1h split in the ContextBar tooltip, and nerve MCP tools no longer appear in the deferred-tools <system-reminder> block

Generated by Nerve

Bump claude-agent-sdk to 0.2.82 (from 0.1.60). The Anthropic API splits cache_creation by TTL under `usage.cache_creation.ephemeral_*`; the two TTLs are billed at different rates (5m = 1.25x base input, 1h = 2.00x base input), so accurate cost attribution needs the split. - Migration v027: adds cache_creation_5m_input_tokens and cache_creation_1h_input_tokens to session_usage (default 0). - record_turn_usage: accepts the split as keyword args; engine extracts it from ResultMessage.usage via extract_cache_ttl_split. - MODEL_PRICING: separate cache_write_5m and cache_write_1h rates; estimate_turn_cost prefers the split, falls back to 5m rate when the API response omits it (legacy responses + pre-v027 rows). - ContextBar surfaces the per-TTL breakdown when present.

Set alwaysLoad: true on the in-process nerve server so its core tools (memory_recall, task_*, plan_*, notify, ask_user, react, send_file, skill_*, etc.) skip Claude Code's tool-search deferral and are available on the first turn. These tools are used in nearly every session — deferring them only adds a ToolSearch round-trip on startup. Requires Claude Code CLI >= 2.1.121; silently ignored on older versions. The SDK transport passes through unknown keys verbatim under --mcp-config.

pufit added 2 commits May 16, 2026 14:33

pufit merged commit fb745f4 into main May 16, 2026

pufit deleted the pufit/sdk-upgrade-cache-ttl-split branch May 16, 2026 18:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(usage): track 5m/1h cache TTL split + alwaysLoad for nerve MCP#76

feat(usage): track 5m/1h cache TTL split + alwaysLoad for nerve MCP#76
pufit merged 2 commits into
mainfrom
pufit/sdk-upgrade-cache-ttl-split

pufit commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pufit commented May 16, 2026

Summary

Changes

Cache TTL split

alwaysLoad

Compatibility

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant