Skip to content

feat(usage): track 5m/1h cache TTL split + alwaysLoad for nerve MCP#76

Merged
pufit merged 2 commits into
mainfrom
pufit/sdk-upgrade-cache-ttl-split
May 16, 2026
Merged

feat(usage): track 5m/1h cache TTL split + alwaysLoad for nerve MCP#76
pufit merged 2 commits into
mainfrom
pufit/sdk-upgrade-cache-ttl-split

Conversation

@pufit
Copy link
Copy Markdown
Member

@pufit pufit commented May 16, 2026

Summary

Two related improvements unlocked by upgrading the Claude Agent SDK from 0.1.600.2.82 (and the bundled Claude Code CLI from 2.1.1112.1.142):

  1. Track the 5-minute vs 1-hour ephemeral cache TTL split. The Anthropic API returns usage.cache_creation.ephemeral_{5m,1h}_input_tokens alongside the legacy aggregate. The two TTLs are billed at different rates (5m = 1.25x base, 1h = 2.00x base), so accurate per-turn cost attribution requires the split.
  2. alwaysLoad: true on the in-process nerve MCP server. Skips Claude Code's tool-search deferral for the nerve server's tools (memory_recall, task_*, plan_*, notify, ask_user, react, etc.). These tools are used in nearly every session — deferring them only added a ToolSearch round-trip on startup.

Changes

Cache TTL split

  • Migration v027 adds cache_creation_5m_input_tokens and cache_creation_1h_input_tokens columns to session_usage (default 0).
  • extract_cache_ttl_split() parses usage.cache_creation.ephemeral_* from the raw API dict.
  • record_turn_usage accepts the split as keyword args; the engine pulls it from ResultMessage.usage and persists it per turn.
  • MODEL_PRICING now carries separate cache_write_5m and cache_write_1h rates per model (Opus 4.7: $6.25 vs $10.00 per MTok).
  • estimate_turn_cost / estimate_cost_from_totals prefer the split when present; fall back to the legacy aggregate at the 5-minute rate for pre-v027 rows and older API responses.
  • ContextBar surfaces nested ↳5m / ↳1h rows under "Cache created" when the split arrives.

alwaysLoad

  • create_session_mcp_server / create_nerve_mcp_server set alwaysLoad: True on the returned SDK config. Requires Claude Code CLI >= 2.1.121; silently ignored on older versions. The SDK transport already passes through unknown keys verbatim under --mcp-config.

Compatibility

  • SDK upgrade is internal cleanup — the public API is backward-compatible, all 564 pre-existing tests pass unchanged.
  • Historical session_usage rows keep their existing aggregate in cache_creation_input_tokens; the new 5m/1h columns default to 0 and only get populated from the next turn forward.
  • The cost estimator's fallback bills legacy rows at the 5-minute rate — the conservative default.

Test plan

  • 17 new tests in tests/test_usage_cache_ttl.py (schema, parsing, persistence, pricing arity/monotonicity, 4 cost-estimation scenarios)
  • 1 new assertion in tests/test_session_mcp.py::test_nerve_server_marked_always_load
  • Full suite: 582 passing (was 564)
  • Frontend npm run build clean (strict TS)
  • Verified live on Pi after restart — migration applied, new sessions surface the 5m/1h split in the ContextBar tooltip, and nerve MCP tools no longer appear in the deferred-tools <system-reminder> block

Generated by Nerve

pufit added 2 commits May 16, 2026 14:33
Bump claude-agent-sdk to 0.2.82 (from 0.1.60). The Anthropic API
splits cache_creation by TTL under `usage.cache_creation.ephemeral_*`;
the two TTLs are billed at different rates (5m = 1.25x base input,
1h = 2.00x base input), so accurate cost attribution needs the split.

- Migration v027: adds cache_creation_5m_input_tokens and
  cache_creation_1h_input_tokens to session_usage (default 0).
- record_turn_usage: accepts the split as keyword args; engine
  extracts it from ResultMessage.usage via extract_cache_ttl_split.
- MODEL_PRICING: separate cache_write_5m and cache_write_1h rates;
  estimate_turn_cost prefers the split, falls back to 5m rate when
  the API response omits it (legacy responses + pre-v027 rows).
- ContextBar surfaces the per-TTL breakdown when present.
Set alwaysLoad: true on the in-process nerve server so its core tools
(memory_recall, task_*, plan_*, notify, ask_user, react, send_file,
skill_*, etc.) skip Claude Code's tool-search deferral and are
available on the first turn. These tools are used in nearly every
session — deferring them only adds a ToolSearch round-trip on startup.

Requires Claude Code CLI >= 2.1.121; silently ignored on older
versions. The SDK transport passes through unknown keys verbatim
under --mcp-config.
@pufit pufit merged commit fb745f4 into main May 16, 2026
@pufit pufit deleted the pufit/sdk-upgrade-cache-ttl-split branch May 16, 2026 18:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant