Goal
Keep prompt-cache contexts alive across heartbeat cycles when the heartbeat interval is shorter than the cache TTL, and warn when a configured heartbeat exceeds the provider's TTL.
Context
Anthropic 1h sliding TTL (enabled via ENABLE_PROMPT_CACHING_1H) refreshes on every cache hit. For agents heartbeating at sub-TTL intervals, this means the cache stays warm for free across all wakes, dropping per-wake cost ~10×. Above the TTL, the cache cold-starts each wake.
OpenAI/Codex caches have a 5–10 min TTL with no extension flag — break-even is shorter.
Gemini context cache is request-counted on the API side; for the AI Pro CLI bridge, behavior is closer to per-session.
Scope
- Add
MessageType.CACHE_PING (or equivalent) in the scheduler — a 1-token no-op turn whose purpose is to refresh the cache.
- For agents whose heartbeat interval is below the provider's TTL, no keepalive needed. For agents above the TTL but where cost optimization is still worthwhile (configurable), schedule a keepalive at
TTL - margin.
- Validation: at agent registration, compare
heartbeat.interval against provider TTL and emit a warning into the snapshot when the configuration is suboptimal.
Acceptance
- A unit test demonstrates a cache-ping reduces cache_creation tokens on the next real turn.
- Snapshot shows a clear warning when an agent's heartbeat exceeds the provider's TTL and no keepalive is configured.
Goal
Keep prompt-cache contexts alive across heartbeat cycles when the heartbeat interval is shorter than the cache TTL, and warn when a configured heartbeat exceeds the provider's TTL.
Context
Anthropic 1h sliding TTL (enabled via
ENABLE_PROMPT_CACHING_1H) refreshes on every cache hit. For agents heartbeating at sub-TTL intervals, this means the cache stays warm for free across all wakes, dropping per-wake cost ~10×. Above the TTL, the cache cold-starts each wake.OpenAI/Codex caches have a 5–10 min TTL with no extension flag — break-even is shorter.
Gemini context cache is request-counted on the API side; for the AI Pro CLI bridge, behavior is closer to per-session.
Scope
MessageType.CACHE_PING(or equivalent) in the scheduler — a 1-token no-op turn whose purpose is to refresh the cache.TTL - margin.heartbeat.intervalagainst provider TTL and emit a warning into the snapshot when the configuration is suboptimal.Acceptance