Skip to content

Add cache keepalive primitive and heartbeat-vs-TTL warning #14

@EightRice

Description

@EightRice

Goal

Keep prompt-cache contexts alive across heartbeat cycles when the heartbeat interval is shorter than the cache TTL, and warn when a configured heartbeat exceeds the provider's TTL.

Context

Anthropic 1h sliding TTL (enabled via ENABLE_PROMPT_CACHING_1H) refreshes on every cache hit. For agents heartbeating at sub-TTL intervals, this means the cache stays warm for free across all wakes, dropping per-wake cost ~10×. Above the TTL, the cache cold-starts each wake.

OpenAI/Codex caches have a 5–10 min TTL with no extension flag — break-even is shorter.

Gemini context cache is request-counted on the API side; for the AI Pro CLI bridge, behavior is closer to per-session.

Scope

  • Add MessageType.CACHE_PING (or equivalent) in the scheduler — a 1-token no-op turn whose purpose is to refresh the cache.
  • For agents whose heartbeat interval is below the provider's TTL, no keepalive needed. For agents above the TTL but where cost optimization is still worthwhile (configurable), schedule a keepalive at TTL - margin.
  • Validation: at agent registration, compare heartbeat.interval against provider TTL and emit a warning into the snapshot when the configuration is suboptimal.

Acceptance

  • A unit test demonstrates a cache-ping reduces cache_creation tokens on the next real turn.
  • Snapshot shows a clear warning when an agent's heartbeat exceeds the provider's TTL and no keepalive is configured.

Metadata

Metadata

Assignees

No one assigned

    Labels

    track:opsCaching, telemetry, cost tracking, observabilitytype:featureNew capability

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions