Problem
Auto-review has been spending substantial tokens on repeated detached/stale review snapshots. In cbusillo/codex-skills, the auto-review ledger reported:
tokens=2036776t across recent runs
high_burn=5
- run
168d4bd0-e694-4023-b7f6-1188095bae06: tokens=1467380t, findings later verified stale against current main
- run
d494a552-6cf2-40b0-963c-dc1effbd4a24: tokens=190756t, useful but reviewing a detached snapshot rather than clearly scoped active target
This makes stale review prompts costly and increases noise around readiness decisions.
Desired Behavior
Reduce token burn for repeated or stale auto-review contexts.
Possible approaches:
- Cache review context and reuse summaries when the target snapshot has not changed.
- Detect when a finding's snapshot is no longer active and avoid launching another full review unless requested.
- Add a low-cost freshness/applicability preflight before high-context review.
- Cap repeated stale review prompts for the same finding/run family after current
HEAD verification.
- Prefer compact diff/snapshot comparison before loading broad repo context.
Acceptance Criteria
- Repeated auto-review runs against unchanged detached snapshots consume little or no additional model budget.
- Stale/superseded findings do not trigger high-token review loops by default.
- The ledger or UI exposes enough token/cache diagnostics to confirm whether caching is working.
- Agents can still request a fresh full review explicitly when a stale finding appears suspicious or safety-critical.
Why This Matters
The auto-review system caught a real issue in the later run, so the feature is valuable. But repeated stale high-token runs make review expensive and noisy. A cheap applicability/cache layer would preserve signal while keeping the system practical for long-running coding sessions.
Problem
Auto-review has been spending substantial tokens on repeated detached/stale review snapshots. In
cbusillo/codex-skills, the auto-review ledger reported:tokens=2036776tacross recent runshigh_burn=5168d4bd0-e694-4023-b7f6-1188095bae06:tokens=1467380t, findings later verified stale against currentmaind494a552-6cf2-40b0-963c-dc1effbd4a24:tokens=190756t, useful but reviewing a detached snapshot rather than clearly scoped active targetThis makes stale review prompts costly and increases noise around readiness decisions.
Desired Behavior
Reduce token burn for repeated or stale auto-review contexts.
Possible approaches:
HEADverification.Acceptance Criteria
Why This Matters
The auto-review system caught a real issue in the later run, so the feature is valuable. But repeated stale high-token runs make review expensive and noisy. A cheap applicability/cache layer would preserve signal while keeping the system practical for long-running coding sessions.