Objective
Build a read-only scanner that turns rollout/session files and usage records into actionable token-efficiency evidence. The scanner should make expensive patterns visible before we optimize prompts, memory, agents, or history handling. It should run deterministically by default and support an optional second-stage local/cheap LLM classifier only on narrowed suspect spans, not whole raw rollout files.
Finish Line
A developer can point the scanner at recent Every Code sessions and see the largest cost drivers: embedded images, oversized tool outputs, duplicated base instructions, memory prompt size, agent fanout, auto-review churn, and per-session token growth.
Current Status
State: Active
Next action: Prototype a read-only analysis command against recent ~/.code/sessions and ~/.code/usage examples.
Blocked by: None
Last verified: 2026-05-11, parent #43 evidence
Scope
- In: Local read-only scanning, compact reports, thresholds, examples from rollout/session files, before/after measurements, and optional local/cheap LLM classification of narrowed suspect spans.
- Out: Mutating session history, changing model routing, enforcing budgets directly, or sending whole raw rollout/session files to LLMs by default.
Acceptance Criteria
Relationships
Parent: #43
Blocks: image/tool-output compaction, prompt de-duplication, memory budgeting, auto-drive budget thresholds.
Validation
Run against known 2026-05-11 sessions and confirm it identifies the image-heavy context-panel session and the non-image auto-review churn separately.
Decisions
Use deterministic evidence from real rollout files before changing behavior; use local/cheap LLMs for classification only after the scanner has narrowed the evidence.
Open Questions
Should this live as a developer script, core diagnostic command, or both?
Objective
Build a read-only scanner that turns rollout/session files and usage records into actionable token-efficiency evidence. The scanner should make expensive patterns visible before we optimize prompts, memory, agents, or history handling. It should run deterministically by default and support an optional second-stage local/cheap LLM classifier only on narrowed suspect spans, not whole raw rollout files.
Finish Line
A developer can point the scanner at recent Every Code sessions and see the largest cost drivers: embedded images, oversized tool outputs, duplicated base instructions, memory prompt size, agent fanout, auto-review churn, and per-session token growth.
Current Status
State: Active
Next action: Prototype a read-only analysis command against recent
~/.code/sessionsand~/.code/usageexamples.Blocked by: None
Last verified: 2026-05-11, parent #43 evidence
Scope
Acceptance Criteria
data:image/input_imagepayloads and reports byte/token scale.~/.code/usagebuckets where possible.Relationships
Parent: #43
Blocks: image/tool-output compaction, prompt de-duplication, memory budgeting, auto-drive budget thresholds.
Validation
Run against known 2026-05-11 sessions and confirm it identifies the image-heavy
context-panelsession and the non-image auto-review churn separately.Decisions
Use deterministic evidence from real rollout files before changing behavior; use local/cheap LLMs for classification only after the scanner has narrowed the evidence.
Open Questions
Should this live as a developer script, core diagnostic command, or both?