Skip to content

Build rollout/session token-efficiency scanner #44

@cbusillo

Description

@cbusillo

Objective

Build a read-only scanner that turns rollout/session files and usage records into actionable token-efficiency evidence. The scanner should make expensive patterns visible before we optimize prompts, memory, agents, or history handling. It should run deterministically by default and support an optional second-stage local/cheap LLM classifier only on narrowed suspect spans, not whole raw rollout files.

Finish Line

A developer can point the scanner at recent Every Code sessions and see the largest cost drivers: embedded images, oversized tool outputs, duplicated base instructions, memory prompt size, agent fanout, auto-review churn, and per-session token growth.

Current Status

State: Active
Next action: Prototype a read-only analysis command against recent ~/.code/sessions and ~/.code/usage examples.
Blocked by: None
Last verified: 2026-05-11, parent #43 evidence

Scope

  • In: Local read-only scanning, compact reports, thresholds, examples from rollout/session files, before/after measurements, and optional local/cheap LLM classification of narrowed suspect spans.
  • Out: Mutating session history, changing model routing, enforcing budgets directly, or sending whole raw rollout/session files to LLMs by default.

Acceptance Criteria

  • Detects raw data:image / input_image payloads and reports byte/token scale.
  • Reports largest tool outputs and likely durable-history ballast.
  • Detects duplicated base-instruction/project-doc/skill blocks.
  • Correlates session timestamps with ~/.code/usage buckets where possible.
  • Produces output suitable for attaching to planning issues or PR validation.
  • Optional model-assisted analysis consumes compact scanner output/suspect spans and records provider/model used.

Relationships

Parent: #43
Blocks: image/tool-output compaction, prompt de-duplication, memory budgeting, auto-drive budget thresholds.

Validation

Run against known 2026-05-11 sessions and confirm it identifies the image-heavy context-panel session and the non-image auto-review churn separately.

Decisions

Use deterministic evidence from real rollout files before changing behavior; use local/cheap LLMs for classification only after the scanner has narrowed the evidence.

Open Questions

Should this live as a developer script, core diagnostic command, or both?

Metadata

Metadata

Assignees

No one assigned

    Labels

    planDurable planning issue

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions