Skip to content

Add context source ledger and prompt observability #92

@cbusillo

Description

@cbusillo

Finish Line

Every Code has a visible, testable prompt/context assembly contract: each context source has an identity, prompt-size contribution is observable before requests are sent, and duplicate/bloated context producers are caught by tests or diagnostics before they silently burn tokens.

Current Status

State: Core request ledger and first /context TUI view are merged; #93 is complete.
Next action: Use /context on real sessions to choose the next observability/budgeting increment: duplicate grouping, richer visual bars, harness request-shape assertions, or budgeting thresholds.
Blocked by: None.
Waiting for: Next implementation slice selection.
Last verified: 2026-05-22; PR #96 merged as f09fa68, local overlay was fast-forwarded, just local-code-rebuild completed, and PATH code reports 0.6.98.

Acceptance Criteria

  • Prompt assembly has named source/category buckets such as base instructions, developer messages, user/project instructions, skills catalog, explicit skill bodies, environment context, memories, history, pending input, tool outputs, and status/browser items.
  • Before API submission, Every Code logs or emits a compact prompt composition summary with item counts and byte/token estimates by bucket.
  • The same ledger powers a user-facing /context command or view in the TUI, similar in spirit to Claude Code / Antigravity context views but tailored to Every Code's sources.
  • /context shows at least total estimated context, per-source budget bars/counts, whether each source is persisted or request-only, and the largest contributors.
  • /context can highlight likely duplicate contextual fragments or repeated source identities before tokens are spent.
  • Contextual fragments have stable identity or markers so duplicate user instructions, skills catalog, environment context, explicit skills, and status fragments can be detected intentionally rather than by ad hoc string checks.
  • Regression tests assemble prompts with multiple producers active and assert each one-shot contextual source appears once.
  • The code-exec harness has fake Responses coverage proving the ledger/view matches the real executable request shape, not only lower-level helper internals.
  • The design supports future budget enforcement for Preserve and budget Every Code memory mode #47 and Add auto-review and auto-drive budget guardrails #50 without requiring every context producer to reinvent its own limit logic.
  • The design supports Support manual-only skills outside default context #91 manual-only skills by separating "discoverable/installed", "included in default prompt context", and "explicitly injected for this turn".

Evidence / Context

This is the systemic follow-up to two concrete duplication bugs:

  • Duplicate tool outputs were found in rollout/session files and fixed in commit 75710e00fd by filtering already-recorded pending tool outputs.
  • De-duplicate prompt and skill injection overhead #46 is now tracking a likely composed-instructions duplication path where project docs and skills are folded into user_instructions, then recomposed.

Read-only agent review found the common pattern:

  • Several context producers append directly into request/history without a shared ledger.
  • Some producers dedupe locally, but there is no final assembled-prompt invariant.
  • Upstream codex-rs has useful reference patterns around contextual fragments and skill config/policy that can guide Every Code without wholesale porting.

Proposed Phases

  1. De-duplicate prompt and skill injection overhead #46/Support manual-only skills outside default context #91 immediate fixes: keep raw/effective instructions separate, avoid duplicated project-doc/skills context, and support manual-only skills plus explicit request-only $skill injection.
  2. Ledger data model: add a small internal prompt/context source record with source id, category, persistence class, byte/token estimate, and duplicate key.
  3. Request assembly instrumentation: populate the ledger from the same places that assemble the real request, then log or emit it immediately before API submission.
  4. /context first view: expose a compact TUI command that shows total estimate, per-source bars/counts, persisted vs request-only context, top contributors, and duplicate warnings.
  5. Harness truth test: add fake Responses scenarios that assert /context or emitted ledger data matches the captured real request body shape.
  6. Budget guardrails: use the same source categories to warn or enforce limits for memories, auto-review/auto-drive, history, status/browser items, and explicit skills.
  7. UX polish: add drilldowns, before/after compaction views, and selected-source inspection once the ledger is proven reliable.

Open Questions

  • Should the first implementation slice be log/event-only, or should it include a minimal /context command from the start?
  • Should /context inspect the last submitted request, the next request estimate, or both?
  • Which persistence labels should be canonical: persisted history, contextual/session, request-only, generated-per-attempt, and tool-result?
  • Should token counts start with approximate byte/token estimates and graduate to provider/model-specific tokenizer estimates later?
  • Should duplicate detection be fail-fast in tests only, warn at runtime, or surface directly in /context?
  • Should the first TUI view be text/table only, or include proportional bars immediately?
  • How should compacted history and previous-response-id/ZDR modes be represented so the view stays truthful across providers?

Relationships

Parent context: #43
Related: #46, #47, #50, #76, #90, #91

Metadata

Metadata

Assignees

No one assigned

    Labels

    planDurable planning issueplan:activeCurrent active plan

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions