Add context source ledger and prompt observability

## Finish Line

Every Code has a visible, testable prompt/context assembly contract: each context source has an identity, prompt-size contribution is observable before requests are sent, and duplicate/bloated context producers are caught by tests or diagnostics before they silently burn tokens.

## Current Status

State: Core request ledger and first `/context` TUI view are merged; #93 is complete.
Next action: Use `/context` on real sessions to choose the next observability/budgeting increment: duplicate grouping, richer visual bars, harness request-shape assertions, or budgeting thresholds.
Blocked by: None.
Waiting for: Next implementation slice selection.
Last verified: 2026-05-22; PR #96 merged as f09fa6888337529621531a3ce3b3197ee6a0c2d0, local overlay was fast-forwarded, `just local-code-rebuild` completed, and PATH `code` reports `0.6.98`.

## Acceptance Criteria

- [ ] Prompt assembly has named source/category buckets such as base instructions, developer messages, user/project instructions, skills catalog, explicit skill bodies, environment context, memories, history, pending input, tool outputs, and status/browser items.
- [ ] Before API submission, Every Code logs or emits a compact prompt composition summary with item counts and byte/token estimates by bucket.
- [ ] The same ledger powers a user-facing `/context` command or view in the TUI, similar in spirit to Claude Code / Antigravity context views but tailored to Every Code's sources.
- [ ] `/context` shows at least total estimated context, per-source budget bars/counts, whether each source is persisted or request-only, and the largest contributors.
- [ ] `/context` can highlight likely duplicate contextual fragments or repeated source identities before tokens are spent.
- [ ] Contextual fragments have stable identity or markers so duplicate user instructions, skills catalog, environment context, explicit skills, and status fragments can be detected intentionally rather than by ad hoc string checks.
- [ ] Regression tests assemble prompts with multiple producers active and assert each one-shot contextual source appears once.
- [ ] The code-exec harness has fake Responses coverage proving the ledger/view matches the real executable request shape, not only lower-level helper internals.
- [ ] The design supports future budget enforcement for #47 and #50 without requiring every context producer to reinvent its own limit logic.
- [ ] The design supports #91 manual-only skills by separating "discoverable/installed", "included in default prompt context", and "explicitly injected for this turn".

## Evidence / Context

This is the systemic follow-up to two concrete duplication bugs:

- Duplicate tool outputs were found in rollout/session files and fixed in commit `75710e00fd` by filtering already-recorded pending tool outputs.
- #46 is now tracking a likely composed-instructions duplication path where project docs and skills are folded into `user_instructions`, then recomposed.

Read-only agent review found the common pattern:

- Several context producers append directly into request/history without a shared ledger.
- Some producers dedupe locally, but there is no final assembled-prompt invariant.
- Upstream `codex-rs` has useful reference patterns around contextual fragments and skill config/policy that can guide Every Code without wholesale porting.

## Proposed Phases

1. #46/#91 immediate fixes: keep raw/effective instructions separate, avoid duplicated project-doc/skills context, and support manual-only skills plus explicit request-only `$skill` injection.
2. Ledger data model: add a small internal prompt/context source record with source id, category, persistence class, byte/token estimate, and duplicate key.
3. Request assembly instrumentation: populate the ledger from the same places that assemble the real request, then log or emit it immediately before API submission.
4. `/context` first view: expose a compact TUI command that shows total estimate, per-source bars/counts, persisted vs request-only context, top contributors, and duplicate warnings.
5. Harness truth test: add fake Responses scenarios that assert `/context` or emitted ledger data matches the captured real request body shape.
6. Budget guardrails: use the same source categories to warn or enforce limits for memories, auto-review/auto-drive, history, status/browser items, and explicit skills.
7. UX polish: add drilldowns, before/after compaction views, and selected-source inspection once the ledger is proven reliable.

## Open Questions

- Should the first implementation slice be log/event-only, or should it include a minimal `/context` command from the start?
- Should `/context` inspect the last submitted request, the next request estimate, or both?
- Which persistence labels should be canonical: persisted history, contextual/session, request-only, generated-per-attempt, and tool-result?
- Should token counts start with approximate byte/token estimates and graduate to provider/model-specific tokenizer estimates later?
- Should duplicate detection be fail-fast in tests only, warn at runtime, or surface directly in `/context`?
- Should the first TUI view be text/table only, or include proportional bars immediately?
- How should compacted history and previous-response-id/ZDR modes be represented so the view stays truthful across providers?

## Relationships

Parent context: #43
Related: #46, #47, #50, #76, #90, #91
- related: cbusillo/code#43 - https://github.com/cbusillo/code/issues/43
- related: cbusillo/code#46 - https://github.com/cbusillo/code/issues/46
- related: cbusillo/code#47 - https://github.com/cbusillo/code/issues/47
- related: cbusillo/code#50 - https://github.com/cbusillo/code/issues/50
- related: cbusillo/code#76 - https://github.com/cbusillo/code/issues/76
- related: cbusillo/code#90 - https://github.com/cbusillo/code/issues/90
- related: cbusillo/code#91 - https://github.com/cbusillo/code/issues/91


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add context source ledger and prompt observability #92

Finish Line

Current Status

Acceptance Criteria

Evidence / Context

Proposed Phases

Open Questions

Relationships

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add context source ledger and prompt observability #92

Description

Finish Line

Current Status

Acceptance Criteria

Evidence / Context

Proposed Phases

Open Questions

Relationships

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions