Triage rollout/session friction audit findings

## Objective

Turn the 2026-05-22 read-only rollout/session friction audit into a durable recovery point. The audit looked for repeated agent mistakes, late Auto Review confusion, wasted tokens, duplicated instructions, skill misfires, memory/compaction issues, and places where the LLM lacked state it should have had.

This issue is an index and triage queue. It should keep the user from needing to hold the whole audit in working memory. Work each candidate one at a time by following the linked issue or creating a focused implementation issue when the destination is not already covered.

## Finish Line

Each evidence-backed friction candidate from the audit is either linked to an existing implementation issue, split into a focused new issue, or explicitly parked as noise. No raw private rollout/session contents are copied into public issues.

## Current Status

State: Active
Next action: Choose the next one-at-a-time follow-up: #46 prompt/skill injection duplication, #76 Auto Review stale/superseded result handling, or repo-management work under #85/#87/#51.
Blocked by: None
Waiting for: User choice of next fix/workstream.
Last verified: 2026-05-22 after implementing and validating the duplicate tool-output fix on local/cbusillo-overlay.

## Scope

- **In**: Redacted measurements from local rollout/session files, issue routing, one-at-a-time follow-up decisions, and durable links to existing work.
- **Out**: Raw transcript dumps, secrets, private paths beyond broad source classes, immediate code changes, or changing skills/config without explicit approval.

## Acceptance Criteria

- [ ] Duplicate persisted `function_call_output` entries are tracked in a focused destination.
- [ ] Base-instruction/project-doc/skill duplication is tracked under #46 with current evidence.
- [ ] Auto Review stale/superseded result handling is tracked under #76 with current evidence.
- [ ] Auto Review and Auto Drive budget/loop guardrail evidence is tracked under #50.
- [ ] Stale IDE inspection evidence routing is tracked or intentionally split into a new issue.
- [ ] Raw image/tool-output ballast evidence is reconciled with completed #45 and any remaining regression risk.
- [ ] Session recall/search friction from oversized histories is routed to #47/#45/#27 or split if needed.
- [ ] Branch-guard preflight friction is either documented as a small skill/workflow issue or parked.
- [ ] Scanner false positives are routed back to #44 or a follow-up scanner-quality issue.

## Relationships

Parent context: #43

Existing destinations:

- #45 - Compact persisted image and large tool-output history
- #46 - De-duplicate prompt and skill injection overhead
- #47 - Preserve and budget Every Code memory mode
- #50 - Add auto-review and auto-drive budget guardrails
- #76 - Make Auto Review snapshot-aware and preserve superseded results
- #27 - Expose Every Code automation origin on sessions
- #44 - Build rollout/session token-efficiency scanner
- related: cbusillo/code#43 - https://github.com/cbusillo/code/issues/43
- related: cbusillo/code#45 - https://github.com/cbusillo/code/issues/45
- related: cbusillo/code#46 - https://github.com/cbusillo/code/issues/46
- related: cbusillo/code#47 - https://github.com/cbusillo/code/issues/47
- related: cbusillo/code#50 - https://github.com/cbusillo/code/issues/50
- related: cbusillo/code#76 - https://github.com/cbusillo/code/issues/76
- related: cbusillo/code#27 - https://github.com/cbusillo/code/issues/27
- related: cbusillo/code#44 - https://github.com/cbusillo/code/issues/44
- related: cbusillo/code#91 - https://github.com/cbusillo/code/issues/91

## Validation

Read-only audit commands used local session/rollout data under:

- Recent sessions: `~/.code/sessions/2026/05/13`, `~/.code/sessions/2026/05/14`, `~/.code/sessions/2026/05/22`
- Archived sessions: selected `~/.code/archived_sessions` files with unusually large size or image payloads
- Catalog: `~/.code/sessions/index/catalog.jsonl`

The audit used the rollout-friction scanner where helpful, then manually discounted scanner hits that were only instruction/prose matches instead of real runtime friction.

## Evidence Summary

### 1. Duplicate persisted function-call outputs

Candidate destination: reopen/extend #45, or create a new focused rollout persistence bug.

Evidence:

- In three sampled sessions, identical `function_call_output` entries with the same `call_id` were persisted twice.
- Sample 1: 14,850 outputs, 7,670 unique, 7,180 duplicate entries; duplicate output bytes were about 47.2% of recorded output bytes.
- Sample 2: 922 outputs, 461 unique; duplicate output bytes were about 50.0%.
- Sample 3: 118 outputs, 59 unique; duplicate output bytes were about 50.0%.
- One inspected span showed the same tool result written twice about 200 ms apart with the same `call_id`.

Likely cause to investigate: rollout/event recording may persist both live event output and a later state snapshot item.

### 2. Duplicated project-doc and skill blocks

Candidate destination: #46.

Evidence:

- Recent `session_meta.base_instructions` repeatedly contained two `--- project-doc ---` sections and two `### Available skills` blocks.
- Base instruction payloads in the sampled sessions commonly ranged from about 20 KB to 52 KB before task-specific context.

Likely fix shape: de-duplicate prompt assembly sources and add a regression check that each project-doc and skill index block is injected once.

### 3. Auto Review loop volume and stale-current confusion

Candidate destinations: #76 and #50.

Evidence:

- Local session catalog sample had 77 auto-review-like sessions totaling 7,707 messages.
- 12 sessions exceeded 100 messages, 5 exceeded 500 messages, and 1 reached 1,618 messages.
- Several large sessions had only one user message, which points to automated follow-up/resolve loops rather than interactive conversation.

Likely fix shape: snapshot-aware completion gating (#76), plus token/turn/progress guardrails (#50).

### 4. Stale IDE inspection results repeated into context

Candidate destination: #50 if treated as automation guardrail friction, or a new focused validation-evidence routing issue.

Evidence:

- A large sampled session contained stale inspection results with `completion_reason: stale_results` after waits around 9s to 36s.
- The stale state included a project-changed-since-inspection reason and then appeared multiple times in later recorded context.

Likely fix shape: treat stale inspection output as compact invalid evidence and require rerun, rather than preserving bulky stale findings as reusable context.

### 5. Raw image payload ballast

Candidate destination: #45 / #43 validation and regression evidence.

Evidence:

- One archived session was about 234.7 MB and contained about 187.2 MB of base64 image payload across 208 image references.
- Another archived session was about 157.5 MB and contained about 116.8 MB of base64 image payload across 137 image references.
- A later huge May 13 session still had image references, but only about 1.6 MB of base64, suggesting the original image compaction may have improved but should remain a regression fixture.

Likely fix shape: keep raw images out of reusable model history and use these sessions as validation cases.

### 6. Session recall/search friction from oversized histories

Candidate destinations: #47, #45, and possibly #27.

Evidence:

- A follow-up session investigating a destructive previous command had to search prior rollouts to reconstruct what happened.
- Broad raw search over `~/.code` was killed or abandoned because large encoded payloads made direct history search impractical.
- The agent switched to extracting structured JSON message/tool fields to recover the relevant state.

Likely fix shape: better session indexing/summaries/citations so the LLM can retrieve prior operational facts without scanning massive raw rollout files.

### 7. Branch-guard preflight friction

Candidate destination: new small skill/workflow issue, or park as acceptable guard friction.

Evidence:

- `Blocked git switch` appeared in 21 recent sessions.
- The guard is useful, but agents repeatedly encounter it after issuing a blocked command instead of planning branch setup around the guard.

Likely fix shape: teach branch setup/preflight in GitHub workflow guidance or improve the guard message/action path.

### 8. Rollout-friction scanner false positives

Candidate destination: reopen #44 or create a scanner-quality follow-up.

Evidence:

- The scanner flagged GraphQL/REST/rate-limit and Auto Review signals from skill docs, issue bodies, and instructions that merely mentioned those terms.
- Manual review was needed to separate real runtime friction from self-referential prose.

Likely fix shape: classify structured runtime/tool outputs separately from instructions, issue bodies, and scanner-generated reports.

## Decisions

- Keep raw rollout/session files private and local.
- Use GitHub issues as the durable index; do not expect chat history to carry this audit.
- Review and implement one candidate at a time.
- Prefer existing issue destinations when the intent already overlaps; create new issues only for truly orphaned fixes.

## Open Questions

- Should duplicate persisted `function_call_output` be folded into #45 or split into a new focused bug?
- Should stale IDE inspection result routing live under #50 or a dedicated validation-evidence issue?
- Is branch-guard friction worth a new issue, or should it be handled as a small skill edit later?
- Should #44 be reopened for scanner false-positive reduction, or should scanner quality get a follow-up issue?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Triage rollout/session friction audit findings #90

Objective

Finish Line

Current Status

Scope

Acceptance Criteria

Relationships

Validation

Evidence Summary

1. Duplicate persisted function-call outputs

2. Duplicated project-doc and skill blocks

3. Auto Review loop volume and stale-current confusion

4. Stale IDE inspection results repeated into context

5. Raw image payload ballast

6. Session recall/search friction from oversized histories

7. Branch-guard preflight friction

8. Rollout-friction scanner false positives

Decisions

Open Questions

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Triage rollout/session friction audit findings #90

Description

Objective

Finish Line

Current Status

Scope

Acceptance Criteria

Relationships

Validation

Evidence Summary

1. Duplicate persisted function-call outputs

2. Duplicated project-doc and skill blocks

3. Auto Review loop volume and stale-current confusion

4. Stale IDE inspection results repeated into context

5. Raw image payload ballast

6. Session recall/search friction from oversized histories

7. Branch-guard preflight friction

8. Rollout-friction scanner false positives

Decisions

Open Questions

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions