Skip to content

Surface silent parser content-capture gaps at ingest time (#59)#75

Merged
willwashburn merged 2 commits intomainfrom
feat/content-gap-surfacing-59
Apr 25, 2026
Merged

Surface silent parser content-capture gaps at ingest time (#59)#75
willwashburn merged 2 commits intomainfrom
feat/content-gap-surfacing-59

Conversation

@willwashburn
Copy link
Copy Markdown
Member

@willwashburn willwashburn commented Apr 25, 2026

Fixes #59.

Summary

  • New helper countToolCallGaps(turns, content) flags a session where the parser committed turns containing tool calls but emitted zero tool_result ContentRecords. That's the silent-degradation shape that hid 99.7% of sessions in even-split attribution before Add content capture for Codex and OpenCode parsers (#33 follow-up) #58.
  • Each per-adapter ingest path (ingestClaudeInto / ingestCodexInto / ingestOpencodeInto) accumulates affectedSessions + orphanToolCalls during the walk and emits one formatted warning at the end of the loop, naming the adapter, the counts, the downstream impact (burn waste even-split fallback), and pointing at Design: content sidecar store with retention and opt-out #33.
  • Suppression: per-process state (Map<adapter, lastEmittedSessionCount>). After the first warn, later ingestAll() calls in the same burn invocation stay silent unless additional affected sessions show up — so --watch loops don't flood stderr. Exported resetIngestGapWarnings() and setIngestGapWriter() for tests; the latter also lets future callers route warnings somewhere other than stderr if needed.
  • The warning never fires when contentMode === 'hash-only' or 'off', nor when an adapter's sessions have zero tool calls (chat-only). Exit code unchanged (0) — this is visibility, not gating.
  • Refactored CLAUDE_PROJECTS / CODEX_SESSIONS / OPENCODE_* constants into per-call accessors so tests can swap HOME and exercise the real ingest paths against a temp dir, matching the style already used in commands/claude.ts.

Test plan

  • pnpm run test:ts (347 tests, all pass) including the new countToolCallGaps unit tests and the ingest gap warning (codex parser-gap scenario) suite.
  • New tests cover: warning fires once with correct counts/text on a synthetic codex session whose parser produces no tool_result records; no warning under hash-only; no warning under off; no warning for chat-only sessions; repeat ingest calls in the same process stay silent.
  • Manual: run pnpm dev:cli summary (or waste) on a fresh ledger that contains codex sessions with tool calls — the one-line warning should appear once.

Out of scope (deferred)

  • Aggregate count surfaced in burn diagnose as a permanent state report — issue calls this out as a separate follow-up.
  • Pluggable capability registry — explicitly noted as overkill in the issue.

🤖 Generated with Claude Code


Open in Devin Review

When `contentMode === 'full'` and an adapter's parser commits turns with tool
calls but emits zero `tool_result` ContentRecords for the session, ingest now
prints one warning per adapter naming the affected session/tool-call counts
and pointing at #33. Suppressed for `hash-only`/`off`, chat-only sessions,
and after the first emit per `burn` invocation. Exit code unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

Open in Devin Review

…acing-59

# Conflicts:
#	packages/cli/CHANGELOG.md
@willwashburn willwashburn merged commit de207cf into main Apr 25, 2026
@willwashburn willwashburn deleted the feat/content-gap-surfacing-59 branch April 25, 2026 22:52
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 5 additional findings in Devin Review.

Open in Devin Review

Comment on lines +187 to +189
const priorEmitted = state.warnedAffectedSessions.get(adapter);
if (priorEmitted !== undefined && stats.affectedSessions <= priorEmitted) return;
state.warnedAffectedSessions.set(adapter, stats.affectedSessions);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Gap warning suppression compares per-pass count against stored historical count, incorrectly suppressing warnings for new affected sessions

The emitGapWarning function stores the per-pass stats.affectedSessions count at packages/cli/src/ingest.ts:189 and on subsequent calls compares the new per-pass count against the stored value (packages/cli/src/ingest.ts:188). However, GapStats is freshly initialized to {affectedSessions: 0, orphanToolCalls: 0} on every call to ingestCodexSessions() / ingestAll() etc. (packages/cli/src/ingest.ts:118), so stats.affectedSessions only reflects sessions from the current pass, not a cumulative total.

In a --watch loop: pass 1 finds 5 affected sessions → warns, stores 5. Pass 2 (cursor at EOF) → 0 affected, early return at line 183 (correct). Pass 3 finds 2 new affected sessions → 2 <= 5 → suppressed (wrong!). The comment at line 184-186 says "no additional affected sessions showed up since then", but the comparison doesn't detect additional sessions — it only fires when the current batch is larger than the previous emission's batch.

Prompt for agents
The emitGapWarning function at packages/cli/src/ingest.ts:176-195 has a flawed suppression mechanism. The GapStats object is freshly initialized per ingest call, so stats.affectedSessions is a per-pass count. But the suppression check at line 188 compares this per-pass count against the previously stored per-pass count from the last emission. This means if pass 1 had 5 affected sessions and a later pass has 2 new affected sessions, the warning is incorrectly suppressed (2 <= 5).

To fix this, either:
1. Change warnedAffectedSessions to a simple Set<AdapterName> (boolean flag) and suppress all warnings after the first emission per adapter, or
2. Make the stored count cumulative by adding stats.affectedSessions to the prior value when storing (and compare current cumulative total against stored cumulative total), or
3. Simply suppress only when stats.affectedSessions === 0 (which line 183 already does), and remove the priorEmitted comparison entirely — this would re-warn on every pass that finds any affected sessions, which matches the stated intent of surfacing fresh affected sessions.

Option 3 is simplest and matches the documented intent. The warnedAffectedSessions map and lines 187-189 can be replaced with a simple boolean per-adapter check.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread packages/cli/CHANGELOG.md
### Added

- **`burn mcp-server`** — stdio MCP (Model Context Protocol) server that lets a running agent self-query its own cost and quota state mid-session. Registers `burn__sessionCost` and `burn__currentBlock`. Read-only. Pair with `buildMcpConfig({sessionId})` from `@relayburn/mcp` to inject the server into a spawned `claude --mcp-config <…>` session. (#26)
- **Surface silent parser content-capture gaps at ingest time.** When `contentMode === 'full'` and an adapter's parser commits turns containing tool calls but emits zero `tool_result` ContentRecords for the session, ingest now prints a one-line warning naming the adapter, the affected session/tool-call counts, and pointing at #33. The warning is suppressed for `hash-only`/`off` content modes, for chat-only sessions (no tool calls), and after the first emit per adapter per `burn` invocation (so `--watch` loops stay quiet). Exit code unchanged. (#59)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Changelog entry placed under released version [0.13.1] instead of [Unreleased]

The new changelog entry for "Surface silent parser content-capture gaps at ingest time" is added to the ## [0.13.1] - 2026-04-25 section (an already-stamped version) rather than under ## [Unreleased]. AGENTS.md mandates: "Curate [Unreleased] in the relevant per-package packages/*/CHANGELOG.md as you land PRs... At publish time, the workflow promotes your [Unreleased] block verbatim into ## [x.y.z] - DATE." Adding to an already-versioned section breaks this workflow — the publish workflow will see an empty [Unreleased] and either skip changelog generation or fall back to git-log inference, while the manually placed entry sits in an already-published version.

Prompt for agents
The new changelog entry at packages/cli/CHANGELOG.md line 39 (Surface silent parser content-capture gaps at ingest time) was added to the already-stamped ## [0.13.1] section instead of under ## [Unreleased]. Per AGENTS.md, new entries should go under [Unreleased] and will be promoted to a versioned section by the publish workflow. Move this entry to the [Unreleased] section at the top of the file (after line 8).
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Surface silent parser content-capture gaps at ingest time

1 participant