Implement waste-pattern detectors (closes #11)#52
Conversation
…ion loss, edit-revert) Introduces four orthogonal detectors for the "why was this work unproductive" axis of `burn waste` (issue #11). They complement per-tool cost attribution by surfacing the *shape* of unproductive work, not just where the tokens went. - Retry loops: ≥3 strictly-consecutive tool calls sharing (name, argsHash) that all carry is_error=true. - Failure runs: ≥3 consecutive errored tool calls with distinct keys — the "agent is stuck trying different things" pattern. Orthogonal to retry loops (same-key streaks are suppressed). - Compaction losses: anchored to `type:system subtype:compact_boundary` records in Claude Code sessions. Priced against the cacheRead of the immediately-preceding turn. - Edit-revert cycles: content-hash matching across the full session window, detected when a post-state hash matches any earlier pre-state hash on the same file. Reset after close so A→B→A→B→A reports two cycles. Surface area: - Reader: `ToolCall.isError`, `editPreHash`, `editPostHash`; a new `CompactionEvent` type emitted in `ParseResult.events`. - Ledger: new `CompactionLine` kind with id-hash dedup; `appendCompactions`, `queryCompactions`. - Analyze: `detectPatterns(turns, opts)` plus per-session rollup summary. - CLI: `burn waste --patterns[=retries,failures,compaction,reverts]` and new `burn diagnose <session-id>` verbose single-session view. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolves the design question in #6 by adopting the two signals that converged in the issue discussion: outcome inference (from agentsview) and one-shot rate (from codeburn). These are orthogonal — each catches a different failure mode — and together form the MVP quality axis that distinguishes burn's "same output, less spend" question from every other usage tracker. - Outcome inference: classifies each session as completed | abandoned | errored | unknown with high/medium/low confidence, using turn count, ending role, trailing failure streak, recency, and (optional) last assistant text for give-up phrase detection. - One-shot rate: per-session `oneShotTurns / editTurns` — robust in hash-only content mode since it needs only tool-call patterns. - Both computed lazily in `@relayburn/analyze`. Nothing persists to the ledger — upgrading rules later doesn't require a rebuild. No prompt storage required (the design's hard constraint). - Wired into `burn summary --quality` as the first consumer. Closes #6. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| const userTextByMessageId = new Map<string, string>(); | ||
| const erroredToolUseIds = new Set<string>(); | ||
| const events: Array<{ offset: number; event: CompactionEvent }> = []; | ||
| let lastAssistantMessageId: string | undefined; |
There was a problem hiding this comment.
🟡 Incremental parser does not carry forward lastAssistantMessageId, causing compaction events to lose preceding-turn annotation
In parseClaudeSessionIncremental, lastAssistantMessageId is initialized to undefined and is never seeded from a prior call — unlike lastUserText which IS carried forward via ParseIncrementalOptions.lastUserText and the ClaudeCursor.
When a compact_boundary system line falls in a new incremental chunk (because the preceding assistant message was fully processed in a prior chunk and the cursor advanced past it), lastAssistantMessageId is undefined, so the CompactionEvent is emitted without precedingMessageId. Consequently, annotateCompactionEvents at packages/reader/src/claude.ts:796 cannot look up the preceding turn's cacheRead, leaving tokensBeforeCompact unset. Downstream, detectCompactionLosses (packages/analyze/src/patterns.ts:316) defaults it to 0, producing cacheLostCost: 0 — silently dropping the cost data for that compaction event in the ledger permanently.
The analogous carry-forward pattern exists for lastUserText (packages/reader/src/claude.ts:648, packages/cli/src/ingest.ts:113-114, packages/ledger/src/cursors.ts:15) but was not implemented for lastAssistantMessageId.
Prompt for agents
The incremental Claude session parser does not carry forward lastAssistantMessageId across incremental parsing calls. This means when a compact_boundary system line is the first thing seen in a new incremental chunk (after the preceding assistant message was processed in a prior chunk), the CompactionEvent will lack precedingMessageId and tokensBeforeCompact, causing the compaction cost to be recorded as 0.
To fix this, mirror the existing lastUserText carry-forward pattern:
1. Add lastAssistantMessageId?: string to ParseIncrementalOptions (packages/reader/src/claude.ts, around line 588-595)
2. Add lastAssistantMessageId: string to ParseIncrementalResult (packages/reader/src/claude.ts, around line 597-604)
3. Seed the local variable from options: let lastAssistantMessageId = options.lastAssistantMessageId (line 645)
4. Return lastAssistantMessageId in the result object (around line 798-804)
5. Add lastAssistantMessageId?: string to ClaudeCursor in packages/ledger/src/cursors.ts
6. Wire it through in packages/cli/src/ingest.ts ingestClaudeInto, similar to how lastUserText is handled at lines 113-114 and 134
Was this helpful? React with 👍 or 👎 to provide feedback.
barryollama
left a comment
There was a problem hiding this comment.
Code Review: Waste-pattern detectors
Summary
This PR delivers on the ambitious goal from #11 by implementing four distinct waste-pattern detectors. The implementation is solid, well-tested, and follows TypeScript best practices.
What's Strong
1. Clean separation of concerns
The detectors are orthogonally designed with clear boundaries:
- RetryLoop detector catches same-tool, same-args failures (≥3 consecutive)
- FailureRun catches multi-tool failure streaks
- CompactionLoss prices context window resets
- EditRevertCycle tracks file-state oscillations
2. Thoughtful deduplication
Excellent handling of the overlap between retry loops and failure runs. The keys.size < 2 guard in detectFailureRunsForSession() prevents double-reporting.
3. Ledger integration
The CompactionLine addition to the ledger schema with id-hash dedup is clean. Reusing the turn-id index namespace is pragmatic—we're not expecting collisions between turn UUIDs and compaction event hashes.
4. Test coverage
Comprehensive fixtures for all four detectors. The JSONL fixtures are readable and capture real-world session patterns.
Suggestions
1. Naming clarity in editPreHash/editPostHash
These field names could be more descriptive for Write tool calls (they don't technically have a "pre" state). Consider contentBeforeHash/contentAfterHash or documenting that Write lacks editPreHash.
2. Session summary totalPatternCost
Nit: The comment says "Sum of detector-reported costs"—worth clarifying whether this represents cumulative cost or unique cost (could a turn contribute to multiple patterns?).
3. Incremental parsing edge case
In parseClaudeSessionIncremental, compaction events past endOffset are filtered. This is correct for dedup, but consider what happens if a compaction marker appears mid-turn (rare but possible edge case in streaming).
4. Pricing assumptions
The compaction cost uses the preceding turn's model rate. This is a reasonable proxy, but consider documenting this assumption explicitly—it assumes cache pricing hasn't changed between turns.
Questions
-
Detector extensibility: The PR mentions OpenCode-specific detectors as a follow-up. Should the DetectPatternsOptions interface include a source filter to make this cleaner, or will that be handled at the caller level?
-
Health grading: The PR description mentions health grading as a downstream consumer. Should the SessionPatternSummary include severity levels (e.g., each pattern has a severity score), or will the grading layer compute that?
Approval Recommendation
Approve with minor notes. The implementation is solid, well-tested, and the deferred follow-ups are well-justified in the PR description.
The edit-revert detection in particular is elegant—tracking hash transitions per-file and only flagging when postHash == previous preHash neatly captures the "back to square one" semantics without false positives on A→B→C→B (which is a partial revert, not a full cycle).
There was a problem hiding this comment.
Pull request overview
Adds a new “waste-pattern detection” layer (retry loops, consecutive failures, compaction-loss, edit-revert cycles) to complement per-tool cost attribution, surfaced via burn waste --patterns[...] and a new burn diagnose <session-id> command. This extends the Claude reader + ledger to persist additional signals (tool error flags, edit/write content hashes, compaction boundary events) so analysis can be done from the ledger without re-parsing raw session logs.
Changes:
- Extend Claude session parsing to surface
tool_result.is_error, capture Edit/Write content hashes, and emit compaction-boundary events. - Persist compaction events in the ledger and expose them via
queryCompactions. - Add analyzer-side pattern detectors (+ tests) and CLI surfaces (
burn waste --patterns…,burn diagnose).
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/fixtures/claude/retry-loop.jsonl | New fixture for retry-loop detection using tool_result.is_error. |
| tests/fixtures/claude/edit-revert.jsonl | New fixture for edit-revert cycle detection based on old/new content. |
| tests/fixtures/claude/consecutive-failures.jsonl | New fixture for consecutive failure-run detection. |
| tests/fixtures/claude/compact-boundary.jsonl | New fixture for compaction-boundary system marker. |
| packages/reader/src/types.ts | Add ToolCall.isError + edit hashes, and introduce CompactionEvent. |
| packages/reader/src/index.ts | Export CompactionEvent. |
| packages/reader/src/hash.ts | Add contentHash() for Edit/Write hashing. |
| packages/reader/src/claude.ts | Emit compaction events; propagate tool errors and edit hashes into ToolCall. |
| packages/reader/src/claude.test.ts | Tests for new reader fields + compaction-event emission. |
| packages/ledger/src/schema.ts | Add CompactionLine and isCompactionLine. |
| packages/ledger/src/index-sidecar.ts | Add compaction id hashing + index rebuild support. |
| packages/ledger/src/writer.ts | Add appendCompactions() to persist compaction events with id-dedup. |
| packages/ledger/src/reader.ts | Add queryCompactions() for reading compaction records. |
| packages/ledger/src/index.ts | Export compaction APIs/types. |
| packages/cli/src/ingest.ts | Ingest compaction events alongside turns/content. |
| packages/cli/src/commands/waste.ts | Add --patterns[...] path that runs detectors (and prints tables/JSON). |
| packages/cli/src/commands/diagnose.ts | New single-session command combining attribution + pattern reports. |
| packages/cli/src/cli.ts | Register diagnose command and document waste --patterns. |
| packages/cli/src/args.ts | Support --flag=value inline flag syntax. |
| packages/cli/src/args.test.ts | Tests for inline flag parsing behavior. |
| packages/analyze/src/patterns.ts | Implement the four detectors + per-session summary rollups. |
| packages/analyze/src/patterns.test.ts | Unit + fixture-driven tests for all detectors and rollups. |
| packages/analyze/src/index.ts | Export new detector API/types. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| const patternsFlag = args.flags['patterns']; | ||
| if (patternsFlag !== undefined) { | ||
| const selected = resolvePatternSelection(patternsFlag); | ||
| const compactions = selected.has('compaction') | ||
| ? await queryCompactions(q) | ||
| : []; | ||
| const patterns = detectPatterns(turns, { pricing, compactions }); | ||
| return renderPatterns(args, patterns, selected, turns.length); | ||
| } | ||
|
|
||
| const sessionIds = new Set(turns.map((t) => t.sessionId)); |
There was a problem hiding this comment.
When q.project is set, queryAll(q) returns only turns for that project, but queryCompactions(q) ignores project (by design in ledger.reader) and will return compaction events from unrelated sessions/projects. That can produce pattern output + session summaries for sessions that weren’t part of the analyzed turn set. Filter compactions to the sessionIds present in turns (e.g., query by since/until/source then sessionId-filter in-memory, or query per sessionId) before calling detectPatterns.
| const patternsFlag = args.flags['patterns']; | |
| if (patternsFlag !== undefined) { | |
| const selected = resolvePatternSelection(patternsFlag); | |
| const compactions = selected.has('compaction') | |
| ? await queryCompactions(q) | |
| : []; | |
| const patterns = detectPatterns(turns, { pricing, compactions }); | |
| return renderPatterns(args, patterns, selected, turns.length); | |
| } | |
| const sessionIds = new Set(turns.map((t) => t.sessionId)); | |
| const sessionIds = new Set(turns.map((t) => t.sessionId)); | |
| const patternsFlag = args.flags['patterns']; | |
| if (patternsFlag !== undefined) { | |
| const selected = resolvePatternSelection(patternsFlag); | |
| const compactions = selected.has('compaction') | |
| ? (await queryCompactions(q)).filter((compaction) => sessionIds.has(compaction.sessionId)) | |
| : []; | |
| const patterns = detectPatterns(turns, { pricing, compactions }); | |
| return renderPatterns(args, patterns, selected, turns.length); | |
| } |
| const commit = (): void => { | ||
| if (streak.length < MIN_FAILURE_RUN_LEN) return; | ||
| const keys = new Set(streak.map((r) => `${r.call.name}|${r.call.argsHash}`)); | ||
| // A same-(tool,args) run is a retry loop, not a distinct-failure run. | ||
| // Keep them orthogonal so the two detectors never double-report the same | ||
| // sequence. | ||
| if (keys.size < 2) return; | ||
| const first = streak[0]!; | ||
| const last = streak[streak.length - 1]!; | ||
| const tools = Array.from(new Set(streak.map((r) => r.call.name))); | ||
| const contributingTurns = dedupTurns(streak.map((r) => r.turn)); | ||
| runs.push({ | ||
| sessionId, | ||
| length: streak.length, | ||
| startTurnIndex: first.turn.turnIndex, | ||
| endTurnIndex: last.turn.turnIndex, | ||
| toolsInvolved: tools, | ||
| cost: sumCostForTurns(contributingTurns, pricing), | ||
| }); | ||
| }; | ||
|
|
||
| for (const ref of flat) { | ||
| if (ref.call.isError === true) { | ||
| streak.push(ref); | ||
| } else { | ||
| commit(); | ||
| streak = []; | ||
| } | ||
| } | ||
| commit(); |
There was a problem hiding this comment.
detectFailureRunsForSession currently treats any ≥3 consecutive errored tool calls as a failure run as long as there are at least 2 distinct (tool,argsHash) keys. This can (a) report runs where the same tool fails repeatedly with different args (contradicting the “distinct tools failing in sequence” requirement) and (b) double-report sequences that contain a retry-loop segment (e.g., A, B, B, B, C would emit both a retry loop and a failure run). Consider building the failure-run streak from distinct consecutive keys/tools by collapsing consecutive identical keys (or resetting on them) and enforcing ≥3 distinct tools before emitting.
| const tokens = e.tokensBeforeCompact ?? 0; | ||
| let cacheLostCost = 0; | ||
| if (tokens > 0 && e.precedingMessageId) { | ||
| const preceding = turnByMessageId.get(e.precedingMessageId); | ||
| if (preceding) { | ||
| const priced = costForUsage( | ||
| { | ||
| input: 0, | ||
| output: 0, | ||
| reasoning: 0, | ||
| cacheRead: tokens, | ||
| cacheCreate5m: 0, | ||
| cacheCreate1h: 0, | ||
| }, | ||
| preceding.model, | ||
| pricing, | ||
| ); | ||
| if (priced) cacheLostCost = priced.total; | ||
| } |
There was a problem hiding this comment.
detectCompactionLosses treats missing tokensBeforeCompact as 0, which makes cacheLostCost silently drop to 0 even when precedingMessageId is present and the preceding turn is available in turns (common if incremental ingestion didn’t annotate the event). Prefer deriving tokensBeforeCompact from the preceding turn (preceding.usage.cacheRead) when the event field is undefined, and only defaulting to 0 when neither is available.
| const tokens = e.tokensBeforeCompact ?? 0; | |
| let cacheLostCost = 0; | |
| if (tokens > 0 && e.precedingMessageId) { | |
| const preceding = turnByMessageId.get(e.precedingMessageId); | |
| if (preceding) { | |
| const priced = costForUsage( | |
| { | |
| input: 0, | |
| output: 0, | |
| reasoning: 0, | |
| cacheRead: tokens, | |
| cacheCreate5m: 0, | |
| cacheCreate1h: 0, | |
| }, | |
| preceding.model, | |
| pricing, | |
| ); | |
| if (priced) cacheLostCost = priced.total; | |
| } | |
| const preceding = e.precedingMessageId | |
| ? turnByMessageId.get(e.precedingMessageId) | |
| : undefined; | |
| const tokens = e.tokensBeforeCompact ?? preceding?.usage.cacheRead ?? 0; | |
| let cacheLostCost = 0; | |
| if (tokens > 0 && preceding) { | |
| const priced = costForUsage( | |
| { | |
| input: 0, | |
| output: 0, | |
| reasoning: 0, | |
| cacheRead: tokens, | |
| cacheCreate5m: 0, | |
| cacheCreate1h: 0, | |
| }, | |
| preceding.model, | |
| pricing, | |
| ); | |
| if (priced) cacheLostCost = priced.total; |
| const events: Array<{ offset: number; event: CompactionEvent }> = []; | ||
| let lastAssistantMessageId: string | undefined; | ||
| // Seed from the prior call so an in-progress turn whose user prompt lives | ||
| // before `startOffset` still classifies against that prompt on resume. | ||
| let currentUserText = options.lastUserText ?? ''; |
There was a problem hiding this comment.
In incremental parsing, lastAssistantMessageId starts as undefined each call. If a compact_boundary record is appended after the previous ingestion (so the preceding assistant line is before startOffset), the emitted CompactionEvent won’t get precedingMessageId, and later pricing will be impossible/zeroed. This likely needs cursor carry-forward similar to lastUserText (e.g., persist last seen assistant message id in ClaudeCursor / parseClaudeSessionIncremental options), or an alternate strategy to resolve the preceding turn when the marker is in a later chunk.
Treat sessions whose turns lack a stopReason (e.g. Codex) as completed/low-confidence rather than misclassifying them as user-abandoned. Add 'unknown-ending' outcome, return early from inferOutcome for unknown endings, and make endingRole return 'unknown' when stopReason is undefined so trailing failure-streak detection still works. Add tests to cover the new classification and failure-streak detection for sources without stopReason.
- Extend single-exchange to messageCount <= 2 so a true one-turn "user asks, assistant answers" session is classified completed/medium instead of too-short/unknown. TurnRecord counts assistant turns only; the prior `=== 2` gate only caught tool-mediated round trips. - Add three give-up phrases observed in real Claude/Codex sessions. - Parallelize content sidecar reads in `burn summary --quality` with a concurrency cap of 8 so large ledgers (many sessions, many ENOENT paths) don't serialize I/O. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds [Unreleased] entries for PR #53 to the root, analyze, and cli changelogs covering the outcome inference + one-shot rate module and the new `burn summary --quality` flag. Notes the Codex unknown-stopReason handling and the concurrency cap on sidecar reads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add quality signals (outcome inference + one-shot rate) (closes #6)
There was a problem hiding this comment.
Devin Review found 1 new potential issue.
🐛 1 issue in files not directly in the diff
🐛 burn claude wrapper silently drops compaction events, causing permanent data loss (packages/cli/src/commands/claude.ts:64-70)
The ingestSession function destructures only { turns, content } from parseClaudeSession, ignoring the new events field added in this PR. Compaction events (compact_boundary markers) are never persisted via appendCompactions(). Critically, the function then sets the cursor to EOF at line 80 (offsetBytes: st.size), so when ingestAll() runs later (e.g. via burn summary), it sees startOffset >= st.size and skips the file entirely (packages/cli/src/ingest.ts:102). This permanently loses compaction data for sessions ingested through the burn claude wrapper. The ingestAll() path in packages/cli/src/ingest.ts:115-127 correctly destructures and persists events, so this is an inconsistency specific to the spawn-wrapper codepath.
View 10 additional findings in Devin Review.
Summary
burn waste.burn waste --patterns[=retries,failures,compaction,reverts]and a newburn diagnose <session-id>command that renders per-tool attribution + all four detectors for a single session.tool_result.is_errorontoToolCall, capture Edit/Write pre/post content hashes, and emitCompactionEventrecords from Claude Code'stype:system subtype:compact_boundarymarkers. Persists compactions in the ledger via a newCompactionLinekind with id-hash dedup.retryLoopCount,consecutiveFailureMax,editRevertCount, …) so downstream commands (burn compare, health grading) can read the summary without re-running the detector suite — per the discussion on the issue.Closes #11.
What's not in this PR (and why)
Follow-ups explicitly deferred:
source === 'opencode'; will land as a follow-up since they depend on OpenCode-specific fixture work.WasteFinding/WasteActionstructured schema upgrade — intentional: these four detectors ship with plain result types first so downstream consumers stabilize before we wrap them in a common envelope.Acceptance (from the issue)
Bashcalls reports one retry-loop of length 4 with nonzero cumulative cost (tests/fixtures/claude/retry-loop.jsonl).compact_boundarymarker reportstokensBeforeCompact = 9000from the preceding turn'scacheRead, priced against that turn's model.Test plan
pnpm run test:ts— 225 pass (15 new tests)pnpm dev:cli waste --patterns --jsonagainst real Claude Code sessions — finds retry-free sessions with real failure-runs and edit-revertspnpm dev:cli diagnose <session-id>against a real session with a known edit-revert — renders expected table🤖 Generated with Claude Code