Context
Every Anthropic completion carries a stop_reason: end_turn, max_tokens, pause_turn, stop_sequence, tool_use, refusal. Burn ingests this data but doesn't surface it. That means common failure modes are invisible:
max_tokens truncations — the model ran out of output budget. Often the cause of "agent stopped mid-edit" symptoms.
refusal — content policy stopped the response. Useful to count for prompt-quality audits.
pause_turn — interleaved-thinking pause. Useful for distinguishing real completions from in-flight ones.
Prior art: agent-profiler. Source: ui/src/components/conversation/transforms.ts, deriveTurnOutcome. Reads assistant stop_reason events and yields one of end_turn | max_tokens | pause_turn | refusal | silent.
Proposal
- Reader: extract
stop_reason from the assistant row of each inference and store it on the inference (or turn) record.
- Summary: add a one-line outcome breakdown to
burn summary:
Turn outcomes: 142 end_turn, 3 max_tokens, 1 refusal, 0 pause
- Per-turn: surface in
burn hotspots --explain and in any future per-turn detail verb.
- SDK: add
stop_reason: Option<StopReason> to the inference / turn struct so it's queryable.
Implementation sketch
pub enum StopReason { EndTurn, MaxTokens, PauseTurn, StopSequence, ToolUse, Refusal, Silent }
Silent covers the case where an inference's row was written but no stop_reason is present (mid-write, sidechain, etc.). The current row already lands somewhere via the reader — just preserve the field.
Open questions
- Codex equivalent. Codex rollouts have their own outcome semantics; map to the same enum or keep a
CodexStopReason parallel? First-cut: extend the enum or use String-typed raw_stop_reason for non-Anthropic harnesses.
- Aggregation level. Per-inference or per-turn? Per-inference is more correct (multi-inference turns can have different outcomes). Roll up to per-turn as max-severity for summary.
- Refusal context. Just count, or also surface the user prompt that triggered it? Probably out of scope here — counting is the cheap win.
Acceptance
References
- agent-profiler:
ui/src/components/conversation/transforms.ts deriveTurnOutcome.
- Anthropic docs: stop_reason values.
- Related: span-tree foundation (
stop_reason is a natural attribute on the Turn root span).
Context
Every Anthropic completion carries a
stop_reason:end_turn,max_tokens,pause_turn,stop_sequence,tool_use,refusal. Burn ingests this data but doesn't surface it. That means common failure modes are invisible:max_tokenstruncations — the model ran out of output budget. Often the cause of "agent stopped mid-edit" symptoms.refusal— content policy stopped the response. Useful to count for prompt-quality audits.pause_turn— interleaved-thinking pause. Useful for distinguishing real completions from in-flight ones.Prior art: agent-profiler. Source:
ui/src/components/conversation/transforms.ts,deriveTurnOutcome. Reads assistantstop_reasonevents and yields one ofend_turn | max_tokens | pause_turn | refusal | silent.Proposal
stop_reasonfrom the assistant row of each inference and store it on the inference (or turn) record.burn summary:burn hotspots --explainand in any future per-turn detail verb.stop_reason: Option<StopReason>to the inference / turn struct so it's queryable.Implementation sketch
Silentcovers the case where an inference's row was written but nostop_reasonis present (mid-write, sidechain, etc.). The current row already lands somewhere via the reader — just preserve the field.Open questions
CodexStopReasonparallel? First-cut: extend the enum or useString-typedraw_stop_reasonfor non-Anthropic harnesses.Acceptance
stop_reasonparsed and stored at ingest for Claude rows.burn summaryshows outcome counts.stop_reasonon inference/turn records.max_tokensturn surfaces it in summary output.References
ui/src/components/conversation/transforms.tsderiveTurnOutcome.stop_reasonis a natural attribute on the Turn root span).