summary: expose turn outcome (stop_reason) — end_turn / max_tokens / refusal / pause

## Context

Every Anthropic completion carries a `stop_reason`: `end_turn`, `max_tokens`, `pause_turn`, `stop_sequence`, `tool_use`, `refusal`. Burn ingests this data but doesn't surface it. That means common failure modes are invisible:

- **`max_tokens` truncations** — the model ran out of output budget. Often the cause of "agent stopped mid-edit" symptoms.
- **`refusal`** — content policy stopped the response. Useful to count for prompt-quality audits.
- **`pause_turn`** — interleaved-thinking pause. Useful for distinguishing real completions from in-flight ones.

Prior art: [agent-profiler](https://github.com/DevonPeroutky/agent-profiler). Source: `ui/src/components/conversation/transforms.ts`, `deriveTurnOutcome`. Reads assistant `stop_reason` events and yields one of `end_turn | max_tokens | pause_turn | refusal | silent`.

## Proposal

1. **Reader**: extract `stop_reason` from the assistant row of each inference and store it on the inference (or turn) record.
2. **Summary**: add a one-line outcome breakdown to `burn summary`:
   ```
   Turn outcomes: 142 end_turn, 3 max_tokens, 1 refusal, 0 pause
   ```
3. **Per-turn**: surface in `burn hotspots --explain` and in any future per-turn detail verb.
4. **SDK**: add `stop_reason: Option<StopReason>` to the inference / turn struct so it's queryable.

## Implementation sketch

```rust
pub enum StopReason { EndTurn, MaxTokens, PauseTurn, StopSequence, ToolUse, Refusal, Silent }
```

`Silent` covers the case where an inference's row was written but no `stop_reason` is present (mid-write, sidechain, etc.). The current row already lands somewhere via the reader — just preserve the field.

## Open questions

1. **Codex equivalent.** Codex rollouts have their own outcome semantics; map to the same enum or keep a `CodexStopReason` parallel? First-cut: extend the enum or use `String`-typed `raw_stop_reason` for non-Anthropic harnesses.
2. **Aggregation level.** Per-inference or per-turn? Per-inference is more correct (multi-inference turns can have different outcomes). Roll up to per-turn as max-severity for summary.
3. **Refusal context.** Just count, or also surface the user prompt that triggered it? Probably out of scope here — counting is the cheap win.

## Acceptance

- [ ] `stop_reason` parsed and stored at ingest for Claude rows.
- [ ] `burn summary` shows outcome counts.
- [ ] SDK exposes `stop_reason` on inference/turn records.
- [ ] Fixture with a `max_tokens` turn surfaces it in summary output.
- [ ] Codex behavior documented (mapped or raw-passthrough).

## References

- agent-profiler: `ui/src/components/conversation/transforms.ts` `deriveTurnOutcome`.
- Anthropic docs: stop_reason values.
- Related: span-tree foundation (`stop_reason` is a natural attribute on the Turn root span).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

summary: expose turn outcome (stop_reason) — end_turn / max_tokens / refusal / pause #437

Context

Proposal

Implementation sketch

Open questions

Acceptance

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

summary: expose turn outcome (stop_reason) — end_turn / max_tokens / refusal / pause #437

Description

Context

Proposal

Implementation sketch

Open questions

Acceptance

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions