sdk: introduce per-turn span tree as analytical primitive

## Context

Burn's analytical units today are flat rows: `TurnRecord`, `tool_result_event`, content sidecars. That shape is great for tabular aggregation (summary, hotspots, overhead) but loses the hierarchical structure of a turn — inferences within a turn, tool_uses nested inside an inference, subagent fanouts nested inside a tool_use. Anything that wants to answer "where in this turn did the cost/time/context blow up?" has to re-derive that hierarchy ad-hoc per call site.

Prior art worth stealing: [DevonPeroutky/agent-profiler](https://github.com/DevonPeroutky/agent-profiler) models every turn as an OTel-style **span tree**, and projects scalars off the tree rather than the other way around:

```ts
// agent-profiler: lib/traces/types.d.ts
interface SpanNode {
  startMs: number;
  endMs: number;
  durationMs: number;
  name: string;
  kind: 'turn' | 'inference' | 'tool_use' | 'subagent' | 'skill' | ...;
  status: { code: 0 | 1 | 2; message?: string };
  attributes: Record<string, unknown>;
  events: Array<{ ts: number; name: string; attributes: Record<string, unknown> }>;
  children: SpanNode[];
}

interface Turn {
  traceId: string;
  sessionId: string;
  turnNumber: number;
  startMs: number; endMs: number; durationMs: number;
  userPrompt: string;
  // ...scalar tokens, toolCount, errorCount, finalMode, cwd, model...
  // "top-level scalar fields are a pure projection of values already on
  //  root.attributes / root.events" — i.e. the tree is canonical.
  root: SpanNode;
}

type TraceSummary = Turn | UnattachedGroup;
```

Construction lives in `lib/claude-code/traces.js` (Claude side) and `lib/codex/traces.js` (Codex side); the shape is harness-neutral by contract.

## Proposal

Add a span-tree projection to `relayburn-sdk` as a **derived** view over existing ledger data. No schema migration; built on-demand from `TurnRecord` + `tool_result_event` + content sidecars.

```rust
// crates/relayburn-sdk/src/analyze/span_tree.rs (new)

pub enum SpanKind {
    Turn, Inference, ToolUse, Subagent, Skill, UserPrompt, ToolResult,
}

pub struct SpanNode {
    pub kind: SpanKind,
    pub name: String,
    pub start_ms: i64,
    pub end_ms: i64,
    pub status: SpanStatus,                     // Ok | Error { msg }
    pub attributes: BTreeMap<String, AttrValue>, // tokens, model, request_id, agent_id, ...
    pub events: Vec<SpanEvent>,
    pub children: Vec<SpanNode>,
}

pub struct TurnSpanTree {
    pub session_id: String,
    pub turn_id: String,
    pub turn_number: u32,
    pub root: SpanNode,
}

impl LedgerHandle {
    pub fn turn_span_tree(&self, session_id: &str, turn_id: &str) -> Result<TurnSpanTree>;
    pub fn session_span_trees(&self, session_id: &str) -> Result<Vec<TurnSpanTree>>;
}
```

### Span-kind hierarchy (Claude Code)

```
Turn (root)
├── UserPrompt
├── Inference                       <- one per requestId (#issue requestId-dedup)
│   ├── ToolUse (Bash)
│   │   └── ToolResult (status, byte size, attached content)
│   ├── ToolUse (Task)              <- subagent dispatch
│   │   └── Subagent                <- nested span tree from agent-*.jsonl
│   │       └── ...
│   └── ToolUse (Skill)             <- synthesized from slash-command triad
└── Inference                       <- if turn produced multiple API calls
    └── ...
```

### Why this is the foundation, not the feature

Multiple in-flight ideas collapse into "project off the span tree":
- **Inference-flow DAG** (#flow-dag) is a layout pass over `Vec<TurnSpanTree>`.
- **Context-delta attribution** (#context-delta) walks consecutive `Inference` nodes and credits the intervening `ToolResult` / `UserPrompt` children to the delta.
- **Hotspots** can attribute against any span level (per-tool, per-subagent, per-inference), not just `tool_use_id`.
- **Overhead trim** can show which `Inference` first paid the cached-prefix cost.
- **`stop_reason` outcome** (#turn-outcome) is an event on the `Turn` root.
- **`UnattachedGroup`** (#subagent-pairing) is a top-level `Subagent` span without a parent `ToolUse`.

## Implementation sketch

1. **Pure projection module** `crates/relayburn-sdk/src/analyze/span_tree.rs`.
   - Inputs: `TurnRecord`, child `tool_result_event` rows for the turn, optional content sidecar reads, optional subagent transcripts.
   - Output: `TurnSpanTree`. Pure function — no DB writes.
2. **Harness-specific builders** under `crates/relayburn-sdk/src/reader/`.
   - `reader/claude/span_tree.rs`: assembles Inference nodes per `requestId`, nests `ToolUse` from assistant `tool_use` blocks, pairs `ToolResult` by `tool_use_id`, stitches subagent subtrees via `toolUseResult.agentId` (cf. agent-profiler `lib/claude-code/traces.js` `buildInferenceSpansForSlice` and subagent pairing).
   - `reader/codex/span_tree.rs`: analogous, using Codex's rollout structure.
3. **SDK verbs** `LedgerHandle::turn_span_tree`, `LedgerHandle::session_span_trees`.
4. **Node SDK facade** in `packages/sdk-node` — wire the napi binding with a TS type that mirrors `SpanNode` exactly so the MCP server and any future web UI can consume the same shape.
5. **MCP** new tool `burn__turnSpanTree` once SDK verb stabilizes (separate follow-up).

## Open questions

1. **Caching.** Should the tree be memoized in `burn.sqlite` (new `span_trees` table) or always re-derived from `TurnRecord`? Re-derivation cost on a 10K-turn session needs measurement before deciding. Default position: always derive; cache only if profiling demands it.
2. **Time source.** Some Claude rows are missing wall-clock timestamps. `start_ms` / `end_ms` should fall back to row order from `_rowIndex` so layout still works.
3. **Attribute schema.** Lock the attribute keys (`tokens.input`, `tokens.output`, `tokens.cache_read`, `tokens.cache_write`, `model`, `request_id`, `agent_id`, `cwd`, `mode`, `stop_reason`) in one place so consumers can rely on them.
4. **Status semantics.** OTel uses `0 ok / 1 unset / 2 error`. Map our existing error signals (tool_use `is_error`, refusals, max_tokens) cleanly into this.
5. **Inference grouping** depends on #requestId-dedup — that should land first or be done in the same PR.
6. **Subagent stitching** depends on #subagent-pairing.

## Acceptance

- [ ] `crates/relayburn-sdk/src/analyze/span_tree.rs` exists with `SpanNode`, `TurnSpanTree`, `SpanKind`, `SpanStatus`, `SpanEvent` types.
- [ ] `LedgerHandle::turn_span_tree` returns a well-formed tree for the cli-golden fixture; depth-first traversal sums match `TurnRecord` scalars within rounding tolerance.
- [ ] Builder for Claude Code groups inferences by `request_id`, pairs `tool_result` by `tool_use_id`, and nests subagent subtrees by `agentId`.
- [ ] Builder for Codex produces a tree of equivalent shape from `rollout-*.jsonl`.
- [ ] Attribute keys documented in module doc comment.
- [ ] Unit tests cover: single-inference turn, multi-inference turn, tool_use with paired result, tool_use with unpaired subagent (UnattachedGroup-equivalent), max_tokens turn, error turn.
- [ ] `packages/sdk-node/src/index.d.ts` exports the mirrored TS type.

## References

- agent-profiler: `lib/traces/types.d.ts`, `lib/claude-code/traces.js` (`findTurnRoot`, `sliceTurns`, `buildInferenceSpansForSlice`), `lib/adapters/registry.js` (harness prefix on traceId).
- Current burn data shapes: `crates/relayburn-sdk/src/reader/types.rs` (`TurnRecord`), `crates/relayburn-sdk/src/ledger/schema.rs`.
- Downstream consumers that would migrate to span trees: `crates/relayburn-sdk/src/analyze/hotspots.rs`, `crates/relayburn-sdk/src/analyze/overhead.rs`, `crates/relayburn-sdk/src/analyze/tool_output_bloat.rs`.

## Related

- Inference-flow DAG (consumer of this)
- Context-delta attribution (consumer of this)
- `requestId` dedup (prerequisite for the Inference layer)
- Subagent pairing via `agentId` (prerequisite for Subagent subtrees)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sdk: introduce per-turn span tree as analytical primitive #430

Context

Proposal

Span-kind hierarchy (Claude Code)

Why this is the foundation, not the feature

Implementation sketch

Open questions

Acceptance

References

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

sdk: introduce per-turn span tree as analytical primitive #430

Description

Context

Proposal

Span-kind hierarchy (Claude Code)

Why this is the foundation, not the feature

Implementation sketch

Open questions

Acceptance

References

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions