Skip to content

sdk: introduce per-turn span tree as analytical primitive #430

@willwashburn

Description

@willwashburn

Context

Burn's analytical units today are flat rows: TurnRecord, tool_result_event, content sidecars. That shape is great for tabular aggregation (summary, hotspots, overhead) but loses the hierarchical structure of a turn — inferences within a turn, tool_uses nested inside an inference, subagent fanouts nested inside a tool_use. Anything that wants to answer "where in this turn did the cost/time/context blow up?" has to re-derive that hierarchy ad-hoc per call site.

Prior art worth stealing: DevonPeroutky/agent-profiler models every turn as an OTel-style span tree, and projects scalars off the tree rather than the other way around:

// agent-profiler: lib/traces/types.d.ts
interface SpanNode {
  startMs: number;
  endMs: number;
  durationMs: number;
  name: string;
  kind: 'turn' | 'inference' | 'tool_use' | 'subagent' | 'skill' | ...;
  status: { code: 0 | 1 | 2; message?: string };
  attributes: Record<string, unknown>;
  events: Array<{ ts: number; name: string; attributes: Record<string, unknown> }>;
  children: SpanNode[];
}

interface Turn {
  traceId: string;
  sessionId: string;
  turnNumber: number;
  startMs: number; endMs: number; durationMs: number;
  userPrompt: string;
  // ...scalar tokens, toolCount, errorCount, finalMode, cwd, model...
  // "top-level scalar fields are a pure projection of values already on
  //  root.attributes / root.events" — i.e. the tree is canonical.
  root: SpanNode;
}

type TraceSummary = Turn | UnattachedGroup;

Construction lives in lib/claude-code/traces.js (Claude side) and lib/codex/traces.js (Codex side); the shape is harness-neutral by contract.

Proposal

Add a span-tree projection to relayburn-sdk as a derived view over existing ledger data. No schema migration; built on-demand from TurnRecord + tool_result_event + content sidecars.

// crates/relayburn-sdk/src/analyze/span_tree.rs (new)

pub enum SpanKind {
    Turn, Inference, ToolUse, Subagent, Skill, UserPrompt, ToolResult,
}

pub struct SpanNode {
    pub kind: SpanKind,
    pub name: String,
    pub start_ms: i64,
    pub end_ms: i64,
    pub status: SpanStatus,                     // Ok | Error { msg }
    pub attributes: BTreeMap<String, AttrValue>, // tokens, model, request_id, agent_id, ...
    pub events: Vec<SpanEvent>,
    pub children: Vec<SpanNode>,
}

pub struct TurnSpanTree {
    pub session_id: String,
    pub turn_id: String,
    pub turn_number: u32,
    pub root: SpanNode,
}

impl LedgerHandle {
    pub fn turn_span_tree(&self, session_id: &str, turn_id: &str) -> Result<TurnSpanTree>;
    pub fn session_span_trees(&self, session_id: &str) -> Result<Vec<TurnSpanTree>>;
}

Span-kind hierarchy (Claude Code)

Turn (root)
├── UserPrompt
├── Inference                       <- one per requestId (#issue requestId-dedup)
│   ├── ToolUse (Bash)
│   │   └── ToolResult (status, byte size, attached content)
│   ├── ToolUse (Task)              <- subagent dispatch
│   │   └── Subagent                <- nested span tree from agent-*.jsonl
│   │       └── ...
│   └── ToolUse (Skill)             <- synthesized from slash-command triad
└── Inference                       <- if turn produced multiple API calls
    └── ...

Why this is the foundation, not the feature

Multiple in-flight ideas collapse into "project off the span tree":

  • Inference-flow DAG (#flow-dag) is a layout pass over Vec<TurnSpanTree>.
  • Context-delta attribution (#context-delta) walks consecutive Inference nodes and credits the intervening ToolResult / UserPrompt children to the delta.
  • Hotspots can attribute against any span level (per-tool, per-subagent, per-inference), not just tool_use_id.
  • Overhead trim can show which Inference first paid the cached-prefix cost.
  • stop_reason outcome (#turn-outcome) is an event on the Turn root.
  • UnattachedGroup (#subagent-pairing) is a top-level Subagent span without a parent ToolUse.

Implementation sketch

  1. Pure projection module crates/relayburn-sdk/src/analyze/span_tree.rs.
    • Inputs: TurnRecord, child tool_result_event rows for the turn, optional content sidecar reads, optional subagent transcripts.
    • Output: TurnSpanTree. Pure function — no DB writes.
  2. Harness-specific builders under crates/relayburn-sdk/src/reader/.
    • reader/claude/span_tree.rs: assembles Inference nodes per requestId, nests ToolUse from assistant tool_use blocks, pairs ToolResult by tool_use_id, stitches subagent subtrees via toolUseResult.agentId (cf. agent-profiler lib/claude-code/traces.js buildInferenceSpansForSlice and subagent pairing).
    • reader/codex/span_tree.rs: analogous, using Codex's rollout structure.
  3. SDK verbs LedgerHandle::turn_span_tree, LedgerHandle::session_span_trees.
  4. Node SDK facade in packages/sdk-node — wire the napi binding with a TS type that mirrors SpanNode exactly so the MCP server and any future web UI can consume the same shape.
  5. MCP new tool burn__turnSpanTree once SDK verb stabilizes (separate follow-up).

Open questions

  1. Caching. Should the tree be memoized in burn.sqlite (new span_trees table) or always re-derived from TurnRecord? Re-derivation cost on a 10K-turn session needs measurement before deciding. Default position: always derive; cache only if profiling demands it.
  2. Time source. Some Claude rows are missing wall-clock timestamps. start_ms / end_ms should fall back to row order from _rowIndex so layout still works.
  3. Attribute schema. Lock the attribute keys (tokens.input, tokens.output, tokens.cache_read, tokens.cache_write, model, request_id, agent_id, cwd, mode, stop_reason) in one place so consumers can rely on them.
  4. Status semantics. OTel uses 0 ok / 1 unset / 2 error. Map our existing error signals (tool_use is_error, refusals, max_tokens) cleanly into this.
  5. Inference grouping depends on #requestId-dedup — that should land first or be done in the same PR.
  6. Subagent stitching depends on #subagent-pairing.

Acceptance

  • crates/relayburn-sdk/src/analyze/span_tree.rs exists with SpanNode, TurnSpanTree, SpanKind, SpanStatus, SpanEvent types.
  • LedgerHandle::turn_span_tree returns a well-formed tree for the cli-golden fixture; depth-first traversal sums match TurnRecord scalars within rounding tolerance.
  • Builder for Claude Code groups inferences by request_id, pairs tool_result by tool_use_id, and nests subagent subtrees by agentId.
  • Builder for Codex produces a tree of equivalent shape from rollout-*.jsonl.
  • Attribute keys documented in module doc comment.
  • Unit tests cover: single-inference turn, multi-inference turn, tool_use with paired result, tool_use with unpaired subagent (UnattachedGroup-equivalent), max_tokens turn, error turn.
  • packages/sdk-node/src/index.d.ts exports the mirrored TS type.

References

  • agent-profiler: lib/traces/types.d.ts, lib/claude-code/traces.js (findTurnRoot, sliceTurns, buildInferenceSpansForSlice), lib/adapters/registry.js (harness prefix on traceId).
  • Current burn data shapes: crates/relayburn-sdk/src/reader/types.rs (TurnRecord), crates/relayburn-sdk/src/ledger/schema.rs.
  • Downstream consumers that would migrate to span trees: crates/relayburn-sdk/src/analyze/hotspots.rs, crates/relayburn-sdk/src/analyze/overhead.rs, crates/relayburn-sdk/src/analyze/tool_output_bloat.rs.

Related

  • Inference-flow DAG (consumer of this)
  • Context-delta attribution (consumer of this)
  • requestId dedup (prerequisite for the Inference layer)
  • Subagent pairing via agentId (prerequisite for Subagent subtrees)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions