You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Burn's analytical units today are flat rows: TurnRecord, tool_result_event, content sidecars. That shape is great for tabular aggregation (summary, hotspots, overhead) but loses the hierarchical structure of a turn — inferences within a turn, tool_uses nested inside an inference, subagent fanouts nested inside a tool_use. Anything that wants to answer "where in this turn did the cost/time/context blow up?" has to re-derive that hierarchy ad-hoc per call site.
Prior art worth stealing: DevonPeroutky/agent-profiler models every turn as an OTel-style span tree, and projects scalars off the tree rather than the other way around:
// agent-profiler: lib/traces/types.d.tsinterfaceSpanNode{startMs: number;endMs: number;durationMs: number;name: string;kind: 'turn'|'inference'|'tool_use'|'subagent'|'skill'| ...;status: {code: 0|1|2;message?: string};attributes: Record<string,unknown>;events: Array<{ts: number;name: string;attributes: Record<string,unknown>}>;children: SpanNode[];}interfaceTurn{traceId: string;sessionId: string;turnNumber: number;startMs: number;endMs: number;durationMs: number;userPrompt: string;// ...scalar tokens, toolCount, errorCount, finalMode, cwd, model...// "top-level scalar fields are a pure projection of values already on// root.attributes / root.events" — i.e. the tree is canonical.root: SpanNode;}typeTraceSummary=Turn|UnattachedGroup;
Construction lives in lib/claude-code/traces.js (Claude side) and lib/codex/traces.js (Codex side); the shape is harness-neutral by contract.
Proposal
Add a span-tree projection to relayburn-sdk as a derived view over existing ledger data. No schema migration; built on-demand from TurnRecord + tool_result_event + content sidecars.
Turn (root)
├── UserPrompt
├── Inference <- one per requestId (#issue requestId-dedup)
│ ├── ToolUse (Bash)
│ │ └── ToolResult (status, byte size, attached content)
│ ├── ToolUse (Task) <- subagent dispatch
│ │ └── Subagent <- nested span tree from agent-*.jsonl
│ │ └── ...
│ └── ToolUse (Skill) <- synthesized from slash-command triad
└── Inference <- if turn produced multiple API calls
└── ...
Why this is the foundation, not the feature
Multiple in-flight ideas collapse into "project off the span tree":
Inference-flow DAG (#flow-dag) is a layout pass over Vec<TurnSpanTree>.
Context-delta attribution (#context-delta) walks consecutive Inference nodes and credits the intervening ToolResult / UserPrompt children to the delta.
Hotspots can attribute against any span level (per-tool, per-subagent, per-inference), not just tool_use_id.
Overhead trim can show which Inference first paid the cached-prefix cost.
stop_reason outcome (#turn-outcome) is an event on the Turn root.
UnattachedGroup (#subagent-pairing) is a top-level Subagent span without a parent ToolUse.
Implementation sketch
Pure projection modulecrates/relayburn-sdk/src/analyze/span_tree.rs.
Inputs: TurnRecord, child tool_result_event rows for the turn, optional content sidecar reads, optional subagent transcripts.
Output: TurnSpanTree. Pure function — no DB writes.
Harness-specific builders under crates/relayburn-sdk/src/reader/.
reader/claude/span_tree.rs: assembles Inference nodes per requestId, nests ToolUse from assistant tool_use blocks, pairs ToolResult by tool_use_id, stitches subagent subtrees via toolUseResult.agentId (cf. agent-profiler lib/claude-code/traces.jsbuildInferenceSpansForSlice and subagent pairing).
reader/codex/span_tree.rs: analogous, using Codex's rollout structure.
Node SDK facade in packages/sdk-node — wire the napi binding with a TS type that mirrors SpanNode exactly so the MCP server and any future web UI can consume the same shape.
MCP new tool burn__turnSpanTree once SDK verb stabilizes (separate follow-up).
Open questions
Caching. Should the tree be memoized in burn.sqlite (new span_trees table) or always re-derived from TurnRecord? Re-derivation cost on a 10K-turn session needs measurement before deciding. Default position: always derive; cache only if profiling demands it.
Time source. Some Claude rows are missing wall-clock timestamps. start_ms / end_ms should fall back to row order from _rowIndex so layout still works.
Attribute schema. Lock the attribute keys (tokens.input, tokens.output, tokens.cache_read, tokens.cache_write, model, request_id, agent_id, cwd, mode, stop_reason) in one place so consumers can rely on them.
Status semantics. OTel uses 0 ok / 1 unset / 2 error. Map our existing error signals (tool_use is_error, refusals, max_tokens) cleanly into this.
Inference grouping depends on #requestId-dedup — that should land first or be done in the same PR.
Subagent stitching depends on #subagent-pairing.
Acceptance
crates/relayburn-sdk/src/analyze/span_tree.rs exists with SpanNode, TurnSpanTree, SpanKind, SpanStatus, SpanEvent types.
LedgerHandle::turn_span_tree returns a well-formed tree for the cli-golden fixture; depth-first traversal sums match TurnRecord scalars within rounding tolerance.
Builder for Claude Code groups inferences by request_id, pairs tool_result by tool_use_id, and nests subagent subtrees by agentId.
Builder for Codex produces a tree of equivalent shape from rollout-*.jsonl.
Attribute keys documented in module doc comment.
Unit tests cover: single-inference turn, multi-inference turn, tool_use with paired result, tool_use with unpaired subagent (UnattachedGroup-equivalent), max_tokens turn, error turn.
packages/sdk-node/src/index.d.ts exports the mirrored TS type.
Current burn data shapes: crates/relayburn-sdk/src/reader/types.rs (TurnRecord), crates/relayburn-sdk/src/ledger/schema.rs.
Downstream consumers that would migrate to span trees: crates/relayburn-sdk/src/analyze/hotspots.rs, crates/relayburn-sdk/src/analyze/overhead.rs, crates/relayburn-sdk/src/analyze/tool_output_bloat.rs.
Related
Inference-flow DAG (consumer of this)
Context-delta attribution (consumer of this)
requestId dedup (prerequisite for the Inference layer)
Subagent pairing via agentId (prerequisite for Subagent subtrees)
Context
Burn's analytical units today are flat rows:
TurnRecord,tool_result_event, content sidecars. That shape is great for tabular aggregation (summary, hotspots, overhead) but loses the hierarchical structure of a turn — inferences within a turn, tool_uses nested inside an inference, subagent fanouts nested inside a tool_use. Anything that wants to answer "where in this turn did the cost/time/context blow up?" has to re-derive that hierarchy ad-hoc per call site.Prior art worth stealing: DevonPeroutky/agent-profiler models every turn as an OTel-style span tree, and projects scalars off the tree rather than the other way around:
Construction lives in
lib/claude-code/traces.js(Claude side) andlib/codex/traces.js(Codex side); the shape is harness-neutral by contract.Proposal
Add a span-tree projection to
relayburn-sdkas a derived view over existing ledger data. No schema migration; built on-demand fromTurnRecord+tool_result_event+ content sidecars.Span-kind hierarchy (Claude Code)
Why this is the foundation, not the feature
Multiple in-flight ideas collapse into "project off the span tree":
Vec<TurnSpanTree>.Inferencenodes and credits the interveningToolResult/UserPromptchildren to the delta.tool_use_id.Inferencefirst paid the cached-prefix cost.stop_reasonoutcome (#turn-outcome) is an event on theTurnroot.UnattachedGroup(#subagent-pairing) is a top-levelSubagentspan without a parentToolUse.Implementation sketch
crates/relayburn-sdk/src/analyze/span_tree.rs.TurnRecord, childtool_result_eventrows for the turn, optional content sidecar reads, optional subagent transcripts.TurnSpanTree. Pure function — no DB writes.crates/relayburn-sdk/src/reader/.reader/claude/span_tree.rs: assembles Inference nodes perrequestId, nestsToolUsefrom assistanttool_useblocks, pairsToolResultbytool_use_id, stitches subagent subtrees viatoolUseResult.agentId(cf. agent-profilerlib/claude-code/traces.jsbuildInferenceSpansForSliceand subagent pairing).reader/codex/span_tree.rs: analogous, using Codex's rollout structure.LedgerHandle::turn_span_tree,LedgerHandle::session_span_trees.packages/sdk-node— wire the napi binding with a TS type that mirrorsSpanNodeexactly so the MCP server and any future web UI can consume the same shape.burn__turnSpanTreeonce SDK verb stabilizes (separate follow-up).Open questions
burn.sqlite(newspan_treestable) or always re-derived fromTurnRecord? Re-derivation cost on a 10K-turn session needs measurement before deciding. Default position: always derive; cache only if profiling demands it.start_ms/end_msshould fall back to row order from_rowIndexso layout still works.tokens.input,tokens.output,tokens.cache_read,tokens.cache_write,model,request_id,agent_id,cwd,mode,stop_reason) in one place so consumers can rely on them.0 ok / 1 unset / 2 error. Map our existing error signals (tool_useis_error, refusals, max_tokens) cleanly into this.Acceptance
crates/relayburn-sdk/src/analyze/span_tree.rsexists withSpanNode,TurnSpanTree,SpanKind,SpanStatus,SpanEventtypes.LedgerHandle::turn_span_treereturns a well-formed tree for the cli-golden fixture; depth-first traversal sums matchTurnRecordscalars within rounding tolerance.request_id, pairstool_resultbytool_use_id, and nests subagent subtrees byagentId.rollout-*.jsonl.packages/sdk-node/src/index.d.tsexports the mirrored TS type.References
lib/traces/types.d.ts,lib/claude-code/traces.js(findTurnRoot,sliceTurns,buildInferenceSpansForSlice),lib/adapters/registry.js(harness prefix on traceId).crates/relayburn-sdk/src/reader/types.rs(TurnRecord),crates/relayburn-sdk/src/ledger/schema.rs.crates/relayburn-sdk/src/analyze/hotspots.rs,crates/relayburn-sdk/src/analyze/overhead.rs,crates/relayburn-sdk/src/analyze/tool_output_bloat.rs.Related
requestIddedup (prerequisite for the Inference layer)agentId(prerequisite for Subagent subtrees)