sdk: per-turn span tree as analytical primitive (#430)#451
Conversation
Project the per-turn hierarchy that flat row records can't express:
Turn -> { UserPrompt, Inference -> ToolUse -> { ToolResult, Subagent } }.
Pure projection over TurnRecord + tool_result_event rows + Claude
subagent sidecars - no schema change, no caching, derive on every call.
- analyze/span_tree.rs: SpanKind / SpanStatus / AttrValue / SpanEvent /
SpanNode / TurnSpanTree types with locked-in attribute schema
(tokens.*, model, request_id, agent_id, tool_use_id, stop_reason)
documented in the module preamble. Kebab-case wire form matches the
existing repo convention.
- reader/claude/span_tree.rs: harness builder that consumes the #448
Inference aggregates (falls back to a synthetic single-inference for
pre-v5 ledgers), pairs tool_result events by tool_use_id, and nests
paired subagents under their Task ToolUse. Unpaired subagents
surface as sibling Subagent spans under the Turn root with
attributes["unattached"] = true.
- reader/codex/span_tree.rs: equivalent builder for Codex rollouts.
Codex carries strictly less hierarchy (no requestId, no sidecar
transcripts, no stop_reason), so the builder is documented as
limited and produces Turn -> { UserPrompt, Inference -> ToolUse }
without fabricating data.
- query_verbs.rs: LedgerHandle::turn_span_tree(session_id, turn_id) and
session_span_trees(session_id) verbs + free-function forms; source
dispatch picks the right per-harness builder.
Status mapping: tool_use.is_error -> "tool_error" on the ToolUse span,
bubbles to parent Inference and root as "child_error"; stop_reason ==
Refusal -> "refusal" on root; stop_reason == MaxTokens ->
"max_tokens" on root.
Tests: 29 new (12 type / 10 Claude builder / 3 Codex builder /
4 LedgerHandle integration), covering every acceptance case in the
issue. cargo test --workspace passes 871 tests; BURN_GOLDEN=1
cargo test --test golden passes 5.
https://claude.ai/code/session_01QEpNZbWEYNwxzqQjTN5LCY
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (10)
📝 WalkthroughWalkthroughThis PR introduces a per-turn span tree analytical primitive for the relayburn SDK. It projects existing ledger data ( ChangesPer-turn span tree analytical primitive
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0a93c6e6cc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| turn, | ||
| tool_result_events: &events_for_turn, | ||
| inferences: &infs_for_turn, | ||
| subagents: &subagents, |
There was a problem hiding this comment.
Restrict subagent sidecars to the active turn
session_span_trees passes the full session-level subagents slice into every Claude turn build, but build_claude_span_tree treats paired_tool_use_id == None as an unattached root child. That means any orphan sidecar is duplicated into every turn in the session, inflating per-turn trees and any downstream counts/cost rollups derived from those nodes. Filter subagents per turn (or otherwise assign unattached sidecars once) before calling the builder.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in a3086ef. session_span_trees now buckets the session-wide subagents slice per turn before calling build_claude_span_tree, so each sidecar lands in exactly one turn. Paired sidecars route by tool_use_id; orphans go to the latest turn whose start_ms <= subagent_start_ms (falling back to the first turn). The orphan-semantics choice (sibling under turn root with attributes["unattached"] = true) stands — only the duplication is fixed. New bucket_subagents_per_turn unit test + a session_span_trees regression test cover paired + orphan placement with an explicit no-duplication assertion.
Generated by Claude Code
| if tool_node.end_ms < result_node.end_ms { | ||
| tool_node.end_ms = result_node.end_ms; | ||
| } |
There was a problem hiding this comment.
Propagate tool-result end time to Claude inference span
This branch extends tool_node.end_ms from the paired ToolResult timestamp, but the parent Inference span (node.end_ms) is never updated to that later value. Because root end time is computed from inference end times, turns with tool results after the assistant row will report truncated durations. Update the inference end to max(node.end_ms, tool_node.end_ms) before pushing the child.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in a3086ef. After widening tool_node.end_ms from the paired ToolResult and pushing the tool_use child, the Claude builder now propagates node.end_ms = node.end_ms.max(tool_node.end_ms) so the parent Inference (and the turn root, which rolls up inference ends) reflects the trailing ToolResult timestamp. New regression test tool_result_after_assistant_row_widens_inference_and_root_end_ms covers it.
Generated by Claude Code
| if tool_node.end_ms < result_node.end_ms { | ||
| tool_node.end_ms = result_node.end_ms; | ||
| } |
There was a problem hiding this comment.
Propagate tool-result end time to Codex inference span
The Codex builder has the same timing bug: tool_node.end_ms is widened to the ToolResult timestamp, but the parent Inference end is left unchanged. As a result, the inferred turn end can be earlier than its tool-result child, which skews duration-based analysis for Codex sessions. Mirror the child end back into node.end_ms while iterating tool uses.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in a3086ef. The Codex builder now applies the same fix as Claude — node.end_ms = node.end_ms.max(tool_node.end_ms) after widening the tool_use end from the ToolResult and before pushing the child. New regression test codex_tool_result_after_assistant_row_widens_inference_and_root_end_ms covers Codex sessions.
Generated by Claude Code
There was a problem hiding this comment.
4 issues found across 10 files
Tip: cubic can generate docs of your entire codebase and keep them up to date. Try it here.
Re-trigger cubic
…, drop unsound Eq Address PR #451 review findings: - session_span_trees passed the session-wide subagents slice into every turn, so the Claude builder duplicated each orphan sidecar into every turn tree. Pre-bucket subagents so each lands in exactly one turn: paired sidecars route by tool_use_id; orphans go to the latest turn whose start <= subagent_start (first turn if none precede). The orphan-semantics decision (sibling under the turn root, attributes unattached=true) stands. - Claude and Codex builders widened tool_node.end_ms from a later ToolResult timestamp but left the parent Inference end_ms unchanged, so turns reported truncated durations once the root rolled up its inference children. Propagate the widened end up to the inference span before pushing the tool_use child. - impl Eq for AttrValue violated reflexivity (AttrValue::Float(f64), NaN != NaN). Drop the impl. BTreeMap<String, AttrValue> only needs Ord on its keys, so no consumer required Eq. Tests: - bucket_subagents_per_turn unit test covers paired + three orphan placements (mid, early, late) with a no-duplication assertion. - session_span_trees regression test pins the no-duplication contract end-to-end. - Claude + Codex span-tree builder tests assert Inference and root end_ms widen to a trailing ToolResult timestamp. https://claude.ai/code/session_01QEpNZbWEYNwxzqQjTN5LCY
There was a problem hiding this comment.
2 issues found across 4 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="crates/relayburn-sdk/src/reader/codex/span_tree.rs">
<violation number="1" location="crates/relayburn-sdk/src/reader/codex/span_tree.rs:245">
P2: Inference end update inside loop leaks into later ToolUse defaults. Per-tool end_ms becomes order-dependent. Track inference max separately and apply after building children.</violation>
</file>
<file name="crates/relayburn-sdk/src/query_verbs.rs">
<violation number="1" location="crates/relayburn-sdk/src/query_verbs.rs:4363">
P2: Orphan-to-turn match uses row order, not max timestamp. Wrong turn can get the subagent when rows are out of time order. Pick the greatest `turn_start <= subagent_start` instead of reverse-find.</violation>
</file>
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
| // turns whose tool_result timestamps trail the assistant row | ||
| // don't underreport duration (the turn root rolls up its | ||
| // inference children's end_ms). | ||
| if node.end_ms < tool_node.end_ms { |
There was a problem hiding this comment.
P2: Inference end update inside loop leaks into later ToolUse defaults. Per-tool end_ms becomes order-dependent. Track inference max separately and apply after building children.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At crates/relayburn-sdk/src/reader/codex/span_tree.rs, line 245:
<comment>Inference end update inside loop leaks into later ToolUse defaults. Per-tool end_ms becomes order-dependent. Track inference max separately and apply after building children.</comment>
<file context>
@@ -238,6 +238,13 @@ fn build_inference_node(
+ // turns whose tool_result timestamps trail the assistant row
+ // don't underreport duration (the turn root rolls up its
+ // inference children's end_ms).
+ if node.end_ms < tool_node.end_ms {
+ node.end_ms = tool_node.end_ms;
+ }
</file context>
| Some(sa_ms) => turn_starts | ||
| .iter() | ||
| .enumerate() | ||
| .rev() | ||
| .find(|(_, ts)| **ts <= sa_ms) | ||
| .map(|(i, _)| i) | ||
| .unwrap_or(0), |
There was a problem hiding this comment.
P2: Orphan-to-turn match uses row order, not max timestamp. Wrong turn can get the subagent when rows are out of time order. Pick the greatest turn_start <= subagent_start instead of reverse-find.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At crates/relayburn-sdk/src/query_verbs.rs, line 4363:
<comment>Orphan-to-turn match uses row order, not max timestamp. Wrong turn can get the subagent when rows are out of time order. Pick the greatest `turn_start <= subagent_start` instead of reverse-find.</comment>
<file context>
@@ -4292,6 +4313,146 @@ impl LedgerHandle {
+ // when the sidecar carries no parseable timestamp.
+ let sa_start_ms = first_record_ts_ms(&sa.records);
+ assigned = Some(match sa_start_ms {
+ Some(sa_ms) => turn_starts
+ .iter()
+ .enumerate()
</file context>
| Some(sa_ms) => turn_starts | |
| .iter() | |
| .enumerate() | |
| .rev() | |
| .find(|(_, ts)| **ts <= sa_ms) | |
| .map(|(i, _)| i) | |
| .unwrap_or(0), | |
| Some(sa_ms) => turn_starts | |
| .iter() | |
| .enumerate() | |
| .filter(|(_, ts)| **ts <= sa_ms) | |
| .max_by_key(|(_, ts)| **ts) | |
| .map(|(i, _)| i) | |
| .unwrap_or(0), |
In-scope #452 fixes (A-G): - `burn overhead deltas` now honors `--since` (A): thread the parent overhead args' `since` into `ContextDeltaOpts`, parse relative ranges (`24h`/`7d`/`4w`/`2m`) into `Duration`, and use that to scope the session-enumeration `query_turns` seed inside the SDK so the `None`-session path no longer walks every historical session. - Make the per-rail and cross-session `ContextDelta` sort fully deterministic across HashMap iteration order (B): chain `owner_rail` then `session_id` as final tie-breakers. - UTF-8-safe `short_turn_label` / `short_agent_label` (C): switch from byte slicing to `chars().take(8)` so multi-byte ids never panic. - `format_signed_tokens` preserves the negative sign (D): emit `-` for negative deltas instead of dropping it. - Sort compactions with `sort_by_cached_key` (E) so `parse_iso_ms` runs once per element rather than once per comparison. - Dedup four copies of `parse_iso_ms` (F) into `crates/relayburn-sdk/src/util/time.rs`; the analyze/context_delta, query_verbs, reader/claude, and reader/codex copies now share one implementation. - `read_jsonl_values` streams via `BufReader::lines()` (G) rather than reading the entire file into memory. Foundation fixes (carried in #452's diff; will also land on #451): - Propagate `output_truncated` on `ToolResult` span nodes (H) in both the Claude and Codex builders so downstream consumers can flag truncated tool outputs. - Propagate `ToolResult` error status to the parent `ToolUse` (I, J) in both builders — the runtime tool_result is the ground truth, not the assistant row's `is_error` hint. - Don't drop subagents whose `paired_tool_use_id` doesn't match any ToolUse in the turn (K): drain leftover `paired_subagents` after the inference walk and surface them as `unattached` siblings under the turn root. - Stop swallowing real ledger-read failures (M): replace the blanket `unwrap_or_default()` on `query_inferences` / `query_tool_result_events` with a `match` that tolerates only the pre-schema "no such table/column" class of error and propagates every other failure. Tests: `cargo test --workspace` (zero failures, +2 new tests in the `query_verbs` mod for the since filter and helper). Zero warnings on `cargo build --workspace`.
* sdk/cli: per-inference context-delta attribution (#432) New `burn overhead deltas` verb answers "what blew up my context between inference N and inference N+1?" by walking each session's TurnSpanTree timeline, pairing same-rail Inference spans, and attributing the delta in `tokens.input + cache_read + cache_write` to the intervening ToolResult / UserPrompt / SystemReminder leaves. SDK surface: `LedgerHandle::context_delta(ContextDeltaOpts)` returns `Vec<ContextDelta>` with per-step intervening breakdown, attributed cost (charged at cache_read rate — what the *future* will pay for the persisted prefix), and compaction events surfaced as their own row rather than negative deltas. Main-rail deltas never see subagent tool_results and vice versa. Tool-result token estimates use `output_bytes / 4` as a first-cut fallback; documented as approximate in the output. CLI: `burn overhead deltas [--session ID] [--top N] [--min-delta TOK] [--owner main|subagent|all] [--explain] [--json]`. Default top is 20, default min_delta is 1000 tokens. Compaction rows ignore min_delta so they always surface. Tests: 9 unit tests covering the Bash blow-up driver path, compaction- replaces-negative-delta, subagent isolation, owner filter, top cap, min_delta filter, single-inference no-op, and JSON wire format. Two golden snapshots (`overhead-deltas`, `overhead-deltas-json`) anchor the CLI output against the fixture ledger. * fix(context-delta): address review feedback on PR #452 In-scope #452 fixes (A-G): - `burn overhead deltas` now honors `--since` (A): thread the parent overhead args' `since` into `ContextDeltaOpts`, parse relative ranges (`24h`/`7d`/`4w`/`2m`) into `Duration`, and use that to scope the session-enumeration `query_turns` seed inside the SDK so the `None`-session path no longer walks every historical session. - Make the per-rail and cross-session `ContextDelta` sort fully deterministic across HashMap iteration order (B): chain `owner_rail` then `session_id` as final tie-breakers. - UTF-8-safe `short_turn_label` / `short_agent_label` (C): switch from byte slicing to `chars().take(8)` so multi-byte ids never panic. - `format_signed_tokens` preserves the negative sign (D): emit `-` for negative deltas instead of dropping it. - Sort compactions with `sort_by_cached_key` (E) so `parse_iso_ms` runs once per element rather than once per comparison. - Dedup four copies of `parse_iso_ms` (F) into `crates/relayburn-sdk/src/util/time.rs`; the analyze/context_delta, query_verbs, reader/claude, and reader/codex copies now share one implementation. - `read_jsonl_values` streams via `BufReader::lines()` (G) rather than reading the entire file into memory. Foundation fixes (carried in #452's diff; will also land on #451): - Propagate `output_truncated` on `ToolResult` span nodes (H) in both the Claude and Codex builders so downstream consumers can flag truncated tool outputs. - Propagate `ToolResult` error status to the parent `ToolUse` (I, J) in both builders — the runtime tool_result is the ground truth, not the assistant row's `is_error` hint. - Don't drop subagents whose `paired_tool_use_id` doesn't match any ToolUse in the turn (K): drain leftover `paired_subagents` after the inference walk and surface them as `unattached` siblings under the turn root. - Stop swallowing real ledger-read failures (M): replace the blanket `unwrap_or_default()` on `query_inferences` / `query_tool_result_events` with a `match` that tolerates only the pre-schema "no such table/column" class of error and propagates every other failure. Tests: `cargo test --workspace` (zero failures, +2 new tests in the `query_verbs` mod for the since filter and helper). Zero warnings on `cargo build --workspace`. --------- Co-authored-by: Claude <noreply@anthropic.com>
Closes #430.
Summary
Introduces a per-turn span tree as a derived analytical primitive. Pure projection from
TurnRecord+tool_result_eventrows + (optional) subagent transcripts. No schema changes, no caching — always re-derived per call.Foundation for inference-flow DAG (#431), context-delta attribution (#432), and several other downstream features.
New types (
crates/relayburn-sdk/src/analyze/span_tree.rs)Serde with kebab-case wire form, matching
ActivityCategory/StopReasonconvention.Attribute schema (locked in module doc)
tokens.input,tokens.output,tokens.cache_read,tokens.cache_write,tokens.reasoning,model,request_id,agent_id,tool_use_id,cwd,mode,stop_reason. Downstream consumers can rely on these.Status mapping
tool_use.is_error == true→Error { msg: <tool error message> }stop_reason == Refusal→Error { msg: "refusal" }stop_reason == MaxTokens→Error { msg: "max_tokens" }OkErrors propagate from leaf to root.
Claude builder (
reader/claude/span_tree.rs)Builds the full hierarchy:
Turn (root) → UserPrompt + N Inferences; each Inference → ToolUses; each ToolUse → ToolResult + optional Subagent subtree.requestId(via the existingWorkingRecordcapture; if reader: dedupe assistant rows by requestId into Inference unit (#434) #448'sInferencetype is in main first, the next iteration can switch to that source — see follow-up note below).pair_to_mainfrom reader: pair Task subagents via toolUseResult.agentId, surface orphans as UnattachedGroup (#435) #449.attributes["unattached"] = true. Tree shape stays uniform (one root);UnattachedGroupequivalent recoverable bychildren.iter().filter(|c| c.kind == Subagent && c.attributes["unattached"] == true). Documented in module doc.Codex builder (
reader/codex/span_tree.rs)Limited but honest about it. Codex rollouts expose strictly less hierarchy than Claude:
requestId→ oneInferenceper turn keyed bymessage_id(matchesInferenceKeySource::MessageIdfallback)Subagentspansstop_reasonon assistant rows → root status defaults toOkunless a child tool errorsProduces:
Turn → {UserPrompt, Inference → ToolUse → ToolResult}. Same attribute keys, same status mapping. Documented in module preamble.SDK verbs
Plus free-function forms for embedders.
Test plan
29 new tests covering all 6 acceptance cases from the issue:
TurnRecordwithin roundingrequestIds) → multipleInferencechildrenToolResultnested underToolUseMaxTokensturn → root statusError { msg: "max_tokens" }tool_use.is_error) → propagates to rootBreakdown:
analyze/span_tree.rsreader/claude/span_tree.rsreader/codex/span_tree.rsLedgerHandleintegration tests inquery_verbs.rsWorkspace:
cargo build --workspaceclean (zero warnings).cargo test --workspace— 871 passed, 0 failed.BURN_GOLDEN=1 cargo test --test golden— 5/5.Deferred to follow-ups
packages/sdk-node/src/index.d.tsmirror + napi bridge). Issue lists it as out of scope. Rust surface lands first.burn__turnSpanTreetool — explicitly out of scope per issue.UserPrompttext body andToolResultattached file content. The spans are emitted as structural placeholders; future PRs can populate the content without changing tree shape.burn.sqlite— issue's default position: "always derive; cache only if profiling demands it." No caching this round.Out of scope
#[non_exhaustive]on new types.Files
New (3):
analyze/span_tree.rs(583L),reader/claude/span_tree.rs(1036L),reader/codex/span_tree.rs(498L). 29 tests.Modified (7):
analyze.rs,lib.rs,query_verbs.rs,reader.rs,reader/claude.rs,reader/codex.rs,CHANGELOG.md.Generated by Claude Code