Skip to content

overhead: per-inference context-delta attribution (#432)#452

Merged
willwashburn merged 2 commits into
mainfrom
claude/burn-432-context-delta
May 26, 2026
Merged

overhead: per-inference context-delta attribution (#432)#452
willwashburn merged 2 commits into
mainfrom
claude/burn-432-context-delta

Conversation

@willwashburn
Copy link
Copy Markdown
Member

Closes #432.

Depends on #451 (span tree foundation, #430) being in main first. Branch is currently based on claude/burn-430-span-tree-foundation; will rebase mechanically onto main once #451 lands.

Summary

New burn overhead deltas --session <id> verb. For each inference in a session, computes the context_tokens delta from the prior same-owner inference, attributes the delta to intervening tool_results / user_prompts / system_reminders / compaction events. Answers "what blew up my context between inference 5 and inference 6".

Pure derivation from span trees + content sidecars. No DB writes, no schema changes.

SDK

pub fn LedgerHandle::context_delta(&self, opts: ContextDeltaOpts) -> Result<Vec<ContextDelta>>;

ContextDelta carries session_id, turn_id, inference_idx, owner_rail (Main | Subagent(agent_id)), prior_context_tokens, current_context_tokens, delta_tokens, intervening: Vec<InterveningStep>, and attributed_cost_usd.

InterveningStep variants: ToolResult { tool_use_id, tool_name, approx_tokens, approx_bytes, truncated }, UserPrompt, SystemReminder (reserved — see approximations below), Compaction { tokens_freed }, Other.

CLI

burn overhead deltas
    [--session <id>]
    [--since <duration>]         # default 24h
    [--top N]                    # default 20
    [--min-delta TOKENS]         # default 1000
    [--owner main|subagent|all]  # default all
    [--json]
    [--explain]                  # expand to all intervening steps per delta

Human render: Inference | Owner | Δtokens | Δcost | Driver table where Driver = largest intervening step (or "N steps" with --explain). JSON: full Vec<ContextDelta>.

Key decisions

Cost rate: max(delta_tokens, 0) * curr.model.cache_read per million. Rationale (documented inline): cache_read is what every future inference pays for the persisted prefix the delta added. cache_write is paid once; cache_read is the steady-state cost the user keeps paying. Matches the issue's open-question #3 recommendation. Cost is 0.0 for models the pricing table doesn't recognize, matching the rest of analyze/.

Subagent rail isolation: each Subagent span flips the active owner rail to OwnerRail::Subagent { agent_id } for its entire subtree during DFS. Leaves under a Task fanout never enter the main-rail timeline. Tested directly by subagent_isolation_main_rail_excludes_subagent_results.

Compaction handling: when delta < 0 and a CompactionEvent sits between prev and curr, surface as Compaction { tokens_freed } instead of a negative delta. Compaction rows bypass min_delta filtering so a small compaction never silently vanishes. Tested by compaction_replaces_negative_delta.

Known approximations (documented in source + output)

Test plan

  • cargo test --workspace — 891 passed, 0 failed, zero warnings on build
  • BURN_GOLDEN=1 cargo test --test golden — passing including 2 new snapshots (overhead-deltas, overhead-deltas-json)
  • 9 SDK unit tests in context_delta.rs covering: known >10k jump via Bash result, compaction replaces negative delta, subagent rail isolation, single-inference graceful handling, --min-delta filter, --top cap
  • 4 CLI unit tests in overhead.rs covering renderer output shape

Out of scope

Files

New (1): crates/relayburn-sdk/src/analyze/context_delta.rs
Modified (5): analyze.rs, lib.rs, query_verbs.rs, cli.rs (deltas args), commands/overhead.rs (runner + renderers), CHANGELOG.md, plus 2 new golden snapshots + invocations entry.


Generated by Claude Code

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 26, 2026

Review Change Stack

Caution

Review failed

Failed to post review comments

📝 Walkthrough

Walkthrough

This PR implements per-inference context-window delta attribution by adding a shared ISO-8601 timestamp parser, improving span-tree builders with error handling and truncation tracking, introducing a context-delta analysis algorithm that pairs consecutive inferences and attributes intervening steps (tool results, user prompts, compactions), exposing SDK query verbs, and delivering a new burn overhead deltas CLI command with human-readable and JSON output formats.

Changes

Per-inference context-delta attribution pipeline

Layer / File(s) Summary
Shared ISO-8601 timestamp parser
crates/relayburn-sdk/src/util.rs, crates/relayburn-sdk/src/util/time.rs
New parse_iso_ms utility converts ISO-8601 timestamps to Unix milliseconds, replacing duplicate local implementations across span-tree readers with a centralized, tested parser.
Span-tree builder improvements
crates/relayburn-sdk/src/reader/claude/span_tree.rs, crates/relayburn-sdk/src/reader/codex/span_tree.rs, crates/relayburn-sdk/src/query_verbs.rs
Builders switch to shared timestamp parser, emit orphan subagents with unattached flag (Claude), propagate tool-result errors to parent ToolUse spans, record output_truncated attributes. Query verbs add schema-missing fail-soft logic and refactor JSONL streaming.
Context-delta attribution algorithm
crates/relayburn-sdk/src/analyze/context_delta.rs
New module implements deltas_for_session that builds DFS timelines, pairs consecutive inferences per rail, collects intervening steps (ToolResult, UserPrompt, SystemReminder, Compaction), derives token deltas, handles negative deltas via compaction replacement/clamping, computes attributed USD cost, and filters/sorts/truncates output. Includes extensive unit tests.
SDK API surface
crates/relayburn-sdk/src/analyze.rs, crates/relayburn-sdk/src/lib.rs
Declares pub mod context_delta and re-exports core types and functions. Root lib.rs adds util module and context-delta re-exports with type aliases for ergonomic imports.
Context-delta query verb
crates/relayburn-sdk/src/query_verbs.rs
Adds LedgerHandle::context_delta method and free-function, selecting session scopes, loading span-trees and compactions, computing deltas, and cross-session sorting. Includes filtering and since-window tests.
CLI overhead deltas arguments and routing
crates/relayburn-cli/src/cli.rs
Extends OverheadAction with Deltas(OverheadDeltasArgs) variant. Adds OverheadDeltasArgs struct with --session, --top, --min-delta, --owner, --explain, --json flags and OverheadDeltasOwner enum with SDK filter mapping.
Overhead deltas command and rendering
crates/relayburn-cli/src/commands/overhead.rs
Implements run_deltas handler building ContextDeltaOpts, invoking SDK verb, and dispatching to output renderers. Provides render_human_deltas table with Inference/Owner/Δ/Cost/Driver columns, optional explain sections, and helper functions for label formatting, driver selection, and step stringification. Unit tests cover label trimming, compaction prioritization, and token formatting.
Documentation and integration tests
CHANGELOG.md, tests/fixtures/cli-golden/invocations.json, tests/fixtures/cli-golden/snapshots/overhead-deltas*.stdout.txt
Documents new CLI flags and SDK API. Adds golden-fixture invocations for human and JSON output variants. Provides snapshot outputs showing expected table format and JSON structure with realistic token/cost values and approximation notes.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

  • AgentWorkforce/burn#432: This PR implements the complete per-inference context-delta attribution feature, including the burn overhead deltas CLI, SDK algorithm, intervening-step attribution, and compaction handling specified in the issue.

Possibly related PRs

  • AgentWorkforce/burn#309: The main PR implements the previously stubbed burn overhead command, adding the OverheadAction::Deltas path and run_deltas handler that builds on the CLI scaffold.
  • AgentWorkforce/burn#312: Both PRs extend the shared burn overhead CLI wiring in crates/relayburn-cli/src/cli.rs and command module, with this PR introducing the new OverheadAction::Deltas variant.

Poem

🐰 With careful hop through inference trees,
We trace the tokens, find the keys—
Which tool result made context bloom?
Context deltas light the room,
Per-step accounting, clear and bright!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'overhead: per-inference context-delta attribution (#432)' clearly and concisely summarizes the main change: adding per-inference context-delta attribution functionality to the overhead command.
Description check ✅ Passed The PR description is comprehensive and directly related to the changeset, providing implementation details, architecture decisions, test coverage, and approximations used throughout the changes.
Linked Issues check ✅ Passed The PR successfully implements all major objectives from issue #432: per-inference context delta computation via LedgerHandle::context_delta(), intervening step attribution, cost calculation using cache_read rate, compaction handling with min_delta bypass, subagent rail isolation, and both human and JSON CLI outputs with comprehensive test coverage.
Out of Scope Changes check ✅ Passed The PR introduces comprehensive changes to support context-delta functionality including new SDK APIs, CLI commands, span-tree improvements for error tracking, and refactored shared timestamp parsing, all directly supporting the core feature. The refactoring of duplicate parse_iso_ms implementations into shared utilities is a necessary supporting change.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/burn-432-context-delta

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces per-inference context-window delta attribution and per-turn span trees as derived analytical primitives. It adds the burn overhead deltas CLI command (and corresponding SDK entry point LedgerHandle::context_delta) to attribute context-window growth to intervening steps like tool results, user prompts, system reminders, or compactions. It also implements per-turn span tree builders for Claude Code and Codex rollouts to project hierarchical turn structures. The reviewer feedback highlights several important improvement opportunities: preventing potential panics from byte-slicing UTF-8 strings in label helpers, fixing a formatting bug for negative tokens in format_signed_tokens, optimizing sorting performance using sort_by_cached_key, centralizing the duplicated parse_iso_ms utility, and improving memory efficiency when reading JSONL files by utilizing a BufReader.

Comment on lines +471 to +481
fn short_turn_label(turn_id: &str) -> String {
// Turn ids on Claude are `msg-...` UUIDs; trim to a short prefix
// for the table. Keep the original for JSON output.
let trimmed = turn_id.trim_start_matches("msg_");
let trimmed = trimmed.trim_start_matches("msg-");
if trimmed.len() > 8 {
format!("T{}", &trimmed[..8])
} else {
format!("T{trimmed}")
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The short_turn_label function uses string slicing &trimmed[..8] which can panic if the 8th byte is not a character boundary (e.g., if the string contains multi-byte UTF-8 characters). Using .chars().take(8).collect() is safer and avoids potential panics.

fn short_turn_label(turn_id: &str) -> String {
    // Turn ids on Claude are `msg-...` UUIDs; trim to a short prefix
    // for the table. Keep the original for JSON output.
    let trimmed = turn_id.trim_start_matches("msg_").trim_start_matches("msg-");
    let short: String = trimmed.chars().take(8).collect();
    format!("T{short}")
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in cf3c61eshort_turn_label now uses chars().take(8).collect() so multi-byte UTF-8 ids never panic on a mid-byte cut.


Generated by Claude Code

Comment on lines +483 to +490
fn short_agent_label(agent_id: &str) -> String {
let trimmed = agent_id.trim_start_matches("agent-");
if trimmed.len() > 8 {
trimmed[..8].to_string()
} else {
trimmed.to_string()
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The short_agent_label function uses string slicing trimmed[..8] which can panic if the 8th byte is not a character boundary. Using .chars().take(8).collect() is safer and avoids potential panics.

fn short_agent_label(agent_id: &str) -> String {
    let trimmed = agent_id.trim_start_matches("agent-");
    trimmed.chars().take(8).collect()
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in cf3c61eshort_agent_label now uses chars().take(8).collect() (same fix as short_turn_label).


Generated by Claude Code

Comment on lines +492 to +495
fn format_signed_tokens(n: i64) -> String {
let sign = if n > 0 { "+" } else { "" };
format!("{sign}{}", format_tokens(n.unsigned_abs()))
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The format_signed_tokens function does not handle negative numbers correctly. If n is negative, sign is set to "" and n.unsigned_abs() is formatted, resulting in a positive string representation (e.g., -500 becomes "500"). It should format with a "-" sign for negative numbers.

fn format_signed_tokens(n: i64) -> String {
    let sign = if n > 0 {
        "+"
    } else if n < 0 {
        "-"
    } else {
        ""
    };
    format!("{sign}{}", format_tokens(n.unsigned_abs()))
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in cf3c61eformat_signed_tokens now emits - for negative deltas instead of dropping the sign.


Generated by Claude Code

Comment on lines +279 to +280
let mut compactions_sorted: Vec<&CompactionEvent> = compactions.iter().collect();
compactions_sorted.sort_by_key(|c| parse_iso_ms(&c.ts).unwrap_or(0));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The sort_by_key method executes the key extraction function parse_iso_ms for every comparison, which is $O(N \log N)$ times. Since parse_iso_ms involves relatively expensive string parsing and calendar calculations, using sort_by_cached_key is much more efficient as it evaluates the key exactly once per element ($O(N)$ times).

Suggested change
let mut compactions_sorted: Vec<&CompactionEvent> = compactions.iter().collect();
compactions_sorted.sort_by_key(|c| parse_iso_ms(&c.ts).unwrap_or(0));
let mut compactions_sorted: Vec<&CompactionEvent> = compactions.iter().collect();
compactions_sorted.sort_by_cached_key(|c| parse_iso_ms(&c.ts).unwrap_or(0));

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in cf3c61e — switched to sort_by_cached_key so parse_iso_ms runs once per element instead of once per comparison.


Generated by Claude Code

}
}

/// ISO-8601 -> Unix-ms. Mirror of the parsers elsewhere in the SDK; kept
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The parse_iso_ms function is duplicated 4 times across different modules in this PR alone (e.g., in context_delta.rs, query_verbs.rs, claude/span_tree.rs, and codex/span_tree.rs). Since all these modules are within the same relayburn-sdk crate, they should share a single internal utility function (e.g., in a shared crate::utils or crate::reader::utils module) to improve maintainability and reduce code duplication.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in cf3c61e — moved the parser to a single pub(crate) helper at crates/relayburn-sdk/src/util/time.rs. The four ex-copies (analyze/context_delta.rs, query_verbs.rs, reader/claude/span_tree.rs, reader/codex/span_tree.rs) now all use crate::util::time::parse_iso_ms.


Generated by Claude Code

Comment on lines +4511 to +4530
fn read_jsonl_values(path: &Path) -> Vec<serde_json::Value> {
let bytes = match std::fs::read(path) {
Ok(b) => b,
Err(_) => return Vec::new(),
};
let text = match std::str::from_utf8(&bytes) {
Ok(s) => s,
Err(_) => return Vec::new(),
};
text.lines()
.filter_map(|line| {
let t = line.trim();
if t.is_empty() {
None
} else {
serde_json::from_str::<serde_json::Value>(t).ok()
}
})
.collect()
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The read_jsonl_values function reads the entire file into memory at once using std::fs::read and then converts it to a string. For large JSONL files, this can lead to high memory usage or OOM. Reading the file line-by-line using BufReader is much more memory-efficient and robust.

fn read_jsonl_values(path: &Path) -> Vec<serde_json::Value> {
    let file = match std::fs::File::open(path) {
        Ok(f) => f,
        Err(_) => return Vec::new(),
    };
    let reader = std::io::BufReader::new(file);
    use std::io::BufRead;
    reader
        .lines()
        .filter_map(|line| {
            let l = line.ok()?;
            let t = l.trim();
            if t.is_empty() {
                None
            } else {
                serde_json::from_str::<serde_json::Value>(t).ok()
            }
        })
        .collect()
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in cf3c61eread_jsonl_values now opens the file and streams via BufReader::lines() instead of slurping the entire payload into memory.


Generated by Claude Code

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 05f9aa3e6c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread crates/relayburn-sdk/src/query_verbs.rs Outdated
// of `since` — that flag caps output rows, not the
// input scan).
let mut ids: BTreeSet<String> = BTreeSet::new();
let all = self.inner.query_turns(&Query::default())?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Apply since filtering when collecting context deltas

ContextDeltaOpts includes a time window, but this path always enumerates every session with query_turns(&Query::default()) and never consults opts.since/effective_since, so old sessions can still dominate overhead deltas results even when callers expect a recent window. This makes the top-N output incorrect for bounded-time analysis and breaks the documented since semantics.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in cf3c61e--since now flows from the parent overhead args through run_deltas into ContextDeltaOpts.since, and the SDK seeds the session-enumeration query_turns with a matching since-scoped Query. Sessions whose latest activity falls outside the window are skipped before their span trees get loaded. New regression test: query_verbs::tests::context_delta_since_filter_excludes_old_sessions.


Generated by Claude Code

Comment on lines +423 to +425
if let Some(bytes) = final_event.output_bytes {
node.set_attr("output_bytes", AttrValue::Int(bytes as i64));
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve tool-result truncation metadata on span nodes

This mapper copies output_bytes from ToolResultEventRecord but drops output_truncated, so downstream delta attribution never marks intervening tool results as truncated. In practice, overhead deltas --explain can present large tool outputs as fully representative even when ingest detected truncation, which degrades attribution accuracy and operator trust; propagate output_truncated onto the ToolResult span (and mirror the same fix in the Codex builder).

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in cf3c61e — both the Claude and Codex build_tool_result_node now propagate output_truncated onto the ToolResult span as an AttrValue::Bool attribute. The context-delta InterveningStep::ToolResult.truncated and --explain render already consume it.


Generated by Claude Code

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (2)
CHANGELOG.md (1)

9-20: ⚡ Quick win

Shorten the new [Unreleased] bullets to impact-first release notes.

These entries are currently too implementation-heavy for the changelog style in this repo. Please condense to one short user-impact bullet per change and remove issue-link/backstory density.

As per coding guidelines: “Changelog entries should be concise and impact-first… Drop issue/PR links, internal review notes, implementation backstory, and ‘foundation for…’ phrasing…”.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@CHANGELOG.md` around lines 9 - 20, The changelog bullets are too
implementation-heavy; rewrite each entry as a single concise, impact-first
sentence and remove PR/issue links and implementation/backstory. For the `burn
overhead deltas` item, replace the paragraph with a short user-facing line like:
"Add context-delta attribution via LedgerHandle::context_delta to show what
changed between consecutive inferences." For the `relayburn-sdk` item, replace
the paragraph with a short user-facing line like: "Add TurnSpanTree projection
APIs (LedgerHandle::turn_span_tree and LedgerHandle::session_span_trees) to
expose per-turn span trees for tooling and analysis." Ensure you drop internal
details (e.g., TurnSpanTree structure, locked attribute list, compaction
behavior, 'orphan' wording) and keep only the user impact and the API symbols.
crates/relayburn-sdk/src/reader/codex/span_tree.rs (1)

276-279: 💤 Low value

Consider adding a wire_str() method to ToolResultStatus for consistency.

Line 75 uses reason.wire_str() for StopReason, but here ToolResultStatus relies on {:?} Debug formatting + lowercase. If the Debug representation changes (e.g., enum variant rename), the wire string silently changes. A dedicated wire_str() method would make the contract explicit.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/relayburn-sdk/src/reader/codex/span_tree.rs` around lines 276 - 279,
final_event.status is being serialized via format!("{:?}").to_ascii_lowercase(),
which couples the wire string to the Debug representation; add a wire_str()
method to the ToolResultStatus enum (similar to StopReason::wire_str()) that
returns the canonical lowercase wire string for each variant, update the code
that sets the node attribute to call final_event.status.wire_str() instead of
using Debug-based formatting, and ensure all other places that serialize
ToolResultStatus use the new wire_str() to make the contract explicit and
stable.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/relayburn-cli/src/commands/overhead.rs`:
- Around line 32-33: The deltas branch currently drops the parent `since` and
constructs query options with `since: None`, so `burn overhead deltas --since
...` is ignored; update the deltas path to accept and propagate the parent
`since` into the deltas query options: modify the call/site around
OverheadAction::Deltas(deltas) / run_deltas(globals, deltas) so `run_deltas` (or
the code that builds the deltas query) receives the `args.since` value and uses
it when building the query options (replace `since: None` with the passed
`since`), and ensure any helper that constructs the query options (the code
around the current `since: None` usage) uses that propagated value.

In `@crates/relayburn-sdk/src/analyze/context_delta.rs`:
- Around line 379-386: The current sort closure used in out.sort_by compares
delta_tokens, turn_id, and inference_idx but still leaves ties dependent on
per_rail insertion order; update the comparator used where out.sort_by(...) is
defined to add additional deterministic tie-breakers by chaining .then_with(||
a.owner_rail.cmp(&b.owner_rail)) and finally .then_with(||
a.session_id.cmp(&b.session_id)) (or vice‑versa if session_id is preferred
first) so the ordering is fully deterministic across HashMap iteration; locate
the sort_by closure that references delta_tokens, turn_id, and inference_idx and
append these comparisons to it.

In `@crates/relayburn-sdk/src/query_verbs.rs`:
- Around line 4581-4593: When opts.session is None, the current session_ids
collection uses Query::default() which ignores opts.since; change it to
construct a Query that sets the since field from opts.since (or equivalent
time-scope) and pass that query to self.inner.query_turns to seed the BTreeSet
of session IDs, then continue to call deltas_for_session (or other full-tree
loaders) only for those session IDs; update the code paths that build the Query
and call query_turns (refer to opts.since, session_ids, self.inner.query_turns,
and deltas_for_session) so the initial session enumeration honors the --since
filter.
- Around line 4238-4240: Replace the unconditional unwrap_or_default() on the
query results so real ledger-read errors are not silently swallowed: for both
self.inner.query_inferences(&session_q) and
self.inner.query_tool_result_events(&session_q) match the Result and only fall
back to the pre-schema default when the error explicitly indicates a missing
table/column/schema; for any other Err return propagate the error (e.g., using
the ? operator or returning Err) so read failures surface. Reference the
query_inferences/query_tool_result_events calls and the
inferences/tool_result_events variables when making this change, or extract a
helper like is_schema_missing(err) to centralize the schema-missing check.

In `@crates/relayburn-sdk/src/reader/claude/span_tree.rs`:
- Around line 462-468: The subagent transcript (sa.records) is being ignored so
Subagent nodes are created as leaves; update the span-tree builder (the code
around the `node` return in build_claude_span_tree / the span-tree construction
for Subagent) to materialize child spans from `sa.records` rather than
discarding them: parse/convert each entry in `sa.records` into proper child span
nodes (including timestamps and status) and attach them to the parent Subagent
node so the ToolUse -> Subagent -> ... hierarchy is preserved; reuse the same
span-construction helpers used for top-level turns to create those child spans
so downstream consumers can still call `build_claude_span_tree` on materialized
child turns if needed.
- Around line 176-180: The loop currently only attaches nodes from
unpaired_subagents and drops any remaining entries in paired_subagents; instead,
after the existing for sa in unpaired_subagents loop, iterate over any leftover
items in paired_subagents and push build_subagent_node(sa, true) onto
root.children (e.g., via paired_subagents.drain(..) or into_iter() depending on
ownership) so unmatched subagents in paired_subagents are surfaced as unattached
root children; update the code that mutates root.children, paired_subagents, and
unpaired_subagents accordingly.

---

Nitpick comments:
In `@CHANGELOG.md`:
- Around line 9-20: The changelog bullets are too implementation-heavy; rewrite
each entry as a single concise, impact-first sentence and remove PR/issue links
and implementation/backstory. For the `burn overhead deltas` item, replace the
paragraph with a short user-facing line like: "Add context-delta attribution via
LedgerHandle::context_delta to show what changed between consecutive
inferences." For the `relayburn-sdk` item, replace the paragraph with a short
user-facing line like: "Add TurnSpanTree projection APIs
(LedgerHandle::turn_span_tree and LedgerHandle::session_span_trees) to expose
per-turn span trees for tooling and analysis." Ensure you drop internal details
(e.g., TurnSpanTree structure, locked attribute list, compaction behavior,
'orphan' wording) and keep only the user impact and the API symbols.

In `@crates/relayburn-sdk/src/reader/codex/span_tree.rs`:
- Around line 276-279: final_event.status is being serialized via
format!("{:?}").to_ascii_lowercase(), which couples the wire string to the Debug
representation; add a wire_str() method to the ToolResultStatus enum (similar to
StopReason::wire_str()) that returns the canonical lowercase wire string for
each variant, update the code that sets the node attribute to call
final_event.status.wire_str() instead of using Debug-based formatting, and
ensure all other places that serialize ToolResultStatus use the new wire_str()
to make the contract explicit and stable.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: d8fbbd76-7ba9-4d50-970e-78c69c141584

📥 Commits

Reviewing files that changed from the base of the PR and between 131e31e and 05f9aa3.

📒 Files selected for processing (16)
  • CHANGELOG.md
  • crates/relayburn-cli/src/cli.rs
  • crates/relayburn-cli/src/commands/overhead.rs
  • crates/relayburn-sdk/src/analyze.rs
  • crates/relayburn-sdk/src/analyze/context_delta.rs
  • crates/relayburn-sdk/src/analyze/span_tree.rs
  • crates/relayburn-sdk/src/lib.rs
  • crates/relayburn-sdk/src/query_verbs.rs
  • crates/relayburn-sdk/src/reader.rs
  • crates/relayburn-sdk/src/reader/claude.rs
  • crates/relayburn-sdk/src/reader/claude/span_tree.rs
  • crates/relayburn-sdk/src/reader/codex.rs
  • crates/relayburn-sdk/src/reader/codex/span_tree.rs
  • tests/fixtures/cli-golden/invocations.json
  • tests/fixtures/cli-golden/snapshots/overhead-deltas-json.stdout.txt
  • tests/fixtures/cli-golden/snapshots/overhead-deltas.stdout.txt

Comment thread crates/relayburn-cli/src/commands/overhead.rs Outdated
Comment thread crates/relayburn-sdk/src/analyze/context_delta.rs Outdated
Comment thread crates/relayburn-sdk/src/query_verbs.rs Outdated
Comment thread crates/relayburn-sdk/src/query_verbs.rs
Comment thread crates/relayburn-sdk/src/reader/claude/span_tree.rs
Comment on lines +462 to +468
// We do NOT recursively build a span tree from `sa.records` here:
// the parser hands us raw `Value` rows, and re-running the parse
// pipeline against an in-memory sidecar is the ingest path's job,
// not the span-tree builder's. Downstream consumers can call
// `build_claude_span_tree` against the materialized child turn(s)
// once they're in the ledger and stitch the subtree client-side.
node
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

The subagent transcript contents are discarded here.

sa.records never become child spans, timestamps, or status on this node, so every Subagent is a leaf. That means the resulting tree cannot actually represent the advertised ToolUse -> Subagent -> ... hierarchy, and any delegated inference/tool activity disappears from the parent turn’s span tree.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/relayburn-sdk/src/reader/claude/span_tree.rs` around lines 462 - 468,
The subagent transcript (sa.records) is being ignored so Subagent nodes are
created as leaves; update the span-tree builder (the code around the `node`
return in build_claude_span_tree / the span-tree construction for Subagent) to
materialize child spans from `sa.records` rather than discarding them:
parse/convert each entry in `sa.records` into proper child span nodes (including
timestamps and status) and attach them to the parent Subagent node so the
ToolUse -> Subagent -> ... hierarchy is preserved; reuse the same
span-construction helpers used for top-level turns to create those child spans
so downstream consumers can still call `build_claude_span_tree` on materialized
child turns if needed.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — this is the documented limitation from PR #451's "Deferred to follow-ups" list ("Content sidecar reads for UserPrompt text body and ToolResult attached file content. The spans are emitted as structural placeholders; future PRs can populate the content without changing tree shape."). The current build_subagent_node deliberately treats Subagent as a leaf with the agent metadata attributes but no child spans derived from sa.records.

Materializing a full child span tree from sa.records is a non-trivial enhancement (requires re-running the parse pipeline against the in-memory sidecar transcript), so it belongs in its own follow-up issue rather than this fix-pass. I'll open a separate issue to track it.


Generated by Claude Code

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@willwashburn Thanks for the context — the inline comment at that exact location makes the intent clear, and I should have caught it before raising this. The deferred-to-follow-up semantics are well-documented both in code and in the PR description.

Would you like me to open a follow-up GitHub issue to track "Materialize child span trees from sa.records in build_subagent_node"? I can include backlinks to this PR and the relevant comment so it's easy to pick up later.


✏️ Learnings added
Learnt from: willwashburn
Repo: AgentWorkforce/burn PR: 452
File: crates/relayburn-sdk/src/reader/claude/span_tree.rs:494-500
Timestamp: 2026-05-26T12:54:28.449Z
Learning: In `crates/relayburn-sdk/src/reader/claude/span_tree.rs`, `build_subagent_node` intentionally treats `Subagent` spans as leaves — `sa.records` (the raw subagent transcript entries) are **not** materialized into child spans. This is a documented design decision deferred from PR `#451` ("Content sidecar reads" follow-up): re-running the parse pipeline against an in-memory sidecar is considered the ingest path's job, not the span-tree builder's. Downstream consumers are expected to call `build_claude_span_tree` on materialized child turns in the ledger and stitch the subtree client-side. Do not flag the absence of recursive child-span construction here as a bug.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found

Tip: cubic can generate docs of your entire codebase and keep them up to date. Try it here.

Re-trigger cubic

Comment thread crates/relayburn-sdk/src/reader/claude/span_tree.rs
Comment thread crates/relayburn-cli/src/commands/overhead.rs Outdated
Comment thread crates/relayburn-sdk/src/reader/codex/span_tree.rs
Comment thread crates/relayburn-sdk/src/reader/claude/span_tree.rs Outdated
claude added 2 commits May 26, 2026 12:51
New `burn overhead deltas` verb answers "what blew up my context
between inference N and inference N+1?" by walking each session's
TurnSpanTree timeline, pairing same-rail Inference spans, and
attributing the delta in `tokens.input + cache_read + cache_write`
to the intervening ToolResult / UserPrompt / SystemReminder leaves.

SDK surface: `LedgerHandle::context_delta(ContextDeltaOpts)` returns
`Vec<ContextDelta>` with per-step intervening breakdown, attributed
cost (charged at cache_read rate — what the *future* will pay for
the persisted prefix), and compaction events surfaced as their own
row rather than negative deltas. Main-rail deltas never see subagent
tool_results and vice versa.

Tool-result token estimates use `output_bytes / 4` as a first-cut
fallback; documented as approximate in the output.

CLI: `burn overhead deltas [--session ID] [--top N] [--min-delta TOK]
[--owner main|subagent|all] [--explain] [--json]`. Default top is 20,
default min_delta is 1000 tokens. Compaction rows ignore min_delta so
they always surface.

Tests: 9 unit tests covering the Bash blow-up driver path, compaction-
replaces-negative-delta, subagent isolation, owner filter, top cap,
min_delta filter, single-inference no-op, and JSON wire format. Two
golden snapshots (`overhead-deltas`, `overhead-deltas-json`) anchor
the CLI output against the fixture ledger.
In-scope #452 fixes (A-G):

- `burn overhead deltas` now honors `--since` (A): thread the parent
  overhead args' `since` into `ContextDeltaOpts`, parse relative ranges
  (`24h`/`7d`/`4w`/`2m`) into `Duration`, and use that to scope the
  session-enumeration `query_turns` seed inside the SDK so the
  `None`-session path no longer walks every historical session.
- Make the per-rail and cross-session `ContextDelta` sort fully
  deterministic across HashMap iteration order (B): chain
  `owner_rail` then `session_id` as final tie-breakers.
- UTF-8-safe `short_turn_label` / `short_agent_label` (C): switch from
  byte slicing to `chars().take(8)` so multi-byte ids never panic.
- `format_signed_tokens` preserves the negative sign (D): emit `-`
  for negative deltas instead of dropping it.
- Sort compactions with `sort_by_cached_key` (E) so `parse_iso_ms`
  runs once per element rather than once per comparison.
- Dedup four copies of `parse_iso_ms` (F) into
  `crates/relayburn-sdk/src/util/time.rs`; the analyze/context_delta,
  query_verbs, reader/claude, and reader/codex copies now share one
  implementation.
- `read_jsonl_values` streams via `BufReader::lines()` (G) rather
  than reading the entire file into memory.

Foundation fixes (carried in #452's diff; will also land on #451):

- Propagate `output_truncated` on `ToolResult` span nodes (H) in
  both the Claude and Codex builders so downstream consumers can
  flag truncated tool outputs.
- Propagate `ToolResult` error status to the parent `ToolUse` (I, J)
  in both builders — the runtime tool_result is the ground truth,
  not the assistant row's `is_error` hint.
- Don't drop subagents whose `paired_tool_use_id` doesn't match any
  ToolUse in the turn (K): drain leftover `paired_subagents` after
  the inference walk and surface them as `unattached` siblings under
  the turn root.
- Stop swallowing real ledger-read failures (M): replace the blanket
  `unwrap_or_default()` on `query_inferences` /
  `query_tool_result_events` with a `match` that tolerates only the
  pre-schema "no such table/column" class of error and propagates
  every other failure.

Tests: `cargo test --workspace` (zero failures, +2 new tests in the
`query_verbs` mod for the since filter and helper). Zero warnings on
`cargo build --workspace`.
@willwashburn willwashburn force-pushed the claude/burn-432-context-delta branch from 05f9aa3 to cf3c61e Compare May 26, 2026 12:52
@willwashburn willwashburn merged commit e7f1395 into main May 26, 2026
11 checks passed
@willwashburn willwashburn deleted the claude/burn-432-context-delta branch May 26, 2026 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

overhead: per-inference context-delta attribution ("what grew the window between inferences")

2 participants