overhead: per-inference context-delta attribution (#432)#452
Conversation
|
Caution Review failedFailed to post review comments 📝 WalkthroughWalkthroughThis PR implements per-inference context-window delta attribution by adding a shared ISO-8601 timestamp parser, improving span-tree builders with error handling and truncation tracking, introducing a context-delta analysis algorithm that pairs consecutive inferences and attributes intervening steps (tool results, user prompts, compactions), exposing SDK query verbs, and delivering a new ChangesPer-inference context-delta attribution pipeline
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces per-inference context-window delta attribution and per-turn span trees as derived analytical primitives. It adds the burn overhead deltas CLI command (and corresponding SDK entry point LedgerHandle::context_delta) to attribute context-window growth to intervening steps like tool results, user prompts, system reminders, or compactions. It also implements per-turn span tree builders for Claude Code and Codex rollouts to project hierarchical turn structures. The reviewer feedback highlights several important improvement opportunities: preventing potential panics from byte-slicing UTF-8 strings in label helpers, fixing a formatting bug for negative tokens in format_signed_tokens, optimizing sorting performance using sort_by_cached_key, centralizing the duplicated parse_iso_ms utility, and improving memory efficiency when reading JSONL files by utilizing a BufReader.
| fn short_turn_label(turn_id: &str) -> String { | ||
| // Turn ids on Claude are `msg-...` UUIDs; trim to a short prefix | ||
| // for the table. Keep the original for JSON output. | ||
| let trimmed = turn_id.trim_start_matches("msg_"); | ||
| let trimmed = trimmed.trim_start_matches("msg-"); | ||
| if trimmed.len() > 8 { | ||
| format!("T{}", &trimmed[..8]) | ||
| } else { | ||
| format!("T{trimmed}") | ||
| } | ||
| } |
There was a problem hiding this comment.
The short_turn_label function uses string slicing &trimmed[..8] which can panic if the 8th byte is not a character boundary (e.g., if the string contains multi-byte UTF-8 characters). Using .chars().take(8).collect() is safer and avoids potential panics.
fn short_turn_label(turn_id: &str) -> String {
// Turn ids on Claude are `msg-...` UUIDs; trim to a short prefix
// for the table. Keep the original for JSON output.
let trimmed = turn_id.trim_start_matches("msg_").trim_start_matches("msg-");
let short: String = trimmed.chars().take(8).collect();
format!("T{short}")
}There was a problem hiding this comment.
Fixed in cf3c61e — short_turn_label now uses chars().take(8).collect() so multi-byte UTF-8 ids never panic on a mid-byte cut.
Generated by Claude Code
| fn short_agent_label(agent_id: &str) -> String { | ||
| let trimmed = agent_id.trim_start_matches("agent-"); | ||
| if trimmed.len() > 8 { | ||
| trimmed[..8].to_string() | ||
| } else { | ||
| trimmed.to_string() | ||
| } | ||
| } |
There was a problem hiding this comment.
The short_agent_label function uses string slicing trimmed[..8] which can panic if the 8th byte is not a character boundary. Using .chars().take(8).collect() is safer and avoids potential panics.
fn short_agent_label(agent_id: &str) -> String {
let trimmed = agent_id.trim_start_matches("agent-");
trimmed.chars().take(8).collect()
}There was a problem hiding this comment.
Fixed in cf3c61e — short_agent_label now uses chars().take(8).collect() (same fix as short_turn_label).
Generated by Claude Code
| fn format_signed_tokens(n: i64) -> String { | ||
| let sign = if n > 0 { "+" } else { "" }; | ||
| format!("{sign}{}", format_tokens(n.unsigned_abs())) | ||
| } |
There was a problem hiding this comment.
The format_signed_tokens function does not handle negative numbers correctly. If n is negative, sign is set to "" and n.unsigned_abs() is formatted, resulting in a positive string representation (e.g., -500 becomes "500"). It should format with a "-" sign for negative numbers.
fn format_signed_tokens(n: i64) -> String {
let sign = if n > 0 {
"+"
} else if n < 0 {
"-"
} else {
""
};
format!("{sign}{}", format_tokens(n.unsigned_abs()))
}There was a problem hiding this comment.
Fixed in cf3c61e — format_signed_tokens now emits - for negative deltas instead of dropping the sign.
Generated by Claude Code
| let mut compactions_sorted: Vec<&CompactionEvent> = compactions.iter().collect(); | ||
| compactions_sorted.sort_by_key(|c| parse_iso_ms(&c.ts).unwrap_or(0)); |
There was a problem hiding this comment.
The sort_by_key method executes the key extraction function parse_iso_ms for every comparison, which is parse_iso_ms involves relatively expensive string parsing and calendar calculations, using sort_by_cached_key is much more efficient as it evaluates the key exactly once per element (
| let mut compactions_sorted: Vec<&CompactionEvent> = compactions.iter().collect(); | |
| compactions_sorted.sort_by_key(|c| parse_iso_ms(&c.ts).unwrap_or(0)); | |
| let mut compactions_sorted: Vec<&CompactionEvent> = compactions.iter().collect(); | |
| compactions_sorted.sort_by_cached_key(|c| parse_iso_ms(&c.ts).unwrap_or(0)); |
There was a problem hiding this comment.
Fixed in cf3c61e — switched to sort_by_cached_key so parse_iso_ms runs once per element instead of once per comparison.
Generated by Claude Code
| } | ||
| } | ||
|
|
||
| /// ISO-8601 -> Unix-ms. Mirror of the parsers elsewhere in the SDK; kept |
There was a problem hiding this comment.
The parse_iso_ms function is duplicated 4 times across different modules in this PR alone (e.g., in context_delta.rs, query_verbs.rs, claude/span_tree.rs, and codex/span_tree.rs). Since all these modules are within the same relayburn-sdk crate, they should share a single internal utility function (e.g., in a shared crate::utils or crate::reader::utils module) to improve maintainability and reduce code duplication.
There was a problem hiding this comment.
Fixed in cf3c61e — moved the parser to a single pub(crate) helper at crates/relayburn-sdk/src/util/time.rs. The four ex-copies (analyze/context_delta.rs, query_verbs.rs, reader/claude/span_tree.rs, reader/codex/span_tree.rs) now all use crate::util::time::parse_iso_ms.
Generated by Claude Code
| fn read_jsonl_values(path: &Path) -> Vec<serde_json::Value> { | ||
| let bytes = match std::fs::read(path) { | ||
| Ok(b) => b, | ||
| Err(_) => return Vec::new(), | ||
| }; | ||
| let text = match std::str::from_utf8(&bytes) { | ||
| Ok(s) => s, | ||
| Err(_) => return Vec::new(), | ||
| }; | ||
| text.lines() | ||
| .filter_map(|line| { | ||
| let t = line.trim(); | ||
| if t.is_empty() { | ||
| None | ||
| } else { | ||
| serde_json::from_str::<serde_json::Value>(t).ok() | ||
| } | ||
| }) | ||
| .collect() | ||
| } |
There was a problem hiding this comment.
The read_jsonl_values function reads the entire file into memory at once using std::fs::read and then converts it to a string. For large JSONL files, this can lead to high memory usage or OOM. Reading the file line-by-line using BufReader is much more memory-efficient and robust.
fn read_jsonl_values(path: &Path) -> Vec<serde_json::Value> {
let file = match std::fs::File::open(path) {
Ok(f) => f,
Err(_) => return Vec::new(),
};
let reader = std::io::BufReader::new(file);
use std::io::BufRead;
reader
.lines()
.filter_map(|line| {
let l = line.ok()?;
let t = l.trim();
if t.is_empty() {
None
} else {
serde_json::from_str::<serde_json::Value>(t).ok()
}
})
.collect()
}There was a problem hiding this comment.
Fixed in cf3c61e — read_jsonl_values now opens the file and streams via BufReader::lines() instead of slurping the entire payload into memory.
Generated by Claude Code
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 05f9aa3e6c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // of `since` — that flag caps output rows, not the | ||
| // input scan). | ||
| let mut ids: BTreeSet<String> = BTreeSet::new(); | ||
| let all = self.inner.query_turns(&Query::default())?; |
There was a problem hiding this comment.
Apply
since filtering when collecting context deltas
ContextDeltaOpts includes a time window, but this path always enumerates every session with query_turns(&Query::default()) and never consults opts.since/effective_since, so old sessions can still dominate overhead deltas results even when callers expect a recent window. This makes the top-N output incorrect for bounded-time analysis and breaks the documented since semantics.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in cf3c61e — --since now flows from the parent overhead args through run_deltas into ContextDeltaOpts.since, and the SDK seeds the session-enumeration query_turns with a matching since-scoped Query. Sessions whose latest activity falls outside the window are skipped before their span trees get loaded. New regression test: query_verbs::tests::context_delta_since_filter_excludes_old_sessions.
Generated by Claude Code
| if let Some(bytes) = final_event.output_bytes { | ||
| node.set_attr("output_bytes", AttrValue::Int(bytes as i64)); | ||
| } |
There was a problem hiding this comment.
Preserve tool-result truncation metadata on span nodes
This mapper copies output_bytes from ToolResultEventRecord but drops output_truncated, so downstream delta attribution never marks intervening tool results as truncated. In practice, overhead deltas --explain can present large tool outputs as fully representative even when ingest detected truncation, which degrades attribution accuracy and operator trust; propagate output_truncated onto the ToolResult span (and mirror the same fix in the Codex builder).
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in cf3c61e — both the Claude and Codex build_tool_result_node now propagate output_truncated onto the ToolResult span as an AttrValue::Bool attribute. The context-delta InterveningStep::ToolResult.truncated and --explain render already consume it.
Generated by Claude Code
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (2)
CHANGELOG.md (1)
9-20: ⚡ Quick winShorten the new
[Unreleased]bullets to impact-first release notes.These entries are currently too implementation-heavy for the changelog style in this repo. Please condense to one short user-impact bullet per change and remove issue-link/backstory density.
As per coding guidelines: “Changelog entries should be concise and impact-first… Drop issue/PR links, internal review notes, implementation backstory, and ‘foundation for…’ phrasing…”.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@CHANGELOG.md` around lines 9 - 20, The changelog bullets are too implementation-heavy; rewrite each entry as a single concise, impact-first sentence and remove PR/issue links and implementation/backstory. For the `burn overhead deltas` item, replace the paragraph with a short user-facing line like: "Add context-delta attribution via LedgerHandle::context_delta to show what changed between consecutive inferences." For the `relayburn-sdk` item, replace the paragraph with a short user-facing line like: "Add TurnSpanTree projection APIs (LedgerHandle::turn_span_tree and LedgerHandle::session_span_trees) to expose per-turn span trees for tooling and analysis." Ensure you drop internal details (e.g., TurnSpanTree structure, locked attribute list, compaction behavior, 'orphan' wording) and keep only the user impact and the API symbols.crates/relayburn-sdk/src/reader/codex/span_tree.rs (1)
276-279: 💤 Low valueConsider adding a
wire_str()method toToolResultStatusfor consistency.Line 75 uses
reason.wire_str()forStopReason, but hereToolResultStatusrelies on{:?}Debug formatting + lowercase. If the Debug representation changes (e.g., enum variant rename), the wire string silently changes. A dedicatedwire_str()method would make the contract explicit.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/relayburn-sdk/src/reader/codex/span_tree.rs` around lines 276 - 279, final_event.status is being serialized via format!("{:?}").to_ascii_lowercase(), which couples the wire string to the Debug representation; add a wire_str() method to the ToolResultStatus enum (similar to StopReason::wire_str()) that returns the canonical lowercase wire string for each variant, update the code that sets the node attribute to call final_event.status.wire_str() instead of using Debug-based formatting, and ensure all other places that serialize ToolResultStatus use the new wire_str() to make the contract explicit and stable.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/relayburn-cli/src/commands/overhead.rs`:
- Around line 32-33: The deltas branch currently drops the parent `since` and
constructs query options with `since: None`, so `burn overhead deltas --since
...` is ignored; update the deltas path to accept and propagate the parent
`since` into the deltas query options: modify the call/site around
OverheadAction::Deltas(deltas) / run_deltas(globals, deltas) so `run_deltas` (or
the code that builds the deltas query) receives the `args.since` value and uses
it when building the query options (replace `since: None` with the passed
`since`), and ensure any helper that constructs the query options (the code
around the current `since: None` usage) uses that propagated value.
In `@crates/relayburn-sdk/src/analyze/context_delta.rs`:
- Around line 379-386: The current sort closure used in out.sort_by compares
delta_tokens, turn_id, and inference_idx but still leaves ties dependent on
per_rail insertion order; update the comparator used where out.sort_by(...) is
defined to add additional deterministic tie-breakers by chaining .then_with(||
a.owner_rail.cmp(&b.owner_rail)) and finally .then_with(||
a.session_id.cmp(&b.session_id)) (or vice‑versa if session_id is preferred
first) so the ordering is fully deterministic across HashMap iteration; locate
the sort_by closure that references delta_tokens, turn_id, and inference_idx and
append these comparisons to it.
In `@crates/relayburn-sdk/src/query_verbs.rs`:
- Around line 4581-4593: When opts.session is None, the current session_ids
collection uses Query::default() which ignores opts.since; change it to
construct a Query that sets the since field from opts.since (or equivalent
time-scope) and pass that query to self.inner.query_turns to seed the BTreeSet
of session IDs, then continue to call deltas_for_session (or other full-tree
loaders) only for those session IDs; update the code paths that build the Query
and call query_turns (refer to opts.since, session_ids, self.inner.query_turns,
and deltas_for_session) so the initial session enumeration honors the --since
filter.
- Around line 4238-4240: Replace the unconditional unwrap_or_default() on the
query results so real ledger-read errors are not silently swallowed: for both
self.inner.query_inferences(&session_q) and
self.inner.query_tool_result_events(&session_q) match the Result and only fall
back to the pre-schema default when the error explicitly indicates a missing
table/column/schema; for any other Err return propagate the error (e.g., using
the ? operator or returning Err) so read failures surface. Reference the
query_inferences/query_tool_result_events calls and the
inferences/tool_result_events variables when making this change, or extract a
helper like is_schema_missing(err) to centralize the schema-missing check.
In `@crates/relayburn-sdk/src/reader/claude/span_tree.rs`:
- Around line 462-468: The subagent transcript (sa.records) is being ignored so
Subagent nodes are created as leaves; update the span-tree builder (the code
around the `node` return in build_claude_span_tree / the span-tree construction
for Subagent) to materialize child spans from `sa.records` rather than
discarding them: parse/convert each entry in `sa.records` into proper child span
nodes (including timestamps and status) and attach them to the parent Subagent
node so the ToolUse -> Subagent -> ... hierarchy is preserved; reuse the same
span-construction helpers used for top-level turns to create those child spans
so downstream consumers can still call `build_claude_span_tree` on materialized
child turns if needed.
- Around line 176-180: The loop currently only attaches nodes from
unpaired_subagents and drops any remaining entries in paired_subagents; instead,
after the existing for sa in unpaired_subagents loop, iterate over any leftover
items in paired_subagents and push build_subagent_node(sa, true) onto
root.children (e.g., via paired_subagents.drain(..) or into_iter() depending on
ownership) so unmatched subagents in paired_subagents are surfaced as unattached
root children; update the code that mutates root.children, paired_subagents, and
unpaired_subagents accordingly.
---
Nitpick comments:
In `@CHANGELOG.md`:
- Around line 9-20: The changelog bullets are too implementation-heavy; rewrite
each entry as a single concise, impact-first sentence and remove PR/issue links
and implementation/backstory. For the `burn overhead deltas` item, replace the
paragraph with a short user-facing line like: "Add context-delta attribution via
LedgerHandle::context_delta to show what changed between consecutive
inferences." For the `relayburn-sdk` item, replace the paragraph with a short
user-facing line like: "Add TurnSpanTree projection APIs
(LedgerHandle::turn_span_tree and LedgerHandle::session_span_trees) to expose
per-turn span trees for tooling and analysis." Ensure you drop internal details
(e.g., TurnSpanTree structure, locked attribute list, compaction behavior,
'orphan' wording) and keep only the user impact and the API symbols.
In `@crates/relayburn-sdk/src/reader/codex/span_tree.rs`:
- Around line 276-279: final_event.status is being serialized via
format!("{:?}").to_ascii_lowercase(), which couples the wire string to the Debug
representation; add a wire_str() method to the ToolResultStatus enum (similar to
StopReason::wire_str()) that returns the canonical lowercase wire string for
each variant, update the code that sets the node attribute to call
final_event.status.wire_str() instead of using Debug-based formatting, and
ensure all other places that serialize ToolResultStatus use the new wire_str()
to make the contract explicit and stable.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: d8fbbd76-7ba9-4d50-970e-78c69c141584
📒 Files selected for processing (16)
CHANGELOG.mdcrates/relayburn-cli/src/cli.rscrates/relayburn-cli/src/commands/overhead.rscrates/relayburn-sdk/src/analyze.rscrates/relayburn-sdk/src/analyze/context_delta.rscrates/relayburn-sdk/src/analyze/span_tree.rscrates/relayburn-sdk/src/lib.rscrates/relayburn-sdk/src/query_verbs.rscrates/relayburn-sdk/src/reader.rscrates/relayburn-sdk/src/reader/claude.rscrates/relayburn-sdk/src/reader/claude/span_tree.rscrates/relayburn-sdk/src/reader/codex.rscrates/relayburn-sdk/src/reader/codex/span_tree.rstests/fixtures/cli-golden/invocations.jsontests/fixtures/cli-golden/snapshots/overhead-deltas-json.stdout.txttests/fixtures/cli-golden/snapshots/overhead-deltas.stdout.txt
| // We do NOT recursively build a span tree from `sa.records` here: | ||
| // the parser hands us raw `Value` rows, and re-running the parse | ||
| // pipeline against an in-memory sidecar is the ingest path's job, | ||
| // not the span-tree builder's. Downstream consumers can call | ||
| // `build_claude_span_tree` against the materialized child turn(s) | ||
| // once they're in the ledger and stitch the subtree client-side. | ||
| node |
There was a problem hiding this comment.
The subagent transcript contents are discarded here.
sa.records never become child spans, timestamps, or status on this node, so every Subagent is a leaf. That means the resulting tree cannot actually represent the advertised ToolUse -> Subagent -> ... hierarchy, and any delegated inference/tool activity disappears from the parent turn’s span tree.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/relayburn-sdk/src/reader/claude/span_tree.rs` around lines 462 - 468,
The subagent transcript (sa.records) is being ignored so Subagent nodes are
created as leaves; update the span-tree builder (the code around the `node`
return in build_claude_span_tree / the span-tree construction for Subagent) to
materialize child spans from `sa.records` rather than discarding them:
parse/convert each entry in `sa.records` into proper child span nodes (including
timestamps and status) and attach them to the parent Subagent node so the
ToolUse -> Subagent -> ... hierarchy is preserved; reuse the same
span-construction helpers used for top-level turns to create those child spans
so downstream consumers can still call `build_claude_span_tree` on materialized
child turns if needed.
There was a problem hiding this comment.
Acknowledged — this is the documented limitation from PR #451's "Deferred to follow-ups" list ("Content sidecar reads for UserPrompt text body and ToolResult attached file content. The spans are emitted as structural placeholders; future PRs can populate the content without changing tree shape."). The current build_subagent_node deliberately treats Subagent as a leaf with the agent metadata attributes but no child spans derived from sa.records.
Materializing a full child span tree from sa.records is a non-trivial enhancement (requires re-running the parse pipeline against the in-memory sidecar transcript), so it belongs in its own follow-up issue rather than this fix-pass. I'll open a separate issue to track it.
Generated by Claude Code
There was a problem hiding this comment.
@willwashburn Thanks for the context — the inline comment at that exact location makes the intent clear, and I should have caught it before raising this. The deferred-to-follow-up semantics are well-documented both in code and in the PR description.
Would you like me to open a follow-up GitHub issue to track "Materialize child span trees from sa.records in build_subagent_node"? I can include backlinks to this PR and the relevant comment so it's easy to pick up later.
✏️ Learnings added
Learnt from: willwashburn
Repo: AgentWorkforce/burn PR: 452
File: crates/relayburn-sdk/src/reader/claude/span_tree.rs:494-500
Timestamp: 2026-05-26T12:54:28.449Z
Learning: In `crates/relayburn-sdk/src/reader/claude/span_tree.rs`, `build_subagent_node` intentionally treats `Subagent` spans as leaves — `sa.records` (the raw subagent transcript entries) are **not** materialized into child spans. This is a documented design decision deferred from PR `#451` ("Content sidecar reads" follow-up): re-running the parse pipeline against an in-memory sidecar is considered the ingest path's job, not the span-tree builder's. Downstream consumers are expected to call `build_claude_span_tree` on materialized child turns in the ledger and stitch the subtree client-side. Do not flag the absence of recursive child-span construction here as a bug.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
There was a problem hiding this comment.
4 issues found
Tip: cubic can generate docs of your entire codebase and keep them up to date. Try it here.
Re-trigger cubic
New `burn overhead deltas` verb answers "what blew up my context between inference N and inference N+1?" by walking each session's TurnSpanTree timeline, pairing same-rail Inference spans, and attributing the delta in `tokens.input + cache_read + cache_write` to the intervening ToolResult / UserPrompt / SystemReminder leaves. SDK surface: `LedgerHandle::context_delta(ContextDeltaOpts)` returns `Vec<ContextDelta>` with per-step intervening breakdown, attributed cost (charged at cache_read rate — what the *future* will pay for the persisted prefix), and compaction events surfaced as their own row rather than negative deltas. Main-rail deltas never see subagent tool_results and vice versa. Tool-result token estimates use `output_bytes / 4` as a first-cut fallback; documented as approximate in the output. CLI: `burn overhead deltas [--session ID] [--top N] [--min-delta TOK] [--owner main|subagent|all] [--explain] [--json]`. Default top is 20, default min_delta is 1000 tokens. Compaction rows ignore min_delta so they always surface. Tests: 9 unit tests covering the Bash blow-up driver path, compaction- replaces-negative-delta, subagent isolation, owner filter, top cap, min_delta filter, single-inference no-op, and JSON wire format. Two golden snapshots (`overhead-deltas`, `overhead-deltas-json`) anchor the CLI output against the fixture ledger.
In-scope #452 fixes (A-G): - `burn overhead deltas` now honors `--since` (A): thread the parent overhead args' `since` into `ContextDeltaOpts`, parse relative ranges (`24h`/`7d`/`4w`/`2m`) into `Duration`, and use that to scope the session-enumeration `query_turns` seed inside the SDK so the `None`-session path no longer walks every historical session. - Make the per-rail and cross-session `ContextDelta` sort fully deterministic across HashMap iteration order (B): chain `owner_rail` then `session_id` as final tie-breakers. - UTF-8-safe `short_turn_label` / `short_agent_label` (C): switch from byte slicing to `chars().take(8)` so multi-byte ids never panic. - `format_signed_tokens` preserves the negative sign (D): emit `-` for negative deltas instead of dropping it. - Sort compactions with `sort_by_cached_key` (E) so `parse_iso_ms` runs once per element rather than once per comparison. - Dedup four copies of `parse_iso_ms` (F) into `crates/relayburn-sdk/src/util/time.rs`; the analyze/context_delta, query_verbs, reader/claude, and reader/codex copies now share one implementation. - `read_jsonl_values` streams via `BufReader::lines()` (G) rather than reading the entire file into memory. Foundation fixes (carried in #452's diff; will also land on #451): - Propagate `output_truncated` on `ToolResult` span nodes (H) in both the Claude and Codex builders so downstream consumers can flag truncated tool outputs. - Propagate `ToolResult` error status to the parent `ToolUse` (I, J) in both builders — the runtime tool_result is the ground truth, not the assistant row's `is_error` hint. - Don't drop subagents whose `paired_tool_use_id` doesn't match any ToolUse in the turn (K): drain leftover `paired_subagents` after the inference walk and surface them as `unattached` siblings under the turn root. - Stop swallowing real ledger-read failures (M): replace the blanket `unwrap_or_default()` on `query_inferences` / `query_tool_result_events` with a `match` that tolerates only the pre-schema "no such table/column" class of error and propagates every other failure. Tests: `cargo test --workspace` (zero failures, +2 new tests in the `query_verbs` mod for the since filter and helper). Zero warnings on `cargo build --workspace`.
05f9aa3 to
cf3c61e
Compare
Closes #432.
Depends on #451 (span tree foundation, #430) being in
mainfirst. Branch is currently based onclaude/burn-430-span-tree-foundation; will rebase mechanically ontomainonce #451 lands.Summary
New
burn overhead deltas --session <id>verb. For each inference in a session, computes thecontext_tokensdelta from the prior same-owner inference, attributes the delta to intervening tool_results / user_prompts / system_reminders / compaction events. Answers "what blew up my context between inference 5 and inference 6".Pure derivation from span trees + content sidecars. No DB writes, no schema changes.
SDK
ContextDeltacarriessession_id,turn_id,inference_idx,owner_rail(Main | Subagent(agent_id)),prior_context_tokens,current_context_tokens,delta_tokens,intervening: Vec<InterveningStep>, andattributed_cost_usd.InterveningStepvariants:ToolResult { tool_use_id, tool_name, approx_tokens, approx_bytes, truncated },UserPrompt,SystemReminder(reserved — see approximations below),Compaction { tokens_freed },Other.CLI
Human render:
Inference | Owner | Δtokens | Δcost | Drivertable whereDriver= largest intervening step (or "N steps" with--explain). JSON: fullVec<ContextDelta>.Key decisions
Cost rate:
max(delta_tokens, 0) * curr.model.cache_readper million. Rationale (documented inline): cache_read is what every future inference pays for the persisted prefix the delta added. cache_write is paid once; cache_read is the steady-state cost the user keeps paying. Matches the issue's open-question #3 recommendation. Cost is0.0for models the pricing table doesn't recognize, matching the rest ofanalyze/.Subagent rail isolation: each
Subagentspan flips the active owner rail toOwnerRail::Subagent { agent_id }for its entire subtree during DFS. Leaves under a Task fanout never enter the main-rail timeline. Tested directly bysubagent_isolation_main_rail_excludes_subagent_results.Compaction handling: when
delta < 0and aCompactionEventsits betweenprevandcurr, surface asCompaction { tokens_freed }instead of a negative delta. Compaction rows bypassmin_deltafiltering so a small compaction never silently vanishes. Tested bycompaction_replaces_negative_delta.Known approximations (documented in source + output)
output_bytes / 4— no real tokenizer pass yet<system-reminder>content classified asReminderSource::Otherfor first-cut; relaycast/harness classification deferred to overhead: account for <system-reminder> injection cost (relaycast, harness, …) #425SystemReminderInterveningStepvariant is reserved (#[allow(dead_code)]) until the span-tree builder synthesizes those leavesTest plan
cargo test --workspace— 891 passed, 0 failed, zero warnings on buildBURN_GOLDEN=1 cargo test --test golden— passing including 2 new snapshots (overhead-deltas,overhead-deltas-json)context_delta.rscovering: known >10k jump via Bash result, compaction replaces negative delta, subagent rail isolation, single-inference graceful handling,--min-deltafilter,--topcapoverhead.rscovering renderer output shapeOut of scope
burn summary— separate follow-up#[non_exhaustive]Files
New (1):
crates/relayburn-sdk/src/analyze/context_delta.rsModified (5):
analyze.rs,lib.rs,query_verbs.rs,cli.rs(deltas args),commands/overhead.rs(runner + renderers),CHANGELOG.md, plus 2 new golden snapshots + invocations entry.Generated by Claude Code