cli-golden: fixture corpus + TS-CLI snapshots + diff runner (#248 part c) by willwashburn · Pull Request #310 · AgentWorkforce/burn

willwashburn · 2026-05-06T12:24:30Z

Summary

Wave 1 D5 of the Rust CLI port (#248). Captures the current TS CLI's output across a synthetic fixture ledger so the eight Wave 2 fan-out PRs have a deterministic golden target to assert against.

The #248 work splits across three parallel Wave 1 PRs:

[Rust port] relayburn-cli: clap CLI + mcp-server subcommand (closes #210) #248-a — clap CLI scaffold + render helpers
[Rust port] relayburn-cli: clap CLI + mcp-server subcommand (closes #210) #248-b — HarnessAdapter trait + lazy phf registry + watch-loop integration
[Rust port] relayburn-cli: clap CLI + mcp-server subcommand (closes #210) #248-c — this PR — golden-output fixture corpus + diff runner

The Rust CLI is still a eprintln!(\"not yet implemented\") + exit(1) stub today; that's intentional. This PR ships the target — Wave 2's eight fan-out PRs flip per-command `enabled` flags in `invocations.json` as each Rust command implementation lands.

What ships

`tests/fixtures/cli-golden/` — hand-built synthetic ledger covering claude / codex / opencode sources, with activity-classifier categories (coding, testing, review, delegation) populated so `compare` and `hotspots` produce non-empty buckets.
`tests/fixtures/cli-golden/scripts/{build-ledger,capture-snapshots}.mjs` — node scripts that regenerate the ledger from scratch and re-run every TS-CLI invocation against it under a sealed env (`RELAYBURN_HOME` at the fixture, `HOME` at an empty tmp dir so `ingestAll` finds no real sessions, `RELAYBURN_ARCHIVE=0` to force the streaming-ledger path).
`tests/fixtures/cli-golden/snapshots/` — 16 captured stdout files covering `summary`, `hotspots`, `overhead`, `overhead trim`, `compare`, `state status` (TTY + `--json` each) plus help text for `ingest`, `run`, `mcp-server`, and the top-level command.
`tests/fixtures/cli-golden/invocations.json` — args + per-invocation env, with `enabled: false` toggles Wave 2 PRs flip on as each Rust command lands.
`crates/relayburn-cli/tests/golden.rs` — Rust integration test that iterates invocations.json, spawns the Rust `burn` binary against the fixture, and prints a unified diff on mismatch. Gated on `BURN_GOLDEN=1` so plain `cargo test --workspace` stays green while the Rust CLI is a stub; per-invocation `enabled: false` keeps even `BURN_GOLDEN=1` runs silent until Wave 2 wires up each command.
`tests/fixtures/cli-golden/README.md` documents what's snapshotted, how to refresh, the per-command Wave 2 ownership table, and how each Wave 2 PR flips its `enabled` flags.
`pnpm run golden:capture` script in `package.json` for one-shot regeneration.

How Wave 2 un-ignores per command

Each Wave 2 PR (D1–D8 in `RUST_PORT_WAVE_PLAN.md`) flips the matching `enabled: true` in `invocations.json` once its Rust command implementation prints byte-identical output:

Wave 2 dev	Scope	Flip these enabled flags
D1	summary + hotspots	`summary`, `summary-json`, `hotspots`, `hotspots-json`
D2	overhead + overhead trim	`overhead`, `overhead-json`, `overhead-trim`, `overhead-trim-json`
D3	compare	`compare`, `compare-json`
D4	state	`state-status`, `state-status-json`
D5	run + Claude adapter	`run-help`, `top-level-help`
D8	ingest + mcp-server	`ingest-help`, `mcp-server-help`, `top-level-help`

The last Wave 2 PR also drops the `BURN_GOLDEN=1` env-var guard from `tests/golden.rs` so the diff runner runs by default in CI from then on.

Determinism notes

Pricing comes from the vendored `packages/analyze/pricing/models.dev.json` snapshot — no network calls during capture.
The `stamp()` writer uses `new Date().toISOString()`; the build-ledger script post-processes the ledger to substitute the live timestamp with a fixed one so the JSONL hashes the same on every run. Stamps aren't index-hashed, so this is safe.
Absolute paths in stdout (e.g. `state status --json`'s `archivePath`) are normalized to `${RELAYBURN_HOME}` and `${PROJECT}` placeholders. The Rust diff runner applies the same substitutions before comparing.
Wall-clock millisecond fields (`ledgerMtimeMsCurrent`, `lastBuiltAt`, `lastRebuildAt`) are squashed to `${MTIME}` / `${TS}` placeholders.
Re-running the capture script twice produces zero diff against committed snapshots.

Test plan

`cargo build --workspace` clean
`cargo test --workspace` green (golden test passes as a no-op when `BURN_GOLDEN!=1`)
`BURN_GOLDEN=1 cargo test --test golden -- --nocapture` green (all invocations skipped via `enabled: false`)
Manual smoke: temporarily flip one invocation to `enabled: true`, re-run with `BURN_GOLDEN=1` — runner correctly fails with a unified diff between the snapshot and the stub binary's "not yet implemented" output, then reverts cleanly
`pnpm run golden:capture` is reproducible — `git diff` after a fresh capture is empty
All 16 snapshots are committed; `tests/fixtures/cli-golden/ledger/.gitignore` keeps the regenerable archive / cursor / hwm files out of git

…t c) Wave 1 D5 of the Rust CLI port (#248). Captures the current TS CLI's output across a synthetic fixture ledger so the eight Wave 2 fan-out PRs have a deterministic target to assert against. - tests/fixtures/cli-golden/ — hand-built ledger covering claude / codex / opencode sources, plus the activity buckets compare and hotspots aggregate over (coding, testing, review, delegation) - tests/fixtures/cli-golden/scripts/{build-ledger,capture-snapshots}.mjs — node scripts that regenerate the ledger and re-run every TS-CLI invocation against it under a sealed env (RELAYBURN_HOME at the fixture, HOME at a tmp dir so ingestAll finds no real sessions, RELAYBURN_ARCHIVE=0 to force the streaming path) - tests/fixtures/cli-golden/snapshots/ — 16 captured stdout files covering summary / hotspots / overhead / overhead-trim / compare / state-status (TTY + --json) plus help text for ingest / run / mcp-server / top-level - tests/fixtures/cli-golden/invocations.json — args + sealed env per snapshot, with enabled: bool toggles Wave 2 PRs flip on per command as the Rust implementation lands - crates/relayburn-cli/tests/golden.rs — Rust integration test that iterates invocations.json, spawns the Rust burn binary against the fixture, and prints a unified diff on mismatch. Gated on BURN_GOLDEN=1 so cargo test --workspace stays green while the Rust CLI is a stub; per-invocation enabled: false keeps even BURN_GOLDEN=1 runs silent until Wave 2 wires up each command. - tests/fixtures/cli-golden/README.md documents what's snapshotted, how to refresh, the per-command Wave 2 ownership table, and how each Wave 2 PR should flip its enabled flags. Refresh: pnpm run golden:capture (or run the script directly).

coderabbitai · 2026-05-06T12:24:48Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 16abfe75-07fc-4099-a9d4-47cb0d6ad45e

📥 Commits

Reviewing files that changed from the base of the PR and between 4b1e0cb and 47d0f25.

📒 Files selected for processing (2)

CHANGELOG.md
crates/relayburn-cli/Cargo.toml

📝 Walkthrough

Walkthrough

Adds a gated Rust golden-output integration test for the burn CLI plus a deterministic fixture ecosystem and capture tooling: invocation fixtures, reproducible ledger generator, snapshot capture/normalization scripts, many golden snapshots, docs, and changelog entries.

Changes

CLI Golden-Output Test Infrastructure

Layer / File(s)	Summary
Test Harness / Dev deps `crates/relayburn-cli/tests/golden.rs`, `crates/relayburn-cli/Cargo.toml`	Adds a Rust integration test that deserializes invocation fixtures, spawns the `burn` binary with a sealed HOME, normalizes stdout/stderr (paths, timestamps, numeric fields), diffs against snapshots, and gates via `BURN_GOLDEN`. Adds documentation-only dev-dep notes for serde.
Fixture Definitions `tests/fixtures/cli-golden/invocations.json`, `tests/fixtures/cli-golden/ledger/*`, `tests/fixtures/cli-golden/project/CLAUDE.md`, `tests/fixtures/cli-golden/ledger/.gitignore`	Introduces invocation specs (disabled by default), deterministic ledger artifacts (`ledger.idx`, `ledger.content.idx`, `ledger.jsonl`), project files for snapshotting, and a ledger `.gitignore`.
Fixture Generators `tests/fixtures/cli-golden/scripts/build-ledger.mjs`	Node script that deterministically builds the ledger JSONL/index with fixed timestamps and multi-session turns used for snapshot capture.
Snapshot Capture Orchestration `tests/fixtures/cli-golden/scripts/capture-snapshots.mjs`, `package.json`	Node capture script rebuilds ledger, seals HOME, runs CLI invocations, normalizes outputs for portability, and writes snapshot artifacts; adds `golden:capture` npm script.
Golden Snapshots `tests/fixtures/cli-golden/snapshots/.txt`, `tests/fixtures/cli-golden/snapshots/.json.txt`	Adds many textual and JSON golden snapshots (top-level help, summary, hotspots, compare, overhead, state-status, ingest, run, etc.) representing expected CLI outputs.
Docs & Changelog `tests/fixtures/cli-golden/README.md`, `CHANGELOG.md`	Adds README documenting fixture layout, provenance, refresh procedure, Wave-2 usage; updates CHANGELOG with an entry describing the golden-test rig and associated artifacts.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Runner as Golden Test Runner (Node/Rust)
  participant CLI as burn CLI (binary)
  participant Ledger as Deterministic Ledger (RELAYBURN_HOME)
  participant Snap as Golden Snapshots

  Runner->>Ledger: build or ensure deterministic ledger (build-ledger.mjs)
  Runner->>Runner: create sealed HOME env
  Runner->>CLI: spawn with args/env per invocation
  CLI->>Runner: stdout/stderr + exit status
  Runner->>Runner: normalize output (paths, timestamps, numeric fields)
  Runner->>Snap: compare normalized output vs snapshot
  alt mismatch
    Runner->>Runner: emit unified diff
  else match
    Runner->>Runner: record success
  end
  Runner->>Runner: aggregate failures and fail test if any

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

AgentWorkforce/burn#291: Ports hotspots aggregator into relayburn-analyze — directly related to the hotspots snapshots added here.
AgentWorkforce/burn#305: Monolith restructure and ports of ingest/analyze modules — overlaps with the Rust ports exercised by these fixtures.
AgentWorkforce/burn#296: Ports gap-warning state machine and reingest logic into Rust — related to the ingest help and ledger artifacts used by the harness.

Poem

🐇
I stitched a ledger neat and true,
fixed stamps and paths so tests pass through;
I ran the burn and scrubbed each line,
matched gold with care — the snaps now shine. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Linked Issues check	⚠️ Warning	The linked issue `#39` concerns plan-based monthly quota tracking for cost budgeting, which is unrelated to this PR's golden-output test harness. The PR makes no changes toward plan tracking, monthly budgets, or cost forecasting.	Either unlink issue `#39` and link only if a related cost-plan issue exists, or clarify how the golden fixture connects to plan-based quota tracking objectives.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title clearly and specifically describes the main deliverable: adding a CLI golden fixture corpus, TypeScript snapshots, and diff runner infrastructure for Wave 1 D5 of the Rust CLI port.
Description check	✅ Passed	The PR description is comprehensive and directly related to the changeset, detailing the fixture ledger, scripts, snapshot files, integration test, and determinism measures that align with all added files.
Out of Scope Changes check	✅ Passed	All changes are in-scope for the stated objective: building a golden-output test fixture corpus, snapshot capture scripts, snapshot files, invocation metadata, and a Rust diff-runner integration test. No unrelated edits are present.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch rust-port/cli-golden-snapshots

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6cb0838fd4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-06T12:28:23Z

+        out.push_str(&rest[..idx]);
+        out.push_str(&needle);
+        let after_key = &rest[idx + needle.len()..];
+        let trimmed_start = after_key.trim_start_matches(|c: char| c == ' ' || c == '\t');


Match all JSON whitespace when squashing volatile fields

Make squash_numeric_field consume all whitespace after "<key>": rather than only spaces/tabs. The snapshot capture path uses JS regexes like /"lastBuiltAt":\s*\d+/, so it will normalize values even if formatting places the number on the next line, but this Rust path currently skips only ' ' and '\t'. If output formatting ever inserts a newline (or other whitespace) before these numeric timestamps, the Rust runner will stop normalizing those fields and report false golden mismatches for otherwise equivalent output.

Useful? React with 👍 / 👎.

Fixed in 1ef101c. Switched the byte-level skip loop to match the full ASCII whitespace set ( , \t, \n, \r, \x0b, \x0c) so the Rust runner mirrors the JS capture path's \s*\d+ exactly — including pretty-printed numerics across line breaks. Worth flagging: char::is_ascii_whitespace is not equivalent (it excludes U+000B vertical tab, which JS \s matches), so I went with an explicit matches! over the byte set. Added four unit tests covering tab, newline+indent, CR/VT/FF, and the non-numeric bail; cargo test --workspace is green.

coderabbitai

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/relayburn-cli/tests/golden.rs`:
- Around line 63-75: The test prints a skip message when BURN_GOLDEN is not "1"
but continues running; modify the conditional that checks golden_gate (the
variable set by std::env::var("BURN_GOLDEN").ok()) to return immediately after
the eprintln so the test exits inertly when golden_gate.as_deref() != Some("1")
— i.e., after the eprintln add an early return from the test function (or a
short-circuit) so no further fixture-dependent logic in this test executes.

In `@tests/fixtures/cli-golden/README.md`:
- Line 11: Update the fenced code block in the README.md so the opening
backticks include a language tag (for example change the opening "```" to
"```text") to satisfy markdownlint MD040; locate the layout tree fenced block
and add the language identifier to the opening fence so the block becomes
"```text" while leaving the closing "```" unchanged.

In `@tests/fixtures/cli-golden/scripts/build-ledger.mjs`:
- Around line 39-62: The script currently computes HOME via ledgerHome() and
immediately deletes known files (FILES) and the content directory; add a hard
safety guard before the deletion loop that verifies HOME is the expected test
fixture path (or that an explicit force flag is present), and aborts with an
error if not; implement this by checking ledgerHome() result (HOME) against the
fixture directory pattern or an environment/CLI flag (e.g., RELAYBURN_HOME or a
--force boolean) and only proceed to run the rm loop and the
rm(`${HOME}/content`, ...) when the check passes, otherwise throw/exit early
with a clear message.
- Around line 389-395: The current replacement silently does nothing when the
regex /("kind":"stamp","ts":")[^"]+(")/g doesn't match, reintroducing
non-determinism; modify the block around rewritten/raw/STA MP_FIXED_TS so the
test fails explicitly when no match is found (e.g., if rewritten === raw or
regex.test(raw) is false) by throwing an Error or failing the test with a clear
message including the ledgerFile and the stamp regex, instead of just skipping
writeFile.

In `@tests/fixtures/cli-golden/snapshots/hotspots.stdout.txt`:
- Around line 17-18: The snapshot contains absolute fixture paths (e.g.
"/tmp/golden-project/...") which must be replaced with placeholder-normalized
paths before writing hotspots.stdout.txt; update the code that produces the
hotspots snapshot for tests/fixtures/cli-golden/snapshots/hotspots.stdout.txt to
run the output through the same path-normalization used by the test harness
(replace project-root paths with "${PROJECT}" and any home paths with
"${RELAYBURN_HOME}" or equivalent) so the lines like
"/tmp/golden-project/src/foo.ts" and "/tmp/golden-project/src/bar.ts" are
emitted with placeholder paths instead of absolute paths prior to snapshot
capture.

In `@tests/fixtures/cli-golden/snapshots/top-level-help.stdout.txt`:
- Around line 16-46: The help output for the subcommand "burn state rebuild
archive" is inconsistent: the synopsis shows "--vacuum" as a flag while the
examples show "vacuum" as a positional argument; pick one and make all usages
consistent. Update the CLI help generation and examples so the command
identifier "state rebuild archive" uses the same form everywhere (either change
the synopsis to "vacuum" positional or change the examples to "--vacuum"), and
adjust any code that constructs the usage string for the "state rebuild" /
"archive" command and the examples fixture "top-level-help.stdout.txt" so both
the synopsis and examples match the chosen syntax. Ensure option parsing and
help text (e.g. the code that defines the "archive" subcommand and the fixture
generator) reflect the single chosen form.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 4b853eb3-6fa6-43d8-8530-008cd40b7121

📥 Commits

Reviewing files that changed from the base of the PR and between 09cc4e0 and 6cb0838.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (29)

CHANGELOG.md
crates/relayburn-cli/Cargo.toml
crates/relayburn-cli/tests/golden.rs
package.json
tests/fixtures/cli-golden/README.md
tests/fixtures/cli-golden/invocations.json
tests/fixtures/cli-golden/ledger/.gitignore
tests/fixtures/cli-golden/ledger/ledger.content.idx
tests/fixtures/cli-golden/ledger/ledger.idx
tests/fixtures/cli-golden/ledger/ledger.jsonl
tests/fixtures/cli-golden/project/CLAUDE.md
tests/fixtures/cli-golden/scripts/build-ledger.mjs
tests/fixtures/cli-golden/scripts/capture-snapshots.mjs
tests/fixtures/cli-golden/snapshots/compare-json.stdout.txt
tests/fixtures/cli-golden/snapshots/compare.stdout.txt
tests/fixtures/cli-golden/snapshots/hotspots-json.stdout.txt
tests/fixtures/cli-golden/snapshots/hotspots.stdout.txt
tests/fixtures/cli-golden/snapshots/ingest-help.stdout.txt
tests/fixtures/cli-golden/snapshots/mcp-server-help.stdout.txt
tests/fixtures/cli-golden/snapshots/overhead-json.stdout.txt
tests/fixtures/cli-golden/snapshots/overhead-trim-json.stdout.txt
tests/fixtures/cli-golden/snapshots/overhead-trim.stdout.txt
tests/fixtures/cli-golden/snapshots/overhead.stdout.txt
tests/fixtures/cli-golden/snapshots/run-help.stdout.txt
tests/fixtures/cli-golden/snapshots/state-status-json.stdout.txt
tests/fixtures/cli-golden/snapshots/state-status.stdout.txt
tests/fixtures/cli-golden/snapshots/summary-json.stdout.txt
tests/fixtures/cli-golden/snapshots/summary.stdout.txt
tests/fixtures/cli-golden/snapshots/top-level-help.stdout.txt

coderabbitai · 2026-05-06T12:30:35Z

+  burn overhead      [trim] [--project <path>] [--since 7d] [--kind <k>] [--top <n>] [--json]
+  burn compare       <model_a,model_b[,...]> [--since 7d] [--project <path>] [--session <id>] [--workflow <id>] [--agent <id>] [--min-sample <n>] [--json|--csv]
+  burn run <claude|codex|opencode>  [--tag k=v ...] [-- <harness args>]
+  burn ingest        [--watch|--hook <name>] [--interval <ms>] [--quiet]


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Top-level ingest usage is missing discoverable flags.

Line [13] omits --opencode-stream and --opencode-url, but ingest command help includes them. Aligning top-level help prevents feature discoverability gaps.

Pushing back here — the snapshot is faithfully capturing intended TS-CLI behavior, not a CLI bug.

Top-level burn --help (packages/cli/src/cli.ts HELP block) renders the Usage section as a one-line summary per command — short, scannable, surface-level. The full per-command flag inventory lives in burn <cmd> --help. So burn ingest [--watch|--hook <name>] [--interval <ms>] [--quiet] deliberately omits --opencode-stream / --opencode-url (and --daemon for that matter); burn ingest --help (snapshot ingest-help.stdout.txt) renders the full grammar:

burn ingest [--quiet] burn ingest --watch [--interval <ms>] [--quiet] [--opencode-stream] [--opencode-url <url>] burn ingest --hook claude [--quiet]

The top-level Examples block also already references burn ingest --watch --opencode-stream, so the discoverability isn't lost. Keeping the top-level summary terse is a deliberate design choice across all commands (compare summary and hotspots summary lines vs their own --help outputs) — adding flags here would set a precedent for cluttering every line.

@willwashburn, that's a clear and well-reasoned explanation — understood. The terse one-liner-per-command convention for the top-level burn --help is intentional and consistent across all commands, with full flag grammars deferred to per-command --help. The Examples block also already surfaces --opencode-stream, so discoverability is intact. I'll withdraw this concern.

(　´･ω･)？

✏️ Learnings added

Learnt from: willwashburn Repo: AgentWorkforce/burn PR: 310 File: tests/fixtures/cli-golden/snapshots/top-level-help.stdout.txt:13-13 Timestamp: 2026-05-06T12:48:38.455Z Learning: In the burn CLI (`packages/cli/src/cli.ts`), the top-level `burn --help` Usage section intentionally shows a short, scannable one-line summary per command and deliberately omits many flags (e.g., `--opencode-stream`, `--opencode-url`, `--daemon` for `burn ingest`). Full per-command flag grammars are only shown in `burn <cmd> --help`. The Examples block in top-level help surfaces key flags for discoverability. Do not flag missing flags in the top-level help snapshot (`top-level-help.stdout.txt`) as a bug.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

coderabbitai · 2026-05-06T12:30:35Z

+  burn state rebuild index | classify | content | archive [--full|--vacuum] | all
+  burn state prune   [--days <n>] [--force]
+  burn state reset   [--force] [--reingest] [--json]
+
+Examples:
+  burn summary --since 24h
+  burn summary --by-provider --provider synthetic
+  burn summary --subagent-tree <session-id>
+  burn summary --by-subagent-type --since 7d
+  burn summary --by-relationship --since 7d
+  burn summary --by-tool --since 7d
+  burn hotspots --since 7d
+  burn hotspots --patterns --since 7d
+  burn hotspots --session --explain-drift
+  burn hotspots --session <session-id>
+  burn overhead --since 30d
+  burn overhead --kind claude-md
+  burn overhead trim --top 3
+  burn overhead trim --json
+  burn compare claude-sonnet-4-6,claude-haiku-4-5 --since 30d
+  burn run claude   --tag workflow=refactor -- --resume
+  burn run codex    --tag workflow=refactor
+  burn run opencode --tag workflow=refactor
+  burn ingest
+  burn ingest --watch
+  burn ingest --watch --opencode-stream
+  burn state
+  burn state prune --days 30
+  burn state rebuild archive
+  burn state rebuild archive --full
+  burn state rebuild archive vacuum


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

state rebuild archive syntax is internally inconsistent.

Line [16] documents --vacuum as a flag, while Line [46] shows vacuum as positional. Please pick one syntax and keep usage/examples consistent.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/fixtures/cli-golden/snapshots/top-level-help.stdout.txt` around lines 16 - 46, The help output for the subcommand "burn state rebuild archive" is inconsistent: the synopsis shows "--vacuum" as a flag while the examples show "vacuum" as a positional argument; pick one and make all usages consistent. Update the CLI help generation and examples so the command identifier "state rebuild archive" uses the same form everywhere (either change the synopsis to "vacuum" positional or change the examples to "--vacuum"), and adjust any code that constructs the usage string for the "state rebuild" / "archive" command and the examples fixture "top-level-help.stdout.txt" so both the synopsis and examples match the chosen syntax. Ensure option parsing and help text (e.g. the code that defines the "archive" subcommand and the fixture generator) reflect the single chosen form.

Pushing back here too — both forms are real, intentionally supported, and documented in the canonical place.

packages/cli/src/commands/state.ts::runArchiveTarget accepts both:

const action = args.positional[2]; const vacuum = args.flags['vacuum'] === true || action === 'vacuum';

So burn state rebuild archive --vacuum and burn state rebuild archive vacuum are equivalent (and --full likewise has flag-only form since there's no build positional sibling worth typing). The per-command help text in REBUILD_HELP documents both grammars:

burn state rebuild archive [--full|--vacuum] [--json] burn state rebuild archive vacuum [--json]

The top-level help's one-line Usage summary picks the flag form (it's the more compact representation and consistent with how the summary describes --full), while the Examples block deliberately demonstrates both invocations so users can see they work. So line 16 (flag-form summary) and line 46 (positional example) are not contradictory — they're showing two equally-valid spellings the CLI accepts. Same precedent as the ingest case above: the top-level summary is short by design; the per-command help is the source of truth for full grammar.

@willwashburn, understood — the dual-form design is intentional and backed by the implementation in runArchiveTarget. Having the synopsis use the compact flag form while the examples demonstrate the positional alias is a deliberate affordance for users, not a contradiction. I'll withdraw the concern.

（´・ω・`)?

✏️ Learnings added

Learnt from: willwashburn Repo: AgentWorkforce/burn PR: 310 File: tests/fixtures/cli-golden/snapshots/top-level-help.stdout.txt:16-46 Timestamp: 2026-05-06T12:48:52.688Z Learning: In `packages/cli/src/commands/state.ts`, `runArchiveTarget` intentionally accepts both `--vacuum` (flag) and `vacuum` (positional arg at index 2) as equivalent spellings. The top-level help synopsis uses the compact flag form (`[--full|--vacuum]`) while the Examples block deliberately shows the positional form (`burn state rebuild archive vacuum`) to demonstrate both valid grammars. This dual-form pattern is by design, not an inconsistency.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

…w fix) The byte-level skip loop only ate ' ' and '\t', so a newline or other ASCII whitespace before a numeric value would have stopped normalization and produced a false golden mismatch. Match the JS capture path's `\s*\d+` semantics by consuming the full ASCII whitespace set (space, tab, LF, CR, VT, FF) — `char::is_ascii_whitespace` is *not* equivalent (it excludes U+000B vertical tab), so list the bytes explicitly. Adds unit tests covering tab, newline+indent, CR/VT/FF, and the non-numeric bail. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

🧹 Nitpick comments (2)

crates/relayburn-cli/tests/golden.rs (2)
154-155: 💤 Low value

Consider distinguishing signal termination from normal exit.

When the process is killed by a signal (e.g., SIGSEGV), code() returns None and this defaults to -1. The failure message would say "expected 0, got -1" without indicating it was a signal. For better diagnostics, consider checking output.status.signal() on Unix platforms to report signal termination explicitly.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/relayburn-cli/tests/golden.rs` around lines 154 - 155, The test
currently maps a missing exit code to -1 which hides signal-based termination;
update the assertion logic in the golden test where expected_status and
actual_status are computed to detect signal termination by checking
output.status.code() and, on Unix, output.status.signal() (or use
StatusExt::signal()) and produce a clearer actual_status/result message (e.g.,
"terminated by signal SIGSEGV" or similar) instead of -1; keep expected_status
logic unchanged but change the reporting/assertion to include signal info when
code() is None so failures show signal termination explicitly.
16-26: 💤 Low value

Documentation references #[ignore] but the test uses BURN_GOLDEN gating instead.

The doc header says "Why this is #[ignore]d on main" and suggests running with --include-ignored, but the actual test function at line 61 is not marked #[ignore]. The gating is implemented via BURN_GOLDEN env var check. Consider updating the doc to reflect the actual mechanism, or remove --include-ignored from the example command since it has no effect.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/relayburn-cli/tests/golden.rs` around lines 16 - 26, The doc header
incorrectly states the test is #[ignore] and suggests using --include-ignored;
update the comment in crates/relayburn-cli/tests/golden.rs to reflect the actual
gating via the BURN_GOLDEN environment variable (or remove the --include-ignored
example). Specifically, edit the top comment that begins "Why this is
`#[ignore]`d on `main`" to explain that the golden test (the test function in
this file) is enabled/disabled by BURN_GOLDEN rather than #[ignore], and adjust
the example command to show setting BURN_GOLDEN=1 instead of using
--include-ignored.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@crates/relayburn-cli/tests/golden.rs`:
- Around line 154-155: The test currently maps a missing exit code to -1 which
hides signal-based termination; update the assertion logic in the golden test
where expected_status and actual_status are computed to detect signal
termination by checking output.status.code() and, on Unix,
output.status.signal() (or use StatusExt::signal()) and produce a clearer
actual_status/result message (e.g., "terminated by signal SIGSEGV" or similar)
instead of -1; keep expected_status logic unchanged but change the
reporting/assertion to include signal info when code() is None so failures show
signal termination explicitly.
- Around line 16-26: The doc header incorrectly states the test is #[ignore] and
suggests using --include-ignored; update the comment in
crates/relayburn-cli/tests/golden.rs to reflect the actual gating via the
BURN_GOLDEN environment variable (or remove the --include-ignored example).
Specifically, edit the top comment that begins "Why this is `#[ignore]`d on
`main`" to explain that the golden test (the test function in this file) is
enabled/disabled by BURN_GOLDEN rather than #[ignore], and adjust the example
command to show setting BURN_GOLDEN=1 instead of using --include-ignored.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 779cfe86-a19b-4ee1-8818-09b164fd5f19

📥 Commits

Reviewing files that changed from the base of the PR and between 6cb0838 and 1ef101c.

📒 Files selected for processing (1)

crates/relayburn-cli/tests/golden.rs

…apshot paths (review fixes round 2) - build-ledger.mjs: refuse to run unless the resolved ledger home is inside an allowlisted prefix (the in-repo fixture dir, $RUNNER_TEMP, or $TMPDIR). Aborts before any rm() so a missing RELAYBURN_HOME can no longer wipe a developer's real ledger. - build-ledger.mjs: hard-fail if the stamp ts normalization regex misses, instead of silently shipping a non-deterministic ledger. - golden.rs: BURN_GOLDEN!=1 now returns immediately so the skip path is truly fixture-free. Per-iteration gate check is dead code and gone. - capture-snapshots.mjs + golden.rs: normalize the synthetic /tmp/golden-project path to \${FIXTURE_PROJECT} on both sides so absolute-looking paths stay out of golden snapshots. - README.md: tag the layout fence as text for MD040.

coderabbitai

🧹 Nitpick comments (1)

crates/relayburn-cli/tests/golden.rs (1)
101-101: 💤 Low value

Consider using system temp directory for sealed HOME.

The sealed HOME is created under fixture_dir (inside the repo at tests/fixtures/cli-golden/), whereas the JS capture script uses os.tmpdir(). If cleanup fails (crash, signal kill), temp directories accumulate in the committed fixture directory. Using std::env::temp_dir() would match the JS behavior and keep the repo clean.
♻️ Suggested change
-    let sealed_home = tempdir_under(&fixture_dir);
+    let sealed_home = tempdir_under(&std::env::temp_dir());
Also applies to: 342-352
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/relayburn-cli/tests/golden.rs` at line 101, The test currently creates
sealed_home via tempdir_under(&fixture_dir) which places temp dirs inside the
repo; change it to create the sealed HOME in the system temp directory
(std::env::temp_dir()) instead — either call the tempdir helper with
std::env::temp_dir() as the base or replace the tempdir_under(&fixture_dir)
invocation in the golden test (sealed_home) with a tempdir created under
std::env::temp_dir(); apply the same change for the other tempdir_under usages
in the same file (the later sealed HOME/temp dir usages) so temp artifacts go to
the OS temp directory, not tests/fixtures.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@crates/relayburn-cli/tests/golden.rs`:
- Line 101: The test currently creates sealed_home via
tempdir_under(&fixture_dir) which places temp dirs inside the repo; change it to
create the sealed HOME in the system temp directory (std::env::temp_dir())
instead — either call the tempdir helper with std::env::temp_dir() as the base
or replace the tempdir_under(&fixture_dir) invocation in the golden test
(sealed_home) with a tempdir created under std::env::temp_dir(); apply the same
change for the other tempdir_under usages in the same file (the later sealed
HOME/temp dir usages) so temp artifacts go to the OS temp directory, not
tests/fixtures.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 33570f6b-6ecb-4b49-8f20-5d731d490051

📥 Commits

Reviewing files that changed from the base of the PR and between 1ef101c and 9d5d173.

📒 Files selected for processing (6)

crates/relayburn-cli/tests/golden.rs
tests/fixtures/cli-golden/README.md
tests/fixtures/cli-golden/scripts/build-ledger.mjs
tests/fixtures/cli-golden/scripts/capture-snapshots.mjs
tests/fixtures/cli-golden/snapshots/hotspots-json.stdout.txt
tests/fixtures/cli-golden/snapshots/hotspots.stdout.txt

🚧 Files skipped from review as they are similar to previous changes (1)

tests/fixtures/cli-golden/snapshots/hotspots-json.stdout.txt

…LOG, Cargo.toml, Cargo.lock conflicts) # Conflicts: # CHANGELOG.md # Cargo.lock # crates/relayburn-cli/Cargo.toml

…cts after #309 landed) # Conflicts: # CHANGELOG.md # Cargo.lock # crates/relayburn-cli/Cargo.toml

chatgpt-codex-connector Bot reviewed May 6, 2026

View reviewed changes

coderabbitai Bot reviewed May 6, 2026

View reviewed changes

willwashburn added 2 commits May 6, 2026 08:59

Merge origin/main into rust-port/cli-golden-snapshots (resolve CHANGE…

4b1e0cb

…LOG, Cargo.toml, Cargo.lock conflicts) # Conflicts: # CHANGELOG.md # Cargo.lock # crates/relayburn-cli/Cargo.toml

Merge origin/main into rust-port/cli-golden-snapshots (resolve confli…

47d0f25

…cts after #309 landed) # Conflicts: # CHANGELOG.md # Cargo.lock # crates/relayburn-cli/Cargo.toml

willwashburn merged commit 582ab95 into main May 6, 2026
3 checks passed

willwashburn deleted the rust-port/cli-golden-snapshots branch May 6, 2026 13:55

Conversation

willwashburn commented May 6, 2026

Summary

What ships

How Wave 2 un-ignores per command

Determinism notes

Test plan

Uh oh!

coderabbitai Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

willwashburn May 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

willwashburn May 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

willwashburn May 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 6, 2026 •

edited

Loading

coderabbitai Bot May 6, 2026 •

edited

Loading

coderabbitai Bot May 6, 2026 •

edited

Loading