feat(tools): perf-harness.sh — multi-run boot-to-end aggregation (P10)#21
feat(tools): perf-harness.sh — multi-run boot-to-end aggregation (P10)#21cemililik wants to merge 3 commits into
Conversation
Single-run boot-to-end claims have ~15-30 % run-to-run variance under QEMU TCG (the counter advances on emulated instructions, not wall- clock time, so translation-cache state leaks into the number). The 2026-05-07 multi-axis review's Track D §D2 promoted the queued P10 proposal — a multi-run harness — from "queued" to "load-bearing before B2 ADR-0027 implementation" so future perf-relevant changes land against a tight percentile band rather than a single anecdote. The harness wraps tools/run-qemu.sh in an iteration loop with a portable per-run watchdog (the kernel halts in WFI after the demo; QEMU never exits on its own), parses the kernel's "boot-to-end elapsed = X ns" emission out of each run, and prints min / p10 / p50 / p90 / p99 / max / mean / stddev in both ns and ms. It optionally writes a markdown report under docs/analysis/reports/ using the perf-review master plan's Inputs / Methodology / Metric / Verdict shape. Pure bash + awk; macOS bash 3.2 compatible (matches run-qemu.sh's existing idioms; no GNU `timeout` binary required). A run aborts non-zero if fewer than half of its iterations produced a valid sample — that threshold is treated as environmental rather than a measurement worth aggregating, per the brief. Out of scope: CI integration (maintainer-launched only, like the QEMU smoke), automatic regression detection vs a stored baseline, the kernel-side `cfg(feature = "bench")` IPC microbench (a future P10 extension; v1 is multi-run aggregation of the existing boot- to-end emission). Refs: P10 (2026-05-06 Track D review), 2026-05-07 multi-axis review §D2
Records the first measured boot-to-end band produced by tools/perf-harness.sh and wires the harness into the project's documentation surface. - docs/analysis/reports/perf-baseline-2026-05-08-post-pr-19-pre- adr-0027.md — auto-generated baseline at HEAD `aa7e6c5`; debug build, 20 iterations, 5 s per-run timeout, QEMU TCG. Headline band: p10=3.884 ms / p50=4.642 ms / p90=5.584 ms / p99=6.558 ms, mean 4.711 ms, stddev 0.709 ms. Brackets the prior "~4-6.5 ms typical" anecdote tightly but is now a measured band on this host rather than an order-of-magnitude observation. - docs/standards/infrastructure.md — new "Performance harness" section names tools/perf-harness.sh as the canonical source for boot-to-end timing claims and deprecates single-run anecdotes in PR bodies. - docs/roadmap/current.md — new 2026-05-08 banner above the 2026-05-07 banner records the harness landing and the measured band; old banner preserved as the historical record (matches the append-only update discipline already used in this file). - docs/analysis/reviews/performance-optimization-reviews/ 2026-05-07-B1-closure.md — one-line cross-reference appended at the end of the "Post-T-015 amendment" section pointing at the new baseline report. The existing single-run claims throughout the 2026-05-07 baseline are deliberately preserved (the brief explicitly asked for the historical record to stay intact). The 2026-05-08 banner does not promote any task to In Progress / Done — the harness is tooling, not a roadmap task. B2 prep (ADR-0027 drafting) remains the active thread. Refs: P10 (2026-05-06 Track D review), 2026-05-07 multi-axis review §D2
ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one. |
There was a problem hiding this comment.
Sorry @cemililik, you have reached your weekly rate limit of 500000 diff characters.
Please try again later or upgrade to continue using Sourcery
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📝 WalkthroughWalkthroughThis PR introduces a performance harness tool for multi-run boot timing aggregation. The changes add ChangesPerformance Harness Tool and Standards
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a performance harness tool (tools/perf-harness.sh) for multi-run boot-to-end timing aggregation and updates documentation to establish a measured performance baseline. The feedback focuses on optimizing the bash script by replacing inefficient piped commands and subshells with more direct awk and while read patterns to improve performance and readability.
| SAMPLE=$(printf '%s\n' "$OUTPUT" \ | ||
| | grep -oE 'boot-to-end elapsed = [0-9]+ ns' \ | ||
| | head -n 1 \ | ||
| | grep -oE '[0-9]+' \ | ||
| | head -n 1 || true) |
There was a problem hiding this comment.
The extraction of the nanosecond sample can be significantly simplified using a single awk command. This approach is more efficient as it avoids multiple piped processes and stops processing after the first match is found. Using a here-string (<<<) is also more idiomatic in Bash for passing variable content to a command.
| SAMPLE=$(printf '%s\n' "$OUTPUT" \ | |
| | grep -oE 'boot-to-end elapsed = [0-9]+ ns' \ | |
| | head -n 1 \ | |
| | grep -oE '[0-9]+' \ | |
| | head -n 1 || true) | |
| SAMPLE=$(awk '/boot-to-end elapsed = [0-9]+ ns/ { print $4; exit }' <<< "$OUTPUT") |
| STATS=$(read_stats) | ||
| STAT_MIN=$(echo "$STATS" | awk '$1=="min" {print $2}') | ||
| STAT_P10=$(echo "$STATS" | awk '$1=="p10" {print $2}') | ||
| STAT_P50=$(echo "$STATS" | awk '$1=="p50" {print $2}') | ||
| STAT_P90=$(echo "$STATS" | awk '$1=="p90" {print $2}') | ||
| STAT_P99=$(echo "$STATS" | awk '$1=="p99" {print $2}') | ||
| STAT_MAX=$(echo "$STATS" | awk '$1=="max" {print $2}') | ||
| STAT_MEAN=$(echo "$STATS" | awk '$1=="mean" {print $2}') | ||
| STAT_STDDEV=$(echo "$STATS" | awk '$1=="stddev" {print $2}') |
There was a problem hiding this comment.
Parsing the STATS output by repeatedly calling awk in subshells is inefficient. A single while read loop can process the key-value pairs in one pass, improving performance and readability while remaining compatible with Bash 3.2.
| STATS=$(read_stats) | |
| STAT_MIN=$(echo "$STATS" | awk '$1=="min" {print $2}') | |
| STAT_P10=$(echo "$STATS" | awk '$1=="p10" {print $2}') | |
| STAT_P50=$(echo "$STATS" | awk '$1=="p50" {print $2}') | |
| STAT_P90=$(echo "$STATS" | awk '$1=="p90" {print $2}') | |
| STAT_P99=$(echo "$STATS" | awk '$1=="p99" {print $2}') | |
| STAT_MAX=$(echo "$STATS" | awk '$1=="max" {print $2}') | |
| STAT_MEAN=$(echo "$STATS" | awk '$1=="mean" {print $2}') | |
| STAT_STDDEV=$(echo "$STATS" | awk '$1=="stddev" {print $2}') | |
| while read -r key val; do | |
| case "$key" in | |
| min) STAT_MIN=$val ;; | |
| p10) STAT_P10=$val ;; | |
| p50) STAT_P50=$val ;; | |
| p90) STAT_P90=$val ;; | |
| p99) STAT_P99=$val ;; | |
| max) STAT_MAX=$val ;; | |
| mean) STAT_MEAN=$val ;; | |
| stddev) STAT_STDDEV=$val ;; | |
| esac | |
| done <<EOF | |
| $(read_stats) | |
| EOF |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tools/perf-harness.sh (1)
420-421: 💤 Low valuePre-compute relative paths to silence SC2295.
${KERNEL_ELF#${REPO_ROOT}/}(line 421) and${REPORT_PATH#${REPO_ROOT}/}(line 492) treat$REPO_ROOTas a glob pattern inside the#operator, so a path containing[,],*, or?would silently mismatch. WhileREPO_ROOTis arealpath-resolved string unlikely to carry glob characters, the fix is a one-liner pre-compute before each use-site.🔧 Proposed fix
+ KERNEL_ELF_REL="${KERNEL_ELF#"$REPO_ROOT"/}" { ... - echo "| Kernel ELF | \`${KERNEL_ELF#${REPO_ROOT}/}\` |" + echo "| Kernel ELF | \`${KERNEL_ELF_REL}\` |"- echo "report written: ${REPORT_PATH#${REPO_ROOT}/}" + REPORT_PATH_REL="${REPORT_PATH#"$REPO_ROOT"/}" + echo "report written: $REPORT_PATH_REL"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/perf-harness.sh` around lines 420 - 421, Pre-compute the relative paths instead of using parameter expansion with REPO_ROOT inline to avoid glob interpretation: create variables like kernel_rel by stripping REPO_ROOT from KERNEL_ELF (e.g., compute kernel_rel from KERNEL_ELF and REPO_ROOT before the echo) and similarly compute report_rel from REPORT_PATH and REPO_ROOT before its use; then use kernel_rel and report_rel in the echo lines that currently reference `${KERNEL_ELF#${REPO_ROOT}/}` and `${REPORT_PATH#${REPO_ROOT}/}` so the `#` operator never sees REPO_ROOT as a glob pattern.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tools/perf-harness.sh`:
- Around line 420-421: Pre-compute the relative paths instead of using parameter
expansion with REPO_ROOT inline to avoid glob interpretation: create variables
like kernel_rel by stripping REPO_ROOT from KERNEL_ELF (e.g., compute kernel_rel
from KERNEL_ELF and REPO_ROOT before the echo) and similarly compute report_rel
from REPORT_PATH and REPO_ROOT before its use; then use kernel_rel and
report_rel in the echo lines that currently reference
`${KERNEL_ELF#${REPO_ROOT}/}` and `${REPORT_PATH#${REPO_ROOT}/}` so the `#`
operator never sees REPO_ROOT as a glob pattern.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0b289f79-0dc6-4fa3-852d-3b738b6ac4c0
📒 Files selected for processing (5)
docs/analysis/reports/perf-baseline-2026-05-08-post-pr-19-pre-adr-0027.mddocs/analysis/reviews/performance-optimization-reviews/2026-05-07-B1-closure.mddocs/roadmap/current.mddocs/standards/infrastructure.mdtools/perf-harness.sh
…ctor) Both gemini-code-assist inline comments on PR #21 flagged efficiency improvements: 1. **L213 sample extraction (4-pipe → single awk).** Replaced the `printf | grep | head | grep | head` pipeline with one awk invocation. Departures from gemini's literal suggestion: - Used sub/sub stripping rather than `print $4` because the kernel's boot-to-end line has the format "tyrne: boot-to-end elapsed = NUM ns" — `$4` is the equals sign; `$5` is the number. The sub/sub pattern is format-shift-tolerant (any "...= NUM ns..." shape resolves to NUM regardless of preceding fields), avoiding the fragility of a positional field index that would break if a future kernel build added a prefix or moved the line. - Confirmed by 5-iteration sanity run: band 3.973 / 4.559 / 5.642 ms p10/p50/p90 (consistent with the existing baseline). 2. **L309 STATS parsing (8 echo|awk → single while-read loop).** Replaced eight `echo "$STATS" | awk '$1=="key" {print $2}'` invocations with one `while read -r key val; do case "$key" in ... esac done <<EOF $(read_stats) EOF` loop. Bash 3.2 compatible (heredoc keeps the loop in the parent shell so variable assignments persist past the loop end). Saves 7 fork+exec per harness invocation; runs once per harness invocation so the absolute saving is small, but the pattern is cleaner and more maintainable. Verification: cargo fmt clean, host-test 159/159 (no source changes), harness 5-iter sanity run produces consistent stats; refactor is behaviour-preserving. Refs: PR #21 review-round (gemini-code-assist) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Superseded by the integration PR for the B2 prep bundle: #19 + #20 + #21 + 2026-05-08 review artefacts + T3-M1 phase-b status-block fix all land together. Branch |
…losure status Re-verified all 13 §Follow-up backlog items from the 2026-05-07 PR #12-#17 multi-axis review against the current integration-branch state. All 9 hygiene items + the 1 forward-flagged P10 harness item are now closed; 3 remain forward-flagged on appropriate downstream venues. Two updates to the consolidated review file: 1. **Top-of-file closure-status banner** — readers see the per-item disposition at-a-glance without scrolling to the bottom backlog. 2. **§Follow-up backlog per-item closure annotations** — items 1-9 gain ✅ + closing-PR + closing-commit references; item 11 (P10 harness) gains ✅ + integration-PR reference + measured-baseline numbers; items 10/12/13 keep their forward-flagged status with "status unchanged 2026-05-08" markers. Re-verification at integration-branch HEAD (per item): | # | Item | Closing PR / commit | Verification | |---|---|---|---| | 1 | current.md + perf re-baseline `.text 22,020` | PR #18 / `94a6c0f` | grep "22,020 bytes" current.md → 1 hit | | 2 | cancel_recv_on_recv_complete test | PR #18 / `25854a1` | grep test name in ipc/mod.rs → 1 hit; host-test 159/159 | | 3 | ipc_cancel_recv doc-rider on cap-bearing state | PR #18 / `25854a1` | grep "destroy-drain callers (Phase B2+)" → 1 hit | | 4 | cancel-block SAFETY wording | PR #18 / `25854a1` | grep "caller_table.*shared.*reborrow" → 1 hit | | 5 | UNSAFE-2026-0014 SHA back-fill (c30f4ee, 7a402cb) | PR #18 / `94a6c0f` | grep both SHAs in unsafe-log.md → 2 hits each | | 6 | unsafe-policy.md §3 mechanical-edit exemption | PR #18 / `94a6c0f` | grep "Mechanical-edit exemption" → 1 hit | | 7 | ADR-0026 §Simulation chronology rider | PR #18 / `94a6c0f` | grep "§Simulation rule was retro-extracted" → 1 hit | | 8 | master-plan AC cross-reference | PR #18 / `94a6c0f` | grep "Closure-trio coordination cross-reference" in security + perf master-plans → 1 hit each | | 9 | ADR-0026 §skill-clause reconciliation rider | PR #18 / `94a6c0f` | grep "single-commit Propose+Accept landing reconciliation" → 1 hit | | 11 | P10 wall-clock harness | this integration PR (replaces #19/#20/#21) | tools/perf-harness.sh exists; baseline report exists; band p10=3.884/p50=4.642/p90=5.584 ms | Forward-flagged (status unchanged): - Item 10: RecvWaiting waiter-identity gap — ADR-0030 / ADR-0019 venue - Item 12: cancel-on-cap-bearing-state destroy-drain ADR — first userspace-destroy task venue - Item 13: B5+ preemption-rollback re-validation of ADR-0032 — B5+ preemption ADR venue This commit only touches the consolidated review's annotation; track files preserved as historical artefacts (their per-track verdicts are the snapshot at the moment of the review, not subject to back-edits). The review's per-item findings (Track-A NIT-2 SchedQueue::new doc rename; Track-G MIN-G1/G2/G3; Track-H MIN-1/MIN-2; Track-A MIN-2 ipc_cancel_recv doc-rider; Track-D D1; Track-F §F-1) are all closed in PR #18 + this integration PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… 2026-05-08 + 2026-05-07 follow-ups Closes the 12 remaining items flagged by the 2026-05-08 multi-axis review's §Follow-up backlog (3 Track-2 Majors, 1 Track-3 Major fix already closed in `59c08e9`, 6 Track-3 / Track-2 Minors, 2 Track-4 Minors, 2 Track-2 Nits) plus the carry-forward 2026-05-07 Track-H NIT-1. **Track 2 Majors (forward-flagged → ADR-0027 / phase-b ledger riders):** - M1 (escape-hatch doc): ADR-0027 §Decision outcome (c) gains a bullet documenting `mem::forget` / `ManuallyDrop` / `let _ = ...` as deliberate-but-rare escape hatches, mirroring the `x86_64::structures::paging::MapperFlush` precedent. - M2 (MMU-instance binding): ADR-0027 §Decision outcome (c) gains a bullet noting `MapperFlush::flush(self, mmu: &impl Mmu)` accepts any `Mmu`; multi-`Mmu` deployments (B3+ per-task `AddressSpace`, Phase C multi-CPU) will need a stronger token type. Out of scope for v1. - M3 (ADR-0034 placeholder): ADR-0027 §Decision outcome adds an "ADR-0034 (kernel-image section permissions) placeholder" block alongside ADR-0033, and `phase-b.md` ADR ledger gains rows for both ADR-0033 and ADR-0034 with named-but-unallocated discipline. **Track 3 Minors (governance / wording polish):** - m1 + m2 (current.md L52): drop "Phase-2" prefix on "§Simulation table" (the table walks Steps 0–4, not a "Phase 2"); "Accept will be" → "Accept landed as" + actual commit SHA `bb0a6ba`. - m3 (commit-style.md PR-numbering rider): new §"PR-number references in committed artefacts" subsection naming the recurrence (PR #18 + PR #20 each had a one-commit PR-number fix-up) and codifying three acceptable disciplines (defer banner authoring; reference branch slug; or use commit SHA). - n1 (framing alignment): "first to apply Simulation forward" wording in current.md (banner + Active decisions row) and phase-b.md §B2 status block aligned to the precise "first non-recovery-primitive state-machine ADR drafted under §Simulation" phrasing — ADR-0032's Propose did land with a table; the prior framing was technically defensible only under a narrow reading of "retro-extracted". **Track 2 Nits (substance riders in ADR-0027):** - #4 (DSB ISH vs DSB NSH rationale): §Simulation gains a rationale paragraph after the table — `ISH` is forward-compatible with the eventual SMP boot, sub-microsecond cost on single-core, matches Linux aarch64 `arch/arm64/mm/proc.S` for the same reason. - #5 (TCR_EL1.AS wording tighten): line 59 reworded — `AS = 0` selects 8-bit ASID *size*, not "the ASID value is 0" (the value is `TTBR0_EL1.ASID = 0` and is what's "globally used in v1"). - #7 (line 17 §-citation precision) + Track-3 n1: "first non-recovery-primitive state-machine" framing now precise. - #8 (memory-management.md L88 page-table descriptor cosmetic): ASCII bit-field diagram redrawn to match L2 block-descriptor reality (OutputAddress[47:21], not [47:12]); explanatory note added for L1-block / L3-page variants. **Track 4 Minors (perf-harness.sh):** - #1 (Ctrl-C cleanup trap): new `cleanup_in_flight` shell function + `trap '...' EXIT INT TERM` that kills any in-flight QEMU + watchdog PIDs tracked in `CURRENT_CMD_PID` / `CURRENT_WATCHDOG_PID` shell globals. `run_with_timeout` updates the globals at every call so the trap addresses whichever pair is currently in flight; clears them at every clean exit so the trap is a no-op outside iterations. - #2 (p99 small-N reporting hygiene): generated baseline report Methodology section gains a "**Note on p99 at small `n`**" paragraph explaining that under nearest-rank `p99 == max` for `n < 100` and callers should not over-read it as a tail-latency signal until `n >= 100`. **Track 4 Nit #3 (read_stats refactor):** **Already closed by PR #21 review-round commit `ef30b5c`** — the 8 `echo | awk` parses became a single `while read` loop. Verified at HEAD (`grep -c "while read -r key val" tools/perf-harness.sh` → 1 hit). **2026-05-07 Track-H NIT-1 (Pending Amendment closure-path indexing):** UNSAFE-2026-0019 / 0020 / 0021 each gain a 2026-05-08 "closure-path indexed" Amendment naming the canonical clearance trigger (B5 Milestone, ADR-0030 entry-point, deadline-arming syscall) explicitly, so a future reader of `unsafe-log.md` alone has the full picture without leaving the file. No semantic change; co-locates information that was previously distributed across `phase-b.md` cross-references. Verification gates re-run on the integration branch: - `cargo fmt --all -- --check` clean - `cargo host-test` 159/159 (25 + 100 + 34) - `cargo host-clippy` clean (-D warnings) - `cargo kernel-clippy` clean - `cargo kernel-build` clean - `tools/perf-harness.sh --iterations=3` runs end-to-end with the new trap + Methodology note + while-read parsing intact This commit + the prior 2026-05-08 review's recommendations close all follow-ups identified by both 2026-05-07 and 2026-05-08 multi-axis reviews. Forward-flagged items (10/12/13 from 2026-05-07) and any review-round bot input on this integration branch remain the only open items. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
P10 wall-clock benchmark harness lands. Promotes single-run boot-to-end anecdotes (the wide ~4–6.5 ms band that absorbed PR #17's 4.1 ms vs PR #16's 5.8 ms without flagging) to a multi-run percentile band measured against a tight statistical envelope — load-bearing for measuring B2 (T-016 MMU activation) regressions against a sound baseline rather than the pre-P10 envelope.
Two commits:
1de8143—feat(tooling): add perf-harness.sh for multi-run boot-to-end aggregation. Pure bash + awk (macOS bash 3.2 + BSD awk compatible). Portable per-run watchdog (no GNUtimeoutdependency).abf26b9—docs: P10 baseline report + roadmap / perf-rebaseline cross-references. New baseline report; new "Performance harness" section indocs/standards/infrastructure.md; new 2026-05-08 banner indocs/roadmap/current.md(older banners preserved); one-line cross-reference in the 2026-05-07 perf re-baseline §"Post-T-015 amendment".Measured baseline
Debug build, 20 iterations, 5 s per-run timeout, QEMU TCG, all 20 valid:
Brackets the previous "~4–6.5 ms typical" anecdote tightly. From now on,
current.mdquotes the measured band instead of single-run anecdotes; B2 changes are measured against this envelope.Harness usage
Flags:
--iterations=K(default 20),--timeout=SECONDS(default 5),--release(forwards totools/run-qemu.sh),--quiet(suppress per-iteration progress),--report=CONTEXT(write a markdown report todocs/analysis/reports/perf-baseline-<context>.md).What's not in this PR
.github/workflows/change. Harness is maintainer-launched, matching the existingtools/run-qemu.shconvention.--baseline=<file>flag that compares against a stored baseline and exits non-zero on drift.cfg(feature = "bench")) — that's a future P10 extension; v1 P10 is multi-run aggregation of the existing boot-to-end emission only.Deviations from the brief
YYYY-MM-DD-from--report=CONTEXTso the user-supplied context2026-05-08-post-pr-19-pre-adr-0027producesperf-baseline-2026-05-08-post-pr-19-pre-adr-0027.md, not the doubled-date form. Same de-duplication on the H1 title.(N+1)/2— at least 50 % valid (10/20), per the brief's literal "fewer than 50 % failure → exit non-zero".Project-side observations for the maintainer
originURL still points atcemililik/UmbrixOS.git; GitHub is redirecting tocemililik/Tyrne.git. Worth runninggit remote set-url origin https://github.com/cemililik/Tyrne.gitso future pushes don't depend on the redirect.awkrejects nested function definitions; this is now handled by liftingpctto top-level inside the awk program. Worth keeping in mind for any future bash + awk tooling.Test plan
timeoutcargo fmt --all -- --checkclean (no Rust changes)cargo host-test159/159 (no source changes)cargo +nightly miri test159/159tools/run-qemu.shsmoke unaffected (full demo trace; baseline report includes the same trace 20 times)docs/analysis/reports/perf-baseline-2026-05-08-post-pr-19-pre-adr-0027.md) cross-references match the harness's live output🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation