Skip to content

feat(tools): perf-harness.sh — multi-run boot-to-end aggregation (P10)#21

Closed
cemililik wants to merge 3 commits into
mainfrom
p10-wall-clock-bench-harness
Closed

feat(tools): perf-harness.sh — multi-run boot-to-end aggregation (P10)#21
cemililik wants to merge 3 commits into
mainfrom
p10-wall-clock-bench-harness

Conversation

@cemililik
Copy link
Copy Markdown
Collaborator

@cemililik cemililik commented May 8, 2026

Summary

P10 wall-clock benchmark harness lands. Promotes single-run boot-to-end anecdotes (the wide ~4–6.5 ms band that absorbed PR #17's 4.1 ms vs PR #16's 5.8 ms without flagging) to a multi-run percentile band measured against a tight statistical envelope — load-bearing for measuring B2 (T-016 MMU activation) regressions against a sound baseline rather than the pre-P10 envelope.

Two commits:

  1. 1de8143feat(tooling): add perf-harness.sh for multi-run boot-to-end aggregation. Pure bash + awk (macOS bash 3.2 + BSD awk compatible). Portable per-run watchdog (no GNU timeout dependency).
  2. abf26b9docs: P10 baseline report + roadmap / perf-rebaseline cross-references. New baseline report; new "Performance harness" section in docs/standards/infrastructure.md; new 2026-05-08 banner in docs/roadmap/current.md (older banners preserved); one-line cross-reference in the 2026-05-07 perf re-baseline §"Post-T-015 amendment".

Measured baseline

Debug build, 20 iterations, 5 s per-run timeout, QEMU TCG, all 20 valid:

Metric ms
p10 3.884
p50 (median) 4.642
p90 5.584
p99 6.558
mean 4.711
stddev 0.709

Brackets the previous "~4–6.5 ms typical" anecdote tightly. From now on, current.md quotes the measured band instead of single-run anecdotes; B2 changes are measured against this envelope.

Harness usage

# Default (20 iterations, 5 s timeout, no report file):
./tools/perf-harness.sh

# Custom iteration count + timeout + named report:
./tools/perf-harness.sh --iterations=50 --timeout=10 --report=2026-05-09-post-t-016

Flags: --iterations=K (default 20), --timeout=SECONDS (default 5), --release (forwards to tools/run-qemu.sh), --quiet (suppress per-iteration progress), --report=CONTEXT (write a markdown report to docs/analysis/reports/perf-baseline-<context>.md).

What's not in this PR

  • CI integration — no .github/workflows/ change. Harness is maintainer-launched, matching the existing tools/run-qemu.sh convention.
  • Regression-detection threshold — the harness prints the band; future enhancement: a --baseline=<file> flag that compares against a stored baseline and exits non-zero on drift.
  • IPC round-trip-specific kernel benchmark (kernel-side micro-bench under cfg(feature = "bench")) — that's a future P10 extension; v1 P10 is multi-run aggregation of the existing boot-to-end emission only.

Deviations from the brief

  • Report-filename construction strips a leading YYYY-MM-DD- from --report=CONTEXT so the user-supplied context 2026-05-08-post-pr-19-pre-adr-0027 produces perf-baseline-2026-05-08-post-pr-19-pre-adr-0027.md, not the doubled-date form. Same de-duplication on the H1 title.
  • Threshold uses (N+1)/2 — at least 50 % valid (10/20), per the brief's literal "fewer than 50 % failure → exit non-zero".

Project-side observations for the maintainer

  • The repo's origin URL still points at cemililik/UmbrixOS.git; GitHub is redirecting to cemililik/Tyrne.git. Worth running git remote set-url origin https://github.com/cemililik/Tyrne.git so future pushes don't depend on the redirect.
  • macOS default awk rejects nested function definitions; this is now handled by lifting pct to top-level inside the awk program. Worth keeping in mind for any future bash + awk tooling.

Test plan

  • Harness runs cleanly on macOS bash 3.2 + BSD awk + no GNU timeout
  • 20-iteration baseline produces a valid percentile band (no failed iterations)
  • cargo fmt --all -- --check clean (no Rust changes)
  • cargo host-test 159/159 (no source changes)
  • cargo +nightly miri test 159/159
  • tools/run-qemu.sh smoke unaffected (full demo trace; baseline report includes the same trace 20 times)
  • Baseline report (docs/analysis/reports/perf-baseline-2026-05-08-post-pr-19-pre-adr-0027.md) cross-references match the harness's live output

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Introduced a performance benchmarking harness for capturing and analyzing boot-to-end timing metrics across multiple iterations with statistical aggregation (min, percentiles, max, mean, stddev).
  • Documentation

    • Established performance measurement standards, including guidelines for percentile-based reporting and baseline capture methodology.
    • Generated initial performance baseline report.
    • Updated roadmap with performance infrastructure implementation status.

cemililik added 2 commits May 8, 2026 13:08
Single-run boot-to-end claims have ~15-30 % run-to-run variance under
QEMU TCG (the counter advances on emulated instructions, not wall-
clock time, so translation-cache state leaks into the number). The
2026-05-07 multi-axis review's Track D §D2 promoted the queued P10
proposal — a multi-run harness — from "queued" to "load-bearing
before B2 ADR-0027 implementation" so future perf-relevant changes
land against a tight percentile band rather than a single anecdote.

The harness wraps tools/run-qemu.sh in an iteration loop with a
portable per-run watchdog (the kernel halts in WFI after the demo;
QEMU never exits on its own), parses the kernel's
"boot-to-end elapsed = X ns" emission out of each run, and prints
min / p10 / p50 / p90 / p99 / max / mean / stddev in both ns and
ms. It optionally writes a markdown report under
docs/analysis/reports/ using the perf-review master plan's Inputs
/ Methodology / Metric / Verdict shape. Pure bash + awk; macOS
bash 3.2 compatible (matches run-qemu.sh's existing idioms; no
GNU `timeout` binary required).

A run aborts non-zero if fewer than half of its iterations
produced a valid sample — that threshold is treated as
environmental rather than a measurement worth aggregating, per
the brief.

Out of scope: CI integration (maintainer-launched only, like the
QEMU smoke), automatic regression detection vs a stored baseline,
the kernel-side `cfg(feature = "bench")` IPC microbench (a future
P10 extension; v1 is multi-run aggregation of the existing boot-
to-end emission).

Refs: P10 (2026-05-06 Track D review), 2026-05-07 multi-axis review §D2
Records the first measured boot-to-end band produced by
tools/perf-harness.sh and wires the harness into the project's
documentation surface.

- docs/analysis/reports/perf-baseline-2026-05-08-post-pr-19-pre-
  adr-0027.md — auto-generated baseline at HEAD `aa7e6c5`; debug
  build, 20 iterations, 5 s per-run timeout, QEMU TCG. Headline
  band: p10=3.884 ms / p50=4.642 ms / p90=5.584 ms / p99=6.558 ms,
  mean 4.711 ms, stddev 0.709 ms. Brackets the prior "~4-6.5 ms
  typical" anecdote tightly but is now a measured band on this
  host rather than an order-of-magnitude observation.
- docs/standards/infrastructure.md — new "Performance harness"
  section names tools/perf-harness.sh as the canonical source for
  boot-to-end timing claims and deprecates single-run anecdotes in
  PR bodies.
- docs/roadmap/current.md — new 2026-05-08 banner above the
  2026-05-07 banner records the harness landing and the measured
  band; old banner preserved as the historical record (matches the
  append-only update discipline already used in this file).
- docs/analysis/reviews/performance-optimization-reviews/
  2026-05-07-B1-closure.md — one-line cross-reference appended at
  the end of the "Post-T-015 amendment" section pointing at the
  new baseline report. The existing single-run claims throughout
  the 2026-05-07 baseline are deliberately preserved (the brief
  explicitly asked for the historical record to stay intact).

The 2026-05-08 banner does not promote any task to In Progress /
Done — the harness is tooling, not a roadmap task. B2 prep
(ADR-0027 drafting) remains the active thread.

Refs: P10 (2026-05-06 Track D review), 2026-05-07 multi-axis review §D2
@qodo-code-review
Copy link
Copy Markdown

ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one.

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @cemililik, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

Review Change Stack

Warning

Rate limit exceeded

@cemililik has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 56 minutes and 26 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 30edee82-efe4-43ac-a7be-ad4efe94d3de

📥 Commits

Reviewing files that changed from the base of the PR and between abf26b9 and ef30b5c.

📒 Files selected for processing (1)
  • tools/perf-harness.sh
📝 Walkthrough

Walkthrough

This PR introduces a performance harness tool for multi-run boot timing aggregation. The changes add tools/perf-harness.sh (a Bash script that runs QEMU multiple times, extracts nanosecond samples, and computes percentiles), establish collection standards in docs/standards/infrastructure.md, generate the first baseline report, and update related documentation to deprecate single-run claims and record the new methodology.

Changes

Performance Harness Tool and Standards

Layer / File(s) Summary
Performance Collection Standard
docs/standards/infrastructure.md
New "Performance harness" section defines tools/perf-harness.sh as canonical source for boot-to-end timing claims, describes iteration/sample extraction/aggregation, specifies 50% validity threshold for non-zero exit, establishes reporting discipline for PR claims, and documents QEMU TCG now_ns() measurement caveats.
Harness Script Implementation
tools/perf-harness.sh
New Bash harness orchestrates multi-run iterations under per-run timeout, extracts first boot-to-end elapsed = <ns> ns sample per run, computes min/p10/p50/p90/p99/max/mean/stddev via awk, formats output with thousands separators, and optionally generates Markdown baseline reports with run inputs, methodology, metrics table, and raw samples.
First Baseline Report
docs/analysis/reports/perf-baseline-2026-05-08-post-pr-19-pre-adr-0027.md
New baseline report records run timestamp, iterations, timeout, build/kernel/git/environment details, extraction methodology, computed statistics table in ns/ms, ordered raw sample values, and verdict marking as baseline-only and instructing future comparisons to cite percentile bands.
Integration Updates
docs/analysis/reviews/performance-optimization-reviews/2026-05-07-B1-closure.md, docs/roadmap/current.md
Append-only notes documenting P10 harness landing, linking to first measured baseline, recording p10/p50/p90/p99 figures, and noting deprecation of single-run boot-to-end claims under infrastructure policy.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A harness born to measure boot,
Multi-runs through QEMU's root,
Percentiles dance in neat array,
Baselines baseline the faster way! 📊✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(tools): perf-harness.sh — multi-run boot-to-end aggregation (P10)' directly and clearly summarizes the main change: adding a performance harness tool for multi-run boot-to-end timing aggregation.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch p10-wall-clock-bench-harness

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

cemililik added a commit that referenced this pull request May 8, 2026
The 2026-05-08 banner originally claimed P10 lands in PR #20; the
actual PR-number assignment landed P10 at #21 (after path-drift PR #19
and ADR-0027 PR #20). Single-line correction; nothing else changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a performance harness tool (tools/perf-harness.sh) for multi-run boot-to-end timing aggregation and updates documentation to establish a measured performance baseline. The feedback focuses on optimizing the bash script by replacing inefficient piped commands and subshells with more direct awk and while read patterns to improve performance and readability.

Comment thread tools/perf-harness.sh Outdated
Comment on lines +213 to +217
SAMPLE=$(printf '%s\n' "$OUTPUT" \
| grep -oE 'boot-to-end elapsed = [0-9]+ ns' \
| head -n 1 \
| grep -oE '[0-9]+' \
| head -n 1 || true)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The extraction of the nanosecond sample can be significantly simplified using a single awk command. This approach is more efficient as it avoids multiple piped processes and stops processing after the first match is found. Using a here-string (<<<) is also more idiomatic in Bash for passing variable content to a command.

Suggested change
SAMPLE=$(printf '%s\n' "$OUTPUT" \
| grep -oE 'boot-to-end elapsed = [0-9]+ ns' \
| head -n 1 \
| grep -oE '[0-9]+' \
| head -n 1 || true)
SAMPLE=$(awk '/boot-to-end elapsed = [0-9]+ ns/ { print $4; exit }' <<< "$OUTPUT")

Comment thread tools/perf-harness.sh Outdated
Comment on lines +310 to +318
STATS=$(read_stats)
STAT_MIN=$(echo "$STATS" | awk '$1=="min" {print $2}')
STAT_P10=$(echo "$STATS" | awk '$1=="p10" {print $2}')
STAT_P50=$(echo "$STATS" | awk '$1=="p50" {print $2}')
STAT_P90=$(echo "$STATS" | awk '$1=="p90" {print $2}')
STAT_P99=$(echo "$STATS" | awk '$1=="p99" {print $2}')
STAT_MAX=$(echo "$STATS" | awk '$1=="max" {print $2}')
STAT_MEAN=$(echo "$STATS" | awk '$1=="mean" {print $2}')
STAT_STDDEV=$(echo "$STATS" | awk '$1=="stddev" {print $2}')
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Parsing the STATS output by repeatedly calling awk in subshells is inefficient. A single while read loop can process the key-value pairs in one pass, improving performance and readability while remaining compatible with Bash 3.2.

Suggested change
STATS=$(read_stats)
STAT_MIN=$(echo "$STATS" | awk '$1=="min" {print $2}')
STAT_P10=$(echo "$STATS" | awk '$1=="p10" {print $2}')
STAT_P50=$(echo "$STATS" | awk '$1=="p50" {print $2}')
STAT_P90=$(echo "$STATS" | awk '$1=="p90" {print $2}')
STAT_P99=$(echo "$STATS" | awk '$1=="p99" {print $2}')
STAT_MAX=$(echo "$STATS" | awk '$1=="max" {print $2}')
STAT_MEAN=$(echo "$STATS" | awk '$1=="mean" {print $2}')
STAT_STDDEV=$(echo "$STATS" | awk '$1=="stddev" {print $2}')
while read -r key val; do
case "$key" in
min) STAT_MIN=$val ;;
p10) STAT_P10=$val ;;
p50) STAT_P50=$val ;;
p90) STAT_P90=$val ;;
p99) STAT_P99=$val ;;
max) STAT_MAX=$val ;;
mean) STAT_MEAN=$val ;;
stddev) STAT_STDDEV=$val ;;
esac
done <<EOF
$(read_stats)
EOF

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tools/perf-harness.sh (1)

420-421: 💤 Low value

Pre-compute relative paths to silence SC2295.

${KERNEL_ELF#${REPO_ROOT}/} (line 421) and ${REPORT_PATH#${REPO_ROOT}/} (line 492) treat $REPO_ROOT as a glob pattern inside the # operator, so a path containing [, ], *, or ? would silently mismatch. While REPO_ROOT is a realpath-resolved string unlikely to carry glob characters, the fix is a one-liner pre-compute before each use-site.

🔧 Proposed fix
+    KERNEL_ELF_REL="${KERNEL_ELF#"$REPO_ROOT"/}"
     {
         ...
-        echo "| Kernel ELF | \`${KERNEL_ELF#${REPO_ROOT}/}\` |"
+        echo "| Kernel ELF | \`${KERNEL_ELF_REL}\` |"
-    echo "report written: ${REPORT_PATH#${REPO_ROOT}/}"
+    REPORT_PATH_REL="${REPORT_PATH#"$REPO_ROOT"/}"
+    echo "report written: $REPORT_PATH_REL"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/perf-harness.sh` around lines 420 - 421, Pre-compute the relative paths
instead of using parameter expansion with REPO_ROOT inline to avoid glob
interpretation: create variables like kernel_rel by stripping REPO_ROOT from
KERNEL_ELF (e.g., compute kernel_rel from KERNEL_ELF and REPO_ROOT before the
echo) and similarly compute report_rel from REPORT_PATH and REPO_ROOT before its
use; then use kernel_rel and report_rel in the echo lines that currently
reference `${KERNEL_ELF#${REPO_ROOT}/}` and `${REPORT_PATH#${REPO_ROOT}/}` so
the `#` operator never sees REPO_ROOT as a glob pattern.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tools/perf-harness.sh`:
- Around line 420-421: Pre-compute the relative paths instead of using parameter
expansion with REPO_ROOT inline to avoid glob interpretation: create variables
like kernel_rel by stripping REPO_ROOT from KERNEL_ELF (e.g., compute kernel_rel
from KERNEL_ELF and REPO_ROOT before the echo) and similarly compute report_rel
from REPORT_PATH and REPO_ROOT before its use; then use kernel_rel and
report_rel in the echo lines that currently reference
`${KERNEL_ELF#${REPO_ROOT}/}` and `${REPORT_PATH#${REPO_ROOT}/}` so the `#`
operator never sees REPO_ROOT as a glob pattern.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0b289f79-0dc6-4fa3-852d-3b738b6ac4c0

📥 Commits

Reviewing files that changed from the base of the PR and between aa7e6c5 and abf26b9.

📒 Files selected for processing (5)
  • docs/analysis/reports/perf-baseline-2026-05-08-post-pr-19-pre-adr-0027.md
  • docs/analysis/reviews/performance-optimization-reviews/2026-05-07-B1-closure.md
  • docs/roadmap/current.md
  • docs/standards/infrastructure.md
  • tools/perf-harness.sh

…ctor)

Both gemini-code-assist inline comments on PR #21 flagged efficiency
improvements:

1. **L213 sample extraction (4-pipe → single awk).** Replaced the
   `printf | grep | head | grep | head` pipeline with one awk
   invocation. Departures from gemini's literal suggestion:
   - Used sub/sub stripping rather than `print $4` because the kernel's
     boot-to-end line has the format "tyrne: boot-to-end elapsed = NUM
     ns" — `$4` is the equals sign; `$5` is the number. The sub/sub
     pattern is format-shift-tolerant (any "...= NUM ns..." shape
     resolves to NUM regardless of preceding fields), avoiding the
     fragility of a positional field index that would break if a
     future kernel build added a prefix or moved the line.
   - Confirmed by 5-iteration sanity run: band 3.973 / 4.559 / 5.642
     ms p10/p50/p90 (consistent with the existing baseline).

2. **L309 STATS parsing (8 echo|awk → single while-read loop).**
   Replaced eight `echo "$STATS" | awk '$1=="key" {print $2}'`
   invocations with one `while read -r key val; do case "$key" in
   ... esac done <<EOF $(read_stats) EOF` loop. Bash 3.2 compatible
   (heredoc keeps the loop in the parent shell so variable
   assignments persist past the loop end). Saves 7 fork+exec per
   harness invocation; runs once per harness invocation so the
   absolute saving is small, but the pattern is cleaner and more
   maintainable.

Verification: cargo fmt clean, host-test 159/159 (no source changes),
harness 5-iter sanity run produces consistent stats; refactor is
behaviour-preserving.

Refs: PR #21 review-round (gemini-code-assist)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cemililik
Copy link
Copy Markdown
Collaborator Author

Superseded by the integration PR for the B2 prep bundle: #19 + #20 + #21 + 2026-05-08 review artefacts + T3-M1 phase-b status-block fix all land together. Branch p10-wall-clock-bench-harness preserved for reference; the integration branch integration-2026-05-08-b2-prep-bundle carries this PR's content (3 commits including the gemini-review-round refactor) unchanged.

@cemililik cemililik closed this May 8, 2026
cemililik added a commit that referenced this pull request May 8, 2026
…losure status

Re-verified all 13 §Follow-up backlog items from the 2026-05-07 PR
#12-#17 multi-axis review against the current integration-branch
state. All 9 hygiene items + the 1 forward-flagged P10 harness item
are now closed; 3 remain forward-flagged on appropriate downstream
venues. Two updates to the consolidated review file:

1. **Top-of-file closure-status banner** — readers see the per-item
   disposition at-a-glance without scrolling to the bottom backlog.
2. **§Follow-up backlog per-item closure annotations** — items 1-9
   gain ✅ + closing-PR + closing-commit references; item 11 (P10
   harness) gains ✅ + integration-PR reference + measured-baseline
   numbers; items 10/12/13 keep their forward-flagged status with
   "status unchanged 2026-05-08" markers.

Re-verification at integration-branch HEAD (per item):

| # | Item | Closing PR / commit | Verification |
|---|---|---|---|
| 1 | current.md + perf re-baseline `.text 22,020` | PR #18 / `94a6c0f` | grep "22,020 bytes" current.md → 1 hit |
| 2 | cancel_recv_on_recv_complete test | PR #18 / `25854a1` | grep test name in ipc/mod.rs → 1 hit; host-test 159/159 |
| 3 | ipc_cancel_recv doc-rider on cap-bearing state | PR #18 / `25854a1` | grep "destroy-drain callers (Phase B2+)" → 1 hit |
| 4 | cancel-block SAFETY wording | PR #18 / `25854a1` | grep "caller_table.*shared.*reborrow" → 1 hit |
| 5 | UNSAFE-2026-0014 SHA back-fill (c30f4ee, 7a402cb) | PR #18 / `94a6c0f` | grep both SHAs in unsafe-log.md → 2 hits each |
| 6 | unsafe-policy.md §3 mechanical-edit exemption | PR #18 / `94a6c0f` | grep "Mechanical-edit exemption" → 1 hit |
| 7 | ADR-0026 §Simulation chronology rider | PR #18 / `94a6c0f` | grep "§Simulation rule was retro-extracted" → 1 hit |
| 8 | master-plan AC cross-reference | PR #18 / `94a6c0f` | grep "Closure-trio coordination cross-reference" in security + perf master-plans → 1 hit each |
| 9 | ADR-0026 §skill-clause reconciliation rider | PR #18 / `94a6c0f` | grep "single-commit Propose+Accept landing reconciliation" → 1 hit |
| 11 | P10 wall-clock harness | this integration PR (replaces #19/#20/#21) | tools/perf-harness.sh exists; baseline report exists; band p10=3.884/p50=4.642/p90=5.584 ms |

Forward-flagged (status unchanged):
- Item 10: RecvWaiting waiter-identity gap — ADR-0030 / ADR-0019 venue
- Item 12: cancel-on-cap-bearing-state destroy-drain ADR — first userspace-destroy task venue
- Item 13: B5+ preemption-rollback re-validation of ADR-0032 — B5+ preemption ADR venue

This commit only touches the consolidated review's annotation; track
files preserved as historical artefacts (their per-track verdicts are
the snapshot at the moment of the review, not subject to back-edits).
The review's per-item findings (Track-A NIT-2 SchedQueue::new doc
rename; Track-G MIN-G1/G2/G3; Track-H MIN-1/MIN-2; Track-A MIN-2
ipc_cancel_recv doc-rider; Track-D D1; Track-F §F-1) are all closed
in PR #18 + this integration PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cemililik added a commit that referenced this pull request May 8, 2026
… 2026-05-08 + 2026-05-07 follow-ups

Closes the 12 remaining items flagged by the 2026-05-08 multi-axis
review's §Follow-up backlog (3 Track-2 Majors, 1 Track-3 Major fix
already closed in `59c08e9`, 6 Track-3 / Track-2 Minors, 2 Track-4
Minors, 2 Track-2 Nits) plus the carry-forward 2026-05-07 Track-H
NIT-1.

**Track 2 Majors (forward-flagged → ADR-0027 / phase-b ledger riders):**
- M1 (escape-hatch doc): ADR-0027 §Decision outcome (c) gains a bullet
  documenting `mem::forget` / `ManuallyDrop` / `let _ = ...` as
  deliberate-but-rare escape hatches, mirroring the
  `x86_64::structures::paging::MapperFlush` precedent.
- M2 (MMU-instance binding): ADR-0027 §Decision outcome (c) gains a
  bullet noting `MapperFlush::flush(self, mmu: &impl Mmu)` accepts
  any `Mmu`; multi-`Mmu` deployments (B3+ per-task `AddressSpace`,
  Phase C multi-CPU) will need a stronger token type. Out of scope
  for v1.
- M3 (ADR-0034 placeholder): ADR-0027 §Decision outcome adds an
  "ADR-0034 (kernel-image section permissions) placeholder" block
  alongside ADR-0033, and `phase-b.md` ADR ledger gains rows for
  both ADR-0033 and ADR-0034 with named-but-unallocated discipline.

**Track 3 Minors (governance / wording polish):**
- m1 + m2 (current.md L52): drop "Phase-2" prefix on "§Simulation
  table" (the table walks Steps 0–4, not a "Phase 2"); "Accept will
  be" → "Accept landed as" + actual commit SHA `bb0a6ba`.
- m3 (commit-style.md PR-numbering rider): new §"PR-number references
  in committed artefacts" subsection naming the recurrence (PR #18 +
  PR #20 each had a one-commit PR-number fix-up) and codifying three
  acceptable disciplines (defer banner authoring; reference branch
  slug; or use commit SHA).
- n1 (framing alignment): "first to apply Simulation forward" wording
  in current.md (banner + Active decisions row) and phase-b.md §B2
  status block aligned to the precise "first non-recovery-primitive
  state-machine ADR drafted under §Simulation" phrasing — ADR-0032's
  Propose did land with a table; the prior framing was technically
  defensible only under a narrow reading of "retro-extracted".

**Track 2 Nits (substance riders in ADR-0027):**
- #4 (DSB ISH vs DSB NSH rationale): §Simulation gains a rationale
  paragraph after the table — `ISH` is forward-compatible with the
  eventual SMP boot, sub-microsecond cost on single-core, matches
  Linux aarch64 `arch/arm64/mm/proc.S` for the same reason.
- #5 (TCR_EL1.AS wording tighten): line 59 reworded — `AS = 0`
  selects 8-bit ASID *size*, not "the ASID value is 0" (the value
  is `TTBR0_EL1.ASID = 0` and is what's "globally used in v1").
- #7 (line 17 §-citation precision) + Track-3 n1: "first
  non-recovery-primitive state-machine" framing now precise.
- #8 (memory-management.md L88 page-table descriptor cosmetic):
  ASCII bit-field diagram redrawn to match L2 block-descriptor
  reality (OutputAddress[47:21], not [47:12]); explanatory note
  added for L1-block / L3-page variants.

**Track 4 Minors (perf-harness.sh):**
- #1 (Ctrl-C cleanup trap): new `cleanup_in_flight` shell function +
  `trap '...' EXIT INT TERM` that kills any in-flight QEMU + watchdog
  PIDs tracked in `CURRENT_CMD_PID` / `CURRENT_WATCHDOG_PID` shell
  globals. `run_with_timeout` updates the globals at every call so
  the trap addresses whichever pair is currently in flight; clears
  them at every clean exit so the trap is a no-op outside iterations.
- #2 (p99 small-N reporting hygiene): generated baseline report
  Methodology section gains a "**Note on p99 at small `n`**" paragraph
  explaining that under nearest-rank `p99 == max` for `n < 100` and
  callers should not over-read it as a tail-latency signal until
  `n >= 100`.

**Track 4 Nit #3 (read_stats refactor):** **Already closed by PR #21
review-round commit `ef30b5c`** — the 8 `echo | awk` parses became a
single `while read` loop. Verified at HEAD (`grep -c "while read -r
key val" tools/perf-harness.sh` → 1 hit).

**2026-05-07 Track-H NIT-1 (Pending Amendment closure-path indexing):**
UNSAFE-2026-0019 / 0020 / 0021 each gain a 2026-05-08 "closure-path
indexed" Amendment naming the canonical clearance trigger (B5
Milestone, ADR-0030 entry-point, deadline-arming syscall) explicitly,
so a future reader of `unsafe-log.md` alone has the full picture
without leaving the file. No semantic change; co-locates information
that was previously distributed across `phase-b.md` cross-references.

Verification gates re-run on the integration branch:
- `cargo fmt --all -- --check` clean
- `cargo host-test` 159/159 (25 + 100 + 34)
- `cargo host-clippy` clean (-D warnings)
- `cargo kernel-clippy` clean
- `cargo kernel-build` clean
- `tools/perf-harness.sh --iterations=3` runs end-to-end with the new
  trap + Methodology note + while-read parsing intact

This commit + the prior 2026-05-08 review's recommendations close all
follow-ups identified by both 2026-05-07 and 2026-05-08 multi-axis
reviews. Forward-flagged items (10/12/13 from 2026-05-07) and any
review-round bot input on this integration branch remain the only
open items.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cemililik added a commit that referenced this pull request May 8, 2026
…bundle

B2 prep integration bundle (replaces #19, #20, #21) — ADR-0027 + T-016 Draft + P10 harness + path-drift sweep + 2026-05-08 review
@cemililik cemililik deleted the p10-wall-clock-bench-harness branch May 25, 2026 12:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant