feat(persona): per-phase latency decomposition in serve_persona_loop (#195 slice 1)#1532
Merged
Conversation
…195 slice 1) Slice 1 of task #195 (inference latency perfection campaign). Per `[[observability-is-half-the-architecture]]` + Joel 2026-06-05 "focus on perfecting real inference latency": the substrate doesn't optimize what it can't measure. Today `ServeOutcome` captures only end-to-end `turn_latency`. Subsequent slices need phase-level data to know which step of the per-turn cycle to attack — otherwise we're guessing. ## What lands Five new `LatencyAggregate` fields on `ServeOutcome`, each measuring one phase of the per-turn cognition cycle: - `recall_latency` — `cognition.admission.recall_scored` (L2 retrieval; holds the cognition mutex) - `admit_latency` — `cognition.admission.admit` (engram formation for the inbox message; holds the mutex) - `compose_latency` — `cognition.compose_for_turn` (RAG flexbox composer + multi-source delivery; holds the mutex) - `respond_latency` — `persona::response::respond` (the LLM bulk; does NOT hold the mutex per the existing async contract). Typically the dominant cost and the primary target of slice 2+. - `say_latency` — `conversation.say` (airc publish + ack) Each phase wraps with `Instant::now()` + `.elapsed().as_millis()` into a `PhaseTimings` scratchpad zeroed per turn. After `conversation.say` succeeds, the scratchpad is recorded into the matching `ServeOutcome` aggregates. Only successful replies sample the phase aggregates — identical gate as `turn_latency` — so the decomposition stays internally consistent. ## Tracing The existing "turn complete" `tracing::info!` line gains five new fields (`recall_ms`, `admit_ms`, `compose_ms`, `respond_ms`, `say_ms`) so operators see the per-turn breakdown live without scraping the test fixture. Tag updated to `"turn complete — substrate's per-reply cost recorded with phase decomposition"` so log readers see the decomposition is wired. ## Why this matters for #195 sequencing Without the decomposition we'd guess at which optimization moves the needle. With it, slice 2 can be data-driven: run real persona turns on the LCD tier, observe whether `respond_ms` is the dominant cost (almost certainly) or whether `compose_latency` is surprising the operator (possible — the RAG composer touches multiple sources), and direct the next slice accordingly. Per `[[init-once-handle-then-lease-zero-copy-refs]]`: substrate latency lives in reinit + memory/disk, not raw compute. Decomposition will tell us WHERE the reinit costs are hiding. ## Tests (+2 new, 8/8 service_loop tests pass) `persona::service_loop::tests`: - `serve_outcome_phase_aggregates_default_to_empty` — pins that every new phase aggregate starts at `count=0`, `mean_ms=None`. Future field additions can't accidentally regress the `Default` impl for existing fields. - `phase_aggregates_use_same_type_as_turn_latency` — pins that every phase aggregate is the same `LatencyAggregate` shape as `turn_latency` (record/total/mean call sites compile identically). A future refactor that swaps one to a different type fails to compile. Also checks the sum-of-phases ≤ turn_latency invariant. The four pre-existing `#[ignore]` tests in this module (gated on the slice 1D / #161 global adapter registration) remain ignored; this slice is structural addition with no behavioral change to the loop's reply path. ## Out of scope (subsequent slices) - No optimization of any phase. Pure measurement layer. - No format/breaking changes to `ServeOutcome` (additive fields only). - No tracing format change beyond extending the existing "turn complete" line. - No projection-overhead phase aggregate — if `RespondInput` construction shows up as material residual in real runs, slice 195.k adds its own aggregate. ## Doctrine - `[[observability-is-half-the-architecture]]` — measure before optimize - `[[init-once-handle-then-lease-zero-copy-refs]]` — knowing which phase is dominant directs the next slice - `[[no-fallbacks-ever]]` — continue-on-error arms abandon the scratchpad implicitly; only fully-successful replies sample the phase aggregates, matching `turn_latency`'s sample set card: b8b0c49c-39fa-483c-bc39-9f08fd13ea15 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This was referenced Jun 6, 2026
joelteply
added a commit
that referenced
this pull request
Jun 6, 2026
…Id::as_str() (#195 slice 3) Slice 3 of task #195 (inference latency perfection campaign). Pre-slice-3, `serve_persona_loop_inner` did, every turn: ```rust specialty: format!("{:?}", ctx.role).to_lowercase(), ``` Two allocations, a variadic-macro dispatch through derived-Debug, and a Unicode lowercase walk — all to produce a value `RoleId` already exposes via its canonical `as_str() -> &'static str` method (in `role_template.rs`, pinned at compile time). ## What ships Switch the per-turn read from the Debug+lowercase chain to `ctx.role.as_str()`: ```rust let respond_input = RespondInput { persona: PersonaSlot { // was: format!("{:?}", ctx.role).to_lowercase() specialty: ctx.role.as_str().to_string(), /* ... */ }, /* ... */ }; ``` Per-turn cost drops from `format!()` (variadic macro + Debug formatter + heap alloc) + `.to_lowercase()` (second alloc + Unicode lowercase walk) to a single `String::from` of a static `&'static str`. The source IS the cache — no field on `PersonaContext`, no helper, no Arc plumbing. ## Why this shape (not a cache) Initial draft of slice 3 added `PersonaContext.specialty: Arc<str>` populated by a `build_persona_specialty(role)` helper that pre- baked the Debug+lowercase output. Adversarial review (3 agents) correctly identified two problems: 1. **Caching the wrong source.** `RoleId::as_str()` already exists as the documented, intentionally-stable specialty derivation (kebab-case identifier, pinned by an explicit `match` so renames are deliberate). Caching the Debug+lowercase output couples the prompt-assembly contract to the derived-Debug format — a future contributor adding a custom Debug or wrapping the enum silently breaks the specialty string with no test failure. 2. **Caching test was a tautology.** The helper was `Arc::from(format!("{:?}", role).to_lowercase())` and the test compared its output to `format!("{:?}", role).to_lowercase()` — comparing the helper to its own implementation. A buggy helper using `.to_uppercase()` would have passed. The revised slice picks the right source and the right contract: use `role.as_str()` directly. No cache needed because the static str pointer IS already pre-computed at compile time. The behavior-preservation test (below) is non-circular: it compares the new direct path (`role.as_str()`, a hand-curated match in role_template.rs) to the pre-slice-3 derived-Debug chain. ## Test (+1 new) `persona::service_loop::tests::role_as_str_preserves_pre_slice3_specialty_format_for_each_role`: - Pins that for every `RoleId` variant (Helper, Coder, Sentinel, Custom), `role.as_str()` produces byte-identical output to the pre-slice-3 `format!("{:?}", role).to_lowercase()`. Non-circular: two independently-derived strings. A future PR adjusting either `as_str()` or adding a variant where the two paths disagree must update this test to record the intentional divergence — silent drift breaks loudly. - Compile-time exhaustiveness: a closure-typed `|role: RoleId| match role { ... }` with NO wildcard arm forces a new RoleId variant to be added here AND to the `variants` array. Cosmetic-exhaustiveness defeats from the initial draft (matching on a literal `RoleId::Helper`) are fixed. ## What this slice does NOT do (deliberate) - No `PersonaContext.specialty` field (initial draft removed per review). The original goal was reducing per-turn cost; `role.as_str()` already gives that with zero field bloat. - No `display_name` change — `String::clone` of a fixed-per- session name is the same per-turn cost whether we cache or not, so the smell is RespondInput's `String`-typed field, not the source. Lift in a follow-up when `RespondInput`'s type changes (#149-adjacent). - No prompt-template change — substrate keeps the existing specialty-string contract verbatim; the migration is provably behavior-preserving. ## Doctrine - `[[init-once-handle-then-lease-zero-copy-refs]]` — the canonical "init-once" here is the compile-time `match` in `RoleId::as_str()`; per-turn cost is a `&'static str` deref + one `String::from` (memcpy of 6-9 static bytes). Doesn't get cheaper without changing `RespondInput`'s type signature. - `[[observability-is-half-the-architecture]]` — slice 1's `respond_latency` aggregate will tell us if this and the cumulative slice-2/3+ wins are moving the needle on real persona turns. - `[[no-fallbacks-ever]]` — no graceful-degradation path; the behavior-preservation test breaks loudly if either side drifts. card: 0d780926 parent task: #195 slice 1: #1532 (merged) — phase decomposition slice 2: #1533 (merged) — cache system_prompt at construction Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
joelteply
added a commit
that referenced
this pull request
Jun 6, 2026
…Id::as_str() (#195 slice 3) Slice 3 of task #195 (inference latency perfection campaign). Pre-slice-3, `serve_persona_loop_inner` did, every turn: ```rust specialty: format!("{:?}", ctx.role).to_lowercase(), ``` Two allocations, a variadic-macro dispatch through derived-Debug, and a Unicode lowercase walk — all to produce a value `RoleId` already exposes via its canonical `as_str() -> &'static str` method (in `role_template.rs`, pinned at compile time). ## What ships Switch the per-turn read from the Debug+lowercase chain to `ctx.role.as_str()`: ```rust let respond_input = RespondInput { persona: PersonaSlot { // was: format!("{:?}", ctx.role).to_lowercase() specialty: ctx.role.as_str().to_string(), /* ... */ }, /* ... */ }; ``` Per-turn cost drops from `format!()` (variadic macro + Debug formatter + heap alloc) + `.to_lowercase()` (second alloc + Unicode lowercase walk) to a single `String::from` of a static `&'static str`. The source IS the cache — no field on `PersonaContext`, no helper, no Arc plumbing. ## Why this shape (not a cache) Initial draft of slice 3 added `PersonaContext.specialty: Arc<str>` populated by a `build_persona_specialty(role)` helper that pre- baked the Debug+lowercase output. Adversarial review (3 agents) correctly identified two problems: 1. **Caching the wrong source.** `RoleId::as_str()` already exists as the documented, intentionally-stable specialty derivation (kebab-case identifier, pinned by an explicit `match` so renames are deliberate). Caching the Debug+lowercase output couples the prompt-assembly contract to the derived-Debug format — a future contributor adding a custom Debug or wrapping the enum silently breaks the specialty string with no test failure. 2. **Caching test was a tautology.** The helper was `Arc::from(format!("{:?}", role).to_lowercase())` and the test compared its output to `format!("{:?}", role).to_lowercase()` — comparing the helper to its own implementation. A buggy helper using `.to_uppercase()` would have passed. The revised slice picks the right source and the right contract: use `role.as_str()` directly. No cache needed because the static str pointer IS already pre-computed at compile time. The behavior-preservation test (below) is non-circular: it compares the new direct path (`role.as_str()`, a hand-curated match in role_template.rs) to the pre-slice-3 derived-Debug chain. ## Test (+1 new) `persona::service_loop::tests::role_as_str_preserves_pre_slice3_specialty_format_for_each_role`: - Pins that for every `RoleId` variant (Helper, Coder, Sentinel, Custom), `role.as_str()` produces byte-identical output to the pre-slice-3 `format!("{:?}", role).to_lowercase()`. Non-circular: two independently-derived strings. A future PR adjusting either `as_str()` or adding a variant where the two paths disagree must update this test to record the intentional divergence — silent drift breaks loudly. - Compile-time exhaustiveness: a closure-typed `|role: RoleId| match role { ... }` with NO wildcard arm forces a new RoleId variant to be added here AND to the `variants` array. Cosmetic-exhaustiveness defeats from the initial draft (matching on a literal `RoleId::Helper`) are fixed. ## What this slice does NOT do (deliberate) - No `PersonaContext.specialty` field (initial draft removed per review). The original goal was reducing per-turn cost; `role.as_str()` already gives that with zero field bloat. - No `display_name` change — `String::clone` of a fixed-per- session name is the same per-turn cost whether we cache or not, so the smell is RespondInput's `String`-typed field, not the source. Lift in a follow-up when `RespondInput`'s type changes (#149-adjacent). - No prompt-template change — substrate keeps the existing specialty-string contract verbatim; the migration is provably behavior-preserving. ## Doctrine - `[[init-once-handle-then-lease-zero-copy-refs]]` — the canonical "init-once" here is the compile-time `match` in `RoleId::as_str()`; per-turn cost is a `&'static str` deref + one `String::from` (memcpy of 6-9 static bytes). Doesn't get cheaper without changing `RespondInput`'s type signature. - `[[observability-is-half-the-architecture]]` — slice 1's `respond_latency` aggregate will tell us if this and the cumulative slice-2/3+ wins are moving the needle on real persona turns. - `[[no-fallbacks-ever]]` — no graceful-degradation path; the behavior-preservation test breaks loudly if either side drifts. card: 0d780926 parent task: #195 slice 1: #1532 (merged) — phase decomposition slice 2: #1533 (merged) — cache system_prompt at construction Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
joelteply
added a commit
that referenced
this pull request
Jun 6, 2026
…Id::as_str() (#195 slice 3) (#1534) Slice 3 of task #195 (inference latency perfection campaign). Pre-slice-3, `serve_persona_loop_inner` did, every turn: ```rust specialty: format!("{:?}", ctx.role).to_lowercase(), ``` Two allocations, a variadic-macro dispatch through derived-Debug, and a Unicode lowercase walk — all to produce a value `RoleId` already exposes via its canonical `as_str() -> &'static str` method (in `role_template.rs`, pinned at compile time). ## What ships Switch the per-turn read from the Debug+lowercase chain to `ctx.role.as_str()`: ```rust let respond_input = RespondInput { persona: PersonaSlot { // was: format!("{:?}", ctx.role).to_lowercase() specialty: ctx.role.as_str().to_string(), /* ... */ }, /* ... */ }; ``` Per-turn cost drops from `format!()` (variadic macro + Debug formatter + heap alloc) + `.to_lowercase()` (second alloc + Unicode lowercase walk) to a single `String::from` of a static `&'static str`. The source IS the cache — no field on `PersonaContext`, no helper, no Arc plumbing. ## Why this shape (not a cache) Initial draft of slice 3 added `PersonaContext.specialty: Arc<str>` populated by a `build_persona_specialty(role)` helper that pre- baked the Debug+lowercase output. Adversarial review (3 agents) correctly identified two problems: 1. **Caching the wrong source.** `RoleId::as_str()` already exists as the documented, intentionally-stable specialty derivation (kebab-case identifier, pinned by an explicit `match` so renames are deliberate). Caching the Debug+lowercase output couples the prompt-assembly contract to the derived-Debug format — a future contributor adding a custom Debug or wrapping the enum silently breaks the specialty string with no test failure. 2. **Caching test was a tautology.** The helper was `Arc::from(format!("{:?}", role).to_lowercase())` and the test compared its output to `format!("{:?}", role).to_lowercase()` — comparing the helper to its own implementation. A buggy helper using `.to_uppercase()` would have passed. The revised slice picks the right source and the right contract: use `role.as_str()` directly. No cache needed because the static str pointer IS already pre-computed at compile time. The behavior-preservation test (below) is non-circular: it compares the new direct path (`role.as_str()`, a hand-curated match in role_template.rs) to the pre-slice-3 derived-Debug chain. ## Test (+1 new) `persona::service_loop::tests::role_as_str_preserves_pre_slice3_specialty_format_for_each_role`: - Pins that for every `RoleId` variant (Helper, Coder, Sentinel, Custom), `role.as_str()` produces byte-identical output to the pre-slice-3 `format!("{:?}", role).to_lowercase()`. Non-circular: two independently-derived strings. A future PR adjusting either `as_str()` or adding a variant where the two paths disagree must update this test to record the intentional divergence — silent drift breaks loudly. - Compile-time exhaustiveness: a closure-typed `|role: RoleId| match role { ... }` with NO wildcard arm forces a new RoleId variant to be added here AND to the `variants` array. Cosmetic-exhaustiveness defeats from the initial draft (matching on a literal `RoleId::Helper`) are fixed. ## What this slice does NOT do (deliberate) - No `PersonaContext.specialty` field (initial draft removed per review). The original goal was reducing per-turn cost; `role.as_str()` already gives that with zero field bloat. - No `display_name` change — `String::clone` of a fixed-per- session name is the same per-turn cost whether we cache or not, so the smell is RespondInput's `String`-typed field, not the source. Lift in a follow-up when `RespondInput`'s type changes (#149-adjacent). - No prompt-template change — substrate keeps the existing specialty-string contract verbatim; the migration is provably behavior-preserving. ## Doctrine - `[[init-once-handle-then-lease-zero-copy-refs]]` — the canonical "init-once" here is the compile-time `match` in `RoleId::as_str()`; per-turn cost is a `&'static str` deref + one `String::from` (memcpy of 6-9 static bytes). Doesn't get cheaper without changing `RespondInput`'s type signature. - `[[observability-is-half-the-architecture]]` — slice 1's `respond_latency` aggregate will tell us if this and the cumulative slice-2/3+ wins are moving the needle on real persona turns. - `[[no-fallbacks-ever]]` — no graceful-degradation path; the behavior-preservation test breaks loudly if either side drifts. card: 0d780926 parent task: #195 slice 1: #1532 (merged) — phase decomposition slice 2: #1533 (merged) — cache system_prompt at construction Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
joelteply
added a commit
that referenced
this pull request
Jun 6, 2026
…ion) (#1535) * perf(persona): drop per-turn format!()+lowercase — use canonical RoleId::as_str() (#195 slice 3) Slice 3 of task #195 (inference latency perfection campaign). Pre-slice-3, `serve_persona_loop_inner` did, every turn: ```rust specialty: format!("{:?}", ctx.role).to_lowercase(), ``` Two allocations, a variadic-macro dispatch through derived-Debug, and a Unicode lowercase walk — all to produce a value `RoleId` already exposes via its canonical `as_str() -> &'static str` method (in `role_template.rs`, pinned at compile time). ## What ships Switch the per-turn read from the Debug+lowercase chain to `ctx.role.as_str()`: ```rust let respond_input = RespondInput { persona: PersonaSlot { // was: format!("{:?}", ctx.role).to_lowercase() specialty: ctx.role.as_str().to_string(), /* ... */ }, /* ... */ }; ``` Per-turn cost drops from `format!()` (variadic macro + Debug formatter + heap alloc) + `.to_lowercase()` (second alloc + Unicode lowercase walk) to a single `String::from` of a static `&'static str`. The source IS the cache — no field on `PersonaContext`, no helper, no Arc plumbing. ## Why this shape (not a cache) Initial draft of slice 3 added `PersonaContext.specialty: Arc<str>` populated by a `build_persona_specialty(role)` helper that pre- baked the Debug+lowercase output. Adversarial review (3 agents) correctly identified two problems: 1. **Caching the wrong source.** `RoleId::as_str()` already exists as the documented, intentionally-stable specialty derivation (kebab-case identifier, pinned by an explicit `match` so renames are deliberate). Caching the Debug+lowercase output couples the prompt-assembly contract to the derived-Debug format — a future contributor adding a custom Debug or wrapping the enum silently breaks the specialty string with no test failure. 2. **Caching test was a tautology.** The helper was `Arc::from(format!("{:?}", role).to_lowercase())` and the test compared its output to `format!("{:?}", role).to_lowercase()` — comparing the helper to its own implementation. A buggy helper using `.to_uppercase()` would have passed. The revised slice picks the right source and the right contract: use `role.as_str()` directly. No cache needed because the static str pointer IS already pre-computed at compile time. The behavior-preservation test (below) is non-circular: it compares the new direct path (`role.as_str()`, a hand-curated match in role_template.rs) to the pre-slice-3 derived-Debug chain. ## Test (+1 new) `persona::service_loop::tests::role_as_str_preserves_pre_slice3_specialty_format_for_each_role`: - Pins that for every `RoleId` variant (Helper, Coder, Sentinel, Custom), `role.as_str()` produces byte-identical output to the pre-slice-3 `format!("{:?}", role).to_lowercase()`. Non-circular: two independently-derived strings. A future PR adjusting either `as_str()` or adding a variant where the two paths disagree must update this test to record the intentional divergence — silent drift breaks loudly. - Compile-time exhaustiveness: a closure-typed `|role: RoleId| match role { ... }` with NO wildcard arm forces a new RoleId variant to be added here AND to the `variants` array. Cosmetic-exhaustiveness defeats from the initial draft (matching on a literal `RoleId::Helper`) are fixed. ## What this slice does NOT do (deliberate) - No `PersonaContext.specialty` field (initial draft removed per review). The original goal was reducing per-turn cost; `role.as_str()` already gives that with zero field bloat. - No `display_name` change — `String::clone` of a fixed-per- session name is the same per-turn cost whether we cache or not, so the smell is RespondInput's `String`-typed field, not the source. Lift in a follow-up when `RespondInput`'s type changes (#149-adjacent). - No prompt-template change — substrate keeps the existing specialty-string contract verbatim; the migration is provably behavior-preserving. ## Doctrine - `[[init-once-handle-then-lease-zero-copy-refs]]` — the canonical "init-once" here is the compile-time `match` in `RoleId::as_str()`; per-turn cost is a `&'static str` deref + one `String::from` (memcpy of 6-9 static bytes). Doesn't get cheaper without changing `RespondInput`'s type signature. - `[[observability-is-half-the-architecture]]` — slice 1's `respond_latency` aggregate will tell us if this and the cumulative slice-2/3+ wins are moving the needle on real persona turns. - `[[no-fallbacks-ever]]` — no graceful-degradation path; the behavior-preservation test breaks loudly if either side drifts. card: 0d780926 parent task: #195 slice 1: #1532 (merged) — phase decomposition slice 2: #1533 (merged) — cache system_prompt at construction Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(probes): JsonlProbeFileSink + RTOS-debugger manual + README/CLAUDE pointers (#151 foundation) The substrate has the JTAG hardware (Slice P #176/#177): `probe!` / `time_sync!` / `stack!` macros, `ProbeRouterLayer` (in-process broadcast Layer), `ProbeStreamModule` (the `debug/probes/{class}/{open, next,close}` URI consumer), and `UriCaptureLayer` (URI ancestry). Zero usage sites in `persona/` or `cognition/` — the probes were infrastructure waiting for application. Per Joel 2026-06-06 `[[jtag-probes-are-rtos-debugger]]`: "We are building an rtos. This is so complex we need to be able to have a detailed understanding of what's going on, like a debug breakpoint, in a way, to easily observe where the code and the surrounding vars are, ones we want to inspect via probes, and timing of anything, so we can hunt down bottlenecks." This commit ships the FOUNDATION for that — the disk-persisted breakpoint log + the manual that documents the method. The sprinkle itself (probe calls inside `persona::response`, `persona::prompt_assembly`, `cognition::shared_analysis`, `persona::service_loop`) follows in subsequent commits using the checklist in the manual. ## What ships ### `routing/probe_file_sink.rs` (new — 5/5 tests passing) `JsonlProbeFileSink` — a tracing Layer that visits every probe event, applies the operator-configured class filter, and writes a JSONL line to disk with the original `class` + `uri_chain` + `message` + `fields` plus a fresh `captured_at_ms` timestamp. ```rust let sink = JsonlProbeFileSink::from_env()?; // CONTINUUM_PROBE_FILE=/tmp/probes.jsonl // CONTINUUM_PROBE_CLASSES=persona.turn.start,persona.response.render.prompt tracing_subscriber::registry() .with(UriCaptureLayer::new()) .with(ProbeRouterLayer::new()) // in-process broadcast .with(sink) // disk-persisted breakpoint log .init(); ``` Composes with `ProbeRouterLayer` — both visit the same tracing event independently; broadcast subscribers stay in-process, this one persists to disk. Per-event filter is a single HashSet lookup (empty set = no filter, all classes pass). ### Tests (5) - `from_env_returns_envvar_unset_when_path_missing` — env-var unset is a typed Err, not a silent skip (no-fallbacks doctrine) - `unfiltered_sink_persists_every_class` — empty class filter passes everything through, captured_at_ms is populated, message + fields preserved - `class_filter_drops_unallowed_classes` — only allowed classes reach disk - `non_probe_tracing_events_are_ignored` — `tracing::info!` calls without `probe_class` field don't pollute the probe log (signal density preserved) - `sink_captures_uri_chain_when_dispatched_span_active` — when inside a dispatched URI span, the on-disk record carries the URI ancestry (so an operator can trace which command's execution the probe fired during) ### `docs/architecture/RTOS-DEBUGGER-PROBES.md` (new) The MANUAL. Persistent reference per Joel's directive: "We will want a manual (doc) on how to probe the persona so we don't forget where the work must be done and how. Be sure you document your method." Covers: - The mental model: probes are non-blocking breakpoints, time_sync! is RAII timing, stack!() is the URI ancestry — together they're the substrate's RTOS debugger - When to add a probe (branch boundaries, stage entry/exit, external calls, state seams, decision points) - The class taxonomy: stable names for `persona.turn.*`, `persona.response.*`, `persona.prompt.*`, `cognition.analyze.*`, plus general-purpose `decision` / `state` / `error` - How to enable (CONTINUUM_PROBE_FILE + CONTINUUM_PROBE_CLASSES env vars, no recompile) - How to read (jq one-liners for the common queries: per-persona filter, silence reasons, slow timings, single-turn reconstruction) - How to add a new probe (the call shape + convention rules) - **The sprinkle checklist** — every file that needs probes added, with the specific seams in each. Updates to this checklist live in the same doc; the doc IS the source of truth for what's wired and what isn't ### README.md "Debugging this substrate" section Inserted between Research Foundations and Documentation. Worked code example + the env-var enable commands + link to the manual. Every contributor (human or AI agent) sees the probe convention from the public README, not just from buried docs. The Documentation table gains a row for `RTOS-DEBUGGER-PROBES.md` directly after CONTINUUM-ARCHITECTURE.md, so the doc is discoverable from the canonical doc list as well. ### CLAUDE.md sub-entry Added under canonical doc #5 (OBSERVABILITY-AS-SUBSTRATE): "the practical companion: how to USE the `probe!` / `time_sync!` / `time_async!` macros as RTOS-style breakpoints with variable inspection + timing." Any agent reading the project's CLAUDE.md finds the probe convention as a precedence-winning truth, alongside the canonical substrate docs. ## Why this slice is foundation-only Sprinkling probes into the cognition is a separate slice (the manual's checklist tracks it). Shipping the file sink + the manual + the discoverability pointers FIRST gives reviewers a small, testable change that's clearly correct in isolation, and gives the next slice (the sprinkle) a stable target to write against — every probe call site in commit 2+ refers to the class taxonomy in commit 1's manual. ## Doctrine - `[[jtag-probes-are-rtos-debugger]]` — the framing this slice enables - `[[observability-is-half-the-architecture]]` — CaptureSink + Noop default + replay-as-first-class; this is the file-persisted consumer of that pattern - `[[no-fallbacks-ever]]` — env-var unset is typed Err `ProbeFileSinkError::EnvVarUnset`, path failures are typed `OpenFailed { path, source }` (operator must fix path; substrate refuses to silently synthesize a default) card: 8d7ca5c3 parent: #151 (echo-storm filter) — the bug we're actually hunting, debugger-first so we can find the bug INSIDE the cognition pipeline, not in a Rust gate around it Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Slice 1 of task #195 (inference latency perfection campaign). Adds five
LatencyAggregatefields toServeOutcomeso we can see WHERE the per-turn cost lives instead of just the end-to-end total.Per
[[observability-is-half-the-architecture]]+ Joel 2026-06-05 "focus on perfecting real inference latency": the substrate doesn't optimize what it can't measure. This is the measurement layer that all subsequent optimization slices read off of.What ships
Five new aggregates on
ServeOutcome:recall_latencycognition.admission.recall_scoredadmit_latencycognition.admission.admitcompose_latencycognition.compose_for_turnrespond_latencypersona::response::respond(LLM bulk)say_latencyconversation.sayEach phase wraps with
Instant::now()+.elapsed().as_millis()into aPhaseTimingsscratchpad zeroed per turn. Afterconversation.saysucceeds, the scratchpad is recorded into the matchingServeOutcomeaggregates. Only successful replies sample the phase aggregates — identical gate asturn_latency— so the decomposition stays internally consistent.The existing "turn complete"
tracing::info!line gains the five new*_msfields so operators see the per-turn breakdown live.Why now
Without the decomposition, slice 2 would be guessing. With it, slice 2 can be data-driven: observe
respond_msis (almost certainly) the dominant cost, attack that with the existing inference-hot-path bypass tasks (#112/#113/#114) and the system-prompt pre-tokenization task (#149).Per
[[init-once-handle-then-lease-zero-copy-refs]]: substrate latency lives in reinit + memory/disk, not raw compute. The decomposition will tell us WHERE the reinit costs are hiding.Tests
8/8
service_looptests pass (the 4 pre-existing#[ignore]tests remain ignored — same gate as before, no behavioral change to the reply path):serve_outcome_phase_aggregates_default_to_empty—Defaultimpl initializes all 5 phase aggregates tocount=0,mean_ms=Nonephase_aggregates_use_same_type_as_turn_latency— same shape as existingturn_latency; sum-of-phases ≤ turn_latency invariantOut of scope
ServeOutcome(additive fields only).RespondInputconstruction shows as material residual in real runs, slice 195.k adds its own.Doctrine
[[observability-is-half-the-architecture]]— measure first[[init-once-handle-then-lease-zero-copy-refs]]— decomposition directs which reinit to chase[[no-fallbacks-ever]]— continue-on-error arms abandon the scratchpad implicitly; phase aggregates' sample set matchesturn_latencycard:
b8b0c49c-39fa-483c-bc39-9f08fd13ea15parent task: #195