feat(persona): per-phase latency decomposition in serve_persona_loop (#195 slice 1) by joelteply · Pull Request #1532 · CambrianTech/continuum

joelteply · 2026-06-06T02:09:49Z

Summary

Slice 1 of task #195 (inference latency perfection campaign). Adds five LatencyAggregate fields to ServeOutcome so we can see WHERE the per-turn cost lives instead of just the end-to-end total.

Per [[observability-is-half-the-architecture]] + Joel 2026-06-05 "focus on perfecting real inference latency": the substrate doesn't optimize what it can't measure. This is the measurement layer that all subsequent optimization slices read off of.

What ships

Five new aggregates on ServeOutcome:

Field	Phase	Holds cognition mutex?
`recall_latency`	`cognition.admission.recall_scored`	yes
`admit_latency`	`cognition.admission.admit`	yes
`compose_latency`	`cognition.compose_for_turn`	yes
`respond_latency`	`persona::response::respond` (LLM bulk)	no
`say_latency`	`conversation.say`	no

Each phase wraps with Instant::now() + .elapsed().as_millis() into a PhaseTimings scratchpad zeroed per turn. After conversation.say succeeds, the scratchpad is recorded into the matching ServeOutcome aggregates. Only successful replies sample the phase aggregates — identical gate as turn_latency — so the decomposition stays internally consistent.

The existing "turn complete" tracing::info! line gains the five new *_ms fields so operators see the per-turn breakdown live.

Why now

Without the decomposition, slice 2 would be guessing. With it, slice 2 can be data-driven: observe respond_ms is (almost certainly) the dominant cost, attack that with the existing inference-hot-path bypass tasks (#112/#113/#114) and the system-prompt pre-tokenization task (#149).

Per [[init-once-handle-then-lease-zero-copy-refs]]: substrate latency lives in reinit + memory/disk, not raw compute. The decomposition will tell us WHERE the reinit costs are hiding.

Tests

8/8 service_loop tests pass (the 4 pre-existing #[ignore] tests remain ignored — same gate as before, no behavioral change to the reply path):

serve_outcome_phase_aggregates_default_to_empty — Default impl initializes all 5 phase aggregates to count=0, mean_ms=None
phase_aggregates_use_same_type_as_turn_latency — same shape as existing turn_latency; sum-of-phases ≤ turn_latency invariant

Out of scope

No optimization of any phase. Pure measurement layer.
No format/breaking changes to ServeOutcome (additive fields only).
No projection-overhead aggregate yet — if RespondInput construction shows as material residual in real runs, slice 195.k adds its own.

Doctrine

[[observability-is-half-the-architecture]] — measure first
[[init-once-handle-then-lease-zero-copy-refs]] — decomposition directs which reinit to chase
[[no-fallbacks-ever]] — continue-on-error arms abandon the scratchpad implicitly; phase aggregates' sample set matches turn_latency

card: b8b0c49c-39fa-483c-bc39-9f08fd13ea15
parent task: #195

…195 slice 1) Slice 1 of task #195 (inference latency perfection campaign). Per `[[observability-is-half-the-architecture]]` + Joel 2026-06-05 "focus on perfecting real inference latency": the substrate doesn't optimize what it can't measure. Today `ServeOutcome` captures only end-to-end `turn_latency`. Subsequent slices need phase-level data to know which step of the per-turn cycle to attack — otherwise we're guessing. ## What lands Five new `LatencyAggregate` fields on `ServeOutcome`, each measuring one phase of the per-turn cognition cycle: - `recall_latency` — `cognition.admission.recall_scored` (L2 retrieval; holds the cognition mutex) - `admit_latency` — `cognition.admission.admit` (engram formation for the inbox message; holds the mutex) - `compose_latency` — `cognition.compose_for_turn` (RAG flexbox composer + multi-source delivery; holds the mutex) - `respond_latency` — `persona::response::respond` (the LLM bulk; does NOT hold the mutex per the existing async contract). Typically the dominant cost and the primary target of slice 2+. - `say_latency` — `conversation.say` (airc publish + ack) Each phase wraps with `Instant::now()` + `.elapsed().as_millis()` into a `PhaseTimings` scratchpad zeroed per turn. After `conversation.say` succeeds, the scratchpad is recorded into the matching `ServeOutcome` aggregates. Only successful replies sample the phase aggregates — identical gate as `turn_latency` — so the decomposition stays internally consistent. ## Tracing The existing "turn complete" `tracing::info!` line gains five new fields (`recall_ms`, `admit_ms`, `compose_ms`, `respond_ms`, `say_ms`) so operators see the per-turn breakdown live without scraping the test fixture. Tag updated to `"turn complete — substrate's per-reply cost recorded with phase decomposition"` so log readers see the decomposition is wired. ## Why this matters for #195 sequencing Without the decomposition we'd guess at which optimization moves the needle. With it, slice 2 can be data-driven: run real persona turns on the LCD tier, observe whether `respond_ms` is the dominant cost (almost certainly) or whether `compose_latency` is surprising the operator (possible — the RAG composer touches multiple sources), and direct the next slice accordingly. Per `[[init-once-handle-then-lease-zero-copy-refs]]`: substrate latency lives in reinit + memory/disk, not raw compute. Decomposition will tell us WHERE the reinit costs are hiding. ## Tests (+2 new, 8/8 service_loop tests pass) `persona::service_loop::tests`: - `serve_outcome_phase_aggregates_default_to_empty` — pins that every new phase aggregate starts at `count=0`, `mean_ms=None`. Future field additions can't accidentally regress the `Default` impl for existing fields. - `phase_aggregates_use_same_type_as_turn_latency` — pins that every phase aggregate is the same `LatencyAggregate` shape as `turn_latency` (record/total/mean call sites compile identically). A future refactor that swaps one to a different type fails to compile. Also checks the sum-of-phases ≤ turn_latency invariant. The four pre-existing `#[ignore]` tests in this module (gated on the slice 1D / #161 global adapter registration) remain ignored; this slice is structural addition with no behavioral change to the loop's reply path. ## Out of scope (subsequent slices) - No optimization of any phase. Pure measurement layer. - No format/breaking changes to `ServeOutcome` (additive fields only). - No tracing format change beyond extending the existing "turn complete" line. - No projection-overhead phase aggregate — if `RespondInput` construction shows up as material residual in real runs, slice 195.k adds its own aggregate. ## Doctrine - `[[observability-is-half-the-architecture]]` — measure before optimize - `[[init-once-handle-then-lease-zero-copy-refs]]` — knowing which phase is dominant directs the next slice - `[[no-fallbacks-ever]]` — continue-on-error arms abandon the scratchpad implicitly; only fully-successful replies sample the phase aggregates, matching `turn_latency`'s sample set card: b8b0c49c-39fa-483c-bc39-9f08fd13ea15 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…Id::as_str() (#195 slice 3) Slice 3 of task #195 (inference latency perfection campaign). Pre-slice-3, `serve_persona_loop_inner` did, every turn: ```rust specialty: format!("{:?}", ctx.role).to_lowercase(), ``` Two allocations, a variadic-macro dispatch through derived-Debug, and a Unicode lowercase walk — all to produce a value `RoleId` already exposes via its canonical `as_str() -> &'static str` method (in `role_template.rs`, pinned at compile time). ## What ships Switch the per-turn read from the Debug+lowercase chain to `ctx.role.as_str()`: ```rust let respond_input = RespondInput { persona: PersonaSlot { // was: format!("{:?}", ctx.role).to_lowercase() specialty: ctx.role.as_str().to_string(), /* ... */ }, /* ... */ }; ``` Per-turn cost drops from `format!()` (variadic macro + Debug formatter + heap alloc) + `.to_lowercase()` (second alloc + Unicode lowercase walk) to a single `String::from` of a static `&'static str`. The source IS the cache — no field on `PersonaContext`, no helper, no Arc plumbing. ## Why this shape (not a cache) Initial draft of slice 3 added `PersonaContext.specialty: Arc<str>` populated by a `build_persona_specialty(role)` helper that pre- baked the Debug+lowercase output. Adversarial review (3 agents) correctly identified two problems: 1. **Caching the wrong source.** `RoleId::as_str()` already exists as the documented, intentionally-stable specialty derivation (kebab-case identifier, pinned by an explicit `match` so renames are deliberate). Caching the Debug+lowercase output couples the prompt-assembly contract to the derived-Debug format — a future contributor adding a custom Debug or wrapping the enum silently breaks the specialty string with no test failure. 2. **Caching test was a tautology.** The helper was `Arc::from(format!("{:?}", role).to_lowercase())` and the test compared its output to `format!("{:?}", role).to_lowercase()` — comparing the helper to its own implementation. A buggy helper using `.to_uppercase()` would have passed. The revised slice picks the right source and the right contract: use `role.as_str()` directly. No cache needed because the static str pointer IS already pre-computed at compile time. The behavior-preservation test (below) is non-circular: it compares the new direct path (`role.as_str()`, a hand-curated match in role_template.rs) to the pre-slice-3 derived-Debug chain. ## Test (+1 new) `persona::service_loop::tests::role_as_str_preserves_pre_slice3_specialty_format_for_each_role`: - Pins that for every `RoleId` variant (Helper, Coder, Sentinel, Custom), `role.as_str()` produces byte-identical output to the pre-slice-3 `format!("{:?}", role).to_lowercase()`. Non-circular: two independently-derived strings. A future PR adjusting either `as_str()` or adding a variant where the two paths disagree must update this test to record the intentional divergence — silent drift breaks loudly. - Compile-time exhaustiveness: a closure-typed `|role: RoleId| match role { ... }` with NO wildcard arm forces a new RoleId variant to be added here AND to the `variants` array. Cosmetic-exhaustiveness defeats from the initial draft (matching on a literal `RoleId::Helper`) are fixed. ## What this slice does NOT do (deliberate) - No `PersonaContext.specialty` field (initial draft removed per review). The original goal was reducing per-turn cost; `role.as_str()` already gives that with zero field bloat. - No `display_name` change — `String::clone` of a fixed-per- session name is the same per-turn cost whether we cache or not, so the smell is RespondInput's `String`-typed field, not the source. Lift in a follow-up when `RespondInput`'s type changes (#149-adjacent). - No prompt-template change — substrate keeps the existing specialty-string contract verbatim; the migration is provably behavior-preserving. ## Doctrine - `[[init-once-handle-then-lease-zero-copy-refs]]` — the canonical "init-once" here is the compile-time `match` in `RoleId::as_str()`; per-turn cost is a `&'static str` deref + one `String::from` (memcpy of 6-9 static bytes). Doesn't get cheaper without changing `RespondInput`'s type signature. - `[[observability-is-half-the-architecture]]` — slice 1's `respond_latency` aggregate will tell us if this and the cumulative slice-2/3+ wins are moving the needle on real persona turns. - `[[no-fallbacks-ever]]` — no graceful-degradation path; the behavior-preservation test breaks loudly if either side drifts. card: 0d780926 parent task: #195 slice 1: #1532 (merged) — phase decomposition slice 2: #1533 (merged) — cache system_prompt at construction Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…Id::as_str() (#195 slice 3) (#1534) Slice 3 of task #195 (inference latency perfection campaign). Pre-slice-3, `serve_persona_loop_inner` did, every turn: ```rust specialty: format!("{:?}", ctx.role).to_lowercase(), ``` Two allocations, a variadic-macro dispatch through derived-Debug, and a Unicode lowercase walk — all to produce a value `RoleId` already exposes via its canonical `as_str() -> &'static str` method (in `role_template.rs`, pinned at compile time). ## What ships Switch the per-turn read from the Debug+lowercase chain to `ctx.role.as_str()`: ```rust let respond_input = RespondInput { persona: PersonaSlot { // was: format!("{:?}", ctx.role).to_lowercase() specialty: ctx.role.as_str().to_string(), /* ... */ }, /* ... */ }; ``` Per-turn cost drops from `format!()` (variadic macro + Debug formatter + heap alloc) + `.to_lowercase()` (second alloc + Unicode lowercase walk) to a single `String::from` of a static `&'static str`. The source IS the cache — no field on `PersonaContext`, no helper, no Arc plumbing. ## Why this shape (not a cache) Initial draft of slice 3 added `PersonaContext.specialty: Arc<str>` populated by a `build_persona_specialty(role)` helper that pre- baked the Debug+lowercase output. Adversarial review (3 agents) correctly identified two problems: 1. **Caching the wrong source.** `RoleId::as_str()` already exists as the documented, intentionally-stable specialty derivation (kebab-case identifier, pinned by an explicit `match` so renames are deliberate). Caching the Debug+lowercase output couples the prompt-assembly contract to the derived-Debug format — a future contributor adding a custom Debug or wrapping the enum silently breaks the specialty string with no test failure. 2. **Caching test was a tautology.** The helper was `Arc::from(format!("{:?}", role).to_lowercase())` and the test compared its output to `format!("{:?}", role).to_lowercase()` — comparing the helper to its own implementation. A buggy helper using `.to_uppercase()` would have passed. The revised slice picks the right source and the right contract: use `role.as_str()` directly. No cache needed because the static str pointer IS already pre-computed at compile time. The behavior-preservation test (below) is non-circular: it compares the new direct path (`role.as_str()`, a hand-curated match in role_template.rs) to the pre-slice-3 derived-Debug chain. ## Test (+1 new) `persona::service_loop::tests::role_as_str_preserves_pre_slice3_specialty_format_for_each_role`: - Pins that for every `RoleId` variant (Helper, Coder, Sentinel, Custom), `role.as_str()` produces byte-identical output to the pre-slice-3 `format!("{:?}", role).to_lowercase()`. Non-circular: two independently-derived strings. A future PR adjusting either `as_str()` or adding a variant where the two paths disagree must update this test to record the intentional divergence — silent drift breaks loudly. - Compile-time exhaustiveness: a closure-typed `|role: RoleId| match role { ... }` with NO wildcard arm forces a new RoleId variant to be added here AND to the `variants` array. Cosmetic-exhaustiveness defeats from the initial draft (matching on a literal `RoleId::Helper`) are fixed. ## What this slice does NOT do (deliberate) - No `PersonaContext.specialty` field (initial draft removed per review). The original goal was reducing per-turn cost; `role.as_str()` already gives that with zero field bloat. - No `display_name` change — `String::clone` of a fixed-per- session name is the same per-turn cost whether we cache or not, so the smell is RespondInput's `String`-typed field, not the source. Lift in a follow-up when `RespondInput`'s type changes (#149-adjacent). - No prompt-template change — substrate keeps the existing specialty-string contract verbatim; the migration is provably behavior-preserving. ## Doctrine - `[[init-once-handle-then-lease-zero-copy-refs]]` — the canonical "init-once" here is the compile-time `match` in `RoleId::as_str()`; per-turn cost is a `&'static str` deref + one `String::from` (memcpy of 6-9 static bytes). Doesn't get cheaper without changing `RespondInput`'s type signature. - `[[observability-is-half-the-architecture]]` — slice 1's `respond_latency` aggregate will tell us if this and the cumulative slice-2/3+ wins are moving the needle on real persona turns. - `[[no-fallbacks-ever]]` — no graceful-degradation path; the behavior-preservation test breaks loudly if either side drifts. card: 0d780926 parent task: #195 slice 1: #1532 (merged) — phase decomposition slice 2: #1533 (merged) — cache system_prompt at construction Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…ion) (#1535) * perf(persona): drop per-turn format!()+lowercase — use canonical RoleId::as_str() (#195 slice 3) Slice 3 of task #195 (inference latency perfection campaign). Pre-slice-3, `serve_persona_loop_inner` did, every turn: ```rust specialty: format!("{:?}", ctx.role).to_lowercase(), ``` Two allocations, a variadic-macro dispatch through derived-Debug, and a Unicode lowercase walk — all to produce a value `RoleId` already exposes via its canonical `as_str() -> &'static str` method (in `role_template.rs`, pinned at compile time). ## What ships Switch the per-turn read from the Debug+lowercase chain to `ctx.role.as_str()`: ```rust let respond_input = RespondInput { persona: PersonaSlot { // was: format!("{:?}", ctx.role).to_lowercase() specialty: ctx.role.as_str().to_string(), /* ... */ }, /* ... */ }; ``` Per-turn cost drops from `format!()` (variadic macro + Debug formatter + heap alloc) + `.to_lowercase()` (second alloc + Unicode lowercase walk) to a single `String::from` of a static `&'static str`. The source IS the cache — no field on `PersonaContext`, no helper, no Arc plumbing. ## Why this shape (not a cache) Initial draft of slice 3 added `PersonaContext.specialty: Arc<str>` populated by a `build_persona_specialty(role)` helper that pre- baked the Debug+lowercase output. Adversarial review (3 agents) correctly identified two problems: 1. **Caching the wrong source.** `RoleId::as_str()` already exists as the documented, intentionally-stable specialty derivation (kebab-case identifier, pinned by an explicit `match` so renames are deliberate). Caching the Debug+lowercase output couples the prompt-assembly contract to the derived-Debug format — a future contributor adding a custom Debug or wrapping the enum silently breaks the specialty string with no test failure. 2. **Caching test was a tautology.** The helper was `Arc::from(format!("{:?}", role).to_lowercase())` and the test compared its output to `format!("{:?}", role).to_lowercase()` — comparing the helper to its own implementation. A buggy helper using `.to_uppercase()` would have passed. The revised slice picks the right source and the right contract: use `role.as_str()` directly. No cache needed because the static str pointer IS already pre-computed at compile time. The behavior-preservation test (below) is non-circular: it compares the new direct path (`role.as_str()`, a hand-curated match in role_template.rs) to the pre-slice-3 derived-Debug chain. ## Test (+1 new) `persona::service_loop::tests::role_as_str_preserves_pre_slice3_specialty_format_for_each_role`: - Pins that for every `RoleId` variant (Helper, Coder, Sentinel, Custom), `role.as_str()` produces byte-identical output to the pre-slice-3 `format!("{:?}", role).to_lowercase()`. Non-circular: two independently-derived strings. A future PR adjusting either `as_str()` or adding a variant where the two paths disagree must update this test to record the intentional divergence — silent drift breaks loudly. - Compile-time exhaustiveness: a closure-typed `|role: RoleId| match role { ... }` with NO wildcard arm forces a new RoleId variant to be added here AND to the `variants` array. Cosmetic-exhaustiveness defeats from the initial draft (matching on a literal `RoleId::Helper`) are fixed. ## What this slice does NOT do (deliberate) - No `PersonaContext.specialty` field (initial draft removed per review). The original goal was reducing per-turn cost; `role.as_str()` already gives that with zero field bloat. - No `display_name` change — `String::clone` of a fixed-per- session name is the same per-turn cost whether we cache or not, so the smell is RespondInput's `String`-typed field, not the source. Lift in a follow-up when `RespondInput`'s type changes (#149-adjacent). - No prompt-template change — substrate keeps the existing specialty-string contract verbatim; the migration is provably behavior-preserving. ## Doctrine - `[[init-once-handle-then-lease-zero-copy-refs]]` — the canonical "init-once" here is the compile-time `match` in `RoleId::as_str()`; per-turn cost is a `&'static str` deref + one `String::from` (memcpy of 6-9 static bytes). Doesn't get cheaper without changing `RespondInput`'s type signature. - `[[observability-is-half-the-architecture]]` — slice 1's `respond_latency` aggregate will tell us if this and the cumulative slice-2/3+ wins are moving the needle on real persona turns. - `[[no-fallbacks-ever]]` — no graceful-degradation path; the behavior-preservation test breaks loudly if either side drifts. card: 0d780926 parent task: #195 slice 1: #1532 (merged) — phase decomposition slice 2: #1533 (merged) — cache system_prompt at construction Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(probes): JsonlProbeFileSink + RTOS-debugger manual + README/CLAUDE pointers (#151 foundation) The substrate has the JTAG hardware (Slice P #176/#177): `probe!` / `time_sync!` / `stack!` macros, `ProbeRouterLayer` (in-process broadcast Layer), `ProbeStreamModule` (the `debug/probes/{class}/{open, next,close}` URI consumer), and `UriCaptureLayer` (URI ancestry). Zero usage sites in `persona/` or `cognition/` — the probes were infrastructure waiting for application. Per Joel 2026-06-06 `[[jtag-probes-are-rtos-debugger]]`: "We are building an rtos. This is so complex we need to be able to have a detailed understanding of what's going on, like a debug breakpoint, in a way, to easily observe where the code and the surrounding vars are, ones we want to inspect via probes, and timing of anything, so we can hunt down bottlenecks." This commit ships the FOUNDATION for that — the disk-persisted breakpoint log + the manual that documents the method. The sprinkle itself (probe calls inside `persona::response`, `persona::prompt_assembly`, `cognition::shared_analysis`, `persona::service_loop`) follows in subsequent commits using the checklist in the manual. ## What ships ### `routing/probe_file_sink.rs` (new — 5/5 tests passing) `JsonlProbeFileSink` — a tracing Layer that visits every probe event, applies the operator-configured class filter, and writes a JSONL line to disk with the original `class` + `uri_chain` + `message` + `fields` plus a fresh `captured_at_ms` timestamp. ```rust let sink = JsonlProbeFileSink::from_env()?; // CONTINUUM_PROBE_FILE=/tmp/probes.jsonl // CONTINUUM_PROBE_CLASSES=persona.turn.start,persona.response.render.prompt tracing_subscriber::registry() .with(UriCaptureLayer::new()) .with(ProbeRouterLayer::new()) // in-process broadcast .with(sink) // disk-persisted breakpoint log .init(); ``` Composes with `ProbeRouterLayer` — both visit the same tracing event independently; broadcast subscribers stay in-process, this one persists to disk. Per-event filter is a single HashSet lookup (empty set = no filter, all classes pass). ### Tests (5) - `from_env_returns_envvar_unset_when_path_missing` — env-var unset is a typed Err, not a silent skip (no-fallbacks doctrine) - `unfiltered_sink_persists_every_class` — empty class filter passes everything through, captured_at_ms is populated, message + fields preserved - `class_filter_drops_unallowed_classes` — only allowed classes reach disk - `non_probe_tracing_events_are_ignored` — `tracing::info!` calls without `probe_class` field don't pollute the probe log (signal density preserved) - `sink_captures_uri_chain_when_dispatched_span_active` — when inside a dispatched URI span, the on-disk record carries the URI ancestry (so an operator can trace which command's execution the probe fired during) ### `docs/architecture/RTOS-DEBUGGER-PROBES.md` (new) The MANUAL. Persistent reference per Joel's directive: "We will want a manual (doc) on how to probe the persona so we don't forget where the work must be done and how. Be sure you document your method." Covers: - The mental model: probes are non-blocking breakpoints, time_sync! is RAII timing, stack!() is the URI ancestry — together they're the substrate's RTOS debugger - When to add a probe (branch boundaries, stage entry/exit, external calls, state seams, decision points) - The class taxonomy: stable names for `persona.turn.*`, `persona.response.*`, `persona.prompt.*`, `cognition.analyze.*`, plus general-purpose `decision` / `state` / `error` - How to enable (CONTINUUM_PROBE_FILE + CONTINUUM_PROBE_CLASSES env vars, no recompile) - How to read (jq one-liners for the common queries: per-persona filter, silence reasons, slow timings, single-turn reconstruction) - How to add a new probe (the call shape + convention rules) - **The sprinkle checklist** — every file that needs probes added, with the specific seams in each. Updates to this checklist live in the same doc; the doc IS the source of truth for what's wired and what isn't ### README.md "Debugging this substrate" section Inserted between Research Foundations and Documentation. Worked code example + the env-var enable commands + link to the manual. Every contributor (human or AI agent) sees the probe convention from the public README, not just from buried docs. The Documentation table gains a row for `RTOS-DEBUGGER-PROBES.md` directly after CONTINUUM-ARCHITECTURE.md, so the doc is discoverable from the canonical doc list as well. ### CLAUDE.md sub-entry Added under canonical doc #5 (OBSERVABILITY-AS-SUBSTRATE): "the practical companion: how to USE the `probe!` / `time_sync!` / `time_async!` macros as RTOS-style breakpoints with variable inspection + timing." Any agent reading the project's CLAUDE.md finds the probe convention as a precedence-winning truth, alongside the canonical substrate docs. ## Why this slice is foundation-only Sprinkling probes into the cognition is a separate slice (the manual's checklist tracks it). Shipping the file sink + the manual + the discoverability pointers FIRST gives reviewers a small, testable change that's clearly correct in isolation, and gives the next slice (the sprinkle) a stable target to write against — every probe call site in commit 2+ refers to the class taxonomy in commit 1's manual. ## Doctrine - `[[jtag-probes-are-rtos-debugger]]` — the framing this slice enables - `[[observability-is-half-the-architecture]]` — CaptureSink + Noop default + replay-as-first-class; this is the file-persisted consumer of that pattern - `[[no-fallbacks-ever]]` — env-var unset is typed Err `ProbeFileSinkError::EnvVarUnset`, path failures are typed `OpenFailed { path, source }` (operator must fix path; substrate refuses to silently synthesize a default) card: 8d7ca5c3 parent: #151 (echo-storm filter) — the bug we're actually hunting, debugger-first so we can find the bug INSIDE the cognition pipeline, not in a Rust gate around it Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

github-actions Bot added the size: M label Jun 6, 2026

joelteply merged commit 0c10234 into canary Jun 6, 2026
3 checks passed

joelteply deleted the b8b0c49c/latency-decomp branch June 6, 2026 02:10

This was referenced Jun 6, 2026

perf(persona): cache system_prompt at PersonaContext construction (#195 slice 2) #1533

Merged

perf(persona): drop per-turn format!()+lowercase — use canonical RoleId::as_str() (#195 slice 3) #1534

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(persona): per-phase latency decomposition in serve_persona_loop (#195 slice 1)#1532

feat(persona): per-phase latency decomposition in serve_persona_loop (#195 slice 1)#1532
joelteply merged 1 commit into
canaryfrom
b8b0c49c/latency-decomp

joelteply commented Jun 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

joelteply commented Jun 6, 2026

Summary

What ships

Why now

Tests

Out of scope

Doctrine

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant