Skip to content

feat(persona): per-phase latency decomposition in serve_persona_loop (#195 slice 1)#1532

Merged
joelteply merged 1 commit into
canaryfrom
b8b0c49c/latency-decomp
Jun 6, 2026
Merged

feat(persona): per-phase latency decomposition in serve_persona_loop (#195 slice 1)#1532
joelteply merged 1 commit into
canaryfrom
b8b0c49c/latency-decomp

Conversation

@joelteply
Copy link
Copy Markdown
Contributor

Summary

Slice 1 of task #195 (inference latency perfection campaign). Adds five LatencyAggregate fields to ServeOutcome so we can see WHERE the per-turn cost lives instead of just the end-to-end total.

Per [[observability-is-half-the-architecture]] + Joel 2026-06-05 "focus on perfecting real inference latency": the substrate doesn't optimize what it can't measure. This is the measurement layer that all subsequent optimization slices read off of.

What ships

Five new aggregates on ServeOutcome:

Field Phase Holds cognition mutex?
recall_latency cognition.admission.recall_scored yes
admit_latency cognition.admission.admit yes
compose_latency cognition.compose_for_turn yes
respond_latency persona::response::respond (LLM bulk) no
say_latency conversation.say no

Each phase wraps with Instant::now() + .elapsed().as_millis() into a PhaseTimings scratchpad zeroed per turn. After conversation.say succeeds, the scratchpad is recorded into the matching ServeOutcome aggregates. Only successful replies sample the phase aggregates — identical gate as turn_latency — so the decomposition stays internally consistent.

The existing "turn complete" tracing::info! line gains the five new *_ms fields so operators see the per-turn breakdown live.

Why now

Without the decomposition, slice 2 would be guessing. With it, slice 2 can be data-driven: observe respond_ms is (almost certainly) the dominant cost, attack that with the existing inference-hot-path bypass tasks (#112/#113/#114) and the system-prompt pre-tokenization task (#149).

Per [[init-once-handle-then-lease-zero-copy-refs]]: substrate latency lives in reinit + memory/disk, not raw compute. The decomposition will tell us WHERE the reinit costs are hiding.

Tests

8/8 service_loop tests pass (the 4 pre-existing #[ignore] tests remain ignored — same gate as before, no behavioral change to the reply path):

  • serve_outcome_phase_aggregates_default_to_emptyDefault impl initializes all 5 phase aggregates to count=0, mean_ms=None
  • phase_aggregates_use_same_type_as_turn_latency — same shape as existing turn_latency; sum-of-phases ≤ turn_latency invariant

Out of scope

  • No optimization of any phase. Pure measurement layer.
  • No format/breaking changes to ServeOutcome (additive fields only).
  • No projection-overhead aggregate yet — if RespondInput construction shows as material residual in real runs, slice 195.k adds its own.

Doctrine

  • [[observability-is-half-the-architecture]] — measure first
  • [[init-once-handle-then-lease-zero-copy-refs]] — decomposition directs which reinit to chase
  • [[no-fallbacks-ever]] — continue-on-error arms abandon the scratchpad implicitly; phase aggregates' sample set matches turn_latency

card: b8b0c49c-39fa-483c-bc39-9f08fd13ea15
parent task: #195

…195 slice 1)

Slice 1 of task #195 (inference latency perfection campaign). Per
`[[observability-is-half-the-architecture]]` + Joel 2026-06-05
"focus on perfecting real inference latency": the substrate
doesn't optimize what it can't measure. Today `ServeOutcome`
captures only end-to-end `turn_latency`. Subsequent slices need
phase-level data to know which step of the per-turn cycle to
attack — otherwise we're guessing.

## What lands

Five new `LatencyAggregate` fields on `ServeOutcome`, each
measuring one phase of the per-turn cognition cycle:

  - `recall_latency` — `cognition.admission.recall_scored` (L2
    retrieval; holds the cognition mutex)
  - `admit_latency` — `cognition.admission.admit` (engram
    formation for the inbox message; holds the mutex)
  - `compose_latency` — `cognition.compose_for_turn` (RAG
    flexbox composer + multi-source delivery; holds the mutex)
  - `respond_latency` — `persona::response::respond` (the LLM
    bulk; does NOT hold the mutex per the existing async
    contract). Typically the dominant cost and the primary
    target of slice 2+.
  - `say_latency` — `conversation.say` (airc publish + ack)

Each phase wraps with `Instant::now()` + `.elapsed().as_millis()`
into a `PhaseTimings` scratchpad zeroed per turn. After
`conversation.say` succeeds, the scratchpad is recorded into the
matching `ServeOutcome` aggregates. Only successful replies
sample the phase aggregates — identical gate as `turn_latency`
— so the decomposition stays internally consistent.

## Tracing

The existing "turn complete" `tracing::info!` line gains five
new fields (`recall_ms`, `admit_ms`, `compose_ms`, `respond_ms`,
`say_ms`) so operators see the per-turn breakdown live without
scraping the test fixture. Tag updated to
`"turn complete — substrate's per-reply cost recorded with phase
decomposition"` so log readers see the decomposition is wired.

## Why this matters for #195 sequencing

Without the decomposition we'd guess at which optimization moves
the needle. With it, slice 2 can be data-driven: run real
persona turns on the LCD tier, observe whether `respond_ms` is
the dominant cost (almost certainly) or whether
`compose_latency` is surprising the operator (possible — the
RAG composer touches multiple sources), and direct the next
slice accordingly.

Per `[[init-once-handle-then-lease-zero-copy-refs]]`: substrate
latency lives in reinit + memory/disk, not raw compute.
Decomposition will tell us WHERE the reinit costs are hiding.

## Tests (+2 new, 8/8 service_loop tests pass)

`persona::service_loop::tests`:

  - `serve_outcome_phase_aggregates_default_to_empty` — pins
    that every new phase aggregate starts at `count=0`,
    `mean_ms=None`. Future field additions can't accidentally
    regress the `Default` impl for existing fields.
  - `phase_aggregates_use_same_type_as_turn_latency` — pins that
    every phase aggregate is the same `LatencyAggregate` shape
    as `turn_latency` (record/total/mean call sites compile
    identically). A future refactor that swaps one to a
    different type fails to compile. Also checks the
    sum-of-phases ≤ turn_latency invariant.

The four pre-existing `#[ignore]` tests in this module (gated on
the slice 1D / #161 global adapter registration) remain
ignored; this slice is structural addition with no
behavioral change to the loop's reply path.

## Out of scope (subsequent slices)

- No optimization of any phase. Pure measurement layer.
- No format/breaking changes to `ServeOutcome` (additive fields
  only).
- No tracing format change beyond extending the existing "turn
  complete" line.
- No projection-overhead phase aggregate — if `RespondInput`
  construction shows up as material residual in real runs,
  slice 195.k adds its own aggregate.

## Doctrine

- `[[observability-is-half-the-architecture]]` — measure before
  optimize
- `[[init-once-handle-then-lease-zero-copy-refs]]` — knowing
  which phase is dominant directs the next slice
- `[[no-fallbacks-ever]]` — continue-on-error arms abandon the
  scratchpad implicitly; only fully-successful replies sample
  the phase aggregates, matching `turn_latency`'s sample set

card: b8b0c49c-39fa-483c-bc39-9f08fd13ea15

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@joelteply joelteply merged commit 0c10234 into canary Jun 6, 2026
3 checks passed
@joelteply joelteply deleted the b8b0c49c/latency-decomp branch June 6, 2026 02:10
joelteply added a commit that referenced this pull request Jun 6, 2026
…Id::as_str() (#195 slice 3)

Slice 3 of task #195 (inference latency perfection campaign).

Pre-slice-3, `serve_persona_loop_inner` did, every turn:

```rust
specialty: format!("{:?}", ctx.role).to_lowercase(),
```

Two allocations, a variadic-macro dispatch through derived-Debug,
and a Unicode lowercase walk — all to produce a value `RoleId`
already exposes via its canonical `as_str() -> &'static str`
method (in `role_template.rs`, pinned at compile time).

## What ships

Switch the per-turn read from the Debug+lowercase chain to
`ctx.role.as_str()`:

```rust
let respond_input = RespondInput {
    persona: PersonaSlot {
        // was: format!("{:?}", ctx.role).to_lowercase()
        specialty: ctx.role.as_str().to_string(),
        /* ... */
    },
    /* ... */
};
```

Per-turn cost drops from `format!()` (variadic macro + Debug
formatter + heap alloc) + `.to_lowercase()` (second alloc +
Unicode lowercase walk) to a single `String::from` of a static
`&'static str`. The source IS the cache — no field on
`PersonaContext`, no helper, no Arc plumbing.

## Why this shape (not a cache)

Initial draft of slice 3 added `PersonaContext.specialty: Arc<str>`
populated by a `build_persona_specialty(role)` helper that pre-
baked the Debug+lowercase output. Adversarial review (3 agents)
correctly identified two problems:

1. **Caching the wrong source.** `RoleId::as_str()` already exists
   as the documented, intentionally-stable specialty derivation
   (kebab-case identifier, pinned by an explicit `match` so
   renames are deliberate). Caching the Debug+lowercase output
   couples the prompt-assembly contract to the derived-Debug
   format — a future contributor adding a custom Debug or
   wrapping the enum silently breaks the specialty string with
   no test failure.
2. **Caching test was a tautology.** The helper was
   `Arc::from(format!("{:?}", role).to_lowercase())` and the test
   compared its output to `format!("{:?}", role).to_lowercase()`
   — comparing the helper to its own implementation. A buggy
   helper using `.to_uppercase()` would have passed.

The revised slice picks the right source and the right
contract: use `role.as_str()` directly. No cache needed because
the static str pointer IS already pre-computed at compile time.
The behavior-preservation test (below) is non-circular: it
compares the new direct path (`role.as_str()`, a hand-curated
match in role_template.rs) to the pre-slice-3 derived-Debug
chain.

## Test (+1 new)

`persona::service_loop::tests::role_as_str_preserves_pre_slice3_specialty_format_for_each_role`:

  - Pins that for every `RoleId` variant (Helper, Coder,
    Sentinel, Custom), `role.as_str()` produces byte-identical
    output to the pre-slice-3 `format!("{:?}", role).to_lowercase()`.
    Non-circular: two independently-derived strings. A future PR
    adjusting either `as_str()` or adding a variant where the
    two paths disagree must update this test to record the
    intentional divergence — silent drift breaks loudly.
  - Compile-time exhaustiveness: a closure-typed
    `|role: RoleId| match role { ... }` with NO wildcard arm
    forces a new RoleId variant to be added here AND to the
    `variants` array. Cosmetic-exhaustiveness defeats from the
    initial draft (matching on a literal `RoleId::Helper`) are
    fixed.

## What this slice does NOT do (deliberate)

- No `PersonaContext.specialty` field (initial draft removed
  per review). The original goal was reducing per-turn cost;
  `role.as_str()` already gives that with zero field bloat.
- No `display_name` change — `String::clone` of a fixed-per-
  session name is the same per-turn cost whether we cache or
  not, so the smell is RespondInput's `String`-typed field, not
  the source. Lift in a follow-up when `RespondInput`'s type
  changes (#149-adjacent).
- No prompt-template change — substrate keeps the existing
  specialty-string contract verbatim; the migration is
  provably behavior-preserving.

## Doctrine

- `[[init-once-handle-then-lease-zero-copy-refs]]` — the
  canonical "init-once" here is the compile-time `match` in
  `RoleId::as_str()`; per-turn cost is a `&'static str` deref +
  one `String::from` (memcpy of 6-9 static bytes). Doesn't get
  cheaper without changing `RespondInput`'s type signature.
- `[[observability-is-half-the-architecture]]` — slice 1's
  `respond_latency` aggregate will tell us if this and the
  cumulative slice-2/3+ wins are moving the needle on real
  persona turns.
- `[[no-fallbacks-ever]]` — no graceful-degradation path; the
  behavior-preservation test breaks loudly if either side
  drifts.

card: 0d780926
parent task: #195
slice 1: #1532 (merged) — phase decomposition
slice 2: #1533 (merged) — cache system_prompt at construction

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
joelteply added a commit that referenced this pull request Jun 6, 2026
…Id::as_str() (#195 slice 3)

Slice 3 of task #195 (inference latency perfection campaign).

Pre-slice-3, `serve_persona_loop_inner` did, every turn:

```rust
specialty: format!("{:?}", ctx.role).to_lowercase(),
```

Two allocations, a variadic-macro dispatch through derived-Debug,
and a Unicode lowercase walk — all to produce a value `RoleId`
already exposes via its canonical `as_str() -> &'static str`
method (in `role_template.rs`, pinned at compile time).

## What ships

Switch the per-turn read from the Debug+lowercase chain to
`ctx.role.as_str()`:

```rust
let respond_input = RespondInput {
    persona: PersonaSlot {
        // was: format!("{:?}", ctx.role).to_lowercase()
        specialty: ctx.role.as_str().to_string(),
        /* ... */
    },
    /* ... */
};
```

Per-turn cost drops from `format!()` (variadic macro + Debug
formatter + heap alloc) + `.to_lowercase()` (second alloc +
Unicode lowercase walk) to a single `String::from` of a static
`&'static str`. The source IS the cache — no field on
`PersonaContext`, no helper, no Arc plumbing.

## Why this shape (not a cache)

Initial draft of slice 3 added `PersonaContext.specialty: Arc<str>`
populated by a `build_persona_specialty(role)` helper that pre-
baked the Debug+lowercase output. Adversarial review (3 agents)
correctly identified two problems:

1. **Caching the wrong source.** `RoleId::as_str()` already exists
   as the documented, intentionally-stable specialty derivation
   (kebab-case identifier, pinned by an explicit `match` so
   renames are deliberate). Caching the Debug+lowercase output
   couples the prompt-assembly contract to the derived-Debug
   format — a future contributor adding a custom Debug or
   wrapping the enum silently breaks the specialty string with
   no test failure.
2. **Caching test was a tautology.** The helper was
   `Arc::from(format!("{:?}", role).to_lowercase())` and the test
   compared its output to `format!("{:?}", role).to_lowercase()`
   — comparing the helper to its own implementation. A buggy
   helper using `.to_uppercase()` would have passed.

The revised slice picks the right source and the right
contract: use `role.as_str()` directly. No cache needed because
the static str pointer IS already pre-computed at compile time.
The behavior-preservation test (below) is non-circular: it
compares the new direct path (`role.as_str()`, a hand-curated
match in role_template.rs) to the pre-slice-3 derived-Debug
chain.

## Test (+1 new)

`persona::service_loop::tests::role_as_str_preserves_pre_slice3_specialty_format_for_each_role`:

  - Pins that for every `RoleId` variant (Helper, Coder,
    Sentinel, Custom), `role.as_str()` produces byte-identical
    output to the pre-slice-3 `format!("{:?}", role).to_lowercase()`.
    Non-circular: two independently-derived strings. A future PR
    adjusting either `as_str()` or adding a variant where the
    two paths disagree must update this test to record the
    intentional divergence — silent drift breaks loudly.
  - Compile-time exhaustiveness: a closure-typed
    `|role: RoleId| match role { ... }` with NO wildcard arm
    forces a new RoleId variant to be added here AND to the
    `variants` array. Cosmetic-exhaustiveness defeats from the
    initial draft (matching on a literal `RoleId::Helper`) are
    fixed.

## What this slice does NOT do (deliberate)

- No `PersonaContext.specialty` field (initial draft removed
  per review). The original goal was reducing per-turn cost;
  `role.as_str()` already gives that with zero field bloat.
- No `display_name` change — `String::clone` of a fixed-per-
  session name is the same per-turn cost whether we cache or
  not, so the smell is RespondInput's `String`-typed field, not
  the source. Lift in a follow-up when `RespondInput`'s type
  changes (#149-adjacent).
- No prompt-template change — substrate keeps the existing
  specialty-string contract verbatim; the migration is
  provably behavior-preserving.

## Doctrine

- `[[init-once-handle-then-lease-zero-copy-refs]]` — the
  canonical "init-once" here is the compile-time `match` in
  `RoleId::as_str()`; per-turn cost is a `&'static str` deref +
  one `String::from` (memcpy of 6-9 static bytes). Doesn't get
  cheaper without changing `RespondInput`'s type signature.
- `[[observability-is-half-the-architecture]]` — slice 1's
  `respond_latency` aggregate will tell us if this and the
  cumulative slice-2/3+ wins are moving the needle on real
  persona turns.
- `[[no-fallbacks-ever]]` — no graceful-degradation path; the
  behavior-preservation test breaks loudly if either side
  drifts.

card: 0d780926
parent task: #195
slice 1: #1532 (merged) — phase decomposition
slice 2: #1533 (merged) — cache system_prompt at construction

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
joelteply added a commit that referenced this pull request Jun 6, 2026
…Id::as_str() (#195 slice 3) (#1534)

Slice 3 of task #195 (inference latency perfection campaign).

Pre-slice-3, `serve_persona_loop_inner` did, every turn:

```rust
specialty: format!("{:?}", ctx.role).to_lowercase(),
```

Two allocations, a variadic-macro dispatch through derived-Debug,
and a Unicode lowercase walk — all to produce a value `RoleId`
already exposes via its canonical `as_str() -> &'static str`
method (in `role_template.rs`, pinned at compile time).

## What ships

Switch the per-turn read from the Debug+lowercase chain to
`ctx.role.as_str()`:

```rust
let respond_input = RespondInput {
    persona: PersonaSlot {
        // was: format!("{:?}", ctx.role).to_lowercase()
        specialty: ctx.role.as_str().to_string(),
        /* ... */
    },
    /* ... */
};
```

Per-turn cost drops from `format!()` (variadic macro + Debug
formatter + heap alloc) + `.to_lowercase()` (second alloc +
Unicode lowercase walk) to a single `String::from` of a static
`&'static str`. The source IS the cache — no field on
`PersonaContext`, no helper, no Arc plumbing.

## Why this shape (not a cache)

Initial draft of slice 3 added `PersonaContext.specialty: Arc<str>`
populated by a `build_persona_specialty(role)` helper that pre-
baked the Debug+lowercase output. Adversarial review (3 agents)
correctly identified two problems:

1. **Caching the wrong source.** `RoleId::as_str()` already exists
   as the documented, intentionally-stable specialty derivation
   (kebab-case identifier, pinned by an explicit `match` so
   renames are deliberate). Caching the Debug+lowercase output
   couples the prompt-assembly contract to the derived-Debug
   format — a future contributor adding a custom Debug or
   wrapping the enum silently breaks the specialty string with
   no test failure.
2. **Caching test was a tautology.** The helper was
   `Arc::from(format!("{:?}", role).to_lowercase())` and the test
   compared its output to `format!("{:?}", role).to_lowercase()`
   — comparing the helper to its own implementation. A buggy
   helper using `.to_uppercase()` would have passed.

The revised slice picks the right source and the right
contract: use `role.as_str()` directly. No cache needed because
the static str pointer IS already pre-computed at compile time.
The behavior-preservation test (below) is non-circular: it
compares the new direct path (`role.as_str()`, a hand-curated
match in role_template.rs) to the pre-slice-3 derived-Debug
chain.

## Test (+1 new)

`persona::service_loop::tests::role_as_str_preserves_pre_slice3_specialty_format_for_each_role`:

  - Pins that for every `RoleId` variant (Helper, Coder,
    Sentinel, Custom), `role.as_str()` produces byte-identical
    output to the pre-slice-3 `format!("{:?}", role).to_lowercase()`.
    Non-circular: two independently-derived strings. A future PR
    adjusting either `as_str()` or adding a variant where the
    two paths disagree must update this test to record the
    intentional divergence — silent drift breaks loudly.
  - Compile-time exhaustiveness: a closure-typed
    `|role: RoleId| match role { ... }` with NO wildcard arm
    forces a new RoleId variant to be added here AND to the
    `variants` array. Cosmetic-exhaustiveness defeats from the
    initial draft (matching on a literal `RoleId::Helper`) are
    fixed.

## What this slice does NOT do (deliberate)

- No `PersonaContext.specialty` field (initial draft removed
  per review). The original goal was reducing per-turn cost;
  `role.as_str()` already gives that with zero field bloat.
- No `display_name` change — `String::clone` of a fixed-per-
  session name is the same per-turn cost whether we cache or
  not, so the smell is RespondInput's `String`-typed field, not
  the source. Lift in a follow-up when `RespondInput`'s type
  changes (#149-adjacent).
- No prompt-template change — substrate keeps the existing
  specialty-string contract verbatim; the migration is
  provably behavior-preserving.

## Doctrine

- `[[init-once-handle-then-lease-zero-copy-refs]]` — the
  canonical "init-once" here is the compile-time `match` in
  `RoleId::as_str()`; per-turn cost is a `&'static str` deref +
  one `String::from` (memcpy of 6-9 static bytes). Doesn't get
  cheaper without changing `RespondInput`'s type signature.
- `[[observability-is-half-the-architecture]]` — slice 1's
  `respond_latency` aggregate will tell us if this and the
  cumulative slice-2/3+ wins are moving the needle on real
  persona turns.
- `[[no-fallbacks-ever]]` — no graceful-degradation path; the
  behavior-preservation test breaks loudly if either side
  drifts.

card: 0d780926
parent task: #195
slice 1: #1532 (merged) — phase decomposition
slice 2: #1533 (merged) — cache system_prompt at construction

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
joelteply added a commit that referenced this pull request Jun 6, 2026
…ion) (#1535)

* perf(persona): drop per-turn format!()+lowercase — use canonical RoleId::as_str() (#195 slice 3)

Slice 3 of task #195 (inference latency perfection campaign).

Pre-slice-3, `serve_persona_loop_inner` did, every turn:

```rust
specialty: format!("{:?}", ctx.role).to_lowercase(),
```

Two allocations, a variadic-macro dispatch through derived-Debug,
and a Unicode lowercase walk — all to produce a value `RoleId`
already exposes via its canonical `as_str() -> &'static str`
method (in `role_template.rs`, pinned at compile time).

## What ships

Switch the per-turn read from the Debug+lowercase chain to
`ctx.role.as_str()`:

```rust
let respond_input = RespondInput {
    persona: PersonaSlot {
        // was: format!("{:?}", ctx.role).to_lowercase()
        specialty: ctx.role.as_str().to_string(),
        /* ... */
    },
    /* ... */
};
```

Per-turn cost drops from `format!()` (variadic macro + Debug
formatter + heap alloc) + `.to_lowercase()` (second alloc +
Unicode lowercase walk) to a single `String::from` of a static
`&'static str`. The source IS the cache — no field on
`PersonaContext`, no helper, no Arc plumbing.

## Why this shape (not a cache)

Initial draft of slice 3 added `PersonaContext.specialty: Arc<str>`
populated by a `build_persona_specialty(role)` helper that pre-
baked the Debug+lowercase output. Adversarial review (3 agents)
correctly identified two problems:

1. **Caching the wrong source.** `RoleId::as_str()` already exists
   as the documented, intentionally-stable specialty derivation
   (kebab-case identifier, pinned by an explicit `match` so
   renames are deliberate). Caching the Debug+lowercase output
   couples the prompt-assembly contract to the derived-Debug
   format — a future contributor adding a custom Debug or
   wrapping the enum silently breaks the specialty string with
   no test failure.
2. **Caching test was a tautology.** The helper was
   `Arc::from(format!("{:?}", role).to_lowercase())` and the test
   compared its output to `format!("{:?}", role).to_lowercase()`
   — comparing the helper to its own implementation. A buggy
   helper using `.to_uppercase()` would have passed.

The revised slice picks the right source and the right
contract: use `role.as_str()` directly. No cache needed because
the static str pointer IS already pre-computed at compile time.
The behavior-preservation test (below) is non-circular: it
compares the new direct path (`role.as_str()`, a hand-curated
match in role_template.rs) to the pre-slice-3 derived-Debug
chain.

## Test (+1 new)

`persona::service_loop::tests::role_as_str_preserves_pre_slice3_specialty_format_for_each_role`:

  - Pins that for every `RoleId` variant (Helper, Coder,
    Sentinel, Custom), `role.as_str()` produces byte-identical
    output to the pre-slice-3 `format!("{:?}", role).to_lowercase()`.
    Non-circular: two independently-derived strings. A future PR
    adjusting either `as_str()` or adding a variant where the
    two paths disagree must update this test to record the
    intentional divergence — silent drift breaks loudly.
  - Compile-time exhaustiveness: a closure-typed
    `|role: RoleId| match role { ... }` with NO wildcard arm
    forces a new RoleId variant to be added here AND to the
    `variants` array. Cosmetic-exhaustiveness defeats from the
    initial draft (matching on a literal `RoleId::Helper`) are
    fixed.

## What this slice does NOT do (deliberate)

- No `PersonaContext.specialty` field (initial draft removed
  per review). The original goal was reducing per-turn cost;
  `role.as_str()` already gives that with zero field bloat.
- No `display_name` change — `String::clone` of a fixed-per-
  session name is the same per-turn cost whether we cache or
  not, so the smell is RespondInput's `String`-typed field, not
  the source. Lift in a follow-up when `RespondInput`'s type
  changes (#149-adjacent).
- No prompt-template change — substrate keeps the existing
  specialty-string contract verbatim; the migration is
  provably behavior-preserving.

## Doctrine

- `[[init-once-handle-then-lease-zero-copy-refs]]` — the
  canonical "init-once" here is the compile-time `match` in
  `RoleId::as_str()`; per-turn cost is a `&'static str` deref +
  one `String::from` (memcpy of 6-9 static bytes). Doesn't get
  cheaper without changing `RespondInput`'s type signature.
- `[[observability-is-half-the-architecture]]` — slice 1's
  `respond_latency` aggregate will tell us if this and the
  cumulative slice-2/3+ wins are moving the needle on real
  persona turns.
- `[[no-fallbacks-ever]]` — no graceful-degradation path; the
  behavior-preservation test breaks loudly if either side
  drifts.

card: 0d780926
parent task: #195
slice 1: #1532 (merged) — phase decomposition
slice 2: #1533 (merged) — cache system_prompt at construction

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(probes): JsonlProbeFileSink + RTOS-debugger manual + README/CLAUDE pointers (#151 foundation)

The substrate has the JTAG hardware (Slice P #176/#177): `probe!` /
`time_sync!` / `stack!` macros, `ProbeRouterLayer` (in-process
broadcast Layer), `ProbeStreamModule` (the `debug/probes/{class}/{open,
next,close}` URI consumer), and `UriCaptureLayer` (URI ancestry).
Zero usage sites in `persona/` or `cognition/` — the probes were
infrastructure waiting for application.

Per Joel 2026-06-06 `[[jtag-probes-are-rtos-debugger]]`: "We are
building an rtos. This is so complex we need to be able to have a
detailed understanding of what's going on, like a debug breakpoint,
in a way, to easily observe where the code and the surrounding vars
are, ones we want to inspect via probes, and timing of anything, so
we can hunt down bottlenecks."

This commit ships the FOUNDATION for that — the disk-persisted
breakpoint log + the manual that documents the method. The sprinkle
itself (probe calls inside `persona::response`,
`persona::prompt_assembly`, `cognition::shared_analysis`,
`persona::service_loop`) follows in subsequent commits using the
checklist in the manual.

## What ships

### `routing/probe_file_sink.rs` (new — 5/5 tests passing)

`JsonlProbeFileSink` — a tracing Layer that visits every probe
event, applies the operator-configured class filter, and writes a
JSONL line to disk with the original `class` + `uri_chain` +
`message` + `fields` plus a fresh `captured_at_ms` timestamp.

```rust
let sink = JsonlProbeFileSink::from_env()?;
// CONTINUUM_PROBE_FILE=/tmp/probes.jsonl
// CONTINUUM_PROBE_CLASSES=persona.turn.start,persona.response.render.prompt
tracing_subscriber::registry()
    .with(UriCaptureLayer::new())
    .with(ProbeRouterLayer::new())       // in-process broadcast
    .with(sink)                          // disk-persisted breakpoint log
    .init();
```

Composes with `ProbeRouterLayer` — both visit the same tracing event
independently; broadcast subscribers stay in-process, this one
persists to disk. Per-event filter is a single HashSet lookup (empty
set = no filter, all classes pass).

### Tests (5)

- `from_env_returns_envvar_unset_when_path_missing` — env-var unset
  is a typed Err, not a silent skip (no-fallbacks doctrine)
- `unfiltered_sink_persists_every_class` — empty class filter passes
  everything through, captured_at_ms is populated, message + fields
  preserved
- `class_filter_drops_unallowed_classes` — only allowed classes
  reach disk
- `non_probe_tracing_events_are_ignored` — `tracing::info!` calls
  without `probe_class` field don't pollute the probe log (signal
  density preserved)
- `sink_captures_uri_chain_when_dispatched_span_active` — when
  inside a dispatched URI span, the on-disk record carries the URI
  ancestry (so an operator can trace which command's execution the
  probe fired during)

### `docs/architecture/RTOS-DEBUGGER-PROBES.md` (new)

The MANUAL. Persistent reference per Joel's directive: "We will want
a manual (doc) on how to probe the persona so we don't forget where
the work must be done and how. Be sure you document your method."

Covers:
- The mental model: probes are non-blocking breakpoints, time_sync!
  is RAII timing, stack!() is the URI ancestry — together they're
  the substrate's RTOS debugger
- When to add a probe (branch boundaries, stage entry/exit, external
  calls, state seams, decision points)
- The class taxonomy: stable names for `persona.turn.*`,
  `persona.response.*`, `persona.prompt.*`, `cognition.analyze.*`,
  plus general-purpose `decision` / `state` / `error`
- How to enable (CONTINUUM_PROBE_FILE + CONTINUUM_PROBE_CLASSES
  env vars, no recompile)
- How to read (jq one-liners for the common queries: per-persona
  filter, silence reasons, slow timings, single-turn reconstruction)
- How to add a new probe (the call shape + convention rules)
- **The sprinkle checklist** — every file that needs probes added,
  with the specific seams in each. Updates to this checklist live
  in the same doc; the doc IS the source of truth for what's wired
  and what isn't

### README.md "Debugging this substrate" section

Inserted between Research Foundations and Documentation. Worked
code example + the env-var enable commands + link to the manual.
Every contributor (human or AI agent) sees the probe convention
from the public README, not just from buried docs.

The Documentation table gains a row for
`RTOS-DEBUGGER-PROBES.md` directly after CONTINUUM-ARCHITECTURE.md,
so the doc is discoverable from the canonical doc list as well.

### CLAUDE.md sub-entry

Added under canonical doc #5 (OBSERVABILITY-AS-SUBSTRATE):
"the practical companion: how to USE the `probe!` / `time_sync!`
/ `time_async!` macros as RTOS-style breakpoints with variable
inspection + timing." Any agent reading the project's CLAUDE.md
finds the probe convention as a precedence-winning truth, alongside
the canonical substrate docs.

## Why this slice is foundation-only

Sprinkling probes into the cognition is a separate slice (the
manual's checklist tracks it). Shipping the file sink + the manual
+ the discoverability pointers FIRST gives reviewers a small,
testable change that's clearly correct in isolation, and gives the
next slice (the sprinkle) a stable target to write against — every
probe call site in commit 2+ refers to the class taxonomy in commit
1's manual.

## Doctrine

- `[[jtag-probes-are-rtos-debugger]]` — the framing this slice
  enables
- `[[observability-is-half-the-architecture]]` — CaptureSink +
  Noop default + replay-as-first-class; this is the file-persisted
  consumer of that pattern
- `[[no-fallbacks-ever]]` — env-var unset is typed Err
  `ProbeFileSinkError::EnvVarUnset`, path failures are typed
  `OpenFailed { path, source }` (operator must fix path; substrate
  refuses to silently synthesize a default)

card: 8d7ca5c3
parent: #151 (echo-storm filter) — the bug we're actually hunting,
debugger-first so we can find the bug INSIDE the cognition pipeline,
not in a Rust gate around it

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant