feat(persona): per-turn latency metrics — LatencyAggregate + ServeOutcome.turn_latency by joelteply · Pull Request #1515 · CambrianTech/continuum

joelteply · 2026-06-02T21:16:53Z

Summary

Per Joel 2026-06-02 ("make sure timing and other metrics are in place"): the substrate doesn't get to claim "fast airc-bound persona" without measuring. This PR makes the per-reply cost structural.

What changed

LatencyAggregate { count, total_ms, min_ms, max_ms } — cheap online aggregator. O(1) record, allocation-free, saturating-add on overflow (locked by test).
ServeOutcome.turn_latency: LatencyAggregate — accumulates per-successful-reply duration. Excludes wait-for-next-message and pre-watermark / self-loop / RAG-only-skip cycles.
serve_persona_loop_inner instruments the per-reply path:
- Instant::now captured AFTER filters, BEFORE RAG inspect
- elapsed recorded into turn_latency only on successful say
- tracing::info! per turn with lamport, duration, mean/min/max so the substrate's observability layer captures the metric structurally per [[observability-is-half-the-architecture]]

Doctrine fit

Monotonic Instant (not wall-clock) — immune to clock skew
One Instant per turn, no Vec growth, no heap allocs on hot path
Per Joel's computer-engineer mental model in [[init-once-handle-then-lease-zero-copy-refs]]: cache-friendly, branch-predictable, autovectorization-friendly

Tests (7/7 pass)

latency_aggregate_records_min_max_sum_count — empty + populated math; mean = total/count
latency_aggregate_saturates_on_overflow — locks safety property per [[every-error-is-an-opportunity-to-battle-harden]]
replies_to_inbound_from_other_peer (extended) — asserts turn_latency.count == 1 after one successful reply

Stacked on

PR #1514 (feat/persona-prime-at-spawn).

Closes #150. Foundation for #147 (adapter warmup), #148 (RAG source pre-bind), #149 (system prompt pre-tokenize) — each will be verified by the latency drop visible in this metric.

🤖 Generated with Claude Code

…come.turn_latency (#150) Per Joel 2026-06-02: "make sure timing and other metrics are in place." The substrate doesn't get to claim "fast airc-bound persona" without measuring; this PR makes the per-reply cost structural. Added (all in persona/service_loop.rs): - LatencyAggregate { count, total_ms, min_ms, max_ms } — cheap online aggregator. O(1) record, allocation-free, saturating-add on overflow (locked by test). mean_ms returns Option<f64>. - ServeOutcome.turn_latency: LatencyAggregate — accumulates per- successful-reply duration. Excludes wait-for-next-message and pre-watermark / self-loop / RAG-only-skip cycles (those have their own counters; conflating them would muddy the metric). - serve_persona_loop_inner instruments the per-reply path: - Instant::now captured AFTER filters, BEFORE RAG inspect - elapsed recorded into turn_latency only on successful say - tracing::info per turn with lamport, duration, mean/min/max so the substrate's observability layer captures the metric structurally per [[observability-is-half-the-architecture]] Doctrine fit: - Monotonic Instant (not wall-clock) — immune to clock skew - One Instant per turn, no Vec growth, no heap allocs on hot path - Per Joel's computer-engineer mental model in [[init-once-handle-then-lease-zero-copy-refs]]: cache-friendly, branch-predictable, autovectorization-friendly Tests (7/7 pass): - latency_aggregate_records_min_max_sum_count — empty + populated math; mean = total/count - latency_aggregate_saturates_on_overflow — locks the safety property per [[every-error-is-an-opportunity-to-battle-harden]] - replies_to_inbound_from_other_peer (extended) — asserts turn_latency.count == 1 after one successful reply; min/max/mean set. If a future refactor forgets to record, count drops to 0 and the test fails loudly Test plan: - [x] cargo test --lib ... persona::service_loop:: — 7/7 pass Closes #150. Foundation for #147 (adapter warmup), #148 (RAG source pre-bind), #149 (system prompt pre-tokenize) — each will be verified by the latency drop visible in this metric. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…test (caller-primes contract) Per Joel 2026-06-02: "God I hope it's not more fallback cancer. You tend to turn stuff into fake demos." Two honest fixes addressing both criticisms. ## Fix 1: ONE place primes, not two (no more belt-and-suspenders) Before: `spawn_persona_service` called `conversation.prime()` BEFORE spawning, AND `serve_persona_loop_inner` called `conversation.prime()` unconditionally as a "safety net." Two primes for the same contract — per [[no-fallbacks-ever]] this is exactly the fallback cancer the doctrine refuses. After: `serve_persona_loop_inner` does NOT prime. Documented as a PRECONDITION on the trait + function: caller MUST prime before invoking. The supervisor's `spawn_persona_service` primes for production. Direct callers (`airc_chat_demo`, tests) prime explicitly. If a caller forgets, the first `next_message` returns the typed `Err("called before prime()")` shipped in cb2894f — fail-loud, never silently-warm. Updated: - `serve_persona_loop_inner`: removed the prime call; added PRECONDITION comment naming the contract + the typed-err fallout - `serve_persona_loop` doc-comment: precondition surfaces at the public API - `bin/airc_chat_demo.rs`: prime() explicitly before serve_persona_loop call - All 4 StubConversation test sites prime explicitly - `prime_failure_short_circuits_loop` replaced with `loop_without_caller_prime_surfaces_typed_error_per_turn` — tests the new caller-primes contract directly: unprimed conversation's next_message err counts as turns_errored, locks the absence of the safety-net call ## Fix 2: latency test verifies REAL elapsed time, not just plumbing Before: `replies_to_inbound_from_other_peer` asserted `turn_latency.count == 1` and that min/max/mean were Some. Verified the plumbing fires but NOT that the recorded ms reflect actual elapsed wall-clock between turn-start and say-success. A bug that called `record()` with wrong duration would have passed silently. Fake-demo-shaped. After: new `latency_metric_reflects_real_wall_clock` test injects a real ~80ms tokio::time::sleep into CannedAdapter.generate_text, runs the loop, asserts: - `observed_ms >= 50` (CI jitter floor — verifies metric tracks the injected delay, not always-zero) - `observed_ms < 5000` (upper bound for sanity) CannedAdapter gains `inject_delay_ms` field; `fake_hosted_with_delay` helper exposes it. Default (`fake_hosted`) passes 0 so existing tests are unaffected. Test plan: - [x] cargo test --lib ... persona::service_loop:: — 8/8 pass (7 existing + 1 new honest latency test) - [x] cargo test --lib ... persona:: — 713/713 pass overall Doctrine recap: - [[no-fallbacks-ever]] — one place primes, not two - [[every-error-is-an-opportunity-to-battle-harden]] — the caller-primes regression test locks the contract - The honest latency test prevents the "passes on plumbing, silent on correctness" anti-pattern Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

joelteply · 2026-06-07T15:51:40Z

Superseded — the semantic content of this PR landed via the cognition pipeline rewires on canary.

Verified on canary (75b21d08a):

AircCitizen trait → src/workers/continuum-core/src/persona/airc_citizen.rs:73
PersonaConversation::prime → src/workers/continuum-core/src/persona/airc_persona_conversation.rs:105 + service_loop.rs:100
LatencyAggregate + ServeOutcome.turn_latency → service_loop.rs:167,247
AIProviderAdapter::warmup → ai/adapter.rs:319, called at supervisor.rs:447
Test scaffolding primitives (scripted_conversation, scripted_adapter_factory) → persona/scripted_*.rs
PersonaSpawnSupervisor + BootSummary → persona/host.rs:119
.instrument(ctx.span()) (&ctx-pure tracing) → service_loop.rs:309
multi_persona_stress_baseline.rs → already on canary (via feat(persona): persona decides + responds via LLM in ONE structured call #1519)

The stack pre-dated #1519's "persona decides + responds via LLM in ONE structured call" rewire and #1539's cognition pipeline integration. Those PRs reshaped serve_persona_loop so heavily that rebasing this stack would mean replaying the architectural intent against code that already implements it. Closing as superseded; task system marks #144/#146/#147/#150/#154/#156 completed.

No work lost — every architectural insight from this stack is in the substrate today.

github-actions Bot added the size: M label Jun 2, 2026

github-actions Bot added size: L and removed size: M labels Jun 2, 2026

joelteply mentioned this pull request Jun 2, 2026

feat(persona): adapter warmup at boot — pay KV cache + kernel JIT cost off the hot path #1516

Closed

joelteply deleted the branch feat/persona-prime-at-spawn June 7, 2026 15:51

joelteply closed this Jun 7, 2026

joelteply deleted the feat/persona-turn-latency-metrics branch June 7, 2026 15:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(persona): per-turn latency metrics — LatencyAggregate + ServeOutcome.turn_latency#1515

feat(persona): per-turn latency metrics — LatencyAggregate + ServeOutcome.turn_latency#1515
joelteply wants to merge 2 commits into
feat/persona-prime-at-spawnfrom
feat/persona-turn-latency-metrics

joelteply commented Jun 2, 2026

Uh oh!

joelteply commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant