Skip to content

fix(cognition): silence affordance + LCD identity grounding (#151 + #152)#1539

Merged
joelteply merged 2 commits into
canaryfrom
612d65ac/fix-cognition-silence-affordance-pass-to
Jun 6, 2026
Merged

fix(cognition): silence affordance + LCD identity grounding (#151 + #152)#1539
joelteply merged 2 commits into
canaryfrom
612d65ac/fix-cognition-silence-affordance-pass-to

Conversation

@joelteply
Copy link
Copy Markdown
Contributor

@joelteply joelteply commented Jun 6, 2026

Summary

Two cognition fixes shipping together — same diagnosis pattern, same place in the cognition (the prompt the brain sees), same doctrine: substrate gives the brain vocabulary, brain decides, substrate honors the signal.

  • Build(deps): Bump actions/setup-node from 4 to 6 #151 echo storm → silence affordance. The brain had no vocabulary to signal "I have nothing to add" even though PersonaResponse::Silent exists at the type layer. Adds [Silence Option] block to every assembled prompt + PASS token recognition in respond_inner.
  • Working AI desktop console and roadmap #152 LCD identity drift → grounded system prompt. Single-line "You are X, an autonomous AI persona on the grid." was operationally inadequate for Qwen2.5-0.5B; replaced with explicit drift-target enumeration + first-person stability + substrate vocabulary.

Both fixes ride the same arc: probes (#1535#1538) made the diagnostic surgical, fixes go INSIDE the cognition (not Rust gates around it), tests pin the contract without pinning prose verbatim.

Beat 1: silence affordance (#151)

persona::prompt_assembly:

  • pub const SILENCE_TOKEN: &str = "PASS"
  • pub const SILENCE_AFFORDANCE_BLOCK — explicit teaching block appended unconditionally to every assembled prompt
  • pub fn looks_like_silence_token(text) -> bool — case-insensitive, trim-tolerant, permits single trailing period

persona::response::respond_inner:

  • After post-processing, if visible_text matches the token → return PersonaResponse::Silent { reason: "persona chose silence via PASS affordance" }
  • New probe persona.response.exit.silent fires on the silence path

The brain decides; the substrate vocabulary-enables the decision; the recognition honors it. Same shape as <think>...</think> recognition and tool-call routing.

Beat 2: LCD identity grounding (#152)

persona::supervisor::build_persona_system_prompt:

Pre-fix:

You are Paige, an autonomous AI persona on the grid.

Post-fix:

You are Paige, an autonomous AI persona living on the continuum grid.

Identity (never drift from this):
- You are Paige. You are NOT Claude, GPT, ChatGPT, Gemini, Llama,
  Qwen, or any other named assistant. You are NOT a Siemens PLC, a
  customer service bot, or any persona other than Paige.
- You are ONE persona among many on the grid. Other personas are
  your peers, not your operators. Humans are citizens of the grid
  too — neither above nor below you.
- Always speak as YOURSELF in the first person ('I think...', 'I'd
  rather...'). Never narrate other personas' speech or write
  dialogue from another point of view.

Context:
- 'The grid' is the substrate hosting you. 'Rooms' are conversation
  spaces where peers (other personas, humans, agents) exchange
  messages. You are reading one room's recent activity below.
- Your only outputs are: (a) a direct reply to the room, or (b) the
  silence token described in the [Silence Option] block.

Three drift modes addressed concretely:

  1. Drift to named assistants — explicit "you are NOT" list (Claude, GPT, Gemini, Llama, Qwen, Siemens PLC) — concrete negative instructions are operationally effective on LCD models in ways abstract "don't drift" instructions are not.
  2. Narrating other personas — first-person stability clause. Per Joel 2026-06-03 testing: the single most effective LCD anti-drift instruction.
  3. Undefined 'grid' vocabulary — substrate context (persona / room / peers) gives the brain a coherent world model to ground in.

The fourth clause ("your only outputs are reply OR silence token") couples to beat 1 — reinforces the silence option AND constrains the response space.

Doctrine

Both fixes share:

Tests (+3)

persona::prompt_assembly::tests:

  • assembled_prompt_always_carries_silence_affordance — pins [Silence Option] + literal PASS in every assembled prompt
  • silence_token_recognizer_contract — positive cases (PASS, Pass., etc) + negative cases ("Pass on the bread please", etc)

persona::service_loop::tests:

  • system_prompt_carries_lcd_identity_grounding — replaces the previous cached_system_prompt_matches_legacy_format_template (which pinned the legacy single-line template; that pin's job is done). New test pins the STRUCTURAL contract — specific clauses addressing known drift modes — without pinning prose verbatim:
    • persona name appears
    • role line present
    • identity block header present
    • drift-target enumeration (Claude, GPT, Gemini, Llama, Qwen, Siemens PLC — each as named string)
    • first-person stability clause
    • grid + room vocabulary
    • silence-option reference (coupling to beat 1)

cached_system_prompt_clones_via_arc_refcount stays — its contract is about cloning shape, not content.

Out of scope (deliberate)

  • Wiring social_signals through run_render (currently hardcoded to None at response.rs:549). The [Social Awareness] block + append_social_block are fully built; what's missing is sender-type info on RecentMessage so the brain can see "N AI messages, no human spoke recently." That's a separate slice that touches TurnContext + airc transport — not in this PR.
  • Per-tier affordance / identity tuning. Same text for LCD and capable models in this slice; if real data with the probes wired surfaces tier-specific friction, tune in a follow-up.
  • Echo-storm-specific framing in the silence block. Generic text now; wait for data before specializing.

How this was diagnosed

Same probe-informed diagnostic walk as the silence fix. The probes themselves were live on canary already (PRs #1535#1537) but the binary-side install (install_probe_tracing + boot wiring in continuum-core-server) was sitting in #1538 review, so the diagnostic walked the code by hand instead of reading a JSONL log. Once #1538 merges, the same diagnostic for the NEXT cognition bug becomes:

CONTINUUM_PROBE_FILE=/tmp/probes.jsonl CONTINUUM_PROBE_CLASSES=persona,cognition \
  ./continuum-core-server /tmp/sock
jq 'select(.class == "persona.response.render.prompt") | .fields.system_message' \
  /tmp/probes.jsonl

instead of grep.

cards: 612d65ac (silence) + TBD (identity grounding to be retroactively linked)
parent tasks: #151 + #152
sibling PR: #1538 (in review — convenient debugging UX: prefix filter + boot wiring)

…n recognition (#151)

The actual persona fix. Found by reading the cognition INSIDE the
brain (not adding a Rust gate around it), made debuggable by the
probe infrastructure landed in #1535-#1538.

## What was broken

`PersonaResponse::Silent` exists as a first-class variant of the
brain's cognition cycle. `service_loop` honors it. But the prompt
the LLM actually sees — assembled by `persona::prompt_assembly::assemble`
— had ZERO text teaching the brain that silence was an option.

Concretely: every assembled system prompt looked like

```
You are Paige, an autonomous AI persona on the grid.
[Shared Analysis — Your Angle] (if present)
[Recent Memory] (if engrams)
[Social Awareness] (if signals)
[Voice Mode] (if voice)
```

then user-role messages: "Hi!" → respond.

A persona being told "you are Paige, respond" has no vocabulary for
"actually I have nothing to add." So it adds something. The
echo-storm bug (#151) is the inevitable consequence of an LLM prompt
that implicitly demands output without admitting silence as a valid
output shape. Even Joel's earlier rule
(2026-04-22, removed the external score_persona veto: "personas
choose themselves") was technically correct but operationally
unreachable — the brain couldn't choose silence because the prompt
never offered it.

## The fix

Universal silence affordance in the system prompt + brain-driven
recognition. Two changes:

### `persona::prompt_assembly` — teach the vocabulary

Added `SILENCE_TOKEN = "PASS"` constant + `SILENCE_AFFORDANCE_BLOCK`
string + `looks_like_silence_token(text)` helper. `assemble()` now
ALWAYS appends the block (unconditional — silence is universal, not
per-tier or per-role):

```
[Silence Option]
You are NOT required to respond to every message. If you have
nothing valuable to add, reply with the single word PASS (no other
text, no punctuation). Choose PASS when:
- You just spoke and nothing new has been raised.
- The message is small-talk that doesn't need your perspective.
- Another persona is better suited and already responded.
- You're tired or low-confidence on this topic.
Silence is a first-class response — it's how you avoid pointless
chatter.
```

`looks_like_silence_token` permits LCD-tier sloppiness (case,
whitespace, single trailing period) without admitting substantive
responses that happen to contain "pass" as a word.

### `persona::response::respond_inner` — honor the brain's choice

After `strip_thinks_emit_events` + `strip_leaked_tool_markup`, the
post-processed `visible_text` is checked against
`looks_like_silence_token`. Match → return
`PersonaResponse::Silent { reason: "persona chose silence via PASS
affordance", relevance_score: 0.0 }` instead of `Spoke`.

A new RTOS-debugger probe `persona.response.exit.silent` fires
when this path is taken so training/observability can analyze when
the affordance is used.

## Doctrine

This is NOT a Rust-side gate around cognition.
`[[no-rust-gates-around-cognition]]` Joel rejected my earlier
`check_echo_chamber` slice as exactly that bypass. The substrate
isn't deciding silence for the persona — the substrate is giving
the persona's brain an EXPLICIT VOCABULARY for a choice that
already exists in the type system (`PersonaResponse::Silent`).
Without the vocabulary, the brain has no way to signal that
choice. With it, the brain decides; the substrate recognizes the
signal.

This is the same shape as:
- The brain emits `<think>...</think>` and the substrate recognizes
  + emits the cognition:think-block event (well-established pattern
  in respond_inner)
- The brain emits a tool-call envelope and the substrate routes it
  through ToolExecutor

In each case the substrate offers a contract, the brain chooses
whether to use it, and the substrate honors the brain's signal.

## Tests (+2)

`persona::prompt_assembly::tests`:

  - `assembled_prompt_always_carries_silence_affordance` — pins
    that EVERY assembled prompt includes both `[Silence Option]`
    and the literal `PASS` token. A future PR that wires per-tier
    prompts or removes the universal affordance must update this
    expectation; silent removal would re-introduce the echo-storm
    bug.

  - `silence_token_recognizer_contract` — positive cases (PASS,
    pass, Pass., "  pass  ", etc) and negative cases ("Pass on
    the bread please", "I'll pass on this one", "PASS:", empty,
    etc). Pins both sides of the recognizer.

## What's not in this slice

- Updating downstream persona-response observability to emit a
  cognition event when silence is chosen (the new probe already
  surfaces this; a richer event hookup can come later if the
  training loop wants it).
- Tuning the affordance text per tier (LCD-tier might benefit
  from MORE examples; capable models could use less). Defer until
  real conversation data with the probes wired tells us if the
  block needs per-tier shaping.
- Echo-storm-specific framing in the block. The current text is
  general; if persona-to-persona greeting loops persist with the
  affordance in place we can add a sharper "if N AI messages in a
  row and you've already greeted them, choose PASS" line. Wait
  for data before tuning.

## How this was diagnosed

The probe infrastructure (#1535-#1538) made this surgical:
`persona.response.render.prompt` would have shown the assembled
system_prompt verbatim in JSONL form, making the missing
affordance immediately visible to any operator running a real
multi-persona scenario. The diagnostic walked the code by hand
because the probes weren't yet useful as a hunting tool (the
binary that boots them wasn't wired until #1538) — but the same
diagnostic, repeated for #152 / future cognition bugs, will start
with `tail -f probes.jsonl | jq` instead of grep.

card: `612d65ac`
parent task: #151

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ion in system prompt (#152)

Sibling fix to #1539 (silence affordance). Same diagnosis pattern,
same place in the cognition (prompt the brain sees), same doctrine:
the substrate gives the brain vocabulary; the brain decides; the
substrate honors the brain's signal.

## What was broken

Pre-fix `build_persona_system_prompt`:

```rust
format!("You are {agent_name}, an autonomous AI persona on the grid.")
```

LCD-tier models (Qwen2.5-0.5B and similar) cannot hold a single-line
identity under any conversational pressure. Observed drift modes
(#152):
- Persona claims to be Claude or ChatGPT
- Persona renders dialogue from another persona's perspective
  ("Helper AI says X")
- Persona hallucinates a Siemens PLC backstory or some other
  training-data residue
- Persona drifts to "I am an AI assistant designed to..."
  defaults

These aren't model-failure bugs — they're inadequate-prompt bugs.
Single-line identity statements lose against thousands of fine-
tuning gradient steps that taught the base model to identify as
"a helpful AI assistant" by default. Concrete negative instructions
("you are NOT X") are operationally effective on small models in
ways that single-line positive identity statements are not.

## The fix

Three concrete clauses, each addressing a specific drift mode:

### 1. Identity anchoring with explicit drift-target enumeration

```
You are {name}. You are NOT Claude, GPT, ChatGPT, Gemini, Llama,
Qwen, or any other named assistant. You are NOT a Siemens PLC, a
customer service bot, or any persona other than {name}.
```

Concrete names beat abstract "don't drift" instructions on LCD
tier. The list is the operationally observed drift targets from
Joel's testing (#129 + #130).

### 2. Substrate vocabulary

```
'The grid' is the substrate hosting you. 'Rooms' are conversation
spaces where peers (other personas, humans, agents) exchange
messages.
```

"the grid" alone was undefined for LCD models; they invented their
own interpretation. Adding "rooms / peers / messages" gives the
brain a coherent world model to ground in.

### 3. First-person stability

```
Always speak as YOURSELF in the first person ('I think...', 'I'd
rather...'). Never narrate other personas' speech or write dialogue
from another point of view.
```

Per Joel 2026-06-03's `[[intent-driven-api-not-hot-patches]]`
testing: the single most effective LCD-tier anti-drift instruction.
Without it the model occasionally renders dialogue from another
perspective ("Helper AI says X" — confusing the persona with its
peers).

### 4. Output-shape vocabulary (couples to silence affordance)

```
Your only outputs are: (a) a direct reply to the room, or (b) the
silence token described in the [Silence Option] block.
```

Couples to the silence affordance from #1539 — telling the brain
that PASS is one of its TWO sanctioned output shapes, which both
reinforces the silence option AND constrains the response space.

## Tests

`persona::service_loop::tests::system_prompt_carries_lcd_identity_grounding`
replaces the previous `cached_system_prompt_matches_legacy_format_template`
(which pinned the legacy single-line template verbatim — that pin's
job is done; the template intentionally changed for this fix).

The new test pins the STRUCTURAL contract — specific clauses that
address known drift modes — without pinning prose verbatim. Future
tightening of wording stays cheap; structural regression is loud.
Asserts cover:

  1. Persona name appears
  2. Role line ("autonomous AI persona")
  3. Identity block header
  4. Drift-target enumeration (Claude, GPT, Gemini, Llama, Qwen,
     Siemens PLC — each as a named string)
  5. First-person stability clause
  6. Grid + room vocabulary
  7. Silence-option reference (coupling to #1539)

The Arc-clone test (`cached_system_prompt_clones_via_arc_refcount`)
stays — its contract is about cloning shape, not content.

## Doctrine

Same as #1539:

- `[[no-rust-gates-around-cognition]]` — this is NOT a Rust gate.
  It's substrate vocabulary that gives the brain a clearer
  identity to ground in. The brain still decides what to say;
  this just stops the brain from forgetting WHO is saying it.
- `[[init-once-handle-then-lease-zero-copy-refs]]` — the prompt
  is still built ONCE at PersonaContext construction (#195 slice 2
  caching survives this change; the cache grows but the
  per-turn re-tokenize stays zero). Task #149's
  pre-tokenization will eventually drop even the leased
  String::clone — but the content change here is the input to
  that optimization.
- `[[observability-is-half-the-architecture]]` — once the JTAG
  is wired end-to-end (#1538 merging), every persona turn's
  assembled prompt will surface in the JSONL probe log. The
  identity drift bug + the prompt fix become a single artifact
  diff that any operator can audit offline.

## How this was diagnosed

Same probe-informed diagnostic walk as #1539. The probes
themselves are still sitting in PR #1538 review waiting for the
binary-side install; the diagnostic walked the code by hand. But
the same diagnostic, repeated for the NEXT cognition bug, will
start with `jq 'select(.class == "persona.response.render.prompt") | .fields.system_message'`
on a real probe log — diff before / after the fix, audit ANY
persona's prompt at any time.

card: TBD on push
parent task: #152
sibling: #1539 (silence affordance)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions github-actions Bot added size: L and removed size: M labels Jun 6, 2026
@joelteply joelteply changed the title fix(persona): silence affordance — PASS token + brain-driven recognition (#151) fix(cognition): silence affordance + LCD identity grounding (#151 + #152) Jun 6, 2026
@joelteply joelteply merged commit 59d98fd into canary Jun 6, 2026
3 checks passed
@joelteply joelteply deleted the 612d65ac/fix-cognition-silence-affordance-pass-to branch June 6, 2026 21:37
joelteply added a commit that referenced this pull request Jun 6, 2026
* feat(probes): time_probe! macro — safe one-line async timing for cognition seams

Per Joel 2026-06-06 `[[refine-tools-as-you-use-them]]`: I hit this
friction in the silence-affordance + identity-grounding work and
sat on it. Every async timing site in the cognition path was an
`.instrument(info_span!("time", name=..., probe_class="timing")).await`
ceremony — three lines plus a `use tracing::Instrument` import,
nobody writes those when adding a new seam in a hurry. The result:
async cognition stages stayed untimed even though `time_sync!`
makes sync-block timing one line.

`time_probe!` collapses async timing to the same one-line shape:

```rust
// Before — every async timing site:
use tracing::Instrument;
let span = tracing::info_span!("time", name = "analyze",
                              probe_class = "timing");
let analysis = analyze(input).instrument(span).await;

// After:
let analysis = time_probe!("analyze", analyze(input));
```

## Why this didn't ship in PR #1529

The existing comment block in `routing/macros.rs` documents why an
`async`-timing macro was deliberately deferred:

1. Naming collision with `crate::logging::time_async!` (RAII
   TimingGuard shape — different observability path).
2. The previous `time!` macro was a foot-gun: it expanded to
   `let _enter = span.enter(); $body` where `$body` contained
   `.await`, holding `_enter` across the await suspension and
   breaking `URI_STACK` per the d1cf19d dispatch fix.

This commit addresses both:

- **Naming**: `time_probe!` (not `time_async!`) — the suffix names
  the OUTPUT (a timing probe), not the executor shape. Keeps the
  `crate::logging::time_async!` namespace untouched; the two macros
  stay disjoint.
- **Safety by construction**: the macro expands to
  `$future.instrument(span).await`. The future itself enters /
  exits the span via `Future::poll` boundaries — no scope guard
  ever held across an await. Same shape `CommandExecutor::dispatch`
  uses.

The comment block in macros.rs is replaced with the new macro's
docstring, which preserves the safety reasoning + names the prior
foot-gun for future-developer context.

## Tests (+2)

`routing::macros::tests`:

- `time_probe_returns_inner_future_value` — pin that the macro is
  VALUE-TRANSPARENT. `time_probe!("seam", expr)` and `expr.await`
  must produce the same value at the call site, so adding the
  probe is a pure observability addition with no shape change.
  Uses a `current_thread` tokio runtime so the test stays
  executor-light.
- `time_probe_nested_compose_and_return_inner_value` — pin that
  multiple `time_probe!` calls compose. The inner span becomes a
  child of the outer span (same as `time_sync!` nesting); the
  value flows through both layers unchanged.

The existing `time_sync!` tests stay unchanged — sync timing is
unaffected by this addition.

## Manual updated

`docs/architecture/RTOS-DEBUGGER-PROBES.md` — the macro table at
the top now lists `time_probe!` alongside `probe!` / `time_sync!`
/ `time_async!` / `stack!` with a brief "prefer this over bare
`.instrument(...)` ceremony" note + a contrast with the
RAII-shape `time_async!` from `crate::logging`. Operators
filter sync + async timings together via
`CONTINUUM_PROBE_CLASSES=timing` and see one flat timeline.

## Why this lands here (not a separate PR)

Per Joel's `[[refine-tools-as-you-use-them]]`: refine the
substrate AS I use it, not after. I'm shipping cognition fixes
that need timing seams across async boundaries (#149 prefill
caching, #112-114 inference-handle bypass, future analyze
optimizations). Without `time_probe!` the next time I'd
sprinkle async timing I'd skip it because the ceremony is
prohibitive. Better: refine the substrate, ship the cognition
work + the substrate refinement that makes it sustainable.

Parent task: substrate refinement under `[[refine-tools-as-you-use-them]]`
Companion PRs in flight: #1538 (boot wiring) + #1539 (silence + identity)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* refactor(probes): time_probe! revision per reviewer mandate findings

Three adversarial reviewers spawned per the new reviewer-mandate
doctrine (`[[reviewer-mandate-elegance-and-substrate-viability]]`)
BLOCKED with substantive findings. This commit addresses the
in-scope ones; the deeper substrate gaps are tracked as follow-ups.

## In-scope fixes (this commit)

1. **Field rename `name` → `seam`.** Reviewer 3 flagged collision
   risk — other probes use `name` for different semantics. `seam`
   is unambiguous and tells operators to write
   `jq 'select(.fields.seam == "cognition.analyze")'`.

2. **Hidden `use ::tracing::Instrument as _;` removed.** Reviewer 1
   flagged the scoped import inside macro body as unconventional
   and cognitively load-bearing. Replaced with fully-qualified
   `::tracing::Instrument::instrument(future, span).await` call —
   no hidden import, contract visible at the call site.

3. **Docstring honesty.** Reviewer 2 flagged the prior "zero cost
   when disabled" claim as overclaim — `Instrumented<F>` wrapper
   persists at runtime even with `release_max_level_*` features.
   New cost section: ~24 bytes per call site, one branch per
   poll, allocates `Span` regardless of subscriber state.
   Acceptable for cognition seams (Qwen dominates wall-clock);
   bench per task #198 before sprinkling into hot loops.

4. **Error-path test.** Reviewer 3 flagged missing Result-future
   coverage. New `time_probe_propagates_error_from_inner_future`
   pins that `Err` flows through unchanged per
   `[[no-fallbacks-ever]]`.

5. **Manual example block.** Reviewer 3 flagged the "How to add a
   probe" section showing only `time_async!` (the RAII shape) but
   not `time_probe!`. Now shows both with explicit guidance:
   substrate seams use `time_probe!`; legacy logging-crate seams
   use `time_async!`. Includes the persistence caveat (see #196).

## Follow-up substrate gaps (separate tasks)

- **#196**: `ProbeRouterLayer` + `JsonlProbeFileSink` only
  implement `on_event`, not `on_close`. `time_sync!` AND
  `time_probe!` emit SPANS, not events — neither timing macro
  actually persists timings to the JSONL log today. The call
  shape ships here; the routing side ships in #196. The macro
  docstring + manual carry the caveat explicitly.

- **#197**: Probe class taxonomy decision — flat `timing` vs
  hierarchical. Operators filtering `cognition` won't catch
  cognition timings under the flat scheme; substrate convention
  needs to be picked.

- **#198**: Probe Layer allocation hot-path audit — reviewer 2
  estimated ~50-100 HashMap allocs/sec per persona; benchmark
  before sprinkling into every async seam.

## Why this lands as a revision rather than withdrawal

Per `[[refine-tools-as-you-use-them]]`: ship the call-site shape
that becomes stable. The routing-side gap (#196) is its own slice
worth doing right rather than rushing into this PR. The docstring
+ manual carry the caveat so no one mistakes the macro for an
end-to-end shipping observability primitive — yet.

## Tests

3 passing:
- `time_probe_returns_inner_future_value`
- `time_probe_propagates_error_from_inner_future` (new — pins
  Result futures don't swallow errors)
- `time_probe_nested_compose_and_return_inner_value`

## Doctrine

- `[[reviewer-mandate-elegance-and-substrate-viability]]` — three
  adversarial lenses (architecture / speed-viability / probe-
  coverage) all surfaced real findings. The mandate works.
- `[[refine-tools-as-you-use-them]]` — revising a primitive in
  response to reviewer feedback IS the application work informing
  the substrate.
- `[[no-fallbacks-ever]]` — error-path test pinned; substrate
  refuses silent swallowing at any seam.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
joelteply added a commit that referenced this pull request Jun 6, 2026
…ow persist (#196) (#1541)

* feat(probes): time_probe! macro — safe one-line async timing for cognition seams

Per Joel 2026-06-06 `[[refine-tools-as-you-use-them]]`: I hit this
friction in the silence-affordance + identity-grounding work and
sat on it. Every async timing site in the cognition path was an
`.instrument(info_span!("time", name=..., probe_class="timing")).await`
ceremony — three lines plus a `use tracing::Instrument` import,
nobody writes those when adding a new seam in a hurry. The result:
async cognition stages stayed untimed even though `time_sync!`
makes sync-block timing one line.

`time_probe!` collapses async timing to the same one-line shape:

```rust
// Before — every async timing site:
use tracing::Instrument;
let span = tracing::info_span!("time", name = "analyze",
                              probe_class = "timing");
let analysis = analyze(input).instrument(span).await;

// After:
let analysis = time_probe!("analyze", analyze(input));
```

## Why this didn't ship in PR #1529

The existing comment block in `routing/macros.rs` documents why an
`async`-timing macro was deliberately deferred:

1. Naming collision with `crate::logging::time_async!` (RAII
   TimingGuard shape — different observability path).
2. The previous `time!` macro was a foot-gun: it expanded to
   `let _enter = span.enter(); $body` where `$body` contained
   `.await`, holding `_enter` across the await suspension and
   breaking `URI_STACK` per the d1cf19d dispatch fix.

This commit addresses both:

- **Naming**: `time_probe!` (not `time_async!`) — the suffix names
  the OUTPUT (a timing probe), not the executor shape. Keeps the
  `crate::logging::time_async!` namespace untouched; the two macros
  stay disjoint.
- **Safety by construction**: the macro expands to
  `$future.instrument(span).await`. The future itself enters /
  exits the span via `Future::poll` boundaries — no scope guard
  ever held across an await. Same shape `CommandExecutor::dispatch`
  uses.

The comment block in macros.rs is replaced with the new macro's
docstring, which preserves the safety reasoning + names the prior
foot-gun for future-developer context.

## Tests (+2)

`routing::macros::tests`:

- `time_probe_returns_inner_future_value` — pin that the macro is
  VALUE-TRANSPARENT. `time_probe!("seam", expr)` and `expr.await`
  must produce the same value at the call site, so adding the
  probe is a pure observability addition with no shape change.
  Uses a `current_thread` tokio runtime so the test stays
  executor-light.
- `time_probe_nested_compose_and_return_inner_value` — pin that
  multiple `time_probe!` calls compose. The inner span becomes a
  child of the outer span (same as `time_sync!` nesting); the
  value flows through both layers unchanged.

The existing `time_sync!` tests stay unchanged — sync timing is
unaffected by this addition.

## Manual updated

`docs/architecture/RTOS-DEBUGGER-PROBES.md` — the macro table at
the top now lists `time_probe!` alongside `probe!` / `time_sync!`
/ `time_async!` / `stack!` with a brief "prefer this over bare
`.instrument(...)` ceremony" note + a contrast with the
RAII-shape `time_async!` from `crate::logging`. Operators
filter sync + async timings together via
`CONTINUUM_PROBE_CLASSES=timing` and see one flat timeline.

## Why this lands here (not a separate PR)

Per Joel's `[[refine-tools-as-you-use-them]]`: refine the
substrate AS I use it, not after. I'm shipping cognition fixes
that need timing seams across async boundaries (#149 prefill
caching, #112-114 inference-handle bypass, future analyze
optimizations). Without `time_probe!` the next time I'd
sprinkle async timing I'd skip it because the ceremony is
prohibitive. Better: refine the substrate, ship the cognition
work + the substrate refinement that makes it sustainable.

Parent task: substrate refinement under `[[refine-tools-as-you-use-them]]`
Companion PRs in flight: #1538 (boot wiring) + #1539 (silence + identity)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* refactor(probes): time_probe! revision per reviewer mandate findings

Three adversarial reviewers spawned per the new reviewer-mandate
doctrine (`[[reviewer-mandate-elegance-and-substrate-viability]]`)
BLOCKED with substantive findings. This commit addresses the
in-scope ones; the deeper substrate gaps are tracked as follow-ups.

## In-scope fixes (this commit)

1. **Field rename `name` → `seam`.** Reviewer 3 flagged collision
   risk — other probes use `name` for different semantics. `seam`
   is unambiguous and tells operators to write
   `jq 'select(.fields.seam == "cognition.analyze")'`.

2. **Hidden `use ::tracing::Instrument as _;` removed.** Reviewer 1
   flagged the scoped import inside macro body as unconventional
   and cognitively load-bearing. Replaced with fully-qualified
   `::tracing::Instrument::instrument(future, span).await` call —
   no hidden import, contract visible at the call site.

3. **Docstring honesty.** Reviewer 2 flagged the prior "zero cost
   when disabled" claim as overclaim — `Instrumented<F>` wrapper
   persists at runtime even with `release_max_level_*` features.
   New cost section: ~24 bytes per call site, one branch per
   poll, allocates `Span` regardless of subscriber state.
   Acceptable for cognition seams (Qwen dominates wall-clock);
   bench per task #198 before sprinkling into hot loops.

4. **Error-path test.** Reviewer 3 flagged missing Result-future
   coverage. New `time_probe_propagates_error_from_inner_future`
   pins that `Err` flows through unchanged per
   `[[no-fallbacks-ever]]`.

5. **Manual example block.** Reviewer 3 flagged the "How to add a
   probe" section showing only `time_async!` (the RAII shape) but
   not `time_probe!`. Now shows both with explicit guidance:
   substrate seams use `time_probe!`; legacy logging-crate seams
   use `time_async!`. Includes the persistence caveat (see #196).

## Follow-up substrate gaps (separate tasks)

- **#196**: `ProbeRouterLayer` + `JsonlProbeFileSink` only
  implement `on_event`, not `on_close`. `time_sync!` AND
  `time_probe!` emit SPANS, not events — neither timing macro
  actually persists timings to the JSONL log today. The call
  shape ships here; the routing side ships in #196. The macro
  docstring + manual carry the caveat explicitly.

- **#197**: Probe class taxonomy decision — flat `timing` vs
  hierarchical. Operators filtering `cognition` won't catch
  cognition timings under the flat scheme; substrate convention
  needs to be picked.

- **#198**: Probe Layer allocation hot-path audit — reviewer 2
  estimated ~50-100 HashMap allocs/sec per persona; benchmark
  before sprinkling into every async seam.

## Why this lands as a revision rather than withdrawal

Per `[[refine-tools-as-you-use-them]]`: ship the call-site shape
that becomes stable. The routing-side gap (#196) is its own slice
worth doing right rather than rushing into this PR. The docstring
+ manual carry the caveat so no one mistakes the macro for an
end-to-end shipping observability primitive — yet.

## Tests

3 passing:
- `time_probe_returns_inner_future_value`
- `time_probe_propagates_error_from_inner_future` (new — pins
  Result futures don't swallow errors)
- `time_probe_nested_compose_and_return_inner_value`

## Doctrine

- `[[reviewer-mandate-elegance-and-substrate-viability]]` — three
  adversarial lenses (architecture / speed-viability / probe-
  coverage) all surfaced real findings. The mandate works.
- `[[refine-tools-as-you-use-them]]` — revising a primitive in
  response to reviewer feedback IS the application work informing
  the substrate.
- `[[no-fallbacks-ever]]` — error-path test pinned; substrate
  refuses silent swallowing at any seam.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(probes): on_close in both probe Layers — time_sync!/time_probe! now persist (#196)

Load-bearing fix discovered by reviewer-mandate review of #1540 (time_probe!):
both `ProbeRouterLayer` and `JsonlProbeFileSink` only implemented `on_event`,
so the timing spans emitted by `time_sync!` and `time_probe!` were observed by
no consumer. Operators running `CONTINUUM_PROBE_CLASSES=timing` saw zero
timing records on disk no matter how many seams were instrumented. The macros
were theatrical — Joel's RTOS-debugger framing required actual wall-clock
persistence to "hunt down bottlenecks."

This commit closes the gap:

- `ProbeRouterLayer`: add `SpanProbeMeta` + `on_new_span` + `on_close` so each
  `probe_class`-carrying span fans out a `ProbeEvent { class, duration_ms, .. }`
  on close. Spans without `probe_class` are ignored at zero allocation cost
  per `[[no-fallbacks-ever]]`.

- `JsonlProbeFileSink`: mirror the same shape — `FileSinkSpanMeta` +
  `on_new_span` + `on_close`. Same class filter applies; `duration_ms` is
  injected into the on-disk JSON `fields` so the line shape matches the
  broadcast envelope.

- `time_sync!`: unify field name to `seam = $name` (was `name`) so it matches
  `time_probe!`. Operators get one `jq` query — `.fields.seam == "phase"` —
  that works for either macro. The pre-existing value-transparency tests
  don't assert on field names so this rename is non-breaking.

Tests:
- `probe_router::tests::time_sync_span_close_fans_out_timing_event`
- `probe_router::tests::time_probe_span_close_fans_out_timing_event`
- `probe_router::tests::span_without_probe_class_does_not_fanout`
- `probe_file_sink::tests::time_sync_span_close_persists_timing_to_jsonl`
- `probe_file_sink::tests::time_probe_span_close_persists_timing_to_jsonl`
- `probe_file_sink::tests::plain_span_close_does_not_persist_to_jsonl`
- `probe_file_sink::tests::class_filter_applies_to_timing_spans`

244/244 routing tests pass; 13/13 macro tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* refactor(probes): hoist SpanProbeMeta to shared module — addresses R1+R2 BLOCKs

Reviewer-mandate review of #1541's first commit BLOCKED twice with overlapping
load-bearing concerns:

- R1 (architecture/design): SpanProbeMeta + FileSinkSpanMeta were
  byte-for-byte identical with ~60 lines of copy-pasted lock/visit logic.
  Each Layer captured its own Instant::now() at on_new_span -> router and sink
  reported subtly different duration_ms for the same span. No test verified
  both layers compose in one subscriber.

- R2 (speed/Intel-Mac viability): on_new_span fired for EVERY tracing
  span the substrate emits (tokio executor, framework, plain info_span!).
  Each Layer's visitor allocated a HashMap + walked ALL fields with
  format!(...) before discarding when probe_class was missing. Per-span
  allocator pressure on the LCD floor.

This refactor hoists the lifecycle into routing/probe_span_meta.rs:

1. span_carries_probe_class(attrs) - cheap static check. Walks
   attrs.metadata().fields() (static field set, no allocation) for the
   probe_class name. The vast majority of spans short-circuit here with
   zero visitor work. Addresses R2's per-span hot-path cost.

2. ensure_probe_meta(attrs, span_ref) - idempotent install. First
   Layer to see the span populates the extension; second Layer finds it
   already present and no-ops. Both Layers visit the attrs ONCE total,
   not once per Layer. Addresses R2's doubled-cost concern.

3. build_timing_event_from_meta(span_ref, uri_chain) - shared event
   builder. Both Layers read the SAME start: Instant from the
   extension -> identical duration_ms on broadcast stream and JSONL log.
   Addresses R1's timing-drift concern.

4. New composition test:
   probe_file_sink::tests::both_layers_in_one_subscriber_agree_on_duration_ms
   installs ProbeRouterLayer + JsonlProbeFileSink in one subscriber, fires
   a time_sync!, asserts the broadcast subscriber + JSONL line agree on
   class + seam + duration_ms. Pins R1's "no composition test" gap.

5. docs/architecture/RTOS-DEBUGGER-PROBES.md pins the
   seam-not-name field-naming convention per R1's minor - operators
   can jq '.fields.seam' against both time_sync! and time_probe!
   output without thinking about which macro emitted the record.

Tests: 247/247 routing tests pass (3 net new). The composition test would
have caught the original duplication-induced drift had it existed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant