fix(probes): on_close in both probe Layers — time_sync!/time_probe! now persist (#196)#1541
Merged
joelteply merged 4 commits intoJun 6, 2026
Merged
Conversation
…ition seams
Per Joel 2026-06-06 `[[refine-tools-as-you-use-them]]`: I hit this
friction in the silence-affordance + identity-grounding work and
sat on it. Every async timing site in the cognition path was an
`.instrument(info_span!("time", name=..., probe_class="timing")).await`
ceremony — three lines plus a `use tracing::Instrument` import,
nobody writes those when adding a new seam in a hurry. The result:
async cognition stages stayed untimed even though `time_sync!`
makes sync-block timing one line.
`time_probe!` collapses async timing to the same one-line shape:
```rust
// Before — every async timing site:
use tracing::Instrument;
let span = tracing::info_span!("time", name = "analyze",
probe_class = "timing");
let analysis = analyze(input).instrument(span).await;
// After:
let analysis = time_probe!("analyze", analyze(input));
```
## Why this didn't ship in PR #1529
The existing comment block in `routing/macros.rs` documents why an
`async`-timing macro was deliberately deferred:
1. Naming collision with `crate::logging::time_async!` (RAII
TimingGuard shape — different observability path).
2. The previous `time!` macro was a foot-gun: it expanded to
`let _enter = span.enter(); $body` where `$body` contained
`.await`, holding `_enter` across the await suspension and
breaking `URI_STACK` per the d1cf19d dispatch fix.
This commit addresses both:
- **Naming**: `time_probe!` (not `time_async!`) — the suffix names
the OUTPUT (a timing probe), not the executor shape. Keeps the
`crate::logging::time_async!` namespace untouched; the two macros
stay disjoint.
- **Safety by construction**: the macro expands to
`$future.instrument(span).await`. The future itself enters /
exits the span via `Future::poll` boundaries — no scope guard
ever held across an await. Same shape `CommandExecutor::dispatch`
uses.
The comment block in macros.rs is replaced with the new macro's
docstring, which preserves the safety reasoning + names the prior
foot-gun for future-developer context.
## Tests (+2)
`routing::macros::tests`:
- `time_probe_returns_inner_future_value` — pin that the macro is
VALUE-TRANSPARENT. `time_probe!("seam", expr)` and `expr.await`
must produce the same value at the call site, so adding the
probe is a pure observability addition with no shape change.
Uses a `current_thread` tokio runtime so the test stays
executor-light.
- `time_probe_nested_compose_and_return_inner_value` — pin that
multiple `time_probe!` calls compose. The inner span becomes a
child of the outer span (same as `time_sync!` nesting); the
value flows through both layers unchanged.
The existing `time_sync!` tests stay unchanged — sync timing is
unaffected by this addition.
## Manual updated
`docs/architecture/RTOS-DEBUGGER-PROBES.md` — the macro table at
the top now lists `time_probe!` alongside `probe!` / `time_sync!`
/ `time_async!` / `stack!` with a brief "prefer this over bare
`.instrument(...)` ceremony" note + a contrast with the
RAII-shape `time_async!` from `crate::logging`. Operators
filter sync + async timings together via
`CONTINUUM_PROBE_CLASSES=timing` and see one flat timeline.
## Why this lands here (not a separate PR)
Per Joel's `[[refine-tools-as-you-use-them]]`: refine the
substrate AS I use it, not after. I'm shipping cognition fixes
that need timing seams across async boundaries (#149 prefill
caching, #112-114 inference-handle bypass, future analyze
optimizations). Without `time_probe!` the next time I'd
sprinkle async timing I'd skip it because the ceremony is
prohibitive. Better: refine the substrate, ship the cognition
work + the substrate refinement that makes it sustainable.
Parent task: substrate refinement under `[[refine-tools-as-you-use-them]]`
Companion PRs in flight: #1538 (boot wiring) + #1539 (silence + identity)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three adversarial reviewers spawned per the new reviewer-mandate doctrine (`[[reviewer-mandate-elegance-and-substrate-viability]]`) BLOCKED with substantive findings. This commit addresses the in-scope ones; the deeper substrate gaps are tracked as follow-ups. ## In-scope fixes (this commit) 1. **Field rename `name` → `seam`.** Reviewer 3 flagged collision risk — other probes use `name` for different semantics. `seam` is unambiguous and tells operators to write `jq 'select(.fields.seam == "cognition.analyze")'`. 2. **Hidden `use ::tracing::Instrument as _;` removed.** Reviewer 1 flagged the scoped import inside macro body as unconventional and cognitively load-bearing. Replaced with fully-qualified `::tracing::Instrument::instrument(future, span).await` call — no hidden import, contract visible at the call site. 3. **Docstring honesty.** Reviewer 2 flagged the prior "zero cost when disabled" claim as overclaim — `Instrumented<F>` wrapper persists at runtime even with `release_max_level_*` features. New cost section: ~24 bytes per call site, one branch per poll, allocates `Span` regardless of subscriber state. Acceptable for cognition seams (Qwen dominates wall-clock); bench per task #198 before sprinkling into hot loops. 4. **Error-path test.** Reviewer 3 flagged missing Result-future coverage. New `time_probe_propagates_error_from_inner_future` pins that `Err` flows through unchanged per `[[no-fallbacks-ever]]`. 5. **Manual example block.** Reviewer 3 flagged the "How to add a probe" section showing only `time_async!` (the RAII shape) but not `time_probe!`. Now shows both with explicit guidance: substrate seams use `time_probe!`; legacy logging-crate seams use `time_async!`. Includes the persistence caveat (see #196). ## Follow-up substrate gaps (separate tasks) - **#196**: `ProbeRouterLayer` + `JsonlProbeFileSink` only implement `on_event`, not `on_close`. `time_sync!` AND `time_probe!` emit SPANS, not events — neither timing macro actually persists timings to the JSONL log today. The call shape ships here; the routing side ships in #196. The macro docstring + manual carry the caveat explicitly. - **#197**: Probe class taxonomy decision — flat `timing` vs hierarchical. Operators filtering `cognition` won't catch cognition timings under the flat scheme; substrate convention needs to be picked. - **#198**: Probe Layer allocation hot-path audit — reviewer 2 estimated ~50-100 HashMap allocs/sec per persona; benchmark before sprinkling into every async seam. ## Why this lands as a revision rather than withdrawal Per `[[refine-tools-as-you-use-them]]`: ship the call-site shape that becomes stable. The routing-side gap (#196) is its own slice worth doing right rather than rushing into this PR. The docstring + manual carry the caveat so no one mistakes the macro for an end-to-end shipping observability primitive — yet. ## Tests 3 passing: - `time_probe_returns_inner_future_value` - `time_probe_propagates_error_from_inner_future` (new — pins Result futures don't swallow errors) - `time_probe_nested_compose_and_return_inner_value` ## Doctrine - `[[reviewer-mandate-elegance-and-substrate-viability]]` — three adversarial lenses (architecture / speed-viability / probe- coverage) all surfaced real findings. The mandate works. - `[[refine-tools-as-you-use-them]]` — revising a primitive in response to reviewer feedback IS the application work informing the substrate. - `[[no-fallbacks-ever]]` — error-path test pinned; substrate refuses silent swallowing at any seam. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ow persist (#196) Load-bearing fix discovered by reviewer-mandate review of #1540 (time_probe!): both `ProbeRouterLayer` and `JsonlProbeFileSink` only implemented `on_event`, so the timing spans emitted by `time_sync!` and `time_probe!` were observed by no consumer. Operators running `CONTINUUM_PROBE_CLASSES=timing` saw zero timing records on disk no matter how many seams were instrumented. The macros were theatrical — Joel's RTOS-debugger framing required actual wall-clock persistence to "hunt down bottlenecks." This commit closes the gap: - `ProbeRouterLayer`: add `SpanProbeMeta` + `on_new_span` + `on_close` so each `probe_class`-carrying span fans out a `ProbeEvent { class, duration_ms, .. }` on close. Spans without `probe_class` are ignored at zero allocation cost per `[[no-fallbacks-ever]]`. - `JsonlProbeFileSink`: mirror the same shape — `FileSinkSpanMeta` + `on_new_span` + `on_close`. Same class filter applies; `duration_ms` is injected into the on-disk JSON `fields` so the line shape matches the broadcast envelope. - `time_sync!`: unify field name to `seam = $name` (was `name`) so it matches `time_probe!`. Operators get one `jq` query — `.fields.seam == "phase"` — that works for either macro. The pre-existing value-transparency tests don't assert on field names so this rename is non-breaking. Tests: - `probe_router::tests::time_sync_span_close_fans_out_timing_event` - `probe_router::tests::time_probe_span_close_fans_out_timing_event` - `probe_router::tests::span_without_probe_class_does_not_fanout` - `probe_file_sink::tests::time_sync_span_close_persists_timing_to_jsonl` - `probe_file_sink::tests::time_probe_span_close_persists_timing_to_jsonl` - `probe_file_sink::tests::plain_span_close_does_not_persist_to_jsonl` - `probe_file_sink::tests::class_filter_applies_to_timing_spans` 244/244 routing tests pass; 13/13 macro tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…+R2 BLOCKs Reviewer-mandate review of #1541's first commit BLOCKED twice with overlapping load-bearing concerns: - R1 (architecture/design): SpanProbeMeta + FileSinkSpanMeta were byte-for-byte identical with ~60 lines of copy-pasted lock/visit logic. Each Layer captured its own Instant::now() at on_new_span -> router and sink reported subtly different duration_ms for the same span. No test verified both layers compose in one subscriber. - R2 (speed/Intel-Mac viability): on_new_span fired for EVERY tracing span the substrate emits (tokio executor, framework, plain info_span!). Each Layer's visitor allocated a HashMap + walked ALL fields with format!(...) before discarding when probe_class was missing. Per-span allocator pressure on the LCD floor. This refactor hoists the lifecycle into routing/probe_span_meta.rs: 1. span_carries_probe_class(attrs) - cheap static check. Walks attrs.metadata().fields() (static field set, no allocation) for the probe_class name. The vast majority of spans short-circuit here with zero visitor work. Addresses R2's per-span hot-path cost. 2. ensure_probe_meta(attrs, span_ref) - idempotent install. First Layer to see the span populates the extension; second Layer finds it already present and no-ops. Both Layers visit the attrs ONCE total, not once per Layer. Addresses R2's doubled-cost concern. 3. build_timing_event_from_meta(span_ref, uri_chain) - shared event builder. Both Layers read the SAME start: Instant from the extension -> identical duration_ms on broadcast stream and JSONL log. Addresses R1's timing-drift concern. 4. New composition test: probe_file_sink::tests::both_layers_in_one_subscriber_agree_on_duration_ms installs ProbeRouterLayer + JsonlProbeFileSink in one subscriber, fires a time_sync!, asserts the broadcast subscriber + JSONL line agree on class + seam + duration_ms. Pins R1's "no composition test" gap. 5. docs/architecture/RTOS-DEBUGGER-PROBES.md pins the seam-not-name field-naming convention per R1's minor - operators can jq '.fields.seam' against both time_sync! and time_probe! output without thinking about which macro emitted the record. Tests: 247/247 routing tests pass (3 net new). The composition test would have caught the original duplication-induced drift had it existed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Load-bearing fix for the load-bearing gap discovered by the reviewer-mandate review of PR #1540 (
time_probe!): bothProbeRouterLayerandJsonlProbeFileSinkonly implementedon_event, so the timing spans emitted bytime_sync!andtime_probe!were observed by no consumer.Operators running
CONTINUUM_PROBE_CLASSES=timingsaw zero timing records on disk no matter how many seams were instrumented. Both macros were theatrical — Joel's RTOS-debugger framing (docs/architecture/RTOS-DEBUGGER-PROBES.md) requires actual wall-clock persistence to "hunt down bottlenecks."This PR closes the gap:
ProbeRouterLayer(routing/probe_router.rs): addsSpanProbeMeta+on_new_span+on_close. Eachprobe_class-carrying span fans out aProbeEvent { class, duration_ms, .. }on close. Spans withoutprobe_classare ignored at zero allocation cost per [[no-fallbacks-ever]].JsonlProbeFileSink(routing/probe_file_sink.rs): mirrors the same shape —FileSinkSpanMeta+on_new_span+on_close. Class filter applies;duration_msis injected into on-diskfieldsso the line shape matches the broadcast envelope.time_sync!macro (routing/macros.rs): unify field name toseam = $name(wasname) so it matchestime_probe!. Operators get onejqquery —.fields.seam == "phase"— that works for either macro.Architecture context
Composes with PR #1540 (
time_probe!) — that PR added the safe-across-await async timing macro; this PR makes BOTH macros produce real records on disk and on subscribers. Without this fix, #1540 was the right macro shape with a missing back-end.Reference docs:
docs/architecture/RTOS-DEBUGGER-PROBES.md— the operator manual; "where the records land" section is now accuratedocs/architecture/GRID-ADDRESSING-AND-ROUTING.md§"Per-class routing" — the routing contract this implementsRelated memories:
[[jtag-probes-are-rtos-debugger]],[[refine-tools-as-you-use-them]],[[every-error-is-an-opportunity-to-battle-harden]].Test plan
cargo test --features metal,accelerate routing::probe_file_sink::— 9/9 pass (4 new tests covering timing-span persistence + class-filter on timing)cargo test --features metal,accelerate routing::— 244/244 pass (full routing suite including 3 new tests in probe_router)cargo test --features metal,accelerate macros— 13/13 pass (macro tests unaffected byseamrename — value transparency preserved)~/.continuum/probes.jsonlwithCONTINUUM_PROBE_CLASSES=timingand confirmtime_sync!/time_probe!calls in cognition produce records withduration_ms🤖 Generated with Claude Code