Skip to content

feat(core): BackgroundSupervisor Phase 2 — supervised spawns, latency histogram, TUI display, turn-boundary abort#2892

Merged
bug-ops merged 2 commits intomainfrom
2883-bg-supervisor-phase-2
Apr 11, 2026
Merged

feat(core): BackgroundSupervisor Phase 2 — supervised spawns, latency histogram, TUI display, turn-boundary abort#2892
bug-ops merged 2 commits intomainfrom
2883-bg-supervisor-phase-2

Conversation

@bug-ops
Copy link
Copy Markdown
Owner

@bug-ops bug-ops commented Apr 11, 2026

Closes #2883 (epic)
Closes #2884 #2885 #2886 #2887 #2888 #2889

Summary

Phase 2 expansion of BackgroundSupervisor (Phase 1: #2816). Six sub-phases implemented in a single PR on branch feat/2883-bg-supervisor-phase-2.

Spec: specs/039-background-task-supervisor/spec.md

Changes

2A: Route remaining agent-path spawns

  • native.rs:1261 audit log → supervisor.spawn(Telemetry, "audit-log")
  • sanitize.rs:161 audit log → supervisor.spawn(Telemetry, "audit-log-sanitize")
  • assembly.rs::spawn_outgoing_digest excluded: &self borrow conflicts with &mut self supervisor (documented in spec)

2B: Per-class bg_latency histogram

  • TaskResult carries spawned_at: Instant; reap() computes elapsed
  • observe_bg_task(label, duration) added to HistogramRecorder trait
  • Prometheus: zeph_bg_task_duration_seconds_{enrichment,telemetry} with 9 buckets [1ms…30s]

2C: TUI status bar

  • MetricsSnapshot gains bg_inflight_enrichment + bg_inflight_telemetry
  • Status bar renders bg: N enrich, M telem when either > 0; hidden when both are 0

2D: Turn-boundary abort for Enrichment

  • class_handles: [Vec<AbortHandle>; 2] per class; abort_class(class) with atomic abort
  • reap() prunes finished handles via retain(|h| !h.is_finished())
  • Turn-boundary abort gated by config.supervisor.abort_enrichment_on_turn (default: false)

2E: Configurable queue depths

  • TaskSupervisorConfig under [agent.supervisor] (enrichment_limit=4, telemetry_limit=8)
  • with_supervisor_config() on AgentBuilder
  • --migrate-config step 14 adds section with defaults

2F: Tracing span propagation

  • All supervised futures wrapped with .instrument(info_span!("bg_task", class, task))

Testing

  • 8158 tests, all passing (--features full)
  • 13 supervisor unit tests including new observe_bg_task_called_on_reap mock test
  • cargo +nightly fmt --check: clean
  • cargo clippy --features full --lib --bins -- -D warnings: clean

Validation summary

Agent Verdict
architect ✅ design approved
critic (arch) ✅ minor — S1 borrow conflict addressed
tester ✅ 13/13 pass, 3 non-blocking gaps noted
perf ✅ no issues — Vec bounded, overhead negligible
security ✅ 0 critical/high — 2 medium follow-ups filed
impl-critic ✅ minor — cancellation log split fixed
reviewer ✅ approved after 2 required fixes

Follow-up issues

  • Security M1 (shutdown audit loss): file separately
  • Security M2 (no config upper bound): file separately

bug-ops added 2 commits April 11, 2026 16:10
… histogram, TUI display, turn-boundary abort (#2883)

Closes #2884, #2885, #2886, #2887, #2888, #2889

## Phase 2A: route remaining agent-path fire-and-forget spawns
- native.rs audit log write → supervisor.spawn(Telemetry, "audit-log")
- sanitize.rs audit log write → supervisor.spawn(Telemetry, "audit-log-sanitize")
- assembly.rs::spawn_outgoing_digest excluded: &self borrow conflicts with
  &mut self supervisor; documented in spec 039

## Phase 2B: per-class bg_latency histogram
- TaskResult carries spawned_at: Instant
- reap() computes elapsed and calls observe_bg_task(label, duration)
- observe_bg_task() added to HistogramRecorder trait
- Prometheus: zeph_bg_task_duration_seconds_enrichment / _telemetry
  with buckets [1ms, 5ms, 10ms, 50ms, 100ms, 500ms, 1s, 5s, 30s]

## Phase 2C: TUI status bar background task display
- MetricsSnapshot gains bg_inflight_enrichment + bg_inflight_telemetry
- status.rs renders "bg: N enrich, M telem" when either > 0, hidden otherwise

## Phase 2D: turn-boundary abort for Enrichment class
- class_handles: [Vec<AbortHandle>; 2] tracks handles per TaskClass
- abort_class(class) aborts all handles of that class; clears vec
- reap() prunes finished handles via retain(|h| !h.is_finished())
- Aborted tasks counted in bg_dropped metric
- Turn-boundary abort gated by config.supervisor.abort_enrichment_on_turn

## Phase 2E: configurable queue depths
- TaskSupervisorConfig under [agent.supervisor] in AgentConfig
  (enrichment_limit=4, telemetry_limit=8, abort_enrichment_on_turn=false)
- with_supervisor_config() on AgentBuilder
- --migrate-config step 14: adds [agent.supervisor] block with defaults
- config/default.toml: commented-out [agent.supervisor] section

## Phase 2F: tracing span propagation
- All spawned futures wrapped with .instrument(info_span!("bg_task",
  class, task)) for Jaeger/Chrome trace visibility

## Code quality
- JoinError split: is_cancelled() -> debug!, panic -> warn!
- observe_bg_task_called_on_reap test with CountingRecorder mock
- BG_BUCKETS const moved to module level (clippy::items_after_statements)
- bool-to-int replaced with usize::from() (clippy::bool_to_int_with_if)
zeph-config does not depend on zeph-core, so the link to
BackgroundSupervisor resolved at doc build time. Replace with plain
text reference.
@bug-ops bug-ops force-pushed the 2883-bg-supervisor-phase-2 branch from ba0e63c to d1d6388 Compare April 11, 2026 14:10
@bug-ops bug-ops merged commit a7268a5 into main Apr 11, 2026
30 checks passed
@bug-ops bug-ops deleted the 2883-bg-supervisor-phase-2 branch April 11, 2026 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment