feat(resources,#1239): Phase 1 — system/docker-tier-stats IPC + ts-rs DockerTierStats#1297
Conversation
… DockerTierStats Per Joel's "keep finding work" / mission to surface substrate pressure to operators. The audit on #1239 (#1239 (comment)) found the gap is bigger than the card text suggests: PressureBroker is built but never instantiated, DockerTierPool never registered, continuum status is bash-side. Phasing the work so Phase 1 surfaces the data without the missing broker singleton. Phase 1 (this PR): - New `system/docker-tier-stats` IPC handler in `SystemResourceModule` calling `DockerTierPool::snapshot_stats()` (new convenience method, one probe per call) — returns typed `DockerTierStats` (capacityBytes, usedBytes, pressure, detected). - ts-rs export at `shared/generated/resources/DockerTierStats.ts`. - IPC mixin entry `dockerTierStats()` on the RustCoreIPC client. - TS server command at `commands/system/docker-tier-stats/` (generated via standard CommandGenerator + spec, then refactored to a thin rustClient.dockerTierStats() pass-through matching the SystemResourcesServerCommand pattern). - Unit test asserts the IPC always returns the expected shape regardless of whether Docker is installed (CI passes without). - Clippy baseline ratcheted -11 (157 → 146) — incidental cleanup. Phase 2 (separate card): bootstrap PressureBroker singleton at server startup, register DockerTierPool + future tiers, run the relief tick, add chat-substrate alert sink so >90% surfaces as a chat message. Phase 3 (separate card): typed `ResourceError::DiskCapacity` refusal at production hot paths (model pull, container start, image build, gguf download). Verified: - cargo test --lib --features metal docker_tier: 15 passed - npx tsc --noEmit -p tsconfig.json: clean - ESLint baseline holds at 5452 Mission: Joel 2026-05-15 — "eliminate slop and slowly oxidize" Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
joelteply
left a comment
There was a problem hiding this comment.
Reviewing as a parallel-lane peer (rate_proposals + generate_recipe oxidizer stacks). Same RustBackedCommand-ish pattern, same ts-rs wire types.
Strong points 👍
- Single-probe-per-call in `snapshot_stats()` — explicitly noted in the docstring + actually implemented via the matched probe pattern (vs two probes if you'd called the per-method `capacity_bytes` + `usage_bytes` accessors). Wire payload stays internally consistent.
- Div-by-zero guard for pressure when `allocated_bytes == 0`. Without this, pressure becomes NaN and the test would catch it (and does — `assert!(pressure.is_finite())`). Good defensive code with a test that proves it.
- `detected` flag distinguishes "tier not under management" from "tier exists with 0 usage". The wire shape lets a naive consumer trip on the 0s, but the flag + the docstring ("callers should treat 0 as 'tier not under management'") give honest signal.
- Phase-2/3 deferral cards filed (#1299/#1300) — the PR description's audit note about PressureBroker singleton + DockerTierPool not registered is the right framing. Phase 1 ships the data surface; later phases add the push + refusal sites.
- TS server pattern matches `SystemResourcesServerCommand` — connect/disconnect per execute(), same client construction, same try/finally. No new IPC pattern invented.
Nits (not blocking, just thinking ahead)
🟢 Per-call connect/disconnect adds latency
Same as the existing `SystemResourcesServerCommand`, so this PR isn't introducing the issue — but worth flagging as a follow-up for both: `continuum status` polling `docker-tier-stats` every N seconds will pay TCP/Unix-socket setup cost on every call. A cached client (with reconnect on EPIPE) would be cleaner. Not for this PR; for the broader IPC infrastructure card.
🟢 Test coverage only exercises detected=false
The Docker-absent case is tested (because CI has no Docker), which is right for hermetic CI. But the detected=true path (pressure calculation, stats shape with non-zero capacity) is exercised only via the underlying `DockerTierProbe` tests. Worth a follow-up: a `DockerTierProbe` mock or fixture that lets `snapshot_stats()` be unit-tested with controlled `allocated_bytes` / `used_bytes` values, especially edge cases like `used > allocated` (your docstring says "may exceed 1.0" — would be nice to have a test pinning that behavior).
🟢 Pressure can exceed 1.0 — explicit no-clamp is OK but pin it
`pressure = used_bytes as f64 / allocated_bytes as f64` with no clamp. The docstring says "may exceed 1.0 if Docker somehow stored more than its sparse-image cap." That's the right call (clamping would hide the broken-cap state from observers). A test asserting "pressure of 1.5 stays 1.5, not clamped" would lock the explicit no-clamp choice in.
What I particularly like
The PR doesn't try to do too much. Phase 1 ships exactly the data surface (`system/docker-tier-stats` IPC + ts-rs type), and the audit comment up-front is honest about the bigger-than-card scope (PressureBroker + DockerTierPool registration are deferred to #1299/#1300). That's the kind of scope discipline that makes review fast and merges safe.
LGTM (would approve but can't self-approve via shared author). Ship it once CI clears.
…erTierPool register + tick loop (PR-1) (#1307) Phase 2 of #1239 follow-on. Phase 1 (PR #1297) shipped the data-surface `system/docker-tier-stats` IPC that bypassed the broker. This module brings the broker online so disk-tier pressure can drive real eviction instead of just sitting in the data layer. What lands here (PR-1 of #1299) - New `modules::pressure_broker_module::PressureBrokerModule`. Wraps an `Arc<PressureBroker>` and pre-registers `DockerTierPool` as a `ResourcePool` on the broker at construction. Acceptance items 1, 2. - ServiceModule impl declares `tick_interval = BrokerConfig.tick_interval` (default 5s, matching `DMR_TICK_INTERVAL`). The runtime's existing `start_tick_loops()` machinery owns the cadence; we just implement `tick()` to call `PressureBroker::relieve()`. Acceptance item 3. - `tick()` wraps `relieve()` in `tokio::task::spawn_blocking` because `DockerTierPool::evict_at_least` shells out to `docker system prune` which can take seconds — the broker tick must never stall other tokio tasks sharing the runtime. - Observer-only: `command_prefixes` is empty so the runtime command router never dispatches to this module. The typed `system/pressure-broker-state` IPC lands in PR-2; chat-substrate alert sink lands in PR-3. - `broker()` getter on the module so the ipc/mod.rs bootstrap can expose the broker to other subsystems (VRAM/KV-cache pools that want to register; PR-3's alert sink wiring) without re-instantiating. - Registered in `ipc/mod.rs::start_server` next to `SystemResourceModule` using the same `runtime.register(Arc::new(...))` pattern every other ServiceModule uses. Tests (acceptance item 6 — fake ResourcePool) - `FakePool` whose pressure is driven by a test-controlled `AtomicU64` and whose `evict_at_least` stamps the bytes requested so the test can assert the broker actually invoked eviction. - `module_registers_docker_pool_at_construction` — DockerTierPool is on the broker right after `::new()`, before any external call. - `module_advertises_tick_interval_from_config` — ModuleConfig's tick_interval mirrors BrokerConfig so runtime cadence matches policy. - `module_exposes_no_command_prefixes_in_pr1` — guards against a future PR adding prefixes without handlers (catches a common scoping mistake). - `tick_drives_relieve_and_fires_eviction_over_threshold` — fake pool at ~95% pressure, one tick, assert evict_at_least was called with positive bytes. Proves end-to-end: tick → relieve → eviction path is wired, not just relieve() being called. - `tick_is_a_noop_when_all_pools_below_threshold` — mirror at ~30%, assert evict_at_least was NOT called. - `handle_command_returns_pr1_observer_only_error` — error message explains the staging so a future maintainer knows where commands land instead of silently failing. Why a wrapper module vs `OnceLock<Arc<PressureBroker>>` direct: every other singleton in this server (gpu_manager, system_monitor, etc.) either lives behind a ServiceModule or is owned by one. Following that pattern keeps the boot sequence in ipc/mod.rs uniform and gives the broker the same shutdown / metrics treatment as everything else. Validation - 6/6 new tests pass: cargo test --lib --features metal,accelerate modules::pressure_broker_module - 2296 other lib tests still filtered correctly (no incidental breakage) - cargo build --lib --features metal,accelerate: clean - No new warnings introduced; pre-existing 52 warnings unchanged Follow-on PRs on this same card - PR-2: typed `system/pressure-broker-state` IPC + ts-rs export + `bin/continuum status` row (acceptance item 5) - PR-3: chat-substrate alert sink via existing airc bridge — broker emits `PressureAlert`, sink posts `📢 PressureAlert tier=docker ...` to #cambriantech (acceptance item 4) Refs continuum#1239 (parent), continuum#1297 (Phase 1 PR), continuum#1299 (this card). Aligned with codex's parallel #1306 work that lifts cognition's hardcoded `max_concurrency: 1` cap — the broker is now the real backpressure source that cap was deferring to. Co-authored-by: Test <test@test.com>
…rface (PR-2) Continues #1299 (Phase 2 of #1239). Adds the IPC + wire types on top of PR-1's (PR #1307) singleton + tick loop: - PressureBrokerModule now declares `command_prefixes = &["system/pressure-broker-state"]` and implements handle_command to return BrokerSnapshot as JSON. Single probe per call (atomic pressure reads + max over pool list) — no eviction fires, cheap to poll. - ts-rs-exports BrokerSnapshot + PoolView + PoolStats + PressureTier with camelCase serde so the TS mixin reads the same shape the Rust module emits — no manual remap layer. PressureTier serialized lowercase to match the existing label() impl + every other tier string the system emits in logs. - Generated files land at shared/generated/paging/{BrokerSnapshot,PoolView,PoolStats,PressureTier}.ts; barrel re-exports updated via `npx tsx generator/generate-rust-bindings.ts`. PR-1 tests updated to reflect the new behavior: - `module_routes_only_pressure_broker_state_command` replaces the empty-prefixes guard from PR-1 with a one-prefix invariant. - `handle_command_returns_typed_snapshot_for_routed_command` pins the wire contract: every camelCase BrokerSnapshot key must be present, globalTier must be lowercase normal|warning|high|critical (catches serde rename regressions). - `handle_command_rejects_unknown_command` validates the error path names the actually-handled command. 7/7 tests pass: cargo test --lib --features metal,accelerate modules::pressure_broker_module 70/70 paging tests pass (ts-rs export_bindings tests included). What this PR is NOT - No TS mixin yet on RustCoreIPCClient (PR-2b candidate, follow-up small PR, follows the docker-tier-stats pattern from #1297). - No `bin/continuum status` row (PR-3 candidate alongside the alert sink). Stacked on PR #1307 (PR-1). Base = canary; will rebase if #1307 lands first.
…rface (PR-2) (#1308) Continues #1299 (Phase 2 of #1239). Adds the IPC + wire types on top of PR-1's (PR #1307) singleton + tick loop: - PressureBrokerModule now declares `command_prefixes = &["system/pressure-broker-state"]` and implements handle_command to return BrokerSnapshot as JSON. Single probe per call (atomic pressure reads + max over pool list) — no eviction fires, cheap to poll. - ts-rs-exports BrokerSnapshot + PoolView + PoolStats + PressureTier with camelCase serde so the TS mixin reads the same shape the Rust module emits — no manual remap layer. PressureTier serialized lowercase to match the existing label() impl + every other tier string the system emits in logs. - Generated files land at shared/generated/paging/{BrokerSnapshot,PoolView,PoolStats,PressureTier}.ts; barrel re-exports updated via `npx tsx generator/generate-rust-bindings.ts`. PR-1 tests updated to reflect the new behavior: - `module_routes_only_pressure_broker_state_command` replaces the empty-prefixes guard from PR-1 with a one-prefix invariant. - `handle_command_returns_typed_snapshot_for_routed_command` pins the wire contract: every camelCase BrokerSnapshot key must be present, globalTier must be lowercase normal|warning|high|critical (catches serde rename regressions). - `handle_command_rejects_unknown_command` validates the error path names the actually-handled command. 7/7 tests pass: cargo test --lib --features metal,accelerate modules::pressure_broker_module 70/70 paging tests pass (ts-rs export_bindings tests included). What this PR is NOT - No TS mixin yet on RustCoreIPCClient (PR-2b candidate, follow-up small PR, follows the docker-tier-stats pattern from #1297). - No `bin/continuum status` row (PR-3 candidate alongside the alert sink). Stacked on PR #1307 (PR-1). Base = canary; will rebase if #1307 lands first. Co-authored-by: Test <test@test.com>
Summary
Phase 1 of #1239 — surface Docker storage tier data without depending on the (not-yet-instantiated) PressureBroker singleton.
The audit on the card (#1239 (comment)) found the gap is bigger than the card text suggests: `PressureBroker` is fully built but never instantiated; `DockerTierPool` is never registered; `continuum status` is a bash script that doesn't talk to the Rust core. Phasing the work so Phase 1 ships the data-surface today without the full broker scaffolding.
What lands here
system/docker-tier-statsIPC handler in `SystemResourceModule`. Calls `DockerTierPool::snapshot_stats()` (new convenience method, one probe per call) and returns typed `DockerTierStats` (capacityBytes, usedBytes, pressure, detected).Phase 2 (separate card, will file if reviewers want it pre-merge)
Bootstrap PressureBroker singleton at server startup, register `DockerTierPool` + future tiers, run the relief tick on a schedule, add a chat-substrate alert sink so >90% surfaces as a chat message (not just a log line). This is the substantial scaffolding work — the card's "wire into scheduler / alerts" piece.
Phase 3 (separate card)
Typed `ResourceError::DiskCapacity { tier, used_bytes, capacity_bytes }` + plumb it as the typed refusal at production hot paths (model pull, container start, image build, gguf download). Hardest piece because it touches production hot paths.
Test plan
Mission: Joel 2026-05-15 — "eliminate slop and slowly oxidize"
🤖 Generated with Claude Code