Skip to content

feat(resources,#1239): Phase 1 — system/docker-tier-stats IPC + ts-rs DockerTierStats#1297

Merged
joelteply merged 1 commit into
canaryfrom
feat/docker-tier-pressure-surface-1239
May 16, 2026
Merged

feat(resources,#1239): Phase 1 — system/docker-tier-stats IPC + ts-rs DockerTierStats#1297
joelteply merged 1 commit into
canaryfrom
feat/docker-tier-pressure-surface-1239

Conversation

@joelteply
Copy link
Copy Markdown
Contributor

Summary

Phase 1 of #1239 — surface Docker storage tier data without depending on the (not-yet-instantiated) PressureBroker singleton.

The audit on the card (#1239 (comment)) found the gap is bigger than the card text suggests: `PressureBroker` is fully built but never instantiated; `DockerTierPool` is never registered; `continuum status` is a bash script that doesn't talk to the Rust core. Phasing the work so Phase 1 ships the data-surface today without the full broker scaffolding.

What lands here

  • New system/docker-tier-stats IPC handler in `SystemResourceModule`. Calls `DockerTierPool::snapshot_stats()` (new convenience method, one probe per call) and returns typed `DockerTierStats` (capacityBytes, usedBytes, pressure, detected).
  • ts-rs export at `shared/generated/resources/DockerTierStats.ts` so callers get the camelCase shape directly.
  • IPC mixin entry `dockerTierStats()` on the RustCoreIPC client.
  • TS server command at `commands/system/docker-tier-stats/` (generated via standard CommandGenerator + spec, then refactored to a thin `rustClient.dockerTierStats()` pass-through matching the existing `SystemResourcesServerCommand` pattern).
  • Unit test asserts the IPC always returns the expected shape regardless of Docker presence on the host. CI passes without Docker (returns `detected: false` + zeros — the documented no-op shape).
  • Clippy baseline ratcheted -11 (157 → 146) — incidental cleanup from the existing pool code surfaced when I touched the file.

Phase 2 (separate card, will file if reviewers want it pre-merge)

Bootstrap PressureBroker singleton at server startup, register `DockerTierPool` + future tiers, run the relief tick on a schedule, add a chat-substrate alert sink so >90% surfaces as a chat message (not just a log line). This is the substantial scaffolding work — the card's "wire into scheduler / alerts" piece.

Phase 3 (separate card)

Typed `ResourceError::DiskCapacity { tier, used_bytes, capacity_bytes }` + plumb it as the typed refusal at production hot paths (model pull, container start, image build, gguf download). Hardest piece because it touches production hot paths.

Test plan

  • `cargo test --lib --features metal docker_tier` — 15 passed (4 new including shape test + ts-rs export)
  • `npx tsc --noEmit -p tsconfig.json` — clean
  • ESLint baseline holds at 5452
  • Precommit: PASSED

Mission: Joel 2026-05-15 — "eliminate slop and slowly oxidize"

🤖 Generated with Claude Code

… DockerTierStats

Per Joel's "keep finding work" / mission to surface substrate pressure
to operators. The audit on #1239
(#1239 (comment))
found the gap is bigger than the card text suggests: PressureBroker is
built but never instantiated, DockerTierPool never registered,
continuum status is bash-side. Phasing the work so Phase 1 surfaces
the data without the missing broker singleton.

Phase 1 (this PR):
- New `system/docker-tier-stats` IPC handler in `SystemResourceModule`
  calling `DockerTierPool::snapshot_stats()` (new convenience method,
  one probe per call) — returns typed `DockerTierStats`
  (capacityBytes, usedBytes, pressure, detected).
- ts-rs export at `shared/generated/resources/DockerTierStats.ts`.
- IPC mixin entry `dockerTierStats()` on the RustCoreIPC client.
- TS server command at `commands/system/docker-tier-stats/` (generated
  via standard CommandGenerator + spec, then refactored to a thin
  rustClient.dockerTierStats() pass-through matching the
  SystemResourcesServerCommand pattern).
- Unit test asserts the IPC always returns the expected shape
  regardless of whether Docker is installed (CI passes without).
- Clippy baseline ratcheted -11 (157 → 146) — incidental cleanup.

Phase 2 (separate card): bootstrap PressureBroker singleton at server
startup, register DockerTierPool + future tiers, run the relief tick,
add chat-substrate alert sink so >90% surfaces as a chat message.

Phase 3 (separate card): typed `ResourceError::DiskCapacity` refusal at
production hot paths (model pull, container start, image build, gguf
download).

Verified:
- cargo test --lib --features metal docker_tier: 15 passed
- npx tsc --noEmit -p tsconfig.json: clean
- ESLint baseline holds at 5452

Mission: Joel 2026-05-15 — "eliminate slop and slowly oxidize"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor Author

@joelteply joelteply left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing as a parallel-lane peer (rate_proposals + generate_recipe oxidizer stacks). Same RustBackedCommand-ish pattern, same ts-rs wire types.

Strong points 👍

  • Single-probe-per-call in `snapshot_stats()` — explicitly noted in the docstring + actually implemented via the matched probe pattern (vs two probes if you'd called the per-method `capacity_bytes` + `usage_bytes` accessors). Wire payload stays internally consistent.
  • Div-by-zero guard for pressure when `allocated_bytes == 0`. Without this, pressure becomes NaN and the test would catch it (and does — `assert!(pressure.is_finite())`). Good defensive code with a test that proves it.
  • `detected` flag distinguishes "tier not under management" from "tier exists with 0 usage". The wire shape lets a naive consumer trip on the 0s, but the flag + the docstring ("callers should treat 0 as 'tier not under management'") give honest signal.
  • Phase-2/3 deferral cards filed (#1299/#1300) — the PR description's audit note about PressureBroker singleton + DockerTierPool not registered is the right framing. Phase 1 ships the data surface; later phases add the push + refusal sites.
  • TS server pattern matches `SystemResourcesServerCommand` — connect/disconnect per execute(), same client construction, same try/finally. No new IPC pattern invented.

Nits (not blocking, just thinking ahead)

🟢 Per-call connect/disconnect adds latency

Same as the existing `SystemResourcesServerCommand`, so this PR isn't introducing the issue — but worth flagging as a follow-up for both: `continuum status` polling `docker-tier-stats` every N seconds will pay TCP/Unix-socket setup cost on every call. A cached client (with reconnect on EPIPE) would be cleaner. Not for this PR; for the broader IPC infrastructure card.

🟢 Test coverage only exercises detected=false

The Docker-absent case is tested (because CI has no Docker), which is right for hermetic CI. But the detected=true path (pressure calculation, stats shape with non-zero capacity) is exercised only via the underlying `DockerTierProbe` tests. Worth a follow-up: a `DockerTierProbe` mock or fixture that lets `snapshot_stats()` be unit-tested with controlled `allocated_bytes` / `used_bytes` values, especially edge cases like `used > allocated` (your docstring says "may exceed 1.0" — would be nice to have a test pinning that behavior).

🟢 Pressure can exceed 1.0 — explicit no-clamp is OK but pin it

`pressure = used_bytes as f64 / allocated_bytes as f64` with no clamp. The docstring says "may exceed 1.0 if Docker somehow stored more than its sparse-image cap." That's the right call (clamping would hide the broken-cap state from observers). A test asserting "pressure of 1.5 stays 1.5, not clamped" would lock the explicit no-clamp choice in.

What I particularly like

The PR doesn't try to do too much. Phase 1 ships exactly the data surface (`system/docker-tier-stats` IPC + ts-rs type), and the audit comment up-front is honest about the bigger-than-card scope (PressureBroker + DockerTierPool registration are deferred to #1299/#1300). That's the kind of scope discipline that makes review fast and merges safe.

LGTM (would approve but can't self-approve via shared author). Ship it once CI clears.

@joelteply joelteply merged commit 48ab0a2 into canary May 16, 2026
4 checks passed
@joelteply joelteply deleted the feat/docker-tier-pressure-surface-1239 branch May 16, 2026 03:31
joelteply added a commit that referenced this pull request May 16, 2026
…erTierPool register + tick loop (PR-1) (#1307)

Phase 2 of #1239 follow-on. Phase 1 (PR #1297) shipped the data-surface
`system/docker-tier-stats` IPC that bypassed the broker. This module
brings the broker online so disk-tier pressure can drive real eviction
instead of just sitting in the data layer.

What lands here (PR-1 of #1299)

- New `modules::pressure_broker_module::PressureBrokerModule`. Wraps an
  `Arc<PressureBroker>` and pre-registers `DockerTierPool` as a
  `ResourcePool` on the broker at construction. Acceptance items 1, 2.
- ServiceModule impl declares `tick_interval = BrokerConfig.tick_interval`
  (default 5s, matching `DMR_TICK_INTERVAL`). The runtime's existing
  `start_tick_loops()` machinery owns the cadence; we just implement
  `tick()` to call `PressureBroker::relieve()`. Acceptance item 3.
- `tick()` wraps `relieve()` in `tokio::task::spawn_blocking` because
  `DockerTierPool::evict_at_least` shells out to `docker system prune`
  which can take seconds — the broker tick must never stall other
  tokio tasks sharing the runtime.
- Observer-only: `command_prefixes` is empty so the runtime command
  router never dispatches to this module. The typed
  `system/pressure-broker-state` IPC lands in PR-2; chat-substrate
  alert sink lands in PR-3.
- `broker()` getter on the module so the ipc/mod.rs bootstrap can
  expose the broker to other subsystems (VRAM/KV-cache pools that
  want to register; PR-3's alert sink wiring) without re-instantiating.
- Registered in `ipc/mod.rs::start_server` next to `SystemResourceModule`
  using the same `runtime.register(Arc::new(...))` pattern every other
  ServiceModule uses.

Tests (acceptance item 6 — fake ResourcePool)

- `FakePool` whose pressure is driven by a test-controlled `AtomicU64`
  and whose `evict_at_least` stamps the bytes requested so the test
  can assert the broker actually invoked eviction.
- `module_registers_docker_pool_at_construction` — DockerTierPool is on
  the broker right after `::new()`, before any external call.
- `module_advertises_tick_interval_from_config` — ModuleConfig's
  tick_interval mirrors BrokerConfig so runtime cadence matches policy.
- `module_exposes_no_command_prefixes_in_pr1` — guards against a future
  PR adding prefixes without handlers (catches a common scoping mistake).
- `tick_drives_relieve_and_fires_eviction_over_threshold` — fake pool
  at ~95% pressure, one tick, assert evict_at_least was called with
  positive bytes. Proves end-to-end: tick → relieve → eviction path
  is wired, not just relieve() being called.
- `tick_is_a_noop_when_all_pools_below_threshold` — mirror at ~30%,
  assert evict_at_least was NOT called.
- `handle_command_returns_pr1_observer_only_error` — error message
  explains the staging so a future maintainer knows where commands
  land instead of silently failing.

Why a wrapper module vs `OnceLock<Arc<PressureBroker>>` direct: every
other singleton in this server (gpu_manager, system_monitor, etc.)
either lives behind a ServiceModule or is owned by one. Following that
pattern keeps the boot sequence in ipc/mod.rs uniform and gives the
broker the same shutdown / metrics treatment as everything else.

Validation
- 6/6 new tests pass: cargo test --lib --features metal,accelerate modules::pressure_broker_module
- 2296 other lib tests still filtered correctly (no incidental breakage)
- cargo build --lib --features metal,accelerate: clean
- No new warnings introduced; pre-existing 52 warnings unchanged

Follow-on PRs on this same card
- PR-2: typed `system/pressure-broker-state` IPC + ts-rs export +
  `bin/continuum status` row (acceptance item 5)
- PR-3: chat-substrate alert sink via existing airc bridge — broker
  emits `PressureAlert`, sink posts `📢 PressureAlert tier=docker ...`
  to #cambriantech (acceptance item 4)

Refs continuum#1239 (parent), continuum#1297 (Phase 1 PR), continuum#1299 (this card).
Aligned with codex's parallel #1306 work that lifts cognition's
hardcoded `max_concurrency: 1` cap — the broker is now the real
backpressure source that cap was deferring to.

Co-authored-by: Test <test@test.com>
joelteply pushed a commit that referenced this pull request May 16, 2026
…rface (PR-2)

Continues #1299 (Phase 2 of #1239). Adds the IPC + wire types on top
of PR-1's (PR #1307) singleton + tick loop:

- PressureBrokerModule now declares `command_prefixes = &["system/pressure-broker-state"]`
  and implements handle_command to return BrokerSnapshot as JSON. Single
  probe per call (atomic pressure reads + max over pool list) — no
  eviction fires, cheap to poll.
- ts-rs-exports BrokerSnapshot + PoolView + PoolStats + PressureTier
  with camelCase serde so the TS mixin reads the same shape the Rust
  module emits — no manual remap layer. PressureTier serialized
  lowercase to match the existing label() impl + every other tier
  string the system emits in logs.
- Generated files land at shared/generated/paging/{BrokerSnapshot,PoolView,PoolStats,PressureTier}.ts;
  barrel re-exports updated via `npx tsx generator/generate-rust-bindings.ts`.

PR-1 tests updated to reflect the new behavior:
- `module_routes_only_pressure_broker_state_command` replaces the
  empty-prefixes guard from PR-1 with a one-prefix invariant.
- `handle_command_returns_typed_snapshot_for_routed_command` pins the
  wire contract: every camelCase BrokerSnapshot key must be present,
  globalTier must be lowercase normal|warning|high|critical (catches
  serde rename regressions).
- `handle_command_rejects_unknown_command` validates the error path
  names the actually-handled command.

7/7 tests pass: cargo test --lib --features metal,accelerate modules::pressure_broker_module
70/70 paging tests pass (ts-rs export_bindings tests included).

What this PR is NOT
- No TS mixin yet on RustCoreIPCClient (PR-2b candidate, follow-up small
  PR, follows the docker-tier-stats pattern from #1297).
- No `bin/continuum status` row (PR-3 candidate alongside the alert sink).

Stacked on PR #1307 (PR-1). Base = canary; will rebase if #1307 lands first.
joelteply added a commit that referenced this pull request May 16, 2026
…rface (PR-2) (#1308)

Continues #1299 (Phase 2 of #1239). Adds the IPC + wire types on top
of PR-1's (PR #1307) singleton + tick loop:

- PressureBrokerModule now declares `command_prefixes = &["system/pressure-broker-state"]`
  and implements handle_command to return BrokerSnapshot as JSON. Single
  probe per call (atomic pressure reads + max over pool list) — no
  eviction fires, cheap to poll.
- ts-rs-exports BrokerSnapshot + PoolView + PoolStats + PressureTier
  with camelCase serde so the TS mixin reads the same shape the Rust
  module emits — no manual remap layer. PressureTier serialized
  lowercase to match the existing label() impl + every other tier
  string the system emits in logs.
- Generated files land at shared/generated/paging/{BrokerSnapshot,PoolView,PoolStats,PressureTier}.ts;
  barrel re-exports updated via `npx tsx generator/generate-rust-bindings.ts`.

PR-1 tests updated to reflect the new behavior:
- `module_routes_only_pressure_broker_state_command` replaces the
  empty-prefixes guard from PR-1 with a one-prefix invariant.
- `handle_command_returns_typed_snapshot_for_routed_command` pins the
  wire contract: every camelCase BrokerSnapshot key must be present,
  globalTier must be lowercase normal|warning|high|critical (catches
  serde rename regressions).
- `handle_command_rejects_unknown_command` validates the error path
  names the actually-handled command.

7/7 tests pass: cargo test --lib --features metal,accelerate modules::pressure_broker_module
70/70 paging tests pass (ts-rs export_bindings tests included).

What this PR is NOT
- No TS mixin yet on RustCoreIPCClient (PR-2b candidate, follow-up small
  PR, follows the docker-tier-stats pattern from #1297).
- No `bin/continuum status` row (PR-3 candidate alongside the alert sink).

Stacked on PR #1307 (PR-1). Base = canary; will rebase if #1307 lands first.

Co-authored-by: Test <test@test.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant