Skip to content

Control Plane Scheduling

coo1white edited this page Jun 8, 2026 · 1 revision

Control-Plane Scheduling (v0.1.37)

A scheduling-policy layer over the v0.1.28 Run Registry queue: priority + a hard concurrency ceiling + lease lifecycle + retry/backoff + a fail-closed park state. Policy-as-data, deterministic, in a distinct sched namespace. Shipped in v0.1.37. Repo doc: docs/control-plane-scheduling.7.md.

The v0.1.28 queue had ORDER (priority, enqueuedAt) but no policy — nothing limited in-flight runs, nothing retried with backoff, and queue drain would re-hand the same failing entry forever. v0.1.37 layers policy over the existing queue (no queue file duplicated). The sched namespace is separate from the unrelated wall-clock schedule (loop/cron) scheduler.

Design Mantra

The queue has order; this adds policy.
Concurrency is a hard ceiling — never exceeded.
Past the budget: park, not retry-forever.
Deterministic: inject now, no jitter.

The Borrowed Idea: Policy as Data Over a Pure Selector

SchedulingPolicy is a plain, diffable file ($CW_HOME/registry/ scheduling-policy.json) with conservative fail-closed defaults (maxConcurrent 1, maxAttempts 3, leaseTtlMs 300000, exponential backoff capped at 60s, no jitter). The core (src/scheduling.ts) is pure — every function takes an injected now, reuses compareQueue, and operates on the existing RunQueueEntry[]. "CW records readiness/order/leases; the host still executes the workers."

The lease lifecycle

ready --lease--> leased --complete--> drained
   ^               |  \--release(failed)/expire--> ready (+backoff) | parked
   |__reset________/

Principles

  1. sched plan — READ-ONLY would-be lease plan for queue+policy+now; deterministic, replayable, payload-identical across CLI/MCP.
  2. Hard concurrency ceiling (load-bearing fail-closed)maxConcurrent bounds in-flight (leased) entries; leasing stops at the ceiling, over-limit entries stay ready. Never exceeded.
  3. Leasessched lease claims eligible entries (priority order) with a leaseId + leaseExpiresAt; sched complete is terminal success; an EXPIRED lease (sched reclaim) is reclaimable and counts one recorded failed attempt.
  4. Retry with computed backoff — a failed/expired attempt under budget increments attempts and sets nextEligibleAt = now + baseMs*factor^(attempts-1) (capped); deterministic, no randomness.
  5. Park past budget (fail closed) — at maxAttempts the entry becomes parked and is NEVER re-selected; sched reset is the only way back. The queue can never re-hand a failing entry forever.
  6. Backward compatibleRunQueueEntry additively gains scheduling fields + leased/parked statuses; a pre-0.1.37 queue.json loads unchanged; the queue add|list|drain|show verbs are untouched.

Why It Matters

This makes the durable queue safe to actually drain at scale: concurrency is bounded, transient failures retry with backoff, and a persistently failing entry parks for an operator instead of looping forever — all as inspectable, deterministic policy data. Verified live via the CLI over a temp CW_HOME, including pre-v0.1.37 queue compatibility.

See Also

Clone this wiki locally