-
Notifications
You must be signed in to change notification settings - Fork 0
Run Registry Control Plane
A derived, rebuildable, fingerprinted userland index over MANY runs across MANY repos: search, resume, queue, archive, cross-repo history, and failed-run rerun — while each per-run
state.jsonstays the single source of truth. Append-only history; fail closed on stale or missing source. Shipped in v0.1.28. Repo doc:docs/run-registry-control-plane.7.md.
Before v0.1.28 a run lived only under its repo's .cw/runs/<id>/ and was loaded from the current directory (loadRunFromCwd); there was no cross-repo index and no unified lifecycle management. v0.1.28 adds search, resume, archive, a durable queue, cross-repo history, and failed-run rerun — without changing the run-state schema and without taking ownership of source truth.
The design mantra for this layer:
state.json is the only truth.
The registry is a derived cache.
Delete it, rebuild it from source.
Classify lifecycle, never invent it.
Append history; never overwrite the past.
Fail closed on stale or missing.
The registry is MECHANISM: a rebuildable cache over runs. POLICY — retention windows, queue ordering, and archive thresholds — is configurable and kept out of the index (RunRegistryPolicy, explicit flags). The index can be deleted and rebuilt from source at any time; it never holds authority a state.json does not.
State is plain files, readable and diffable:
<repo>/.cw/runs/<id>/state.json source of truth (unchanged, never owned here)
<repo>/.cw/registry/index.json per-repo derived index (rebuildable)
<repo>/.cw/registry/archive.json archive overlay (mark; never deletes source)
<repo>/.cw/registry/provenance.json rerun provenance links (derived metadata)
$CW_HOME/registry/repos.json registered repo roots (explicit discovery set)
$CW_HOME/registry/index.json cross-repo derived index (rebuildable)
$CW_HOME/registry/queue.json durable run queue (plain, ordered)
The home registry root resolves from CW_HOME, then XDG_STATE_HOME/cool-workflow, then ~/.local/state/cool-workflow. A repo is registered into repos.json when it is refreshed (or when a queue entry names it). Reads never write: a search or show computes the repo set as the union of the registered repos and the current repo in memory, so reading the index never mutates discovery state.
The per-run .cw/runs/<id>/state.json is the SINGLE source of truth. The registry is a DERIVED userland index, never a replacement for source records. There is no hidden database and no daemon required to read state. The registry, archive overlay, provenance overlay, queue, and home discovery set are all derived files that can be deleted and rebuilt from source at any time.
A RunRecord is derived per run and carries schemaVersion, runId, appId, appVersion, workflowId, title, repo (the owning repo root), runDir, statePath, createdAt, updatedAt, loopStage, a lifecycle and a derivedLifecycle, an archived flag with archivedAt/archiveReason, task counts, commitCount, verifierGatedCommitCount, openFeedbackCount, a bounded inputsDigest for free-text search, a deterministic sourceFingerprint, a per-record freshness (valid, stale, or missing), and optional provenance.
A RunRegistryIndex aggregates records for a scope (repo or home) with its own sourceFingerprint, the covered repos, the queue, and lifecycle counts. A RunRegistryReport wraps the index with explicit freshness (valid, stale, or absent) plus the staleRuns and missingRuns lists and a nextAction. Every read re-derives records from source; the persisted index is only compared against, never trusted as the live status.
Lifecycle is CLASSIFIED from existing state. deriveLifecycle applies the following rules to a run's source state — first match wins:
1. running tasks > 0 -> running
2. open feedback > 0 -> blocked (failures under correction)
3. failed tasks > 0 -> failed
4. tasks > 0 and all tasks completed -> completed
5. verifier-gated commits > 0 and nothing pending -> completed (commit-only runs)
6. completed tasks > 0 -> running (mid-flight)
7. otherwise -> queued
archived is an OVERLAY disposition applied on top of this. The surfaced lifecycle becomes archived, but derivedLifecycle preserves the source-derived state so search and history can still match the underlying run. The classifier never reads the cache; it reads source state.json.
History is append-only. Resume continues a run, rerun creates a NEW linked run, and archive marks rather than deletes.
-
Search —
run searchqueries by--app,--status, time range (--since,--until),--repo, and free-text (--text, matched over runId, app, workflow, title, repo, lifecycle, loop stage, and a bounded digest of run inputs). Results are deterministic (ordered bycreatedAt, thenrunId) and paginated (--limit,--offset). Search is cross-repo by default (--scope home); use--scope repoto restrict to the current repo. Archived runs are included by default and can be excluded with--include-archived false. -
Resume —
run resume <run-id>resolves a run by id across the registry — not just the cwd — loads its durable state, and returns the next runnable tasks and next actions for the host to execute. By default resume is read-only over source: it does not changestate.jsonand does not un-archive a run; the default payload and next actions stay byte-identical. The opt-inrun resume <run-id> --drive(or--oncefor one step) hands the found run to the agent-delegation drive loop, which DOES advance durable state (an unconfigured agent fails closed,drive.status: "blocked"). Continuing through the drive with--incrementalreuses the cached result of every step whose inputs are unchanged (keyed by prompt + run inputs + delegation config + upstream-result digests), so a re-run replays the unchanged prefix and only the first changed task and everything downstream of it run live. -
Archive —
run archive <run-id>writes an overlay mark to the owning repo'sregistry/archive.json; the run'sstate.jsonis never moved or deleted, and the run stays searchable (itsderivedLifecycleis preserved).--unarchiveclears the mark. Retention is POLICY:run archive --older-than-days N [--state completed --state failed]archives eligible runs older than the window. The default policy archives nothing (archiveOlderThanDays = 0) until a window is given. -
Rerun —
run rerun <run-id>re-runs a failed run as a NEW run: it reuses the original inputs and app, lands the new run beside the original (same repo), and records a provenance link (rerunOf,rerunOfRepo,originRunId,generation,reason) in the repo'sregistry/provenance.json. The original failed run is PRESERVED for audit — the past is never overwritten. Rerunning a rerun incrementsgenerationand keepsoriginRunIdpinned to the chain root. -
History —
historyreads a unified timeline of runs across all registered repos (newest first), each entry carrying its repo, lifecycle, loop stage, timestamps, freshness, and provenance back to its.cw/runs/<id>/. Filter with--appand--status; paginate with--limitand--offset.
queue add appends a durable entry to $CW_HOME/registry/queue.json with an explicit --priority (lower drains first; ties break by enqueue time, then id). queue list prints the queue in policy order; queue show <id> shows one entry. queue drain [--limit N] marks the next ready entries drained and returns them — CW records order and readiness; the HOST still executes the workers. Nothing in the queue spawns work on its own.
registry show recomputes the current source fingerprint for every run and compares it to the persisted index. If a run's source changed, the report status is stale and the run is named in staleRuns. If a persisted run's source is gone, the run is named in missingRuns, it is NOT fabricated into the current records, and the next action is registry refresh. run show of a run whose source is missing returns found: false with freshness: missing and only the last-known persisted record, clearly flagged — never as a live status. An unreadable or unsupported run state is treated as missing, never as success.
Every command is declared once in the v0.1.28 capability registry (src/capability-registry.ts) and rendered on both surfaces, so cw <cmd> --json is schema-identical to the matching cw_<tool> result and the pair passes npm run parity:check:
node scripts/cw.js registry refresh [--scope repo|home] [--json]
node scripts/cw.js registry show [--scope repo|home] [--json]
node scripts/cw.js run search [--app ID] [--status STATE] [--text Q] [--repo PATH] [--since ISO] [--until ISO] [--limit N] [--offset N] [--scope repo|home] [--json]
node scripts/cw.js run list [--scope repo|home] [--json]
node scripts/cw.js run show <run-id> [--scope repo|home] [--json]
node scripts/cw.js run resume <run-id> [--limit N] [--json]
node scripts/cw.js run archive <run-id> [--reason TEXT] [--unarchive]
node scripts/cw.js run archive --older-than-days N [--state completed --state failed]
node scripts/cw.js run rerun <run-id> [--reason TEXT]
node scripts/cw.js queue add [--app ID|--workflow ID|--runId ID] [--repo PATH] [--priority N] [--note TEXT]
node scripts/cw.js queue list [--status STATE] [--repo PATH] [--json]
node scripts/cw.js queue show <queue-id>
node scripts/cw.js queue drain [--limit N] [--repo PATH]
node scripts/cw.js history [--app ID] [--status STATE] [--limit N] [--offset N] [--scope repo|home] [--json]
The MCP tools mirror the CLI one-for-one: cw_registry_refresh, cw_registry_show, cw_run_search, cw_run_list, cw_run_show, cw_run_resume, cw_run_archive, cw_run_rerun, cw_queue_add, cw_queue_list, cw_queue_drain, cw_queue_show, and cw_history. Read commands print terse human panels by default (lifecycle, freshness, counts, and next action) and full machine output under --json or --format json.
Pre-0.1.28 single-repo runs and existing .cw/runs/ layouts keep working with an empty, rebuildable registry: registry show reports absent until the first registry refresh, and every pre-0.1.28 CLI command and MCP tool is unchanged. No run-state schema change ships in v0.1.28; newer unsupported run-state schemas still fail closed.
The registry is the control plane that turns isolated per-run state into an operable fleet: you can find any run across any registered repo, continue it, queue more work, retire the old, and rerun the failed — without the index ever becoming the authority a state.json is. Because every read re-derives from source and fails closed on drift, the control plane is trustworthy precisely because it is disposable. It composes cleanly with the rest of CW: CLI MCP Parity guarantees both surfaces render one data source, and Execution Backends keep the result/evidence envelope backend-independent, so the registry stays agnostic to who executed a run.
- CLI MCP Parity
- Execution Backends
- Architecture Principles
- Runtime Contract
- CLI and MCP Surface
- Multi-Agent Topologies
- Home
- Repo doc:
plugins/cool-workflow/docs/run-registry-control-plane.7.md
Organized from local Obsidian notes and reconciled with the current
coo1white/cool-workflow repository state.
Start here
Go deeper
- Workflow Apps
- Architecture
- Trust And Audit
- Recovery And Restore
- Commands or API
- MCP And Manifests
- Operations
- FAQ
Source docs