Skip to content

Run Registry Control Plane

coo1white edited this page Jun 8, 2026 · 2 revisions

Run Registry / Control Plane (v0.1.28)

A derived, rebuildable, fingerprinted userland index over MANY runs across MANY repos: search, resume, queue, archive, cross-repo history, and failed-run rerun — while each per-run state.json stays the single source of truth. Append-only history; fail closed on stale or missing source. Shipped in v0.1.28. Repo doc: docs/run-registry-control-plane.7.md.

Before v0.1.28 a run lived only under its repo's .cw/runs/<id>/ and was loaded from the current directory (loadRunFromCwd); there was no cross-repo index and no unified lifecycle management. v0.1.28 adds search, resume, archive, a durable queue, cross-repo history, and failed-run rerun — without changing the run-state schema and without taking ownership of source truth.

The design mantra for this layer:

state.json is the only truth.
The registry is a derived cache.
Delete it, rebuild it from source.
Classify lifecycle, never invent it.
Append history; never overwrite the past.
Fail closed on stale or missing.

The Borrowed Idea: A Rebuildable Index Over Source-of-Truth Files

The registry is MECHANISM: a rebuildable cache over runs. POLICY — retention windows, queue ordering, and archive thresholds — is configurable and kept out of the index (RunRegistryPolicy, explicit flags). The index can be deleted and rebuilt from source at any time; it never holds authority a state.json does not.

State is plain files, readable and diffable:

<repo>/.cw/runs/<id>/state.json     source of truth (unchanged, never owned here)
<repo>/.cw/registry/index.json      per-repo derived index (rebuildable)
<repo>/.cw/registry/archive.json    archive overlay (mark; never deletes source)
<repo>/.cw/registry/provenance.json rerun provenance links (derived metadata)

$CW_HOME/registry/repos.json        registered repo roots (explicit discovery set)
$CW_HOME/registry/index.json        cross-repo derived index (rebuildable)
$CW_HOME/registry/queue.json        durable run queue (plain, ordered)

The home registry root resolves from CW_HOME, then XDG_STATE_HOME/cool-workflow, then ~/.local/state/cool-workflow. A repo is registered into repos.json when it is refreshed (or when a queue entry names it). Reads never write: a search or show computes the repo set as the union of the registered repos and the current repo in memory, so reading the index never mutates discovery state.

1. state.json Is the Only Truth; the Registry Is Derived

The per-run .cw/runs/<id>/state.json is the SINGLE source of truth. The registry is a DERIVED userland index, never a replacement for source records. There is no hidden database and no daemon required to read state. The registry, archive overlay, provenance overlay, queue, and home discovery set are all derived files that can be deleted and rebuilt from source at any time.

2. Every Read Re-Derives From Source

A RunRecord is derived per run and carries schemaVersion, runId, appId, appVersion, workflowId, title, repo (the owning repo root), runDir, statePath, createdAt, updatedAt, loopStage, a lifecycle and a derivedLifecycle, an archived flag with archivedAt/archiveReason, task counts, commitCount, verifierGatedCommitCount, openFeedbackCount, a bounded inputsDigest for free-text search, a deterministic sourceFingerprint, a per-record freshness (valid, stale, or missing), and optional provenance.

A RunRegistryIndex aggregates records for a scope (repo or home) with its own sourceFingerprint, the covered repos, the queue, and lifecycle counts. A RunRegistryReport wraps the index with explicit freshness (valid, stale, or absent) plus the staleRuns and missingRuns lists and a nextAction. Every read re-derives records from source; the persisted index is only compared against, never trusted as the live status.

3. Lifecycle Is Classified, Never Invented

Lifecycle is CLASSIFIED from existing state. deriveLifecycle applies the following rules to a run's source state — first match wins:

1. running tasks > 0                              -> running
2. open feedback > 0                              -> blocked   (failures under correction)
3. failed tasks > 0                               -> failed
4. tasks > 0 and all tasks completed              -> completed
5. verifier-gated commits > 0 and nothing pending -> completed (commit-only runs)
6. completed tasks > 0                            -> running   (mid-flight)
7. otherwise                                      -> queued

archived is an OVERLAY disposition applied on top of this. The surfaced lifecycle becomes archived, but derivedLifecycle preserves the source-derived state so search and history can still match the underlying run. The classifier never reads the cache; it reads source state.json.

4. Append-Only History: Resume, Rerun, Archive

History is append-only. Resume continues a run, rerun creates a NEW linked run, and archive marks rather than deletes.

  • Searchrun search queries by --app, --status, time range (--since, --until), --repo, and free-text (--text, matched over runId, app, workflow, title, repo, lifecycle, loop stage, and a bounded digest of run inputs). Results are deterministic (ordered by createdAt, then runId) and paginated (--limit, --offset). Search is cross-repo by default (--scope home); use --scope repo to restrict to the current repo. Archived runs are included by default and can be excluded with --include-archived false.
  • Resumerun resume <run-id> resolves a run by id across the registry — not just the cwd — loads its durable state, and returns the next runnable tasks and next actions for the host to execute. Resume is read-only over source: it never mutates state.json and never un-archives a run.
  • Archiverun archive <run-id> writes an overlay mark to the owning repo's registry/archive.json; the run's state.json is never moved or deleted, and the run stays searchable (its derivedLifecycle is preserved). --unarchive clears the mark. Retention is POLICY: run archive --older-than-days N [--state completed --state failed] archives eligible runs older than the window. The default policy archives nothing (archiveOlderThanDays = 0) until a window is given.
  • Rerunrun rerun <run-id> re-runs a failed run as a NEW run: it reuses the original inputs and app, lands the new run beside the original (same repo), and records a provenance link (rerunOf, rerunOfRepo, originRunId, generation, reason) in the repo's registry/provenance.json. The original failed run is PRESERVED for audit — the past is never overwritten. Rerunning a rerun increments generation and keeps originRunId pinned to the chain root.
  • Historyhistory reads a unified timeline of runs across all registered repos (newest first), each entry carrying its repo, lifecycle, loop stage, timestamps, freshness, and provenance back to its .cw/runs/<id>/. Filter with --app and --status; paginate with --limit and --offset.

5. The Queue Records Order; the Host Executes

queue add appends a durable entry to $CW_HOME/registry/queue.json with an explicit --priority (lower drains first; ties break by enqueue time, then id). queue list prints the queue in policy order; queue show <id> shows one entry. queue drain [--limit N] marks the next ready entries drained and returns them — CW records order and readiness; the HOST still executes the workers. Nothing in the queue spawns work on its own.

6. Fail Closed on Stale or Missing

registry show recomputes the current source fingerprint for every run and compares it to the persisted index. If a run's source changed, the report status is stale and the run is named in staleRuns. If a persisted run's source is gone, the run is named in missingRuns, it is NOT fabricated into the current records, and the next action is registry refresh. run show of a run whose source is missing returns found: false with freshness: missing and only the last-known persisted record, clearly flagged — never as a live status. An unreadable or unsupported run state is treated as missing, never as success.

7. One Selection, Two Surfaces

Every command is declared once in the v0.1.28 capability registry (src/capability-registry.ts) and rendered on both surfaces, so cw <cmd> --json is schema-identical to the matching cw_<tool> result and the pair passes npm run parity:check:

node scripts/cw.js registry refresh [--scope repo|home] [--json]
node scripts/cw.js registry show [--scope repo|home] [--json]
node scripts/cw.js run search [--app ID] [--status STATE] [--text Q] [--repo PATH] [--since ISO] [--until ISO] [--limit N] [--offset N] [--scope repo|home] [--json]
node scripts/cw.js run list [--scope repo|home] [--json]
node scripts/cw.js run show <run-id> [--scope repo|home] [--json]
node scripts/cw.js run resume <run-id> [--limit N] [--json]
node scripts/cw.js run archive <run-id> [--reason TEXT] [--unarchive]
node scripts/cw.js run archive --older-than-days N [--state completed --state failed]
node scripts/cw.js run rerun <run-id> [--reason TEXT]
node scripts/cw.js queue add [--app ID|--workflow ID|--runId ID] [--repo PATH] [--priority N] [--note TEXT]
node scripts/cw.js queue list [--status STATE] [--repo PATH] [--json]
node scripts/cw.js queue show <queue-id>
node scripts/cw.js queue drain [--limit N] [--repo PATH]
node scripts/cw.js history [--app ID] [--status STATE] [--limit N] [--offset N] [--scope repo|home] [--json]

The MCP tools mirror the CLI one-for-one: cw_registry_refresh, cw_registry_show, cw_run_search, cw_run_list, cw_run_show, cw_run_resume, cw_run_archive, cw_run_rerun, cw_queue_add, cw_queue_list, cw_queue_drain, cw_queue_show, and cw_history. Read commands print terse human panels by default (lifecycle, freshness, counts, and next action) and full machine output under --json or --format json.

8. Backward Compatible by Construction

Pre-0.1.28 single-repo runs and existing .cw/runs/ layouts keep working with an empty, rebuildable registry: registry show reports absent until the first registry refresh, and every pre-0.1.28 CLI command and MCP tool is unchanged. No run-state schema change ships in v0.1.28; newer unsupported run-state schemas still fail closed.

Why It Matters

The registry is the control plane that turns isolated per-run state into an operable fleet: you can find any run across any registered repo, continue it, queue more work, retire the old, and rerun the failed — without the index ever becoming the authority a state.json is. Because every read re-derives from source and fails closed on drift, the control plane is trustworthy precisely because it is disposable. It composes cleanly with the rest of CW: CLI MCP Parity guarantees both surfaces render one data source, and Execution Backends keep the result/evidence envelope backend-independent, so the registry stays agnostic to who executed a run.

See Also

Clone this wiki locally