-
Notifications
You must be signed in to change notification settings - Fork 0
Observability Cost Accounting
Time/duration, failure/verifier/acceptance rates, and token/cost — all DERIVED from run state CW already keeps, with host-attested usage so cost is honest. Shipped in v0.1.31. Repo doc:
docs/observability-cost-accounting.7.md.
The design mantra for this layer:
Derive, never collect.
A counter you cannot trust is worse than none.
Cost is attested, never measured.
Pricing is data, not kernel.
Fail closed: n/a over a fabricated zero.
CW models metrics the way a Unix base system exposes state — a DERIVED projection of records that already exist, never a separate database. It follows the same philosophy as Run Registry Control Plane and State Explosion Management: the per-run .cw/runs/<id>/state.json is the SINGLE source of truth, metrics are a read-only projection of it, and there is no telemetry pipeline, no background collector daemon, and no hidden counters. Before v0.1.31 there was no metrics module and no token or cost field anywhere; run state already carried createdAt/updatedAt/completedAt/dispatchedAt and outcome statuses on tasks, workers, verifier nodes, candidates, memberships, and feedback. This release projects those into a report — and adds an additive, host-attested usage record so cost can be accounted honestly — without changing the ResultEnvelope schema and without taking ownership of source truth.
durable run state -> pure projection -> rates/durations + attested usage -> cost under a policy
The runtime is MECHANISM — it records attested usage and derives rates/durations. The pricing table is POLICY — supplied as DATA (CostPolicy), not baked into the kernel. The same attested usage yields different cost reports under different pricing without touching the runtime. A bundled EXAMPLE policy lives at manifest/pricing.policy.json (USD per 1e6 tokens, an editable starting point — not a live price feed); pass --pricing <path> to use your own, or --pricing default for the bundled example. With no policy supplied, cost is unpriced/unreported, never guessed. CW never measures cost.
The same attested usage also drives bounded dynamic control flow: a loop(...) with until:{kind:"budget-target", target} scales its rounds toward a token target read from this attested-only usage, while the fail-closed limits.tokenBudget cap stays the absolute backstop that can never be overshot.
-
Derive, never collect. Every number is a projection of existing durable state.
deriveMetricsReport(run, { now, policy })is a PURE function of one run's state, an injectednow, and an optional pricing policy. There is no metrics store. The only now-derived field isgeneratedAt; durations are computed from recorded timestamps, so a report over a fixed snapshot is byte-reproducible (eval/replay agnostic). -
Durations come from recorded timestamps.
dispatchedAt→completedAtfor tasks,createdAt→worker outputrecordedAtfor workers,createdAt→updatedAtfor the run. Nothing is sampled live. -
Three rates, each with its own arithmetic. The failure rate pools failed/rejected workers, failed memberships, failed un-worker-backed tasks, and unresolved (
open/tasked) feedback over the total of those samples. The verifier pass rate countsverifierstate nodes whose status is a pass (verified/completed/committed) against decided gates (pass +failed/rejected/blocked); pending/running gates are undecided and excluded. The candidate acceptance rate countsselected/verifiedcandidates over all candidate records. -
A counter you cannot trust is worse than none. Each rate is a
RateMetriccarryingstate(ok/n/a),count,total,rate, and per-bucket sample counts. Over zero samples the state isn/aandcount/ratearenull— never0. No divide-by-zero, no partial-data rate presented as complete. Sample counts and buckets accompany every rate so a reader can audit the numerator and denominator. -
Cost is attested, never measured or fabricated. CW does not call the model; the host/worker does. Token usage is recorded as HOST-ATTESTED provenance — a
UsageRecordaccepted on the existing intake path and stored on the task or worker record, never onResultEnvelope. CW records what the host attests verbatim and synthesizes nothing. When the host reports no usage the value is an explicitunreported— never0, never a silent guess. The report surfacesusage.coverage(the fraction of work units carrying attested usage) andusage.unreportedUnitsso the gap is visible. -
Attested and estimated cost are never conflated. A monetary figure is
attestedONLY when derived from attested usage × a recorded pricing policy with an EXACT model match. When a model is priced by the policy'sdefaultPricefallback, that portion is a SEPARATEestimatedfigure and the coststatebecomesestimated; the two USD figures are never conflated into one. Cost states:attested(every attested model exact-matched a policy entry),estimated(some attested usage priced by the policy default/fallback),unpriced(attested usage present but no policy entry and no default),unreported(no attested usage to price). -
One source, every surface. The metrics verbs are declared once in
src/capability-registry.ts, so the CLI and MCP surfaces are two renderings of one core (src/observability.ts) and pass the v0.1.27 parity gate —cw <cmd> --jsonis byte-identical tocw_<tool>(durations are integers from recorded timestamps; only the ISOgeneratedAtis now-derived and neutralized by the parity probe). The v0.1.30 Workbench renders a read-only metrics panel from the same payload, showing coverage andunreported/n/ahonestly — it shows nothing the CLI/MCP cannot. -
Snapshots are rebuildable and freshness is fail-closed. The per-run report is persisted as a rebuildable, fingerprinted snapshot under
.cw/runs/<id>/metrics/metrics-report.json; the cross-repo summary reports each run's snapshot freshness asvalid|stale|absentagainst current source — fail closed, exactly like the registry. -
Backward compatible by construction. Usage/cost fields are additive and optional. Old runs load and report
unreportedcost while still yielding correct time and rate metrics from their existing timestamps and outcomes; theResultEnvelopeschema is unchanged.
cw metrics show <run-id> derived per-run report: durations, the three rates with
sample counts, attested usage with coverage, and cost.
--json for the canonical payload; --pricing <path>|default
cw metrics summary cross-repo rollup over the v0.1.28 run registry: pooled
rates, summed attested usage/cost with coverage, per-app
and per-backend breakdowns. --scope repo|home;
unreadable runs counted (unreadableRuns), never dropped
Usage attestation rides the existing intake path:
cw result <run-id> <task-id> <file> --usage-input-tokens N --usage-output-tokens M --usage-model ID --usage-source host-attested
cw worker output <run-id> <worker-id> <file> --usage-input-tokens N --usage-output-tokens M --usage-model ID
MCP hosts call cw_metrics_show and cw_metrics_summary with identical payloads.
Metrics that are collected drift from truth, fabricate zeros over empty samples, and quietly conflate measured cost with guessed cost. CW does none of that: every rate is auditable against its own numerator/denominator, fails to n/a rather than lying, and cost is attested provenance priced by a data policy CW never measures — with the unreported gap always visible. You get honest accounting that stays byte-reproducible across CLI, MCP, and Workbench.
- Architecture Principles
- Runtime Contract
- CLI MCP Parity
- Run Registry Control Plane
- Execution Backends
- Web Desktop Workbench — the read-only metrics panel rendered from this payload
- Team Collaboration — v0.1.32, parity-gated and Workbench-rendered like these verbs
- Release Tooling
- Repo doc:
plugins/cool-workflow/docs/observability-cost-accounting.7.md
Organized from local Obsidian notes and reconciled with the current
coo1white/cool-workflow repository state.
Start here
Go deeper
- Workflow Apps
- Architecture
- Trust And Audit
- Recovery And Restore
- Commands or API
- MCP And Manifests
- Operations
- FAQ
Source docs