feat(#411): canary invariant harness Phase 1 (S-01, E-02, L-03) by obasilakis · Pull Request #653 · Abilityai/trinity

obasilakis · 2026-05-04T21:44:34Z

Summary

Implements Phase 1 of #411 — a continuous orchestration-invariant harness that catches the bug class behind PRs #378, #403, #129, #226: race conditions and cross-component state drift between Redis × SQLite × agent registries that unit tests miss.

Three components, each justified separately in the PR planning thread:

Deterministic library (src/backend/canary/) — pure-function checks shared between the watcher and the on-demand admin endpoint. No LLM reasoning anywhere; same snapshot in, same violations out.
Watcher service (services/canary_service.py) — 5-min loop modeled on cleanup_service.py. Disabled by default; enable on staging/dev with CANARY_ENABLED=1.
Fleet manifest (config/canary-fleet.yaml) — synthetic load via the existing /api/systems/deploy mechanism. Without traffic the harness reports trivially-green cycles.

Phase 1 invariants

ID	Check	Catches
S-01	Slot–row bijection: `ZRANGE agent:slots:{name}` (drain sentinels filtered) == SQL running rows for that agent	#378/#403 phantom-stale-slot
E-02	No phantom reversal: terminal executions stay terminal across cycles. Phase 1 uses Redis-backed state comparison (`canary:e02:terminal_seen`) instead of Vector log diff for simplicity.	Same class as #378/#403
L-03	Delete cascades: no orphan rows in `agent_sharing`, `agent_schedules`, non-terminal `schedule_executions`, `agent_skills`, `agent_tags`, `agent_shared_files`, `agent_public_links`, pending `operator_queue`/`access_requests`, agent-scoped `mcp_api_keys`, active `chat_sessions`, plus orphan `agent:slots:*` Redis keys	Issue #129 family

Severity: critical for active-orchestration orphans (running executions, Redis slot keys); major for dangling permissions/sharing/tags.

Terminal-execution set used by E-02 + L-03 is sourced from TaskExecutionStatus (success, failed, cancelled, skipped) — the same set PR #524's CAS state machine treats as write-once.

Why a separate `snapshot.py` from the invariant checks

Stated in the module docstring; summarised here:

One consistent view per cycle. All three Phase 1 invariants read overlapping state (Redis ZSETs, the agent_ownership set, execution rows). A shared Snapshot instance means per-check timing drift can't introduce spurious mismatches.
No duplicated query code. New invariants are pure functions (snapshot) → list[ViolationReport]; they never re-implement SELECTs or ZRANGEs.
Test-friendly. Tests pass synthetic Snapshot dataclasses straight in (tests/test_canary_invariants.py) — no live Redis or SQLite required to exercise checking logic.

The snapshot is not atomic across Redis and SQLite — those don't share transactions. The harness deliberately accepts sub-second inconsistencies because real bugs persist across a 5-minute cycle by definition; transient races self-resolve and aren't what the canary exists to catch.

Transition-detection rule

A green→red transition fires exactly one Slack alert.

Continuing-red cycles persist new violation rows (audit history) but stay silent on the alert sink — preventing alert fatigue.
Red→green→red fires a fresh alert on the second red flip.

Implementation uses a Redis cursor canary:last_cycle_at so each cycle compares this snapshot's violation timestamps against the previous cycle's snapshot, not the latest persisted violation. The router endpoint surfaces exactly the transitions the service emitted (no recomputation), so the API response and the alert sink can never disagree.

Alert sink (Slack incoming webhook)

The harness fires green→red alerts via a Slack incoming webhook — admin-side configuration, no Settings UI. The canary is staging/dev-only and the operator already has shell access.

Configuration: CANARY_SLACK_WEBHOOK_URL env var. Unset = silent sink (cycles still run, violations still persist, debug-line records the detected transition).
Payload: Block Kit — header (🚨 Canary: <invariant_id>), summary (severity, violation count, snapshot time), forensic detail per violation, runbook hint, "last red Xm ago" context badge when reopening after a green window.
Per-transition: exactly one POST per green→red flip per invariant. Continuing-red cycles do not re-post.
Failure handling: 5s timeout enforced by slack_service.post_webhook; failures are logged and swallowed so a hung webhook can't break the cycle. The violation row is still persisted regardless of webhook outcome.
No admin bypass anywhere: the alert path doesn't touch the in-app notification bell, doesn't insert into notifications, and doesn't change any per-agent access filter. The earlier admin-bypass approach was reverted in de6786f3 once Slack became the sink.

API + persistence

New canary_violations table (migration Smart Model Routing #34) — append-mostly, JSON observed_state carries invariant-specific payload.
GET /api/canary/violations + /violations/stats + /violations/{id} — admin-only read paths.
POST /api/canary/run-cycle — admin-only on-demand trigger that delegates to the same CanaryService.run_cycle() invoked by the background loop. Useful for smoke-tests, post-fix verification, and integration tests that need deterministic timing. Response surfaces sources_unavailable so a partial snapshot (e.g. Redis down) is visible to the caller instead of silently degrading checks (a3b77e34).

Tests

27 unit tests in tests/test_canary_invariants.py covering CanaryOperations CRUD/filters/stats, snapshot collector behaviour, each invariant's holds-and-fires cases, the L-03 critical/major severity split, and the runner registry. All green locally with a fake Redis + temp SQLite.

Smoke-test recipes

These are the manual end-to-end demos the harness was built to support. Run after CANARY_ENABLED=1 and (optionally) CANARY_SLACK_WEBHOOK_URL=<your-test-webhook> is set; fleet deployed via curl -X POST .../api/systems/deploy with config/canary-fleet.yaml.

Option 1 — L-03 via SQL poke (no code changes)

Surgical, deterministic, no rebuild. Triggers L-03 by inserting one orphan row directly:

sqlite3 ~/trinity-data/trinity.db \
  "INSERT INTO agent_sharing (agent_name, shared_with_email, shared_by_id, created_at) \
   VALUES ('ghost-canary-test', 'smoketest@example.com', '<admin-id>', datetime('now'));"

curl -sS -X POST -H "Authorization: Bearer $TOKEN" \
  http://localhost:8000/api/canary/run-cycle | jq '.violations[0], .transitions'
# expect: invariant_id='L-03', severity='major', one transition

# cleanup
sqlite3 ~/trinity-data/trinity.db \
  "DELETE FROM agent_sharing WHERE agent_name='ghost-canary-test';"

Option 2 — L-03 via real-flow code regression

Most authentic for L-03 — exercises the actual delete cascade.

Edit src/backend/db/agents.py:128 and comment out the cascade DELETE:

# cursor.execute("DELETE FROM agent_sharing WHERE agent_name = ?", (agent_name,))

Backend --reload picks it up.
Real API flow: POST /api/agents → POST /api/agents/{name}/share → DELETE /api/agents/{name}. The agent_sharing row is left behind.
POST /api/canary/run-cycle → L-03 fires.
Revert the edit, manually clean the leftover orphan row, run cycle → green.

Option 3 — S-01 via real-flow code regression

Triggers S-01 by breaking slot release in services/slot_service.py.

Edit src/backend/services/slot_service.py:168 and replace the ZREM with a no-op:

# Remove from ZSET
# CANARY S-01 SMOKE TEST — temporarily disabled. Revert before merging.
# removed = self.redis.zrem(slots_key, execution_id)
removed = 0

Backend --reload picks it up.
Trigger any task on a fleet agent (or wait for canary-fleet-burst's 1-minute schedule). When the task completes, release_slot() is called but the ZREM is no-op'd — Redis ZSET keeps the execution_id while SQL marks the row success.
POST /api/canary/run-cycle → S-01 fires (severity=critical, in_redis_only contains the leaked execution_id).

Revert the edit. The leaked slot is not auto-cleaned by the fix:

docker exec trinity-redis sh -c \
  'REDISCLI_AUTH=$REDIS_BACKEND_PASSWORD redis-cli --user backend --no-auth-warning \
   ZREM agent:slots:canary-fleet-burst <leaked-execution-id>'

Run cycle → green.

Option 4 — E-02 via DB simulation

E-02 is the only invariant where direct DB manipulation is the realistic path: PR #524 added CAS guards specifically blocking terminal→non-terminal flips at the code level, so a "comment out the guard + trigger a reversal" recipe doesn't work cleanly. The DB poke simulates exactly what the bug class does — a silent status reversal — which is what E-02 exists to catch.

Trigger any task on a fleet agent and wait for it to complete to a terminal status (e.g. success).
Run cycle 1 → seeds canary:e02:terminal_seen with the completed execution_id.

Direct UPDATE simulating the reversal:

docker exec trinity-backend python3 -c "
import sqlite3
c = sqlite3.connect('/data/trinity.db')
c.execute(\"UPDATE schedule_executions SET status='running', completed_at=NULL WHERE id='<eid>'\")
c.commit()
"

Run cycle 2 → E-02 fires (severity=critical, prev=terminal curr=running).

Fix step:

docker exec trinity-backend python3 -c "
import sqlite3
c = sqlite3.connect('/data/trinity.db')
c.execute(\"UPDATE schedule_executions SET status='success', completed_at=datetime('now') WHERE id='<eid>'\")
c.commit()
"

Run cycle 3 → green.

Note: slot_service.py, db/agents.py, and schedule_executions rows are not modified by this PR. Options 2–4 are temporary local-edit smoke-test recipes, not production changes.

Cycle hygiene to expect (verified live for all four recipes)

First red cycle for an invariant: 1 violation row + 1 Slack alert (or one debug log line if CANARY_SLACK_WEBHOOK_URL is unset).
Subsequent cycles while the bug remains: more violation rows (audit trail) but 0 new alerts.
After the fix lands and bad state is cleaned: cycle returns violations: []. Alert count holds.
Red → green → red: a fresh alert on the second red flip.

Test plan

Unit tests pass locally (pytest tests/test_canary_invariants.py — 27/27)
All canary modules compile (python3 -m py_compile)
Local end-to-end L-03 demo via the cascade regression recipe (Option 2): one alert, continuing-red silent, fix step returns to green
Local end-to-end S-01 demo via the slot-release regression recipe (Option 3): one alert, continuing-red silent, manual slot cleanup + fix step returns to green
Local end-to-end E-02 demo via the DB simulation recipe (Option 4): one alert, continuing-red silent, fix step returns to green
Backend starts cleanly with CANARY_ENABLED=0 (default — no canary activity)
Backend starts cleanly with CANARY_ENABLED=1, watcher runs on the 5-min loop, fleet deployable via /api/systems/deploy
Backend starts cleanly with CANARY_ENABLED=1 and CANARY_SLACK_WEBHOOK_URL unset — silent sink, transitions still detected, debug log records them

Notes from the live demo

Real bugs surfaced during the end-to-end walkthrough and are fixed in this PR:

CANARY_ENABLED was set in .env but never propagated to the backend container — added explicit env passthrough in docker-compose.yml.
The canary's terminal-status set was ('completed', 'failed', 'cancelled', 'timeout') — wrong on three of four values vs Trinity's actual TaskExecutionStatus enum (success, failed, cancelled, skipped). Would have silently broken E-02 on production data and generated false-positive L-03 hits. Now sourced from the canonical enum and reused for both invariants.
Transition detection was firing on every continuing-red cycle (the old rule "previous violation predates this snapshot" is true on every later cycle). Replaced with a Redis previous-cycle cursor so transitions only fire on actual green→red flips. The endpoint now returns exactly what the service emitted (no recomputation), so API and alert sink agree.
Original design routed alerts through the in-app notification bell with an admin-bypass on the per-agent access filter (so platform-sourced alerts surfaced for admins). The bypass was reverted in de6786f3 once the alert sink moved to Slack incoming webhook in 21b833d4 — no notifications-table writes, no access-filter changes anywhere.
The S-01 reversal alert was reading prev_status from the canary:e02:terminal_seen Redis hash without preserving the actual terminal status; alerts now show the real prior status (7cca4b8c).

These argue for adding an integration-test layer in a follow-up — pytest-driven test_systems.py-style tests against a live backend would catch this class of issue automatically. Discussed in the PR thread.

Phase 2 (deferred)

S-02, S-03, E-01, E-05, E-06, B-01, B-02, G-01, R-01 (per docs/testing/orchestration-invariant-catalog.md). Each lands as a new file under src/backend/canary/invariants/ plus a registry entry. The watcher service and API surface stay unchanged.

Closes the infrastructure portion of #411's acceptance criteria.

Implements the canary harness from Abilityai#411 that catches the bug class behind PRs Abilityai#378, Abilityai#403, Abilityai#129, Abilityai#226 — race conditions and cross-component state drift between Redis × SQLite × agent registries that unit tests miss. Architecture (per design discussion in PR comments): - Deterministic library (src/backend/canary/) — pure-function checks shared between the watcher and the on-demand admin endpoint. No LLM reasoning anywhere — same snapshot in, same violations out. - Watcher service (services/canary_service.py) — 5-min APScheduler-style loop modeled on cleanup_service.py. Disabled by default; enable on staging/dev with CANARY_ENABLED=1. - Fleet manifest (config/canary-fleet.yaml) — synthetic load via the existing /api/systems/deploy mechanism. Without traffic the harness reports trivially-green cycles. Phase 1 invariants: - S-01 (slot–row bijection) — Redis ZRANGE vs SQL running rows; drain sentinels filtered. Critical severity. - E-02 (no phantom reversal) — terminal executions stay terminal. Phase 1 uses Redis-backed state comparison instead of Vector log diff for simplicity. Critical severity. - L-03 (delete cascades) — orphan-row scan across cross-cutting tables (agent_sharing, agent_schedules, schedule_executions [non-terminal], agent_skills, agent_tags, agent_shared_files, agent_public_links, pending operator_queue / access_requests, agent-scoped mcp_api_keys, active chat_sessions) plus orphan agent:slots:* Redis key scan. Critical for active-orchestration orphans, major otherwise. Persistence + API: - canary_violations table (migration 34, schema in db/schema.py). - GET /api/canary/violations + /violations/stats + /violations/{id} — admin-only read paths over the persisted history. - POST /api/canary/run-cycle — admin-only on-demand trigger; delegates to the same CanaryService.run_cycle() the background loop calls. Notifications: - One green→red transition emits one notification via db.create_notification(notification_type='alert', category='canary'). - Severity → priority: critical → urgent, major → high, minor → normal. - Continuing-red cycles do not re-notify; the row in canary_violations is the trend signal, the bell is the "now" signal. Tests: - 27 unit tests in tests/test_canary_invariants.py covering CanaryOperations CRUD/filters/stats, snapshot collector behaviour, each invariant's holds-and-fires cases, the L-03 critical/major severity split, and the runner registry. Smoke-test recipe (Option 1 — orphan SQL row, no code changes): # 1. Set CANARY_ENABLED=1 and restart backend (creates canary_violations # table via migration; starts the watcher). # 2. Deploy the fleet: curl -X POST -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d "$(jq -Rs '{manifest: .}' < config/canary-fleet.yaml)" \ http://localhost:8000/api/systems/deploy # 3. Baseline cycle: curl -X POST -H "Authorization: Bearer $TOKEN" \ http://localhost:8000/api/canary/run-cycle | jq '.violations, .transitions' # 4. Inject one orphan row: sqlite3 ~/trinity-data/trinity.db \ "INSERT INTO agent_sharing (agent_name, shared_with_email, shared_by_id, created_at) \ VALUES ('ghost-canary-test', 'smoketest@example.com', '<admin-id>', datetime('now'));" # 5. Trigger again — expect L-03 in violations + transitions: curl -X POST -H "Authorization: Bearer $TOKEN" \ http://localhost:8000/api/canary/run-cycle | jq '.violations[0], .transitions' # 6. Verify in UI: bell shows new 'canary' notification (severity major). # 7. Cleanup: sqlite3 ~/trinity-data/trinity.db \ "DELETE FROM agent_sharing WHERE agent_name='ghost-canary-test';" Closes part of Abilityai#411 — Phase 1 acceptance criteria (S-01, E-02, L-03 running on staging continuously, table + read endpoint shipped). Phase 2 adds S-02, S-03, E-01, E-05, E-06, B-01, B-02, G-01, R-01 incrementally without changes to the watcher or API surface.

Six fixes from the live end-to-end smoke test on staging-equivalent local stack — all surfaced by walking the L-03 + S-01 demo recipes: 1. **`CANARY_ENABLED` env passthrough** (`docker-compose.yml`). The flag was set in `.env` but never reached the backend container, so the watcher silently no-op'd on staging. Added explicit `${CANARY_ENABLED:-0}` mapping. 2. **`canary-fleet.yaml` CPU fix** (`config/canary-fleet.yaml`). `cpu: "0.5"` failed `int()` in Trinity's resource parser. Bumped `burst` to `cpu: "1"` matching the platform's whole-number convention. 3. **Notifications API admin bypass** (`routers/notifications.py`). `agent_notifications.agent_name` for canary entries is the synthetic `canary-harness` (no Docker container). The existing per-agent access filter excluded it from `GET /api/notifications` so the bell stayed empty. Admins now skip the per-agent filter entirely; non-admins keep the existing semantics. 4. **Drop redundant timestamp from notification body** (`services/canary_service.py:_render_message`). The notification panel renders `created_at` as "just now / 4m ago" and the precise ISO `snapshot_time` is in `metadata.snapshot_time`. Embedding the full ISO microsecond suffix in the message text was pure clutter. 5. **Fix transition detection** (`services/canary_service.py`). Old rule "previous violation predates this snapshot → transition" fired on *every* continuing-red cycle, since prior violations are always older than the current snapshot. Replaced with a Redis-backed previous-cycle cursor (`canary:last_cycle_at`): transition only fires when the latest stored violation predates the *previous cycle's* snapshot, i.e. the previous cycle was green for that invariant. Handles first-ever / continuing-red / red→green→red correctly; verified live against L-03 (one notification across two consecutive red cycles). 6. **Run-cycle endpoint reflects what the service emitted** (`routers/canary.py`). Replaced the router's duplicate transition-detection algorithm with `CycleResult. transition_invariant_ids` returned from `CanaryService.run_cycle()`. The endpoint and the bell can no longer disagree about which invariants fired. Plus a `snapshot.py` module-docstring expansion explaining why state collection is its own module separate from the invariant checks (consistency-per-cycle, no duplicated query code, test-friendliness, explicit non-atomicity across Redis × SQLite).

…onStatus enum Surfaced by the live E-02 demo: cycle 1 was supposed to seed `canary:e02:terminal_seen` from a freshly-completed fleet task but the Redis hash stayed empty, so the simulated reversal in cycle 2 went undetected. Root cause: `TERMINAL_EXECUTION_STATUSES` in `canary/snapshot.py` listed `("completed", "failed", "cancelled", "timeout")` but Trinity's actual terminal set (per `models.TaskExecutionStatus` and the CAS state machine in PR Abilityai#524) is `("success", "failed", "cancelled", "skipped")`. Two of the four values were wrong: - `completed` is never written — `success` is the canonical name. - `timeout` is not a status; timed-out executions land in `failed`. - `skipped` (capacity-overflow path, SCHED-COND-001) was missing. Effects of the bug: 1. **E-02 silently broken on production data.** Fleet executions finishing with `status='success'` were not added to `terminal_exec_ids`, so the next cycle's reversal-detection comparison set was always empty. A real terminal→running flip would have gone undetected indefinitely. 2. **L-03 orphan-scan filter wrong.** The `schedule_executions` filter used the same hardcoded list, so legitimate-terminal `success`/`skipped` rows were treated as non-terminal and scanned for orphans on every cycle. Harmless when no orphans exist, but would have generated false-positive L-03 hits in ordinary recovery scenarios. Fix: - Replace the constant with `("success", "failed", "cancelled", "skipped")` matching `TaskExecutionStatus`. - Build the L-03 orphan-scan SQL filter from the constant (`f"status NOT IN ({_TERMINAL_SQL_LIST})"`) so the two stay in lockstep. - Drop the duplicate constant declaration further down the file. - Update e02 docstring + test fixtures (`'completed'` → `'success'`). Verified locally: - `pytest tests/test_canary_invariants.py` — 27/27 passing. - End-to-end E-02 demo: cycle 1 seeds `canary:e02:terminal_seen` (HEXISTS=1), cycle 2 fires after `UPDATE schedule_executions SET status='running'`, cycle 3 stays silent (continuing-red), fix-step reverts the row → cycle 4 green. Exactly 1 notification. This was a real defect waiting to ship — caught only because the demo went all the way through to a real fleet task. Argues for an integration-test layer in a follow-up PR (per `test_systems.py`- style harness discussed in the original PR thread).

…ration # Conflicts: # docs/memory/architecture.md

…variant #16) _collect_terminal_executions filtered with `datetime('now', '-N minutes')` against `completed_at` (written via `utc_now_iso()` → ISO-Z). The two formats diverge at the date/time separator (T vs space), so lexicographic compare let every same-day terminal row through regardless of age — the exact failure invariant #16 / issue Abilityai#476 was written about. Consequence: on busy installs the canary:e02:terminal_seen Redis hash grew unbounded until PREV_TERMINAL_MAX nuked the prev set, leaving E-02 with nothing to diff against on the following cycle. - Extend iso_cutoff() with a keyword-only `minutes=` parameter for sub-hour windows. Existing positional callers unaffected. - Switch snapshot._collect_terminal_executions to iso_cutoff(minutes=...). - Pin the new signature with TestMinutesKwarg in test_iso_cutoff.py (incl. a TypeError check on `iso_cutoff(2, 30)` so a future refactor can't quietly turn that into "2h30m").

…cleResult The router's re-fetch loop ran list_canary_violations once per violation (not per invariant), repeatedly querying the same (invariant_id, snapshot_time) window and matching rows back to violations by signal_query — both wasteful and silently wrong when two violations share a signal_query. The service already had the row id from insert_canary_violation and threw it away. Thread it through CycleResult.persisted_violation_ids (index-aligned with .violations, None on insert failure) so the router maps directly to row ids without touching the DB.

The 27-test suite hit DB CRUD and the pure invariants but never exercised the orchestrator — which is where the demo-driven bugs lived (e7c11b2 transition spam, ef40cf9 terminal-status typo). Both shipped past the unit suite and were caught only by hand-driven demo runs. Adds a TestCanaryService class with four end-to-end tests over run_cycle() using the existing FakeRedis + temp SQLite, plus a canary_service fixture that stubs `database` / `db_models` and records create_notification calls so transitions can be asserted directly: - test_first_cycle_violation_fires_one_notification — baseline - test_continuing_red_does_not_re_fire — regression for e7c11b2 - test_red_green_red_fires_twice — covers the re-arm branch - test_terminal_status_set_seeds_e02_side_table — regression for ef40cf9 (would fail against the pre-fix tuple) Also adds string get/set ops to FakeRedis so the previous-cycle cursor (canary:last_cycle_at) round-trips. 31/31 tests pass.

…ration # Conflicts: # src/backend/db/migrations.py

…tions Per product call: green→red transitions no longer write agent_notifications rows or broadcast over WebSocket. Slack lands in a follow-up PR and plugs into the same _emit_transition seam. - _emit_transition now calls a no-op _notify_transition_stub (logs only) - _render_message / _broadcast / _severity_rank / ws-manager hooks kept in place so the next PR is a single-file change - Tests assert on transition_invariant_ids / cumulative_transitions and pin notification_calls == [] to catch regressions - architecture.md notes Phase 1 has no alert sink wired

The dashboard surface is the Operating Room notifications panel, not a literal navbar bell — comments and docstrings now say "alert sink" or "panel" so a future reader doesn't go hunting for a bell icon.

…efault doesn't exist) The manifest referenced `local:default`, which has no template directory under config/agent-templates/. Agent creation silently skips template-volume mounting on missing paths (services/agent_service/crud.py:341), so the fleet members were spinning up from the bare base image with no CLAUDE.md — verified with `docker exec agent-canary-fleet-burst cat /home/developer/CLAUDE.md` returning "No such file". Switched to `local:test-echo`: minimal stock template, no MCP/credentials, deterministic short replies — exactly what S-01/E-02 slot-churn invariants need. Schedule messages simplified accordingly. Verified end-to-end on local: deleted broken containers, redeployed via POST /api/systems/deploy, agents came up with templates, autonomy enabled, schedules fired, real-flow ZREM regression in slot_service.py produced S-01 critical violation captured by both the 5-min watcher (transitions=1) and on-demand POST /api/canary/run-cycle. Recovery path returned to green.

The Notifications bullet still described the rejected dashboard path (db.create_notification → agent_notifications) after 6e1bb94 stubbed the alert sink. Replaced with an Alert sink bullet noting the no-op _notify_transition_stub and the preserved _emit_transition seam for the follow-up Slack PR. Architecture.md and requirements.md now agree.

…hook - _emit_transition POSTs Block Kit payload via slack_service.post_webhook - payload: severity emoji header + friendly invariant name, summary, per-invariant forensic detail, runbook hint, context line with snapshot_time + violation row #N + "last red Xm ago" badge - CANARY_SLACK_WEBHOOK_URL env var (unset = silent sink) - previous_violation_at now populated through CycleResult (was always None) - removes dead WS-broadcast scaffolding from the dropped Phase 1 path

Justification was canary harness notifications under a synthetic agent name, but the alert sink was rerouted to Slack (21b833d) and never writes to the notifications table. Reverting restores ownership-scoped filtering for admins — no live caller depends on the bypass.

… response Thread snapshot.sources_unavailable through CycleResult so the on-demand run-cycle endpoint reflects degraded Redis/SQLite reads instead of always returning []. The field's whole purpose is to expose partial-state cycles to the operator running the smoke test.

Pre-pivot vocabulary referenced a canary agent template invoking mcp__trinity__send_notification on a per-agent loop. The harness has been a backend service with a Slack webhook sink for several commits; update the docstrings to match.

Header docstring stopped at Abilityai#34 while MIGRATIONS tuple has 57 entries. Pre-existing drift; the canary migration added to it.

…erts The E-02 phantom-reversal side-table stored the literal string "terminal" against every id, so once the Slack sink landed its forensic line printed "terminal → running" — useless to on-call. The snapshot now carries terminal_exec_statuses: dict[eid, status] and E-02 maintains a parallel canary:e02:terminal_status hash keyed alongside the canary:e02:terminal_seen ZSET. The real prior status (success / failed / cancelled / skipped) is read back when a reversal fires and rendered into both observed_state and the signal_query line. Aged-out ids are HDEL'd in the same sweep that ZREMRANGEBYSCORE drops them so the two structures stay in sync. Bundled side-table refactor: HASH + hard 5000-row DEL-on-overflow becomes ZSET trimmed by ZREMRANGEBYSCORE on PREV_TERMINAL_RETENTION_SECONDS (1h). The hard-reset trim left a one-cycle E-02 blind spot every time the cap crossed on busy installs. Tests: extends FakeRedis with zrangebyscore / hmget / hdel; asserts the existing reversal test now reports previous_status="success", adds a new test that round-trips all four terminal statuses through seed→reverse, and a regression test for the age-based trim invariant.

… endpoint After the 422 raise on invalid_ids, requested_ids is provably all-valid, so re-filtering against INVARIANTS was dead code. Pass requested_ids directly to run_cycle().

Comment said `>=` but the code uses `<`.

CANARY-001 alert sink ships in Phase 1, not Phase 2.

vybe

Review: Request Changes

Two items before this can merge.

1. `NameError` in `POST /api/canary/run-cycle` — hard blocker

In src/backend/routers/canary.py, the run_canary_cycle() handler references valid_ids which is never defined. It should be requested_ids (defined ~50 lines earlier). Every call to the endpoint will throw NameError: name 'valid_ids' is not defined. The background loop is unaffected since it calls run_cycle() directly, but the on-demand endpoint is completely broken.

# BUG — valid_ids is undefined; crashes on every request
checks_skipped = [i for i in valid_ids if i not in cycle.violations]
checks_run=[i for i in valid_ids if i in cycle.violations],

# Fix — replace with requested_ids
checks_skipped = [i for i in requested_ids if i not in cycle.violations]
checks_run=[i for i in requested_ids if i in cycle.violations],

2. `canary_service.py` is 692 lines — please split before merge

The service file is already 692 lines and the diff shows canary_service.py at 692 lines added. Target is under 1000 lines per file, but this one is already getting close and Phase 2 invariants will make it grow. Recommend splitting before merge while the boundaries are obvious:

canary_service.py — just the CanaryService class and the 5-min loop (~200 lines)
canary/alert_sink.py (or similar) — Slack Block Kit formatting + webhook POST logic (currently embedded in the service)
canary/transition.py (or similar) — green→red transition detection logic

The snapshot collector and invariant library are already well-separated in src/backend/canary/. The service file should be equally lean.

Everything else here is well-designed — pure-function invariants, graceful degradation on source unavailability, age-based E-02 trimming, one-alert-per-transition suppression, and a solid 1400-line test suite. Happy to approve once these two are addressed.

…lse positives The bijection check fired on a transient mismatch caught by a snapshot landing in the ~30ms gap between the SQL `status='running'` commit and the Redis ZADD on start (or the ~5ms gap between SQL terminal flip and ZREM on stop). Real `release()` skips and phantom rows persist across multiple cycles, so 3s of grace silences the races without losing coverage of the bug class S-01 was designed to catch. - snapshot exposes `running_started_at` per running id and `slot_scores` per slot member (`time.time()` at ZADD), so the check can age-filter. - S-01 drops `in_sql_only` ids whose `started_at` is within 3s and `in_redis_only` ids whose ZSET score is within 3s. Existing strict behaviour applies for older / unparseable timestamps. - FakeRedis test stub gains `withscores` for ZRANGE. - Three new tests: fresh-sql suppressed, fresh-redis suppressed, durable mismatch still fires. 51/51 canary tests pass.

…ration # Conflicts: # src/backend/db/schema.py

… split alerts (1) NameError on POST /api/canary/run-cycle: handler referenced `valid_ids` which never existed (commit 02cfc97 only fixed the call site, not the two trailing usages). Endpoint crashed on every request. Replace with `requested_ids` (provably valid post-422 check). (2) services/canary_service.py was 692 lines and growing. Extract the Slack alert surface (constants + `_emit_transition` + `_build_slack_payload` + `_render_forensic` + `_format_row_refs` + `_format_last_red` + `_render_message`) into services/canary_alerts.py as `CanaryAlerts`. canary_service.py drops to 347 lines, alerts module is 370 — both well under the 1000-line target. No behaviour change. Tests pivoted to `CanaryAlerts._foo` references; 51/51 pass.

…ts split Stale `_emit_transition` / bare `_build_slack_payload` references in test docstrings and the slack_capture fixture's hint pointed at the old method names. Renaming to `CanaryAlerts.emit_transition` etc. so the hints match the actual code path. No test logic changes.

I1 — Webhook URL leak in error logs (services/slack_service.py:309-311): httpx exceptions embed the request URL in their __str__, and for canary alerts that URL IS the credential. The catchall returned `str(e)` which was then logged by `CanaryAlerts.emit_transition`. Sanitize by returning only the exception class name + a generic message; no URL ever lands in Vector logs. I2 — `checks_skipped` always empty (routers/canary.py): `run_invariants` populates `cycle.violations[inv_id]` for every selected id (with `[]` when clean or when a check raised), so the field would always be empty. The actionable signal lives in `sources_unavailable` already. Drop the field from `RunCycleResponse` and simplify `checks_run` to the request's filter list directly. I3 — Severity rank dict duplicated: promote `_severity_rank` → `severity_rank` in canary_alerts and import it into the router instead of redeclaring the literal. 51/51 canary tests pass; live-stack endpoint verified (POST /run-cycle, partial filter, 422 path).

- I1: architecture.md run-cycle row referenced a non-existent `checks_skipped` field; replaced with `sources_unavailable` and documented the new 409 response. - I2: lock-contention path returned an empty 200 indistinguishable from a real green cycle. CycleResult now carries `skipped: bool` and the router maps it to HTTP 409 `cycle in progress`. - I3: TERMINAL_EXECUTION_STATUSES is now derived from TaskExecutionStatus.{SUCCESS,FAILED,CANCELLED,SKIPPED}.value instead of a hand-maintained string tuple. Tests stub `models` so unit imports don't pull in pydantic/db_models. 51/51 canary unit tests + 13/13 iso_cutoff tests green.

`datetime.fromisoformat(ts.rstrip("Z")).timestamp()` stripped the Z suffix, leaving a naive datetime; `.timestamp()` then interprets naive datetimes as local time, so on any host with TZ != UTC the grace-window math was off by the host's UTC offset — producing false-positive S-01 alerts. Python 3.11's `fromisoformat` parses Z natively into `tzinfo=UTC`, so dropping the rstrip restores correctness on any host. Surfaced by /review I1.

Diff-scoped audit of the canary harness (Phase 1 of Abilityai#411). Zero findings at the 8/10 daily-mode confidence gate — admin-only API surface, outbound-only Slack with explicit URL-leak mitigation, parameterized SQL with constants-only f-string composition, env-gated activation. Five candidates considered and dropped below the gate (documented in the report).

vybe

Both items addressed — NameError fixed and alert sink split into . CSO audit clean (0 findings). LGTM.

vybe

Both items addressed — NameError fixed and alert sink split into canary_alerts.py. CSO audit clean (0 findings). LGTM.

Adds three moderate-complexity invariants on top of Phase 2. Each brings exactly one new piece of plumbing — first time the canary takes a hard dep on a non-trivial source beyond SQLite + Redis basic ops: - S-03 — Slot TTL ≥ execution timeout. For every member of `agent:slots:A`, the companion `agent:slot:A:{eid}` HASH must have `TTL ≥ execution_timeout_seconds + 300s` (SLOT_TTL_BUFFER). Three failure kinds surfaced explicitly: `missing` (-2; the #226 class), `no_expiry` (-1), `below_floor` (positive TTL under floor). Critical. Tier A. Per-slot `redis.ttl()` lookup, bounded by ZCARD per agent (≤ max_parallel_tasks). - B-02 — No queued without slots-full. If any agent has queued > 0, then either `slot_count == max_parallel` (legit backpressure) OR a drain tick fired in the last 60s (drain will pick it up). Critical. Tier B. Requires `CapacityManager.run_maintenance()` to write a unix-timestamp heartbeat to `canary:drain_tick_at` at the END of each successful sweep — mid-sweep crash leaves cursor stale and lets B-02 catch the breakage. One-line write in capacity_manager.py, rest is canary-local. - R-01 — No zombie Claude processes. For every running `trinity.platform=agent` container, `ps -eo stat,comm | grep '^Z.*claude' | wc -l` must be 0. Critical. Tier A. Guards PR #407. New source type — docker exec via the existing docker_service.docker_client. Per-container failures recorded in `sources_unavailable` so a single unhealthy container doesn't kill the cycle. Regex anchored at `^Z` rather than the catalog's ` Z` (leading-space) — procps-ng on the agent base image emits STAT left-aligned without padding; verified live by spawning an actual zombie via `os.fork()`+`prctl(PR_SET_NAME, "claude")`. Snapshot extended: - `AgentSnapshot.slot_ttls: Dict[str, int]` — per-slot metadata TTL, drain sentinels skipped at collection time. - `Snapshot.drain_tick_at: Optional[float]` — read from `canary:drain_tick_at`, sentinel-`None` on cold cluster. - `Snapshot.zombie_counts: Dict[str, int]` — per-agent zombie process count via container.exec_run; missing entry = exec failed for that container (recorded in `sources_unavailable`). Tests: 67 → 84 passing. Added `fake_docker` fixture so the synthetic container list is controllable; FakeRedis got a `ttl()` method with the standard -2/-1/positive sentinel semantics. Verification against local stack — for each invariant, provoke, run `POST /api/canary/run-cycle`, observe red, revert, observe green: - S-03: ZADD a slot + EXPIRE its metadata HASH to 30s while the floor is 360s → critical violation `kind: below_floor`. Also covered the `missing` kind by deleting the HASH entirely. Revert by EXPIRE 500. - B-02: inserted 1 queued row, set `canary:drain_tick_at` to 600s ago → critical violation, `free_slots: 2, drain_tick_age_seconds: 600`. Revert by writing a fresh timestamp. - R-01: spawned a real zombie inside agent-cornelius-m via Python fork + prctl PR_SET_NAME → critical violation `zombie_count: 1`. Reaped by killing the parent → green. All post-fix cycles returned `violations: 0, transitions: 0`. Final all-10-invariants cycle on clean platform: 106ms cycle duration, all green, `sources_unavailable: []`. Refs: #411, #653 (Phase 1), #884 (this PR — Phase 2 also)

… B-02, R-01) (#884) * feat(#882): canary invariant harness Phase 2 (S-02, E-01, E-05, B-01) Adds four single-source SQL/Redis invariants to the canary harness (#411). All four follow the Phase 1 (#653) pattern — no new source types, no new infrastructure, registered into the same `INVARIANTS` dict the run-cycle endpoint and background loop already drive. - S-02 — No overbooking. `ZCARD(agent:slots:A)` (drain sentinels filtered) > `max_parallel_tasks`. Critical. Tier A. Catches `acquire_slot` bypass — distinct from S-01 because the violation can be self-consistent (Redis and SQL agree on N+1 vs cap of N). - E-01 — Terminal-state closure. No `status='running'` row older than `execution_timeout_seconds + 300s` (matches `SLOT_TTL_BUFFER` so the check fires *after* cleanup has had its window). Critical. Tier B. - E-05 — Dispatched rows have session. No running row older than 60s with `claude_session_id IS NULL`. Major. Tier B. Guards #106. - B-01 — Queue-status coherence. `db.get_queued_count` (the accessor BacklogService calls) agrees with the snapshot's independently- collected `len(queued_exec_ids)`. Critical. Tier A. Trivially-green today after the #428 consolidation; regression guard against a future cache layer or status-filter drift on the production accessor. Snapshot extended with per-execution `claude_session_id` (E-05) and per-agent `queued_count_via_service` (B-01). The session-id collector PRAGMA-introspects the column so the minimal unit-test DDLs don't have to mirror every production column. The service-count collector lazy-imports `database.db`, returning `None` on import failure so unit tests (which stub `db.connection` but not the full facade) skip B-01 silently rather than firing a false positive. Unit tests: 67 passing (was 51). Each new invariant has positive, negative, and edge-case tests. Verification against local stack — for each invariant, provoke, run `POST /api/canary/run-cycle`, observe red, revert, observe green. All four reproduce as designed: - S-02 — ZADD'd 3 fake slot ids when max_parallel=2 → critical violation with `overbooked_by: 1`. - E-01 — inserted `status='running'` row with `started_at` 2h ago against a 60s-timeout agent → critical violation, `age_seconds: 7436 > timeout+buffer=360s`. - E-05 — inserted `status='running'` row 3 min old with `claude_session_id` NULL → major violation, age=188s. - B-01 — temporarily patched `db.get_queued_count` to return `count - 1` → critical violation, "db.get_queued_count = 0 != |queued ids in snapshot| = 1". Each post-fix cycle returned `violations: 0, transitions: 0`. Refs: #411, #653, docs/testing/orchestration-invariant-catalog.md * feat(#882): canary harness Phase 3 (S-03, B-02, R-01) Adds three moderate-complexity invariants on top of Phase 2. Each brings exactly one new piece of plumbing — first time the canary takes a hard dep on a non-trivial source beyond SQLite + Redis basic ops: - S-03 — Slot TTL ≥ execution timeout. For every member of `agent:slots:A`, the companion `agent:slot:A:{eid}` HASH must have `TTL ≥ execution_timeout_seconds + 300s` (SLOT_TTL_BUFFER). Three failure kinds surfaced explicitly: `missing` (-2; the #226 class), `no_expiry` (-1), `below_floor` (positive TTL under floor). Critical. Tier A. Per-slot `redis.ttl()` lookup, bounded by ZCARD per agent (≤ max_parallel_tasks). - B-02 — No queued without slots-full. If any agent has queued > 0, then either `slot_count == max_parallel` (legit backpressure) OR a drain tick fired in the last 60s (drain will pick it up). Critical. Tier B. Requires `CapacityManager.run_maintenance()` to write a unix-timestamp heartbeat to `canary:drain_tick_at` at the END of each successful sweep — mid-sweep crash leaves cursor stale and lets B-02 catch the breakage. One-line write in capacity_manager.py, rest is canary-local. - R-01 — No zombie Claude processes. For every running `trinity.platform=agent` container, `ps -eo stat,comm | grep '^Z.*claude' | wc -l` must be 0. Critical. Tier A. Guards PR #407. New source type — docker exec via the existing docker_service.docker_client. Per-container failures recorded in `sources_unavailable` so a single unhealthy container doesn't kill the cycle. Regex anchored at `^Z` rather than the catalog's ` Z` (leading-space) — procps-ng on the agent base image emits STAT left-aligned without padding; verified live by spawning an actual zombie via `os.fork()`+`prctl(PR_SET_NAME, "claude")`. Snapshot extended: - `AgentSnapshot.slot_ttls: Dict[str, int]` — per-slot metadata TTL, drain sentinels skipped at collection time. - `Snapshot.drain_tick_at: Optional[float]` — read from `canary:drain_tick_at`, sentinel-`None` on cold cluster. - `Snapshot.zombie_counts: Dict[str, int]` — per-agent zombie process count via container.exec_run; missing entry = exec failed for that container (recorded in `sources_unavailable`). Tests: 67 → 84 passing. Added `fake_docker` fixture so the synthetic container list is controllable; FakeRedis got a `ttl()` method with the standard -2/-1/positive sentinel semantics. Verification against local stack — for each invariant, provoke, run `POST /api/canary/run-cycle`, observe red, revert, observe green: - S-03: ZADD a slot + EXPIRE its metadata HASH to 30s while the floor is 360s → critical violation `kind: below_floor`. Also covered the `missing` kind by deleting the HASH entirely. Revert by EXPIRE 500. - B-02: inserted 1 queued row, set `canary:drain_tick_at` to 600s ago → critical violation, `free_slots: 2, drain_tick_age_seconds: 600`. Revert by writing a fresh timestamp. - R-01: spawned a real zombie inside agent-cornelius-m via Python fork + prctl PR_SET_NAME → critical violation `zombie_count: 1`. Reaped by killing the parent → green. All post-fix cycles returned `violations: 0, transitions: 0`. Final all-10-invariants cycle on clean platform: 106ms cycle duration, all green, `sources_unavailable: []`. Refs: #411, #653 (Phase 1), #884 (this PR — Phase 2 also) * fix(canary): /review fixes — alert quality + B-02 boot-window false-positive Addresses three findings from the pre-landing /review pass: I1 — Alert quality for Phase 2 + 3 invariants. canary_alerts.py only had S-01/E-02/L-03 entries in `_INVARIANT_NAMES`, `_INVARIANT_RUNBOOKS`, `_render_message`, and `_render_forensic`. New ids fell through to the "S-02 fired N violation(s)" generic fallback with the id doubled in the header. Added 7 entries each: - Friendly name and one-line runbook hint per invariant - Per-id `_render_message` (e.g. "3 zombie claude process(es) across 1 agent(s): cornelius-m" for R-01) - Per-id `_render_forensic` rendering of the relevant observed_state fields, truncated to 5 violations with a "+N more" footer Verified by hand-building an R-01 ViolationReport and inspecting the Block Kit payload — header, body, forensic, runbook, and context all render the new shape. I2 — B-02 boot-window false-positive. Background canary loop is fine (30s startup vs 15s maintenance loop), but the on-demand `POST /api/canary/run-cycle` endpoint can hit in the first 15s when no heartbeat exists. With pre-existing queued rows and free slots, B-02 would fire with `drain_tick_age_seconds: null`. `CapacityManager.__init__` now seeds the heartbeat with a fresh timestamp on construction. The maintenance loop overwrites on every successful tick; init only needs a non-stale floor. Verified live: deleted the heartbeat key, bounced the backend, key is present immediately — no waiting for the maintenance tick. I3 — Stale docstring in `_collect_zombie_counts`. The docstring still described the catalog's ` Z.*claude` (leading-space) reasoning while the actual `cmd` line uses `^Z.*claude`. Updated to describe the anchor-at-line-start version and reference the live-zombie verification. Bonus tidy: moved `import time` from inside `run_maintenance` to the top-of-file imports. Tests: 84/84 still green. Full all-10-invariant cycle clean. Refs: /review pass on #884 * feat(canary-fleet): replace long with sleep-echo slow agent `canary-fleet-long` was a duplicate of burst — same template, same task duration, same model — only the cron differed (*/5 vs */2). It added no coverage burst didn't already provide for S-01, E-02, S-03, R-01. Replace it with a `slow` agent backed by a new `sleep-echo` local template that sleeps 75s per task. This gives Phase 2 invariants something to inspect: - E-05 (dispatched rows have session): needs >60s running rows; was trivially-green with 4s test-echo tasks - S-03 below_floor: needs a slot to exist at canary snapshot time Also locks burst's live config into the yaml so a redeploy doesn't revert prior live SQL fixes: - cron: * * * * * -> */2 * * * * (cheapest cadence that phase-slides against the 5-min canary cycle) - description / comments updated to reflect actual coverage scope Manifest deploy can't express `model`, `max_parallel_tasks`, or `execution_timeout_seconds` (system_service.create_schedules drops them, SystemAgentConfig has no slot for capacity). Documented the four required post-deploy API calls in the yaml header so the next operator doesn't trip on it. * chore(lint): regenerate sys.modules baseline to absorb dev state Pre-existing CI failure inherited from dev — `dev` has been failing this lint since #871 (commit 98574f3) merged on 2026-05-17. That PR added 6 sys.modules violations in tests/unit/test_slot_per_slot_ttl.py without regenerating the baseline; a separate cleanup retired 3 violations in tests/unit/test_cleanup_unreachable_orphan.py. Regenerated via `python tests/lint_sys_modules.py --regenerate-baseline` — the path the lint script itself directs you to when violations move below baseline AND new files exceed it. Net: 235 violations in 67 files (unchanged total). No code-quality regression in this PR's actual diff — none of the canary tests use bare sys.modules manipulation (they use monkeypatch.setitem throughout).

obasilakis added 3 commits May 4, 2026 15:43

obasilakis marked this pull request as draft May 5, 2026 01:02

obasilakis added 18 commits May 7, 2026 10:49

Merge remote-tracking branch 'origin/dev' into feature/411-canary-mig…

17725d0

…ration # Conflicts: # docs/memory/architecture.md

Merge remote-tracking branch 'origin/dev' into feature/411-canary-mig…

5616f6b

…ration # Conflicts: # src/backend/db/migrations.py

docs(Abilityai#411): drop "bell" shorthand from canary comments

903669f

The dashboard surface is the Operating Room notifications panel, not a literal navbar bell — comments and docstrings now say "alert sink" or "panel" so a future reader doesn't go hunting for a bell icon.

docs(Abilityai#411): sync migrations.py header to all 57 entries

ce9f708

Header docstring stopped at Abilityai#34 while MIGRATIONS tuple has 57 entries. Pre-existing drift; the canary migration added to it.

refactor(Abilityai#411): drop redundant valid_ids filter in run-cycle…

02cfc97

… endpoint After the 422 raise on invalid_ids, requested_ids is provably all-valid, so re-filtering against INVARIANTS was dead code. Pass requested_ids directly to run_cycle().

docs(Abilityai#411): correct operator in _is_green_to_red comment

8e47292

Comment said `>=` but the code uses `<`.

docs(Abilityai#411): correct phase label in canary alert sink comment

43538ae

CANARY-001 alert sink ships in Phase 1, not Phase 2.

obasilakis marked this pull request as ready for review May 8, 2026 01:08

vybe requested changes May 8, 2026

View reviewed changes

obasilakis added 6 commits May 9, 2026 15:23

Merge remote-tracking branch 'origin/dev' into feature/411-canary-mig…

1c3cb4b

…ration # Conflicts: # src/backend/db/schema.py

obasilakis added 2 commits May 9, 2026 18:04

vybe approved these changes May 10, 2026

View reviewed changes

vybe merged commit a4eec13 into Abilityai:dev May 10, 2026
9 checks passed

This was referenced May 18, 2026

Canary Phase 2 + 3: add invariants S-02, E-01, E-05, B-01, S-03, B-02, R-01 #882

Open

feat(#882): canary harness Phase 2 + 3 (S-02, E-01, E-05, B-01, S-03, B-02, R-01) #884

Merged

Canary Phase 3: add invariants S-03, B-02, R-01 #885

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(#411): canary invariant harness Phase 1 (S-01, E-02, L-03)#653

feat(#411): canary invariant harness Phase 1 (S-01, E-02, L-03)#653
vybe merged 29 commits into
Abilityai:devfrom
obasilakis:feature/411-canary-migration

obasilakis commented May 4, 2026 •

edited

Loading

Uh oh!

vybe left a comment

Uh oh!

vybe left a comment

Uh oh!

vybe left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

obasilakis commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Phase 1 invariants

Why a separate snapshot.py from the invariant checks

Transition-detection rule

Alert sink (Slack incoming webhook)

API + persistence

Tests

Smoke-test recipes

Option 1 — L-03 via SQL poke (no code changes)

Option 2 — L-03 via real-flow code regression

Option 3 — S-01 via real-flow code regression

Option 4 — E-02 via DB simulation

Cycle hygiene to expect (verified live for all four recipes)

Test plan

Notes from the live demo

Phase 2 (deferred)

Uh oh!

vybe left a comment

Choose a reason for hiding this comment

Review: Request Changes

1. NameError in POST /api/canary/run-cycle — hard blocker

2. canary_service.py is 692 lines — please split before merge

Uh oh!

vybe left a comment

Choose a reason for hiding this comment

Uh oh!

vybe left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

obasilakis commented May 4, 2026 •

edited

Loading

Why a separate `snapshot.py` from the invariant checks

1. `NameError` in `POST /api/canary/run-cycle` — hard blocker

2. `canary_service.py` is 692 lines — please split before merge