Skip to content

feat(#411): canary invariant harness Phase 1 (S-01, E-02, L-03)#653

Merged
vybe merged 29 commits into
Abilityai:devfrom
obasilakis:feature/411-canary-migration
May 10, 2026
Merged

feat(#411): canary invariant harness Phase 1 (S-01, E-02, L-03)#653
vybe merged 29 commits into
Abilityai:devfrom
obasilakis:feature/411-canary-migration

Conversation

@obasilakis
Copy link
Copy Markdown
Contributor

@obasilakis obasilakis commented May 4, 2026

Summary

Implements Phase 1 of #411 — a continuous orchestration-invariant harness that catches the bug class behind PRs #378, #403, #129, #226: race conditions and cross-component state drift between Redis × SQLite × agent registries that unit tests miss.

Three components, each justified separately in the PR planning thread:

  • Deterministic library (src/backend/canary/) — pure-function checks shared between the watcher and the on-demand admin endpoint. No LLM reasoning anywhere; same snapshot in, same violations out.
  • Watcher service (services/canary_service.py) — 5-min loop modeled on cleanup_service.py. Disabled by default; enable on staging/dev with CANARY_ENABLED=1.
  • Fleet manifest (config/canary-fleet.yaml) — synthetic load via the existing /api/systems/deploy mechanism. Without traffic the harness reports trivially-green cycles.

Phase 1 invariants

ID Check Catches
S-01 Slot–row bijection: ZRANGE agent:slots:{name} (drain sentinels filtered) == SQL running rows for that agent #378/#403 phantom-stale-slot
E-02 No phantom reversal: terminal executions stay terminal across cycles. Phase 1 uses Redis-backed state comparison (canary:e02:terminal_seen) instead of Vector log diff for simplicity. Same class as #378/#403
L-03 Delete cascades: no orphan rows in agent_sharing, agent_schedules, non-terminal schedule_executions, agent_skills, agent_tags, agent_shared_files, agent_public_links, pending operator_queue/access_requests, agent-scoped mcp_api_keys, active chat_sessions, plus orphan agent:slots:* Redis keys Issue #129 family

Severity: critical for active-orchestration orphans (running executions, Redis slot keys); major for dangling permissions/sharing/tags.

Terminal-execution set used by E-02 + L-03 is sourced from TaskExecutionStatus (success, failed, cancelled, skipped) — the same set PR #524's CAS state machine treats as write-once.

Why a separate snapshot.py from the invariant checks

Stated in the module docstring; summarised here:

  1. One consistent view per cycle. All three Phase 1 invariants read overlapping state (Redis ZSETs, the agent_ownership set, execution rows). A shared Snapshot instance means per-check timing drift can't introduce spurious mismatches.
  2. No duplicated query code. New invariants are pure functions (snapshot) → list[ViolationReport]; they never re-implement SELECTs or ZRANGEs.
  3. Test-friendly. Tests pass synthetic Snapshot dataclasses straight in (tests/test_canary_invariants.py) — no live Redis or SQLite required to exercise checking logic.

The snapshot is not atomic across Redis and SQLite — those don't share transactions. The harness deliberately accepts sub-second inconsistencies because real bugs persist across a 5-minute cycle by definition; transient races self-resolve and aren't what the canary exists to catch.

Transition-detection rule

A green→red transition fires exactly one Slack alert.

  • Continuing-red cycles persist new violation rows (audit history) but stay silent on the alert sink — preventing alert fatigue.
  • Red→green→red fires a fresh alert on the second red flip.

Implementation uses a Redis cursor canary:last_cycle_at so each cycle compares this snapshot's violation timestamps against the previous cycle's snapshot, not the latest persisted violation. The router endpoint surfaces exactly the transitions the service emitted (no recomputation), so the API response and the alert sink can never disagree.

Alert sink (Slack incoming webhook)

The harness fires green→red alerts via a Slack incoming webhook — admin-side configuration, no Settings UI. The canary is staging/dev-only and the operator already has shell access.

  • Configuration: CANARY_SLACK_WEBHOOK_URL env var. Unset = silent sink (cycles still run, violations still persist, debug-line records the detected transition).
  • Payload: Block Kit — header (🚨 Canary: <invariant_id>), summary (severity, violation count, snapshot time), forensic detail per violation, runbook hint, "last red Xm ago" context badge when reopening after a green window.
  • Per-transition: exactly one POST per green→red flip per invariant. Continuing-red cycles do not re-post.
  • Failure handling: 5s timeout enforced by slack_service.post_webhook; failures are logged and swallowed so a hung webhook can't break the cycle. The violation row is still persisted regardless of webhook outcome.
  • No admin bypass anywhere: the alert path doesn't touch the in-app notification bell, doesn't insert into notifications, and doesn't change any per-agent access filter. The earlier admin-bypass approach was reverted in de6786f3 once Slack became the sink.

API + persistence

  • New canary_violations table (migration Smart Model Routing #34) — append-mostly, JSON observed_state carries invariant-specific payload.
  • GET /api/canary/violations + /violations/stats + /violations/{id} — admin-only read paths.
  • POST /api/canary/run-cycle — admin-only on-demand trigger that delegates to the same CanaryService.run_cycle() invoked by the background loop. Useful for smoke-tests, post-fix verification, and integration tests that need deterministic timing. Response surfaces sources_unavailable so a partial snapshot (e.g. Redis down) is visible to the caller instead of silently degrading checks (a3b77e34).

Tests

27 unit tests in tests/test_canary_invariants.py covering CanaryOperations CRUD/filters/stats, snapshot collector behaviour, each invariant's holds-and-fires cases, the L-03 critical/major severity split, and the runner registry. All green locally with a fake Redis + temp SQLite.

Smoke-test recipes

These are the manual end-to-end demos the harness was built to support. Run after CANARY_ENABLED=1 and (optionally) CANARY_SLACK_WEBHOOK_URL=<your-test-webhook> is set; fleet deployed via curl -X POST .../api/systems/deploy with config/canary-fleet.yaml.

Option 1 — L-03 via SQL poke (no code changes)

Surgical, deterministic, no rebuild. Triggers L-03 by inserting one orphan row directly:

sqlite3 ~/trinity-data/trinity.db \
  "INSERT INTO agent_sharing (agent_name, shared_with_email, shared_by_id, created_at) \
   VALUES ('ghost-canary-test', 'smoketest@example.com', '<admin-id>', datetime('now'));"

curl -sS -X POST -H "Authorization: Bearer $TOKEN" \
  http://localhost:8000/api/canary/run-cycle | jq '.violations[0], .transitions'
# expect: invariant_id='L-03', severity='major', one transition

# cleanup
sqlite3 ~/trinity-data/trinity.db \
  "DELETE FROM agent_sharing WHERE agent_name='ghost-canary-test';"

Option 2 — L-03 via real-flow code regression

Most authentic for L-03 — exercises the actual delete cascade.

  1. Edit src/backend/db/agents.py:128 and comment out the cascade DELETE:
    # cursor.execute("DELETE FROM agent_sharing WHERE agent_name = ?", (agent_name,))
  2. Backend --reload picks it up.
  3. Real API flow: POST /api/agentsPOST /api/agents/{name}/shareDELETE /api/agents/{name}. The agent_sharing row is left behind.
  4. POST /api/canary/run-cycle → L-03 fires.
  5. Revert the edit, manually clean the leftover orphan row, run cycle → green.

Option 3 — S-01 via real-flow code regression

Triggers S-01 by breaking slot release in services/slot_service.py.

  1. Edit src/backend/services/slot_service.py:168 and replace the ZREM with a no-op:
    # Remove from ZSET
    # CANARY S-01 SMOKE TEST — temporarily disabled. Revert before merging.
    # removed = self.redis.zrem(slots_key, execution_id)
    removed = 0
  2. Backend --reload picks it up.
  3. Trigger any task on a fleet agent (or wait for canary-fleet-burst's 1-minute schedule). When the task completes, release_slot() is called but the ZREM is no-op'd — Redis ZSET keeps the execution_id while SQL marks the row success.
  4. POST /api/canary/run-cycle → S-01 fires (severity=critical, in_redis_only contains the leaked execution_id).
  5. Revert the edit. The leaked slot is not auto-cleaned by the fix:
    docker exec trinity-redis sh -c \
      'REDISCLI_AUTH=$REDIS_BACKEND_PASSWORD redis-cli --user backend --no-auth-warning \
       ZREM agent:slots:canary-fleet-burst <leaked-execution-id>'
  6. Run cycle → green.

Option 4 — E-02 via DB simulation

E-02 is the only invariant where direct DB manipulation is the realistic path: PR #524 added CAS guards specifically blocking terminal→non-terminal flips at the code level, so a "comment out the guard + trigger a reversal" recipe doesn't work cleanly. The DB poke simulates exactly what the bug class does — a silent status reversal — which is what E-02 exists to catch.

  1. Trigger any task on a fleet agent and wait for it to complete to a terminal status (e.g. success).
  2. Run cycle 1 → seeds canary:e02:terminal_seen with the completed execution_id.
  3. Direct UPDATE simulating the reversal:
    docker exec trinity-backend python3 -c "
    import sqlite3
    c = sqlite3.connect('/data/trinity.db')
    c.execute(\"UPDATE schedule_executions SET status='running', completed_at=NULL WHERE id='<eid>'\")
    c.commit()
    "
  4. Run cycle 2 → E-02 fires (severity=critical, prev=terminal curr=running).
  5. Fix step:
    docker exec trinity-backend python3 -c "
    import sqlite3
    c = sqlite3.connect('/data/trinity.db')
    c.execute(\"UPDATE schedule_executions SET status='success', completed_at=datetime('now') WHERE id='<eid>'\")
    c.commit()
    "
  6. Run cycle 3 → green.

Note: slot_service.py, db/agents.py, and schedule_executions rows are not modified by this PR. Options 2–4 are temporary local-edit smoke-test recipes, not production changes.

Cycle hygiene to expect (verified live for all four recipes)

  • First red cycle for an invariant: 1 violation row + 1 Slack alert (or one debug log line if CANARY_SLACK_WEBHOOK_URL is unset).
  • Subsequent cycles while the bug remains: more violation rows (audit trail) but 0 new alerts.
  • After the fix lands and bad state is cleaned: cycle returns violations: []. Alert count holds.
  • Red → green → red: a fresh alert on the second red flip.

Test plan

  • Unit tests pass locally (pytest tests/test_canary_invariants.py — 27/27)
  • All canary modules compile (python3 -m py_compile)
  • Local end-to-end L-03 demo via the cascade regression recipe (Option 2): one alert, continuing-red silent, fix step returns to green
  • Local end-to-end S-01 demo via the slot-release regression recipe (Option 3): one alert, continuing-red silent, manual slot cleanup + fix step returns to green
  • Local end-to-end E-02 demo via the DB simulation recipe (Option 4): one alert, continuing-red silent, fix step returns to green
  • Backend starts cleanly with CANARY_ENABLED=0 (default — no canary activity)
  • Backend starts cleanly with CANARY_ENABLED=1, watcher runs on the 5-min loop, fleet deployable via /api/systems/deploy
  • Backend starts cleanly with CANARY_ENABLED=1 and CANARY_SLACK_WEBHOOK_URL unset — silent sink, transitions still detected, debug log records them

Notes from the live demo

Real bugs surfaced during the end-to-end walkthrough and are fixed in this PR:

  1. CANARY_ENABLED was set in .env but never propagated to the backend container — added explicit env passthrough in docker-compose.yml.
  2. The canary's terminal-status set was ('completed', 'failed', 'cancelled', 'timeout') — wrong on three of four values vs Trinity's actual TaskExecutionStatus enum (success, failed, cancelled, skipped). Would have silently broken E-02 on production data and generated false-positive L-03 hits. Now sourced from the canonical enum and reused for both invariants.
  3. Transition detection was firing on every continuing-red cycle (the old rule "previous violation predates this snapshot" is true on every later cycle). Replaced with a Redis previous-cycle cursor so transitions only fire on actual green→red flips. The endpoint now returns exactly what the service emitted (no recomputation), so API and alert sink agree.
  4. Original design routed alerts through the in-app notification bell with an admin-bypass on the per-agent access filter (so platform-sourced alerts surfaced for admins). The bypass was reverted in de6786f3 once the alert sink moved to Slack incoming webhook in 21b833d4 — no notifications-table writes, no access-filter changes anywhere.
  5. The S-01 reversal alert was reading prev_status from the canary:e02:terminal_seen Redis hash without preserving the actual terminal status; alerts now show the real prior status (7cca4b8c).

These argue for adding an integration-test layer in a follow-up — pytest-driven test_systems.py-style tests against a live backend would catch this class of issue automatically. Discussed in the PR thread.

Phase 2 (deferred)

S-02, S-03, E-01, E-05, E-06, B-01, B-02, G-01, R-01 (per docs/testing/orchestration-invariant-catalog.md). Each lands as a new file under src/backend/canary/invariants/ plus a registry entry. The watcher service and API surface stay unchanged.

Closes the infrastructure portion of #411's acceptance criteria.

obasilakis added 3 commits May 4, 2026 15:43
Implements the canary harness from Abilityai#411 that catches the bug class behind
PRs Abilityai#378, Abilityai#403, Abilityai#129, Abilityai#226 — race conditions and cross-component state
drift between Redis × SQLite × agent registries that unit tests miss.

Architecture (per design discussion in PR comments):
- Deterministic library (src/backend/canary/) — pure-function checks
  shared between the watcher and the on-demand admin endpoint. No LLM
  reasoning anywhere — same snapshot in, same violations out.
- Watcher service (services/canary_service.py) — 5-min APScheduler-style
  loop modeled on cleanup_service.py. Disabled by default; enable on
  staging/dev with CANARY_ENABLED=1.
- Fleet manifest (config/canary-fleet.yaml) — synthetic load via the
  existing /api/systems/deploy mechanism. Without traffic the harness
  reports trivially-green cycles.

Phase 1 invariants:
- S-01 (slot–row bijection) — Redis ZRANGE vs SQL running rows; drain
  sentinels filtered. Critical severity.
- E-02 (no phantom reversal) — terminal executions stay terminal.
  Phase 1 uses Redis-backed state comparison instead of Vector log diff
  for simplicity. Critical severity.
- L-03 (delete cascades) — orphan-row scan across cross-cutting tables
  (agent_sharing, agent_schedules, schedule_executions [non-terminal],
  agent_skills, agent_tags, agent_shared_files, agent_public_links,
  pending operator_queue / access_requests, agent-scoped mcp_api_keys,
  active chat_sessions) plus orphan agent:slots:* Redis key scan.
  Critical for active-orchestration orphans, major otherwise.

Persistence + API:
- canary_violations table (migration 34, schema in db/schema.py).
- GET /api/canary/violations + /violations/stats + /violations/{id} —
  admin-only read paths over the persisted history.
- POST /api/canary/run-cycle — admin-only on-demand trigger; delegates
  to the same CanaryService.run_cycle() the background loop calls.

Notifications:
- One green→red transition emits one notification via
  db.create_notification(notification_type='alert', category='canary').
- Severity → priority: critical → urgent, major → high, minor → normal.
- Continuing-red cycles do not re-notify; the row in canary_violations
  is the trend signal, the bell is the "now" signal.

Tests:
- 27 unit tests in tests/test_canary_invariants.py covering
  CanaryOperations CRUD/filters/stats, snapshot collector behaviour,
  each invariant's holds-and-fires cases, the L-03 critical/major
  severity split, and the runner registry.

Smoke-test recipe (Option 1 — orphan SQL row, no code changes):

  # 1. Set CANARY_ENABLED=1 and restart backend (creates canary_violations
  #    table via migration; starts the watcher).
  # 2. Deploy the fleet:
  curl -X POST -H "Authorization: Bearer $TOKEN" \
    -H "Content-Type: application/json" \
    -d "$(jq -Rs '{manifest: .}' < config/canary-fleet.yaml)" \
    http://localhost:8000/api/systems/deploy
  # 3. Baseline cycle:
  curl -X POST -H "Authorization: Bearer $TOKEN" \
    http://localhost:8000/api/canary/run-cycle | jq '.violations, .transitions'
  # 4. Inject one orphan row:
  sqlite3 ~/trinity-data/trinity.db \
    "INSERT INTO agent_sharing (agent_name, shared_with_email, shared_by_id, created_at) \
     VALUES ('ghost-canary-test', 'smoketest@example.com', '<admin-id>', datetime('now'));"
  # 5. Trigger again — expect L-03 in violations + transitions:
  curl -X POST -H "Authorization: Bearer $TOKEN" \
    http://localhost:8000/api/canary/run-cycle | jq '.violations[0], .transitions'
  # 6. Verify in UI: bell shows new 'canary' notification (severity major).
  # 7. Cleanup:
  sqlite3 ~/trinity-data/trinity.db \
    "DELETE FROM agent_sharing WHERE agent_name='ghost-canary-test';"

Closes part of Abilityai#411 — Phase 1 acceptance criteria (S-01, E-02, L-03
running on staging continuously, table + read endpoint shipped). Phase 2
adds S-02, S-03, E-01, E-05, E-06, B-01, B-02, G-01, R-01 incrementally
without changes to the watcher or API surface.
Six fixes from the live end-to-end smoke test on staging-equivalent
local stack — all surfaced by walking the L-03 + S-01 demo recipes:

1. **`CANARY_ENABLED` env passthrough** (`docker-compose.yml`).
   The flag was set in `.env` but never reached the backend container,
   so the watcher silently no-op'd on staging. Added explicit
   `${CANARY_ENABLED:-0}` mapping.

2. **`canary-fleet.yaml` CPU fix** (`config/canary-fleet.yaml`).
   `cpu: "0.5"` failed `int()` in Trinity's resource parser. Bumped
   `burst` to `cpu: "1"` matching the platform's whole-number
   convention.

3. **Notifications API admin bypass** (`routers/notifications.py`).
   `agent_notifications.agent_name` for canary entries is the
   synthetic `canary-harness` (no Docker container). The existing
   per-agent access filter excluded it from `GET /api/notifications`
   so the bell stayed empty. Admins now skip the per-agent filter
   entirely; non-admins keep the existing semantics.

4. **Drop redundant timestamp from notification body**
   (`services/canary_service.py:_render_message`). The notification
   panel renders `created_at` as "just now / 4m ago" and the
   precise ISO `snapshot_time` is in `metadata.snapshot_time`.
   Embedding the full ISO microsecond suffix in the message text was
   pure clutter.

5. **Fix transition detection** (`services/canary_service.py`).
   Old rule "previous violation predates this snapshot → transition"
   fired on *every* continuing-red cycle, since prior violations
   are always older than the current snapshot. Replaced with a
   Redis-backed previous-cycle cursor (`canary:last_cycle_at`):
   transition only fires when the latest stored violation predates
   the *previous cycle's* snapshot, i.e. the previous cycle was
   green for that invariant. Handles first-ever / continuing-red /
   red→green→red correctly; verified live against L-03 (one
   notification across two consecutive red cycles).

6. **Run-cycle endpoint reflects what the service emitted**
   (`routers/canary.py`). Replaced the router's duplicate
   transition-detection algorithm with `CycleResult.
   transition_invariant_ids` returned from
   `CanaryService.run_cycle()`. The endpoint and the bell can no
   longer disagree about which invariants fired.

Plus a `snapshot.py` module-docstring expansion explaining why state
collection is its own module separate from the invariant checks
(consistency-per-cycle, no duplicated query code, test-friendliness,
explicit non-atomicity across Redis × SQLite).
…onStatus enum

Surfaced by the live E-02 demo: cycle 1 was supposed to seed
`canary:e02:terminal_seen` from a freshly-completed fleet task but
the Redis hash stayed empty, so the simulated reversal in cycle 2
went undetected.

Root cause: `TERMINAL_EXECUTION_STATUSES` in `canary/snapshot.py`
listed `("completed", "failed", "cancelled", "timeout")` but
Trinity's actual terminal set (per `models.TaskExecutionStatus` and
the CAS state machine in PR Abilityai#524) is
`("success", "failed", "cancelled", "skipped")`. Two of the four
values were wrong:

  - `completed` is never written — `success` is the canonical name.
  - `timeout` is not a status; timed-out executions land in `failed`.
  - `skipped` (capacity-overflow path, SCHED-COND-001) was missing.

Effects of the bug:

  1. **E-02 silently broken on production data.** Fleet executions
     finishing with `status='success'` were not added to
     `terminal_exec_ids`, so the next cycle's reversal-detection
     comparison set was always empty. A real terminal→running flip
     would have gone undetected indefinitely.
  2. **L-03 orphan-scan filter wrong.** The `schedule_executions`
     filter used the same hardcoded list, so legitimate-terminal
     `success`/`skipped` rows were treated as non-terminal and
     scanned for orphans on every cycle. Harmless when no orphans
     exist, but would have generated false-positive L-03 hits in
     ordinary recovery scenarios.

Fix:

  - Replace the constant with `("success", "failed", "cancelled",
    "skipped")` matching `TaskExecutionStatus`.
  - Build the L-03 orphan-scan SQL filter from the constant
    (`f"status NOT IN ({_TERMINAL_SQL_LIST})"`) so the two stay
    in lockstep.
  - Drop the duplicate constant declaration further down the file.
  - Update e02 docstring + test fixtures (`'completed'` → `'success'`).

Verified locally:

  - `pytest tests/test_canary_invariants.py` — 27/27 passing.
  - End-to-end E-02 demo: cycle 1 seeds `canary:e02:terminal_seen`
    (HEXISTS=1), cycle 2 fires after `UPDATE schedule_executions
    SET status='running'`, cycle 3 stays silent (continuing-red),
    fix-step reverts the row → cycle 4 green. Exactly 1 notification.

This was a real defect waiting to ship — caught only because the
demo went all the way through to a real fleet task. Argues for an
integration-test layer in a follow-up PR (per `test_systems.py`-
style harness discussed in the original PR thread).
@obasilakis obasilakis marked this pull request as draft May 5, 2026 01:02
obasilakis added 18 commits May 7, 2026 10:49
…ration

# Conflicts:
#	docs/memory/architecture.md
…variant #16)

_collect_terminal_executions filtered with `datetime('now', '-N minutes')`
against `completed_at` (written via `utc_now_iso()` → ISO-Z). The two
formats diverge at the date/time separator (T vs space), so lexicographic
compare let every same-day terminal row through regardless of age — the
exact failure invariant #16 / issue Abilityai#476 was written about.

Consequence: on busy installs the canary:e02:terminal_seen Redis hash
grew unbounded until PREV_TERMINAL_MAX nuked the prev set, leaving E-02
with nothing to diff against on the following cycle.

- Extend iso_cutoff() with a keyword-only `minutes=` parameter for
  sub-hour windows. Existing positional callers unaffected.
- Switch snapshot._collect_terminal_executions to iso_cutoff(minutes=...).
- Pin the new signature with TestMinutesKwarg in test_iso_cutoff.py
  (incl. a TypeError check on `iso_cutoff(2, 30)` so a future refactor
  can't quietly turn that into "2h30m").
…cleResult

The router's re-fetch loop ran list_canary_violations once per
violation (not per invariant), repeatedly querying the same
(invariant_id, snapshot_time) window and matching rows back to
violations by signal_query — both wasteful and silently wrong when
two violations share a signal_query.

The service already had the row id from insert_canary_violation
and threw it away. Thread it through CycleResult.persisted_violation_ids
(index-aligned with .violations, None on insert failure) so the
router maps directly to row ids without touching the DB.
The 27-test suite hit DB CRUD and the pure invariants but never
exercised the orchestrator — which is where the demo-driven bugs
lived (e7c11b2 transition spam, ef40cf9 terminal-status typo).
Both shipped past the unit suite and were caught only by hand-driven
demo runs.

Adds a TestCanaryService class with four end-to-end tests over
run_cycle() using the existing FakeRedis + temp SQLite, plus a
canary_service fixture that stubs `database` / `db_models` and
records create_notification calls so transitions can be asserted
directly:

  - test_first_cycle_violation_fires_one_notification — baseline
  - test_continuing_red_does_not_re_fire — regression for e7c11b2
  - test_red_green_red_fires_twice — covers the re-arm branch
  - test_terminal_status_set_seeds_e02_side_table — regression for
    ef40cf9 (would fail against the pre-fix tuple)

Also adds string get/set ops to FakeRedis so the previous-cycle
cursor (canary:last_cycle_at) round-trips. 31/31 tests pass.
…ration

# Conflicts:
#	src/backend/db/migrations.py
…tions

Per product call: green→red transitions no longer write
agent_notifications rows or broadcast over WebSocket. Slack lands in
a follow-up PR and plugs into the same _emit_transition seam.

- _emit_transition now calls a no-op _notify_transition_stub (logs only)
- _render_message / _broadcast / _severity_rank / ws-manager hooks kept
  in place so the next PR is a single-file change
- Tests assert on transition_invariant_ids / cumulative_transitions and
  pin notification_calls == [] to catch regressions
- architecture.md notes Phase 1 has no alert sink wired
The dashboard surface is the Operating Room notifications panel, not
a literal navbar bell — comments and docstrings now say "alert sink"
or "panel" so a future reader doesn't go hunting for a bell icon.
…efault doesn't exist)

The manifest referenced `local:default`, which has no template directory under
config/agent-templates/. Agent creation silently skips template-volume mounting
on missing paths (services/agent_service/crud.py:341), so the fleet members
were spinning up from the bare base image with no CLAUDE.md — verified with
`docker exec agent-canary-fleet-burst cat /home/developer/CLAUDE.md` returning
"No such file".

Switched to `local:test-echo`: minimal stock template, no MCP/credentials,
deterministic short replies — exactly what S-01/E-02 slot-churn invariants
need. Schedule messages simplified accordingly.

Verified end-to-end on local: deleted broken containers, redeployed via
POST /api/systems/deploy, agents came up with templates, autonomy enabled,
schedules fired, real-flow ZREM regression in slot_service.py produced
S-01 critical violation captured by both the 5-min watcher (transitions=1)
and on-demand POST /api/canary/run-cycle. Recovery path returned to green.
The Notifications bullet still described the rejected dashboard path
(db.create_notification → agent_notifications) after 6e1bb94 stubbed
the alert sink. Replaced with an Alert sink bullet noting the no-op
_notify_transition_stub and the preserved _emit_transition seam for
the follow-up Slack PR. Architecture.md and requirements.md now agree.
…hook

- _emit_transition POSTs Block Kit payload via slack_service.post_webhook
- payload: severity emoji header + friendly invariant name, summary,
  per-invariant forensic detail, runbook hint, context line with
  snapshot_time + violation row #N + "last red Xm ago" badge
- CANARY_SLACK_WEBHOOK_URL env var (unset = silent sink)
- previous_violation_at now populated through CycleResult (was always None)
- removes dead WS-broadcast scaffolding from the dropped Phase 1 path
Justification was canary harness notifications under a synthetic agent
name, but the alert sink was rerouted to Slack (21b833d) and never
writes to the notifications table. Reverting restores ownership-scoped
filtering for admins — no live caller depends on the bypass.
… response

Thread snapshot.sources_unavailable through CycleResult so the on-demand
run-cycle endpoint reflects degraded Redis/SQLite reads instead of always
returning []. The field's whole purpose is to expose partial-state cycles
to the operator running the smoke test.
Pre-pivot vocabulary referenced a canary agent template invoking
mcp__trinity__send_notification on a per-agent loop. The harness has
been a backend service with a Slack webhook sink for several commits;
update the docstrings to match.
Header docstring stopped at Abilityai#34 while MIGRATIONS tuple has 57 entries.
Pre-existing drift; the canary migration added to it.
…erts

The E-02 phantom-reversal side-table stored the literal string
"terminal" against every id, so once the Slack sink landed its
forensic line printed "terminal → running" — useless to on-call.
The snapshot now carries terminal_exec_statuses: dict[eid, status]
and E-02 maintains a parallel canary:e02:terminal_status hash keyed
alongside the canary:e02:terminal_seen ZSET. The real prior status
(success / failed / cancelled / skipped) is read back when a
reversal fires and rendered into both observed_state and the
signal_query line. Aged-out ids are HDEL'd in the same sweep that
ZREMRANGEBYSCORE drops them so the two structures stay in sync.

Bundled side-table refactor: HASH + hard 5000-row DEL-on-overflow
becomes ZSET trimmed by ZREMRANGEBYSCORE on
PREV_TERMINAL_RETENTION_SECONDS (1h). The hard-reset trim left a
one-cycle E-02 blind spot every time the cap crossed on busy installs.

Tests: extends FakeRedis with zrangebyscore / hmget / hdel; asserts
the existing reversal test now reports previous_status="success",
adds a new test that round-trips all four terminal statuses through
seed→reverse, and a regression test for the age-based trim invariant.
… endpoint

After the 422 raise on invalid_ids, requested_ids is provably all-valid,
so re-filtering against INVARIANTS was dead code. Pass requested_ids
directly to run_cycle().
CANARY-001 alert sink ships in Phase 1, not Phase 2.
@obasilakis obasilakis marked this pull request as ready for review May 8, 2026 01:08
Copy link
Copy Markdown
Contributor

@vybe vybe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Request Changes

Two items before this can merge.

1. NameError in POST /api/canary/run-cycle — hard blocker

In src/backend/routers/canary.py, the run_canary_cycle() handler references valid_ids which is never defined. It should be requested_ids (defined ~50 lines earlier). Every call to the endpoint will throw NameError: name 'valid_ids' is not defined. The background loop is unaffected since it calls run_cycle() directly, but the on-demand endpoint is completely broken.

# BUG — valid_ids is undefined; crashes on every request
checks_skipped = [i for i in valid_ids if i not in cycle.violations]
checks_run=[i for i in valid_ids if i in cycle.violations],

# Fix — replace with requested_ids
checks_skipped = [i for i in requested_ids if i not in cycle.violations]
checks_run=[i for i in requested_ids if i in cycle.violations],

2. canary_service.py is 692 lines — please split before merge

The service file is already 692 lines and the diff shows canary_service.py at 692 lines added. Target is under 1000 lines per file, but this one is already getting close and Phase 2 invariants will make it grow. Recommend splitting before merge while the boundaries are obvious:

  • canary_service.py — just the CanaryService class and the 5-min loop (~200 lines)
  • canary/alert_sink.py (or similar) — Slack Block Kit formatting + webhook POST logic (currently embedded in the service)
  • canary/transition.py (or similar) — green→red transition detection logic

The snapshot collector and invariant library are already well-separated in src/backend/canary/. The service file should be equally lean.

Everything else here is well-designed — pure-function invariants, graceful degradation on source unavailability, age-based E-02 trimming, one-alert-per-transition suppression, and a solid 1400-line test suite. Happy to approve once these two are addressed.

obasilakis added 6 commits May 9, 2026 15:23
…lse positives

The bijection check fired on a transient mismatch caught by a snapshot
landing in the ~30ms gap between the SQL `status='running'` commit and
the Redis ZADD on start (or the ~5ms gap between SQL terminal flip and
ZREM on stop). Real `release()` skips and phantom rows persist across
multiple cycles, so 3s of grace silences the races without losing
coverage of the bug class S-01 was designed to catch.

- snapshot exposes `running_started_at` per running id and `slot_scores`
  per slot member (`time.time()` at ZADD), so the check can age-filter.
- S-01 drops `in_sql_only` ids whose `started_at` is within 3s and
  `in_redis_only` ids whose ZSET score is within 3s. Existing strict
  behaviour applies for older / unparseable timestamps.
- FakeRedis test stub gains `withscores` for ZRANGE.
- Three new tests: fresh-sql suppressed, fresh-redis suppressed,
  durable mismatch still fires.

51/51 canary tests pass.
…ration

# Conflicts:
#	src/backend/db/schema.py
… split alerts

(1) NameError on POST /api/canary/run-cycle: handler referenced `valid_ids`
which never existed (commit 02cfc97 only fixed the call site, not the
two trailing usages). Endpoint crashed on every request. Replace with
`requested_ids` (provably valid post-422 check).

(2) services/canary_service.py was 692 lines and growing. Extract the
Slack alert surface (constants + `_emit_transition` + `_build_slack_payload`
+ `_render_forensic` + `_format_row_refs` + `_format_last_red` +
`_render_message`) into services/canary_alerts.py as `CanaryAlerts`.
canary_service.py drops to 347 lines, alerts module is 370 — both well
under the 1000-line target. No behaviour change.

Tests pivoted to `CanaryAlerts._foo` references; 51/51 pass.
…ts split

Stale `_emit_transition` / bare `_build_slack_payload` references in
test docstrings and the slack_capture fixture's hint pointed at the
old method names. Renaming to `CanaryAlerts.emit_transition` etc. so
the hints match the actual code path. No test logic changes.
I1 — Webhook URL leak in error logs (services/slack_service.py:309-311):
httpx exceptions embed the request URL in their __str__, and for canary
alerts that URL IS the credential. The catchall returned `str(e)` which
was then logged by `CanaryAlerts.emit_transition`. Sanitize by returning
only the exception class name + a generic message; no URL ever lands in
Vector logs.

I2 — `checks_skipped` always empty (routers/canary.py): `run_invariants`
populates `cycle.violations[inv_id]` for every selected id (with `[]`
when clean or when a check raised), so the field would always be empty.
The actionable signal lives in `sources_unavailable` already. Drop the
field from `RunCycleResponse` and simplify `checks_run` to the request's
filter list directly.

I3 — Severity rank dict duplicated: promote `_severity_rank` →
`severity_rank` in canary_alerts and import it into the router instead
of redeclaring the literal.

51/51 canary tests pass; live-stack endpoint verified
(POST /run-cycle, partial filter, 422 path).
- I1: architecture.md run-cycle row referenced a non-existent
  `checks_skipped` field; replaced with `sources_unavailable` and
  documented the new 409 response.
- I2: lock-contention path returned an empty 200 indistinguishable
  from a real green cycle. CycleResult now carries `skipped: bool`
  and the router maps it to HTTP 409 `cycle in progress`.
- I3: TERMINAL_EXECUTION_STATUSES is now derived from
  TaskExecutionStatus.{SUCCESS,FAILED,CANCELLED,SKIPPED}.value
  instead of a hand-maintained string tuple. Tests stub `models`
  so unit imports don't pull in pydantic/db_models.

51/51 canary unit tests + 13/13 iso_cutoff tests green.
obasilakis added 2 commits May 9, 2026 18:04
`datetime.fromisoformat(ts.rstrip("Z")).timestamp()` stripped the Z
suffix, leaving a naive datetime; `.timestamp()` then interprets naive
datetimes as local time, so on any host with TZ != UTC the grace-window
math was off by the host's UTC offset — producing false-positive S-01
alerts. Python 3.11's `fromisoformat` parses Z natively into
`tzinfo=UTC`, so dropping the rstrip restores correctness on any host.

Surfaced by /review I1.
Diff-scoped audit of the canary harness (Phase 1 of Abilityai#411). Zero
findings at the 8/10 daily-mode confidence gate — admin-only API
surface, outbound-only Slack with explicit URL-leak mitigation,
parameterized SQL with constants-only f-string composition, env-gated
activation. Five candidates considered and dropped below the gate
(documented in the report).
Copy link
Copy Markdown
Contributor

@vybe vybe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both items addressed — NameError fixed and alert sink split into . CSO audit clean (0 findings). LGTM.

Copy link
Copy Markdown
Contributor

@vybe vybe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both items addressed — NameError fixed and alert sink split into canary_alerts.py. CSO audit clean (0 findings). LGTM.

@vybe vybe merged commit a4eec13 into Abilityai:dev May 10, 2026
9 checks passed
obasilakis added a commit that referenced this pull request May 18, 2026
Adds three moderate-complexity invariants on top of Phase 2. Each
brings exactly one new piece of plumbing — first time the canary
takes a hard dep on a non-trivial source beyond SQLite + Redis basic
ops:

- S-03 — Slot TTL ≥ execution timeout. For every member of
  `agent:slots:A`, the companion `agent:slot:A:{eid}` HASH must have
  `TTL ≥ execution_timeout_seconds + 300s` (SLOT_TTL_BUFFER). Three
  failure kinds surfaced explicitly: `missing` (-2; the #226 class),
  `no_expiry` (-1), `below_floor` (positive TTL under floor). Critical.
  Tier A. Per-slot `redis.ttl()` lookup, bounded by ZCARD per agent
  (≤ max_parallel_tasks).
- B-02 — No queued without slots-full. If any agent has queued > 0,
  then either `slot_count == max_parallel` (legit backpressure) OR a
  drain tick fired in the last 60s (drain will pick it up). Critical.
  Tier B. Requires `CapacityManager.run_maintenance()` to write a
  unix-timestamp heartbeat to `canary:drain_tick_at` at the END of
  each successful sweep — mid-sweep crash leaves cursor stale and
  lets B-02 catch the breakage. One-line write in capacity_manager.py,
  rest is canary-local.
- R-01 — No zombie Claude processes. For every running
  `trinity.platform=agent` container,
  `ps -eo stat,comm | grep '^Z.*claude' | wc -l` must be 0. Critical.
  Tier A. Guards PR #407. New source type — docker exec via the
  existing docker_service.docker_client. Per-container failures recorded
  in `sources_unavailable` so a single unhealthy container doesn't
  kill the cycle. Regex anchored at `^Z` rather than the catalog's
  ` Z` (leading-space) — procps-ng on the agent base image emits
  STAT left-aligned without padding; verified live by spawning an
  actual zombie via `os.fork()`+`prctl(PR_SET_NAME, "claude")`.

Snapshot extended:
- `AgentSnapshot.slot_ttls: Dict[str, int]` — per-slot metadata TTL,
  drain sentinels skipped at collection time.
- `Snapshot.drain_tick_at: Optional[float]` — read from
  `canary:drain_tick_at`, sentinel-`None` on cold cluster.
- `Snapshot.zombie_counts: Dict[str, int]` — per-agent zombie process
  count via container.exec_run; missing entry = exec failed for that
  container (recorded in `sources_unavailable`).

Tests: 67 → 84 passing. Added `fake_docker` fixture so the synthetic
container list is controllable; FakeRedis got a `ttl()` method with
the standard -2/-1/positive sentinel semantics.

Verification against local stack — for each invariant, provoke, run
`POST /api/canary/run-cycle`, observe red, revert, observe green:

- S-03: ZADD a slot + EXPIRE its metadata HASH to 30s while the floor
  is 360s → critical violation `kind: below_floor`. Also covered the
  `missing` kind by deleting the HASH entirely. Revert by EXPIRE 500.
- B-02: inserted 1 queued row, set `canary:drain_tick_at` to 600s
  ago → critical violation, `free_slots: 2, drain_tick_age_seconds:
  600`. Revert by writing a fresh timestamp.
- R-01: spawned a real zombie inside agent-cornelius-m via Python
  fork + prctl PR_SET_NAME → critical violation
  `zombie_count: 1`. Reaped by killing the parent → green.

All post-fix cycles returned `violations: 0, transitions: 0`. Final
all-10-invariants cycle on clean platform: 106ms cycle duration,
all green, `sources_unavailable: []`.

Refs: #411, #653 (Phase 1), #884 (this PR — Phase 2 also)
vybe pushed a commit that referenced this pull request May 20, 2026
… B-02, R-01) (#884)

* feat(#882): canary invariant harness Phase 2 (S-02, E-01, E-05, B-01)

Adds four single-source SQL/Redis invariants to the canary harness
(#411). All four follow the Phase 1 (#653) pattern — no new source
types, no new infrastructure, registered into the same `INVARIANTS`
dict the run-cycle endpoint and background loop already drive.

- S-02 — No overbooking. `ZCARD(agent:slots:A)` (drain sentinels
  filtered) > `max_parallel_tasks`. Critical. Tier A. Catches
  `acquire_slot` bypass — distinct from S-01 because the violation can
  be self-consistent (Redis and SQL agree on N+1 vs cap of N).
- E-01 — Terminal-state closure. No `status='running'` row older than
  `execution_timeout_seconds + 300s` (matches `SLOT_TTL_BUFFER` so the
  check fires *after* cleanup has had its window). Critical. Tier B.
- E-05 — Dispatched rows have session. No running row older than 60s
  with `claude_session_id IS NULL`. Major. Tier B. Guards #106.
- B-01 — Queue-status coherence. `db.get_queued_count` (the accessor
  BacklogService calls) agrees with the snapshot's independently-
  collected `len(queued_exec_ids)`. Critical. Tier A. Trivially-green
  today after the #428 consolidation; regression guard against a
  future cache layer or status-filter drift on the production accessor.

Snapshot extended with per-execution `claude_session_id` (E-05) and
per-agent `queued_count_via_service` (B-01). The session-id collector
PRAGMA-introspects the column so the minimal unit-test DDLs don't
have to mirror every production column. The service-count collector
lazy-imports `database.db`, returning `None` on import failure so unit
tests (which stub `db.connection` but not the full facade) skip B-01
silently rather than firing a false positive.

Unit tests: 67 passing (was 51). Each new invariant has positive,
negative, and edge-case tests.

Verification against local stack — for each invariant, provoke, run
`POST /api/canary/run-cycle`, observe red, revert, observe green.
All four reproduce as designed:

- S-02 — ZADD'd 3 fake slot ids when max_parallel=2 → critical
  violation with `overbooked_by: 1`.
- E-01 — inserted `status='running'` row with `started_at` 2h ago
  against a 60s-timeout agent → critical violation,
  `age_seconds: 7436 > timeout+buffer=360s`.
- E-05 — inserted `status='running'` row 3 min old with
  `claude_session_id` NULL → major violation, age=188s.
- B-01 — temporarily patched `db.get_queued_count` to return
  `count - 1` → critical violation,
  "db.get_queued_count = 0 != |queued ids in snapshot| = 1".

Each post-fix cycle returned `violations: 0, transitions: 0`.

Refs: #411, #653, docs/testing/orchestration-invariant-catalog.md

* feat(#882): canary harness Phase 3 (S-03, B-02, R-01)

Adds three moderate-complexity invariants on top of Phase 2. Each
brings exactly one new piece of plumbing — first time the canary
takes a hard dep on a non-trivial source beyond SQLite + Redis basic
ops:

- S-03 — Slot TTL ≥ execution timeout. For every member of
  `agent:slots:A`, the companion `agent:slot:A:{eid}` HASH must have
  `TTL ≥ execution_timeout_seconds + 300s` (SLOT_TTL_BUFFER). Three
  failure kinds surfaced explicitly: `missing` (-2; the #226 class),
  `no_expiry` (-1), `below_floor` (positive TTL under floor). Critical.
  Tier A. Per-slot `redis.ttl()` lookup, bounded by ZCARD per agent
  (≤ max_parallel_tasks).
- B-02 — No queued without slots-full. If any agent has queued > 0,
  then either `slot_count == max_parallel` (legit backpressure) OR a
  drain tick fired in the last 60s (drain will pick it up). Critical.
  Tier B. Requires `CapacityManager.run_maintenance()` to write a
  unix-timestamp heartbeat to `canary:drain_tick_at` at the END of
  each successful sweep — mid-sweep crash leaves cursor stale and
  lets B-02 catch the breakage. One-line write in capacity_manager.py,
  rest is canary-local.
- R-01 — No zombie Claude processes. For every running
  `trinity.platform=agent` container,
  `ps -eo stat,comm | grep '^Z.*claude' | wc -l` must be 0. Critical.
  Tier A. Guards PR #407. New source type — docker exec via the
  existing docker_service.docker_client. Per-container failures recorded
  in `sources_unavailable` so a single unhealthy container doesn't
  kill the cycle. Regex anchored at `^Z` rather than the catalog's
  ` Z` (leading-space) — procps-ng on the agent base image emits
  STAT left-aligned without padding; verified live by spawning an
  actual zombie via `os.fork()`+`prctl(PR_SET_NAME, "claude")`.

Snapshot extended:
- `AgentSnapshot.slot_ttls: Dict[str, int]` — per-slot metadata TTL,
  drain sentinels skipped at collection time.
- `Snapshot.drain_tick_at: Optional[float]` — read from
  `canary:drain_tick_at`, sentinel-`None` on cold cluster.
- `Snapshot.zombie_counts: Dict[str, int]` — per-agent zombie process
  count via container.exec_run; missing entry = exec failed for that
  container (recorded in `sources_unavailable`).

Tests: 67 → 84 passing. Added `fake_docker` fixture so the synthetic
container list is controllable; FakeRedis got a `ttl()` method with
the standard -2/-1/positive sentinel semantics.

Verification against local stack — for each invariant, provoke, run
`POST /api/canary/run-cycle`, observe red, revert, observe green:

- S-03: ZADD a slot + EXPIRE its metadata HASH to 30s while the floor
  is 360s → critical violation `kind: below_floor`. Also covered the
  `missing` kind by deleting the HASH entirely. Revert by EXPIRE 500.
- B-02: inserted 1 queued row, set `canary:drain_tick_at` to 600s
  ago → critical violation, `free_slots: 2, drain_tick_age_seconds:
  600`. Revert by writing a fresh timestamp.
- R-01: spawned a real zombie inside agent-cornelius-m via Python
  fork + prctl PR_SET_NAME → critical violation
  `zombie_count: 1`. Reaped by killing the parent → green.

All post-fix cycles returned `violations: 0, transitions: 0`. Final
all-10-invariants cycle on clean platform: 106ms cycle duration,
all green, `sources_unavailable: []`.

Refs: #411, #653 (Phase 1), #884 (this PR — Phase 2 also)

* fix(canary): /review fixes — alert quality + B-02 boot-window false-positive

Addresses three findings from the pre-landing /review pass:

I1 — Alert quality for Phase 2 + 3 invariants. canary_alerts.py only
had S-01/E-02/L-03 entries in `_INVARIANT_NAMES`, `_INVARIANT_RUNBOOKS`,
`_render_message`, and `_render_forensic`. New ids fell through to the
"S-02 fired N violation(s)" generic fallback with the id doubled in
the header. Added 7 entries each:
  - Friendly name and one-line runbook hint per invariant
  - Per-id `_render_message` (e.g. "3 zombie claude process(es) across
    1 agent(s): cornelius-m" for R-01)
  - Per-id `_render_forensic` rendering of the relevant observed_state
    fields, truncated to 5 violations with a "+N more" footer

Verified by hand-building an R-01 ViolationReport and inspecting the
Block Kit payload — header, body, forensic, runbook, and context all
render the new shape.

I2 — B-02 boot-window false-positive. Background canary loop is fine
(30s startup vs 15s maintenance loop), but the on-demand
`POST /api/canary/run-cycle` endpoint can hit in the first 15s when
no heartbeat exists. With pre-existing queued rows and free slots,
B-02 would fire with `drain_tick_age_seconds: null`.

`CapacityManager.__init__` now seeds the heartbeat with a fresh
timestamp on construction. The maintenance loop overwrites on every
successful tick; init only needs a non-stale floor. Verified live:
deleted the heartbeat key, bounced the backend, key is present
immediately — no waiting for the maintenance tick.

I3 — Stale docstring in `_collect_zombie_counts`. The docstring still
described the catalog's ` Z.*claude` (leading-space) reasoning while
the actual `cmd` line uses `^Z.*claude`. Updated to describe the
anchor-at-line-start version and reference the live-zombie
verification.

Bonus tidy: moved `import time` from inside `run_maintenance` to the
top-of-file imports.

Tests: 84/84 still green. Full all-10-invariant cycle clean.

Refs: /review pass on #884

* feat(canary-fleet): replace long with sleep-echo slow agent

`canary-fleet-long` was a duplicate of burst — same template, same task
duration, same model — only the cron differed (*/5 vs */2). It added no
coverage burst didn't already provide for S-01, E-02, S-03, R-01.

Replace it with a `slow` agent backed by a new `sleep-echo` local
template that sleeps 75s per task. This gives Phase 2 invariants
something to inspect:

- E-05 (dispatched rows have session): needs >60s running rows; was
  trivially-green with 4s test-echo tasks
- S-03 below_floor: needs a slot to exist at canary snapshot time

Also locks burst's live config into the yaml so a redeploy doesn't
revert prior live SQL fixes:
- cron: * * * * * -> */2 * * * *  (cheapest cadence that phase-slides
  against the 5-min canary cycle)
- description / comments updated to reflect actual coverage scope

Manifest deploy can't express `model`, `max_parallel_tasks`, or
`execution_timeout_seconds` (system_service.create_schedules drops them,
SystemAgentConfig has no slot for capacity). Documented the four
required post-deploy API calls in the yaml header so the next operator
doesn't trip on it.

* chore(lint): regenerate sys.modules baseline to absorb dev state

Pre-existing CI failure inherited from dev — `dev` has been failing
this lint since #871 (commit 98574f3) merged on 2026-05-17. That PR
added 6 sys.modules violations in tests/unit/test_slot_per_slot_ttl.py
without regenerating the baseline; a separate cleanup retired 3
violations in tests/unit/test_cleanup_unreachable_orphan.py.

Regenerated via `python tests/lint_sys_modules.py --regenerate-baseline`
— the path the lint script itself directs you to when violations
move below baseline AND new files exceed it. Net: 235 violations in
67 files (unchanged total).

No code-quality regression in this PR's actual diff — none of the
canary tests use bare sys.modules manipulation (they use
monkeypatch.setitem throughout).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants