-
Notifications
You must be signed in to change notification settings - Fork 0
CI Gates
Every gate that runs on push to main and on every pull_request:, with the invariant each one proves and the script behind it. Three per-OS workflows run in parallel. Run the deterministic subset locally in one command before every commit:
bash scripts/check-all.sh| Workflow | Runs on | Jobs |
|---|---|---|
[T] Linux Tests |
ubuntu-latest |
install-smoke + adapter-parity + validate + check-references + check-wiki + unit tests + verify-v4 + verify-orchestration-briefing + verify-phases + verify-memory-roundtrip + syntax + lib-parity + pii-guardrails (check-no-pii + gitleaks) + dogfood-workflows |
[T] Mac Tests |
macos-latest |
install-smoke + validate + check-references + unit tests + verify-v4 + verify-orchestration-briefing + verify-phases + verify-memory-roundtrip + syntax (both shells) + lib-parity + check-no-pii |
[T] Windows Tests |
windows-latest |
install-smoke (pwsh) + installer-boundary + validate + check-references + pwsh syntax + lib-parity + check-no-pii + unit tests |
| Gate | Invariant | Script |
|---|---|---|
| install-smoke | Fresh install succeeds; re-run is idempotent; --update refreshes managed files but preserves user edits to wiki/ and AGENTS.md; test infra never propagates to scratch. |
scripts/smoke-install-bash.sh, scripts/smoke-install-pwsh.ps1
|
| post-install integrity | Hook-command paths resolve; every .sh/.ps1 parses; bash installer produces bash commands, pwsh installer produces pwsh commands; settings.json has the expected schema; .harness state files are valid. |
scripts/check-integrity-bash.sh, scripts/check-integrity-pwsh.ps1
|
| adapter-parity | Every adapter ships the canonical set of phase-commands, sub-agents, and skills. | scripts/check-parity.sh |
| validate | Every TOML, YAML frontmatter, and JSON across adapters/ and templates/ parses and has required keys. |
scripts/validate-adapters.py |
| check-references | Every harness/<phases|agents|skills|pipelines>/*.md mentioned in an adapter file exists; phase-spec "dispatch the <name> sub-agent / invoke the <name> skill" lines point at a canonical spec; settings-fragment-{bash,pwsh}.json have matching schemas. |
scripts/check-references.py |
| check-wiki | Diátaxis structural rules (a–k): mode purity, ADR append-only + Status: accepted|superseded|rejected, orphan-link detection, globally-unique filenames, no banned-headings-per-mode. Runs --strict (blocks PRs) when wiki/.diataxis is present; warn-only otherwise. Shipped in v0.9.0 as part of the Diátaxis rollout (Foundations HLD). |
scripts/check-wiki.py |
| syntax |
bash -n on every .sh; PowerShell AST parse on every .ps1 across repo root + scripts/ + templates/ + adapters/. |
scripts/check-syntax.sh, scripts/check-syntax.ps1
|
| unit tests | Every scripts/test_*.py (auto-discovered) passes — the memory-script logic in isolation. |
(cd scripts && python3 -m unittest discover -p 'test_*.py') |
| check-lib-parity |
lib/install/ matches the committed checksums (byte-identical across agentm + crickets). |
scripts/check-lib-parity.sh |
| check-vault-lock-parity | The two copies of the vault-write protocol — scripts/vault_lock.py and its vendored twin harness/skills/memory/scripts/vault_lock.py — are sha256-identical, so the memory skill and the harness core share one canonical lock implementation. |
scripts/check-vault-lock-parity.sh |
| check-multi-plan-naming | Locks the named-plan naming contract three ways: (1) scripts/harness_memory.py still exposes the named-plan resolver surface — both resolve_active_plan (the session→plan binder) and harness_state_dir (the state-dir enumerator); (2) no curated harness/*.md doc hard-asserts a singleton via the narrow deny-pattern — definite-article the PLAN.md + possessive PLAN.md's — which still permits every legitimate mention (a named PLAN-<name>.md, a PLAN*.md glob, a <slug>.PLAN.md queued file, the vault-state-path PLAN.md CLI example, and PLAN.archive.*); (3) both session-start hook twins (harness-context-session-start.{sh,ps1}) still glob PLAN-*.md, so they cannot drift apart and silently lose named-plan discovery at session boot (assertion 3, added V5-10 part 1 task 5). Scans 7 curated docs; design/SKILL.md is included as a regression guard. See Named plans. |
scripts/check-multi-plan-naming.sh |
| check-worktree-slug | The worktree slug-safety invariant (V5-10 / LC-2): a worker in a git worktree can't see the parent's gitignored .harness/, so it resolves the vault slug by the origin basename alone (Tier 3). If the full-chain slug (an explicit vault_project / github.repo override) diverges from the origin basename, a worktree worker would silently write plans/progress under the wrong projects/<slug>/. Delegates to vault_project.py check-worktree-slug (the same resolver the doctor probe calls, so gate + probe never drift); no origin remote → warn-only. |
scripts/check-worktree-slug.sh |
| check-no-auto-worktree | No agentm automation surface auto-spawns a worktree (V5-10 / LC-3): worktrees are an operator-initiated primitive (the spawn helper lives crickets-side). Scans executable surfaces (shell · python · pwsh · CI yaml) for the git worktree add spawn verb; read/cleanup subcommands (list/remove/prune) are allowed, tests + this gate's own file excluded. Proves agentm itself never creates a worktree unprompted. |
scripts/check-no-auto-worktree.sh |
| check-process-seam-import-direction | The memory↔process edge is one-directional (V5-4 / LC-4): the process-seam client (scripts/process_seam.py) imports the memory engine, never the reverse. Scans agentm's Python automation surfaces (scripts harness lib templates .github) for any module importing process_seam; excludes test_*.py (tests import by design), process_seam.py itself, and the empty SEAM_CONSUMERS allowlist (designed consumers live crickets-side, LC-5). A hit is a forbidden back-edge that would turn the one-way client dependency into a cycle. V5-5 bridge extension (LC-8): also asserts no kernel toolkit script (harness/skills/memory/scripts/) imports harness_memory — a back-edge through the orchestration bridge is equally forbidden. The gate's enforcement also runs cross-OS via the auto-discovered unit tests step (its subprocess tests live in test_process_seam.py), so no separate per-OS workflow step exists. See Memory↔process seam, Process seam, and Orchestration bridge. |
scripts/check-process-seam-import-direction.sh |
| check-storage-seam-no-path-leak | No pathlib.Path crosses the memory↔storage seam (V5-1): every seam verb returns the seam's own Locator/Info, never a path, so a filesystem assumption can't reach the engine. Static (AST, not grep): parses each scripts/storage_*.py and flags any seam verb (resolve/read/write/list/exists/info/mkdir) whose return annotation references a path type, however nested (Path, list[Path], `Path |
None, os.PathLike). Internal Pathuse is fine — a filesystem backend'sroot / key; only handing one back is the leak. test_*.pyis out of the glob, so conformance fixtures (which build aPath-returning backend to test the gate) don't trip it. The structural sibling of check-process-seam-import-direction; runs cross-OS via the auto-discovered unit testsstep (subprocess tests intest_storage_seam.py`). See Memory↔storage seam and Storage seam. |
| check-no-pii | The regex PII scanner finds no personal info across the tree (this is a public repo). | scripts/check-no-pii.sh |
| check-no-hardcoded-vault-path | No tracked non-test file contains a hardcoded absolute vault path: (A) …/Library/CloudStorage/… literals (not shell tilde/variable expansions); (B) the retired pre-V5-3 vault root name as a path component (…/Obsidian/AgentMemory). Ensures vault-path hygiene survives refactors — callers must resolve the live path at runtime, not bake a machine-specific or stale-root path into source. |
scripts/check-no-hardcoded-vault-path.py |
| verify-v4 | Kernel-owned auto-orchestration contracts: config seed/parse (A), idle-chain dry-run ordering + bounded execution (E), emit gating (shifted-guard + cooldown) + atomic state write + single-writer invariant (G static) — against a throwaway scratch vault. Linux/Mac only. | scripts/verify-v4.sh |
| verify-orchestration-briefing | PM-half of the orchestration push surface (V5-5 / LC-9): briefing signals (inbox · HIGH watchlist · incubator · idea-ledger · staged-adapt · both nudges) · staged-adapt surfaces-and-clears. 10 checks against a throwaway scratch vault. Linux/Mac only. Session-marker and phase-dispatch scenarios are in verify-phases. |
scripts/verify-orchestration-briefing.sh |
| verify-phases | A full phase lifecycle (/setup → /plan → /work → /release) drives its deterministic seams — state read/write, progress.md appends, features.json updates, post-phase dispatch plumbing — end-to-end on a throwaway fixture project, run twice: once vault-resident, once repo-local. Also covers session-marker scenarios (no-session / single-marker / ambiguous concurrency-safe) and post-release discover-skills chain (V5-5 / LC-9). Linux/Mac only. |
scripts/verify-phases.sh |
| verify-memory-roundtrip | The memory engine round-trips on a throwaway fixture vault: stub-mode embed (deterministic hash vector, no network/model) → save → recall query surfaces it by content → reflect a synthetic transcript → vec_index full-sync/drain builds the index → nearest-neighbor read-back → vault_lint clean. 12 checks; a VERIFY_MEMORY_FAULT=drop-save injection drives the negative path. The nearest-neighbor sub-check is conditional on the backend — asserted when the Python sqlite3 supports enable_load_extension (Mac/Linux CI pip install sqlite-vec to exercise it), logged as SKIPPED (never silently dropped) when it falls back to keyword recall by design. Hermetic, Linux/Mac only. |
scripts/verify-memory-roundtrip.sh |
| dogfood-workflows | Every workflow the harness ships as a template under templates/.github/workflows/ is active at the repo root, byte-identical to the template. Mirrored locally by check-workflow-parity (below), so a one-sided edit is caught in check-all.sh before the push, not as a red Linux run after it. |
Inline job in tests-linux.yml
|
| check-workflow-parity | The local mirror of dogfood-workflows (above): every templated templates/.github/workflows/*.yml is active at the repo root, byte-identical (diff -u, the same comparator CI uses, so the two verdicts cannot diverge). Active workflows without a template twin (e.g. ci-all.yml) are out of scope — the invariant is template→active, not the reverse. One deliberate divergence from CI: zero templated workflows is a setup error (exit 2), not the vacuous pass CI's nullglob yields — a local gate that checked nothing must not read as green. |
scripts/check-workflow-parity.sh |
Checks the V4 #23 kernel-owned auto-orchestration contracts. Runs the real scripts via their CLIs against a throwaway scratch vault and asserts deterministic outputs — the integration complement to the per-function unit suite. Trimmed to 85 lines at V5-5 / LC-9 when the PM-half and Developer-half were extracted.
| Property | Detail |
|---|---|
| Isolation | A mktemp -d scratch vault + an exported IDEAS_SURFACE_PATH — never reads or writes a real vault. |
| No side effects | No network, no transcript mining, no sub-agent dispatch; self-cleans via a trap. |
| Coverage |
Segment A: config seed/parse. Segment E: idle-chain dry-run (ordering + bounded --max-batches/--limit). Segment G: emit gating (shifted-guard + cooldown) · atomic state write · static single-writer check: greps orchestration_*.py for ^def save_state and asserts 0 matches — making the V5-5 LC-2 single-writer invariant a gate-checked property. |
| Out of scope | Briefing signals, staged-adapt, nudge triggers, phase-dispatch, and session-marker scenarios — those moved to verify-orchestration-briefing.sh (PM-half) and verify-phases.sh (Developer-half). |
| Extend it | Add a check_* per new kernel-owned contract (drive the real script against $SV; assert with assert_contains/assert_equals/assert_absent), keeping each check scratch-isolated. |
Checks the PM-owned orchestration signals extracted from verify-v4.sh at V5-5 / LC-9. Will travel to the crickets PM-trigger plan when that ships; runs agentm-side until then.
| Property | Detail |
|---|---|
| Isolation | A mktemp -d scratch vault — never reads or writes a real vault. |
| No side effects | No network, no transcript mining, no sub-agent dispatch; self-cleans via a trap. |
| Coverage | Every briefing signal: inbox · HIGH watchlist · incubator · idea-ledger · staged-adapt · both nudges (idea-promotion + watchlist-authoring) · staged-adapt surfaces-and-clears · phase-dispatch plans + session-marker resolution (no-session / single-marker / ambiguous concurrency-safe). 10 checks. |
| Out of scope | Kernel contracts (config seed, idle chain, emit gating) — those remain in verify-v4.sh. |
| Extend it | Add a check_* per new PM-half signal, keeping each check scratch-isolated. |
gh run list --workflow "[T] Linux Tests" --limit 3
gh run list --workflow "[T] Mac Tests" --limit 3
gh run list --workflow "[T] Windows Tests" --limit 3
gh run view <run-id> --log-failed # drill into the failing stepRed-on-Windows but green-on-POSIX almost always indicates a path-separator or pwsh-host assumption regression. Red-on-all is usually a canonical-spec or adapter-parity drift — try bash scripts/check-parity.sh locally.
One command runs the deterministic battery — 22 gates: unit tests + every check-* gate + the four integration checks (verify-v4, verify-orchestration-briefing, verify-phases, verify-memory-roundtrip) — prints a PASS/FAIL table, and exits non-zero on any failure:
bash scripts/check-all.shIt deliberately omits the heavier smoke-install + gitleaks (slow / external tooling) that CI runs on every push — run those directly if you need them: bash scripts/smoke-install-bash.sh (POSIX) / pwsh -NoProfile -File scripts/smoke-install-pwsh.ps1 (Windows). check-all.sh is the maintained source of truth for the local battery — add a gate line as the project grows.
-
How to cut a release — CI must be green before invoking
ship-release. -
How to refresh an installed harness — what
--updatetouches vs. leaves alone. -
Vault write protocol — the protocol the
check-vault-lock-paritygate keeps byte-identical across its two copies. - Foundations HLD — the installer-boundary rule the gates enforce.
🔧 How-to
- Installer CLI
- Detection rules
- Project config
- Migration tool
- CI gates
- Repo layout
- Vault lint checks
- Note relatedness signals
- Vault write protocol
- Queue status lite
- AgentMemory context payload
- Process seam
- Storage seam
- Auto-orchestration config
- Capability resolver
- Design governance
- Compatibility
- Completed features
- Memory MCP tools