CI Gates

CI gates reference

Every gate that runs on push to main and on every pull_request:, with the invariant each one proves and the script behind it. Three per-OS workflows run in parallel. Run the deterministic subset locally in one command before every commit:

bash scripts/check-all.sh

⚡ Quick Reference

Workflow	Runs on	Jobs
`[T] Linux Tests`	`ubuntu-latest`	install-smoke + adapter-parity + validate + check-references + check-wiki + unit tests + verify-v4 + verify-phases + verify-memory-roundtrip + syntax + lib-parity + pii-guardrails (check-no-pii + gitleaks) + dogfood-workflows
`[T] Mac Tests`	`macos-latest`	install-smoke + validate + check-references + unit tests + verify-v4 + verify-phases + verify-memory-roundtrip + syntax (both shells) + lib-parity + check-no-pii
`[T] Windows Tests`	`windows-latest`	install-smoke (pwsh) + installer-boundary + validate + check-references + pwsh syntax + lib-parity + check-no-pii + unit tests

What each gate proves

Gate	Invariant	Script
install-smoke	Fresh install succeeds; re-run is idempotent; `--update` refreshes managed files but preserves user edits to `wiki/` and `AGENTS.md`; test infra never propagates to scratch.	`scripts/smoke-install-bash.sh`, `scripts/smoke-install-pwsh.ps1`
post-install integrity	Hook-command paths resolve; every `.sh`/`.ps1` parses; bash installer produces bash commands, pwsh installer produces pwsh commands; `settings.json` has the expected schema; `.harness` state files are valid.	`scripts/check-integrity-bash.sh`, `scripts/check-integrity-pwsh.ps1`
adapter-parity	Every adapter ships the canonical set of phase-commands, sub-agents, and skills.	`scripts/check-parity.sh`
validate	Every TOML, YAML frontmatter, and JSON across `adapters/` and `templates/` parses and has required keys.	`scripts/validate-adapters.py`
check-references	Every `harness/<phases\|agents\|skills\|pipelines>/*.md` mentioned in an adapter file exists; phase-spec "dispatch the `<name>` sub-agent / invoke the `<name>` skill" lines point at a canonical spec; `settings-fragment-{bash,pwsh}.json` have matching schemas.	`scripts/check-references.py`
check-wiki	Diátaxis structural rules (a–k): mode purity, ADR append-only + `Status: accepted\|superseded\|rejected`, orphan-link detection, globally-unique filenames, no banned-headings-per-mode. Runs `--strict` (blocks PRs) when `wiki/.diataxis` is present; warn-only otherwise. Shipped in v0.9.0 as part of the Diátaxis rollout (ADR 0004).	`scripts/check-wiki.py`
syntax	`bash -n` on every `.sh`; PowerShell AST parse on every `.ps1` across repo root + `scripts/` + `templates/` + `adapters/`.	`scripts/check-syntax.sh`, `scripts/check-syntax.ps1`
unit tests	Every `scripts/test_*.py` (auto-discovered) passes — the memory-script logic in isolation.	`(cd scripts && python3 -m unittest discover -p 'test_*.py')`
check-lib-parity	`lib/install/` matches the committed checksums (byte-identical across agentm + crickets).	`scripts/check-lib-parity.sh`
check-vault-lock-parity	The two copies of the vault-write protocol — `scripts/vault_lock.py` and its vendored twin `harness/skills/memory/scripts/vault_lock.py` — are sha256-identical, so the memory skill and the harness core share one canonical lock implementation.	`scripts/check-vault-lock-parity.sh`
check-multi-plan-naming	Locks the named-plan naming contract three ways: (1) `scripts/harness_memory.py` still exposes the named-plan resolver surface — both `resolve_active_plan` (the session→plan binder) and `harness_state_dir` (the state-dir enumerator); (2) no curated `harness/.md` doc hard-asserts a singleton via the narrow deny-pattern — definite-article `the PLAN.md` + possessive `PLAN.md's` — which still permits every legitimate mention (a named `PLAN-<name>.md`, a `PLAN.md` glob, a `<slug>.PLAN.md` queued file, the `vault-state-path PLAN.md` CLI example, and `PLAN.archive.`); (3) both session-start hook twins (`harness-context-session-start.{sh,ps1}`) still glob `PLAN-.md`, so they cannot drift apart and silently lose named-plan discovery at session boot (assertion 3, added V5-10 part 1 task 5). Scans 7 curated docs; `design/SKILL.md` is included as a regression guard. See Named plans.	`scripts/check-multi-plan-naming.sh`
check-worktree-slug	The worktree slug-safety invariant (V5-10 / LC-2): a worker in a `git worktree` can't see the parent's gitignored `.harness/`, so it resolves the vault slug by the origin basename alone (Tier 3). If the full-chain slug (an explicit `vault_project` / `github.repo` override) diverges from the origin basename, a worktree worker would silently write plans/progress under the wrong `projects/<slug>/`. Delegates to `vault_project.py check-worktree-slug` (the same resolver the `doctor` probe calls, so gate + probe never drift); no origin remote → warn-only.	`scripts/check-worktree-slug.sh`
check-no-auto-worktree	No agentm automation surface auto-spawns a worktree (V5-10 / LC-3): worktrees are an operator-initiated primitive (the spawn helper lives crickets-side). Scans executable surfaces (shell · python · pwsh · CI yaml) for the `git worktree add` spawn verb; read/cleanup subcommands (`list`/`remove`/`prune`) are allowed, tests + this gate's own file excluded. Proves agentm itself never creates a worktree unprompted.	`scripts/check-no-auto-worktree.sh`
check-no-pii	The regex PII scanner finds no personal info across the tree (this is a public repo).	`scripts/check-no-pii.sh`
verify-v4	The V4 #23 auto-orchestration push surface works end-to-end (briefing · idle · phase-dispatch · nudges · config/state) against a throwaway scratch vault — see below. Linux/Mac only.	`scripts/verify-v4.sh`
verify-phases	A full phase lifecycle (`/setup → /plan → /work → /release`) drives its deterministic seams — state read/write, `progress.md` appends, `features.json` updates, post-phase dispatch plumbing — end-to-end on a throwaway fixture project, run twice: once vault-resident, once repo-local. Linux/Mac only.	`scripts/verify-phases.sh`
verify-memory-roundtrip	The memory engine round-trips on a throwaway fixture vault: stub-mode `embed` (deterministic hash vector, no network/model) → `save` → `recall query` surfaces it by content → `reflect` a synthetic transcript → `vec_index` full-sync/drain builds the index → nearest-neighbor read-back → `vault_lint` clean. 12 checks; a `VERIFY_MEMORY_FAULT=drop-save` injection drives the negative path. The nearest-neighbor sub-check is conditional on the backend — asserted when the Python `sqlite3` supports `enable_load_extension` (Mac/Linux CI `pip install sqlite-vec` to exercise it), logged as SKIPPED (never silently dropped) when it falls back to keyword recall by design. Hermetic, Linux/Mac only.	`scripts/verify-memory-roundtrip.sh`
dogfood-workflows	Every workflow the harness ships as a template under `templates/.github/workflows/` is active at the repo root, byte-identical to the template. Mirrored locally by `check-workflow-parity` (below), so a one-sided edit is caught in `check-all.sh` before the push, not as a red Linux run after it.	Inline job in `tests-linux.yml`
check-workflow-parity	The local mirror of `dogfood-workflows` (above): every templated `templates/.github/workflows/*.yml` is active at the repo root, byte-identical (`diff -u`, the same comparator CI uses, so the two verdicts cannot diverge). Active workflows without a template twin (e.g. `ci-all.yml`) are out of scope — the invariant is template→active, not the reverse. One deliberate divergence from CI: zero templated workflows is a setup error (exit 2), not the vacuous pass CI's `nullglob` yields — a local gate that checked nothing must not read as green.	`scripts/check-workflow-parity.sh`

`verify-v4.sh` — the push-surface integration check

End-to-end check of the V4 #23 auto-orchestration surface. Runs the real scripts via their CLIs against a throwaway scratch vault and asserts the deterministic outputs — the integration complement to the per-function unit suite.

Property	Detail
Isolation	A `mktemp -d` scratch vault + an exported `IDEAS_SURFACE_PATH` — never reads or writes a real vault.
No side effects	No network, no transcript mining, no sub-agent dispatch; self-cleans via a `trap`.
Coverage	config seed/parse · every briefing signal (inbox · HIGH watchlist · incubator · idea-ledger · staged-adapt · both nudges) · staged-adapt surfaces-and-clears · idle-chain dry-run (ordering + bounded `--max-batches`/`--limit`) · phase-dispatch plans + session-marker resolution (incl. the `ambiguous-session` concurrency guard) · emit gating (shifted-guard + cooldown) · atomic state write.
Out of scope	The real-boot integration (does a live session inject the briefing), the cross-surface read paths, and subjective fatigue calibration — those are the operator dogfood (vault-resident `projects/agentm/_harness/DOGFOOD-V4.md`).
Extend it	Add a `check_*` per new push-surface signal (drive the real script against `$SV`; assert with `assert_contains`/`assert_equals`/`assert_absent`), keeping each check scratch-isolated.

Reading red CI

gh run list --workflow "[T] Linux Tests"    --limit 3
gh run list --workflow "[T] Mac Tests"       --limit 3
gh run list --workflow "[T] Windows Tests"   --limit 3
gh run view <run-id> --log-failed             # drill into the failing step

Red-on-Windows but green-on-POSIX almost always indicates a path-separator or pwsh-host assumption regression. Red-on-all is usually a canonical-spec or adapter-parity drift — try bash scripts/check-parity.sh locally.

Running the gate set locally

One command runs the deterministic battery — 16 gates: unit tests + every check-* gate + the three integration checks (verify-v4, verify-phases, verify-memory-roundtrip) — prints a PASS/FAIL table, and exits non-zero on any failure:

bash scripts/check-all.sh

It deliberately omits the heavier smoke-install + gitleaks (slow / external tooling) that CI runs on every push — run those directly if you need them: bash scripts/smoke-install-bash.sh (POSIX) / pwsh -NoProfile -File scripts/smoke-install-pwsh.ps1 (Windows). check-all.sh is the maintained source of truth for the local battery — add a gate line as the project grows.

How to cut a release — CI must be green before invoking ship-release.
How to refresh an installed harness — what --update touches vs. leaves alone.
Vault write protocol — the protocol the check-vault-lock-parity gate keeps byte-identical across its two copies.
ADR 0002 — the installer-boundary rule the gates enforce.

CI Gates

CI gates reference

⚡ Quick Reference

What each gate proves

verify-v4.sh — the push-surface integration check

Reading red CI

Running the gate set locally

Related

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

`verify-v4.sh` — the push-surface integration check