feat(hooks): session-pr-counter — mechanical session-scope enforcement (soc-1aou)#362
Merged
Conversation
…t (soc-1aou) Implements the mechanical follow-through for soc-waxr (PR #361, the session-scope doctrine rule). soc-waxr encoded "2-4 PRs/session default; ≥5 triggers mandatory post-mortem" as documentation; soc-1aou makes that documentation fire mechanically as a PreToolUse hook on `gh pr create`. ## Fitness delta - Documentation-only rules with mechanical backstop: was {AP#1 → ship.sh, AP#7 → verify-gate-claim.sh}, now adds session-scope → session-pr-counter.sh (3 of N session-relevant rules now mechanically enforced). - Session-scope rule recurrence prevention: 0 → 1 (the rule's own derivation cited a session where the cron-loop kept nudging "keep going" past the threshold; this hook would have fired). - New bats: 0 → 12 (test-session-pr-counter.bats: kill switches, tool matching, threshold logic, hard-block mode, fail-open on malformed output). ## What it does - Fires PreToolUse on Bash + `gh pr create` substring. - Counts the operator's PRs (any state, last 24h via `gh pr list --search`). - If that count is >= threshold-1 (so the next PR tips into ≥threshold), emits a `<system-reminder>`-shaped `additionalContext` with the post-mortem prompts. - Hard-block mode (opt-in via `AGENTOPS_SESSION_PR_BLOCK=1`) exits 2 with a clear reason instead — for operators who want the gate to refuse rather than remind. ## Configuration | Variable | Default | Purpose | |---|---|---| | `AGENTOPS_SESSION_PR_THRESHOLD` | 5 | PR count that triggers the reminder | | `AGENTOPS_SESSION_PR_WINDOW_HOURS` | 24 | Window for "current session" | | `AGENTOPS_SESSION_PR_BLOCK` | 0 | 1 = hard block (exit 2) instead of advisory | | `AGENTOPS_SESSION_PR_COUNTER_DISABLED` | 0 | 1 = bypass this hook | | `AGENTOPS_HOOKS_DISABLED` | 0 | 1 = bypass all AgentOps hooks | ## Sibling pattern Hook structure mirrors `hooks/commit-review-gate.sh` (cycle 54 — also a PreToolUse Bash hook that synthesizes `additionalContext` via either `emit_hook_context` or a `jq -n` fallback). Sibling pattern: `hooks/commit-review-gate.sh`. Standards discipline (set -uo pipefail without -e, kill-switch chain, jq+env fallback for tool input) matches the same sibling. ## Files | File | Change | |---|---| | `hooks/session-pr-counter.sh` | New (133 lines), PreToolUse Bash hook | | `hooks/hooks.json` | New PreToolUse Bash entry (timeout 10s) | | `cli/embedded/hooks/session-pr-counter.sh` + `hooks.json` | Auto-synced via `cli/make sync-hooks` | | `tests/hooks/test-session-pr-counter.bats` | New (12 tests, all green) | ## Verification - 12/12 bats green - `shellcheck hooks/session-pr-counter.sh` clean (SC1091 info-only on hook-helpers source, matching the existing commit-review-gate.sh pattern) - `jq -e . hooks/hooks.json` clean (valid JSON) - `cd cli && make sync-hooks` clean - Dogfooded shape: the hook would have fired on PR #5 of yesterday's 7-PR session ## What's NOT in this PR The soc-waxr doctrine surfaces (CLAUDE.md, AGENTS.md, ship-loop SKILL, anti-patterns.md) still say "mechanical enforcement is a successor concern". Updating those will be a tiny follow-up PR once soc-waxr (PR #361) itself merges — editing the same lines now would conflict on rebase. Closes: soc-1aou Discovered-from: soc-waxr Bounded-context: BC0-foundations + BC5-runtime (hook plumbing) Evidence: shellcheck
boshu2
added a commit
that referenced
this pull request
May 20, 2026
…vb #cobra-writer-leak) (#363) ## Summary Fix main CI red on PR #362 — two `goals_measure` tests in `cli/cmd/ao` flake under `go test -race -shuffle=on`: - `TestGoalsMeasure_FullModeJSONCarriesSnapshotAndScenarios` - `TestGoalsMeasure_MissingArtifactYieldsUnknownNotError` Both fail with `unmarshal payload: unexpected end of JSON input` + empty raw stdout. Closes: soc-n6vb ## Root cause Both tests call `goalsMeasureCmd.RunE(goalsMeasureCmd, nil)` directly. Inside RunE, output is written via `cmd.OutOrStdout()`, which walks the cobra command tree until it finds a non-nil `outWriter`: ``` goalsMeasureCmd.outWriter -> nil goalsCmd.outWriter -> nil rootCmd.outWriter -> ??? (if stale: writes here; test's os.Stdout redirect misses it) fallback -> os.Stdout ``` Under `-shuffle=on`, some earlier test leaves `rootCmd.outWriter` pointing at a buffer that's gone out of scope but whose pointer is still live. The failing tests' `captureJSONStdout` redirects `os.Stdout`, but cobra writes to the leaked buffer instead — empty captured payload. The likely vector is `executeCommand` in `cobra_commands_test.go`: it sets `rootCmd.SetOut(cmdBuf)` and restores inline at the end. If `rootCmd.Execute()` panics or `os.Pipe()` fails mid-flight, restoration is skipped. Reproducible locally with: ``` cd cli && go test -race -shuffle=1779241411657363775 -count=1 ./cmd/ao/... ``` ## Fix **Two layers** — root-cause hardening plus defensive belt-and-suspenders. ### Root cause (`cobra_commands_test.go`) Wrap `executeCommand`'s restoration in `defer` so it always runs even if `rootCmd.Execute()` panics: ```go defer func() { rootCmd.SetOut(nil) rootCmd.SetErr(nil) rootCmd.SetArgs(nil) }() // ... defer func() { os.Stdout = oldStdout }() ``` This removes the inline restoration that was vulnerable to panics, and consolidates the cleanup at one site. ### Defensive (`goals_measure_scenarios_test.go`) `setupMeasureScenarioProject` already saved/restored 8 package-level globals (soc-hwgm/soc-xyt1). Add cobra writer reset on entry so future flakes from any upstream leaker can't reach these tests: ```go rootCmd.SetOut(nil) rootCmd.SetErr(nil) goalsCmd.SetOut(nil) goalsCmd.SetErr(nil) goalsMeasureCmd.SetOut(nil) goalsMeasureCmd.SetErr(nil) ``` ## Verification - `cd cli && go test -race -shuffle=1779241411657363775 -count=1 ./cmd/ao/...` — PASSES (was the failing seed). - Test of both targeted functions in isolation — PASSES. - `gofmt -l` clean. `go vet ./cmd/ao/...` clean. ## Why this isn't masking a real bug The user-facing `ao goals measure` command always runs through `goalsMeasureCmd` under `rootCmd.Execute()`, where cobra's writer-walking lands at the real `os.Stdout`. The flake only affects direct `RunE(cmd, nil)` test invocations that bypass `Execute()` — pure test infrastructure. Bounded-context: BC5-runtime (test infrastructure for CLI command surface) Evidence: `cd cli && go test -race -shuffle=1779241411657363775 -count=1 ./cmd/ao/...` Co-authored-by: Codex <codex@example.invalid>
boshu2
added a commit
that referenced
this pull request
May 20, 2026
…rift (soc-h1cr #registry-regen) (#365) ## Summary Regenerate `registry.json` to add the `session-pr-counter` hook entry added by PR #362 (soc-1aou). The registry was stale on main because PR #362's path filter skipped the `registry-check` job, masking the drift. Closes: soc-h1cr Discovered-from: soc-1aou (PR #362) via soc-1nsx (PR #364 CI dogfood — registry-check failed on the YAML-touching PR and surfaced the pre-existing drift) ## Fix ``` bash scripts/generate-registry.sh ``` Generator output: > Wrote registry.json (79 skills, **44 hooks**, 4 stores, 14 job types, 62 evals, 171 CLI commands) The diff matches the `registry-check` job's CI-emitted expected diff line-for-line: hooks count `43 → 44` and the `session-pr-counter` entry (PreToolUse / Bash matcher / 10s timeout) added to the hooks array. ## #trivial Single mechanical regen of a generated file. No behavior change. Carve-out per CLAUDE.md: "Carve-out: `type=chore` with `#trivial` label for tiny work." ## Verification - `bash scripts/generate-registry.sh` ran cleanly - Diff matches CI-expected diff verbatim - `jq -e . registry.json` clean (valid JSON) Bounded-context: BC0-foundations Evidence: registry.json Co-authored-by: Codex <codex@example.invalid>
boshu2
added a commit
that referenced
this pull request
May 20, 2026
…oc-jmbc #waxr-pointer) (#366) ## Summary soc-waxr (PR #361) doctrinated the session-scope rule with a "Mechanical enforcement … is a successor concern" placeholder. soc-1aou (PR #362) shipped the successor: `hooks/session-pr-counter.sh`. The placeholder text remained stale in 3 surfaces. This PR updates all three to cite the concrete hook + behavior + hard-block env var. Closes: soc-jmbc Discovered-from: soc-waxr (PR #361 doctrine) via soc-1aou (PR #362 hook ship) ## Why Per the soc-waxr harvest note (queued in `.agents/rpi/next-work.jsonl`): "Tiny edit PR to point at hooks/session-pr-counter.sh. Deferred from #362 to avoid #361 rebase conflict." Now that both #361 and #362 have merged, the deferred update can land cleanly. ## Files changed - `CLAUDE.md` (line 144 region) - `AGENTS.md` (line 78 region) - `skills/ship-loop/SKILL.md` (line 113 region) Auto-regenerated by edit hook (no manual edit): - `skills-codex/.agentops-manifest.json` (ship-loop `source_hash` bump) - `skills-codex/ship-loop/.agentops-generated.json` (same bump) The codex variant doesn't carry the successor-concern placeholder, so `generated_hash` stays identical — no codex-side edit needed. ## Sibling pattern Mirrors PR #360 (soc-liyr) doctrine-sweep shape — same trio of surfaces (CLAUDE.md, AGENTS.md, `skills/<name>/SKILL.md`) updated together so source-of-truth precedence holds across all entry points an agent or operator might read first. ## Verification - `grep -rn "successor concern" CLAUDE.md AGENTS.md skills/ship-loop/SKILL.md` returns zero matches (was 3 before) - `grep -rn "session-pr-counter.sh" CLAUDE.md AGENTS.md skills/ship-loop/SKILL.md` returns 3 matches (was 0 before) - 5 files changed / 5 insertions / 5 deletions — minimal mechanical replacement ## Self-correcting Evidence claim The original PR-body Evidence line cited `hooks/session-pr-counter.sh` — a path this PR doesn't touch, only references. The just-shipped soc-1nsx per-job AP#7 check (PR #364) correctly flagged this as an unverifiable claim on the first CI run. Updating to a file the PR actually modifies, which the `changes` job's log records as `[modified]`. The per-job log fetch is working as designed. Bounded-context: BC0-foundations Evidence: skills/ship-loop/SKILL.md Co-authored-by: Codex <codex@example.invalid>
boshu2
added a commit
that referenced
this pull request
May 20, 2026
…th-filter-coverage) (#367) ## Summary `registry-check` triggered only on `skills` or `ci` changes, but `scripts/generate-registry.sh` reads from `skills/`, `hooks/`, `evals/`, AND `cli/cmd/ao/`. PR #362 added `hooks/session-pr-counter.sh` — only the `hooks` filter matched, so registry-check was SKIPPED. The drift sat on main until PR #364 (soc-1nsx) touched `.github/workflows/validate.yml`, which DID re-trigger registry-check via the `ci` filter and surfaced the gap. PR #365 (soc-h1cr) regenerated the registry as a separate concern. This PR closes the path-filter-SKIPPED-≠-drift-absent gap by extending the `if` condition to also trigger on `hooks`, `eval`, and `go` outputs. Closes: soc-xhp6 Discovered-from: soc-h1cr (the PR that surfaced the drift) Encoded-in: `.agents/learnings/2026-05-20-path-filter-skipped-not-absent.md` ## Session-scope note This is the 5th PR shipped in this autonomous session. The session-scope post-mortem (soc-waxr doctrine) was already completed by the `/post-mortem` loop cron (output: HEALTHY session, 0 reactive-spiral, 1 self-correction). This PR is harvested-from a post-mortem finding (this session's own harvest), so the marginal-PR analysis = **discovery**, not churn. The just-doctrinated `hooks/session-pr-counter.sh` hook (soc-1aou) will fire on `gh pr create` for this PR with `additionalContext` post-mortem prompts — that's the dogfood working as designed. ## Sibling pattern The fixed `if` matches the shape used by `agentops-contract-canaries` (line 687): - `contracts || go || skills || ci` — multi-source filter for a multi-source generator This PR brings registry-check to the same shape. ## Verification - `python3 -c "yaml.safe_load(open('.github/workflows/validate.yml'))"` clean - Manually compared `if` against `scripts/generate-registry.sh` source paths (`skills/`, `hooks/`, `evals/`, `cli/cmd/ao/`) - One-line edit + a comment block; no behavior change to the registry-check step itself Bounded-context: BC5-runtime Evidence: .github/workflows/validate.yml Co-authored-by: Codex <codex@example.invalid>
boshu2
added a commit
that referenced
this pull request
May 22, 2026
…tors (soc-2gd6 #eval-hard-fails) (#402) ## Why The v2.42.0 release gate (`scripts/ci-local-release.sh`) was red on 8 evals. The 3 score-0/near-0 hard fails are all **eval-staleness behind legitimate recent refactors** — verified, not gaming or security weakening. Operator decision: update eval to match source of truth (executable > contract). | Eval | Was | Cause | Fix | |---|---|---|---| | `hook-manifest-command-counts` | 0 | `session-pr-counter.sh` (PR #362) is the legit 37th hook script; eval hardcoded 43/36 | bump expected counts 43→44, 36→37 | | `push-worktree landing-plane` | 0.14 | #387 tiered-AGENTS split moved "Landing the Plane" to `AGENTS-WORKFLOW.md` (+ dropped 2 lines) | redirect eval target `AGENTS.md`→`AGENTS-WORKFLOW.md` + restore the 2 dropped policy lines | | `security-toolchain ci-soft-gate-policy` | 0 | gate is intentionally **HARD** (no `continue-on-error`); job already runs `security-gate.sh --mode quick` + uploads artifacts | drop the stale `continue-on-error` requirement (security stays HARD) | **Security note:** `security-toolchain-gate` stays a HARD blocking gate. Only the stale "soft gate" assertion was removed from the eval; the actual scan + artifact upload + summary-blocking are unchanged. ## How tested - hook-manifest jq → `hook-manifest-counts-ok` - security smoke `ci-policy` → `security-toolchain-ci-policy-ok` - all 7 landing-plane strings present in `AGENTS-WORKFLOW.md` - shellcheck clean on edited smoke ## Scope honesty This fixes the 3 **hard** fails only. The release gate still has **5 minor evals (0.71–0.99)** + the **vil/release-smoke** lane — a separate remediation, deliberately NOT in this PR (no green-washing). Sibling pattern: same "update eval to match legitimately-changed source of truth" move as the cli-command-surface canary bumps in #396/#397. Fitness: release-gate eval hard-fails 3 → 0. Closes-scenario: soc-2gd6#eval-hard-fails Bounded-context: BC4-Validation Evidence: evals/agentops-core/fixtures/security-toolchain-governance-smoke.sh
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Mechanical follow-through for soc-waxr (PR #361) — the session-scope doctrine. soc-waxr encoded "2-4 PRs/session default; ≥5 triggers mandatory post-mortem" as documentation; this PR makes that rule fire mechanically as a PreToolUse hook on
gh pr create.Closes: soc-1aou · Discovered-from: soc-waxr
Fitness delta
What it does
gh pr creategh pr list --search)count >= threshold-1, emitsadditionalContextwith post-mortem prompts:AGENTOPS_SESSION_PR_BLOCK=1) exits 2 with clear reason insteadConfiguration
AGENTOPS_SESSION_PR_THRESHOLDAGENTOPS_SESSION_PR_WINDOW_HOURSAGENTOPS_SESSION_PR_BLOCKAGENTOPS_SESSION_PR_COUNTER_DISABLEDAGENTOPS_HOOKS_DISABLEDSibling pattern
Hook structure mirrors
hooks/commit-review-gate.sh— same PreToolUse Bash matcher, same kill-switch chain, same jq+env fallback for tool input parsing, sameemit_hook_context→jq -n→ escape-fallback emission chain. Standards discipline matches (set -uo pipefail without -e, fail-open advisory shape). Sibling pattern:hooks/commit-review-gate.sh.Files
hooks/session-pr-counter.shhooks/hooks.jsoncli/embedded/hooks/*make sync-hookstests/hooks/test-session-pr-counter.batsDogfood: what would have fired
Yesterday's 7-PR session (#356 through #361 + #320) would have triggered this hook at PR #5 (
#359soc-bbvw — the FIRST self-correction PR). The hook's reminder would have been visible to the agent before opening #359, prompting "is this churn or discovery?". The answer was "churn — fixing my own regression from #357", which exactly fits the failure mode soc-waxr names.What's NOT in this PR
The soc-waxr doctrine surfaces (CLAUDE.md, AGENTS.md, ship-loop SKILL, anti-patterns.md) still say "mechanical enforcement is a successor concern". Updating those will be a tiny follow-up PR once soc-waxr (#361) itself merges — editing the same lines now would conflict on rebase.
Verification
bats tests/hooks/test-session-pr-counter.bats→ 12 okshellcheck hooks/session-pr-counter.shclean (SC1091 info-only on hook-helpers source, matchingcommit-review-gate.sh)jq -e . hooks/hooks.jsoncleancd cli && make sync-hookscleanBounded-context: BC0-foundations + BC5-runtime (hook plumbing)
Evidence: shellcheck