Skip to content

feat(hooks): session-pr-counter — mechanical session-scope enforcement (soc-1aou)#362

Merged
boshu2 merged 2 commits into
mainfrom
feat/soc-1aou-session-pr-counter
May 20, 2026
Merged

feat(hooks): session-pr-counter — mechanical session-scope enforcement (soc-1aou)#362
boshu2 merged 2 commits into
mainfrom
feat/soc-1aou-session-pr-counter

Conversation

@boshu2
Copy link
Copy Markdown
Owner

@boshu2 boshu2 commented May 19, 2026

Summary

Mechanical follow-through for soc-waxr (PR #361) — the session-scope doctrine. soc-waxr encoded "2-4 PRs/session default; ≥5 triggers mandatory post-mortem" as documentation; this PR makes that rule fire mechanically as a PreToolUse hook on gh pr create.

Closes: soc-1aou · Discovered-from: soc-waxr

Fitness delta

  • Doc-only rules with mechanical backstop: +1 (session-scope joins AP#1→ship.sh and AP#7→verify-gate-claim.sh)
  • Session-scope recurrence prevention: 0 → 1 (would have fired on PR fix: align with Anthropic marketplace standards #5 of yesterday's 7-PR session)
  • New bats: 0 → 12 (kill switches, tool matching, threshold logic, hard-block mode, fail-open on malformed output)

What it does

  • PreToolUse on Bash + gh pr create
  • Counts the operator's PRs (any state, last 24h via gh pr list --search)
  • At count >= threshold-1, emits additionalContext with post-mortem prompts:
    • Which PRs were planned vs reactive?
    • How many self-corrections so far?
    • Is the marginal PR discovery or churn?
  • Hard-block mode (AGENTOPS_SESSION_PR_BLOCK=1) exits 2 with clear reason instead

Configuration

Variable Default Purpose
AGENTOPS_SESSION_PR_THRESHOLD 5 PR count that triggers the reminder
AGENTOPS_SESSION_PR_WINDOW_HOURS 24 Window for "current session"
AGENTOPS_SESSION_PR_BLOCK 0 1 = hard block (exit 2) instead of advisory
AGENTOPS_SESSION_PR_COUNTER_DISABLED 0 1 = bypass this hook
AGENTOPS_HOOKS_DISABLED 0 1 = bypass all AgentOps hooks

Sibling pattern

Hook structure mirrors hooks/commit-review-gate.sh — same PreToolUse Bash matcher, same kill-switch chain, same jq+env fallback for tool input parsing, same emit_hook_contextjq -n → escape-fallback emission chain. Standards discipline matches (set -uo pipefail without -e, fail-open advisory shape). Sibling pattern: hooks/commit-review-gate.sh.

Files

File Change
hooks/session-pr-counter.sh New, 133 lines
hooks/hooks.json New PreToolUse Bash entry (timeout 10s)
cli/embedded/hooks/* Auto-synced via make sync-hooks
tests/hooks/test-session-pr-counter.bats New, 12 tests, all green

Dogfood: what would have fired

Yesterday's 7-PR session (#356 through #361 + #320) would have triggered this hook at PR #5 (#359 soc-bbvw — the FIRST self-correction PR). The hook's reminder would have been visible to the agent before opening #359, prompting "is this churn or discovery?". The answer was "churn — fixing my own regression from #357", which exactly fits the failure mode soc-waxr names.

What's NOT in this PR

The soc-waxr doctrine surfaces (CLAUDE.md, AGENTS.md, ship-loop SKILL, anti-patterns.md) still say "mechanical enforcement is a successor concern". Updating those will be a tiny follow-up PR once soc-waxr (#361) itself merges — editing the same lines now would conflict on rebase.

Verification

  • bats tests/hooks/test-session-pr-counter.bats → 12 ok
  • shellcheck hooks/session-pr-counter.sh clean (SC1091 info-only on hook-helpers source, matching commit-review-gate.sh)
  • jq -e . hooks/hooks.json clean
  • cd cli && make sync-hooks clean

Bounded-context: BC0-foundations + BC5-runtime (hook plumbing)
Evidence: shellcheck

…t (soc-1aou)

Implements the mechanical follow-through for soc-waxr (PR #361, the
session-scope doctrine rule). soc-waxr encoded "2-4 PRs/session default;
≥5 triggers mandatory post-mortem" as documentation; soc-1aou makes that
documentation fire mechanically as a PreToolUse hook on `gh pr create`.

## Fitness delta

- Documentation-only rules with mechanical backstop: was {AP#1 → ship.sh,
  AP#7 → verify-gate-claim.sh}, now adds session-scope → session-pr-counter.sh
  (3 of N session-relevant rules now mechanically enforced).
- Session-scope rule recurrence prevention: 0 → 1 (the rule's own derivation
  cited a session where the cron-loop kept nudging "keep going" past the
  threshold; this hook would have fired).
- New bats: 0 → 12 (test-session-pr-counter.bats: kill switches, tool
  matching, threshold logic, hard-block mode, fail-open on malformed output).

## What it does

- Fires PreToolUse on Bash + `gh pr create` substring.
- Counts the operator's PRs (any state, last 24h via `gh pr list --search`).
- If that count is >= threshold-1 (so the next PR tips into ≥threshold), emits
  a `<system-reminder>`-shaped `additionalContext` with the post-mortem prompts.
- Hard-block mode (opt-in via `AGENTOPS_SESSION_PR_BLOCK=1`) exits 2 with a
  clear reason instead — for operators who want the gate to refuse rather
  than remind.

## Configuration

| Variable | Default | Purpose |
|---|---|---|
| `AGENTOPS_SESSION_PR_THRESHOLD` | 5 | PR count that triggers the reminder |
| `AGENTOPS_SESSION_PR_WINDOW_HOURS` | 24 | Window for "current session" |
| `AGENTOPS_SESSION_PR_BLOCK` | 0 | 1 = hard block (exit 2) instead of advisory |
| `AGENTOPS_SESSION_PR_COUNTER_DISABLED` | 0 | 1 = bypass this hook |
| `AGENTOPS_HOOKS_DISABLED` | 0 | 1 = bypass all AgentOps hooks |

## Sibling pattern

Hook structure mirrors `hooks/commit-review-gate.sh` (cycle 54 — also a
PreToolUse Bash hook that synthesizes `additionalContext` via either
`emit_hook_context` or a `jq -n` fallback). Sibling pattern:
`hooks/commit-review-gate.sh`.

Standards discipline (set -uo pipefail without -e, kill-switch chain, jq+env
fallback for tool input) matches the same sibling.

## Files

| File | Change |
|---|---|
| `hooks/session-pr-counter.sh` | New (133 lines), PreToolUse Bash hook |
| `hooks/hooks.json` | New PreToolUse Bash entry (timeout 10s) |
| `cli/embedded/hooks/session-pr-counter.sh` + `hooks.json` | Auto-synced via `cli/make sync-hooks` |
| `tests/hooks/test-session-pr-counter.bats` | New (12 tests, all green) |

## Verification

- 12/12 bats green
- `shellcheck hooks/session-pr-counter.sh` clean (SC1091 info-only on hook-helpers source, matching the existing commit-review-gate.sh pattern)
- `jq -e . hooks/hooks.json` clean (valid JSON)
- `cd cli && make sync-hooks` clean
- Dogfooded shape: the hook would have fired on PR #5 of yesterday's 7-PR session

## What's NOT in this PR

The soc-waxr doctrine surfaces (CLAUDE.md, AGENTS.md, ship-loop SKILL,
anti-patterns.md) still say "mechanical enforcement is a successor
concern". Updating those will be a tiny follow-up PR once soc-waxr (PR #361)
itself merges — editing the same lines now would conflict on rebase.

Closes: soc-1aou
Discovered-from: soc-waxr

Bounded-context: BC0-foundations + BC5-runtime (hook plumbing)
Evidence: shellcheck
@boshu2 boshu2 enabled auto-merge (squash) May 19, 2026 21:56
@boshu2 boshu2 merged commit eb7874f into main May 20, 2026
67 checks passed
@boshu2 boshu2 deleted the feat/soc-1aou-session-pr-counter branch May 20, 2026 01:41
boshu2 added a commit that referenced this pull request May 20, 2026
…vb #cobra-writer-leak) (#363)

## Summary

Fix main CI red on PR #362 — two `goals_measure` tests in `cli/cmd/ao`
flake under
`go test -race -shuffle=on`:

- `TestGoalsMeasure_FullModeJSONCarriesSnapshotAndScenarios`
- `TestGoalsMeasure_MissingArtifactYieldsUnknownNotError`

Both fail with `unmarshal payload: unexpected end of JSON input` + empty
raw stdout.

Closes: soc-n6vb

## Root cause

Both tests call `goalsMeasureCmd.RunE(goalsMeasureCmd, nil)` directly.
Inside RunE,
output is written via `cmd.OutOrStdout()`, which walks the cobra command
tree until
it finds a non-nil `outWriter`:

```
goalsMeasureCmd.outWriter -> nil
goalsCmd.outWriter        -> nil
rootCmd.outWriter         -> ??? (if stale: writes here; test's os.Stdout redirect misses it)
fallback                  -> os.Stdout
```

Under `-shuffle=on`, some earlier test leaves `rootCmd.outWriter`
pointing at a
buffer that's gone out of scope but whose pointer is still live. The
failing
tests' `captureJSONStdout` redirects `os.Stdout`, but cobra writes to
the leaked
buffer instead — empty captured payload.

The likely vector is `executeCommand` in `cobra_commands_test.go`: it
sets
`rootCmd.SetOut(cmdBuf)` and restores inline at the end. If
`rootCmd.Execute()`
panics or `os.Pipe()` fails mid-flight, restoration is skipped.

Reproducible locally with:
```
cd cli && go test -race -shuffle=1779241411657363775 -count=1 ./cmd/ao/...
```

## Fix

**Two layers** — root-cause hardening plus defensive
belt-and-suspenders.

### Root cause (`cobra_commands_test.go`)

Wrap `executeCommand`'s restoration in `defer` so it always runs even if
`rootCmd.Execute()` panics:

```go
defer func() {
    rootCmd.SetOut(nil)
    rootCmd.SetErr(nil)
    rootCmd.SetArgs(nil)
}()
// ...
defer func() {
    os.Stdout = oldStdout
}()
```

This removes the inline restoration that was vulnerable to panics, and
consolidates
the cleanup at one site.

### Defensive (`goals_measure_scenarios_test.go`)

`setupMeasureScenarioProject` already saved/restored 8 package-level
globals
(soc-hwgm/soc-xyt1). Add cobra writer reset on entry so future flakes
from any
upstream leaker can't reach these tests:

```go
rootCmd.SetOut(nil)
rootCmd.SetErr(nil)
goalsCmd.SetOut(nil)
goalsCmd.SetErr(nil)
goalsMeasureCmd.SetOut(nil)
goalsMeasureCmd.SetErr(nil)
```

## Verification

- `cd cli && go test -race -shuffle=1779241411657363775 -count=1
./cmd/ao/...` — PASSES (was the failing seed).
- Test of both targeted functions in isolation — PASSES.
- `gofmt -l` clean. `go vet ./cmd/ao/...` clean.

## Why this isn't masking a real bug

The user-facing `ao goals measure` command always runs through
`goalsMeasureCmd`
under `rootCmd.Execute()`, where cobra's writer-walking lands at the
real
`os.Stdout`. The flake only affects direct `RunE(cmd, nil)` test
invocations that
bypass `Execute()` — pure test infrastructure.

Bounded-context: BC5-runtime (test infrastructure for CLI command
surface)
Evidence: `cd cli && go test -race -shuffle=1779241411657363775 -count=1
./cmd/ao/...`

Co-authored-by: Codex <codex@example.invalid>
boshu2 added a commit that referenced this pull request May 20, 2026
…rift (soc-h1cr #registry-regen) (#365)

## Summary

Regenerate `registry.json` to add the `session-pr-counter` hook entry
added by PR #362 (soc-1aou). The registry was stale on main because PR
#362's path filter skipped the `registry-check` job, masking the drift.

Closes: soc-h1cr
Discovered-from: soc-1aou (PR #362) via soc-1nsx (PR #364 CI dogfood —
registry-check failed on the YAML-touching PR and surfaced the
pre-existing drift)

## Fix

```
bash scripts/generate-registry.sh
```

Generator output:
> Wrote registry.json (79 skills, **44 hooks**, 4 stores, 14 job types,
62 evals, 171 CLI commands)

The diff matches the `registry-check` job's CI-emitted expected diff
line-for-line: hooks count `43 → 44` and the `session-pr-counter` entry
(PreToolUse / Bash matcher / 10s timeout) added to the hooks array.

## #trivial

Single mechanical regen of a generated file. No behavior change.
Carve-out per CLAUDE.md: "Carve-out: `type=chore` with `#trivial` label
for tiny work."

## Verification

- `bash scripts/generate-registry.sh` ran cleanly
- Diff matches CI-expected diff verbatim
- `jq -e . registry.json` clean (valid JSON)

Bounded-context: BC0-foundations
Evidence: registry.json

Co-authored-by: Codex <codex@example.invalid>
boshu2 added a commit that referenced this pull request May 20, 2026
…oc-jmbc #waxr-pointer) (#366)

## Summary

soc-waxr (PR #361) doctrinated the session-scope rule with a "Mechanical
enforcement … is a successor concern" placeholder. soc-1aou (PR #362)
shipped the successor: `hooks/session-pr-counter.sh`. The placeholder
text remained stale in 3 surfaces. This PR updates all three to cite the
concrete hook + behavior + hard-block env var.

Closes: soc-jmbc
Discovered-from: soc-waxr (PR #361 doctrine) via soc-1aou (PR #362 hook
ship)

## Why

Per the soc-waxr harvest note (queued in `.agents/rpi/next-work.jsonl`):
"Tiny edit PR to point at hooks/session-pr-counter.sh. Deferred from
#362 to avoid #361 rebase conflict." Now that both #361 and #362 have
merged, the deferred update can land cleanly.

## Files changed

- `CLAUDE.md` (line 144 region)
- `AGENTS.md` (line 78 region)
- `skills/ship-loop/SKILL.md` (line 113 region)

Auto-regenerated by edit hook (no manual edit):
- `skills-codex/.agentops-manifest.json` (ship-loop `source_hash` bump)
- `skills-codex/ship-loop/.agentops-generated.json` (same bump)

The codex variant doesn't carry the successor-concern placeholder, so
`generated_hash` stays identical — no codex-side edit needed.

## Sibling pattern

Mirrors PR #360 (soc-liyr) doctrine-sweep shape — same trio of surfaces
(CLAUDE.md, AGENTS.md, `skills/<name>/SKILL.md`) updated together so
source-of-truth precedence holds across all entry points an agent or
operator might read first.

## Verification

- `grep -rn "successor concern" CLAUDE.md AGENTS.md
skills/ship-loop/SKILL.md` returns zero matches (was 3 before)
- `grep -rn "session-pr-counter.sh" CLAUDE.md AGENTS.md
skills/ship-loop/SKILL.md` returns 3 matches (was 0 before)
- 5 files changed / 5 insertions / 5 deletions — minimal mechanical
replacement

## Self-correcting Evidence claim

The original PR-body Evidence line cited `hooks/session-pr-counter.sh` —
a path this PR doesn't touch, only references. The just-shipped soc-1nsx
per-job AP#7 check (PR #364) correctly flagged this as an unverifiable
claim on the first CI run. Updating to a file the PR actually modifies,
which the `changes` job's log records as `[modified]`. The per-job log
fetch is working as designed.

Bounded-context: BC0-foundations
Evidence: skills/ship-loop/SKILL.md

Co-authored-by: Codex <codex@example.invalid>
boshu2 added a commit that referenced this pull request May 20, 2026
…th-filter-coverage) (#367)

## Summary

`registry-check` triggered only on `skills` or `ci` changes, but
`scripts/generate-registry.sh` reads from `skills/`, `hooks/`, `evals/`,
AND
`cli/cmd/ao/`. PR #362 added `hooks/session-pr-counter.sh` — only the
`hooks`
filter matched, so registry-check was SKIPPED. The drift sat on main
until
PR #364 (soc-1nsx) touched `.github/workflows/validate.yml`, which DID
re-trigger registry-check via the `ci` filter and surfaced the gap.
PR #365 (soc-h1cr) regenerated the registry as a separate concern.

This PR closes the path-filter-SKIPPED-≠-drift-absent gap by extending
the
`if` condition to also trigger on `hooks`, `eval`, and `go` outputs.

Closes: soc-xhp6
Discovered-from: soc-h1cr (the PR that surfaced the drift)
Encoded-in:
`.agents/learnings/2026-05-20-path-filter-skipped-not-absent.md`

## Session-scope note

This is the 5th PR shipped in this autonomous session. The session-scope
post-mortem (soc-waxr doctrine) was already completed by the
`/post-mortem`
loop cron (output: HEALTHY session, 0 reactive-spiral, 1
self-correction).
This PR is harvested-from a post-mortem finding (this session's own
harvest), so the marginal-PR analysis = **discovery**, not churn.

The just-doctrinated `hooks/session-pr-counter.sh` hook (soc-1aou) will
fire
on `gh pr create` for this PR with `additionalContext` post-mortem
prompts —
that's the dogfood working as designed.

## Sibling pattern

The fixed `if` matches the shape used by `agentops-contract-canaries`
(line 687):
- `contracts || go || skills || ci` — multi-source filter for a
multi-source generator

This PR brings registry-check to the same shape.

## Verification

- `python3 -c "yaml.safe_load(open('.github/workflows/validate.yml'))"`
clean
- Manually compared `if` against `scripts/generate-registry.sh` source
paths (`skills/`, `hooks/`, `evals/`, `cli/cmd/ao/`)
- One-line edit + a comment block; no behavior change to the
registry-check step itself

Bounded-context: BC5-runtime
Evidence: .github/workflows/validate.yml

Co-authored-by: Codex <codex@example.invalid>
boshu2 added a commit that referenced this pull request May 22, 2026
…tors (soc-2gd6 #eval-hard-fails) (#402)

## Why

The v2.42.0 release gate (`scripts/ci-local-release.sh`) was red on 8
evals. The 3 score-0/near-0 hard fails are all **eval-staleness behind
legitimate recent refactors** — verified, not gaming or security
weakening. Operator decision: update eval to match source of truth
(executable > contract).

| Eval | Was | Cause | Fix |
|---|---|---|---|
| `hook-manifest-command-counts` | 0 | `session-pr-counter.sh` (PR #362)
is the legit 37th hook script; eval hardcoded 43/36 | bump expected
counts 43→44, 36→37 |
| `push-worktree landing-plane` | 0.14 | #387 tiered-AGENTS split moved
"Landing the Plane" to `AGENTS-WORKFLOW.md` (+ dropped 2 lines) |
redirect eval target `AGENTS.md`→`AGENTS-WORKFLOW.md` + restore the 2
dropped policy lines |
| `security-toolchain ci-soft-gate-policy` | 0 | gate is intentionally
**HARD** (no `continue-on-error`); job already runs `security-gate.sh
--mode quick` + uploads artifacts | drop the stale `continue-on-error`
requirement (security stays HARD) |

**Security note:** `security-toolchain-gate` stays a HARD blocking gate.
Only the stale "soft gate" assertion was removed from the eval; the
actual scan + artifact upload + summary-blocking are unchanged.

## How tested
- hook-manifest jq → `hook-manifest-counts-ok`
- security smoke `ci-policy` → `security-toolchain-ci-policy-ok`
- all 7 landing-plane strings present in `AGENTS-WORKFLOW.md`
- shellcheck clean on edited smoke

## Scope honesty
This fixes the 3 **hard** fails only. The release gate still has **5
minor evals (0.71–0.99)** + the **vil/release-smoke** lane — a separate
remediation, deliberately NOT in this PR (no green-washing).

Sibling pattern: same "update eval to match legitimately-changed source
of truth" move as the cli-command-surface canary bumps in #396/#397.

Fitness: release-gate eval hard-fails 3 → 0.

Closes-scenario: soc-2gd6#eval-hard-fails
Bounded-context: BC4-Validation
Evidence:
evals/agentops-core/fixtures/security-toolchain-governance-smoke.sh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant