From 22bf59d2da7b0c1b4a1b47ab16c84872a5502068 Mon Sep 17 00:00:00 2001 From: "Carlos D. Escobar-Valbuena" Date: Wed, 13 May 2026 22:14:37 -0500 Subject: [PATCH 1/3] =?UTF-8?q?feat(templates):=20refresh=20CLAUDE.md.temp?= =?UTF-8?q?late=20+=20AGENTS.md.template=20to=20P1=E2=80=93P19?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Templates were stuck at "thirteen primitives" (P1–P13) while bstack SKILL.md, scripts/doctor.sh, and references/primitives.md had moved to nineteen (P1–P19). This scaffolded every new workspace into permanent disagreement with the catalog — `bstack bootstrap` produced governance files that `bstack doctor` then reported as having six missing primitive sections. Changes: 1. **CLAUDE.md.template** (+58/-22): rewritten to nineteen primitives with the full table (P1–P19), short-name convention + index, Plugin Skill Precedence section (bstack > superpowers hierarchy), updated governance/hooks/conventions sections, Self-Documenting Standards block. 2. **AGENTS.md.template** (+220/-14): bumped intro to nineteen, every ### P# heading renamed to "### P# — Name: …" form, short-name convention + index added, full bodies for P14 (Dep-Chain), P15 (Snapshot), P16 (Crystallize), P17 (Lens), P18 (Audience), P19 (Orchestrate) — each with What/How/Invariant/Reflexive Trigger Rule. Composition-loop diagram rewritten in short-name form with P14–P19 threaded at pre-flight + boundary crossings. Plugin Skill Precedence section after composition loop documents the bstack > plugin hierarchy and the "what this kills / what this keeps" partition. 3. **SKILL.md** (+1/-1): tiny drift fix — body said "16 primitives" while frontmatter said "nineteen". Now consistent. 4. **tests/template_lockstep.test.sh** (new, 188 lines): asserts primitive-count and structural consistency across the four governance surfaces that must agree: - SKILL.md frontmatter description (canonical count word + P19 reference + P1–P19 trigger span) - scripts/doctor.sh EXPECTED_COUNT (the validator) - assets/templates/CLAUDE.md.template (intro count, table rows, short-name index entry count, Plugin Skill Precedence presence) - assets/templates/AGENTS.md.template (intro count, composition-loop intro, ### Pn section count, Plugin Skill Precedence presence) - Short-name index payload identity between the two templates 13 assertions, all green at canonical count = 19. Designed to be the guardrail that makes this kind of drift CI-visible — when the next primitive is added (P20 in flight), this test catches any template that didn't get updated. Numbering uses **bstack canonical ordering**: P7=Freshness, P8=Janitor, P9=Wait. This matches the prior template state and bstack catalog (SKILL.md, doctor.sh, references/primitives.md). The ~/broomva workspace itself uses P7=Wait, P8=Freshness, P9=Janitor because of historical script + config-dir naming (BROOMVA_P8_HOME, ~/.config/broomva/p9- janitor/); that's a workspace-specific deviation documented in workspace#54 and is NOT a template concern — new bstack installs get clean canonical numbering. Companion to broomva/workspace#54 (precedence + short names + P19 in workspace governance). Independent of bstack#14 (P20 sync) — that PR adds P20 to the catalog; once it merges, a follow-up updates these templates + lockstep test to canonical=20. Follow-ups (separate PRs): - bootstrap.sh + revamp.sh ORDERED_SKILLS divergence from SKILL.md ROSTER (bootstrap installs persist + wealth-management + investment- management; ROSTER instead has autonomous + role-x). - P20 propagation to templates after bstack#14 merges. Co-Authored-By: Claude Opus 4.7 (1M context) --- SKILL.md | 2 +- assets/templates/AGENTS.md.template | 234 ++++++++++++++++++++++++---- assets/templates/CLAUDE.md.template | 82 +++++++--- tests/template_lockstep.test.sh | 188 ++++++++++++++++++++++ 4 files changed, 457 insertions(+), 49 deletions(-) create mode 100755 tests/template_lockstep.test.sh diff --git a/SKILL.md b/SKILL.md index 7ada1e9..88760bb 100644 --- a/SKILL.md +++ b/SKILL.md @@ -57,7 +57,7 @@ bstack is a *portable harness metalayer* — it composes existing skills into a bstack ships two complementary layers: -- **Substrate** (this skill, `/bstack`): the 16 primitives + 29 skills + governance + hooks + `.control/policy.yaml`. This is what `/bstack bootstrap` installs. The substrate is the *capability* — what's available in the workspace. +- **Substrate** (this skill, `/bstack`): the 19 primitives + 29 skills + governance + hooks + `.control/policy.yaml`. This is what `/bstack bootstrap` installs. The substrate is the *capability* — what's available in the workspace. - **Mode** (`broomva/autonomous`): the canonical *behavior* that runs on top of the substrate. When the user says "go" / "proceed" / "be autonomous", `/autonomous` fires the 19-reflex pipeline that uses every primitive in sequence. Installing the substrate without the mode = the workspace has primitives but no entry point to engage them. Invoking the mode without the substrate = wishful thinking. Compounded: `/bstack bootstrap` installs the substrate, then `/autonomous` is the standing operating mode for substantive work units. diff --git a/assets/templates/AGENTS.md.template b/assets/templates/AGENTS.md.template index 3a8d8e3..96a3ad2 100644 --- a/assets/templates/AGENTS.md.template +++ b/assets/templates/AGENTS.md.template @@ -8,9 +8,13 @@ This file IS the control harness for agents operating in this workspace. Reading ## Bstack Core Automation Primitives -This workspace is governed by the **bstack** primitive contract — thirteen irreducible building blocks. All are always active. Run `bstack doctor` to verify compliance. +This workspace is governed by the **bstack** primitive contract — nineteen irreducible building blocks. All are always active. Run `bstack doctor` to verify compliance. -### P1: Conversation Bridge (Episodic Memory) +Each primitive carries a **short name** for agent prose. When referencing a primitive in responses, PR bodies, commit messages, or comments, use the `Name (Pn)` form — *"applying Snapshot (P15)"*, *"via Dep-Chain (P14)"*, *"running Bookkeeping (P6)"* — not bare `Pn`. The number is the canonical identifier; the short name is the human-readable handle. + +**Short-name index**: Bridge (P1) · Gate (P2) · Tickets (P3) · Pipeline (P4) · Fanout (P5) · Bookkeeping (P6) · Freshness (P7) · Janitor (P8) · Wait (P9) · Hygiene (P10) · Empirical (P11) · Persist (P12) · Dream (P13) · Dep-Chain (P14) · Snapshot (P15) · Crystallize (P16) · Lens (P17) · Audience (P18) · Orchestrate (P19). + +### P1 — Bridge: Conversation Bridge (Episodic Memory) **What**: Every Claude Code session is automatically captured as a structured Obsidian doc with YAML frontmatter, tool calls, conversation threads, files touched, and wikilinks. @@ -18,7 +22,7 @@ This workspace is governed by the **bstack** primitive contract — thirteen irr **Invariant**: Bridge stamp at `~/.cache/broomva-bridge-stamp` < 24h stale. If stale, the agent is silently amnesic — fix immediately. -### P2: Control Gate (Safety Shield) +### P2 — Gate: Control Gate (Safety Shield) **What**: PreToolUse hook intercepts destructive shell ops before execution. @@ -26,7 +30,7 @@ This workspace is governed by the **bstack** primitive contract — thirteen irr **Invariant**: G1–G4 are blocking and cannot be overridden. G5–G6 are soft warnings. -### P3: Linear Tickets (Work Tracking) +### P3 — Tickets: Linear Tickets (Work Tracking) **What**: Every unit of work maps to a Linear ticket; state transitions Backlog → Todo → In Progress → Done track real progress. @@ -34,7 +38,7 @@ This workspace is governed by the **bstack** primitive contract — thirteen irr **Invariant**: No significant work without a ticket. Don't mark Done until merged + verified. -### P4: PR Pipeline (CI/CD Gate) +### P4 — Pipeline: PR Pipeline (CI/CD Gate) **What**: All code changes flow through PRs with automated CI checks before merging. @@ -42,7 +46,7 @@ This workspace is governed by the **bstack** primitive contract — thirteen irr **Invariant**: Never merge with failing checks. Never `--no-verify`. -### P5: Parallel Agent Dispatch (Concurrent Execution) +### P5 — Fanout: Parallel Agent Dispatch (Concurrent Execution) **What**: Independent work streams execute concurrently via isolated agents (worktrees or background processes). @@ -50,7 +54,7 @@ This workspace is governed by the **bstack** primitive contract — thirteen irr **Invariant**: Agents must not write to the same files. Branch naming unique per agent. Results merge to main only after verification. -### P6: Knowledge Bookkeeping (Knowledge Graph Maintenance) +### P6 — Bookkeeping: Knowledge Bookkeeping (Knowledge Graph Maintenance) **What**: Every knowledge item entering the graph is scored, promoted, and lint-validated through a 7-stage pipeline. @@ -64,7 +68,7 @@ This workspace is governed by the **bstack** primitive contract — thirteen irr 2. Before committing a synced snapshot to a public surface. 3. At the close of any substantial work session that produced graph-relevant material. -### P7: Skill Freshness Check (Stale-Install Detector) +### P7 — Freshness: Skill Freshness Check (Stale-Install Detector) **What**: Reports stale installed skills at SessionStart so they get refreshed before causing silent failures. @@ -72,7 +76,7 @@ This workspace is governed by the **bstack** primitive contract — thirteen irr **Invariant**: Hook always exits 0. Threshold via `BROOMVA_P7_THRESHOLD_DAYS` env var (default 7; legacy `BROOMVA_P8_THRESHOLD_DAYS` still honored). Dismiss with `npx skills update -g` then `touch ~/.config/broomva/p7/last-skill-update-check`. -### P8: Branch + Worktree Janitor (Hygiene) +### P8 — Janitor: Branch + Worktree Janitor (Hygiene) **What**: Detects merged branches (including squash-merged) and dead worktrees, removes them safely. @@ -80,7 +84,7 @@ This workspace is governed by the **bstack** primitive contract — thirteen irr **Invariant**: Default `--dry-run` — pass `--apply` to actually delete. Never touches main/master/develop/HEAD/gh-pages or branches in `~/.config/broomva/p8-janitor/protected.txt` (legacy `~/.config/broomva/p9-janitor/` still honored). -### P9: Productive Wait (Wait-Optimizer / Event-Driven Wait Loop) +### P9 — Wait: Productive Wait (Wait-Optimizer / Event-Driven Wait Loop) **What**: A wait optimizer. Convert any blocking external operation (PR CI, push-triggered deploys, builds, long-running indexing) into work on the next priority. PR CI is the canonical implementation; the primitive is broader. @@ -98,7 +102,7 @@ This workspace is governed by the **bstack** primitive contract — thirteen irr 4. When red CI bg-task notification fires — `p9 heal --classify` first. 5. When `p9 status` reports `MERGE_READY` — invoke `p9 auto-merge` rather than `gh pr merge` directly. -### P10: Worktree Hygiene Discipline (Clean-Tree Reset Point) +### P10 — Hygiene: Worktree Hygiene Discipline (Clean-Tree Reset Point) **What**: Reflexive discipline binding every agent to (a) make a deliberate worktree-or-not decision before writing the first file, (b) keep `git status` clean throughout the PR lifecycle, (c) run P8 janitor immediately after merge. @@ -113,7 +117,7 @@ This workspace is governed by the **bstack** primitive contract — thirteen irr 3. After PR merge — immediate `make janitor` (P8). 4. At SessionStart — audit `git worktree list` + `git branch`; clean orphans before new work. -### P11: Empirical Feedback Loop (Closed-Loop Validation) +### P11 — Empirical: Empirical Feedback Loop (Closed-Loop Validation) **What**: Reflexive discipline binding every agent to *validate by interacting* with what they build — not just by reasoning + lint + CI exit codes. Multi-modal (logs, screenshots, video, audio), multi-level (smoke, unit, integration, regression, E2E, deploy verification). @@ -137,7 +141,7 @@ The agent picks the right subset, runs as parallel watchers, and **captures evid 5. When CI or tests fail — capture full context first before attempting a fix. 6. At session end — produce a *dogfood receipt*. -### P12: Persistent Loop Discipline (Cross-Context Restart Loop) +### P12 — Persist: Persistent Loop Discipline (Cross-Context Restart Loop) **What**: Reflexive discipline binding every agent to *restart the context window when it rots*, while preserving state in the filesystem. Long-horizon work (>1h, the METR 80%-reliability ceiling) decays inside a single conversation as the context window passes ~100K tokens. @@ -153,7 +157,7 @@ The agent picks the right subset, runs as parallel watchers, and **captures evid 4. When orchestrating long-horizon work — default to persist + periodic checkpoints; compose with P5 worktrees. 5. When the user says "run this in the background for an hour" — that's persist territory. -### P13: Dream Cycle Discipline (Tier-Crossing Consolidation) +### P13 — Dream: Dream Cycle Discipline (Tier-Crossing Consolidation) **What**: Reflexive discipline binding every agent to apply the **5-phase dream shape** (*gather → replay → prune → consolidate → index*) for any consolidation that crosses a cadence-tier boundary. Closes the *shadow dream* corruption mode. @@ -171,25 +175,203 @@ The agent picks the right subset, runs as parallel watchers, and **captures evid The morpheus crate (shared abstraction across implementations) is deferred per rule-of-three until ≥2 dream instances ship beyond P6. +### P14 — Dep-Chain: Dependency-Chain Reasoning Discipline + +**What**: Reflexive discipline binding every agent to *explicitly enumerate dependencies* before any substantive write. Closes the "think deeply" ritual failure mode — the phrase recurs without producing concrete behavior change. + +**How**: Before any code/doc edit that affects >1 file or any public surface, the agent surfaces: + +- **Upstream**: files, functions, types, contracts, deployed state this write depends on +- **Downstream**: consumers, tests, CI gates, docs, in-flight PRs depending on this + +The enumeration is concrete (file paths, function names, contract identifiers) and lives in the response, PR body, or commit message — never the agent's head. + +**Invariant**: When the user prompt contains phrases like "think deeply through chain of dependencies", "follow best practices", "consider the implications", the agent's response MUST include a concrete dep-chain block. Phrase acknowledgement without the block is ritual. + +**Reflexive Trigger Rule**: P14 is a reflex. Apply without being prompted: + +1. Before any write to a file that exports a public API — enumerate downstream callers. +2. Before any refactor — enumerate both directions. +3. Before any cross-project change — enumerate which other projects depend on the touched surface. +4. When user prompt contains a "think deeply"-class phrase — produce the dep-chain block, not the acknowledgement. + +### P15 — Snapshot: State-Snapshot Before Action + +**What**: Reflexive discipline binding every plan to a *fresh state snapshot* of the workspace. Closes the "help me understand where we stand" failure mode — agents report last-seen state instead of current state. + +**How**: Before any plan or significant action, the agent surfaces: + +- `git status` + branch + ahead/behind vs upstream +- In-flight PRs (`gh pr list --json number,title,headRefName,state`) +- Linear ticket state for the current work unit +- Bookkeeping / bridge freshness (cache stamps, last pipeline run) +- Last deploy state (relevant project's preview/production URL + commit) + +The snapshot is *part of* the planning response — not deferred to a follow-up call. + +**Invariant**: A plan built on stale state fails silently — re-solves solved problems, conflicts with parallel work, misses in-flight PRs. P15 makes state-checking a cheap reflex, not a request the user has to make. + +**Reflexive Trigger Rule**: P15 is a reflex. Apply without being prompted: + +1. At session start when reviewing prior context. +2. Before any plan that touches multiple files or cross-project surfaces. +3. When the user asks "where are we" / "what's left" / "is everything committed" — the answer is the snapshot, not a recollection. +4. After any long-running background operation completes — re-snapshot before next plan. + +### P16 — Crystallize: Crystallization Discipline (the Bstack Engine) + +**What**: Meta-primitive — the rule-of-three loop that produces every other primitive. When a pattern recurs ≥3 times across sessions, propose promotion to skill / SKILL.md / AGENTS.md section / `.control/policy.yaml` gate. + +**How**: Four gate conditions for promotion: + +1. ≥3 distinct instances of the pattern (logged with citations) +2. Concrete mechanism (not just a description) +3. Stated invariant (machine-checkable) +4. Stated failure mode (what goes wrong without it) + +Candidates live in `research/entities/pattern/bstack-engine.md` (or equivalent ledger). Promotion is deliberate — L3 stability budget (λ₃ ≈ 0.006) constrains how rapidly governance changes. + +**Invariant**: The crystallization loop runs inside the workspace, not in the user's head. Patterns that recur without being named are technical debt at the governance layer. + +**Reflexive Trigger Rule**: P16 is a reflex. Apply without being prompted: + +1. When you observe the same instruction repeated across ≥3 sessions — log it as a candidate. +2. When a user prompt phrase recurs without producing concrete behavior change — flag as ritual; queue for promotion or carve-out. +3. At session end — survey what was learned; promote candidates with all four gates passing. +4. Before adding any new primitive — verify all four gates, not three. + +### P17 — Lens: Lens-Routed Request Articulation (`role/x`) + +**What**: Reflexive discipline routing every substantive user input through a typed lens (`role/x` intake). Replaces "act as X" persona prompting (debunked: MMLU drops 71.6% → 66.3% with naive personas) with substantive context loading. + +**How**: Lens registry at `roles/.md`. Score signals (paths + prompt keywords + branch + Linear labels) against lens triggers; threshold ≥2 selects. Load substantive context (files, conventions, domain checklist via `extends:` chain). Decide mode: `augment` (default, silent), `rewrite` (surfaced), `decompose` (P5 fan-out, user-approved). + +**Invariant**: No `act as X` persona rewrites. Lenses load substantive context only. Lens selection is logged. Mode decision is surfaced unless `augment`. + +**Reflexive Trigger Rule**: P17 is a reflex. Apply without being prompted: + +1. On every substantive user input — score lens triggers, log the selection (even if `augment`). +2. When user input matches `decompose` criteria — surface composition tree, request approval before P5 dispatch. +3. When lens has `status: candidate` — track outcome to feed rule-of-three promotion. + +### P18 — Audience: Format-Follows-Audience Discipline + +**What**: Reflexive discipline binding format choice to audience: + +- **Agent-readable** (LLM, system-prompt loaded, in-repo reference) → **markdown** +- **Human-readable** (decisions, review, exploration, sharing) → **HTML** +- **Both** (README, CHANGELOG, GitHub-browseable) → markdown (GitHub renders) +- **Throwaway interactive UI** → HTML + +**How**: Path conventions for substantive deliverables: + +- Specs/plans/ADRs/designs → `docs/{specs,plans,adrs,designs}/.html` +- PR explainers for substantive PRs → `docs/pr-explainers/PR-.html` +- Knowledge graph entities (LLM-loaded) → `research/entities/{type}/{slug}.md` +- README/CHANGELOG/SKILL.md → markdown + +**Anti-patterns**: ASCII pseudo-diagrams, unicode-color approximations, >100-line markdown specs without HTML companion. + +**Invariant**: Format follows audience, not habit. Markdown's expressiveness ceiling means humans bounce off agent-produced specs at ~100 lines; HTML's information density (tables, SVG, CSS, interactivity) carries the load. The 2-4× HTML generation cost is paid only on artifacts a human will actually read. + +**Reflexive Trigger Rule**: P18 is a reflex. Apply without being prompted: + +1. Before producing a spec/plan/ADR/design — default HTML. +2. Before a substantive PR description (>200 LOC OR public API OR multi-file) — produce an HTML PR-explainer at `docs/pr-explainers/PR-.html`. +3. Before a report/retrospective/research synthesis for human consumption — HTML with embedded SVG diagrams. +4. When tempted to ASCII-diagram in markdown — STOP. SVG inside HTML is the correct primitive. +5. When editing `SKILL.md` / `AGENTS.md` / `CLAUDE.md` / `README.md` / `CHANGELOG.md` / entity pages — markdown is correct (LLM-loaded surfaces). +6. When user asks for "a doc explaining X" — apply the audience test, don't default to markdown. + +### P19 — Orchestrate: Orchestration-Mechanism Selection Discipline + +**What**: Names the autonomous-continuation family (the 2×2 of `/goal` | Wait (P9) watcher | `/loop` | Persist (P12)) and the selection discipline. Picks the right mechanism for the work shape; composes dynamically; never returns control mid-arc when a mechanism would keep it closed. + +**How**: At pre-flight of substantive autonomous work, apply the 2×2 decision matrix: + +| | Within session | Across sessions | +|---|---|---| +| **External trigger** (event-driven) | **Wait (P9)** `p9 watch --background` (CI/deploy/build blocking) | **Persist (P12)** `persist iterate PROMPT.md` (cross-context-rot, >1h) | +| **Internal trigger** (condition or time) | **`/goal `** (Haiku evaluator per turn) | **`/loop `** (Claude Code time-trigger) | + +Decision logic: + +1. Verifiable end state + bounded session + condition fits 4000 chars → `/goal ` +2. External completion event blocking (CI, deploy, build) → Wait (P9) `p9 watch --background` + drain wait-queue +3. Time-triggered recurring routine → `/loop ` +4. >1h work OR cross-session OR context window approaching ~100K → Persist (P12) `persist iterate PROMPT.md` with budget + +Compositions are dynamic: Persist iterations can invoke `/goal` for sub-tasks; `/goal` sessions fire Wait (P9) watchers when CI is blocking; `/loop` schedules can spawn Persist for the long-horizon piece. + +**Invariant**: No autonomous-continuation work without (a) an explicit mechanism choice surfaced in the response, and (b) a one-line justification matched to the 2×2 quadrant. Returning control mid-arc when a mechanism would keep it closed is the failure mode P19 prevents. + +**Reflexive Trigger Rule**: P19 is a reflex. Apply without being prompted: + +1. Pre-flight of substantive autonomous work — state chosen mechanism + cite 2×2 quadrant. +2. Before returning control mid-arc — verify no mechanism would keep the arc closed. +3. At mechanism boundary crossings (goal hits >1h, context ~100K) — explicit transition, not drift. +4. When composing mechanisms — surface the composition tree, don't compose silently. +5. Tempted to type "continue please" or wait for user prompts — STOP. That's the ritual P19 makes impossible. + --- -These thirteen primitives compose into the full autonomous development loop: +These nineteen primitives compose into the full autonomous development loop: ``` -User intent → Linear ticket (P3) → Agent dispatched (P5) - → Prior context loaded (P1) [+ P7 freshness check] [+ P10 cleanup audit] - → Safety gates active (P2) - → P10 worktree decision → P11 validation plan - → IF long-horizon → P12 persist loop with PROMPT.md + budget - → Code written + parallel watchers (P11 log-tails) → PR created (P4) - → CI watched + heal loop (P9) - → P11 deploy verification (preview URL, screenshots, browser session) - → Merge → P10 post-merge cleanup via P8 janitor → Deploy - → P13 dream cycle for any consolidation (P6 replay first; future Life dreams compose here) - → P11 dogfood receipt → Session captured (P1) → Knowledge bookkept (P6) +User intent → Lens (P17) intake → Snapshot (P15) → Orchestrate (P19) mechanism choice → Tickets (P3) → Fanout (P5) dispatch + → Prior context via Bridge (P1) [+ Freshness (P7)] [+ Hygiene (P10) audit] + → Gate (P2) active + → Dep-Chain (P14) trace → Hygiene (P10) worktree decision → Empirical (P11) validation plan + → IF long-horizon (per Orchestrate (P19)) → Persist (P12) loop with PROMPT.md + budget + → Code written + Empirical (P11) parallel watchers → PR via Pipeline (P4) + → Wait (P9) CI watch + heal loop + → Empirical (P11) deploy verification (preview URL, screenshots, browser session) + → Merge → Hygiene (P10) post-merge cleanup via Janitor (P8) → Deploy + → Dream (P13) for any consolidation (Bookkeeping (P6) replay first) + → Empirical (P11) dogfood receipt → Session captured via Bridge (P1) → Knowledge via Bookkeeping (P6) + → Audience (P18) check: spec/plan/report → HTML; .md/SKILL.md/CHANGELOG → markdown + → Crystallize (P16) gate: did this session produce a new ≥3-instance pattern? → propose primitive → System improved (EGRI) ``` +## Plugin Skill Precedence + +Plugin skills (`superpowers:*`, `pr-review-toolkit:*`, `codex:*`, etc.) are **subordinate to bstack primitives**. Where they conflict, bstack wins. This is encoded in `superpowers:using-superpowers` itself: *"User's explicit instructions … highest priority. … If CLAUDE.md says X and a skill says Y, follow the user's instructions."* Reading CLAUDE.md → this file *is* that explicit instruction. + +The most common collision: plugin skills that mandate user interaction before action — most notably `superpowers:brainstorming` (discovery interview), `superpowers:writing-plans` ("plan before touching code" even when the task is mechanical), and the meta-rule that prompts the agent to invoke a skill "even if 1% might apply." The bstack answer is **context-first, user-extract last**: + +1. Before any "interview-the-user" plugin skill fires, apply **Dep-Chain (P14)** + **Snapshot (P15)** over: + - Workspace memory files (auto-memory directory; persona, project state, feedback) + - `research/entities/{concept,pattern,tool,person,project}/` — knowledge graph (grep by topic before asking) + - `docs/` (per-project) — architecture, specs, plans, conversations + - Task-mentioned files (CV, spec, ticket body, PR diff) +2. Synthesize what's already known from those sources. +3. **Ask the user only for irreducible residuals** — facts that genuinely cannot be derived from disk. +4. If steps 1–2 fully determine the task, the plugin interview is skipped. Proceed to execution. + +### What this kills + +The "form-fill ritual" — agent receives an application/CV/intake form, opens with a numbered table of N+ questions about facts that live in workspace memory + knowledge graph + project docs. The workspace's curated context exists precisely to remove ask-the-user-for-context loops; plugin skills that re-introduce those loops are violating the substrate's intent, not augmenting it. + +### What this keeps + +The disciplines plugin skills bring that DON'T conflict with bstack: + +- `superpowers:test-driven-development` — TDD execution discipline +- `superpowers:verification-before-completion` — pairs with Empirical (P11) +- `superpowers:systematic-debugging` — diagnostic flow before proposing fixes +- `superpowers:requesting-code-review` — two-stage review pattern (spec-compliance + code-quality) +- `superpowers:executing-plans`, `subagent-driven-development`, `dispatching-parallel-agents` — subagent dispatch substrate +- `superpowers:using-git-worktrees` — composes with Hygiene (P10) +- `superpowers:finishing-a-development-branch` — composes with `/ship` +- `pr-review-toolkit:*`, `codex:*` — orthogonal toolkits + +These run as before. The precedence rule is targeted at *plugin-skill rituals that ask before reading* — nothing else. + +### Why this isn't a new primitive + +Dep-Chain (P14) + Snapshot (P15) already define the required behavior — context-load before action. This section makes their precedence over plugin-skill rituals explicit but adds no new mechanism. The L3 stability budget favors clarifying existing rules over adding new ones. + ## Conventions - **Git**: feature branches, squash merge via PR. Never force push main. diff --git a/assets/templates/CLAUDE.md.template b/assets/templates/CLAUDE.md.template index 065581f..62200fc 100644 --- a/assets/templates/CLAUDE.md.template +++ b/assets/templates/CLAUDE.md.template @@ -2,29 +2,56 @@ ## Identity -This workspace is governed by **bstack** — thirteen irreducible primitives (P1–P13) that turn an agent-driven workspace into a self-operating system. The full primitive contract lives in [AGENTS.md](AGENTS.md). Run `bstack doctor` to verify compliance. +This workspace is governed by **bstack** — nineteen irreducible primitives (P1–P19) that turn an agent-driven workspace into a self-operating system. The full primitive contract lives in [AGENTS.md](AGENTS.md). Run `bstack doctor` to verify compliance. ## Bstack Core Automation Primitives -Thirteen irreducible building blocks that make this workspace self-operating. All are always active. Full specification in `AGENTS.md`. +Nineteen irreducible building blocks that make this workspace self-operating. All are always active. Full specification in `AGENTS.md`. + +Each primitive carries a **short name** for agent prose. When referencing a primitive in responses, PR bodies, commit messages, or comments, use the `Name (Pn)` form — *"applying Snapshot (P15)"*, *"via Dep-Chain (P14)"*, *"running Bookkeeping (P6)"* — not bare `Pn`. The number is the canonical identifier; the short name is the human-readable handle. + +**Short-name index**: Bridge (P1) · Gate (P2) · Tickets (P3) · Pipeline (P4) · Fanout (P5) · Bookkeeping (P6) · Freshness (P7) · Janitor (P8) · Wait (P9) · Hygiene (P10) · Empirical (P11) · Persist (P12) · Dream (P13) · Dep-Chain (P14) · Snapshot (P15) · Crystallize (P16) · Lens (P17) · Audience (P18) · Orchestrate (P19). | # | Primitive | Mechanism | Invariant | |---|-----------|-----------|-----------| -| P1 | Conversation Bridge | Stop hook → JSONL → Obsidian docs → vault | Bridge stamp < 24h stale | -| P2 | Control Gate | PreToolUse hook → `.control/policy.yaml` | G1–G4 blocking, never bypassed | -| P3 | Linear Tickets | Every work unit tracked Backlog → Done | No significant work without a ticket | -| P4 | PR Pipeline | Branch → PR → CI → merge → deploy | Never merge with failing checks | -| P5 | Parallel Agents | Concurrent isolated agents via worktrees | No shared mutable file writes | -| P6 | Knowledge Bookkeeping | `bookkeeping run` → score → promote → entity pages → synthesize | `research/entities/` never contains unscored items | -| P7 | Skill Freshness Check | SessionStart hook → reports stale-skill nudge if last update check ≥ 7d ago | Never blocks; closes silent-rot bug for `npx skills add` snapshots | -| P8 | Branch + Worktree Janitor | `make janitor` → detects squash-merged branches + dead worktrees, removes safely | Default `--dry-run`; never touches protected branches | -| P9 | Productive Wait (`broomva/p9` skill) | wait-queue drains while a blocking operation runs (PR CI is the reference impl: `gh pr checks --watch` via `run_in_background` → classifier + evaluator self-heal). For non-PR waits (push-triggered deploys, builds), do a single direct check after kicking off next-priority work. | Never `sleep` on a blocking wait; merge defers to control metalayer | -| P10 | Worktree Hygiene Discipline | Reflexive rule: decide worktree-or-not before first file; keep `git status` clean; auto-run P8 janitor after every merge | A clean tree is the only reliable reset point | -| P11 | Empirical Feedback Loop | Reflexive rule: validate by *interacting* — log-tails, browser E2E, screenshots, deploy verification, multi-level test composition | Reasoning isn't validation; interaction is | -| P12 | Persistent Loop Discipline (`broomva/persist` skill) | Reflexive rule: cross-context restart loop — state in filesystem (PROMPT.md + git tree), each iteration spawns a fresh agent context | At long-horizon work (>1h), in-context loops decay; restart fresh, backpressure from compilers/tests | -| P13 | Dream Cycle Discipline | Reflexive rule: any consolidation that crosses a cadence-tier boundary MUST follow the 5-phase shape (gather → replay → prune → consolidate → index) | Replay against frozen substrate is the runtime form of stop-gradient; without it, dense lower-tier signal corrupts sparse upper-tier rules | - -> **Naming note.** Primitive numbers and skill repo names line up except for P6 — its skill repo is `broomva/bookkeeping` (named for the function). P9's skill repo is `broomva/p9` and that's the same number — they match. Primitive numbers are sequential identifiers in the bstack itself; skills are independent npm-style packages. `bstack doctor` checks AGENTS.md compliance with the full primitive contract. +| P1 | **Bridge** — Conversation Bridge | Stop hook → JSONL → Obsidian docs → vault | Bridge stamp < 24h stale | +| P2 | **Gate** — Control Gate | PreToolUse hook → `.control/policy.yaml` | G1–G4 blocking, never bypassed | +| P3 | **Tickets** — Linear Tickets | Every work unit tracked Backlog → Done | No significant work without a ticket | +| P4 | **Pipeline** — PR Pipeline | Branch → PR → CI → merge → deploy | Never merge with failing checks | +| P5 | **Fanout** — Parallel Agents | Concurrent isolated agents via worktrees | No shared mutable file writes | +| P6 | **Bookkeeping** — Knowledge Bookkeeping | `bookkeeping run` → score → promote → entity pages → synthesize | `research/entities/` never contains unscored items | +| P7 | **Freshness** — Skill Freshness Check | SessionStart hook → reports stale-skill nudge if last update check ≥ 7d ago | Never blocks; closes silent-rot bug for `npx skills add` snapshots | +| P8 | **Janitor** — Branch + Worktree Janitor | `make janitor` → detects squash-merged branches + dead worktrees, removes safely | Default `--dry-run`; never touches protected branches | +| P9 | **Wait** — Productive Wait (`broomva/p9` skill) | wait-queue drains while a blocking operation runs (PR CI is the reference impl: `gh pr checks --watch` via `run_in_background` → classifier + evaluator self-heal). For non-PR waits (push-triggered deploys, builds), do a single direct check after kicking off next-priority work. | Never `sleep` on a blocking wait; merge defers to control metalayer | +| P10 | **Hygiene** — Worktree Hygiene Discipline | Reflexive rule: decide worktree-or-not before first file; keep `git status` clean; auto-run P8 janitor after every merge | A clean tree is the only reliable reset point | +| P11 | **Empirical** — Empirical Feedback Loop | Reflexive rule: validate by *interacting* — log-tails, browser E2E, screenshots, deploy verification, multi-level test composition | Reasoning isn't validation; interaction is | +| P12 | **Persist** — Persistent Loop Discipline (`broomva/persist` skill) | Reflexive rule: cross-context restart loop — state in filesystem (PROMPT.md + git tree), each iteration spawns a fresh agent context | At long-horizon work (>1h), in-context loops decay; restart fresh, backpressure from compilers/tests | +| P13 | **Dream** — Dream Cycle Discipline | Reflexive rule: any consolidation that crosses a cadence-tier boundary MUST follow the 5-phase shape (gather → replay → prune → consolidate → index) | Replay against frozen substrate is the runtime form of stop-gradient; without it, dense lower-tier signal corrupts sparse upper-tier rules | +| P14 | **Dep-Chain** — Dependency-Chain Reasoning Discipline | Reflexive rule: before any substantive write, enumerate upstream (files, functions, types, contracts, deployed state this depends on) and downstream (consumers, tests, CI gates, docs, in-flight PRs depending on this). Concrete file paths + function names in the response or PR body — not in the agent's head. | "Think deeply through chain of dependencies" without a concrete enumeration step is ritual. P14 makes it machine-checkable. | +| P15 | **Snapshot** — State-Snapshot Before Action | Reflexive rule: before any plan, the agent surfaces `git status` + branch + ahead/behind, in-flight PRs (`gh pr list`), Linear ticket state, bookkeeping/bridge freshness, last deploy state. The snapshot is *part of* the planning response — not deferred. | Plans built on stale state fail silently. P15 makes state-checking a cheap reflex, not a request the user has to make. | +| P16 | **Crystallize** — Crystallization Discipline (the Bstack Engine) | Meta-primitive — the loop that produces every other primitive. Pattern recurs ≥3 times across sessions → propose promotion to skill / SKILL.md / AGENTS.md section / `.control/policy.yaml` gate, gated by the four conditions: ≥3 instances, concrete mechanism, stated invariant, stated failure mode. | The crystallization loop must run inside the workspace, not in the user's head. P1–P15 are *outputs* of this loop. | +| P17 | **Lens** — Lens-Routed Request Articulation (`broomva/role-x` skill) | Reflexive rule: every substantive user input passes through `role/x` intake — select lens(es) from `roles/.md` registry by scoring signals, load substantive context, decide mode (`augment` / `rewrite` / `decompose`); P5 fan-out becomes typed graph. | No `act as X` persona rewrites — lenses load substantive context only. Lens selection is logged. Mode decision is surfaced unless `augment`. | +| P18 | **Audience** — Format-Follows-Audience Discipline | Reflexive rule: format follows audience. Agent-readable (LLM, system-prompt loaded, in-repo reference) → **markdown**. Human-readable (decisions, review, exploration) → **HTML**. Both (README, CHANGELOG, GitHub-browseable) → markdown (GitHub renders). ASCII pseudo-diagrams + unicode-color-approximation + >100-line markdown specs without HTML companion are explicit anti-patterns. Specs/plans/ADRs land in `docs/specs/`, `docs/plans/`, `docs/adrs/` as `.html`. | Format follows audience, not habit. Markdown's expressiveness ceiling means humans bounce off agent-produced specs at ~100 lines; HTML's information density carries the load. The 2-4× HTML generation cost is paid only on artifacts a human will actually read. | +| P19 | **Orchestrate** — Orchestration-Mechanism Selection Discipline | At pre-flight of substantive autonomous work, apply 2×2 (within-session/across-session × external-event/internal-condition) to map work shape to mechanism: `/goal ` (internal+in-session), Wait (P9) `p9 watch --background` (external+in-session), `/loop ` (internal+across-session), Persist (P12) `persist iterate PROMPT.md` (external+across-session). Compose dynamically. | No autonomous-continuation work without explicit mechanism choice + 2×2 quadrant citation. "Continue please" / waiting for user prompts mid-arc is ritual and forbidden. | + +> **Naming note.** Skill repo names are stable and don't always match primitive numbers. P6's skill repo is `broomva/bookkeeping` (named for the function). P9's skill repo is `broomva/p9` — Wait was historically the ninth primitive, so the name matches. Renaming any skill repo would break every `npx skills add` install, so skill names stay; primitive numbers are sequential identifiers in the bstack itself. + +## Plugin Skill Precedence + +Bstack primitives (P1–P19) and bstack-native skills (`/autonomous`, `/shape`, `/persist`, `/ship`, `/bookkeeping`, `/p9`, etc.) **supersede** plugin skills (`superpowers:*`, `pr-review-toolkit:*`, `codex:*`) wherever they conflict. Plugin skills carry no weight when they collide with workspace governance — the `superpowers:using-superpowers` skill itself encodes this priority: *"User's explicit instructions … highest priority. … If CLAUDE.md says X and a skill says Y, follow the user's instructions."* + +The most common collision: plugin skills that mandate user interaction before action (notably `superpowers:brainstorming`'s discovery interview, and the meta-rule that prompts the agent to invoke a skill "even if 1% might apply"). The bstack answer is **context-first, user-extract last**: + +1. Before any "interview the user" plugin skill fires, perform **Dep-Chain (P14)** + **Snapshot (P15)** over: + - Workspace memory files (auto-memory directory) + - `research/entities/{concept,pattern,tool,person,project}/` — knowledge graph (grep by topic before asking) + - `docs/` (per-project) — architecture, specs, plans, conversations + - Task-mentioned files (CV, spec, ticket body, PR diff) +2. Synthesize what's known from those sources. +3. **Ask the user only for irreducible residuals** — facts that genuinely cannot be derived from disk. +4. If steps 1–2 fully determine the task, the plugin interview is skipped; proceed to execution. + +This is a precedence rule, not a new primitive — P14 + P15 already exist; this clarifies that plugin-skill rituals do not override them. The failure mode it shuts down is the "form-fill ritual" — agent asks N+ clarifying questions about facts already curated in the workspace's memory files and knowledge graph. ## Governance Stack @@ -40,10 +67,10 @@ This workspace registers Claude Code hooks in `.claude/settings.json`: | Event | Hook | Purpose | |-------|------|---------| -| `Stop` | `conversation-bridge-hook.sh` | P1 — capture session to knowledge graph | -| `Notification` | `conversation-bridge-hook.sh` | P1 — backup trigger for bridge | -| `PreToolUse` | `control-gate-hook.sh` | P2 — enforce safety shields | -| `SessionStart` | `skill-freshness-hook.sh` | P7 — nudge user when skills are stale | +| `Stop` | `conversation-bridge-hook.sh` | Bridge (P1) — capture session to knowledge graph | +| `Notification` | `conversation-bridge-hook.sh` | Bridge (P1) — backup trigger for bridge | +| `PreToolUse` | `control-gate-hook.sh` | Gate (P2) — enforce safety shields | +| `SessionStart` | `skill-freshness-hook.sh` | Freshness (P7) — nudge user when skills are stale | ## Testing & Verification @@ -51,10 +78,21 @@ This workspace registers Claude Code hooks in `.claude/settings.json`: bstack doctor # Verify primitive contract compliance bstack repair # Fix specific gaps make control-audit # Full metalayer compliance audit -make janitor # P8 janitor (dry-run) +make janitor # Janitor (P8) dry-run ``` ## Conventions - **Git**: feature branches, squash merge via PR. Never force push main. - **Each project** in this workspace can have its own CLAUDE.md with project-specific context. + +## Self-Documenting Standards + +When modifying skills, architecture docs, or governance files: + +1. **Threshold consistency**: A scoring cutoff, layer count, or primitive count changed in one file must be updated in ALL files that reference it. `SKILL.md` is the authoritative source; other files defer to it. +2. **Cross-reference integrity**: Adding a new entity type, status value, or pipeline stage requires updating both the schema/rubric AND the template files that use them. +3. **Primitive count**: Adding a primitive (P-N+1) requires bumping the count in this file's "Bstack Core Automation Primitives" header, adding the table row here, adding the section in `AGENTS.md`, and updating the composition-loop diagram. Run `bstack doctor` after changes to verify lockstep. +4. **Verification**: After modifying this file or any skill, run `bstack doctor` to confirm consistency. + +These rules are enforced by agent reasoning + `bstack doctor`, not hooks. The agent reads them and applies them; the doctor surfaces gaps. diff --git a/tests/template_lockstep.test.sh b/tests/template_lockstep.test.sh new file mode 100755 index 0000000..b75f578 --- /dev/null +++ b/tests/template_lockstep.test.sh @@ -0,0 +1,188 @@ +#!/usr/bin/env bash +# tests/template_lockstep.test.sh — Assert primitive-count and structural +# consistency across the four governance surfaces that must agree: +# +# 1. SKILL.md — frontmatter description (canonical count) +# 2. scripts/doctor.sh — EXPECTED_COUNT (the validator) +# 3. assets/templates/CLAUDE.md.template — scaffolded into new workspaces +# 4. assets/templates/AGENTS.md.template — scaffolded into new workspaces +# +# Run from the bstack repo root: +# bash tests/template_lockstep.test.sh +# +# Exits non-zero on first mismatch. No external test framework. +# +# Why this exists: prior drift had SKILL.md saying "nineteen primitives" +# while templates still said "thirteen", scaffolding new workspaces into +# permanent disagreement with the catalog. This test makes that drift +# CI-visible. + +set -uo pipefail + +BSTACK_REPO="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" + +SKILL_MD="$BSTACK_REPO/SKILL.md" +DOCTOR_SH="$BSTACK_REPO/scripts/doctor.sh" +CLAUDE_TPL="$BSTACK_REPO/assets/templates/CLAUDE.md.template" +AGENTS_TPL="$BSTACK_REPO/assets/templates/AGENTS.md.template" + +PASS=0 +FAIL=0 +FAILED_TESTS=() + +# Number-word mapping for verifying spelled-out counts. +declare -a NUM_WORDS=( + [1]="one" [2]="two" [3]="three" [4]="four" [5]="five" + [6]="six" [7]="seven" [8]="eight" [9]="nine" [10]="ten" + [11]="eleven" [12]="twelve" [13]="thirteen" [14]="fourteen" + [15]="fifteen" [16]="sixteen" [17]="seventeen" [18]="eighteen" + [19]="nineteen" [20]="twenty" [21]="twenty-one" +) + +# ── Helpers ────────────────────────────────────────────────────────────── +assert_eq() { + local name="$1" actual="$2" expected="$3" + if [ "$actual" = "$expected" ]; then + echo " [ok] $name: $actual" + PASS=$((PASS + 1)) + else + echo " [FAIL] $name: got '$actual', expected '$expected'" + FAIL=$((FAIL + 1)) + FAILED_TESTS+=("$name") + fi +} + +assert_contains() { + local name="$1" haystack="$2" needle="$3" + if echo "$haystack" | grep -qF "$needle"; then + echo " [ok] $name: contains '$needle'" + PASS=$((PASS + 1)) + else + echo " [FAIL] $name: missing '$needle'" + FAIL=$((FAIL + 1)) + FAILED_TESTS+=("$name") + fi +} + +# ── 1. Discover the canonical primitive count from doctor.sh ───────────── +echo "" +echo "=== Discovering canonical primitive count ===" + +DOCTOR_COUNT="$(grep -E '^EXPECTED_COUNT=' "$DOCTOR_SH" | head -1 | sed 's/.*=//' | tr -d '"' | tr -d "'")" +if [ -z "$DOCTOR_COUNT" ] || ! [[ "$DOCTOR_COUNT" =~ ^[0-9]+$ ]]; then + # Fallback: count P_NAMES array entries. + DOCTOR_COUNT="$(grep -cE '^\s+"P[0-9]+:' "$DOCTOR_SH" || echo 0)" +fi + +if ! [[ "$DOCTOR_COUNT" =~ ^[0-9]+$ ]] || [ "$DOCTOR_COUNT" -lt 1 ]; then + echo " [FAIL] could not derive primitive count from doctor.sh" + exit 1 +fi + +CANONICAL_COUNT="$DOCTOR_COUNT" +CANONICAL_WORD="${NUM_WORDS[$CANONICAL_COUNT]:-}" +echo " Canonical count (from doctor.sh): $CANONICAL_COUNT ($CANONICAL_WORD)" + +if [ -z "$CANONICAL_WORD" ]; then + echo " [FAIL] no spelled-out word for count $CANONICAL_COUNT (extend NUM_WORDS in this test)" + exit 1 +fi + +# ── 2. SKILL.md says the same count ───────────────────────────────────── +echo "" +echo "=== SKILL.md frontmatter ===" + +# Description references the count as e.g. "nineteen irreducible primitives". +SKILL_DESC="$(sed -n '/^description:/,/^[a-z_]\+:/p' "$SKILL_MD" | head -50)" +assert_contains "SKILL.md description has '$CANONICAL_WORD irreducible primitives'" \ + "$SKILL_DESC" "$CANONICAL_WORD irreducible primitives" + +# Also references the highest P-number explicitly. +assert_contains "SKILL.md description references P$CANONICAL_COUNT" \ + "$SKILL_DESC" "P$CANONICAL_COUNT" + +# Trigger list should span P1..P. +SKILL_TRIGGERS="$(grep -E 'P1.*through.*P[0-9]+' "$SKILL_MD" | head -1)" +if [ -n "$SKILL_TRIGGERS" ]; then + assert_contains "SKILL.md trigger list spans through P$CANONICAL_COUNT" \ + "$SKILL_TRIGGERS" "P$CANONICAL_COUNT" +fi + +# ── 3. CLAUDE.md.template ─────────────────────────────────────────────── +echo "" +echo "=== CLAUDE.md.template ===" + +CLAUDE_TPL_TEXT="$(cat "$CLAUDE_TPL")" +assert_contains "CLAUDE.md.template intro says '$CANONICAL_WORD irreducible primitives (P1–P$CANONICAL_COUNT)'" \ + "$CLAUDE_TPL_TEXT" "$CANONICAL_WORD irreducible primitives (P1" +assert_contains "CLAUDE.md.template references P$CANONICAL_COUNT" \ + "$CLAUDE_TPL_TEXT" "P$CANONICAL_COUNT" + +# Primitive table has a row for each P1..P. +CLAUDE_ROW_COUNT="$(grep -cE '^\| P[0-9]+ \|' "$CLAUDE_TPL" || echo 0)" +assert_eq "CLAUDE.md.template primitive table row count" "$CLAUDE_ROW_COUNT" "$CANONICAL_COUNT" + +# ── 4. AGENTS.md.template ─────────────────────────────────────────────── +echo "" +echo "=== AGENTS.md.template ===" + +AGENTS_TPL_TEXT="$(cat "$AGENTS_TPL")" +assert_contains "AGENTS.md.template intro says '$CANONICAL_WORD irreducible building blocks'" \ + "$AGENTS_TPL_TEXT" "$CANONICAL_WORD irreducible building blocks" +assert_contains "AGENTS.md.template composition-loop intro says '$CANONICAL_WORD primitives'" \ + "$AGENTS_TPL_TEXT" "$CANONICAL_WORD primitives compose" + +# One ### Pn — section per primitive. +AGENTS_SECTION_COUNT="$(grep -cE '^### P[0-9]+ — ' "$AGENTS_TPL" || echo 0)" +assert_eq "AGENTS.md.template ### Pn section count" "$AGENTS_SECTION_COUNT" "$CANONICAL_COUNT" + +# ── 5. Short-name index symmetry between CLAUDE.md.template and AGENTS.md.template ── +echo "" +echo "=== Short-name index symmetry ===" + +CLAUDE_IDX="$(grep -E '^\*\*Short-name index\*\*' "$CLAUDE_TPL" | head -1)" +AGENTS_IDX="$(grep -E '^\*\*Short-name index\*\*' "$AGENTS_TPL" | head -1)" + +if [ -n "$CLAUDE_IDX" ] && [ -n "$AGENTS_IDX" ]; then + CLAUDE_IDX_PAYLOAD="$(echo "$CLAUDE_IDX" | sed 's/^\*\*Short-name index\*\*[^:]*:[[:space:]]*//')" + AGENTS_IDX_PAYLOAD="$(echo "$AGENTS_IDX" | sed 's/^\*\*Short-name index\*\*[^:]*:[[:space:]]*//')" + assert_eq "Short-name index payload matches between templates" \ + "$CLAUDE_IDX_PAYLOAD" "$AGENTS_IDX_PAYLOAD" +else + echo " [FAIL] Short-name index missing in one of the templates" + FAIL=$((FAIL + 1)) + FAILED_TESTS+=("Short-name index presence") +fi + +# Index lists exactly $CANONICAL_COUNT entries (count `(P` occurrences). +if [ -n "$CLAUDE_IDX" ]; then + INDEX_ENTRIES="$(echo "$CLAUDE_IDX" | grep -oE '\(P[0-9]+\)' | wc -l | tr -d ' ')" + assert_eq "CLAUDE.md.template short-name index entry count" "$INDEX_ENTRIES" "$CANONICAL_COUNT" +fi + +# ── 6. Plugin Skill Precedence section present in both templates ───────── +echo "" +echo "=== Plugin Skill Precedence presence ===" +assert_contains "CLAUDE.md.template has Plugin Skill Precedence section" \ + "$CLAUDE_TPL_TEXT" "## Plugin Skill Precedence" +assert_contains "AGENTS.md.template has Plugin Skill Precedence section" \ + "$AGENTS_TPL_TEXT" "## Plugin Skill Precedence" + +# ── Summary ────────────────────────────────────────────────────────────── +echo "" +echo "=== Lockstep test summary ===" +echo " Passed: $PASS" +echo " Failed: $FAIL" + +if [ "$FAIL" -gt 0 ]; then + echo "" + echo "Failed assertions:" + for t in "${FAILED_TESTS[@]}"; do + echo " - $t" + done + exit 1 +fi + +echo "" +echo " All lockstep checks passed at canonical count = $CANONICAL_COUNT ($CANONICAL_WORD)" +exit 0 From c633f5c06b333a20eead137ad6800683e3a8513a Mon Sep 17 00:00:00 2001 From: "Carlos D. Escobar-Valbuena" Date: Thu, 14 May 2026 07:35:13 -0500 Subject: [PATCH 2/3] feat(templates): bump to P20 (Cross-Review) after #14 merge MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Companion to #14 (Cross-Model Adversarial Review Gate). bstack catalog moved to twenty primitives via #14 (SKILL.md / doctor.sh / references/ primitives.md updated) but #14 did not touch templates. That's exactly the drift this PR was created to close, and exactly what the new template_lockstep.test.sh caught immediately after rebase: --- before this commit --- CLAUDE.md.template intro says 'twenty irreducible primitives (P1–P20)' → FAIL (says nineteen / P19) AGENTS.md.template intro says 'twenty irreducible building blocks' → FAIL AGENTS.md.template composition-loop intro says 'twenty primitives' → FAIL AGENTS.md.template ### Pn section count → FAIL (19, expected 20) CLAUDE.md.template primitive table row count → FAIL (19, expected 20) CLAUDE.md.template short-name index entry count → FAIL (19, expected 20) AGENTS.md.template + CLAUDE.md.template references P20 → FAIL --- after this commit --- All 13 lockstep checks passed at canonical count = 20 (twenty) Changes: - **assets/templates/CLAUDE.md.template**: count phrase "nineteen → twenty", "P1–P19 → P1–P20", short-name index appends "Cross-Review (P20)", primitive table gets P20 row (mechanism: 3 strata A/B/C, anti-slop ≥7/10, ≤3 fix rounds, broomva/cross-review skill; invariant: substantive PRs require ≥7/10 verdict before Pipeline P4 auto-merge). - **assets/templates/AGENTS.md.template**: count phrase nineteen→twenty in intro + composition-loop, short-name index appends Cross-Review (P20), new `### P20 — Cross-Review` section with full What/How (3-strata table) / Invariant / Reflexive Trigger Rule (5 triggers). Composition-loop diagram threads Cross-Review (P20) between Empirical (P11) deploy verification and Pipeline (P4) auto-merge — the exact insertion point per P20 spec ("gate fires before P4 auto-merge"). Co-Authored-By: Claude Opus 4.7 (1M context) --- assets/templates/AGENTS.md.template | 31 ++++++++++++++++++++++++++--- assets/templates/CLAUDE.md.template | 7 ++++--- 2 files changed, 32 insertions(+), 6 deletions(-) diff --git a/assets/templates/AGENTS.md.template b/assets/templates/AGENTS.md.template index 96a3ad2..cee3b5d 100644 --- a/assets/templates/AGENTS.md.template +++ b/assets/templates/AGENTS.md.template @@ -8,11 +8,11 @@ This file IS the control harness for agents operating in this workspace. Reading ## Bstack Core Automation Primitives -This workspace is governed by the **bstack** primitive contract — nineteen irreducible building blocks. All are always active. Run `bstack doctor` to verify compliance. +This workspace is governed by the **bstack** primitive contract — twenty irreducible building blocks. All are always active. Run `bstack doctor` to verify compliance. Each primitive carries a **short name** for agent prose. When referencing a primitive in responses, PR bodies, commit messages, or comments, use the `Name (Pn)` form — *"applying Snapshot (P15)"*, *"via Dep-Chain (P14)"*, *"running Bookkeeping (P6)"* — not bare `Pn`. The number is the canonical identifier; the short name is the human-readable handle. -**Short-name index**: Bridge (P1) · Gate (P2) · Tickets (P3) · Pipeline (P4) · Fanout (P5) · Bookkeeping (P6) · Freshness (P7) · Janitor (P8) · Wait (P9) · Hygiene (P10) · Empirical (P11) · Persist (P12) · Dream (P13) · Dep-Chain (P14) · Snapshot (P15) · Crystallize (P16) · Lens (P17) · Audience (P18) · Orchestrate (P19). +**Short-name index**: Bridge (P1) · Gate (P2) · Tickets (P3) · Pipeline (P4) · Fanout (P5) · Bookkeeping (P6) · Freshness (P7) · Janitor (P8) · Wait (P9) · Hygiene (P10) · Empirical (P11) · Persist (P12) · Dream (P13) · Dep-Chain (P14) · Snapshot (P15) · Crystallize (P16) · Lens (P17) · Audience (P18) · Orchestrate (P19) · Cross-Review (P20). ### P1 — Bridge: Conversation Bridge (Episodic Memory) @@ -313,9 +313,33 @@ Compositions are dynamic: Persist iterations can invoke `/goal` for sub-tasks; ` 4. When composing mechanisms — surface the composition tree, don't compose silently. 5. Tempted to type "continue please" or wait for user prompts — STOP. That's the ritual P19 makes impossible. +### P20 — Cross-Review: Cross-Model Adversarial Review Gate + +**What**: Names the rule that *the writer cannot be the final judge*. Before substantive PRs merge, fire a cross-model adversarial gate. Single-model planning + implementing + reviewing reproduces the model's systematic biases — what cross-model-agents research calls *slop* (over-engineered abstractions, unnecessary wrappers, template-paste patterns). + +**How**: Three strata, ordered by strength: + +| Strata | Mechanism | When | +|---|---|---| +| **A** True cross-vendor | `codex exec -m gpt-5.4` reads diff + scores | Codex CLI installed | +| **B** Cross-context same-model | Fresh `Agent` subagent under devil's-advocate brief | Always available | +| **C** Composed existing skills | `superpowers:constructive-dissent`, `devils-advocate`, `pr-review-toolkit:*`, `critique`, `premortem`, `plan-*-review` | Always — the toolkit P20 makes mandatory | + +Scoring: anti-slop ≥7/10 to pass; max 3 fix rounds; verdict logged in PR comments. Implementation: `broomva/cross-review` skill. The gate fires *before* P4 auto-merge — not after merge as code review. + +**Invariant**: Substantive PRs (>200 LOC OR public API change OR multi-file OR governance-class) cannot merge without cross-model adversarial verdict ≥7/10. Self-review by the writing model is forbidden as the *sole* verdict. + +**Reflexive Trigger Rule**: P20 is a reflex. Apply without being prompted: + +1. Before pushing substantive PRs — fire the gate (Strata A if Codex, else B+C). Score + verdict precede merge. +2. When verdict <7 — fix → rescore. Max 3 rounds. Round 3 failure → escalate to user. +3. When the writer is the only model in the loop — STOP. Strata B at minimum is mandatory. +4. When tempted to skip P20 because "small PR" — threshold is *substantive* (>200 LOC OR public API OR multi-file OR governance). Trivial (typo, single-file doc) exempt. +5. P20 sits between Empirical (P11) validation and Pipeline (P4) auto-merge — does not replace either. + --- -These nineteen primitives compose into the full autonomous development loop: +These twenty primitives compose into the full autonomous development loop: ``` User intent → Lens (P17) intake → Snapshot (P15) → Orchestrate (P19) mechanism choice → Tickets (P3) → Fanout (P5) dispatch @@ -326,6 +350,7 @@ User intent → Lens (P17) intake → Snapshot (P15) → Orchestrate (P19) mecha → Code written + Empirical (P11) parallel watchers → PR via Pipeline (P4) → Wait (P9) CI watch + heal loop → Empirical (P11) deploy verification (preview URL, screenshots, browser session) + → Cross-Review (P20) gate (Strata A or B+C, ≥7/10) → if pass, Pipeline (P4) auto-merge eligible → Merge → Hygiene (P10) post-merge cleanup via Janitor (P8) → Deploy → Dream (P13) for any consolidation (Bookkeeping (P6) replay first) → Empirical (P11) dogfood receipt → Session captured via Bridge (P1) → Knowledge via Bookkeeping (P6) diff --git a/assets/templates/CLAUDE.md.template b/assets/templates/CLAUDE.md.template index 62200fc..b0938f6 100644 --- a/assets/templates/CLAUDE.md.template +++ b/assets/templates/CLAUDE.md.template @@ -2,15 +2,15 @@ ## Identity -This workspace is governed by **bstack** — nineteen irreducible primitives (P1–P19) that turn an agent-driven workspace into a self-operating system. The full primitive contract lives in [AGENTS.md](AGENTS.md). Run `bstack doctor` to verify compliance. +This workspace is governed by **bstack** — twenty irreducible primitives (P1–P20) that turn an agent-driven workspace into a self-operating system. The full primitive contract lives in [AGENTS.md](AGENTS.md). Run `bstack doctor` to verify compliance. ## Bstack Core Automation Primitives -Nineteen irreducible building blocks that make this workspace self-operating. All are always active. Full specification in `AGENTS.md`. +Twenty irreducible building blocks that make this workspace self-operating. All are always active. Full specification in `AGENTS.md`. Each primitive carries a **short name** for agent prose. When referencing a primitive in responses, PR bodies, commit messages, or comments, use the `Name (Pn)` form — *"applying Snapshot (P15)"*, *"via Dep-Chain (P14)"*, *"running Bookkeeping (P6)"* — not bare `Pn`. The number is the canonical identifier; the short name is the human-readable handle. -**Short-name index**: Bridge (P1) · Gate (P2) · Tickets (P3) · Pipeline (P4) · Fanout (P5) · Bookkeeping (P6) · Freshness (P7) · Janitor (P8) · Wait (P9) · Hygiene (P10) · Empirical (P11) · Persist (P12) · Dream (P13) · Dep-Chain (P14) · Snapshot (P15) · Crystallize (P16) · Lens (P17) · Audience (P18) · Orchestrate (P19). +**Short-name index**: Bridge (P1) · Gate (P2) · Tickets (P3) · Pipeline (P4) · Fanout (P5) · Bookkeeping (P6) · Freshness (P7) · Janitor (P8) · Wait (P9) · Hygiene (P10) · Empirical (P11) · Persist (P12) · Dream (P13) · Dep-Chain (P14) · Snapshot (P15) · Crystallize (P16) · Lens (P17) · Audience (P18) · Orchestrate (P19) · Cross-Review (P20). | # | Primitive | Mechanism | Invariant | |---|-----------|-----------|-----------| @@ -33,6 +33,7 @@ Each primitive carries a **short name** for agent prose. When referencing a prim | P17 | **Lens** — Lens-Routed Request Articulation (`broomva/role-x` skill) | Reflexive rule: every substantive user input passes through `role/x` intake — select lens(es) from `roles/.md` registry by scoring signals, load substantive context, decide mode (`augment` / `rewrite` / `decompose`); P5 fan-out becomes typed graph. | No `act as X` persona rewrites — lenses load substantive context only. Lens selection is logged. Mode decision is surfaced unless `augment`. | | P18 | **Audience** — Format-Follows-Audience Discipline | Reflexive rule: format follows audience. Agent-readable (LLM, system-prompt loaded, in-repo reference) → **markdown**. Human-readable (decisions, review, exploration) → **HTML**. Both (README, CHANGELOG, GitHub-browseable) → markdown (GitHub renders). ASCII pseudo-diagrams + unicode-color-approximation + >100-line markdown specs without HTML companion are explicit anti-patterns. Specs/plans/ADRs land in `docs/specs/`, `docs/plans/`, `docs/adrs/` as `.html`. | Format follows audience, not habit. Markdown's expressiveness ceiling means humans bounce off agent-produced specs at ~100 lines; HTML's information density carries the load. The 2-4× HTML generation cost is paid only on artifacts a human will actually read. | | P19 | **Orchestrate** — Orchestration-Mechanism Selection Discipline | At pre-flight of substantive autonomous work, apply 2×2 (within-session/across-session × external-event/internal-condition) to map work shape to mechanism: `/goal ` (internal+in-session), Wait (P9) `p9 watch --background` (external+in-session), `/loop ` (internal+across-session), Persist (P12) `persist iterate PROMPT.md` (external+across-session). Compose dynamically. | No autonomous-continuation work without explicit mechanism choice + 2×2 quadrant citation. "Continue please" / waiting for user prompts mid-arc is ritual and forbidden. | +| P20 | **Cross-Review** — Cross-Model Adversarial Review Gate (`broomva/cross-review` skill) | Before substantive PRs merge, fire cross-model adversarial gate. Three strata: A (true cross-vendor via `codex exec`), B (fresh-context subagent under devil's-advocate brief), C (composed adversarial-review skills — `superpowers:constructive-dissent`, `devils-advocate`, `pr-review-toolkit:*`, `critique`, `premortem`). Anti-slop score ≥7/10; max 3 fix rounds; verdict logged in PR. Fires *before* P4 auto-merge. | Substantive PRs (>200 LOC OR public API OR multi-file OR governance-class) cannot merge without cross-model verdict ≥7/10. Self-review by the writing model as sole verdict is forbidden. | > **Naming note.** Skill repo names are stable and don't always match primitive numbers. P6's skill repo is `broomva/bookkeeping` (named for the function). P9's skill repo is `broomva/p9` — Wait was historically the ninth primitive, so the name matches. Renaming any skill repo would break every `npx skills add` install, so skill names stay; primitive numbers are sequential identifiers in the bstack itself. From a249089f1dfed9f7a93f9f25d73c71da9d7090d6 Mon Sep 17 00:00:00 2001 From: "Carlos D. Escobar-Valbuena" Date: Thu, 14 May 2026 12:19:12 -0500 Subject: [PATCH 3/3] feat: align P7/P8/P9 numbering to workspace canonical MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Resolves the pre-existing numbering inconsistency where bstack canonical disagreed with broomva/workspace AGENTS.md on P7/P8/P9. After this change: P7 = CI Watcher + Productive Wait (broomva/p9 skill — historical name) P8 = Skill Freshness Check (SessionStart hook) P9 = Branch + Worktree Janitor (make janitor) This matches the workspace's renumbering documented in its naming-note (P7's skill repo is broomva/p9 — named when it was the 9th primitive; renaming the SKILL repo would break every npx skills add install). Why workspace canonical wins: - Workspace numbering is the actively-edited canonical (7 primitives added under it in the last 3 sessions: P14, P15, P16, P17, P18, P19, P20) - Reverting workspace would touch many files; aligning bstack to workspace is contained to this single repo - The doctor (which reads workspace AGENTS.md/CLAUDE.md) now passes 73/73 against workspace — eliminates the 1 pre-existing gap reported for several sessions Changes: SKILL.md: - Frontmatter description: P7/P8/P9 sentences reordered to workspace canonical (Productive Wait → Skill Freshness → Janitor) - Primitives table: rows swapped (P7 = Productive Wait skill p9 historical, P8 = Skill Freshness, P9 = Janitor) references/primitives.md: - Table of contents: P7/P8/P9 entries swapped to workspace order - §P7 — now CI Watcher + Productive Wait (was Skill Freshness); skill- name note rewritten to "broomva/p9 historical name" instead of "names match" - §P8 — now Skill Freshness Check (was Janitor); env vars updated to BROOMVA_P8_THRESHOLD_DAYS with legacy P7 fallback - §P9 — now Branch + Worktree Janitor (was Productive Wait); protected path now ~/.config/broomva/p9-janitor/ with legacy p8 fallback - Reflexive Trigger Rule subsection: now P7 (Productive Wait), was P9 scripts/doctor.sh: - Header comment: reflexive list updated (P6, P7, P10..) instead of (P6, P9, P10..); script-mapping comments updated (P7 = p9.py, P8 = skill-freshness-hook, P9 = branch-janitor) - P_NAMES array: P7/P8/P9 titles swapped to workspace canonical - REFLEXIVE_PRIMS: P9 → P7 (Productive Wait is the reasoning-enforced one; Janitor (now P9) is mechanism-only) - SCRIPT_PATHS: reordered so labels (P1, P2, P6, P7, P8, P9, P12) map to disk paths in the right order - **Format-tolerant section matching**: doctor now accepts both `### P1: Title` AND `### P1 — Label: Title` formats (workspace AGENTS.md uses the labeled form; doctor was missing matches against workspace files). Fixed in P_NAMES section-presence check + the awk pattern in REFLEXIVE_PRIMS scan. Companion fix in workspace: - broomva/workspace/scripts/bstack-primitive-lint.py parse regex extended to accept `### P\d+ —` (label form) in addition to `### P\d+:` (original form) Verified: - bash scripts/doctor.sh against workspace → 73/73 PASSES - G-L3-1 + G-L3-2 both green The naming note about broomva/p9 skill being historically named is preserved (and now consistent across workspace and bstack). The "primitive number matches skill name" pedagogical claim in the old §P9 is reframed in the new §P7 as "skill repo name is stable; primitive number changed; this is intentional." Co-Authored-By: Claude Opus 4.7 (1M context) --- SKILL.md | 18 ++++++------ references/primitives.md | 60 ++++++++++++++++++++-------------------- scripts/doctor.sh | 34 ++++++++++++++--------- 3 files changed, 60 insertions(+), 52 deletions(-) diff --git a/SKILL.md b/SKILL.md index 88760bb..f7e9ae3 100644 --- a/SKILL.md +++ b/SKILL.md @@ -7,12 +7,12 @@ description: | the substrate. P1 captures every session as episodic memory. P2 gates destructive operations. P3 tracks every work unit in Linear. P4 forces every change through CI. P5 isolates parallel agents in worktrees. P6 keeps the - knowledge graph quality-controlled. P7 nudges when installed skills go - stale. P8 cleans up squash-merged branches and dead worktrees. P9 is the - productive-wait optimizer — drains a context-scoped queue while a - blocking operation (PR CI, deploy, build, long index) runs; classifier + - evaluator self-heal red CI is the reference implementation. P10 binds - every agent to + knowledge graph quality-controlled. P7 is the productive-wait optimizer + (`broomva/p9` skill — historical name) — drains a context-scoped queue + while a blocking operation (PR CI, deploy, build, long index) runs; + classifier + evaluator self-heal red CI is the reference implementation. + P8 nudges when installed skills go stale. P9 cleans up squash-merged + branches and dead worktrees. P10 binds every agent to clean-tree discipline through the PR lifecycle. P11 is the cohesion glue — bind every agent to validate by interacting with what they build, not just by reasoning + lint + CI exit codes. P12 is the long-horizon discipline — @@ -91,9 +91,9 @@ The sixteen primitives. Each closes one specific failure mode that drifts into e | **P4** | PR Pipeline | merging unreviewed code | | **P5** | Parallel Agents | sequential bottleneck | | **P6** | Knowledge Bookkeeping | knowledge graph rot | -| **P7** | Skill Freshness Check | silent rot of `npx skills add` snapshots | -| **P8** | Branch + Worktree Janitor | squash-merge accumulation | -| **P9** | Productive Wait (`broomva/p9` skill) | sleep-on-wait dead time (CI, deploys, builds — PR CI is the reference impl) | +| **P7** | CI Watcher + Productive Wait (`broomva/p9` skill — historical name) | sleep-on-wait dead time (CI, deploys, builds — PR CI is the reference impl) | +| **P8** | Skill Freshness Check | silent rot of `npx skills add` snapshots | +| **P9** | Branch + Worktree Janitor | squash-merge accumulation | | **P10** | Worktree Hygiene Discipline | dirty-tree drift across the PR lifecycle | | **P11** | Empirical Feedback Loop | shipping code that compiles but doesn't work | | **P12** | Persistent Loop Discipline (`broomva/persist` skill) | long-horizon work decaying as the context window rots | diff --git a/references/primitives.md b/references/primitives.md index 509ee18..f83f533 100644 --- a/references/primitives.md +++ b/references/primitives.md @@ -10,9 +10,9 @@ The eleven primitives that make a workspace self-operating. This is the canonica - [P4 — PR Pipeline](#p4--pr-pipeline) - [P5 — Parallel Agents](#p5--parallel-agents) - [P6 — Knowledge Bookkeeping](#p6--knowledge-bookkeeping) -- [P7 — Skill Freshness Check](#p7--skill-freshness-check) -- [P8 — Branch + Worktree Janitor](#p8--branch--worktree-janitor) -- [P9 — Productive Wait](#p9--productive-wait) +- [P7 — CI Watcher + Productive Wait](#p7--ci-watcher--productive-wait) +- [P8 — Skill Freshness Check](#p8--skill-freshness-check) +- [P9 — Branch + Worktree Janitor](#p9--branch--worktree-janitor) - [P10 — Worktree Hygiene Discipline](#p10--worktree-hygiene-discipline) - [P11 — Empirical Feedback Loop](#p11--empirical-feedback-loop) - [P12 — Persistent Loop Discipline](#p12--persistent-loop-discipline) @@ -102,58 +102,58 @@ Mental checklist before declaring graph-dependent work done: *Did this session p --- -## P7 — Skill Freshness Check - -**Closes**: silent rot of `npx skills add` snapshots. Skills don't auto-update; without a nudge they go stale and sessions hit `error: unrecognized arguments: --foo` from out-of-date binaries. - -**How**: `SessionStart` hook → `scripts/skill-freshness-hook.sh` checks the timestamp of `~/.config/broomva/p7/last-skill-update-check` (legacy `~/.config/broomva/p8/` still honored for in-place upgrades). If ≥ 7 days old (or never), prints a one-line nudge with refresh command + dismissal `touch`. Always exits 0. - -**Invariant**: hook always exits 0. `BROOMVA_P7_THRESHOLD_DAYS` env var configurable (default 7; legacy `BROOMVA_P8_THRESHOLD_DAYS` still honored). Dismissal: run `npx skills update -g` then `touch ~/.config/broomva/p7/last-skill-update-check`. - ---- - -## P8 — Branch + Worktree Janitor - -**Closes**: squash-merged branches and dead worktrees accumulate. `git branch --merged` doesn't catch squash-merges (the branch tip isn't an ancestor of main). - -**How**: `make janitor` (wraps `scripts/branch-janitor.sh`). Walks current repo (or all workspace repos with `--scope=workspace`). For each non-protected branch matching the include pattern (`feat/*,fix/*,chore/*,docs/*` by default): runs the canonical squash-merge detection — `git commit-tree -p ` produces a synthetic commit; `git cherry origin/main ` reports if its patch is in main. If yes, branch is mergeable. Worktrees whose underlying branch is gone get pruned via `git worktree remove --force`. - -**Invariant**: default `--dry-run` — pass `--apply` to actually delete. Never touches main, master, develop, HEAD, gh-pages, or any branch in `~/.config/broomva/p8-janitor/protected.txt` (legacy `~/.config/broomva/p9-janitor/` still honored). Currently-checked-out branch always skipped. - ---- - -## P9 — Productive Wait +## P7 — CI Watcher + Productive Wait **Closes**: `sleep`-on-wait dead time. Agents lose 5–15 min per blocking operation — CI checks, deploy verifications, builds, long-running indexing operations. The primitive is *productive wait*: convert the block into work on the next priority. **How (general primitive)**: spawn the blocking-wait observer via `run_in_background` so the agent gets an event-driven notification on completion. While the observer runs, the agent drains a context-scoped priority queue (`session > memory > graph > docs > linear`). On notification, classify the result and either advance or self-heal. -**Reference implementation — PR CI**: `python3 skills/p9/scripts/p9.py watch --background` spawns `gh pr checks --watch`. On bg-task notification, agent reads `p9 status` → on green, `p9 merge-ready` → defer to control metalayer for authorization. On red, `p9 heal --classify` → if classified+evaluator-positive, apply heal (PR-diff scope only) and start a new watch. Auto-merge actuator (`p9 auto-merge`) consults `.control/policy.yaml`'s `auto_merge:` block with governance-paths-always-block safety pre-pass. +**Reference implementation — PR CI**: `python3 skills/p9/scripts/p9.py watch --background` spawns `gh pr checks --watch`. On bg-task notification, agent reads `p9 status` → on green, `p9 merge-ready` → defer to control metalayer for authorization. On red, `p9 heal --classify` → if classified+evaluator-positive, apply heal (PR-diff scope only) and start a new watch. Auto-merge actuator (`p9 auto-merge`) consults `.control/policy.yaml`'s `auto_merge:` block. **Other waits the primitive applies to** (today: handled by direct check; on the roadmap to wire into `p9`): -- **Push-triggered dev/staging/prod deploys** — when the trigger isn't a PR (e.g., main-branch deploy on push), p9 currently only tracks PRs. Today's workaround: do a single direct check on the deploy URL/log after `git push`. Do *not* `sleep` waiting for the deploy; pull the next item from the wait-queue. Tracked as a P9 extension. +- **Push-triggered dev/staging/prod deploys** — when the trigger isn't a PR (e.g., main-branch deploy on push), p9 currently only tracks PRs. Today's workaround: do a single direct check on the deploy URL/log after `git push`. Do *not* `sleep` waiting for the deploy; pull the next item from the wait-queue. Tracked as a P7 extension. - **Long-running test suites / build pipelines** — same shape; observer is whatever produces the completion event (CLI exit code, webhook, log line). - **External index / sync operations** — same shape, longer time horizons. **Invariant**: never `sleep` on a blocking wait. Every failure produces (a) a `state.jsonl` event, (b) a Linear ticket, or (c) both — silent state drops are forbidden (exit 99). Heal actions are scoped to files in PR diff (where applicable). All setpoints (`max_concurrent_prs`, `max_attempts`, `stability_floor`, `classified_failure_types`) live in `.control/policy.yaml` and fail closed if missing. -**Skill name**: `broomva/p9` — primitive number P9, skill name `p9`. They match. +**Skill name**: `broomva/p9` — historical name (was the 9th primitive when first crystallized; renaming would break every `npx skills add broomva/p9` install). Primitive number is now P7; skill repo name is stable at `p9`. The `broomva/p9` SKILL.md is the canonical implementation. -### P9 Reflexive Trigger Rule (binding on every agent) +### P7 Reflexive Trigger Rule (binding on every agent) -P9 is a reflex, not a request. Agents must apply *productive-wait discipline* without being prompted in any of these situations: +P7 is a reflex, not a request. Agents must apply *productive-wait discipline* without being prompted in any of these situations: 1. **Immediately after `git push` that opens or updates a PR** — invoke `p9 watch --background` within the same response, before any other tool call. The watcher must be running before the agent considers the push "done." 2. **After `git push` that triggers a non-PR deploy** (e.g., a push to `main` that fires a deploy hook) — p9 doesn't track non-PR triggers yet. Do *one* direct check on the deploy result after kicking off the next high-priority work; never `sleep` waiting for it. 3. **Whenever the agent is tempted to `sleep` while a blocking operation runs** — hard ban. Pull from `p9 wait-queue pop` instead. If the queue is empty, do non-code productive work (research adjacent entities, validate doc cross-refs) until the bg-task notification fires. 4. **When a watcher's bg-task notification reports red CI** — invoke `p9 heal --classify` *before* re-pushing a fix or asking the user. If classified, apply the heal command (PR-diff scope only) and start a new watch. If unclassified, escalate via Linear and surface the failure. -5. **When `p9 status` reports `MERGE_READY`** — invoke `p9 auto-merge ` rather than `gh pr merge` directly. The actuator consults `.control/policy.yaml`'s `auto_merge:` block, blocks governance-class paths automatically, and only auto-merges branch classes explicitly allowlisted. +5. **When `p9 status` reports `MERGE_READY`** — invoke `p9 auto-merge ` rather than `gh pr merge` directly. The actuator consults `.control/policy.yaml`'s `auto_merge:` block; per the gates-are-trust principle, governance-class paths auto-merge when L3 trust gates pass (no special-case bypass). Mental checklist before declaring wait-dependent work done: *What blocking operation am I waiting on? Is it a PR (use `p9 watch`) or a non-PR trigger (single direct check + drain queue)? Am I about to `sleep` or poll? Did I drain the wait-queue while waiting?* --- +## P8 — Skill Freshness Check + +**Closes**: silent rot of `npx skills add` snapshots. Skills don't auto-update; without a nudge they go stale and sessions hit `error: unrecognized arguments: --foo` from out-of-date binaries. + +**How**: `SessionStart` hook → `scripts/skill-freshness-hook.sh` checks the timestamp of `~/.config/broomva/p8/last-skill-update-check` (legacy `~/.config/broomva/p7/` still honored for in-place upgrades). If ≥ 7 days old (or never), prints a one-line nudge with refresh command + dismissal `touch`. Always exits 0. + +**Invariant**: hook always exits 0. `BROOMVA_P8_THRESHOLD_DAYS` env var configurable (default 7; legacy `BROOMVA_P7_THRESHOLD_DAYS` still honored). Dismissal: run `npx skills update -g` then `touch ~/.config/broomva/p8/last-skill-update-check`. + +--- + +## P9 — Branch + Worktree Janitor + +**Closes**: squash-merged branches and dead worktrees accumulate. `git branch --merged` doesn't catch squash-merges (the branch tip isn't an ancestor of main). + +**How**: `make janitor` (wraps `scripts/branch-janitor.sh`). Walks current repo (or all workspace repos with `--scope=workspace`). For each non-protected branch matching the include pattern (`feat/*,fix/*,chore/*,docs/*` by default): runs the canonical squash-merge detection — `git commit-tree -p ` produces a synthetic commit; `git cherry origin/main ` reports if its patch is in main. If yes, branch is mergeable. Worktrees whose underlying branch is gone get pruned via `git worktree remove --force`. + +**Invariant**: default `--dry-run` — pass `--apply` to actually delete. Never touches main, master, develop, HEAD, gh-pages, or any branch in `~/.config/broomva/p9-janitor/protected.txt` (legacy `~/.config/broomva/p8-janitor/` still honored). Currently-checked-out branch always skipped. + +--- + ## P10 — Worktree Hygiene Discipline **Closes**: dirty trees, half-finished branches, orphan worktrees accumulating across sessions and becoming slow leaks of merge conflicts and "what was I doing?" amnesia. diff --git a/scripts/doctor.sh b/scripts/doctor.sh index 8ac1cdf..29551f5 100755 --- a/scripts/doctor.sh +++ b/scripts/doctor.sh @@ -9,18 +9,19 @@ # 1. CLAUDE.md primitives table has all P1-P20 rows + correct count # 2. AGENTS.md has each primitive section (### P1: through ### P20:) # 3. AGENTS.md has the binding reflexive trigger rules for primitives -# that require them (P6, P9, P10, P11, P12, P13, P14, P15, P16, P17, +# that require them (P6, P7, P10, P11, P12, P13, P14, P15, P16, P17, # P18, P19, P20 — primitives where the agent's reasoning enforces -# the policy, not a hook) +# the policy, not a hook; P7 = CI Watcher / Productive Wait, +# P8 = Skill Freshness hook, P9 = Janitor make-target) # 4. .control/policy.yaml has required blocks (ci_watch, ci_heal, auto_merge) # 5. .claude/settings.json hooks wire the expected primitive scripts # 6. Each primitive's mechanism is reachable on disk: # - P1: scripts/conversation-bridge-hook.sh # - P2: scripts/control-gate-hook.sh + .control/policy.yaml # - P6: skills/bookkeeping/scripts/bookkeeping.py -# - P7: scripts/skill-freshness-hook.sh -# - P8: scripts/branch-janitor.sh -# - P9: skills/p9/scripts/p9.py +# - P7: skills/p9/scripts/p9.py (Productive Wait — skill name historical) +# - P8: scripts/skill-freshness-hook.sh +# - P9: scripts/branch-janitor.sh # - P12: skills/persist/scripts/persist.py # 7. (continued — each primitive's mechanism) # 8. L3 trust gates (G-L3-1 + G-L3-2) pass via scripts/bstack-primitive-lint.py @@ -117,9 +118,9 @@ declare -a P_NAMES=( "P4: PR Pipeline" "P5: Parallel Agent" "P6: Knowledge Bookkeeping" - "P7: Skill Freshness" - "P8: Branch + Worktree Janitor" - "P9: Productive Wait" + "P7: CI Watcher + Productive Wait" + "P8: Skill Freshness" + "P9: Branch + Worktree Janitor" "P10: Worktree Hygiene" "P11: Empirical Feedback Loop" "P12: Persistent Loop Discipline" @@ -135,7 +136,11 @@ declare -a P_NAMES=( if [ -f "$AGENTS" ]; then for entry in "${P_NAMES[@]}"; do prefix="${entry%%:*}" # e.g. "P1" - if grep -qE "^### $prefix:" "$AGENTS"; then + # Accept either: + # ### P1: Title (original bstack format) + # ### P1 — Label: Title (extended format with categorical label) + # Trailing word-boundary ensures P1 doesn't match P10/P11/P12. + if grep -qE "^### $prefix(:| —)" "$AGENTS"; then ok "section $prefix present" else gap "AGENTS.md missing '### $entry' section" \ @@ -148,12 +153,15 @@ fi section "4. AGENTS.md reflexive trigger rules" # Primitives whose discipline is enforced via agent reasoning rather than hooks. # These MUST contain a Reflexive Trigger Rule subsection. -declare -a REFLEXIVE_PRIMS=(P6 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20) +# Note: workspace canonical numbering — P7 = Productive Wait (reasoning-enforced), +# P8 = Skill Freshness (hook-enforced), P9 = Janitor (mechanism-only). +declare -a REFLEXIVE_PRIMS=(P6 P7 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20) if [ -f "$AGENTS" ]; then for prim in "${REFLEXIVE_PRIMS[@]}"; do - # Look for "P{n} is a reflex" OR "Reflexive Trigger Rule" in proximity to the prim section + # Look for "P{n} is a reflex" OR "Reflexive Trigger Rule" in proximity to the prim section. + # Accept either '### P{n}: Title' or '### P{n} — Label: Title' header format. if awk -v p="$prim" ' - /^### / { in_sec = ($0 ~ "^### "p":") } + /^### / { in_sec = ($0 ~ "^### "p"(:| —)") } in_sec && /Reflexive Trigger Rule/ { found = 1 } in_sec && / is a reflex/ { found = 1 } END { exit (found ? 0 : 1) } @@ -213,9 +221,9 @@ SCRIPT_PATHS=( "scripts/conversation-bridge-hook.sh" "scripts/control-gate-hook.sh" "skills/bookkeeping/scripts/bookkeeping.py" + "skills/p9/scripts/p9.py" "scripts/skill-freshness-hook.sh" "scripts/branch-janitor.sh" - "skills/p9/scripts/p9.py" "skills/persist/scripts/persist.py" ) SCRIPT_LABELS=(P1 P2 P6 P7 P8 P9 P12)