Central-package /implement skill + amend GC-O007 gate model (#791)#792
Merged
Brad-Edwards merged 7 commits intodevfrom May 3, 2026
Conversation
…791) Lands the canonical implement skill at skills/implement/SKILL.md as the single source of truth, parameterized by per-repo .ground-control.yaml via gc_get_repo_ground_control_context. Removes the per-repo .claude/skills/implement/SKILL.md fork. bin/install-skills.sh distributes to ~/.claude/skills/ and ~/.codex/prompts/ so the same workflow is drivable from Claude Code or Codex against identical content. Schema additions to .ground-control.yaml (ADR-027): - docs: { adr_dir, architecture_overview, coding_standards, workflow_reference, knowledge_base } — all path-valued, validated with realpath containment to reject absolute paths and symlink escapes. - example_paths: { source, test } - requirements: { uid_examples } - cross_cutting_concerns: { description } - workflow.base_branch (was hardcoded 'dev' in the skill prose). gc_get_repo_ground_control_context now returns these blocks; the canonical SKILL.md renders against them via {cfg.X|default Y} placeholders. Gate-model amendment (ADR-029, GC-O007 statement updated): - Drop synchronous plan-approval as a human touchpoint. PR merge is the only synchronous human gate. Plan posted to GitHub issue as comment; workflow proceeds directly to TDD. - Review findings and decisions on findings (fix / wontfix / not-applicable, each with one-line rationale) recorded as comments on the issue thread. Agent silence on a finding is a process violation. defer is removed from the decision set. - Codex review loops hard-capped at 2 cycles (no soft-cap escape). - Codex remains the reviewer of record regardless of driver. - ADR-021 amended (not superseded). ADR-027 has the transport-not-bypass clause removed and points at ADR-029. - ADR-028 (Temporal boundary) plan-approval reference cleaned up to align with ADR-029. Out of scope, surfaced as follow-ups on the issue thread: - gc_codex_review refactor reviewer set (Step 12.5 ships as a gap) - review-tests migration to skills/review-tests/ - Per-repo migrations for shifter, pulsar, aptl - Codex prompts directory convention as it settles GC-O007 is already ACTIVE; statement amended via gc_update_requirement. GC-O009 stays DRAFT (Temporal not delivered); linked via DOCUMENTS. Both linked to issue #791 as DOCUMENTS already; reconciliation in Phase D will add the IMPLEMENTS links to the canonical SKILL.md and the schema additions for GC-O007 specifically. Closes #791.
Address six core findings + one security finding from gc_codex_review on PR #792: - Validate workflow.base_branch against a safe Git ref allowlist in normalizeWorkflowConfig. The value flows into shell-evaluated `gh` and `git` commands rendered by the implement skill, so without validation a hostile .ground-control.yaml could inject shell commands. - Replace hardcoded `dev`/`main` in /implement Step 16 with `{cfg.workflow.base_branch|default dev}` so non-`dev` repos reconcile traceability against the right base branch. - Reorder /implement Step 15: classify each in-scope requirement as materially-implemented vs forward-looking BEFORE transitioning, so forward-looking ones stay DRAFT instead of being prematurely promoted. - Remove /implement Step 12.5 entirely. It was a documented no-op pending tooling that may never ship; an unimplemented "required gate" is worse than no gate. Cross-cutting-concerns coverage already lives in Step 3 (existing-helper inventory) and Step 12 (codex review). - Migrate review-tests to skills/review-tests/SKILL.md so /implement Step 13 is agent-neutral. bin/install-skills.sh ships the same content to Claude Code and Codex; the skill's diff command now reads workflow.base_branch. - Remove `defer` from review-finding decision vocabulary in the implement skill, the review-tests skill, and the CHANGELOG narrative. Workflow contract is "fix every finding before PR is ready"; defer was an explicit bypass. - Update docs/DEVELOPMENT_WORKFLOW.md Mermaid diagram and User Touchpoints section to match ADR-029. S5 is now a non-gating "post plan as issue comment" node, the approval edge is unconditional, and the prior hedge line is gone. Tests added in lib.test.js cover the base_branch allowlist (safe inputs accepted; shell-injection and unsafe-ref shapes rejected). Full unit suite: 124/124 pass. pre-commit: all hooks green under Java 21.
Per the corrected hard-cap-2 reading (cycle 2 findings get fixed; cycle 3 is the forbidden verification pass), apply the four fix-disposition findings from cycle 2 of gc_codex_review on PR #792. The two not-fixed-here findings have recorded decisions on the issue thread (#791 comment): one filed as follow-up #795, one recorded as intentional ("not-applicable"). - ADR-029 + GC-O007 statement: remove `defer` from the decision vocabulary. The workflow's contract is "fix every finding before PR is ready"; defer was an explicit bypass that contradicted the rest of the document. GC-O007 statement updated via gc_update_requirement with full change history. - skills/implement/SKILL.md Step 15: broaden "materially implemented" to cover artifacts-of-record beyond Java code — schema, configuration, ADRs, workflow definitions, skill prose, etc. Workflow / docs / config requirements should not be misclassified as forward-looking just because they don't ship Java. - skills/implement/SKILL.md Step 12 cap: rewrite the hard-cap-2 prose so the semantics are unambiguous. Cycle 2 findings ARE fixed; the cap forbids running cycle 3 as a verification pass — residual concerns escalate to the user as an issue-thread comment, not into a third review cycle. References #794 for the durable tool-layer enforcement. - docs/DEVELOPMENT_WORKFLOW.md Mermaid diagram: swap S17 and S18 so the diagram matches SKILL.md Step 15→16 (transition to ACTIVE first, then reconcile traceability). Reconciling first against a still-DRAFT requirement silently fails because the API enforces IMPLEMENTS-only-on-ACTIVE. Drift edge updated; prose updated. - skills/review-tests/SKILL.md: resolve the base_branch placeholder itself via gc_get_repo_ground_control_context rather than expecting the caller to resolve `{cfg.workflow.base_branch|default dev}`. The skill is invoked standalone (Skill tool, Codex prompts), so it has to be self-contained. Companion issues filed: - #793 — route codex review GitHub posting through MCP, not sandbox. - #794 (P0) — enforce caps and ordering at the tool layer. - #795 — add cross-cutting/refactor reviewer set to gc_codex_review. 124/124 unit tests pass. pre-commit clean (Java 21).
The hard-cap-2 contract from GC-O007 / ADR-029 was previously prose-only in skills/implement/SKILL.md. Agents routinely rationalized past it, which is the bug class #794 was filed to address. This change moves enforcement onto the trusted side (the MCP server), where the agent cannot rationalize past it. How it works: - Before each gc_codex_review invocation (post-push, prNumber bound), the MCP server reads the PR's issue-comments and counts existing cycle markers via the new pure helper `parseCodexReviewCycleMarkers`. - `evaluateCodexReviewCycleCap` decides allow vs refuse. The cap value (2) lives in the new exported constant `CODEX_REVIEW_HARD_CAP`. - If the cap is reached, the tool returns a structured refusal: `{ok: false, error: "codex_review_cap_reached", message, prior_cycles, cap}`. No codex run is consumed. - After a successful invocation, the tool posts a machine-readable cycle marker to the PR via `buildCodexReviewCycleMarker` (`<!-- gc:codex-review-cycle ... -->`). Marker-post failures are non-fatal; the worst case is a one-off count miscount that the next call's cap-evaluator handles. - Successful returns surface `cycle` and `cap` so the agent sees its position ("cycle 1 of 2", etc.) and self-paces. Persistence: PR issue-comments authored by whoever the MCP server's `gh` CLI authenticates as. No new database, no local state file. The durable record lives on the issue thread per ADR-029. Restart-safe by construction. Scope: this MVP enforces the cap on post-push reviews only. Pre-push uncommitted reviews (Step 6.5) have a separate cycle limit (5) and no PR yet; that's a follow-up MVP. Tests: 14 new unit tests in lib.test.js covering the marker parser (matching markers, ignoring other PRs, malformed markers, non-string input), the cap evaluator (allow at 0/1, refuse at 2/7, hardCap override, defensive throws), and the marker builder (round-trips through the parser, includes attribution to #794, scopes by PR). 138/138 unit tests pass. pre-commit clean (Java 21). Updates SKILL.md Step 12 to drop the "currently advisory" hedge — the cap is now mechanically enforced. CHANGELOG entry under Added. Refs: #794 (P0 enforcement issue), GC-O007, ADR-029.
Two related changes that target a specific failure mode the user
flagged: agents pause at cycle 2 to ask whether to fix findings,
instead of fixing them and then escalating with a summary so the user
can decide on cycle 3.
Skill prose:
- Step 12 cap section rewritten. The new flow is explicit: cycle 1
fix-without-asking-then-push; cycle 2 fix-without-asking, post a
summary to the issue thread, escalate to the user with the summary
and the cycle-3-vs-ship question. Cycle 3 is the user's call, not
the agent's. Stopping at a finding to ask "should I fix this?" is
itself a process violation.
- Removes the prior "cycle 3 is forbidden" framing — replaced with
"cycle 3 requires user authorization."
Mechanical:
- Added `next_action` field to gc_codex_review results.
- Cycle 1 result: `next_action: "fix_all_findings_and_push"`
- Cycle 2 result: `next_action:
"fix_all_findings_then_summarize_and_escalate"`
- Cap refusal: `next_action:
"post_summary_and_escalate_to_user"`
This nudges the agent toward the right discipline at each cycle so
the prose isn't carrying it alone.
- Added `override_cap` (boolean) + `override_reason` (string) params
to gc_codex_review. The agent cannot self-authorize: override_cap=
true requires a non-empty override_reason quoting the user's
authorization (captured in the conversation), which is recorded on
the cycle marker for audit. Override cycles are distinguishable
from regular ones via `override="true"` and `reason="..."` attrs.
- Cap-marker regex relaxed to allow optional attrs after pr= so
override markers parse the same way as regular ones.
Tests added in lib.test.js cover next_action on each cycle, override
happy-path, override missing-reason rejection, override mid-cap (not
just past-cap), override-marker round-trip through the parser, and
reasons containing embedded quotes (JSON-escaped). 143/143 pass.
The MCP tool registration (index.js) now exposes the override params
to MCP callers.
Refs: #794 (P0 enforcement issue), GC-O007, ADR-029.
Adds the second leg of #794's enforcement architecture: ordering checks in MCP tools, not in skill prose. Mechanism: - Workflow phases are recorded as machine-readable markers on the GitHub issue thread (`<!-- gc:phase phase="<name>" issue="<N>" -->`), same template family as the cycle markers from MVP-1. The issue thread is the durable record per ADR-029; no new database, no local state file. - gc_codex_architecture_preflight writes a `preflight` marker on success when issue_number is set. The preflight call already receives an issue_number (the /implement skill always resolves to one in Step 1), so the marker is reliably present after a legitimate preflight run. - New gc_post_implementation_plan MCP tool. Refuses to post the plan unless the issue has a `preflight` marker. On success, posts a single combined comment carrying both the `plan` phase marker and the agent's plan body, and returns the comment URL/id. - Override is available with `override=true` + `override_reason` for cases where the user explicitly authorizes skipping preflight (e.g., one-line bug fixes). Reason is required and logged on the marker for audit, same pattern as the cycle-cap override. This closes the specific bug the user flagged: "agents trying to defer codex preflight until after planning, but that's absolutely backwards." The MCP server now refuses to accept the plan until preflight has run; the agent can no longer reorder by rationalizing past the prose. Skill prose: - Step 4 plan-posting flow updated: agent calls gc_post_implementation_plan with repo_path, issue_number, plan_body. Replaces the previous `gh issue comment --body-file` pattern. The skill now warns that working around the gate via direct `gh issue comment` is a process violation. Tests added: 14 new unit tests in lib.test.js for the phase-marker parser (matching, ignoring other issues, duplicates, non-string input), the prerequisite evaluator (allow when prerequisites met, refuse when missing, multi-prerequisite reporting, empty-requires allow-all), and the marker builder (round-trip through parser, two phases on one issue, isolation across issues, attribution to #794). 157/157 unit tests pass. pre-commit clean. Refs: #794 (P0 enforcement issue, MVP-2 of 2), GC-O007.
…#794) Two more enforcement extensions on top of MVP-1 (cycle cap) and MVP-2 (preflight-before-planning). 1) Plan-before-review ordering gate on gc_codex_review. Post-push reviews look up the PR's closing-issue refs (via `gh pr view --json closingIssuesReferences`) and refuse unless at least one of those issues carries a `plan` phase marker. Closes the same ordering hole MVP-2 closes for preflight→plan, but one step further down the workflow (plan→review). Same override pattern as preflight: `override_phase_gate=true` + `override_phase_reason` for trivial cases the user explicitly authorizes. PRs that close no issues skip the gate (legitimate for some refactor/chore PRs). 2) Per-finding hard-cap-2 on gc_codex_verify_finding. Same failure mode as the cycle cap, scoped per (PR, comment_id): agents can rationalize "just one more verify pass" indefinitely. Markers are posted to the PR after each verify call (same template family as cycle markers, distinguished by prefix and key shape). The tool refuses cycle 3+ unless overrideCap=true with a non-empty overrideReason quoting the user's authorization. Successful returns surface cycle, cap, next_action, override, override_reason. Three new pure-function exports back this: `parseCodexVerifyCycleMarkers`, `evaluateCodexVerifyCycleCap`, `buildCodexVerifyCycleMarker`. Together with #794 MVP-1 and MVP-2, the full enforced ordering is now: preflight → plan → review (cycle 1, cycle 2 with hard cap) → verify (per-finding hard cap). Each step is gated server-side; agents cannot skip or reorder by rationalizing past prose. Tests: 10 new in lib.test.js (verify-cycle parser, verify-cycle cap evaluator with override path, verify-cycle marker round-trip with override). 167/167 unit tests pass. pre-commit clean. SKILL.md updated to document the plan-before-review gate and the verify per-finding override. Refs: #794 (P0 enforcement issue, extension #1 of N), GC-O007.
|
18 tasks
This was referenced May 4, 2026
Open
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Summary
skills/implement/SKILL.mdas one source of truth for the/implementworkflow, parameterized by.ground-control.yamlviagc_get_repo_ground_control_context. Removes the per-repo.claude/skills/implement/SKILL.mdfork.bin/install-skills.shdistributesskills/*to~/.claude/skills/(Claude Code) and~/.codex/prompts/(Codex), so the same workflow is drivable from either runtime.Requirement UIDs
ADR Impact
What changed
.ground-control.yamlschema additionsAll optional, backwards compatible. Path-valued fields (
docs.*) validated withresolveRepoRelativePath+assertRealpathInRepoto reject absolute paths and symlink escapes — same pattern as the existing knowledge resolver.MCP server (
mcp/ground-control/lib.js)parseGroundControlYamlandgetRepoGroundControlContextextended to accept and return the new blocks. Six new normalize functions following the existingnormalize*Configpattern. 122 tests, all passing.Canonical skill (
skills/implement/SKILL.md)EnterPlanModeremoved; plans posted to issue viagh issue comment, workflow proceeds to TDD without synchronous user-approval wait.fix/wontfix/not-applicable, each with one-line rationale) recorded as issue comments.deferremoved from the decision set.{cfg.X|default Y}placeholders for repo-specific paths and conventions.gc_codex_reviewadding a refactor reviewer set; direct codex / subagent fallbacks removed per ADR-027's reviewer-of-record boundary.review-testsis currently Claude-Code-only; Codex driver migration tracked as follow-up.Distribution
bin/install-skills.shsymlinks (or copies with--copy) everyskills/*/to~/.claude/skills/andskills/*/SKILL.mdto~/.codex/prompts/*.md.--dry-runis side-effect-free;--no-codexskips the Codex install target.GC-O007 statement amendment
Updated via
gc_update_requirement. The prior statement is preserved ingc_get_requirement_history. New statement: one human touchpoint (PR merge); issue-thread durable record; codex review hard-cap-2; codex-as-reviewer-of-record invariant.Out of scope (explicit follow-ons, surfaced on issue thread)
gc_codex_reviewrefactor reviewer set (would activate Step 12.5).review-testsmigration from.claude/skills/review-tests/toskills/review-tests/for Codex driver parity.Ground Control Checks
make policypassesgc_evaluate_quality_gatespasses or is unchanged by this repo-only changegc_run_sweepreviewed or intentionally deferred with reasonThis is a tooling/workflow PR. Sweep is unaffected; deferred with reason.
Test plan
npm testinmcp/ground-control/— 122 tests, 0 fail. New tests cover the 4 new normalize functions and the path-validation rejection cases.make policy— all checks pass.make check— backend untouched, completion gate stays green.bin/install-skills.sh --dry-run— verified side-effect-free; correctly enumerates targets.gc_codex_review --uncommitted=true— 2 cycles ran; surgical fixes applied; remaining open items recorded as decisions on the issue thread per ADR-029.Traceability
skills/implement/SKILL.md(canonical workflow contract)mcp/ground-control/lib.js(schema additions and MCP tool wiring)architecture/adrs/027-agent-neutral-implement-workflow-packaging.mdarchitecture/adrs/029-issue-thread-gate-model.mdbin/install-skills.sh.ground-control.yamlarchitecture/adrs/028-temporal-workflow-orchestration-boundary.md(forward-looking; Temporal not delivered)architecture/adrs/027-agent-neutral-implement-workflow-packaging.md(preserves the configuration model GC-O009 will consume)mcp/ground-control/lib.test.js(schema additions tested, including path-validation rejections)Closes #791