Central-package /implement skill + amend GC-O007 gate model (#791) by Brad-Edwards · Pull Request #792 · KeplerOps/Ground-Control

Brad-Edwards · 2026-05-03T18:58:11Z

Summary

Lands canonical skills/implement/SKILL.md as one source of truth for the /implement workflow, parameterized by .ground-control.yaml via gc_get_repo_ground_control_context. Removes the per-repo .claude/skills/implement/SKILL.md fork.
bin/install-skills.sh distributes skills/* to ~/.claude/skills/ (Claude Code) and ~/.codex/prompts/ (Codex), so the same workflow is drivable from either runtime.
Amends GC-O007 (statement updated; ADR-029 added) to drop synchronous plan-approval as a human touchpoint. PR merge is the only synchronous human gate. Plan, review findings, and finding-decisions live on the GitHub issue thread.

Requirement UIDs

GC-O007 — Gated Agentic Development Loop (statement amended in this PR)
GC-O009 — Workflow Orchestration via Temporal (durable end state; this PR is the interim packaging that feeds into it)

ADR Impact

ADR-027 added — Agent-Neutral Implement Workflow Packaging. Schema additions, canonical-source layout, codex-as-reviewer-of-record invariant, hard-cap-2 codex review.
ADR-028 added — Temporal Workflow Orchestration Boundary (forward-looking for GC-O009).
ADR-029 added — Issue-Thread Gate Model. Amends ADR-021's two-touchpoint clause to one (PR merge). Plan, review findings, decisions on findings live on the issue thread.
ADR-021 carries an inline amendment note pointing at ADR-029 (preserved structure, single-touchpoint contract).

What changed

`.ground-control.yaml` schema additions

docs: { adr_dir, architecture_overview, coding_standards, workflow_reference, knowledge_base }
example_paths: { source, test }
requirements: { uid_examples }
cross_cutting_concerns: { description }
workflow: { ..., base_branch }   # was hardcoded 'dev' in the skill

All optional, backwards compatible. Path-valued fields (docs.*) validated with resolveRepoRelativePath + assertRealpathInRepo to reject absolute paths and symlink escapes — same pattern as the existing knowledge resolver.

MCP server (`mcp/ground-control/lib.js`)

parseGroundControlYaml and getRepoGroundControlContext extended to accept and return the new blocks. Six new normalize functions following the existing normalize*Config pattern. 122 tests, all passing.

Canonical skill (`skills/implement/SKILL.md`)

EnterPlanMode removed; plans posted to issue via gh issue comment, workflow proceeds to TDD without synchronous user-approval wait.
Review-finding decisions (fix / wontfix / not-applicable, each with one-line rationale) recorded as issue comments. defer removed from the decision set.
Codex review loops hard-capped at 2 cycles (no soft-cap escape).
{cfg.X|default Y} placeholders for repo-specific paths and conventions.
Step 12.5 (refactor & cross-cutting concerns) ships as a gap pending gc_codex_review adding a refactor reviewer set; direct codex / subagent fallbacks removed per ADR-027's reviewer-of-record boundary.
Step 13 (test-quality review) notes that review-tests is currently Claude-Code-only; Codex driver migration tracked as follow-up.

Distribution

bin/install-skills.sh symlinks (or copies with --copy) every skills/*/ to ~/.claude/skills/ and skills/*/SKILL.md to ~/.codex/prompts/*.md. --dry-run is side-effect-free; --no-codex skips the Codex install target.

GC-O007 statement amendment

Updated via gc_update_requirement. The prior statement is preserved in gc_get_requirement_history. New statement: one human touchpoint (PR merge); issue-thread durable record; codex review hard-cap-2; codex-as-reviewer-of-record invariant.

Out of scope (explicit follow-ons, surfaced on issue thread)

gc_codex_review refactor reviewer set (would activate Step 12.5).
review-tests migration from .claude/skills/review-tests/ to skills/review-tests/ for Codex driver parity.
Per-repo migrations for shifter, pulsar, aptl (one PR each).
Codex prompts-directory convention as it settles.

Ground Control Checks

make policy passes
gc_evaluate_quality_gates passes or is unchanged by this repo-only change
gc_run_sweep reviewed or intentionally deferred with reason

This is a tooling/workflow PR. Sweep is unaffected; deferred with reason.

Test plan

npm test in mcp/ground-control/ — 122 tests, 0 fail. New tests cover the 4 new normalize functions and the path-validation rejection cases.
make policy — all checks pass.
make check — backend untouched, completion gate stays green.
bin/install-skills.sh --dry-run — verified side-effect-free; correctly enumerates targets.
Pre-push gc_codex_review --uncommitted=true — 2 cycles ran; surgical fixes applied; remaining open items recorded as decisions on the issue thread per ADR-029.
CI green (this PR's run will tell).
SonarCloud green.

Traceability

IMPLEMENTS: GC-O007 → skills/implement/SKILL.md (canonical workflow contract)
IMPLEMENTS: GC-O007 → mcp/ground-control/lib.js (schema additions and MCP tool wiring)
IMPLEMENTS: GC-O007 → architecture/adrs/027-agent-neutral-implement-workflow-packaging.md
IMPLEMENTS: GC-O007 → architecture/adrs/029-issue-thread-gate-model.md
IMPLEMENTS: GC-O007 → bin/install-skills.sh
IMPLEMENTS: GC-O007 → .ground-control.yaml
DOCUMENTS: GC-O009 → architecture/adrs/028-temporal-workflow-orchestration-boundary.md (forward-looking; Temporal not delivered)
DOCUMENTS: GC-O009 → architecture/adrs/027-agent-neutral-implement-workflow-packaging.md (preserves the configuration model GC-O009 will consume)
TESTS: GC-O007 → mcp/ground-control/lib.test.js (schema additions tested, including path-validation rejections)

Closes #791

…791) Lands the canonical implement skill at skills/implement/SKILL.md as the single source of truth, parameterized by per-repo .ground-control.yaml via gc_get_repo_ground_control_context. Removes the per-repo .claude/skills/implement/SKILL.md fork. bin/install-skills.sh distributes to ~/.claude/skills/ and ~/.codex/prompts/ so the same workflow is drivable from Claude Code or Codex against identical content. Schema additions to .ground-control.yaml (ADR-027): - docs: { adr_dir, architecture_overview, coding_standards, workflow_reference, knowledge_base } — all path-valued, validated with realpath containment to reject absolute paths and symlink escapes. - example_paths: { source, test } - requirements: { uid_examples } - cross_cutting_concerns: { description } - workflow.base_branch (was hardcoded 'dev' in the skill prose). gc_get_repo_ground_control_context now returns these blocks; the canonical SKILL.md renders against them via {cfg.X|default Y} placeholders. Gate-model amendment (ADR-029, GC-O007 statement updated): - Drop synchronous plan-approval as a human touchpoint. PR merge is the only synchronous human gate. Plan posted to GitHub issue as comment; workflow proceeds directly to TDD. - Review findings and decisions on findings (fix / wontfix / not-applicable, each with one-line rationale) recorded as comments on the issue thread. Agent silence on a finding is a process violation. defer is removed from the decision set. - Codex review loops hard-capped at 2 cycles (no soft-cap escape). - Codex remains the reviewer of record regardless of driver. - ADR-021 amended (not superseded). ADR-027 has the transport-not-bypass clause removed and points at ADR-029. - ADR-028 (Temporal boundary) plan-approval reference cleaned up to align with ADR-029. Out of scope, surfaced as follow-ups on the issue thread: - gc_codex_review refactor reviewer set (Step 12.5 ships as a gap) - review-tests migration to skills/review-tests/ - Per-repo migrations for shifter, pulsar, aptl - Codex prompts directory convention as it settles GC-O007 is already ACTIVE; statement amended via gc_update_requirement. GC-O009 stays DRAFT (Temporal not delivered); linked via DOCUMENTS. Both linked to issue #791 as DOCUMENTS already; reconciliation in Phase D will add the IMPLEMENTS links to the canonical SKILL.md and the schema additions for GC-O007 specifically. Closes #791.

Address six core findings + one security finding from gc_codex_review on PR #792: - Validate workflow.base_branch against a safe Git ref allowlist in normalizeWorkflowConfig. The value flows into shell-evaluated `gh` and `git` commands rendered by the implement skill, so without validation a hostile .ground-control.yaml could inject shell commands. - Replace hardcoded `dev`/`main` in /implement Step 16 with `{cfg.workflow.base_branch|default dev}` so non-`dev` repos reconcile traceability against the right base branch. - Reorder /implement Step 15: classify each in-scope requirement as materially-implemented vs forward-looking BEFORE transitioning, so forward-looking ones stay DRAFT instead of being prematurely promoted. - Remove /implement Step 12.5 entirely. It was a documented no-op pending tooling that may never ship; an unimplemented "required gate" is worse than no gate. Cross-cutting-concerns coverage already lives in Step 3 (existing-helper inventory) and Step 12 (codex review). - Migrate review-tests to skills/review-tests/SKILL.md so /implement Step 13 is agent-neutral. bin/install-skills.sh ships the same content to Claude Code and Codex; the skill's diff command now reads workflow.base_branch. - Remove `defer` from review-finding decision vocabulary in the implement skill, the review-tests skill, and the CHANGELOG narrative. Workflow contract is "fix every finding before PR is ready"; defer was an explicit bypass. - Update docs/DEVELOPMENT_WORKFLOW.md Mermaid diagram and User Touchpoints section to match ADR-029. S5 is now a non-gating "post plan as issue comment" node, the approval edge is unconditional, and the prior hedge line is gone. Tests added in lib.test.js cover the base_branch allowlist (safe inputs accepted; shell-injection and unsafe-ref shapes rejected). Full unit suite: 124/124 pass. pre-commit: all hooks green under Java 21.

Per the corrected hard-cap-2 reading (cycle 2 findings get fixed; cycle 3 is the forbidden verification pass), apply the four fix-disposition findings from cycle 2 of gc_codex_review on PR #792. The two not-fixed-here findings have recorded decisions on the issue thread (#791 comment): one filed as follow-up #795, one recorded as intentional ("not-applicable"). - ADR-029 + GC-O007 statement: remove `defer` from the decision vocabulary. The workflow's contract is "fix every finding before PR is ready"; defer was an explicit bypass that contradicted the rest of the document. GC-O007 statement updated via gc_update_requirement with full change history. - skills/implement/SKILL.md Step 15: broaden "materially implemented" to cover artifacts-of-record beyond Java code — schema, configuration, ADRs, workflow definitions, skill prose, etc. Workflow / docs / config requirements should not be misclassified as forward-looking just because they don't ship Java. - skills/implement/SKILL.md Step 12 cap: rewrite the hard-cap-2 prose so the semantics are unambiguous. Cycle 2 findings ARE fixed; the cap forbids running cycle 3 as a verification pass — residual concerns escalate to the user as an issue-thread comment, not into a third review cycle. References #794 for the durable tool-layer enforcement. - docs/DEVELOPMENT_WORKFLOW.md Mermaid diagram: swap S17 and S18 so the diagram matches SKILL.md Step 15→16 (transition to ACTIVE first, then reconcile traceability). Reconciling first against a still-DRAFT requirement silently fails because the API enforces IMPLEMENTS-only-on-ACTIVE. Drift edge updated; prose updated. - skills/review-tests/SKILL.md: resolve the base_branch placeholder itself via gc_get_repo_ground_control_context rather than expecting the caller to resolve `{cfg.workflow.base_branch|default dev}`. The skill is invoked standalone (Skill tool, Codex prompts), so it has to be self-contained. Companion issues filed: - #793 — route codex review GitHub posting through MCP, not sandbox. - #794 (P0) — enforce caps and ordering at the tool layer. - #795 — add cross-cutting/refactor reviewer set to gc_codex_review. 124/124 unit tests pass. pre-commit clean (Java 21).

The hard-cap-2 contract from GC-O007 / ADR-029 was previously prose-only in skills/implement/SKILL.md. Agents routinely rationalized past it, which is the bug class #794 was filed to address. This change moves enforcement onto the trusted side (the MCP server), where the agent cannot rationalize past it. How it works: - Before each gc_codex_review invocation (post-push, prNumber bound), the MCP server reads the PR's issue-comments and counts existing cycle markers via the new pure helper `parseCodexReviewCycleMarkers`. - `evaluateCodexReviewCycleCap` decides allow vs refuse. The cap value (2) lives in the new exported constant `CODEX_REVIEW_HARD_CAP`. - If the cap is reached, the tool returns a structured refusal: `{ok: false, error: "codex_review_cap_reached", message, prior_cycles, cap}`. No codex run is consumed. - After a successful invocation, the tool posts a machine-readable cycle marker to the PR via `buildCodexReviewCycleMarker` (``). Marker-post failures are non-fatal; the worst case is a one-off count miscount that the next call's cap-evaluator handles. - Successful returns surface `cycle` and `cap` so the agent sees its position ("cycle 1 of 2", etc.) and self-paces. Persistence: PR issue-comments authored by whoever the MCP server's `gh` CLI authenticates as. No new database, no local state file. The durable record lives on the issue thread per ADR-029. Restart-safe by construction. Scope: this MVP enforces the cap on post-push reviews only. Pre-push uncommitted reviews (Step 6.5) have a separate cycle limit (5) and no PR yet; that's a follow-up MVP. Tests: 14 new unit tests in lib.test.js covering the marker parser (matching markers, ignoring other PRs, malformed markers, non-string input), the cap evaluator (allow at 0/1, refuse at 2/7, hardCap override, defensive throws), and the marker builder (round-trips through the parser, includes attribution to #794, scopes by PR). 138/138 unit tests pass. pre-commit clean (Java 21). Updates SKILL.md Step 12 to drop the "currently advisory" hedge — the cap is now mechanically enforced. CHANGELOG entry under Added. Refs: #794 (P0 enforcement issue), GC-O007, ADR-029.

Two related changes that target a specific failure mode the user flagged: agents pause at cycle 2 to ask whether to fix findings, instead of fixing them and then escalating with a summary so the user can decide on cycle 3. Skill prose: - Step 12 cap section rewritten. The new flow is explicit: cycle 1 fix-without-asking-then-push; cycle 2 fix-without-asking, post a summary to the issue thread, escalate to the user with the summary and the cycle-3-vs-ship question. Cycle 3 is the user's call, not the agent's. Stopping at a finding to ask "should I fix this?" is itself a process violation. - Removes the prior "cycle 3 is forbidden" framing — replaced with "cycle 3 requires user authorization." Mechanical: - Added `next_action` field to gc_codex_review results. - Cycle 1 result: `next_action: "fix_all_findings_and_push"` - Cycle 2 result: `next_action: "fix_all_findings_then_summarize_and_escalate"` - Cap refusal: `next_action: "post_summary_and_escalate_to_user"` This nudges the agent toward the right discipline at each cycle so the prose isn't carrying it alone. - Added `override_cap` (boolean) + `override_reason` (string) params to gc_codex_review. The agent cannot self-authorize: override_cap= true requires a non-empty override_reason quoting the user's authorization (captured in the conversation), which is recorded on the cycle marker for audit. Override cycles are distinguishable from regular ones via `override="true"` and `reason="..."` attrs. - Cap-marker regex relaxed to allow optional attrs after pr= so override markers parse the same way as regular ones. Tests added in lib.test.js cover next_action on each cycle, override happy-path, override missing-reason rejection, override mid-cap (not just past-cap), override-marker round-trip through the parser, and reasons containing embedded quotes (JSON-escaped). 143/143 pass. The MCP tool registration (index.js) now exposes the override params to MCP callers. Refs: #794 (P0 enforcement issue), GC-O007, ADR-029.

Adds the second leg of #794's enforcement architecture: ordering checks in MCP tools, not in skill prose. Mechanism: - Workflow phases are recorded as machine-readable markers on the GitHub issue thread (``), same template family as the cycle markers from MVP-1. The issue thread is the durable record per ADR-029; no new database, no local state file. - gc_codex_architecture_preflight writes a `preflight` marker on success when issue_number is set. The preflight call already receives an issue_number (the /implement skill always resolves to one in Step 1), so the marker is reliably present after a legitimate preflight run. - New gc_post_implementation_plan MCP tool. Refuses to post the plan unless the issue has a `preflight` marker. On success, posts a single combined comment carrying both the `plan` phase marker and the agent's plan body, and returns the comment URL/id. - Override is available with `override=true` + `override_reason` for cases where the user explicitly authorizes skipping preflight (e.g., one-line bug fixes). Reason is required and logged on the marker for audit, same pattern as the cycle-cap override. This closes the specific bug the user flagged: "agents trying to defer codex preflight until after planning, but that's absolutely backwards." The MCP server now refuses to accept the plan until preflight has run; the agent can no longer reorder by rationalizing past the prose. Skill prose: - Step 4 plan-posting flow updated: agent calls gc_post_implementation_plan with repo_path, issue_number, plan_body. Replaces the previous `gh issue comment --body-file` pattern. The skill now warns that working around the gate via direct `gh issue comment` is a process violation. Tests added: 14 new unit tests in lib.test.js for the phase-marker parser (matching, ignoring other issues, duplicates, non-string input), the prerequisite evaluator (allow when prerequisites met, refuse when missing, multi-prerequisite reporting, empty-requires allow-all), and the marker builder (round-trip through parser, two phases on one issue, isolation across issues, attribution to #794). 157/157 unit tests pass. pre-commit clean. Refs: #794 (P0 enforcement issue, MVP-2 of 2), GC-O007.

…#794) Two more enforcement extensions on top of MVP-1 (cycle cap) and MVP-2 (preflight-before-planning). 1) Plan-before-review ordering gate on gc_codex_review. Post-push reviews look up the PR's closing-issue refs (via `gh pr view --json closingIssuesReferences`) and refuse unless at least one of those issues carries a `plan` phase marker. Closes the same ordering hole MVP-2 closes for preflight→plan, but one step further down the workflow (plan→review). Same override pattern as preflight: `override_phase_gate=true` + `override_phase_reason` for trivial cases the user explicitly authorizes. PRs that close no issues skip the gate (legitimate for some refactor/chore PRs). 2) Per-finding hard-cap-2 on gc_codex_verify_finding. Same failure mode as the cycle cap, scoped per (PR, comment_id): agents can rationalize "just one more verify pass" indefinitely. Markers are posted to the PR after each verify call (same template family as cycle markers, distinguished by prefix and key shape). The tool refuses cycle 3+ unless overrideCap=true with a non-empty overrideReason quoting the user's authorization. Successful returns surface cycle, cap, next_action, override, override_reason. Three new pure-function exports back this: `parseCodexVerifyCycleMarkers`, `evaluateCodexVerifyCycleCap`, `buildCodexVerifyCycleMarker`. Together with #794 MVP-1 and MVP-2, the full enforced ordering is now: preflight → plan → review (cycle 1, cycle 2 with hard cap) → verify (per-finding hard cap). Each step is gated server-side; agents cannot skip or reorder by rationalizing past prose. Tests: 10 new in lib.test.js (verify-cycle parser, verify-cycle cap evaluator with override path, verify-cycle marker round-trip with override). 167/167 unit tests pass. pre-commit clean. SKILL.md updated to document the plan-before-review gate and the verify per-finding override. Refs: #794 (P0 enforcement issue, extension #1 of N), GC-O007.

sonarqubecloud · 2026-05-03T23:43:57Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Brad-Edwards added 7 commits May 3, 2026 20:57

Brad-Edwards mentioned this pull request May 3, 2026

Central-package /implement skill as agent-neutral (Claude Code + Codex) #791

Open

18 tasks

Brad-Edwards merged commit 6956f55 into dev May 3, 2026
11 checks passed

Brad-Edwards deleted the 791-central-package-implement-skill-as-agent-neutral-claude-code-+-codex branch May 3, 2026 23:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Central-package /implement skill + amend GC-O007 gate model (#791)#792

Central-package /implement skill + amend GC-O007 gate model (#791)#792
Brad-Edwards merged 7 commits intodevfrom
791-central-package-implement-skill-as-agent-neutral-claude-code-+-codex

Brad-Edwards commented May 3, 2026

Uh oh!

sonarqubecloud Bot commented May 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Brad-Edwards commented May 3, 2026

Summary

Requirement UIDs

ADR Impact

What changed

.ground-control.yaml schema additions

MCP server (mcp/ground-control/lib.js)

Canonical skill (skills/implement/SKILL.md)

Distribution

GC-O007 statement amendment

Out of scope (explicit follow-ons, surfaced on issue thread)

Ground Control Checks

Test plan

Traceability

Uh oh!

sonarqubecloud Bot commented May 3, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`.ground-control.yaml` schema additions

MCP server (`mcp/ground-control/lib.js`)

Canonical skill (`skills/implement/SKILL.md`)