Problem
The /implement workflow's caps and step ordering are encoded as prose in skills/implement/SKILL.md. The LLM driver reads that prose and chooses how (and whether) to follow it. Observed in practice:
- Caps are routinely exceeded. "Hard cap: 2 iterations of
gc_codex_review" reads as a strong constraint to a human, but an LLM mid-task can rationalize "just one more cycle" — and does.
- Steps are reordered. Codex architecture preflight (Step 2.5) is supposed to run before planning, so its guardrails inform the plan. Agents repeatedly defer it to after planning. The skill says "this MUST happen first," but reading prose is not enforcement.
- Steps are skipped. Anything advisory can be silently dropped. Without a structural check, "did the agent run preflight" is unverifiable except by post-hoc audit.
This is not an agent-misbehavior bug — it is a contract-design bug. A workflow contract that lives only in markdown the LLM is asked to read is advisory, not enforced. Even ADR-029 calls out that "agent silence on a finding is a process violation" — but there is no mechanism that prevents agent silence today.
Root cause
Enforcement lives on the wrong side of the trust boundary. The agent is unprivileged (it should not be trusted with workflow-contract decisions), but the workflow contract is encoded as prose addressed to the agent. The MCP server is privileged (host creds, GitHub credentials, Ground Control state) but does not enforce ordering or caps — it just exposes tools the agent calls in any order it pleases.
Compare to gc_create_traceability_link IMPLEMENTS, which already refuses to run against a DRAFT requirement (returns 422 requirement_not_active). That is the right pattern: the constraint is enforced server-side; the agent can call it any number of times in any order, and the server makes "wrong" calls fail loudly. The same pattern should apply to workflow caps and step ordering.
Proposed fix (incremental, two MVP scopes)
MVP 1 — Server-side caps in MCP review tools
gc_codex_review counts prior calls per (repo_path, pr_number, latest_commit_lineage) tuple. After 2 calls (per ADR-027 / GC-O007's hard-cap-2 contract), the 3rd invocation returns {ok: false, error: "hard cap reached for PR <N>; remaining findings must be escalated to user", prior_cycles: [...]}. Agent literally cannot run cycle 3.
gc_codex_verify_finding counts prior calls per (repo_path, pr_number, comment_id) tuple. After 2 calls, the 3rd returns {ok: false, error: "per-finding cap reached for comment <id>; escalate"}.
- The cycle counter is server-side state (Ground Control DB, keyed by PR + commit ancestry — so a new commit on the branch resets the counter only for the new commits, not historical ones).
- Tool responses include the current cycle count so the agent can see "cycle 1 of 2" in the response and self-pace.
MVP 2 — Prerequisite checks in MCP tools (ordering enforcement)
- Add
gc_open_implementation_plan(issue_number, project) — the planning step must call this MCP tool to mark planning open. It refuses to succeed unless gc_codex_architecture_preflight was called for this (issue_number, project) within the current session/branch (markers stored in Ground Control as a structured artifact row).
- Add
gc_mark_phase(issue_number, phase) — agent records phase entry/exit (preflight, plan, tdd, push, review, ship). Subsequent phase entries refuse if their prerequisites are missing.
- The
IMPLEMENTS-only-on-ACTIVE rule is the existing template — it rejects an out-of-order traceability call. Extend the same pattern to workflow-phase ordering.
- Phase markers also become the durable telemetry that ADR-029 implies but doesn't currently produce.
Why the MCP server is the right place
- Already on the trusted side of the boundary (host creds, project scoping, Envers audit).
- Already enforces semantic contracts (ACTIVE-required for IMPLEMENTS, project-scope checks).
- Driver-portable: works for Claude Code AND Codex AND any future Temporal worker. A Claude-Code-only
PreToolUse hook (the alternative) doesn't.
- Recoverable: persistent counters / phase markers survive agent crashes and session restarts.
Why this is a stepping stone to GC-O009 (Temporal)
GC-O009 makes the workflow code be the workflow — await preflight(); await plan(); ... — at which point the agent's role shrinks from "decide what step to do next" to "execute the activity I'm currently in." That is the durable answer. This issue's MVP-1 + MVP-2 are the bridge: they move enforcement off the LLM today, in the MCP server we already operate, while GC-O009's Temporal substrate matures. The data shapes (cycle counters, phase markers) feed directly into Temporal Search Attributes when the cutover happens.
Out of scope
- Replacing the implement skill prose. Prose stays as documentation. The fix is to make the prose's claims mechanically true.
- Changing the gate model (ADR-029) or the cycle cap value (2). Both stay; this issue makes them enforceable.
- Hooks-based enforcement (Claude Code
PreToolUse). Considered and rejected as the primary fix because it is driver-specific.
- Sub-agent decomposition. Considered as an alternative; deferred until MVP-1 and MVP-2 are evaluated, since they are smaller and unlock most of the benefit.
Acceptance criteria
Requirements
- GC-O007 (Gated Agentic Development Loop) — this issue enforces what GC-O007 today only describes.
- GC-O009 (Workflow Orchestration via Temporal) — this issue is the bridge MVP that GC-O009 supersedes.
Related
- ADR-021 — original gated loop (amended).
- ADR-027 — agent-neutral packaging (this issue strengthens its claims).
- ADR-028 — Temporal boundary (this issue's data model feeds it).
- ADR-029 — issue-thread gate model (durable record requires enforced phase markers).
Problem
The
/implementworkflow's caps and step ordering are encoded as prose inskills/implement/SKILL.md. The LLM driver reads that prose and chooses how (and whether) to follow it. Observed in practice:gc_codex_review" reads as a strong constraint to a human, but an LLM mid-task can rationalize "just one more cycle" — and does.This is not an agent-misbehavior bug — it is a contract-design bug. A workflow contract that lives only in markdown the LLM is asked to read is advisory, not enforced. Even ADR-029 calls out that "agent silence on a finding is a process violation" — but there is no mechanism that prevents agent silence today.
Root cause
Enforcement lives on the wrong side of the trust boundary. The agent is unprivileged (it should not be trusted with workflow-contract decisions), but the workflow contract is encoded as prose addressed to the agent. The MCP server is privileged (host creds, GitHub credentials, Ground Control state) but does not enforce ordering or caps — it just exposes tools the agent calls in any order it pleases.
Compare to
gc_create_traceability_link IMPLEMENTS, which already refuses to run against aDRAFTrequirement (returns422 requirement_not_active). That is the right pattern: the constraint is enforced server-side; the agent can call it any number of times in any order, and the server makes "wrong" calls fail loudly. The same pattern should apply to workflow caps and step ordering.Proposed fix (incremental, two MVP scopes)
MVP 1 — Server-side caps in MCP review tools
gc_codex_reviewcounts prior calls per(repo_path, pr_number, latest_commit_lineage)tuple. After 2 calls (per ADR-027 / GC-O007's hard-cap-2 contract), the 3rd invocation returns{ok: false, error: "hard cap reached for PR <N>; remaining findings must be escalated to user", prior_cycles: [...]}. Agent literally cannot run cycle 3.gc_codex_verify_findingcounts prior calls per(repo_path, pr_number, comment_id)tuple. After 2 calls, the 3rd returns{ok: false, error: "per-finding cap reached for comment <id>; escalate"}.MVP 2 — Prerequisite checks in MCP tools (ordering enforcement)
gc_open_implementation_plan(issue_number, project)— the planning step must call this MCP tool to mark planning open. It refuses to succeed unlessgc_codex_architecture_preflightwas called for this(issue_number, project)within the current session/branch (markers stored in Ground Control as a structured artifact row).gc_mark_phase(issue_number, phase)— agent records phase entry/exit (preflight, plan, tdd, push, review, ship). Subsequent phase entries refuse if their prerequisites are missing.IMPLEMENTS-only-on-ACTIVErule is the existing template — it rejects an out-of-order traceability call. Extend the same pattern to workflow-phase ordering.Why the MCP server is the right place
PreToolUsehook (the alternative) doesn't.Why this is a stepping stone to GC-O009 (Temporal)
GC-O009 makes the workflow code be the workflow —
await preflight(); await plan(); ...— at which point the agent's role shrinks from "decide what step to do next" to "execute the activity I'm currently in." That is the durable answer. This issue's MVP-1 + MVP-2 are the bridge: they move enforcement off the LLM today, in the MCP server we already operate, while GC-O009's Temporal substrate matures. The data shapes (cycle counters, phase markers) feed directly into Temporal Search Attributes when the cutover happens.Out of scope
PreToolUse). Considered and rejected as the primary fix because it is driver-specific.Acceptance criteria
gc_codex_reviewrejects a 3rd cycle on the same PR + commit lineage with a structured error and an escalation message.gc_codex_verify_findingrejects a 3rd verify call on the same comment id with a structured error.gc_open_implementation_plan(or equivalent prerequisite-check tool) refuses if preflight artifacts are missing for the current(issue, project).mcp/ground-control/lib.test.js.Requirements
Related