Enforce workflow caps and ordering at the tool layer, not in skill prose

## Problem

The `/implement` workflow's caps and step ordering are encoded as prose in `skills/implement/SKILL.md`. The LLM driver reads that prose and chooses how (and whether) to follow it. Observed in practice:

- **Caps are routinely exceeded.** "Hard cap: 2 iterations of `gc_codex_review`" reads as a strong constraint to a human, but an LLM mid-task can rationalize "just one more cycle" — and does.
- **Steps are reordered.** Codex architecture preflight (Step 2.5) is supposed to run *before* planning, so its guardrails inform the plan. Agents repeatedly defer it to *after* planning. The skill says "this MUST happen first," but reading prose is not enforcement.
- **Steps are skipped.** Anything advisory can be silently dropped. Without a structural check, "did the agent run preflight" is unverifiable except by post-hoc audit.

This is not an agent-misbehavior bug — it is a contract-design bug. **A workflow contract that lives only in markdown the LLM is asked to read is advisory, not enforced.** Even ADR-029 calls out that "agent silence on a finding is a process violation" — but there is no mechanism that prevents agent silence today.

## Root cause

Enforcement lives on the wrong side of the trust boundary. The agent is unprivileged (it should not be trusted with workflow-contract decisions), but the workflow contract is encoded as prose addressed to the agent. The MCP server is privileged (host creds, GitHub credentials, Ground Control state) but does not enforce ordering or caps — it just exposes tools the agent calls in any order it pleases.

Compare to `gc_create_traceability_link IMPLEMENTS`, which already refuses to run against a `DRAFT` requirement (returns `422 requirement_not_active`). That is the right pattern: the constraint is enforced server-side; the agent can call it any number of times in any order, and the server makes "wrong" calls fail loudly. The same pattern should apply to workflow caps and step ordering.

## Proposed fix (incremental, two MVP scopes)

### MVP 1 — Server-side caps in MCP review tools

1. `gc_codex_review` counts prior calls per `(repo_path, pr_number, latest_commit_lineage)` tuple. After 2 calls (per ADR-027 / GC-O007's hard-cap-2 contract), the 3rd invocation returns `{ok: false, error: "hard cap reached for PR <N>; remaining findings must be escalated to user", prior_cycles: [...]}`. Agent literally cannot run cycle 3.
2. `gc_codex_verify_finding` counts prior calls per `(repo_path, pr_number, comment_id)` tuple. After 2 calls, the 3rd returns `{ok: false, error: "per-finding cap reached for comment <id>; escalate"}`.
3. The cycle counter is server-side state (Ground Control DB, keyed by PR + commit ancestry — so a new commit on the branch resets the counter only for the *new* commits, not historical ones).
4. Tool responses include the current cycle count so the agent can see "cycle 1 of 2" in the response and self-pace.

### MVP 2 — Prerequisite checks in MCP tools (ordering enforcement)

1. Add `gc_open_implementation_plan(issue_number, project)` — the planning step must call this MCP tool to mark planning open. It refuses to succeed unless `gc_codex_architecture_preflight` was called for this `(issue_number, project)` within the current session/branch (markers stored in Ground Control as a structured artifact row).
2. Add `gc_mark_phase(issue_number, phase)` — agent records phase entry/exit (preflight, plan, tdd, push, review, ship). Subsequent phase entries refuse if their prerequisites are missing.
3. The `IMPLEMENTS-only-on-ACTIVE` rule is the existing template — it rejects an out-of-order traceability call. Extend the same pattern to workflow-phase ordering.
4. Phase markers also become the durable telemetry that ADR-029 implies but doesn't currently produce.

## Why the MCP server is the right place

- Already on the trusted side of the boundary (host creds, project scoping, Envers audit).
- Already enforces semantic contracts (ACTIVE-required for IMPLEMENTS, project-scope checks).
- Driver-portable: works for Claude Code AND Codex AND any future Temporal worker. A Claude-Code-only `PreToolUse` hook (the alternative) doesn't.
- Recoverable: persistent counters / phase markers survive agent crashes and session restarts.

## Why this is a stepping stone to GC-O009 (Temporal)

GC-O009 makes the workflow code *be* the workflow — `await preflight(); await plan(); ...` — at which point the agent's role shrinks from "decide what step to do next" to "execute the activity I'm currently in." That is the durable answer. This issue's MVP-1 + MVP-2 are the bridge: they move enforcement off the LLM today, in the MCP server we already operate, while GC-O009's Temporal substrate matures. The data shapes (cycle counters, phase markers) feed directly into Temporal Search Attributes when the cutover happens.

## Out of scope

- Replacing the implement skill prose. Prose stays as documentation. The fix is to make the prose's claims mechanically true.
- Changing the gate model (ADR-029) or the cycle cap value (2). Both stay; this issue makes them enforceable.
- Hooks-based enforcement (Claude Code `PreToolUse`). Considered and rejected as the primary fix because it is driver-specific.
- Sub-agent decomposition. Considered as an alternative; deferred until MVP-1 and MVP-2 are evaluated, since they are smaller and unlock most of the benefit.

## Acceptance criteria

- [ ] `gc_codex_review` rejects a 3rd cycle on the same PR + commit lineage with a structured error and an escalation message.
- [ ] `gc_codex_verify_finding` rejects a 3rd verify call on the same comment id with a structured error.
- [ ] Cycle counters survive agent restarts (server-side state, not in-memory).
- [ ] Tool responses include the current cycle count.
- [ ] `gc_open_implementation_plan` (or equivalent prerequisite-check tool) refuses if preflight artifacts are missing for the current `(issue, project)`.
- [ ] Phase markers are queryable via existing Ground Control APIs so post-hoc audits work.
- [ ] Tests cover happy-path, cap-exceeded, prerequisite-missing, and cross-restart cases in `mcp/ground-control/lib.test.js`.
- [ ] CHANGELOG entry under Fixed.
- [ ] ADR-027 amended (or a new ADR-030) documenting the enforcement architecture.

## Requirements

- GC-O007 (Gated Agentic Development Loop) — this issue enforces what GC-O007 today only describes.
- GC-O009 (Workflow Orchestration via Temporal) — this issue is the bridge MVP that GC-O009 supersedes.

## Related

- ADR-021 — original gated loop (amended).
- ADR-027 — agent-neutral packaging (this issue strengthens its claims).
- ADR-028 — Temporal boundary (this issue's data model feeds it).
- ADR-029 — issue-thread gate model (durable record requires enforced phase markers).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enforce workflow caps and ordering at the tool layer, not in skill prose #794

Problem

Root cause

Proposed fix (incremental, two MVP scopes)

MVP 1 — Server-side caps in MCP review tools

MVP 2 — Prerequisite checks in MCP tools (ordering enforcement)

Why the MCP server is the right place

Why this is a stepping stone to GC-O009 (Temporal)

Out of scope

Acceptance criteria

Requirements

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Enforce workflow caps and ordering at the tool layer, not in skill prose #794

Description

Problem

Root cause

Proposed fix (incremental, two MVP scopes)

MVP 1 — Server-side caps in MCP review tools

MVP 2 — Prerequisite checks in MCP tools (ordering enforcement)

Why the MCP server is the right place

Why this is a stepping stone to GC-O009 (Temporal)

Out of scope

Acceptance criteria

Requirements

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions