Context
During an RLCR session, the loop ran five rounds (Round 0 → Round 4) before the stagnation circuit breaker forced exit. The plan declared an all-or-nothing scope (lower bound = upper bound = full target deliverable). An external execution-environment blocker appeared from Round 2 onward and was explicitly surfaced with resolution options in the Round 3 summary's update request. The loop continued anyway for two more rounds; the reviewer's verdict drifted from ADVANCED (Rounds 0–2) to STALLED (Rounds 3–4) as the mainline gap list barely changed. Local round contracts were well-scoped and cleanly executed, but they accepted scope reduction the plan forbade, so the reviewer never credited the work as plan-level progress. The methodology produced clean local rounds and the circuit breaker did eventually fire — but it fired after three rounds of diminishing returns.
This issue summarizes the methodology-improvement suggestions distilled from a post-loop analysis. All content is sanitized; no project-specific details below.
Suggestions
Theme 1: Detecting and reacting to external blockers early
1. Treat reported "environment-bound" deferrals as a plan-level event, not a per-round footnote.
Once the implementer reports an unresolvable external constraint that prevents touching the mainline deliverable, every subsequent round will inherit the same constraint and produce only peripheral work. In the observed session, the implementer surfaced the blocker as early as Round 2 and explicitly listed four resolution options in a Round 3 update request, but the loop continued for two more rounds anyway. When a round summary contains an explicit "Goal Tracker Update Request" presenting decision options that only the user can resolve, the methodology should pause the implement-review loop and route to a human-decision gate. Continuing the loop after a clearly-articulated blocker is a form of busy-waiting that the circuit breaker eventually catches, but a dedicated escalation gate would catch it sooner and more cleanly.
2. Distinguish "implementer cannot proceed" from "implementer chose to defer."
Round summaries listed the same set of unimplemented tasks as "deferred" for environment reasons across multiple rounds. The reviewer flagged this each time as "unjustified deferrals" because, per the plan boundaries, no deferrals were valid. Both sides were correct under their own framing, but the loop had no way to converge. The methodology should require the round summary to label each deferred item with one of three causes (scope choice / dependency on earlier in-loop work / external blocker) and require the reviewer to acknowledge that taxonomy. External-blocker items should automatically trigger the escalation gate from suggestion #1 rather than recurring as "unjustified deferral" findings.
Theme 2: Plan rigidity vs. execution reality
3. A plan whose lower bound equals its upper bound is brittle under any unexpected friction.
The plan collapsed the acceptable outcome range to a single point: full feature parity, no partial credit. Any friction (environment, missing tools, scope discovery) renders the entire loop incapable of producing a "complete" result, and the reviewer is structurally forced to return "incomplete" every time. Every review across all five rounds reported zero acceptance criteria fully addressed (or a small partial), even though substantive structural work was landed. This makes the reviewer's per-round verdict almost non-informative — it cannot distinguish "great round" from "bad round" because both report the same overall progress percentage. Plans should declare both a target scope and a minimum acceptable scope, with an explicit decision rule for what to do when execution can hit the minimum but not the target. The reviewer can then meaningfully grade rounds against the minimum boundary while still pointing toward the target.
4. Validate that the execution environment supports the plan before the loop starts.
The plan presumed a working hardware-and-software stack appropriate to the target task. The actual execution environment was missing key prerequisites, and this was discovered only after the first round of work was underway. Rounds 2 onward were entirely shaped by working around the environment instead of executing the plan. The plan-generation phase should produce an explicit "environment prerequisites" checklist, and the loop should run an environment-probe pre-flight (executing minimal smoke checks for each prerequisite) before Round 0. If a prerequisite fails the probe, the plan is revised or the user is asked before any implement-review cycles run.
Theme 3: Review effectiveness and stagnation detection
5. Reviews were specific and actionable, but the stagnation signal was buried.
The reviewer's findings were consistently concrete: precise citations, exact failure modes, and clear "next steps" lists. However, across rounds, large portions of the "required next implementation plan" text were near-verbatim copies of prior rounds. A reader skimming any single review would not realize that the mainline gap list had been almost identical for three consecutive rounds. The signal that the loop was circling lived in the diff between rounds, not in any single round. The reviewer prompt should include a directive to compute a "delta from previous round" summary — explicitly listing which mainline gaps moved, which closed, and which are unchanged. A rising "unchanged mainline gaps" count across rounds is a strong stagnation indicator that should feed into a softer circuit breaker (warning at N=2, hard stop at N=3) rather than waiting for the hard stagnation breaker to fire.
6. Reviews caught real issues with low false-positive rate, but the issues they caught shrank in importance over time.
The severity profile shifted across the five rounds: early reviews concerned the structural contract; later reviews concerned narrowing literal interpretations of one acceptance criterion and re-flagging cleanup hygiene. The methodology spent its last two rounds polishing the edges of a single acceptance criterion while the central acceptance criteria made no progress. The review machinery functioned, but it ran out of high-value work to do. Reviews should be required to bucket findings as mainline-vs-peripheral and to flag when, for two rounds in a row, all closed findings are peripheral. That signal — "we are closing only peripheral issues" — is a clean stagnation criterion that complements the unchanged-mainline-gaps signal.
Theme 4: Communication and round contracts
7. Round contracts narrowed scope appropriately, but the narrowing was not reconciled with the plan boundaries.
Each round contract correctly identified a tractable local objective. Local execution against those contracts was clean. But the contracts implicitly accepted scope reduction that the plan explicitly forbade, and the reviewer correctly refused to recognize that local progress as plan progress. Implementer and reviewer were operating on different success contracts; both were right in their own frame. A round contract should either (a) be a strict subset of the plan-level deliverables and the reviewer is told to grade only that subset, or (b) explicitly request a plan amendment if the round will not advance the plan. Mixing local-scope contracts with plan-scope reviews is what produced the verdict mismatch.
8. Per-round summaries were thorough, but they normalized non-progress.
The implementer's summaries grew progressively better at justifying why the central block was not advanced each round. By the final round, the deferral justification was a polished, internally-consistent paragraph that referenced earlier rounds and a pending user decision. This is good craftsmanship in writing, but it had the effect of making non-progress feel like a stable, defensible state and partially masked the underlying problem. Round summaries should include a quantitative "tasks closed this round" and "tasks closed cumulatively / total tasks" header at the top. Once that ratio plateaus across two rounds, the loop should trigger a planning check-in rather than a sixth round.
What worked well
- Clean local rounds. Each round was contained, the contract was followed, the validations described were what was actually performed, and the diff between rounds was small and focused.
- Honest blocker reporting. The implementer surfaced the external blocker as soon as it was confirmed and proposed concrete resolution paths rather than hiding it.
- Specific, citation-rich reviews. The reviewer consistently pointed to exact citations and the precise mismatch between expected and observed behavior, which made each round's "next action" unambiguous.
- Circuit breaker eventually fired. The stagnation breaker did catch the loop and force an exit, preventing an unbounded sequence of low-value rounds. The improvement opportunity is making that safety net trigger sooner and more gracefully.
Context
During an RLCR session, the loop ran five rounds (Round 0 → Round 4) before the stagnation circuit breaker forced exit. The plan declared an all-or-nothing scope (lower bound = upper bound = full target deliverable). An external execution-environment blocker appeared from Round 2 onward and was explicitly surfaced with resolution options in the Round 3 summary's update request. The loop continued anyway for two more rounds; the reviewer's verdict drifted from
ADVANCED(Rounds 0–2) toSTALLED(Rounds 3–4) as the mainline gap list barely changed. Local round contracts were well-scoped and cleanly executed, but they accepted scope reduction the plan forbade, so the reviewer never credited the work as plan-level progress. The methodology produced clean local rounds and the circuit breaker did eventually fire — but it fired after three rounds of diminishing returns.This issue summarizes the methodology-improvement suggestions distilled from a post-loop analysis. All content is sanitized; no project-specific details below.
Suggestions
Theme 1: Detecting and reacting to external blockers early
1. Treat reported "environment-bound" deferrals as a plan-level event, not a per-round footnote.
Once the implementer reports an unresolvable external constraint that prevents touching the mainline deliverable, every subsequent round will inherit the same constraint and produce only peripheral work. In the observed session, the implementer surfaced the blocker as early as Round 2 and explicitly listed four resolution options in a Round 3 update request, but the loop continued for two more rounds anyway. When a round summary contains an explicit "Goal Tracker Update Request" presenting decision options that only the user can resolve, the methodology should pause the implement-review loop and route to a human-decision gate. Continuing the loop after a clearly-articulated blocker is a form of busy-waiting that the circuit breaker eventually catches, but a dedicated escalation gate would catch it sooner and more cleanly.
2. Distinguish "implementer cannot proceed" from "implementer chose to defer."
Round summaries listed the same set of unimplemented tasks as "deferred" for environment reasons across multiple rounds. The reviewer flagged this each time as "unjustified deferrals" because, per the plan boundaries, no deferrals were valid. Both sides were correct under their own framing, but the loop had no way to converge. The methodology should require the round summary to label each deferred item with one of three causes (scope choice / dependency on earlier in-loop work / external blocker) and require the reviewer to acknowledge that taxonomy. External-blocker items should automatically trigger the escalation gate from suggestion #1 rather than recurring as "unjustified deferral" findings.
Theme 2: Plan rigidity vs. execution reality
3. A plan whose lower bound equals its upper bound is brittle under any unexpected friction.
The plan collapsed the acceptable outcome range to a single point: full feature parity, no partial credit. Any friction (environment, missing tools, scope discovery) renders the entire loop incapable of producing a "complete" result, and the reviewer is structurally forced to return "incomplete" every time. Every review across all five rounds reported zero acceptance criteria fully addressed (or a small partial), even though substantive structural work was landed. This makes the reviewer's per-round verdict almost non-informative — it cannot distinguish "great round" from "bad round" because both report the same overall progress percentage. Plans should declare both a target scope and a minimum acceptable scope, with an explicit decision rule for what to do when execution can hit the minimum but not the target. The reviewer can then meaningfully grade rounds against the minimum boundary while still pointing toward the target.
4. Validate that the execution environment supports the plan before the loop starts.
The plan presumed a working hardware-and-software stack appropriate to the target task. The actual execution environment was missing key prerequisites, and this was discovered only after the first round of work was underway. Rounds 2 onward were entirely shaped by working around the environment instead of executing the plan. The plan-generation phase should produce an explicit "environment prerequisites" checklist, and the loop should run an environment-probe pre-flight (executing minimal smoke checks for each prerequisite) before Round 0. If a prerequisite fails the probe, the plan is revised or the user is asked before any implement-review cycles run.
Theme 3: Review effectiveness and stagnation detection
5. Reviews were specific and actionable, but the stagnation signal was buried.
The reviewer's findings were consistently concrete: precise citations, exact failure modes, and clear "next steps" lists. However, across rounds, large portions of the "required next implementation plan" text were near-verbatim copies of prior rounds. A reader skimming any single review would not realize that the mainline gap list had been almost identical for three consecutive rounds. The signal that the loop was circling lived in the diff between rounds, not in any single round. The reviewer prompt should include a directive to compute a "delta from previous round" summary — explicitly listing which mainline gaps moved, which closed, and which are unchanged. A rising "unchanged mainline gaps" count across rounds is a strong stagnation indicator that should feed into a softer circuit breaker (warning at N=2, hard stop at N=3) rather than waiting for the hard stagnation breaker to fire.
6. Reviews caught real issues with low false-positive rate, but the issues they caught shrank in importance over time.
The severity profile shifted across the five rounds: early reviews concerned the structural contract; later reviews concerned narrowing literal interpretations of one acceptance criterion and re-flagging cleanup hygiene. The methodology spent its last two rounds polishing the edges of a single acceptance criterion while the central acceptance criteria made no progress. The review machinery functioned, but it ran out of high-value work to do. Reviews should be required to bucket findings as mainline-vs-peripheral and to flag when, for two rounds in a row, all closed findings are peripheral. That signal — "we are closing only peripheral issues" — is a clean stagnation criterion that complements the unchanged-mainline-gaps signal.
Theme 4: Communication and round contracts
7. Round contracts narrowed scope appropriately, but the narrowing was not reconciled with the plan boundaries.
Each round contract correctly identified a tractable local objective. Local execution against those contracts was clean. But the contracts implicitly accepted scope reduction that the plan explicitly forbade, and the reviewer correctly refused to recognize that local progress as plan progress. Implementer and reviewer were operating on different success contracts; both were right in their own frame. A round contract should either (a) be a strict subset of the plan-level deliverables and the reviewer is told to grade only that subset, or (b) explicitly request a plan amendment if the round will not advance the plan. Mixing local-scope contracts with plan-scope reviews is what produced the verdict mismatch.
8. Per-round summaries were thorough, but they normalized non-progress.
The implementer's summaries grew progressively better at justifying why the central block was not advanced each round. By the final round, the deferral justification was a polished, internally-consistent paragraph that referenced earlier rounds and a pending user decision. This is good craftsmanship in writing, but it had the effect of making non-progress feel like a stable, defensible state and partially masked the underlying problem. Round summaries should include a quantitative "tasks closed this round" and "tasks closed cumulatively / total tasks" header at the top. Once that ratio plateaus across two rounds, the loop should trigger a planning check-in rather than a sixth round.
What worked well