Summary
Ran a 9-round RLCR session against a plan whose central deliverable required multi-hour
external compute. The loop terminated by user intervention via the stagnation circuit
breaker. The mechanical iteration was healthy; the structural weakness was a mismatch
between wall-clock-gated plan shape and per-round closure semantics. Below are 10
concrete methodology suggestions derived from a sanitized post-session analysis.
Key observation
The reviewer's strict "did this round close a new mainline task?" metric returned
negative for every round between launch and completion of the external compute job,
regardless of what the executor actually did. This pushed the executor toward smaller
and smaller cleanup deliverables to "earn" forward motion, sometimes introducing
collateral bugs the next round had to revert.
Improvement suggestions (pattern → fix)
1. Wall-clock-gated plans collide with per-round closure semantics
Add a plan-level annotation for tasks gated on external time (compute jobs, human
approvals, scheduled events, etc.). When set and unresolved, evaluate the round against
an alternate criterion ("preserve readiness + capture available evidence") rather than
strict closure. Reviewer-controlled, not executor-controlled.
2. Verdicts conflate side-issue progress with true stagnation
Split the verdict into two orthogonal axes: mainline-closure (yes/no) and round-quality
(advanced / repaired / cleanup-only / regressed / circular). The stagnation circuit
breaker fires on the round-quality axis, not on closure.
should auto-escalate to the user before the standard circuit breaker.
4. Executor over-claims that reviewer must correct
Require summaries to include a machine-checkable claims block (artifact X exists at
path Y, commit Z touches files A/B/C, tracker row N reads ...). A pre-review hook
mechanically verifies those specific claims and rejects the summary back to the
executor before invoking the reviewer.
5. "Cleanup" edits that change behavior get classified as cleanup
When a round describes itself as cleanup-only or behavior-preserving, require a
diff-based check that flags schema renames, interface changes, or public-symbol
renames. Force explicit reclassification as a behavior change rather than letting it
pass as cleanup.
6. Paperwork volume scales with rounds, not with progress
When a round inherits a stable wall-clock blocker from the prior round, allow a
delta-only paperwork mode: short summary referencing the previous full summary by hash,
plus only what changed. Full re-baseline when the blocker clears, the scope changes
materially, or every K rounds.
7. No first-class "plan-unattainable" terminal state
Add an explicit terminal state distinct from STALLED / REGRESSED for "plan success
criteria are now permanently unreachable on this attempt" (artifact destroyed,
permission revoked, deadline passed, etc.). A round can request this state with
evidence; the reviewer confirms or rejects. Confirmed plan-unattainable produces clean
documented closure rather than indefinite stall.
8. Self-referential commit-hash placeholders
Pre-commit / round-finalization check that scans round artifacts for self-referential
placeholders matching common patterns (literal "this commit", "to be filled",
angle-bracket placeholders). Reject the round or mark artifacts as pending post-commit
fixup so the issue is tracked instead of silently shipped.
9. Tracker as ground truth worked; summaries did not
Codify the asymmetry observed in this session: the tracker (especially with reviewer
write access) is the canonical state of record; summaries are narrative commentary that
must reference but cannot contradict the tracker. Auto-flag any summary claim that
conflicts with the tracker.
10. "Produce nothing this round" should be acceptable when blocked
Make a no-op round explicitly acceptable when a wall-clock-gated task is the only
mainline work. Pair with a longer scheduling delay before the next round (rather than
firing immediately and demanding new output). This reduces the busywork attractor and
aligns loop cadence with plan cadence.
What worked well
- Goal-tracker discipline (especially with reviewer write access) stayed coherent
across all rounds and was the most reliable artifact.
- The reviewer caught at least three substantive bugs the executor missed: a schema
rename masquerading as cleanup, a factual transcription error in a results table, and
an over-claim of artifact landing. Direct evidence of verification value beyond
self-review.
- Communication was detailed and traceable to file lines; the cost was volume.
Summary
Ran a 9-round RLCR session against a plan whose central deliverable required multi-hour
external compute. The loop terminated by user intervention via the stagnation circuit
breaker. The mechanical iteration was healthy; the structural weakness was a mismatch
between wall-clock-gated plan shape and per-round closure semantics. Below are 10
concrete methodology suggestions derived from a sanitized post-session analysis.
Key observation
The reviewer's strict "did this round close a new mainline task?" metric returned
negative for every round between launch and completion of the external compute job,
regardless of what the executor actually did. This pushed the executor toward smaller
and smaller cleanup deliverables to "earn" forward motion, sometimes introducing
collateral bugs the next round had to revert.
Improvement suggestions (pattern → fix)
1. Wall-clock-gated plans collide with per-round closure semantics
Add a plan-level annotation for tasks gated on external time (compute jobs, human
approvals, scheduled events, etc.). When set and unresolved, evaluate the round against
an alternate criterion ("preserve readiness + capture available evidence") rather than
strict closure. Reviewer-controlled, not executor-controlled.
2. Verdicts conflate side-issue progress with true stagnation
Split the verdict into two orthogonal axes: mainline-closure (yes/no) and round-quality
(advanced / repaired / cleanup-only / regressed / circular). The stagnation circuit
breaker fires on the round-quality axis, not on closure.
should auto-escalate to the user before the standard circuit breaker.
4. Executor over-claims that reviewer must correct
Require summaries to include a machine-checkable claims block (artifact X exists at
path Y, commit Z touches files A/B/C, tracker row N reads ...). A pre-review hook
mechanically verifies those specific claims and rejects the summary back to the
executor before invoking the reviewer.
5. "Cleanup" edits that change behavior get classified as cleanup
When a round describes itself as cleanup-only or behavior-preserving, require a
diff-based check that flags schema renames, interface changes, or public-symbol
renames. Force explicit reclassification as a behavior change rather than letting it
pass as cleanup.
6. Paperwork volume scales with rounds, not with progress
When a round inherits a stable wall-clock blocker from the prior round, allow a
delta-only paperwork mode: short summary referencing the previous full summary by hash,
plus only what changed. Full re-baseline when the blocker clears, the scope changes
materially, or every K rounds.
7. No first-class "plan-unattainable" terminal state
Add an explicit terminal state distinct from STALLED / REGRESSED for "plan success
criteria are now permanently unreachable on this attempt" (artifact destroyed,
permission revoked, deadline passed, etc.). A round can request this state with
evidence; the reviewer confirms or rejects. Confirmed plan-unattainable produces clean
documented closure rather than indefinite stall.
8. Self-referential commit-hash placeholders
Pre-commit / round-finalization check that scans round artifacts for self-referential
placeholders matching common patterns (literal "this commit", "to be filled",
angle-bracket placeholders). Reject the round or mark artifacts as pending post-commit
fixup so the issue is tracked instead of silently shipped.
9. Tracker as ground truth worked; summaries did not
Codify the asymmetry observed in this session: the tracker (especially with reviewer
write access) is the canonical state of record; summaries are narrative commentary that
must reference but cannot contradict the tracker. Auto-flag any summary claim that
conflicts with the tracker.
10. "Produce nothing this round" should be acceptable when blocked
Make a no-op round explicitly acceptable when a wall-clock-gated task is the only
mainline work. Pair with a longer scheduling delay before the next round (rather than
firing immediately and demanding new output). This reduces the busywork attractor and
aligns loop cadence with plan cadence.
What worked well
across all rounds and was the most reliable artifact.
rename masquerading as cleanup, a factual transcription error in a results table, and
an over-claim of artifact landing. Direct evidence of verification value beyond
self-review.