Problem
The RLCR review phase can enter an infinite loop: the reviewer re-scans the diff from scratch each round and repeatedly raises identical out-of-scope P2 findings. In a real humanize rlcr loop session, the implementation phase completed in 10 rounds, but the review phase ran for 38 additional rounds (48 total) — 79% of rounds produced zero output.
Root Causes
- Reviewer has no cross-round state — Each round scans the diff independently; it doesn't know a finding was already classified as "queued" last round, so it re-raises it identically
- No stagnation detection in the harness — max_iterations only caps the implementation phase; the review phase has no upper bound. The agent requested cancellation from Round 23 through Round 48 with no effect
- Pre-existing code contaminates review scope — Code in the base..HEAD diff that was not introduced by the current session is treated as blocking
Suggested Improvements
| Improvement |
Description |
| Queued-issue suppression |
Once both agents agree a finding is queued, record it in a session-level ledger (keyed by location + description hash). Skip it in subsequent rounds unless the code at that location changes |
| Stagnation detection |
If the finding set is unchanged for 3+ consecutive rounds with no code changes, auto-halt the review phase |
| Review scope boundary |
Plans should declare a change boundary. Findings outside it are advisory, not blocking |
| Separate review-phase max iterations |
Add an independent cap for the review phase (e.g., 10 rounds). Exit with an unresolved-findings summary instead of looping forever |
Impact
Observed: 10 productive rounds + 38 empty rounds. With stagnation detection, auto-halt would have triggered at Round 14, saving 34 rounds of compute and human time.
Problem
The RLCR review phase can enter an infinite loop: the reviewer re-scans the diff from scratch each round and repeatedly raises identical out-of-scope P2 findings. In a real humanize rlcr loop session, the implementation phase completed in 10 rounds, but the review phase ran for 38 additional rounds (48 total) — 79% of rounds produced zero output.
Root Causes
Suggested Improvements
Impact
Observed: 10 productive rounds + 38 empty rounds. With stagnation detection, auto-halt would have triggered at Round 14, saving 34 rounds of compute and human time.