Summary
The worker prompt's review-handling protocol has a silent ordering assumption that, when violated, deadlocks the worker. Specifically: if the worker marks a step complete in STATUS.md and commits before calling review_step for that step, and the subsequent code review returns REVISE, the worker faces an unresolvable contradiction (STATUS says step is done, reviewer says it isn't) and spins until the orch's no-progress timeout kills the lane.
This is the single root cause of a complete batch failure observed in production use. Worth a P0 fix because there is no operator-side workaround beyond manual recovery — the worker cannot self-recover from this state.
Reproduction
- Configure a task with
Review Level: 2 (Plan + Code).
- Worker implements step N.
- Worker writes
[x] to all checkboxes for step N in STATUS.md and sets **Status:** ✅ Complete.
- Worker commits with
feat(<task>): complete Step N — ....
- Worker calls
review_step(step=N, type='code', baseline=<sha>).
- Reviewer returns
REVISE with valid issues.
- Worker is now stuck — every subsequent iteration reads
STATUS.md, sees step N complete, sees the REVISE feedback, and cannot reconcile. Multiple iterations of file-reading + git status + git diff produce no code changes. Eventually the orch's noProgressLimit (default 3) kills the lane.
The same alternative ordering is reachable from the current templates/agents/task-worker.md text — nothing in the prompt forbids it.
Concrete evidence (real run)
Batch 20260506T105850 (taskplane@0.28.4 / source includes TP-184), failed after 45 minutes:
- Worker commit
0ae7f49 feat(MIG-002): complete Step 2 — demo-request bot-protection wiring + email limiter + tests
- Then
.reviews/R006-code-step2.md returned REVISE (form_token verify must run before IP rate-limit per the task's check-ordering spec)
- Iterations 4 → 6:
events.jsonl shows the worker reading STATUS.md, PROMPT.md, demo-request.ts, plus git status / git diff calls. Zero code changes across 3 iterations. Each iteration ended with an exit-no-progress request.
- Lane killed at iteration 6 by the no-progress safety mechanism.
$ tail -10 .pi/runtime/20260506T105850/agents/orch-henrylach-lane-1-worker/events.jsonl | grep -oE '"type":"[^"]+"'
"type":"tool_call"
"type":"tool_result"
"type":"retry_started"
"type":"exit_intercepted"
"type":"agent_exited"
"type":"agent_started"
"type":"prompt_sent"
"type":"context_usage"
"type":"tool_result"
"type":"tool_call"
(Pattern: retry_started → exit_intercepted → agent_exited → agent_started → prompt_sent repeated for each no-progress iteration.)
The same anti-pattern (mark-complete-before-final-APPROVE) recurred in the successful retry batch 20260506T131717 on Step 3 — Step 3 was marked ✅ Complete in STATUS.md while reviewer-state.json showed status: 'running', reviewType: 'code', reviewStep: 3. We got lucky: the in-flight review (R009) returned APPROVE. If R009 had returned REVISE, the death-spiral would have repeated exactly. So this isn't a one-off worker glitch — it's a structural pattern the prompt enables.
Root cause
templates/agents/task-worker.md (around the "Handling verdicts" section) describes APPROVE/REVISE/RETHINK responses but assumes the worker hasn't yet finalised the step. There's no explicit ordering directive, and crucially no recovery recipe for "I marked this done but the reviewer wants a revision."
Fix proposals
Option A (preferred) — explicit ordering rule + recovery recipe
Add to templates/agents/task-worker.md:
## Order of operations for any step at Review Level ≥ 2
The order is mandatory:
1. **Implement** the step (write code, run targeted tests).
2. **Commit** the implementation: `feat(<TASK-ID>): complete Step N — ...`.
3. **Call `review_step(step=N, type='code', baseline=<HEAD-before-step-2>)`**.
4. **Handle the verdict:**
- APPROVE → proceed to step 4.
- REVISE → read the review file, fix the issues, commit (`fix(<TASK-ID>): address R<NNN> — ...`), re-call `review_step` with the same step + type. Repeat until APPROVE.
- RETHINK → reconsider plan, adjust, re-implement.
5. **Only after final APPROVE:** update `STATUS.md` — set `[x]` on all checkboxes for this step and `**Status:** ✅ Complete`. Commit the STATUS update (`chore(<TASK-ID>): finalise STATUS.md for Step N`).
6. Move to step N+1.
**Forbidden:** marking checkboxes `[x]` or setting `Status: ✅ Complete` before step 4's final APPROVE. If you've already done this and a subsequent review returns REVISE: revert the STATUS update (un-check the boxes, set `Status: 🟨 In Progress`), commit the revert, then proceed with the normal REVISE-handling at step 4.
Option B — tool-level guard
Make review_step check STATUS.md for the target step's completion mark. If marked complete:
review_step refuses to run: Step N is already marked ✅ Complete in STATUS.md.
If you need to address review feedback on a completed step, first revert the
status update (un-check boxes, set Status: 🟨 In Progress), commit the revert,
then call review_step again.
Recommendation
Ship both. Option A teaches the worker the right behaviour proactively; Option B catches the worker if it slips. The base prompt already includes other "MUST NOT" rules at this level of strictness (e.g., "Workers MUST NOT add, remove, or renumber steps during execution"), so an order-of-operations rule fits.
Scope of impact
This is the death-spiral trigger. Every taskplane batch with Review Level ≥ 2 is at risk; the failure is silent (no error message — just no progress) until the orch's safety mechanism kicks in 3 iterations later. The lost work in the failing batch is recoverable manually but not autonomously.
Acceptance criteria
Related
Affected version: taskplane@0.28.4 (source includes TP-184). Reproducible repo + batch state available on request.
Summary
The worker prompt's review-handling protocol has a silent ordering assumption that, when violated, deadlocks the worker. Specifically: if the worker marks a step complete in
STATUS.mdand commits before callingreview_stepfor that step, and the subsequent code review returnsREVISE, the worker faces an unresolvable contradiction (STATUS says step is done, reviewer says it isn't) and spins until the orch's no-progress timeout kills the lane.This is the single root cause of a complete batch failure observed in production use. Worth a P0 fix because there is no operator-side workaround beyond manual recovery — the worker cannot self-recover from this state.
Reproduction
Review Level: 2(Plan + Code).[x]to all checkboxes for step N inSTATUS.mdand sets**Status:** ✅ Complete.feat(<task>): complete Step N — ....review_step(step=N, type='code', baseline=<sha>).REVISEwith valid issues.STATUS.md, sees step N complete, sees the REVISE feedback, and cannot reconcile. Multiple iterations of file-reading +git status+git diffproduce no code changes. Eventually the orch'snoProgressLimit(default 3) kills the lane.The same alternative ordering is reachable from the current
templates/agents/task-worker.mdtext — nothing in the prompt forbids it.Concrete evidence (real run)
Batch
20260506T105850(taskplane@0.28.4 / source includes TP-184), failed after 45 minutes:0ae7f49 feat(MIG-002): complete Step 2 — demo-request bot-protection wiring + email limiter + tests.reviews/R006-code-step2.mdreturned REVISE (form_token verify must run before IP rate-limit per the task's check-ordering spec)events.jsonlshows the worker reading STATUS.md, PROMPT.md, demo-request.ts, plusgit status/git diffcalls. Zero code changes across 3 iterations. Each iteration ended with an exit-no-progress request.(Pattern:
retry_started → exit_intercepted → agent_exited → agent_started → prompt_sentrepeated for each no-progress iteration.)The same anti-pattern (mark-complete-before-final-APPROVE) recurred in the successful retry batch
20260506T131717on Step 3 — Step 3 was marked✅ CompleteinSTATUS.mdwhilereviewer-state.jsonshowedstatus: 'running', reviewType: 'code', reviewStep: 3. We got lucky: the in-flight review (R009) returned APPROVE. If R009 had returned REVISE, the death-spiral would have repeated exactly. So this isn't a one-off worker glitch — it's a structural pattern the prompt enables.Root cause
templates/agents/task-worker.md(around the "Handling verdicts" section) describes APPROVE/REVISE/RETHINK responses but assumes the worker hasn't yet finalised the step. There's no explicit ordering directive, and crucially no recovery recipe for "I marked this done but the reviewer wants a revision."Fix proposals
Option A (preferred) — explicit ordering rule + recovery recipe
Add to
templates/agents/task-worker.md:Option B — tool-level guard
Make
review_stepcheckSTATUS.mdfor the target step's completion mark. If marked complete:Recommendation
Ship both. Option A teaches the worker the right behaviour proactively; Option B catches the worker if it slips. The base prompt already includes other "MUST NOT" rules at this level of strictness (e.g., "Workers MUST NOT add, remove, or renumber steps during execution"), so an order-of-operations rule fits.
Scope of impact
This is the death-spiral trigger. Every taskplane batch with
Review Level ≥ 2is at risk; the failure is silent (no error message — just no progress) until the orch's safety mechanism kicks in 3 iterations later. The lost work in the failing batch is recoverable manually but not autonomously.Acceptance criteria
review_stepfollows the recovery recipe (revert STATUS, fix, re-review) without operator intervention.review_stepreturns the documented refusal message when STATUS shows the step complete.Related
Affected version:
taskplane@0.28.4(source includes TP-184). Reproducible repo + batch state available on request.