Skip to content

Worker spins indefinitely when code review returns REVISE on a step already marked complete in STATUS #537

@HenryLach

Description

@HenryLach

Summary

The worker prompt's review-handling protocol has a silent ordering assumption that, when violated, deadlocks the worker. Specifically: if the worker marks a step complete in STATUS.md and commits before calling review_step for that step, and the subsequent code review returns REVISE, the worker faces an unresolvable contradiction (STATUS says step is done, reviewer says it isn't) and spins until the orch's no-progress timeout kills the lane.

This is the single root cause of a complete batch failure observed in production use. Worth a P0 fix because there is no operator-side workaround beyond manual recovery — the worker cannot self-recover from this state.

Reproduction

  1. Configure a task with Review Level: 2 (Plan + Code).
  2. Worker implements step N.
  3. Worker writes [x] to all checkboxes for step N in STATUS.md and sets **Status:** ✅ Complete.
  4. Worker commits with feat(<task>): complete Step N — ....
  5. Worker calls review_step(step=N, type='code', baseline=<sha>).
  6. Reviewer returns REVISE with valid issues.
  7. Worker is now stuck — every subsequent iteration reads STATUS.md, sees step N complete, sees the REVISE feedback, and cannot reconcile. Multiple iterations of file-reading + git status + git diff produce no code changes. Eventually the orch's noProgressLimit (default 3) kills the lane.

The same alternative ordering is reachable from the current templates/agents/task-worker.md text — nothing in the prompt forbids it.

Concrete evidence (real run)

Batch 20260506T105850 (taskplane@0.28.4 / source includes TP-184), failed after 45 minutes:

  • Worker commit 0ae7f49 feat(MIG-002): complete Step 2 — demo-request bot-protection wiring + email limiter + tests
  • Then .reviews/R006-code-step2.md returned REVISE (form_token verify must run before IP rate-limit per the task's check-ordering spec)
  • Iterations 4 → 6: events.jsonl shows the worker reading STATUS.md, PROMPT.md, demo-request.ts, plus git status / git diff calls. Zero code changes across 3 iterations. Each iteration ended with an exit-no-progress request.
  • Lane killed at iteration 6 by the no-progress safety mechanism.
$ tail -10 .pi/runtime/20260506T105850/agents/orch-henrylach-lane-1-worker/events.jsonl | grep -oE '"type":"[^"]+"'
"type":"tool_call"
"type":"tool_result"
"type":"retry_started"
"type":"exit_intercepted"
"type":"agent_exited"
"type":"agent_started"
"type":"prompt_sent"
"type":"context_usage"
"type":"tool_result"
"type":"tool_call"

(Pattern: retry_started → exit_intercepted → agent_exited → agent_started → prompt_sent repeated for each no-progress iteration.)

The same anti-pattern (mark-complete-before-final-APPROVE) recurred in the successful retry batch 20260506T131717 on Step 3 — Step 3 was marked ✅ Complete in STATUS.md while reviewer-state.json showed status: 'running', reviewType: 'code', reviewStep: 3. We got lucky: the in-flight review (R009) returned APPROVE. If R009 had returned REVISE, the death-spiral would have repeated exactly. So this isn't a one-off worker glitch — it's a structural pattern the prompt enables.

Root cause

templates/agents/task-worker.md (around the "Handling verdicts" section) describes APPROVE/REVISE/RETHINK responses but assumes the worker hasn't yet finalised the step. There's no explicit ordering directive, and crucially no recovery recipe for "I marked this done but the reviewer wants a revision."

Fix proposals

Option A (preferred) — explicit ordering rule + recovery recipe

Add to templates/agents/task-worker.md:

## Order of operations for any step at Review Level ≥ 2

The order is mandatory:

1. **Implement** the step (write code, run targeted tests).
2. **Commit** the implementation: `feat(<TASK-ID>): complete Step N — ...`.
3. **Call `review_step(step=N, type='code', baseline=<HEAD-before-step-2>)`**.
4. **Handle the verdict:**
   - APPROVE → proceed to step 4.
   - REVISE → read the review file, fix the issues, commit (`fix(<TASK-ID>): address R<NNN> — ...`), re-call `review_step` with the same step + type. Repeat until APPROVE.
   - RETHINK → reconsider plan, adjust, re-implement.
5. **Only after final APPROVE:** update `STATUS.md` — set `[x]` on all checkboxes for this step and `**Status:** ✅ Complete`. Commit the STATUS update (`chore(<TASK-ID>): finalise STATUS.md for Step N`).
6. Move to step N+1.

**Forbidden:** marking checkboxes `[x]` or setting `Status: ✅ Complete` before step 4's final APPROVE. If you've already done this and a subsequent review returns REVISE: revert the STATUS update (un-check the boxes, set `Status: 🟨 In Progress`), commit the revert, then proceed with the normal REVISE-handling at step 4.

Option B — tool-level guard

Make review_step check STATUS.md for the target step's completion mark. If marked complete:

review_step refuses to run: Step N is already marked ✅ Complete in STATUS.md.
If you need to address review feedback on a completed step, first revert the
status update (un-check boxes, set Status: 🟨 In Progress), commit the revert,
then call review_step again.

Recommendation

Ship both. Option A teaches the worker the right behaviour proactively; Option B catches the worker if it slips. The base prompt already includes other "MUST NOT" rules at this level of strictness (e.g., "Workers MUST NOT add, remove, or renumber steps during execution"), so an order-of-operations rule fits.

Scope of impact

This is the death-spiral trigger. Every taskplane batch with Review Level ≥ 2 is at risk; the failure is silent (no error message — just no progress) until the orch's safety mechanism kicks in 3 iterations later. The lost work in the failing batch is recoverable manually but not autonomously.

Acceptance criteria

  • Test: a worker that marks STATUS complete + commits + then receives REVISE on review_step follows the recovery recipe (revert STATUS, fix, re-review) without operator intervention.
  • Test: a worker that does the order correctly (review before STATUS) is unchanged in behaviour.
  • If Option B is implemented: review_step returns the documented refusal message when STATUS shows the step complete.

Related

Affected version: taskplane@0.28.4 (source includes TP-184). Reproducible repo + batch state available on request.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions