Worker spins indefinitely when code review returns REVISE on a step already marked complete in STATUS

## Summary

The worker prompt's review-handling protocol has a silent ordering assumption that, when violated, deadlocks the worker. Specifically: if the worker marks a step complete in `STATUS.md` and commits *before* calling `review_step` for that step, and the subsequent code review returns `REVISE`, the worker faces an unresolvable contradiction (STATUS says step is done, reviewer says it isn't) and spins until the orch's no-progress timeout kills the lane.

This is the **single root cause** of a complete batch failure observed in production use. Worth a P0 fix because there is no operator-side workaround beyond manual recovery — the worker cannot self-recover from this state.

## Reproduction

1. Configure a task with `Review Level: 2` (Plan + Code).
2. Worker implements step N.
3. Worker writes `[x]` to all checkboxes for step N in `STATUS.md` and sets `**Status:** ✅ Complete`.
4. Worker commits with `feat(<task>): complete Step N — ...`.
5. Worker calls `review_step(step=N, type='code', baseline=<sha>)`.
6. Reviewer returns `REVISE` with valid issues.
7. Worker is now stuck — every subsequent iteration reads `STATUS.md`, sees step N complete, sees the REVISE feedback, and cannot reconcile. Multiple iterations of file-reading + `git status` + `git diff` produce no code changes. Eventually the orch's `noProgressLimit` (default 3) kills the lane.

The same alternative ordering is reachable from the current `templates/agents/task-worker.md` text — nothing in the prompt forbids it.

## Concrete evidence (real run)

Batch `20260506T105850` (taskplane@0.28.4 / source includes TP-184), failed after 45 minutes:

- Worker commit `0ae7f49 feat(MIG-002): complete Step 2 — demo-request bot-protection wiring + email limiter + tests`
- Then `.reviews/R006-code-step2.md` returned **REVISE** (form_token verify must run before IP rate-limit per the task's check-ordering spec)
- Iterations 4 → 6: `events.jsonl` shows the worker reading STATUS.md, PROMPT.md, demo-request.ts, plus `git status` / `git diff` calls. **Zero code changes across 3 iterations.** Each iteration ended with an exit-no-progress request.
- Lane killed at iteration 6 by the no-progress safety mechanism.

```
$ tail -10 .pi/runtime/20260506T105850/agents/orch-henrylach-lane-1-worker/events.jsonl | grep -oE '"type":"[^"]+"'
"type":"tool_call"
"type":"tool_result"
"type":"retry_started"
"type":"exit_intercepted"
"type":"agent_exited"
"type":"agent_started"
"type":"prompt_sent"
"type":"context_usage"
"type":"tool_result"
"type":"tool_call"
```

(Pattern: `retry_started → exit_intercepted → agent_exited → agent_started → prompt_sent` repeated for each no-progress iteration.)

The same anti-pattern (mark-complete-before-final-APPROVE) recurred in the **successful** retry batch `20260506T131717` on Step 3 — Step 3 was marked `✅ Complete` in `STATUS.md` while `reviewer-state.json` showed `status: 'running', reviewType: 'code', reviewStep: 3`. We got lucky: the in-flight review (`R009`) returned APPROVE. If R009 had returned REVISE, the death-spiral would have repeated exactly. So this isn't a one-off worker glitch — it's a structural pattern the prompt enables.

## Root cause

`templates/agents/task-worker.md` (around the "Handling verdicts" section) describes APPROVE/REVISE/RETHINK responses but assumes the worker hasn't yet finalised the step. There's no explicit ordering directive, and crucially no recovery recipe for "I marked this done but the reviewer wants a revision."

## Fix proposals

### Option A (preferred) — explicit ordering rule + recovery recipe

Add to `templates/agents/task-worker.md`:

```markdown
## Order of operations for any step at Review Level ≥ 2

The order is mandatory:

1. **Implement** the step (write code, run targeted tests).
2. **Commit** the implementation: `feat(<TASK-ID>): complete Step N — ...`.
3. **Call `review_step(step=N, type='code', baseline=<HEAD-before-step-2>)`**.
4. **Handle the verdict:**
   - APPROVE → proceed to step 4.
   - REVISE → read the review file, fix the issues, commit (`fix(<TASK-ID>): address R<NNN> — ...`), re-call `review_step` with the same step + type. Repeat until APPROVE.
   - RETHINK → reconsider plan, adjust, re-implement.
5. **Only after final APPROVE:** update `STATUS.md` — set `[x]` on all checkboxes for this step and `**Status:** ✅ Complete`. Commit the STATUS update (`chore(<TASK-ID>): finalise STATUS.md for Step N`).
6. Move to step N+1.

**Forbidden:** marking checkboxes `[x]` or setting `Status: ✅ Complete` before step 4's final APPROVE. If you've already done this and a subsequent review returns REVISE: revert the STATUS update (un-check the boxes, set `Status: 🟨 In Progress`), commit the revert, then proceed with the normal REVISE-handling at step 4.
```

### Option B — tool-level guard

Make `review_step` check `STATUS.md` for the target step's completion mark. If marked complete:

```
review_step refuses to run: Step N is already marked ✅ Complete in STATUS.md.
If you need to address review feedback on a completed step, first revert the
status update (un-check boxes, set Status: 🟨 In Progress), commit the revert,
then call review_step again.
```

### Recommendation

Ship **both**. Option A teaches the worker the right behaviour proactively; Option B catches the worker if it slips. The base prompt already includes other "MUST NOT" rules at this level of strictness (e.g., "Workers MUST NOT add, remove, or renumber steps during execution"), so an order-of-operations rule fits.

## Scope of impact

This is the death-spiral trigger. Every taskplane batch with `Review Level ≥ 2` is at risk; the failure is silent (no error message — just no progress) until the orch's safety mechanism kicks in 3 iterations later. The lost work in the failing batch is recoverable manually but not autonomously.

## Acceptance criteria

- [ ] Test: a worker that marks STATUS complete + commits + then receives REVISE on `review_step` follows the recovery recipe (revert STATUS, fix, re-review) without operator intervention.
- [ ] Test: a worker that does the order correctly (review before STATUS) is unchanged in behaviour.
- [ ] If Option B is implemented: `review_step` returns the documented refusal message when STATUS shows the step complete.

## Related

- This and #6 (premature step-completion as a recurring pattern) share a root cause.
- The cascading zombie alerts that followed the death-spiral are tracked in #538.

Affected version: `taskplane@0.28.4` (source includes TP-184). Reproducible repo + batch state available on request.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Worker spins indefinitely when code review returns REVISE on a step already marked complete in STATUS #537

Summary

Reproduction

Concrete evidence (real run)

Root cause

Fix proposals

Option A (preferred) — explicit ordering rule + recovery recipe

Option B — tool-level guard

Recommendation

Scope of impact

Acceptance criteria

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Worker spins indefinitely when code review returns REVISE on a step already marked complete in STATUS #537

Description

Summary

Reproduction

Concrete evidence (real run)

Root cause

Fix proposals

Option A (preferred) — explicit ordering rule + recovery recipe

Option B — tool-level guard

Recommendation

Scope of impact

Acceptance criteria

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions