Worker pattern: premature step-completion in STATUS before final review APPROVE

## Summary

Workers exhibit a recurring behaviour pattern of marking step checkboxes `[x]` and setting `**Status:** ✅ Complete` in `STATUS.md` *before* the final code-review verdict for that step has returned APPROVE. This pattern is the proximate trigger for the death-spiral in #537, but it's also observable in successful runs — it's "lucky" when the in-flight review returns APPROVE; "unlucky" when it returns REVISE.

This is closely related to #1 but worth tracking separately because the symptoms are different:
- **#1** is the *deadlock* that occurs when this pattern intersects with a REVISE verdict
- **This issue** is the *behavioural pattern* itself, observable across both succeeding and failing batches

Hardening against the pattern (regardless of whether REVISE eventually arrives) reduces a class of latent risk.

## Reproduction

Run any task at `Review Level: 2` with non-trivial implementation work (so the review takes more than a few seconds). Watch the worker's STATUS.md edits relative to its `review_step` calls.

Observation: the worker frequently:
1. Implements step N
2. Updates STATUS.md to mark all checkboxes `[x]` and set `Status: ✅ Complete`
3. Commits with `feat(<task>): complete Step N — ...`
4. Calls `review_step(step=N, type='code')`

Steps 2 and 3 commit the worker to a "this step is done" stance before step 4 has confirmed it.

## Concrete evidence

**Failed batch `20260506T105850`** (Step 2):

- `0ae7f49 feat(MIG-002): complete Step 2 — demo-request bot-protection wiring + email limiter + tests` ← STATUS marked complete here
- `R006-code-step2.md` returned REVISE *after* the commit
- Death-spiral followed (issue #537)

**Successful batch `20260506T131717`** (Step 3) — same pattern, different luck:

- Worker committed `bd1b09e feat(MIG-002): complete Step 3 — wire DemoForm to Turnstile + form-token`
- STATUS.md showed Step 3 ✅ Complete
- `.reviewer-state.json` showed `status: 'running', reviewType: 'code', reviewStep: 3`
- R009 happened to return APPROVE; if it had returned REVISE we'd have repeated the death-spiral
- This batch succeeded *despite* the same anti-pattern, not because of correct protocol

The pattern recurs because the current worker prompt's "Order of operations" guidance is implicit — the order is implied by the description of what each phase does, but no explicit rule forbids marking the step complete before the final verdict arrives.

## Root cause

The base task-worker prompt (`templates/agents/task-worker.md`) describes:
- "Hydrate STATUS.md with sub-checkboxes for the step's work"
- "Implement the step"
- "Run targeted tests"
- "Call `review_step(step=N, type='code')`"
- "Handle the verdict"

But doesn't explicitly say *when* the worker should set checkboxes to `[x]` or change `Status: ✅ Complete`. A reasonable interpretation is "mark complete when you're done implementing" — which is wrong (it should be "when the final code-review APPROVE has landed").

This is the same root cause as issue #537 but viewed from the *behavioural* angle rather than the *deadlock* angle. The fix proposed in #537 (explicit ordering rule + recovery recipe) also resolves this issue.

## Fix proposals

### A. Explicit ordering rule (mirrors #1)

Add to the base prompt:

```markdown
## When to mark a step complete

Set `[x]` on a checkbox the moment the underlying outcome is satisfied (e.g.,
"DemoForm.astro adds hidden form_token input" → tick once the input is
written and committed).

Set `**Status:** ✅ Complete` on a step ONLY after:

1. All checkboxes are `[x]`, AND
2. For Review Level ≥ 2: the final code-review for that step has returned APPROVE.

If you mark `Status: ✅ Complete` and a subsequent review returns REVISE,
revert the status update (set back to `🟨 In Progress`), commit the revert,
then handle the REVISE through the normal recipe.
```

### B. STATUS.md template hint

Update the STATUS.md template (in `references/prompt-template.md` or wherever) so each step has a textual note:

```markdown
### Step N: ...
**Status:** ⬜ Not Started

> Set Status to ✅ Complete only after the final code-review APPROVE.
> See worker prompt §"When to mark a step complete".

- [ ] ...
```

The visible reminder in the per-step block is a constant nudge.

### C. Reviewer pre-check

When the reviewer is invoked for `review_step(type='code', step=N)`, it can read STATUS.md and surface a `Pattern Violations` finding if the step is already marked complete:

> **Pattern Violations:** Step N is already marked `✅ Complete` in STATUS.md before this review's verdict was issued. Mark steps complete only after final APPROVE — see worker prompt §"When to mark a step complete". This is a process-only finding; it does not change the verdict on the implementation itself.

### Recommendation

Ship A + B. They reinforce each other: the prompt teaches; the template reminds. C is a nice-to-have observability layer that flags the issue when it slips through.

## Why this is P2 not P1

The pattern is observed in production but doesn't *always* cause failure — it only fails when the in-flight review returns REVISE. So the impact is "latent risk that occasionally manifests as #1's deadlock". Fixing #1 mitigates the worst case; fixing this issue removes the latent risk entirely.

## Acceptance criteria

- [ ] Worker prompt explicitly states when to mark a step `Status: ✅ Complete`.
- [ ] STATUS.md template includes a per-step reminder.
- [ ] Test: a worker that marks a step complete before the final code review's APPROVE shows the pattern violation in the review (if option C is implemented) or in a hydration linter.

## Related

- Issue #537 — the deadlock that this pattern triggers when REVISE arrives.
- Issue #538 — downstream UX failure when the deadlock kills a lane.
- Issue #541 — reviewer's inability to catch quality issues per step is what allows this pattern to look "safe" to the worker.

Affected version: `taskplane@0.28.4`. Two production batches (20260506T105850 failed, 20260506T131717 succeeded) both exhibit the pattern. Detailed event logs available on request.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Worker pattern: premature step-completion in STATUS before final review APPROVE #542

Summary

Reproduction

Concrete evidence

Root cause

Fix proposals

A. Explicit ordering rule (mirrors #1)

B. STATUS.md template hint

C. Reviewer pre-check

Recommendation

Why this is P2 not P1

Acceptance criteria

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Worker pattern: premature step-completion in STATUS before final review APPROVE #542

Description

Summary

Reproduction

Concrete evidence

Root cause

Fix proposals

A. Explicit ordering rule (mirrors #1)

B. STATUS.md template hint

C. Reviewer pre-check

Recommendation

Why this is P2 not P1

Acceptance criteria

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions