Skip to content

Worker pattern: premature step-completion in STATUS before final review APPROVE #542

@HenryLach

Description

@HenryLach

Summary

Workers exhibit a recurring behaviour pattern of marking step checkboxes [x] and setting **Status:** ✅ Complete in STATUS.md before the final code-review verdict for that step has returned APPROVE. This pattern is the proximate trigger for the death-spiral in #537, but it's also observable in successful runs — it's "lucky" when the in-flight review returns APPROVE; "unlucky" when it returns REVISE.

This is closely related to #1 but worth tracking separately because the symptoms are different:

Hardening against the pattern (regardless of whether REVISE eventually arrives) reduces a class of latent risk.

Reproduction

Run any task at Review Level: 2 with non-trivial implementation work (so the review takes more than a few seconds). Watch the worker's STATUS.md edits relative to its review_step calls.

Observation: the worker frequently:

  1. Implements step N
  2. Updates STATUS.md to mark all checkboxes [x] and set Status: ✅ Complete
  3. Commits with feat(<task>): complete Step N — ...
  4. Calls review_step(step=N, type='code')

Steps 2 and 3 commit the worker to a "this step is done" stance before step 4 has confirmed it.

Concrete evidence

Failed batch 20260506T105850 (Step 2):

Successful batch 20260506T131717 (Step 3) — same pattern, different luck:

  • Worker committed bd1b09e feat(MIG-002): complete Step 3 — wire DemoForm to Turnstile + form-token
  • STATUS.md showed Step 3 ✅ Complete
  • .reviewer-state.json showed status: 'running', reviewType: 'code', reviewStep: 3
  • R009 happened to return APPROVE; if it had returned REVISE we'd have repeated the death-spiral
  • This batch succeeded despite the same anti-pattern, not because of correct protocol

The pattern recurs because the current worker prompt's "Order of operations" guidance is implicit — the order is implied by the description of what each phase does, but no explicit rule forbids marking the step complete before the final verdict arrives.

Root cause

The base task-worker prompt (templates/agents/task-worker.md) describes:

  • "Hydrate STATUS.md with sub-checkboxes for the step's work"
  • "Implement the step"
  • "Run targeted tests"
  • "Call review_step(step=N, type='code')"
  • "Handle the verdict"

But doesn't explicitly say when the worker should set checkboxes to [x] or change Status: ✅ Complete. A reasonable interpretation is "mark complete when you're done implementing" — which is wrong (it should be "when the final code-review APPROVE has landed").

This is the same root cause as issue #537 but viewed from the behavioural angle rather than the deadlock angle. The fix proposed in #537 (explicit ordering rule + recovery recipe) also resolves this issue.

Fix proposals

A. Explicit ordering rule (mirrors #1)

Add to the base prompt:

## When to mark a step complete

Set `[x]` on a checkbox the moment the underlying outcome is satisfied (e.g.,
"DemoForm.astro adds hidden form_token input" → tick once the input is
written and committed).

Set `**Status:** ✅ Complete` on a step ONLY after:

1. All checkboxes are `[x]`, AND
2. For Review Level ≥ 2: the final code-review for that step has returned APPROVE.

If you mark `Status: ✅ Complete` and a subsequent review returns REVISE,
revert the status update (set back to `🟨 In Progress`), commit the revert,
then handle the REVISE through the normal recipe.

B. STATUS.md template hint

Update the STATUS.md template (in references/prompt-template.md or wherever) so each step has a textual note:

### Step N: ...
**Status:** ⬜ Not Started

> Set Status to ✅ Complete only after the final code-review APPROVE.
> See worker prompt §"When to mark a step complete".

- [ ] ...

The visible reminder in the per-step block is a constant nudge.

C. Reviewer pre-check

When the reviewer is invoked for review_step(type='code', step=N), it can read STATUS.md and surface a Pattern Violations finding if the step is already marked complete:

Pattern Violations: Step N is already marked ✅ Complete in STATUS.md before this review's verdict was issued. Mark steps complete only after final APPROVE — see worker prompt §"When to mark a step complete". This is a process-only finding; it does not change the verdict on the implementation itself.

Recommendation

Ship A + B. They reinforce each other: the prompt teaches; the template reminds. C is a nice-to-have observability layer that flags the issue when it slips through.

Why this is P2 not P1

The pattern is observed in production but doesn't always cause failure — it only fails when the in-flight review returns REVISE. So the impact is "latent risk that occasionally manifests as #1's deadlock". Fixing #1 mitigates the worst case; fixing this issue removes the latent risk entirely.

Acceptance criteria

  • Worker prompt explicitly states when to mark a step Status: ✅ Complete.
  • STATUS.md template includes a per-step reminder.
  • Test: a worker that marks a step complete before the final code review's APPROVE shows the pattern violation in the review (if option C is implemented) or in a hydration linter.

Related

Affected version: taskplane@0.28.4. Two production batches (20260506T105850 failed, 20260506T131717 succeeded) both exhibit the pattern. Detailed event logs available on request.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:templatesTemplates and scaffolding defaultsbugSomething isn't workingpriority:P2Medium

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions