Skip to content

Harden .DONE authority for multi-segment tasks — monitor, resume, and discovery guards #462

@HenryLach

Description

@HenryLach

Context

TP-145 and follow-up fixes added four defense layers against premature .DONE in multi-segment tasks (pre-segment deletion, worker prompt, post-segment deletion, expansion deletion). These handle the practical case but Sage identified deeper architectural guards needed.

Remaining recommendations from Sage review

1. Monitor guard

In resolveTaskMonitorState (execution.ts), ignore .DONE as success signal for known non-final active segments. Currently .DONE is highest-priority and caches terminal state. The monitor should require an explicit final-segment signal from lane-runner snapshot before treating .DONE as authoritative.

2. Resume guard

In collectDoneTaskIdsForResume / reconciliation (resume.ts), for multi-segment tasks with incomplete segment frontier, do not accept .DONE as authoritative. Optionally auto-remove stale marker and alert.

3. Discovery safeguard

Add sanity check before skipping .DONE tasks in segmented contexts (discovery.ts). At minimum a doctor warning for inconsistent .DONE vs segment state.

4. Tests needed

  • Non-final unlink failure path
  • Transient .DONE monitor race (non-final should not go terminal)
  • Resume with .DONE plus incomplete frontier (should re-execute, not mark complete)

Priority

Medium — current four-layer defense handles the common case. These guards are for edge cases (unlink failure, race timing, crash between .DONE creation and deletion).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions