Skip to content

fix: reflect resume drops orchestrator @worker dispatch after restart#528

Merged
PureWeen merged 5 commits intomainfrom
shneuvil/fix-reflect-resume-drops-worker-dispatch
Apr 7, 2026
Merged

fix: reflect resume drops orchestrator @worker dispatch after restart#528
PureWeen merged 5 commits intomainfrom
shneuvil/fix-reflect-resume-drops-worker-dispatch

Conversation

@PureWeen
Copy link
Copy Markdown
Owner

@PureWeen PureWeen commented Apr 6, 2026

Summary

  • Reflect resume path dropped the orchestrator's @worker dispatch after app restart — fixed by using SendPromptAndWaitAsync + ParseTaskAssignments in MonitorAndSynthesizeAsync
  • availableWorkers in the reflect resume path included the orchestrator — fixed by filtering it out (matches both existing dispatch paths)
  • session.idle with a stale background-task payload kept sessions stuck in IsProcessing indefinitely — fixed by detecting when backgroundTasksChanged already confirmed empty before the idle arrived

Observed failure

Orchestrator dispatched @worker:Copilot Cli-worker-1 at 21:19:56 UTC, but fire-and-forget synthesis never parsed the response. Worker never received the task. Session timed out 30 min later.

PROMPT session: backgroundTasksChanged shells=0 fired, then session.idle shells=2 arrived (stale CLI snapshot). Fresh 60-min zombie clock started, session stuck in IsProcessing=true.

Test plan

  • MonitorAndSynthesize_ReflectResume_WaitsForOrchestratorResponse — verifies await + dispatch wiring
  • SessionIdle_StalePayload_NotDeferredWhenBgTasksAlreadyConfirmedEmpty — verifies staleness guard
  • Existing tests pass

When the app restarts mid-reflect-loop, ResumeOrchestrationIfPendingAsync
collects worker results and sends a synthesis prompt to the orchestrator.
For OrchestratorReflect groups, the orchestrator may respond with new
@worker blocks (it wants to keep iterating). The previous code used
SendPromptAsync (fire-and-forget), so that response was never read —
the @worker assignments were silently dropped and the loop stalled forever.

Fix: for reflect groups, use SendPromptAndWaitAsync to get the
orchestrator's response, parse it for @worker assignments, and if found
execute one more bounded dispatch+collect+synthesize round. The
OrchestratorCollectionTimeout and ForceCompleteProcessingAsync guards
match the normal reflect loop path.

Observed: orchestrator emitted @worker:Copilot Cli-worker-1 at 21:19:56
after resume synthesis; worker never received the message; loop stalled
until screen lock killed the connection at 21:23.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@PureWeen
Copy link
Copy Markdown
Owner Author

PureWeen commented Apr 6, 2026

🔍 Multi-Model Code Review — PR #528 (Re-review #3 — v3 sentinel fix)

PR: fix: reflect resume drops orchestrator @worker dispatch after restart
Branch: shneuvil/fix-reflect-resume-drops-worker-dispatch
Latest commit: 081399262 — fix: use string.Empty sentinel to distinguish confirmed-empty from never-seen in IDLE-DEFER
Tests: ✅ 122 targeted tests pass (local)
CI: ⚠️ No checks reported on the branch


Status of All Findings

# Finding Status
1 availableWorkers includes orchestrator — possible self-dispatch ✅ FIXED — .Where(m => m != pending.OrchestratorName) filter applied
2 Stale idle false-positive — null conflated "never seen" with "confirmed empty" ✅ FIXED — string.Empty sentinel in RefreshDeferredBackgroundTaskTracking; staleness check uses == string.Empty

v3 Sentinel Fix Verification

The fix correctly distinguishes three states of DeferredBackgroundTaskFingerprint:

  • null → initial/reset state (no backgroundTasksChanged has fired yet, or turn boundary cleared it)
  • string.EmptybackgroundTasksChanged explicitly confirmed zero tasks this turn
  • non-empty string → backgroundTasksChanged reported active tasks with this fingerprint

ClearDeferredIdleTracking (called at turn boundaries, abort, reconnect, etc.) correctly resets to null, preserving the sentinel design. Only RefreshDeferredBackgroundTaskTracking sets string.Empty when tasks are confirmed gone.

The staleness check: preIdleFingerprint == string.Empty && preIdleTicks == 0 && tracking.Snapshot.HasAny now correctly fires only when backgroundTasksChanged has explicitly confirmed empty — NOT on the first idle with genuine tasks (where fingerprint is still null).


Minor Observation (🟢)

🟢 MINOR — Assert.DoesNotContain("= null", method) test is fragile

File: BackgroundTasksIdleTests.cs, RefreshDeferredBackgroundTaskTracking_SetsEmptyStringSentinel_WhenTasksConfirmedGone

The assertion Assert.DoesNotContain("= null", method) matches any = null substring in the method, not specifically DeferredBackgroundTaskFingerprint = null. If a future change adds a legitimate null assignment for a different field in the same method, this test would false-fail.

Suggested fix: Use Assert.DoesNotContain("DeferredBackgroundTaskFingerprint = null", method) for precision.


Recommendation

Approve — All findings from all review rounds have been addressed. The reflect resume fix is correct, the orchestrator filter is in place, and the stale idle detection now properly distinguishes confirmed-empty from never-seen using the string.Empty sentinel. The one minor test fragility note is informational.

PureWeen and others added 2 commits April 6, 2026 17:23
…atch

Both existing dispatch paths (SendViaOrchestratorAsync, SendViaOrchestratorReflectAsync)
filter the orchestrator from the worker list. The new reflect-resume path in
MonitorAndSynthesizeAsync was missing this filter, allowing self-dispatch if the
orchestrator's response includes its own name in an @worker block.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… completion

Race condition in IDLE-DEFER: backgroundTasksChanged fires with shells=0 (PolyPilot
confirms all shells done) then a session.idle arrives with shells=2 (stale CLI snapshot
generated before completions landed). PolyPilot was starting a fresh 60-min zombie
clock on the stale payload, keeping IsProcessing=true indefinitely.

Fix: capture pre-idle fingerprint/ticks before calling RefreshDeferredBackgroundTaskTracking.
If they were null/0 (backgroundTasksChanged already confirmed empty), treat the idle
payload as stale and skip the defer. PolyPilot's real-time tracking is ground truth.

Observed in the PROMPT session: agents all completed (agents=0), backgroundTasksChanged
showed shells=0, but session.idle fired with shells=2 — session stuck in IsProcessing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PureWeen and others added 2 commits April 6, 2026 18:21
…ver-seen in IDLE-DEFER

null = never seen a backgroundTasksChanged event this turn (initial/reset state)
string.Empty = backgroundTasksChanged explicitly confirmed zero tasks
non-empty = backgroundTasksChanged reported active tasks with this fingerprint

The previous null check caused a false positive: when session.idle arrived
with genuine new tasks before backgroundTasksChanged had fired at all,
preIdleFingerprint==null && ticks==0 matched the same condition as confirmed-
empty, incorrectly treating the idle payload as stale and skipping the defer.
- BackgroundTasksIdleTests: update handler search to match variable
  binding pattern; replace CompareExchange assertion with
  RefreshDeferredBackgroundTaskTracking (logic moved there);
  anchor DoesNotContain to DeferredBackgroundTaskFingerprint field name
- MultiAgentRegressionTests: increase block size to 14000 chars
  to reach the reflect-resume code at the end of the method

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@PureWeen PureWeen merged commit 42fb88d into main Apr 7, 2026
@PureWeen PureWeen deleted the shneuvil/fix-reflect-resume-drops-worker-dispatch branch April 7, 2026 00:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant