Skip to content

feat: point-person-led completion pipeline#552

Merged
khaliqgant merged 15 commits intomainfrom
feat/point-person-completion-isolated
Mar 13, 2026
Merged

feat: point-person-led completion pipeline#552
khaliqgant merged 15 commits intomainfrom
feat/point-person-completion-isolated

Conversation

@khaliqgant
Copy link
Copy Markdown
Member

@khaliqgant khaliqgant commented Mar 12, 2026

Summary

  • Replace hard STEP_COMPLETE marker assertion with evidence-based completion pipeline
  • Pipeline: verification → OWNER_DECISION → legacy marker → evidence fallback → hard fail
  • Add channel evidence plumbing with getStepCompletionEvidence() accessor
  • Tolerant review parsing (strict → tolerant → evidence-backed)
  • New types: WorkflowOwnerDecision, WorkflowStepCompletionReason, StepCompletionEvidence
  • 43 new tests, 122 total all green
  • Updated workflow skill and README docs

Test plan

  • cd packages/sdk && npm exec -- vitest run — 122 tests pass
  • Manual: Codex lead/worker repro (previously failed on malformed review)
  • Manual: Gemini lead/worker repro (previously failed on missing STEP_COMPLETE)
  • Manual: Legacy marker-based workflow unchanged

🤖 Generated with Claude Code


Open with Devin

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

Open in Devin Review

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 3 new potential issues.

⚠️ 3 issues in files not directly in the diff

⚠️ Missing completed_by_process_exit case in buildStepCompletionDecision switch (packages/sdk/src/workflows/runner.ts:934-935)

The buildStepCompletionDecision switch statement at packages/sdk/src/workflows/runner.ts:918-936 handles completed_verified, completed_by_evidence, and completed_by_owner_decision, but has no case for the new completed_by_process_exit reason introduced in this same PR. When a step completes via tryProcessExitFallback (packages/sdk/src/workflows/runner.ts:4144), finalizeStepEvidence calls buildStepCompletionDecision which falls through to default: return undefined. This means no StepCompletionDecision is produced, so:

  • The trajectory never records a completion-evidence event for process-exit completions
  • collectOutcomes at packages/sdk/src/workflows/runner.ts:5749-5751 sets completionMode: undefined instead of a meaningful value
  • The stepCompleted trajectory event at line 3410 shows no mode suffix

⚠️ completionGracePeriodMs value ignored — acts as boolean toggle, not a timed wait (packages/sdk/src/workflows/runner.ts:4121-4122)

The tryProcessExitFallback method reads completionGracePeriodMs at packages/sdk/src/workflows/runner.ts:4121 but only checks whether the value is 0 (disabled) at line 4122. The actual numeric value (e.g., 5000ms, 30000ms) is never used for any timer or delay — the method proceeds to check evidence synchronously and returns immediately. However, the documentation in SKILL.md:512 states "The runner waits a configurable grace period" and schema.json:134 describes it as "Grace period (ms) after an agent exits... During this window the runner checks verification gates and evidence." Users setting completionGracePeriodMs: 30000 would expect a 30-second wait for late-arriving evidence but would get no wait at all — identical behavior to the default 5000.


⚠️ stepSignalParticipants map never cleared between workflow runs (packages/sdk/src/workflows/runner.ts:1943-1944)

The stepSignalParticipants map (introduced at packages/sdk/src/workflows/runner.ts:355) is populated via initializeStepSignalParticipants during step execution but is never cleared — neither at the start of runWorkflowCore (lines 1943-1944, where runtimeStepAgents and stepCompletionEvidence are cleared) nor in the finally cleanup block (lines 2303-2339, where supervisedRuntimeAgents, runtimeStepAgents, and other maps are cleared). For WorkflowRunner instances reused across multiple execute() calls, stale participant data from previous runs persists, potentially causing isSignalFromExpectedSender to incorrectly filter signals if step names are reused across runs.

View 9 additional findings in Devin Review.

Open in Devin Review

khaliqgant and others added 6 commits March 12, 2026 15:36
Replace brittle marker-based step completion with evidence-based
completion decision pipeline. Steps now complete via:
verification → OWNER_DECISION → legacy marker → evidence fallback.

- Add WorkflowOwnerDecision and WorkflowStepCompletionReason types
- Refactor assertOwnerCompletionMarker into completion decision pipeline
- Implement tolerant review parsing (strict → tolerant → evidence)
- Add channel evidence plumbing with getStepCompletionEvidence() accessor
- Update trajectory recording for evidence-based completions
- Add 43 new tests (122 total, all green)
- Update workflow authoring skill and README docs

Resolves Codex lead/worker (malformed REVIEW_DECISION) and
Gemini lead/worker (missing STEP_COMPLETE) failure modes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
COMPLIANCE REDUCTION SUMMARY (Items D+E):

Runner logic fixes:
- Add `tryProcessExitFallback()` — when agent exits with code 0 but posts
  no coordination signal, the runner checks verification gates and infers
  completion with reason `completed_by_process_exit`
- Add `completionGracePeriodMs` swarm config (default 5s, 0 to disable)
- Enhance `spawnAndWait` timeout handling — run verification on timeout
  before failing, recovering steps where agent completed work but failed
  to self-terminate
- Accept process exit code 0 as evidence signal in
  `judgeOwnerCompletionByEvidence()` alongside WORKER_DONE/LEAD_DONE
- Add `completed_by_process_exit` to WorkflowStepCompletionReason type

Prompting/skill fixes:
- Document 5-tier completion resolution system (explicit decision →
  legacy marker → verification → evidence → process-exit fallback)
- Document which signals are required vs optional (none are mandatory)
- Add "Robust Coordination Best Practices" section with completion
  strategy by step type
- Add configuration docs for completionGracePeriodMs

Both (runner + prompting):
- Schema updated with completionGracePeriodMs field
- Completion strategy guidance aligned with actual runner behavior

Tests: 4 new tests for process-exit fallback (142 total, all passing)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@khaliqgant khaliqgant force-pushed the feat/point-person-completion-isolated branch from d0990bc to 2af61e5 Compare March 12, 2026 14:37
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 4 new potential issues.

⚠️ 4 issues in files not directly in the diff

⚠️ buildStepCompletionDecision missing case for completed_by_process_exit (packages/sdk/src/workflows/runner.ts:934-935)

The buildStepCompletionDecision switch statement at packages/sdk/src/workflows/runner.ts:918-935 handles completed_verified, completed_by_evidence, and completed_by_owner_decision, but the newly added completed_by_process_exit reason falls through to default: return undefined. This means when a step completes via the process-exit fallback (tryProcessExitFallback at packages/sdk/src/workflows/runner.ts:4145), three things silently fail:

  1. Trajectory recordingfinalizeStepEvidence at line 691-695 calls buildStepCompletionDecision, gets undefined, and skips the stepCompletionDecision trajectory event entirely.
  2. Batch outcome completionMode — at line 2461-2462, the outcome's completionMode is set via buildStepCompletionDecision(...)?.mode which evaluates to undefined.
  3. Final outcome completionMode — at line 5749-5750, same issue in collectOutcomes.

The StepCompletionMode type already includes a 'heuristic' value (at packages/sdk/src/workflows/types.ts:426) that semantically fits process-exit inference.


⚠️ Missing completed_by_process_exit case in buildStepCompletionDecision switch (packages/sdk/src/workflows/runner.ts:934-935)

The buildStepCompletionDecision switch statement at packages/sdk/src/workflows/runner.ts:918-936 handles completed_verified, completed_by_evidence, and completed_by_owner_decision, but has no case for the new completed_by_process_exit reason introduced in this same PR. When a step completes via tryProcessExitFallback (packages/sdk/src/workflows/runner.ts:4144), finalizeStepEvidence calls buildStepCompletionDecision which falls through to default: return undefined. This means no StepCompletionDecision is produced, so:

  • The trajectory never records a completion-evidence event for process-exit completions
  • collectOutcomes at packages/sdk/src/workflows/runner.ts:5749-5751 sets completionMode: undefined instead of a meaningful value
  • The stepCompleted trajectory event at line 3410 shows no mode suffix

⚠️ completionGracePeriodMs value ignored — acts as boolean toggle, not a timed wait (packages/sdk/src/workflows/runner.ts:4121-4122)

The tryProcessExitFallback method reads completionGracePeriodMs at packages/sdk/src/workflows/runner.ts:4121 but only checks whether the value is 0 (disabled) at line 4122. The actual numeric value (e.g., 5000ms, 30000ms) is never used for any timer or delay — the method proceeds to check evidence synchronously and returns immediately. However, the documentation in SKILL.md:512 states "The runner waits a configurable grace period" and schema.json:134 describes it as "Grace period (ms) after an agent exits... During this window the runner checks verification gates and evidence." Users setting completionGracePeriodMs: 30000 would expect a 30-second wait for late-arriving evidence but would get no wait at all — identical behavior to the default 5000.


⚠️ stepSignalParticipants map never cleared between workflow runs (packages/sdk/src/workflows/runner.ts:1943-1944)

The stepSignalParticipants map (introduced at packages/sdk/src/workflows/runner.ts:355) is populated via initializeStepSignalParticipants during step execution but is never cleared — neither at the start of runWorkflowCore (lines 1943-1944, where runtimeStepAgents and stepCompletionEvidence are cleared) nor in the finally cleanup block (lines 2303-2339, where supervisedRuntimeAgents, runtimeStepAgents, and other maps are cleared). For WorkflowRunner instances reused across multiple execute() calls, stale participant data from previous runs persists, potentially causing isSignalFromExpectedSender to incorrectly filter signals if step names are reused across runs.

View 13 additional findings in Devin Review.

Open in Devin Review

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 4 new potential issues.

⚠️ 4 issues in files not directly in the diff

⚠️ buildStepCompletionDecision missing case for completed_by_process_exit (packages/sdk/src/workflows/runner.ts:934-935)

The buildStepCompletionDecision switch statement at packages/sdk/src/workflows/runner.ts:918-935 handles completed_verified, completed_by_evidence, and completed_by_owner_decision, but the newly added completed_by_process_exit reason falls through to default: return undefined. This means when a step completes via the process-exit fallback (tryProcessExitFallback at packages/sdk/src/workflows/runner.ts:4145), three things silently fail:

  1. Trajectory recordingfinalizeStepEvidence at line 691-695 calls buildStepCompletionDecision, gets undefined, and skips the stepCompletionDecision trajectory event entirely.
  2. Batch outcome completionMode — at line 2461-2462, the outcome's completionMode is set via buildStepCompletionDecision(...)?.mode which evaluates to undefined.
  3. Final outcome completionMode — at line 5749-5750, same issue in collectOutcomes.

The StepCompletionMode type already includes a 'heuristic' value (at packages/sdk/src/workflows/types.ts:426) that semantically fits process-exit inference.


⚠️ Missing completed_by_process_exit case in buildStepCompletionDecision switch (packages/sdk/src/workflows/runner.ts:934-935)

The buildStepCompletionDecision switch statement at packages/sdk/src/workflows/runner.ts:918-936 handles completed_verified, completed_by_evidence, and completed_by_owner_decision, but has no case for the new completed_by_process_exit reason introduced in this same PR. When a step completes via tryProcessExitFallback (packages/sdk/src/workflows/runner.ts:4144), finalizeStepEvidence calls buildStepCompletionDecision which falls through to default: return undefined. This means no StepCompletionDecision is produced, so:

  • The trajectory never records a completion-evidence event for process-exit completions
  • collectOutcomes at packages/sdk/src/workflows/runner.ts:5749-5751 sets completionMode: undefined instead of a meaningful value
  • The stepCompleted trajectory event at line 3410 shows no mode suffix

⚠️ completionGracePeriodMs value ignored — acts as boolean toggle, not a timed wait (packages/sdk/src/workflows/runner.ts:4121-4122)

The tryProcessExitFallback method reads completionGracePeriodMs at packages/sdk/src/workflows/runner.ts:4121 but only checks whether the value is 0 (disabled) at line 4122. The actual numeric value (e.g., 5000ms, 30000ms) is never used for any timer or delay — the method proceeds to check evidence synchronously and returns immediately. However, the documentation in SKILL.md:512 states "The runner waits a configurable grace period" and schema.json:134 describes it as "Grace period (ms) after an agent exits... During this window the runner checks verification gates and evidence." Users setting completionGracePeriodMs: 30000 would expect a 30-second wait for late-arriving evidence but would get no wait at all — identical behavior to the default 5000.


⚠️ stepSignalParticipants map never cleared between workflow runs (packages/sdk/src/workflows/runner.ts:1943-1944)

The stepSignalParticipants map (introduced at packages/sdk/src/workflows/runner.ts:355) is populated via initializeStepSignalParticipants during step execution but is never cleared — neither at the start of runWorkflowCore (lines 1943-1944, where runtimeStepAgents and stepCompletionEvidence are cleared) nor in the finally cleanup block (lines 2303-2339, where supervisedRuntimeAgents, runtimeStepAgents, and other maps are cleared). For WorkflowRunner instances reused across multiple execute() calls, stale participant data from previous runs persists, potentially causing isSignalFromExpectedSender to incorrectly filter signals if step names are reused across runs.

View 15 additional findings in Devin Review.

Open in Devin Review

…oting and fallback paths

Bug 1: parseOwnerDecision last-match heuristic picked COMPLETE from
re-quoted template text when the agent referenced the decision format
after stating INCOMPLETE_RETRY. Fixed by filtering matches that appear
on template-format lines (containing pipe-separated options).

Bug 2: judgeOwnerCompletionByEvidence and tryProcessExitFallback did
not check for explicit retry/fail signals in raw output. A clean exit
code or positive language could override an INCOMPLETE_RETRY. Fixed by
adding guards that refuse to infer completion when INCOMPLETE_RETRY,
INCOMPLETE_FAIL, or NEEDS_CLARIFICATION appears in raw output.

Also changed template format from pipe-separated to comma-separated
to prevent the template text from matching the decision regex.

4 regression tests added, 146 tests passing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 3 new potential issues.

⚠️ 3 issues in files not directly in the diff

⚠️ buildStepCompletionDecision missing case for completed_by_process_exit (packages/sdk/src/workflows/runner.ts:934-935)

The buildStepCompletionDecision switch statement at packages/sdk/src/workflows/runner.ts:918-935 handles completed_verified, completed_by_evidence, and completed_by_owner_decision, but the newly added completed_by_process_exit reason falls through to default: return undefined. This means when a step completes via the process-exit fallback (tryProcessExitFallback at packages/sdk/src/workflows/runner.ts:4145), three things silently fail:

  1. Trajectory recordingfinalizeStepEvidence at line 691-695 calls buildStepCompletionDecision, gets undefined, and skips the stepCompletionDecision trajectory event entirely.
  2. Batch outcome completionMode — at line 2461-2462, the outcome's completionMode is set via buildStepCompletionDecision(...)?.mode which evaluates to undefined.
  3. Final outcome completionMode — at line 5749-5750, same issue in collectOutcomes.

The StepCompletionMode type already includes a 'heuristic' value (at packages/sdk/src/workflows/types.ts:426) that semantically fits process-exit inference.


⚠️ completionGracePeriodMs value ignored — acts as boolean toggle, not a timed wait (packages/sdk/src/workflows/runner.ts:4121-4122)

The tryProcessExitFallback method reads completionGracePeriodMs at packages/sdk/src/workflows/runner.ts:4121 but only checks whether the value is 0 (disabled) at line 4122. The actual numeric value (e.g., 5000ms, 30000ms) is never used for any timer or delay — the method proceeds to check evidence synchronously and returns immediately. However, the documentation in SKILL.md:512 states "The runner waits a configurable grace period" and schema.json:134 describes it as "Grace period (ms) after an agent exits... During this window the runner checks verification gates and evidence." Users setting completionGracePeriodMs: 30000 would expect a 30-second wait for late-arriving evidence but would get no wait at all — identical behavior to the default 5000.


⚠️ stepSignalParticipants map never cleared between workflow runs (packages/sdk/src/workflows/runner.ts:1943-1944)

The stepSignalParticipants map (introduced at packages/sdk/src/workflows/runner.ts:355) is populated via initializeStepSignalParticipants during step execution but is never cleared — neither at the start of runWorkflowCore (lines 1943-1944, where runtimeStepAgents and stepCompletionEvidence are cleared) nor in the finally cleanup block (lines 2303-2339, where supervisedRuntimeAgents, runtimeStepAgents, and other maps are cleared). For WorkflowRunner instances reused across multiple execute() calls, stale participant data from previous runs persists, potentially causing isSignalFromExpectedSender to incorrectly filter signals if step names are reused across runs.

View 20 additional findings in Devin Review.

Open in Devin Review

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 3 new potential issues.

⚠️ 3 issues in files not directly in the diff

⚠️ buildStepCompletionDecision missing case for completed_by_process_exit (packages/sdk/src/workflows/runner.ts:934-935)

The buildStepCompletionDecision switch statement at packages/sdk/src/workflows/runner.ts:918-935 handles completed_verified, completed_by_evidence, and completed_by_owner_decision, but the newly added completed_by_process_exit reason falls through to default: return undefined. This means when a step completes via the process-exit fallback (tryProcessExitFallback at packages/sdk/src/workflows/runner.ts:4145), three things silently fail:

  1. Trajectory recordingfinalizeStepEvidence at line 691-695 calls buildStepCompletionDecision, gets undefined, and skips the stepCompletionDecision trajectory event entirely.
  2. Batch outcome completionMode — at line 2461-2462, the outcome's completionMode is set via buildStepCompletionDecision(...)?.mode which evaluates to undefined.
  3. Final outcome completionMode — at line 5749-5750, same issue in collectOutcomes.

The StepCompletionMode type already includes a 'heuristic' value (at packages/sdk/src/workflows/types.ts:426) that semantically fits process-exit inference.


⚠️ completionGracePeriodMs value ignored — acts as boolean toggle, not a timed wait (packages/sdk/src/workflows/runner.ts:4136-4137)

The tryProcessExitFallback method reads completionGracePeriodMs at packages/sdk/src/workflows/runner.ts:4121 but only checks whether the value is 0 (disabled) at line 4122. The actual numeric value (e.g., 5000ms, 30000ms) is never used for any timer or delay — the method proceeds to check evidence synchronously and returns immediately. However, the documentation in SKILL.md:512 states "The runner waits a configurable grace period" and schema.json:134 describes it as "Grace period (ms) after an agent exits... During this window the runner checks verification gates and evidence." Users setting completionGracePeriodMs: 30000 would expect a 30-second wait for late-arriving evidence but would get no wait at all — identical behavior to the default 5000.


⚠️ stepSignalParticipants map never cleared between workflow runs (packages/sdk/src/workflows/runner.ts:1943-1944)

The stepSignalParticipants map (introduced at packages/sdk/src/workflows/runner.ts:355) is populated via initializeStepSignalParticipants during step execution but is never cleared — neither at the start of runWorkflowCore (lines 1943-1944, where runtimeStepAgents and stepCompletionEvidence are cleared) nor in the finally cleanup block (lines 2303-2339, where supervisedRuntimeAgents, runtimeStepAgents, and other maps are cleared). For WorkflowRunner instances reused across multiple execute() calls, stale participant data from previous runs persists, potentially causing isSignalFromExpectedSender to incorrectly filter signals if step names are reused across runs.

View 21 additional findings in Devin Review.

Open in Devin Review

khaliqgant and others added 6 commits March 13, 2026 12:10
Brings in validation workflows (legacy-marker, map-reduce, supervisor,
wrong-sender) from pr552-full-e2e and codex/gemini lead-worker,
map-reduce, supervisor test workflows.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
@khaliqgant khaliqgant merged commit 9e38cc2 into main Mar 13, 2026
1 check passed
@khaliqgant khaliqgant deleted the feat/point-person-completion-isolated branch March 13, 2026 11:31
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 3 new potential issues.

⚠️ 3 issues in files not directly in the diff

⚠️ buildStepCompletionDecision missing case for completed_by_process_exit (packages/sdk/src/workflows/runner.ts:934-935)

The buildStepCompletionDecision switch statement at packages/sdk/src/workflows/runner.ts:918-935 handles completed_verified, completed_by_evidence, and completed_by_owner_decision, but the newly added completed_by_process_exit reason falls through to default: return undefined. This means when a step completes via the process-exit fallback (tryProcessExitFallback at packages/sdk/src/workflows/runner.ts:4145), three things silently fail:

  1. Trajectory recordingfinalizeStepEvidence at line 691-695 calls buildStepCompletionDecision, gets undefined, and skips the stepCompletionDecision trajectory event entirely.
  2. Batch outcome completionMode — at line 2461-2462, the outcome's completionMode is set via buildStepCompletionDecision(...)?.mode which evaluates to undefined.
  3. Final outcome completionMode — at line 5749-5750, same issue in collectOutcomes.

The StepCompletionMode type already includes a 'heuristic' value (at packages/sdk/src/workflows/types.ts:426) that semantically fits process-exit inference.


⚠️ completionGracePeriodMs value ignored — acts as boolean toggle, not a timed wait (packages/sdk/src/workflows/runner.ts:4136-4137)

The tryProcessExitFallback method reads completionGracePeriodMs at packages/sdk/src/workflows/runner.ts:4121 but only checks whether the value is 0 (disabled) at line 4122. The actual numeric value (e.g., 5000ms, 30000ms) is never used for any timer or delay — the method proceeds to check evidence synchronously and returns immediately. However, the documentation in SKILL.md:512 states "The runner waits a configurable grace period" and schema.json:134 describes it as "Grace period (ms) after an agent exits... During this window the runner checks verification gates and evidence." Users setting completionGracePeriodMs: 30000 would expect a 30-second wait for late-arriving evidence but would get no wait at all — identical behavior to the default 5000.


⚠️ stepSignalParticipants map never cleared between workflow runs (packages/sdk/src/workflows/runner.ts:1943-1944)

The stepSignalParticipants map (introduced at packages/sdk/src/workflows/runner.ts:355) is populated via initializeStepSignalParticipants during step execution but is never cleared — neither at the start of runWorkflowCore (lines 1943-1944, where runtimeStepAgents and stepCompletionEvidence are cleared) nor in the finally cleanup block (lines 2303-2339, where supervisedRuntimeAgents, runtimeStepAgents, and other maps are cleared). For WorkflowRunner instances reused across multiple execute() calls, stale participant data from previous runs persists, potentially causing isSignalFromExpectedSender to incorrectly filter signals if step names are reused across runs.

View 28 additional findings in Devin Review.

Open in Devin Review

khaliqgant added a commit that referenced this pull request Mar 25, 2026
* feat: point-person-led completion pipeline

Replace brittle marker-based step completion with evidence-based
completion decision pipeline. Steps now complete via:
verification → OWNER_DECISION → legacy marker → evidence fallback.

- Add WorkflowOwnerDecision and WorkflowStepCompletionReason types
- Refactor assertOwnerCompletionMarker into completion decision pipeline
- Implement tolerant review parsing (strict → tolerant → evidence)
- Add channel evidence plumbing with getStepCompletionEvidence() accessor
- Update trajectory recording for evidence-based completions
- Add 43 new tests (122 total, all green)
- Update workflow authoring skill and README docs

Resolves Codex lead/worker (malformed REVIEW_DECISION) and
Gemini lead/worker (missing STEP_COMPLETE) failure modes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix owner decision precedence in completion pipeline

* fix: validate completion signal sender provenance

* fix: harden owner retry semantics

* test: prove happy-path lead-worker completion flows

* feat: reduce agent compliance dependence with process-exit fallback

COMPLIANCE REDUCTION SUMMARY (Items D+E):

Runner logic fixes:
- Add `tryProcessExitFallback()` — when agent exits with code 0 but posts
  no coordination signal, the runner checks verification gates and infers
  completion with reason `completed_by_process_exit`
- Add `completionGracePeriodMs` swarm config (default 5s, 0 to disable)
- Enhance `spawnAndWait` timeout handling — run verification on timeout
  before failing, recovering steps where agent completed work but failed
  to self-terminate
- Accept process exit code 0 as evidence signal in
  `judgeOwnerCompletionByEvidence()` alongside WORKER_DONE/LEAD_DONE
- Add `completed_by_process_exit` to WorkflowStepCompletionReason type

Prompting/skill fixes:
- Document 5-tier completion resolution system (explicit decision →
  legacy marker → verification → evidence → process-exit fallback)
- Document which signals are required vs optional (none are mandatory)
- Add "Robust Coordination Best Practices" section with completion
  strategy by step type
- Add configuration docs for completionGracePeriodMs

Both (runner + prompting):
- Schema updated with completionGracePeriodMs field
- Completion strategy guidance aligned with actual runner behavior

Tests: 4 new tests for process-exit fallback (142 total, all passing)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: cover relaycast mcp bootstrap paths

* fix: prevent INCOMPLETE_RETRY from being overridden by template re-quoting and fallback paths

Bug 1: parseOwnerDecision last-match heuristic picked COMPLETE from
re-quoted template text when the agent referenced the decision format
after stating INCOMPLETE_RETRY. Fixed by filtering matches that appear
on template-format lines (containing pipe-separated options).

Bug 2: judgeOwnerCompletionByEvidence and tryProcessExitFallback did
not check for explicit retry/fail signals in raw output. A clean exit
code or positive language could override an INCOMPLETE_RETRY. Fixed by
adding guards that refuse to infer completion when INCOMPLETE_RETRY,
INCOMPLETE_FAIL, or NEEDS_CLARIFICATION appears in raw output.

Also changed template format from pipe-separated to comma-separated
to prevent the template text from matching the decision regex.

4 regression tests added, 146 tests passing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: cover remaining relaycast mcp startup branches

* fix: harden supervising leads and interactive PTYs

* fix: stabilize interactive lead-worker happy path

* fix: harden workflow owner retries and relay validation

* chore: add validation and test workflow yamls from PR #552 worktrees

Brings in validation workflows (legacy-marker, map-reduce, supervisor,
wrong-sender) from pr552-full-e2e and codex/gemini lead-worker,
map-reduce, supervisor test workflows.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

* style: auto-format Rust code with cargo fmt

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant