Skip to content

fix: WorkflowBuilder preset field + reviewer double-booking#566

Merged
khaliqgant merged 11 commits intomainfrom
fix/preset-interactive-reviewer-dedup
Mar 14, 2026
Merged

fix: WorkflowBuilder preset field + reviewer double-booking#566
khaliqgant merged 11 commits intomainfrom
fix/preset-interactive-reviewer-dedup

Conversation

@khaliqgant
Copy link
Copy Markdown
Member

@khaliqgant khaliqgant commented Mar 13, 2026

Summary

  • WorkflowBuilder drops preset field: WorkflowBuilder.agent() never mapped options.preset to the AgentDefinition. The AgentOptions interface was missing the preset field entirely. This meant preset: 'worker' via the builder API had no effect — resolveAgentDef never saw the preset, so interactive: false was never applied, and agents were spawned as interactive PTY with the full owner-supervisor-reviewer pipeline.
  • Reviewer double-booking: resolveAutoReviewAgent independently assigned the same agent as reviewer for multiple concurrent steps with no coordination. When e.g. worker-a was assigned to review both task-b and task-c simultaneously, one review could starve and hit the safety backstop timeout (observed in fan-out workflows).

Changes

  1. Added preset?: AgentPreset to AgentOptions interface and the agent() method in builder.ts
  2. Added activeReviewers Set to WorkflowRunner to track agents currently assigned as reviewers
  3. Updated resolveAutoReviewAgent to prefer agents not currently reviewing, falling back to busy agents only when no alternative exists
  4. Cleanup of activeReviewers in the finally block after review completes and in the run cleanup

Test plan

  • Workflow with preset: 'worker' via builder API should run agents as non-interactive (no PTY, no owner/reviewer pipeline)
  • Fan-out workflow with 3+ parallel steps should not assign the same reviewer to multiple concurrent reviews when alternatives exist
  • Existing review gate behavior unchanged when only one eligible reviewer exists

🤖 Generated with Claude Code


Open with Devin

Two fixes:

1. WorkflowBuilder.agent() was not mapping options.preset to the
   AgentDefinition, so preset: 'worker' had no effect via the builder
   API. Added preset to AgentOptions interface and the agent() method.
   This means resolveAgentDef can now see the preset and correctly
   default interactive to false for worker/reviewer/analyst presets.

2. resolveAutoReviewAgent independently assigned the same agent as
   reviewer for multiple concurrent steps. When worker-a reviewed both
   task-b and task-c simultaneously, one review could starve and hit
   the safety backstop timeout. Now tracks active reviewers in a Set
   and prefers unassigned agents, falling back to busy ones only when
   no alternative exists.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

Open in Devin Review

The builder's step() method and StepOptions interface only supported
agent steps. Deterministic (shell command) steps had to be defined
in YAML. Now the builder accepts:

  .step('check', {
    type: 'deterministic',
    command: 'npm test',
    captureOutput: true,
    failOnError: true,
    dependsOn: ['build'],
  })

Also relaxes the toConfig() validation to allow workflows with only
deterministic steps (no agents required).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
devin-ai-integration[bot]

This comment was marked as resolved.

khaliqgant and others added 3 commits March 13, 2026 14:54
….0.2

- Added preset field to Agent Definition section
- Added Deterministic Steps subsection with builder API example
- Added hub-pattern gotcha to Common Mistakes table
- Bumped writing-agent-relay-workflows version to 1.0.2 in prpm.json

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Addresses Devin review feedback: when multiple concurrent steps resolve
to the same reviewer agent, a Set incorrectly removes tracking on the
first delete even while another step is still using that reviewer.
A Map<string, number> with increment/decrement correctly handles
concurrent usage of the same reviewer name.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…-owned agents

For DAG/pipeline patterns, disable auto-hardening (owner+specialist+reviewer
spawning) since it causes agent explosion. Auto-hardening is only appropriate
for hub patterns (fan-out, hub-spoke, etc.) where cross-agent supervision is
expected.

Also fix self-owned agents being stuck in exit-only completion mode. The
evidenceRole was always set to 'owner' in the self-owned path, triggering
preserveIdleSupervisor and preventing idle-as-complete. Since self-owned
agents have no dedicated owner, they should use 'specialist' role which
races exit vs idle for faster completion.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
devin-ai-integration[bot]

This comment was marked as resolved.

When an agent goes idle (30s silence) and has an output_contains
verification, check if the verification token is present before
treating idle as completion. If not yet present, continue waiting
instead of prematurely releasing the agent.

This prevents agents that need time to process (e.g., reading channel
inbox, computing results) from being released before they complete
their work.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
devin-ai-integration[bot]

This comment was marked as resolved.

khaliqgant and others added 2 commits March 13, 2026 20:03
1. Restore evidenceRole to 'owner' for interactive self-owned agents
   (fixes signal attribution) while using preserveOnIdle: false for
   non-hub patterns to prevent exit-only mode blocking DAG agents.

2. Defer reviewer resolution to just before the review gate so
   activeReviewers is populated correctly for concurrent steps.

3. Track elapsed time in idle verification loop so timeoutMs is
   respected across iterations instead of restarting full timeout.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Only force exit-only mode (preserveOnIdle) for lead-like agents in
hub-spoke patterns. Workers/spokes should use idle-as-complete so
they don't hang waiting for process exit.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
devin-ai-integration[bot]

This comment was marked as resolved.

khaliqgant and others added 2 commits March 13, 2026 20:36
… guard

The idle-as-complete loop used a simple `includes()` check for
output_contains verification. This caused false positives when the
verification token appeared in the task text (from the prompt
instruction) — the loop found the task-injected occurrence and
treated idle as complete, but runVerification requires a second
occurrence from the agent's actual output.

Also: only resolve auto-reviewer when usesDedicatedOwner is true,
preventing spurious review gates for self-owned agents.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The broker fires agent_idle only once per idle transition. When
verification fails on idle, the loop's continue would hang because
waitForIdle won't resolve again without new output. Now waits a
15s grace period for the agent to produce output and re-idle. If
the agent exits during grace, returns that result. If still idle
after grace, releases the agent and lets upstream verification
handle the completion decision.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 12 additional findings in Devin Review.

Open in Devin Review

Comment on lines +5375 to +5377
if (graceResult.kind === 'exit') {
return graceResult.result as 'exited' | 'timeout' | 'released';
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Grace period exit-timeout check doesn't distinguish real exits from timeouts, causing non-deterministic behavior

When an agent goes idle but output_contains verification hasn't passed, the code enters a 15-second grace period at packages/sdk/src/workflows/runner.ts:5367. Both waitForExit and waitForIdle are raced with the same 15s timeout. If the agent is permanently idle, both resolve with 'timeout' near-simultaneously. Which promise wins Promise.race is non-deterministic:

  • If exit wins: graceResult = { kind: 'exit', result: 'timeout' } → the if (graceResult.kind === 'exit') check at line 5375 matches → returns 'timeout' without releasing the agent. Upstream spawnAndWait then treats this as a step timeout (packages/sdk/src/workflows/runner.ts:5146-5174), running verification and potentially throwing a timeout error.
  • If idle wins: graceResult = { kind: 'idle', result: 'timeout' } → falls through to line 5380-5383 → agent is released, returns 'released'. Upstream proceeds normally with output collection.

The same scenario (permanently idle agent, grace period expired, verification not passed) produces different outcomes depending on which promise resolves first. The comment at line 5378 ("Grace period timed out — agent is permanently idle") indicates the intended behavior is always the release path, but the overly broad graceResult.kind === 'exit' check intercepts exit timeouts along with real exits.

Fix: filter out timeout results from the exit check
if (graceResult.kind === 'exit' && graceResult.result !== 'timeout') {
  return graceResult.result as 'exited' | 'released';
}
Suggested change
if (graceResult.kind === 'exit') {
return graceResult.result as 'exited' | 'timeout' | 'released';
}
if (graceResult.kind === 'exit' && graceResult.result !== 'timeout') {
return graceResult.result as 'exited' | 'released';
}
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +5366 to +5369
const idleGraceSecs = 15;
const graceResult = await Promise.race([
agent.waitForExit(idleGraceSecs * 1000).then((r) => ({ kind: 'exit' as const, result: r })),
agent.waitForIdle(idleGraceSecs * 1000).then((r) => ({ kind: 'idle' as const, result: r })),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Idle verification grace period ignores remaining step timeout, potentially exceeding it by up to 15 seconds

At packages/sdk/src/workflows/runner.ts:5366-5369, the grace period uses a hardcoded idleGraceSecs = 15 without accounting for the remaining step timeout. The outer loop correctly computes remaining from timeoutMs at line 5334 and passes it to the main waitForExit/waitForIdle race. But once idle fires and verification fails, the grace period races with idleGraceSecs * 1000 (15000ms) regardless of how much timeout is left.

For example, if only 2 seconds of timeout remained when idle was detected, the main race resolves within 2s, but the grace period then waits up to 15s — exceeding the step timeout by 13 seconds. The timeout check at the top of the while loop (remaining <= 0) only runs on the next iteration, after the grace period has already blocked.

Suggested change
const idleGraceSecs = 15;
const graceResult = await Promise.race([
agent.waitForExit(idleGraceSecs * 1000).then((r) => ({ kind: 'exit' as const, result: r })),
agent.waitForIdle(idleGraceSecs * 1000).then((r) => ({ kind: 'idle' as const, result: r })),
const idleGraceSecs = 15;
const elapsedSinceLoopStart = Date.now() - idleLoopStart;
const graceMs = timeoutMs != null
? Math.min(idleGraceSecs * 1000, Math.max(0, timeoutMs - elapsedSinceLoopStart))
: idleGraceSecs * 1000;
const graceResult = await Promise.race([
agent.waitForExit(graceMs).then((r) => ({ kind: 'exit' as const, result: r })),
agent.waitForIdle(graceMs).then((r) => ({ kind: 'idle' as const, result: r })),
]);
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@khaliqgant khaliqgant merged commit bf50ec5 into main Mar 14, 2026
32 checks passed
@khaliqgant khaliqgant deleted the fix/preset-interactive-reviewer-dedup branch March 14, 2026 06:37
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 12 additional findings in Devin Review.

Open in Devin Review

Comment on lines +3448 to +3451
if (usesAutoHardening && usesDedicatedOwner && !reviewDef) {
reviewDef = this.resolveAutoReviewAgent(ownerDef, agentMap);
supervised.reviewer = reviewDef;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 JIT reviewer resolution drops review for self-owned steps in hub patterns

The deferred reviewer resolution at line 3448 requires usesDedicatedOwner in addition to usesAutoHardening, but the old eager resolution only required usesAutoHardening. This means hub-pattern steps where no dedicated lead/coordinator agent was found (i.e., ownerDef.name === specialistDef.name, so usesDedicatedOwner is false) no longer get a reviewer resolved. Previously, self-owned steps in hub patterns still received auto-review because resolveAutoReviewAgent was called unconditionally when usesAutoHardening was true (packages/sdk/src/workflows/runner.ts:3172). Now the review gate at line 3453 (if (usesOwnerFlow && reviewDef)) silently skips because reviewDef remains undefined — the JIT block at line 3448 never fires.

Example scenario where review is lost

A hub-spoke workflow with two agents where neither has an owner-ish name/role:

workflow('test')
  .pattern('hub-spoke')
  .agent('alpha', { cli: 'claude', role: 'Engineer' })
  .agent('beta',  { cli: 'claude', role: 'Tester' })
  .step('build', { agent: 'alpha', task: 'Build feature' })

resolveAutoStepOwner finds no dedicated owner → ownerDef === specialistDefusesDedicatedOwner = false. Old code: reviewer resolved and review runs. New code: reviewer never resolved, review silently skipped.

Suggested change
if (usesAutoHardening && usesDedicatedOwner && !reviewDef) {
reviewDef = this.resolveAutoReviewAgent(ownerDef, agentMap);
supervised.reviewer = reviewDef;
}
if (usesAutoHardening && !reviewDef) {
reviewDef = this.resolveAutoReviewAgent(ownerDef, agentMap);
supervised.reviewer = reviewDef;
}
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

khaliqgant added a commit that referenced this pull request Mar 25, 2026
1. Restore evidenceRole to 'owner' for interactive self-owned agents
   (fixes signal attribution) while using preserveOnIdle: false for
   non-hub patterns to prevent exit-only mode blocking DAG agents.

2. Defer reviewer resolution to just before the review gate so
   activeReviewers is populated correctly for concurrent steps.

3. Track elapsed time in idle verification loop so timeoutMs is
   respected across iterations instead of restarting full timeout.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant