Conversation
…low definitions Three major additions to the workflows spec: 1. Reflection Protocol — event-driven reflection inspired by the Generative Agents paper (Park et al., 2023). Importance-weighted message accumulation triggers focal point generation, synthesis, and course correction. Includes ReflectionEngine implementation, REFLECT message protocol, and per-pattern reflection behavior. 2. Trajectory Integration — formal integration with the agent-trajectories SDK (v0.4.0). Workflows auto-record messages, reflections, and decisions as trajectory events. Auto-generates retrospectives on completion. Enables cross-workflow learning and compliance/attribution. 3. YAML Workflow Definitions — portable YAML schema for defining workflows, compatible with relay-cloud's relay.yaml (PR #94). Supports template variables, DAG-based step parallelism, built-in templates, and progressive configuration (one-liner to full custom). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New patterns (6-10): - handoff: dynamic routing with circuit breaker (max hops) - cascade: cost-aware LLM escalation (cheap → capable) - dag: directed acyclic graph with parallel execution - debate: adversarial refinement with structured rounds + judge - hierarchical: multi-level delegation tree (lead → coordinators → workers) New primitives required: - DAG Scheduler (topological sort, parallel dispatch, join tracking) - Handoff Controller (active agent tracking, context transfer) - Round Manager (debate rounds, turn order, convergence detection) - Confidence Parser (extract [confidence=X.X] from DONE messages) - Tree Validator (structural validation, sub-team computation) New message protocol signals: - HANDOFF, CONFIDENCE, ARGUMENT, CONCEDE, VERDICT, TEAM_DONE Includes pattern × primitive matrix showing what each pattern needs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Decision framework and reference for fan-out, pipeline, hub-spoke, consensus, mesh, handoff, cascade, dag, debate, and hierarchical patterns. Includes reflection protocol, YAML workflow definitions, and common mistakes guide. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
DAG-based execution plan with 9 nodes covering shared types, DB migration, workflow runner, swarm coordinator, templates, API endpoints, CLI commands, dashboard panel, and integration tests. Uses broker SDK for agent lifecycle. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tor script Adds stigmergic state store, agent pool manager, auction engine, branch pruner, and gossip disseminator to WORKFLOWS_SPEC.md (Phase 5). These bring coverage from 67% to 88% of the 42 swarm techniques catalogued from multi-agent orchestration literature. Also adds executable broker SDK script (scripts/run-swarm-implementation.ts) that uses a DAG pattern to coordinate 9 work nodes implementing relay-cloud PR #94, with dependency-aware parallel execution and convention injection. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Script fixes: - Use Promise.allSettled instead of Promise.race for batch execution - Add --resume support with state persistence to .relay/swarm-impl-state.json - Propagate failures to downstream nodes immediately (mark as "blocked") - Add readFirst field to DAG nodes so agents read existing code first - Require detailed DONE messages with type signatures and file paths - Add resolved guard to prevent double-resolution in polling loop - Add "blocked" status to NodeResult for better reporting Skill updates: - Add "DAG Executor Pitfalls" section with 6 common implementation mistakes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Updated spawn/send/release/logs commands to match actual CLI syntax (positional args, not --flag format). Verified with --dry-run. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Import AgentRelayClient, getLogs, and BrokerEvent directly from the broker SDK sub-paths (client, logs, protocol) which avoid the @relaycast/sdk transitive dependency. Replaces all execSync calls with proper SDK methods: spawnPty, release, listAgents, onEvent. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The AgentRelayClient expects the Rust broker binary which has init --name --channels for protocol mode. The Node.js CLI binary has a different init command (setup wizard). Built Rust binary with cargo build and pointed binaryPath to target/debug/agent-relay. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Relaycast API returns 409 when creating a workspace with a name that already exists. Without cached credentials the broker can't recover. Use a timestamped broker name to ensure uniqueness. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…olling The Rust broker doesn't write worker-logs/ files — that's a Node.js CLI feature. Switch watchForDone to use broker events: - worker_stream: accumulate PTY output chunks, scan for DONE/ERROR - relay_inbound: relay messages from agents - agent_exited: detect agent termination Remove unused getLogs import. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of parsing PTY output for DONE signals (which matched the prompt template text), agents now: 1. Do their work 2. Send a relay message with "DONE: <summary>" to the workflow channel 3. Exit naturally The orchestrator watches for: - relay_inbound: captures DONE/ERROR summaries for downstream deps - agent_exited: definitive completion signal (code 0 = success) Removed all "DONE: <detailed summary>" template text from task prompts to prevent false positives. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Spawned PTY agents don't have MCP relay tools, so they can't send
relay messages. Instead, agents now write their summary to
.relay/summaries/{nodeId}.md before exiting. The orchestrator waits
for agent_exited, then reads the summary file for downstream deps.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New pitfalls from running the swarm implementation script: - PTY prompt echo matching signal keywords (false DONE completion) - Assuming agent capabilities (PTY agents lack MCP tools) - Rust broker vs Node.js CLI binary confusion - Log polling assumes Node.js daemon (Rust broker doesn't write logs) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- runner.ts: executeStep now throws after marking step failed, enabling fail-fast/continue error strategies to trigger via Promise.allSettled - cli/index.ts: runScriptFile now only catches ENOENT errors, properly propagating script execution failures instead of trying next runner Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Similar to Wrangler's telemetry.md, this document explains: - What data is collected and why - What is explicitly NOT collected - How to opt out (CLI, env var, config file) - How to view telemetry events for debugging Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
| if (strategy === 'fail-fast') { | ||
| // Mark all pending downstream steps as skipped | ||
| await this.markDownstreamSkipped(step.name, workflow.steps, stepStates, runId); | ||
| throw new Error(`Step "${step.name}" failed: ${error}`); | ||
| } |
There was a problem hiding this comment.
🟡 fail-fast with parallel step failures leaves downstream steps of subsequent failures in 'pending' state
When multiple steps run in parallel and more than one fails under the fail-fast strategy, only the first failed step's downstream dependents are marked as skipped. The loop throws immediately after processing the first failure, so downstream steps of the second (and subsequent) failed steps remain in pending state instead of skipped.
Root Cause and Impact
In executeSteps at packages/sdk-ts/src/workflows/runner.ts:593-614, the results of Promise.allSettled are iterated. When the first rejected result is encountered with fail-fast strategy, markDownstreamSkipped is called for that step, and then an error is thrown at line 607. This means subsequent rejected results in the same batch are never processed — their downstream steps are never marked as skipped.
For example, if steps A and B run in parallel and both fail:
- Step A's failure is processed: A's downstream steps are marked
skipped, thenthrow - Step B's failure is never processed in this loop (B itself is already marked
failedbyexecuteStep) - Step B's downstream steps remain in
pendingstate in the DB
The run correctly ends in failed status (via the catch block at line 437), but the step state in the database is inconsistent — some steps that should be skipped are left as pending. This affects any UI or API that reads step states to show workflow progress, and it affects resume() which would attempt to re-run those pending steps even though their upstream dependency failed.
Prompt for agents
In packages/sdk-ts/src/workflows/runner.ts, in the executeSteps method around lines 593-614, the fail-fast strategy throws immediately after the first rejected result, skipping processing of subsequent rejected results in the same batch. To fix this, process ALL rejected results before throwing. Specifically, change the loop so that it: 1) Iterates through all results and marks each failed step and its downstream as skipped, 2) Collects the first error, 3) After the loop, throws the collected error. This ensures all downstream steps of all failed parallel steps are properly marked as skipped before the throw.
Was this helpful? React with 👍 or 👎 to provide feedback.
Uh oh!
There was an error while loading. Please reload this page.