Fix broker stdout drain in SDK workflow runner#94
Conversation
|
ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR injects a one-time patch into the generated SDK loader that drains/resumes child stdout for agent-relay-broker init processes to avoid stdout backpressure, and adds a Vitest regression that simulates a broker stdout flood to verify the fix. ChangesBroker Stdout Drain Workaround
Sequence Diagram(s)sequenceDiagram
participant Client as Test runner
participant Runtime as Generated Loader / runtime
participant SpawnPatch as spawn override
participant Broker as agent-relay-broker init
participant Stdout as child.stdout
Client->>Runtime: start workflow (runLocal)
Runtime->>SpawnPatch: apply one-time patch & call syncBuiltinESMExports()
Runtime->>Broker: spawn agent-relay-broker init
Broker->>Stdout: flood with event-* lines
SpawnPatch->>Stdout: attach pause/resume and handle drain
Stdout->>Runtime: resume delivery, avoid blocking
Runtime->>Client: log BROKER_STDOUT_DRAINED and finish
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
0d9b04e to
70e18ec
Compare
Ricky Eval ReviewRun: Passed: 8 | Needs human: 43 | Reviewable: 43 | Missing output: 0 | Failed: 0 | Skipped: 0 Human Review CasesThese cases passed deterministic checks and include captured Ricky output for a human verdict against their REVIEW
|
| Flag | Description | Default |
|---|---|---|
--mode |
Must be first-run for this |
|
| ...[truncated] |
**Must**
- Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
- End every branch with a concrete next step.
- Advertise only commands that are currently implemented.
**Must Not**
- Sound like a launch page or documentation dump.
- Claim Ricky runs workflows by default when generation is the default path.
- Require web or Slack onboarding before CLI use.
**Deterministic checks**
- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0
</details>
<details>
<summary><strong>REVIEW</strong> <code>cli.recovery-guidance-no-stack-traces</code> (cli-behavior/manual)</summary>
**User message**
> A user gives Ricky an empty spec or a missing spec file.
**Ricky output**
```text
**Blocker – Missing Specification**
The workflow authoring process requires a concrete specification file (e.g., `workflow.yaml`, `contract.json`, or equivalent) that defines:
1. **Inputs / Triggers** – data schema, event sources, and required authentication.
2. **Steps / Tasks** – ordered actions, responsible agents, and any external service calls.
3. **Outputs / Artifacts** – result schema, storage locations, and downstream notifications.
4. **Verification Gates** – deterministic checks (unit tests, schema validation, contract assertions) that must pass before promotion.
5. **Review & Release Metadata** – branch naming, PR target, reviewers, and required approval count.
Without this spec, I cannot:
- Generate a deterministic workflow contract.
- Create verification gates or validation loops.
- Scope the branch/PR boundaries for review.
- Produce any of the required review artifacts (e.g., test matrix, CI config).
**Next Steps for You**
1. **Create or locate the spec file** in the repository root (or the designated `spec/` directory).
2. Ensure it follows the agreed schema (YAML or JSON) and includes the sections listed above.
3. Commit the file to a featur
...[truncated]
Must
- Return a user-facing failure or guidance message with a real recovery command.
- Distinguish generation failure from execution failure.
- Show stack traces only when verbose diagnostic mode is requested.
Must Not
- Crash with an uncaught exception in normal mode.
- Suggest commands that do not exist.
- Pretend a missing spec was accepted.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.status-does-not-invent-provider-state (cli-behavior/manual)
User message
Render
ricky statuswhen no provider checks have proven Google or GitHub are connected.
Ricky output
**`ricky status` output (no provider checks have proven Google or GitHub are connected)**
=== AgentWorkforce Status Report ===
Timestamp: 2026-05-10T14:32:07Z
🟡 Provider Connectivity
• Google : NOT VERIFIED (no successful health‑check)
• GitHub : NOT VERIFIED (no successful health‑check)
🔧 Core Services
• Scheduler : RUNNING
• Dispatcher : RUNNING
• Storage : RUNNING
• Metrics : RUNNING
⚙️ Configuration
• Active Branches : main, dev
• Feature Flags : all defaults (no overrides)
🚨 Alerts & Blockers
• ProviderCheck: Google connectivity unconfirmed – workflows requiring Google APIs will be blocked.
• ProviderCheck: GitHub connectivity unconfirmed – repository syncs and PR automation are disabled.
📝 Next Steps (recommended)
- Run
ricky provider check googleandricky provider check githubto trigger health checks. - Verify OAuth/SSH credentials and network access for each provider.
- Re‑run
ricky statusafter checks to confirm “CONNECTED” state.
--- End of Report ---
Must
- Report unknown or not-connected provider state honestly.
- Update provider status only from explicit provider checks or Cloud status results.
- Give concrete setup guidance for Cloud when relevant.
Must Not
- Mark Google or GitHub connected because guidance text was shown.
- Invent a provider connection URL or OAuth flow.
- Show empty fields with no recovery guidance when config is missing.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-matcher-registry-backed (generation-quality/manual)
User message
Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.
Ricky output
## GitHub Primitive‑Change Workflow
**Purpose:** Apply a single‑file edit (e.g., README typo fix) to a public repository while guaranteeing reproducible review, testing, and merge.
### 1. Workflow Contract (YAML‑style summary)
```yaml
name: primitive-change
description: One‑file edit → PR → CI → Merge
trigger:
- manual (Ricky UI) or API call
inputs:
repo: # owner/repo string
branch: main # target branch
file_path: # relative path to edited file
new_content: # full file contents (or diff)
commit_message: # PR title & body
author: # GitHub username
outputs:
pr_number: int
pr_url: string
merge_status: {merged: bool, reason?: string}
gates:
- syntax_check
- lint
- unit_test (if applicable)
- review_approval
- merge_protection
validation_loops: 80-100 # deterministic loops for each gate
2. Selected Ricky Skills
| Skill | Why it’s needed for this workflow |
|---|---|
| GitOps | Clone, branch, commit, push, PR creation, merge. |
| CI/CD Orchestration | Trigger GitHub Actions for lint, tests, and status checks. |
| Deterministic Verification | |
| ...[truncated] |
**Must**
- Select skills from the actual registry rather than a hardcoded static set.
- Record ranked skill matches with confidence and trigger evidence.
- Fall back gracefully when the registry is missing or no skill clears the threshold.
**Must Not**
- Claim runtime agents embody skills when only generation-time guidance was applied.
- Fail generation solely because optional skill files are missing.
- Hide skill selection evidence from artifacts or JSON output.
**Deterministic checks**
- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0
</details>
<details>
<summary><strong>REVIEW</strong> <code>generation-quality.tool-selector-honors-spec-hints</code> (generation-quality/manual)</summary>
**User message**
> Generate a workflow where the spec says "use Claude to review and Codex to implement".
**Ricky output**
```text
**Workflow: “Claude‑Review → Codex‑Implement”**
*Purpose*: Automate a deterministic pipeline where Claude (LLM‑reviewer) validates design/spec text and Codex (code‑generator) produces the implementation. The workflow is scoped to a single feature branch and is gated by explicit verification steps to guarantee reliability and traceability.
---
## 1. Contract Overview
| Element | Description |
|---------|-------------|
| **Input** | `spec.md` – Markdown file containing the functional specification for a single feature. |
| **Outputs** | 1. `review_report.md` – Claude’s review with acceptance criteria and change requests.<br>2. `implementation/` – Generated source files (e.g., `.py`, `.js`).<br>3. `ci_report.json` – Structured verification results. |
| **Success Criteria** | • Claude marks the spec as **Approved** (no “BLOCKER” tags).<br>• All generated files pass static analysis, unit‑test suite, and schema validation.<br>• No open “review‑blocker” comments remain. |
| **Failure Modes** | • Claude returns a **BLOCKER** → abort, create issue.<br>• Codex output fails any verification gate → auto‑re‑run up to 3 attempts, then open a PR with “implementation‑needs‑fix”. |
| **Owner
...[truncated]
Must
- Assign per-step runner/model decisions from explicit spec hints where possible.
- Let skill metadata or project defaults fill gaps when the spec is silent.
- Write
tool-selection.jsonor equivalent audit metadata.
Must Not
- Use one generic runtime default for every agent despite explicit hints.
- Let skill defaults override explicit user/spec runner hints.
- Omit the reason each tool or model was chosen.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)
User message
Use
--refineto sharpen a generated workflow's step task descriptions and acceptance gates.
Ricky output
Below is a compact “how‑to” for using the **`--refine`** option when you want to tighten a workflow’s step‑level task descriptions and acceptance gates. The pattern follows the **Ricky** repository conventions (AGENTS.md, workflow standards, shared authoring rules).
---
## 1. When to invoke `--refine`
| Situation | Goal |
|-----------|------|
| Draft workflow already generated (e.g., via `ricky generate …`) | Sharpen language, make tasks **action‑oriented**, and turn vague success criteria into **deterministic verification gates**. |
| Existing workflow is passing but **reviewers flag ambiguity** | Add concrete inputs/outputs, explicit success metrics, and a “blocking‑condition” checklist. |
| You need to **prepare a PR** that meets the 80‑to‑100 validation‑loop requirement | Use `--refine` to produce a diff that only touches step‑level docs, keeping the code/graph unchanged. |
---
## 2. Command‑line syntax
```bash
ricky refine \
--workflow <path/to/workflow.yaml> \
--output <path/to/refined.yaml> \
[--steps step1,step3,…] \
[--gate deterministic|manual] \
[--dry-run]
| Flag | Meaning |
|---|---|
--workflow |
Source workflow file (YAML/JSON). |
...[truncated]
**Must**
- Keep the deterministic unrefined path as the default.
- Bound the refinement pass by timeout, token budget, and editable regions.
- Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.
**Must Not**
- Change the workflow graph, agent assignments, or side-effect scope during refinement.
- Fail the whole generation if optional refinement times out.
- Hide warnings when the deterministic artifact is returned unchanged.
**Deterministic checks**
- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0
</details>
<details>
<summary><strong>REVIEW</strong> <code>generation-quality.behavior-grounded-gates</code> (generation-quality/manual)</summary>
**User message**
> Generate a workflow for the `ricky --version` spec.
**Ricky output**
```text
**Workflow: `ricky --version` Specification & Delivery**
*Repository:* `github.com/agentworkforce/ricky`
*Target Branch:* `main` → `feat/version-cli`
*Owner:* `@ricky-devlead`
*Reviewers:* `@ricky-qa`, `@ricky-docs`, `@ricky-sre`
---
## 1. Objective
Define, implement, test, and document the `ricky --version` command so that it:
1. Prints the current semantic version of the Ricky binary (e.g., `ricky version 2.4.1`).
2. Returns exit code **0** on success, **1** on failure (e.g., missing build metadata).
3. Is discoverable via `ricky --help` and `ricky -h`.
4. Is covered by unit, integration, and e2e tests with deterministic verification gates.
5. Is included in the generated CLI reference docs.
---
## 2. Contract (CLI & API)
| Element | Specification |
|---------|----------------|
| **Command** | `ricky --version` (alias `ricky -v`) |
| **Output** | `ricky version <MAJOR>.<MINOR>.<PATCH>` (no trailing newline required, but allowed) |
| **Exit Code** | `0` on success, `1` on any internal error |
| **Source of Truth** | `internal/version.Version` constant generated at build time via `-ldflags "-X internal/version.Version=$(git describe --tags --always --dirty)"` |
| **
...[truncated]
Must
- Build gates from the stated acceptance behavior, such as checking
ricky --versionoutput. - Avoid generic source-shape grep checks when the spec asks for CLI behavior.
- Keep generated validation meaningful for the current repo shape.
Must Not
- Treat
grep -Eq 'export|function|class|workflow(' dist/ricky.jsas proof of version behavior. - Claim the workflow is proven by source syntax alone.
- Ignore the package-json version resolution order in the spec.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)
User message
Generate a workflow for many independent artifacts with a validation/fix/rerun loop.
Ricky output
**Workflow Name:** `multi‑artifact‑validation‑loop`
**Purpose**
Process a batch of independent artifacts (e.g., data files, code modules, model checkpoints) through a deterministic **Validate → Fix → Rerun** cycle until each artifact meets its contract or is reported as a blocker. The workflow is scoped to a single branch (`feature/multi‑artifact‑loop`) and produces a single PR that aggregates all review artifacts.
---
## 1. Contract Definition (`contract.yaml`)
```yaml
artifacts:
- id: string # unique identifier (e.g., filename or module name)
type: enum # [data, code, model]
version: semver
checksum: sha256
validation:
data:
schema: path/to/schema.json
max_missing: 0
code:
lint: true
test_coverage: ">=80%"
model:
metrics:
accuracy: ">=0.92"
drift: "<=0.05"
fix_policy:
auto_fix: true # apply auto‑fixers where safe
manual_review: true # require human sign‑off for non‑trivial fixes
rerun_limit: 5 # max cycles per artifact
All artifacts must satisfy all fields under validation for their type before the workflow can merge.
2. High‑Level Stages
| Stage | De
...[truncated]
**Must**
- Choose `dag`, `supervisor`, or `pipeline` deliberately based on the work shape.
- Explain the pattern choice in artifact metadata or a rationale.
- Use `dag` for validation/fix/rerun loops when dependencies matter.
**Must Not**
- Default blindly to `dag` for every workflow.
- Collapse independent artifact work into one vague agent task.
- Omit verification gates because the chosen pattern seems obvious.
**Deterministic checks**
- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0
</details>
<details>
<summary><strong>REVIEW</strong> <code>generation-quality.skill-boundary-copy</code> (generation-quality/manual)</summary>
**User message**
> Describe how selected workflow-writing skills affected a generated Ricky workflow.
**Ricky output**
```text
### How Selected Workflow‑Writing Skills Shape a Generated Ricky Workflow
| Skill | What the Skill Does | Direct Effect on the Generated Workflow | Example Outcome |
|-------|--------------------|----------------------------------------|-----------------|
| **Deterministic Contract Definition** | Forces the author to write explicit input‑output contracts for every step (type, required fields, optional defaults). | • Every node in the DAG has a **`contract`** block that the runtime validates before execution.<br>• Guarantees that downstream steps receive exactly what they expect, eliminating “missing‑field” runtime errors. | `stepA.contract = { input: {url: "string"}, output: {html: "string"} }` |
| **Verification Gate Insertion** | Inserts automated checks (unit test, schema validation, smoke‑run) after each critical transition. | • The workflow includes **`gate`** stages that run a small test harness; failure aborts the pipeline and raises a blocker ticket. | After `parseHTML`, a gate runs `assert(html.length > 0)`; on failure the PR is marked **blocked**. |
| **Review Artifact Generation** | Auto‑creates markdown artifacts (change‑log, decision matrix, risk register) that acco
...[truncated]
Must
- Describe skills as generation-time selection, loading, and rendering inputs.
- Point to metadata such as
loaded-skills.txtandskill-application-boundary.json. - Say the workflow instructions were informed by selected skills.
Must Not
- Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
- Treat metadata existence as proof that agents received skill bodies at runtime.
- Overstate current tests beyond the generation boundary.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)
User message
Generate a workflow from a vague product spec and return it to the user.
Ricky output
Below is a concise “request‑for‑clarification” template you can send to the product owner. It follows our **Ricky** conventions for deterministic workflow authoring: it defines the contract (inputs, outputs, verification gates), the review artifacts we’ll need, and the iteration budget (80‑100 validation loops). Once the missing details are supplied we can generate a complete, branch‑ready workflow PR.
---
## Request for Clarification – Vague Product Spec
| **Area** | **What We Need** | **Why It Matters** |
|----------|------------------|--------------------|
| **Goal / Business Outcome** | A one‑sentence description of the primary user or business problem the workflow must solve. | Drives acceptance criteria and success metrics. |
| **Trigger(s)** | • Event source (e.g., webhook, schedule, UI action).<br>• Payload schema (JSON example or field list). | Determines the entry point and required validation. |
| **Core Steps** | For each logical step: <br>1. **Name** (e.g., “Validate Order”). <br>2. **Action type** (API call, DB query, transformation, human review). <br>3. **Inputs** (fields, external services). <br>4. **Outputs** (data produced, side‑effects). | Allows us to map de
...[truncated]
Must
- Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
- Validate with dry-run or targeted structural checks where possible.
- Return artifacts, warnings, and follow-up commands honestly.
Must Not
- Act like Ricky is a pure code-generation bot that emits workflows without verification.
- Stop at "code compiles" as the proof bar.
- Skip skill-aware workflow authoring guidance for serious workflows.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)
User message
A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.
Ricky output
**Ricky’s post‑failure protocol**
1. **Immediate containment**
- **Mark the timed‑out step as *failed*** in the workflow run record.
- **Terminate the stray worker** (send a graceful shutdown signal; if it does not respond within 30 s, force‑kill).
- Capture the worker’s final heartbeat and any partial logs before termination.
2. **Evidence collection**
- Pull the **step‑level logs**, the worker’s stdout/stderr streams, and the **resource‑usage snapshot** (CPU, memory, network) at the moment of timeout.
- Store these artifacts in the run‑specific `artifacts/` bucket with a deterministic name:
```
artifacts/<run‑id>/<step‑name>_timeout_<timestamp>.log
```
- Record the **worker ID**, launch parameters, and any environment variables in a `worker‑metadata.json` file.
3. **Automated incident ticket**
- Create a ticket in the configured issue tracker (e.g., GitHub Issues) with the label `workflow‑failure`.
- Populate the ticket body with:
* Run ID, branch, commit SHA
* Step that timed out and its configured timeout value
* Worker ID and termination reason
* Links to the collected logs/artifacts
* A
...[truncated]
Must
- Classify the failure before deciding whether to retry.
- Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
- Preserve raw evidence and uncertainty when the class is not known.
Must Not
- Blindly rerun the whole workflow without a blocker classification.
- Treat every failure as a broken workflow definition.
- Claim Ricky fixed the workflow before rerun evidence proves it.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)
User message
Ricky detects stale
.agent-relay/,.relay/, and.trajectories/state before launching a local workflow.
Ricky output
**Workflow: Detect and Guard Against Stale Relay State Before a Local Run**
| Phase | Goal | Contract (input → output) | Deterministic Verification Gate | Review Artifact |
|-------|------|---------------------------|--------------------------------|-----------------|
| **1️⃣ Scan** | Locate any stale state directories in the project root. | `projectRoot/` → `foundDirs: { .agent‑relay?, .relay?, .trajectories? }` | ‑ Exists‑check: each path must be a directory **and** contain at least one file older than `MAX_AGE_MS` (default = 7 days). | `scan‑report.md` (list of dirs, timestamps, age) |
| **2️⃣ Evaluate** | Decide whether the stale state is safe to keep. | `foundDirs` → `action: { keep, clean, abort }` | ‑ Policy‑check: if any dir’s age > MAX_AGE_MS **and** `workflowConfig.allowStale===false` → `abort`. Otherwise → `clean`. | `decision‑log.md` (policy rationale) |
| **3️⃣ Guard** | Enforce the chosen action before the workflow engine starts. | `action` → side‑effects (no‑op / rm‑rf / exit 1) | ‑ Post‑condition: <br>• `keep` → directories unchanged.<br>• `clean` → directories removed and **empty**.<br>• `abort` → process terminated with exit code 1 and error message. | `guard‑s
...[truncated]
Must
- Classify stale local runtime state as an environment contamination issue.
- Recommend quarantine or isolated-run guidance before launch.
- Record the observed paths and the action taken or recommended.
Must Not
- Treat stale runtime state as a workflow logic failure.
- Delete or overwrite state without an explicit safe path or user intent.
- Continue into execution as if the workspace were clean.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)
User message
A run marker says another Ricky or Relay run is already active in this workspace.
Ricky output
**Quick‑Fix Checklist for “Another Ricky/Relay run is already active in this workspace”**
| Step | Action | Command / UI | Expected Result |
|------|--------|--------------|-----------------|
| 1️⃣ | **Inspect the lock file** | `cat .ricky/run/active.marker` (or open the file in the IDE) | Shows the PID and start‑time of the process that created the marker. |
| 2️⃣ | **Verify the process is still alive** | `ps -p <PID>` or `ricky status` | If the PID is **not** running, the marker is stale. |
| 3️⃣ | **If stale – remove the marker** | `rm .ricky/run/active.marker` | Marker cleared; you can start a new run. |
| 4️⃣ | **If the process is alive – decide what to do** | • **Continue** the existing run (no further action).<br>• **Abort** it: `ricky abort --pid <PID>` or `kill -SIGTERM <PID>` | The original run stops and the marker is removed automatically. |
| 5️⃣ | **Confirm the workspace is clean** | `ricky status` | Should report *No active runs*. |
| 6️⃣ | **Start your new run** | `ricky run <workflow‑name>` | New run creates a fresh marker and proceeds. |
---
### Detailed Flow (Deterministic Verification Gates)
1. **Gate A – Marker Presence**
- **Check**: `test -f .ricky/
...[truncated]
Must
- Report the active marker, run id, or status path when available.
- Ask the user to inspect, wait for, or explicitly clear the active run.
- Avoid launching a competing run that could corrupt evidence.
Must Not
- Silently start another run.
- Hide the existing run marker from the user.
- Treat the conflict as a generic failure with no recovery path.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.env-loader-injection-runtime-loadable (runtime-recovery/manual)
User message
A workflow artifact references a
MISSING_ENV_VARvalue. Ricky's deterministic auto-fix injects the.env.local/.envloader (loadRickyWorkflowEnv) and the optionalassertRickyWorkflowEnvguard into the artifact before retry. The artifact may be a master-rendered workflow whose.step({ command: ... })bodies embednode --input-type=moduleHEREDOCs containing literalimport { ... } from 'node:fs'/from 'node:path'strings.
Ricky output
**Deterministic auto‑fix for missing environment variables**
Below is the concrete, reproducible patch you can apply to any workflow artifact that currently fails because it references a `MISSING_ENV_VAR`. The fix does three things:
1. **Injects the environment loader** – `loadRickyWorkflowEnv()` – at the top of the generated file so that `.env.local` (or fallback `.env`) is read before any step runs.
2. **Adds an optional guard** – `assertRickyWorkflowEnv()` – that throws a clear error if the required variable is still undefined after loading.
3. **Leaves the existing step bodies untouched**, even when they contain HEREDOC‑style `node --input-type=module` blocks with literal `import … from 'node:fs'` or `import … from 'node:path'` strings.
---
## 1. Helper utilities (single source of truth)
Create (or update) `ricky-workflow-env.ts` in the workflow’s shared utilities folder:
```ts
// ricky-workflow-env.ts
import { config } from 'dotenv';
import { resolve } from 'node:path';
import { existsSync } from 'node:fs';
/**
* Load environment variables from .env.local (preferred) or .env.
* Returns the loaded env object for optional downstream use.
*/
export function loadRick
...[truncated]
Must
- Produce a repaired artifact that successfully loads under Node, not just one that contains the marker comment. The injected
loadRickyWorkflowEnvbody referencesrickyWorkflowFs.*andrickyWorkflowPath.*, so the repair must also add the correspondingimport * as rickyWorkflowFs from 'node:fs'andimport * as rickyWorkflowPath from 'node:path'aliases at module top level. - Detect existing alias imports by matching real top-level
import * as <alias> from '<module>'statements, not by substring-matching the module specifier anywhere in the file (substrings inside HEREDOCs in.step({ command: ... })bodies do not count as imports). - Leave the embedded shell HEREDOC contents untouched so the runtime-spawned child process still sees the literal import lines it expects.
Must Not
- Skip adding the
rickyWorkflowFs/rickyWorkflowPathaliases becausefrom 'node:fs'orfrom 'node:path'already appears somewhere in the file as a string literal. - Inject
loadRickyWorkflowEnv(orassertRickyWorkflowEnv) without the supporting alias imports, which produces aReferenceError: rickyWorkflowPath is not definedat module load and burns the auto-fix budget onUNSUPPORTED_RUNTIME at runtime-launch. - Rewrite or escape the embedded HEREDOC text in step commands.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)
User message
Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.
Ricky output
**Local Workflow Execution – Auto‑Fix Run**
Below is a deterministic, step‑by‑step contract you can copy‑paste into your terminal (or CI script) to run the workflow, trigger the auto‑fix path on the first failure, and verify the repaired artifact. The contract assumes you have the **AgentWorkforce CLI** (`aw`) installed and that the failed step name and the previous run ID are known.
---
## 1️⃣ Pre‑flight Checks
| Check | Command | Expected outcome |
|-------|---------|------------------|
| CLI version | `aw --version` | `>=2.5.0` |
| Repo root | `git rev-parse --show-toplevel` | prints absolute path |
| Workflow definition present | `ls .aw/workflows/<workflow‑name>.yml` | file exists |
| Auto‑fix feature enabled in config | `cat .aw/config.yml \| grep auto_fix` | `auto_fix: true` |
If any check fails, abort and fix the environment before proceeding.
---
## 2️⃣ Input Contract
```yaml
# auto_fix_input.yml
workflow: "<workflow-name>" # e.g. data‑pipeline
failed_step: "<failed-step-id>" # exact step identifier from the first run
prev_run_id: "<previous-run-uuid>" # e.g. 2023‑09‑15T12:34:56Z‑abc123
auto_fix: true
*Save this file as auto_fix_input.yml in
...[truncated]
**Must**
- Use a bounded retry budget and summarize every attempt.
- Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
- Resume from the failed step with the previous run id when those values are available.
**Must Not**
- Edit arbitrary repository source files as the default auto-fix surface.
- Keep retrying after the configured max attempts.
- Lose the single Ricky tracking run id across repair/resume attempts.
**Deterministic checks**
- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0
</details>
<details>
<summary><strong>REVIEW</strong> <code>runtime-recovery.no-auto-fix-preserves-single-attempt</code> (runtime-recovery/manual)</summary>
**User message**
> A user runs `ricky run workflows/foo.ts --no-auto-fix` and the workflow fails.
**Ricky output**
```text
### Quick‑Start Debug Checklist for a Failed `ricky run workflows/foo.ts --no-auto-fix`
| Step | Action | Expected Outcome | How to Verify |
|------|--------|------------------|---------------|
| 1️⃣ | **Collect the raw run output** | Full console dump (including stack traces) saved to a file | `ricky run workflows/foo.ts --no-auto-fix 2>&1 | tee run‑log.txt` |
| 2️⃣ | **Locate the workflow contract** | `workflows/foo.ts` should export a `contract` object (or `getContract()`) that defines inputs, outputs, and verification gates | Open the file and confirm a `export const contract = { … }` block exists |
| 3️⃣ | **Validate input payload** | All required fields are present, types match, and any `enum` constraints are satisfied | Run `ricky validate-input workflows/foo.ts --payload <payload.json>` |
| 4️⃣ | **Run the workflow in debug mode** | Step‑by‑step execution with verbose logging, no auto‑fixes, and a break‑point after each gate | `ricky run workflows/foo.ts --debug --no-auto-fix` |
| 5️⃣ | **Inspect deterministic verification gates** | Each gate should either **PASS**, **FAIL**, or **BLOCK**. A **BLOCK** stops the run and is reported as a blocker. | In the debug log look for
...[truncated]
Must
- Preserve one-attempt behavior when auto-fix is disabled.
- Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
- Make clear that the user chose manual inspection over repair/resume automation.
Must Not
- Start a repair loop despite
--no-auto-fix. - Suppress the diagnosis because no repair was attempted.
- Present the failure as a completed repair attempt.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.in-process-local-runner (runtime-recovery/manual)
User message
Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.
Ricky output
**Ricky’s Standard Procedure for Executing a Local TypeScript Workflow Artifact**
Below is the concrete, repeatable contract that Ricky follows whenever a developer asks to run a TypeScript‑based workflow that lives in the *primary local path* (by convention `./workflows`). The contract is expressed as a deterministic series of steps, verification gates, and required artifacts so that the execution is reliable, auditable, and reproducible.
---
## 1. Preconditions (Input Contract)
| Item | Requirement | Source |
|------|-------------|--------|
| **Workflow source** | A single entry‑point file `*.workflow.ts` (or `index.ts`) located under `./workflows/<name>/` | Developer‑provided |
| **Node runtime** | Node ≥ 18.x installed and available on `$PATH` | System |
| **Package manager** | `npm` (or `pnpm`/`yarn` if the repo lockfile specifies) | System |
| **TypeScript compiler** | `typescript` version matching the repo’s `package.json` | Installed via `npm ci` |
| **ts-node** | Runtime for on‑the‑fly compilation (`ts-node` version pinned in `devDependencies`) | Installed via `npm ci` |
| **Environment variables** | Any variables declared in `.env.local` (or `workflow.env`) that the w
...[truncated]
Must
- Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the
agent-relaybinary on PATH. - Precheck that Node and
@agent-relay/sdkare resolvable for the workflow. - Record the actual spawn command in execution evidence.
Must Not
- Fail solely because
agent-relayis not on PATH when the SDK route is available. - Hide the actual runtime command from evidence.
- Conflate the user-facing reproduction command with the primary internal spawn route.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)
User message
Ricky reaches a boundary after a structural failure persists after a fix attempt.
Ricky output
**Ricky’s Boundary‑Handling Protocol for Persistent Structural Failures**
When a structural failure remains after an initial remediation attempt, Ricky follows a deterministic, contract‑driven workflow that isolates the problem, escalates responsibly, and preserves the integrity of the AgentWorkforce codebase.
---
## 1. Immediate Post‑Fix Verification
| Step | Action | Deterministic Gate |
|------|--------|--------------------|
| 1.1 | Run the **Structural‑Integrity Test Suite (SITS)** on the affected module. | All tests must pass **≥ 95 %** coverage; any failure triggers the “Persisting Failure” gate. |
| 1.2 | Execute the **Static‑Analysis Lint (SAL)** pipeline (e.g., `ricky lint --strict`). | Zero lint violations of severity **high** or **critical**. |
| 1.3 | Perform a **runtime sanity check** on a fresh sandbox environment (spin‑up via `ricky env create --sandbox`). | No crash or exception logs for 5 min of idle + 10 min of typical workload. |
If any gate fails, the failure is classified as *persistent*.
---
## 2. Honest Blocker Reporting
1. **Create a Blocker Issue**
- Repository: `agentworkforce/infra`
- Branch: `bugfix/persist‑struct‑<timestamp>`
- Ti
...[truncated]
Must
- Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
- Distinguish escalation from a generic product failure.
- Preserve enough context for a human operator to continue.
Must Not
- Retry speculative fixes indefinitely.
- Collapse to "something went wrong" without the attempted actions.
- Discard evidence from failed repair attempts.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)
User message
Produce a workflow health digest from many Ricky workflow runs.
Ricky output
**Ricky Workflow Health Digest – Consolidated Report**
---
### 1. Scope & Data Sources
| Source | Description | Retrieval Method |
|--------|-------------|------------------|
| **Workflow Runs** | All Ricky‑managed workflow executions in the last 30 days (or custom window) | `ricky run list --since <date> --format json` |
| **Verification Gates** | Pass/fail of each deterministic gate (e.g., schema validation, contract test, CI lint) | `ricky gate status --run-id <id> --json` |
| **Review Artifacts** | PR review outcomes, reviewer comments, and approval timestamps | `ricky review export --run-id <id> --json` |
| **Blocker Reports** | Explicit “blocker” flags raised during runs | `ricky blocker list --run-id <id> --json` |
| **Validation Loops** | Count of completed 80‑to‑100 % validation loops per run | `ricky validate stats --run-id <id> --json` |
> **Note:** Adjust the `--since` flag to match the desired reporting period (e.g., `--since 2024-04-01`).
---
### 2. Aggregation Logic (Pseudo‑code)
```bash
#!/usr/bin/env bash
# ricky-health-digest.sh
set -euo pipefail
# 1️⃣ Define window
START=${1:-$(date -d "-30 days" +%Y-%m-%d)}
END=$(date +%Y-%m-%d)
# 2️⃣ Pull runs
RUNS=$(
...[truncated]
Must
- Consume normalized
WorkflowRunEvidencerather than raw logs as the primary input. - Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
- Produce concrete recommendations tied to specific workflows, steps, or metrics.
Must Not
- Mutate evidence while analyzing it.
- Return generic advice like "improve your workflow" without references.
- Mix environment failures and workflow-logic failures into one undifferentiated bucket.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)
User message
Design a new Ricky Slack surface for workflow debugging.
Ricky output
## Ricky Slack Surface – “Workflow Debugger”
### 1. Overview
The **Ricky Slack Debugger** is a dedicated Slack app surface that lets engineers and product owners inspect, step‑through, and mutate running Ricky workflows in real time. It follows the **Ricky repository conventions** (AGENTS.md) and the **workflow standards** (deterministic contracts, verification gates, review artifacts, 80‑100 validation loops, blocker reporting, scoped PR boundaries).
---
## 2. Public Contract (Slack‑App Manifest)
| Field | Value |
|-------|-------|
| **App Name** | `ricky-debugger` |
| **Scopes** | `commands`, `chat:write`, `chat:write.public`, `im:history`, `im:read`, `channels:history`, `channels:read`, `workflow:read`, `workflow:write` |
| **Slash Commands** | `/ricky-debug` – entry point for all debugging actions |
| **Interactive Components** | Buttons, select menus, and modals identified by `action_id` prefixes `debug_` |
| **Event Subscriptions** | `app_mention`, `message.im`, `workflow_state_changed` |
| **Rate Limits** | ≤ 5 req/s per workspace (enforced by internal token bucket) |
| **Versioning** | Semantic version in `manifest.yaml` (e.g., `v1.2.0`) |
| **Error Reporting** | All in
...[truncated]
Must
- Treat Slack as a co-equal product surface, not a wrapper around CLI.
- Route domain work through the same normalization, executor, evidence, and specialist stages.
- Keep Slack-specific formatting, thread handling, and interactive components in the surface layer.
Must Not
- Put workflow generation or diagnosis domain logic directly in the Slack handler.
- Degrade Slack to a developer shortcut with weaker routing than CLI.
- Skip signature verification, dedup, or thread handling when the surface is implemented.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.normalizer-is-only-translation-boundary (surfaces-ingress/manual)
User message
Add a new web handoff type that submits a workflow spec and mode.
Ricky output
**New Hand‑off Type:** `web_submit_spec`
**Purpose**
Enable external web clients to hand‑off a complete **Workflow Specification** together with an execution **Mode** (`sync` | `async`) to the AgentWorkforce engine. The engine validates the spec, creates a run, and returns a deterministic run‑ID (or error) in a single HTTP round‑trip.
---
## 1. Contract Definition (OpenAPI 3.1)
```yaml
paths:
/handoff/web-submit-spec:
post:
summary: Submit a workflow spec and mode for immediate execution
operationId: submitWebSpec
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/WebSpecHandoffRequest'
responses:
'200':
description: Run accepted
content:
application/json:
schema:
$ref: '#/components/schemas/WebSpecHandoffResponse'
'400':
description: Validation failure
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
'422':
description: Spec parsing or mode incompatibility
conte
...[truncated]
Must
- Add a handoff variant that normalizes into
LocalInvocationRequestorCloudGenerateRequest. - Keep auth/session validation in the web surface before normalization.
- Ensure downstream code does not need to know the request came from web.
Must Not
- Short-circuit directly from the web handler to executors.
- Create a parallel domain model for web requests.
- Make the web surface the owner of local or Cloud routing semantics.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.mcp-claude-context-is-metadata (surfaces-ingress/manual)
User message
A Claude session hands Ricky a workflow spec plus conversation context and asks Ricky to determine whether to generate, debug, restart, analyze, or coordinate.
Ricky output
**Decision Matrix for the Requested Action**
| **Input Condition** | **Recommended Action** | **Rationale** |
|----------------
---
_Comment truncated to stay within GitHub limits. Download the `ricky-eval-run` artifact for the full `human-review.md`._
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/local/entrypoint.test.ts`:
- Around line 3138-3180: The fake broker launcher currently writes a POSIX
shebang script to brokerPath and relies on AgentRelayClient.spawn() to execute
it, which fails on Windows; change the test to produce a platform-appropriate
launcher: on win32 write a .cmd stub (or spawn using { shell: true }), or better
yet write the broker body to a .js file and invoke it via process.execPath
(spawn process.execPath with the .js brokerPath as an arg) so Windows runs Node
explicitly; ensure chmod() is only used for POSIX paths and adjust the spawn
invocation in the test to match (replace direct script execution with explicit
Node invocation or shell dispatch) so the stdout drain behavior is exercised
cross-platform.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 194c634c-3903-4510-84b2-5eb245af0bbf
📒 Files selected for processing (2)
src/local/entrypoint.test.tssrc/local/entrypoint.ts
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/local/entrypoint.test.ts (1)
3128-3180:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMake the fake broker launcher cross-platform.
This test still writes an extensionless shebang script and hands that path to
AgentRelayClient.spawn(). On Windows, that launcher shape does not reliably execute, so the regression can fail before it ever exercises the stdout-drain path. Please switch to a platform-specific wrapper (.cmdonwin32) or write the broker body to a real.js/.mjsfile and invoke Node explicitly;chmod()should stay POSIX-only.♻️ Minimal cross-platform shape
- const brokerPath = join(brokerDir, 'agent-relay-broker'); + const brokerScriptPath = join(brokerDir, 'agent-relay-broker.mjs'); + const brokerPath = + process.platform === 'win32' + ? join(brokerDir, 'agent-relay-broker.cmd') + : join(brokerDir, 'agent-relay-broker'); - await writeFile( - brokerPath, - [ - '#!/usr/bin/env node', - "import http from 'node:http';", - ... - ].join('\n'), - 'utf8', - ); - await chmod(brokerPath, 0o755); + await writeFile( + brokerScriptPath, + [ + "import http from 'node:http';", + ... + ].join('\n'), + 'utf8', + ); + if (process.platform === 'win32') { + await writeFile( + brokerPath, + `@echo off\r\n"${process.execPath}" "${brokerScriptPath}" %*\r\n`, + 'utf8', + ); + } else { + await writeFile( + brokerPath, + `#!/bin/sh\nexec "${process.execPath}" "${brokerScriptPath}" "$@"\n`, + 'utf8', + ); + await chmod(brokerPath, 0o755); + }Can Node.js child_process.spawn execute an extensionless shebang script directly on Windows, or is an explicit .cmd wrapper / process.execPath invocation required?🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/local/entrypoint.test.ts` around lines 3128 - 3180, The fake broker script written to brokerPath is extensionless and relies on a shebang, which fails on Windows; update the test to create a platform-appropriate launcher and keep chmod POSIX-only: either write the broker body to a real .js (or .mjs) file and ensure AgentRelayClient.spawn() is invoked with process.execPath and that .js path as the first arg, or on win32 additionally write a small .cmd wrapper next to brokerPath that forwards args to "%~dp0\\agent-relay-broker.js" and point AgentRelayClient.spawn() at the .cmd shim; retain the existing chmod(brokerPath, 0o755) but only call it on non-win32, and ensure the test still looks for the FAKE_BROKER_FLOOD_DONE stdout marker.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@src/local/entrypoint.test.ts`:
- Around line 3128-3180: The fake broker script written to brokerPath is
extensionless and relies on a shebang, which fails on Windows; update the test
to create a platform-appropriate launcher and keep chmod POSIX-only: either
write the broker body to a real .js (or .mjs) file and ensure
AgentRelayClient.spawn() is invoked with process.execPath and that .js path as
the first arg, or on win32 additionally write a small .cmd wrapper next to
brokerPath that forwards args to "%~dp0\\agent-relay-broker.js" and point
AgentRelayClient.spawn() at the .cmd shim; retain the existing chmod(brokerPath,
0o755) but only call it on non-win32, and ensure the test still looks for the
FAKE_BROKER_FLOOD_DONE stdout marker.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: faeb4846-c659-4a0c-9fc7-680e5cffd4dd
📒 Files selected for processing (2)
src/local/entrypoint.test.tssrc/local/entrypoint.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- src/local/entrypoint.ts
| '#!/usr/bin/env node', | ||
| "import http from 'node:http';", |
There was a problem hiding this comment.
🔴 Fake broker script uses ESM import syntax in extensionless file, fails on Node.js 20
The test writes a fake broker script to an extensionless file path (join(brokerDir, 'agent-relay-broker')) at line 3128, then fills it with ES module syntax (import http from 'node:http' at line 3141). When the SDK spawns this file (via its #!/usr/bin/env node shebang), Node.js determines the module type from the file extension and the nearest package.json. Because the file has no extension and sits in an OS temp directory (no package.json with "type": "module" in the hierarchy), Node.js 20 treats it as CommonJS and throws SyntaxError: Cannot use import statement outside a module. The project's .node-version specifies 20 and the actual runtime is v20.18.1. Automatic ESM detection (--experimental-detect-module) was only added in Node.js 21.7. As a result, the broker process crashes on startup, AgentRelayClient.spawn() times out, the workflow throws, and the test always fails — leaving the production monkey-patch (src/local/entrypoint.ts:987-1005) effectively untested.
The fix is to use CommonJS syntax in the fake broker script (e.g., const http = require('node:http')).
Prompt for agents
The fake broker script at lines 3139-3176 uses ESM import syntax (import http from 'node:http') but is written to an extensionless file in a temp directory. On Node.js 20 (the version in .node-version), this is treated as CommonJS and fails with SyntaxError.
To fix: Convert the fake broker script body from ESM to CJS. Replace the import statement with a require call:
- Change: import http from 'node:http';
- To: const http = require('node:http');
The rest of the script (http.createServer, server.listen, process.stdout.write, etc.) uses only runtime APIs and should work identically in CJS. No other files need to change.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Addressed in 53ae63b. The fake broker body is now CommonJS (require) and lives in the init script that the copied Node executable runs, so Node 20 no longer parses ESM from an extensionless temp-file script.
Pulls in agent-relay/relay#838 ("Drain broker stdout after SDK startup")
which is the upstream/root fix for the broker pipe-buffer deadlock that
wedged overnight runs of the proactive-runtime workflows.
Reproducible symptom prior to this bump: after M1 fans out to 9 PTY
workers, every worker's log goes silent within seconds of each other,
broker process parks in `write()` (or `_pthread_cond_wait` once enough
events queue), and the workflow node's `step.run` awaits a drain signal
that never arrives. Two diagnostic captures (~/wedge-92b45d3e-diag/ and
~/wedge-c9600674-diag/, ~14h apart on independent runs) showed the same
freeze-at-fanout shape.
Ricky already shipped the loader-level unblocker in #94. This bump
brings in the upstream SDK fix so direct SDK consumers — including
Ricky-driven workflows whose runtime SDK is the bundled copy under
Ricky's node_modules (per #92's loader redirect) — pick up the proper
drain.
Verification
- npm run typecheck — clean
- npm run build — bundles cleanly, externals unchanged
- npm test — 1075 / 1075 pass, including the existing
"drains broker stdout after SDK startup so event floods cannot wedge
the workflow node" regression in src/local/entrypoint.test.ts
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Verification