Skip to content

Fix broker stdout drain in SDK workflow runner#94

Merged
khaliqgant merged 2 commits into
mainfrom
codex/fix-broker-stdout-drain
May 10, 2026
Merged

Fix broker stdout drain in SDK workflow runner#94
khaliqgant merged 2 commits into
mainfrom
codex/fix-broker-stdout-drain

Conversation

@khaliqgant
Copy link
Copy Markdown
Member

Summary

  • patch the Ricky workflow SDK preload so agent-relay-broker stdout is resumed and drained after startup readers pause it
  • add a regression with a fake broker that floods stdout and fails if the workflow node wedges

Verification

  • npx tsc --noEmit --pretty false
  • npx vitest run src/local/entrypoint.test.ts -t "drains broker stdout"
  • npm test

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 10, 2026

Review Change Stack
No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 91a169ac-082e-417a-bd66-0c15381329fc

📥 Commits

Reviewing files that changed from the base of the PR and between 70e18ec and 53ae63b.

📒 Files selected for processing (1)
  • src/local/entrypoint.test.ts

📝 Walkthrough

Walkthrough

This PR injects a one-time patch into the generated SDK loader that drains/resumes child stdout for agent-relay-broker init processes to avoid stdout backpressure, and adds a Vitest regression that simulates a broker stdout flood to verify the fix.

Changes

Broker Stdout Drain Workaround

Layer / File(s) Summary
Core stdout drain implementation
src/local/entrypoint.ts
The generated registerSource script now imports createRequire and syncBuiltinESMExports, calls syncBuiltinESMExports(), and applies a Symbol-guarded override of node:child_process.spawn that drains/resumes agent-relay-broker init child stdout.
Regression test
src/local/entrypoint.test.ts
New end-to-end test writes a fake agent-relay-broker binary that floods stdout with event-* lines, sets FAKE_BROKER_PATH/RICKY_STATE_HOME, runs a workflow via runLocal, asserts the run completes, and verifies the BROKER_STDOUT_DRAINED log marker; includes cleanup of env and temp dirs.

Sequence Diagram(s)

sequenceDiagram
  participant Client as Test runner
  participant Runtime as Generated Loader / runtime
  participant SpawnPatch as spawn override
  participant Broker as agent-relay-broker init
  participant Stdout as child.stdout
  Client->>Runtime: start workflow (runLocal)
  Runtime->>SpawnPatch: apply one-time patch & call syncBuiltinESMExports()
  Runtime->>Broker: spawn agent-relay-broker init
  Broker->>Stdout: flood with event-* lines
  SpawnPatch->>Stdout: attach pause/resume and handle drain
  Stdout->>Runtime: resume delivery, avoid blocking
  Runtime->>Client: log BROKER_STDOUT_DRAINED and finish
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I wrote a tiny patch today,
A drain to keep the streams at bay.
When brokers shout and pipes run full,
We pause and resume — avoid the lull.
Now workflows hop along, hooray! 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title directly describes the main fix in the changeset—patching broker stdout drain in the SDK workflow runner.
Description check ✅ Passed The description accurately explains both the fix (patching stdout drain) and the regression test, and includes verification steps.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/fix-broker-stdout-drain

Comment @coderabbitai help to get the list of available commands and usage tips.

@khaliqgant khaliqgant force-pushed the codex/fix-broker-stdout-drain branch from 0d9b04e to 70e18ec Compare May 10, 2026 20:21
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 10, 2026

Ricky Eval Review

Run: .ricky/evals/runs/2026-05-10T20-30-18-322Z-HEAD-provider
Mode: provider
Git SHA: 790df77

Passed: 8 | Needs human: 43 | Reviewable: 43 | Missing output: 0 | Failed: 0 | Skipped: 0

Human Review Cases

These cases passed deterministic checks and include captured Ricky output for a human verdict against their Must / Must Not rubric.

REVIEW agent-assistant-boundary.real-reuse-not-rhetorical (agent-assistant-boundary/manual)

User message

Update Ricky docs and code to say it uses Agent Assistant more deeply.

Ricky output

**Scope**  
- Update the **Ricky** repository to surface that Ricky now leverages the **Agent Assistant** (AA) for deeper workflow orchestration, validation, and coordination.  
- Changes are limited to documentation, the top‑level `ricky/__init__.py` metadata, and the AA integration shim in `ricky/agent_assistant.py`.  
- No functional behavior changes beyond exposing the AA dependency; existing unit‑tests continue to pass.

---

## 1. Repository Changes (Branch: `feature/aa‑deep‑integration`)

| Path | Change Type | Description |
|------|-------------|-------------|
| `README.md` | Doc update | Add a “Deep Integration with Agent Assistant” section, update badge, and adjust the quick‑start example to show AA‑driven validation. |
| `docs/architecture.md` | Doc update | Expand the architecture diagram description to include the AA orchestration layer, its contract (`AAContract`), and the deterministic verification gate (`AAValidator`). |
| `ricky/__init__.py` | Code/metadata | Bump `__version__` to `0.4.0‑aa` and add `AGENT_ASSISTANT_VERSION = ">=1.2.0"` constant. |
| `ricky/agent_assistant.py` | New shim | Implement a thin wrapper that imports `agent_assistant.core` and exposes two
...[truncated]

Must

  • Ground claims in real package imports and runtime paths.
  • Distinguish current implementation from target architecture.
  • Identify which Agent Assistant primitive is actually exercised.

Must Not

  • Rename local code to sound Agent Assistant aligned and count that as adoption.
  • Claim broad Agent Assistant native behavior from documentation-only alignment.
  • Blur target architecture with landed behavior.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.turn-context-preserves-ricky-envelope (agent-assistant-boundary/manual)

User message

Evaluate the current Ricky @agent-assistant/turn-context adoption.

Ricky output

**Ricky @agent‑assistant/turn‑context Adoption Review (as of 2026‑05‑10)**  

| Metric | Current State | Target / SLA | Gap / Action |
|--------|---------------|--------------|--------------|
| **Workflow contracts defined** | 12 of 18 active agents have a `turn‑context` contract in their `workflow.yaml`. | 100 % of agents with turn‑based interactions must expose a contract. | Identify the 6 agents lacking contracts; create minimal contracts (input: `prevTurn`, `currTurn`; output: `contextSummary`). |
| **Deterministic verification gates** | 7 agents have automated gate tests that assert `turn‑context` payload shape and idempotence. | Every contract must be guarded by a gate that runs on every PR. | Add gates for the remaining 5 agents; use the shared `verify-turn-context.yml` template. |
| **Review artifacts** | 68 % of PRs touching `turn‑context` include a **Context‑Impact Review** checklist. | 100 % compliance. | Enforce checklist via branch‑policy “Require status check: Context‑Impact Review”. |
| **Validation loops** | Average of 42 validation loops per PR (target 80‑100). | 80‑100 loops to surface edge‑case regressions. | Increase fuzz‑testing depth (`fuzz-turn-context.yml`) 
...[truncated]

Must

  • Preserve request id, source metadata, structured spec, invocation root, mode, stage mode, spec path, metadata, and spec text.
  • Record compact provenance through generation decisions or coordinator metadata.
  • Keep the shared turn context internal to the adapter boundary.

Must Not

  • Move LocalResponse, blocker taxonomy, recovery wording, or execution semantics into the shared turn-context package.
  • Drop Ricky-specific workflow metadata during envelope assembly.
  • Treat turn context as a product decision engine.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.product-core-stays-ricky-owned (agent-assistant-boundary/manual)

User message

Decide whether workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording should move into Agent Assistant.

Ricky output

**Recommendation: Move all four capabilities into the Agent Assistant**

| Capability | Why it belongs in the Agent Assistant | Impact on reliability & coordination |
|------------|----------------------------------------|--------------------------------------|
| **Workflow generation** | The Assistant already owns the *authoring contract* (input spec → workflow DSL). Adding generation lets it produce a complete, version‑controlled workflow in a single step, eliminating hand‑off to a separate “generator” service. | Guarantees that generated artifacts are immediately wrapped in the standard `workflow.yaml` contract, enabling downstream deterministic verification gates. |
| **Workflow validation** | Validation is the deterministic gate that follows generation. Embedding it in the Assistant lets the same execution context run the generated DSL through the shared validator (schema, type‑check, static analysis). | Immediate feedback loop (≤ 2 s) reduces the number of validation‑failure PRs and keeps the 80‑100 loop target realistic. |
| **Workflow debugging** | Debugging is the *repair* phase of the authoring lifecycle. By exposing a `debug` sub‑command that re‑runs the workflow with in
...[truncated]

Must

  • Keep product-defining workflow generation, validation, debugging, local UX, and evidence wording Ricky-owned until proof says otherwise.
  • Reuse shared runtime primitives where they reduce duplication without weakening Ricky.
  • Make extraction follow typed, tested, live product proof.

Must Not

  • Generalize workflow-specific behavior prematurely.
  • Adopt moving shared seams merely for architectural purity.
  • Lose the precise local-first staged workflow UX.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.one-slice-at-a-time (agent-assistant-boundary/manual)

User message

Plan the next Agent Assistant adoption slice for Ricky.

Ricky output

**Agent Assistant Adoption Slice – “Ricky v2.0 Roll‑out”**  
*Target window: 2026‑07‑01 → 2026‑09‑30 (13 weeks)*  

---

## 1. Slice Objective
Deliver a production‑ready, self‑service adoption package for the **Ricky** AgentWorkforce assistant that enables **10 pilot teams** to author, test, and deploy reliable workflows with the full set of Ricky‑specific contracts, verification gates, and review artifacts.

Success is measured by:
| KPI | Target |
|-----|--------|
| **On‑boarded pilot teams** | 10 (≥ 8 % ≥ 90 % adoption after 4 weeks) |
| **Workflow pass‑rate** (all contracts satisfied) | ≥ 95 % |
| **Mean Time to Repair (MTTR)** for blocker tickets | ≤ 24 h |
| **Documentation coverage** (contract + verification + review) | 100 % of new artifacts |

---

## 2. Scope & Boundaries  

| In‑Scope | Out‑of‑Scope |
|----------|--------------|
| • Ricky contract library (creation, versioning, CI checks)  <br>• Deterministic verification gates (unit, integration, contract‑test)  <br>• Review workflow (PR template, mandatory reviewer matrix, “Gate‑Keeper” bot)  <br>• Adoption playbook (step‑by‑step guide, FAQ, sample repo)  <br>• Pilot onboarding (Slack channel, onboarding tickets, weekl
...[truncated]

Must

  • Pick exactly one real shared seam to evaluate or adopt.
  • Define a live Ricky product path that will prove the adoption.
  • Include regression checks that product messaging, blocker output, and evidence remain truthful.

Must Not

  • Bundle sessions, memory, policy, proactive behavior, and execution extraction into one vague migration.
  • Skip the comparison/evaluation step for mature Ricky-local seams.
  • Treat adoption as successful without a live product-path proof.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.future-surfaces-use-shared-runtime (agent-assistant-boundary/manual)

User message

Design future Slack or web support for Ricky using Agent Assistant packages.

Ricky output

## Ricky – Future Support Interface (Slack & Web)

Below is a **concrete, contract‑driven design** for extending Ricky’s support capabilities to Slack and a web‑based console. The design follows the **AgentWorkforce workflow standards** (deterministic gates, review artifacts, 80‑100 validation loops, blocker reporting) and is scoped for a single feature branch **`feat/support‑ui‑v2`**.

---

### 1. High‑Level Architecture

| Component | Responsibility | Primary Tech | Interaction |
|-----------|----------------|--------------|-------------|
| **Ricky Core** | Business logic, workflow orchestration | Python 3.11, FastAPI | Exposes **REST** and **gRPC** endpoints |
| **Slack Adapter** | Translate Slack events ↔ Ricky Core | Bolt for Python, Socket Mode | Subscribes to `message`, `app_mention`, `block_actions` |
| **Web Console** | Rich UI for ticket view, escalation, logs | React 18 + TypeScript, Vite, TailwindCSS | Calls Ricky Core via **OpenAPI** client |
| **Auth Service** | OAuth2 for Slack & SSO for web | Auth0 (or internal OIDC) | Issues JWTs for both adapters |
| **Observability** | Tracing, metrics, error aggregation | OpenTelemetry, Prometheus, Grafana | All services emit `t
...[truncated]

Must

  • Prefer shared surfaces, webhook-runtime, sessions, and routing primitives for future non-CLI interaction where mature.
  • Keep local CLI behavior product-local unless shared adoption is proven harmless.
  • Explain which behavior is future/target architecture versus implemented today.

Must Not

  • Preemptively add memory, policy, or proactive packages without a real Ricky product requirement.
  • Let future surface abstractions distort the current CLI contract.
  • Duplicate a mature Agent Assistant capability locally without justification.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.generation-default-not-execution (cli-behavior/manual)

User message

A user runs ricky --mode local --spec "generate a workflow for package checks" without --run.

Ricky output

**What happens when you run**

```bash
ricky --mode local --spec "generate a workflow for package checks"

without the --run flag?

  1. Specification‑only mode

    • ricky parses the --spec string, expands any template variables, and builds an in‑memory workflow contract (the JSON/YAML description of the workflow, its stages, inputs, outputs, and verification gates).
    • No tasks are dispatched to the executor, no containers are started, and no side‑effects (e.g., file writes, network calls) occur.
  2. Validation & linting

    • The generated contract is run through the deterministic verification gates defined in workflow/validation/.
    • These gates check:
      • Schema compliance (required fields, correct types).
      • Branch/PR boundaries (each stage’s branch and pr scopes are well‑formed).
      • Dependency graph (no cycles, all inputs satisfied).
      • Resource limits (CPU/memory caps are within allowed ranges).
    • Any violations are reported as blockers with line‑number references to the generated spec.
  3. Review artifacts

    • Because you omitted --run, ricky automatically writes two read‑o
      ...[truncated]

**Must**

- Say generation is the default and execution was not requested.
- Print the generated artifact path, workflow id, spec digest, and next run command.
- Avoid showing execution evidence for a generation-only request.

**Must Not**

- Imply the workflow ran automatically.
- Present a generation-only result as execution success.
- Hide the opt-in commands for running the artifact.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.first-run-copy-is-compact-and-truthful</code> (cli-behavior/manual)</summary>

**User message**

> Render Ricky's first-run CLI onboarding for a new user.

**Ricky output**

```text
**Ricky – First‑Run CLI Onboarding Workflow**  
*(Designed for a brand‑new user installing the `ricky` command‑line tool)*  

---

## 1. Overview
The onboarding workflow guides the user through:

1. **Environment verification** – ensure required runtimes and permissions.  
2. **Ricky installation validation** – confirm the binary is reachable.  
3. **Workspace initialization** – create a local `ricky-workspace/` with the standard repo layout.  
4. **Authentication & profile setup** – generate a personal access token (PAT) and store it securely.  
5. **Sample workflow bootstrap** – scaffold a minimal “Hello‑World” workflow and run a dry‑run verification.  
6. **Final health check** – run `ricky health` and present a concise success report.

All steps are deterministic, idempotent, and emit structured JSON logs for downstream CI/CD verification.

---

## 2. Contract (CLI Invocation)

```bash
# One‑liner for first‑run onboarding
ricky onboarding --mode=first-run \
    --workspace=~/ricky-workspace \
    --log-format=json \
    --no-interactive   # optional, for scripted CI
Flag Description Default
--mode Must be first-run for this
...[truncated]

**Must**

- Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
- End every branch with a concrete next step.
- Advertise only commands that are currently implemented.

**Must Not**

- Sound like a launch page or documentation dump.
- Claim Ricky runs workflows by default when generation is the default path.
- Require web or Slack onboarding before CLI use.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.recovery-guidance-no-stack-traces</code> (cli-behavior/manual)</summary>

**User message**

> A user gives Ricky an empty spec or a missing spec file.

**Ricky output**

```text
**Blocker – Missing Specification**

The workflow authoring process requires a concrete specification file (e.g., `workflow.yaml`, `contract.json`, or equivalent) that defines:

1. **Inputs / Triggers** – data schema, event sources, and required authentication.  
2. **Steps / Tasks** – ordered actions, responsible agents, and any external service calls.  
3. **Outputs / Artifacts** – result schema, storage locations, and downstream notifications.  
4. **Verification Gates** – deterministic checks (unit tests, schema validation, contract assertions) that must pass before promotion.  
5. **Review & Release Metadata** – branch naming, PR target, reviewers, and required approval count.

Without this spec, I cannot:

- Generate a deterministic workflow contract.  
- Create verification gates or validation loops.  
- Scope the branch/PR boundaries for review.  
- Produce any of the required review artifacts (e.g., test matrix, CI config).

**Next Steps for You**

1. **Create or locate the spec file** in the repository root (or the designated `spec/` directory).  
2. Ensure it follows the agreed schema (YAML or JSON) and includes the sections listed above.  
3. Commit the file to a featur
...[truncated]

Must

  • Return a user-facing failure or guidance message with a real recovery command.
  • Distinguish generation failure from execution failure.
  • Show stack traces only when verbose diagnostic mode is requested.

Must Not

  • Crash with an uncaught exception in normal mode.
  • Suggest commands that do not exist.
  • Pretend a missing spec was accepted.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.status-does-not-invent-provider-state (cli-behavior/manual)

User message

Render ricky status when no provider checks have proven Google or GitHub are connected.

Ricky output

**`ricky status` output (no provider checks have proven Google or GitHub are connected)**  

=== AgentWorkforce Status Report ===
Timestamp: 2026-05-10T14:32:07Z

🟡 Provider Connectivity
• Google : NOT VERIFIED (no successful health‑check)
• GitHub : NOT VERIFIED (no successful health‑check)

🔧 Core Services
• Scheduler : RUNNING
• Dispatcher : RUNNING
• Storage : RUNNING
• Metrics : RUNNING

⚙️ Configuration
• Active Branches : main, dev
• Feature Flags : all defaults (no overrides)

🚨 Alerts & Blockers
• ProviderCheck: Google connectivity unconfirmed – workflows requiring Google APIs will be blocked.
• ProviderCheck: GitHub connectivity unconfirmed – repository syncs and PR automation are disabled.

📝 Next Steps (recommended)

  1. Run ricky provider check google and ricky provider check github to trigger health checks.
  2. Verify OAuth/SSH credentials and network access for each provider.
  3. Re‑run ricky status after checks to confirm “CONNECTED” state.

--- End of Report ---

Must

  • Report unknown or not-connected provider state honestly.
  • Update provider status only from explicit provider checks or Cloud status results.
  • Give concrete setup guidance for Cloud when relevant.

Must Not

  • Mark Google or GitHub connected because guidance text was shown.
  • Invent a provider connection URL or OAuth flow.
  • Show empty fields with no recovery guidance when config is missing.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-matcher-registry-backed (generation-quality/manual)

User message

Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.

Ricky output

## GitHub Primitive‑Change Workflow  
**Purpose:** Apply a single‑file edit (e.g., README typo fix) to a public repository while guaranteeing reproducible review, testing, and merge.  

### 1. Workflow Contract (YAML‑style summary)

```yaml
name: primitive-change
description: One‑file edit → PR → CI → Merge
trigger:
  - manual (Ricky UI) or API call
inputs:
  repo:               # owner/repo string
  branch: main        # target branch
  file_path:          # relative path to edited file
  new_content:        # full file contents (or diff)
  commit_message:     # PR title & body
  author:             # GitHub username
outputs:
  pr_number: int
  pr_url: string
  merge_status: {merged: bool, reason?: string}
gates:
  - syntax_check
  - lint
  - unit_test (if applicable)
  - review_approval
  - merge_protection
validation_loops: 80-100   # deterministic loops for each gate

2. Selected Ricky Skills

Skill Why it’s needed for this workflow
GitOps Clone, branch, commit, push, PR creation, merge.
CI/CD Orchestration Trigger GitHub Actions for lint, tests, and status checks.
Deterministic Verification
...[truncated]

**Must**

- Select skills from the actual registry rather than a hardcoded static set.
- Record ranked skill matches with confidence and trigger evidence.
- Fall back gracefully when the registry is missing or no skill clears the threshold.

**Must Not**

- Claim runtime agents embody skills when only generation-time guidance was applied.
- Fail generation solely because optional skill files are missing.
- Hide skill selection evidence from artifacts or JSON output.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>generation-quality.tool-selector-honors-spec-hints</code> (generation-quality/manual)</summary>

**User message**

> Generate a workflow where the spec says "use Claude to review and Codex to implement".

**Ricky output**

```text
**Workflow: “Claude‑Review → Codex‑Implement”**  
*Purpose*: Automate a deterministic pipeline where Claude (LLM‑reviewer) validates design/spec text and Codex (code‑generator) produces the implementation. The workflow is scoped to a single feature branch and is gated by explicit verification steps to guarantee reliability and traceability.

---

## 1. Contract Overview  

| Element | Description |
|---------|-------------|
| **Input** | `spec.md` – Markdown file containing the functional specification for a single feature. |
| **Outputs** | 1. `review_report.md` – Claude’s review with acceptance criteria and change requests.<br>2. `implementation/` – Generated source files (e.g., `.py`, `.js`).<br>3. `ci_report.json` – Structured verification results. |
| **Success Criteria** | • Claude marks the spec as **Approved** (no “BLOCKER” tags).<br>• All generated files pass static analysis, unit‑test suite, and schema validation.<br>• No open “review‑blocker” comments remain. |
| **Failure Modes** | • Claude returns a **BLOCKER** → abort, create issue.<br>• Codex output fails any verification gate → auto‑re‑run up to 3 attempts, then open a PR with “implementation‑needs‑fix”. |
| **Owner
...[truncated]

Must

  • Assign per-step runner/model decisions from explicit spec hints where possible.
  • Let skill metadata or project defaults fill gaps when the spec is silent.
  • Write tool-selection.json or equivalent audit metadata.

Must Not

  • Use one generic runtime default for every agent despite explicit hints.
  • Let skill defaults override explicit user/spec runner hints.
  • Omit the reason each tool or model was chosen.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)

User message

Use --refine to sharpen a generated workflow's step task descriptions and acceptance gates.

Ricky output

Below is a compact “how‑to” for using the **`--refine`** option when you want to tighten a workflow’s step‑level task descriptions and acceptance gates. The pattern follows the **Ricky** repository conventions (AGENTS.md, workflow standards, shared authoring rules).

---

## 1. When to invoke `--refine`

| Situation | Goal |
|-----------|------|
| Draft workflow already generated (e.g., via `ricky generate …`) | Sharpen language, make tasks **action‑oriented**, and turn vague success criteria into **deterministic verification gates**. |
| Existing workflow is passing but **reviewers flag ambiguity** | Add concrete inputs/outputs, explicit success metrics, and a “blocking‑condition” checklist. |
| You need to **prepare a PR** that meets the 80‑to‑100 validation‑loop requirement | Use `--refine` to produce a diff that only touches step‑level docs, keeping the code/graph unchanged. |

---

## 2. Command‑line syntax

```bash
ricky refine \
  --workflow <path/to/workflow.yaml> \
  --output   <path/to/refined.yaml> \
  [--steps step1,step3,…] \
  [--gate  deterministic|manual] \
  [--dry-run]
Flag Meaning
--workflow Source workflow file (YAML/JSON).

...[truncated]


**Must**

- Keep the deterministic unrefined path as the default.
- Bound the refinement pass by timeout, token budget, and editable regions.
- Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.

**Must Not**

- Change the workflow graph, agent assignments, or side-effect scope during refinement.
- Fail the whole generation if optional refinement times out.
- Hide warnings when the deterministic artifact is returned unchanged.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>generation-quality.behavior-grounded-gates</code> (generation-quality/manual)</summary>

**User message**

> Generate a workflow for the `ricky --version` spec.

**Ricky output**

```text
**Workflow: `ricky --version` Specification & Delivery**  
*Repository:* `github.com/agentworkforce/ricky`  
*Target Branch:* `main` → `feat/version-cli`  
*Owner:* `@ricky-devlead`  
*Reviewers:* `@ricky-qa`, `@ricky-docs`, `@ricky-sre`  

---

## 1. Objective
Define, implement, test, and document the `ricky --version` command so that it:

1. Prints the current semantic version of the Ricky binary (e.g., `ricky version 2.4.1`).
2. Returns exit code **0** on success, **1** on failure (e.g., missing build metadata).
3. Is discoverable via `ricky --help` and `ricky -h`.
4. Is covered by unit, integration, and e2e tests with deterministic verification gates.
5. Is included in the generated CLI reference docs.

---

## 2. Contract (CLI & API)

| Element | Specification |
|---------|----------------|
| **Command** | `ricky --version` (alias `ricky -v`) |
| **Output** | `ricky version <MAJOR>.<MINOR>.<PATCH>` (no trailing newline required, but allowed) |
| **Exit Code** | `0` on success, `1` on any internal error |
| **Source of Truth** | `internal/version.Version` constant generated at build time via `-ldflags "-X internal/version.Version=$(git describe --tags --always --dirty)"` |
| **
...[truncated]

Must

  • Build gates from the stated acceptance behavior, such as checking ricky --version output.
  • Avoid generic source-shape grep checks when the spec asks for CLI behavior.
  • Keep generated validation meaningful for the current repo shape.

Must Not

  • Treat grep -Eq 'export|function|class|workflow(' dist/ricky.js as proof of version behavior.
  • Claim the workflow is proven by source syntax alone.
  • Ignore the package-json version resolution order in the spec.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)

User message

Generate a workflow for many independent artifacts with a validation/fix/rerun loop.

Ricky output

**Workflow Name:** `multi‑artifact‑validation‑loop`

**Purpose**  
Process a batch of independent artifacts (e.g., data files, code modules, model checkpoints) through a deterministic **Validate → Fix → Rerun** cycle until each artifact meets its contract or is reported as a blocker. The workflow is scoped to a single branch (`feature/multi‑artifact‑loop`) and produces a single PR that aggregates all review artifacts.

---

## 1. Contract Definition (`contract.yaml`)

```yaml
artifacts:
  - id: string            # unique identifier (e.g., filename or module name)
    type: enum            # [data, code, model]
    version: semver
    checksum: sha256
validation:
  data:
    schema: path/to/schema.json
    max_missing: 0
  code:
    lint: true
    test_coverage: ">=80%"
  model:
    metrics:
      accuracy: ">=0.92"
      drift: "<=0.05"
fix_policy:
  auto_fix: true          # apply auto‑fixers where safe
  manual_review: true     # require human sign‑off for non‑trivial fixes
rerun_limit: 5            # max cycles per artifact

All artifacts must satisfy all fields under validation for their type before the workflow can merge.


2. High‑Level Stages

| Stage | De
...[truncated]


**Must**

- Choose `dag`, `supervisor`, or `pipeline` deliberately based on the work shape.
- Explain the pattern choice in artifact metadata or a rationale.
- Use `dag` for validation/fix/rerun loops when dependencies matter.

**Must Not**

- Default blindly to `dag` for every workflow.
- Collapse independent artifact work into one vague agent task.
- Omit verification gates because the chosen pattern seems obvious.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>generation-quality.skill-boundary-copy</code> (generation-quality/manual)</summary>

**User message**

> Describe how selected workflow-writing skills affected a generated Ricky workflow.

**Ricky output**

```text
### How Selected Workflow‑Writing Skills Shape a Generated Ricky Workflow  

| Skill | What the Skill Does | Direct Effect on the Generated Workflow | Example Outcome |
|-------|--------------------|----------------------------------------|-----------------|
| **Deterministic Contract Definition** | Forces the author to write explicit input‑output contracts for every step (type, required fields, optional defaults). | • Every node in the DAG has a **`contract`** block that the runtime validates before execution.<br>• Guarantees that downstream steps receive exactly what they expect, eliminating “missing‑field” runtime errors. | `stepA.contract = { input: {url: "string"}, output: {html: "string"} }` |
| **Verification Gate Insertion** | Inserts automated checks (unit test, schema validation, smoke‑run) after each critical transition. | • The workflow includes **`gate`** stages that run a small test harness; failure aborts the pipeline and raises a blocker ticket. | After `parseHTML`, a gate runs `assert(html.length > 0)`; on failure the PR is marked **blocked**. |
| **Review Artifact Generation** | Auto‑creates markdown artifacts (change‑log, decision matrix, risk register) that acco
...[truncated]

Must

  • Describe skills as generation-time selection, loading, and rendering inputs.
  • Point to metadata such as loaded-skills.txt and skill-application-boundary.json.
  • Say the workflow instructions were informed by selected skills.

Must Not

  • Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
  • Treat metadata existence as proof that agents received skill bodies at runtime.
  • Overstate current tests beyond the generation boundary.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)

User message

Generate a workflow from a vague product spec and return it to the user.

Ricky output

Below is a concise “request‑for‑clarification” template you can send to the product owner. It follows our **Ricky** conventions for deterministic workflow authoring: it defines the contract (inputs, outputs, verification gates), the review artifacts we’ll need, and the iteration budget (80‑100 validation loops). Once the missing details are supplied we can generate a complete, branch‑ready workflow PR.

---

## Request for Clarification – Vague Product Spec

| **Area** | **What We Need** | **Why It Matters** |
|----------|------------------|--------------------|
| **Goal / Business Outcome** | A one‑sentence description of the primary user or business problem the workflow must solve. | Drives acceptance criteria and success metrics. |
| **Trigger(s)** | • Event source (e.g., webhook, schedule, UI action).<br>• Payload schema (JSON example or field list). | Determines the entry point and required validation. |
| **Core Steps** | For each logical step: <br>1. **Name** (e.g., “Validate Order”). <br>2. **Action type** (API call, DB query, transformation, human review). <br>3. **Inputs** (fields, external services). <br>4. **Outputs** (data produced, side‑effects). | Allows us to map de
...[truncated]

Must

  • Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
  • Validate with dry-run or targeted structural checks where possible.
  • Return artifacts, warnings, and follow-up commands honestly.

Must Not

  • Act like Ricky is a pure code-generation bot that emits workflows without verification.
  • Stop at "code compiles" as the proof bar.
  • Skip skill-aware workflow authoring guidance for serious workflows.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)

User message

A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.

Ricky output

**Ricky’s post‑failure protocol**

1. **Immediate containment**  
   - **Mark the timed‑out step as *failed*** in the workflow run record.  
   - **Terminate the stray worker** (send a graceful shutdown signal; if it does not respond within 30 s, force‑kill).  
   - Capture the worker’s final heartbeat and any partial logs before termination.

2. **Evidence collection**  
   - Pull the **step‑level logs**, the worker’s stdout/stderr streams, and the **resource‑usage snapshot** (CPU, memory, network) at the moment of timeout.  
   - Store these artifacts in the run‑specific `artifacts/` bucket with a deterministic name:  
     ```
     artifacts/<run‑id>/<step‑name>_timeout_<timestamp>.log
     ```
   - Record the **worker ID**, launch parameters, and any environment variables in a `worker‑metadata.json` file.

3. **Automated incident ticket**  
   - Create a ticket in the configured issue tracker (e.g., GitHub Issues) with the label `workflow‑failure`.  
   - Populate the ticket body with:
     * Run ID, branch, commit SHA  
     * Step that timed out and its configured timeout value  
     * Worker ID and termination reason  
     * Links to the collected logs/artifacts  
     * A
...[truncated]

Must

  • Classify the failure before deciding whether to retry.
  • Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
  • Preserve raw evidence and uncertainty when the class is not known.

Must Not

  • Blindly rerun the whole workflow without a blocker classification.
  • Treat every failure as a broken workflow definition.
  • Claim Ricky fixed the workflow before rerun evidence proves it.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)

User message

Ricky detects stale .agent-relay/, .relay/, and .trajectories/ state before launching a local workflow.

Ricky output

**Workflow: Detect and Guard Against Stale Relay State Before a Local Run**  

| Phase | Goal | Contract (input → output) | Deterministic Verification Gate | Review Artifact |
|-------|------|---------------------------|--------------------------------|-----------------|
| **1️⃣ Scan** | Locate any stale state directories in the project root. | `projectRoot/` → `foundDirs: { .agent‑relay?, .relay?, .trajectories? }` | ‑ Exists‑check: each path must be a directory **and** contain at least one file older than `MAX_AGE_MS` (default = 7 days). | `scan‑report.md` (list of dirs, timestamps, age) |
| **2️⃣ Evaluate** | Decide whether the stale state is safe to keep. | `foundDirs` → `action: { keep, clean, abort }` | ‑ Policy‑check: if any dir’s age > MAX_AGE_MS **and** `workflowConfig.allowStale===false` → `abort`. Otherwise → `clean`. | `decision‑log.md` (policy rationale) |
| **3️⃣ Guard** | Enforce the chosen action before the workflow engine starts. | `action` → side‑effects (no‑op / rm‑rf / exit 1) | ‑ Post‑condition: <br>• `keep` → directories unchanged.<br>• `clean` → directories removed and **empty**.<br>• `abort` → process terminated with exit code 1 and error message. | `guard‑s
...[truncated]

Must

  • Classify stale local runtime state as an environment contamination issue.
  • Recommend quarantine or isolated-run guidance before launch.
  • Record the observed paths and the action taken or recommended.

Must Not

  • Treat stale runtime state as a workflow logic failure.
  • Delete or overwrite state without an explicit safe path or user intent.
  • Continue into execution as if the workspace were clean.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)

User message

A run marker says another Ricky or Relay run is already active in this workspace.

Ricky output

**Quick‑Fix Checklist for “Another Ricky/Relay run is already active in this workspace”**

| Step | Action | Command / UI | Expected Result |
|------|--------|--------------|-----------------|
| 1️⃣ | **Inspect the lock file** | `cat .ricky/run/active.marker` (or open the file in the IDE) | Shows the PID and start‑time of the process that created the marker. |
| 2️⃣ | **Verify the process is still alive** | `ps -p <PID>`  or  `ricky status` | If the PID is **not** running, the marker is stale. |
| 3️⃣ | **If stale – remove the marker** | `rm .ricky/run/active.marker` | Marker cleared; you can start a new run. |
| 4️⃣ | **If the process is alive – decide what to do** | • **Continue** the existing run (no further action).<br>• **Abort** it: `ricky abort --pid <PID>` or `kill -SIGTERM <PID>` | The original run stops and the marker is removed automatically. |
| 5️⃣ | **Confirm the workspace is clean** | `ricky status` | Should report *No active runs*. |
| 6️⃣ | **Start your new run** | `ricky run <workflow‑name>` | New run creates a fresh marker and proceeds. |

---

### Detailed Flow (Deterministic Verification Gates)

1. **Gate A – Marker Presence**  
   - **Check**: `test -f .ricky/
...[truncated]

Must

  • Report the active marker, run id, or status path when available.
  • Ask the user to inspect, wait for, or explicitly clear the active run.
  • Avoid launching a competing run that could corrupt evidence.

Must Not

  • Silently start another run.
  • Hide the existing run marker from the user.
  • Treat the conflict as a generic failure with no recovery path.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.env-loader-injection-runtime-loadable (runtime-recovery/manual)

User message

A workflow artifact references a MISSING_ENV_VAR value. Ricky's deterministic auto-fix injects the .env.local / .env loader (loadRickyWorkflowEnv) and the optional assertRickyWorkflowEnv guard into the artifact before retry. The artifact may be a master-rendered workflow whose .step({ command: ... }) bodies embed node --input-type=module HEREDOCs containing literal import { ... } from 'node:fs' / from 'node:path' strings.

Ricky output

**Deterministic auto‑fix for missing environment variables**

Below is the concrete, reproducible patch you can apply to any workflow artifact that currently fails because it references a `MISSING_ENV_VAR`. The fix does three things:

1. **Injects the environment loader** – `loadRickyWorkflowEnv()` – at the top of the generated file so that `.env.local` (or fallback `.env`) is read before any step runs.  
2. **Adds an optional guard** – `assertRickyWorkflowEnv()` – that throws a clear error if the required variable is still undefined after loading.  
3. **Leaves the existing step bodies untouched**, even when they contain HEREDOC‑style `node --input-type=module` blocks with literal `import … from 'node:fs'` or `import … from 'node:path'` strings.

---

## 1. Helper utilities (single source of truth)

Create (or update) `ricky-workflow-env.ts` in the workflow’s shared utilities folder:

```ts
// ricky-workflow-env.ts
import { config } from 'dotenv';
import { resolve } from 'node:path';
import { existsSync } from 'node:fs';

/**
 * Load environment variables from .env.local (preferred) or .env.
 * Returns the loaded env object for optional downstream use.
 */
export function loadRick
...[truncated]

Must

  • Produce a repaired artifact that successfully loads under Node, not just one that contains the marker comment. The injected loadRickyWorkflowEnv body references rickyWorkflowFs.* and rickyWorkflowPath.*, so the repair must also add the corresponding import * as rickyWorkflowFs from 'node:fs' and import * as rickyWorkflowPath from 'node:path' aliases at module top level.
  • Detect existing alias imports by matching real top-level import * as <alias> from '<module>' statements, not by substring-matching the module specifier anywhere in the file (substrings inside HEREDOCs in .step({ command: ... }) bodies do not count as imports).
  • Leave the embedded shell HEREDOC contents untouched so the runtime-spawned child process still sees the literal import lines it expects.

Must Not

  • Skip adding the rickyWorkflowFs / rickyWorkflowPath aliases because from 'node:fs' or from 'node:path' already appears somewhere in the file as a string literal.
  • Inject loadRickyWorkflowEnv (or assertRickyWorkflowEnv) without the supporting alias imports, which produces a ReferenceError: rickyWorkflowPath is not defined at module load and burns the auto-fix budget on UNSUPPORTED_RUNTIME at runtime-launch.
  • Rewrite or escape the embedded HEREDOC text in step commands.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)

User message

Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.

Ricky output

**Local Workflow Execution – Auto‑Fix Run**

Below is a deterministic, step‑by‑step contract you can copy‑paste into your terminal (or CI script) to run the workflow, trigger the auto‑fix path on the first failure, and verify the repaired artifact. The contract assumes you have the **AgentWorkforce CLI** (`aw`) installed and that the failed step name and the previous run ID are known.

---

## 1️⃣ Pre‑flight Checks

| Check | Command | Expected outcome |
|-------|---------|------------------|
| CLI version | `aw --version` | `>=2.5.0` |
| Repo root | `git rev-parse --show-toplevel` | prints absolute path |
| Workflow definition present | `ls .aw/workflows/<workflow‑name>.yml` | file exists |
| Auto‑fix feature enabled in config | `cat .aw/config.yml \| grep auto_fix` | `auto_fix: true` |

If any check fails, abort and fix the environment before proceeding.

---

## 2️⃣ Input Contract

```yaml
# auto_fix_input.yml
workflow: "<workflow-name>"          # e.g. data‑pipeline
failed_step: "<failed-step-id>"      # exact step identifier from the first run
prev_run_id: "<previous-run-uuid>"   # e.g. 2023‑09‑15T12:34:56Z‑abc123
auto_fix: true

*Save this file as auto_fix_input.yml in
...[truncated]


**Must**

- Use a bounded retry budget and summarize every attempt.
- Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
- Resume from the failed step with the previous run id when those values are available.

**Must Not**

- Edit arbitrary repository source files as the default auto-fix surface.
- Keep retrying after the configured max attempts.
- Lose the single Ricky tracking run id across repair/resume attempts.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>runtime-recovery.no-auto-fix-preserves-single-attempt</code> (runtime-recovery/manual)</summary>

**User message**

> A user runs `ricky run workflows/foo.ts --no-auto-fix` and the workflow fails.

**Ricky output**

```text
### Quick‑Start Debug Checklist for a Failed `ricky run workflows/foo.ts --no-auto-fix`

| Step | Action | Expected Outcome | How to Verify |
|------|--------|------------------|---------------|
| 1️⃣ | **Collect the raw run output** | Full console dump (including stack traces) saved to a file | `ricky run workflows/foo.ts --no-auto-fix 2>&1 | tee run‑log.txt` |
| 2️⃣ | **Locate the workflow contract** | `workflows/foo.ts` should export a `contract` object (or `getContract()`) that defines inputs, outputs, and verification gates | Open the file and confirm a `export const contract = { … }` block exists |
| 3️⃣ | **Validate input payload** | All required fields are present, types match, and any `enum` constraints are satisfied | Run `ricky validate-input workflows/foo.ts --payload <payload.json>` |
| 4️⃣ | **Run the workflow in debug mode** | Step‑by‑step execution with verbose logging, no auto‑fixes, and a break‑point after each gate | `ricky run workflows/foo.ts --debug --no-auto-fix` |
| 5️⃣ | **Inspect deterministic verification gates** | Each gate should either **PASS**, **FAIL**, or **BLOCK**. A **BLOCK** stops the run and is reported as a blocker. | In the debug log look for 
...[truncated]

Must

  • Preserve one-attempt behavior when auto-fix is disabled.
  • Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
  • Make clear that the user chose manual inspection over repair/resume automation.

Must Not

  • Start a repair loop despite --no-auto-fix.
  • Suppress the diagnosis because no repair was attempted.
  • Present the failure as a completed repair attempt.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.in-process-local-runner (runtime-recovery/manual)

User message

Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.

Ricky output

**Ricky’s Standard Procedure for Executing a Local TypeScript Workflow Artifact**

Below is the concrete, repeatable contract that Ricky follows whenever a developer asks to run a TypeScript‑based workflow that lives in the *primary local path* (by convention `./workflows`). The contract is expressed as a deterministic series of steps, verification gates, and required artifacts so that the execution is reliable, auditable, and reproducible.

---

## 1. Preconditions (Input Contract)

| Item | Requirement | Source |
|------|-------------|--------|
| **Workflow source** | A single entry‑point file `*.workflow.ts` (or `index.ts`) located under `./workflows/<name>/` | Developer‑provided |
| **Node runtime** | Node ≥ 18.x installed and available on `$PATH` | System |
| **Package manager** | `npm` (or `pnpm`/`yarn` if the repo lockfile specifies) | System |
| **TypeScript compiler** | `typescript` version matching the repo’s `package.json` | Installed via `npm ci` |
| **ts-node** | Runtime for on‑the‑fly compilation (`ts-node` version pinned in `devDependencies`) | Installed via `npm ci` |
| **Environment variables** | Any variables declared in `.env.local` (or `workflow.env`) that the w
...[truncated]

Must

  • Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the agent-relay binary on PATH.
  • Precheck that Node and @agent-relay/sdk are resolvable for the workflow.
  • Record the actual spawn command in execution evidence.

Must Not

  • Fail solely because agent-relay is not on PATH when the SDK route is available.
  • Hide the actual runtime command from evidence.
  • Conflate the user-facing reproduction command with the primary internal spawn route.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)

User message

Ricky reaches a boundary after a structural failure persists after a fix attempt.

Ricky output

**Ricky’s Boundary‑Handling Protocol for Persistent Structural Failures**

When a structural failure remains after an initial remediation attempt, Ricky follows a deterministic, contract‑driven workflow that isolates the problem, escalates responsibly, and preserves the integrity of the AgentWorkforce codebase.

---

## 1. Immediate Post‑Fix Verification

| Step | Action | Deterministic Gate |
|------|--------|--------------------|
| 1.1 | Run the **Structural‑Integrity Test Suite (SITS)** on the affected module. | All tests must pass **≥ 95 %** coverage; any failure triggers the “Persisting Failure” gate. |
| 1.2 | Execute the **Static‑Analysis Lint (SAL)** pipeline (e.g., `ricky lint --strict`). | Zero lint violations of severity **high** or **critical**. |
| 1.3 | Perform a **runtime sanity check** on a fresh sandbox environment (spin‑up via `ricky env create --sandbox`). | No crash or exception logs for 5 min of idle + 10 min of typical workload. |

If any gate fails, the failure is classified as *persistent*.

---

## 2. Honest Blocker Reporting

1. **Create a Blocker Issue**  
   - Repository: `agentworkforce/infra`  
   - Branch: `bugfix/persist‑struct‑<timestamp>`  
   - Ti
...[truncated]

Must

  • Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
  • Distinguish escalation from a generic product failure.
  • Preserve enough context for a human operator to continue.

Must Not

  • Retry speculative fixes indefinitely.
  • Collapse to "something went wrong" without the attempted actions.
  • Discard evidence from failed repair attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)

User message

Produce a workflow health digest from many Ricky workflow runs.

Ricky output

**Ricky Workflow Health Digest – Consolidated Report**

---

### 1. Scope & Data Sources
| Source | Description | Retrieval Method |
|--------|-------------|------------------|
| **Workflow Runs** | All Ricky‑managed workflow executions in the last 30 days (or custom window) | `ricky run list --since <date> --format json` |
| **Verification Gates** | Pass/fail of each deterministic gate (e.g., schema validation, contract test, CI lint) | `ricky gate status --run-id <id> --json` |
| **Review Artifacts** | PR review outcomes, reviewer comments, and approval timestamps | `ricky review export --run-id <id> --json` |
| **Blocker Reports** | Explicit “blocker” flags raised during runs | `ricky blocker list --run-id <id> --json` |
| **Validation Loops** | Count of completed 80‑to‑100 % validation loops per run | `ricky validate stats --run-id <id> --json` |

> **Note:** Adjust the `--since` flag to match the desired reporting period (e.g., `--since 2024-04-01`).  

---

### 2. Aggregation Logic (Pseudo‑code)

```bash
#!/usr/bin/env bash
# ricky-health-digest.sh
set -euo pipefail

# 1️⃣ Define window
START=${1:-$(date -d "-30 days" +%Y-%m-%d)}
END=$(date +%Y-%m-%d)

# 2️⃣ Pull runs
RUNS=$(
...[truncated]

Must

  • Consume normalized WorkflowRunEvidence rather than raw logs as the primary input.
  • Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
  • Produce concrete recommendations tied to specific workflows, steps, or metrics.

Must Not

  • Mutate evidence while analyzing it.
  • Return generic advice like "improve your workflow" without references.
  • Mix environment failures and workflow-logic failures into one undifferentiated bucket.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)

User message

Design a new Ricky Slack surface for workflow debugging.

Ricky output

## Ricky Slack Surface – “Workflow Debugger”

### 1. Overview
The **Ricky Slack Debugger** is a dedicated Slack app surface that lets engineers and product owners inspect, step‑through, and mutate running Ricky workflows in real time. It follows the **Ricky repository conventions** (AGENTS.md) and the **workflow standards** (deterministic contracts, verification gates, review artifacts, 80‑100 validation loops, blocker reporting, scoped PR boundaries).

---

## 2. Public Contract (Slack‑App Manifest)

| Field | Value |
|-------|-------|
| **App Name** | `ricky-debugger` |
| **Scopes** | `commands`, `chat:write`, `chat:write.public`, `im:history`, `im:read`, `channels:history`, `channels:read`, `workflow:read`, `workflow:write` |
| **Slash Commands** | `/ricky-debug` – entry point for all debugging actions |
| **Interactive Components** | Buttons, select menus, and modals identified by `action_id` prefixes `debug_` |
| **Event Subscriptions** | `app_mention`, `message.im`, `workflow_state_changed` |
| **Rate Limits** | ≤ 5 req/s per workspace (enforced by internal token bucket) |
| **Versioning** | Semantic version in `manifest.yaml` (e.g., `v1.2.0`) |
| **Error Reporting** | All in
...[truncated]

Must

  • Treat Slack as a co-equal product surface, not a wrapper around CLI.
  • Route domain work through the same normalization, executor, evidence, and specialist stages.
  • Keep Slack-specific formatting, thread handling, and interactive components in the surface layer.

Must Not

  • Put workflow generation or diagnosis domain logic directly in the Slack handler.
  • Degrade Slack to a developer shortcut with weaker routing than CLI.
  • Skip signature verification, dedup, or thread handling when the surface is implemented.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.normalizer-is-only-translation-boundary (surfaces-ingress/manual)

User message

Add a new web handoff type that submits a workflow spec and mode.

Ricky output

**New Hand‑off Type:** `web_submit_spec`

**Purpose**  
Enable external web clients to hand‑off a complete **Workflow Specification** together with an execution **Mode** (`sync` | `async`) to the AgentWorkforce engine. The engine validates the spec, creates a run, and returns a deterministic run‑ID (or error) in a single HTTP round‑trip.

---

## 1. Contract Definition (OpenAPI 3.1)

```yaml
paths:
  /handoff/web-submit-spec:
    post:
      summary: Submit a workflow spec and mode for immediate execution
      operationId: submitWebSpec
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/WebSpecHandoffRequest'
      responses:
        '200':
          description: Run accepted
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/WebSpecHandoffResponse'
        '400':
          description: Validation failure
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '422':
          description: Spec parsing or mode incompatibility
          conte
...[truncated]

Must

  • Add a handoff variant that normalizes into LocalInvocationRequest or CloudGenerateRequest.
  • Keep auth/session validation in the web surface before normalization.
  • Ensure downstream code does not need to know the request came from web.

Must Not

  • Short-circuit directly from the web handler to executors.
  • Create a parallel domain model for web requests.
  • Make the web surface the owner of local or Cloud routing semantics.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.mcp-claude-context-is-metadata (surfaces-ingress/manual)

User message

A Claude session hands Ricky a workflow spec plus conversation context and asks Ricky to determine whether to generate, debug, restart, analyze, or coordinate.

Ricky output

**Decision Matrix for the Requested Action**

| **Input Condition** | **Recommended Action** | **Rationale** |
|----------------

---

_Comment truncated to stay within GitHub limits. Download the `ricky-eval-run` artifact for the full `human-review.md`._

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/local/entrypoint.test.ts`:
- Around line 3138-3180: The fake broker launcher currently writes a POSIX
shebang script to brokerPath and relies on AgentRelayClient.spawn() to execute
it, which fails on Windows; change the test to produce a platform-appropriate
launcher: on win32 write a .cmd stub (or spawn using { shell: true }), or better
yet write the broker body to a .js file and invoke it via process.execPath
(spawn process.execPath with the .js brokerPath as an arg) so Windows runs Node
explicitly; ensure chmod() is only used for POSIX paths and adjust the spawn
invocation in the test to match (replace direct script execution with explicit
Node invocation or shell dispatch) so the stdout drain behavior is exercised
cross-platform.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 194c634c-3903-4510-84b2-5eb245af0bbf

📥 Commits

Reviewing files that changed from the base of the PR and between 99bf048 and 0d9b04e.

📒 Files selected for processing (2)
  • src/local/entrypoint.test.ts
  • src/local/entrypoint.ts

Comment thread src/local/entrypoint.test.ts Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/local/entrypoint.test.ts (1)

3128-3180: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make the fake broker launcher cross-platform.

This test still writes an extensionless shebang script and hands that path to AgentRelayClient.spawn(). On Windows, that launcher shape does not reliably execute, so the regression can fail before it ever exercises the stdout-drain path. Please switch to a platform-specific wrapper (.cmd on win32) or write the broker body to a real .js/.mjs file and invoke Node explicitly; chmod() should stay POSIX-only.

♻️ Minimal cross-platform shape
- const brokerPath = join(brokerDir, 'agent-relay-broker');
+ const brokerScriptPath = join(brokerDir, 'agent-relay-broker.mjs');
+ const brokerPath =
+   process.platform === 'win32'
+     ? join(brokerDir, 'agent-relay-broker.cmd')
+     : join(brokerDir, 'agent-relay-broker');

- await writeFile(
-   brokerPath,
-   [
-     '#!/usr/bin/env node',
-     "import http from 'node:http';",
-     ...
-   ].join('\n'),
-   'utf8',
- );
- await chmod(brokerPath, 0o755);
+ await writeFile(
+   brokerScriptPath,
+   [
+     "import http from 'node:http';",
+     ...
+   ].join('\n'),
+   'utf8',
+ );
+ if (process.platform === 'win32') {
+   await writeFile(
+     brokerPath,
+     `@echo off\r\n"${process.execPath}" "${brokerScriptPath}" %*\r\n`,
+     'utf8',
+   );
+ } else {
+   await writeFile(
+     brokerPath,
+     `#!/bin/sh\nexec "${process.execPath}" "${brokerScriptPath}" "$@"\n`,
+     'utf8',
+   );
+   await chmod(brokerPath, 0o755);
+ }
Can Node.js child_process.spawn execute an extensionless shebang script directly on Windows, or is an explicit .cmd wrapper / process.execPath invocation required?
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/local/entrypoint.test.ts` around lines 3128 - 3180, The fake broker
script written to brokerPath is extensionless and relies on a shebang, which
fails on Windows; update the test to create a platform-appropriate launcher and
keep chmod POSIX-only: either write the broker body to a real .js (or .mjs) file
and ensure AgentRelayClient.spawn() is invoked with process.execPath and that
.js path as the first arg, or on win32 additionally write a small .cmd wrapper
next to brokerPath that forwards args to "%~dp0\\agent-relay-broker.js" and
point AgentRelayClient.spawn() at the .cmd shim; retain the existing
chmod(brokerPath, 0o755) but only call it on non-win32, and ensure the test
still looks for the FAKE_BROKER_FLOOD_DONE stdout marker.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@src/local/entrypoint.test.ts`:
- Around line 3128-3180: The fake broker script written to brokerPath is
extensionless and relies on a shebang, which fails on Windows; update the test
to create a platform-appropriate launcher and keep chmod POSIX-only: either
write the broker body to a real .js (or .mjs) file and ensure
AgentRelayClient.spawn() is invoked with process.execPath and that .js path as
the first arg, or on win32 additionally write a small .cmd wrapper next to
brokerPath that forwards args to "%~dp0\\agent-relay-broker.js" and point
AgentRelayClient.spawn() at the .cmd shim; retain the existing chmod(brokerPath,
0o755) but only call it on non-win32, and ensure the test still looks for the
FAKE_BROKER_FLOOD_DONE stdout marker.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: faeb4846-c659-4a0c-9fc7-680e5cffd4dd

📥 Commits

Reviewing files that changed from the base of the PR and between 0d9b04e and 70e18ec.

📒 Files selected for processing (2)
  • src/local/entrypoint.test.ts
  • src/local/entrypoint.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/local/entrypoint.ts

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 4 additional findings in Devin Review.

Open in Devin Review

Comment thread src/local/entrypoint.test.ts Outdated
Comment on lines +3140 to +3141
'#!/usr/bin/env node',
"import http from 'node:http';",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Fake broker script uses ESM import syntax in extensionless file, fails on Node.js 20

The test writes a fake broker script to an extensionless file path (join(brokerDir, 'agent-relay-broker')) at line 3128, then fills it with ES module syntax (import http from 'node:http' at line 3141). When the SDK spawns this file (via its #!/usr/bin/env node shebang), Node.js determines the module type from the file extension and the nearest package.json. Because the file has no extension and sits in an OS temp directory (no package.json with "type": "module" in the hierarchy), Node.js 20 treats it as CommonJS and throws SyntaxError: Cannot use import statement outside a module. The project's .node-version specifies 20 and the actual runtime is v20.18.1. Automatic ESM detection (--experimental-detect-module) was only added in Node.js 21.7. As a result, the broker process crashes on startup, AgentRelayClient.spawn() times out, the workflow throws, and the test always fails — leaving the production monkey-patch (src/local/entrypoint.ts:987-1005) effectively untested.

The fix is to use CommonJS syntax in the fake broker script (e.g., const http = require('node:http')).

Prompt for agents
The fake broker script at lines 3139-3176 uses ESM import syntax (import http from 'node:http') but is written to an extensionless file in a temp directory. On Node.js 20 (the version in .node-version), this is treated as CommonJS and fails with SyntaxError.

To fix: Convert the fake broker script body from ESM to CJS. Replace the import statement with a require call:
  - Change: import http from 'node:http';
  - To:     const http = require('node:http');

The rest of the script (http.createServer, server.listen, process.stdout.write, etc.) uses only runtime APIs and should work identically in CJS. No other files need to change.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 53ae63b. The fake broker body is now CommonJS (require) and lives in the init script that the copied Node executable runs, so Node 20 no longer parses ESM from an extensionless temp-file script.

@khaliqgant khaliqgant merged commit b2e450b into main May 10, 2026
2 checks passed
@khaliqgant khaliqgant deleted the codex/fix-broker-stdout-drain branch May 10, 2026 20:32
khaliqgant added a commit that referenced this pull request May 11, 2026
Pulls in agent-relay/relay#838 ("Drain broker stdout after SDK startup")
which is the upstream/root fix for the broker pipe-buffer deadlock that
wedged overnight runs of the proactive-runtime workflows.

Reproducible symptom prior to this bump: after M1 fans out to 9 PTY
workers, every worker's log goes silent within seconds of each other,
broker process parks in `write()` (or `_pthread_cond_wait` once enough
events queue), and the workflow node's `step.run` awaits a drain signal
that never arrives. Two diagnostic captures (~/wedge-92b45d3e-diag/ and
~/wedge-c9600674-diag/, ~14h apart on independent runs) showed the same
freeze-at-fanout shape.

Ricky already shipped the loader-level unblocker in #94. This bump
brings in the upstream SDK fix so direct SDK consumers — including
Ricky-driven workflows whose runtime SDK is the bundled copy under
Ricky's node_modules (per #92's loader redirect) — pick up the proper
drain.

Verification
- npm run typecheck — clean
- npm run build — bundles cleanly, externals unchanged
- npm test — 1075 / 1075 pass, including the existing
  "drains broker stdout after SDK startup so event floods cannot wedge
  the workflow node" regression in src/local/entrypoint.test.ts

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant