Skip to content

fix(loader): drain broker stdout on spawn (was wired to 'pause' which never fires)#98

Merged
khaliqgant merged 1 commit into
mainfrom
fix/broker-stdout-drain-unblocker
May 11, 2026
Merged

fix(loader): drain broker stdout on spawn (was wired to 'pause' which never fires)#98
khaliqgant merged 1 commit into
mainfrom
fix/broker-stdout-drain-unblocker

Conversation

@khaliqgant
Copy link
Copy Markdown
Member

Summary

The loader's NODE_OPTIONS register script monkeypatches child_process.spawn so that agent-relay-broker children have their stdout drained — preventing the broker from blocking in write() once the OS pipe buffer fills. The previous wiring was a no-op: it hooked the stream's 'pause' event, which Node's Readable streams never emit on internal buffer fill'pause' only fires when something explicitly calls .pause(), which nothing in this code path does. The stream stayed in paused mode, libuv stopped draining the kernel pipe at the high-water mark, and a chatty broker would block in write() once ~64KB queued up.

Symptom

Overnight proactive-runtime-m1 runs (Ricky-driven, M1 fans out to 9 PTY workers) froze within seconds of fanout with every worker log stuck at the same mtime, broker process parked in write() or _pthread_cond_wait, M1's step.run awaiting a never-arriving drain signal. Reproduced twice ~14 hours apart on independent runs, with diagnostic bundles capturing the same freeze-at-fanout shape both times.

Root cause (decoded from the running NODE_OPTIONS)

// Previous (broken):
const drainBrokerStdout = () => {
  child.stdout?.off(\"pause\", drainBrokerStdout);
  child.stdout?.on(\"data\", () => {});
  child.stdout?.resume();
};
child.stdout.on(\"pause\", drainBrokerStdout);   // ← never fires

vs. what SDK 6.0.15's drainBrokerStdoutAfterStartup (relay#838) does for direct SDK consumers:

child.stdout.on('data', () => {});
child.stdout.resume();

What changed (`src/local/entrypoint.ts`)

  1. Attach data listener and resume() synchronously at spawn time. No more waiting on 'pause'. Stream goes into flowing mode at birth and stays there for the lifetime of the broker.
  2. Expand argv guard from argv[0]===\"init\" to also include \"pty\". M1 spawns one init broker (channel multiplexer) plus one pty broker per worker (1 lead + 9 impl-* workers). All of them write events back through stdout. Covering only init left the PTY brokers vulnerable.
  3. Updated the comment block above registerSource to capture the new semantics and the prior bug.

Verification

  • npm run typecheck — clean
  • npm test1077 / 1077 pass, including the existing \"drains broker stdout after SDK startup so event floods cannot wedge the workflow node\" regression at entrypoint.test.ts:3122 (which previously passed only because the bundled SDK 6.0.15 has its own drain — this fix makes the loader-level protection actually do what it advertises for consumers still on older SDKs).

Related

  • relay#838 — upstream/root drain in @agent-relay/sdk@6.0.15
  • ricky#94 — original loader-level unblocker (introduced the broken 'pause' wiring)
  • ricky#96 — earlier bump to bundle SDK ^6.0.15 (necessary but not sufficient when consumers have their own local SDK pin)

… never fires)

The NODE_OPTIONS register script monkeypatches child_process.spawn so that
agent-relay-broker children have their stdout drained — preventing the
broker from blocking on write() once the OS pipe buffer fills.

The previous wiring attached a 'data' listener inside a callback that only
ran when child.stdout emitted 'pause'. Node Readable streams never emit
'pause' on internal buffer fill — that event only fires when something
explicitly calls .pause() (which nothing in this code path ever does),
so the stream stayed in paused mode, libuv stopped draining the kernel
pipe at high-water mark, and a chatty broker would block in write() once
~64KB of stdout queued up.

Symptom: overnight proactive-runtime runs (Ricky-driven, M1 fans out to
9 PTY workers) froze within seconds of fanout with every worker log
stuck at the same mtime, broker process parked in write() or
_pthread_cond_wait, M1's step.run awaiting a never-arriving drain
signal. Reproduced twice (~14h apart) with diagnostic bundles capturing
the same shape.

Changes
- Attach `data` listener and call `resume()` synchronously at spawn time
  for both `init` and `pty` broker invocations. This matches what SDK
  6.0.15's `drainBrokerStdoutAfterStartup` does for direct SDK consumers.
- Expand the argv guard from `argv[0]==="init"` to also include `"pty"`,
  so per-worker PTY brokers (M1's lead + impl-* fanout) are protected,
  not just the channel-multiplexer init broker.
- Update the explanatory comment block above `registerSource` to capture
  the new semantics and the prior bug.

Verification
- npm run typecheck — clean
- npm test — 1077 / 1077 pass, including the existing
  "drains broker stdout after SDK startup so event floods cannot wedge
  the workflow node" regression at entrypoint.test.ts:3122.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 11, 2026

Review Change Stack

Caution

Review failed

Pull request was closed or merged during review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 4f8f8868-3268-4f62-8ec6-ff80ca586e68

📥 Commits

Reviewing files that changed from the base of the PR and between 684c865 and cd6093c.

📒 Files selected for processing (1)
  • src/local/entrypoint.ts

📝 Walkthrough

Walkthrough

The SDK runtime loader's child-process stdout draining workaround is updated to fix broker stdout blockage. Instead of relying on pause-event hooks, the injected spawn patch now immediately attaches a no-op data listener and calls resume() on stdout for managed agent-relay-broker processes in init and pty modes. The accompanying comment explains the readline-detached behavior that necessitates this fix.

Changes

Stdout Draining Workaround

Layer / File(s) Summary
Problem Context Comment
src/local/entrypoint.ts
Updated inline comment explaining readline-detached behavior leaves broker stdout in paused mode, shifting from pause-event explanation to paused-state explanation.
Spawn Patch Logic Fix
src/local/entrypoint.ts
Injected spawn patch now unconditionally installs no-op stdout.on("data") handler and calls child.stdout.resume() for init and pty agent-relay-broker children, replacing previous pause-handler defer logic.

Possibly Related PRs

  • AgentWorkforce/ricky#94: Also modifies the spawn patch for agent-relay-broker stdout draining in the same entrypoint file.
  • AgentWorkforce/ricky#96: Addresses the same stdout-draining deadlock by bumping the vendored @agent-relay/sdk upstream fix while this PR updates the loader-side workaround.

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

A brook with no flow, held in a pause,
Now wakes with a touch and resumes its cause,
No more shall the broker wait idle and still,
A listener and resume—a simple fix will! 🐰

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: fixing broker stdout draining by moving from a broken 'pause' event hook to immediate attachment of a data listener.
Description check ✅ Passed The description is detailed and directly related to the changeset, explaining the bug, root cause, symptoms, and the specific changes made to src/local/entrypoint.ts.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/broker-stdout-drain-unblocker

Comment @coderabbitai help to get the list of available commands and usage tips.

@khaliqgant khaliqgant merged commit 18ffb66 into main May 11, 2026
0 of 3 checks passed
@khaliqgant khaliqgant deleted the fix/broker-stdout-drain-unblocker branch May 11, 2026 11:52
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

Open in Devin Review

@github-actions
Copy link
Copy Markdown
Contributor

Ricky Eval Review

Run: .ricky/evals/runs/2026-05-11T11-52-07-013Z-HEAD-provider
Mode: provider
Git SHA: 9fe2a95

Passed: 8 | Needs human: 42 | Reviewable: 42 | Missing output: 0 | Failed: 1 | Skipped: 0

Blocking Cases

FAIL workflow-authoring.generated-workflow-template-read (workflow-authoring/manual)

User message

Generate several Ricky workflows from a meta-workflow.

Ricky output

No Ricky output captured for this case. Executor: manual.

Must

  • Read docs/workflows/WORKFLOW_STANDARDS.md, workflows/shared/WORKFLOW_AUTHORING_RULES.md, relevant specs, and workflows/meta/spec/generated-workflow-template.md at runtime.
  • Materialize audit artifacts under .workflow-artifacts/<meta-slug>/.
  • Dry-run or structurally validate generated workflows before signoff.

Must Not

  • Rely only on ambient agent context for workflow standards.
  • Claim generated workflows are ready before dry-run or structural sanity checks.
  • Hand-tune generated workflows when the template/spec/rules should be fixed upstream.

Error

OpenRouter eval failed: 429 Provider returned error

Human Review Cases

These cases passed deterministic checks and include captured Ricky output for a human verdict against their Must / Must Not rubric.

REVIEW agent-assistant-boundary.real-reuse-not-rhetorical (agent-assistant-boundary/manual)

User message

Update Ricky docs and code to say it uses Agent Assistant more deeply.

Ricky output

**Ricky – Updated Documentation & Code to Reflect Deeper Integration with Agent Assistant**

Below is a complete, deterministic change package that brings the repository in line with the new product spec: *Ricky now leverages the Agent Assistant (AA) for all coordination, verification, and suggestion steps.*  

The package follows **Ricky‑repo conventions** (AGENTS.md), the **workflow standards** (contract‑first, 80‑100 validation loops), and the **shared authoring rules** (scoped branches, PR‑level review artifacts).

---

## 1. Scope of the Change  

| Area | What is added/modified | Reason |
|------|------------------------|--------|
| **README.md** | New “Deep Agent Assistant Integration” section; updated feature list and architecture diagram. | Makes the product claim visible to users and contributors. |
| **docs/architecture.md** | Expanded diagram and description showing AA in the *Coordination*, *Verification*, and *Suggestion* layers. | Provides a single source of truth for the architecture. |
| **src/ricky/__init__.py** | Updated module docstring to mention AA usage. | Keeps code‑level documentation consistent. |
| **src/ricky/coordination.py** | Refactored to call `agent
...[truncated]

Must

  • Ground claims in real package imports and runtime paths.
  • Distinguish current implementation from target architecture.
  • Identify which Agent Assistant primitive is actually exercised.

Must Not

  • Rename local code to sound Agent Assistant aligned and count that as adoption.
  • Claim broad Agent Assistant native behavior from documentation-only alignment.
  • Blur target architecture with landed behavior.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.turn-context-preserves-ricky-envelope (agent-assistant-boundary/manual)

User message

Evaluate the current Ricky @agent-assistant/turn-context adoption.

Ricky output

**Ricky @agent‑assistant/turn‑context Adoption Review (as of 2026‑05‑11)**  

| Metric | Current State | Target / Trend | Comments |
|--------|---------------|----------------|----------|
| **Workflow contracts defined** | 12 of 15 active agents have a formal `turn‑context` contract (input schema, output schema, version tag). | 100 % by Q4 2026 | 3 agents still use ad‑hoc context passing; they are flagged as **adoption blockers**. |
| **Deterministic verification gates** | 9 agents run the `turn‑context‑gate` CI step (schema validation + deterministic hash check). | 100 % by Q3 2026 | 3 agents lack the gate; CI logs show intermittent schema mismatches. |
| **Review artifacts** | 78 % of PRs include a `turn‑context‑review` checklist entry; 22 % missing. | 95 % by Q2 2026 | Missing entries correlate with the three agents lacking contracts. |
| **Validation loops** | Average of 62 validation loops per PR (target 80‑100). | 80‑100 loops per PR by Q4 2026 | Loop count limited by missing mock‑turn fixtures for two agents. |
| **Blocker reporting** | 4 open blocker tickets (IDs: #312, #417, #529, #601). | Zero blockers after remediation. | All blockers relate to schema drift or missing ve
...[truncated]

Must

  • Preserve request id, source metadata, structured spec, invocation root, mode, stage mode, spec path, metadata, and spec text.
  • Record compact provenance through generation decisions or coordinator metadata.
  • Keep the shared turn context internal to the adapter boundary.

Must Not

  • Move LocalResponse, blocker taxonomy, recovery wording, or execution semantics into the shared turn-context package.
  • Drop Ricky-specific workflow metadata during envelope assembly.
  • Treat turn context as a product decision engine.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.product-core-stays-ricky-owned (agent-assistant-boundary/manual)

User message

Decide whether workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording should move into Agent Assistant.

Ricky output

**Recommendation:** Move **all four** capabilities—workflow generation, validation, debugging, and the staged‑CLI UX with blocker/evidence wording—into the **Agent Assistant** product line.

---

## Rationale (aligned with AGENTS.md & workflow standards)

| Capability | Current State | Why it belongs in Agent Assistant | Impact on reliability & coordination |
|------------|---------------|-----------------------------------|--------------------------------------|
| **Workflow Generation** | Exists in separate “Generator” micro‑service, invoked via HTTP from the UI. | Agent Assistant already owns the *authoring* contract (JSON‑LD workflow spec). Centralising generation eliminates cross‑service latency, guarantees that generated artifacts conform to the same schema version, and enables deterministic “generation‑gate” tests. | Reduces race conditions between generator and validator; single source of truth for versioned templates. |
| **Workflow Validation** | Implemented as a post‑generation batch job that writes validation reports to a DB. | Validation is a deterministic verification gate that must run **immediately** after generation to enforce the 80‑100 loop contract. Embedding it
...[truncated]

Must

  • Keep product-defining workflow generation, validation, debugging, local UX, and evidence wording Ricky-owned until proof says otherwise.
  • Reuse shared runtime primitives where they reduce duplication without weakening Ricky.
  • Make extraction follow typed, tested, live product proof.

Must Not

  • Generalize workflow-specific behavior prematurely.
  • Adopt moving shared seams merely for architectural purity.
  • Lose the precise local-first staged workflow UX.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.one-slice-at-a-time (agent-assistant-boundary/manual)

User message

Plan the next Agent Assistant adoption slice for Ricky.

Ricky output

**Agent Assistant Adoption Slice – “Ricky v2.0 Roll‑out (Phase 1)”**  
*Target: Core Customer‑Success (CS) teams in North America (≈120 users)*  

---

## 1. Slice Objective
Enable CS agents to **create, test, and publish reliable AgentWorkforce (AW) workflows** using Ricky’s authoring assistant, while establishing the **verification‑gate framework** that will become the standard for all future slices.

| Success Metric | Target |
|----------------|--------|
| Agents who can independently author a workflow (end‑to‑end) | ≥ 90 % of pilot participants |
| First‑time‑right workflow acceptance (no post‑publish bugs) | ≥ 95 % |
| Average time from idea → publish | ≤ 2 days |
| Blocker detection & reporting compliance | 100 % of identified blockers logged in JIRA |

---

## 2. Scope & Boundaries  

| In‑Scope | Out‑of‑Scope |
|----------|--------------|
| • Ricky‑enabled authoring UI (web) – v2.0 feature flag | • Enterprise‑wide rollout (later slices) |
| • Deterministic verification gates (syntax, schema, sandbox execution) | • Custom connector development (handled by Platform team) |
| • Review artifacts: PR template, gate‑report markdown, test‑suite summary | • Legacy workflow migrati
...[truncated]

Must

  • Pick exactly one real shared seam to evaluate or adopt.
  • Define a live Ricky product path that will prove the adoption.
  • Include regression checks that product messaging, blocker output, and evidence remain truthful.

Must Not

  • Bundle sessions, memory, policy, proactive behavior, and execution extraction into one vague migration.
  • Skip the comparison/evaluation step for mature Ricky-local seams.
  • Treat adoption as successful without a live product-path proof.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.future-surfaces-use-shared-runtime (agent-assistant-boundary/manual)

User message

Design future Slack or web support for Ricky using Agent Assistant packages.

Ricky output

## Ricky × Slack / Web Support – Agent‑Assistant Package Design  

Below is a **complete, production‑ready package specification** that lets Ricky (the reliability‑coordination assistant) surface in Slack channels or a dedicated web widget. The design follows the **Ricky repository conventions** (AGENTS.md), the **workflow standards** (deterministic contracts, verification gates, review artifacts, 80‑100 validation loops, blocker reporting), and respects **branch/PR scoping**.

---  

### 1. High‑Level Architecture  

| Component | Responsibility | Runtime | Key Interfaces |
|-----------|----------------|---------|----------------|
| **ricky‑core** | Business logic, reliability checks, coordination engine | Python 3.11 (async) | `ricky.core.*` |
| **ricky‑slack‑adapter** | Slack Events API & Interactivity bridge | FastAPI (ASGI) | `/slack/events`, `/slack/actions` |
| **ricky‑web‑widget** | Embedded React widget + WebSocket client | Node 20 (Vite) | `ws://<host>/ws/ricky` |
| **ricky‑gateway** | API‑gateway (nginx + auth) that routes Slack ↔ web ↔ core | Nginx + JWT | `POST /api/v1/*` |
| **ricky‑infra** | Terraform + Helm charts (K8s) for deployment | – | – |

All components are v
...[truncated]

Must

  • Prefer shared surfaces, webhook-runtime, sessions, and routing primitives for future non-CLI interaction where mature.
  • Keep local CLI behavior product-local unless shared adoption is proven harmless.
  • Explain which behavior is future/target architecture versus implemented today.

Must Not

  • Preemptively add memory, policy, or proactive packages without a real Ricky product requirement.
  • Let future surface abstractions distort the current CLI contract.
  • Duplicate a mature Agent Assistant capability locally without justification.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.generation-default-not-execution (cli-behavior/manual)

User message

A user runs ricky --mode local --spec "generate a workflow for package checks" without --run.

Ricky output

**What the command does**

```bash
ricky --mode local --spec "generate a workflow for package checks"
  • --mode local – Runs the agent in the sandboxed, file‑system‑only environment (no external services are touched).
  • --spec "…" – Supplies a natural‑language specification that Ricky will turn into a workflow contract (YAML/JSON) and a set of verification gates.
  • No --run flag – Tells Ricky to stop after authoring the workflow; it will not execute any steps.

So the command will:

  1. Parse the spec → “generate a workflow for package checks”.
  2. Create a deterministic workflow contract (workflow.yaml) that:
    • Checks that required package files exist.
    • Runs lint, unit‑test, and security‑scan stages.
    • Publishes a summary artifact.
  3. Add verification gates (e.g., gate: file_exists, gate: lint_pass, gate: test_pass, gate: scan_pass).
  4. Write supporting artifacts:
    • README.md with usage notes.
    • CONTRIBUTING.md snippet describing the 80‑to‑100 validation loop.
    • A stub review.md where a human reviewer can add comments before the PR is opened.
  5. Exit with a success status
    ...[truncated]

**Must**

- Say generation is the default and execution was not requested.
- Print the generated artifact path, workflow id, spec digest, and next run command.
- Avoid showing execution evidence for a generation-only request.

**Must Not**

- Imply the workflow ran automatically.
- Present a generation-only result as execution success.
- Hide the opt-in commands for running the artifact.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.first-run-copy-is-compact-and-truthful</code> (cli-behavior/manual)</summary>

**User message**

> Render Ricky's first-run CLI onboarding for a new user.

**Ricky output**

```text
**Ricky – First‑Run CLI Onboarding**

Below is the complete, ready‑to‑run onboarding workflow for a brand‑new user of the Ricky CLI. It follows the repository conventions (AGENTS.md), the workflow standards, and the shared authoring rules. All steps are deterministic, include verification gates, and produce review artifacts that can be inspected in CI.

---

## 1️⃣  Repository Layout (for reference)

ricky/
├─ .github/
│ └─ workflows/
│ └─ onboarding.yml # CI gate for first‑run onboarding
├─ src/
│ └─ cli/
│ └─ main.ts # Entry point (Node/TS)
├─ scripts/
│ └─ onboarding/
│ ├─ 01-setup-env.sh
│ ├─ 02-install-deps.sh
│ ├─ 03-configure-ricky.sh
│ └─ 04-verify.sh
├─ docs/
│ └─ onboarding.md # Human‑readable guide (generated)
└─ package.json


---

## 2️⃣  Onboarding Contract (YAML)

```yaml
# onboarding/contract.yml
name: Ricky First‑Run CLI Onboarding
description: |
  Bootstraps a new developer workstation for the Ricky AgentWorkforce
  platform. Installs dependencies, creates a local config, and validates
  the installation.
scope: local‑machine
owner: ricky-team
version: 1.0.0
steps:
  - id: setup-env
...[truncated]

Must

  • Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
  • End every branch with a concrete next step.
  • Advertise only commands that are currently implemented.

Must Not

  • Sound like a launch page or documentation dump.
  • Claim Ricky runs workflows by default when generation is the default path.
  • Require web or Slack onboarding before CLI use.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.recovery-guidance-no-stack-traces (cli-behavior/manual)

User message

A user gives Ricky an empty spec or a missing spec file.

Ricky output

**Blocker – Missing Specification**

The workflow authoring process requires a concrete specification file (e.g., `workflow.yaml`, `contract.json`, or equivalent) that defines:

1. **Inputs / Triggers** – event schema, required fields, source system.  
2. **Outputs / Artifacts** – expected results, data contracts, downstream consumers.  
3. **Steps / Tasks** – ordered actions, responsible agents, required resources.  
4. **Verification Gates** – deterministic checks, test vectors, success criteria.  
5. **Review & Release Metadata** – owners, reviewers, version tags, branch/PR targets.

Without this spec, I cannot:

- Generate a deterministic workflow contract.  
- Create verification gates or validation loops.  
- Scope the branch/PR boundaries for review.  
- Produce any of the required review artifacts (design doc, test plan, CI config).

**Next Steps for You**

1. **Create or locate the spec file** in the repository root (e.g., `specs/my_workflow.yaml`).  
2. Ensure it includes the sections listed above.  
3. Commit the file to a feature branch (e.g., `feature/add‑my‑workflow`) and push it.  
4. Notify me (or open a PR) once the spec is present; I’ll then generate the workflow 
...[truncated]

Must

  • Return a user-facing failure or guidance message with a real recovery command.
  • Distinguish generation failure from execution failure.
  • Show stack traces only when verbose diagnostic mode is requested.

Must Not

  • Crash with an uncaught exception in normal mode.
  • Suggest commands that do not exist.
  • Pretend a missing spec was accepted.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.status-does-not-invent-provider-state (cli-behavior/manual)

User message

Render ricky status when no provider checks have proven Google or GitHub are connected.

Ricky output

**`ricky status` output (no provider checks have proven Google or GitHub are connected)**  

=== AgentWorkforce Status Report ===

🟡 Provider Connectivity
• Google : NOT VERIFIED (no successful check)
• GitHub : NOT VERIFIED (no successful check)

🔧 Workflow Engine
• Scheduler : RUNNING
• Queue : EMPTY
• Workers : 0 ACTIVE / 0 IDLE

📦 Repository State
• Branches : master (default)
• Pending PRs : 0
• Unmerged : 0

⚙️ System Health
• CPU Usage : 3%
• Memory : 210 MiB / 2 GiB
• Disk Space : 12 GiB free / 20 GiB total

🚨 Alerts
• Provider check for Google failed or not executed.
• Provider check for GitHub failed or not executed.

📝 Next Steps (recommended)

  1. Run ricky provider check google to validate Google connectivity.
  2. Run ricky provider check github to validate GitHub connectivity.
  3. Review provider credentials in ~/.ricky/config.yaml if checks continue to fail.
  4. Re‑run ricky status after successful checks to see green ✅ indicators.

--- End of Report ---

Must

  • Report unknown or not-connected provider state honestly.
  • Update provider status only from explicit provider checks or Cloud status results.
  • Give concrete setup guidance for Cloud when relevant.

Must Not

  • Mark Google or GitHub connected because guidance text was shown.
  • Invent a provider connection URL or OAuth flow.
  • Show empty fields with no recovery guidance when config is missing.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-matcher-registry-backed (generation-quality/manual)

User message

Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.

Ricky output

**GitHub Primitive‑Change Workflow**

| Stage | Owner | Action | Input | Output | Verification Gate |
|-------|-------|--------|-------|--------|--------------------|
| **1️⃣ Plan** | **Product Owner** | Draft a concise change ticket (title, description, scope, risk) in the project board. | Feature‑request or bug‑report. | Ticket status = *Ready for Development* (✅). |
| **2️⃣ Branch** | **Developer** | `git checkout -b <type>/<ticket‑id>-<short‑slug>` (e.g., `feat/1234-add‑api`). | `main` HEAD. | Branch naming lint passes (pre‑commit hook). |
| **3️⃣ Code** | **Developer** | Implement the primitive change (single‑file edit, config tweak, doc update). | Local repo. | `npm run lint && npm test` (or language‑specific CI) must return **0**. |
| **4️⃣ Local Verify** | **Developer** | Run **deterministic verification**: <br>• Unit tests covering the touched code. <br>• Static analysis (security, style). | Code + test results. | All tests pass **and** no new lint violations. |
| **5️⃣ Push** | **Developer** | `git push origin <branch>` | Remote branch. | Remote CI pipeline triggers; must reach **“Checks Passed”** status. |
| **6️⃣ Pull Request (PR) Open** | **Developer** | Create PR targ
...[truncated]

Must

  • Select skills from the actual registry rather than a hardcoded static set.
  • Record ranked skill matches with confidence and trigger evidence.
  • Fall back gracefully when the registry is missing or no skill clears the threshold.

Must Not

  • Claim runtime agents embody skills when only generation-time guidance was applied.
  • Fail generation solely because optional skill files are missing.
  • Hide skill selection evidence from artifacts or JSON output.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)

User message

Generate a workflow where the spec says "use Claude to review and Codex to implement".

Ricky output

**Workflow: `review‑then‑implement`**

| **Element** | **Definition** |
|-------------|----------------|
| **Name** | `review‑then‑implement` |
| **Description** | Executes a Claude‑based code review on a proposed change, then hands the approved diff to Codex for implementation. |
| **Repository** | `github.com/your‑org/your‑repo` (default branch `main`) |
| **Branching Strategy** | <ul><li>Feature work is done on a short‑lived branch `feat/<ticket‑id>`.</li><li>Claude review runs on a PR opened against `main`.</li><li>After successful review, a deterministic “implementation” branch `impl/<ticket‑id>` is created from the PR head and submitted to Codex.</li></ul> |
| **PR Boundaries** | <ul><li>**Review PR** – contains the original change set and a `CLAUDE‑REVIEW` label.</li><li>**Implementation PR** – created automatically, labeled `CODEX‑IMPL`, targets `main`.</li></ul> |

---

## 1. Workflow Contract (YAML)

```yaml
name: review-then-implement
on:
  pull_request:
    types: [opened, synchronize, labeled]
    branches: [main]

env:
  CLAUDE_MODEL: "claude-3-sonnet"
  CODEX_MODEL: "codex-2.0"
  MAX_LOOP: 100          # validation loops
  REVIEW_TIMEOUT: "30m"
  IMPLEMENT_TIMEOUT: "
...[truncated]

Must

  • Assign per-step runner/model decisions from explicit spec hints where possible.
  • Let skill metadata or project defaults fill gaps when the spec is silent.
  • Write tool-selection.json or equivalent audit metadata.

Must Not

  • Use one generic runtime default for every agent despite explicit hints.
  • Let skill defaults override explicit user/spec runner hints.
  • Omit the reason each tool or model was chosen.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)

User message

Use --refine to sharpen a generated workflow's step task descriptions and acceptance gates.

Ricky output

## Using `--refine` to Sharpen Step Descriptions & Acceptance Gates  

`--refine` is the **deterministic post‑processing flag** that takes a draft workflow JSON/YAML, runs it through the **Ricky refinement engine**, and returns a new version where:

1. **Task‑level descriptions** are concrete, action‑oriented, and bounded (≤ 2 sentences).  
2. **Acceptance gates** are explicit, testable predicates (e.g., “output file exists and size > 0”, “API returns 200 and schema matches”).  
3. **Verification metadata** (e.g., `gate_id`, `gate_type`, `expected`) is added for downstream CI checks.  

Below is a step‑by‑step contract you can follow when invoking `--refine`.

---

### 1. Prepare the Draft Workflow  

```yaml
# draft-workflow.yaml
name: data‑ingest
steps:
  - id: fetch
    task: "grab the data"
    uses: http-get
    with:
      url: "{{ inputs.source_url }}"
  - id: transform
    task: "make it pretty"
    uses: python-script
    with:
      script: transform.py
  - id: store
    task: "save it"
    uses: s3-put
    with:
      bucket: "{{ inputs.bucket }}"

The draft may contain vague verbs (“grab”, “make it pretty”) and no gates.


2. Run the Refinement Command

...[truncated]


**Must**

- Keep the deterministic unrefined path as the default.
- Bound the refinement pass by timeout, token budget, and editable regions.
- Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.

**Must Not**

- Change the workflow graph, agent assignments, or side-effect scope during refinement.
- Fail the whole generation if optional refinement times out.
- Hide warnings when the deterministic artifact is returned unchanged.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>generation-quality.behavior-grounded-gates</code> (generation-quality/manual)</summary>

**User message**

> Generate a workflow for the `ricky --version` spec.

**Ricky output**

```text
**Workflow: `ricky --version` Specification & Release**  
*Repository: `github.com/yourorg/ricky`*  
*Target branch: `main` → `release/vX.Y.Z`*  

---  

## 1. Scope & Objective
Produce a deterministic, test‑driven workflow that validates the `ricky --version` command, guarantees correct semantic‑version output, and publishes the new version tag. The workflow must be repeatable, auditable, and safe to run on CI/CD pipelines.

---

## 2. Contract (Inputs → Outputs)

| Item | Description |
|------|-------------|
| **Input** | Source code on `release/vX.Y.Z` (branch created from `main`). |
| **Environment** | Ubuntu‑22.04 runner, Go 1.22+, `goreleaser` 2.x, `git` 2.40+. |
| **Outputs** | 1. Binary `ricky` with embedded version string.<br>2. `ricky --version` prints `ricky version vX.Y.Z` (exact match).<br>3. Git tag `vX.Y.Z` pushed to origin.<br>4. Release notes draft in `CHANGELOG.md`.<br>5. Artifact uploaded to GitHub Releases. |
| **Failure Conditions** | Any step returns non‑zero, version string mismatches, or tag already exists. |

---

## 3. Deterministic Verification Gates  

| Gate | Command | Expected Result | Action on Failure |
|------|---------|----------------|-----------
...[truncated]

Must

  • Build gates from the stated acceptance behavior, such as checking ricky --version output.
  • Avoid generic source-shape grep checks when the spec asks for CLI behavior.
  • Keep generated validation meaningful for the current repo shape.

Must Not

  • Treat grep -Eq 'export|function|class|workflow(' dist/ricky.js as proof of version behavior.
  • Claim the workflow is proven by source syntax alone.
  • Ignore the package-json version resolution order in the spec.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)

User message

Generate a workflow for many independent artifacts with a validation/fix/rerun loop.

Ricky output

**Workflow: “Batch‑Artifact Validation‑Fix‑Rerun Loop”**  
*Purpose*: Process a large set of independent artifacts (e.g., data files, code modules, model checkpoints) through a deterministic validation pipeline, automatically apply fix‑scripts where possible, and rerun until all artifacts are either **PASS** or **BLOCKED**. The workflow is scoped to a single branch (`batch‑validation`) and produces a single PR that aggregates the final state of every artifact.

---

## 1. Contract (YAML)

```yaml
name: batch_artifact_validation
description: >
  Validate, auto‑fix, and rerun a collection of independent artifacts.
  Stops when every artifact is PASS or BLOCKED.
branch: batch-validation
trigger:
  - push
  - workflow_dispatch
inputs:
  artifact_path:   # root folder containing artifacts
    type: string
    default: artifacts/
  max_iterations:
    type: integer
    default: 10
  validation_schema:   # JSON‑Schema file used by the validator
    type: string
    default: schema.json
outputs:
  summary_report:   # markdown report attached to the PR
    type: string

2. High‑Level Stages

| Stage | Deterministic Gate | Action | Success → Next | Failure → Next |
|-------|-----
...[truncated]


**Must**

- Choose `dag`, `supervisor`, or `pipeline` deliberately based on the work shape.
- Explain the pattern choice in artifact metadata or a rationale.
- Use `dag` for validation/fix/rerun loops when dependencies matter.

**Must Not**

- Default blindly to `dag` for every workflow.
- Collapse independent artifact work into one vague agent task.
- Omit verification gates because the chosen pattern seems obvious.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>generation-quality.skill-boundary-copy</code> (generation-quality/manual)</summary>

**User message**

> Describe how selected workflow-writing skills affected a generated Ricky workflow.

**Ricky output**

```text
### How Selected Workflow‑Writing Skills Shape a Generated Ricky Workflow  

| Skill | What the Skill Does | Direct Effect on the Generated Workflow | Example of the Resulting Artifact |
|-------|---------------------|------------------------------------------|------------------------------------|
| **Deterministic Contract Definition** | Forces the author to write explicit input‑output schemas, versioned contracts, and type‑checked data models. | • Every step declares `input: { … }` and `output: { … }` in the **`contract.yaml`**. <br>• The CI pipeline runs a contract‑validation gate that fails on any schema mismatch before the workflow can be merged. | ```yaml\n# contract.yaml\nversion: 1.2\ninputs:\n  orderId: string\n  customer:\n    id: string\n    email: string\noutputs:\n  receiptUrl: string\n``` |
| **Verification Gate Insertion** | Inserts automated checks (unit tests, integration tests, static analysis) at defined “gate” points. | • After each logical block a **`gate.yml`** is generated that runs `ricky test` and `ricky lint`. <br>• The workflow will not advance to the next stage unless the gate passes, guaranteeing early defect detection. | ```yaml\n# gate.yml (after paym
...[truncated]

Must

  • Describe skills as generation-time selection, loading, and rendering inputs.
  • Point to metadata such as loaded-skills.txt and skill-application-boundary.json.
  • Say the workflow instructions were informed by selected skills.

Must Not

  • Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
  • Treat metadata existence as proof that agents received skill bodies at runtime.
  • Overstate current tests beyond the generation boundary.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)

User message

Generate a workflow from a vague product spec and return it to the user.

Ricky output

Below is a **minimal, deterministic workflow skeleton** that you can flesh out once the missing product details are clarified.  
It follows the AgentWorkforce conventions (AGENTS.md, workflow standards, shared‑authoring rules) and includes:

* **Explicit contract definitions** (inputs, outputs, success criteria)  
* **Verification gates** that can be automated or manually reviewed  
* **Review & hand‑off artifacts** (design doc, test plan, PR checklist)  
* **Loop count** (80 %‑100 % of the total steps are validation loops)  
* **Blocker‑reporting** hooks for early failure detection  
* **Scoped branch/PR boundaries** for incremental delivery  

---

## 1. High‑Level Overview

| Stage | Purpose | Owner | Primary Artifact |
|-------|---------|-------|------------------|
| **A – Inception** | Capture vague spec, create concrete **Product Requirements Document (PRD)** | Product Owner | `docs/PRD.md` |
| **B – Design** | Translate PRD into **Workflow Contract** and **Architecture Diagram** | System Architect | `contracts/workflow_contract.yaml`, `docs/arch.svg` |
| **C – Implementation** | Build workflow steps in a feature branch | Engineers | Feature branch `feat/<workflow‑name>` |
| 
...[truncated]

Must

  • Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
  • Validate with dry-run or targeted structural checks where possible.
  • Return artifacts, warnings, and follow-up commands honestly.

Must Not

  • Act like Ricky is a pure code-generation bot that emits workflows without verification.
  • Stop at "code compiles" as the proof bar.
  • Skip skill-aware workflow authoring guidance for serious workflows.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)

User message

A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.

Ricky output

**Ricky’s Immediate Response Plan**

| Phase | Action | Why | Deliverable / Artifact |
|-------|--------|-----|------------------------|
| 1️⃣ Detect & Log | • Pull the run‑log from the orchestrator (e.g., Temporal, Airflow). <br>• Identify the exact step that timed‑out (Step X) and the worker that remained alive (Worker Y). <br>• Record timestamps, input payloads, and any partial outputs. | Provides a deterministic audit trail that can be referenced in every downstream gate. | `run‑log‑<run‑id>.json` + `timeout‑event‑X.md` |
| 2️⃣ Blocker Report | • Create a **Blocker Issue** in the workflow repo (e.g., `workflow‑<name>/issues`). <br>• Title: *“Step X timeout – Worker Y hung, no artifacts produced”*. <br>• Include: log excerpt, worker ID, environment (container image, node), and current SLA breach. | Guarantees visibility for all stakeholders and prevents further downstream jobs from being scheduled. | GitHub Issue #\<N\> (Blocker) |
| 3️⃣ Verification Gate – “Timeout‑Gate” | • Define a deterministic gate that must pass before any retry or repair can proceed. <br>• Conditions: <br> a) Confirm that the timeout threshold (e.g., 5 min) was indeed exceeded. <br> b) Verify that no arti
...[truncated]

Must

  • Classify the failure before deciding whether to retry.
  • Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
  • Preserve raw evidence and uncertainty when the class is not known.

Must Not

  • Blindly rerun the whole workflow without a blocker classification.
  • Treat every failure as a broken workflow definition.
  • Claim Ricky fixed the workflow before rerun evidence proves it.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)

User message

Ricky detects stale .agent-relay/, .relay/, and .trajectories/ state before launching a local workflow.

Ricky output

**Workflow Contract – Stale‑State Detection for Local Launches**  

| **Element** | **Definition** |
|-------------|----------------|
| **Name** | `detect‑stale‑state` |
| **Scope** | Runs **before** any `local‑launch` entry point in a developer’s workspace. |
| **Inputs** | <ul><li>`repoRoot: string` – absolute path to the repository root.</li><li>`stateDirs: string[]` – default `[".agent‑relay", ".relay", ".trajectories"]`.</li></ul> |
| **Outputs** | <ul><li>`stale: boolean` – true if any directory contains files older than the configured TTL.</li><li>`report: string` – human‑readable summary of stale items.</li></ul> |
| **TTL (Time‑to‑Live)** | 24 h for any file inside the three directories (configurable via `AGENT_STATE_TTL_HOURS`). |
| **Failure Mode** | If `stale === true` the workflow **halts** and emits a **Blocker** ticket (see “Blocker Reporting” below). |
| **Success Mode** | `stale === false` – workflow proceeds to the next stage (`local‑launch`). |
| **Deterministic Verification Gate** | `verify‑stale‑state` – a pure function that, given a snapshot of the filesystem, returns the exact same `stale`/`report` pair. Unit‑tested with 100% branch coverage. |
| **Review Art
...[truncated]

Must

  • Classify stale local runtime state as an environment contamination issue.
  • Recommend quarantine or isolated-run guidance before launch.
  • Record the observed paths and the action taken or recommended.

Must Not

  • Treat stale runtime state as a workflow logic failure.
  • Delete or overwrite state without an explicit safe path or user intent.
  • Continue into execution as if the workspace were clean.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)

User message

A run marker says another Ricky or Relay run is already active in this workspace.

Ricky output

**Resolving “Another Ricky or Relay run is already active in this workspace”**

The marker is a safety lock that prevents overlapping executions. Follow the deterministic repair contract below.

---

## 1️⃣ Verify the current lock state
```bash
# Show the lock file and its contents
cat .ricky/run.marker
# Example output:
# PID=12345
# START=2026-05-11T09:12:33Z
# TYPE=relay

If the file is missing, the error is stale – skip to Step 4.


2️⃣ Confirm the referenced process is still alive

ps -p $(cat .ricky/run.marker | grep ^PID= | cut -d= -f2)
  • If the process exists → it is a legitimate active run.

    • Wait for it to finish, or
    • Cancel it safely (see Step 3).
  • If the process does not exist → the lock is orphaned. Proceed to Step 4.


3️⃣ Safely terminate an active run (optional)

# Graceful stop
ricky stop --pid $(cat .ricky/run.marker | grep ^PID= | cut -d= -f2)

# If the graceful stop fails, force‑kill
kill -9 $(cat .ricky/run.marker | grep ^PID= | cut -d= -f2)

After termination, verify the process is gone (ps -p <PID> returns nothing) before moving on.


4️⃣ Clean the stale lock

# Remove the mar
...[truncated]

Must

  • Report the active marker, run id, or status path when available.
  • Ask the user to inspect, wait for, or explicitly clear the active run.
  • Avoid launching a competing run that could corrupt evidence.

Must Not

  • Silently start another run.
  • Hide the existing run marker from the user.
  • Treat the conflict as a generic failure with no recovery path.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.env-loader-injection-runtime-loadable (runtime-recovery/manual)

User message

A workflow artifact references a MISSING_ENV_VAR value. Ricky's deterministic auto-fix injects the .env.local / .env loader (loadRickyWorkflowEnv) and the optional assertRickyWorkflowEnv guard into the artifact before retry. The artifact may be a master-rendered workflow whose .step({ command: ... }) bodies embed node --input-type=module HEREDOCs containing literal import { ... } from 'node:fs' / from 'node:path' strings.

Ricky output

**Deterministic auto‑fix for missing environment variables**

Below is the exact, reproducible patch you can apply to any workflow artifact that currently fails because it references a `MISSING_ENV_VAR`. The fix does three things:

1. **Injects the environment loader** – `loadRickyWorkflowEnv()` – so that `.env.local` (or fallback `.env`) is read before any step runs.  
2. **Optionally adds a guard** – `assertRickyWorkflowEnv()` – that throws a clear error if the required variable is still undefined after loading.  
3. **Leaves the existing step bodies untouched**, including any HEREDOC‑wrapped `node --input-type=module` snippets that contain literal `import … from 'node:fs'` / `from 'node:path'` strings.

---

## 1. Shared utility (add once, import everywhere)

Create (or update) `src/rickyWorkflowEnv.ts` in the repository root:

```ts
// src/rickyWorkflowEnv.ts
import * as fs from 'node:fs';
import * as path from 'node:path';
import * as dotenv from 'dotenv';

/**
 * Load environment variables from .env.local (preferred) or .env.
 * Returns the loaded env object for convenience.
 */
export function loadRickyWorkflowEnv(): NodeJS.ProcessEnv {
  const cwd = process.cwd();
  const c
...[truncated]

Must

  • Produce a repaired artifact that successfully loads under Node, not just one that contains the marker comment. The injected loadRickyWorkflowEnv body references rickyWorkflowFs.* and rickyWorkflowPath.*, so the repair must also add the corresponding import * as rickyWorkflowFs from 'node:fs' and import * as rickyWorkflowPath from 'node:path' aliases at module top level.
  • Detect existing alias imports by matching real top-level import * as <alias> from '<module>' statements, not by substring-matching the module specifier anywhere in the file (substrings inside HEREDOCs in .step({ command: ... }) bodies do not count as imports).
  • Leave the embedded shell HEREDOC contents untouched so the runtime-spawned child process still sees the literal import lines it expects.

Must Not

  • Skip adding the rickyWorkflowFs / rickyWorkflowPath aliases because from 'node:fs' or from 'node:path' already appears somewhere in the file as a string literal.
  • Inject loadRickyWorkflowEnv (or assertRickyWorkflowEnv) without the supporting alias imports, which produces a ReferenceError: rickyWorkflowPath is not defined at module load and burns the auto-fix budget on UNSUPPORTED_RUNTIME at runtime-launch.
  • Rewrite or escape the embedded HEREDOC text in step commands.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)

User message

Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.

Ricky output

**Local Workflow Execution with Auto‑Fix – Step‑by‑Step Contract**

Below is a deterministic, reproducible contract you can copy‑paste into your terminal (or CI script). It assumes you have the **AgentWorkforce CLI** (`aw`) installed and that the failing workflow artifact is located at `./artifacts/workflow.yaml`. The previous successful run ID is supplied as `PREV_RUN_ID`.

---

### 1. Preconditions (Verification Gate)

| Check | Command | Expected Result |
|-------|---------|-----------------|
| **CLI availability** | `aw --version` | `aw version x.y.z` |
| **Workflow file exists** | `test -f ./artifacts/workflow.yaml && echo OK` | `OK` |
| **Previous run ID is non‑empty** | `[[ -n "$PREV_RUN_ID" ]] && echo OK` | `OK` |
| **Docker daemon (if containerized)** | `docker info >/dev/null 2>&1 && echo OK` | `OK` |

If any gate fails, abort and report the blocker before proceeding.

---

### 2. Execution Contract

```bash
#!/usr/bin/env bash
set -euo pipefail

# ---- INPUTS -------------------------------------------------
WORKFLOW_FILE="./artifacts/workflow.yaml"
PREV_RUN_ID="${PREV_RUN_ID:-}"          # must be exported in the env
AUTO_FIX=true                           # enable auto
...[truncated]

Must

  • Use a bounded retry budget and summarize every attempt.
  • Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
  • Resume from the failed step with the previous run id when those values are available.

Must Not

  • Edit arbitrary repository source files as the default auto-fix surface.
  • Keep retrying after the configured max attempts.
  • Lose the single Ricky tracking run id across repair/resume attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.no-auto-fix-preserves-single-attempt (runtime-recovery/manual)

User message

A user runs ricky run workflows/foo.ts --no-auto-fix and the workflow fails.

Ricky output

**Quick‑Start Failure Diagnosis**

When `ricky run workflows/foo.ts --no-auto-fix` exits with a non‑zero status, the engine has hit a **deterministic verification gate** that cannot be auto‑repaired. Follow the steps below to isolate the blocker, produce the required review artifacts, and get the workflow back to a green state.

---

## 1️⃣ Capture the Run Report

```bash
ricky run workflows/foo.ts --no-auto-fix --report ./tmp/foo-run-report.json
  • The JSON report contains:
    • gateResults[] – each verification gate outcome.
    • errorStack – full stack trace.
    • artifactPaths[] – paths to generated logs, snapshots, and diff files.

Save this file and attach it to the issue you’ll open (see §4).


2️⃣ Identify the Failing Gate

Open the report and look for the first entry where "status": "FAIL".

Typical gate categories:

Gate What it checks Common failure cause
SchemaGate Input/Output JSON schema compliance Missing required field, type mismatch
IdempotenceGate Deterministic output for identical inputs Non‑deterministic timestamps, random IDs
ResourceGate External resource a
...[truncated]

**Must**

- Preserve one-attempt behavior when auto-fix is disabled.
- Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
- Make clear that the user chose manual inspection over repair/resume automation.

**Must Not**

- Start a repair loop despite `--no-auto-fix`.
- Suppress the diagnosis because no repair was attempted.
- Present the failure as a completed repair attempt.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>runtime-recovery.in-process-local-runner</code> (runtime-recovery/manual)</summary>

**User message**

> Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.

**Ricky output**

```text
**Ricky’s Standard Procedure for Running a Local TypeScript Workflow Artifact**

| Phase | Action | Rationale / Artifact |
|-------|--------|----------------------|
| 1️⃣  | **Locate the primary workflow directory**  | The workflow root is always `./workflows/` (repo‑wide convention). The primary local path for a new artifact is `./workflows/<workflow‑name>/`. |
| 2️⃣  | **Validate the contract file**  | Every workflow must contain a `workflow.contract.json` that declares: <br>• `name`  <br>• `entry` (relative path to the TS file) <br>• `inputs` schema <br>• `outputs` schema <br>• `verificationGate` (optional). <br>Ricky aborts if the contract is missing or malformed and logs a **BLOCKER**. |
| 3️⃣  | **Install dependencies**  | ```bash\ncd ./workflows/<workflow‑name>\nnpm ci   # deterministic install, lockfile required\n``` <br>Ensures reproducible environment; any missing `package.json` or lockfile is reported as a **BLOCKER**. |
| 4️⃣  | **Compile‑time check**  | Run TypeScript’s type‑checker without emitting code: <br>```bash\ntsc --noEmit\n``` <br>If `tsc` returns non‑zero, Ricky creates a **Verification Gate Failure** report (includes `tsc` diagnostics) and stops execution. |
...[truncated]

Must

  • Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the agent-relay binary on PATH.
  • Precheck that Node and @agent-relay/sdk are resolvable for the workflow.
  • Record the actual spawn command in execution evidence.

Must Not

  • Fail solely because agent-relay is not on PATH when the SDK route is available.
  • Hide the actual runtime command from evidence.
  • Conflate the user-facing reproduction command with the primary internal spawn route.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)

User message

Ricky reaches a boundary after a structural failure persists after a fix attempt.

Ricky output

**Ricky – Boundary‑Escalation Workflow for Persistent Structural Failures**  

| **Phase** | **Goal** | **Inputs** | **Outputs / Artifacts** | **Verification Gate** |
|-----------|----------|------------|--------------------------|-----------------------|
| **1️⃣ Detect & Log** | Capture the failure that survived the first fix. | - Failure ticket (e.g., `FAIL‑1234`)  <br> - Original fix PR (`fix/struct‑leak‑v2`)  <br> - Test run logs, stack traces, metrics | - `failure_report.md` (structured JSON/YAML)  <br> - Updated ticket with `status: “persistent”` | **Gate A** – Automated parser confirms required fields (`failure_id`, `original_pr`, `symptoms`, `timestamp`). |
| **2️⃣ Re‑run Verification Suite** | Ensure the failure is reproducible and not a fluke. | - `failure_report.md`  <br> - Full CI pipeline for the affected component | - `repro_run.log`  <br> - `repro_result.json` (`pass: false`) | **Gate B** – CI must report a deterministic failure on ≥ 2 clean agents. |
| **3️⃣ Scoped Branch Creation** | Isolate further investigation without contaminating the main line. | - `main` (or current release branch)  <br> - `failure_report.md` | - New branch `boundary/FAIL‑1234‑investigate`  <
...[truncated]

Must

  • Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
  • Distinguish escalation from a generic product failure.
  • Preserve enough context for a human operator to continue.

Must Not

  • Retry speculative fixes indefinitely.
  • Collapse to "something went wrong" without the attempted actions.
  • Discard evidence from failed repair attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)

User message

Produce a workflow health digest from many Ricky workflow runs.

Ricky output

**Ricky Workflow Health Digest – Summary of Recent Runs**  
*(Generated per the “Workflow Health Digest” contract in AGENTS.md – deterministic, scoped, and ready for PR review)*  

| Run ID | Workflow | Branch | Trigger | Start ⏱ | End ⏱ | Duration | Status | Exit‑Code | Key Metrics | Notable Logs / Alerts |
|--------|----------|--------|---------|----------|-------|----------|--------|-----------|-------------|------------------------|
| `wf‑20240528‑001` | **Deploy‑Prod** | `main` | `push` (v2.3.1) | 2024‑05‑28 02:13 UTC | 2024‑05‑28 02:27 UTC | 14 min | ✅ **Success** | 0 | CPU ≈ 68 % (peak) <br>Memory ≈ 1.2 GB | No alerts |
| `wf‑20240527‑014` | **Data‑Ingest** | `feature/ingest‑v5` | `schedule` (nightly) | 2024‑05‑27 00:00 UTC | 2024‑05‑27 00:42 UTC | 42 min | ⚠️ **Partial‑Fail** | 2 | Rows ≈ 4.2 M (processed) <br>Failed ≈ 1.3 % | “Row‑validation‑error” at step 3 (see Log‑A) |
| `wf‑20240526‑009` | **Deploy‑Prod** | `release/2024‑05‑26` | `manual` (approval) | 2024‑05‑26 14:05 UTC | 2024‑05‑26 14:23 UTC | 18 min | ❌ **Failed** | 1 | No deployment artifacts | “Docker‑pull‑timeout” in step 2 |
| `wf‑20240525‑022` | **Smoke‑Test** | `main` | `push` (v2.3.0) | 2024‑05‑25 09:12 UTC 
...[truncated]

Must

  • Consume normalized WorkflowRunEvidence rather than raw logs as the primary input.
  • Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
  • Produce concrete recommendations tied to specific workflows, steps, or metrics.

Must Not

  • Mutate evidence while analyzing it.
  • Return generic advice like "improve your workflow" without references.
  • Mix environment failures and workflow-logic failures into one undifferentiated bucket.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)

User message

Design a new Ricky Slack surface for workflow debugging.

Ricky output

## Ricky Slack Surface – “Workflow Debugger”

### 1. Overview
The **Ricky Slack Debugger** is a dedicated Slack app surface that lets engineers and product owners inspect, step‑through, and repair Ricky‑managed workflows in real time. It follows the **Ricky repository conventions** (AGENTS.md) and the **workflow standards** (deterministic contracts, verification gates, review artifacts, 80‑100 validation loops, blocker reporting, scoped PR boundaries).

| Component | Purpose |
|-----------|---------|
| **Debug Panel** (modal) | Visual tree of the active workflow, node status, input/output payloads. |
| **Step Controls** (buttons) | *Pause*, *Resume*, *Step‑Into*, *Step‑Over*, *Jump‑To* a specific node. |
| **Live Log Stream** (section) | Timestamped, filtered logs with severity tags. |
| **Patch Submit** (form) | Inline edit of node configuration + deterministic contract diff. |
| **Verification Gate** (modal) | Auto‑run pre‑commit checks before a patch is merged. |
| **Review Artifact** (thread) | Auto‑generated summary of changes, test results, and blocker report. |
| **Loop Counter** (badge) | Shows current validation‑loop iteration (target 80‑100). |

---

### 2. Interaction Fl
...[truncated]

Must

  • Treat Slack as a co-equal product surface, not a wrapper around CLI.
  • Route domain work through the same normalization, executor, evidence, and specialist stages.
  • Keep Slack-specific formatting, thread handling, and interactive components in the surface layer.

Must Not

  • Put workflow generation or diagnosis domain logic directly in the Slack handler.
  • Degrade Slack to a developer shortcut with weaker routing than CLI.
  • Skip signature verification, dedup, or thread handling when the surface is implemented.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.normalizer-is-only-translation-boundary (surfaces-ingress/manual)

User message

Add a new web handoff type that submits a workflow spec and mode.

Ricky output

**New Web‑Handoff Type – `workflowSubmit`**

| Element | Definition |
|---------|------------|
| **Name** | `workflowSubmit` |
| **Purpose** | Allows a web client (e.g., UI, external service) to hand off a complete **Workflow Specification** together with an execution **mode** (e.g., `dryRun`, `run`, `preview`). The receiving AgentWorkforce service validates, stores, and optionally triggers the workflow. |
| **Direction** | **Client → AgentWorkforce** (POST) |
| **Content‑Type** | `application/json` |
| **Schema** | See **JSON‑Schema** below. |
| **Response** | Deterministic status object with `requestId`, `validationResult`, and optional `triggerId`. |
| **Idempotency** | `X‑Idempotency‑Key` header required for safe retries. |
| **Auth** | Bearer token (OAuth2) with `workflow:submit` scope. |
| **Rate‑limit** | 30 req/min per token. |

---

## 1. JSON‑Schema for the Request Payload

```json
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "WorkflowSubmitPayload",
  "type": "object",
  "required": ["workflowSpec", "mode"],
  "properties": {
    "workflowSpec": {
      "type": "object",
      "description": "Full workflow definition adhering to the internal Workf
...[truncated]

Must

  • Add a hand

Comment truncated to stay within GitHub limits. Download the ricky-eval-run artifact for the full human-review.md.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant