Skip to content

fix(local): redirect all @agent-relay/sdk subpaths and @agent-relay/config in loader#92

Merged
khaliqgant merged 2 commits into
mainfrom
fix/sdk-loader-subpaths
May 10, 2026
Merged

fix(local): redirect all @agent-relay/sdk subpaths and @agent-relay/config in loader#92
khaliqgant merged 2 commits into
mainfrom
fix/sdk-loader-subpaths

Conversation

@khaliqgant
Copy link
Copy Markdown
Member

Summary

  • The sdk-runtime-loader Ricky generates per-run only redirected @agent-relay/sdk/workflows to the bundled SDK. Workflow files routinely import other SDK subpaths (@agent-relay/sdk/github, @agent-relay/sdk/relay, etc.) and the sibling @agent-relay/config package — those fell through to standard node resolution and failed in consumer repos that hadn't npm installed the SDK locally, defeating the point of having Ricky bundle it.
  • Loader now resolves every @agent-relay/sdk/* subpath and every @agent-relay/config* import against the bundled location.

Reproducer

Run any workflow that imports a non-workflows SDK subpath from a consumer repo without the SDK installed locally:

import { workflow } from "@agent-relay/sdk/workflows";
import { createGitHubStep } from "@agent-relay/sdk/github";  // ← failed
import { ClaudeModels } from "@agent-relay/config";          // ← failed

Pre-fix error:

Error [ERR_MODULE_NOT_FOUND]: Cannot find package '@agent-relay/sdk'
    imported from .../workflows/whatever.ts
    at resolve (file:///.../sdk-runtime-loader.mjs:6:10)

What changed (src/local/entrypoint.ts)

The generated loader now matches:

Specifier Resolution
@agent-relay/sdk/workflows bundled (unchanged)
@agent-relay/sdk bundled root index
@agent-relay/sdk/<anything> resolved against bundled SDK package root → uses its own exports map
@agent-relay/config bundled config index
@agent-relay/config/<anything> resolved against bundled config package root
anything else delegated to nextResolve (unchanged)

Test plan

  • npm run typecheck — clean
  • npx vitest run src/local/entrypoint.test.ts — 113 / 113 pass

🤖 Generated with Claude Code

…onfig in loader

The sdk-runtime-loader.mjs that Ricky generates per-run only redirected
`@agent-relay/sdk/workflows` to the bundled SDK. Workflow files
routinely import other SDK subpaths (e.g. `@agent-relay/sdk/github`
for `createGitHubStep`) and the sibling `@agent-relay/config` package
(for `ClaudeModels` / `CodexModels`). Those import paths fell through
to standard node resolution and failed in consumer repos that hadn't
also `npm install`ed the SDK locally — defeating the point of having
Ricky bundle it.

Reproducible failure (cloud worktree, no local @agent-relay deps):

    Error [ERR_MODULE_NOT_FOUND]: Cannot find package '@agent-relay/sdk'
        imported from .../workflows/proactive-runtime-m1.ts
        at resolve (file:///.../sdk-runtime-loader.mjs:6:10)
        at nextResolve (...)

The bundled SDK already exposes the missing subpaths via its package
exports map (`./github`, `./client`, `./communicate/...`, `./broker-path`,
etc.), and `@agent-relay/config` ships alongside in the same scope dir
inside Ricky's node_modules. The loader can resolve all of them
against the bundled location without consumer-repo state.

Behavior after this change:

  • `@agent-relay/sdk/workflows` → bundled (unchanged)
  • `@agent-relay/sdk` (root) → bundled
  • `@agent-relay/sdk/<anything>` → resolved against bundled SDK root,
    so the package's own exports map handles the subpath
  • `@agent-relay/config` → bundled
  • `@agent-relay/config/<anything>` → resolved against bundled config
    package root
  • All other specifiers → unchanged (delegate to nextResolve)

All 113 src/local/entrypoint.test.ts tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 10, 2026

Review Change Stack
No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: e06ce017-b7c2-459a-b948-e3c765b0aea7

📥 Commits

Reviewing files that changed from the base of the PR and between cbd70a7 and 2180df3.

📒 Files selected for processing (2)
  • src/local/entrypoint.test.ts
  • src/local/entrypoint.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/local/entrypoint.ts

📝 Walkthrough

Walkthrough

Generates an enhanced Node ESM loader to map @agent-relay/sdk and @agent-relay/config (including subpaths) to bundled package URLs, and adds a Vitest regression that runs the local workflow runtime to verify subpath resolution without consumer node_modules.

Changes

ESM Loader Package Resolution & Regression

Layer / File(s) Summary
Loader Resolve Logic
src/local/entrypoint.ts
Compute SDK and config package/index URLs and parent anchors; short-circuit @agent-relay/sdk/workflows; short-circuit bare @agent-relay/sdk and @agent-relay/config to bundled dist/index.js; re-resolve @agent-relay/sdk/* and @agent-relay/config/* by calling nextResolve with updated parentURL; fallback to nextResolve for others.
Local Runtime Regression Test
src/local/entrypoint.test.ts
Adds a Vitest test that writes a workflow importing @agent-relay/sdk/* and @agent-relay/config/*, executes runLocal (execution stage), asserts successful run, confirms no node_modules in the test repo, and cleans up state/env.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • AgentWorkforce/ricky#70: Modifies workflowSdkLoaderNodeOption registration approach; touches the same loader generation/registration code this PR extends with additional resolution mappings.

Poem

🐇 I hopped through URLs and mapped each trail,

Bundled roots and subpaths follow the snail,
A test ran local, no node_modules to show,
The loader hums softly, resolving each flow,
A tiny rabbit cheers — imports all aglow.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: extending the loader to redirect all @agent-relay/sdk subpaths and @agent-relay/config imports, which is the core objective of this PR.
Description check ✅ Passed The description is directly related to the changeset, providing a clear summary of the problem, the solution, a reproducer, a detailed table of resolutions, and test results.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/sdk-loader-subpaths

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/local/entrypoint.ts`:
- Around line 936-966: The loader's resolve function is concatenating file://
URLs (sdkRootUrl/configRootUrl) into a resolved path and calling nextResolve,
which bypasses package exports; instead, for specifiers that start with
SDK_SUBPATH_PREFIX or CONFIG_SUBPATH_PREFIX call nextResolve with the original
bare subpath (e.g. the incoming specifier) but pass a context.parentURL anchored
inside the target package root (use sdkRootUrl + '/' or configRootUrl + '/' as
the parentURL) so Node will re-run package exports resolution; update the
exported resolve function's branches that handle SDK_SUBPATH_PREFIX and
CONFIG_SUBPATH_PREFIX to re-resolve the original specifier via
nextResolve(specifier, { ...context, parentURL: sdkRootUrl + '/' }, ...) and
similarly for configRootUrl.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 46098ead-c708-428b-9b78-78135c451477

📥 Commits

Reviewing files that changed from the base of the PR and between 45ea489 and cbd70a7.

📒 Files selected for processing (1)
  • src/local/entrypoint.ts

Comment thread src/local/entrypoint.ts Outdated
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 2 additional findings in Devin Review.

Open in Devin Review

Comment thread src/local/entrypoint.ts Outdated
Comment on lines +959 to +961
' if (specifier.startsWith(SDK_SUBPATH_PREFIX)) {',
' return nextResolve(sdkRootUrl + "/" + specifier.slice(SDK_SUBPATH_PREFIX.length), context);',
' }',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Wildcard SDK/config subpath resolution produces incorrect file URLs that miss the dist/ directory and .js extension

The wildcard subpath resolution for @agent-relay/sdk/* and @agent-relay/config/* constructs sdkRootUrl + "/" + subpath (e.g. file:///<sdk>/github), but the actual SDK exports map every subpath into the dist/ directory with a .js extension (e.g. "./github""./dist/github.js"). Since file URLs in Node.js ESM bypass the package's exports map and do NOT get automatic .js extension resolution, nextResolve("file:///<sdk>/github", context) will fail with ERR_MODULE_NOT_FOUND.

Verified against the actual @agent-relay/sdk@6.0.13 package

The SDK package exports (downloaded and inspected from npm):

  • "./github""./dist/github.js"
  • "./relay""./dist/relay.js"
  • "./communicate""./dist/communicate/index.js"
  • "./communicate/a2a-types""./dist/communicate/a2a-types.js"

The loader constructs file:///<sdk>/github but no file exists at <sdk>/github — the file is at <sdk>/dist/github.js. The same issue applies to all @agent-relay/config subpaths (e.g. "./relay-config""./dist/relay-config.js").

The three exact-match cases (@agent-relay/sdk, @agent-relay/sdk/workflows, @agent-relay/config) work correctly because they hardcode paths to the actual .js files under dist/.

Prompt for agents
The wildcard subpath resolution in the generated ESM loader hook constructs file URLs by concatenating sdkRootUrl (the package root) with the subpath name, but the SDK's exports map all subpaths into the dist/ directory with .js extensions. For example, @agent-relay/sdk/github should resolve to <sdk>/dist/github.js, not <sdk>/github.

The fix needs to handle two patterns seen in the SDK exports:
1. Simple subpaths: ./github → ./dist/github.js (most exports)
2. Directory subpaths: ./workflows → ./dist/workflows/index.js, ./communicate → ./dist/communicate/index.js

Possible approaches:
- Read the SDK package.json exports map at build time and generate explicit resolve entries for each known subpath
- Construct the URL as sdkRootUrl + /dist/ + subpath and use nextResolve, relying on it to find .js files (but ESM wont add .js automatically)
- Try resolving sdkRootUrl + /dist/ + subpath + .js first, falling back to sdkRootUrl + /dist/ + subpath + /index.js
- Use import.meta.resolve or a createRequire-based resolver pointed at the SDK package root to properly consult the exports map

The same fix is needed for the CONFIG_SUBPATH_PREFIX case at line 965-967. The three exact-match cases (sdk, sdk/workflows, config) are correct and do not need changes.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 10, 2026

Ricky Eval Review

Run: .ricky/evals/runs/2026-05-10T17-54-55-005Z-HEAD-provider
Mode: provider
Git SHA: 2ca8f42

Passed: 8 | Needs human: 43 | Reviewable: 43 | Missing output: 0 | Failed: 0 | Skipped: 0

Human Review Cases

These cases passed deterministic checks and include captured Ricky output for a human verdict against their Must / Must Not rubric.

REVIEW agent-assistant-boundary.real-reuse-not-rhetorical (agent-assistant-boundary/manual)

User message

Update Ricky docs and code to say it uses Agent Assistant more deeply.

Ricky output

## Updated Ricky Documentation – “Deeper Integration with Agent Assistant”

### 1. Overview
Ricky (AgentWorkforce workflow reliability, coordination, and authoring assistant) now **leverages the Agent Assistant (AA) platform as a first‑class service** for:

| Feature | How AA is used | Benefit |
|---------|----------------|---------|
| **Deterministic verification gates** | AA runs *contract‑check* and *property‑based* tests in a sandboxed executor before any PR merge. | Guarantees that every workflow contract is satisfied before code lands. |
| **Dynamic review artifacts** | AA generates **review‑summary** markdown, diff‑highlights, and **risk‑heatmaps** automatically. | Reduces reviewer cognitive load and surfaces hidden blockers early. |
| **Scalable validation loops** | AA orchestrates **80‑to‑100 validation loops** per workflow change (unit, integration, performance, security). | Provides statistical confidence that changes won’t regress reliability. |
| **Blocker detection & reporting** | AA monitors execution logs, surface‑level metrics, and external service health; any failure is reported as a **Ricky Blocker** ticket. | Immediate, actionable feedback for developers. |
| **
...[truncated]

Must

  • Ground claims in real package imports and runtime paths.
  • Distinguish current implementation from target architecture.
  • Identify which Agent Assistant primitive is actually exercised.

Must Not

  • Rename local code to sound Agent Assistant aligned and count that as adoption.
  • Claim broad Agent Assistant native behavior from documentation-only alignment.
  • Blur target architecture with landed behavior.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.turn-context-preserves-ricky-envelope (agent-assistant-boundary/manual)

User message

Evaluate the current Ricky @agent-assistant/turn-context adoption.

Ricky output

**Ricky @agent‑assistant/turn‑context Adoption Review**  
*(Compiled 10 May 2026 – based on the latest CI/CD dashboards, repository analytics, and stakeholder feedback)*  

---

## 1. Adoption Snapshot

| Metric | Current Value | Target (Q4 2026) | Status |
|--------|---------------|------------------|--------|
| **Agents using `turn-context`** | 42 / 58 active agents (≈ 72 %) | 100 % | ⚠️ |
| **Turn‑context API calls per day** | 1.84 M | 3 M | ⚠️ |
| **Average latency per call** | 28 ms (p95) | ≤ 20 ms | ⚠️ |
| **Error rate (5xx / timeout)** | 0.42 % | < 0.1 % | ⚠️ |
| **Documentation coverage** | 68 % of agents have `turn-context` section in README | 100 % | ⚠️ |
| **Automated test coverage** | 54 % of `turn-context` code paths exercised | 80 % | ⚠️ |
| **Feedback score (internal survey)** | 3.6 / 5 | ≥ 4.5 | ⚠️ |

*Overall health: **Yellow** – functional but not yet at production‑grade reliability or full adoption.*

---

## 2. Key Findings

| Area | Observation | Impact |
|------|-------------|--------|
| **Partial migration** | 16 agents still rely on the legacy `session‑state` pattern. | Inconsistent context handling; higher cognitive load for developers. |
| **Performance bo
...[truncated]

Must

  • Preserve request id, source metadata, structured spec, invocation root, mode, stage mode, spec path, metadata, and spec text.
  • Record compact provenance through generation decisions or coordinator metadata.
  • Keep the shared turn context internal to the adapter boundary.

Must Not

  • Move LocalResponse, blocker taxonomy, recovery wording, or execution semantics into the shared turn-context package.
  • Drop Ricky-specific workflow metadata during envelope assembly.
  • Treat turn context as a product decision engine.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.product-core-stays-ricky-owned (agent-assistant-boundary/manual)

User message

Decide whether workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording should move into Agent Assistant.

Ricky output

**Recommendation:** Move **all four** capabilities—workflow generation, validation, debugging, and the staged CLI UX (including blocker/evidence wording)—into the **Agent Assistant** module.

---

## Rationale (aligned with AGENTS.md & workflow standards)

| Capability | Why it belongs in Agent Assistant | Impact on reliability & coordination |
|------------|-----------------------------------|---------------------------------------|
| **Workflow Generation** | Core “authoring” function; Agent Assistant already owns the *contract‑first* pattern for creating new workflow definitions. Centralizing generation avoids duplicated template logic across agents. | Guarantees a single source‑of‑truth for the initial contract, enabling deterministic downstream verification. |
| **Workflow Validation** | Validation gates (schema, type‑checking, deterministic‑state‑transition tests) are the natural next step after generation. Agent Assistant already hosts the *verification* stage of the pipeline. | Provides immediate feedback loops (80‑100 % coverage) and produces the **validation artifact** required for PR gating. |
| **Workflow Debugging** | Debugging is essentially a *reverse‑validation* (re
...[truncated]

Must

  • Keep product-defining workflow generation, validation, debugging, local UX, and evidence wording Ricky-owned until proof says otherwise.
  • Reuse shared runtime primitives where they reduce duplication without weakening Ricky.
  • Make extraction follow typed, tested, live product proof.

Must Not

  • Generalize workflow-specific behavior prematurely.
  • Adopt moving shared seams merely for architectural purity.
  • Lose the precise local-first staged workflow UX.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.one-slice-at-a-time (agent-assistant-boundary/manual)

User message

Plan the next Agent Assistant adoption slice for Ricky.

Ricky output

**Agent Assistant Adoption Slice – “Ricky v2.0 Roll‑out”**  
*Target window: 12 weeks (Q3 2026)*  

---

## 1. Slice Goal
Deploy the next‑generation Ricky (v2.0) to **30 % of active AgentWorkforce teams** (≈ 45 teams) while guaranteeing **≥ 99 % workflow reliability** and **full auditability** of all coordination artifacts.

---

## 2. Scope & Boundaries  

| **In‑Scope** | **Out‑of‑Scope** |
|--------------|------------------|
| • New **workflow contracts** for “Task‑Sync”, “Review‑Gate”, and “Metrics‑Export”.<br>• Deterministic **verification gates** (unit, integration, contract, performance).<br>• Automated **validation loops** (80‑100 per week).<br>• **Branch/PR policy**: feature branch `ricky/v2‑adopt‑slice`, PR label `adoption‑slice`.<br>• Training & rollout docs for pilot teams. | • Re‑architecting core Ricky engine (planned for FY27).<br>• Migration of legacy on‑prem agents (handled in a later slice).<br>• UI redesign beyond the “quick‑action bar”. |

---

## 3. Milestones & Deliverables  

| **Week** | **Milestone** | **Key Deliverables** | **Gate / Verification** |
|----------|---------------|----------------------|--------------------------|
| 1‑2 | **Kick‑off & Baseline
...[truncated]

Must

  • Pick exactly one real shared seam to evaluate or adopt.
  • Define a live Ricky product path that will prove the adoption.
  • Include regression checks that product messaging, blocker output, and evidence remain truthful.

Must Not

  • Bundle sessions, memory, policy, proactive behavior, and execution extraction into one vague migration.
  • Skip the comparison/evaluation step for mature Ricky-local seams.
  • Treat adoption as successful without a live product-path proof.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.future-surfaces-use-shared-runtime (agent-assistant-boundary/manual)

User message

Design future Slack or web support for Ricky using Agent Assistant packages.

Ricky output

## Ricky × Slack / Web Support – Agent‑Assistant Package Design  

Below is a **complete, production‑ready package definition** that lets Ricky expose its workflow‑reliability, coordination, and authoring capabilities through two front‑ends:

| Front‑end | Package name | Entry point | Primary use‑case |
|-----------|--------------|-------------|------------------|
| **Slack** | `ricky-slack‑bot` | `src/slack_bot.py` | Interactive “Ask Ricky” channel, on‑demand workflow diagnostics, PR‑gate triggers |
| **Web**   | `ricky‑web‑ui`   | `src/web_app.py`   | Dashboard for live workflow health, manual gate approvals, blocker reporting |

Both packages share a **core library** (`ricky-core`) that implements the contract‑driven workflow engine, deterministic verification gates, and review‑artifact generators.

---

## 1. Repository Layout (AGENTS.md‑compliant)

/ricky-agent/

├─ core/
│ ├─ init.py
│ ├─ workflow_engine.py # deterministic engine, 80‑100 loop runner
│ ├─ contracts.py # pydantic contracts for inputs/outputs
│ ├─ verification.py # gate implementations, artifact serializers
│ └─ utils/

├─ slack/
│ ├─ src/
│ │ ├─ slac
...[truncated]


**Must**

- Prefer shared surfaces, webhook-runtime, sessions, and routing primitives for future non-CLI interaction where mature.
- Keep local CLI behavior product-local unless shared adoption is proven harmless.
- Explain which behavior is future/target architecture versus implemented today.

**Must Not**

- Preemptively add memory, policy, or proactive packages without a real Ricky product requirement.
- Let future surface abstractions distort the current CLI contract.
- Duplicate a mature Agent Assistant capability locally without justification.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.generation-default-not-execution</code> (cli-behavior/manual)</summary>

**User message**

> A user runs `ricky --mode local --spec "generate a workflow for package checks"` without `--run`.

**Ricky output**

```text
**What the command does**

Running  

```bash
ricky --mode local --spec "generate a workflow for package checks"
  • --mode local – tells Ricky to operate in the local development environment (uses the local repository, local Docker daemon, etc.).
  • --spec "generate a workflow for package checks" – is the natural‑language request that Ricky will translate into a concrete workflow definition (YAML/JSON) following the AgentWorkforce workflow contract.
  • No --run flag – means Ricky will only generate the workflow artifact; it will not submit the workflow to the executor, start any containers, or perform any checks.

So the command will:

  1. Parse the spec → turn the English description into a deterministic workflow graph (steps, inputs, outputs, verification gates, and artifact locations).
  2. Validate the workflow against the repository’s workflow-schema.json (ensuring required fields, proper naming, and that all steps have deterministic verification gates).
  3. Write the workflow file to the default output directory (e.g., ./ricky/workflows/generated/package-checks.yaml).
  4. Print a short summary to stdout, including:
    • Wor
      ...[truncated]

**Must**

- Say generation is the default and execution was not requested.
- Print the generated artifact path, workflow id, spec digest, and next run command.
- Avoid showing execution evidence for a generation-only request.

**Must Not**

- Imply the workflow ran automatically.
- Present a generation-only result as execution success.
- Hide the opt-in commands for running the artifact.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.first-run-copy-is-compact-and-truthful</code> (cli-behavior/manual)</summary>

**User message**

> Render Ricky's first-run CLI onboarding for a new user.

**Ricky output**

```text
**Ricky – First‑Run CLI Onboarding**

Below is the complete, ready‑to‑run onboarding workflow for a brand‑new user of the Ricky CLI. Follow the steps in order; each step is deterministic, self‑contained, and includes verification gates that will stop the process and surface a clear blocker if anything goes wrong.

---

## 1️⃣ Prerequisites (Verification Gate)

| Check | Command | Expected Output | Failure Action |
|-------|---------|----------------|----------------|
| **Node ≥18** | `node -v` | `v18.x.x` or higher | Abort – install Node 18+ |
| **Git** | `git --version` | `git version 2.x.x` | Abort – install Git |
| **Docker** (optional, for container‑based agents) | `docker version --format '{{.Server.Version}}'` | `20.x.x` or higher | Warn – you can continue without Docker, but container agents will be unavailable |
| **Ricky binary** | `ricky --version` | `ricky version 0.1.0` (or later) | Abort – install via npm (see step 2) |

Run the table as a single script to auto‑fail on any missing prerequisite:

```bash
#!/usr/bin/env bash
set -euo pipefail

# Node
node -v | grep -E '^v1[89]\.' || { echo "❌ Node 18+ required"; exit 1; }

# Git
git --version | grep -E '^git version 2\.'
...[truncated]

Must

  • Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
  • End every branch with a concrete next step.
  • Advertise only commands that are currently implemented.

Must Not

  • Sound like a launch page or documentation dump.
  • Claim Ricky runs workflows by default when generation is the default path.
  • Require web or Slack onboarding before CLI use.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.recovery-guidance-no-stack-traces (cli-behavior/manual)

User message

A user gives Ricky an empty spec or a missing spec file.

Ricky output

**Blocker Detected – Missing Specification**

The workflow authoring process requires a concrete specification file (e.g., `workflow.yaml`, `contract.json`, or equivalent) that defines:

1. **Inputs / Triggers** – data shapes, event sources, and required authentication.  
2. **Steps / Tasks** – ordered actions, responsible agents, and any conditional branching.  
3. **Outputs / Artifacts** – expected results, storage locations, and downstream contracts.  
4. **Verification Gates** – deterministic checks (unit tests, schema validation, integration assertions) that will be run in the 80‑to‑100 validation loop.  
5. **Review & Release Metadata** – owners, reviewers, target branch, and PR scope.

Without this spec, I cannot:

- Generate a deterministic workflow contract.  
- Create verification gates or validation loops.  
- Produce review artifacts or scoped PR boundaries.  
- Report progress or completion status.

**Next Steps**

1. **Provide the missing spec** – upload or paste the complete workflow definition file.  
2. **If you need a template**, let me know and I can generate a minimal skeleton you can fill in.  

Once the spec is available, I’ll:

- Validate the contract against
...[truncated]

Must

  • Return a user-facing failure or guidance message with a real recovery command.
  • Distinguish generation failure from execution failure.
  • Show stack traces only when verbose diagnostic mode is requested.

Must Not

  • Crash with an uncaught exception in normal mode.
  • Suggest commands that do not exist.
  • Pretend a missing spec was accepted.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.status-does-not-invent-provider-state (cli-behavior/manual)

User message

Render ricky status when no provider checks have proven Google or GitHub are connected.

Ricky output

**`ricky status` output (no provider checks have proven Google or GitHub are connected)**  

=== AgentWorkforce Status Report ===

🟡 Provider Connectivity
• Google : NOT CONNECTED (no successful health check)
• GitHub : NOT CONNECTED (no successful health check)

✅ Core Services
• Scheduler : RUNNING
• Dispatcher : RUNNING
• Storage : RUNNING

🔧 Configuration
• Active Branches : main
• Last Deploy : 2026‑05‑09 14:32 UTC
• Feature Flags : all defaults

⚠️ Alerts & Blockers
• Provider health checks have not returned a positive result for Google or GitHub.
• Dependent workflows that require these providers are currently blocked.
• Recommended action: verify OAuth tokens / API credentials and re‑run ricky provider check --all.

🗒️ Next Steps

  1. Run ricky provider check google and ricky provider check github to diagnose.
  2. If credentials are missing or expired, update them via ricky provider configure.
  3. Re‑run ricky status to confirm connectivity.

--- End of Report ---

Must

  • Report unknown or not-connected provider state honestly.
  • Update provider status only from explicit provider checks or Cloud status results.
  • Give concrete setup guidance for Cloud when relevant.

Must Not

  • Mark Google or GitHub connected because guidance text was shown.
  • Invent a provider connection URL or OAuth flow.
  • Show empty fields with no recovery guidance when config is missing.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-matcher-registry-backed (generation-quality/manual)

User message

Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.

Ricky output

**GitHub Primitive‑Change Workflow**

| Stage | Action | Owner | Input | Output | Verification Gate |
|-------|--------|-------|-------|--------|--------------------|
| **1️⃣ Plan** | Create a **Change Request (CR)** in the repository’s `CHANGELOG.md` and tag the issue with `primitive-change`. | Product Owner | Issue description, scope, acceptance criteria | CR ticket (GitHub Issue) | ✅ Issue labeled `primitive-change` and linked to a milestone |
| **2️⃣ Branch** | `git checkout -b primitive/<short‑id>` from `main`. | Developer | CR ticket ID | Feature branch `primitive/<id>` | ✅ Branch name matches pattern `primitive/*` |
| **3️⃣ Code Edit** | Apply the minimal code edit (e.g., bump a constant, rename a variable). | Developer | Local repo, test harness | Modified files | ✅ `git diff` shows only the intended primitive change |
| **4️⃣ Lint/Static Check** | Run `npm run lint` (or language‑specific linter). | CI Bot | Changed files | Lint report | ✅ All lint warnings/errors = 0 |
| **5️⃣ Unit Test** | Execute `npm test` (or equivalent). | CI Bot | Code + test suite | Test report | ✅ All tests pass (≥ 100 % pass) |
| **6️⃣ Deterministic Verification Gate** | Run a **deterministic cont
...[truncated]

Must

  • Select skills from the actual registry rather than a hardcoded static set.
  • Record ranked skill matches with confidence and trigger evidence.
  • Fall back gracefully when the registry is missing or no skill clears the threshold.

Must Not

  • Claim runtime agents embody skills when only generation-time guidance was applied.
  • Fail generation solely because optional skill files are missing.
  • Hide skill selection evidence from artifacts or JSON output.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)

User message

Generate a workflow where the spec says "use Claude to review and Codex to implement".

Ricky output

**Workflow Title:** Claude‑Review → Codex‑Implementation Pipeline  

**Purpose:** Automatically route a feature request through a Claude‑based review stage, then hand‑off the approved design to Codex for code generation, with deterministic verification and clear blocker reporting.

---

## 1. Contract Overview  

| Element | Definition |
|---------|------------|
| **Input** | `FeatureSpec` – JSON object `{id:string, title:string, description:string, acceptanceCriteria:Array<string>}` |
| **Output** | `ImplementationBundle` – ZIP containing generated source files, a `README.md`, and a `VerificationReport.json`. |
| **Success Criteria** | • Claude review returns `reviewStatus: "APPROVED"`.<br>• Codex generates code that passes **all** automated tests derived from `acceptanceCriteria`.<br>• Verification report shows 0 failures. |
| **Failure Modes** | 1. Claude returns `REJECTED` → workflow aborts, blocker logged.<br>2. Codex generation errors or test failures → retry loop (max 3 attempts) then abort with blocker. |
| **SLAs** | Total wall‑clock time ≤ 30 min per request (including retries). |

---

## 2. Workflow Steps (Deterministic Gates)

| Step | Actor | Action | Gate | Artifact 
...[truncated]

Must

  • Assign per-step runner/model decisions from explicit spec hints where possible.
  • Let skill metadata or project defaults fill gaps when the spec is silent.
  • Write tool-selection.json or equivalent audit metadata.

Must Not

  • Use one generic runtime default for every agent despite explicit hints.
  • Let skill defaults override explicit user/spec runner hints.
  • Omit the reason each tool or model was chosen.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)

User message

Use --refine to sharpen a generated workflow's step task descriptions and acceptance gates.

Ricky output

### Using `--refine` to Sharpen Step Descriptions & Acceptance Gates  

`--refine` is a deterministic post‑processing flag that takes an existing workflow JSON/YAML and rewrites the **`task.description`** and **`acceptance.gate`** fields for every step. It is meant for the “polish” phase after the core logic has been validated (i.e., after you have passed the 80‑to‑100 validation loops and the workflow is merge‑ready).

---

## 1. When to invoke `--refine`

| Situation | Reason |
|-----------|--------|
| **Initial authoring** | After the first draft passes structural linting (`awf lint`) but before the first review. |
| **Post‑review iteration** | To incorporate reviewer feedback that targets clarity, measurability, or testability of tasks. |
| **Branch‑merge preparation** | Right before opening a PR to guarantee that every step is self‑documenting and has a deterministic gate. |
| **Blocker resolution** | If a blocker is reported as “vague description / non‑deterministic gate”, run `--refine` to auto‑generate concrete language. |

---

## 2. Command Syntax

```bash
awf refine \
  --input  path/to/workflow.yaml \
  --output path/to/workflow.refined.yaml \
  [--style concise|verbose
...[truncated]

Must

  • Keep the deterministic unrefined path as the default.
  • Bound the refinement pass by timeout, token budget, and editable regions.
  • Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.

Must Not

  • Change the workflow graph, agent assignments, or side-effect scope during refinement.
  • Fail the whole generation if optional refinement times out.
  • Hide warnings when the deterministic artifact is returned unchanged.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.behavior-grounded-gates (generation-quality/manual)

User message

Generate a workflow for the ricky --version spec.

Ricky output

**Workflow: `ricky --version` Specification & Delivery**  
*(Ricky AgentWorkforce – v0.9.3)*  

---

## 1. Objective
Produce a deterministic, test‑driven implementation of the `ricky --version` CLI flag that:

1. Returns the current Ricky version string (e.g., `ricky 0.9.3`).
2. Exits with status 0 on success, non‑zero on internal error.
3. Is reproducible across all supported platforms (Linux, macOS, Windows).
4. Is covered by unit‑tests, integration‑tests, and CI verification gates.

---

## 2. Scope & Boundaries
| Scope | Included | Excluded |
|-------|----------|----------|
| **Code** | CLI entry point, version constant, help integration, test suite. | Unrelated sub‑commands, packaging scripts, documentation beyond `README.md` version badge. |
| **Branch** | `feature/ricky-version` (new branch from `main`). | Changes to `main` unrelated to version flag. |
| **PR** | Single PR titled `feat: expose ricky version via --version`. | Additional feature work, refactors not directly tied to version flag. |

---

## 3. Contract (Specification)

| Item | Requirement | Acceptance Criteria |
|------|-------------|----------------------|
| **CLI flag** | `ricky --version` (short `-v` option
...[truncated]

Must

  • Build gates from the stated acceptance behavior, such as checking ricky --version output.
  • Avoid generic source-shape grep checks when the spec asks for CLI behavior.
  • Keep generated validation meaningful for the current repo shape.

Must Not

  • Treat grep -Eq 'export|function|class|workflow(' dist/ricky.js as proof of version behavior.
  • Claim the workflow is proven by source syntax alone.
  • Ignore the package-json version resolution order in the spec.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)

User message

Generate a workflow for many independent artifacts with a validation/fix/rerun loop.

Ricky output

**Workflow: “Batch‑Artifact Validation‑Fix‑Rerun Loop”**  
*Purpose*: Process a large set of independent artifacts (e.g., data files, code modules, model checkpoints) through deterministic validation, automatic fixing where possible, and controlled re‑execution until all artifacts pass or are blocked.  

---  

## 1. Scope & Boundaries  

| Scope Element | Description |
|---------------|-------------|
| **Input** | A *manifest* (JSON/CSV) listing artifact IDs, locations, and type metadata. |
| **Output** | For each artifact: <br>• `status: PASS | FIXED | BLOCKED` <br>• `validation_report` (JSON) <br>• `fixed_artifact` (if applicable) |
| **Branch/PR** | All changes to the manifest or fix scripts are made in a dedicated branch `batch‑loop‑<run‑id>`. A single PR aggregates the final “All‑Pass” commit. |
| **Owner** | `@team‑artifact‑ops` (reviewers: `@qa‑lead`, `@dev‑lead`). |
| **Runtime** | Executed in the CI‑CD pipeline `batch‑artifact‑loop.yml` (GitHub Actions / Azure Pipelines). |

---  

## 2. Contract (YAML)  

```yaml
name: batch-artifact-loop
on:
  workflow_dispatch:
    inputs:
      manifest_path:
        description: Path to manifest file
        required: true
        ty
...[truncated]

Must

  • Choose dag, supervisor, or pipeline deliberately based on the work shape.
  • Explain the pattern choice in artifact metadata or a rationale.
  • Use dag for validation/fix/rerun loops when dependencies matter.

Must Not

  • Default blindly to dag for every workflow.
  • Collapse independent artifact work into one vague agent task.
  • Omit verification gates because the chosen pattern seems obvious.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-boundary-copy (generation-quality/manual)

User message

Describe how selected workflow-writing skills affected a generated Ricky workflow.

Ricky output

### How Selected Workflow‑Writing Skills Shape a Generated Ricky Workflow  

| Skill | What the skill contributes | Direct impact on the generated workflow | Example of the artifact it produces |
|-------|----------------------------|------------------------------------------|--------------------------------------|
| **Deterministic Contract Definition** | Writes explicit input‑output contracts (JSON schema, versioned API, data‑type constraints). | Guarantees that every step receives a well‑typed payload and that downstream steps can be statically verified. The workflow engine can auto‑generate type‑checking gates before each transition. | ```json { "type": "object", "properties": { "orderId": {"type":"string"}, "amount": {"type":"number"} }, "required": ["orderId"] }``` |
| **Verification Gate Design** | Inserts deterministic validation nodes (e.g., schema validation, checksum, business‑rule predicates). | Prevents “bad data” from propagating, turning silent failures into early, observable blockers. Each gate emits a **GateResult** artifact that downstream steps can query. | `gate: validateOrderPayload → {status: "PASS"| "FAIL", errors: []}` |
| **Review‑Artifact Generation** | Em
...[truncated]

Must

  • Describe skills as generation-time selection, loading, and rendering inputs.
  • Point to metadata such as loaded-skills.txt and skill-application-boundary.json.
  • Say the workflow instructions were informed by selected skills.

Must Not

  • Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
  • Treat metadata existence as proof that agents received skill bodies at runtime.
  • Overstate current tests beyond the generation boundary.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)

User message

Generate a workflow from a vague product spec and return it to the user.

Ricky output

Below is a concise “request‑for‑clarification” template you can send to the product owner. It follows our **Ricky workflow authoring contract**: it defines the exact inputs we need, the validation gates we’ll apply, and the deliverables we’ll produce once the spec is clarified.

---

## 📄 Request for Specification Details (Workflow‑Authoring Contract)

**Purpose:** Gather the missing information required to generate a deterministic, review‑ready workflow that satisfies the product’s intent.

| # | Required Detail | Why We Need It | Expected Format |
|---|----------------|----------------|-----------------|
| 1 | **Workflow Goal / Business Outcome** | Drives the high‑level success criteria and determines the final verification gate. | One‑sentence statement (e.g., “Publish a user‑generated article after editorial approval”). |
| 2 | **Primary Actors & Roles** | Determines who owns each step and which access controls are needed. | List of roles (e.g., *Author, Reviewer, Publisher*). |
| 3 | **Key Steps (at least 3)** | Defines the deterministic state‑transition graph. | Ordered list with brief description per step. |
| 4 | **Inputs & Outputs per Step** | Enables us to create typed c
...[truncated]

Must

  • Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
  • Validate with dry-run or targeted structural checks where possible.
  • Return artifacts, warnings, and follow-up commands honestly.

Must Not

  • Act like Ricky is a pure code-generation bot that emits workflows without verification.
  • Stop at "code compiles" as the proof bar.
  • Skip skill-aware workflow authoring guidance for serious workflows.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)

User message

A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.

Ricky output

**Ricky’s immediate response to a failed workflow**

1. **Capture the failure state**  
   - Pull the latest run‑metadata from the orchestration engine (e.g., Airflow, Temporal, or the internal workflow runner).  
   - Record:
     * The step that timed‑out (step‑ID, start‑time, timeout‑limit, error message).  
     * The worker that remained alive (worker‑ID, heartbeat timestamps, last‑known state).  
     * Absence of expected artifacts (artifact‑paths, checksum‑expected vs. actual).  

2. **Mark a deterministic blocker**  
   - Create a **Blocker Ticket** in the workflow‑tracking repo (e.g., `BLOCKER‑<date>-<seq>`).  
   - Tag it with `workflow-failure`, `timeout`, `orphan‑worker`.  
   - Include the captured metadata and a short “impact” statement (e.g., “ downstream jobs cannot start because required artifact X is missing”).  

3. **Isolate the run**  
   - Abort the current run via the orchestration API (if not already terminated).  
   - Force‑kill the orphaned worker process (using the worker‑ID and the platform’s termination command).  
   - Ensure the worker’s container/pod is removed so it cannot produce stray artifacts later.

4. **Generate a repair contract**  
   - Op
...[truncated]

Must

  • Classify the failure before deciding whether to retry.
  • Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
  • Preserve raw evidence and uncertainty when the class is not known.

Must Not

  • Blindly rerun the whole workflow without a blocker classification.
  • Treat every failure as a broken workflow definition.
  • Claim Ricky fixed the workflow before rerun evidence proves it.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)

User message

Ricky detects stale .agent-relay/, .relay/, and .trajectories/ state before launching a local workflow.

Ricky output

**Workflow: Pre‑Launch Stale‑State Guard for Local AgentWorkforce Runs**  

---

### 1. Purpose
Detect and abort a local workflow launch when any of the following runtime‑state directories contain stale data:

| Directory | Meaning |
|-----------|---------|
| `.agent-relay/` | Cached relay messages and pending ACKs |
| `.relay/`       | Transport‑layer buffers for inter‑agent communication |
| `.trajectories/`| Persisted execution traces and state snapshots |

Stale data can cause nondeterministic behavior, replay of old messages, or corrupted trajectory logs. This guard ensures a clean start for every local run.

---

### 2. Scope & Boundaries
- **Branch**: `feature/prelaunch-stale‑guard` (or a hot‑fix branch `bugfix/clean‑state‑guard`).
- **PR Target**: `main`.
- **Affected Packages**: `@agentworkforce/cli`, `@agentworkforce/runtime`.
- **No changes** to production agents or external services; only local bootstrap logic.

---

### 3. Contract (Input → Output)

| Input | Description |
|-------|-------------|
| `process.cwd()` | Root of the local workflow project. |
| `options.clean?: boolean` | Optional flag to auto‑clean stale dirs (`true` = delete, `false` = abort). |
| `options
...[truncated]

Must

  • Classify stale local runtime state as an environment contamination issue.
  • Recommend quarantine or isolated-run guidance before launch.
  • Record the observed paths and the action taken or recommended.

Must Not

  • Treat stale runtime state as a workflow logic failure.
  • Delete or overwrite state without an explicit safe path or user intent.
  • Continue into execution as if the workspace were clean.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)

User message

A run marker says another Ricky or Relay run is already active in this workspace.

Ricky output

**Fixing “Another Ricky or Relay run is already active in this workspace”**

The marker is a lock file that prevents two concurrent executions from stepping on each other. When it is left behind (crash, stray process, or stale workspace) you’ll see the error you described. Follow the deterministic repair contract below.

---

## 1️⃣ Verify the current state

| Step | Command | Expected outcome |
|------|---------|------------------|
| 1.1 | `ricky status` | Should show **RUNNING** with a PID, or **IDLE** with a stale lock. |
| 1.2 | `ps -ef \| grep ricky` (or `tasklist /FI "IMAGENAME eq ricky.exe"` on Windows) | List any live Ricky/Relay processes. |
| 1.3 | `cat .ricky/run/lock.json` (or `.relay/run/lock.json`) | Shows JSON `{ "pid": <num>, "started": "<ISO‑ts>" }`. |

If **status** reports *RUNNING* **and** a PID exists in the lock file **and** the PID is alive → the run is genuine. If the PID is missing or the process is dead, the lock is stale.

---

## 2️⃣ Resolve a **genuine** active run

If the lock is valid:

1. **Let the run finish** – most CI pipelines clean up automatically.  
2. If you need to abort, use the official gate:  

   ```bash
   ricky abort --pid <pid-from-lo
...[truncated]

Must

  • Report the active marker, run id, or status path when available.
  • Ask the user to inspect, wait for, or explicitly clear the active run.
  • Avoid launching a competing run that could corrupt evidence.

Must Not

  • Silently start another run.
  • Hide the existing run marker from the user.
  • Treat the conflict as a generic failure with no recovery path.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.env-loader-injection-runtime-loadable (runtime-recovery/manual)

User message

A workflow artifact references a MISSING_ENV_VAR value. Ricky's deterministic auto-fix injects the .env.local / .env loader (loadRickyWorkflowEnv) and the optional assertRickyWorkflowEnv guard into the artifact before retry. The artifact may be a master-rendered workflow whose .step({ command: ... }) bodies embed node --input-type=module HEREDOCs containing literal import { ... } from 'node:fs' / from 'node:path' strings.

Ricky output

**Deterministic auto‑fix for missing environment variables**

Below is the concrete, reproducible patch you can apply to any workflow artifact that currently fails because it references a `MISSING_ENV_VAR`. The fix does three things:

1. **Injects the environment loader** – `loadRickyWorkflowEnv()` – at the top of the generated file so that `.env.local` (or fallback `.env`) is parsed before any step runs.  
2. **Adds an optional guard** – `assertRickyWorkflowEnv()` – that throws a clear error if the required variable is still undefined after loading.  
3. **Wraps any HEREDOC‑embedded `node --input-type=module` command** so that the loader/guard are available inside the temporary script that the step executes.

---

## 1. Shared helper module (`ricky-workflow-env.ts`)

Create (or update) a tiny utility file that lives next to your workflow source (e.g., `src/ricky-workflow-env.ts`). This file is **purely deterministic** – no side‑effects other than reading the env files.

```ts
// src/ricky-workflow-env.ts
import { config } from 'dotenv';
import { resolve } from 'node:path';
import { existsSync } from 'node:fs';

/**
 * Load environment variables from .env.local (preferred) or .env.
...[truncated]

Must

  • Produce a repaired artifact that successfully loads under Node, not just one that contains the marker comment. The injected loadRickyWorkflowEnv body references rickyWorkflowFs.* and rickyWorkflowPath.*, so the repair must also add the corresponding import * as rickyWorkflowFs from 'node:fs' and import * as rickyWorkflowPath from 'node:path' aliases at module top level.
  • Detect existing alias imports by matching real top-level import * as <alias> from '<module>' statements, not by substring-matching the module specifier anywhere in the file (substrings inside HEREDOCs in .step({ command: ... }) bodies do not count as imports).
  • Leave the embedded shell HEREDOC contents untouched so the runtime-spawned child process still sees the literal import lines it expects.

Must Not

  • Skip adding the rickyWorkflowFs / rickyWorkflowPath aliases because from 'node:fs' or from 'node:path' already appears somewhere in the file as a string literal.
  • Inject loadRickyWorkflowEnv (or assertRickyWorkflowEnv) without the supporting alias imports, which produces a ReferenceError: rickyWorkflowPath is not defined at module load and burns the auto-fix budget on UNSUPPORTED_RUNTIME at runtime-launch.
  • Rewrite or escape the embedded HEREDOC text in step commands.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)

User message

Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.

Ricky output

**Local Workflow Execution – Auto‑Fix Run**

Below is a deterministic, contract‑driven procedure you can copy‑paste into your terminal (or CI step) to run the workflow, detect the failure, invoke the auto‑fix routine, and re‑run the repaired artifact. All verification gates, review artifacts, and loop limits are spelled out so you can audit each stage.

---

## 1️⃣ Prerequisites (contract)

| Item | Requirement | How to verify |
|------|-------------|---------------|
| `workflow.yaml` | Exists in the current repo root | `test -f workflow.yaml && echo OK` |
| `aw-cli` (AgentWorkforce CLI) | Version ≥ 2.4.0, auto‑fix flag supported | `aw-cli --version` |
| Docker (or the runtime specified in `workflow.yaml`) | Running & reachable | `docker ps` |
| Previous run ID (`PREV_RUN_ID`) | Exported env var or passed as arg | `echo $PREV_RUN_ID` |
| Failed step name (`FAILED_STEP`) | Exported env var or passed as arg | `echo $FAILED_STEP` |

If any check fails, abort and fix the prerequisite before proceeding.

---

## 2️⃣ Execution Contract

```bash
#!/usr/bin/env bash
set -euo pipefail

# ---- INPUTS -------------------------------------------------
PREV_RUN_ID="${PREV_RUN_ID:-}"          # 
...[truncated]

Must

  • Use a bounded retry budget and summarize every attempt.
  • Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
  • Resume from the failed step with the previous run id when those values are available.

Must Not

  • Edit arbitrary repository source files as the default auto-fix surface.
  • Keep retrying after the configured max attempts.
  • Lose the single Ricky tracking run id across repair/resume attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.no-auto-fix-preserves-single-attempt (runtime-recovery/manual)

User message

A user runs ricky run workflows/foo.ts --no-auto-fix and the workflow fails.

Ricky output

**Immediate troubleshooting checklist**

| Step | Action | Expected outcome |
|------|--------|-------------------|
| 1️⃣ | **Collect the run log**<br>`ricky logs workflows/foo.ts --last` | A full JSON‑Lined log file (`foo.run.log`) appears in `./ricky/logs/`. |
| 2️⃣ | **Identify the failure gate**<br>Open the log and look for a line with `"gate":"<name>"` and `"status":"FAIL"` | The gate name tells you which contract validation failed (e.g., `input-schema`, `resource‑availability`, `type‑check`). |
| 3️⃣ | **Inspect the error payload**<br>Copy the `"error"` field from the failing gate entry | Gives the exact message (e.g., “Missing required field `payload.id`”, “Circular dependency detected”). |
| 4️⃣ | **Run the workflow in dry‑run mode**<br>`ricky run workflows/foo.ts --dry-run` | The engine will stop before any side‑effects and print the same validation sequence, confirming whether the failure is deterministic. |
| 5️⃣ | **Validate the contract locally**<br>`ricky verify contracts/foo.contract.ts` | Returns `PASS`/`FAIL` for each contract clause. If it fails, the contract itself is broken. |
| 6️⃣ | **Check for recent PR merges**<br>`git log -n 10 --oneline -- workflows/foo.ts
...[truncated]

Must

  • Preserve one-attempt behavior when auto-fix is disabled.
  • Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
  • Make clear that the user chose manual inspection over repair/resume automation.

Must Not

  • Start a repair loop despite --no-auto-fix.
  • Suppress the diagnosis because no repair was attempted.
  • Present the failure as a completed repair attempt.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.in-process-local-runner (runtime-recovery/manual)

User message

Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.

Ricky output

**Ricky – Local Execution of a TypeScript Workflow Artifact**  
*(primary local path = `./workflows` in the repository root)*  

---

## 1. Preconditions (Contract)

| Item | Requirement | Verification |
|------|-------------|--------------|
| **Repo layout** | `./workflows/<workflow‑name>/` contains `index.ts` (entry point) and a `workflow.yaml` manifest. | `ls ./workflows/*/index.ts` must succeed. |
| **Node environment** | Node ≥ 18, `npm`/`pnpm`/`yarn` installed, and `ts-node` available globally or as a dev‑dependency. | `node -v && ts-node -v` returns versions. |
| **Dependencies** | All `package.json` deps for the workflow are installed (`npm ci` or `pnpm install`). | `npm ls` exits 0. |
| **TypeScript config** | A `tsconfig.json` exists at repo root or inside the workflow folder and includes `"moduleResolution": "node"` and `"esModuleInterop": true`. | `tsc --noEmit` passes. |
| **Execution guard** | No uncommitted changes in the workflow folder (to guarantee reproducibility). | `git status --porcelain ./workflows/<wf>` returns empty. |

If any verification fails, **Ricky reports a blocker** with a concise error and aborts the run.

---

## 2. Deterministic Execution Gate

1
...[truncated]

Must

  • Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the agent-relay binary on PATH.
  • Precheck that Node and @agent-relay/sdk are resolvable for the workflow.
  • Record the actual spawn command in execution evidence.

Must Not

  • Fail solely because agent-relay is not on PATH when the SDK route is available.
  • Hide the actual runtime command from evidence.
  • Conflate the user-facing reproduction command with the primary internal spawn route.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)

User message

Ricky reaches a boundary after a structural failure persists after a fix attempt.

Ricky output

**Ricky’s Boundary‑Handling Protocol for a Persistent Structural Failure**

When a structural failure remains after an initial remediation attempt, Ricky follows a deterministic, contract‑driven workflow that isolates the problem, prevents downstream impact, and creates a clear hand‑off for human owners. The protocol is split into four phases: **Detect → Attempt → Verify → Escalate**. Each phase is bounded by explicit contracts and verification gates that guarantee repeatability and traceability.

---

## 1. Detect – Failure Capture & Contract Assertion  

| Step | Action | Contract |
|------|--------|----------|
| 1.1 | Pull the latest **failure event** from the monitoring bus (e.g., `structural_failure_v2` topic). | Event must contain: `failure_id`, `timestamp`, `affected_components[]`, `initial_root_cause` (if any), `attempted_fix_id` (optional). |
| 1.2 | Log the event to the **Failure Ledger** (`failures/ledger.yaml`) with a unique `failure_id`. | Ledger entry must be **append‑only** and include a SHA‑256 hash of the raw event for immutability. |
| 1.3 | Emit a **Detection Gate** (`detect_gate_{failure_id}`) that other agents can subscribe to. | Gate payload = `{status: "DETEC
...[truncated]

Must

  • Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
  • Distinguish escalation from a generic product failure.
  • Preserve enough context for a human operator to continue.

Must Not

  • Retry speculative fixes indefinitely.
  • Collapse to "something went wrong" without the attempted actions.
  • Discard evidence from failed repair attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)

User message

Produce a workflow health digest from many Ricky workflow runs.

Ricky output

**Ricky Workflow Health Digest – Summary (last 30 days)**  

| Workflow | Runs (30 d) | Success % | Avg Dur (s) | Failure Reasons (top 3) | SLA Met % | Last Review |
|----------|-------------|----------|-------------|--------------------------|----------|-------------|
| **wf‑data‑ingest** | 124 | 96.8 % | 42 | 1️⃣ Schema‑mismatch (7) 2️⃣ Timeout (3) 3️⃣ Auth‑token‑expired (2) | 99 % (≤ 45 s) | 2026‑04‑28 (peer‑review) |
| **wf‑model‑train** | 58 | 91.4 % | 3120 | 1️⃣ GPU‑out‑of‑memory (9) 2️⃣ Data‑leak (4) 3️⃣ Config‑validation‑fail (2) | 95 % (≤ 3600 s) | 2026‑04‑15 (auto‑gate) |
| **wf‑report‑gen** | 210 | 98.6 % | 18 | 1️⃣ PDF‑render‑error (3) 2️⃣ Missing‑locale (1) | 100 % (≤ 30 s) | 2026‑04‑30 (peer‑review) |
| **wf‑notify‑slack** | 342 | 99.4 % | 4 | 1️⃣ Rate‑limit (2) | 100 % (≤ 5 s) | 2026‑04‑27 (auto‑gate) |
| **wf‑cleanup‑tmp** | 87 | 93.1 % | 27 | 1️⃣ File‑lock (5) 2️⃣ Permission‑denied (2) | 97 % (≤ 30 s) | 2026‑04‑20 (peer‑review) |

---

### 1. Key Health Indicators  

| Indicator | Target | Current | Status |
|-----------|--------|---------|--------|
| **Overall Success Rate** | ≥ 95 % | **96.2 %** | ✅ |
| **Mean Time To Recovery (MTTR)** | ≤ 5 min | **3 min 42 s** 
...[truncated]

Must

  • Consume normalized WorkflowRunEvidence rather than raw logs as the primary input.
  • Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
  • Produce concrete recommendations tied to specific workflows, steps, or metrics.

Must Not

  • Mutate evidence while analyzing it.
  • Return generic advice like "improve your workflow" without references.
  • Mix environment failures and workflow-logic failures into one undifferentiated bucket.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)

User message

Design a new Ricky Slack surface for workflow debugging.

Ricky output

**Ricky Slack Surface – “Workflow Debugger”**  
*Version 0.1 – Design Draft*  

---

## 1. Overview
The **Workflow Debugger** Slack surface gives engineers a deterministic, low‑latency channel to inspect, step‑through, and repair Ricky‑managed workflows directly from Slack. It follows the Ricky repository conventions (AGENTS.md) and the standard workflow lifecycle:

1. **Trigger** – Slack command or button press.  
2. **Validate** – Contract check (payload schema, auth).  
3. **Execute** – Invoke the debug engine (read‑only or interactive).  
4. **Report** – Structured response (status, logs, actionable suggestions).  
5. **Close** – Optional PR creation or ticket filing.

All interactions are **idempotent** and **audit‑logged** for compliance.

---

## 2. Public Contract (Slack App Manifest)

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `command` | `string` | ✅ | `/ricky-debug` – entry point. |
| `subcommand` | `enum` | ✅ | `inspect`, `step`, `patch`, `status`. |
| `workflow_id` | `uuid` | ✅ (except `status`) | Target workflow definition. |
| `run_id` | `uuid` | ❌ (optional) | Specific execution to debug; if omitted, latest run is used. |
|
...[truncated]

Must

  • Treat Slack as a co-equal product surface, not a wrapper around CLI.
  • Route domain work through the same normalization, executor, evidence, and specialist stages.
  • Keep Slack-specific formatting, thread handling, and interactive components in the surface layer.

Must Not

  • Put workflow generation or diagnosis domain logic directly in the Slack handler.
  • Degrade Slack to a developer shortcut with weaker routing than CLI.
  • Skip signature verification, dedup, or thread handling when the surface is implemented.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.normalizer-is-only-translation-boundary (surfaces-ingress/manual)

User message

Add a new web handoff type that submits a workflow spec and mode.

Ricky output

**New Hand‑off Type:** `web_submit_spec`

| Field | Type | Description |
|-------|------|-------------|
| **type** | `string` (enum) | Fixed value `"web_submit_spec"` – identifies this hand‑off. |
| **payload** | `object` | Container for the data the UI sends. |
| **payload.spec** | `object` | Full workflow specification (JSON‑Schema‑validated). |
| **payload.mode** | `string` (enum) | `"draft"` | `"publish"` | `"test"` – determines how the spec is processed downstream. |
| **payload.metadata** *(optional)* | `object` | Free‑form key/value pairs (e.g., `requestId`, `userId`). |
| **response** | `object` | Returned to the UI after processing. |
| **response.status** | `string` (enum) | `"accepted"` | `"rejected"` | `"error"` |
| **response.message** | `string` | Human‑readable explanation of the result. |
| **response.errors** *(optional)* | `array<object>` | Validation or processing errors, each with `path` and `detail`. |

---

## 1. Contract Definition (JSON‑Schema)

```json
{
  "$id": "urn:agentworkforce:handoff:web_submit_spec",
  "type": "object",
  "required": ["type", "payload"],
  "properties": {
    "type": { "const": "web_submit_spec" },
    "payload": {
      "type": "ob
...[truncated]

Must

  • Add a handoff variant that normalizes into LocalInvocationRequest or CloudGenerateRequest.
  • Keep auth/session validation in the web surface before normalization.
  • Ensure downstream code does not need to know the request came from web.

Must Not

  • Short-circuit directly from the web handler to executors.
  • Create a parallel domain model for web requests.
  • Make the web surface the owner of local or Cloud routing semantics.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.mcp-claude-context-is-metadata (surfaces-ingress/manual)

User message

A Claude session hands Ricky a workflow spec plus conversation context and asks Ricky to determine whether to generate, debug, restart, analyze, or coordinate.

Ricky output

**Decision Matrix for the Requested Action**

| **Input Condition** | **Recommended Action** | **Rationale** |


---

_Comment truncated to stay within GitHub limits. Download the `ricky-eval-run` artifact for the full `human-review.md`._

…// concat

CodeRabbit + Devin both flagged the bug correctly: the previous
subpath branch did

    return nextResolve(sdkRootUrl + "/" + "github", context);

which passes a fully-qualified file:// URL to nextResolve. Per node's
ESM resolution algorithm (Node docs `esm.html` + `packages.html`),
file:// URLs are treated as already-resolved and Node SKIPS package-
exports lookup. So `@agent-relay/sdk/github` would try to load a
literal file at `<sdkRoot>/github` instead of going through the SDK's
exports map (which maps `./github` → `./dist/github.js`). Verified
against actual @agent-relay/sdk@6.0.13.

The right shape: pass the original BARE specifier to nextResolve
with `parentURL` anchored inside the target package root. Node then
walks up looking for node_modules, finds the bundled package, and
applies its exports map.

Behavior after this commit (the three exact-match branches were
already correct and stay):

  • `@agent-relay/sdk/workflows` → bundled (shortCircuit)
  • `@agent-relay/sdk` → bundled root index
  • `@agent-relay/sdk/<anything>` → re-resolved via parentURL anchor
    so the SDK's exports map handles the subpath
  • `@agent-relay/config` → bundled config index
  • `@agent-relay/config/<anything>` → re-resolved via config
    parentURL anchor

Adds a regression test that imports a non-`workflows` SDK subpath
(`@agent-relay/sdk/github`) AND two `@agent-relay/config` subpaths
(`/relay-config`, `/agent-config`) from a workflow file and asserts
the run succeeds without consumer-repo node_modules. Test fails
against the previous file-URL-concat implementation; passes against
this one.

114/114 src/local/entrypoint.test.ts pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@khaliqgant
Copy link
Copy Markdown
Member Author

Pushed 2180df3 addressing the CodeRabbit + Devin findings. Both bots correctly flagged that the previous subpath branch did:

return nextResolve(sdkRootUrl + "/" + "github", context);  // file:// URL

Per node's ESM resolution algorithm, file:// URLs are treated as already-resolved and the package's exports map is skipped. So @agent-relay/sdk/github would try to load a literal file at <sdkRoot>/github instead of going through the SDK's exports map (which maps ./github./dist/github.js). Verified against the actual @agent-relay/sdk@6.0.13 package.

Fix

Pass the original bare specifier with parentURL anchored inside the bundled package, so node walks up looking for node_modules, finds the bundled @agent-relay/sdk (or /config), and applies its exports map:

if (specifier.startsWith(SDK_SUBPATH_PREFIX)) {
  return nextResolve(specifier, { ...context, parentURL: sdkParentUrl });
}

Regression test

Added runLocal > resolves @agent-relay/sdk subpaths and @agent-relay/config via the bundled package — imports a non-workflows SDK subpath (@agent-relay/sdk/github) plus two config subpaths (/relay-config, /agent-config) from a workflow file with no consumer-repo node_modules, and asserts the run succeeds. Fails against the previous file-URL-concat implementation; passes against this one.

114/114 in src/local/entrypoint.test.ts passing.

@khaliqgant khaliqgant merged commit 28361a4 into main May 10, 2026
2 checks passed
@khaliqgant khaliqgant deleted the fix/sdk-loader-subpaths branch May 10, 2026 17:56
khaliqgant added a commit that referenced this pull request May 11, 2026
Pulls in agent-relay/relay#838 ("Drain broker stdout after SDK startup")
which is the upstream/root fix for the broker pipe-buffer deadlock that
wedged overnight runs of the proactive-runtime workflows.

Reproducible symptom prior to this bump: after M1 fans out to 9 PTY
workers, every worker's log goes silent within seconds of each other,
broker process parks in `write()` (or `_pthread_cond_wait` once enough
events queue), and the workflow node's `step.run` awaits a drain signal
that never arrives. Two diagnostic captures (~/wedge-92b45d3e-diag/ and
~/wedge-c9600674-diag/, ~14h apart on independent runs) showed the same
freeze-at-fanout shape.

Ricky already shipped the loader-level unblocker in #94. This bump
brings in the upstream SDK fix so direct SDK consumers — including
Ricky-driven workflows whose runtime SDK is the bundled copy under
Ricky's node_modules (per #92's loader redirect) — pick up the proper
drain.

Verification
- npm run typecheck — clean
- npm run build — bundles cleanly, externals unchanged
- npm test — 1075 / 1075 pass, including the existing
  "drains broker stdout after SDK startup so event floods cannot wedge
  the workflow node" regression in src/local/entrypoint.test.ts

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant