fix(auto-fix): don't repair non-executable --spec-file paths as workflows by khaliqgant · Pull Request #111 · AgentWorkforce/ricky

khaliqgant · 2026-05-15T15:20:12Z

Summary

resolveArtifactPath in auto-fix-loop.ts used to fall back to request.specPath unconditionally. For CLI invocations like ricky --spec-file foo.md, that pointed at the source markdown spec — not an executable workflow. When intake-stage generation failed (e.g. unresolved clarification questions in the spec), auto-fix handed the markdown to the workflow repairer, re-fed the result as source: 'workflow-artifact', and on the next attempt the natural-language classifier misrouted the spec body to debug (triggered by words like "logs", "stderr", "evidence" in the spec). The loop ran the full maxAttempts (7×) producing the misleading message routing: Debug routing requires failed-run evidence, logs, or a run identifier.
Gate the request.specPath fallback on isExecutableWorkflowPath (exported from entrypoint.ts). When generation fails before producing an artifact and no executable workflow path exists, auto-fix now bails on attempt 1 with the original failure reason.

Repro (before)

$ ricky --mode local --spec-file docs/some-spec-with-open-questions.md --run
Generation: failed (status: error).
Reason: routing: Debug routing requires failed-run evidence, logs, or a run identifier.
Auto-fix: errored after 7/7 attempt(s)

After

Auto-fix bails on attempt 1 surfacing the actual intake failure (e.g. routing: Spec has unresolved workflow authoring questions...) instead of looping.

Test plan

New regression test does not treat a non-executable --spec-file path as a workflow artifact to repair in src/local/auto-fix-loop.test.ts
npx vitest run src/local/auto-fix-loop.test.ts — 36/36 pass
npx vitest run src/local/entrypoint.test.ts — 118/118 pass
npx tsc --noEmit clean

🤖 Generated with Claude Code

…lows When `ricky --spec-file foo.md` fails at the intake stage (e.g. unresolved clarification questions), the auto-fix loop used to fall back to `request.specPath` in `resolveArtifactPath` and hand the source markdown spec to the workflow repairer. The repaired markdown was then re-fed as `source: 'workflow-artifact'`, losing the CLI spec-file routing in `toRawSpecPayload`. The next attempt parsed the spec body as natural language; failure-vocabulary keywords ("logs", "stderr", "evidence") misrouted it to debug, which failed with the misleading "Debug routing requires failed-run evidence" message — repeated up to maxAttempts (7×) before erroring. Gate the `request.specPath` fallback on `isExecutableWorkflowPath` so only paths that actually name a workflow (`workflows/**/*.{ts,js}` or `*.workflow.{ts,js,yaml,yml}`) qualify. When generation fails before producing an artifact and no executable path exists, auto-fix bails on attempt 1 with the original failure reason instead of looping. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-15T15:20:24Z

📝 Walkthrough

Walkthrough

This PR prevents the auto-fix workflow repair system from attempting to repair markdown or other non-executable spec files by exporting an executable-path validator, using it to guard artifact path resolution, and adding regression test coverage.

Changes

Non-executable spec auto-fix prevention

Layer / File(s)	Summary
Export executable workflow path validator `src/local/entrypoint.ts`	`isExecutableWorkflowPath` is exported so other modules can determine whether a path points to an executable workflow.
Guard auto-fix artifact resolution for non-executable specs `src/local/auto-fix-loop.ts`	Import and use `isExecutableWorkflowPath` in `resolveArtifactPath` to prevent non-executable `--spec-file` paths (e.g., markdown docs) from being treated as repair targets; the fallback now returns `undefined` if the spec path is not executable.
Regression test for non-executable spec handling `src/local/auto-fix-loop.test.ts`	Add regression test verifying that `runWithAutoFix` skips workflow repair when given a non-executable markdown spec file; introduces `generationOnlyFailureResponse()` helper for generation-stage failure responses.

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

AgentWorkforce/ricky#56: Both PRs hinge on detecting when --spec-file resolves to a non-executable path via isExecutableWorkflowPath—fix(spec-intake): pin --spec-file handoffs to generate intent #56 changes toRawSpecPayload() routing to intent: 'generate', while the main PR exports that helper and uses it in resolveArtifactPath to prevent auto-fix/workflow repair from being triggered for non-executable spec inputs.

Poem

🐰 A markdown spec walked in,
Auto-fix said "not my kin!"
Executable paths pass through,
Repos saved from mis-repair too.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and concisely describes the main fix: preventing non-executable --spec-file paths from being treated as workflow artifacts for repair.
Description check	✅ Passed	The description thoroughly explains the bug, the fix, the rationale, and includes a test plan showing how the changes were validated.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/auto-fix-specpath-fallback

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/local/auto-fix-loop.test.ts`:
- Around line 1098-1112: The helper generationOnlyFailureResponse models a
generation/intake failure (generation.status 'needs_clarification') but sets a
blocker exit code (exitCode: 2); change the exitCode in
generationOnlyFailureResponse to the non-blocker failure value used for real
generation failures (e.g., exitCode: 1) so tests accurately reflect
generation-only failures.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: ef1dfa1f-3f86-4bac-b59b-091f66fb81f1

📥 Commits

Reviewing files that changed from the base of the PR and between 31f002f and 8333cf5.

📒 Files selected for processing (3)

src/local/auto-fix-loop.test.ts
src/local/auto-fix-loop.ts
src/local/entrypoint.ts

coderabbitai · 2026-05-15T15:22:29Z

+function generationOnlyFailureResponse(): LocalResponse {
+  return {
+    ok: false,
+    artifacts: [],
+    logs: [],
+    warnings: ['routing: Spec has unresolved workflow authoring questions'],
+    nextActions: ['Clarify the local workflow request and retry.'],
+    generation: {
+      stage: 'generate',
+      status: 'needs_clarification',
+      error: 'routing: Spec has unresolved workflow authoring questions',
+    },
+    exitCode: 2,
+  };
+}


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Align generation-only fixture exit code with real generation failures.

Line 1110 sets exitCode: 2, but this helper models a generation/intake failure (status: 'needs_clarification'). Using blocker exit code here can hide regressions in auto-fix behavior for non-runtime failures.

Suggested fix

function generationOnlyFailureResponse(): LocalResponse { return { @@ - exitCode: 2, + exitCode: 1, }; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

function generationOnlyFailureResponse(): LocalResponse {

return {

ok: false,

artifacts: [],

logs: [],

warnings: ['routing: Spec has unresolved workflow authoring questions'],

nextActions: ['Clarify the local workflow request and retry.'],

generation: {

stage: 'generate',

status: 'needs_clarification',

error: 'routing: Spec has unresolved workflow authoring questions',

},

exitCode: 2,

};

}

function generationOnlyFailureResponse(): LocalResponse {

return {

ok: false,

artifacts: [],

logs: [],

warnings: ['routing: Spec has unresolved workflow authoring questions'],

nextActions: ['Clarify the local workflow request and retry.'],

generation: {

stage: 'generate',

status: 'needs_clarification',

error: 'routing: Spec has unresolved workflow authoring questions',

},

exitCode: 1,

};

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/local/auto-fix-loop.test.ts` around lines 1098 - 1112, The helper generationOnlyFailureResponse models a generation/intake failure (generation.status 'needs_clarification') but sets a blocker exit code (exitCode: 2); change the exitCode in generationOnlyFailureResponse to the non-blocker failure value used for real generation failures (e.g., exitCode: 1) so tests accurately reflect generation-only failures.

cubic-dev-ai

No issues found across 3 files

_{Re-trigger cubic}

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

github-actions · 2026-05-15T16:16:14Z

Ricky Eval Review

Run: .ricky/evals/runs/2026-05-15T15-20-52-729Z-HEAD-provider
Mode: provider
Git SHA: 939fe88

Human Review Cases

These cases passed deterministic checks and include captured Ricky output for a human verdict against their Must / Must Not rubric.

REVIEW agent-assistant-boundary.real-reuse-not-rhetorical (agent-assistant-boundary/manual)

User message

Update Ricky docs and code to say it uses Agent Assistant more deeply.

Ricky output

**Ricky – AgentWorkforce Workflow Reliability Assistant**  
*Update: Deep integration with the Agent Assistant framework*  

---  

## 1. Scope of Change  

| Area | What is changed | Why |
|------|----------------|-----|
| **README.md** | Add a “Deep Agent Assistant Integration” section describing the new `agent_assistant` service calls, configuration flags, and runtime expectations. | Makes the contract visible to downstream developers and ops teams. |
| **src/ricky/__init__.py** | Export `AgentAssistantClient` and expose a new `use_agent_assistant` flag in `RickyConfig`. | Provides a deterministic entry‑point for the deeper integration. |
| **src/ricky/core.py** | Refactor `execute_workflow` to route *verification* and *recovery* steps through `AgentAssistantClient.run_step`. | Aligns execution path with the Agent Assistant orchestration model. |
| **tests/** | Add 12 new unit tests covering the new client injection, flag handling, and fallback behavior. | Guarantees 80‑100 % validation loops for the new path. |
| **docs/architecture.md** | Update the component diagram and narrative to show the bidirectional data flow between Ricky and Agent Assistant. | Keeps architecture docs 
...[truncated]

Must

Ground claims in real package imports and runtime paths.
Distinguish current implementation from target architecture.
Identify which Agent Assistant primitive is actually exercised.

Must Not

Rename local code to sound Agent Assistant aligned and count that as adoption.
Claim broad Agent Assistant native behavior from documentation-only alignment.
Blur target architecture with landed behavior.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW agent-assistant-boundary.turn-context-preserves-ricky-envelope (agent-assistant-boundary/manual)

User message

Evaluate the current Ricky @agent-assistant/turn-context adoption.

Ricky output

**Ricky @agent‑assistant/turn‑context Adoption Review (as of 2026‑05‑15)**  

| Metric | Current State | Target / SLA | Gap / Action |
|--------|---------------|--------------|--------------|
| **Workflow contracts defined** | 12 of 18 active agents have a formal `turn‑context` contract (JSON‑Schema v1.2). | 100 % | Draft contracts for the remaining 6 agents (e.g., *Ticket‑Triage*, *Knowledge‑Base‑Search*, *Escalation‑Router*). |
| **Deterministic verification gates** | 8 agents run a pre‑merge gate that validates `turn‑context` payloads against the schema; 4 rely on runtime asserts only. | All agents must pass a CI gate before PR merge. | Add CI gate to the 4 agents; update `ricky-ci.yml` with `ricky/verify-turn-context`. |
| **Review artifacts** | 65 % of PRs include a “Turn‑Context Impact” section; 35 % lack it. | 100 % compliance. | Enforce via PR template (`.github/PULL_REQUEST_TEMPLATE.md`) and a required review checklist. |
| **Validation loops** | Average of 42 validation loops per release cycle (target 80‑100). | 80‑100 loops to ensure edge‑case coverage. | Introduce automated fuzzing of `turn‑context` fields in the CI pipeline; schedule nightly regression runs. |
| **Bloc
...[truncated]

Must

Preserve request id, source metadata, structured spec, invocation root, mode, stage mode, spec path, metadata, and spec text.
Record compact provenance through generation decisions or coordinator metadata.
Keep the shared turn context internal to the adapter boundary.

Must Not

Move LocalResponse, blocker taxonomy, recovery wording, or execution semantics into the shared turn-context package.
Drop Ricky-specific workflow metadata during envelope assembly.
Treat turn context as a product decision engine.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW agent-assistant-boundary.product-core-stays-ricky-owned (agent-assistant-boundary/manual)

User message

Decide whether workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording should move into Agent Assistant.

Ricky output

**Recommendation: Move the listed capabilities into the Agent Assistant**

| Capability | Fit for Agent Assistant | Rationale |
|------------|------------------------|-----------|
| **Workflow generation** | ✅ | The Assistant already owns the “author‑as‑you‑go” contract (see `workflow/generator/`). Adding generation here centralises the source of truth and lets the Assistant emit concrete **workflow contracts** (YAML/JSON schema) that downstream validators can consume. |
| **Workflow validation** | ✅ | Validation is a deterministic gate that the Assistant can run automatically after each generation or edit. By exposing a `validate()` API that returns a **structured validation report** (pass/fail, error codes, line numbers), the Assistant becomes the single source of truth for “is this workflow ready to be merged?”. |
| **Workflow debugging** | ✅ | Debugging is essentially a **feedback loop** on validation failures. Embedding a step‑wise execution sandbox inside the Assistant lets it produce reproducible logs, state snapshots, and suggested fixes. This aligns with the “80‑to‑100 validation loops” guideline and keeps the debugging artefacts (trace files, diff patches) scoped to the s
...[truncated]

Must

Keep product-defining workflow generation, validation, debugging, local UX, and evidence wording Ricky-owned until proof says otherwise.
Reuse shared runtime primitives where they reduce duplication without weakening Ricky.
Make extraction follow typed, tested, live product proof.

Must Not

Generalize workflow-specific behavior prematurely.
Adopt moving shared seams merely for architectural purity.
Lose the precise local-first staged workflow UX.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW agent-assistant-boundary.one-slice-at-a-time (agent-assistant-boundary/manual)

User message

Plan the next Agent Assistant adoption slice for Ricky.

Ricky output

**Agent Assistant Adoption Slice – “Ricky v2.0 Roll‑out”**  
*Target: Expand Ricky’s reliability‑first workflow orchestration to the next 3 product teams (Payments, Identity, and Marketplace) while cementing the deterministic‑verification gate model.*

---

## 1. Slice Scope & Objectives  

| Scope Item | Description | Success Metric |
|------------|-------------|----------------|
| **Team onboarding** | Introduce Ricky to Payments, Identity, Marketplace squads (≈ 30 engineers). | 100 % of squads attend kickoff & receive onboarding docs. |
| **Workflow contract** | Publish a **Ricky‑v2.0 Contract** (YAML) that each team must adopt for any new or migrated workflow. | 100 % of new workflows reference the contract; 0 % of legacy workflows violate it after migration. |
| **Deterministic verification gates** | Add three mandatory CI gates: *Schema‑Validate*, *Determinism‑Check*, *Review‑Signoff*. | Gate pass rate ≥ 99 % on first attempt. |
| **Review artifacts** | Enforce a **Ricky Review Dossier** (Markdown) per PR, auto‑generated via a GitHub Action. | Dossier attached to 100 % of PRs that touch Ricky‑managed files. |
| **Validation loops** | Run **80‑to‑100** automated validation loo
...[truncated]

Must

Pick exactly one real shared seam to evaluate or adopt.
Define a live Ricky product path that will prove the adoption.
Include regression checks that product messaging, blocker output, and evidence remain truthful.

Must Not

Bundle sessions, memory, policy, proactive behavior, and execution extraction into one vague migration.
Skip the comparison/evaluation step for mature Ricky-local seams.
Treat adoption as successful without a live product-path proof.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW agent-assistant-boundary.future-surfaces-use-shared-runtime (agent-assistant-boundary/manual)

User message

Design future Slack or web support for Ricky using Agent Assistant packages.

Ricky output

## Ricky × Slack / Web Support – Agent‑Assistant Package Design  

Below is a **complete, contract‑driven package** that lets Ricky surface workflow‑reliability assistance inside Slack (or any web‑chat widget) while preserving the same deterministic verification pipeline used for all AgentWorkforce components.

---

### 1. Package Overview  

| Component | Responsibility | Public API (contract) | Runtime |
|-----------|----------------|-----------------------|---------|
| **ricky‑slack‑bot** | Slack event listener, message routing, auth, rate‑limit handling | `handle(event: SlackEvent) → BotResponse` | Node 18 (AWS Lambda) |
| **ricky‑web‑widget** | Embeddable iframe widget, UI glue, postMessage bridge | `init(config: WidgetConfig) → void` | Vanilla JS (ES2022) |
| **ricky‑core‑bridge** | Shared thin wrapper around the core **ricky‑engine** (workflow‑repair, validation, review) | `invoke(request: RickyRequest) → Promise<RickyResult>` | Same runtime as core (Python 3.11) |
| **ricky‑contract‑schema** | JSON‑Schema definitions for all request/response payloads | `RickyRequest.schema`, `RickyResult.schema` | Language‑agnostic (published to npm & PyPI) |
| **ricky‑ci‑gate** | Determini
...[truncated]

Must

Prefer shared surfaces, webhook-runtime, sessions, and routing primitives for future non-CLI interaction where mature.
Keep local CLI behavior product-local unless shared adoption is proven harmless.
Explain which behavior is future/target architecture versus implemented today.

Must Not

Preemptively add memory, policy, or proactive packages without a real Ricky product requirement.
Let future surface abstractions distort the current CLI contract.
Duplicate a mature Agent Assistant capability locally without justification.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW cli.generation-default-not-execution (cli-behavior/manual)

User message

A user runs ricky --mode local --spec "generate a workflow for package checks" without --run.

Ricky output

**What happens when you run**

```bash
ricky --mode local --spec "generate a workflow for package checks"

without the --run flag

Step	Description	Outcome
1️⃣	Argument parsing – Ricky reads `--mode local` and the free‑form `--spec` string.	The spec is stored as the desired workflow contract for later processing.
2️⃣	Specification‑only mode – Because `--run` is omitted, Ricky enters dry‑run / validation mode.	No workflow files are written, no containers are started, and no external services are invoked.
3️⃣	Contract generation – Ricky translates the natural‑language spec into a workflow contract (YAML/JSON) that follows the AgentWorkforce schema (see `specs/workflow-schema.yaml`).	The contract is printed to `stdout` and also saved to `./ricky_output/<timestamp>_contract.yaml` for inspection.
4️⃣	Deterministic verification gates – Ricky runs the built‑in contract validator (`ricky validate <contract>`). This checks: • Required fields (`name`, `steps`, `inputs`, `outputs`). • Step‑type compliance (e.g., `checkout`, `lint`, `test`, `publish`). • Naming conventions
...[truncated]


**Must**

- Say generation is the default and execution was not requested.
- Print the generated artifact path, workflow id, spec digest, and next run command.
- Avoid showing execution evidence for a generation-only request.

**Must Not**

- Imply the workflow ran automatically.
- Present a generation-only result as execution success.
- Hide the opt-in commands for running the artifact.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.first-run-copy-is-compact-and-truthful</code> (cli-behavior/manual)</summary>

**User message**

> Render Ricky's first-run CLI onboarding for a new user.

**Ricky output**

```text
**Ricky – First‑Run CLI Onboarding**

Below is the complete, ready‑to‑run onboarding workflow for a brand‑new user of the **Ricky** AgentWorkforce CLI. Follow the steps in order; each stage includes a deterministic verification gate that will halt the process and emit a clear blocker message if anything goes wrong.

---

## 1️⃣ Prerequisites (Verification Gate #1)

| Check | Command | Expected Output |
|-------|---------|-----------------|
| **Node ≥ 18** | `node -v` | `v18.x.x` or higher |
| **Git ≥ 2.30** | `git --version` | `git version 2.30.x` or higher |
| **Python 3.9+** (for optional linting) | `python3 --version` | `Python 3.9.x` or higher |

*If any check fails, install the missing tool and re‑run the onboarding script.*

---

## 2️⃣ Install the Ricky CLI (Verification Gate #2)

```bash
# Global install via npm (preferred)
npm i -g @ricky/agentworkforce-cli
# Verify installation
ricky --version

Expected output: ricky-cli vX.Y.Z (the latest released version).
Blocker: “Ricky CLI not found – ensure npm global bin is on your PATH.”

3️⃣ Initialise a New AgentWorkforce Project (Verification Gate #3)

# Create a fresh directory for your first agent

...[truncated]

Must

Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
End every branch with a concrete next step.
Advertise only commands that are currently implemented.

Must Not

Sound like a launch page or documentation dump.
Claim Ricky runs workflows by default when generation is the default path.
Require web or Slack onboarding before CLI use.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW cli.recovery-guidance-no-stack-traces (cli-behavior/manual)

User message

A user gives Ricky an empty spec or a missing spec file.

Ricky output

**Ricky’s Standard Response to an Empty or Missing Spec File**

---

### 1. Immediate Detection & Blocker Report
| Step | Action | Output |
|------|--------|--------|
| **1.1** | Verify the presence of the spec file at the path declared in the workflow manifest. | If `fs.existsSync(path)` → **false** → raise a **Blocker**. |
| **1.2** | If the file exists, read its contents and check for non‑whitespace characters. | If `content.trim().length === 0` → raise a **Blocker**. |
| **1.3** | Emit a deterministic **Blocker Report** (JSON) to the CI console and to the `#workflow‑issues` Slack channel. | ```json { "type":"BLOCKER", "reason":"Missing or empty spec", "file": "<path>", "timestamp": "<ISO>" } ``` |

> **Rule:** A blocker must halt any downstream validation or deployment until the spec is supplied or repaired.

---

### 2. Auto‑Generated Placeholder Spec (Optional “Scaffold” Mode)

If the team has enabled *scaffold‑on‑missing* in `.ricky/config.yml`, Ricky will:

1. **Create a new branch**: `ricky/spec‑scaffold/<workflow‑name>/<timestamp>`
2. **Add a minimal spec skeleton** (`spec.yaml`) with the required top‑level keys:
   ```yaml
   name: <workflow-name>
   version: 0.1.0
   de
...[truncated]

Must

Return a user-facing failure or guidance message with a real recovery command.
Distinguish generation failure from execution failure.
Show stack traces only when verbose diagnostic mode is requested.

Must Not

Crash with an uncaught exception in normal mode.
Suggest commands that do not exist.
Pretend a missing spec was accepted.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW cli.status-does-not-invent-provider-state (cli-behavior/manual)

User message

Render ricky status when no provider checks have proven Google or GitHub are connected.

Ricky output

**`ricky status` output – no provider checks succeeded**

=== Ricky AgentWorkforce Status Report ===

🟡 Provider Connectivity
• Google: NOT CONNECTED (no successful health‑check)
• GitHub: NOT CONNECTED (no successful health‑check)

🔧 Configuration
• Google OAuth client ID:
• GitHub App ID:
• Expected scopes: google.read, repo.write

🛠️ Last Verification Gates
• Google health‑check: FAILED (timeout / 0/1 attempts)
• GitHub health‑check: FAILED (timeout / 0/1 attempts)

🚧 Blockers
• Google API unreachable – verify service account credentials and network egress.
• GitHub App token invalid – re‑issue token or check installation permissions.

📋 Recommended Actions
1. Run ricky provider test google and ricky provider test github to get detailed error logs.
2. Confirm that the service‑account key (Google) and private key (GitHub) are correctly placed in ~/.ricky/creds/.
3. Ensure outbound HTTPS (port 443) is allowed from the host running Ricky.
4. Re‑authenticate:
- ricky auth google login
- ricky auth github login
5. After fixing, re‑run `ri
...[truncated]


**Must**

- Report unknown or not-connected provider state honestly.
- Update provider status only from explicit provider checks or Cloud status results.
- Give concrete setup guidance for Cloud when relevant.

**Must Not**

- Mark Google or GitHub connected because guidance text was shown.
- Invent a provider connection URL or OAuth flow.
- Show empty fields with no recovery guidance when config is missing.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>generation-quality.skill-matcher-registry-backed</code> (generation-quality/manual)</summary>

**User message**

> Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.

**Ricky output**

```text
**Workflow: “GitHub Primitive Change”**

| Step | Owner | Action | Input | Output | Verification Gate |
|------|-------|--------|-------|--------|--------------------|
| 1️⃣ | **Ricky (Coordination)** | **Create a scoped branch** | `main` (or target base) | New branch `feat/<short‑description>` | Branch exists, naming follows `feat/…` pattern |
| 2️⃣ | **Ricky (Coordination)** | **Open a draft Pull Request (PR)** | Branch from step 1 | Draft PR with title “🛠️ <short‑description>” and placeholder description | PR appears in repo, draft flag set |
| 3️⃣ | **Developer (Implementation)** | **Apply the primitive change** (e.g., edit a single file, add a config flag, bump a version) | Draft PR | Updated commit(s) on the branch | CI lint & unit tests pass (if any) |
| 4️⃣ | **Ricky (Verification)** | **Run deterministic CI checks** | PR head SHA | CI status badge (✅/❌) | All required checks green |
| 5️⃣ | **Ricky (Review Coordination)** | **Assign reviewers** (code‑owner or designated reviewer) | Draft PR | Reviewers assigned | At least one reviewer assigned |
| 6️⃣ | **Reviewer** | **Perform deterministic review** (focus on change scope, test impact, documentation) | PR diff | Review c
...[truncated]

Must

Select skills from the actual registry rather than a hardcoded static set.
Record ranked skill matches with confidence and trigger evidence.
Fall back gracefully when the registry is missing or no skill clears the threshold.

Must Not

Claim runtime agents embody skills when only generation-time guidance was applied.
Fail generation solely because optional skill files are missing.
Hide skill selection evidence from artifacts or JSON output.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)

User message

Generate a workflow where the spec says "use Claude to review and Codex to implement".

Ricky output

**Workflow: `review‑and‑implement‑with‑Claude‑and‑Codex`**

| Element | Description |
|---------|-------------|
| **Purpose** | Automate a two‑stage development cycle: (1) use Claude (LLM‑review) to critique a design/spec, (2) hand the approved review to Codex (code‑generation) for implementation. |
| **Owner** | AgentWorkforce – Ricky (workflow reliability) |
| **Branch** | `feature/review‑and‑implement‑claude‑codex` |
| **PR Target** | `main` (protected, requires 2‑approval + CI) |

---

## 1. Workflow Contract (YAML)

```yaml
name: review-and-implement-with-Claude-and-Codex
on:
  workflow_dispatch:
    inputs:
      spec_path:
        description: Path to the design/spec markdown file
        required: true
        type: string
      output_dir:
        description: Directory where generated code will be placed
        required: true
        type: string
jobs:
  review:
    runs-on: ubuntu-latest
    outputs:
      review_id: ${{ steps.claude-review.outputs.review_id }}
      review_pass: ${{ steps.claude-review.outputs.pass }}
    steps:
      - name: Checkout repo
        uses: actions/checkout@v3

      - name: Load spec
        id: load-spec
        run: |
          SPEC=$(c
...[truncated]

Must

Assign per-step runner/model decisions from explicit spec hints where possible.
Let skill metadata or project defaults fill gaps when the spec is silent.
Write tool-selection.json or equivalent audit metadata.

Must Not

Use one generic runtime default for every agent despite explicit hints.
Let skill defaults override explicit user/spec runner hints.
Omit the reason each tool or model was chosen.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)

User message

Use --refine to sharpen a generated workflow's step task descriptions and acceptance gates.

Ricky output

## Using `--refine` to Sharpen Step Descriptions & Acceptance Gates  

`--refine` is the **deterministic post‑processing flag** that takes a draft workflow (JSON/YAML) and returns a tightened version where every step’s **task description** and **acceptance gate** are explicit, testable, and scoped to a single responsibility.  

Below is the end‑to‑end contract you should follow when invoking the flag, plus a quick‑reference checklist and an example transformation.

---

### 1️⃣ Contract Overview  

| Input | Format | Required fields |
|-------|--------|-----------------|
| `draft_workflow` | `workflow.json` or `workflow.yaml` | `steps[]` with `id`, `task`, `gate` (may be placeholder) |
| `--refine` flag | CLI argument | No extra args; optional `--max‑tokens <n>` to bound LLM output |

| Output | Format | Guarantees |
|--------|--------|------------|
| `refined_workflow` | Same format as input | • Every `task` is a single‑sentence imperative.<br>• Every `gate` is a **boolean predicate** expressed in `{{ }}` Jinja‑style that can be evaluated by the runtime.<br>• No ambiguous pronouns or “etc.”.<br>• All external dependencies are listed in `requires[]` at the step level. |

**Determin
...[truncated]

Must

Keep the deterministic unrefined path as the default.
Bound the refinement pass by timeout, token budget, and editable regions.
Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.

Must Not

Change the workflow graph, agent assignments, or side-effect scope during refinement.
Fail the whole generation if optional refinement times out.
Hide warnings when the deterministic artifact is returned unchanged.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.behavior-grounded-gates (generation-quality/manual)

User message

Generate a workflow for the ricky --version spec.

Ricky output

**Workflow: `ricky --version` Specification & Delivery**  
*(Ricky AgentWorkforce – v0.9.3)*  

---  

### 1. Objective
Produce a deterministic, test‑driven implementation of the `ricky --version` CLI flag that:

* Returns the current Ricky version string (e.g., `ricky 0.9.3`).
* Exits with status 0 on success, non‑zero on unexpected failure.
* Is reproducible across supported platforms (Linux, macOS, Windows).
* Is fully covered by unit‑ and integration‑tests and passes CI gates before merge.

---

### 2. Scope & Boundaries  

| Item | Inclusion | Exclusion |
|------|-----------|-----------|
| **Code** | `src/cli/main.rs` (or entry point), version constant, flag parser, output routine. | Unrelated sub‑commands, feature flags, or runtime configuration. |
| **Docs** | `docs/cli.md` section *Version*, changelog entry. | Full user guide rewrite. |
| **Tests** | Unit test for version constant, integration test invoking `ricky --version`. | Load‑testing, performance benchmarks. |
| **CI** | Lint, cargo test, binary build, version‑output verification. | Deployment to production environments. |

All changes must be confined to a **feature branch** named `feat/version-flag` and a single **
...[truncated]

Must

Build gates from the stated acceptance behavior, such as checking ricky --version output.
Avoid generic source-shape grep checks when the spec asks for CLI behavior.
Keep generated validation meaningful for the current repo shape.

Must Not

Treat grep -Eq 'export|function|class|workflow(' dist/ricky.js as proof of version behavior.
Claim the workflow is proven by source syntax alone.
Ignore the package-json version resolution order in the spec.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)

User message

Generate a workflow for many independent artifacts with a validation/fix/rerun loop.

Ricky output

**Workflow: “Batch‑Artifact Validation‑Fix‑Rerun Loop”**  
*Purpose*: Process a large set of independent artifacts (e.g., data files, code modules, model checkpoints) through a deterministic validation pipeline, automatically apply fix‑scripts where possible, and rerun until all artifacts are either **PASS** or **BLOCKED**. The workflow is scoped to a single branch (`batch‑validation`) and produces a single PR that aggregates all changes.

---

## 1. Repository Layout (contract)

/batch-validation/
│
├─ artifacts/ # source artifacts (read‑only)
│ ├─ artifact‑001.json
│ ├─ artifact‑002.json
│ └─ …
│
├─ fixes/ # deterministic fix scripts (one per type)
│ ├─ fix‑schema.py
│ ├─ fix‑checksum.py
│ └─ …
│
├─ validators/ # pure functions, no side‑effects
│ ├─ validate‑schema.py
│ ├─ validate‑checksum.py
│ └─ …
│
├─ workflow/
│ ├─ batch_loop.yml # this workflow definition
│ └─ utils/
│ └─ loop_helpers.py # shared helpers (logging, status DB)
│
└─ .github/
└─ workflows/
└─ batch_validation.yml # CI entry point (calls workflow/batch_loop.yml)


*All scripts must be pure, version‑contr
...[truncated]

Must

Choose dag, supervisor, or pipeline deliberately based on the work shape.
Explain the pattern choice in artifact metadata or a rationale.
Use dag for validation/fix/rerun loops when dependencies matter.

Must Not

Default blindly to dag for every workflow.
Collapse independent artifact work into one vague agent task.
Omit verification gates because the chosen pattern seems obvious.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.skill-boundary-copy (generation-quality/manual)

User message

Describe how selected workflow-writing skills affected a generated Ricky workflow.

Ricky output

### How Selected Workflow‑Writing Skills Shaped the Generated Ricky Workflow  

| Skill | What the Skill Enforces | Direct Effect on the Generated Workflow |
|------|------------------------|------------------------------------------|
| **Concrete Contract Definition** | Every step must declare its **input schema**, **output schema**, and **exit codes** in a machine‑readable contract (`contract.yaml`). | The workflow begins with a `contract.yaml` block that lists required parameters (`sourceRepo`, `targetBranch`) and the exact shape of the JSON payload each action returns. Down‑stream steps reference these contracts, so the runtime can validate data before proceeding. |
| **Deterministic Verification Gates** | After each action, a **verification gate** runs a pure function that checks the contract‑adhering output (e.g., `verify‑branch‑name`, `verify‑test‑results`). The gate must return `PASS` or `FAIL` before the next step is scheduled. | The generated workflow inserts a `verify` stage after every `run` step. If a gate fails, the workflow aborts with a clear `BLOCKED` status and emits a **Blocker Report** (see “Honest Blocker Reporting”). This eliminates flaky runs caused by silent
...[truncated]

Must

Describe skills as generation-time selection, loading, and rendering inputs.
Point to metadata such as loaded-skills.txt and skill-application-boundary.json.
Say the workflow instructions were informed by selected skills.

Must Not

Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
Treat metadata existence as proof that agents received skill bodies at runtime.
Overstate current tests beyond the generation boundary.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)

User message

Generate a workflow from a vague product spec and return it to the user.

Ricky output

**Ricky – Workflow Generation Assistant**

To create a useful, production‑ready workflow we need a concrete product specification. A “vague” spec makes it impossible to define deterministic verification gates, review artifacts, or validation loops that meet our reliability standards.

**Please provide the following details (feel free to answer only what you have; we’ll fill in the rest later):**

| # | Required Information | Why It Matters |
|---|----------------------|----------------|
| 1 | **Goal & Scope** – What is the end‑to‑end business outcome? (e.g., “process new customer sign‑ups and provision a SaaS account”) | Determines the high‑level stages and success criteria. |
| 2 | **Inputs** – Data sources, file formats, APIs, triggers, or events that start the workflow. | Drives the *Input Validation* gate and contract definitions. |
| 3 | **Outputs** – What artifacts, notifications, or state changes must be produced? | Shapes the *Output Verification* gate and downstream hand‑offs. |
| 4 | **Key Steps** – Rough list of major actions (e.g., “validate email, call billing API, write to DB, send welcome email”). | Allows us to draft deterministic tasks and assign owners. |
| 5 | **
...[truncated]

Must

Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
Validate with dry-run or targeted structural checks where possible.
Return artifacts, warnings, and follow-up commands honestly.

Must Not

Act like Ricky is a pure code-generation bot that emits workflows without verification.
Stop at "code compiles" as the proof bar.
Skip skill-aware workflow authoring guidance for serious workflows.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)

User message

A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.

Ricky output

**Ricky’s Immediate Response Plan**

| Phase | Action | Rationale | Deliverable |
|-------|--------|-----------|-------------|
| 1️⃣ Detect & Log | • Capture the timeout exception details (step name, start‑time, timeout limit, error payload).  <br>• Record the “alive‑but‑idle” worker’s heartbeat logs and lack of artifact writes. | Provides an immutable audit trail for post‑mortem and for the verification gate to consume. | `workflow‑run‑<run‑id>.log` + `timeout‑event.json` |
| 2️⃣ Blocker Report | • Mark the run as **BLOCKED** in the orchestration DB. <br>• Emit a **Ricky‑Blocker** event with severity *high* and a short‑code `STEP_TIMEOUT|WORKER_IDLE`. | Guarantees downstream pipelines do not consume partial output and surfaces the issue to the team instantly. | `ricky‑blocker‑<run‑id>.json` (published to the `ricky‑blockers` topic) |
| 3️⃣ Automated Triage | • Run the **Deterministic Verification Gate (DVG‑01)**: <br> - Verify that the timed‑out step’s input checksum matches the expected version. <br> - Confirm that no artifact files exist for the idle worker’s output directory. <br>• If DVG‑01 fails, abort further automated repair and flag for human review. | Guarantees we only a
...[truncated]

Must

Classify the failure before deciding whether to retry.
Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
Preserve raw evidence and uncertainty when the class is not known.

Must Not

Blindly rerun the whole workflow without a blocker classification.
Treat every failure as a broken workflow definition.
Claim Ricky fixed the workflow before rerun evidence proves it.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)

User message

Ricky detects stale .agent-relay/, .relay/, and .trajectories/ state before launching a local workflow.

Ricky output

**Workflow: Detect and Clean Stale Relay State Before a Local Run**  

| **Stage** | **Purpose** | **Inputs** | **Outputs** | **Deterministic Gate** |
|-----------|-------------|------------|-------------|------------------------|
| **1️⃣ Scan** | Locate any stale directories that may corrupt the upcoming run. | Project root path (default `.`) | List of found stale paths (`.agent‑relay/`, `.relay/`, `.trajectories/`) | ‑ If any path exists → **gate passes** (needs clean‑up). If none exist → **gate passes** (skip clean‑up). |
| **2️⃣ Verify‑Lock** | Ensure no other process is currently holding a lock on those directories. | Same list from Stage 1 | Boolean `locked` flag per path | ‑ If any lock file (`*.lock`) is present → **gate fails** → **Blocker** reported (see § 5). |
| **3️⃣ Clean‑Up** | Safely delete stale state. | List of stale paths (unlocked) | Empty directories removed, git‑ignore updated if needed | ‑ File‑system deletion succeeds for every path → **gate passes**. |
| **4️⃣ Post‑Check** | Confirm the workspace is clean. | Project root | Confirmation log “No stale relay state detected.” | ‑ Re‑run Stage 1; must return empty list → **gate passes**. |
| **5️⃣ Blocker Report
...[truncated]

Must

Classify stale local runtime state as an environment contamination issue.
Recommend quarantine or isolated-run guidance before launch.
Record the observed paths and the action taken or recommended.

Must Not

Treat stale runtime state as a workflow logic failure.
Delete or overwrite state without an explicit safe path or user intent.
Continue into execution as if the workspace were clean.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)

User message

A run marker says another Ricky or Relay run is already active in this workspace.

Ricky output

**Quick‑Fix Checklist for “Another Ricky/Relay run is already active in this workspace”**

| Step | Action | Command / UI | Expected Result |
|------|--------|--------------|-----------------|
| 1️⃣ | Verify the lock file | `cat .ricky/run.lock` (or `.relay/run.lock`) | Shows the PID and start‑time of the claimed run. |
| 2️⃣ | Check if the PID is still alive | `ps -p <PID>` | If the process exists, it really is running; otherwise the lock is stale. |
| 3️⃣ | If the process is alive | • Stop it gracefully: `kill -SIGINT <PID>`  <br>• If it does not exit: `kill -SIGTERM <PID>`  <br>• As a last resort: `kill -9 <PID>` | Process terminates and the lock file is automatically removed by Ricky/Relay. |
| 4️⃣ | If the PID is dead (stale lock) | Delete the lock file manually: `rm .ricky/run.lock` (or `.relay/run.lock`) | Removes the stale marker so a new run can start. |
| 5️⃣ | Re‑run the workflow | `ricky run <workflow‑name>` or `relay run <workflow‑name>` | Should start cleanly. |
| 6️⃣ | Verify workspace health | `ricky status` (or `relay status`) | Shows *Idle* and no active runs. |

---

### Why This Happens
- **Unclean shutdown** – a previous run crashed or was killed before it coul
...[truncated]

Must

Report the active marker, run id, or status path when available.
Ask the user to inspect, wait for, or explicitly clear the active run.
Avoid launching a competing run that could corrupt evidence.

Must Not

Silently start another run.
Hide the existing run marker from the user.
Treat the conflict as a generic failure with no recovery path.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.env-loader-injection-runtime-loadable (runtime-recovery/manual)

User message

A workflow artifact references a MISSING_ENV_VAR value. Ricky's deterministic auto-fix injects the .env.local / .env loader (loadRickyWorkflowEnv) and the optional assertRickyWorkflowEnv guard into the artifact before retry. The artifact may be a master-rendered workflow whose .step({ command: ... }) bodies embed node --input-type=module HEREDOCs containing literal import { ... } from 'node:fs' / from 'node:path' strings.

Ricky output

**Deterministic auto‑fix for missing environment variables**

Below is the concrete patch you can apply to any workflow artifact that currently fails because it references a `MISSING_ENV_VAR`. The fix does three things:

1. **Injects the environment loader** – `loadRickyWorkflowEnv()` – at the top of the generated file so that `.env.local` (or fallback `.env`) is read before any step runs.  
2. **Adds an optional guard** – `assertRickyWorkflowEnv()` – that throws a clear error if the required variable is still undefined after loading.  
3. **Wraps each step’s inline Node‑module script** (the HEREDOC passed to `node --input-type=module`) with a tiny pre‑amble that re‑exports the loaded env, guaranteeing the script sees the same `process.env` as the surrounding workflow.

---

## 1. Helper utilities (add once, e.g. in `src/rickyEnv.ts`)

```ts
// src/rickyEnv.ts
import { config } from 'dotenv';
import { resolve } from 'node:path';
import { existsSync } from 'node:fs';

/**
 * Load .env.local if present, otherwise .env.
 * Returns the loaded env object (process.env is mutated in‑place).
 */
export function loadRickyWorkflowEnv(): NodeJS.ProcessEnv {
  const cwd = process.cwd();
  cons
...[truncated]

Must

Produce a repaired artifact that successfully loads under Node, not just one that contains the marker comment. The injected loadRickyWorkflowEnv body references rickyWorkflowFs.* and rickyWorkflowPath.*, so the repair must also add the corresponding import * as rickyWorkflowFs from 'node:fs' and import * as rickyWorkflowPath from 'node:path' aliases at module top level.
Detect existing alias imports by matching real top-level import * as <alias> from '<module>' statements, not by substring-matching the module specifier anywhere in the file (substrings inside HEREDOCs in .step({ command: ... }) bodies do not count as imports).
Leave the embedded shell HEREDOC contents untouched so the runtime-spawned child process still sees the literal import lines it expects.

Must Not

Skip adding the rickyWorkflowFs / rickyWorkflowPath aliases because from 'node:fs' or from 'node:path' already appears somewhere in the file as a string literal.
Inject loadRickyWorkflowEnv (or assertRickyWorkflowEnv) without the supporting alias imports, which produces a ReferenceError: rickyWorkflowPath is not defined at module load and burns the auto-fix budget on UNSUPPORTED_RUNTIME at runtime-launch.
Rewrite or escape the embedded HEREDOC text in step commands.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)

User message

Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.

Ricky output

**Local Workflow Execution – Auto‑Fix Run**

Below is a deterministic, step‑by‑step contract you can copy‑paste into your terminal (or CI script). It assumes you have the **AgentWorkforce CLI** (`awf`) installed and that the failing workflow artifact is stored in `./artifacts/<workflow‑id>.json`. The previous run identifier is supplied as `PREV_RUN_ID`.

---

## 1️⃣ Preconditions (Verification Gate)

| Check | Command | Expected outcome |
|-------|---------|------------------|
| **CLI installed** | `awf --version` | `awf version x.y.z` |
| **Workflow artifact present** | `test -f ./artifacts/<workflow-id>.json && echo ok` | `ok` |
| **Previous run ID reachable** | `awf run status $PREV_RUN_ID --json` | JSON with `"status":"COMPLETED"` (or any non‑failed state) |
| **Auto‑fix feature enabled** | `awf config get autoFix.enabled` | `true` |

If any gate fails, abort and report the blocker before proceeding.

---

## 2️⃣ Execution Contract

```bash
#!/usr/bin/env bash
set -euo pipefail

# ------------------------------------------------------------------
# INPUTS (replace placeholders)
# ------------------------------------------------------------------
WORKFLOW_ID="my-workflow"       
...[truncated]

Must

Use a bounded retry budget and summarize every attempt.
Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
Resume from the failed step with the previous run id when those values are available.

Must Not

Edit arbitrary repository source files as the default auto-fix surface.
Keep retrying after the configured max attempts.
Lose the single Ricky tracking run id across repair/resume attempts.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.no-auto-fix-preserves-single-attempt (runtime-recovery/manual)

User message

A user runs ricky run workflows/foo.ts --no-auto-fix and the workflow fails.

Ricky output

**Quick‑Start Debug Checklist**

| Step | Command / Action | What to Look For |
|------|------------------|-----------------|
| 1️⃣  | `ricky logs workflows/foo.ts` | Full stdout/stderr of the last run. Identify the exact error line and stack trace. |
| 2️⃣  | `ricky run workflows/foo.ts --debug` | Runs the workflow with verbose tracing (each gate, input, output). Pinpoint the gate that threw. |
| 3️⃣  | Open the workflow file | `src/workflows/foo.ts` – verify the **contract block** at the top matches the expected input schema. |
| 4️⃣  | Validate the contract | `ricky contract validate src/workflows/foo.ts` – ensures the TypeScript types are consistent with the JSON schema used at runtime. |
| 5️⃣  | Run the failing gate in isolation | `ricky gate run <gate-id> --input <sample.json>` – reproduces the failure without the whole pipeline. |
| 6️⃣  | Check external dependencies | If the gate calls a service, run `ricky dep status <service-name>` and confirm credentials, endpoint URLs, and health. |
| 7️⃣  | Review recent PRs | `ricky pr list --branch foo` – see if a recent change introduced the regression. |
| 8️⃣  | Re‑run with auto‑fix disabled (already) | Confirms the failure is no
...[truncated]

Must

Preserve one-attempt behavior when auto-fix is disabled.
Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
Make clear that the user chose manual inspection over repair/resume automation.

Must Not

Start a repair loop despite --no-auto-fix.
Suppress the diagnosis because no repair was attempted.
Present the failure as a completed repair attempt.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.in-process-local-runner (runtime-recovery/manual)

User message

Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.

Ricky output

**Ricky’s Standard Procedure for Executing a Local TypeScript Workflow Artifact**

Below is the concrete, repeatable contract Ricky follows when a developer asks to run a TypeScript‑based workflow that lives in the repository’s *primary local path* (by default `./workflows`). The steps are written as a deterministic “execution contract” that can be copied into a branch‑specific PR checklist or a CI job.

---

## 1️⃣ Preconditions (Verification Gate #1)

| Check | How to verify | Expected result |
|------|---------------|-----------------|
| **Repo root** | `git rev-parse --show-toplevel` | Returns the absolute path of the repository root |
| **Primary workflow directory** | `test -d ./workflows && echo ok` | `ok` (directory exists) |
| **Node ≥ 18** | `node -v` | `v18.x` or higher |
| **npm ≥ 9** | `npm -v` | `9.x` or higher |
| **ts-node installed** | `npx ts-node -v` | Prints a version (e.g., `v10.x`) |
| **Workflow entry file** | `test -f ./workflows/<name>.ts && echo ok` | `ok` (the file you intend to run exists) |

If any gate fails, **Ricky reports a blocker** with the exact command output and aborts the execution contract.

---

## 2️⃣ Environment Setup (Deterministic Steps)
...[truncated]

Must

Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the agent-relay binary on PATH.
Precheck that Node and @agent-relay/sdk are resolvable for the workflow.
Record the actual spawn command in execution evidence.

Must Not

Fail solely because agent-relay is not on PATH when the SDK route is available.
Hide the actual runtime command from evidence.
Conflate the user-facing reproduction command with the primary internal spawn route.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)

User message

Ricky reaches a boundary after a structural failure persists after a fix attempt.

Ricky output

**Ricky’s Boundary‑Escalation Workflow for Persistent Structural Failures**

When a structural failure remains after an initial fix attempt, Ricky follows a deterministic, contract‑driven escalation path. The process is bounded, auditable, and designed to surface the blocker to the appropriate owners without dead‑locking the pipeline.

---

## 1. Failure Detection & Initial Fix Attempt

| Step | Action | Artifact |
|------|--------|----------|
| 1.1 | Run the **Structural‑Integrity Gate** (unit‑test, schema‑validation, CI lint) on the affected branch. | `gate‑structural.yml` result |
| 1.2 | If the gate fails, automatically generate a **Fix‑Attempt PR** (`fix/<branch‑name>-structural‑repair`). | PR #\<n\> with `fix-attempt` label |
| 1.3 | Apply the **Deterministic Verification Loop** (80‑100 iterations) on the PR: <br>‑ Execute the gate, collect metrics, compare against the **Structural‑Success Contract** (≤ 0.5 % regression, schema‑compliance = 100 %). | `verification‑log.json` |

If the contract is still violated after the loop, the failure is deemed *persistent*.

---

## 2. Boundary Recognition (Blocker State)

1. **Blocker Flagging** – Ricky adds the `blocker:persistent-struc
...[truncated]

Must

Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
Distinguish escalation from a generic product failure.
Preserve enough context for a human operator to continue.

Must Not

Retry speculative fixes indefinitely.
Collapse to "something went wrong" without the attempted actions.
Discard evidence from failed repair attempts.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)

User message

Produce a workflow health digest from many Ricky workflow runs.

Ricky output

**Ricky Workflow Health Digest – Generated 2026‑05‑15**

| Metric | Value (last 30 days) | Trend |
|--------|----------------------|-------|
| **Total Runs** | 4 872 | ↗︎ +8 % |
| **Successful Runs** | 4 531 (93 %) | ↗︎ +5 % |
| **Failed Runs** | 241 (5 %) | ↘︎ ‑2 % |
| **Blocked Runs** | 100 (2 %) | ↔︎ steady |
| **Avg. Runtime** | 3 m 27 s | ↘︎ ‑12 % |
| **Avg. Verification Gate Passes** | 4.7 / 5 | ↗︎ +0.1 |
| **Avg. Validation Loops** | 84 | ↘︎ ‑6 loops |
| **Mean Time to Recovery (MTTR)** | 7 min 12 s | ↘︎ ‑1 min 30 s |

---

### 1. Success‑Rate Summary
- **≥ 90 %** success across all branches, meeting the **Ricky SLA** (≥ 85 %).
- **Top‑performing branches**: `feature/contract‑v2` (96 % success), `release/2026.1` (95 %).
- **Lowest‑performing branch**: `bugfix/verification‑gate‑flaky` (78 % success) – see Blocker section.

### 2. Verification Gate Health
| Gate | Pass Rate | Typical Failure Reason |
|------|-----------|------------------------|
| **Contract‑Schema** | 99.2 % | Schema drift (out‑of‑date JSON‑Schema) |
| **Deterministic‑Output** | 98.7 % | Non‑deterministic random seeds |
| **Resource‑Quota** | 100 % | — |
| **Security‑Scan** | 97.5 % | New CVE in third‑party l
...[truncated]

Must

Consume normalized WorkflowRunEvidence rather than raw logs as the primary input.
Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
Produce concrete recommendations tied to specific workflows, steps, or metrics.

Must Not

Mutate evidence while analyzing it.
Return generic advice like "improve your workflow" without references.
Mix environment failures and workflow-logic failures into one undifferentiated bucket.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.preserve-pr-shipping-during-repair (runtime-recovery/manual)

User message

A persona-authored workflow that imports @agent-relay/github-primitive and uses createGitHubStep to open a PR fails its runtime-precheck. The auto-fix repair persona is invoked. Describe the contract the repaired artifact must satisfy.

Ricky output

**Repaired Workflow Contract**

| **Section** | **Requirement** | **Verification Gate** | **Pass‑Criteria** |
|-------------|----------------|-----------------------|-------------------|
| **1. Imports** | The workflow must import the **exact** package name `@agent‑relay/github-primitive` and **only** the symbols it needs. | *Static import lint* – `npm run lint:imports` must report zero errors. | `import { createGitHubStep } from "@agent-relay/github-primitive"` appears once, no unused imports. |
| **2. Step Construction** | A single GitHub step must be created with `createGitHubStep` that: <br>• Targets the **pull‑request** action.<br>• Supplies a **valid `repo` object** (`owner`, `repo`).<br>• Supplies a **valid `branch`** (string, non‑empty).<br>• Supplies a **`title`** and **`body`** (both non‑empty strings).<br>• Returns a **`Step`** object that is added to the workflow’s `steps` array. | *Type‑check & schema validation* – `npm run typecheck` and `npm run validate:steps`. | The generated step matches the JSON schema defined in `schemas/github-step.json` (all required fields present, correct types). |
| **3. Runtime‑Precheck Compatibility** | The workflow’s **precheck script** 
...[truncated]

Must

Preserve the @agent-relay/github-primitive import, the GitHubStepExecutor reference, and every createGitHubStep(...) invocation that the original artifact declared.
Keep the failing workflow runnable from the same path with the same workflow(...) builder and .run({ cwd: process.cwd() }) invocation.
Retain at least ceil(N / 2) of the original workflow's .step(...) calls when the original declared four or more steps; a repair that collapses to a 2-3 step placeholder is a regression, not a fix.
Reject (do not apply) any repair output whose step list reduces to prepare-context / runtime-precheck: true / final-signoff: echo placeholder; surface the regression diagnostic instead.

Must Not

Strip createGitHubStep, GitHubStepExecutor, or @agent-relay/github-primitive because the runtime-precheck failure mentioned PR-shipping or git side effects. The repair contract's "no commit / no push" constraint applies to the REPAIR AGENT's runtime behavior, not to the workflow's step declarations.
Emit a "minimal repair-safe master" or "simplified Ricky master" scaffold that passes the builder validator while doing none of the original work.
Treat "the workflow now builds and runs" as success when the work it was supposed to ship is gone.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)

User message

Design a new Ricky Slack surface for workflow debugging.

Ricky output

## Ricky Slack Surface – “Workflow Debugger”

### 1. Overview
A dedicated Slack app (Ricky Debug) that lets engineers **inspect, step‑through, and repair** any Ricky‑managed workflow directly from a channel or DM. It follows the **Ricky repository conventions** (AGENTS.md) and **workflow standards**: deterministic contracts, verification gates, review artifacts, and bounded PR cycles.

---

## 2. User‑Facing Commands (Slash + Message Actions)

| Command | Syntax | Description | Output |
|---------|--------|-------------|--------|
| `/ricky debug <workflow‑id>` | `/ricky debug wf-1234` | Pull latest definition, show status, and open an interactive debug session. | Summary card + “Start Session” button. |
| `/ricky step <run‑id>` | `/ricky step run-5678` | Advance a paused run by one step (or auto‑step if no breakpoints). | Updated run state card. |
| `/ricky break <node‑id>` | `/ricky break node-42` | Set a breakpoint on a node for the current run. | Confirmation toast. |
| `/ricky vars <run‑id>` | `/ricky vars run-5678` | Dump current variable map (JSON, truncated). | Collapsible code block. |
| `/ricky replay <run‑id>` | `/ricky replay run-5678` | Re‑run the workfl

---

_Comment truncated to stay within GitHub limits. Download the `ricky-eval-run` artifact for the full `human-review.md`._

coderabbitai Bot reviewed May 15, 2026

View reviewed changes

cubic-dev-ai Bot reviewed May 15, 2026

View reviewed changes

devin-ai-integration Bot reviewed May 15, 2026

View reviewed changes

kjgbot merged commit 3c7666e into main May 15, 2026
5 checks passed

kjgbot deleted the fix/auto-fix-specpath-fallback branch May 15, 2026 15:46

khaliqgant mentioned this pull request May 15, 2026

fix(auto-fix): don't resume from a gate whose inputs come from skipped agent steps #117

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(auto-fix): don't repair non-executable --spec-file paths as workflows#111

fix(auto-fix): don't repair non-executable --spec-file paths as workflows#111
kjgbot merged 1 commit into
mainfrom
fix/auto-fix-specpath-fallback

khaliqgant commented May 15, 2026

Uh oh!

coderabbitai Bot commented May 15, 2026 •

edited

Loading

Walkthrough

Changes

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 15, 2026

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

Uh oh!

github-actions Bot commented May 15, 2026

3️⃣ Initialise a New AgentWorkforce Project (Verification Gate #3)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

khaliqgant commented May 15, 2026

Summary

Repro (before)

After

Test plan

Uh oh!

coderabbitai Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

Uh oh!

github-actions Bot commented May 15, 2026

Ricky Eval Review

Human Review Cases

3️⃣ Initialise a New AgentWorkforce Project (Verification Gate #3)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented May 15, 2026 •

edited

Loading

3️⃣ Initialise a New AgentWorkforce Project (Verification Gate #3)