Skip to content

fix(auto-fix): detect rickyWorkflow* alias imports via TypeScript AST#86

Merged
kjgbot merged 3 commits intomainfrom
fix/env-loader-alias-detection
May 10, 2026
Merged

fix(auto-fix): detect rickyWorkflow* alias imports via TypeScript AST#86
kjgbot merged 3 commits intomainfrom
fix/env-loader-alias-detection

Conversation

@kjgbot
Copy link
Copy Markdown
Contributor

@kjgbot kjgbot commented May 9, 2026

Summary

injectWorkflowEnvLoader decided whether to add the
import * as rickyWorkflowFs from 'node:fs' and
import * as rickyWorkflowPath from 'node:path' aliases by checking whether the substring from 'node:fs' (or from 'node:path') appeared anywhere in the artifact text. Master-rendered workflows embed those exact substrings inside a shell HEREDOC in a .step({ command: ... }) body — see master-workflow-renderer.ts:138-149, where the master renderer emits a node --input-type=module <<'NODE'\nimport { mkdirSync, writeFileSync } from 'node:fs';\n…NODE block as part of a deterministic step's command string.

The substring check matched that string literal and skipped adding the real top-level alias imports. The injected loadRickyWorkflowEnv body still referenced rickyWorkflowFs and rickyWorkflowPath, so Node threw ReferenceError: rickyWorkflowPath is not defined at module load and the auto-fix loop burned 7/7 attempts on UNSUPPORTED_RUNTIME at runtime-launch.

Repro

After #84 unblocked the artifact write so this latent bug surfaces:

$ ricky --mode local --spec-file docs/plans/2026-05-05-proactive-pr-remediation-spec.md --run
Generation: ok — workflows/generated/ricky-…ts
Workflow name: wf-1e14175ec3b5
Execution: blocked — UNSUPPORTED_RUNTIME at runtime-launch
Cause: node --experimental-strip-types exited with code 1
Auto-fix: stopped after 7/7 attempt(s) (UNSUPPORTED_RUNTIME)

ReferenceError: rickyWorkflowPath is not defined
    at loadRickyWorkflowEnv (workflows/generated/ricky-…ts:10:18)

The generated workflow had loadRickyWorkflowEnv defined and called from main(), but no rickyWorkflowFs / rickyWorkflowPath import at module scope — exactly because the substring check was fooled by the master renderer's embedded HEREDOC.

Fix

Replace the substring check with ts.createSourceFile + a walk over module-scope ImportDeclaration nodes. hasRickyWorkflowAliasImport(content, alias, moduleName) now matches import * as <alias> from '<module>' structurally: contents inside StringLiteral, NoSubstitutionTemplateLiteral, and TemplateExpression nodes are inert, so HEREDOC strings stop fooling detection regardless of how the embedded shell text is escaped. Comments are also inert.

Why AST and not a tighter regex / preamble parser:

  • Multi-line import * as\n alias\n from 'mod' declarations slip through any line-anchored regex and get re-injected as duplicates, which the strip-types loader rejects with SyntaxError: Identifier has already been declared. (See the new recognizes already-present rickyWorkflow* alias imports declared via multi-line statement and skips re-injection test.)
  • A future template that emits an alias import below the file preamble would also be missed by a preamble-only scanner.
  • Comment-stripping heuristics (startsWith('//'), startsWith('/*')) miss inline import { x } /* trailing */ from 'mod'; and other edge cases the parser owns for free.

typescript moves from devDependencies to dependencies. The bundler externalizes it (already in the externals list now), so dist/ricky.js stays at 2.4MB. Install footprint grows by ~23MB at npm install time, comparable to the dep weight ricky already carries through @agent-relay/sdk and @agentworkforce/harness-kit.

Test plan

  • npm run typecheck
  • npm test — 1029 / 1030 tests pass; the lone failure is a pre-existing flake in local-run-monitor.test.ts > exposes background monitor progress across subprocess status checks (race between checking subprocess state and the subprocess completing — passes 3/3 in isolation, unrelated to this change).
  • New regression tests in src/local/auto-fix-loop.test.ts:
    • adds the rickyWorkflow* alias imports even when the workflow embeds 'from \\'node:fs\\'' inside a .step command HEREDOC — pins the original bug.
    • recognizes already-present rickyWorkflow* alias imports declared via multi-line statement and skips re-injection — exercises a shape that neither the original substring check nor the intermediate preamble regex could distinguish from "not imported", proving the AST upgrade is meaningful.
  • Existing env-loader injection test still passes.
  • New eval case runtime-recovery.env-loader-injection-runtime-loadable in evals/suites/runtime-recovery/cases.md captures the behavioral requirement at the human-review layer (compiled into cases.jsonl via npm run evals:compile).
  • npm run build — bundle still emits 2.4MB dist/ricky.js; typescript correctly resolved as an external.
  • Manual end-to-end: with the same fix patched into the installed dist, the user's repro produces a workflow file that actually loads under Node (vs. the previous ReferenceError-on-import).

Commits

This PR has two commits that fix the same user-visible bug at successive levels of structural robustness:

  1. b9f9f47 fix(auto-fix): limit alias import detection to file preamble — pragmatic intermediate fix using a line-anchored regex limited to the file preamble.
  2. cc2e2d9 refactor(auto-fix): use TypeScript AST for alias-import detection — final fix using ts.createSourceFile, eliminating the class of failure modes a regex/preamble approach is structurally vulnerable to.

Reviewers may prefer to squash. The intermediate commit is preserved so the rationale for upgrading to AST is visible in history.

Relationship to other PRs

Stacks logically on top of #84 (persona-writer fallback) — that PR ensures the deterministic artifact actually gets written to disk; this PR ensures the deterministic artifact actually loads under Node. Either PR is independently valuable; together they unblock the user's ricky --mode local --spec-file <md> --run flow when the workforce-persona writer is unavailable.

🤖 Generated with Claude Code

…substring

`injectWorkflowEnvLoader` decides whether to add
`import * as rickyWorkflowFs from 'node:fs'` and
`import * as rickyWorkflowPath from 'node:path'` aliases by checking
whether the substring `from 'node:fs'` (or `node:path`) appears anywhere
in the artifact text. Master-rendered workflows (master-workflow-renderer.ts:138-149)
emit a `node --input-type=module` HEREDOC inside a
`.step({ command: ... })` body that contains literal
`import { mkdirSync, writeFileSync } from 'node:fs'` /
`import { dirname } from 'node:path'` lines as part of the embedded
shell script. The substring check matches that string literal and skips
adding the real top-level alias imports. The injected
`loadRickyWorkflowEnv` body still references `rickyWorkflowFs` and
`rickyWorkflowPath`, so Node throws
`ReferenceError: rickyWorkflowPath is not defined` at module load and
the auto-fix loop burns 7/7 attempts on
`UNSUPPORTED_RUNTIME at runtime-launch`.

Repro from the user's session (after PR #84 unblocked the artifact
write so this latent bug could surface):

  Generation: ok — workflows/generated/ricky-...ts
  Workflow name: wf-1e14175ec3b5
  Execution: blocked — UNSUPPORTED_RUNTIME at runtime-launch
  Cause: node --experimental-strip-types exited with code 1
  Auto-fix: stopped after 7/7 attempt(s) (UNSUPPORTED_RUNTIME)

  ReferenceError: rickyWorkflowPath is not defined
      at loadRickyWorkflowEnv (workflows/generated/ricky-...ts:10:18)

Fix: introduce `hasRickyWorkflowAliasImport(content, alias, moduleName)`
which matches an actual top-of-file
`import * as <alias> from '<module>'` statement (anchored to line start
with the multiline flag). Substring matches inside HEREDOC strings no
longer count, so the alias imports always get added when the helpers
that reference them are.

Tests:
- `src/local/auto-fix-loop.test.ts` — new regression case
  "adds the rickyWorkflow* alias imports even when the workflow embeds
  `from 'node:fs'` inside a .step command HEREDOC" reproduces the
  master-rendered shape (`.step({ command: "node --input-type=module
  <<'NODE'\\nimport { mkdirSync, ... } from 'node:fs';\\n..." })`) and
  asserts the top-level aliases land while the HEREDOC stays untouched.
  Existing env-loader injection test still passes.

Evals:
- `evals/suites/runtime-recovery/cases.md` — new
  `runtime-recovery.env-loader-injection-runtime-loadable` case captures
  the behavioral requirement at the human-review layer (compiled into
  cases.jsonl by `npm run evals:compile`).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 9, 2026

Review Change Stack
No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: a6eb3780-ce33-4ade-9f2b-68e88563d5be

📥 Commits

Reviewing files that changed from the base of the PR and between b9f9f47 and cc2e2d9.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (3)
  • package.json
  • src/local/auto-fix-loop.test.ts
  • src/local/auto-fix-loop.ts

📝 Walkthrough

Walkthrough

This PR fixes env-loader injection logic to use anchored, top-of-file import detection instead of substring matching. When repairing workflows with missing environment variables, the injector must add Node.js alias imports but avoid false positives from embedded HEREDOC command text. A new helper function validates actual import statements rather than string presence.

Changes

Env-Loader Deterministic Injection Fix

Layer / File(s) Summary
Test Specification
evals/suites/runtime-recovery/cases.md, evals/suites/runtime-recovery/cases.jsonl
New regression case runtime-recovery.env-loader-injection-runtime-loadable specifies that auto-fix must inject rickyWorkflowFs/rickyWorkflowPath top-level imports only when actual import statements exist, not substring matches, and must preserve embedded HEREDOC shell command text unchanged.
TypeScript Import
src/local/auto-fix-loop.ts
Adds import ts from 'typescript' to enable AST parsing used by alias-detection logic.
Import Detection Helper
src/local/auto-fix-loop.ts
New hasRickyWorkflowAliasImport(content, alias, moduleName) parses the module AST and detects import * as <alias> from '<moduleName>' at module scope.
Core Injection Logic
src/local/auto-fix-loop.ts
injectWorkflowEnvLoader replaces substring checks with alias-aware AST detection and conditionally injects import * as rickyWorkflowFs from 'node:fs'; and/or import * as rickyWorkflowPath from 'node:path'; and the env-loader/assertions.
Regression Tests
src/local/auto-fix-loop.test.ts
Adds two tests: one ensures injection occurs despite HEREDOC containing from 'node:fs'/from 'node:path' substrings and preserves HEREDOC; the other ensures pre-existing import * as alias imports are not duplicated.
Manifest / Dependencies
package.json
Moves typescript to devDependencies and reorders ssh2 placement in the dependencies list.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • AgentWorkforce/ricky#49: Both PRs modify the auto-fix flow that injects the Ricky env loader and add tests for injecting top-level Node alias imports (rickyWorkflowFs/path).
  • AgentWorkforce/ricky#47: Related changes to auto-fix handling of MISSING_ENV_VAR and deterministic env-loader behavior.

Poem

🐰 A workflow once fooled by embedded text,
Now guards its imports with anchored regex blessed!
No HEREDOC trickery shall prevail today—
Top-level aliases mark the proper way.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely summarizes the main change: replacing substring-based detection with TypeScript AST-based detection for rickyWorkflow* alias imports.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining the bug, root cause, fix approach, testing, and impact.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/env-loader-alias-detection

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

Open in Devin Review

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/local/auto-fix-loop.ts`:
- Around line 755-769: The hasRickyWorkflowAliasImport function currently tests
importPattern against the whole file (using the 'm' flag) so a HEREDOC/template
literal with a left-anchored import line can produce a false positive; change
hasRickyWorkflowAliasImport to first extract the file preamble (scan lines from
the top and stop at the first line that is not a top-of-file
import/export/blank/comment statement), then run the existing importPattern.test
only against that preamble string; keep the same importPattern and
escapedAlias/escapedModule logic and update references in
hasRickyWorkflowAliasImport to perform the header-scan-and-slice before testing.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 396c6af8-7009-409f-a684-ec416c0a6736

📥 Commits

Reviewing files that changed from the base of the PR and between 98f3c30 and 50e4c3e.

📒 Files selected for processing (4)
  • evals/suites/runtime-recovery/cases.jsonl
  • evals/suites/runtime-recovery/cases.md
  • src/local/auto-fix-loop.test.ts
  • src/local/auto-fix-loop.ts

Comment thread src/local/auto-fix-loop.ts
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

Ricky Eval Review

Run: .ricky/evals/runs/2026-05-10T07-05-41-570Z-HEAD-provider
Mode: provider
Git SHA: 3ebd4e3

Passed: 5 | Needs human: 43 | Reviewable: 43 | Missing output: 0 | Failed: 0 | Skipped: 0

Human Review Cases

These cases passed deterministic checks and include captured Ricky output for a human verdict against their Must / Must Not rubric.

REVIEW agent-assistant-boundary.real-reuse-not-rhetorical (agent-assistant-boundary/manual)

User message

Update Ricky docs and code to say it uses Agent Assistant more deeply.

Ricky output

**Scope of the change**  
- **Docs** – Update the README, the “Architecture” section, and the “Getting Started” guide to state that Ricky now delegates core reliability checks to the **Agent Assistant** service (instead of the legacy internal helper).  
- **Code** – Replace the internal `ReliabilityEngine` calls with the new `AgentAssistantClient` wrapper, add a thin adapter (`RickyAgentAdapter`) and unit‑test the integration path.  
- **Tests** – Add integration‑test stubs that mock the Agent Assistant API and verify the deterministic verification gates (see contract below).  

All changes are confined to the `docs/` and `ricky/` packages; no cross‑module side‑effects.

---

## 1. Deterministic verification contract

| Gate | Input | Expected output | Success criteria |
|------|-------|----------------|------------------|
| **AA‑Auth** | `RickyConfig.api_key` | HTTP 200 + `{ token: <jwt> }` | Token is non‑empty, expires ≤ 1 h |
| **AA‑Check** | `CheckRequest{ workflow_id, payload }` | `{ status: "ok" \| "fail", details: string }` | `status` matches internal `ReliabilityEngine` result |
| **AA‑Report** | `Report{ run_id, metrics }` | HTTP 202 | Response body empty, `Location` heade
...[truncated]

Must

  • Ground claims in real package imports and runtime paths.
  • Distinguish current implementation from target architecture.
  • Identify which Agent Assistant primitive is actually exercised.

Must Not

  • Rename local code to sound Agent Assistant aligned and count that as adoption.
  • Claim broad Agent Assistant native behavior from documentation-only alignment.
  • Blur target architecture with landed behavior.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.turn-context-preserves-ricky-envelope (agent-assistant-boundary/manual)

User message

Evaluate the current Ricky @agent-assistant/turn-context adoption.

Ricky output

**Ricky @agent‑assistant/turn‑context Adoption Review**  
*(as of the latest CI run 2026‑05‑08)*  

| Metric | Current Value | Target / Trend | Status |
|--------|---------------|----------------|--------|
| **Workflow contracts defined** | 12 of 15 active agents have a `turn‑context` contract in `workflow.yaml` | 100 % | ⚠️ 3 agents missing |
| **Deterministic verification gates** | 9 agents use `turn‑context‑gate.yml` (checks schema, required fields, version bump) | 100 % for agents with contract | ⚠️ 3 agents lack gate |
| **Review artifacts** | 78 % of PRs include `turn‑context‑review.md` (auto‑generated diff + rationale) | ≥ 90 % | ⚠️ 22 % missing |
| **Validation loops** | Avg. 62 loops per turn (target 80‑100) | Below target | ⚠️ |
| **Blocker reporting** | 4 blockers logged in the last sprint (all resolved) | ≤ 2 per sprint | ⚠️ |
| **Branch/PR scoping** | 85 % of PRs keep `turn‑context` changes isolated to a single feature branch | ≥ 95 % | ⚠️ |
| **Test coverage** | 71 % of `turn‑context` code paths covered (unit + integration) | ≥ 80 % | ⚠️ |
| **Documentation compliance** | 6 of 9 `turn‑context` docs have up‑to‑date usage examples | ≥ 8 | ⚠️ |

### Key Findings
1. **Par
...[truncated]

Must

  • Preserve request id, source metadata, structured spec, invocation root, mode, stage mode, spec path, metadata, and spec text.
  • Record compact provenance through generation decisions or coordinator metadata.
  • Keep the shared turn context internal to the adapter boundary.

Must Not

  • Move LocalResponse, blocker taxonomy, recovery wording, or execution semantics into the shared turn-context package.
  • Drop Ricky-specific workflow metadata during envelope assembly.
  • Treat turn context as a product decision engine.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.product-core-stays-ricky-owned (agent-assistant-boundary/manual)

User message

Decide whether workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording should move into Agent Assistant.

Ricky output

**Recommendation:** Move **all** of the listed capabilities—workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording—into the **Agent Assistant** component.

---

## Rationale (aligned with AGENTS.md and workflow standards)

| Capability | Why it belongs in Agent Assistant | Impact on reliability & coordination |
|------------|-----------------------------------|--------------------------------------|
| **Workflow generation** | Agent Assistant already owns the *authoring contract* (input schema → workflow DSL). Centralizing generation avoids duplicated template logic across agents. | Guarantees a single source‑of‑truth for DSL syntax, enabling deterministic verification gates. |
| **Workflow validation** | Validation is the deterministic gate that follows generation. Embedding it in the same service ensures the generated artifact is immediately checked against the **Workflow Contract** (schema + invariants). | Reduces false‑positive passes, enforces 80‑100 validation loops, and surfaces blockers early. |
| **Debugging (runtime & static)** | Debug hooks (trace IDs, step‑state snapshots) are part of the *execution contract*. Agent Assistant already med
...[truncated]

Must

  • Keep product-defining workflow generation, validation, debugging, local UX, and evidence wording Ricky-owned until proof says otherwise.
  • Reuse shared runtime primitives where they reduce duplication without weakening Ricky.
  • Make extraction follow typed, tested, live product proof.

Must Not

  • Generalize workflow-specific behavior prematurely.
  • Adopt moving shared seams merely for architectural purity.
  • Lose the precise local-first staged workflow UX.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.one-slice-at-a-time (agent-assistant-boundary/manual)

User message

Plan the next Agent Assistant adoption slice for Ricky.

Ricky output

**Agent Assistant Adoption Slice – “Ricky v2.0 Roll‑out”**  
*Target: Core Customer‑Success (CS) teams, 3 months, 2 sprints per month (6 sprints total)*  

---

## 1. Slice Objective
Enable CS agents to **create, validate, and publish** AgentWorkforce (AW) workflows without developer hand‑off, while guaranteeing **deterministic reliability** and **traceability** for every new workflow artifact.

---

## 2. Success Metrics (adoption KPIs)

| Metric | Target (by end of slice) |
|--------|--------------------------|
| % of CS agents able to author a workflow end‑to‑end | **≥ 85 %** |
| Avg. time from workflow idea → production PR | **≤ 3 days** |
| Validation gate pass rate (all 80‑100 loops) | **≥ 95 %** |
| Blocker escalation rate (issues > 48 h) | **≤ 2 %** |
| NPS of CS agents on workflow tooling | **≥ 8/10** |

---

## 3. Scope & Boundaries

| In‑Scope | Out‑of‑Scope |
|----------|--------------|
| **Authoring UI** – new “Ricky Studio” web UI (form‑driven, schema‑guided) | Full‑stack refactor of existing AW backend |
| **Deterministic contracts** – JSON‑Schema + OpenAPI for each workflow step | Custom AI‑generated code beyond the contract |
| **Verification gates** – lint, unit, 
...[truncated]

Must

  • Pick exactly one real shared seam to evaluate or adopt.
  • Define a live Ricky product path that will prove the adoption.
  • Include regression checks that product messaging, blocker output, and evidence remain truthful.

Must Not

  • Bundle sessions, memory, policy, proactive behavior, and execution extraction into one vague migration.
  • Skip the comparison/evaluation step for mature Ricky-local seams.
  • Treat adoption as successful without a live product-path proof.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.future-surfaces-use-shared-runtime (agent-assistant-boundary/manual)

User message

Design future Slack or web support for Ricky using Agent Assistant packages.

Ricky output

## Ricky – Future Support Interface (Slack & Web)

Below is a **contract‑first design** for extending Ricky’s support capabilities through two delivery channels:

| Channel | Package | Core Capabilities | Contract (API) | Verification Gate | Review Artifact |
|---------|---------|-------------------|----------------|-------------------|-----------------|
| **Slack** | `ricky-slack‑agent` | • Real‑time query handling  <br>• Escalation to human SME <br>• Context‑preserving thread state | `POST /slack/events`  (JSON)  <br>• `event.type` ∈ {`message`, `app_mention`} <br>• `event.text` ≤ 500 chars <br>• `event.thread_ts` optional | 1️⃣ Schema validation (OpenAPI 3.0) <br>2️⃣ Unit test: 100% of event types return a deterministic `action` field <br>3️⃣ End‑to‑end: simulated Slack workspace runs 10k events with < 1 % failure | OpenAPI spec + generated SDK (Python/TS) |
| **Web** | `ricky‑web‑ui` | • Self‑service knowledge base <br>• Guided workflow wizard <br>• Ticket creation & status view | `GET /ui/v1/knowledge`  (JSON) <br>`POST /ui/v1/ticket`  (JSON) <br>• Request/response schemas versioned (`v1`) | 1️⃣ Contract test against mock server (Pact) <br>2️⃣ UI‑component snapshot diff (Chrom
...[truncated]

Must

  • Prefer shared surfaces, webhook-runtime, sessions, and routing primitives for future non-CLI interaction where mature.
  • Keep local CLI behavior product-local unless shared adoption is proven harmless.
  • Explain which behavior is future/target architecture versus implemented today.

Must Not

  • Preemptively add memory, policy, or proactive packages without a real Ricky product requirement.
  • Let future surface abstractions distort the current CLI contract.
  • Duplicate a mature Agent Assistant capability locally without justification.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.generation-default-not-execution (cli-behavior/manual)

User message

A user runs ricky --mode local --spec "generate a workflow for package checks" without --run.

Ricky output

**What the command does**

Running  

```bash
ricky --mode local --spec "generate a workflow for package checks"
  • --mode local – tells Ricky to operate against the local repository (no remote CI/CD integration).
  • --spec "generate a workflow for package checks" – is the natural‑language request that Ricky will turn into a workflow contract (a YAML/JSON description that follows the AgentWorkforce schema).
  • No --run flag – means Ricky will stop after authoring the workflow; it will not submit a PR, trigger any verification gates, or execute the workflow.

So the command will:

  1. Parse the spec and map it to the standard workflow template (e.g., package-checks.yml).
  2. Create a branch (e.g., ricky/package-checks‑<timestamp>) in the local repo.
  3. Write the workflow file under .github/workflows/ (or the configured workflow directory).
  4. Commit the file with a conventional commit message (feat(workflow): add package‑checks workflow).
  5. Print a summary to the console, showing:
    • The branch name
    • The diff of the new workflow file
    • Instructions on how to review, test, and merge (e.g., `git checkout <
      ...[truncated]

**Must**

- Say generation is the default and execution was not requested.
- Print the generated artifact path, workflow id, spec digest, and next run command.
- Avoid showing execution evidence for a generation-only request.

**Must Not**

- Imply the workflow ran automatically.
- Present a generation-only result as execution success.
- Hide the opt-in commands for running the artifact.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.first-run-copy-is-compact-and-truthful</code> (cli-behavior/manual)</summary>

**User message**

> Render Ricky's first-run CLI onboarding for a new user.

**Ricky output**

```text
**Ricky – First‑Run CLI Onboarding**

Below is the complete, ready‑to‑run onboarding workflow for a brand‑new user of the **Ricky** CLI. It follows the repository conventions (AGENTS.md), includes deterministic verification gates, and is scoped to a single feature branch (`onboarding‑first‑run`).  

---  

## 1️⃣ Repository Setup (one‑time)

```bash
# Clone the AgentWorkforce monorepo (if not already)
git clone https://github.com/AgentWorkforce/agentworkforce.git
cd agentworkforce

# Create a dedicated onboarding branch
git checkout -b onboarding-first-run

2️⃣ Install the Ricky CLI

# Use the official installer (works on macOS, Linux, Windows WSL)
curl -fsSL https://ricky.agentworkforce.com/install.sh | bash

# Verify installation (deterministic gate)
ricky --version | grep -E '^ricky version [0-9]+\.[0-9]+\.[0-9]+$' && echo "✅ CLI installed"

Gate 1 – Installation verification
If the version string does not match the regex, abort and open an issue: #onboarding‑install‑failure.

3️⃣ Initialise a New Ricky Project

# Initialise a fresh project in the current directory
ricky init my‑first‑workflow

# Expected output (deterministic):
#   • Cr
...[truncated]

Must

  • Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
  • End every branch with a concrete next step.
  • Advertise only commands that are currently implemented.

Must Not

  • Sound like a launch page or documentation dump.
  • Claim Ricky runs workflows by default when generation is the default path.
  • Require web or Slack onboarding before CLI use.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.recovery-guidance-no-stack-traces (cli-behavior/manual)

User message

A user gives Ricky an empty spec or a missing spec file.

Ricky output

**Ricky’s Standard Response to an Empty or Missing Spec File**

| Phase | Action | Deliverable | Success Gate |
|-------|--------|-------------|--------------|
| 1️⃣ Detect | • Verify the spec path exists in the repository.<br>• If the file is present, read its contents and check for a non‑empty JSON/YAML object. | **Detection Log** – `spec‑status.md` (found | empty | missing). | File must be **present && non‑empty** to proceed. |
| 2️⃣ Blocker Report | • If the spec is empty or missing, automatically open a **blocking issue** in the repo’s `#workflow‑issues` project. | **Issue #\<id\>** – “Spec <path> is empty/missing – workflow cannot be generated.” | Issue is created and labeled `blocker`, `spec‑missing`. |
| 3️⃣ Request Spec | • Post a templated comment on the issue tagging the spec owner(s) and linking the workflow that depends on it. | **Comment** – “@owner Please provide a valid spec (JSON/YAML) that conforms to `workflow‑spec‑v2`. See `docs/spec‑template.md` for guidance.” | Owner acknowledges the request. |
| 4️⃣ Placeholder Branch (optional) | • Create a short‑lived branch `spec‑placeholder/<workflow‑name>` containing a **minimal stub spec** that satisfies the schema (e.g
...[truncated]

Must

  • Return a user-facing failure or guidance message with a real recovery command.
  • Distinguish generation failure from execution failure.
  • Show stack traces only when verbose diagnostic mode is requested.

Must Not

  • Crash with an uncaught exception in normal mode.
  • Suggest commands that do not exist.
  • Pretend a missing spec was accepted.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.status-does-not-invent-provider-state (cli-behavior/manual)

User message

Render ricky status when no provider checks have proven Google or GitHub are connected.

Ricky output

**`ricky status` output (no provider checks have proven Google or GitHub are connected)**  

=== AgentWorkforce Status Report ===

Providers:
• Google : DISCONNECTED (no successful health check)
• GitHub : DISCONNECTED (no successful health check)

Workflow Engine:
• Scheduler : RUNNING
• Queue depth : 0
• Last heartbeat:

Active Branches:
• main (up‑to‑date)
• feature/* (no pending PRs)

Recent Logs:

  • 2026‑05‑10 09:12:03 INFO Provider health checks initiated
  • 2026‑05‑10 09:12:07 WARN Google provider check failed (timeout)
  • 2026‑05‑10 09:12:09 WARN GitHub provider check failed (authentication error)

Next Actions:
• Verify OAuth credentials for Google
• Verify PAT/token for GitHub
• Re‑run ricky provider check --all after remediation


*Interpretation*: Both Google and GitHub integrations are currently **disconnected** because no health‑check succeeded. Resolve the credential or network issues indicated in the “Next Actions” section, then re‑run the provider checks to bring the services back online.

Must

  • Report unknown or not-connected provider state honestly.
  • Update provider status only from explicit provider checks or Cloud status results.
  • Give concrete setup guidance for Cloud when relevant.

Must Not

  • Mark Google or GitHub connected because guidance text was shown.
  • Invent a provider connection URL or OAuth flow.
  • Show empty fields with no recovery guidance when config is missing.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-matcher-registry-backed (generation-quality/manual)

User message

Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.

Ricky output

**Workflow: “GitHub Primitive Change”**  
*Purpose*: Apply a small, self‑contained change to a repository (e.g., typo fix, comment update, config tweak) while guaranteeing traceability, deterministic verification, and safe merge.  

---  

## 1. Scope & Branching Contract  

| Item | Definition |
|------|------------|
| **Change Type** | Primitive (single‑file, < 50 lines, no API impact) |
| **Target Branch** | `main` (or protected default) |
| **Feature Branch** | `pr/<issue‑id>-<short‑desc>` – created from the latest `main` |
| **PR Title** | `fix: <short description>` – must include `Fixes #<issue‑id>` |
| **PR Description** | • Issue link<br>• Before/after snippet<br>• Test plan (if applicable) |
| **Merge Strategy** | **Rebase‑and‑merge** – keep linear history |
| **Protected Rules** | Required status checks: `lint`, `unit‑test`, `code‑review` (≥1 approver) |

*Contract enforcement*: CI will reject any PR that does not meet the naming or description pattern (regex checks).  

---  

## 2. Step‑by‑Step Workflow (deterministic gates highlighted)

| Step | Actor | Action | Verification Gate |
|------|-------|--------|--------------------|
| **0 – Issue Creation** | Reporter | Ope
...[truncated]

Must

  • Select skills from the actual registry rather than a hardcoded static set.
  • Record ranked skill matches with confidence and trigger evidence.
  • Fall back gracefully when the registry is missing or no skill clears the threshold.

Must Not

  • Claim runtime agents embody skills when only generation-time guidance was applied.
  • Fail generation solely because optional skill files are missing.
  • Hide skill selection evidence from artifacts or JSON output.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)

User message

Generate a workflow where the spec says "use Claude to review and Codex to implement".

Ricky output

**Workflow Title:** `review‑and‑implement‑claude‑codex`

**Purpose**  
Automate a two‑stage development cycle where Claude (LLM‑reviewer) validates design/spec changes and Codex (LLM‑coder) generates the implementation. The workflow is deterministic, auditable, and bounded to a single feature branch.

---

## 1. Contract (YAML)

```yaml
name: review-and-implement-claude-codex
description: |
  Takes a feature spec, runs Claude for a textual review,
  then passes the approved review to Codex for code generation.
  All artifacts are persisted and gated before merge.
trigger:
  - pull_request
  - workflow_dispatch
inputs:
  spec_path:
    type: string
    description: Relative path to the markdown spec file
    required: true
  target_branch:
    type: string
    description: Branch where generated code will be committed
    default: feature/claude-codex
outputs:
  review_report:
    description: Path to Claude’s review markdown
  implementation_pr:
    description: URL of the PR created by Codex
environment:
  python: "3.11"
  node: "20"
resources:
  cpu: "2"
  memory: "4Gi"

2. Step‑by‑Step Execution Plan

| Step | Agent | Action | Deterministic Gate | Artifact |
|------|
...[truncated]


**Must**

- Assign per-step runner/model decisions from explicit spec hints where possible.
- Let skill metadata or project defaults fill gaps when the spec is silent.
- Write `tool-selection.json` or equivalent audit metadata.

**Must Not**

- Use one generic runtime default for every agent despite explicit hints.
- Let skill defaults override explicit user/spec runner hints.
- Omit the reason each tool or model was chosen.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>generation-quality.refine-is-opt-in-and-bounded</code> (generation-quality/manual)</summary>

**User message**

> Use `--refine` to sharpen a generated workflow's step task descriptions and acceptance gates.

**Ricky output**

```text
## Using `--refine` to Sharpen Step Descriptions & Acceptance Gates  

`--refine` is a deterministic post‑processing flag that takes an existing workflow JSON/YAML and rewrites the **`task.description`** and **`task.acceptanceGate`** fields for every step. The goal is to make each step **action‑oriented, measurable, and verifiable** while keeping the original logical flow intact.

Below is a concise contract, the verification gate checklist, and a step‑by‑step usage guide you can copy‑paste into your CI pipeline or local dev loop.

---

### 1. Contract for `--refine`

| Input | Required | Format |
|-------|----------|--------|
| `workflowFile` | ✅ | Path to a valid workflow definition (`.json` or `.yaml`). |
| `outputFile` | ✅ | Destination path for the refined workflow. |
| `options` | ❌ | Optional JSON object: `{ "maxLength": 200, "strictMode": true }`. |

**Output** – A workflow file identical to the input except that every step’s `task.description` and `task.acceptanceGate` are replaced with refined, deterministic text that satisfies the **Refinement Validation Gate** (see §3).

---

### 2. Invocation Syntax

```bash
# Basic usage (writes to a new file)
ricky workflow refine \

...[truncated]

Must

  • Keep the deterministic unrefined path as the default.
  • Bound the refinement pass by timeout, token budget, and editable regions.
  • Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.

Must Not

  • Change the workflow graph, agent assignments, or side-effect scope during refinement.
  • Fail the whole generation if optional refinement times out.
  • Hide warnings when the deterministic artifact is returned unchanged.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.behavior-grounded-gates (generation-quality/manual)

User message

Generate a workflow for the ricky --version spec.

Ricky output

**Workflow: `ricky --version` Specification & Release**  
*Repository:* `github.com/agentworkforce/ricky`  
*Target Branch:* `main` → `release/vX.Y.Z` (semantic version bump)  

---

## 1. Scope & Objective
Produce a deterministic, test‑driven workflow that validates the `ricky --version` command, guarantees correct version string formatting, and publishes the new version tag. The workflow must be reproducible, auditable, and safe to run in CI/CD pipelines.

---

## 2. Contract (Inputs / Outputs)

| Item | Type | Description |
|------|------|-------------|
| `VERSION` | `semver` (e.g., `1.4.2`) | New version to be released. |
| `BUILD_ID` | `string` (CI generated) | Unique identifier for the build. |
| `GIT_TAG` | `string` (`v${VERSION}`) | Git tag created after successful verification. |
| `VERSION_OUTPUT` | `string` | Exact stdout of `ricky --version`. Must match `ricky version ${VERSION}`. |
| `ARTIFACTS` | `zip` | Binary distribution (`ricky-${VERSION}-${OS}-${ARCH}.zip`). |

**Success Criteria**  
- `VERSION_OUTPUT` matches regex `^ricky version ${VERSION}$`.  
- All verification gates (unit, integration, smoke) pass.  
- Tag `v${VERSION}` is pushed and signed.  
- Release ass
...[truncated]

Must

  • Build gates from the stated acceptance behavior, such as checking ricky --version output.
  • Avoid generic source-shape grep checks when the spec asks for CLI behavior.
  • Keep generated validation meaningful for the current repo shape.

Must Not

  • Treat grep -Eq 'export|function|class|workflow(' dist/ricky.js as proof of version behavior.
  • Claim the workflow is proven by source syntax alone.
  • Ignore the package-json version resolution order in the spec.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)

User message

Generate a workflow for many independent artifacts with a validation/fix/rerun loop.

Ricky output

**Workflow: “Batch‑Artifact Validation‑Fix‑Rerun Loop”**  
*Purpose*: Process a large set of independent artifacts (e.g., data files, config bundles, code modules) through deterministic validation, automatic fixing where possible, and controlled re‑execution until all pass or a blocker is reported.

---

## 1. Scope & Boundaries
| Item | Description |
|------|-------------|
| **Branch** | `feature/batch‑artifact‑validation` (isolated from `main`) |
| **PR Target** | `main` – only after **100 % pass** on the batch |
| **Artifacts** | Any files under `artifacts/` that are self‑contained (no cross‑artifact dependencies). |
| **Loop Limit** | Max **80–100** validation‑fix‑rerun cycles per artifact (configurable). |
| **Blocker Policy** | If an artifact exceeds the loop limit or a non‑recoverable error occurs, the workflow aborts and creates a **Blocker Issue** with full diagnostics. |

---

## 2. Contract (Inputs / Outputs)

| Contract | Type | Details |
|----------|------|---------|
| **Input** | `artifact_set` | JSON array of artifact identifiers (paths relative to repo root). |
| **Output** | `validation_report` | JSON summary: `{artifact: {status: "PASS|FIXED|FAIL", attempts: N, lo
...[truncated]

Must

  • Choose dag, supervisor, or pipeline deliberately based on the work shape.
  • Explain the pattern choice in artifact metadata or a rationale.
  • Use dag for validation/fix/rerun loops when dependencies matter.

Must Not

  • Default blindly to dag for every workflow.
  • Collapse independent artifact work into one vague agent task.
  • Omit verification gates because the chosen pattern seems obvious.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-boundary-copy (generation-quality/manual)

User message

Describe how selected workflow-writing skills affected a generated Ricky workflow.

Ricky output

### How Selected Workflow‑Writing Skills Shape a Generated Ricky Workflow  

Below is a concise “skill‑impact map” that shows the concrete ways each of the core Ricky authoring skills influences the structure, reliability, and reviewability of a generated workflow. The map is illustrated with a short, end‑to‑end example workflow (a **Customer‑Onboarding** process) that we built using those skills.

| **Skill** | **What the Skill Enforces** | **Resulting Artifact in the Workflow** | **Why It Improves Reliability** |
|----------|----------------------------|----------------------------------------|---------------------------------|
| **Deterministic Contract Definition** | Every step declares an explicit **input‑output contract** (type, required fields, optional defaults). | `step: verify‑email` <br> `input: { email: string }` <br> `output: { isValid: boolean, reason?: string }` | Guarantees that downstream steps receive exactly what they expect, eliminating “missing‑field” runtime errors. |
| **Verification Gates** | After each step, a **gate** runs a deterministic predicate on the step’s output before allowing the next step to start. | `gate: email‑valid‑gate` <br> `condition: outp
...[truncated]

Must

  • Describe skills as generation-time selection, loading, and rendering inputs.
  • Point to metadata such as loaded-skills.txt and skill-application-boundary.json.
  • Say the workflow instructions were informed by selected skills.

Must Not

  • Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
  • Treat metadata existence as proof that agents received skill bodies at runtime.
  • Overstate current tests beyond the generation boundary.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)

User message

Generate a workflow from a vague product spec and return it to the user.

Ricky output

**Ricky – Workflow Generation Assistant**

---

### 📋 Initial Assessment
The product specification you provided is too vague to create a concrete, verifiable workflow. To deliver a reliable, testable workflow we need the following details:

| Required Detail | Why It’s Needed |
|-----------------|-----------------|
| **Goal / Outcome** | Defines the final deliverable the workflow must achieve. |
| **Key Actors / Roles** | Determines who performs each step (e.g., Engineer, QA, Product Owner). |
| **Core Steps / Milestones** | Establishes the sequence of work items and hand‑offs. |
| **Inputs & Outputs** | Enables deterministic verification gates (e.g., artifact A → artifact B). |
| **Success Criteria / Acceptance Tests** | Provides the validation loop for the 80‑100% verification target. |
| **Dependencies / External Services** | Allows us to scope branch/PR boundaries and identify blockers. |
| **Compliance / Review Gates** | Determines required reviews (code, design, security, etc.). |
| **Timeline / SLA (optional)** | Helps size iteration cycles and gating windows. |

---

### 📐 What the Result Will Look Like (once we have the details)

1. **Workflow Contract (YAML)**
   ```yam
...[truncated]

Must

  • Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
  • Validate with dry-run or targeted structural checks where possible.
  • Return artifacts, warnings, and follow-up commands honestly.

Must Not

  • Act like Ricky is a pure code-generation bot that emits workflows without verification.
  • Stop at "code compiles" as the proof bar.
  • Skip skill-aware workflow authoring guidance for serious workflows.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)

User message

A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.

Ricky output

**Ricky’s Immediate Response Plan**

| Phase | Action | Why | Deliverable / Artifact |
|-------|--------|-----|------------------------|
| 1️⃣ Detect & Log | • Pull the execution trace from the orchestration engine (e.g., Airflow, Temporal). <br>• Identify the exact task that timed‑out and the worker that remained alive. | Provides a deterministic source of truth for the failure. | `failure‑trace.json` (timestamped, immutable) |
| 2️⃣ Blocker Report | • Open a **Blocker Issue** in the workflow repo (e.g., `workflow‑<name>-blocker-<date>.md`). <br>• Tag the issue with `#blocker`, `#timeout`, `#orphan‑worker`. <br>• Include the trace, logs, and a short “impact” summary (e.g., “down‑stream artifact missing → downstream pipelines halted”). | Guarantees visibility to all stakeholders and stops any downstream merges that depend on the broken contract. | GitHub Issue #1234 (or equivalent) |
| 3️⃣ Isolate & Quarantine | • Pause the affected workflow run (or cancel the DAG if the platform permits). <br>• Mark the live worker as *quarantined* in the worker registry so it cannot be scheduled again until the root cause is resolved. | Prevents the same faulty state from propagating. | Updated `
...[truncated]

Must

  • Classify the failure before deciding whether to retry.
  • Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
  • Preserve raw evidence and uncertainty when the class is not known.

Must Not

  • Blindly rerun the whole workflow without a blocker classification.
  • Treat every failure as a broken workflow definition.
  • Claim Ricky fixed the workflow before rerun evidence proves it.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)

User message

Ricky detects stale .agent-relay/, .relay/, and .trajectories/ state before launching a local workflow.

Ricky output

**Workflow Contract – Pre‑Launch Stale‑State Guard**

| **Element** | **Definition** |
|-------------|----------------|
| **Name** | `guard‑stale‑state` |
| **Scope** | Local development runs of any `.agent‑relay/`, `.relay/`, or `.trajectories/` workflow. |
| **Inputs** | <ul><li>`WORKDIR` – absolute path to the repository root (string).</li></ul> |
| **Outputs** | <ul><li>`STALE_FOUND` – boolean flag (`true` if any stale directory is detected).</li><li>`STALE_PATHS` – newline‑separated list of stale directories (string).</li></ul> |
| **Success Criteria** | The guard returns `STALE_FOUND = false`. If `true`, the launch is aborted and a **Blocker Report** is emitted. |
| **Deterministic Verification Gate** | The guard must complete in ≤ 200 ms on a typical dev machine (≤ 2 CPU, 8 GB RAM). Timing is asserted by the CI verification step `verify‑guard‑performance`. |
| **Idempotence** | Re‑running the guard without modifying the filesystem must yield the same `STALE_FOUND`/`STALE_PATHS` values. |
| **Error Handling** | Any unexpected I/O error is treated as a hard failure and escalates to the **Workflow Coordinator** for manual triage. |

---

### 1. Detection Logic (implementation s
...[truncated]

Must

  • Classify stale local runtime state as an environment contamination issue.
  • Recommend quarantine or isolated-run guidance before launch.
  • Record the observed paths and the action taken or recommended.

Must Not

  • Treat stale runtime state as a workflow logic failure.
  • Delete or overwrite state without an explicit safe path or user intent.
  • Continue into execution as if the workspace were clean.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)

User message

A run marker says another Ricky or Relay run is already active in this workspace.

Ricky output

**Fixing “another Ricky or Relay run is already active in this workspace”**

The message means a *run‑marker* file is still present (or a stray process is holding the lock). Follow the deterministic clean‑up contract below; each step is verifiable and idempotent.

---

## 1️⃣ Verify the current state

```bash
# Show the marker (if any) and its contents
cat .ricky/run_marker 2>/dev/null || echo "No marker file"

# List any Ricky/Relay processes owned by you
ps -u $USER -f | grep -E '(ricky|relay)' | grep -v grep

Expected outcome:

  • If the marker file is empty or missing → the error is stale; skip to Step 3.
  • If a process list shows a running ricky/relay PID → note the PID for termination.

2️⃣ Gracefully stop the active run (if it exists)

A. If a PID is found

# Replace <PID> with the number you saw
kill -SIGTERM <PID>
# Wait a few seconds, then verify it exited
sleep 5
ps -p <PID> && echo "Process still alive – escalating" || echo "Process stopped"

If the process persists:

kill -SIGKILL <PID>

B. If no process is listed but the marker remains

The marker may be orphaned (e.g., a crash). Delete it safely:

rm -f 
...[truncated]

Must

  • Report the active marker, run id, or status path when available.
  • Ask the user to inspect, wait for, or explicitly clear the active run.
  • Avoid launching a competing run that could corrupt evidence.

Must Not

  • Silently start another run.
  • Hide the existing run marker from the user.
  • Treat the conflict as a generic failure with no recovery path.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.env-loader-injection-runtime-loadable (runtime-recovery/manual)

User message

A workflow artifact references a MISSING_ENV_VAR value. Ricky's deterministic auto-fix injects the .env.local / .env loader (loadRickyWorkflowEnv) and the optional assertRickyWorkflowEnv guard into the artifact before retry. The artifact may be a master-rendered workflow whose .step({ command: ... }) bodies embed node --input-type=module HEREDOCs containing literal import { ... } from 'node:fs' / from 'node:path' strings.

Ricky output

**Deterministic auto‑fix for a workflow that references `MISSING_ENV_VAR`**

Below is the exact, minimal change you can apply to any workflow artifact (including a master‑rendered workflow whose `.step({ command: … })` bodies embed HEREDOCs) so that the missing environment variable is injected safely and the workflow can be retried without manual intervention.

---

## 1. Add the environment‑loader utility

Create (or update) **`ricky-workflow-env.js`** at the root of the repository:

```js
// ricky-workflow-env.js
import { config } from 'dotenv';
import { resolve } from 'node:path';
import { existsSync } from 'node:fs';

/**
 * Load .env.local first, then fallback to .env.
 * Returns the loaded environment object.
 */
export function loadRickyWorkflowEnv() {
  const cwd = process.cwd();

  // Prefer .env.local, then .env
  const candidates = ['.env.local', '.env'];
  for (const file of candidates) {
    const fullPath = resolve(cwd, file);
    if (existsSync(fullPath)) {
      config({ path: fullPath, override: true });
      console.info(`[Ricky] Loaded environment from ${file}`);
      return process.env;
    }
  }

  console.warn('[Ricky] No .env.* file found – proceeding with 
...[truncated]

Must

  • Produce a repaired artifact that successfully loads under Node, not just one that contains the marker comment. The injected loadRickyWorkflowEnv body references rickyWorkflowFs.* and rickyWorkflowPath.*, so the repair must also add the corresponding import * as rickyWorkflowFs from 'node:fs' and import * as rickyWorkflowPath from 'node:path' aliases at module top level.
  • Detect existing alias imports by matching real top-level import * as <alias> from '<module>' statements, not by substring-matching the module specifier anywhere in the file (substrings inside HEREDOCs in .step({ command: ... }) bodies do not count as imports).
  • Leave the embedded shell HEREDOC contents untouched so the runtime-spawned child process still sees the literal import lines it expects.

Must Not

  • Skip adding the rickyWorkflowFs / rickyWorkflowPath aliases because from 'node:fs' or from 'node:path' already appears somewhere in the file as a string literal.
  • Inject loadRickyWorkflowEnv (or assertRickyWorkflowEnv) without the supporting alias imports, which produces a ReferenceError: rickyWorkflowPath is not defined at module load and burns the auto-fix budget on UNSUPPORTED_RUNTIME at runtime-launch.
  • Rewrite or escape the embedded HEREDOC text in step commands.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)

User message

Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.

Ricky output

**Local Workflow Execution – Auto‑Fix Mode**  

Below is a deterministic, step‑by‑step contract you can copy‑paste into your terminal (or CI script). It assumes you have the **AgentWorkforce CLI** (`aw`) installed and that the failing workflow artifact is stored in `./artifacts/<workflow‑id>.json`.  

---

## 1️⃣ Preconditions (verify before running)

| Check | Command | Expected outcome |
|-------|---------|------------------|
| CLI version | `aw --version` | `>=2.4.0` |
| Workflow definition present | `ls ./workflows/<workflow‑name>.yaml` | File exists |
| Artifact readable | `cat ./artifacts/<workflow‑id>.json` | Valid JSON |
| Previous run ID known | `echo $PREV_RUN_ID` | Non‑empty string (e.g., `run‑20240530‑01`) |
| Auto‑fix flag enabled in config | `awk '/auto_fix/ {print}' ~/.aw/config.yaml` | `auto_fix: true` |

If any check fails, **block** the run and fix the issue before proceeding.

---

## 2️⃣ Execution Contract

```bash
#!/usr/bin/env bash
set -euo pipefail   # deterministic failure handling
IFS=$'\n\t'

# ---------- INPUTS ----------
WORKFLOW_NAME="my_workflow"                     # <‑‑ replace
WORKFLOW_DEF="./workflows/${WORKFLOW_NAME}.yaml"
ARTIFACT_PATH="./artifa
...[truncated]

Must

  • Use a bounded retry budget and summarize every attempt.
  • Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
  • Resume from the failed step with the previous run id when those values are available.

Must Not

  • Edit arbitrary repository source files as the default auto-fix surface.
  • Keep retrying after the configured max attempts.
  • Lose the single Ricky tracking run id across repair/resume attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.no-auto-fix-preserves-single-attempt (runtime-recovery/manual)

User message

A user runs ricky run workflows/foo.ts --no-auto-fix and the workflow fails.

Ricky output

**Quick‑Start Debug Checklist**

1. **Capture the failure output**  
   ```bash
   ricky run workflows/foo.ts --no-auto-fix 2>&1 | tee ricky-run.log
  • Keep ricky-run.log – you’ll need it for the verification gate and any PR you open.
  1. Locate the deterministic verification gate

    • Every workflow in workflows/ has a companion *_spec.ts (or a spec/ folder) that defines the contract (inputs → expected outputs).
    • Run the contract manually to see which assertion is tripping:
      ricky verify workflows/foo_spec.ts
    • The command will print the exact step and the mismatched payload.
  2. Inspect the failing step

    • Open workflows/foo.ts and find the step name reported by the verification gate (e.g., step: "transformData").
    • Check:
      • Input shape (type/interface) vs. what the step actually receives.
      • Side‑effects (file writes, external API calls) – they must be pure or wrapped in a withMock() block for test runs.
  3. Run the step in isolation

    ricky run-step workflows/foo.ts transformData --no-auto-fix
    • This isolates the failure from the rest of the pipeline and g
      ...[truncated]

**Must**

- Preserve one-attempt behavior when auto-fix is disabled.
- Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
- Make clear that the user chose manual inspection over repair/resume automation.

**Must Not**

- Start a repair loop despite `--no-auto-fix`.
- Suppress the diagnosis because no repair was attempted.
- Present the failure as a completed repair attempt.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>runtime-recovery.in-process-local-runner</code> (runtime-recovery/manual)</summary>

**User message**

> Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.

**Ricky output**

```text
**Ricky’s Standard Procedure for Executing a Local TypeScript Workflow Artifact**

Below is the concrete, repeatable contract Ricky follows when a developer asks to run a workflow artifact that lives in the primary local path (e.g., `./workflows/`). The steps are written as a deterministic “execution contract” that can be copied into a README or CI script and verified with automated gates.

---

## 1️⃣ Preconditions (Verification Gate #1)

| Check | How to verify | Pass criteria |
|------|---------------|---------------|
| **Repo root** | `git rev-parse --show-toplevel` | Returns the absolute path of the repository root |
| **Node version** | `node -v` | `>= 18.x` (LTS) |
| **npm / pnpm / yarn** | `npm -v` (or `pnpm -v` / `yarn -v`) | `>= 9.x` (or equivalent) |
| **TypeScript compiler** | `npx tsc --version` | `>= 5.0` |
| **ts-node** (runtime) | `npx ts-node --version` | `>= 10.x` |
| **Workflow artifact exists** | `test -f ./workflows/<name>.ts` | File is present and readable |
| **Dependencies installed** | `npm ls` (or `pnpm ls`) | No “missing” or “extraneous” errors |

If any check fails, Ricky **blocks** the run and emits a concise error report (see “Blocker Reporting” below)
...[truncated]

Must

  • Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the agent-relay binary on PATH.
  • Precheck that Node and @agent-relay/sdk are resolvable for the workflow.
  • Record the actual spawn command in execution evidence.

Must Not

  • Fail solely because agent-relay is not on PATH when the SDK route is available.
  • Hide the actual runtime command from evidence.
  • Conflate the user-facing reproduction command with the primary internal spawn route.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)

User message

Ricky reaches a boundary after a structural failure persists after a fix attempt.

Ricky output

**Boundary‑Reached Protocol – Structural Failure Persisting After a Fix Attempt**

When a structural failure remains after an attempted remediation, the workflow must transition from “repair” to “escalation” in a deterministic, auditable way. Below is the **Ricky‑Standard Boundary‑Reached Contract (R‑SBRC)** that you should invoke immediately.

---

## 1. Contract Preconditions
| Condition | Must be true before invoking R‑SBRC |
|-----------|--------------------------------------|
| **F‑Attempt** – A fix has been applied and **all** unit‑, integration‑, and system‑level verification gates defined in the original **Repair Contract (R‑RC‑\<id\>)** have been re‑run. |
| **F‑Result** – At least **one** verification gate returns **FAIL** and the failure persists for **≥ 2 consecutive runs** (to rule out flakiness). |
| **Owner‑Ack** – The primary owner of the failing component has **signed off** that the fix was applied as specified (signature in the PR comment thread). |
| **No‑New‑Scope** – No new functional scope has been introduced since the fix attempt. |

If any precondition is not met, continue the normal repair loop until they are satisfied.

---

## 2. Deterministic Verificatio
...[truncated]

Must

  • Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
  • Distinguish escalation from a generic product failure.
  • Preserve enough context for a human operator to continue.

Must Not

  • Retry speculative fixes indefinitely.
  • Collapse to "something went wrong" without the attempted actions.
  • Discard evidence from failed repair attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)

User message

Produce a workflow health digest from many Ricky workflow runs.

Ricky output

**Ricky Workflow Health Digest – Aggregated Summary**  
*(Generated from the last N workflow runs across all active branches)*  

| Metric | Value | Target / SLA | Status |
|--------|-------|--------------|--------|
| **Total Runs** | 1 842 | – | ✅ |
| **Successful Runs** | 1 679 | ≥ 95 % success | ✅ |
| **Failed Runs** | 163 | ≤ 5 % failure | ⚠️ |
| **Avg. Runtime** | 4 m 27 s | ≤ 5 m | ✅ |
| **90‑pctile Runtime** | 5 m 12 s | ≤ 6 m | ✅ |
| **Deterministic Verification Pass** | 1 642 | 100 % of successful runs | ✅ |
| **Validation Loop Count** (per run) | 84 ± 7 | 80‑100 loops | ✅ |
| **Blocked Runs (unresolved)** | 7 | 0 | ❌ |
| **Open Review Artifacts** | 12 PRs awaiting sign‑off | ≤ 5 | ⚠️ |
| **Branch Scope Violations** | 3 (out‑of‑scope file changes) | 0 | ❌ |
| **Contract Breaches** | 5 (missing `workflow.contract.yaml`) | 0 | ❌ |

---

### 1. Success / Failure Breakdown
| Branch | Runs | Success % | Avg. Runtime | Recent Failure Reason |
|--------|------|-----------|--------------|-----------------------|
| `main` | 742 | 98 % | 4 m 02 s | None |
| `feature/contract‑v2` | 311 | 93 % | 4 m 58 s | Missing deterministic gate |
| `release/2024‑Q2` | 189 | 96 % | 5 m 10 s | Time
...[truncated]

Must

  • Consume normalized WorkflowRunEvidence rather than raw logs as the primary input.
  • Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
  • Produce concrete recommendations tied to specific workflows, steps, or metrics.

Must Not

  • Mutate evidence while analyzing it.
  • Return generic advice like "improve your workflow" without references.
  • Mix environment failures and workflow-logic failures into one undifferentiated bucket.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)

User message

Design a new Ricky Slack surface for workflow debugging.

Ricky output

**Ricky Slack Surface – “Workflow Debugger”**  
*Version 0.1 – Design Draft*  

---

## 1. Overview
The **Workflow Debugger** Slack surface gives engineers, product managers, and ops a deterministic, low‑latency UI for inspecting, stepping through, and fixing Ricky‑managed workflows directly from a Slack channel or DM. It follows the **Ricky repository conventions** (AGENTS.md) and the **workflow standards** (contract‑first, verification gates, review artifacts, 80‑100 validation loops, blocker reporting, scoped PR boundaries).

---

## 2. Surface Architecture

| Component | Responsibility | Slack Interaction | Persistence |
|-----------|----------------|-------------------|-------------|
| **Slack Bot (ricky‑debugger)** | Receives slash commands / interactive messages, authenticates user, routes to backend services. | `/ricky debug <workflow‑id>` → opens modal. | Stateless; token‑based auth. |
| **API Gateway** | Exposes deterministic REST/GraphQL contracts for all debug actions. | Called by bot. | Logs request/response for audit. |
| **Workflow Engine Adapter** | Reads workflow definition, current state, and execution graph from the central workflow store. | Provides `GET /debug/
...[truncated]

Must

  • Treat Slack as a co-equal product surface, not a wrapper around CLI.
  • Route domain work through the same normalization, executor, evidence, and specialist stages.
  • Keep Slack-specific formatting, thread handling, and interactive components in the surface layer.

Must Not

  • Put workflow generation or diagnosis domain logic directly in the Slack handler.
  • Degrade Slack to a developer shortcut with weaker routing than CLI.
  • Skip signature verification, dedup, or thread handling when the surface is implemented.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.normalizer-is-only-translation-boundary (surfaces-ingress/manual)

User message

Add a new web handoff type that submits a workflow spec and mode.

Ricky output

**New Web‑Handoff Type: `submitWorkflowSpec`**

| Field | Type | Description | Required |
|-------|------|-------------|----------|
| **type** | `string` | Fixed value `"submitWorkflowSpec"` – identifies the handoff. | ✅ |
| **payload** | `object` | Container for the submission data. | ✅ |
| └─ **spec** | `object` | Full workflow specification (JSON‑serializable). Must conform to the **WorkflowSpec v2** schema (see `schemas/workflowSpecV2.json`). | ✅ |
| └─ **mode** | `enum` | Execution mode for the submitted spec: `"dryRun"` | `"execute"` | `"preview"` | ✅ |
| **metadata** | `object` *(optional)* | Caller‑supplied context (e.g., `requestId`, `initiator`). | – |
| **callbackUrl** | `string` *(optional)* | URL to POST the asynchronous result to. If omitted, the caller receives a synchronous 202 response with a `jobId`. | – |

---

## 1. Contract Definition (YAML)

```yaml
# contracts/webHandoff/submitWorkflowSpec.yaml
type: object
required: [type, payload]
properties:
  type:
    const: submitWorkflowSpec
  payload:
    type: object
    required: [spec, mode]
    properties:
      spec:
        $ref: '../../schemas/workflowSpecV2.json'
      mode:
        type: string
        enum: 
...[truncated]

Must

  • Add a handoff variant that normalizes into LocalInvocationRequest or CloudGenerateRequest.
  • Keep auth/session validation in the web surface before normalization.
  • Ensure downstream code does not need to know the request came from web.

Must Not

  • Short-circuit directly from the web handler to executors.
  • Create a parallel domain model for web requests.
  • Make the web surface the owner of local or Cloud routing semantics.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.mcp-claude-context-is-metadata (surfaces-ingress/manual)

User message

A Claude session hands Ricky a workflow spec plus conversation context and asks Ricky to determine whether to generate, debug, restart, analyze, or coordinate.

Ricky output

**Decision:** Generate a new workflow artifact.

**Rationale**

1. **Input Type** – You supplied a *

---

_Comment truncated to stay within GitHub limits. Download the `ricky-eval-run` artifact for the full `human-review.md`._

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/local/auto-fix-loop.ts`:
- Around line 768-786: The preamble-slicing logic uses line.length + 1 to
advance preambleLength which breaks on CRLF and can mis-slice content before
importPattern check (causing duplicate alias imports); update the loop that
computes preambleLength so it advances by the actual substring length including
the real newline separator instead of assuming a single-byte newline: iterate
over the original content with a cursor using content.indexOf to find the next
newline (handling "\r\n" and "\n"), compute the exact increment to add to
preambleLength, and keep the same trimmed checks for comment/import/export
markers; ensure this updated preambleLength is used with
importPattern.test(content.slice(0, preambleLength)) so alias imports are
reliably detected.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 3753c073-d40f-4756-b601-1ec06271a916

📥 Commits

Reviewing files that changed from the base of the PR and between 50e4c3e and b9f9f47.

📒 Files selected for processing (1)
  • src/local/auto-fix-loop.ts

Comment thread src/local/auto-fix-loop.ts Outdated
Follow-up to b9f9f47 (preamble-limited regex). Both this commit and
b9f9f47 fix the same user-visible bug — `injectWorkflowEnvLoader`
silently skipping the rickyWorkflowFs/rickyWorkflowPath alias imports
when a master-rendered workflow embeds `from 'node:fs'` inside a
`.step({ command: ... })` shell HEREDOC — but the regex/preamble
approach is structurally fragile and has class-of-bug failure modes
that AST-based detection eliminates for free:

- Multi-line `import * as\n  alias\n  from 'mod'` declarations slip
  through a line-anchored regex and get re-injected as duplicates,
  which the strip-types loader rejects with `SyntaxError: Identifier
  has already been declared`.
- A future template that emits an alias import below the file
  preamble (after a const, function, comment block boundary, etc.)
  would also be missed.
- Comment-stripping heuristics ("starts with //", "starts with /*")
  miss inline `import { x } /* trailing */ from 'mod';` and other
  edge cases the parser owns.

Replaces the regex with `ts.createSourceFile` + a walk over module-
scope `ImportDeclaration` nodes, matching `import * as <alias> from
'<module>'` structurally. Contents inside StringLiteral /
NoSubstitutionTemplateLiteral / TemplateExpression nodes are inert,
so HEREDOC strings stop fooling detection regardless of how the
embedded shell text is escaped.

`typescript` moves from devDependencies to dependencies. The bundler
externalizes it (it is already used at runtime via the AST API), so
`dist/ricky.js` stays at 2.4MB. Install footprint grows by ~23MB at
`npm install` time but matches the dep weight ricky already carries
through @agent-relay/sdk and @agentworkforce/harness-kit.

Tests:
- New regression case in `src/local/auto-fix-loop.test.ts`:
  "recognizes already-present rickyWorkflow* alias imports declared
  via multi-line statement and skips re-injection" — proves the AST
  upgrade meaningfully improves on the regex approach by handling a
  shape neither the original substring check nor the preamble regex
  could distinguish from "not imported."
- Existing HEREDOC regression test still passes.
- npm test green (1029 pass; the lone failing test in
  local-run-monitor.test.ts is a pre-existing flake — passes 3/3 in
  isolation, unrelated to this change).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kjgbot kjgbot changed the title fix(auto-fix): detect rickyWorkflow* alias imports by statement, not substring fix(auto-fix): detect rickyWorkflow* alias imports via TypeScript AST May 10, 2026
@kjgbot kjgbot merged commit a3e5868 into main May 10, 2026
2 checks passed
@kjgbot kjgbot deleted the fix/env-loader-alias-detection branch May 10, 2026 08:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant