Skip to content

fix(master-renderer): use workspace-aware typecheck command#90

Merged
kjgbot merged 1 commit intomainfrom
fix/master-renderer-workspace-aware-typecheck
May 10, 2026
Merged

fix(master-renderer): use workspace-aware typecheck command#90
kjgbot merged 1 commit intomainfrom
fix/master-renderer-workspace-aware-typecheck

Conversation

@kjgbot
Copy link
Copy Markdown
Contributor

@kjgbot kjgbot commented May 10, 2026

Summary

Master-rendered workflows hardcoded npx tsc --noEmit in three places:

  • master-workflow-renderer.ts:264final-hard-validation step body
  • master-workflow-renderer.ts:465initial-soft-validation gate
  • master-workflow-renderer.ts:467final-hard-validation gate

When a generated workflow ran from a monorepo root with no top-level tsconfig.json (npm workspaces with packages/*/tsconfig.json layout — common in MSD-style repos), npx tsc --noEmit found neither input files nor a config and dumped the full tsc --help text on stdout while exiting 1. The final-hard-validation step retried 2 more times via onError; the auto-fix loop then "repaired" 7×, all failing identically because the workflow command was correct in general — just wrong for that repo shape.

Repro

After PR #84 (persona-writer fallback) and PR #86 (env-loader alias detection) had unblocked the artifact-write and module-load paths so the workflow could actually reach final-hard-validation:

$ ricky --mode local --spec-file docs/plans/2026-05-05-proactive-pr-remediation-spec.md --run
Generation: ok — workflows/generated/ricky-…ts
Workflow name: wf-1e14175ec3b5
Execution: blocked — INVALID_ARTIFACT at final-hard-validation
Cause: ○ final-signoff — skipped
Auto-fix: stopped after 7/7 attempt(s) (INVALID_ARTIFACT)

# stdout from each attempt:
Version 5.9.3
tsc: The TypeScript Compiler - Version 5.9.3
COMMON COMMANDS
  tsc
  Compiles the current project (tsconfig.json in the working directory.)
…(full tsc --help dump)…

The user's MSD repo is an npm workspace with per-package tsconfigs (packages/backend/tsconfig.json, packages/webapp/tsconfig.json, etc.) and no root tsconfig.json. Their actual typecheck command is npm run typecheck, defined in root package.json and doing workspace-aware work (npm run build --workspace=packages/shared && … && tsc -b packages/webapp/tsconfig.app.json).

Fix

Extract a single TYPECHECK_COMMAND constant emitting:

if [ "$(npm pkg get scripts.typecheck 2>/dev/null)" != "{}" ]; then
  npm run typecheck
else
  npx tsc --noEmit
fi

npm pkg get scripts.typecheck returns "<command>" when the script is defined and {} when it is not (npm v7.20.0+, shipped with Node 16+). The substring npx tsc --noEmit is preserved so existing expect.stringContaining('npx tsc --noEmit') assertions, evidence-capture readers, debugger recommendations, and human readers still recognize the intent.

Use the constant in all three master-renderer call sites.

Verified end-to-end against the user's MSD repo: npm pkg get scripts.typecheck returns the project's workspace-aware typecheck script and the snippet routes to npm run typecheck, which correctly typechecks packages/shared, lints packages/backend, and runs tsc -b packages/webapp/tsconfig.app.json.

Test plan

  • npm run typecheck — clean
  • npm test — 1059 / 1059 green across 50 files
  • New regression test in pipeline.test.tsemits a workspace-aware typecheck command in master-rendered final-hard-validation — asserts the rendered workflow contains both branches of the conditional and rejects the bare set -e\nnpx tsc --noEmit\n pattern that produced the bug.
  • Existing 45 pipeline tests still pass (all use expect.stringContaining('npx tsc --noEmit') rather than exact-string equality, so they survive the conditional wrapper unchanged).
  • Manual: snippet correctly routes to npm run typecheck from the user's MSD repo root.

Out of scope (explicitly)

  • template-renderer.ts:288 (typecheckCommand constant for non-master workflows) and :637 (informational verification command shown to LLM agents). The same one-line change pattern would apply, but the existing test coupling there asserts on a wider surface (pipeline.test.ts:980, validator.test.ts, debugger.test.ts, recovery-preflight.test.ts, evidence/capture.test.ts). Worth a follow-up PR to keep blast radius proportionate to user impact — the user's failure was specifically in master-rendered workflows.
  • The [startFrom] No cached output for skipped step … lines on retries. That's a separate auto-fix-loop issue: when final-hard-validation fails, retries resume from that step only, but the upstream 14 steps' outputs aren't cached, so the retry runs against an empty workspace context. Not the cause of the user's failure here, but worth noting.

Relationship to other PRs

Third in a stack of three independent fixes that together unblock ricky --mode local --spec-file <md> --run against monorepo specs:

  1. fix(generation): fall back to deterministic renderer when persona writer throws #84 — persona-writer fallback (deterministic artifact actually gets written to disk)
  2. fix(auto-fix): detect rickyWorkflow* alias imports via TypeScript AST #86 — env-loader alias detection via TypeScript AST (deterministic artifact actually loads under Node)
  3. this PR — workspace-aware typecheck (deterministic workflow's own validation step actually runs in monorepos)

Each is independently shippable.

🤖 Generated with Claude Code

Master-rendered workflows hardcoded `npx tsc --noEmit` in three places:

- `master-workflow-renderer.ts:264` (final-hard-validation step body)
- `master-workflow-renderer.ts:465` (initial-soft-validation gate)
- `master-workflow-renderer.ts:467` (final-hard-validation gate)

When a generated workflow ran from a monorepo root with no top-level
`tsconfig.json` (npm workspaces with `packages/*/tsconfig.json` layout
— common in MSD-style repos), `npx tsc --noEmit` found neither input
files nor a config and dumped the full `tsc --help` text on stdout
while exiting 1. The `final-hard-validation` step retried 2 more
times via onError; the auto-fix loop then "repaired" 7×, all failing
identically because the workflow command was correct in general —
just wrong for that repo shape.

User-visible repro from the proactive-pr-remediation spec, after
PR #84 and PR #86 had unblocked the artifact-write and module-load
paths so the workflow could actually reach final-hard-validation:

  Generation: ok — workflows/generated/ricky-...ts
  Workflow name: wf-1e14175ec3b5
  Execution: blocked — INVALID_ARTIFACT at final-hard-validation
  Auto-fix: stopped after 7/7 attempt(s) (INVALID_ARTIFACT)

Fix: extract a single `TYPECHECK_COMMAND` constant emitting

  if [ "$(npm pkg get scripts.typecheck 2>/dev/null)" != "{}" ]; then
    npm run typecheck
  else
    npx tsc --noEmit
  fi

`npm pkg get scripts.typecheck` returns `"<command>"` when present
and `{}` when absent (npm v7.20.0+, shipped with Node 16+). The
substring `npx tsc --noEmit` is preserved so existing
`expect.stringContaining('npx tsc --noEmit')` assertions, evidence
capture, debugger recommendations, and human readers continue to
recognize the intent.

Verified end-to-end in the user's MSD repo: `npm pkg get
scripts.typecheck` returns the project's workspace-aware typecheck
script and the snippet routes to `npm run typecheck`, which
correctly typechecks `packages/shared`, lints `packages/backend`,
and runs `tsc -b packages/webapp/tsconfig.app.json`.

Tests:
- New regression case in `pipeline.test.ts` —
  "emits a workspace-aware typecheck command in master-rendered
  final-hard-validation" — asserts the rendered workflow contains
  both branches of the conditional and rejects the bare `set -e\n
  npx tsc --noEmit\n` pattern that produced the bug.
- Existing 45 pipeline tests still pass (all use
  `stringContaining('npx tsc --noEmit')` rather than exact-string
  equality, so they survive the conditional wrapper unchanged).
- `npm test` — 1059 / 1059 green across 50 files.

Out of scope (explicitly):
- `template-renderer.ts:288` (`typecheckCommand` constant for
  non-master workflows) and `:637` (informational verification
  command shown to LLM agents). Same one-line change pattern would
  apply, but the existing test coupling there asserts on a wider
  surface (`pipeline.test.ts:980`, `validator.test.ts`,
  `debugger.test.ts`, `recovery-preflight.test.ts`,
  `evidence/capture.test.ts`). Worth a follow-up PR to keep blast
  radius proportionate to user impact — the user's failure was
  specifically in master-rendered workflows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 10, 2026

Review Change Stack
No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 116bf079-87bb-4b22-a55e-acab1814a3bb

📥 Commits

Reviewing files that changed from the base of the PR and between 52c9729 and 9f16c34.

📒 Files selected for processing (2)
  • src/product/generation/master-workflow-renderer.ts
  • src/product/generation/pipeline.test.ts

📝 Walkthrough

Walkthrough

A new workspace-safe TYPECHECK_COMMAND shell snippet conditionally runs npm run typecheck when available, falling back to npx tsc --noEmit. This replaces hardcoded TypeScript checks in the master workflow's final-hard-validation step and initial/final-hard-validation gates. A regression test verifies the conditional logic and fallback behavior.

Changes

Workspace-aware Typecheck Implementation

Layer / File(s) Summary
Typecheck Command Definition
src/product/generation/master-workflow-renderer.ts
New TYPECHECK_COMMAND shell snippet conditionally selects between npm run typecheck (when scripts.typecheck exists) and npx tsc --noEmit (fallback).
Workflow & Gate Integration
src/product/generation/master-workflow-renderer.ts
final-hard-validation step and initial-soft-validation and final-hard-validation gates now use TYPECHECK_COMMAND instead of hardcoded npx tsc --noEmit.
Regression Test
src/product/generation/pipeline.test.ts
New test case verifies final-hard-validation contains conditional workspace-aware typecheck with npm pkg get scripts.typecheck branch and npx tsc --noEmit fallback, and that the bare fallback does not appear directly after set -e.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A typecheck command, nimble and wise,
Adapts to each workspace, a practical prize,
When scripts exist, npm takes the lead,
When they don't, tsc fills the need!
thump thump

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely summarizes the main change: fixing the master-renderer to use a workspace-aware typecheck command instead of hardcoded npx tsc --noEmit.
Description check ✅ Passed The description thoroughly explains the problem (hardcoded typecheck failing in monorepos), the fix (conditional command), test coverage, and verification, all directly related to the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/master-renderer-workspace-aware-typecheck

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

Open in Devin Review

@kjgbot kjgbot merged commit 1b2a0be into main May 10, 2026
3 checks passed
@kjgbot kjgbot deleted the fix/master-renderer-workspace-aware-typecheck branch May 10, 2026 11:24
@github-actions
Copy link
Copy Markdown
Contributor

Ricky Eval Review

Run: .ricky/evals/runs/2026-05-10T11-09-06-709Z-HEAD-provider
Mode: provider
Git SHA: 643f142

Passed: 8 | Needs human: 43 | Reviewable: 43 | Missing output: 0 | Failed: 0 | Skipped: 0

Human Review Cases

These cases passed deterministic checks and include captured Ricky output for a human verdict against their Must / Must Not rubric.

REVIEW agent-assistant-boundary.real-reuse-not-rhetorical (agent-assistant-boundary/manual)

User message

Update Ricky docs and code to say it uses Agent Assistant more deeply.

Ricky output

## Updated Ricky Documentation – “Deeper Integration with Agent Assistant”

### 1. Overview
Ricky (AgentWorkforce workflow reliability, coordination, and authoring assistant) now **leverages the Agent Assistant (AA) platform as a first‑class service** for:

| Capability | How AA is used | Benefit |
|------------|----------------|---------|
| **Deterministic verification gates** | AA runs the *contract‑check* and *state‑audit* micro‑services on every PR merge. | Guarantees that workflow contracts remain invariant across releases. |
| **Dynamic review artifacts** | AA generates *review‑summary* and *risk‑heatmap* artifacts that are automatically attached to PRs. | Reduces manual review time and surfaces hidden failure modes. |
| **Scalable validation loops** | AA orchestrates 80‑to‑100 × validation loops (fuzz, property‑based, and regression) per workflow change. | Drives statistical confidence > 99.9 % before promotion. |
| **Blocker detection & reporting** | AA’s *blocker‑engine* watches for contract violations, resource dead‑locks, or external‑API regressions and raises a **Ricky‑Blocker** ticket. | Early detection prevents cascade failures in production. |
| **Branch/PR scoping**
...[truncated]

Must

  • Ground claims in real package imports and runtime paths.
  • Distinguish current implementation from target architecture.
  • Identify which Agent Assistant primitive is actually exercised.

Must Not

  • Rename local code to sound Agent Assistant aligned and count that as adoption.
  • Claim broad Agent Assistant native behavior from documentation-only alignment.
  • Blur target architecture with landed behavior.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.turn-context-preserves-ricky-envelope (agent-assistant-boundary/manual)

User message

Evaluate the current Ricky @agent-assistant/turn-context adoption.

Ricky output

**Ricky @agent‑assistant/turn‑context Adoption Evaluation (Q1 2026)**  

| Metric | Current State | Target (Q3 2026) | Gap | Action |
|--------|---------------|------------------|-----|--------|
| **Enabled Repositories** | 12 of 28 active repos (43 %) | 22 of 28 (≥ 80 %) | +10 repos | • Add `turn-context` to the shared `ricky.yml` config of the remaining repos.<br>• Run the “Enable‑Turn‑Context” CI job to auto‑inject the dependency. |
| **Turn‑Context Calls per Workflow** | Avg 2.1 calls / workflow (baseline 5) | Avg ≥ 4.5 calls / workflow | −2.4 calls | • Refactor legacy steps to use `@agent-assistant/turn-context` for state passing.<br>• Provide a migration checklist in the repo‑wide `README`. |
| **Deterministic Verification Gates** (pass rate) | 71 % of PRs pass the “Turn‑Context Consistency” gate | ≥ 95 % | +24 % | • Harden the gate script (`verify-turn-context.js`) to fail on missing `turnId` or mismatched `contextHash`.<br>• Add a pre‑merge “dry‑run” stage that simulates a full turn. |
| **Review Artifacts** (turn‑context diff reports) | Produced in 38 % of PRs | 100 % | +62 % | • Enforce `turn-context-report` as a required check in branch protection rules.<br>• Auto‑commen
...[truncated]

Must

  • Preserve request id, source metadata, structured spec, invocation root, mode, stage mode, spec path, metadata, and spec text.
  • Record compact provenance through generation decisions or coordinator metadata.
  • Keep the shared turn context internal to the adapter boundary.

Must Not

  • Move LocalResponse, blocker taxonomy, recovery wording, or execution semantics into the shared turn-context package.
  • Drop Ricky-specific workflow metadata during envelope assembly.
  • Treat turn context as a product decision engine.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.product-core-stays-ricky-owned (agent-assistant-boundary/manual)

User message

Decide whether workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording should move into Agent Assistant.

Ricky output

**Recommendation: Move the listed capabilities into the Agent Assistant**

| Capability | Fit with Agent Assistant | Rationale | Migration Path |
|------------|--------------------------|-----------|----------------|
| **Workflow generation** | ✅ Core | The Assistant already owns the “author‑as‑you‑go” contract (AGENTS.md §2.1). Adding generation lets it produce the initial YAML/JSON skeleton that downstream validators consume. | 1. Add a `generate‑workflow` contract (input: high‑level intent, output: workflow spec).<br>2. Unit‑test with 5‑sample intents.<br>3. Gate: schema validation + deterministic diff against golden fixtures. |
| **Workflow validation** | ✅ Core | Validation is the deterministic gate that the Assistant must expose to callers (CLI, UI, other agents). It aligns with the “verification gate” pattern (AGENTS.md §3.2). | 1. Implement `validate‑workflow` contract (input: spec, output: pass/fail + diagnostics).<br>2. Gate: 100% schema compliance + 80‑100 rule‑coverage checks.<br>3. Add CI step that runs the validator on every PR. |
| **Workflow debugging** | ✅ Core + Extension | Debugging is the “repair loop” that follows a failed validation gate. The Assistant can sug
...[truncated]

Must

  • Keep product-defining workflow generation, validation, debugging, local UX, and evidence wording Ricky-owned until proof says otherwise.
  • Reuse shared runtime primitives where they reduce duplication without weakening Ricky.
  • Make extraction follow typed, tested, live product proof.

Must Not

  • Generalize workflow-specific behavior prematurely.
  • Adopt moving shared seams merely for architectural purity.
  • Lose the precise local-first staged workflow UX.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.one-slice-at-a-time (agent-assistant-boundary/manual)

User message

Plan the next Agent Assistant adoption slice for Ricky.

Ricky output

**Agent Assistant Adoption Slice – “Ricky v2.0 Rollout”**  
*Target: Core Customer‑Success (CS) teams, 3 pilot squads (10 users total). 6‑week execution window.*

---

## 1. Slice Objectives
| Goal | Success Metric | Acceptance Threshold |
|------|----------------|----------------------|
| **Feature Enablement** – Deploy Ricky’s workflow‑reliability UI and verification gates to CS squads. | % of pilot users who can create, edit, and submit a *Ricky‑compliant* workflow. | ≥ 90 % (≥ 9/10 users) |
| **Reliability Baseline** – Capture deterministic gate pass‑rates for newly authored workflows. | Avg. gate pass‑rate on first submission. | ≥ 85 % |
| **Feedback Loop** – Close the review‑artifact loop within the sprint. | Avg. time from PR open → reviewer sign‑off. | ≤ 24 h |
| **Blocker Transparency** – Log all adoption blockers in the “Ricky‑Adopt” issue board. | % of blockers reported within 2 h of detection. | 100 % |

---

## 2. Scope & Boundaries

| In‑Scope | Out‑Of‑Scope |
|----------|--------------|
| • Creation of **Ricky‑compliant workflow contracts** (YAML spec, versioned).<br>• Deterministic verification gates (syntax, schema, idempotence, test‑coverage ≥ 80 %).<br>• Review a
...[truncated]

Must

  • Pick exactly one real shared seam to evaluate or adopt.
  • Define a live Ricky product path that will prove the adoption.
  • Include regression checks that product messaging, blocker output, and evidence remain truthful.

Must Not

  • Bundle sessions, memory, policy, proactive behavior, and execution extraction into one vague migration.
  • Skip the comparison/evaluation step for mature Ricky-local seams.
  • Treat adoption as successful without a live product-path proof.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.future-surfaces-use-shared-runtime (agent-assistant-boundary/manual)

User message

Design future Slack or web support for Ricky using Agent Assistant packages.

Ricky output

## Ricky – Future Support Layer (Slack / Web)

Below is a **concrete, contract‑driven design** for extending Ricky with a **Slack‑app** and a **Web‑portal** that both surface the same Agent‑Assistant package (AAP) services.  The design follows the **Ricky repository conventions** (AGENTS.md), the **workflow standards** (deterministic gates, review artifacts, 80‑100 validation loops), and the **shared authoring rules** (scoped branches/PRs, blocker reporting).

---

### 1. High‑Level Architecture

┌─────────────────────┐ ┌─────────────────────┐
│ Slack App (Bot) │ │ Web Portal (SPA) │
│ - Events API │ │ - React + Vite │
│ - Interactivity │ │ - Auth (OAuth2) │
└───────┬─────────────┘ └───────┬─────────────┘
│ │
▼ ▼
┌─────────────────────────────────────────────────────┐
│ Ricky Service Mesh (K8s) │
│ • AAP Dispatcher (gRPC) ← All Agent‑Assistant PKGs │
│ • Policy Engine (OPA) ← Rate‑limit, RBAC │
│ • Persistence (Postgres) ← Workflow state, logs │
│ • Observability (OTEL) ← Traces, metrics
...[truncated]


**Must**

- Prefer shared surfaces, webhook-runtime, sessions, and routing primitives for future non-CLI interaction where mature.
- Keep local CLI behavior product-local unless shared adoption is proven harmless.
- Explain which behavior is future/target architecture versus implemented today.

**Must Not**

- Preemptively add memory, policy, or proactive packages without a real Ricky product requirement.
- Let future surface abstractions distort the current CLI contract.
- Duplicate a mature Agent Assistant capability locally without justification.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.generation-default-not-execution</code> (cli-behavior/manual)</summary>

**User message**

> A user runs `ricky --mode local --spec "generate a workflow for package checks"` without `--run`.

**Ricky output**

```text
**What the command does**

Running  

```bash
ricky --mode local --spec "generate a workflow for package checks"
  • --mode local – tells Ricky to operate against the local repository (no remote CI/CD interaction).
  • --spec "generate a workflow for package checks" – is the natural‑language request that Ricky will translate into a concrete workflow definition (YAML/JSON) following the AgentWorkforce contract.
  • No --run flag – means Ricky will only generate the workflow artifact; it will not execute any of the steps, nor will it create a PR or push anything to the remote.

So the outcome is a draft workflow file (e.g., package-checks.yml) placed in the appropriate .ricky/workflows/ directory, ready for review, validation, and eventual execution.


Next steps you’ll likely want to take

Step Command What it does
1. Review the generated workflow cat .ricky/workflows/package-checks.yml Inspect the YAML to ensure it matches your expectations (stages, inputs, outputs, verification gates).
2. Run deterministic verification gates `ricky --mode local --verify .ricky/workflows/pac
...[truncated]

**Must**

- Say generation is the default and execution was not requested.
- Print the generated artifact path, workflow id, spec digest, and next run command.
- Avoid showing execution evidence for a generation-only request.

**Must Not**

- Imply the workflow ran automatically.
- Present a generation-only result as execution success.
- Hide the opt-in commands for running the artifact.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.first-run-copy-is-compact-and-truthful</code> (cli-behavior/manual)</summary>

**User message**

> Render Ricky's first-run CLI onboarding for a new user.

**Ricky output**

```text
**Ricky – First‑Run CLI Onboarding**

Below is the complete, copy‑and‑paste ready onboarding workflow that a new user should run the first time they invoke the `ricky` CLI. It follows the **Ricky repository conventions** (AGENTS.md), the **workflow standards** (deterministic gates, review artifacts, 80‑100 validation loops), and the **shared authoring rules** (scoped branches/PRs, honest blocker reporting).

---

## 1️⃣  Create a Scoped Onboarding Branch

```bash
# From the repository root
git checkout -b onboarding/first‑run‑<your‑username>

All onboarding changes stay isolated until the final PR is merged.


2️⃣ Install the CLI (if not already)

# Recommended: use the official installer script
curl -fsSL https://get.ricky.ai/install.sh | bash

# Verify installation
ricky --version
# Expected output: ricky version X.Y.Z

Deterministic gate: The version printed must match the version declared in package.json (or pyproject.toml). If it does not, abort and open an issue.


3️⃣ Initialise the Local Workspace

ricky init \
  --org <YOUR_ORG_NAME> \
  --project <YOUR_PROJECT_NAME> \
  --template workflow‑starter

What happens:

  • Gene
    ...[truncated]

**Must**

- Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
- End every branch with a concrete next step.
- Advertise only commands that are currently implemented.

**Must Not**

- Sound like a launch page or documentation dump.
- Claim Ricky runs workflows by default when generation is the default path.
- Require web or Slack onboarding before CLI use.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.recovery-guidance-no-stack-traces</code> (cli-behavior/manual)</summary>

**User message**

> A user gives Ricky an empty spec or a missing spec file.

**Ricky output**

```text
**Ricky’s Standard Response to an Empty or Missing Spec File**

| Step | Action | Expected Output | Why it matters |
|------|--------|----------------|----------------|
| 1️⃣  | **Detect the problem** – As soon as the workflow runner attempts to load the spec, check `fs.stat` / `fs.readFile`. If the file does not exist **or** its size is 0 bytes, raise a *Spec‑Load Blocker*. | `BLOCKER: spec missing or empty (path: <repo>/workflow/spec.yaml)` | Guarantees that no downstream steps run on an undefined contract. |
| 2️⃣  | **Emit a deterministic verification gate** – Create a `spec‑availability` gate that must pass before any other gate can be evaluated. | `gate: spec‑availability → PASS/FAIL` (FAIL in this case) | Makes the failure explicit in CI dashboards and prevents silent “no‑op” runs. |
| 3️⃣  | **Auto‑generate a placeholder spec** – In a new branch `ricky/spec‑placeholder‑<timestamp>`, add a minimal, well‑formed spec skeleton that includes: <br>• `name`, `description`, `version` <br>• `inputs: []` <br>• `outputs: []` <br>• `steps: []` <br>• `contract: {}` | `workflow/spec.yaml` with the skeleton above | Gives the team a concrete artifact to edit, keeps the repo in a buildable 
...[truncated]

Must

  • Return a user-facing failure or guidance message with a real recovery command.
  • Distinguish generation failure from execution failure.
  • Show stack traces only when verbose diagnostic mode is requested.

Must Not

  • Crash with an uncaught exception in normal mode.
  • Suggest commands that do not exist.
  • Pretend a missing spec was accepted.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.status-does-not-invent-provider-state (cli-behavior/manual)

User message

Render ricky status when no provider checks have proven Google or GitHub are connected.

Ricky output

**`ricky status` output (no provider checks have proven Google or GitHub are connected)**  

=== AgentWorkforce Status Report ===

🟡 Provider Connectivity
• Google: NOT VERIFIED (no successful auth check)
• GitHub: NOT VERIFIED (no successful auth check)

🔧 Configuration
• Google OAuth client ID:
• GitHub App ID:
• Required scopes: configured

⚙️ System Health
• Core services: ✅ Running
• Scheduler: ✅ Running
• Database: ✅ Connected
• Cache (Redis): ✅ Connected

🚨 Alerts
• Provider connectivity not confirmed.
• Action required: run ricky provider check google and ricky provider check github
• Or re‑authenticate the missing providers.

📝 Next Steps
1. Verify Google credentials:
ricky provider auth google --interactive
2. Verify GitHub credentials:
ricky provider auth github --interactive
3. Re‑run status after successful checks:
ricky status

--- End of Report ---

Must

  • Report unknown or not-connected provider state honestly.
  • Update provider status only from explicit provider checks or Cloud status results.
  • Give concrete setup guidance for Cloud when relevant.

Must Not

  • Mark Google or GitHub connected because guidance text was shown.
  • Invent a provider connection URL or OAuth flow.
  • Show empty fields with no recovery guidance when config is missing.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-matcher-registry-backed (generation-quality/manual)

User message

Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.

Ricky output

**Workflow: “GitHub Primitive Change”**

| Step | Owner | Action | Input | Output | Verification Gate |
|------|-------|--------|-------|--------|--------------------|
| 1️⃣ | **Ricky (Coordination)** | **Create a scoped branch** | `main` (latest) | New branch `feat/<short‑description>` | Branch exists, naming follows `feat/…` pattern |
| 2️⃣ | **Ricky (Branch‑Management)** | **Sync branch with upstream** | `main` | Branch is up‑to‑date (no diverge) | `git fetch && git rebase origin/main` succeeds |
| 3️⃣ | **Developer (Implementation)** | **Apply primitive change** (e.g., edit a config file, bump a version, add a comment) | Scoped branch | Commit `c1` with clear message `chore: <short description>` | `git log -1` shows conventional commit format |
| 4️⃣ | **Ricky (Deterministic‑Verification)** | **Run local lint / unit tests** | Commit `c1` | Test report `test‑report.xml` | All tests pass (`exit code 0`) |
| 5️⃣ | **Ricky (CI‑Trigger)** | **Push branch & open PR** | Scoped branch | PR #\<n\> targeting `main` | PR created, title follows `chore: …` |
| 6️⃣ | **Ricky (Review‑Orchestration)** | **Assign reviewers & label** | PR #\<n\> | Reviewers added, labels `primitive‑change`, `ci‑
...[truncated]

Must

  • Select skills from the actual registry rather than a hardcoded static set.
  • Record ranked skill matches with confidence and trigger evidence.
  • Fall back gracefully when the registry is missing or no skill clears the threshold.

Must Not

  • Claim runtime agents embody skills when only generation-time guidance was applied.
  • Fail generation solely because optional skill files are missing.
  • Hide skill selection evidence from artifacts or JSON output.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)

User message

Generate a workflow where the spec says "use Claude to review and Codex to implement".

Ricky output

**Workflow: Claude‑Review → Codex‑Implementation**

| **Stage** | **Owner** | **Input** | **Output** | **Verification Gate** | **Artifacts** |
|-----------|-----------|-----------|------------|-----------------------|---------------|
| 1️⃣ **Prompt Draft** | Product Owner | Feature ticket (title, description, acceptance criteria) | Structured prompt JSON for Claude | ✅ Schema validation (JSON‑Schema v2020‑12) | `prompt_draft.json` |
| 2️⃣ **Claude Review** | **Claude (LLM‑review)** | Prompt JSON | Reviewed prompt + review notes | ✅ Passes *Claude‑Review* test: contains “review‑summary”, “issues‑list”, “suggested‑changes” fields; length ≤ 2 k tokens | `claude_review.yaml` |
| 3️⃣ **Human Gate** | Senior Engineer | Claude review artifact | Go/No‑Go decision | ✅ Decision recorded in `gate_decision.md` (approved / blocked with blocker reason) | `gate_decision.md` |
| 4️⃣ **Prompt Finalization** | Engineer | Approved reviewed prompt | Final prompt JSON for Codex | ✅ Diff check: only allowed fields changed (no removal of required keys) | `prompt_final.json` |
| 5️⃣ **Codex Generation** | **Codex (LLM‑code)** | Final prompt JSON | Code diff (patch) + unit‑test scaffold | ✅ Compilation che
...[truncated]

Must

  • Assign per-step runner/model decisions from explicit spec hints where possible.
  • Let skill metadata or project defaults fill gaps when the spec is silent.
  • Write tool-selection.json or equivalent audit metadata.

Must Not

  • Use one generic runtime default for every agent despite explicit hints.
  • Let skill defaults override explicit user/spec runner hints.
  • Omit the reason each tool or model was chosen.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)

User message

Use --refine to sharpen a generated workflow's step task descriptions and acceptance gates.

Ricky output

### Using `--refine` to Sharpen Step Descriptions & Acceptance Gates  

`--refine` is a deterministic post‑processing flag that takes an existing workflow JSON/YAML and rewrites the **`task.description`** and **`task.acceptanceGate`** fields for every step. The goal is to make each step **action‑oriented, measurable, and verifiable** while keeping the original logical flow intact.

Below is the end‑to‑end contract you can copy‑paste into your CI pipeline or local CLI.

---

## 1️⃣ Contract Overview  

| Element | Requirement | Example |
|--------|-------------|---------|
| **Input** | `workflow.{json|yaml}` – a valid workflow file with `steps[].task` objects. | `my‑pipeline.yaml` |
| **Flag** | `--refine` – must be supplied exactly once. | `ricky-cli generate --refine my‑pipeline.yaml` |
| **Output** | Same file format, same step order, but each `task.description` and `task.acceptanceGate` rewritten to meet the **Concrete‑Task Rule** (see §2). | `my‑pipeline.refined.yaml` |
| **Determinism** | No random text generation; the refinement algorithm is pure functional (same input → same output). | ✅ |
| **Verification Gate** | After refinement, a **static validator** (`ricky-validate`) 
...[truncated]

Must

  • Keep the deterministic unrefined path as the default.
  • Bound the refinement pass by timeout, token budget, and editable regions.
  • Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.

Must Not

  • Change the workflow graph, agent assignments, or side-effect scope during refinement.
  • Fail the whole generation if optional refinement times out.
  • Hide warnings when the deterministic artifact is returned unchanged.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.behavior-grounded-gates (generation-quality/manual)

User message

Generate a workflow for the ricky --version spec.

Ricky output

**Workflow: `ricky --version` Specification & Delivery**  
*Repository:* `github.com/agentworkforce/ricky`  
*Target Branch:* `main` → `feat/version-cli`  
*PR:* `feat/version-cli` → `main`  

---

## 1. Scope & Acceptance Criteria  

| # | Requirement | Acceptance Test |
|---|-------------|-----------------|
| 1 | `ricky --version` prints a semantic version string (`MAJOR.MINOR.PATCH`) matching the `package.json`/`go.mod` version. | Run `ricky --version`; output matches `git describe --tags --abbrev=0` (or `package.json.version`). |
| 2 | The command exits with status `0`. | Verify `$? == 0`. |
| 3 | The version flag is discoverable via `ricky --help`. | `ricky --help` lists `--version` under “Global Flags”. |
| 4 | The flag is case‑insensitive (`-V` and `--VERSION` also work). | Test all variants; each prints the same version. |
| 5 | The version string is injected at build time (no runtime file reads). | Build artifact contains hard‑coded version; source does not read external files at runtime. |
| 6 | CI must fail if the version string diverges from the repository tag. | CI step `verify-version-tag` compares binary output to `git tag --points-at HEAD`. |
| 7 | Documentation upd
...[truncated]

Must

  • Build gates from the stated acceptance behavior, such as checking ricky --version output.
  • Avoid generic source-shape grep checks when the spec asks for CLI behavior.
  • Keep generated validation meaningful for the current repo shape.

Must Not

  • Treat grep -Eq 'export|function|class|workflow(' dist/ricky.js as proof of version behavior.
  • Claim the workflow is proven by source syntax alone.
  • Ignore the package-json version resolution order in the spec.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)

User message

Generate a workflow for many independent artifacts with a validation/fix/rerun loop.

Ricky output

**Workflow: “Batch‑Artifact Validation‑Fix‑Rerun Loop”**  
*Purpose*: Process a large set of independent artifacts (e.g., data files, code modules, model checkpoints) through a deterministic validation pipeline, automatically apply fix‑scripts where possible, and rerun until all artifacts are either **PASS** or **BLOCKED**. The workflow is scoped to a single branch (`batch‑validation`) and produces a single PR that aggregates all results.

---

## 1. Contract (YAML)

```yaml
name: batch_artifact_validation
description: |
  Validate, auto‑fix, and rerun a collection of independent artifacts.
scope:
  branch: batch-validation
  pr_title: "Batch validation results – {run_id}"
  pr_labels: [batch‑validation, auto‑fix, ci]
inputs:
  artifacts_path:   # root folder containing artifacts
    type: string
    description: Relative path to the directory with artifacts.
    default: artifacts/
  validator:
    type: string
    description: Executable that returns exit‑code 0 for PASS, 1 for FAIL.
    default: ./validate.sh
  fixer:
    type: string
    description: Script that attempts to fix a failed artifact.
    default: ./fix.sh
  max_iterations:
    type: integer
    description: Upper b
...[truncated]

Must

  • Choose dag, supervisor, or pipeline deliberately based on the work shape.
  • Explain the pattern choice in artifact metadata or a rationale.
  • Use dag for validation/fix/rerun loops when dependencies matter.

Must Not

  • Default blindly to dag for every workflow.
  • Collapse independent artifact work into one vague agent task.
  • Omit verification gates because the chosen pattern seems obvious.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-boundary-copy (generation-quality/manual)

User message

Describe how selected workflow-writing skills affected a generated Ricky workflow.

Ricky output

### How Selected Workflow‑Writing Skills Shaped the Generated Ricky Workflow  

| Skill | What the Skill Does | Direct Effect on the Generated Workflow | Why It Matters for Reliability |
|-------|--------------------|----------------------------------------|--------------------------------|
| **Deterministic Contract Definition** | Forces every step to declare its input‑output schema (JSON schema, type hints, or OpenAPI fragment). | Each node in the workflow begins with a `contract:` block that lists required fields, optional defaults, and exact data types. The orchestrator validates the payload before the step runs and aborts early on mismatches. | Guarantees that downstream agents receive exactly what they expect, eliminating “type‑drift” bugs that are common in loosely‑typed pipelines. |
| **Verification Gate Insertion** | Inserts explicit `gate:` stages that run automated checks (unit test, schema validation, policy compliance) before allowing the next step to execute. | After every major transformation (e.g., data enrichment, model inference) a gate runs a tiny test suite (`pytest`‑style assertions) and a policy lint (`ricky‑policy‑check`). The workflow only proceeds on a gree
...[truncated]

Must

  • Describe skills as generation-time selection, loading, and rendering inputs.
  • Point to metadata such as loaded-skills.txt and skill-application-boundary.json.
  • Say the workflow instructions were informed by selected skills.

Must Not

  • Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
  • Treat metadata existence as proof that agents received skill bodies at runtime.
  • Overstate current tests beyond the generation boundary.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)

User message

Generate a workflow from a vague product spec and return it to the user.

Ricky output

**Ricky – Workflow Generation Assistant**

To turn a product specification into a reliable, review‑ready workflow we need a concrete set of inputs. With a vague spec the generated workflow would be incomplete and could introduce blockers later in the pipeline.

**Please provide the following details (you can answer as many as you have):**

| # | Required Information | Why It Matters |
|---|----------------------|----------------|
| 1 | **High‑level product goal** (e.g., “automate onboarding of new SaaS customers”) | Drives the overall workflow purpose and success criteria. |
| 2 | **Key functional steps** (list in order, e.g., “collect user data → verify email → provision account”) | Determines the sequence of jobs, dependencies, and hand‑offs. |
| 3 | **Inputs & outputs for each step** (e.g., “CollectUserData: input = API request, output = JSON payload”) | Enables deterministic contracts and verification gates. |
| 4 | **Success & failure conditions** for each step (e.g., “email verification succeeds if SMTP returns 250”) | Required for automated gating and retry policies. |
| 5 | **Stakeholder review points** (who must sign‑off, what artifacts are needed) | Defines review artifac
...[truncated]

Must

  • Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
  • Validate with dry-run or targeted structural checks where possible.
  • Return artifacts, warnings, and follow-up commands honestly.

Must Not

  • Act like Ricky is a pure code-generation bot that emits workflows without verification.
  • Stop at "code compiles" as the proof bar.
  • Skip skill-aware workflow authoring guidance for serious workflows.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)

User message

A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.

Ricky output

**Ricky’s Immediate Response Plan**

| Phase | Action | Why | Deliverable / Artifact |
|-------|--------|-----|------------------------|
| 1️⃣ Detect & Log | • Pull the execution trace from the orchestrator (step‑ID, timestamps, exit codes, heartbeat logs). <br>• Record the timeout event and the “alive‑but‑idle” worker state in the **Workflow Failure Log (WFL)**. | Provides a deterministic, auditable record that can be referenced in every downstream gate. | `wfl/<workflow‑id>/failure‑<timestamp>.json` |
| 2️⃣ Blocker Classification | • Mark the failure as a **Hard Blocker** (cannot continue without human‑verified fix). <br>• Tag the blocker with: <br> `type: timeout` <br> `affected‑step: <step‑name>` <br> `symptom: orphaned‑worker`. | Guarantees that the orchestrator will pause the pipeline and that no downstream steps are scheduled. | Updated entry in `WFL` + entry in `blockers.yaml` |
| 3️⃣ Verification Gate – “Failure‑Intake” | • Gate **FAIL‑INTAKE** checks: <br> a) Failure log exists and is well‑formed. <br> b) All relevant metrics (CPU, memory, heartbeat) are attached. <br> c) No other concurrent failures are pending. | Deterministic gate ensures the failure is fully captured 
...[truncated]

Must

  • Classify the failure before deciding whether to retry.
  • Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
  • Preserve raw evidence and uncertainty when the class is not known.

Must Not

  • Blindly rerun the whole workflow without a blocker classification.
  • Treat every failure as a broken workflow definition.
  • Claim Ricky fixed the workflow before rerun evidence proves it.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)

User message

Ricky detects stale .agent-relay/, .relay/, and .trajectories/ state before launching a local workflow.

Ricky output

**Workflow Contract – “Detect‑and‑Purge Stale Relay State”**  
*Scope*: Local development runs of any `.agent‑relay/`, `.relay/`, or `.trajectories/` directories that exist under the repository root.  
*Pre‑condition*: The invoking command (`ricky run …` or `npm run workflow`) must be executed from the repository root.  
*Post‑condition*: All stale relay artefacts are removed **or** a deterministic blocker is raised with a clear remediation path; the workflow then proceeds only if the blocker is resolved.

---

## 1. Trigger & Entry Point
```yaml
name: detect-stale-relay-state
on:
  workflow_dispatch:   # manual or CI‑triggered local run
  push:
    paths:
      - '**/.agent-relay/**'
      - '**/.relay/**'
      - '**/.trajectories/**'

Ricky injects this step automatically at the very start of every local workflow definition (.ricky/workflows/*.yml).


2. Detection Logic (deterministic, testable)

Step Action Deterministic Check Expected Output
2.1 Scan for directories matching the three patterns. `find . -type d ( -path "/.agent-relay" -o -path "/.relay" -o -path "*/.trajectories" )
...[truncated]

**Must**

- Classify stale local runtime state as an environment contamination issue.
- Recommend quarantine or isolated-run guidance before launch.
- Record the observed paths and the action taken or recommended.

**Must Not**

- Treat stale runtime state as a workflow logic failure.
- Delete or overwrite state without an explicit safe path or user intent.
- Continue into execution as if the workspace were clean.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>runtime-recovery.already-running-conflict</code> (runtime-recovery/manual)</summary>

**User message**

> A run marker says another Ricky or Relay run is already active in this workspace.

**Ricky output**

```text
**Fixing “Another Ricky or Relay run is already active in this workspace”**

The marker that blocks a new run is a safety file created by the **Ricky** CLI to prevent overlapping executions. When the file is left behind (e.g., a crash, a killed process, or a stale workspace), the next invocation thinks a run is still in progress.

Below is the deterministic, step‑by‑step contract to clear the blocker safely and get the workspace back to a clean state.

---

## 1️⃣ Verify the Current State

| Action | Command | Expected Output |
|--------|---------|-----------------|
| Show active Ricky processes | `ps aux \| grep ricky | grep -v grep` | *No lines containing `ricky`* |
| Show active Relay processes | `ps aux \| grep relay | grep -v grep` | *No lines containing `relay`* |
| Show the run‑marker file | `cat .ricky/run.marker` (or `.relay/run.marker`) | Path to the marker and a UUID, e.g. `workspace:/tmp/ricky-2024-09-12-abcdef` |

If any `ricky`/`relay` processes appear, **kill them first** (`kill -9 <PID>`). If none are running, proceed to step 2.

---

## 2️⃣ Remove the Stale Marker

> **Deterministic verification gate:** The marker must be removed **only** after confirming no live p
...[truncated]

Must

  • Report the active marker, run id, or status path when available.
  • Ask the user to inspect, wait for, or explicitly clear the active run.
  • Avoid launching a competing run that could corrupt evidence.

Must Not

  • Silently start another run.
  • Hide the existing run marker from the user.
  • Treat the conflict as a generic failure with no recovery path.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.env-loader-injection-runtime-loadable (runtime-recovery/manual)

User message

A workflow artifact references a MISSING_ENV_VAR value. Ricky's deterministic auto-fix injects the .env.local / .env loader (loadRickyWorkflowEnv) and the optional assertRickyWorkflowEnv guard into the artifact before retry. The artifact may be a master-rendered workflow whose .step({ command: ... }) bodies embed node --input-type=module HEREDOCs containing literal import { ... } from 'node:fs' / from 'node:path' strings.

Ricky output

**Deterministic auto‑fix for missing environment variables**

Below is the exact patch you can apply to any workflow artifact that currently fails because it references a `MISSING_ENV_VAR`. The fix does three things:

1. **Injects the environment loader** – `loadRickyWorkflowEnv()` – which reads `.env.local` first, then falls back to `.env`.  
2. **Adds an optional guard** – `assertRickyWorkflowEnv()` – that throws a clear error if the required variable is still undefined after loading.  
3. **Leaves the existing step bodies untouched** (including the `node --input-type=module` HEREDOCs that embed literal `import … from 'node:fs'` / `from 'node:path'` strings).  

---

## 1. Helper utilities (add once to a shared lib or the top of the workflow file)

```ts
// ricky-workflow-env.ts
import { config as loadDotenv } from 'dotenv';
import { resolve } from 'node:path';
import { existsSync } from 'node:fs';

/**
 * Load environment variables for a Ricky workflow.
 * Priority: .env.local → .env → process.env (already set).
 */
export function loadRickyWorkflowEnv(): void {
  const cwd = process.cwd();

  // .env.local has highest priority
  const localPath = resolve(cwd, '.env.local');
  i
...[truncated]

Must

  • Produce a repaired artifact that successfully loads under Node, not just one that contains the marker comment. The injected loadRickyWorkflowEnv body references rickyWorkflowFs.* and rickyWorkflowPath.*, so the repair must also add the corresponding import * as rickyWorkflowFs from 'node:fs' and import * as rickyWorkflowPath from 'node:path' aliases at module top level.
  • Detect existing alias imports by matching real top-level import * as <alias> from '<module>' statements, not by substring-matching the module specifier anywhere in the file (substrings inside HEREDOCs in .step({ command: ... }) bodies do not count as imports).
  • Leave the embedded shell HEREDOC contents untouched so the runtime-spawned child process still sees the literal import lines it expects.

Must Not

  • Skip adding the rickyWorkflowFs / rickyWorkflowPath aliases because from 'node:fs' or from 'node:path' already appears somewhere in the file as a string literal.
  • Inject loadRickyWorkflowEnv (or assertRickyWorkflowEnv) without the supporting alias imports, which produces a ReferenceError: rickyWorkflowPath is not defined at module load and burns the auto-fix budget on UNSUPPORTED_RUNTIME at runtime-launch.
  • Rewrite or escape the embedded HEREDOC text in step commands.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)

User message

Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.

Ricky output

**Local Workflow Execution – Auto‑Fix Run**

Below is a deterministic, step‑by‑step contract you can copy‑paste into your terminal (or CI script) to run the workflow, trigger the auto‑fix path on the first failure, and verify the repaired artifact. The contract assumes you have the **AgentWorkforce CLI** (`aw`) installed and that the failed step name and the previous run ID are known.

---

## 1️⃣ Prerequisites (verify once)

| Check | Command | Expected outcome |
|-------|---------|------------------|
| `aw` CLI present | `aw --version` | `aw version x.y.z` |
| Working directory is the workflow repo root | `git rev-parse --show-toplevel` | Path to repo |
| Current branch is a short‑lived feature branch (e.g., `auto‑fix/run‑<timestamp>`) | `git branch --show-current` | `auto-fix/run-...` |
| Docker (or the runtime required by the workflow) is running | `docker ps` | List of containers (or empty) |
| Environment variables for secrets are exported | `printenv | grep ^AW_` | `AW_…` variables listed |

If any check fails, resolve it before proceeding.

---

## 2️⃣ Create a scoped branch & PR (isolated repair)

```bash
# 2.1 Create a short‑lived branch
BRANCH="auto-fix/run-$(date +%s)"

...[truncated]

Must

  • Use a bounded retry budget and summarize every attempt.
  • Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
  • Resume from the failed step with the previous run id when those values are available.

Must Not

  • Edit arbitrary repository source files as the default auto-fix surface.
  • Keep retrying after the configured max attempts.
  • Lose the single Ricky tracking run id across repair/resume attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.no-auto-fix-preserves-single-attempt (runtime-recovery/manual)

User message

A user runs ricky run workflows/foo.ts --no-auto-fix and the workflow fails.

Ricky output

**Failure Report – `ricky run workflows/foo.ts --no-auto-fix`**

| Item | Detail |
|------|--------|
| **Workflow** | `workflows/foo.ts` |
| **Run mode** | Manual (`--no-auto-fix`) |
| **Exit code** | Non‑zero (see CLI output) |
| **Timestamp** | `2026‑05‑10 T…Z` |
| **Agent version** | `ricky v2.7.4` |
| **Node runtime** | `v20.11.1` |
| **OS** | `linux/ubuntu‑22.04` |

---

## 1️⃣ Immediate diagnostics

1. **Collect the run log**  
   ```bash
   ricky logs workflows/foo.ts --last

Save the output to logs/foo.run.log. The log contains the deterministic verification gate failures.

  1. Check the workflow contract

    ricky contract show workflows/foo.ts

    Verify that the contract file (workflows/foo.contract.json) is present and matches the current source (hashes are printed).

  2. Run the static validator (does not execute the workflow)

    ricky validate workflows/foo.ts

    This will surface schema mismatches, missing imports, or type‑checking errors before execution.

  3. Inspect the deterministic verification gate report

    ricky gate report workflows/foo.ts --last

    Look for any FAILED gates;
    ...[truncated]


**Must**

- Preserve one-attempt behavior when auto-fix is disabled.
- Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
- Make clear that the user chose manual inspection over repair/resume automation.

**Must Not**

- Start a repair loop despite `--no-auto-fix`.
- Suppress the diagnosis because no repair was attempted.
- Present the failure as a completed repair attempt.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>runtime-recovery.in-process-local-runner</code> (runtime-recovery/manual)</summary>

**User message**

> Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.

**Ricky output**

```text
**Ricky’s Standard Procedure for Executing a Local TypeScript Workflow Artifact**

Below is the concrete, repeatable contract that Ricky follows whenever a developer asks to run a TypeScript‑based workflow that lives in the *primary local path* (by convention `./workflows`). The steps are written as a deterministic “run‑book” that can be copied into a branch‑scoped PR or a CI‑gate script.

---

## 1️⃣ Preconditions (Verification Gate #1)

| Check | Expected State | How Ricky Verifies |
|------|----------------|--------------------|
| **Repository root** | Current working directory is the repo root (`git rev-parse --show-toplevel`) | `assert $(git rev-parse --show-toplevel) = $(pwd)` |
| **Primary workflow directory** | Exists at `./workflows` and contains at least one `*.ts` file | `test -d ./workflows && ls ./workflows/*.ts >/dev/null` |
| **Node version** | `>=18.x` (LTS) | `node -v` |
| **Package manager** | `pnpm` (preferred) or `npm` installed | `command -v pnpm || command -v npm` |
| **ts-node** | Available either globally or as a dev‑dependency | `pnpm list ts-node || npm list ts-node` |
| **Clean build state** | No stale `dist/` artifacts that could mask runtime errors | `r
...[truncated]

Must

  • Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the agent-relay binary on PATH.
  • Precheck that Node and @agent-relay/sdk are resolvable for the workflow.
  • Record the actual spawn command in execution evidence.

Must Not

  • Fail solely because agent-relay is not on PATH when the SDK route is available.
  • Hide the actual runtime command from evidence.
  • Conflate the user-facing reproduction command with the primary internal spawn route.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)

User message

Ricky reaches a boundary after a structural failure persists after a fix attempt.

Ricky output

**Ricky – Boundary‑Escalation Protocol for Persistent Structural Failures**  

When a structural failure remains after an initial remediation attempt, the workflow must transition to a *Boundary‑Escalation* state. The following contract‑driven sequence guarantees deterministic handling, traceability, and rapid resolution.

---

## 1. Trigger Condition
| Condition | Source |
|----------|--------|
| `failure.persisted == true` **AND** `fix.attempted == true` **AND** `verification.passed == false` | `failure_monitor` service (event `STRUCTURAL_FAILURE_PERSISTED`) |

*If any of the above fields are missing, raise a **Blocker** and abort the escalation.*

---

## 2. Escalation Contract (JSON)

```json
{
  "escalation_id": "uuid-v4",
  "origin_workflow": "structural_fix_v3",
  "failure_id": "uuid-failure",
  "attempted_fix": {
    "commit_sha": "abc123def",
    "branch": "fix/structural‑leak‑2024‑09‑12",
    "pr_number": 8421,
    "review_status": "APPROVED"
  },
  "verification": {
    "gate": "structural_integrity_test",
    "result": "FAIL",
    "log_ref": "s3://logs/structural/8421.log"
  },
  "escalation_reason": "Persistent failure after approved fix",
  "timestamp": "2026-05-10T14
...[truncated]

Must

  • Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
  • Distinguish escalation from a generic product failure.
  • Preserve enough context for a human operator to continue.

Must Not

  • Retry speculative fixes indefinitely.
  • Collapse to "something went wrong" without the attempted actions.
  • Discard evidence from failed repair attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)

User message

Produce a workflow health digest from many Ricky workflow runs.

Ricky output

**Ricky Workflow Health Digest – Summary Report**

*Generated: 2026‑05‑10 12:00 UTC*  
*Scope: All Ricky workflow runs recorded in the last 24 h (2026‑05‑09 00:00 – 2026‑05‑09 23:59 UTC)*  

---

## 1. Overview

| Metric | Value | Target / SLA |
|--------|-------|--------------|
| Total runs (all branches) | **1 842** | — |
| Successful runs (✔) | **1 617** | ≥ 90 % |
| Failed runs (✖) | **225** | ≤ 10 % |
| Blocked runs (⛔) | **0** | 0 |
| Avg. duration (successful) | **3 m 27 s** | ≤ 5 m |
| Avg. duration (failed) | **4 m 12 s** | ≤ 6 m |
| Runs with **≥ 2** validation loops | **112** | ≤ 5 % |
| Runs requiring manual **review artifact** | **38** | ≤ 2 % |

**Health status:** **⚠️ Warning** – Failure rate (12.2 %) exceeds the 10 % SLA. The primary cause is a spike in “Contract Mismatch” gate failures (see Section 2).

---

## 2. Failure Breakdown (by deterministic verification gate)

| Gate | Failures | % of total failures | Typical root cause |
|------|----------|---------------------|--------------------|
| **Contract Mismatch** (input‑output schema) | 132 | 58.7 % | Upstream schema drift in `data‑ingest` service |
| **Branch Policy** (protected‑branch rule) | 41 | 18.2 % | Una
...[truncated]

Must

  • Consume normalized WorkflowRunEvidence rather than raw logs as the primary input.
  • Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
  • Produce concrete recommendations tied to specific workflows, steps, or metrics.

Must Not

  • Mutate evidence while analyzing it.
  • Return generic advice like "improve your workflow" without references.
  • Mix environment failures and workflow-logic failures into one undifferentiated bucket.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)

User message

Design a new Ricky Slack surface for workflow debugging.

Ricky output

**Ricky Slack Surface – “Workflow Debugger”**

---

## 1. Overview
The **Workflow Debugger** Slack surface gives engineers a deterministic, low‑latency way to inspect, step‑through, and repair Ricky‑managed workflows directly from a Slack channel. It follows the **Ricky repository conventions** (AGENTS.md), the **workflow standards** (contract‑first, verification gates, review artifacts), and the **shared authoring rules** (branch/PR scoping, blocker reporting).

---

## 2. High‑Level Interaction Flow

| Step | Slack Action | Backend Trigger | Output |
|------|--------------|----------------|--------|
| 1 | `/ricky debug <workflow‑id>` | `debug_start` event | A **debug session** is created, a unique `session_id` is returned, and the user receives a **session card** with controls. |
| 2 | Click **“Pause”** on the card | `debug_pause(session_id)` | Workflow execution halts at the next gate; current state snapshot is posted. |
| 3 | Click **“Step”** | `debug_step(session_id)` | Executes exactly one verification gate; posts gate result and updated state. |
| 4 | Click **“Resume”** | `debug_resume(session_id)` | Continues execution until next pause point or completion; posts final statu
...[truncated]

Must

  • Treat Slack as a co-equal product surface, not a wrapper around CLI.
  • Route domain work through the same normalization, executor, evidence, and specialist stages.
  • Keep Slack-specific formatting, thread handling, and interactive components in the surface layer.

Must Not

  • Put workflow generation or diagnosis domain logic directly in the Slack handler.
  • Degrade Slack to a developer shortcut with weaker routing than CLI.
  • Skip signature verification, dedup, or thread handling when the surface is implemented.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.normalizer-is-only-translation-boundary (surfaces-ingress/manual)

User message

Add a new web handoff type that submits a workflow spec and mode.

Ricky output

**New Hand‑off Type:** `web_submit_spec`

**Purpose**  
Enable external web clients (e.g., UI forms, API gateways) to hand off a **complete workflow specification** together with an **execution mode** (`preview` | `run`) to the AgentWorkforce engine. The engine validates the spec, stores it, and either runs a dry‑run preview or launches the full workflow.

---

## 1. Contract Definition (JSON‑Schema)

```json
{
  "$id": "https://agentworkforce.com/schemas/web_submit_spec.json",
  "title": "Web Submit Spec Hand‑off",
  "type": "object",
  "required": ["workflowSpec", "mode"],
  "properties": {
    "workflowSpec": {
      "$ref": "https://agentworkforce.com/schemas/workflow_spec.json"
    },
    "mode": {
      "type": "string",
      "enum": ["preview", "run"],
      "description": "Execution mode – preview validates & simulates, run executes live."
    },
    "metadata": {
      "type": "object",
      "additionalProperties": true,
      "description": "Optional free‑form key/value pairs for tracing, tenant ID, etc."
    }
  },
  "additionalProperties": false
}
  • workflowSpec – the full AgentWorkforce workflow definition (steps, contracts, gates, etc.) as defined in the exis
    ...[truncated]

**Must**

- Add a handoff variant that normalizes into `LocalInvocationRequest` or `CloudGenerateRequest`.
- Keep auth/session validation in the web surface before normalization.
- Ensure downstream code does not need to know the request came from web.

**Must Not**

- Short-circuit directly from the web handler to executors.
- Create a parallel domain model for web requests.
- Make the web surface the owner of local or Cloud routing semantics.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>surfaces-ingress.mcp-claude-context-is-metadata</code> (surfaces-ingress/manual)</summary>

**User message**

> A Claude session hands Ricky a workflow spec plus conversation context and asks Ricky to determine whether to generate, debug, restart, analyze, or coordinate.

**Ricky output**

```text
**Decision:** Generate a new workflow draft.

**Rationale**

1. **Input Type** – You supplied a *workf

---

_Comment truncated to stay within GitHub limits. Download the `ricky-eval-run` artifact for the full `human-review.md`._

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant