Skip to content

feat(generation): load review-fix-signoff-loop as a default workflow skill#115

Merged
khaliqgant merged 1 commit into
mainfrom
ricky/default-review-fix-signoff-loop-skill
May 15, 2026
Merged

feat(generation): load review-fix-signoff-loop as a default workflow skill#115
khaliqgant merged 1 commit into
mainfrom
ricky/default-review-fix-signoff-loop-skill

Conversation

@khaliqgant
Copy link
Copy Markdown
Member

Summary

  • Add review-fix-signoff-loop to Ricky's DEFAULT_WORKFLOW_SKILL_IDS so the generator loads it alongside choosing-swarm-patterns, writing-agent-relay-workflows, and relay-80-100-workflow whenever a spec is turned into a workflow.
  • Wire the new skill through the rendering pipeline: template-renderer.ts emits a generation_rendering / workflow_contract evidence entry and adds a boundary-gate grep, master-workflow-renderer.ts includes it in the master-executor fallback, and pipeline.ts requires the rendering evidence so workflows cannot ship without recording that the dual-reviewer review-fix-signoff loop was applied.
  • Bundle the prpm-installed skill (.agents/skills/review-fix-signoff-loop/SKILL.md, .claude/skills/review-fix-signoff-loop/SKILL.md), the sibling skill description updates, and the prpm.lock bump so fresh checkouts discover the skill via Ricky's existing discovery roots.
  • Document the new default in AGENTS.md (the CLAUDE.md symlink picks it up automatically) so authors know the loop pattern is now a hard convention for generated workflows.

Test plan

  • npx tsc --noEmit
  • npx vitest run src/product/generation/skill-matcher.test.ts src/product/generation/pipeline.test.ts — 60 tests pass, including new assertions that pin review-fix-signoff-loop as a default match, a generation_rendering evidence entry with effect: workflow_contract, and a boundary-gate grep on the skill name.
  • CI typecheck + full test suite on PR.

🤖 Generated with Claude Code

…skill

When Ricky reads a spec and renders a workflow it now applies the new
review-fix-signoff-loop skill alongside choosing-swarm-patterns,
writing-agent-relay-workflows, and relay-80-100-workflow: the generator
loads the skill, emits generation_rendering evidence for it, the boundary
gate greps for the skill name, and the pipeline validator requires the
rendering evidence so workflows cannot ship without the dual-reviewer
review-fix-signoff loop being recorded as applied.

The prpm-installed skill bundle (new review-fix-signoff-loop SKILL.md
under .agents/skills and .claude/skills, plus updated sibling skill
descriptions and prpm.lock) is included so fresh checkouts of this branch
can discover and load the skill via Ricky's existing discovery roots.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@khaliqgant khaliqgant merged commit d485e41 into main May 15, 2026
3 of 5 checks passed
@khaliqgant khaliqgant deleted the ricky/default-review-fix-signoff-loop-skill branch May 15, 2026 18:27
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 15, 2026

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 538ec9d0-644e-4493-a25f-19928ee62a7f

📥 Commits

Reviewing files that changed from the base of the PR and between e0a4e2d and 5dc860c.

⛔ Files ignored due to path filters (1)
  • prpm.lock is excluded by !**/*.lock
📒 Files selected for processing (13)
  • .agents/skills/relay-80-100-workflow/SKILL.md
  • .agents/skills/review-fix-signoff-loop/SKILL.md
  • .agents/skills/writing-agent-relay-workflows/SKILL.md
  • .claude/skills/relay-80-100-workflow/SKILL.md
  • .claude/skills/review-fix-signoff-loop/SKILL.md
  • .claude/skills/writing-agent-relay-workflows/SKILL.md
  • AGENTS.md
  • src/product/generation/master-workflow-renderer.ts
  • src/product/generation/pipeline.test.ts
  • src/product/generation/pipeline.ts
  • src/product/generation/skill-matcher.test.ts
  • src/product/generation/skill-matcher.ts
  • src/product/generation/template-renderer.ts

📝 Walkthrough

Walkthrough

This PR introduces a comprehensive review-fix-signoff-loop workflow skill with mandatory sequential review/fix iterations using fresh agent context and dual-signoff gates, integrates it into existing relay-80-100-workflow patterns, documents new pr_url verification gates, and wires the skill throughout the workflow generation system with validation, evidence building, and test coverage.

Changes

Workflow Skill Generation

Layer / File(s) Summary
Define review-fix-signoff-loop skill pattern
.agents/skills/review-fix-signoff-loop/SKILL.md, .claude/skills/review-fix-signoff-loop/SKILL.md
New skill document defines mandatory review → fix → re-validate loops with fresh agent context, deterministic verdict contracts, fresh-context iterations per loop, PR signoff comments, blocked-state handling via BLOCKED_NO_COMMIT.md, and common mistakes to prevent.
Update relay-80-100-workflow to mandate sequential loop
.agents/skills/relay-80-100-workflow/SKILL.md, .claude/skills/relay-80-100-workflow/SKILL.md
Relay-80-100 skills now explicitly require sequential Claude-then-Codex review/fix loops with test hardening. Final acceptance changed from two independent reviewers to sequential Claude review → fixer repair → Claude re-review → Codex repeat, with fallback to BLOCKED_NO_COMMIT if issues persist; commits/PR creation gated on post-Codex-fix green acceptance.
Add pr_url verification gate documentation
.agents/skills/writing-agent-relay-workflows/SKILL.md, .claude/skills/writing-agent-relay-workflows/SKILL.md
Verification Gates section expanded to include new pr_url type for steps that must publish a PR. Clarifies <owner>/<repo> repository scoping and emphasizes createGitHubStep({ action: 'createPR' }) mechanism.
Wire review-fix-signoff-loop into generation pipeline
AGENTS.md, src/product/generation/skill-matcher.ts, src/product/generation/master-workflow-renderer.ts, src/product/generation/pipeline.ts, src/product/generation/template-renderer.ts, src/product/generation/pipeline.test.ts, src/product/generation/skill-matcher.test.ts
Skill is added to default skill ids, fallback defaults, and validation requirements. AGENTS.md convention updated to require the skill and corresponding generation_rendering evidence. Boundary gates extended to verify skill markers. Evidence builder computes reviewer task counts. Tests across pipeline and skill-matcher expect review-fix-signoff-loop in defaults, evidence arrays, and gate commands.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A looping review dance so fine,
Fresh eyes twice, verdicts align,
No hidden bias, blocked or clear,
Sequential sign-offs bring good cheer! ✨

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ricky/default-review-fix-signoff-loop-skill

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Open in Devin Review

@github-actions
Copy link
Copy Markdown
Contributor

Ricky Eval Review

Run: .ricky/evals/runs/2026-05-15T18-27-09-609Z-HEAD-provider
Mode: provider
Git SHA: 64272db

Passed: 11 | Needs human: 44 | Reviewable: 44 | Missing output: 0 | Failed: 0 | Skipped: 0

Human Review Cases

These cases passed deterministic checks and include captured Ricky output for a human verdict against their Must / Must Not rubric.

REVIEW agent-assistant-boundary.real-reuse-not-rhetorical (agent-assistant-boundary/manual)

User message

Update Ricky docs and code to say it uses Agent Assistant more deeply.

Ricky output

**Ricky – Updated Documentation & Code to Reflect Deeper Integration with Agent Assistant**

Below is a complete, deterministic change package that brings the repository into alignment with the new product spec: *Ricky now leverages the Agent Assistant (AA) for all internal reasoning, validation, and coordination steps.*  

The package follows **Ricky repository conventions** (AGENTS.md), the **workflow standards** (contract → implementation → verification → review), and the **shared authoring rules** (atomic PRs, scoped branches, 80‑to‑100 validation loops).

---

## 1. Scope & Branch Layout
| Item | Branch | Description |
|------|--------|-------------|
| **Feature** | `feat/aa‑deep‑integration` | Introduces AA‑driven reasoning, adds AA‑specific contracts, updates docs, and adds unit‑test suite. |
| **PR** | `PR #274 – Deep Agent Assistant Integration` | Single PR containing all changes; passes CI gate “AA‑Contract‑Check”. |
| **Reviewers** | `@ricky‑lead`, `@aa‑team` | Required approvals: 2 reviewers, one from AA team. |
| **Blocking Issues** | None (all dependencies already in `main`). |

---

## 2. Contract – “AA‑Deep‑Integration” (YAML)

```yaml
# contracts/aa_deep_integratio
...[truncated]

Must

  • Ground claims in real package imports and runtime paths.
  • Distinguish current implementation from target architecture.
  • Identify which Agent Assistant primitive is actually exercised.

Must Not

  • Rename local code to sound Agent Assistant aligned and count that as adoption.
  • Claim broad Agent Assistant native behavior from documentation-only alignment.
  • Blur target architecture with landed behavior.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.turn-context-preserves-ricky-envelope (agent-assistant-boundary/manual)

User message

Evaluate the current Ricky @agent-assistant/turn-context adoption.

Ricky output

**Ricky @agent‑assistant/turn‑context Adoption Review**  
*(as of the latest CI run on 2026‑05‑12)*  

| Metric | Current Value | Target / Trend | Status |
|--------|---------------|----------------|--------|
| **Workflow contracts defined** | 7 of 12 agents have a `turn‑context` contract in `workflow.yaml` | 100 % by Q4 2026 | ⚠️ |
| **Deterministic verification gates** | 5 agents use `turn‑context`‑gate in CI | 8 by Q3 2026 | ⚠️ |
| **Review artifacts** | 23 PRs contain `turn‑context` diff; 14 approved, 9 pending | 100 % of PRs with turn‑context changes must have a reviewer‑signed artifact | ⚠️ |
| **Validation loops (80‑100)** | Avg. loops per PR: 62 | ≥80 loops before merge | ❌ |
| **Blocker reports** | 3 open blocker tickets (IDs #212, #219, #227) related to missing `turn‑context` in shared libraries | All blockers must be resolved before next release | ❌ |
| **Branch/PR scoping** | 4 PRs merged with unrelated feature changes alongside `turn‑context` updates | Enforce single‑purpose PRs for `turn‑context` changes | ⚠️ |
| **Adoption growth (last 30 days)** | +12 % new contracts, +8 % gate coverage | +15 % month‑over‑month | ↗️ |

### Key Findings
1. **Partial Coverage** – Only
...[truncated]

Must

  • Preserve request id, source metadata, structured spec, invocation root, mode, stage mode, spec path, metadata, and spec text.
  • Record compact provenance through generation decisions or coordinator metadata.
  • Keep the shared turn context internal to the adapter boundary.

Must Not

  • Move LocalResponse, blocker taxonomy, recovery wording, or execution semantics into the shared turn-context package.
  • Drop Ricky-specific workflow metadata during envelope assembly.
  • Treat turn context as a product decision engine.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.product-core-stays-ricky-owned (agent-assistant-boundary/manual)

User message

Decide whether workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording should move into Agent Assistant.

Ricky output

**Recommendation:** Move **all** of the listed capabilities—workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording—into the **Agent Assistant** component.

---

## Rationale (aligned with AGENTS.md and workflow standards)

| Capability | Why it belongs in Agent Assistant | Impact on reliability & coordination |
|------------|-----------------------------------|---------------------------------------|
| **Workflow generation** | Agent Assistant already owns *authoring* contracts and can emit deterministic workflow definitions (YAML/JSON). Centralizing generation avoids duplicated template logic across agents. | Guarantees a single source‑of‑truth for workflow shape; downstream agents consume a stable contract. |
| **Validation** | Validation gates (schema checks, type‑safety, idempotency) are deterministic and fit the “verification gate” pattern that Agent Assistant enforces before a PR is merged. | Early failure detection, reduces flaky runs, and keeps the CI pipeline lean. |
| **Debugging** | Agent Assistant can attach deterministic “debug‑trace” artifacts (log snippets, diff reports) to the workflow PR, providing a reproducible debugging surface. 
...[truncated]

Must

  • Keep product-defining workflow generation, validation, debugging, local UX, and evidence wording Ricky-owned until proof says otherwise.
  • Reuse shared runtime primitives where they reduce duplication without weakening Ricky.
  • Make extraction follow typed, tested, live product proof.

Must Not

  • Generalize workflow-specific behavior prematurely.
  • Adopt moving shared seams merely for architectural purity.
  • Lose the precise local-first staged workflow UX.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.one-slice-at-a-time (agent-assistant-boundary/manual)

User message

Plan the next Agent Assistant adoption slice for Ricky.

Ricky output

**Agent Assistant Adoption Slice – “Ricky v2.0 Roll‑out”**  
*Target window: July 1 – September 30 2026*  

---

## 1. Slice Objective
Deploy the next‑generation reliability and coordination features of Ricky to **30 % of the active AgentWorkforce (AW) tenant base** while establishing a repeatable, gated adoption pipeline that can be scaled to 100 % in subsequent slices.

| Success Metric | Target |
|----------------|--------|
| **Feature activation** – Ricky‑v2.0 enabled in production | ≥ 30 % of tenants |
| **Mean‑time‑to‑recovery (MTTR)** for workflow failures (post‑adoption) | ↓ 20 % vs. baseline |
| **User‑reported friction** (support tickets / NPS) | ≤ 5 % of adopters |
| **Automation coverage** (validated contracts) | ≥ 80 % of new workflow steps |
| **Verification loops executed** | 80‑100 per PR (see §4) |

---

## 2. Scope & Boundaries  

| In‑Scope | Out‑of‑Scope |
|----------|--------------|
| • New **deterministic verification gates** for workflow contracts (pre‑merge & post‑deploy).<br>• **Branch/PR policy**: feature branch `ricky/v2‑adopt‑slice`, PR target `main` with required reviewers (R1, R2, QA).<br>• **Automation scripts** for onboarding tenant config, feature‑f
...[truncated]

Must

  • Pick exactly one real shared seam to evaluate or adopt.
  • Define a live Ricky product path that will prove the adoption.
  • Include regression checks that product messaging, blocker output, and evidence remain truthful.

Must Not

  • Bundle sessions, memory, policy, proactive behavior, and execution extraction into one vague migration.
  • Skip the comparison/evaluation step for mature Ricky-local seams.
  • Treat adoption as successful without a live product-path proof.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.future-surfaces-use-shared-runtime (agent-assistant-boundary/manual)

User message

Design future Slack or web support for Ricky using Agent Assistant packages.

Ricky output

## Ricky – Future Support Interface (Slack / Web)

Below is a **concrete, contract‑driven design** for adding a “Ricky Support Assistant” that can be invoked from Slack or a web portal. The design follows the **AgentWorkforce workflow standards** (AGENTS.md) and includes deterministic verification gates, review artifacts, and a bounded PR scope.

---

### 1. High‑Level Architecture

| Component | Responsibility | Tech Stack (suggested) |
|-----------|----------------|------------------------|
| **Ricky‑API Gateway** | HTTP/HTTPS entry point; validates auth, routes to the correct Agent package. | FastAPI (Python) + OpenAPI spec |
| **Agent Packages** | Self‑contained Ricky agents (e.g., `ricky‑workflow‑repair`, `ricky‑doc‑lookup`). Each package ships a `manifest.yaml` with contracts. | Python packages, Poetry |
| **Slack Bot** | Listens to `/ricky` slash command and interactive messages; forwards payload to the API Gateway. | Bolt for Python |
| **Web UI** | Single‑page React app; uses the same API endpoints as Slack. | React + Vite |
| **Verification Service** | Executes deterministic verification gates (schema validation, contract tests, sandboxed execution). | pytest‑asyncio + sc
...[truncated]

Must

  • Prefer shared surfaces, webhook-runtime, sessions, and routing primitives for future non-CLI interaction where mature.
  • Keep local CLI behavior product-local unless shared adoption is proven harmless.
  • Explain which behavior is future/target architecture versus implemented today.

Must Not

  • Preemptively add memory, policy, or proactive packages without a real Ricky product requirement.
  • Let future surface abstractions distort the current CLI contract.
  • Duplicate a mature Agent Assistant capability locally without justification.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.generation-default-not-execution (cli-behavior/manual)

User message

A user runs ricky --mode local --spec "generate a workflow for package checks" without --run.

Ricky output

**What happens when you run**

```bash
ricky --mode local --spec "generate a workflow for package checks"

without the --run flag

Step Action Result
1️⃣ Parse arguments – Ricky reads --mode local and the free‑form --spec string. The request is interpreted as “create a workflow definition for package checks”.
2️⃣ Select the authoring pipeline – Because --run is absent, Ricky enters author‑only mode. No execution engines are started; only the workflow generation logic is invoked.
3️⃣ Generate the workflow contract – Ricky builds a deterministic JSON/YAML contract that describes the steps, inputs, outputs, and verification gates for the package‑check pipeline. The contract is written to ./ricky_output/<timestamp>_package-checks.workflow.json (or .yaml depending on your config).
4️⃣ Print a summary – Ricky prints a concise summary to STDOUT, e.g.:

✅ Workflow generated: ./ricky_output/20240515_1432_package-checks.workflow.yaml
🛠️ Mode: local (no execution)
You see where the artifact lives and that no jobs have been launched.
5️⃣ Exit – The process termin
...[truncated]

**Must**

- Say generation is the default and execution was not requested.
- Print the generated artifact path, workflow id, spec digest, and next run command.
- Avoid showing execution evidence for a generation-only request.

**Must Not**

- Imply the workflow ran automatically.
- Present a generation-only result as execution success.
- Hide the opt-in commands for running the artifact.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.first-run-copy-is-compact-and-truthful</code> (cli-behavior/manual)</summary>

**User message**

> Render Ricky's first-run CLI onboarding for a new user.

**Ricky output**

```text
**Ricky – First‑Run CLI Onboarding Workflow**  
*(Designed for a brand‑new user installing the `ricky` command‑line tool on a supported platform.)*  

---  

### 1️⃣ Prerequisites (Verification Gate)

| Check | Command | Expected Result | Failure Action |
|-------|---------|----------------|----------------|
| **Node ≥18** | `node -v` | `v18.x` or higher | Prompt user to install Node LTS and abort. |
| **Git ≥2.30** | `git --version` | `git version 2.30.x` or higher | Prompt user to install Git and abort. |
| **Supported OS** | `uname -s` (Linux/macOS) or `systeminfo` (Windows) | `Linux`, `Darwin`, or `Windows` | Abort with “Unsupported OS” message. |

*All checks are run synchronously before any further steps. If any gate fails, the onboarding stops and a **Blocker Report** is emitted (see § 7).*

---  

### 2️⃣ Install the `ricky` CLI (Deterministic Step)

```bash
# Preferred: npm (global)
npm install -g @ricky/cli@latest

# Alternative: Homebrew (macOS/Linux)
brew tap ricky/cli
brew install ricky

Verification:

ricky --version   # should output a semver string, e.g. 1.2.3

If the version string is missing or malformed, abort and emit a blocker report.


...[truncated]


**Must**

- Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
- End every branch with a concrete next step.
- Advertise only commands that are currently implemented.

**Must Not**

- Sound like a launch page or documentation dump.
- Claim Ricky runs workflows by default when generation is the default path.
- Require web or Slack onboarding before CLI use.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.recovery-guidance-no-stack-traces</code> (cli-behavior/manual)</summary>

**User message**

> A user gives Ricky an empty spec or a missing spec file.

**Ricky output**

```text
**Missing or Empty Specification – Next Steps**

1. **Confirm the Issue**  
   - If you see a file named `spec.yaml` (or similar) that is completely empty, or if the expected spec file is absent from the repository, the workflow cannot be generated or validated.

2. **Provide a Minimal Spec Skeleton**  
   To get the process moving, create a placeholder spec with the required top‑level keys. Below is a generic template you can copy‑paste and then fill in the details relevant to your use case:

   ```yaml
   # spec.yaml – minimal skeleton
   version: "1.0"
   name: "<Your Workflow Name>"
   description: "<Brief description of the workflow’s purpose>"
   author: "<Your Name or Team>"
   contacts:
     - email: "<your.email@example.com>"
   triggers:
     - type: "manual"   # or "schedule", "webhook", etc.
   inputs:
     # Define input parameters here
     # - name: "input_name"
     #   type: "string"
     #   required: true
   steps:
     # List each step in execution order
     # - id: "step1"
     #   name: "Descriptive Step Name"
     #   action: "<action_identifier>"
     #   inputs: {}
     #   outputs: {}
   outputs:
     # Define what the workflow returns
     # - name: "res
...[truncated]

Must

  • Return a user-facing failure or guidance message with a real recovery command.
  • Distinguish generation failure from execution failure.
  • Show stack traces only when verbose diagnostic mode is requested.

Must Not

  • Crash with an uncaught exception in normal mode.
  • Suggest commands that do not exist.
  • Pretend a missing spec was accepted.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.status-does-not-invent-provider-state (cli-behavior/manual)

User message

Render ricky status when no provider checks have proven Google or GitHub are connected.

Ricky output

**`ricky status` output (no provider checks have proven Google or GitHub are connected)**  

=== AgentWorkforce Status Report ===
Timestamp: 2026‑05‑15T12:34:56Z

▶ Provider Connectivity
• Google : ❌ NOT CONNECTED (no successful health‑check)
• GitHub : ❌ NOT CONNECTED (no successful health‑check)

▶ Workflow Engine
• Scheduler : ✅ RUNNING
• Queue : 0 pending jobs
• Workers : 4 active / 4 idle

▶ Recent Activity
• Last successful provider check: 2026‑05‑14T08:12:03Z (none)
• Last workflow execution: 2026‑05‑15T11:58:20Z (completed)

▶ Alerts & Blockers
• BLOCKER: Provider connectivity missing for Google and GitHub.
→ Required for any workflow that depends on external auth, repo access,
or Google‑API integrations.
→ Resolve by:
1. Verifying OAuth credentials / service‑account keys.
2. Running ricky provider test --name google and --name github.
3. Ensuring network egress to *.googleapis.com and api.github.com.

▶ Recommendations
• Re‑authenticate Google provider:
ricky provider auth --name google --type oauth2
• Re‑authenticate GitHub provider:
ricky provider auth --name github -
...[truncated]


**Must**

- Report unknown or not-connected provider state honestly.
- Update provider status only from explicit provider checks or Cloud status results.
- Give concrete setup guidance for Cloud when relevant.

**Must Not**

- Mark Google or GitHub connected because guidance text was shown.
- Invent a provider connection URL or OAuth flow.
- Show empty fields with no recovery guidance when config is missing.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>generation-quality.skill-matcher-registry-backed</code> (generation-quality/manual)</summary>

**User message**

> Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.

**Ricky output**

```text
## GitHub Primitive‑Change Workflow  
**Purpose:** Apply a single‑file edit (e.g., README typo fix) to a repository while guaranteeing traceability, deterministic verification, and safe merge.  

| Phase | Owner | Skill(s) Used | Artifact | Gate / Validation |
|-------|-------|---------------|----------|-------------------|
| **1️⃣ Plan** | **Ricky – Planning** | *Scope Definition, Impact Forecast* | `change‑plan.md` (includes ticket ID, file path, diff preview, risk rating) | **Gate A:** Plan approved by repository‑owner (review comment “✅ Plan approved”). |
| **2️⃣ Branch** | **Ricky – Branching** | *Branch Naming, Isolation* | New branch `fix/<ticket‑id>-<short‑desc>` | **Gate B:** Branch created from `main` and CI lint passes on empty commit. |
| **3️⃣ Edit** | **Ricky – Edit‑Assist** | *Diff Generation, Contextual Lint* | Commit `c1` with atomic change (single‑hunk diff) | **Gate C:** Local pre‑commit hook runs `markdown‑lint` / `spellcheck`; must return 0. |
| **4️⃣ Local Verify** | **Ricky – Verification** | *Deterministic Test, Snapshot* | `verification‑log.txt` (hash of file before/after, diff checksum) | **Gate D:** `sha256sum` of the new file matches expected checksum re
...[truncated]

Must

  • Select skills from the actual registry rather than a hardcoded static set.
  • Record ranked skill matches with confidence and trigger evidence.
  • Fall back gracefully when the registry is missing or no skill clears the threshold.

Must Not

  • Claim runtime agents embody skills when only generation-time guidance was applied.
  • Fail generation solely because optional skill files are missing.
  • Hide skill selection evidence from artifacts or JSON output.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)

User message

Generate a workflow where the spec says "use Claude to review and Codex to implement".

Ricky output

**Workflow: “Claude‑Review → Codex‑Implement”**  
*Purpose*: Automate a two‑stage content pipeline where Claude (LLM‑reviewer) validates design/spec text and Codex (code‑generator) produces the implementation artifact. The workflow is version‑controlled, deterministic, and includes explicit verification gates and review artifacts.

---

## 1. Repository Layout (per **AGENTS.md**)

/workflows/
│ claude-review-codex-implement.yaml # ← workflow definition (this file)
│ README.md # usage & diagram
│ contracts/
│ review_contract.json # Claude input/output contract
│ impl_contract.json # Codex input/output contract
│ gates/
│ review_gate.py # deterministic verification of Claude output
│ impl_gate.py # deterministic verification of Codex output
│ scripts/
│ run_review.py # wrapper for Claude call
│ run_impl.py # wrapper for Codex call
│ artifacts/
│ <run‑id>/ # per‑run artifact folder (review, impl, logs)

/docs/
│ workflow_diagram.png
│ changelog.md


---

## 2. Workfl
...[truncated]

Must

  • Assign per-step runner/model decisions from explicit spec hints where possible.
  • Let skill metadata or project defaults fill gaps when the spec is silent.
  • Write tool-selection.json or equivalent audit metadata.

Must Not

  • Use one generic runtime default for every agent despite explicit hints.
  • Let skill defaults override explicit user/spec runner hints.
  • Omit the reason each tool or model was chosen.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)

User message

Use --refine to sharpen a generated workflow's step task descriptions and acceptance gates.

Ricky output

## Using `--refine` to Sharpen a Workflow

`--refine` is the flag you add when you run the **workflow‑author** CLI (or the internal authoring service) to tell the system to iterate over an existing workflow definition and improve:

* **Step‑task descriptions** – make them concrete, measurable, and scoped.  
* **Acceptance gates** – add deterministic verification checks that can be automatically evaluated.

Below is a step‑by‑step contract you can follow to get reproducible, high‑quality refinements.

---

### 1. Prepare the Base Workflow

| Artifact | Required Fields |
|----------|-----------------|
| `workflow.yaml` | `name`, `version`, `steps[]` (each with `id`, `task`, `inputs`, `outputs`) |
| `contract.json` *(optional)* | `scope`, `quality‑criteria`, `review‑owners` |

> **Tip:** Keep the base version **immutable** (e.g., `v1.0.0`). The refined version will be a new branch/PR.

---

### 2. Invoke the Refine Command

```bash
awf workflow-author \
  --input workflow.yaml \
  --contract contract.json \
  --refine \
  --target-branch feature/refine‑<workflow‑name>-v2 \
  --reviewer @team-lead
  • --refine tells the engine to:
    1. Parse each step’s task string.
    2. Replace
      ...[truncated]

**Must**

- Keep the deterministic unrefined path as the default.
- Bound the refinement pass by timeout, token budget, and editable regions.
- Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.

**Must Not**

- Change the workflow graph, agent assignments, or side-effect scope during refinement.
- Fail the whole generation if optional refinement times out.
- Hide warnings when the deterministic artifact is returned unchanged.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>generation-quality.behavior-grounded-gates</code> (generation-quality/manual)</summary>

**User message**

> Generate a workflow for the `ricky --version` spec.

**Ricky output**

```text
**Workflow: `ricky --version` Feature Implementation & Release**  
*(Ricky AgentWorkforce – version‑display command)*  

---

## 1. Scope & Objective
Create a deterministic, test‑driven workflow that adds the `--version` flag to the `ricky` CLI, exposing the current package version (semantic‑version string) and integrates it into the release pipeline. The workflow must guarantee:

* Zero runtime regressions for existing commands.  
* Accurate version output matching `package.json` (or `pyproject.toml` for Python).  
* Automated verification before any merge to `main`.  

---

## 2. Branching Model
| Branch | Purpose |
|--------|---------|
| `feature/version-flag` | Development of `--version` implementation. |
| `release/vX.Y.Z` | Release candidate branch; version bump applied here. |
| `main` | Production‑ready code. |
| `hotfix/*` | Emergency patches (not used for this feature). |

All work must stay within the `feature/version-flag` branch until the final PR is approved.

---

## 3. Contract (Specification)

| Item | Description |
|------|-------------|
| **Command** | `ricky --version` |
| **Output** | Exact string: `ricky version <MAJOR>.<MINOR>.<PATCH>` (e.g., `ricky version 2
...[truncated]

Must

  • Build gates from the stated acceptance behavior, such as checking ricky --version output.
  • Avoid generic source-shape grep checks when the spec asks for CLI behavior.
  • Keep generated validation meaningful for the current repo shape.

Must Not

  • Treat grep -Eq 'export|function|class|workflow(' dist/ricky.js as proof of version behavior.
  • Claim the workflow is proven by source syntax alone.
  • Ignore the package-json version resolution order in the spec.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)

User message

Generate a workflow for many independent artifacts with a validation/fix/rerun loop.

Ricky output

**Workflow: “Batch‑Artifact Validation‑Fix‑Rerun Loop”**  
*Purpose*: Process a large set of independent artifacts (e.g., data files, code modules, model checkpoints) through a deterministic validation stage, automatically apply fix‑scripts where possible, and rerun until all artifacts pass or are blocked.  

---  

## 1. Scope & Boundaries  

| Scope Element | Description |
|---------------|-------------|
| **Branch** | `feature/batch‑artifact‑pipeline` (isolated from `main`) |
| **PR Target** | `main` – only after **100 % pass** on the batch run |
| **Artifacts** | Any files under `artifacts/` that match `**/*.json` (or other extensions via config) |
| **Fix Scripts** | Located in `fixes/` – one script per artifact type, named `fix_<type>.sh` |
| **Validation Suite** | `validate/` – deterministic test harness returning JSON status |
| **Loop Limit** | Max **8** iterations (≈ 80‑100 validation‑fix cycles) to avoid infinite loops |
| **Blocker Policy** | If an artifact fails after the final iteration, the workflow **fails** and creates a **Blocker Issue** with details. |

---  

## 2. Contract (Inputs / Outputs)

| Item | Type | Required | Description |
|------|------|----------|--
...[truncated]

Must

  • Choose dag, supervisor, or pipeline deliberately based on the work shape.
  • Explain the pattern choice in artifact metadata or a rationale.
  • Use dag for validation/fix/rerun loops when dependencies matter.

Must Not

  • Default blindly to dag for every workflow.
  • Collapse independent artifact work into one vague agent task.
  • Omit verification gates because the chosen pattern seems obvious.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-boundary-copy (generation-quality/manual)

User message

Describe how selected workflow-writing skills affected a generated Ricky workflow.

Ricky output

### How Selected Workflow‑Writing Skills Shape a Generated Ricky Workflow  

| **Skill** | **What the Skill Does** | **Concrete Effect on the Generated Workflow** | **Why It Matters for Reliability** |
|----------|------------------------|-----------------------------------------------|------------------------------------|
| **Deterministic Contract Definition** | Declares input‑output schemas, pre‑conditions, and post‑conditions in a machine‑readable JSON‑Schema block. | The workflow begins with a `contract` section that is validated by the **ContractGate** before any step runs. If the incoming payload violates the schema, the workflow aborts with a clear `ContractViolation` error. | Guarantees that downstream steps receive exactly the data they expect, eliminating “garbage‑in‑garbage‑out” failures. |
| **Verification Gate Insertion** | Inserts explicit verification steps (e.g., `SchemaCheck`, `IdempotencyCheck`, `ResourceLockCheck`) after each logical group of actions. | After every **action block** the generated code adds a `gate: Verify<GateName>` stanza. The gate runs a deterministic test script; on failure it triggers a **Rollback** branch. | Provides early detection of hidde
...[truncated]

Must

  • Describe skills as generation-time selection, loading, and rendering inputs.
  • Point to metadata such as loaded-skills.txt and skill-application-boundary.json.
  • Say the workflow instructions were informed by selected skills.

Must Not

  • Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
  • Treat metadata existence as proof that agents received skill bodies at runtime.
  • Overstate current tests beyond the generation boundary.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)

User message

Generate a workflow from a vague product spec and return it to the user.

Ricky output

**Blocker – Insufficient Specification**

The request is to “generate a workflow from a vague product spec.” Without concrete details (e.g., product goals, key actions, inputs/outputs, success criteria, integration points, and any required compliance or performance constraints) I cannot produce a deterministic, review‑ready workflow contract.

**What’s needed to proceed**

| Category | Example details to provide |
|----------|----------------------------|
| **Product Vision** | One‑sentence description of the product’s purpose and primary user. |
| **Core Use Cases** | 2‑3 concrete scenarios (e.g., “User uploads a CSV and receives a summary report”). |
| **Inputs & Outputs** | Data formats, APIs, files, or UI elements that start and end the workflow. |
| **Key Steps & Actors** | Ordered list of actions, responsible agents (human or system), and any hand‑offs. |
| **Success / Failure Conditions** | Metrics or states that define a successful run and how failures are handled. |
| **Compliance / Security** | Any regulatory or security gates (e.g., GDPR consent, OAuth token validation). |
| **Performance SLAs** | Expected latency or throughput constraints for each stage. |
| **Review Ar
...[truncated]

Must

  • Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
  • Validate with dry-run or targeted structural checks where possible.
  • Return artifacts, warnings, and follow-up commands honestly.

Must Not

  • Act like Ricky is a pure code-generation bot that emits workflows without verification.
  • Stop at "code compiles" as the proof bar.
  • Skip skill-aware workflow authoring guidance for serious workflows.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)

User message

A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.

Ricky output

**Ricky’s next actions after a step‑timeout + orphaned worker**

| Phase | Action | Why | Artifact |
|-------|--------|-----|----------|
| 1️⃣ Detect & Log | • Capture the timeout event (step X) and the “alive‑but‑idle” worker (worker Y). <br>• Append a **FailureLog** entry to the workflow run’s audit trail (timestamp, step ID, worker ID, error code). | Provides an immutable record for post‑mortem and for the verification gate to consume. | `audit/failure‑log.json` |
| 2️⃣ Blocker Report | • Mark the run as **BLOCKED** in the orchestration DB. <br>• Emit a **BlockerTicket** (GitHub Issue or JIRA) with: <br>  – Summary: *Step X timed‑out; worker Y left dangling.* <br>  – Severity: **High** (no downstream artifacts). <br>  – Owner: *workflow‑owner* (or auto‑assign to the “Ricky‑Ops” team). | Guarantees visibility and prevents downstream pipelines from consuming incomplete artifacts. | `issues/####-timeout-orphan.md` |
| 3️⃣ Deterministic Verification Gate | • Insert a **Gate** `gate/timeout‑orphan‑check` that must pass before any retry or continuation. <br>• Gate logic: <br>  1. Confirm no partial artifacts exist in the expected output bucket. <br>  2. Verify the worker process has b
...[truncated]

Must

  • Classify the failure before deciding whether to retry.
  • Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
  • Preserve raw evidence and uncertainty when the class is not known.

Must Not

  • Blindly rerun the whole workflow without a blocker classification.
  • Treat every failure as a broken workflow definition.
  • Claim Ricky fixed the workflow before rerun evidence proves it.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)

User message

Ricky detects stale .agent-relay/, .relay/, and .trajectories/ state before launching a local workflow.

Ricky output

**Workflow Contract – “Stale‑State Guard for Local AgentWorkforce Runs”**  

| **Element** | **Definition** |
|-------------|----------------|
| **Name** | `guard‑stale‑state` |
| **Scope** | Local developer machines before any `aw run` (or equivalent) command. |
| **Inputs** | 1. Working‑tree root (detected via `git rev‑parse --show-toplevel`).<br>2. Optional config file `.aw/guard‑stale‑state.yml` (see *Config* section). |
| **Outputs** | - Exit code `0` → clean state, workflow may proceed.<br>- Exit code `1` → stale state detected; abort run.<br>- Human‑readable report printed to `stderr` and written to `$ROOT/.aw/guard‑stale‑state/report.txt`. |
| **Side‑effects** | - May delete or archive stale directories when the user opts‑in via config.<br>- Writes a timestamped log entry to `$ROOT/.aw/guard‑stale‑state/log.json`. |
| **Deterministic Verification Gates** | 1. **Presence Check** – Verify existence of any of the three directories.<br>2. **Freshness Check** – Compare each directory’s `mtime` of the newest file against the configured freshness threshold (default = 24 h).<br>3. **Integrity Check** – Run a quick hash‑sum (`sha256sum`) of a sentinel file (`.agent‑relay/.stamp`, `.
...[truncated]

Must

  • Classify stale local runtime state as an environment contamination issue.
  • Recommend quarantine or isolated-run guidance before launch.
  • Record the observed paths and the action taken or recommended.

Must Not

  • Treat stale runtime state as a workflow logic failure.
  • Delete or overwrite state without an explicit safe path or user intent.
  • Continue into execution as if the workspace were clean.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)

User message

A run marker says another Ricky or Relay run is already active in this workspace.

Ricky output

**Fixing “Another Ricky or Relay run is already active in this workspace”**

The marker file is a safety lock that prevents two concurrent executions from stepping on each other. When it’s left behind (e.g., after a crash or an interrupted run) you’ll see the error you described. Follow the deterministic, low‑risk repair flow below.

---

## 1️⃣ Verify the Situation (Gate 1)

| Check | Command | Expected outcome |
|-------|---------|-------------------|
| Active process? | `ps aux \| grep -i ricky` | No `ricky` or `relay` PID should be listed (aside from the `grep` line). |
| Marker file existence | `ls -l .ricky/run.marker` | File exists → proceed. |
| Marker timestamp | `stat -c %y .ricky/run.marker` | Note the modification time; if it’s older than the last successful run, it’s stale. |

If a live `ricky`/`relay` process **is** running, **do not delete the marker** – let the process finish or kill it deliberately (see “Force‑stop” below).  

If no process is running, the marker is orphaned and safe to clear.

---

## 2️⃣ Safe Cleanup (Gate 2)

1. **Backup the marker (optional, for audit)**  
   ```bash
   cp .ricky/run.marker .ricky/run.marker.bak_$(date +%Y%m%d%H%M%S)

...[truncated]


**Must**

- Report the active marker, run id, or status path when available.
- Ask the user to inspect, wait for, or explicitly clear the active run.
- Avoid launching a competing run that could corrupt evidence.

**Must Not**

- Silently start another run.
- Hide the existing run marker from the user.
- Treat the conflict as a generic failure with no recovery path.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>runtime-recovery.env-loader-injection-runtime-loadable</code> (runtime-recovery/manual)</summary>

**User message**

> A workflow artifact references a `MISSING_ENV_VAR` value. Ricky's deterministic auto-fix injects the `.env.local` / `.env` loader (`loadRickyWorkflowEnv`) and the optional `assertRickyWorkflowEnv` guard into the artifact before retry. The artifact may be a master-rendered workflow whose `.step({ command: ... })` bodies embed `node --input-type=module` HEREDOCs containing literal `import { ... } from 'node:fs'` / `from 'node:path'` strings.

**Ricky output**

```text
**Deterministic auto‑fix for missing environment variables**

Below is the concrete patch you can apply to any workflow artifact that currently fails because it references a `MISSING_ENV_VAR`. The fix does three things:

1. **Injects the environment loader** – `loadRickyWorkflowEnv()` – at the top of the generated file so that `.env.local` (or fallback `.env`) is read before any step runs.  
2. **Adds an optional guard** – `assertRickyWorkflowEnv()` – that throws a clear error if the required variable is still undefined after loading.  
3. **Wraps any HEREDOC‑embedded `node --input-type=module` command** with the same loader/guard so that even dynamically‑generated modules see the variables.

The patch is scoped to the single workflow file (or the master‑rendered workflow that contains the HEREDOCs). No other branches or PRs are required.

---

## 1. Helper utilities (add once to a shared `ricky-workflow-env.ts`)

```ts
// ricky-workflow-env.ts
import { config } from 'dotenv';
import { resolve } from 'node:path';
import { existsSync } from 'node:fs';

/**
 * Load environment variables from .env.local (preferred) or .env.
 * Returns the loaded env object for inspection.
 */
export f
...[truncated]

Must

  • Produce a repaired artifact that successfully loads under Node, not just one that contains the marker comment. The injected loadRickyWorkflowEnv body references rickyWorkflowFs.* and rickyWorkflowPath.*, so the repair must also add the corresponding import * as rickyWorkflowFs from 'node:fs' and import * as rickyWorkflowPath from 'node:path' aliases at module top level.
  • Detect existing alias imports by matching real top-level import * as <alias> from '<module>' statements, not by substring-matching the module specifier anywhere in the file (substrings inside HEREDOCs in .step({ command: ... }) bodies do not count as imports).
  • Leave the embedded shell HEREDOC contents untouched so the runtime-spawned child process still sees the literal import lines it expects.

Must Not

  • Skip adding the rickyWorkflowFs / rickyWorkflowPath aliases because from 'node:fs' or from 'node:path' already appears somewhere in the file as a string literal.
  • Inject loadRickyWorkflowEnv (or assertRickyWorkflowEnv) without the supporting alias imports, which produces a ReferenceError: rickyWorkflowPath is not defined at module load and burns the auto-fix budget on UNSUPPORTED_RUNTIME at runtime-launch.
  • Rewrite or escape the embedded HEREDOC text in step commands.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)

User message

Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.

Ricky output

**Local Workflow Execution – Auto‑Fix Mode**  

Below is a deterministic, step‑by‑step contract you can copy‑paste into your terminal (or CI script) to run the workflow, trigger the auto‑fix path on the first failure, and verify the repaired artifact before committing any changes.

---

## 1️⃣ Preconditions (inputs you already have)

| Variable | Source | Example |
|----------|--------|---------|
| `WORKFLOW_PATH` | Path to the workflow definition (e.g., `./workflows/build.yml`) | `./workflows/build.yml` |
| `FAILED_STEP_ID` | Identifier of the step that failed on the first run | `test:unit` |
| `PREV_RUN_ID` | Run‑ID of the previous successful execution (used for diff/trace) | `run‑20240512‑a1b2c3` |
| `AUTO_FIX=true` | Flag to enable the auto‑repair loop | – |
| `MAX_REPAIR_ATTEMPTS` | Upper bound for repair loops (recommended 3) | `3` |

> **Note:** All variables must be exported in the shell before invoking the runner.

```bash
export WORKFLOW_PATH=./workflows/build.yml
export FAILED_STEP_ID=test:unit
export PREV_RUN_ID=run-20240512-a1b2c3
export AUTO_FIX=true
export MAX_REPAIR_ATTEMPTS=3

2️⃣ Execution Contract

Create a single‑purpose script (`run-with-autofix
...[truncated]


**Must**

- Use a bounded retry budget and summarize every attempt.
- Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
- Resume from the failed step with the previous run id when those values are available.

**Must Not**

- Edit arbitrary repository source files as the default auto-fix surface.
- Keep retrying after the configured max attempts.
- Lose the single Ricky tracking run id across repair/resume attempts.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>runtime-recovery.no-auto-fix-preserves-single-attempt</code> (runtime-recovery/manual)</summary>

**User message**

> A user runs `ricky run workflows/foo.ts --no-auto-fix` and the workflow fails.

**Ricky output**

```text
**Quick‑Start Debug Checklist for a Failing `ricky run workflows/foo.ts --no-auto-fix`**

| Step | Action | Expected Outcome | How to Verify |
|------|--------|------------------|---------------|
| 1️⃣ | **Collect the run log** | A complete, timestamped log file (`ricky‑run‑foo‑<ts>.log`) in the workspace `/.ricky/logs/` | `cat .ricky/logs/ricky‑run‑foo-*.log` |
| 2️⃣ | **Locate the failure gate** | Identify the deterministic verification gate that threw the error (e.g., `SchemaValidate`, `DependencyCheck`, `RuntimeAssert`) | Search the log for `✖ Gate` or `❌ ` markers |
| 3️⃣ | **Inspect the offending node** | The log will show the workflow node ID, source file line, and the exact exception message | Example: `NodeID: step‑transform‑1 (workflows/foo.ts:27)` |
| 4️⃣ | **Run the node in isolation** | Use `ricky run-node <node-id> --no-auto-fix` to reproduce the error without the full pipeline | `ricky run-node step‑transform‑1 --no-auto-fix` |
| 5️⃣ | **Validate contract compliance** | Ensure the node’s input/output matches the **Workflow Contract** defined in `workflows/foo.contract.ts` | `ricky validate-contract workflows/foo.contract.ts` |
| 6️⃣ | **Check external dependencies** 
...[truncated]

Must

  • Preserve one-attempt behavior when auto-fix is disabled.
  • Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
  • Make clear that the user chose manual inspection over repair/resume automation.

Must Not

  • Start a repair loop despite --no-auto-fix.
  • Suppress the diagnosis because no repair was attempted.
  • Present the failure as a completed repair attempt.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.in-process-local-runner (runtime-recovery/manual)

User message

Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.

Ricky output

**Ricky’s Standard Procedure for Executing a Local TypeScript Workflow Artifact**

Below is the concrete, repeatable contract Ricky follows when a developer asks to run a TypeScript‑based workflow that lives in the repository’s *primary local path* (by default `./workflows`). The steps are written as a deterministic “execution contract” that can be copied into a branch‑specific README or CI script and verified by the built‑in validation gates.

---

## 1️⃣ Preconditions (Contract Input)

| Item | Requirement | How Ricky Checks |
|------|-------------|------------------|
| **Repo root** | The command must be run from the repository root (contains `package.json`). | `git rev-parse --show-toplevel` |
| **Primary workflow directory** | Must exist and contain an `index.ts` (or a file matching `*.workflow.ts`). | `test -d ./workflows && test -f ./workflows/index.ts` |
| **Node version** | `>=18.x` (LTS) – matches `engines.node` in `package.json`. | `node -v` |
| **TypeScript compiler** | `ts-node` must be available (either globally or as a dev‑dependency). | `npx ts-node -v` |
| **Dependencies** | All npm packages declared in `package.json` are installed. | `npm ls` returns no missing mo
...[truncated]

Must

  • Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the agent-relay binary on PATH.
  • Precheck that Node and @agent-relay/sdk are resolvable for the workflow.
  • Record the actual spawn command in execution evidence.

Must Not

  • Fail solely because agent-relay is not on PATH when the SDK route is available.
  • Hide the actual runtime command from evidence.
  • Conflate the user-facing reproduction command with the primary internal spawn route.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)

User message

Ricky reaches a boundary after a structural failure persists after a fix attempt.

Ricky output

**Ricky’s Boundary‑Handling Protocol for Persistent Structural Failures**

When a structural failure in an AgentWorkforce workflow remains after an initial remediation attempt, Ricky follows a deterministic, contract‑driven protocol that isolates the problem, escalates it cleanly, and guarantees traceability. The steps below constitute the **“Persistent‑Failure Boundary”** workflow contract.

---

## 1. Failure Detection & Initial Fix Attempt  

| Phase | Action | Artifact |
|-------|--------|----------|
| **Detect** | Automated test suite (or runtime monitor) raises a *StructuralFailure* alert with a unique `FAIL‑ID`. | `failure.log` (JSON) |
| **Log** | Ricky writes a *Failure Record* (FR) to `/.ricky/failures/FAIL‑ID.json` containing: <br>• Timestamp <br>• Affected workflow name & version <br>• Failure signature (stack trace, diff) <br>• Owner (team/individual) | FR file |
| **Attempt Fix** | Ricky creates a *fix branch* `fix/FAIL‑ID‑<short‑desc>` from the latest stable `main`. <br>Applies the prescribed remediation (e.g., schema bump, dependency pin). | `fix/FAIL‑ID‑<short‑desc>` |
| **Verify** | Runs the **Deterministic Verification Gate (DVG)**: <br>1. Unit + integration test
...[truncated]

Must

  • Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
  • Distinguish escalation from a generic product failure.
  • Preserve enough context for a human operator to continue.

Must Not

  • Retry speculative fixes indefinitely.
  • Collapse to "something went wrong" without the attempted actions.
  • Discard evidence from failed repair attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)

User message

Produce a workflow health digest from many Ricky workflow runs.

Ricky output

**Ricky Workflow Health Digest – Compiled from Recent Runs**  
*(Generated: 2026‑05‑15 08:12 UTC)*  

---

## 1. Executive Summary
| Metric | Value | Trend |
|--------|-------|-------|
| Total runs (last 24 h) | **312** | ↗ +8 % |
| Successful runs | **274** | ↗ +5 % |
| Failed runs | **38** | ↘ ‑12 % |
| Runs with **critical blockers** | **3** | ↔ steady |
| Avg. run duration | **2 m 17 s** | ↘ ‑14 s |
| Avg. verification‑gate latency | **12 s** | ↔ steady |

**Overall health:** **🟢 Green** – > 85 % success, no regression in verification‑gate pass‑rates.  

---

## 2. Run‑Level Health Indicators  

| Run ID | Branch | PR | Status | Verification Gates (pass/total) | Duration | Blockers |
|--------|--------|----|--------|----------------------------------|----------|----------|
| `wf‑20260514‑a1b2c3` | `feature/auto‑review` | #8421 | ✅ Success | 12/12 | 1 m 42 s | – |
| `wf‑20260514‑d4e5f6` | `bugfix/verify‑gate‑timeout` | #8399 | ❌ Failed | 9/12 | 3 m 05 s | **Gate‑Timeout** (gate #7) |
| `wf‑20260514‑g7h8i9` | `release/v2.3.0` | #8450 | ✅ Success | 12/12 | 2 m 10 s | – |
| `wf‑20260514‑j0k1l2` | `hotfix/critical‑blocker` | #8462 | ❌ Failed | 5/12 | 4 m 22 s | **Critical Blocker**
...[truncated]

Must

  • Consume normalized WorkflowRunEvidence rather than raw logs as the primary input.
  • Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
  • Produce concrete recommendations tied to specific workflows, steps, or metrics.

Must Not

  • Mutate evidence while analyzing it.
  • Return generic advice like "improve your workflow" without references.
  • Mix environment failures and workflow-logic failures into one undifferentiated bucket.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.preserve-pr-shipping-during-repair (runtime-recovery/manual)

User message

A persona-authored workflow that imports @agent-relay/github-primitive and uses createGitHubStep to open a PR fails its runtime-precheck. The auto-fix repair persona is invoked. Describe the contract the repaired artifact must satisfy.

Ricky output

**Repaired Workflow Contract**

| **Section** | **Requirement** | **Verification Gate** | **Pass‑Criteria** |
|-------------|----------------|-----------------------|-------------------|
| **1. Imports** | The workflow must import the **exact** package name `@agent‑relay/github-primitive` and **only** the symbols it needs. | *Static import lint* – `import { createGitHubStep } from "@agent-relay/github-primitive"` must be present and resolve without warnings. | No missing‑module or duplicate‑import errors. |
| **2. Step Construction** | A single GitHub step must be created with `createGitHubStep` that: <br>• Targets the **pull‑request** action.<br>• Supplies a **valid `repo` object** (`owner`, `repo`).<br>• Supplies a **valid `branch`** (source & target).<br>• Supplies a **non‑empty `title`** and **body**.<br>• Returns a **step object** that conforms to `GitHubStep` interface (has `id`, `run`, `outputs`). | *Type‑check* against `GitHubStep` schema + *runtime‑precheck* that the step’s `run` string is a valid GitHub CLI command. | `step.id` is a non‑empty string, `run` contains `gh pr create …`, and `outputs` includes at least `prNumber` and `prUrl`. |
| **3. Workflow Definition** | T
...[truncated]

Must

  • Preserve the @agent-relay/github-primitive import, the GitHubStepExecutor reference, and every createGitHubStep(...) invocation that the original artifact declared.
  • Keep the failing workflow runnable from the same path with the same workflow(...) builder and .run({ cwd: process.cwd() }) invocation.
  • Retain at least ceil(N / 2) of the original workflow's .step(...) calls when the original declared four or more steps; a repair that collapses to a 2-3 step placeholder is a regression, not a fix.
  • Reject (do not apply) any repair output whose step list reduces to prepare-context / runtime-precheck: true / final-signoff: echo placeholder; surface the regression diagnostic instead.

Must Not

  • Strip createGitHubStep, GitHubStepExecutor, or @agent-relay/github-primitive because the runtime-precheck failure mentioned PR-shipping or git side effects. The repair contract's "no commit / no push" constraint applies to the REPAIR AGENT's runtime behavior, not to the workflow's step declarations.
  • Emit a "minimal repair-safe master" or "simplified Ricky master" scaffold that passes the builder validator while doing none of the original work.
  • Treat "the workflow now builds and runs" as success when the work it was supposed to ship is gone.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)

User message

Design a new Ricky Slack surface for workflow debugging.

Ricky output

## Ricky Slack Surface – “Workflow Debugger”

### 1. Overview
The **Ricky Slack Debugger** is a dedicated Slack app surface that lets engineers and product owners inspect, step‑through, and repair Ricky‑managed workflows without leaving Slack. It follows the **Ricky repository conventions** (AGENTS.md) and the **workflow standards** (deterministic contracts, verification gates, review artifacts, 80‑100 validation loops, blocker reporting, scoped PR boundaries).

---

## 2. High‑Level Architecture

| Component | Responsibility | Contract |
|-----------|----------------|----------|
| **Slack Bot (ricky‑debugger)** | Receives slash‑commands & interactive actions, formats responses, forwards requests to the backend. | `SlackEvent → BotRequest` (JSON, schema in `contracts/slack_event.json`). |
| **API Gateway (ricky‑debugger‑api)** | Auth‑checks, rate‑limits, routes to service layer. | `BotRequest → ServiceCall` (OpenAPI v3, `api/spec.yaml`). |
| **Debug Service** | Core logic: fetch workflow definition, current state, step execution, patch generation. | `DebugRequest → DebugResponse` (protobuf v3, `contracts/debug.proto`). |
| **Workflow Engine (Ricky Core)** | Executes

---

_Comment truncated to stay within GitHub limits. Download the `ricky-eval-run` artifact for the full `human-review.md`._

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant