Skip to content

fix(generation): fall back to deterministic renderer when persona writer throws#84

Merged
kjgbot merged 1 commit intomainfrom
fix/persona-writer-fallback-to-deterministic
May 9, 2026
Merged

fix(generation): fall back to deterministic renderer when persona writer throws#84
kjgbot merged 1 commit intomainfrom
fix/persona-writer-fallback-to-deterministic

Conversation

@kjgbot
Copy link
Copy Markdown
Contributor

@kjgbot kjgbot commented May 9, 2026

Summary

ricky --mode local --spec-file <foo>.md --run consumes the entire 7/7 auto-fix budget on identical INVALID_ARTIFACT at runtime-precheck failures whenever the workforce-persona writer (opencode/claude CLI) fails — non-completed status, malformed output, harness errored, etc. — even though Ricky already rendered a perfectly valid deterministic artifact in memory before invoking the persona writer.

$ ricky --mode local --spec-file docs/plans/2026-05-05-proactive-pr-remediation-spec.md --run
Generation: ok — workflows/generated/ricky-…ts
Workflow name: wf-1e14175ec3b5
Execution: blocked — INVALID_ARTIFACT at runtime-precheck
Resume: ricky run workflows/generated/ricky-…ts --start-from runtime-precheck
Cause: Workflow artifact is not readable at /…/workflows/generated/ricky-…ts.
Auto-fix: stopped after 7/7 attempt(s) (INVALID_ARTIFACT)

The Generation: ok — <path> line is misleading — the artifact was rendered but never written. The <path> directory workflows/generated/ does not exist on disk after the run.

(This supersedes #83, which only patched the gate to re-render on retries — that doesn't help, because the persona writer just fails again.)

Root cause

Traced end-to-end against the user's spec by instrumenting the installed dist around defaultArtifactWriter.writeArtifact, precheckRuntimeLaunch, the gate, the auto-fix writer, and the generation call. Concrete attempt-1 trace:

[trace] gen call start workforcePersona=true
[trace] gen call returned success=false hasArtifact=true \
        artifactPath=workflows/generated/ricky-…ts validErrors=1
[trace] gen EARLY-RETURN no write success=false hasArtifact=true \
        firstErr=Workforce persona writer did not complete: failed.

So:

  1. generateWithWorkforcePersona builds baseResult (deterministic template render — always valid). Persona writer is invoked. Persona writer throws WorkforcePersonaWriterError("Workforce persona writer did not complete: failed.").
  2. The catch block at pipeline.ts:204-239 builds a WORKFORCE_PERSONA_WRITER_FAILED blocking validation issue and returns { ...baseResult, success: false, validation, workforcePersona }. Crucially it spreads baseResult.artifact so the artifact field is non-null, but it sets success: false to surface the writer failure.
  3. entrypoint.execute() hits the early-return at entrypoint.ts:1083 (!success || !artifact) and returns without writing anything. The artifact-in-memory is pushed onto response.artifacts (line 1075) but never persisted.
  4. Auto-fix sees the failure, runs retryBaseRequest, which promotes response.artifacts[0].pathrequest.specPathsource: 'workflow-artifact'.
  5. On the retry, workflowFileForRoute returns the (synthetic) .ts specPath because it matches isExecutableWorkflowPath. Routing target stays at 'debug' (spec content unchanged). The gate target === 'generate' || !workflowFile evaluates false → skips generation entirely.
  6. Runtime-precheck ENOENTs on the never-written file. INVALID_ARTIFACT. Repeat 7×.

Generation: ok prints because on retries 2–7, createArtifactReferenceGenerationStage synthesizes a generation stage with status: 'ok' for the artifact-reference path — the CLI doesn't distinguish "rendered" from "written-and-readable".

Fix

In the non-clarification catch branch of generateWithWorkforcePersona, fall back to the deterministic baseResult.artifact instead of returning success: false:

  • success: true (artifact field already set via ...baseResult)
  • validation gets a warning WORKFORCE_PERSONA_WRITER_FAILED issue (instead of a blocking one) via addValidationWarning
  • workforcePersona.warnings includes the failure message so it surfaces in the run summary

This matches the existing pre-write validation fallback at pipeline.ts:154-166, which already returns success: true with the deterministic baseResult when the persona's output fails repair-attempt validation. The two persona-failure modes (writer threw vs. writer returned invalid output) now behave the same way.

Test plan

  • npm run typecheck
  • npm test — 1028 tests pass across 49 files
  • Updated src/product/generation/workforce-persona-writer.test.ts:330 (renamed from "errors instead of writing a file when the harness returns malformed text" to "falls back to deterministic rendering with a visible warning when the harness returns malformed text"). The original assertion codified the bug. The new assertion verifies:
    • result.success === true
    • result.artifact.content contains workflow( (the deterministic render, not the persona's malformed output)
    • result.validation.warnings includes the persona writer failure message
    • result.workforcePersona.warnings includes it too
  • Manual end-to-end repro is in progress against the user's spec; will report the result once the dist with the patch finishes its run.

Out of scope

  • The Generation: ok — <path> CLI output is misleading when the artifact was rendered-but-not-written (or when the path comes from createArtifactReferenceGenerationStage on a retry). Worth a follow-up.
  • The parser at src/product/spec-intake/parser.ts:detectIntent scoring "spec for handling failed runs" as debug intent (rather than generate) is a separate issue. Not the cause of the visible bug, but it complicates retry routing. Worth a follow-up.

🤖 Generated with Claude Code

…ter throws

When `ricky --mode local --spec-file <foo>.md --run` invoked the
workforce-persona writer (opencode/claude CLI) and that call failed —
e.g. non-completed status, malformed/unfenced output, harness errored —
generateWithWorkforcePersona's catch block returned `success: false`
with `WORKFORCE_PERSONA_WRITER_FAILED` as a *blocking* issue. That
discarded the perfectly valid `baseResult.artifact` (the deterministic
template render we'd already produced before calling the persona
writer), so entrypoint.execute() hit its `!success || !artifact`
early-return at src/local/entrypoint.ts:1083 without ever writing
anything to disk.

The downstream effect was the user-visible 7/7 INVALID_ARTIFACT loop
the original bug report described:

  Generation: ok — workflows/generated/ricky-...ts
  Workflow name: wf-1e14175ec3b5
  Execution: blocked — INVALID_ARTIFACT at runtime-precheck
  Auto-fix: stopped after 7/7 attempt(s) (INVALID_ARTIFACT)

What actually happened, traced end-to-end against the user's spec:

  [trace] gen call start workforcePersona=true
  [trace] gen call returned success=false hasArtifact=true \
          artifactPath=workflows/generated/ricky-...ts validErrors=1
  [trace] gen EARLY-RETURN no write success=false hasArtifact=true \
          firstErr=Workforce persona writer did not complete: failed.

So generation rendered a valid artifact in memory but threw it away
because the persona-writer failure was treated as fatal. retryBaseRequest
then promoted `response.artifacts[0].path` → `request.specPath`, the
gate at entrypoint.ts:1039 saw a workflowFile matching
isExecutableWorkflowPath and skipped generation, and runtime-precheck
ENOENT'd on the never-written file every retry.

Fix: when the persona writer throws (non-clarification path), fall back
to the deterministic baseResult — `success: true`, artifact intact,
and the persona failure surfaced as a `validation.warnings` entry plus
a `workforcePersona.warnings` entry. This mirrors the existing pre-write
validation fallback at pipeline.ts:154-166, which already returns
`success: true` with the deterministic render when the persona's output
fails repair-attempt validation. The two failure modes (writer threw vs.
writer returned invalid output) now behave the same way.

Updated workforce-persona-writer.test.ts:330 — previously named
"errors instead of writing a file when the harness returns malformed
text" — to assert the new fallback behavior. The original assertion
codified the bug.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 9, 2026

Review Change Stack
No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 136061a7-31df-4ef9-86f7-085b20d8c44f

📥 Commits

Reviewing files that changed from the base of the PR and between f36851a and 8d9e09c.

📒 Files selected for processing (2)
  • src/product/generation/pipeline.ts
  • src/product/generation/workforce-persona-writer.test.ts

📝 Walkthrough

Walkthrough

Error handling for workforce persona writer failures was reclassified from hard failures returning validation errors to graceful fallbacks that succeed with warnings. When writer failures occur, the function now returns a deterministic artifact with unresolved persona metadata and merged warnings instead of blocking execution.

Changes

Workforce Persona Fallback Strategy

Layer / File(s) Summary
Error Handling Strategy
src/product/generation/pipeline.ts
generateWithWorkforcePersona() catch block now returns success: true with fallback workforcePersona metadata marked unresolved/unknown, merges writerError.warnings with deterministic-renderer fallback message, and adds warning to validation instead of returning blocking errors.
Test Expectations
src/product/generation/workforce-persona-writer.test.ts
Test updated to expect successful run with deterministic fallback artifact written to artifactPath, containing a workflow( call, and workforce persona writer failure surfaced via validation.warnings and workforcePersona.warnings instead of validation errors.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • AgentWorkforce/ricky#66: Both PRs modify generateWithWorkforcePersona in src/product/generation/pipeline.ts; this PR changes error-handling to return deterministic fallback with warnings while the retrieved PR removes an early return for persona authoring in master execution plans.
  • AgentWorkforce/ricky#54: Both PRs modify the workforce persona generation path in src/product/generation/pipeline.ts with fallback/repair-based handling, shifting failure to deterministic results with warnings.
  • AgentWorkforce/ricky#52: Both PRs touch workforce-persona generation, its tests, and fallback artifact behavior in the generation pipeline.

Poem

🐰 When writers stumble, we don't fall,
A fallback artifact saves us all!
With warnings whispered, not errors loud,
The pipeline flows through generation's crowd.
Success, even when the persona's unknown—
Resilience blooms in how we've grown. 🌱

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: falling back to deterministic rendering when the workforce persona writer fails, which is the core fix described throughout the PR.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining the problem, root cause, fix, test validation, and scope clearly.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/persona-writer-fallback-to-deterministic

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

Open in Devin Review

@kjgbot kjgbot merged commit 3c65240 into main May 9, 2026
3 checks passed
@kjgbot kjgbot deleted the fix/persona-writer-fallback-to-deterministic branch May 9, 2026 19:01
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

Ricky Eval Review

Run: .ricky/evals/runs/2026-05-09T18-20-39-532Z-HEAD-provider
Mode: provider
Git SHA: 7a842d5

Passed: 5 | Needs human: 42 | Reviewable: 42 | Missing output: 0 | Failed: 0 | Skipped: 0

Human Review Cases

These cases passed deterministic checks and include captured Ricky output for a human verdict against their Must / Must Not rubric.

REVIEW agent-assistant-boundary.real-reuse-not-rhetorical (agent-assistant-boundary/manual)

User message

Update Ricky docs and code to say it uses Agent Assistant more deeply.

Ricky output

**Ricky – Updated Documentation & Code to Reflect Deeper Integration with Agent Assistant**

Below is a complete, deterministic change package that brings the Ricky repository in line with the new product spec: *“Ricky now leverages the Agent Assistant (AA) for all internal reasoning, validation, and coordination tasks.”*  

The package follows **AGENTS.md** conventions, the **workflow standards**, and the **shared authoring rules** (branch naming, PR template, review gates, and validation loops).

---

## 1. Scope & Branching

| Item | Value |
|------|-------|
| **Feature ID** | `RICKY-AA-2024-03` |
| **Branch** | `feat/ricky/aa‑deep‑integration` |
| **PR Title** | `feat(ricky): embed Agent Assistant throughout workflow engine` |
| **PR Description** | See section 3 below. |
| **Owner** | `ricky-maintainer` |
| **Reviewers** | `@agent‑assistant-team`, `@workflow‑lead`, `@doc‑owner` |
| **Target Milestone** | `v2.3.0` |

---

## 2. Concrete Workflow Contract

| Contract Element | Definition |
|------------------|------------|
| **Input** | Any `RickyTask` JSON payload submitted via `/ricky/v1/tasks`. |
| **AA Invocation** | `AgentAssistant.run(taskSpec, context)` – synchronous call
...[truncated]

Must

  • Ground claims in real package imports and runtime paths.
  • Distinguish current implementation from target architecture.
  • Identify which Agent Assistant primitive is actually exercised.

Must Not

  • Rename local code to sound Agent Assistant aligned and count that as adoption.
  • Claim broad Agent Assistant native behavior from documentation-only alignment.
  • Blur target architecture with landed behavior.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.turn-context-preserves-ricky-envelope (agent-assistant-boundary/manual)

User message

Evaluate the current Ricky @agent-assistant/turn-context adoption.

Ricky output

**Ricky @agent‑assistant/turn‑context Adoption Review (as of 2026‑05‑09)**  

| Metric | Current State | Target / SLA | Gap / Action |
|--------|---------------|--------------|--------------|
| **Workflow contracts defined** | 12 contracts (JSON‑Schema v1.3) covering turn‑context ingestion, enrichment, and propagation. | 100 % of active pipelines must have a contract. | 2 pipelines still use ad‑hoc “raw‑context” objects. **Action:** Draft contracts for *pipeline‑audit* and *pipeline‑alert* by 2026‑05‑20. |
| **Deterministic verification gates** | 8/12 contracts have pre‑merge gate (schema validation + unit test). | 100 % gate coverage. | Missing gates for *context‑audit* and *context‑alert*. **Action:** Add `ricky-verify-context` CI step (npm script) by 2026‑05‑22. |
| **Review artifacts** | All PRs include generated contract diff and a “Context‑Impact” checklist. | Checklist must be signed off by at least one senior reviewer. | 3 of the last 15 PRs lack signed checklist. **Action:** Enforce checklist via PR template enforcement rule (GitHub CODEOWNERS). |
| **Validation loops** | Average of 42 validation loops per contract (unit + integration + end‑to‑end). | 80‑100 loops per cont
...[truncated]

Must

  • Preserve request id, source metadata, structured spec, invocation root, mode, stage mode, spec path, metadata, and spec text.
  • Record compact provenance through generation decisions or coordinator metadata.
  • Keep the shared turn context internal to the adapter boundary.

Must Not

  • Move LocalResponse, blocker taxonomy, recovery wording, or execution semantics into the shared turn-context package.
  • Drop Ricky-specific workflow metadata during envelope assembly.
  • Treat turn context as a product decision engine.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.product-core-stays-ricky-owned (agent-assistant-boundary/manual)

User message

Decide whether workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording should move into Agent Assistant.

Ricky output

**Recommendation: Move the listed capabilities into the Agent Assistant**

| Capability | Fit with Agent Assistant | Rationale | Migration Path |
|------------|--------------------------|-----------|----------------|
| **Workflow generation** | ✅ Core | The Assistant already owns the “author‑as‑you‑go” contract (AGENTS.md §2.1). Adding generation lets it produce concrete `workflow.yaml` files from high‑level intents, keeping the authoring loop inside a single agent. | • Add a `generate‑workflow` command that accepts a JSON spec and returns a validated workflow contract.<br>• Hook the existing deterministic verification gate (schema + lint) as the first gate after generation. |
| **Workflow validation** | ✅ Core | Validation is the deterministic gate that the Assistant enforces before any PR is opened. Centralising it avoids duplicated lint/CI scripts across repos. | • Expose a `validate‑workflow` sub‑command that runs schema, type‑checking, and the 80‑to‑100 % validation loop.<br>• Return a machine‑readable report (JSON) for downstream CI. |
| **Workflow debugging** | ✅ Core | Debugging is the “repair” side of the authoring loop. The Assistant already tracks blockers and evidence (
...[truncated]

Must

  • Keep product-defining workflow generation, validation, debugging, local UX, and evidence wording Ricky-owned until proof says otherwise.
  • Reuse shared runtime primitives where they reduce duplication without weakening Ricky.
  • Make extraction follow typed, tested, live product proof.

Must Not

  • Generalize workflow-specific behavior prematurely.
  • Adopt moving shared seams merely for architectural purity.
  • Lose the precise local-first staged workflow UX.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.one-slice-at-a-time (agent-assistant-boundary/manual)

User message

Plan the next Agent Assistant adoption slice for Ricky.

Ricky output

**Agent Assistant Adoption Slice – “Ricky v2.0 Roll‑out”**  
*Target: Core‑Team pilots (5 squads) → 4 weeks → measurable reliability uplift*  

---

## 1. Slice Scope & Objectives
| Goal | Success Metric | Acceptance Criteria |
|------|----------------|----------------------|
| **Reliability** – reduce “workflow‑failure” tickets by ≥30% vs. baseline | Ticket count (Jira BUG‑RICK‑*) | ≤ 7 tickets/​week after week 2 |
| **Coordination** – 90 % of cross‑team hand‑offs completed on‑time | Hand‑off SLA (48 h) | ≥ 90 % on‑time across all pilot squads |
| **Authoring** – 100 % of new/updated workflows pass deterministic verification | Verification gate pass rate | 0 % false‑positive failures in CI |
| **Feedback Loop** – ≥ 80 % of reviewers submit “review‑artifact” within 24 h | Review‑artifact submission | ≥ 80 % on‑time |

---

## 2. Concrete Workflow Contracts

| Contract ID | Trigger | Input | Output | Deterministic Gate |
|-------------|---------|-------|--------|--------------------|
| **RC‑01** | PR opened on `ricky/*` branch | `workflow.yaml` + `contract.json` | `workflow‑validated.json` | `ricky‑verify --contract contract.json workflow.yaml` (must return `PASS`) |
| **RC‑02** | C
...[truncated]

Must

  • Pick exactly one real shared seam to evaluate or adopt.
  • Define a live Ricky product path that will prove the adoption.
  • Include regression checks that product messaging, blocker output, and evidence remain truthful.

Must Not

  • Bundle sessions, memory, policy, proactive behavior, and execution extraction into one vague migration.
  • Skip the comparison/evaluation step for mature Ricky-local seams.
  • Treat adoption as successful without a live product-path proof.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.future-surfaces-use-shared-runtime (agent-assistant-boundary/manual)

User message

Design future Slack or web support for Ricky using Agent Assistant packages.

Ricky output

## Ricky – Future Slack / Web Support Design  
*Using the Agent Assistant Packages (AAP) framework and adhering to the Ricky repository conventions (AGENTS.md, workflow standards, shared authoring rules, and product specs).*

---

### 1. High‑Level Architecture

| Component | Responsibility | AAP Package | Runtime |
|-----------|----------------|------------|---------|
| **Ricky‑Core** | Coordination engine, contract enforcement, deterministic gating | `ricky-core` | Node 18 (Docker) |
| **Slack Adapter** | Bi‑directional Slack events ↔️ Ricky‑Core | `ricky-adapter-slack` | Serverless (AWS Lambda) |
| **Web UI** | Interactive chat widget, live‑preview of workflow state | `ricky-adapter-web` | Vercel/Next.js |
| **Persistence Layer** | Immutable audit log, workflow versioning, PR metadata | `ricky-store` (PostgreSQL + EventStore) | Managed RDS |
| **Verification Service** | Deterministic gate execution, 80‑100 validation loops, artifact generation | `ricky-verify` | Kubernetes (autoscale) |
| **Review Bot** | Auto‑generate review artifacts, enforce blocker policy | `ricky-review` | Cloud Run |
| **CI/CD Pipeline** | Scoped branch/PR boundaries, deterministic gate checks | GitHub Act
...[truncated]

Must

  • Prefer shared surfaces, webhook-runtime, sessions, and routing primitives for future non-CLI interaction where mature.
  • Keep local CLI behavior product-local unless shared adoption is proven harmless.
  • Explain which behavior is future/target architecture versus implemented today.

Must Not

  • Preemptively add memory, policy, or proactive packages without a real Ricky product requirement.
  • Let future surface abstractions distort the current CLI contract.
  • Duplicate a mature Agent Assistant capability locally without justification.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.generation-default-not-execution (cli-behavior/manual)

User message

A user runs ricky --mode local --spec "generate a workflow for package checks" without --run.

Ricky output

**What happens when you run**

```bash
ricky --mode local --spec "generate a workflow for package checks"

without the --run flag

Step Description Outcome
1️⃣ Parse the spec – Ricky reads the free‑form text “generate a workflow for package checks” and translates it into a workflow contract (inputs, outputs, tasks, verification gates). A deterministic contract is produced and displayed on stdout.
2️⃣ Validate the contract – All required fields (name, description, inputs, outputs, tasks, gates) are checked against the Ricky schema. If any required element is missing or malformed, Ricky prints a validation error and exits with a non‑zero status.
3️⃣ Dry‑run mode – Because --run is omitted, Ricky does not execute any tasks, does not create branches, PRs, or artifacts. It only shows you what would be run. You get a readable YAML/JSON representation of the workflow contract and a summary of the verification gates that would be applied.
4️⃣ Exit – After printing the contract (or errors), Ricky terminates. No side‑effects are performed on the repository or CI sy
...[truncated]

**Must**

- Say generation is the default and execution was not requested.
- Print the generated artifact path, workflow id, spec digest, and next run command.
- Avoid showing execution evidence for a generation-only request.

**Must Not**

- Imply the workflow ran automatically.
- Present a generation-only result as execution success.
- Hide the opt-in commands for running the artifact.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.first-run-copy-is-compact-and-truthful</code> (cli-behavior/manual)</summary>

**User message**

> Render Ricky's first-run CLI onboarding for a new user.

**Ricky output**

```text
**Ricky – First‑Run CLI Onboarding**

Below is the complete, ready‑to‑run onboarding workflow for a brand‑new user of the **Ricky** AgentWorkforce CLI. Follow the steps in order; each stage includes a deterministic verification gate that will halt the process and emit a clear blocker message if anything goes wrong.

---

## 1️⃣ Prerequisites (Verification Gate #1)

| Check | Command | Expected Output |
|-------|---------|-----------------|
| **Node ≥ 18** | `node -v` | `v18.x.x` or higher |
| **npm ≥ 9** | `npm -v` | `9.x.x` or higher |
| **Git** | `git --version` | `git version 2.x.x` |
| **Supported OS** | `uname -a` (Linux/macOS) or `systeminfo` (Windows) | Linux, macOS, or Windows 10+ |

*If any check fails, abort and install the missing prerequisite before proceeding.*

---

## 2️⃣ Install the Ricky CLI (Verification Gate #2)

```bash
# Global install (recommended)
npm i -g @ricky/agentworkforce-cli

Verification:

ricky --version

Expected output: ricky-cli vX.Y.Z (the latest released version).
Blocker: “Ricky CLI not found – re‑run the install command or check npm permissions.”


3️⃣ Initialise a New Workspace (Verification Gate #3)

...[truncated]

Must

  • Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
  • End every branch with a concrete next step.
  • Advertise only commands that are currently implemented.

Must Not

  • Sound like a launch page or documentation dump.
  • Claim Ricky runs workflows by default when generation is the default path.
  • Require web or Slack onboarding before CLI use.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.recovery-guidance-no-stack-traces (cli-behavior/manual)

User message

A user gives Ricky an empty spec or a missing spec file.

Ricky output

**Ricky’s Standard Response to an Empty or Missing Spec File**

| Phase | Action | Output / Artifact | Deterministic Gate |
|-------|--------|-------------------|--------------------|
| **1️⃣ Detect** | • Verify the spec path exists in the repo.<br>• If the file is present, read its contents and check for a non‑empty JSON/YAML object. | *Log entry* – `spec‑detect: missing` or `spec‑detect: empty` | **Gate A** – `file‑exists && file‑size > 0` |
| **2️⃣ Blocker Report** | • Immediately raise a **Blocker Issue** in the repository’s `#workflow‑issues` channel. | *GitHub Issue* – Title: **[BLOCKER] Spec file missing/empty**<br>Body includes:<br>• Path attempted<br>• Detection timestamp<br>• Link to the commit/PR where the problem was found<br>• Request for a complete spec | **Gate B** – Issue must be opened before any further automation proceeds |
| **3️⃣ Create a Placeholder** | • Generate a minimal “stub” spec in the same location to keep the CI pipeline syntactically happy.<br>• The stub contains only required top‑level keys with placeholder values (e.g., `name: "<TODO>"`, `version: "0.0.0"`, `description: "Placeholder – awaiting real spec"`). | *Commit* on a **`spec‑placeholder/<bra
...[truncated]

Must

  • Return a user-facing failure or guidance message with a real recovery command.
  • Distinguish generation failure from execution failure.
  • Show stack traces only when verbose diagnostic mode is requested.

Must Not

  • Crash with an uncaught exception in normal mode.
  • Suggest commands that do not exist.
  • Pretend a missing spec was accepted.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.status-does-not-invent-provider-state (cli-behavior/manual)

User message

Render ricky status when no provider checks have proven Google or GitHub are connected.

Ricky output

**`ricky status` output (no provider checks have proven Google or GitHub are connected)**  

=== AgentWorkforce Status Report ===

🟡 Provider Connectivity
• Google : NOT VERIFIED (no successful check)
• GitHub : NOT VERIFIED (no successful check)

✅ Core Services
• Scheduler : RUNNING
• Queue : ACTIVE
• Metrics : COLLECTING

🔧 Configuration
• ricky.yaml : LOADED
• env vars : OK

⚠️ Alerts
• No successful connectivity verification for Google or GitHub.
• Dependent workflows that require these providers may be blocked.

🗒️ Next Steps

  1. Run provider health checks:
    ricky check --provider google
    ricky check --provider github
  2. Review authentication credentials in the secret store.
  3. Re‑run ricky status after checks complete.

--- End of Report ---

Must

  • Report unknown or not-connected provider state honestly.
  • Update provider status only from explicit provider checks or Cloud status results.
  • Give concrete setup guidance for Cloud when relevant.

Must Not

  • Mark Google or GitHub connected because guidance text was shown.
  • Invent a provider connection URL or OAuth flow.
  • Show empty fields with no recovery guidance when config is missing.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-matcher-registry-backed (generation-quality/manual)

User message

Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.

Ricky output

## GitHub Primitive‑Change Workflow  
**Purpose:** Apply a single‑file edit (e.g., README typo fix) to a public repository while guaranteeing reproducible review, testing, and merge.  

| Phase | Owner | Action | Artifact | Verification Gate |
|-------|-------|--------|----------|--------------------|
| **1️⃣ Init** | **Ricky (Coordination)** | • `git checkout -b fix/readme‑typo` <br>• Create `CHANGELOG.md` entry “Fix typo in README”. | `branch‑spec.yaml` (branch name, base, ticket ID) | **Branch‑Spec Validation** – schema check (YAML lint, required fields). |
| **2️⃣ Edit** | **Ricky (Authoring)** | • Open file, make edit, `git add README.md` <br>• Run `pre‑commit` hooks (spell‑check, lint). | `commit‑msg.txt` (conventional commit) | **Commit‑Msg Lint** – `commitlint` must pass. |
| **3️⃣ CI‑Prep** | **Ricky (Reliability)** | • Push branch → trigger GitHub Actions `ci-pr-check.yml`. <br>• Artifact: `ci‑run‑id`. | `ci‑run‑id` stored in PR metadata. | **CI Trigger** – GitHub Actions must start within 30 s. |
| **4️⃣ Automated Checks** | **Ricky (Reliability)** | • **Lint** (`markdownlint`, `yamllint`) <br>• **Unit‑test stub** (none for README) <br>• **Security scan** (`trivy` on rep
...[truncated]

Must

  • Select skills from the actual registry rather than a hardcoded static set.
  • Record ranked skill matches with confidence and trigger evidence.
  • Fall back gracefully when the registry is missing or no skill clears the threshold.

Must Not

  • Claim runtime agents embody skills when only generation-time guidance was applied.
  • Fail generation solely because optional skill files are missing.
  • Hide skill selection evidence from artifacts or JSON output.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)

User message

Generate a workflow where the spec says "use Claude to review and Codex to implement".

Ricky output

**Workflow: Claude‑Review → Codex‑Implementation**

| **Section** | **Content** |
|-------------|-------------|
| **Name** | `review‑then‑implement` |
| **Owner** | `team‑mlops` |
| **Purpose** | Automate a deterministic pipeline where Claude performs a semantic review of a design spec and Codex generates the corresponding code. |
| **Scope** | • Input: `spec.md` (Markdown) <br>• Output: `implementation/` (source files) <br>• Supported languages: Python, TypeScript, Go |
| **Branch / PR Boundary** | All changes are made on a feature branch `feat/review‑implement‑<timestamp>` and merged via a single PR that must pass all verification gates before merge. |

---

## 1. Contract (Inputs / Outputs)

| **Artifact** | **Type** | **Location** | **Schema** |
|--------------|----------|--------------|------------|
| `spec.md` | Text (Markdown) | `src/spec/` | Must contain a top‑level heading `# Specification` and a `## Acceptance Criteria` list. |
| `review_report.json` | JSON | `artifacts/review/` | `{ "status": "PASS|FAIL", "issues": [{ "line": int, "msg": string }], "summary": string }` |
| `implementation/` | Directory | `src/implementation/` | Files generated by Codex; each file must co
...[truncated]

Must

  • Assign per-step runner/model decisions from explicit spec hints where possible.
  • Let skill metadata or project defaults fill gaps when the spec is silent.
  • Write tool-selection.json or equivalent audit metadata.

Must Not

  • Use one generic runtime default for every agent despite explicit hints.
  • Let skill defaults override explicit user/spec runner hints.
  • Omit the reason each tool or model was chosen.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)

User message

Use --refine to sharpen a generated workflow's step task descriptions and acceptance gates.

Ricky output

## Using `--refine` to Sharpen Step Descriptions & Acceptance Gates  

`--refine` is the deterministic “polish” flag for any **AgentWorkforce** workflow definition (`.awf.yaml`). It runs a second‑pass pass‑through that:

1. **Normalises** free‑form prose into the **Task Description** schema.  
2. **Validates** every **Acceptance Gate** against the **Contract** section.  
3. **Injects** missing verification artefacts (e.g., unit‑test stubs, lint checks).  
4. **Fails fast** on any ambiguous wording or missing gate, reporting a **Blocker** that must be resolved before the PR can be merged.

Below is a step‑by‑step contract‑first pattern you can copy‑paste into your repo.

---

### 1️⃣ Minimal Workflow Skeleton (pre‑refine)

```yaml
# .awf/workflows/issue_triage.awf.yaml
name: Issue Triage
description: >
  Process new GitHub issues, label them, and assign an owner.

steps:
  - id: fetch_issue
    run: |
      # placeholder script
      echo "fetching issue"
  - id: label_issue
    run: |
      # placeholder script
      echo "labeling issue"
  - id: assign_owner
    run: |
      # placeholder script
      echo "assigning owner"

*The file compiles, but the step tasks are vague and
...[truncated]


**Must**

- Keep the deterministic unrefined path as the default.
- Bound the refinement pass by timeout, token budget, and editable regions.
- Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.

**Must Not**

- Change the workflow graph, agent assignments, or side-effect scope during refinement.
- Fail the whole generation if optional refinement times out.
- Hide warnings when the deterministic artifact is returned unchanged.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>generation-quality.behavior-grounded-gates</code> (generation-quality/manual)</summary>

**User message**

> Generate a workflow for the `ricky --version` spec.

**Ricky output**

```text
**Workflow: `ricky --version` Specification & Delivery**  
*(Ricky AgentWorkforce – v0.9.3)*  

---

## 1. Overview
The `ricky --version` command must return the current Ricky CLI version in a deterministic, machine‑parsable format while also supporting the standard `--help` flag. This workflow defines the end‑to‑end contract, authoring steps, verification gates, and review artifacts required to ship the feature with full reliability guarantees.

---

## 2. Contract (Public Interface)

| Element | Type | Description | Example |
|---------|------|-------------|---------|
| `command` | string | Full CLI invocation | `ricky --version` |
| `output` | object | Structured JSON payload printed to **stdout** | `{ "ricky": "0.9.3", "gitCommit": "a1b2c3d", "buildDate": "2026‑05‑09T12:00:00Z" }` |
| `exitCode` | integer | Process exit status (always `0` on success) | `0` |
| `stderr` | string | Must be empty on success | `""` |
| `flags` | array | Supported flags (must include `--help`) | `[ "--version", "--help" ]` |
| `helpOutput` | string | Human‑readable help text when `--help` is supplied | *see §4* |

**Determinism guarantees**  

* The JSON fields are ordered exactly as shown.  
* `git
...[truncated]

Must

  • Build gates from the stated acceptance behavior, such as checking ricky --version output.
  • Avoid generic source-shape grep checks when the spec asks for CLI behavior.
  • Keep generated validation meaningful for the current repo shape.

Must Not

  • Treat grep -Eq 'export|function|class|workflow(' dist/ricky.js as proof of version behavior.
  • Claim the workflow is proven by source syntax alone.
  • Ignore the package-json version resolution order in the spec.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)

User message

Generate a workflow for many independent artifacts with a validation/fix/rerun loop.

Ricky output

**Workflow: “Batch‑Artifact Validation‑Fix‑Rerun Loop”**  
*Purpose*: Process a large set of independent artifacts (e.g., data files, code modules, config bundles) through deterministic validation, automatic fixing where possible, and controlled re‑execution until all pass or a blocker is reported.

---

## 1. Scope & Boundaries
| Item | Description |
|------|-------------|
| **Branch** | `feature/batch‑artifact‑validation` (short‑lived, created from `main`) |
| **PR** | `Batch‑Artifact Validation Loop` – includes workflow YAML, test harness, and docs. |
| **Artifacts** | Any file set under `artifacts/` that implements the `Artifact` interface (see contract). |
| **Owner** | `@team‑data‑ops` (primary) – reviewers: `@team‑qa`, `@team‑devops`. |
| **Blockers** | Any step that fails > 3 automatic retries or produces a non‑recoverable error (e.g., schema mismatch). Must be reported in the PR comment and the workflow aborts. |

---

## 2. Artifact Contract (`artifact_schema.json`)
```json
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Artifact",
  "type": "object",
  "required": ["id","type","payload","metadata"],
  "properties": {
    "id": {"type":"string"},
   
...[truncated]

Must

  • Choose dag, supervisor, or pipeline deliberately based on the work shape.
  • Explain the pattern choice in artifact metadata or a rationale.
  • Use dag for validation/fix/rerun loops when dependencies matter.

Must Not

  • Default blindly to dag for every workflow.
  • Collapse independent artifact work into one vague agent task.
  • Omit verification gates because the chosen pattern seems obvious.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-boundary-copy (generation-quality/manual)

User message

Describe how selected workflow-writing skills affected a generated Ricky workflow.

Ricky output

### How Selected Workflow‑Writing Skills Shape a Generated Ricky Workflow  

Below is a step‑by‑step illustration of the **five core writing skills** we emphasize in the Ricky repository and the concrete impact each one has on the final workflow artifact. The example workflow is a simple “**On‑Call Incident Triage**” pipeline that routes a newly created incident to the appropriate responder group, enriches it with context, and posts a status update to Slack.

| # | Skill | What the author does | Direct effect on the generated workflow |
|---|-------|----------------------|------------------------------------------|
| 1 | **Deterministic Contract Definition** | Writes a **`contract.yaml`** that lists required inputs (`incident_id`, `severity`), outputs (`triage_status`), and explicit data‑type constraints. | The workflow engine creates a **validation gate** (`validate_contract`) that aborts the run before any task executes if the payload violates the schema. This prevents downstream tasks from receiving malformed data and guarantees repeatable entry points. |
| 2 | **Explicit Verification Gates** | Inserts `gate` steps after every side‑effecting action (e.g., after calling the ticke
...[truncated]

Must

  • Describe skills as generation-time selection, loading, and rendering inputs.
  • Point to metadata such as loaded-skills.txt and skill-application-boundary.json.
  • Say the workflow instructions were informed by selected skills.

Must Not

  • Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
  • Treat metadata existence as proof that agents received skill bodies at runtime.
  • Overstate current tests beyond the generation boundary.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)

User message

Generate a workflow from a vague product spec and return it to the user.

Ricky output

**Blocker – Insufficient Specification**

The request is to “generate a workflow from a vague product spec.” Because the source specification is undefined, I cannot produce a deterministic, verifiable workflow contract that meets our reliability standards.  

**What’s needed to proceed**

1. **Concrete product description** – a brief narrative (1–2 sentences) of the user‑facing goal.  
2. **Key functional steps** – at least three high‑level actions the system must perform (e.g., “ingest user data,” “run eligibility check,” “send confirmation email”).  
3. **Success criteria** – measurable outcomes for each step (e.g., “data validation passes 99 % of the time”).  
4. **Failure handling** – how the workflow should react to errors (retry policy, fallback path, alerting).  
5. **Boundary conditions** – any limits on input size, rate limits, or external service dependencies.

Once these details are supplied, I will:

* Draft a **workflow contract** (inputs, outputs, state transitions).  
* Define **deterministic verification gates** for each step.  
* Produce a **review artifact** (markdown diagram + test matrix).  
* Scope the work to a single branch/PR with clear entry/exit criteria, 
...[truncated]

Must

  • Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
  • Validate with dry-run or targeted structural checks where possible.
  • Return artifacts, warnings, and follow-up commands honestly.

Must Not

  • Act like Ricky is a pure code-generation bot that emits workflows without verification.
  • Stop at "code compiles" as the proof bar.
  • Skip skill-aware workflow authoring guidance for serious workflows.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)

User message

A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.

Ricky output

**Ricky’s Immediate Response Plan**

| Phase | Action | Rationale | Deliverable |
|-------|--------|-----------|-------------|
| 1️⃣ Detect & Log | • Pull the run‑metadata from the orchestration service (e.g., Temporal, Airflow). <br>• Record the exact timestamp of the timeout, the step ID, and the worker‑ID that remained alive. <br>• Capture the worker’s heartbeat logs and any partial stdout/stderr. | Provides an immutable audit trail and isolates the failure to a single step/worker. | `run‑failure‑log.md` (auto‑generated) |
| 2️⃣ Blocker Report | • Mark the run as **BLOCKED** in the workflow state store. <br>• Emit a **Ricky‑Blocker** event with severity *high* and a short description: “Step X timed out; Worker Y still alive, no artifacts produced.” | Guarantees downstream pipelines do not consume incomplete data and surfaces the issue to the team instantly. | `blocker‑report.json` (published to the incident channel) |
| 3️⃣ Deterministic Verification Gate | • Create a **Verification Gate** named `gate‑step‑X‑recovery`. <br>• Gate criteria: <br> a) No active heartbeat from Worker Y for > 2 × step timeout. <br> b) All required artifacts for Step X are either present *or* explicitl
...[truncated]

Must

  • Classify the failure before deciding whether to retry.
  • Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
  • Preserve raw evidence and uncertainty when the class is not known.

Must Not

  • Blindly rerun the whole workflow without a blocker classification.
  • Treat every failure as a broken workflow definition.
  • Claim Ricky fixed the workflow before rerun evidence proves it.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)

User message

Ricky detects stale .agent-relay/, .relay/, and .trajectories/ state before launching a local workflow.

Ricky output

**Workflow Contract – Pre‑Launch Stale‑State Guard**

| **Element** | **Definition** |
|-------------|----------------|
| **Name** | `guard‑stale‑state‑prelaunch` |
| **Scope** | Local development runs of any `.agent‑relay/`, `.relay/`, or `.trajectories/` workflow. |
| **Inputs** | <ul><li>`WORKDIR` – absolute path to the repository root (string).</li></ul> |
| **Outputs** | <ul><li>`STALE_FOUND` – boolean flag (`true` if any stale directory is present).</li><li>`STALE_PATHS` – newline‑separated list of stale directories (string).</li></ul> |
| **Success Criteria** | The guard returns `STALE_FOUND = false`. If `true`, the workflow aborts with a deterministic error message and a remediation PR is auto‑generated. |
| **Verification Gate** | A deterministic shell script (`guard.sh`) that exits `0` on success, `1` on failure. The exit code is captured by the workflow engine and used to decide continuation. |
| **Idempotence** | The guard can be re‑run without side‑effects; it never mutates the file system. |
| **Determinism** | The guard’s result depends only on the presence of the three directories and their modification timestamps; no external services are consulted. |
| **Rollback*
...[truncated]

Must

  • Classify stale local runtime state as an environment contamination issue.
  • Recommend quarantine or isolated-run guidance before launch.
  • Record the observed paths and the action taken or recommended.

Must Not

  • Treat stale runtime state as a workflow logic failure.
  • Delete or overwrite state without an explicit safe path or user intent.
  • Continue into execution as if the workspace were clean.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)

User message

A run marker says another Ricky or Relay run is already active in this workspace.

Ricky output

**Fixing “Another Ricky or Relay run is already active in this workspace”**

The marker that blocks a new run is a safety file created by the **Ricky** CLI to prevent overlapping executions. When the file is left behind (e.g., a crash, a killed process, or a stale CI container) the next invocation sees it and aborts with the message you’re seeing.

Below is the **deterministic repair workflow** you can run locally or in CI. Follow the steps in order; each step has a verification gate that you should confirm before moving to the next one.

---

## 1️⃣ Verify the current state

```bash
# Show the active marker (if any)
ricky status

Expected output: No active Ricky/Relay run detected.
If you still see “active run”, proceed to step 2.


2️⃣ Locate the marker file

Ricky stores the lock in the workspace root under .ricky/run.lock (or .relay/run.lock for Relay).

ls -l .ricky/run.lock .relay/run.lock

Expected: file exists → proceed. If neither file exists, the CLI is reading a stale PID from its internal cache; skip to step 5.


3️⃣ Inspect the lock contents

cat .ricky/run.lock   # or .relay/run.lock

Typical content:

PID=12
...[truncated]

Must

  • Report the active marker, run id, or status path when available.
  • Ask the user to inspect, wait for, or explicitly clear the active run.
  • Avoid launching a competing run that could corrupt evidence.

Must Not

  • Silently start another run.
  • Hide the existing run marker from the user.
  • Treat the conflict as a generic failure with no recovery path.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)

User message

Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.

Ricky output

**Local Workflow Execution – Auto‑Fix Mode**  

Below is a deterministic, step‑by‑step contract you can copy‑paste into your terminal (or CI script) to run the workflow, trigger the auto‑fix path on the first failure, and verify the repaired artifact before committing any changes.

---

## 1️⃣ Prerequisites (verify once)

| Check | Command | Expected outcome |
|-------|---------|------------------|
| `ricky` CLI installed | `ricky --version` | `ricky vX.Y.Z` |
| Workflow repo cloned & on a clean branch | `git status` | “nothing to commit, working tree clean” |
| Docker (or the runtime your workflow uses) running | `docker ps` | List of containers (or empty) |
| Environment variables for the workflow (e.g., `WORKFLOW_ID`, `RUN_ID`) exported | `echo $WORKFLOW_ID` | non‑empty value |

If any check fails, resolve it before proceeding.

---

## 2️⃣ Input artifacts you already have

| Artifact | Source | How to reference |
|----------|--------|------------------|
| **Failed step name** | From the first run logs | `FAILED_STEP="process‑data"` |
| **Previous run ID** | From the CI dashboard | `PREV_RUN_ID="run‑20240531‑a1b2c3"` |
| **Workflow definition file** | `./workflow.yaml` (root of 
...[truncated]

Must

  • Use a bounded retry budget and summarize every attempt.
  • Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
  • Resume from the failed step with the previous run id when those values are available.

Must Not

  • Edit arbitrary repository source files as the default auto-fix surface.
  • Keep retrying after the configured max attempts.
  • Lose the single Ricky tracking run id across repair/resume attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.no-auto-fix-preserves-single-attempt (runtime-recovery/manual)

User message

A user runs ricky run workflows/foo.ts --no-auto-fix and the workflow fails.

Ricky output

**Quick‑Start Debug Checklist**

| Step | Command / Action | What to Look For |
|------|------------------|------------------|
| 1️⃣  | `ricky logs workflows/foo.ts` | Full stdout/stderr of the last run. Identify the exact error message and the stage where the failure occurred. |
| 2️⃣  | `ricky status workflows/foo.ts` | Shows the last known state of each verification gate (✅ passed, ❌ failed, ⏳ pending). |
| 3️⃣  | `ricky run workflows/foo.ts --debug` | Re‑run with verbose tracing. The debug log (`.ricky/debug.log`) contains the deterministic execution trace and the values of all contract inputs/outputs. |
| 4️⃣  | Open the workflow file | Verify that the **contract block** at the top matches the expected shape of inputs/outputs. |
| 5️⃣  | Run the failing gate in isolation | `ricky gate verify‑<gate‑name> workflows/foo.ts` – isolates the gate that is failing and prints the exact validation error. |
| 6️⃣  | Check the **branch/PR boundary** | If the workflow lives in a feature branch, ensure the latest `main` is merged (or re‑base) so that contract versions are in sync. |
| 7️⃣  | Look for **blocker reports** | `ricky blockers` lists any open blockers that prevent auto‑fixes or f
...[truncated]

Must

  • Preserve one-attempt behavior when auto-fix is disabled.
  • Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
  • Make clear that the user chose manual inspection over repair/resume automation.

Must Not

  • Start a repair loop despite --no-auto-fix.
  • Suppress the diagnosis because no repair was attempted.
  • Present the failure as a completed repair attempt.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.in-process-local-runner (runtime-recovery/manual)

User message

Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.

Ricky output

**Ricky’s Standard Procedure for Executing a Local TypeScript Workflow Artifact**

Below is the concrete, repeatable contract that Ricky follows whenever a developer asks to run a TypeScript‑based workflow that lives in the *primary local path* (by default `./workflows`). The contract is expressed as a deterministic series of steps, verification gates, and required artifacts so that the execution is reliable, auditable, and reproducible.

---

## 1. Preconditions (Input Contract)

| Item | Requirement | Source |
|------|-------------|--------|
| **Workflow source** | A single entry‑point file `*.workflow.ts` (or `index.ts`) located under `./workflows` (or a sub‑directory explicitly passed) | Developer |
| **Node runtime** | Node ≥ 18.x installed and available on `$PATH` | System |
| **Package manager** | `npm` (or `pnpm`/`yarn` if the repo declares a lockfile) | System |
| **TypeScript compiler** | `typescript` (≥ 5.0) as a devDependency in `package.json` | Repo |
| **ts‑node** | `ts-node` (≥ 10.9) as a devDependency in `package.json` | Repo |
| **Environment config** | Optional `.env` file in the repo root or explicit `RICKY_ENV` map passed to the command | Developer |
| **Executi
...[truncated]

Must

  • Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the agent-relay binary on PATH.
  • Precheck that Node and @agent-relay/sdk are resolvable for the workflow.
  • Record the actual spawn command in execution evidence.

Must Not

  • Fail solely because agent-relay is not on PATH when the SDK route is available.
  • Hide the actual runtime command from evidence.
  • Conflate the user-facing reproduction command with the primary internal spawn route.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)

User message

Ricky reaches a boundary after a structural failure persists after a fix attempt.

Ricky output

**Ricky’s Boundary‑Handling Procedure for a Persistent Structural Failure**

When a structural failure remains after an attempted fix, Ricky follows a deterministic, contract‑driven workflow that isolates the problem, prevents downstream impact, and creates a clear hand‑off for human intervention. The steps below are expressed as a **workflow contract** (inputs, actions, outputs, verification gates) and include the required review artifacts and validation loops.

---

## 1. Contract Definition

| Element | Description |
|---------|-------------|
| **Trigger** | `StructuralFailurePersisted` event emitted by the CI pipeline after a *FixAttempt* job returns `FAILURE` and the failure signature matches a known structural pattern (e.g., missing schema, broken dependency graph). |
| **Inputs** | - `failure_id` (UUID) <br> - `attempt_id` (UUID) <br> - `failure_log` (structured JSON) <br> - `fix_patch` (git diff) <br> - `target_branch` (string) |
| **Outputs** | - `blocker_issue` (GitHub Issue URL) <br> - `isolation_branch` (string) <br> - `validation_report` (JSON) |
| **Success Criteria** | All verification gates pass **or** a blocker is raised with full diagnostic context. |
| **Time‑box
...[truncated]

Must

  • Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
  • Distinguish escalation from a generic product failure.
  • Preserve enough context for a human operator to continue.

Must Not

  • Retry speculative fixes indefinitely.
  • Collapse to "something went wrong" without the attempted actions.
  • Discard evidence from failed repair attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)

User message

Produce a workflow health digest from many Ricky workflow runs.

Ricky output

**Ricky Workflow Health Digest – [Date Range]**

---

### 1. Executive Summary
| Metric | Value | Trend |
|--------|-------|-------|
| Total Runs (all workflows) | **1 842** | ↗ +12 % vs. prior period |
| Successful Runs | **1 617** | ↗ +9 % |
| Failed Runs | **225** | ↘ ‑4 % |
| Blocked Runs (manual gate) | **0** | — |
| Avg. Cycle Time (per run) | **4 m 27 s** | ↘ ‑15 % |
| Avg. Verification Gate Pass Rate | **96.3 %** | ↗ +1.2 % |

> **Health Rating:** **🟢 Green** – >95 % success, no outstanding blockers, verification gates meeting SLA.

---

### 2. Run‑Level Detail (Top 5 by runtime)

| Run ID | Workflow | Duration | Status | Verification Gate Passes | Notes |
|--------|----------|----------|--------|--------------------------|-------|
| `wf‑20240528‑001` | `deploy‑service‑A` | **9 m 12 s** | ✅ Success | 5/5 | Optimized artifact caching reduced time by 22 % |
| `wf‑20240527‑014` | `release‑pipeline‑B` | **8 m 45 s** | ✅ Success | 4/4 | No regressions detected |
| `wf‑20240526‑003` | `data‑ingest‑daily` | **7 m 31 s** | ⚠️ Partial (1 gate fail) | 3/4 | **Gate #2** failed – missing schema version; auto‑retriable, succeeded on retry |
| `wf‑20240525‑019` | `security‑scan‑nightly`
...[truncated]

Must

  • Consume normalized WorkflowRunEvidence rather than raw logs as the primary input.
  • Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
  • Produce concrete recommendations tied to specific workflows, steps, or metrics.

Must Not

  • Mutate evidence while analyzing it.
  • Return generic advice like "improve your workflow" without references.
  • Mix environment failures and workflow-logic failures into one undifferentiated bucket.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)

User message

Design a new Ricky Slack surface for workflow debugging.

Ricky output

**Ricky Slack Surface – “Workflow Debugger”**

---

## 1. Overview
A dedicated Slack app view that lets engineers **inspect, step‑through, and mutate** any Ricky‑managed workflow in real time. It is built as a **modal‑based UI** triggered from a slash‑command (`/ricky‑debug`) and backed by the existing Ricky API contract layer. All interactions are logged, gated by deterministic verification steps, and produce review artifacts for audit.

---

## 2. User Flow (Contract)

| Step | Trigger | Input | Ricky Service Call | Verification Gate | Output |
|------|---------|-------|--------------------|-------------------|--------|
| 1 | `/ricky‑debug <workflow‑id>` | workflow‑id (UUID) | `GET /workflows/{id}` | **Schema‑Validate** response (WorkflowSpec v2) | Modal “Select Run” with list of recent executions (last 24 h) |
| 2 | User selects execution | execution‑id | `GET /executions/{id}` | **Auth‑Check** (user ↔ execution owner or admin) | Modal “Execution Details” (status, start/end, input snapshot) |
| 3 | Click **Step Into** on a node | node‑id | `GET /executions/{id}/nodes/{nodeId}` | **State‑Consistent** (node must be *paused* or *failed*) | Modal “Node Debugger” with: <br>• Input pa
...[truncated]

Must

  • Treat Slack as a co-equal product surface, not a wrapper around CLI.
  • Route domain work through the same normalization, executor, evidence, and specialist stages.
  • Keep Slack-specific formatting, thread handling, and interactive components in the surface layer.

Must Not

  • Put workflow generation or diagnosis domain logic directly in the Slack handler.
  • Degrade Slack to a developer shortcut with weaker routing than CLI.
  • Skip signature verification, dedup, or thread handling when the surface is implemented.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.normalizer-is-only-translation-boundary (surfaces-ingress/manual)

User message

Add a new web handoff type that submits a workflow spec and mode.

Ricky output

**New Web‑Handoff Type – `workflowSubmit`**

| Item | Detail |
|------|--------|
| **Handoff name** | `workflowSubmit` |
| **Purpose** | Allows a web client (e.g., UI, external service) to submit a complete AgentWorkforce workflow specification together with the desired execution *mode* (e.g., `preview`, `run`, `dry‑run`). |
| **Payload schema** | ```json { "workflowSpec": { "$ref": "#/components/schemas/WorkflowSpec" }, "mode": { "type": "string", "enum": ["preview","run","dry-run"] } } ``` |
| **Response schema** | ```json { "requestId": "string", "status": "queued|accepted|rejected", "validationReport": { "errors": [], "warnings": [] }, "executionUrl": "string (optional)" } ``` |
| **Auth** | Bearer token (same scopes as other handoffs). |
| **Rate limit** | 30 req/min per token. |
| **Idempotency** | Client‑supplied `Idempotency-Key` header; duplicate submissions return the original `requestId`. |

---

## 1. Contract Definition (OpenAPI snippet)

```yaml
paths:
  /handoff/workflowSubmit:
    post:
      summary: Submit a workflow spec for execution or preview.
      operationId: submitWorkflow
      security:
        - bearerAuth: []
      requestBody:
        required: true
 
...[truncated]

Must

  • Add a handoff variant that normalizes into LocalInvocationRequest or CloudGenerateRequest.
  • Keep auth/session validation in the web surface before normalization.
  • Ensure downstream code does not need to know the request came from web.

Must Not

  • Short-circuit directly from the web handler to executors.
  • Create a parallel domain model for web requests.
  • Make the web surface the owner of local or Cloud routing semantics.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.mcp-claude-context-is-metadata (surfaces-ingress/manual)

User message

A Claude session hands Ricky a workflow spec plus conversation context and asks Ricky to determine whether to generate, debug, restart, analyze, or coordinate.

Ricky output

**Decision Matrix**

| Workflow State | Indicator | Recommended Action |
|----------------|-----------|---------------------|
| **Spec is complete, syntactically valid, and passes all deterministic verification gates** | All required sections (metadata, inputs, steps, outputs, contracts) are present; `ricky lint` returns **PASS**; unit‑test suite reports **0 failures** | **Generate** – create the new branch, add the workflow files, and open a PR with the standard review checklist. |
| **Spec is syntactically correct but fails one or more verification gates** (e.g., missing contract, mismatched input‑output types, circular dependency) | `ricky lint` reports **ERROR**; `ricky verify` shows failing contracts | **Debug** – return a concise list of the failing checks, suggest concrete fixes, and optionally open a temporary “debug” branch for iterative correction. |
| **Spec is missing critical sections or the conversation context indicates a major change request** (e.g., new trigger, altered SLA) | No `metadata.id`, undefined `trigger`, or contradictory requirements in the chat log | **Coordinate** – summarize the missing pieces, request clarification from the user, and propose a coordi
...[truncated]

Must

  • Normalize the tool call through the MCP/Claude handoff path.
  • Treat conversation context as advisory metadata.
  • Make the spec itself sufficient for routing whenever possible.

Must Not

  • Require the user to rewrite the spec manually as a workflow.
  • Put domain routing logic inside the MCP tool handler itself.
  • Ignore provided mode, workflow path, or prior decisions metadata.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.cloud-api-versioning (surfaces-ingress/manual)

User message

Change the Cloud Ricky generate API response to include a new field and change one existing error code name.

Ricky output

**Scope**  
- Add a new field **`requestId`** (string, UUID) to the successful JSON payload returned by **`POST /v1/generate`**.  
- Rename error code **`INVALID_PROMPT`** to **`PROMPT_VALIDATION_ERROR`** in the error‑response schema and all references.  

All changes are limited to the **`cloud-ricky`** service repository. No other services are impacted.

---

## 1. Contract Update (API Spec)

| Element | Change | Rationale |
|---------|--------|-----------|
| **Success response** (`200`) | Add `requestId: string (format: uuid)` | Enables tracing of individual generation requests across logs and downstream systems. |
| **Error response** (`400`) | Rename `code: "INVALID_PROMPT"` → `code: "PROMPT_VALIDATION_ERROR"` | Aligns naming with the new validation framework. |
| **OpenAPI file** | `openapi/cloud-ricky.yaml` – update `components.schemas.GenerateResponse` and `ErrorResponse` accordingly. | Source of truth for client SDK generation and contract tests. |

*Contract version bump*: `v1.2.0 → v1.3.0` (semantic‑minor).

---

## 2. Code Changes

| Path | Change | Owner |
|------|--------|-------|
| `src/handlers/generate.ts` | - Generate a UUID (`uuidv4()`) at request start and inclu
...[truncated]

Must

  • Allow additive response fields within /api/v1/ricky/....
  • Reject renaming or changing the meaning of an existing error code inside the same version.
  • Require a new API version for breaking request/response changes.

Must Not

  • Remove or rename existing response fields in v1.
  • Add a new required request field within v1.
  • Treat version transitions as a

Comment truncated to stay within GitHub limits. Download the ricky-eval-run artifact for the full human-review.md.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant