Conversation
…ter throws
When `ricky --mode local --spec-file <foo>.md --run` invoked the
workforce-persona writer (opencode/claude CLI) and that call failed —
e.g. non-completed status, malformed/unfenced output, harness errored —
generateWithWorkforcePersona's catch block returned `success: false`
with `WORKFORCE_PERSONA_WRITER_FAILED` as a *blocking* issue. That
discarded the perfectly valid `baseResult.artifact` (the deterministic
template render we'd already produced before calling the persona
writer), so entrypoint.execute() hit its `!success || !artifact`
early-return at src/local/entrypoint.ts:1083 without ever writing
anything to disk.
The downstream effect was the user-visible 7/7 INVALID_ARTIFACT loop
the original bug report described:
Generation: ok — workflows/generated/ricky-...ts
Workflow name: wf-1e14175ec3b5
Execution: blocked — INVALID_ARTIFACT at runtime-precheck
Auto-fix: stopped after 7/7 attempt(s) (INVALID_ARTIFACT)
What actually happened, traced end-to-end against the user's spec:
[trace] gen call start workforcePersona=true
[trace] gen call returned success=false hasArtifact=true \
artifactPath=workflows/generated/ricky-...ts validErrors=1
[trace] gen EARLY-RETURN no write success=false hasArtifact=true \
firstErr=Workforce persona writer did not complete: failed.
So generation rendered a valid artifact in memory but threw it away
because the persona-writer failure was treated as fatal. retryBaseRequest
then promoted `response.artifacts[0].path` → `request.specPath`, the
gate at entrypoint.ts:1039 saw a workflowFile matching
isExecutableWorkflowPath and skipped generation, and runtime-precheck
ENOENT'd on the never-written file every retry.
Fix: when the persona writer throws (non-clarification path), fall back
to the deterministic baseResult — `success: true`, artifact intact,
and the persona failure surfaced as a `validation.warnings` entry plus
a `workforcePersona.warnings` entry. This mirrors the existing pre-write
validation fallback at pipeline.ts:154-166, which already returns
`success: true` with the deterministic render when the persona's output
fails repair-attempt validation. The two failure modes (writer threw vs.
writer returned invalid output) now behave the same way.
Updated workforce-persona-writer.test.ts:330 — previously named
"errors instead of writing a file when the harness returns malformed
text" — to assert the new fallback behavior. The original assertion
codified the bug.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughError handling for workforce persona writer failures was reclassified from hard failures returning validation errors to graceful fallbacks that succeed with warnings. When writer failures occur, the function now returns a deterministic artifact with unresolved persona metadata and merged warnings instead of blocking execution. ChangesWorkforce Persona Fallback Strategy
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Ricky Eval ReviewRun: Passed: 5 | Needs human: 42 | Reviewable: 42 | Missing output: 0 | Failed: 0 | Skipped: 0 Human Review CasesThese cases passed deterministic checks and include captured Ricky output for a human verdict against their REVIEW
|
| Step | Description | Outcome |
|---|---|---|
| 1️⃣ | Parse the spec – Ricky reads the free‑form text “generate a workflow for package checks” and translates it into a workflow contract (inputs, outputs, tasks, verification gates). | A deterministic contract is produced and displayed on stdout. |
| 2️⃣ | Validate the contract – All required fields (name, description, inputs, outputs, tasks, gates) are checked against the Ricky schema. |
If any required element is missing or malformed, Ricky prints a validation error and exits with a non‑zero status. |
| 3️⃣ | Dry‑run mode – Because --run is omitted, Ricky does not execute any tasks, does not create branches, PRs, or artifacts. It only shows you what would be run. |
You get a readable YAML/JSON representation of the workflow contract and a summary of the verification gates that would be applied. |
| 4️⃣ | Exit – After printing the contract (or errors), Ricky terminates. No side‑effects are performed on the repository or CI sy | |
| ...[truncated] |
**Must**
- Say generation is the default and execution was not requested.
- Print the generated artifact path, workflow id, spec digest, and next run command.
- Avoid showing execution evidence for a generation-only request.
**Must Not**
- Imply the workflow ran automatically.
- Present a generation-only result as execution success.
- Hide the opt-in commands for running the artifact.
**Deterministic checks**
- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0
</details>
<details>
<summary><strong>REVIEW</strong> <code>cli.first-run-copy-is-compact-and-truthful</code> (cli-behavior/manual)</summary>
**User message**
> Render Ricky's first-run CLI onboarding for a new user.
**Ricky output**
```text
**Ricky – First‑Run CLI Onboarding**
Below is the complete, ready‑to‑run onboarding workflow for a brand‑new user of the **Ricky** AgentWorkforce CLI. Follow the steps in order; each stage includes a deterministic verification gate that will halt the process and emit a clear blocker message if anything goes wrong.
---
## 1️⃣ Prerequisites (Verification Gate #1)
| Check | Command | Expected Output |
|-------|---------|-----------------|
| **Node ≥ 18** | `node -v` | `v18.x.x` or higher |
| **npm ≥ 9** | `npm -v` | `9.x.x` or higher |
| **Git** | `git --version` | `git version 2.x.x` |
| **Supported OS** | `uname -a` (Linux/macOS) or `systeminfo` (Windows) | Linux, macOS, or Windows 10+ |
*If any check fails, abort and install the missing prerequisite before proceeding.*
---
## 2️⃣ Install the Ricky CLI (Verification Gate #2)
```bash
# Global install (recommended)
npm i -g @ricky/agentworkforce-cli
Verification:
ricky --versionExpected output: ricky-cli vX.Y.Z (the latest released version).
Blocker: “Ricky CLI not found – re‑run the install command or check npm permissions.”
3️⃣ Initialise a New Workspace (Verification Gate #3)
...[truncated]Must
- Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
- End every branch with a concrete next step.
- Advertise only commands that are currently implemented.
Must Not
- Sound like a launch page or documentation dump.
- Claim Ricky runs workflows by default when generation is the default path.
- Require web or Slack onboarding before CLI use.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.recovery-guidance-no-stack-traces (cli-behavior/manual)
User message
A user gives Ricky an empty spec or a missing spec file.
Ricky output
**Ricky’s Standard Response to an Empty or Missing Spec File**
| Phase | Action | Output / Artifact | Deterministic Gate |
|-------|--------|-------------------|--------------------|
| **1️⃣ Detect** | • Verify the spec path exists in the repo.<br>• If the file is present, read its contents and check for a non‑empty JSON/YAML object. | *Log entry* – `spec‑detect: missing` or `spec‑detect: empty` | **Gate A** – `file‑exists && file‑size > 0` |
| **2️⃣ Blocker Report** | • Immediately raise a **Blocker Issue** in the repository’s `#workflow‑issues` channel. | *GitHub Issue* – Title: **[BLOCKER] Spec file missing/empty**<br>Body includes:<br>• Path attempted<br>• Detection timestamp<br>• Link to the commit/PR where the problem was found<br>• Request for a complete spec | **Gate B** – Issue must be opened before any further automation proceeds |
| **3️⃣ Create a Placeholder** | • Generate a minimal “stub” spec in the same location to keep the CI pipeline syntactically happy.<br>• The stub contains only required top‑level keys with placeholder values (e.g., `name: "<TODO>"`, `version: "0.0.0"`, `description: "Placeholder – awaiting real spec"`). | *Commit* on a **`spec‑placeholder/<bra
...[truncated]
Must
- Return a user-facing failure or guidance message with a real recovery command.
- Distinguish generation failure from execution failure.
- Show stack traces only when verbose diagnostic mode is requested.
Must Not
- Crash with an uncaught exception in normal mode.
- Suggest commands that do not exist.
- Pretend a missing spec was accepted.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.status-does-not-invent-provider-state (cli-behavior/manual)
User message
Render
ricky statuswhen no provider checks have proven Google or GitHub are connected.
Ricky output
**`ricky status` output (no provider checks have proven Google or GitHub are connected)**
=== AgentWorkforce Status Report ===
🟡 Provider Connectivity
• Google : NOT VERIFIED (no successful check)
• GitHub : NOT VERIFIED (no successful check)
✅ Core Services
• Scheduler : RUNNING
• Queue : ACTIVE
• Metrics : COLLECTING
🔧 Configuration
• ricky.yaml : LOADED
• env vars : OK
• No successful connectivity verification for Google or GitHub.
• Dependent workflows that require these providers may be blocked.
🗒️ Next Steps
- Run provider health checks:
ricky check --provider google
ricky check --provider github - Review authentication credentials in the secret store.
- Re‑run
ricky statusafter checks complete.
--- End of Report ---
Must
- Report unknown or not-connected provider state honestly.
- Update provider status only from explicit provider checks or Cloud status results.
- Give concrete setup guidance for Cloud when relevant.
Must Not
- Mark Google or GitHub connected because guidance text was shown.
- Invent a provider connection URL or OAuth flow.
- Show empty fields with no recovery guidance when config is missing.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-matcher-registry-backed (generation-quality/manual)
User message
Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.
Ricky output
## GitHub Primitive‑Change Workflow
**Purpose:** Apply a single‑file edit (e.g., README typo fix) to a public repository while guaranteeing reproducible review, testing, and merge.
| Phase | Owner | Action | Artifact | Verification Gate |
|-------|-------|--------|----------|--------------------|
| **1️⃣ Init** | **Ricky (Coordination)** | • `git checkout -b fix/readme‑typo` <br>• Create `CHANGELOG.md` entry “Fix typo in README”. | `branch‑spec.yaml` (branch name, base, ticket ID) | **Branch‑Spec Validation** – schema check (YAML lint, required fields). |
| **2️⃣ Edit** | **Ricky (Authoring)** | • Open file, make edit, `git add README.md` <br>• Run `pre‑commit` hooks (spell‑check, lint). | `commit‑msg.txt` (conventional commit) | **Commit‑Msg Lint** – `commitlint` must pass. |
| **3️⃣ CI‑Prep** | **Ricky (Reliability)** | • Push branch → trigger GitHub Actions `ci-pr-check.yml`. <br>• Artifact: `ci‑run‑id`. | `ci‑run‑id` stored in PR metadata. | **CI Trigger** – GitHub Actions must start within 30 s. |
| **4️⃣ Automated Checks** | **Ricky (Reliability)** | • **Lint** (`markdownlint`, `yamllint`) <br>• **Unit‑test stub** (none for README) <br>• **Security scan** (`trivy` on rep
...[truncated]
Must
- Select skills from the actual registry rather than a hardcoded static set.
- Record ranked skill matches with confidence and trigger evidence.
- Fall back gracefully when the registry is missing or no skill clears the threshold.
Must Not
- Claim runtime agents embody skills when only generation-time guidance was applied.
- Fail generation solely because optional skill files are missing.
- Hide skill selection evidence from artifacts or JSON output.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)
User message
Generate a workflow where the spec says "use Claude to review and Codex to implement".
Ricky output
**Workflow: Claude‑Review → Codex‑Implementation**
| **Section** | **Content** |
|-------------|-------------|
| **Name** | `review‑then‑implement` |
| **Owner** | `team‑mlops` |
| **Purpose** | Automate a deterministic pipeline where Claude performs a semantic review of a design spec and Codex generates the corresponding code. |
| **Scope** | • Input: `spec.md` (Markdown) <br>• Output: `implementation/` (source files) <br>• Supported languages: Python, TypeScript, Go |
| **Branch / PR Boundary** | All changes are made on a feature branch `feat/review‑implement‑<timestamp>` and merged via a single PR that must pass all verification gates before merge. |
---
## 1. Contract (Inputs / Outputs)
| **Artifact** | **Type** | **Location** | **Schema** |
|--------------|----------|--------------|------------|
| `spec.md` | Text (Markdown) | `src/spec/` | Must contain a top‑level heading `# Specification` and a `## Acceptance Criteria` list. |
| `review_report.json` | JSON | `artifacts/review/` | `{ "status": "PASS|FAIL", "issues": [{ "line": int, "msg": string }], "summary": string }` |
| `implementation/` | Directory | `src/implementation/` | Files generated by Codex; each file must co
...[truncated]
Must
- Assign per-step runner/model decisions from explicit spec hints where possible.
- Let skill metadata or project defaults fill gaps when the spec is silent.
- Write
tool-selection.jsonor equivalent audit metadata.
Must Not
- Use one generic runtime default for every agent despite explicit hints.
- Let skill defaults override explicit user/spec runner hints.
- Omit the reason each tool or model was chosen.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)
User message
Use
--refineto sharpen a generated workflow's step task descriptions and acceptance gates.
Ricky output
## Using `--refine` to Sharpen Step Descriptions & Acceptance Gates
`--refine` is the deterministic “polish” flag for any **AgentWorkforce** workflow definition (`.awf.yaml`). It runs a second‑pass pass‑through that:
1. **Normalises** free‑form prose into the **Task Description** schema.
2. **Validates** every **Acceptance Gate** against the **Contract** section.
3. **Injects** missing verification artefacts (e.g., unit‑test stubs, lint checks).
4. **Fails fast** on any ambiguous wording or missing gate, reporting a **Blocker** that must be resolved before the PR can be merged.
Below is a step‑by‑step contract‑first pattern you can copy‑paste into your repo.
---
### 1️⃣ Minimal Workflow Skeleton (pre‑refine)
```yaml
# .awf/workflows/issue_triage.awf.yaml
name: Issue Triage
description: >
Process new GitHub issues, label them, and assign an owner.
steps:
- id: fetch_issue
run: |
# placeholder script
echo "fetching issue"
- id: label_issue
run: |
# placeholder script
echo "labeling issue"
- id: assign_owner
run: |
# placeholder script
echo "assigning owner"
*The file compiles, but the step tasks are vague and
...[truncated]
**Must**
- Keep the deterministic unrefined path as the default.
- Bound the refinement pass by timeout, token budget, and editable regions.
- Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.
**Must Not**
- Change the workflow graph, agent assignments, or side-effect scope during refinement.
- Fail the whole generation if optional refinement times out.
- Hide warnings when the deterministic artifact is returned unchanged.
**Deterministic checks**
- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0
</details>
<details>
<summary><strong>REVIEW</strong> <code>generation-quality.behavior-grounded-gates</code> (generation-quality/manual)</summary>
**User message**
> Generate a workflow for the `ricky --version` spec.
**Ricky output**
```text
**Workflow: `ricky --version` Specification & Delivery**
*(Ricky AgentWorkforce – v0.9.3)*
---
## 1. Overview
The `ricky --version` command must return the current Ricky CLI version in a deterministic, machine‑parsable format while also supporting the standard `--help` flag. This workflow defines the end‑to‑end contract, authoring steps, verification gates, and review artifacts required to ship the feature with full reliability guarantees.
---
## 2. Contract (Public Interface)
| Element | Type | Description | Example |
|---------|------|-------------|---------|
| `command` | string | Full CLI invocation | `ricky --version` |
| `output` | object | Structured JSON payload printed to **stdout** | `{ "ricky": "0.9.3", "gitCommit": "a1b2c3d", "buildDate": "2026‑05‑09T12:00:00Z" }` |
| `exitCode` | integer | Process exit status (always `0` on success) | `0` |
| `stderr` | string | Must be empty on success | `""` |
| `flags` | array | Supported flags (must include `--help`) | `[ "--version", "--help" ]` |
| `helpOutput` | string | Human‑readable help text when `--help` is supplied | *see §4* |
**Determinism guarantees**
* The JSON fields are ordered exactly as shown.
* `git
...[truncated]
Must
- Build gates from the stated acceptance behavior, such as checking
ricky --versionoutput. - Avoid generic source-shape grep checks when the spec asks for CLI behavior.
- Keep generated validation meaningful for the current repo shape.
Must Not
- Treat
grep -Eq 'export|function|class|workflow(' dist/ricky.jsas proof of version behavior. - Claim the workflow is proven by source syntax alone.
- Ignore the package-json version resolution order in the spec.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)
User message
Generate a workflow for many independent artifacts with a validation/fix/rerun loop.
Ricky output
**Workflow: “Batch‑Artifact Validation‑Fix‑Rerun Loop”**
*Purpose*: Process a large set of independent artifacts (e.g., data files, code modules, config bundles) through deterministic validation, automatic fixing where possible, and controlled re‑execution until all pass or a blocker is reported.
---
## 1. Scope & Boundaries
| Item | Description |
|------|-------------|
| **Branch** | `feature/batch‑artifact‑validation` (short‑lived, created from `main`) |
| **PR** | `Batch‑Artifact Validation Loop` – includes workflow YAML, test harness, and docs. |
| **Artifacts** | Any file set under `artifacts/` that implements the `Artifact` interface (see contract). |
| **Owner** | `@team‑data‑ops` (primary) – reviewers: `@team‑qa`, `@team‑devops`. |
| **Blockers** | Any step that fails > 3 automatic retries or produces a non‑recoverable error (e.g., schema mismatch). Must be reported in the PR comment and the workflow aborts. |
---
## 2. Artifact Contract (`artifact_schema.json`)
```json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Artifact",
"type": "object",
"required": ["id","type","payload","metadata"],
"properties": {
"id": {"type":"string"},
...[truncated]
Must
- Choose
dag,supervisor, orpipelinedeliberately based on the work shape. - Explain the pattern choice in artifact metadata or a rationale.
- Use
dagfor validation/fix/rerun loops when dependencies matter.
Must Not
- Default blindly to
dagfor every workflow. - Collapse independent artifact work into one vague agent task.
- Omit verification gates because the chosen pattern seems obvious.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-boundary-copy (generation-quality/manual)
User message
Describe how selected workflow-writing skills affected a generated Ricky workflow.
Ricky output
### How Selected Workflow‑Writing Skills Shape a Generated Ricky Workflow
Below is a step‑by‑step illustration of the **five core writing skills** we emphasize in the Ricky repository and the concrete impact each one has on the final workflow artifact. The example workflow is a simple “**On‑Call Incident Triage**” pipeline that routes a newly created incident to the appropriate responder group, enriches it with context, and posts a status update to Slack.
| # | Skill | What the author does | Direct effect on the generated workflow |
|---|-------|----------------------|------------------------------------------|
| 1 | **Deterministic Contract Definition** | Writes a **`contract.yaml`** that lists required inputs (`incident_id`, `severity`), outputs (`triage_status`), and explicit data‑type constraints. | The workflow engine creates a **validation gate** (`validate_contract`) that aborts the run before any task executes if the payload violates the schema. This prevents downstream tasks from receiving malformed data and guarantees repeatable entry points. |
| 2 | **Explicit Verification Gates** | Inserts `gate` steps after every side‑effecting action (e.g., after calling the ticke
...[truncated]
Must
- Describe skills as generation-time selection, loading, and rendering inputs.
- Point to metadata such as
loaded-skills.txtandskill-application-boundary.json. - Say the workflow instructions were informed by selected skills.
Must Not
- Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
- Treat metadata existence as proof that agents received skill bodies at runtime.
- Overstate current tests beyond the generation boundary.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)
User message
Generate a workflow from a vague product spec and return it to the user.
Ricky output
**Blocker – Insufficient Specification**
The request is to “generate a workflow from a vague product spec.” Because the source specification is undefined, I cannot produce a deterministic, verifiable workflow contract that meets our reliability standards.
**What’s needed to proceed**
1. **Concrete product description** – a brief narrative (1–2 sentences) of the user‑facing goal.
2. **Key functional steps** – at least three high‑level actions the system must perform (e.g., “ingest user data,” “run eligibility check,” “send confirmation email”).
3. **Success criteria** – measurable outcomes for each step (e.g., “data validation passes 99 % of the time”).
4. **Failure handling** – how the workflow should react to errors (retry policy, fallback path, alerting).
5. **Boundary conditions** – any limits on input size, rate limits, or external service dependencies.
Once these details are supplied, I will:
* Draft a **workflow contract** (inputs, outputs, state transitions).
* Define **deterministic verification gates** for each step.
* Produce a **review artifact** (markdown diagram + test matrix).
* Scope the work to a single branch/PR with clear entry/exit criteria,
...[truncated]
Must
- Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
- Validate with dry-run or targeted structural checks where possible.
- Return artifacts, warnings, and follow-up commands honestly.
Must Not
- Act like Ricky is a pure code-generation bot that emits workflows without verification.
- Stop at "code compiles" as the proof bar.
- Skip skill-aware workflow authoring guidance for serious workflows.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)
User message
A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.
Ricky output
**Ricky’s Immediate Response Plan**
| Phase | Action | Rationale | Deliverable |
|-------|--------|-----------|-------------|
| 1️⃣ Detect & Log | • Pull the run‑metadata from the orchestration service (e.g., Temporal, Airflow). <br>• Record the exact timestamp of the timeout, the step ID, and the worker‑ID that remained alive. <br>• Capture the worker’s heartbeat logs and any partial stdout/stderr. | Provides an immutable audit trail and isolates the failure to a single step/worker. | `run‑failure‑log.md` (auto‑generated) |
| 2️⃣ Blocker Report | • Mark the run as **BLOCKED** in the workflow state store. <br>• Emit a **Ricky‑Blocker** event with severity *high* and a short description: “Step X timed out; Worker Y still alive, no artifacts produced.” | Guarantees downstream pipelines do not consume incomplete data and surfaces the issue to the team instantly. | `blocker‑report.json` (published to the incident channel) |
| 3️⃣ Deterministic Verification Gate | • Create a **Verification Gate** named `gate‑step‑X‑recovery`. <br>• Gate criteria: <br> a) No active heartbeat from Worker Y for > 2 × step timeout. <br> b) All required artifacts for Step X are either present *or* explicitl
...[truncated]
Must
- Classify the failure before deciding whether to retry.
- Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
- Preserve raw evidence and uncertainty when the class is not known.
Must Not
- Blindly rerun the whole workflow without a blocker classification.
- Treat every failure as a broken workflow definition.
- Claim Ricky fixed the workflow before rerun evidence proves it.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)
User message
Ricky detects stale
.agent-relay/,.relay/, and.trajectories/state before launching a local workflow.
Ricky output
**Workflow Contract – Pre‑Launch Stale‑State Guard**
| **Element** | **Definition** |
|-------------|----------------|
| **Name** | `guard‑stale‑state‑prelaunch` |
| **Scope** | Local development runs of any `.agent‑relay/`, `.relay/`, or `.trajectories/` workflow. |
| **Inputs** | <ul><li>`WORKDIR` – absolute path to the repository root (string).</li></ul> |
| **Outputs** | <ul><li>`STALE_FOUND` – boolean flag (`true` if any stale directory is present).</li><li>`STALE_PATHS` – newline‑separated list of stale directories (string).</li></ul> |
| **Success Criteria** | The guard returns `STALE_FOUND = false`. If `true`, the workflow aborts with a deterministic error message and a remediation PR is auto‑generated. |
| **Verification Gate** | A deterministic shell script (`guard.sh`) that exits `0` on success, `1` on failure. The exit code is captured by the workflow engine and used to decide continuation. |
| **Idempotence** | The guard can be re‑run without side‑effects; it never mutates the file system. |
| **Determinism** | The guard’s result depends only on the presence of the three directories and their modification timestamps; no external services are consulted. |
| **Rollback*
...[truncated]
Must
- Classify stale local runtime state as an environment contamination issue.
- Recommend quarantine or isolated-run guidance before launch.
- Record the observed paths and the action taken or recommended.
Must Not
- Treat stale runtime state as a workflow logic failure.
- Delete or overwrite state without an explicit safe path or user intent.
- Continue into execution as if the workspace were clean.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)
User message
A run marker says another Ricky or Relay run is already active in this workspace.
Ricky output
**Fixing “Another Ricky or Relay run is already active in this workspace”**
The marker that blocks a new run is a safety file created by the **Ricky** CLI to prevent overlapping executions. When the file is left behind (e.g., a crash, a killed process, or a stale CI container) the next invocation sees it and aborts with the message you’re seeing.
Below is the **deterministic repair workflow** you can run locally or in CI. Follow the steps in order; each step has a verification gate that you should confirm before moving to the next one.
---
## 1️⃣ Verify the current state
```bash
# Show the active marker (if any)
ricky status
Expected output: No active Ricky/Relay run detected.
If you still see “active run”, proceed to step 2.
2️⃣ Locate the marker file
Ricky stores the lock in the workspace root under .ricky/run.lock (or .relay/run.lock for Relay).
ls -l .ricky/run.lock .relay/run.lockExpected: file exists → proceed. If neither file exists, the CLI is reading a stale PID from its internal cache; skip to step 5.
3️⃣ Inspect the lock contents
cat .ricky/run.lock # or .relay/run.lockTypical content:
PID=12
...[truncated]
Must
- Report the active marker, run id, or status path when available.
- Ask the user to inspect, wait for, or explicitly clear the active run.
- Avoid launching a competing run that could corrupt evidence.
Must Not
- Silently start another run.
- Hide the existing run marker from the user.
- Treat the conflict as a generic failure with no recovery path.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)
User message
Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.
Ricky output
**Local Workflow Execution – Auto‑Fix Mode**
Below is a deterministic, step‑by‑step contract you can copy‑paste into your terminal (or CI script) to run the workflow, trigger the auto‑fix path on the first failure, and verify the repaired artifact before committing any changes.
---
## 1️⃣ Prerequisites (verify once)
| Check | Command | Expected outcome |
|-------|---------|------------------|
| `ricky` CLI installed | `ricky --version` | `ricky vX.Y.Z` |
| Workflow repo cloned & on a clean branch | `git status` | “nothing to commit, working tree clean” |
| Docker (or the runtime your workflow uses) running | `docker ps` | List of containers (or empty) |
| Environment variables for the workflow (e.g., `WORKFLOW_ID`, `RUN_ID`) exported | `echo $WORKFLOW_ID` | non‑empty value |
If any check fails, resolve it before proceeding.
---
## 2️⃣ Input artifacts you already have
| Artifact | Source | How to reference |
|----------|--------|------------------|
| **Failed step name** | From the first run logs | `FAILED_STEP="process‑data"` |
| **Previous run ID** | From the CI dashboard | `PREV_RUN_ID="run‑20240531‑a1b2c3"` |
| **Workflow definition file** | `./workflow.yaml` (root of
...[truncated]
Must
- Use a bounded retry budget and summarize every attempt.
- Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
- Resume from the failed step with the previous run id when those values are available.
Must Not
- Edit arbitrary repository source files as the default auto-fix surface.
- Keep retrying after the configured max attempts.
- Lose the single Ricky tracking run id across repair/resume attempts.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.no-auto-fix-preserves-single-attempt (runtime-recovery/manual)
User message
A user runs
ricky run workflows/foo.ts --no-auto-fixand the workflow fails.
Ricky output
**Quick‑Start Debug Checklist**
| Step | Command / Action | What to Look For |
|------|------------------|------------------|
| 1️⃣ | `ricky logs workflows/foo.ts` | Full stdout/stderr of the last run. Identify the exact error message and the stage where the failure occurred. |
| 2️⃣ | `ricky status workflows/foo.ts` | Shows the last known state of each verification gate (✅ passed, ❌ failed, ⏳ pending). |
| 3️⃣ | `ricky run workflows/foo.ts --debug` | Re‑run with verbose tracing. The debug log (`.ricky/debug.log`) contains the deterministic execution trace and the values of all contract inputs/outputs. |
| 4️⃣ | Open the workflow file | Verify that the **contract block** at the top matches the expected shape of inputs/outputs. |
| 5️⃣ | Run the failing gate in isolation | `ricky gate verify‑<gate‑name> workflows/foo.ts` – isolates the gate that is failing and prints the exact validation error. |
| 6️⃣ | Check the **branch/PR boundary** | If the workflow lives in a feature branch, ensure the latest `main` is merged (or re‑base) so that contract versions are in sync. |
| 7️⃣ | Look for **blocker reports** | `ricky blockers` lists any open blockers that prevent auto‑fixes or f
...[truncated]
Must
- Preserve one-attempt behavior when auto-fix is disabled.
- Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
- Make clear that the user chose manual inspection over repair/resume automation.
Must Not
- Start a repair loop despite
--no-auto-fix. - Suppress the diagnosis because no repair was attempted.
- Present the failure as a completed repair attempt.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.in-process-local-runner (runtime-recovery/manual)
User message
Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.
Ricky output
**Ricky’s Standard Procedure for Executing a Local TypeScript Workflow Artifact**
Below is the concrete, repeatable contract that Ricky follows whenever a developer asks to run a TypeScript‑based workflow that lives in the *primary local path* (by default `./workflows`). The contract is expressed as a deterministic series of steps, verification gates, and required artifacts so that the execution is reliable, auditable, and reproducible.
---
## 1. Preconditions (Input Contract)
| Item | Requirement | Source |
|------|-------------|--------|
| **Workflow source** | A single entry‑point file `*.workflow.ts` (or `index.ts`) located under `./workflows` (or a sub‑directory explicitly passed) | Developer |
| **Node runtime** | Node ≥ 18.x installed and available on `$PATH` | System |
| **Package manager** | `npm` (or `pnpm`/`yarn` if the repo declares a lockfile) | System |
| **TypeScript compiler** | `typescript` (≥ 5.0) as a devDependency in `package.json` | Repo |
| **ts‑node** | `ts-node` (≥ 10.9) as a devDependency in `package.json` | Repo |
| **Environment config** | Optional `.env` file in the repo root or explicit `RICKY_ENV` map passed to the command | Developer |
| **Executi
...[truncated]
Must
- Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the
agent-relaybinary on PATH. - Precheck that Node and
@agent-relay/sdkare resolvable for the workflow. - Record the actual spawn command in execution evidence.
Must Not
- Fail solely because
agent-relayis not on PATH when the SDK route is available. - Hide the actual runtime command from evidence.
- Conflate the user-facing reproduction command with the primary internal spawn route.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)
User message
Ricky reaches a boundary after a structural failure persists after a fix attempt.
Ricky output
**Ricky’s Boundary‑Handling Procedure for a Persistent Structural Failure**
When a structural failure remains after an attempted fix, Ricky follows a deterministic, contract‑driven workflow that isolates the problem, prevents downstream impact, and creates a clear hand‑off for human intervention. The steps below are expressed as a **workflow contract** (inputs, actions, outputs, verification gates) and include the required review artifacts and validation loops.
---
## 1. Contract Definition
| Element | Description |
|---------|-------------|
| **Trigger** | `StructuralFailurePersisted` event emitted by the CI pipeline after a *FixAttempt* job returns `FAILURE` and the failure signature matches a known structural pattern (e.g., missing schema, broken dependency graph). |
| **Inputs** | - `failure_id` (UUID) <br> - `attempt_id` (UUID) <br> - `failure_log` (structured JSON) <br> - `fix_patch` (git diff) <br> - `target_branch` (string) |
| **Outputs** | - `blocker_issue` (GitHub Issue URL) <br> - `isolation_branch` (string) <br> - `validation_report` (JSON) |
| **Success Criteria** | All verification gates pass **or** a blocker is raised with full diagnostic context. |
| **Time‑box
...[truncated]
Must
- Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
- Distinguish escalation from a generic product failure.
- Preserve enough context for a human operator to continue.
Must Not
- Retry speculative fixes indefinitely.
- Collapse to "something went wrong" without the attempted actions.
- Discard evidence from failed repair attempts.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)
User message
Produce a workflow health digest from many Ricky workflow runs.
Ricky output
**Ricky Workflow Health Digest – [Date Range]**
---
### 1. Executive Summary
| Metric | Value | Trend |
|--------|-------|-------|
| Total Runs (all workflows) | **1 842** | ↗ +12 % vs. prior period |
| Successful Runs | **1 617** | ↗ +9 % |
| Failed Runs | **225** | ↘ ‑4 % |
| Blocked Runs (manual gate) | **0** | — |
| Avg. Cycle Time (per run) | **4 m 27 s** | ↘ ‑15 % |
| Avg. Verification Gate Pass Rate | **96.3 %** | ↗ +1.2 % |
> **Health Rating:** **🟢 Green** – >95 % success, no outstanding blockers, verification gates meeting SLA.
---
### 2. Run‑Level Detail (Top 5 by runtime)
| Run ID | Workflow | Duration | Status | Verification Gate Passes | Notes |
|--------|----------|----------|--------|--------------------------|-------|
| `wf‑20240528‑001` | `deploy‑service‑A` | **9 m 12 s** | ✅ Success | 5/5 | Optimized artifact caching reduced time by 22 % |
| `wf‑20240527‑014` | `release‑pipeline‑B` | **8 m 45 s** | ✅ Success | 4/4 | No regressions detected |
| `wf‑20240526‑003` | `data‑ingest‑daily` | **7 m 31 s** | ⚠️ Partial (1 gate fail) | 3/4 | **Gate #2** failed – missing schema version; auto‑retriable, succeeded on retry |
| `wf‑20240525‑019` | `security‑scan‑nightly`
...[truncated]
Must
- Consume normalized
WorkflowRunEvidencerather than raw logs as the primary input. - Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
- Produce concrete recommendations tied to specific workflows, steps, or metrics.
Must Not
- Mutate evidence while analyzing it.
- Return generic advice like "improve your workflow" without references.
- Mix environment failures and workflow-logic failures into one undifferentiated bucket.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)
User message
Design a new Ricky Slack surface for workflow debugging.
Ricky output
**Ricky Slack Surface – “Workflow Debugger”**
---
## 1. Overview
A dedicated Slack app view that lets engineers **inspect, step‑through, and mutate** any Ricky‑managed workflow in real time. It is built as a **modal‑based UI** triggered from a slash‑command (`/ricky‑debug`) and backed by the existing Ricky API contract layer. All interactions are logged, gated by deterministic verification steps, and produce review artifacts for audit.
---
## 2. User Flow (Contract)
| Step | Trigger | Input | Ricky Service Call | Verification Gate | Output |
|------|---------|-------|--------------------|-------------------|--------|
| 1 | `/ricky‑debug <workflow‑id>` | workflow‑id (UUID) | `GET /workflows/{id}` | **Schema‑Validate** response (WorkflowSpec v2) | Modal “Select Run” with list of recent executions (last 24 h) |
| 2 | User selects execution | execution‑id | `GET /executions/{id}` | **Auth‑Check** (user ↔ execution owner or admin) | Modal “Execution Details” (status, start/end, input snapshot) |
| 3 | Click **Step Into** on a node | node‑id | `GET /executions/{id}/nodes/{nodeId}` | **State‑Consistent** (node must be *paused* or *failed*) | Modal “Node Debugger” with: <br>• Input pa
...[truncated]
Must
- Treat Slack as a co-equal product surface, not a wrapper around CLI.
- Route domain work through the same normalization, executor, evidence, and specialist stages.
- Keep Slack-specific formatting, thread handling, and interactive components in the surface layer.
Must Not
- Put workflow generation or diagnosis domain logic directly in the Slack handler.
- Degrade Slack to a developer shortcut with weaker routing than CLI.
- Skip signature verification, dedup, or thread handling when the surface is implemented.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.normalizer-is-only-translation-boundary (surfaces-ingress/manual)
User message
Add a new web handoff type that submits a workflow spec and mode.
Ricky output
**New Web‑Handoff Type – `workflowSubmit`**
| Item | Detail |
|------|--------|
| **Handoff name** | `workflowSubmit` |
| **Purpose** | Allows a web client (e.g., UI, external service) to submit a complete AgentWorkforce workflow specification together with the desired execution *mode* (e.g., `preview`, `run`, `dry‑run`). |
| **Payload schema** | ```json { "workflowSpec": { "$ref": "#/components/schemas/WorkflowSpec" }, "mode": { "type": "string", "enum": ["preview","run","dry-run"] } } ``` |
| **Response schema** | ```json { "requestId": "string", "status": "queued|accepted|rejected", "validationReport": { "errors": [], "warnings": [] }, "executionUrl": "string (optional)" } ``` |
| **Auth** | Bearer token (same scopes as other handoffs). |
| **Rate limit** | 30 req/min per token. |
| **Idempotency** | Client‑supplied `Idempotency-Key` header; duplicate submissions return the original `requestId`. |
---
## 1. Contract Definition (OpenAPI snippet)
```yaml
paths:
/handoff/workflowSubmit:
post:
summary: Submit a workflow spec for execution or preview.
operationId: submitWorkflow
security:
- bearerAuth: []
requestBody:
required: true
...[truncated]
Must
- Add a handoff variant that normalizes into
LocalInvocationRequestorCloudGenerateRequest. - Keep auth/session validation in the web surface before normalization.
- Ensure downstream code does not need to know the request came from web.
Must Not
- Short-circuit directly from the web handler to executors.
- Create a parallel domain model for web requests.
- Make the web surface the owner of local or Cloud routing semantics.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.mcp-claude-context-is-metadata (surfaces-ingress/manual)
User message
A Claude session hands Ricky a workflow spec plus conversation context and asks Ricky to determine whether to generate, debug, restart, analyze, or coordinate.
Ricky output
**Decision Matrix**
| Workflow State | Indicator | Recommended Action |
|----------------|-----------|---------------------|
| **Spec is complete, syntactically valid, and passes all deterministic verification gates** | All required sections (metadata, inputs, steps, outputs, contracts) are present; `ricky lint` returns **PASS**; unit‑test suite reports **0 failures** | **Generate** – create the new branch, add the workflow files, and open a PR with the standard review checklist. |
| **Spec is syntactically correct but fails one or more verification gates** (e.g., missing contract, mismatched input‑output types, circular dependency) | `ricky lint` reports **ERROR**; `ricky verify` shows failing contracts | **Debug** – return a concise list of the failing checks, suggest concrete fixes, and optionally open a temporary “debug” branch for iterative correction. |
| **Spec is missing critical sections or the conversation context indicates a major change request** (e.g., new trigger, altered SLA) | No `metadata.id`, undefined `trigger`, or contradictory requirements in the chat log | **Coordinate** – summarize the missing pieces, request clarification from the user, and propose a coordi
...[truncated]
Must
- Normalize the tool call through the MCP/Claude handoff path.
- Treat conversation context as advisory metadata.
- Make the spec itself sufficient for routing whenever possible.
Must Not
- Require the user to rewrite the spec manually as a workflow.
- Put domain routing logic inside the MCP tool handler itself.
- Ignore provided mode, workflow path, or prior decisions metadata.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.cloud-api-versioning (surfaces-ingress/manual)
User message
Change the Cloud Ricky generate API response to include a new field and change one existing error code name.
Ricky output
**Scope**
- Add a new field **`requestId`** (string, UUID) to the successful JSON payload returned by **`POST /v1/generate`**.
- Rename error code **`INVALID_PROMPT`** to **`PROMPT_VALIDATION_ERROR`** in the error‑response schema and all references.
All changes are limited to the **`cloud-ricky`** service repository. No other services are impacted.
---
## 1. Contract Update (API Spec)
| Element | Change | Rationale |
|---------|--------|-----------|
| **Success response** (`200`) | Add `requestId: string (format: uuid)` | Enables tracing of individual generation requests across logs and downstream systems. |
| **Error response** (`400`) | Rename `code: "INVALID_PROMPT"` → `code: "PROMPT_VALIDATION_ERROR"` | Aligns naming with the new validation framework. |
| **OpenAPI file** | `openapi/cloud-ricky.yaml` – update `components.schemas.GenerateResponse` and `ErrorResponse` accordingly. | Source of truth for client SDK generation and contract tests. |
*Contract version bump*: `v1.2.0 → v1.3.0` (semantic‑minor).
---
## 2. Code Changes
| Path | Change | Owner |
|------|--------|-------|
| `src/handlers/generate.ts` | - Generate a UUID (`uuidv4()`) at request start and inclu
...[truncated]
Must
- Allow additive response fields within
/api/v1/ricky/.... - Reject renaming or changing the meaning of an existing error code inside the same version.
- Require a new API version for breaking request/response changes.
Must Not
- Remove or rename existing response fields in v1.
- Add a new required request field within v1.
- Treat version transitions as a
Comment truncated to stay within GitHub limits. Download the ricky-eval-run artifact for the full human-review.md.
Summary
ricky --mode local --spec-file <foo>.md --runconsumes the entire 7/7 auto-fix budget on identicalINVALID_ARTIFACT at runtime-precheckfailures whenever the workforce-persona writer (opencode/claude CLI) fails — non-completed status, malformed output, harness errored, etc. — even though Ricky already rendered a perfectly valid deterministic artifact in memory before invoking the persona writer.The
Generation: ok — <path>line is misleading — the artifact was rendered but never written. The<path>directoryworkflows/generated/does not exist on disk after the run.(This supersedes #83, which only patched the gate to re-render on retries — that doesn't help, because the persona writer just fails again.)
Root cause
Traced end-to-end against the user's spec by instrumenting the installed dist around
defaultArtifactWriter.writeArtifact,precheckRuntimeLaunch, the gate, the auto-fix writer, and the generation call. Concrete attempt-1 trace:So:
generateWithWorkforcePersonabuildsbaseResult(deterministic template render — always valid). Persona writer is invoked. Persona writer throwsWorkforcePersonaWriterError("Workforce persona writer did not complete: failed.").pipeline.ts:204-239builds aWORKFORCE_PERSONA_WRITER_FAILEDblocking validation issue and returns{ ...baseResult, success: false, validation, workforcePersona }. Crucially it spreadsbaseResult.artifactso the artifact field is non-null, but it setssuccess: falseto surface the writer failure.entrypoint.execute()hits the early-return atentrypoint.ts:1083(!success || !artifact) and returns without writing anything. The artifact-in-memory is pushed ontoresponse.artifacts(line 1075) but never persisted.retryBaseRequest, which promotesresponse.artifacts[0].path→request.specPath→source: 'workflow-artifact'.workflowFileForRoutereturns the (synthetic).tsspecPath because it matchesisExecutableWorkflowPath. Routing target stays at'debug'(spec content unchanged). The gatetarget === 'generate' || !workflowFileevaluatesfalse→ skips generation entirely.Generation: okprints because on retries 2–7,createArtifactReferenceGenerationStagesynthesizes a generation stage withstatus: 'ok'for the artifact-reference path — the CLI doesn't distinguish "rendered" from "written-and-readable".Fix
In the non-clarification catch branch of
generateWithWorkforcePersona, fall back to the deterministicbaseResult.artifactinstead of returningsuccess: false:success: true(artifact field already set via...baseResult)validationgets a warningWORKFORCE_PERSONA_WRITER_FAILEDissue (instead of a blocking one) viaaddValidationWarningworkforcePersona.warningsincludes the failure message so it surfaces in the run summaryThis matches the existing pre-write validation fallback at
pipeline.ts:154-166, which already returnssuccess: truewith the deterministic baseResult when the persona's output fails repair-attempt validation. The two persona-failure modes (writer threw vs. writer returned invalid output) now behave the same way.Test plan
npm run typechecknpm test— 1028 tests pass across 49 filessrc/product/generation/workforce-persona-writer.test.ts:330(renamed from "errors instead of writing a file when the harness returns malformed text" to "falls back to deterministic rendering with a visible warning when the harness returns malformed text"). The original assertion codified the bug. The new assertion verifies:result.success === trueresult.artifact.contentcontainsworkflow((the deterministic render, not the persona's malformed output)result.validation.warningsincludes the persona writer failure messageresult.workforcePersona.warningsincludes it tooOut of scope
Generation: ok — <path>CLI output is misleading when the artifact was rendered-but-not-written (or when the path comes fromcreateArtifactReferenceGenerationStageon a retry). Worth a follow-up.src/product/spec-intake/parser.ts:detectIntentscoring "spec for handling failed runs" asdebugintent (rather thangenerate) is a separate issue. Not the cause of the visible bug, but it complicates retry routing. Worth a follow-up.🤖 Generated with Claude Code