Skip to content

fix(auto-fix): don't resume from a gate whose inputs come from skipped agent steps#117

Merged
khaliqgant merged 1 commit into
mainfrom
fix/auto-fix-resume-anchor-skips-agent-deps
May 15, 2026
Merged

fix(auto-fix): don't resume from a gate whose inputs come from skipped agent steps#117
khaliqgant merged 1 commit into
mainfrom
fix/auto-fix-resume-anchor-skips-agent-deps

Conversation

@khaliqgant
Copy link
Copy Markdown
Member

Summary

Follow-up to the auto-fix work in #111/#113. Surfaced by a real master-orchestrator run (workspace-primitives PR-39 spec): one child workflow (update-last-week) looped for hours.

When a generated child workflow fails at a deterministic gate — e.g. final-review-pass-gate, which greps an artifact produced by upstream final-fix-claude / final-fix-codex agent steps — the auto-fix loop resumed with --start-from final-review-pass-gate --previous-run-id <id>. That skips the 14 upstream steps including the agent producers:

[startFrom] Skipping 14 steps, starting from "final-review-pass-gate"
[startFrom] No cached output for skipped step "review-claude" — using empty string
[final-review-pass-gate] Command failed (exit code 1)

The gate's input is never regenerated → it fails identically every attempt → child exhausts 7 auto-fix attempts → master retries the whole child → ~37 min/cycle, effectively forever.

Fix

  • resolveSafeResumeAnchor(failedStep, workflowContent) — parses the workflow step graph from the artifact via the TypeScript AST (per AGENTS.md "Source-Text Analysis: Use Grammar-Aware Parsers, Not Regex"; string-literal/HEREDOC step ids stay inert). When the failed step is non-regenerating (type deterministic/worktree/integration, no agent:) and its transitive dependency chain contains an agent/regenerating step, the anchor moves to the nearest upstream producer so the gate's input is regenerated. Earlier artifacts persist on disk between runs, so they aren't recomputed.
  • failedStepRepaired(...) gate: when a repair rewrote the failed step's own config (e.g. the lead-plan-gate marker-append logic — the gate itself was the bug), the fix lives in that step and resuming from it is correct. The downgrade only fires when the repair left the failed step untouched (the child-10 case: repair made validation non-terminal but didn't fix the gate's input).
  • Applied at the persona/deterministic-repair retry site (has old+new artifact content) and the repair-provider-failure site. The direct-repair path (V1 env/binary blockers) has no artifact content in scope and is unaffected.

Test plan

  • New resolveSafeResumeAnchor unit suite: gate→producer downgrade, agent step unchanged, pure-deterministic chain unchanged, missing/unparseable content, unknown step, HEREDOC-embedded step ids inert
  • Existing deterministically repairs generated lead-plan marker gates and resumes from the failed gate still asserts startFromStep: lead-plan-gate (repair touched the gate → no downgrade)
  • npx vitest run — 1171/1171 pass
  • npx tsc --noEmit clean

🤖 Generated with Claude Code

…d agent steps

When a generated child workflow failed at a deterministic gate (e.g.
`final-review-pass-gate`, which greps an artifact written by upstream
`final-fix-claude`/`final-fix-codex` agent steps), the auto-fix loop
resumed `--start-from <gate> --previous-run-id <id>`. That skips the
14 upstream steps including the agent producers, so the gate's input
artifact is never regenerated and the gate fails identically on every
attempt. The child exhausts its 7 auto-fix attempts, the master
orchestrator retries the whole child, and it loops ~37 min/cycle
forever — observed on the workspace-primitives PR-39 master run
(child 10 `update-last-week`).

Add `resolveSafeResumeAnchor`: parse the workflow step graph from the
artifact via the TypeScript AST (per AGENTS.md grammar-aware-parser
rule), and when the failed step is a non-regenerating step (type
deterministic/worktree/integration, no `agent:`) whose transitive
dependency chain contains an agent/regenerating step, move the resume
anchor to the *nearest* upstream producer so the gate's input is
regenerated. Earlier artifacts persist on disk between runs so they
need not be recomputed.

Gate the downgrade on `failedStepRepaired`: when a repair rewrote the
failed step's own config (e.g. the `lead-plan-gate` marker-append
logic — the gate itself was the bug), the fix lives in that step and
resuming from it is correct, so the anchor is left unchanged. The
downgrade only fires when the repair left the failed step untouched.

Applies at the persona/deterministic-repair retry site (carries
old+new artifact content) and the repair-provider-failure site. The
direct-repair path (V1 env/binary blockers) has no artifact content in
scope and is unaffected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 15, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: d98e10aa-e933-4a40-9364-79229d0ee3d3

📥 Commits

Reviewing files that changed from the base of the PR and between 4710e55 and 9435b71.

📒 Files selected for processing (2)
  • src/local/auto-fix-loop.test.ts
  • src/local/auto-fix-loop.ts

📝 Walkthrough

Walkthrough

The PR adds AST-based workflow step-graph analysis to auto-fix retry logic. A new resolveSafeResumeAnchor function parses TypeScript workflow builder calls to build a dependency graph, then computes the nearest upstream regenerating agent step as a safe resume anchor instead of always retrying from the originally failed step. This anchor is applied across three auto-fix retry paths with appropriate warnings and progress messaging.

Changes

Safe Resume Anchor Implementation and Integration

Layer / File(s) Summary
Safe Resume Anchor Algorithm and Tests
src/local/auto-fix-loop.ts, src/local/auto-fix-loop.test.ts
Implements resolveSafeResumeAnchor function that parses workflow builder calls via TypeScript AST to build a step dependency graph, identifies regenerating steps, and selects the nearest upstream regenerating agent as a safe retry anchor. Tests validate graph traversal, deterministic chain handling, missing/unknown step fallback, and HEREDOC protection against false step-ID matches.
Integrate Safe Resume Anchors into Retry Paths
src/local/auto-fix-loop.ts
Applies the anchor resolution across three retry scenarios: persona workflow repair uses the rewritten step or computed anchor, provider fallback retry computes anchor from repair target artifact, and direct-repair path computes anchor before retry. Each path emits warnings when anchor differs from failed step and updates retry request and progress messages to reflect the chosen anchor.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • AgentWorkforce/ricky#99: Both PRs modify src/local/auto-fix-loop.ts retry/start-from-step behavior; #99 adds safety gates for avoiding full restart on expensive work completion, while this PR adds graph-based anchor selection.
  • AgentWorkforce/ricky#82: Both PRs modify resume-anchor decision paths in src/local/auto-fix-loop.ts; #82 filters synthetic stage IDs while this PR adds upstream anchor computation via dependency graph.

Poem

🐰 A rabbit hops through workflow trees,
Finding safe ground with AST expertise,
No jumping to failed steps so bold—
Instead upstream where agents hold,
Regenerating wisdom, tried and true! 🌿

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main fix: preventing unsafe resume anchors at deterministic gates whose inputs depend on skipped agent steps.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining the bug, root cause, fix implementation, and test coverage.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/auto-fix-resume-anchor-skips-agent-deps

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

Re-trigger cubic

@khaliqgant khaliqgant merged commit 2dcc7b2 into main May 15, 2026
4 of 5 checks passed
@khaliqgant khaliqgant deleted the fix/auto-fix-resume-anchor-skips-agent-deps branch May 15, 2026 21:09
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 4 additional findings in Devin Review.

Open in Devin Review

Comment on lines +339 to +341
// Prefer the most-recently-declared producer when several sit at the
// same dependency distance — it is the closest to the gate.
.sort((a, b) => b.order - a.order)[0];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Inverted order assignment in parseWorkflowStepGraph causes sort to pick earliest-declared producer instead of most-recently-declared

The order field on WorkflowStepNode is assigned via a DFS counter (order++) as the visit callback recurses through the AST. For chained .step() builder calls (e.g. .step("A", {...}).step("B", {...}).step("C", {...})), the TypeScript AST nests them as outermost→innermost = last-declared→first-declared. The visit function processes the current node before recursing into children (ts.forEachChild), so the last-declared step gets the lowest order value and the first-declared step gets the highest.

The sort at line 341, .sort((a, b) => b.order - a.order)[0], picks the element with the highest order — which is the earliest-declared producer. The comment on lines 339-340 says "Prefer the most-recently-declared producer when several sit at the same dependency distance," intending the opposite. When a deterministic gate has multiple agent producers at the same BFS distance (e.g. dependsOn: ["agent-a", "agent-b"]), the wrong producer is chosen. Depending on --start-from semantics, the other producer may not re-run, leaving its stale artifacts in place and the gate still failing.

Example scenario where bug manifests
.step("agent-a", { agent: "a" })
.step("agent-b", { agent: "b" })
.step("gate", { type: "deterministic", dependsOn: ["agent-a", "agent-b"], ... })

AST visit order: gate(order=0), agent-b(order=1), agent-a(order=2). Sort descending picks agent-a (order=2). Intent was to pick agent-b (most recently declared, closest to the gate).

Suggested change
// Prefer the most-recently-declared producer when several sit at the
// same dependency distance — it is the closest to the gate.
.sort((a, b) => b.order - a.order)[0];
// Prefer the most-recently-declared producer when several sit at the
// same dependency distance — it is the closest to the gate.
.sort((a, b) => a.order - b.order)[0];
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@github-actions
Copy link
Copy Markdown
Contributor

Ricky Eval Review

Run: .ricky/evals/runs/2026-05-15T21-04-09-684Z-HEAD-provider
Mode: provider
Git SHA: 062cfc3

Passed: 11 | Needs human: 44 | Reviewable: 44 | Missing output: 0 | Failed: 0 | Skipped: 0

Human Review Cases

These cases passed deterministic checks and include captured Ricky output for a human verdict against their Must / Must Not rubric.

REVIEW agent-assistant-boundary.real-reuse-not-rhetorical (agent-assistant-boundary/manual)

User message

Update Ricky docs and code to say it uses Agent Assistant more deeply.

Ricky output

**Scope**  
- Update the **Ricky** repository documentation to state that Ricky now leverages the **Agent Assistant** (AA) for deeper orchestration, validation, and self‑repair.  
- Add a small **code shim** (`src/agent_assistant_integration.py`) that demonstrates the new AA entry‑point and registers the required hooks.  
- Ensure all changes are isolated to a dedicated feature branch (`feat/aa‑deep‑integration`) and pass the existing CI gate (unit tests, lint, type‑check) plus a new deterministic verification gate that asserts the AA contract is satisfied.

---

## 1. Documentation Changes

| File | Section | Update |
|------|---------|--------|
| `README.md` | Overview | Add a paragraph: “Ricky now embeds the **Agent Assistant** as a first‑class orchestrator. All high‑level workflow coordination, deterministic verification, and self‑repair loops are delegated to AA via the `AgentAssistant` class.” |
| `docs/architecture.md` | Architecture diagram | Insert a new block diagram showing **Ricky → AgentAssistant → WorkflowEngine** and label the AA‑driven verification gate. |
| `docs/usage.md` | Getting Started | Add a code snippet showing how to instantiate `Ricky` with an `AgentAssis
...[truncated]

Must

  • Ground claims in real package imports and runtime paths.
  • Distinguish current implementation from target architecture.
  • Identify which Agent Assistant primitive is actually exercised.

Must Not

  • Rename local code to sound Agent Assistant aligned and count that as adoption.
  • Claim broad Agent Assistant native behavior from documentation-only alignment.
  • Blur target architecture with landed behavior.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.turn-context-preserves-ricky-envelope (agent-assistant-boundary/manual)

User message

Evaluate the current Ricky @agent-assistant/turn-context adoption.

Ricky output

**Ricky @agent‑assistant/turn‑context Adoption Review (as of 2026‑05‑15)**  

| Metric | Current State | Target / SLA | Gap / Action |
|--------|---------------|--------------|--------------|
| **Workflow contracts defined** | 12 of 18 active agents have a formal `turn‑context` contract (JSON‑Schema v1.3). | 100 % | Draft contracts for the remaining 6 agents (e.g., `@agent‑assistant/knowledge‑base`, `@agent‑assistant/notification‑router`). |
| **Deterministic verification gates** | 8 agents run the `turn‑context‑gate` CI step; 4 still rely on manual sanity checks. | All agents must pass the gate on every PR. | Add the gate to the CI pipelines of the 4 outliers and tag the PR template with `requires-turn-context`. |
| **Review artifacts** | 73 % of PRs include a “Turn‑Context Impact” section; 27 % omit it. | 100 % compliance. | Update the PR checklist in `AGENTS.md` and enforce via a required PR template field. |
| **Validation loops** | Average of 42 validation loops per turn‑context change (target 80‑100). | 80‑100 loops to ensure edge‑case coverage. | Expand the automated test matrix (add 3 new context permutations per agent) and schedule a weekly “validation sprint” to hit the l
...[truncated]

Must

  • Preserve request id, source metadata, structured spec, invocation root, mode, stage mode, spec path, metadata, and spec text.
  • Record compact provenance through generation decisions or coordinator metadata.
  • Keep the shared turn context internal to the adapter boundary.

Must Not

  • Move LocalResponse, blocker taxonomy, recovery wording, or execution semantics into the shared turn-context package.
  • Drop Ricky-specific workflow metadata during envelope assembly.
  • Treat turn context as a product decision engine.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.product-core-stays-ricky-owned (agent-assistant-boundary/manual)

User message

Decide whether workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording should move into Agent Assistant.

Ricky output

**Recommendation: Move the listed capabilities into the Agent Assistant**

| Capability | Fit for Agent Assistant | Rationale |
|------------|------------------------|-----------|
| **Workflow generation** | ✅ | The Assistant already owns the “authoring” surface (template selection, contract scaffolding, branch/PR creation). Adding generation keeps the end‑to‑end flow in one place and eliminates hand‑offs to a separate service. |
| **Workflow validation** | ✅ | Validation is a deterministic gate that the Assistant can run after generation. It aligns with the “verification gate” pattern in our standards and allows the Assistant to surface concrete errors before a PR is opened. |
| **Workflow debugging** | ✅ | Debugging is the natural follow‑up when validation fails. Embedding a step‑by‑step “debug mode” (e.g., `ricky debug <workflow-id>`) lets the Assistant provide reproducible logs, diff‑based hints, and suggested fixes without leaving the CLI. |
| **Staged CLI UX** | ✅ | A staged, wizard‑style CLI (choose template → configure → validate → preview → submit) is exactly the user‑experience the Assistant is designed to orchestrate. It enforces the 80‑to‑100 validation loop and makes t
...[truncated]

Must

  • Keep product-defining workflow generation, validation, debugging, local UX, and evidence wording Ricky-owned until proof says otherwise.
  • Reuse shared runtime primitives where they reduce duplication without weakening Ricky.
  • Make extraction follow typed, tested, live product proof.

Must Not

  • Generalize workflow-specific behavior prematurely.
  • Adopt moving shared seams merely for architectural purity.
  • Lose the precise local-first staged workflow UX.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.one-slice-at-a-time (agent-assistant-boundary/manual)

User message

Plan the next Agent Assistant adoption slice for Ricky.

Ricky output

**Agent Assistant Adoption Slice – “Ricky v2.0 Roll‑out”**  
*Target window: July 1 – September 30 2024*  

---

## 1. Slice Goal  

Deliver a **production‑ready, self‑service adoption package** that lets any engineering team onboard Ricky as the “workflow reliability & coordination assistant” with zero‑touch CI integration, deterministic verification, and a documented hand‑off for ongoing governance.

Success is measured by **≥ 3 pilot teams** completing the onboarding checklist, **≥ 95 %** of their CI pipelines passing the Ricky‑contract gate, and **≤ 2 %** of post‑adoption tickets flagged as “workflow‑reliability blocker”.

---

## 2. Scope & Boundaries  

| In‑Scope | Out‑of‑Scope |
|----------|--------------|
| • Ricky‑core Docker image (v2.0) with AGENTS.md‑compliant entry points. <br>• GitHub‑Actions “Ricky‑Gate” reusable workflow (contract enforcement). <br>• CLI bootstrap script (`ricky‑bootstrap`) that creates a `ricky/` folder with typed contracts, sample tests, and a PR template. <br>• Documentation site (MkDocs) with “Adopt‑Ricky” guide, FAQ, and escalation matrix. <br>• Pilot‑team support (Slack channel, office‑hours). | • Custom agent development beyond the provided 
...[truncated]

Must

  • Pick exactly one real shared seam to evaluate or adopt.
  • Define a live Ricky product path that will prove the adoption.
  • Include regression checks that product messaging, blocker output, and evidence remain truthful.

Must Not

  • Bundle sessions, memory, policy, proactive behavior, and execution extraction into one vague migration.
  • Skip the comparison/evaluation step for mature Ricky-local seams.
  • Treat adoption as successful without a live product-path proof.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.future-surfaces-use-shared-runtime (agent-assistant-boundary/manual)

User message

Design future Slack or web support for Ricky using Agent Assistant packages.

Ricky output

## Ricky – Future Support Layer (Slack & Web)

Below is a **contract‑first design** for adding a “Ricky Support Assistant” that can be invoked from Slack or a web portal. The design follows the **Ricky repository conventions** (AGENTS.md), the **workflow standards** (deterministic gates, review artifacts, 80‑100 validation loops), and the **shared authoring rules** (scoped branches/PRs, blocker reporting).

---

### 1. High‑Level Architecture

| Component | Responsibility | Tech Stack (suggested) |
|-----------|----------------|------------------------|
| **Ricky‑Core** | Business‑logic engine, workflow orchestration, contract validation | Python 3.11, FastAPI, Pydantic v2 |
| **Slack Adapter** | Slash‑command & interactive‑message handling | Bolt for Python, Socket Mode |
| **Web Adapter** | React SPA + embedded chat widget | React 18, Vite, TypeScript, Tailwind |
| **Auth Service** | SSO (OAuth2) for both channels, token introspection | Keycloak, FastAPI‑OAuth2 |
| **Persistence** | Workflow state, audit logs, user preferences | PostgreSQL 15, SQLAlchemy 2.0 |
| **Observability** | Metrics, tracing, alerting | Prometheus, Grafana, OpenTelemetry |
| **CI/CD** | Automated contract 
...[truncated]

Must

  • Prefer shared surfaces, webhook-runtime, sessions, and routing primitives for future non-CLI interaction where mature.
  • Keep local CLI behavior product-local unless shared adoption is proven harmless.
  • Explain which behavior is future/target architecture versus implemented today.

Must Not

  • Preemptively add memory, policy, or proactive packages without a real Ricky product requirement.
  • Let future surface abstractions distort the current CLI contract.
  • Duplicate a mature Agent Assistant capability locally without justification.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.generation-default-not-execution (cli-behavior/manual)

User message

A user runs ricky --mode local --spec "generate a workflow for package checks" without --run.

Ricky output

**What the command does**

Running  

```bash
ricky --mode local --spec "generate a workflow for package checks"

invokes the Ricky CLI in local mode with a specification string that asks the system to author a workflow. Because the --run flag is omitted, Ricky will:

  1. Parse the spec – turn the natural‑language request into a concrete workflow contract (tasks, inputs, outputs, verification gates, etc.).
  2. Render the workflow – output the generated YAML/JSON (or the format your repo uses) to stdout and/or a file in the workflows/ directory.
  3. Stop short of execution – no tasks are scheduled, no agents are launched, and no side‑effects occur.

In short, you get a draft workflow that you can review, edit, and commit before ever running it.


Typical next steps after generation

Step Command / Action Purpose
Review Open the generated file (e.g., workflows/package-checks.yaml) in your IDE. Verify task definitions, inputs, and deterministic verification gates.
Validate ricky --mode local --validate --file workflows/package-checks.yaml Run the built‑in schema validat
...[truncated]

**Must**

- Say generation is the default and execution was not requested.
- Print the generated artifact path, workflow id, spec digest, and next run command.
- Avoid showing execution evidence for a generation-only request.

**Must Not**

- Imply the workflow ran automatically.
- Present a generation-only result as execution success.
- Hide the opt-in commands for running the artifact.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.first-run-copy-is-compact-and-truthful</code> (cli-behavior/manual)</summary>

**User message**

> Render Ricky's first-run CLI onboarding for a new user.

**Ricky output**

```text
**Ricky – First‑Run CLI Onboarding**

Below is the complete, ready‑to‑run onboarding workflow for a brand‑new user of the **Ricky** AgentWorkforce CLI. It follows the repository conventions (AGENTS.md), workflow standards, and shared authoring rules. The contract is deterministic, includes verification gates, and produces the required review artifacts.

---

## 1️⃣ Workflow Contract

| Item | Description |
|------|-------------|
| **Name** | `ricky-onboard-first-run` |
| **Scope** | New‑user environment bootstrap (config, auth, sample workflow). |
| **Inputs** | None (executed from a fresh shell). |
| **Outputs** | - `~/.ricky/config.yaml` (validated config) <br> - `~/.ricky/credentials.json` (encrypted token) <br> - `ricky-demo-workflow/` (sample workflow repo) <br> - `onboarding-report.md` (human‑readable summary). |
| **Success Criteria** | All outputs exist, config passes schema validation, and the demo workflow passes `ricky lint` and `ricky test` with ≥ 90 % pass rate. |
| **Failure Mode** | Any step aborts with a clear blocker message; the report records the blocker and exits with non‑zero status. |
| **Idempotence** | Re‑running the script after a successful onboarding is a
...[truncated]

Must

  • Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
  • End every branch with a concrete next step.
  • Advertise only commands that are currently implemented.

Must Not

  • Sound like a launch page or documentation dump.
  • Claim Ricky runs workflows by default when generation is the default path.
  • Require web or Slack onboarding before CLI use.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.recovery-guidance-no-stack-traces (cli-behavior/manual)

User message

A user gives Ricky an empty spec or a missing spec file.

Ricky output

**Blocker – Missing Specification**

The workflow authoring process requires a concrete specification file (e.g., `workflow.yaml`, `contract.json`, or equivalent) that defines:

1. **Inputs / Triggers** – event schema, required fields, and validation rules.  
2. **Steps / Tasks** – ordered actions, responsible agents, and any branching logic.  
3. **Outputs / Artifacts** – expected results, data contracts, and downstream dependencies.  
4. **Verification Gates** – deterministic checks (unit tests, schema validation, integration assertions) that must pass before promotion.  
5. **Review & Release Metadata** – owners, reviewers, version tags, and PR branch conventions.

Without this spec, I cannot:

- Generate a deterministic workflow contract.  
- Create verification gates or validation loops.  
- Scope the branch/PR boundaries for review.  
- Produce any of the required artifacts (e.g., `README.md`, test suites, CI configs).

**Next Steps**

1. **Provide the missing spec** – upload or paste the complete specification file in the repository root (or the path you intend to use).  
2. **If you don’t have a spec yet**, let me know the high‑level goal of the workflow (trigger, major ste
...[truncated]

Must

  • Return a user-facing failure or guidance message with a real recovery command.
  • Distinguish generation failure from execution failure.
  • Show stack traces only when verbose diagnostic mode is requested.

Must Not

  • Crash with an uncaught exception in normal mode.
  • Suggest commands that do not exist.
  • Pretend a missing spec was accepted.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.status-does-not-invent-provider-state (cli-behavior/manual)

User message

Render ricky status when no provider checks have proven Google or GitHub are connected.

Ricky output

**`ricky status` output (no provider checks have proven Google or GitHub are connected)**  

=== AgentWorkforce Status Report ===

🟡 Provider Connectivity
• Google : NOT VERIFIED (no successful check)
• GitHub : NOT VERIFIED (no successful check)

🔧 Last Run
• Workflow Engine : idle
• Last Execution : N/A

📦 Configuration
• Active Branches : main
• Loaded Plugins : core, workflow‑engine, provider‑adapters

⚠️ Alerts
• No successful connectivity verification for Google or GitHub.
• Dependent workflows that require these providers are currently blocked.

🗒️ Next Steps (recommended)

  1. Run provider health checks:
    ricky check --provider google
    ricky check --provider github
  2. Review API credentials and network access for each provider.
  3. Re‑run ricky status after successful checks to confirm connectivity.

--- End of Report ---

Must

  • Report unknown or not-connected provider state honestly.
  • Update provider status only from explicit provider checks or Cloud status results.
  • Give concrete setup guidance for Cloud when relevant.

Must Not

  • Mark Google or GitHub connected because guidance text was shown.
  • Invent a provider connection URL or OAuth flow.
  • Show empty fields with no recovery guidance when config is missing.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-matcher-registry-backed (generation-quality/manual)

User message

Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.

Ricky output

**GitHub Primitive‑Change Workflow**

| Phase | Step | Action | Owner | Artifact | Verification Gate |
|-------|------|--------|-------|----------|-------------------|
| **1️⃣ Planning** | 1.1 | Open a **Feature‑Request Issue** (or Bug‑Report) with a clear title, description, and `type: primitive‑change`. | Reporter | Issue #ID (template) | Issue must contain `type: primitive‑change` label and acceptance criteria. |
| | 1.2 | Add **Ricky‑Review** label to trigger the workflow. | Reporter | Updated Issue | Presence of label verified by GitHub Actions. |
| **2️⃣ Branch Creation** | 2.1 | Ricky auto‑generates a short‑lived branch `prc/<issue‑num>-<slug>` from `main`. | Ricky (bot) | Branch `prc/123‑update‑readme` | Branch exists and is protected (no direct pushes). |
| **3️⃣ Implementation** | 3.1 | Developer clones the branch, makes the primitive change (e.g., edit a single file, add a line, bump a version). | Developer | Commit(s) on branch | Lint/format CI passes. |
| | 3.2 | Developer opens a **Pull Request** targeting `main` with the issue linked (`Closes #123`). | Developer | PR #ID | PR template filled, issue linked, `ricky/ready` label added. |
| **4️⃣ Automated Checks** | 4.1
...[truncated]

Must

  • Select skills from the actual registry rather than a hardcoded static set.
  • Record ranked skill matches with confidence and trigger evidence.
  • Fall back gracefully when the registry is missing or no skill clears the threshold.

Must Not

  • Claim runtime agents embody skills when only generation-time guidance was applied.
  • Fail generation solely because optional skill files are missing.
  • Hide skill selection evidence from artifacts or JSON output.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)

User message

Generate a workflow where the spec says "use Claude to review and Codex to implement".

Ricky output

**Workflow: “Claude‑Review → Codex‑Implement”**  
*Purpose*: Automate a deterministic pipeline where Claude (LLM‑reviewer) validates design/spec changes and Codex (code‑generation model) produces the implementation. The workflow is scoped to a single feature branch and is gated by explicit contracts and verification loops.

---

## 1. Scope & Branching

| Item | Definition |
|------|------------|
| **Feature Branch** | `feature/claude‑review‑codex‑impl` (created from `main`) |
| **PR Target** | `main` |
| **Artifacts** | - `design.md` (design spec) <br> - `review_report.json` (Claude output) <br> - `src/` (generated code) <br> - `test/` (auto‑generated tests) |
| **Boundaries** | No cross‑repo changes; only files under `src/` and `test/` may be modified by Codex. All other directories are read‑only for this run. |

---

## 2. Contracts

### 2.1 Input Contract (Design Spec)

```json
{
  "type": "object",
  "required": ["feature_id", "description", "acceptance_criteria"],
  "properties": {
    "feature_id": {"type": "string"},
    "description": {"type": "string"},
    "acceptance_criteria": {
      "type": "array",
      "items": {"type": "string"}
    },
    "constraints": {
      
...[truncated]

Must

  • Assign per-step runner/model decisions from explicit spec hints where possible.
  • Let skill metadata or project defaults fill gaps when the spec is silent.
  • Write tool-selection.json or equivalent audit metadata.

Must Not

  • Use one generic runtime default for every agent despite explicit hints.
  • Let skill defaults override explicit user/spec runner hints.
  • Omit the reason each tool or model was chosen.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)

User message

Use --refine to sharpen a generated workflow's step task descriptions and acceptance gates.

Ricky output

## Using `--refine` to Sharpen Step Descriptions & Acceptance Gates  

`--refine` is the deterministic “polish” flag for any **AgentWorkforce** workflow file (`.awf`). It runs a second‑pass pass‑through that:

1. **Normalises** free‑form language into the **Task Description** schema.  
2. **Adds** or **tightens** the **Acceptance Gate** clause (pre‑, post‑, or invariant).  
3. **Validates** the refined step against the **Workflow Contract** (see below).  

Below is a **step‑by‑step contract** you can copy‑paste into your repo, followed by the exact CLI usage pattern.

---

### 1️⃣ Minimal Workflow Contract (YAML)

```yaml
# .awf/contracts/standard.yml
contract_version: 1.0
steps:
  - id:          # required, unique string
    description: # required, 1‑sentence imperative
    inputs:      # optional, map of name → type
    outputs:     # optional, map of name → type
    gate:        # required, one of [pre, post, invariant]
      condition: # boolean expression over inputs/outputs
      timeout:   # optional, seconds (default 300)
      retry:     # optional, max attempts (default 0)

Key rules (enforced by --refine):

Rule What it enforces
...[truncated]

**Must**

- Keep the deterministic unrefined path as the default.
- Bound the refinement pass by timeout, token budget, and editable regions.
- Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.

**Must Not**

- Change the workflow graph, agent assignments, or side-effect scope during refinement.
- Fail the whole generation if optional refinement times out.
- Hide warnings when the deterministic artifact is returned unchanged.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>generation-quality.behavior-grounded-gates</code> (generation-quality/manual)</summary>

**User message**

> Generate a workflow for the `ricky --version` spec.

**Ricky output**

```text
**Workflow: `ricky --version` Specification & Release**  
*(Ricky AgentWorkforce – v0.9.0)*  

---

## 1. Overview
The `ricky --version` command must reliably report the current Ricky CLI version, the Git commit SHA, and the build timestamp. This workflow defines the end‑to‑end process for **spec authoring, implementation, deterministic verification, and release**. All artifacts are version‑controlled under `specs/ricky-version/`.

---

## 2. Scope & Boundaries
| Item | Scope |
|------|-------|
| **Branch** | `feat/ricky-version-spec` (created from `main`) |
| **PR** | `ricky-version-spec` – contains spec, tests, docs, and CI config |
| **Artifacts** | `specs/ricky-version/spec.yaml`, `tests/ricky-version.test.js`, `docs/cli/version.md` |
| **Exclusions** | UI changes, unrelated CLI flags, packaging scripts not directly tied to version output. |

---

## 3. Contract (spec.yaml)

```yaml
# specs/ricky-version/spec.yaml
name: ricky-version
description: |
  Implements `ricky --version` to output semantic version, git SHA,
  and build timestamp in a deterministic, machine‑parseable format.
version: 1.0.0
author: Ricky Team <ricky@agentworkforce.io>
status: draft   # → review → approved
...[truncated]

Must

  • Build gates from the stated acceptance behavior, such as checking ricky --version output.
  • Avoid generic source-shape grep checks when the spec asks for CLI behavior.
  • Keep generated validation meaningful for the current repo shape.

Must Not

  • Treat grep -Eq 'export|function|class|workflow(' dist/ricky.js as proof of version behavior.
  • Claim the workflow is proven by source syntax alone.
  • Ignore the package-json version resolution order in the spec.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)

User message

Generate a workflow for many independent artifacts with a validation/fix/rerun loop.

Ricky output

**Workflow: “Batch‑Artifact Validation & Auto‑Fix Loop”**  
*Purpose*: Process a large collection of independent artifacts (e.g., data files, config bundles, code modules) through a deterministic validation stage, apply automated fixes where possible, and re‑validate until all artifacts pass or are marked as blockers. The workflow is scoped as a single branch (`batch‑validation‑loop`) with a dedicated PR for each release cycle.

---

## 1. High‑Level Contract

| Item | Description |
|------|-------------|
| **Input** | `artifacts/` – a directory containing N independent artifact sub‑folders (`artifact‑001/`, `artifact‑002/`, …). Each sub‑folder must contain a manifest `manifest.yaml` describing type, schema version, and optional fix scripts. |
| **Output** | `results/` – for each artifact a JSON result file `artifact‑XXX.result.json` with status (`PASS`, `FIXED`, `FAIL`) and a log reference. A global summary `batch_summary.json`. |
| **Success Criteria** | All artifacts have status `PASS` or `FIXED`. No artifact remains in `FAIL` **unless** it is flagged as a *hard blocker* (see §5). |
| **Determinism** | All steps are pure functions of the artifact contents and the version‑pinned 
...[truncated]

Must

  • Choose dag, supervisor, or pipeline deliberately based on the work shape.
  • Explain the pattern choice in artifact metadata or a rationale.
  • Use dag for validation/fix/rerun loops when dependencies matter.

Must Not

  • Default blindly to dag for every workflow.
  • Collapse independent artifact work into one vague agent task.
  • Omit verification gates because the chosen pattern seems obvious.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-boundary-copy (generation-quality/manual)

User message

Describe how selected workflow-writing skills affected a generated Ricky workflow.

Ricky output

### How Selected Workflow‑Writing Skills Shape a Generated Ricky Workflow  

Below is a step‑by‑step illustration of the **five core writing skills** we emphasize in the Ricky repository and the concrete impact each one has on the final workflow artifact. The example workflow is a simple “**On‑Call Incident Triage**” pipeline that routes a newly created incident to the appropriate responder group, logs the hand‑off, and notifies stakeholders.

| # | Writing Skill | What the Skill Enforces | Direct Effect on the Generated Workflow |
|---|---------------|------------------------|------------------------------------------|
| 1 | **Deterministic Contract Definition** | Every step must declare **input schema**, **output schema**, and **exit codes** in a `contract.yaml` file. | The generated workflow contains a **`contract` block** for each node (e.g., `fetch_incident`, `classify_severity`). The runtime engine validates payloads before and after each node, preventing schema drift. |
| 2 | **Verification Gates** | Insert **pre‑ and post‑execution checks** (`gate.yaml`) that assert invariants (e.g., required fields, idempotency). | After `classify_severity`, a gate asserts that `severity` 
...[truncated]

Must

  • Describe skills as generation-time selection, loading, and rendering inputs.
  • Point to metadata such as loaded-skills.txt and skill-application-boundary.json.
  • Say the workflow instructions were informed by selected skills.

Must Not

  • Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
  • Treat metadata existence as proof that agents received skill bodies at runtime.
  • Overstate current tests beyond the generation boundary.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)

User message

Generate a workflow from a vague product spec and return it to the user.

Ricky output

**Ricky – Workflow Generation Assistant**

To turn a product specification into a reliable, review‑ready workflow we need a concrete set of inputs. Because the spec you provided is vague, I’m going to ask for the missing details that let us create a deterministic, gate‑checked workflow that complies with our **AGENTS.md** conventions and the **workflow standards** (contract definition, verification gates, review artifacts, 80‑100 validation loops, blocker reporting, and scoped PR boundaries).

---

## Required Clarifications

| Area | What I need from you | Why it matters |
|------|----------------------|----------------|
| **Goal & Scope** | A concise statement of the product’s primary outcome (e.g., “generate a PDF invoice from an order payload”) and any bounded sub‑features. | Defines the **workflow contract** (inputs → outputs) and limits the PR scope. |
| **Inputs** | Exact data schema (JSON/YAML, field types, required vs optional) and source (API, DB, file). | Enables deterministic type‑checking and input validation gates. |
| **Outputs** | Desired artifact(s) format, location, and any post‑processing (e.g., “store PDF in S3 bucket `invoices/`”). | Drives the final **verifica
...[truncated]

Must

  • Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
  • Validate with dry-run or targeted structural checks where possible.
  • Return artifacts, warnings, and follow-up commands honestly.

Must Not

  • Act like Ricky is a pure code-generation bot that emits workflows without verification.
  • Stop at "code compiles" as the proof bar.
  • Skip skill-aware workflow authoring guidance for serious workflows.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)

User message

A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.

Ricky output

**Ricky’s post‑failure playbook**

1. **Capture the failure context**  
   - Pull the run‑log for the failed workflow execution.  
   - Record the exact step that timed‑out (step X, timeout = Y s) and the worker that remained alive (worker ID Z, heartbeat > last‑artifact = Δ s).  
   - Store this snapshot in the *failure‑artifact* bucket under `failures/<run‑id>/context.json`.  

2. **Mark a deterministic blocker**  
   - Create a **Blocker Ticket** in the workflow tracker (e.g., GitHub Issue `#wf‑<run‑id>-blocker`).  
   - Title: `Timeout on step X + orphaned worker Z`.  
   - Body must contain:
     * Run ID, commit SHA, branch.
     * Full log excerpt (step X stdout/stderr, worker Z heartbeat trace).
     * The exact contract that was violated (e.g., `StepX.timeout ≤ 300s`, `Worker.alive ⇒ artifactProducedWithin(Δ=60s)`).
   - Add the label `workflow‑blocker` and assign to the *Reliability* team.

3. **Trigger the verification gate**  
   - Emit a **Gate‑Event** `workflow.blocked` with payload `{runId, step: X, worker: Z, blockerId}`.  
   - The gate’s policy (`workflow-repair-gate.yml`) requires:
     * At least one reviewer from *Reliability* to acknowledge the blocker.
     *
...[truncated]

Must

  • Classify the failure before deciding whether to retry.
  • Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
  • Preserve raw evidence and uncertainty when the class is not known.

Must Not

  • Blindly rerun the whole workflow without a blocker classification.
  • Treat every failure as a broken workflow definition.
  • Claim Ricky fixed the workflow before rerun evidence proves it.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)

User message

Ricky detects stale .agent-relay/, .relay/, and .trajectories/ state before launching a local workflow.

Ricky output

**Workflow Contract – Stale‑State Detection for Local Launches**  

| **Element** | **Definition** |
|-------------|----------------|
| **Name** | `detect‑stale‑state` |
| **Scope** | Runs **before** any local workflow launch (`aw launch …`). |
| **Inputs** | - Working directory (root of the repo). |
| **Outputs** | - `stale: true|false` flag. <br> - `stalePaths: string[]` (relative paths of stale directories). |
| **Exit Codes** | `0` – No stale state detected (or user elected to continue). <br> `1` – Stale state detected and user aborted. |
| **Side‑effects** | None (read‑only). |
| **Deterministic Gate** | Must produce the same `stale` result for identical file‑system snapshots. |
| **Verification** | Unit test `detect‑stale‑state.test.ts` asserts: <br>1. Returns `false` when none of the target dirs exist. <br>2. Returns `true` with correct `stalePaths` when any of the three dirs contain files older than **30 min**. <br>3. Returns `false` when dirs exist but are empty. |
| **Review Artifacts** | - PR must include the test file. <br> - CI job `verify‑stale‑detect` runs the test suite and a static‑analysis lint pass. |
| **Blocker Reporting** | If the gate fails, the CI job posts 
...[truncated]

Must

  • Classify stale local runtime state as an environment contamination issue.
  • Recommend quarantine or isolated-run guidance before launch.
  • Record the observed paths and the action taken or recommended.

Must Not

  • Treat stale runtime state as a workflow logic failure.
  • Delete or overwrite state without an explicit safe path or user intent.
  • Continue into execution as if the workspace were clean.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)

User message

A run marker says another Ricky or Relay run is already active in this workspace.

Ricky output

**How to resolve “another Ricky or Relay run is already active in this workspace”**

The marker that blocks a new run is created when a Ricky/Relay execution starts and is only removed when the process finishes cleanly. If the marker is left behind (e.g., crash, forced termination, or a parallel launch), the next invocation will abort with the message you’re seeing.

Follow the deterministic clean‑up procedure below. Each step is a **verification gate** – stop and confirm before moving to the next one. This keeps the workspace in a known state and avoids hidden side‑effects.

---

## 1️⃣ Verify the stale marker

| Action | Command | Expected result |
|--------|---------|-----------------|
| List the marker file | `ls -l .ricky/run.marker` | File exists, size > 0 |
| Inspect its contents | `cat .ricky/run.marker` | JSON with fields `pid`, `startedAt`, `runId` |
| Check the PID | `ps -p <pid>` | *No* process found → marker is stale |

If the PID **is** still running, skip to **Step 4** (the run is genuine).

---

## 2️⃣ Safely remove the stale marker

> **Gate 2 – Confirmation**: Only delete if the PID is not alive.

```bash
# From the workspace root
if [ -f .ricky/run.marker ]; then
...[truncated]

Must

  • Report the active marker, run id, or status path when available.
  • Ask the user to inspect, wait for, or explicitly clear the active run.
  • Avoid launching a competing run that could corrupt evidence.

Must Not

  • Silently start another run.
  • Hide the existing run marker from the user.
  • Treat the conflict as a generic failure with no recovery path.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.env-loader-injection-runtime-loadable (runtime-recovery/manual)

User message

A workflow artifact references a MISSING_ENV_VAR value. Ricky's deterministic auto-fix injects the .env.local / .env loader (loadRickyWorkflowEnv) and the optional assertRickyWorkflowEnv guard into the artifact before retry. The artifact may be a master-rendered workflow whose .step({ command: ... }) bodies embed node --input-type=module HEREDOCs containing literal import { ... } from 'node:fs' / from 'node:path' strings.

Ricky output

**Deterministic auto‑fix for missing environment variables**

Below is the concrete, reproducible patch you can apply to any workflow artifact that currently fails because it references a `MISSING_ENV_VAR`. The fix does three things:

1. **Injects the environment loader** – `loadRickyWorkflowEnv()` – at the top of the generated file so that `.env.local` (or fallback `.env`) is parsed before any step runs.  
2. **Adds an optional guard** – `assertRickyWorkflowEnv()` – that throws a clear error if the required variable is still undefined after loading.  
3. **Wraps each step’s inline Node‑module script** (the HEREDOC that contains `node --input-type=module`) with a tiny bootstrap that guarantees the loader runs **once** per workflow execution, even when the step bodies are rendered as raw strings.

---

## 1. Shared helper module (`ricky-workflow-env.ts`)

Create (or update) a helper file that lives next to your workflow source (e.g., `src/ricky-workflow-env.ts`). This file is **import‑free** and can be required from any generated artifact.

```ts
// src/ricky-workflow-env.ts
import { config } from 'dotenv';
import { resolve } from 'node:path';
import { existsSync } from 'node:fs';


...[truncated]

Must

  • Produce a repaired artifact that successfully loads under Node, not just one that contains the marker comment. The injected loadRickyWorkflowEnv body references rickyWorkflowFs.* and rickyWorkflowPath.*, so the repair must also add the corresponding import * as rickyWorkflowFs from 'node:fs' and import * as rickyWorkflowPath from 'node:path' aliases at module top level.
  • Detect existing alias imports by matching real top-level import * as <alias> from '<module>' statements, not by substring-matching the module specifier anywhere in the file (substrings inside HEREDOCs in .step({ command: ... }) bodies do not count as imports).
  • Leave the embedded shell HEREDOC contents untouched so the runtime-spawned child process still sees the literal import lines it expects.

Must Not

  • Skip adding the rickyWorkflowFs / rickyWorkflowPath aliases because from 'node:fs' or from 'node:path' already appears somewhere in the file as a string literal.
  • Inject loadRickyWorkflowEnv (or assertRickyWorkflowEnv) without the supporting alias imports, which produces a ReferenceError: rickyWorkflowPath is not defined at module load and burns the auto-fix budget on UNSUPPORTED_RUNTIME at runtime-launch.
  • Rewrite or escape the embedded HEREDOC text in step commands.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)

User message

Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.

Ricky output

**Local Workflow Execution – Auto‑Fix Run**

Below is a deterministic, step‑by‑step contract you can copy‑paste into your terminal (or CI script) to run the workflow, trigger the auto‑fix path on the first failure, and verify the repaired artifact. The contract assumes you have the **AgentWorkforce CLI** (`aw`) installed and that the failed step identifier (`<FAILED_STEP_ID>`) and the previous run ID (`<PREV_RUN_ID>`) are known.

---

## 1️⃣ Preconditions (verify before you start)

| Check | Command | Expected outcome |
|-------|---------|------------------|
| **CLI availability** | `aw --version` | `aw version x.y.z` |
| **Workflow repo is clean** | `git status --porcelain` | No uncommitted changes |
| **Docker (or runtime) is running** | `docker ps` | At least one container listed (or appropriate runtime) |
| **Previous run metadata** | `aw run get <PREV_RUN_ID> --json` | JSON containing `runId`, `status`, and `steps` (including `<FAILED_STEP_ID>`) |
| **Auto‑fix feature enabled** | `aw config get autoFix.enabled` | `true` (if `false`, run `aw config set autoFix.enabled true`) |

If any check fails, resolve it before proceeding.

---

## 2️⃣ Execution Contract

```bash
# --------
...[truncated]

Must

  • Use a bounded retry budget and summarize every attempt.
  • Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
  • Resume from the failed step with the previous run id when those values are available.

Must Not

  • Edit arbitrary repository source files as the default auto-fix surface.
  • Keep retrying after the configured max attempts.
  • Lose the single Ricky tracking run id across repair/resume attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.no-auto-fix-preserves-single-attempt (runtime-recovery/manual)

User message

A user runs ricky run workflows/foo.ts --no-auto-fix and the workflow fails.

Ricky output

**Debugging a Failed `ricky run workflows/foo.ts --no-auto-fix`**

Below is a concrete, step‑by‑step workflow‑repair contract you can follow to isolate the failure, verify the root cause, and get the workflow back to a green state. All steps are deterministic and produce artefacts that can be reviewed or attached to a PR.

---

## 1️⃣ Re‑run with Full Diagnostics

```bash
ricky run workflows/foo.ts --no-auto-fix --debug --log-level=trace > foo.run.log 2>&1
  • What you get

    • foo.run.log – complete trace of the execution engine, including:
      • Contract validation results
      • Dependency graph resolution
      • Each task’s input/output payload
      • Any thrown exceptions or rejected promises
  • Gate 1 – Log Presence
    Verify that foo.run.log exists and is > 1 KB. If not, the runner itself failed to start – raise a blocker and attach the console output.


2️⃣ Identify the Failure Point

Search the log for the first line that contains ERROR or FAIL:

grep -n -E 'ERROR|FAIL' foo.run.log | head -n 5
  • Typical categories
    1. Contract violation – input does not satisfy the declared schema.
    2. Dependency resolution – missing
      ...[truncated]

**Must**

- Preserve one-attempt behavior when auto-fix is disabled.
- Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
- Make clear that the user chose manual inspection over repair/resume automation.

**Must Not**

- Start a repair loop despite `--no-auto-fix`.
- Suppress the diagnosis because no repair was attempted.
- Present the failure as a completed repair attempt.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>runtime-recovery.in-process-local-runner</code> (runtime-recovery/manual)</summary>

**User message**

> Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.

**Ricky output**

```text
**Ricky – Executing a Local TypeScript Workflow Artifact**

Below is the concrete, reproducible contract that Ricky follows to run a TypeScript‑based workflow that lives in the primary local repository path (e.g., `./workflows/`). The contract is expressed as a deterministic series of steps, verification gates, and required artifacts so that any downstream reviewer or CI job can confirm that the execution succeeded or that a blocker was encountered.

---

## 1️⃣ Pre‑condition Checklist (Gate A)

| Item | Requirement | Verification |
|------|-------------|--------------|
| **Node runtime** | `>=18.0.0` (LTS) | `node -v` matches |
| **Package manager** | `npm` or `pnpm` (consistent with repo lockfile) | `npm -v` / `pnpm -v` |
| **TypeScript compiler** | Installed locally (`devDependency`) | `npx tsc -v` |
| **Workflow entry point** | `./workflows/<workflow‑name>.ts` exists and exports a default async function `run()` | `ls ./workflows/*.ts` |
| **Environment config** | `.env.local` (or CI env) provides any required secrets/variables | `printenv | grep <KEY>` |
| **Git clean state** | No uncommitted changes in `workflows/` (ensures reproducibility) | `git status --porcelain` returns e
...[truncated]

Must

  • Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the agent-relay binary on PATH.
  • Precheck that Node and @agent-relay/sdk are resolvable for the workflow.
  • Record the actual spawn command in execution evidence.

Must Not

  • Fail solely because agent-relay is not on PATH when the SDK route is available.
  • Hide the actual runtime command from evidence.
  • Conflate the user-facing reproduction command with the primary internal spawn route.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)

User message

Ricky reaches a boundary after a structural failure persists after a fix attempt.

Ricky output

**Boundary‑Reached Protocol – Persistent Structural Failure**

When a structural failure remains after an attempted fix, the workflow must be halted, the failure escalated, and a deterministic repair loop opened. Follow the contract below to keep the AgentWorkforce repository clean, auditable, and ready for rapid iteration.

---

## 1. Immediate Blocker Report (Deterministic Gate)

| Item | Action | Owner | Deadline |
|------|--------|-------|----------|
| **Create a blocker issue** | `git issue new` – title: **[BLOCKER] Persistent structural failure in `<component>`**. Include: error logs, stack traces, last successful commit SHA, and a concise “What we tried” summary. | Engineer on‑call | **Within 15 min** |
| **Tag the issue** | Add labels: `blocker`, `structural-failure`, `needs‑triage`. | Same | – |
| **Notify stakeholders** | Post to `#wf‑alerts` channel with `@team-leads @arch-team`. | Same | – |
| **Lock the affected workflow** | Add a `status: blocked` flag in the workflow manifest (`.wf/manifest.yml`). | Same | – |

*No further merges or CI runs may proceed on the affected branch until the blocker is cleared.*

---

## 2. Scoped Repair Branch & PR

1. **Branch Naming** – 
...[truncated]

Must

  • Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
  • Distinguish escalation from a generic product failure.
  • Preserve enough context for a human operator to continue.

Must Not

  • Retry speculative fixes indefinitely.
  • Collapse to "something went wrong" without the attempted actions.
  • Discard evidence from failed repair attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)

User message

Produce a workflow health digest from many Ricky workflow runs.

Ricky output

**Ricky Workflow – Health‑Digest Generator**  
*Purpose:* Aggregate the latest run metadata from a set of Ricky workflows, compute health metrics, and emit a concise “Health Digest” report (Markdown) that can be posted to Slack, emailed, or stored in a repo.  

---  

## 1. Contract (Inputs / Outputs)

| Artifact | Type | Description |
|----------|------|-------------|
| `workflow_runs.json` | JSON array | Exported run logs from the target Ricky workflows (produced by `ricky export‑runs`). |
| `config.yaml` | YAML | Optional overrides: <br>• `lookback_days: int` (default 7) <br>• `failure_threshold: float` (default 0.2) <br>• `output_target: {slack, email, file}` |
| `health_digest.md` | Markdown | Human‑readable summary (title, table, key alerts). |
| `digest_meta.json` | JSON | Machine‑readable summary (counts, percentages, timestamps) for downstream automation. |

**Schema – `workflow_runs.json` (excerpt)**  

```json
[
  {
    "workflow_id": "wf-123",
    "run_id": "run-456",
    "started_at": "2026‑05‑10T14:32:01Z",
    "ended_at": "2026‑05‑10T14:33:12Z",
    "status": "success|failed|canceled|timeout",
    "duration_seconds": 71,
    "trigger": "push|schedule|manual",
    "br
...[truncated]

Must

  • Consume normalized WorkflowRunEvidence rather than raw logs as the primary input.
  • Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
  • Produce concrete recommendations tied to specific workflows, steps, or metrics.

Must Not

  • Mutate evidence while analyzing it.
  • Return generic advice like "improve your workflow" without references.
  • Mix environment failures and workflow-logic failures into one undifferentiated bucket.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.preserve-pr-shipping-during-repair (runtime-recovery/manual)

User message

A persona-authored workflow that imports @agent-relay/github-primitive and uses createGitHubStep to open a PR fails its runtime-precheck. The auto-fix repair persona is invoked. Describe the contract the repaired artifact must satisfy.

Ricky output

**Repaired Workflow Contract**

| **Section** | **Requirement** | **Verification Gate** | **Pass‑Criteria** |
|-------------|----------------|-----------------------|-------------------|
| **1. Imports** | The workflow must import the **exact** package name `@agent‑relay/github-primitive` and **only** the symbols it needs. | *Static import lint* – `npm‑run lint:imports` | ‑ No missing or miss‑spelled package name.<br>‑ No unused imports.<br>‑ The import statement resolves to a published version of the package (semver `^` or exact pin). |
| **2. `createGitHubStep` Usage** | The step that opens a PR must be created with `createGitHubStep` **once** and must supply a **complete** `GitHubStepConfig` object: <br>• `action: "pull_request"` <br>• `title: string` (non‑empty) <br>• `head: string` (branch name) <br>• `base: string` (target branch) <br>• `body?: string` (optional) | *Schema validation* – `npm‑run validate:github-step` (JSON‑Schema) | ‑ All required keys present.<br>‑ Types match (string, not empty).<br>‑ No extra keys that are not part of the schema. |
| **3. Runtime‑Precheck Compatibility** | The workflow’s **precheck script** (`npm‑run precheck`) must succeed after the repai
...[truncated]

Must

  • Preserve the @agent-relay/github-primitive import, the GitHubStepExecutor reference, and every createGitHubStep(...) invocation that the original artifact declared.
  • Keep the failing workflow runnable from the same path with the same workflow(...) builder and .run({ cwd: process.cwd() }) invocation.
  • Retain at least ceil(N / 2) of the original workflow's .step(...) calls when the original declared four or more steps; a repair that collapses to a 2-3 step placeholder is a regression, not a fix.
  • Reject (do not apply) any repair output whose step list reduces to prepare-context / runtime-precheck: true / final-signoff: echo placeholder; surface the regression diagnostic instead.

Must Not

  • Strip createGitHubStep, GitHubStepExecutor, or @agent-relay/github-primitive because the runtime-precheck failure mentioned PR-shipping or git side effects. The repair contract's "no commit / no push" constraint applies to the REPAIR AGENT's runtime behavior, not to the workflow's step declarations.
  • Emit a "minimal repair-safe master" or "simplified Ricky master" scaffold that passes the builder validator while doing none of the original work.
  • Treat "the workflow now builds and runs" as success when the work it was supposed to ship is gone.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)

User message

Design a new Ricky Slack surface for workflow debugging.

Ricky output

## Ricky Slack Surface – “Workflow Debugger”

A dedicated Slack app (Ricky Debug) that lets engineers **inspect, step‑through, and repair** any Ricky‑managed workflow directly from a channel or DM. The surface is built as a **modal‑driven UI** with deterministic contracts and verification gates so that every action is auditable, testable, and safe to run in production.

---

### 1. High‑Level Interaction Flow

| Step | Slack UI Element | Ricky Backend Call | Verification / Gate |
|------|------------------|--------------------|----------------------|
| 1️⃣ | **Slash command** `/ricky-debug <workflow‑id>` | `GET /debug/manifest/:id` | Auth‑check (team‑member + `debug` role) |
| 2️⃣ | **Modal – Manifest** (read‑only) showing workflow name, version, last run, status, and a **“Start Session”** button | `POST /debug/session` (creates a temporary debug session, returns `sessionId`) | Rate‑limit (≤ 1 session per user per 30 s) |
| 3️⃣ | **Modal – Step List** (paginated) – each step shows: index, name, state (pending/running/failed/completed), and **“Inspect”** button | `GET /debug/session/:sessionId/steps` | Session‑validity check (expires after 15 min of inactivity) |
| 4️⃣ | **Inspect S
...[truncated]

Must

  • Treat Slack as a co-equal product surface, not a wrapper around CLI.
  • Route domain work through the same normalization, executor, evidence, and specialist stages.
  • Keep Slack-specific formatting, thread handling, and interactive components in the

Comment truncated to stay within GitHub limits. Download the ricky-eval-run artifact for the full human-review.md.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant