fix(generation): drop cross-repo paths from derived test command#103
Conversation
When a spec's targetFiles include paths in a sibling repo (e.g.
`relayfile-adapters/packages/core/src/digest-contract.test.ts` while
the workflow ships in `relayfile`), deriveTestCommand used to render
that path straight into the final-hard-validation vitest invocation.
The generated workflow runs in a single repo's cwd, so vitest's
include glob `packages/**/*.test.ts` cannot reach a path under another
repo's directory, and the file does not exist locally anyway -- vitest
exits 1 with "No test files found". The workflow auto-fix loop then
burns its full budget (INVALID_ARTIFACT x maxAttempts) trying to
repair the phantom artifact path, even though the fix lives in another
repo it cannot reach.
Filter cross-repo paths in deriveTestCommand before constructing the
vitest invocation. A path is treated as cross-repo when:
- it escapes the cwd (`../sibling`, `./..`, embedded `/../`)
- or its first segment is not a recognized in-repo source root
(`packages`, `apps`, `services`, `src`, `lib`, `libs`, `tests`,
`test`, `e2e`, `integration-tests`)
Root-level files with no `/` (e.g. `index.test.ts`) are treated as
local since vitest can resolve them from the cwd. The same filtered
list is fed into uniqueWorkspacesFromTargetFiles, so sibling-repo
paths also do not surface as `npm test --workspace=` invocations.
Add a regression test mirroring the relayfile failure mode that
exhausted the auto-fix budget.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR updates test command generation in the workflow pipeline to handle cross-repository target paths. It introduces a local-path classifier, updates ChangesCross-repo target path filtering in test command generation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Comment |
Ricky Eval ReviewRun: Passed: 8 | Needs human: 43 | Reviewable: 43 | Missing output: 0 | Failed: 0 | Skipped: 0 Human Review CasesThese cases passed deterministic checks and include captured Ricky output for a human verdict against their REVIEW
|
| Step | Action | Result |
|---|---|---|
| 1️⃣ | Parse arguments – Ricky reads --mode local and the free‑form --spec string. |
The request is interpreted as “create a workflow definition for package checks” and store it in a temporary spec object. |
| 2️⃣ | Validate the spec – Ricky runs the spec‑validation gate (deterministic, schema‑based). | If the spec string is syntactically valid, the gate passes; otherwise you get a clear error (e.g., “Spec must be a JSON/YAML object with name, steps, …”). |
| 3️⃣ | Generate the workflow contract – Using the built‑in Package‑Check template, Ricky expands the contract: • name: package-checks • trigger: manual • steps: [checkout, install‑deps, lint, test, security‑scan, artifact‑publish] |
The contract is written to ./ricky-output/<timestamp>-package-checks.yaml. |
| 4️⃣ | Persist the artifact – Because --run is absent, Ricky stops after artifact creation. No CI/CD pipeline is triggered, no containers are spun up, and no external service |
|
| ...[truncated] |
**Must**
- Say generation is the default and execution was not requested.
- Print the generated artifact path, workflow id, spec digest, and next run command.
- Avoid showing execution evidence for a generation-only request.
**Must Not**
- Imply the workflow ran automatically.
- Present a generation-only result as execution success.
- Hide the opt-in commands for running the artifact.
**Deterministic checks**
- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0
</details>
<details>
<summary><strong>REVIEW</strong> <code>cli.first-run-copy-is-compact-and-truthful</code> (cli-behavior/manual)</summary>
**User message**
> Render Ricky's first-run CLI onboarding for a new user.
**Ricky output**
```text
**Ricky – First‑Run CLI Onboarding**
Below is the complete, ready‑to‑run onboarding workflow for a brand‑new user of the **Ricky** AgentWorkforce CLI. Follow the steps in order; each stage includes a deterministic verification gate that will abort the flow with a clear blocker message if the expected condition is not met.
---
## 1️⃣ Prerequisites (Verification Gate)
| Check | Command | Expected Result | Blocker Message |
|-------|---------|----------------|-----------------|
| **Node ≥18** | `node -v` | `v18.x` or higher | “❌ Node 18+ not detected. Install from https://nodejs.org/.” |
| **Git** | `git --version` | `git version 2.x` | “❌ Git not found. Install Git before proceeding.” |
| **Supported OS** | `uname -s` (Linux/macOS) or `ver` (Windows) | `Linux`, `Darwin`, or `Windows_NT` | “❌ Unsupported OS. Use Linux, macOS, or Windows.” |
*If any gate fails, resolve the blocker before re‑running the onboarding script.*
---
## 2️⃣ Install the Ricky CLI (Deterministic Contract)
```bash
# 2.1 Install via npm (global)
npm i -g @ricky/agentworkforce-cli
# 2.2 Verify installation
ricky --version
Verification Gate:
- Expected output:
ricky-cli vX.Y.Z(semantic version
...[truncated]
**Must**
- Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
- End every branch with a concrete next step.
- Advertise only commands that are currently implemented.
**Must Not**
- Sound like a launch page or documentation dump.
- Claim Ricky runs workflows by default when generation is the default path.
- Require web or Slack onboarding before CLI use.
**Deterministic checks**
- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0
</details>
<details>
<summary><strong>REVIEW</strong> <code>cli.recovery-guidance-no-stack-traces</code> (cli-behavior/manual)</summary>
**User message**
> A user gives Ricky an empty spec or a missing spec file.
**Ricky output**
```text
**Ricky – Handling an Empty or Missing Specification File**
Below is the concrete, deterministic workflow you should follow whenever a user submits an empty spec or a spec file that cannot be found. The contract is scoped to a single branch (`spec‑recovery/<ticket‑id>`) and a single PR, keeping the change set minimal and review‑ready.
---
## 1️⃣ Detection & Immediate Blocker Report
| Step | Action | Output |
|------|--------|--------|
| 1.1 | Verify the path supplied by the user (`<repo>/specs/<name>.md`). | `exists?` boolean |
| 1.2 | If the file **does not exist** **or** its size is **0 bytes**, raise a **Blocker** in the ticket. | `BLOCKER: Spec missing or empty – cannot proceed without a valid spec.` |
| 1.3 | Post the blocker comment on the ticket and on the PR (if already opened). | `Comment ID: <cid>` |
| 1.4 | Attach a **“Spec‑Request”** template (see §2) to the ticket for the user to fill. | `attachment: spec‑request.md` |
*The blocker must be cleared before any further automation runs.*
---
## 2️⃣ Spec‑Request Template (User‑Facing)
```markdown
# Specification Request
**Ticket:** <ticket‑id>
**Requested by:** <user>
## Required Sections
1. **Purpose** – What
...[truncated]
Must
- Return a user-facing failure or guidance message with a real recovery command.
- Distinguish generation failure from execution failure.
- Show stack traces only when verbose diagnostic mode is requested.
Must Not
- Crash with an uncaught exception in normal mode.
- Suggest commands that do not exist.
- Pretend a missing spec was accepted.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.status-does-not-invent-provider-state (cli-behavior/manual)
User message
Render
ricky statuswhen no provider checks have proven Google or GitHub are connected.
Ricky output
**`ricky status` output (no provider checks have proven Google or GitHub are connected)**
=== AgentWorkforce Status Report ===
🟡 Provider Connectivity
• Google : NOT VERIFIED (no successful check)
• GitHub : NOT VERIFIED (no successful check)
🔧 Last Run
• Workflow Engine : idle
• Last Execution : N/A
• Pending Jobs : 0
📦 Configuration
• Active Branches : main
• Loaded Plugins : core, auth, sync
• Feature Flags : all defaults
• No successful connectivity verification for Google or GitHub.
• Actions that depend on these providers will be skipped until a successful check occurs.
🛠️ Next Steps (recommended)
- Run provider health checks:
ricky check --provider google
ricky check --provider github - Verify credentials / tokens are present in the secret store.
- Re‑run the status command after checks complete.
--- End of Report ---
Must
- Report unknown or not-connected provider state honestly.
- Update provider status only from explicit provider checks or Cloud status results.
- Give concrete setup guidance for Cloud when relevant.
Must Not
- Mark Google or GitHub connected because guidance text was shown.
- Invent a provider connection URL or OAuth flow.
- Show empty fields with no recovery guidance when config is missing.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-matcher-registry-backed (generation-quality/manual)
User message
Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.
Ricky output
**Workflow: “GitHub Primitive Change”**
*Purpose*: Apply a small, self‑contained change to a repository (e.g., typo fix, comment update, config tweak) while guaranteeing reproducibility, traceability, and zero‑surprise merges.
---
## 1. Scope & Branching Contract
| Item | Definition |
|------|------------|
| **Change Type** | Primitive (single‑file, < 50 lines, no API impact) |
| **Target Repo** | Any GitHub repo under the organization’s `github.com/<org>/…` namespace |
| **Branch Naming** | `feat/primitive‑<short‑desc>-<YYYYMMDD>` |
| **PR Title Template** | `Primitive: <short‑desc> (#[issue‑id])` |
| **PR Base** | `main` (or the repo’s default protected branch) |
| **Merge Strategy** | **Squash‑and‑merge** only – preserves a linear history and a single commit message. |
| **Protected Rules** | - Require at least 1 approving review from a senior engineer.<br>- Status checks: `lint`, `unit‑test`, `code‑coverage`, `security‑scan`. |
| **Rollback Policy** | If any post‑merge gate fails, create a revert PR titled `Revert: Primitive: <short‑desc>` within 2 h. |
---
## 2. Deterministic Verification Gates
| Gate | Tool | Success Criteria |
|------|------|-------------------|
...[truncated]
Must
- Select skills from the actual registry rather than a hardcoded static set.
- Record ranked skill matches with confidence and trigger evidence.
- Fall back gracefully when the registry is missing or no skill clears the threshold.
Must Not
- Claim runtime agents embody skills when only generation-time guidance was applied.
- Fail generation solely because optional skill files are missing.
- Hide skill selection evidence from artifacts or JSON output.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)
User message
Generate a workflow where the spec says "use Claude to review and Codex to implement".
Ricky output
**Workflow: Claude‑Review → Codex‑Implementation**
| Phase | Owner | Input | Output | Contract (Pre‑/Post‑conditions) | Verification Gate |
|-------|-------|-------|--------|----------------------------------|--------------------|
| **1. Review** | **Claude** (LLM‑reviewer) | • Feature spec (Markdown) <br>• Acceptance criteria (JSON) | • Review report (Markdown) <br>• Annotated spec with “review‑notes” block | **Pre:** Spec is syntactically valid Markdown; acceptance criteria conform to schema `{"id":string,"description":string,"expected":any}`. <br>**Post:** Review report contains at least one “✅ Approved” or “❌ Changes Required” flag and a `review‑notes` block with line‑level comments. | **Gate‑R1:** Automated parser validates Markdown structure and JSON schema. <br>**Gate‑R2:** Lint checks that `review‑notes` block is present and contains a status flag. |
| **2. Decision** | **Workflow Engine** | Review report | Decision flag (`approved` / `rework`) | **Pre:** Output of Gate‑R2 passed. <br>**Post:** Decision flag is set; if `rework`, workflow aborts with blocker report. | **Gate‑D1:** Simple boolean check. |
| **3. Implementation** | **Codex** (code generator) | • Annotated spe
...[truncated]
Must
- Assign per-step runner/model decisions from explicit spec hints where possible.
- Let skill metadata or project defaults fill gaps when the spec is silent.
- Write
tool-selection.jsonor equivalent audit metadata.
Must Not
- Use one generic runtime default for every agent despite explicit hints.
- Let skill defaults override explicit user/spec runner hints.
- Omit the reason each tool or model was chosen.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)
User message
Use
--refineto sharpen a generated workflow's step task descriptions and acceptance gates.
Ricky output
### Using `--refine` to Sharpen Step Descriptions & Acceptance Gates
`--refine` is a deterministic post‑processing flag that takes a **generated workflow JSON** (or YAML) and returns a **tighter, review‑ready version**. It focuses on two things:
| Target | What `--refine` does |
|--------|----------------------|
| **Task descriptions** | • Removes vague verbs (“do”, “handle”). <br>• Adds concrete inputs, outputs, and responsible role. <br>• Limits each description to 1‑2 sentences (≈30‑45 words). |
| **Acceptance gates** | • Converts informal “check” statements into explicit **verification predicates** (e.g., `output.fileExists && output.schemaValid`). <br>• Adds a **deterministic verification step** (`verify_<stepId>`) that runs before the next step. <br>• Tags each gate with a **severity level** (`critical`, `warning`). |
---
## 1. Basic Invocation
```bash
awf generate --template my_template.yaml \
--output workflow_raw.json \
--model gpt-4o-mini
awf refine --input workflow_raw.json \
--output workflow_refined.json \
--refine
--refinetells the engine to run the Refinement Pass after the initial g
...[truncated]
**Must**
- Keep the deterministic unrefined path as the default.
- Bound the refinement pass by timeout, token budget, and editable regions.
- Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.
**Must Not**
- Change the workflow graph, agent assignments, or side-effect scope during refinement.
- Fail the whole generation if optional refinement times out.
- Hide warnings when the deterministic artifact is returned unchanged.
**Deterministic checks**
- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0
</details>
<details>
<summary><strong>REVIEW</strong> <code>generation-quality.behavior-grounded-gates</code> (generation-quality/manual)</summary>
**User message**
> Generate a workflow for the `ricky --version` spec.
**Ricky output**
```text
**Workflow: `ricky --version` Specification & Delivery**
*(Ricky AgentWorkforce – v0.9.3)*
---
## 1. Objective
Produce a deterministic, test‑driven implementation of the `ricky --version` CLI flag that:
1. Returns the current Ricky package version (semantic‑version string) on **stdout**.
2. Exits with status **0** on success, **1** on any internal error.
3. Is reproducible across all supported runtimes (Python 3.9+, Node 18+, Go 1.22).
4. Is fully covered by unit, integration, and contract tests and gated by CI.
---
## 2. Scope & Boundaries
| Scope | Included | Excluded |
|-------|----------|----------|
| **Language** | Python implementation (primary) – other language wrappers may reference the same contract. | Platform‑specific packaging (e.g., Docker image) – handled in separate workflow. |
| **Version source** | `ricky/__init__.py` `__version__` constant; fallback to `pyproject.toml` if missing. | Dynamic versioning from Git tags at runtime. |
| **Output format** | Plain string `X.Y.Z` (no trailing newline required, but allowed). | JSON, verbose logs, or additional metadata. |
| **Error handling** | Unexpected import errors, missing version constant → exit 1 with
...[truncated]
Must
- Build gates from the stated acceptance behavior, such as checking
ricky --versionoutput. - Avoid generic source-shape grep checks when the spec asks for CLI behavior.
- Keep generated validation meaningful for the current repo shape.
Must Not
- Treat
grep -Eq 'export|function|class|workflow(' dist/ricky.jsas proof of version behavior. - Claim the workflow is proven by source syntax alone.
- Ignore the package-json version resolution order in the spec.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)
User message
Generate a workflow for many independent artifacts with a validation/fix/rerun loop.
Ricky output
**Workflow: “Batch‑Artifact Validation‑Fix‑Rerun Loop”**
*Purpose*: Process a large set of independent artifacts (e.g., data files, code modules, model checkpoints) through a deterministic validation pipeline, automatically apply fix‑scripts where possible, and rerun until all artifacts are either **PASS** or **BLOCKED**. The workflow is scoped to a single branch (`batch‑validation`) and produces a single PR that aggregates all changes.
---
## 1. Contract (YAML)
```yaml
name: batch_artifact_validation
description: |
Validate, auto‑fix, and rerun a collection of independent artifacts.
scope:
branch: batch-validation
pr_title: "Batch validation & auto‑fix for <artifact‑set>"
pr_body: |
This PR contains the results of the validation‑fix‑rerun loop for the
artifact set <artifact‑set>. Each artifact ends in one of three states:
- PASS – no further action required
- FIXED – auto‑fix applied, now PASS
- BLOCKED – manual intervention required (see attached report)
inputs:
artifacts_path: # root directory containing artifacts
type: string
description: Relative path to the folder with all artifacts.
validator: # executable that returns
...[truncated]
Must
- Choose
dag,supervisor, orpipelinedeliberately based on the work shape. - Explain the pattern choice in artifact metadata or a rationale.
- Use
dagfor validation/fix/rerun loops when dependencies matter.
Must Not
- Default blindly to
dagfor every workflow. - Collapse independent artifact work into one vague agent task.
- Omit verification gates because the chosen pattern seems obvious.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-boundary-copy (generation-quality/manual)
User message
Describe how selected workflow-writing skills affected a generated Ricky workflow.
Ricky output
### How Selected Workflow‑Writing Skills Shape a Generated Ricky Workflow
Below is a concise “skill‑impact map” that shows the concrete ways each of the core Ricky writing skills is reflected in a generated workflow. The example workflow is a **“Customer‑Onboarding Email Sequence”** that moves a new user from sign‑up to first‑value activation.
| **Skill** | **What the Skill Enforces** | **Resulting Artifact in the Workflow** | **Why It Matters** |
|----------|----------------------------|----------------------------------------|--------------------|
| **Deterministic Contract Definition** | Every step declares its **input schema**, **output schema**, and **pre‑condition** in a JSON‑Schema block. | ```yaml\nsteps:\n - id: fetch_user\n input: {type: object, required: [user_id]}\n output: {type: object, required: [email, name]}\n``` | Guarantees that downstream steps can be type‑checked automatically, eliminating runtime mismatches. |
| **Verification Gates** | After each step a **gate** runs a lightweight test (e.g., schema validation, API‑status check). | ```yaml\n - id: gate_fetch_user\n type: verification\n condition: $.fetch_user.status == 200\n``` | Early failu
...[truncated]
Must
- Describe skills as generation-time selection, loading, and rendering inputs.
- Point to metadata such as
loaded-skills.txtandskill-application-boundary.json. - Say the workflow instructions were informed by selected skills.
Must Not
- Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
- Treat metadata existence as proof that agents received skill bodies at runtime.
- Overstate current tests beyond the generation boundary.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)
User message
Generate a workflow from a vague product spec and return it to the user.
Ricky output
**Ricky – Workflow Generation Assistant**
To turn a product specification into a reliable, review‑ready workflow we need a concrete set of inputs. With a vague spec the generated workflow would be incomplete and could introduce blockers later in the pipeline.
**Please provide the following details (you can answer as many as you have; missing items will be marked as “open” and revisited in the validation loop):**
| # | Required Input | Why It Matters |
|---|----------------|----------------|
| 1 | **Goal / Outcome** – a one‑sentence description of the product’s primary purpose (e.g., “Generate a PDF invoice from an order payload”). | Defines the workflow’s terminal state and success criteria. |
| 2 | **Trigger** – event that starts the workflow (e.g., HTTP POST, Pub/Sub message, schedule). | Determines the entry contract and required authentication. |
| 3 | **Core Steps** – ordered list of high‑level actions (e.g., *Validate payload → Enrich with customer data → Render PDF → Store in GCS → Notify via Slack*). | Drives the DAG structure and gate placement. |
| 4 | **Data Contracts** – schema (JSON, Protobuf, etc.) for inputs and outputs of each step. | Enables deterministic verifi
...[truncated]
Must
- Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
- Validate with dry-run or targeted structural checks where possible.
- Return artifacts, warnings, and follow-up commands honestly.
Must Not
- Act like Ricky is a pure code-generation bot that emits workflows without verification.
- Stop at "code compiles" as the proof bar.
- Skip skill-aware workflow authoring guidance for serious workflows.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)
User message
A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.
Ricky output
**Ricky’s post‑failure protocol**
1. **Capture the failure state**
- Pull the latest run log from the orchestration engine.
- Record the exact timestamp, step identifier, and the timeout error message.
- Snapshot the alive worker’s process ID, heartbeat timestamps, and any partial output directories.
2. **Mark a deterministic blocker**
- Insert a **“BLOCKER: STEP_TIMEOUT”** entry in the workflow’s status table (or equivalent state store).
- Tag the blocker with the affected step name, the timeout threshold that was exceeded, and the worker ID that remained alive.
3. **Generate a verification gate ticket**
- Create a PR‑style ticket in the workflow‑ops repo (e.g., `workflow/ops#<next‑id>`).
- Title: `BLOCKER – <workflow‑name> – step <X> timed‑out, worker <Y> hung`.
- Body must contain:
* Full log excerpt (including the timeout stack trace).
* Worker heartbeat dump.
* Current artifact manifest (showing the missing artifact).
* Suggested remediation checklist (see steps 4‑7).
- Assign the ticket to the **Reliability Lead** and add the **#workflow‑blockers** label.
4. **Isolate and terminate the stray worker**
...[truncated]
Must
- Classify the failure before deciding whether to retry.
- Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
- Preserve raw evidence and uncertainty when the class is not known.
Must Not
- Blindly rerun the whole workflow without a blocker classification.
- Treat every failure as a broken workflow definition.
- Claim Ricky fixed the workflow before rerun evidence proves it.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)
User message
Ricky detects stale
.agent-relay/,.relay/, and.trajectories/state before launching a local workflow.
Ricky output
**Ricky – Stale‑State Detection for Local Workflow Launches**
*(implementation sketch that follows the AgentWorkforce repository conventions, workflow standards, and shared authoring rules)*
---
## 1. Scope & Contract
| Item | Description |
|------|-------------|
| **Trigger** | `ricky launch <workflow‑name>` (or any local `ricky run …` command) |
| **Inputs** | • Current working directory (must contain a valid `agent.yaml` or `workflow.yaml`).<br>• Presence of the three runtime caches: `.agent‑relay/`, `.relay/`, `.trajectories/`. |
| **Outputs** | • Pass/Fail status.<br>• Structured report (`ricky‑state‑report.json`).<br>• Optional interactive prompt to clean or ignore stale data. |
| **Deterministic Gate** | The launch proceeds **only** if the stale‑state gate returns *clean* or the user explicitly overrides. |
| **Verification** | Unit‑test suite `test_state_detection.py` (≥ 90 % coverage) + integration test `e2e_state_gate.sh`. |
| **Review Artifacts** | • PR must include updated `README.md` section *“Local launch pre‑flight checks”*.<br>• Add `docs/ops/stale_state.md` with troubleshooting guide.<br>• Include `CHANGELOG` entry: “Add stale‑state detection before workflow
...[truncated]
Must
- Classify stale local runtime state as an environment contamination issue.
- Recommend quarantine or isolated-run guidance before launch.
- Record the observed paths and the action taken or recommended.
Must Not
- Treat stale runtime state as a workflow logic failure.
- Delete or overwrite state without an explicit safe path or user intent.
- Continue into execution as if the workspace were clean.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)
User message
A run marker says another Ricky or Relay run is already active in this workspace.
Ricky output
**How to resolve “another Ricky or Relay run is already active in this workspace”**
The marker file is a safety lock that prevents two concurrent executions of the same workspace. When it’s left behind (e.g., after a crash, forced termination, or a stray background process), the next run will stop with the message you’re seeing.
Below is a deterministic, step‑by‑step repair workflow that you can run locally. It follows the **Ricky repository conventions** (AGENTS.md) and includes verification gates so you can be sure the workspace is clean before you start a new run.
---
## 1️⃣ Verify the workspace state
| Gate | Command | Expected result |
|------|---------|-----------------|
| **G1 – No dangling PID** | `ps aux \| grep -i ricky | grep -v grep` | No `ricky` or `relay` processes should appear. |
| **G2 – Marker file exists** | `ls -l .ricky/run.marker` | File should be present (the error you’re seeing). |
| **G3 – Workspace git clean** | `git status --porcelain` | Output should be empty (no uncommitted changes). |
If any gate fails, address it before proceeding.
---
## 2️⃣ Safely remove the stale marker
1. **Confirm no active run** – double‑check G1. If a process *is* runn
...[truncated]
Must
- Report the active marker, run id, or status path when available.
- Ask the user to inspect, wait for, or explicitly clear the active run.
- Avoid launching a competing run that could corrupt evidence.
Must Not
- Silently start another run.
- Hide the existing run marker from the user.
- Treat the conflict as a generic failure with no recovery path.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.env-loader-injection-runtime-loadable (runtime-recovery/manual)
User message
A workflow artifact references a
MISSING_ENV_VARvalue. Ricky's deterministic auto-fix injects the.env.local/.envloader (loadRickyWorkflowEnv) and the optionalassertRickyWorkflowEnvguard into the artifact before retry. The artifact may be a master-rendered workflow whose.step({ command: ... })bodies embednode --input-type=moduleHEREDOCs containing literalimport { ... } from 'node:fs'/from 'node:path'strings.
Ricky output
**Deterministic auto‑fix for missing environment variables**
Below is the concrete, reproducible patch you can apply to any workflow artifact that currently fails because it references a `MISSING_ENV_VAR`. The fix does three things:
1. **Injects the environment‑loader** – `loadRickyWorkflowEnv()` – at the top of the generated workflow file so that `.env.local` (or fallback `.env`) is parsed before any step runs.
2. **Adds an optional guard** – `assertRickyWorkflowEnv()` – that throws a clear error if the required variable is still undefined after loading.
3. **Wraps each step’s inline Node‑module script** (the HEREDOC that contains `node --input-type=module`) with a tiny bootstrap that re‑exports the loader/guard, guaranteeing the same environment is available inside the embedded script.
---
## 1. Helper utilities (single‑source)
Create (or update) `ricky-workflow-env.js` in the repository root (or a shared `utils/` folder). This file is **import‑only** and can be reused by every workflow artifact.
```js
// ricky-workflow-env.js
import { config } from 'dotenv';
import { resolve } from 'node:path';
import { existsSync } from 'node:fs';
/**
* Load .env.local if present, o
...[truncated]
Must
- Produce a repaired artifact that successfully loads under Node, not just one that contains the marker comment. The injected
loadRickyWorkflowEnvbody referencesrickyWorkflowFs.*andrickyWorkflowPath.*, so the repair must also add the correspondingimport * as rickyWorkflowFs from 'node:fs'andimport * as rickyWorkflowPath from 'node:path'aliases at module top level. - Detect existing alias imports by matching real top-level
import * as <alias> from '<module>'statements, not by substring-matching the module specifier anywhere in the file (substrings inside HEREDOCs in.step({ command: ... })bodies do not count as imports). - Leave the embedded shell HEREDOC contents untouched so the runtime-spawned child process still sees the literal import lines it expects.
Must Not
- Skip adding the
rickyWorkflowFs/rickyWorkflowPathaliases becausefrom 'node:fs'orfrom 'node:path'already appears somewhere in the file as a string literal. - Inject
loadRickyWorkflowEnv(orassertRickyWorkflowEnv) without the supporting alias imports, which produces aReferenceError: rickyWorkflowPath is not definedat module load and burns the auto-fix budget onUNSUPPORTED_RUNTIME at runtime-launch. - Rewrite or escape the embedded HEREDOC text in step commands.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)
User message
Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.
Ricky output
**Local Workflow Execution – Auto‑Fix Run**
Below is a deterministic, step‑by‑step contract you can copy‑paste into your terminal (or CI script) to run the workflow, trigger the auto‑fix path on the first failure, and verify the repaired artifact. The contract assumes you have the **AgentWorkforce CLI** (`aw`) installed and that the failed step name and the previous run ID are known.
---
## 1️⃣ Pre‑flight Checks
| Check | Command | Expected outcome |
|-------|---------|------------------|
| CLI version | `aw --version` | `>=2.4.0` |
| Repo root | `git rev-parse --show-toplevel` | prints absolute path |
| Clean working tree | `git status --porcelain` | no output (no uncommitted changes) |
| Docker daemon (if workflow uses containers) | `docker info` | daemon reachable |
If any check fails, resolve it before proceeding.
---
## 2️⃣ Input Parameters (populate before running)
```bash
# ---- USER‑SUPPLIED VALUES ----
FAILED_STEP="my_step_name" # exact name as reported in the failure log
PREV_RUN_ID="run-2024-05-12-01" # the run identifier of the successful prior execution
WORKFLOW_PATH="./workflows/data_ingest.yml" # relative path to the workflow definition
AUTO_FI
...[truncated]
Must
- Use a bounded retry budget and summarize every attempt.
- Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
- Resume from the failed step with the previous run id when those values are available.
Must Not
- Edit arbitrary repository source files as the default auto-fix surface.
- Keep retrying after the configured max attempts.
- Lose the single Ricky tracking run id across repair/resume attempts.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.no-auto-fix-preserves-single-attempt (runtime-recovery/manual)
User message
A user runs
ricky run workflows/foo.ts --no-auto-fixand the workflow fails.
Ricky output
**Quick‑Start Troubleshooting Checklist**
| Step | Command / Action | What to Look For |
|------|------------------|------------------|
| 1️⃣ | `ricky logs workflows/foo.ts --tail 200` | Recent error messages, stack traces, and which **gate** failed. |
| 2️⃣ | `ricky run workflows/foo.ts --debug` | Full execution trace, input‑output of each **contract** (pre‑, post‑, and invariant checks). |
| 3️⃣ | Open the workflow file | Verify the **contract signatures** at the top of the file match the expected schema (e.g., `export const contract = { input: {...}, output: {...} }`). |
| 4️⃣ | Run the failing contract in isolation | `ricky test contracts/<contract‑name>.ts --input <sample‑json>` – confirms whether the contract itself is broken or the surrounding logic. |
| 5️⃣ | Check the **deterministic verification gate** logs | Look for `Gate <gate‑id> FAILED` and the accompanying `reason:` field. |
| 6️⃣ | Search recent PRs / merges | A recent change may have introduced a breaking change to a shared library or a dependent workflow. |
| 7️⃣ | If the failure is reproducible, create a **blocker issue** | Include: command used, trimmed logs, relevant contract definitions, and a minimal
...[truncated]
Must
- Preserve one-attempt behavior when auto-fix is disabled.
- Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
- Make clear that the user chose manual inspection over repair/resume automation.
Must Not
- Start a repair loop despite
--no-auto-fix. - Suppress the diagnosis because no repair was attempted.
- Present the failure as a completed repair attempt.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.in-process-local-runner (runtime-recovery/manual)
User message
Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.
Ricky output
**Ricky – Local Execution of a TypeScript Workflow Artifact**
*(Primary local path = `./workflows` – the repository‑root default for all workflow source files.)*
---
## 1. Preconditions (Verification Gate 1)
| Check | Expected | How to Verify |
|-------|----------|---------------|
| **Node runtime** | `>=18.x` (LTS) | `node -v` |
| **npm / pnpm / yarn** | Installed, matches repo lockfile | `npm -v` (or `pnpm -v` / `yarn -v`) |
| **TypeScript compiler** | `tsc` available (via dev‑dependency) | `npx tsc --version` |
| **Workflow source** | `./workflows/<name>.ts` exists and is **exported as default** | `ls ./workflows` |
| **Ricky config** | `ricky.yaml` present with `runtime: local` | `cat ricky.yaml` |
If any gate fails, **report a blocker** with the missing item and abort execution.
---
## 2. Contract – “Run‑TS‑Local” (deterministic contract)
| Field | Type | Description |
|-------|------|-------------|
| `workflowId` | `string` | File name without extension (e.g., `orderProcessor`) |
| `entryPoint` | `string` | Relative path to the TS file (`./workflows/${workflowId}.ts`) |
| `env` | `Record<string,string>` | Optional env vars to inject (merged with process.env) |
| `arg
...[truncated]
Must
- Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the
agent-relaybinary on PATH. - Precheck that Node and
@agent-relay/sdkare resolvable for the workflow. - Record the actual spawn command in execution evidence.
Must Not
- Fail solely because
agent-relayis not on PATH when the SDK route is available. - Hide the actual runtime command from evidence.
- Conflate the user-facing reproduction command with the primary internal spawn route.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)
User message
Ricky reaches a boundary after a structural failure persists after a fix attempt.
Ricky output
**Ricky – Boundary Protocol for Persistent Structural Failures**
When a structural failure survives an initial remediation attempt, Ricky follows a deterministic, contract‑driven “Boundary Protocol.” The protocol isolates the failure, prevents downstream contamination, and creates a clear hand‑off for human intervention.
---
## 1. Failure Detection & Initial Fix Attempt
| Step | Action | Contract / Artifact |
|------|--------|----------------------|
| 1.1 | **Detect** a structural failure via the *Structural‑Integrity Gate* (SIG) – a static‑analysis + unit‑test suite that must return `PASS` before any merge. | `SIG.result: {status: "FAIL", details: {...}}` |
| 1.2 | **Apply** the prescribed auto‑fix (e.g., lint‑fix, schema migration, dependency pin). | `autoFix.apply(): boolean` |
| 1.3 | **Re‑run** SIG. | `SIG.result` must be `PASS` to proceed. |
If SIG still returns `FAIL`, the Boundary Protocol is triggered.
---
## 2. Entering the Boundary
1. **Lock the Branch**
* Create a *branch lock* file `.ricky/lock.json` containing:
```json
{
"branch": "<branch‑name>",
"reason": "Persistent structural failure",
"timestamp": "<ISO‑8601>",
"owner"
...[truncated]
Must
- Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
- Distinguish escalation from a generic product failure.
- Preserve enough context for a human operator to continue.
Must Not
- Retry speculative fixes indefinitely.
- Collapse to "something went wrong" without the attempted actions.
- Discard evidence from failed repair attempts.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)
User message
Produce a workflow health digest from many Ricky workflow runs.
Ricky output
**Ricky Workflow Health Digest – Generated 2026‑05‑13**
| Workflow | Total Runs (30 d) | Success % | Avg Duration | # Failures | Top Failure Reason | Blocker Flag |
|----------|-------------------|----------|--------------|-----------|--------------------|--------------|
| **Data‑Ingest‑ETL** | 124 | 92 % | 3 m 12 s | 10 | “Source API timeout (HTTP 504)” | ❌ |
| **User‑Sync‑Delta** | 87 | 98 % | 1 m 45 s | 2 | “Schema‑mismatch on profile table” | ❌ |
| **Report‑Gen‑Daily** | 30 | 85 % | 7 m 03 s | 5 | “Insufficient disk space on worker‑node‑3” | ⚠️ (needs ops attention) |
| **Billing‑Reconcile** | 45 | 100 % | 2 m 18 s | 0 | – | ❌ |
| **Alert‑Dispatch** | 210 | 97 % | 0 m 58 s | 6 | “SMTP rate‑limit exceeded” | ⚠️ (rate‑limit increase pending) |
---
### 1. Overall Health Score
**Weighted Success % = 94.3 %** (weights: critical = 1.5×, non‑critical = 1×).
**Health Tier:** **🟢 Green** – All critical workflows (> 90 % success) are within SLA.
### 2. Critical Issues & Action Items
| Issue | Affected Workflow(s) | Impact | Current Blocker? | Recommended Fix (deterministic gate) |
|-------|----------------------|--------|------------------|------------------------------------
...[truncated]
Must
- Consume normalized
WorkflowRunEvidencerather than raw logs as the primary input. - Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
- Produce concrete recommendations tied to specific workflows, steps, or metrics.
Must Not
- Mutate evidence while analyzing it.
- Return generic advice like "improve your workflow" without references.
- Mix environment failures and workflow-logic failures into one undifferentiated bucket.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)
User message
Design a new Ricky Slack surface for workflow debugging.
Ricky output
## Ricky Slack Surface – “Workflow Debugger”
### 1. Overview
The **Ricky Slack Debugger** is a dedicated Slack app surface that lets engineers and product owners inspect, step‑through, and repair Ricky‑managed workflows in real time. It follows the **Ricky repository conventions** (AGENTS.md) and the **workflow standards** for deterministic verification, review artifacts, and bounded PR cycles.
| Component | Purpose | Primary Users |
|-----------|---------|---------------|
| **Debug Console** | Interactive view of a workflow instance (state, context, logs) | Engineers |
| **Step‑Control Panel** | Pause, resume, step‑into, step‑out, jump to node | Engineers |
| **Verification Gate** | Auto‑run deterministic checks after each step | CI/CD |
| **Review Artifact Generator** | Emits a markdown “debug report” for PR review | Reviewers |
| **Blocker Reporter** | Flags non‑recoverable errors as GitHub issues | Ops/Team Leads |
| **Scope Selector** | Limits debugging to a branch/PR or a tag‑bound workflow version | Engineers, Release Managers |
### 2. Interaction Flow (Contract)
| Phase | Slack Action | Backend Contract | Output |
|-------|--------------|------------------|--------|
| *
...[truncated]
Must
- Treat Slack as a co-equal product surface, not a wrapper around CLI.
- Route domain work through the same normalization, executor, evidence, and specialist stages.
- Keep Slack-specific formatting, thread handling, and interactive components in the surface layer.
Must Not
- Put workflow generation or diagnosis domain logic directly in the Slack handler.
- Degrade Slack to a developer shortcut with weaker routing than CLI.
- Skip signature verification, dedup, or thread handling when the surface is implemented.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.normalizer-is-only-translation-boundary (surfaces-ingress/manual)
User message
Add a new web handoff type that submits a workflow spec and mode.
Ricky output
**New Hand‑off Type:** `web_submit_spec`
| Field | Type | Description |
|-------|------|-------------|
| **type** | `string` (enum) | Fixed value `"web_submit_spec"` – identifies this hand‑off. |
| **payload** | `object` | Container for the data the UI sends. |
| **payload.spec** | `object` | Full workflow specification (JSON‑Schema‑validated). |
| **payload.mode** | `string` (enum) | Execution mode – `"dry‑run"` | `"run"` | `"preview"` |
| **payload.metadata** *(optional)* | `object` | Arbitrary key‑value pairs for tracing (e.g., `requestId`, `userId`). |
---
## 1. Contract Definition (YAML)
```yaml
# agents/web_submit_spec/contract.yaml
type: object
required: [type, payload]
properties:
type:
const: web_submit_spec
payload:
type: object
required: [spec, mode]
properties:
spec:
$ref: "#/definitions/workflowSpec"
mode:
type: string
enum: [dry-run, run, preview]
metadata:
type: object
additionalProperties: true
definitions:
workflowSpec:
type: object
# reference the existing workflow‑spec schema used by the engine
$ref: "../../schemas/workflow-spec.json"
additionalProperties: false
...[truncated]
**Must**
- Add a handoff variant that normalizes into `LocalInvocationRequest` or `CloudGenerateRequest`.
- Keep auth/session validation in the web surface before normalization.
- Ensure downstream code does not need to know the request came from web.
**Must Not**
- Short-circuit directly from the web handler to executors.
- Create a parallel domain model for web requests.
- Make the web surface the owner of local or Cloud routing semantics.
**Deterministic checks**
- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0
</details>
<details>
<summary><strong>REVIEW</strong> <code>surfaces-ingress.mcp-claude-context-is-metadata</code> (surfaces-ingress/manual)</summary>
**User message**
> A Claude session hands Ricky a workflow spec plus conversation context and asks Ricky to determine whether to generate, debug, restart, analyze, or coordinate.
**Ricky output**
```text
**Decision:** Generate a new workflow draft.
**Rationale**
1. **Input Type** – You supplied a *workflow specification* together with *conversation context* but no existing workflow artifact (code, PR, or execution log) to debug or restart.
2. **Requested Outcome** – The goal is to turn the specification and context into an actionable, version‑controlled workflow that ca
---
_Comment truncated to stay within GitHub limits. Download the `ricky-eval-run` artifact for the full `human-review.md`._
Summary
deriveTestCommandnow filters cross-repo paths out ofspec.targetFilesbefore constructing thefinal-hard-validationvitest invocation, fixing the failure mode that burns the full auto-fix budget when a spec spans multiple repos.relayfile-adapters/packages/core/src/digest-contract.test.tswhile the workflow shipped in therelayfilerepo.The bug
A spec that legitimately spans two repos (e.g.
relayfile+relayfile-adapters) produced thisfinal-hard-validationcommand:npx vitest run 'relayfile-adapters/packages/core/src/digest-contract.test.ts'The generated workflow runs in a single repo's cwd (
relayfile). Vitest'sincludeglobpackages/**/*.test.tscan't reach a path under a sibling repo's directory, and the file doesn't exist locally anyway →No test files found, exit 1.The auto-fix loop then tried to repair this 7×, each time regenerating the workflow inside
relayfile, each time pointing at the same phantom path. It can't fix it: the workflow-persona repairer operates onrepairTarget.cwd, and the actual test file belongs in../relayfile-adapters. The loop burned its full budget (INVALID_ARTIFACT × maxAttempts) for a problem it had no instruments to solve.This is the exact failure class already called out in the source:
pipeline.ts:245–248— "made the auto-fix loop chase a phantom artifact path… fails INVALID_ARTIFACT every retry until the auto-fix budget burns."master-workflow-renderer.ts:40— "The auto-fix loop then 'repairs' the workflow 7×, all failing identically."The fix
deriveTestCommandnow drops cross-repo paths before deciding what to emit. A path is treated as cross-repo when:../sibling,./.., embedded/../).packages,apps,services,src,lib,libs,tests,test,e2e,integration-tests.Root-level files with no
/(e.g.index.test.ts) are treated as local since vitest can resolve them from the cwd.The same filtered list is fed into
uniqueWorkspacesFromTargetFiles, so sibling-repo paths also don't surface asnpm test --workspace=invocations downstream.Test plan
pipeline.test.ts: spec with mixed local + cross-repo test paths now renders only the local test infinal-hard-validation; sibling-repo path is absent from both the gate command and the rendered step body.npx vitest run src/product/generation/pipeline.test.ts— 49/49 pass.npx vitest run src/product/generation— 124/124 pass.scheduled-agent.test.tsdue to missing@agent-relay/agentdep, unrelated).npx vitest run 'relayfile-adapters/...'from the failing generated workflow and resumed viaricky run … --start-from final-hard-validation --previous-run-id …→ succeeded. Confirms this is the right knob.🤖 Generated with Claude Code