Skip to content

fix(generation): make master child slices baseline-aware in a shared worktree#121

Merged
khaliqgant merged 2 commits into
mainfrom
fix/master-children-shared-worktree-scope
May 16, 2026
Merged

fix(generation): make master child slices baseline-aware in a shared worktree#121
khaliqgant merged 2 commits into
mainfrom
fix/master-children-shared-worktree-scope

Conversation

@khaliqgant
Copy link
Copy Markdown
Member

Summary

Root cause of the workspace-primitives PR-39 child update-last-week stall (the third finding from the #117 / #119 thread).

The master executor runs every child slice in the same checkout — .run({ cwd: process.cwd() }), and each run-<child> step is just ricky run <child> --foreground in that cwd. Child slices declare a per-slice changed-files scope; reviewers and fix-loops enforce it by reading git status / git diff. With a shared worktree, by the time a later child is reviewed the tree already contains every earlier sibling's dirty files, so the reviewer/validator saw dozens of out-of-scope files, assigned BLOCKED, and wrote BLOCKED_NO_COMMIT.md.

Concretely, update-last-week's codex validator blocked on:

git status --short | wc -l   -> 50
git diff --shortstat         -> 21 files changed, 4433 insertions(+), 115 deletions(-)

— all sibling-owned (digest runtime, mountfuse, mountsync, Python SDK, CLI…). The slice's own deliverable was correct and every proof passed (go test, pytest, contract check, typecheck, dry-run). It was a false block, and combined with #117/#119 it looped for hours.

The spec's own execution model prescribes one worktree per slice, but the renderer emits shared-checkout runs — so the scope check must be baseline-relative.

Fix

  • prepare-context snapshots the pre-child dirty set to <artifactsDir>/scope-baseline.txt (git status --porcelain) before the child touches anything.
  • lead-plan / implement-slice prose: the worktree is shared; baseline files are sibling/parent state and must not be edited, reverted, or cleaned.
  • A single sharedWorktreeScopeRule injected into every review/fix/final task: judge scope, findings, and BLOCKED only on the delta this child introduces on top of the baseline (current git status --porcelain minus scope-baseline.txt); never BLOCKED solely because unrelated sibling files are dirty.

Preserves the scope gate's real purpose (catch a child editing outside its declared targets) while tolerating the shared worktree the renderer actually produces.

Alternative considered

True per-child git worktree isolation (what the spec prescribes). Rejected for this PR: it needs a merge-back strategy for each slice's declared files and changes the master execution model significantly. Baseline-relative scoping is the minimal correct fix; worktree isolation can follow as a separate, larger change.

Test plan

  • New pipeline regression test: child source contains the scope-baseline.txt snapshot, the Shared-worktree scope rule, the "do not BLOCK on sibling dirt" clause, and the baseline-subtraction definition
  • npx vitest run — 1192/1192 pass
  • npx tsc --noEmit clean

Related

Completes the trio from one master run: #117 (resume-anchor skip-loop), #119 (opaque gate failure), this (shared-worktree false BLOCK — the actual root cause).

🤖 Generated with Claude Code

…worktree

The master executor runs every child slice in the SAME checkout
(`.run({ cwd: process.cwd() })`, and each `run-<child>` step is just
`ricky run <child> --foreground` in that cwd). So by the time a later
child is reviewed, earlier siblings' changes are already dirty in the
tree. Child slices declare a per-slice changed-files scope; reviewers
and fix-loops enforce it by reading `git status`/`git diff`. With a
shared worktree they saw dozens of out-of-scope sibling files, assigned
BLOCKED, and wrote BLOCKED_NO_COMMIT.md — a false block that stalled the
entire master plan for hours even though the slice's own work was
correct and all proofs passed (observed on the workspace-primitives
PR-39 run, child `update-last-week`: 50 dirty entries / 21 files /
4433 insertions, all sibling-owned).

The spec's own execution model prescribes one worktree per slice, but
the renderer emits shared-checkout runs, so the scope check must be
baseline-relative:

- `prepare-context` now snapshots the pre-child dirty set to
  `<artifactsDir>/scope-baseline.txt` (`git status --porcelain`) before
  the child touches anything.
- `lead-plan` and `implement-slice` prose tell agents the worktree is
  shared, baseline files are sibling/parent state, and must not be
  edited, reverted, or cleaned.
- A single `sharedWorktreeScopeRule` is injected into every
  review/fix/final task: judge scope, findings, and BLOCKED only on the
  delta this child introduces on top of the baseline (current
  `git status --porcelain` minus scope-baseline.txt), and never BLOCK
  solely because unrelated sibling files are dirty.

This preserves the scope gate's real purpose (catch a child editing
outside its declared targets) while tolerating the shared worktree the
renderer actually produces.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 16, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 46bce90e-214e-4510-ae09-c20115907e52

📥 Commits

Reviewing files that changed from the base of the PR and between 5d1723b and 924ec3b.

📒 Files selected for processing (2)
  • src/product/generation/master-workflow-renderer.ts
  • src/product/generation/pipeline.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/product/generation/master-workflow-renderer.ts

📝 Walkthrough

Walkthrough

This PR adds shared-worktree baseline scoping to child workflow generation. Before execution, each child snapshots the initial worktree dirty state. All reviewer and fix-loop stages inject a sharedWorktreeScopeRule that constrains scope evaluation to changes delta-relative to that baseline, preventing false BLOCKED verdicts from unrelated sibling file modifications. A regression test validates the baseline-aware behavior.

Changes

Shared-worktree baseline scoping for child workflows

Layer / File(s) Summary
Shared-worktree baseline rule definition
src/product/generation/master-workflow-renderer.ts
Defines sharedWorktreeScopeRule constant explaining how reviewers/fix-loops must judge scope only on delta relative to baseline snapshot, disallowing BLOCKED verdicts caused solely by unrelated sibling artifacts.
Baseline capture and prompt updates
src/product/generation/master-workflow-renderer.ts
Updates prepare-context step to snapshot worktree dirty set into scope-baseline.txt via git status --porcelain, and updates planning/implementation prompts to treat baseline files as pre-existing state not subject to scope violations.
Rule injection into all review and fix-loop stages
src/product/generation/master-workflow-renderer.ts
Injects sharedWorktreeScopeRule into Claude and Codex reviewer instructions across first-pass review, fix-loop, final-review, and final-fix stages to enforce consistent baseline-relative scope constraints throughout the review lifecycle.
Regression test for baseline-aware rendering
src/product/generation/pipeline.test.ts
Vitest case verifies generated child workflow snapshots initial worktree state, includes required baseline artifacts, and injects the shared-worktree scope rule to prevent false BLOCK outcomes from sibling dirt.

sequenceDiagram
participant Renderer as childWorkflowSource
participant Git as git
participant Child as ChildWorkflow (prepare-context)
participant Claude as ClaudeReviewer
participant Codex as CodexReviewer
Renderer->>Git: run git status --porcelain -> write artifactsDir/scope-baseline.txt
Renderer->>Child: include reference to scope-baseline.txt in prompts
Child->>Claude: submit changes + sharedWorktreeScopeRule
Child->>Codex: submit changes + sharedWorktreeScopeRule
Claude->>Child: evaluate delta (git status - scope-baseline) -> verdict/fixes
Codex->>Child: evaluate delta (git status - scope-baseline) -> verdict/fixes

🎯 2 (Simple) | ⏱️ ~12 minutes

A rabbit hops through the worktree,
Snapshotting baseline dust to be free,
No false blocks from sibling noise,
Each child reviews just its own joys! 🐰✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title directly describes the main change: making child slices baseline-aware to handle scope validation in a shared worktree environment, which is the core fix.
Description check ✅ Passed The description is comprehensive and closely related to the changeset, explaining the root cause, the fix implementation, alternatives considered, and test plan.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/master-children-shared-worktree-scope

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/product/generation/master-workflow-renderer.ts`:
- Around line 387-395: The baseline snapshot command writing to
scope-baseline.txt (uses artifactsDir and the git status command) is taken while
sibling children may still mutate the shared checkout (master executor runs
concurrent children), so later sibling changes can be misattributed to the wrong
child; fix by either (A) serializing child execution in the master workflow
renderer so the snapshot is taken with no concurrent siblings (turn off/guard
the concurrency path around the child-run dispatch referenced near the
concurrent-run logic) or (B) switch to per-child isolated worktrees before
taking the baseline (create a per-child git worktree/checkout or temporary clone
for the child, then run the `git status --porcelain >
${artifactsDir}/scope-baseline.txt` inside that isolated worktree), and ensure
all subsequent scope-delta calculations use that isolated workspace; update the
code paths that reference artifactsDir/scope-baseline.txt and the
child-dispatch/concurrency logic accordingly.

In `@src/product/generation/pipeline.test.ts`:
- Around line 85-95: Replace the raw substring assertions on the generated
workflow text (the local variable content and its toContain checks for
'Shared-worktree scope rule' and related strings) with parser-based structural
assertions: parse the generated workflow source (use a TypeScript/JSON/YAML
parser appropriate to the workflow format) to extract the steps/tasks for the
review and fix stages, then assert that each expected review/fix step contains
the shared-worktree scope rule object (i.e., the rule is present as a structured
node/field attached to those tasks) rather than matching on literal text; update
assertions around the "prepare-context" snapshot check to verify the scope
baseline is used by checking the task's input/argument AST/node that references
scope-baseline.txt and the subtraction expression (current git status minus
scope-baseline.txt). Ensure you operate on the same test-local variables
(content) and test names but replace toContain usages with AST/JSON/YAML field
checks against the task/step nodes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 2398453d-63f1-42e3-871b-1e32f3d02234

📥 Commits

Reviewing files that changed from the base of the PR and between e52e7b8 and 5d1723b.

📒 Files selected for processing (2)
  • src/product/generation/master-workflow-renderer.ts
  • src/product/generation/pipeline.test.ts

Comment on lines +387 to +395
// Snapshot the worktree's dirty set BEFORE this child touches anything.
// The master executor runs every child in the SAME checkout, so by the
// time a later child starts, earlier siblings' changes are already
// dirty here. Files listed in scope-baseline.txt are pre-existing
// sibling/parent state this child does not own — the scope gate must
// judge this child only on the delta it introduces, never on this
// baseline. Without it, every child after the first false-blocks on
// sibling contamination.
`git status --porcelain > ${shellQuote(`${artifactsDir}/scope-baseline.txt`)} 2>/dev/null || : > ${shellQuote(`${artifactsDir}/scope-baseline.txt`)}`,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Baseline snapshot is not stable when sibling children run concurrently.

Line 395 captures scope-baseline.txt only once at child start, but Line 206 allows concurrent child runs in the same checkout. Sibling dirt introduced after this snapshot is still outside the baseline and can be misclassified as this child’s scope delta, which reintroduces false BLOCKED outcomes.

Consider either serializing child runs in shared worktree mode or moving to per-child isolated worktrees before relying on baseline subtraction for gating.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/product/generation/master-workflow-renderer.ts` around lines 387 - 395,
The baseline snapshot command writing to scope-baseline.txt (uses artifactsDir
and the git status command) is taken while sibling children may still mutate the
shared checkout (master executor runs concurrent children), so later sibling
changes can be misattributed to the wrong child; fix by either (A) serializing
child execution in the master workflow renderer so the snapshot is taken with no
concurrent siblings (turn off/guard the concurrency path around the child-run
dispatch referenced near the concurrent-run logic) or (B) switch to per-child
isolated worktrees before taking the baseline (create a per-child git
worktree/checkout or temporary clone for the child, then run the `git status
--porcelain > ${artifactsDir}/scope-baseline.txt` inside that isolated
worktree), and ensure all subsequent scope-delta calculations use that isolated
workspace; update the code paths that reference artifactsDir/scope-baseline.txt
and the child-dispatch/concurrency logic accordingly.

Comment thread src/product/generation/pipeline.test.ts Outdated
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

Open in Devin Review

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

Re-trigger cubic

… concurrency residual

Addresses PR #121 review:

- CodeRabbit (pipeline.test.ts): replace raw substring assertions on the
  generated workflow blob with TypeScript-AST extraction. Export
  childWorkflowSource; the test now parses each `.step("<id>", {...})`
  config and asserts the baseline snapshot is on `prepare-context` and
  the shared-worktree scope rule is attached to each review/fix stage's
  `task` — verifying the contract is wired to the right steps, not that
  text appears anywhere (AGENTS.md parser-not-regex rule).
- CodeRabbit (master-workflow-renderer.ts): document the honest scope of
  the baseline heuristic. It fully fixes sequential accumulation (the
  observed stall). Concurrent siblings within a .maxConcurrency(4) wave
  remain a residual a snapshot cannot close; per-child git-worktree
  isolation is the durable fix, deferred as a separate larger change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@khaliqgant
Copy link
Copy Markdown
Member Author

Addressed both review comments in 924ec3b:

1. pipeline.test.ts — parser-based assertions (CodeRabbit, quick win): Done. Exported childWorkflowSource and the test now parses each generated child workflow with the TypeScript AST (extractStepConfigs), then asserts the baseline snapshot is on the prepare-context step and the shared-worktree scope rule is attached to each review/fix stage's task value specifically — verifying the contract is wired to the right steps, not that a string appears somewhere in the blob. Matches the AGENTS.md parser-not-regex rule.

2. Concurrent baseline race (CodeRabbit, valid): Acknowledged and documented honestly rather than papered over. The observed failure (update-last-week stall) was the sequential-accumulation case — siblings had already completed when the child ran (50 dirty entries, all from finished siblings) — which baseline subtraction fully fixes. The concurrent-within-wave race (master is .maxConcurrency(4)) is a genuine residual a one-time snapshot cannot close; a sibling that dirties a new file after the snapshot is still misattributable. The only deterministic fix is per-child git worktree isolation, which I explicitly deferred as a separate, larger change in the "Alternative considered" section. Added an in-source comment at the snapshot site spelling out this boundary so future readers know the heuristic's limits.

All checks were green; full suite still 1192/1192, typecheck clean.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 16, 2026

Ricky Eval Review

Run: .ricky/evals/runs/2026-05-16T08-49-42-664Z-HEAD-provider
Mode: provider
Git SHA: 6d056f5

Passed: 11 | Needs human: 44 | Reviewable: 44 | Missing output: 0 | Failed: 0 | Skipped: 0

Human Review Cases

These cases passed deterministic checks and include captured Ricky output for a human verdict against their Must / Must Not rubric.

REVIEW agent-assistant-boundary.real-reuse-not-rhetorical (agent-assistant-boundary/manual)

User message

Update Ricky docs and code to say it uses Agent Assistant more deeply.

Ricky output

## Updated Ricky Documentation – “Deeper Integration with Agent Assistant”

### 1. Overview
Ricky (AgentWorkforce workflow reliability, coordination, and authoring assistant) now **leverages the Agent Assistant (AA) platform as a first‑class dependency** for:

| Feature | How AA is used | Benefit |
|---------|----------------|---------|
| **Deterministic verification gates** | AA’s `verify()` micro‑service runs contract checks on every workflow commit. | Guarantees that every change satisfies the shared contract before merge. |
| **Dynamic review artifacts** | AA’s `review()` endpoint generates structured review reports (diff, risk score, test coverage). | Provides a single source of truth for reviewers and automates gating. |
| **Validation loops** | AA orchestrates 80‑to‑100 % loop coverage via its `loop()` orchestrator, feeding back into Ricky’s `validate()` stage. | Ensures near‑complete path coverage without manual test authoring. |
| **Blocker detection & reporting** | AA’s `blocker()` service surfaces real‑time blockers (missing contracts, failing gates) in the PR comment thread. | Early visibility, reduces merge‑time friction. |
| **Scoped branch/PR boundaries** | AA enforc
...[truncated]

Must

  • Ground claims in real package imports and runtime paths.
  • Distinguish current implementation from target architecture.
  • Identify which Agent Assistant primitive is actually exercised.

Must Not

  • Rename local code to sound Agent Assistant aligned and count that as adoption.
  • Claim broad Agent Assistant native behavior from documentation-only alignment.
  • Blur target architecture with landed behavior.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.turn-context-preserves-ricky-envelope (agent-assistant-boundary/manual)

User message

Evaluate the current Ricky @agent-assistant/turn-context adoption.

Ricky output

**Ricky @agent‑assistant/turn‑context Adoption Review**  
*(as of the latest CI run on 2026‑05‑14)*  

| Metric | Current Value | Target / Trend | Status |
|--------|---------------|----------------|--------|
| **Workflow contracts defined** | 12 of 15 active agents have a `turn‑context` contract in `workflow.yaml` | 100 % | ⚠️ 3 agents missing |
| **Deterministic verification gates** | 9 agents have `turn‑context` gate in CI (unit‑test + schema‑check) | 100 % | ⚠️ 6 agents lack gate |
| **Review artifacts** | 78 % of PRs include `turn‑context` impact analysis (auto‑generated checklist) | ≥ 90 % | ⚠️ Below target |
| **Validation loops (80‑100)** | Avg. loops per agent: 62 | ≥ 80 | ❌ Under‑validated |
| **Blocker reporting** | 4 open blocker tickets (all “missing contract”) | 0 open blockers | ❌ |
| **Branch/PR scoping** | 68 % of PRs scoped to a single `turn‑context` change | ≥ 85 % | ⚠️ |
| **Adoption growth (last 30 d)** | +7 % new contracts, +4 % new verification gates | +10 % / 30 d | ↗️ Slightly lagging |

---

### Key Findings

1. **Partial Coverage** – While most agents have added a `turn‑context` contract, three production agents (`@agent‑assistant/notify‑slack`, `@agent‑a
...[truncated]

Must

  • Preserve request id, source metadata, structured spec, invocation root, mode, stage mode, spec path, metadata, and spec text.
  • Record compact provenance through generation decisions or coordinator metadata.
  • Keep the shared turn context internal to the adapter boundary.

Must Not

  • Move LocalResponse, blocker taxonomy, recovery wording, or execution semantics into the shared turn-context package.
  • Drop Ricky-specific workflow metadata during envelope assembly.
  • Treat turn context as a product decision engine.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.product-core-stays-ricky-owned (agent-assistant-boundary/manual)

User message

Decide whether workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording should move into Agent Assistant.

Ricky output

**Recommendation: Move the listed capabilities into the Agent Assistant**

| Capability | Fit for Agent Assistant | Rationale | Migration Path |
|------------|------------------------|-----------|----------------|
| **Workflow generation** | ✅ | Core to the Assistant’s purpose of authoring reliable pipelines. It leverages the existing contract‑first templating engine and can be exposed as a first‑class command (`assistant generate‑workflow`). | 1. Add a `generate` sub‑command that accepts a **workflow contract** (YAML/JSON) and produces a scaffold in the `workflows/` directory. <br>2. Include deterministic verification that the generated files pass the **schema‑validation gate** before committing. |
| **Workflow validation** | ✅ | Validation is already a deterministic gate in the CI pipeline; exposing it in the Assistant gives developers immediate feedback and reduces PR turnaround. | 1. Implement `assistant validate‑workflow <path>` that runs the same schema + lint checks used in CI. <br>2. Return a **machine‑readable report** (JSON) plus a concise human summary. |
| **Workflow debugging** | ✅ | Debugging is a coordination problem (identifying missing artifacts, mismatched contrac
...[truncated]

Must

  • Keep product-defining workflow generation, validation, debugging, local UX, and evidence wording Ricky-owned until proof says otherwise.
  • Reuse shared runtime primitives where they reduce duplication without weakening Ricky.
  • Make extraction follow typed, tested, live product proof.

Must Not

  • Generalize workflow-specific behavior prematurely.
  • Adopt moving shared seams merely for architectural purity.
  • Lose the precise local-first staged workflow UX.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.one-slice-at-a-time (agent-assistant-boundary/manual)

User message

Plan the next Agent Assistant adoption slice for Ricky.

Ricky output

**Agent Assistant Adoption – “Ricky v2.0 Slice”**  
*Target window: 12 weeks (May 20 – Aug 12 2026)*  

---

## 1. Slice Goal
Deliver a production‑ready, self‑service adoption package that lets any AgentWorkforce team onboard Ricky with **zero‑touch CI/CD**, **deterministic verification**, and **clear rollback**. The slice ends with a **public “Adopt‑Now” release** and a **post‑mortem** that feeds into the next roadmap.

---

## 2. Scope (What’s In / Out)

| In‑Scope | Out‑Scope |
|----------|-----------|
| • Ricky‑core (workflow engine, contract DSL, verification runner) | • Feature‑parity with future “AI‑coach” extensions |
| • CLI/VS‑Code extension for local authoring | • Full‑blown UI portal (deferred to v2.1) |
| • GitHub‑template repo (`ricky‑adopt‑template`) with CI pipelines | • Enterprise‑wide SSO integration (use existing Org‑SSO) |
| • Deterministic test‑suite (80‑100 validation loops) | • Custom domain‑specific adapters (customers add later) |
| • Adoption playbook (markdown + diagrams) | • Legacy‑system migration scripts |

---

## 3. Milestones & Deliverables

| Week | Milestone | Deliverable | Verification Gate |
|------|-----------|-------------|--------------------
...[truncated]

Must

  • Pick exactly one real shared seam to evaluate or adopt.
  • Define a live Ricky product path that will prove the adoption.
  • Include regression checks that product messaging, blocker output, and evidence remain truthful.

Must Not

  • Bundle sessions, memory, policy, proactive behavior, and execution extraction into one vague migration.
  • Skip the comparison/evaluation step for mature Ricky-local seams.
  • Treat adoption as successful without a live product-path proof.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.future-surfaces-use-shared-runtime (agent-assistant-boundary/manual)

User message

Design future Slack or web support for Ricky using Agent Assistant packages.

Ricky output

## Ricky – Future Slack / Web Support Design  
*Agent Assistant Package (AAP) v2.3 – “Ricky‑Support”*  

---

### 1. High‑level Architecture  

| Component | Responsibility | Runtime | Key AAP Packages |
|-----------|----------------|---------|------------------|
| **Slack Bot Front‑End** | Receives user messages, renders interactive blocks, forwards to core engine. | Node 18 (AWS Lambda) | `@ricky/slack‑adapter` |
| **Web Chat Front‑End** | Embedded React widget, supports rich media, fallback to Slack link. | Browser (React 18) | `@ricky/web‑widget` |
| **Core Engine** | Executes workflow contracts, coordinates agents, enforces verification gates. | Python 3.11 (AWS Fargate) | `@ricky/engine`, `@ricky/contract‑runtime` |
| **Agent Registry** | Stores versioned agent definitions, metadata, and capability matrix. | DynamoDB (v2) | `@ricky/registry` |
| **Verification Service** | Runs deterministic checks, stores results, exposes gate API. | Go 1.22 (ECS) | `@ricky/verify` |
| **Review & Audit Store** | Immutable log of PRs, review artifacts, blocker reports. | S3‑ObjectLock + QLDB | `@ricky/audit` |
| **CI/CD Pipeline** | Enforces 80‑100 validation loops, branch/PR boundaries, autom
...[truncated]

Must

  • Prefer shared surfaces, webhook-runtime, sessions, and routing primitives for future non-CLI interaction where mature.
  • Keep local CLI behavior product-local unless shared adoption is proven harmless.
  • Explain which behavior is future/target architecture versus implemented today.

Must Not

  • Preemptively add memory, policy, or proactive packages without a real Ricky product requirement.
  • Let future surface abstractions distort the current CLI contract.
  • Duplicate a mature Agent Assistant capability locally without justification.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.generation-default-not-execution (cli-behavior/manual)

User message

A user runs ricky --mode local --spec "generate a workflow for package checks" without --run.

Ricky output

**What the command does**

Running  

```bash
ricky --mode local --spec "generate a workflow for package checks"
  • --mode local – tells Ricky to operate in the local development environment (uses the local repository, local Docker daemon, etc.).
  • --spec "generate a workflow for package checks" – is the natural‑language request that Ricky will translate into a concrete workflow definition (YAML/JSON) following the AgentWorkforce contract.
  • No --run flag – means Ricky will only generate the workflow artifact; it will not submit the workflow to the executor, start any containers, or perform any checks.

So the command will:

  1. Parse the spec string.
  2. Resolve the appropriate workflow template (e.g., package-checks.yml).
  3. Fill in any required parameters (defaults, inferred values, etc.).
  4. Output the fully‑rendered workflow file to the console (or to ./ricky-output/ if the config enables file output).
  5. Exit with status 0 if generation succeeded, or a non‑zero code with an error message if the spec could not be mapped to a known template.

Typical output

--- Generated workflow: package-checks.yml ---
name: package-checks
o
...[truncated]

Must

  • Say generation is the default and execution was not requested.
  • Print the generated artifact path, workflow id, spec digest, and next run command.
  • Avoid showing execution evidence for a generation-only request.

Must Not

  • Imply the workflow ran automatically.
  • Present a generation-only result as execution success.
  • Hide the opt-in commands for running the artifact.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.first-run-copy-is-compact-and-truthful (cli-behavior/manual)

User message

Render Ricky's first-run CLI onboarding for a new user.

Ricky output

**Ricky – First‑Run CLI Onboarding**

Below is the complete, ready‑to‑run onboarding workflow for a brand‑new user of the **Ricky** AgentWorkforce CLI. It follows the repository conventions (AGENTS.md), the workflow standards, and the shared authoring rules. The contract, verification gates, and review artifacts are all included so you can copy‑paste the script into a new branch, run the validation loop, and merge with confidence.

---

## 1️⃣  Workflow Contract

| Item | Description |
|------|-------------|
| **Name** | `ricky-onboard-first-run` |
| **Scope** | One‑time setup for a new local development environment (install dependencies, configure secrets, bootstrap a sample project). |
| **Inputs** | None (runs interactively). |
| **Outputs** | - `~/.ricky/config.yaml` (user config) <br> - `ricky-demo/` (sample project) <br> - `ricky.log` (run‑time log) |
| **Success Criteria** | - All commands exit `0` <br> - Config file contains a valid `api_key` field <br> - Sample project builds and passes `ricky test` |
| **Failure Mode** | Any non‑zero exit aborts the run, prints a clear blocker message, and leaves a `ricky.onboard.error` artifact for CI review. |

---

## 2️⃣  Onboarding S
...[truncated]

Must

  • Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
  • End every branch with a concrete next step.
  • Advertise only commands that are currently implemented.

Must Not

  • Sound like a launch page or documentation dump.
  • Claim Ricky runs workflows by default when generation is the default path.
  • Require web or Slack onboarding before CLI use.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.recovery-guidance-no-stack-traces (cli-behavior/manual)

User message

A user gives Ricky an empty spec or a missing spec file.

Ricky output

**Handling an Empty or Missing Specification File**

When a user submits an empty spec or fails to provide a spec file, follow the deterministic, contract‑driven workflow below. This ensures the issue is surfaced early, the team stays aligned, and the work can resume without ambiguity.

---

## 1. Immediate Blocker Detection (Gate #0)

| Condition | Action |
|-----------|--------|
| Spec file **absent** or **zero‑byte** | **Fail** the “Spec Presence” gate. Emit a **Blocker Report** (see §2). |
| Spec file present but **fails JSON/YAML schema validation** | **Fail** the “Spec Syntax” gate. Emit a **Blocker Report**. |

*No downstream steps are executed until the blocker is cleared.*

---

## 2. Blocker Report Template

🛑 BLOCKER – Missing or Empty Specification
Repository: <repo‑name>
Branch: <branch‑name>
PR: <PR‑number>
Detected by: Ricky (AgentWorkforce)
Timestamp: <ISO‑8601>

Details:

  • File: <path/to/spec.file>
  • Condition: <missing | empty | invalid>
  • Impact: All downstream validation, code‑gen, and CI steps are halted.

Required Action:

  1. Upload a complete spec file that conforms to the <SPEC‑SCHEMA‑VERSION> schema.
  2. If the spec is intentionally empty (e.g
    ...[truncated]

**Must**

- Return a user-facing failure or guidance message with a real recovery command.
- Distinguish generation failure from execution failure.
- Show stack traces only when verbose diagnostic mode is requested.

**Must Not**

- Crash with an uncaught exception in normal mode.
- Suggest commands that do not exist.
- Pretend a missing spec was accepted.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.status-does-not-invent-provider-state</code> (cli-behavior/manual)</summary>

**User message**

> Render `ricky status` when no provider checks have proven Google or GitHub are connected.

**Ricky output**

```text
**`ricky status` output (no provider checks have proven Google or GitHub are connected)**  

=== Ricky AgentWorkforce Status Report ===

🟡 Provider Connectivity
• Google: NOT VERIFIED – no successful OAuth / API health check
• GitHub: NOT VERIFIED – no successful token validation / API health check

🔧 Configuration
• Google OAuth client ID:
• Google OAuth secret:
• GitHub App ID:
• GitHub App private key:

⚙️ Last Verification Run
• Timestamp: 2026‑05‑16 09:12:03 UTC
• Checks executed: 0 (provider health checks skipped)
• Result: Incomplete – missing provider verification

🚨 Blockers
• Google provider not confirmed – workflow steps that depend on Google (e.g., user SSO, Drive access) are disabled.
• GitHub provider not confirmed – repository sync, PR automation, and code‑review hooks are disabled.

🛠️ Recommended Actions
1. Run provider health checks:
- ricky provider check google
- ricky provider check github
2. Verify that OAuth / App credentials are correct and reachable from the runtime envir
...[truncated]


**Must**

- Report unknown or not-connected provider state honestly.
- Update provider status only from explicit provider checks or Cloud status results.
- Give concrete setup guidance for Cloud when relevant.

**Must Not**

- Mark Google or GitHub connected because guidance text was shown.
- Invent a provider connection URL or OAuth flow.
- Show empty fields with no recovery guidance when config is missing.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>generation-quality.skill-matcher-registry-backed</code> (generation-quality/manual)</summary>

**User message**

> Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.

**Ricky output**

```text
## GitHub Primitive‑Change Workflow  
**Purpose:** Apply a single‑file edit (e.g., README typo fix) to a public repository while guaranteeing reproducible review, CI, and merge safety.

| Phase | Owner | Action | Artifact | Verification Gate |
|-------|-------|--------|----------|--------------------|
| **1️⃣ Init** | **Ricky (Coordination)** | `ricky init‑change --repo <owner>/<repo> --branch <base‑branch>` | `change‑spec.yaml` (contains PR title, description, file path, diff stub) | **Spec‑Schema** – YAML validates against `schemas/change‑spec.json`. |
| **2️⃣ Branch** | **Ricky (Branching)** | `git checkout -b <feature‑branch>` | Local branch created | **Git‑State** – `git status` must be clean; no untracked files. |
| **3️⃣ Edit** | **Developer** | Apply edit (e.g., `sed -i 's/teh/the/g' README.md`) | Updated file + `git diff` | **Diff‑Lint** – `ricky lint‑diff` checks for stray whitespace, line‑end changes. |
| **4️⃣ Commit** | **Developer** | `git add <file>` → `git commit -m "<PR title>"` | Commit object | **Commit‑Policy** – Commit message matches `change‑spec.title` (regex). |
| **5️⃣ Push** | **Developer** | `git push origin <feature‑branch>` | Remote branch | **Push‑Guar
...[truncated]

Must

  • Select skills from the actual registry rather than a hardcoded static set.
  • Record ranked skill matches with confidence and trigger evidence.
  • Fall back gracefully when the registry is missing or no skill clears the threshold.

Must Not

  • Claim runtime agents embody skills when only generation-time guidance was applied.
  • Fail generation solely because optional skill files are missing.
  • Hide skill selection evidence from artifacts or JSON output.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)

User message

Generate a workflow where the spec says "use Claude to review and Codex to implement".

Ricky output

**Workflow: “Claude‑Review → Codex‑Implement”**  
*Purpose*: Automate a deterministic pipeline where Claude (LLM‑reviewer) validates design/spec changes and Codex (code‑generation model) produces the implementation. The workflow is scoped to a single feature branch and enforces explicit contracts, verification gates, and review artifacts.

---

## 1. High‑Level Diagram  

[Feature Branch] ──► (1) Claude Review Stage ──► (2) Codex Generation Stage ──► (3) CI Verification ──► (4) PR Open → Human Gate → Merge


---

## 2. Contract Definitions  

| Contract | Owner | Input | Output | Success Criteria |
|----------|-------|-------|--------|------------------|
| **C1 – ReviewSpec** | Claude | `spec.yaml` (user‑provided) | `review_report.json` | All required fields present, no “critical” issues, confidence ≥ 0.85 |
| **C2 – GenerateCode** | Codex | `spec.yaml` + `review_report.json` | `src/` (generated code) + `gen_meta.json` | Compiles, passes lint, matches spec coverage ≥ 90 % |
| **C3 – VerifyBuild** | CI | `src/` | `build_artifact.zip` + `test_report.xml` | Build succeeds, all unit tests pass, coverage ≥ 80 % |
| **C4 – PRReady** | Automation | `build_artifact.zip`, `test_repor
...[truncated]

Must

  • Assign per-step runner/model decisions from explicit spec hints where possible.
  • Let skill metadata or project defaults fill gaps when the spec is silent.
  • Write tool-selection.json or equivalent audit metadata.

Must Not

  • Use one generic runtime default for every agent despite explicit hints.
  • Let skill defaults override explicit user/spec runner hints.
  • Omit the reason each tool or model was chosen.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)

User message

Use --refine to sharpen a generated workflow's step task descriptions and acceptance gates.

Ricky output

### Using `--refine` to Sharpen Step Descriptions & Acceptance Gates  

`--refine` is a deterministic post‑processing flag that takes an existing workflow JSON/YAML and rewrites the **`task.description`** and **`task.acceptanceGate`** fields for every step. The goal is to make each step **action‑oriented, measurable, and verifiable** while keeping the original logical flow intact.

Below is the end‑to‑end contract, a minimal example, and the verification checklist you should run after the refinement.

---

## 1️⃣ Contract Overview  

| Input | Type | Required | Notes |
|-------|------|----------|-------|
| `workflow` | JSON/YAML file | ✅ | Must contain a top‑level `steps` array, each step with `id`, `task.description`, and optional `task.acceptanceGate`. |
| `--refine` | Flag | ✅ | Triggers the refinement engine. |
| `--output <path>` | String | ❌ | Destination for the refined workflow (defaults to `stdout`). |
| `--dry-run` | Flag | ❌ | Prints diff without writing file. |
| `--max‑tokens <int>` | Integer | ❌ | Upper bound for LLM token usage (default 2048). |

**Outputs**

* Refined workflow file (same format as input).  
* Optional diff report (`--dry-run`).  

**Deterministic Gu
...[truncated]

Must

  • Keep the deterministic unrefined path as the default.
  • Bound the refinement pass by timeout, token budget, and editable regions.
  • Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.

Must Not

  • Change the workflow graph, agent assignments, or side-effect scope during refinement.
  • Fail the whole generation if optional refinement times out.
  • Hide warnings when the deterministic artifact is returned unchanged.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.behavior-grounded-gates (generation-quality/manual)

User message

Generate a workflow for the ricky --version spec.

Ricky output

**Workflow: `ricky --version` Specification & Release Pipeline**  
*(Ricky AgentWorkforce – v0.9.0)*  

---  

### 1. Scope & Objective
Create a deterministic, test‑driven workflow that defines, validates, and releases the `ricky --version` command. The command must:

1. Print the current Ricky version string in **semantic‑version** format (`MAJOR.MINOR.PATCH`).
2. Exit with status **0** on success, **1** on any internal error.
3. Support an optional `--json` flag that returns `{ "ricky": "X.Y.Z" }`.
4. Be reproducible across all supported CI runners (Linux‑x86_64, macOS‑arm64, Windows‑x86_64).

The workflow must enforce **80‑100 validation loops** (unit → integration → contract → e2e) before any merge to `main`.

---  

### 2. Repository Layout (Ricky conventions)

/ricky

├─ src/
│ └─ cli/
│ └─ version.rs # implementation

├─ tests/
│ ├─ unit/
│ │ └─ version_test.rs
│ ├─ integration/
│ │ └─ version_cli_test.rs
│ └─ contract/
│ └─ version_contract.yaml

├─ .github/
│ └─ workflows/
│ └─ version_spec.yml # CI pipeline (this file)

├─ Cargo.toml
└─ README.md


---  

### 3. Contract (Machine‑Readable)

**File:** `tests/contra
...[truncated]

Must

  • Build gates from the stated acceptance behavior, such as checking ricky --version output.
  • Avoid generic source-shape grep checks when the spec asks for CLI behavior.
  • Keep generated validation meaningful for the current repo shape.

Must Not

  • Treat grep -Eq 'export|function|class|workflow(' dist/ricky.js as proof of version behavior.
  • Claim the workflow is proven by source syntax alone.
  • Ignore the package-json version resolution order in the spec.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)

User message

Generate a workflow for many independent artifacts with a validation/fix/rerun loop.

Ricky output

**Workflow: “Batch‑Artifact Validation / Fix / Rerun Loop”**  
*Purpose*: Process a large set of independent artifacts (e.g., data files, config bundles, code snippets) through a deterministic validation pipeline, automatically apply fix‑scripts where possible, and re‑run until all artifacts are either **PASS** or **BLOCKED**. The workflow is scoped to a single feature branch and produces a single PR that contains only the *fix* commits; original artifacts remain untouched.

---

## 1. Contract (Inputs / Outputs)

| Element | Type | Description |
|---------|------|-------------|
| `artifact_set` | Directory path (relative to repo root) | Root folder containing *N* independent artifact sub‑folders (`artifact_001/`, `artifact_002/`, …). Each sub‑folder must contain a manifest `artifact.yaml`. |
| `validation_rules` | JSON file (`rules.json`) | Declarative schema (JSON‑Schema) and custom script hooks that each artifact must satisfy. |
| `fix_scripts` | Directory (`fixes/`) | One executable per rule (`fix_<rule_id>.sh`) that can transform an artifact in‑place to satisfy the rule. |
| `max_iterations` | Integer (default = 10) | Upper bound on validation/fix cycles to avoid infinite loop
...[truncated]

Must

  • Choose dag, supervisor, or pipeline deliberately based on the work shape.
  • Explain the pattern choice in artifact metadata or a rationale.
  • Use dag for validation/fix/rerun loops when dependencies matter.

Must Not

  • Default blindly to dag for every workflow.
  • Collapse independent artifact work into one vague agent task.
  • Omit verification gates because the chosen pattern seems obvious.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-boundary-copy (generation-quality/manual)

User message

Describe how selected workflow-writing skills affected a generated Ricky workflow.

Ricky output

### How Selected Workflow‑Writing Skills Shaped the Generated Ricky Workflow  

| Skill | What the Skill Enforces | Direct Effect on the Generated Workflow |
|------|------------------------|------------------------------------------|
| **Concrete Contract Definition** | Every step declares its **input schema**, **output schema**, and **exit codes** in a machine‑readable `contract.yaml`. | The workflow begins with a `contract` block that lists required artifacts (e.g., `source_repo`, `target_branch`) and the exact shape of the JSON payload each task must return. This eliminates ambiguous “hand‑off” data and lets the CI runner validate payloads before a step runs. |
| **Deterministic Verification Gates** | After each task, a **verification gate** runs a pure function that checks the task’s output against its contract. The gate must return `PASS` or `FAIL` before the next task is scheduled. | The generated workflow inserts a `verify_<task>` step after every core action. If verification fails, the pipeline aborts with a clear error code (`VERIF_<TASK>_FAIL`) and a link to the generated diff of expected vs. actual payloads. |
| **Review Artifacts Production** | Tasks must emit **review
...[truncated]

Must

  • Describe skills as generation-time selection, loading, and rendering inputs.
  • Point to metadata such as loaded-skills.txt and skill-application-boundary.json.
  • Say the workflow instructions were informed by selected skills.

Must Not

  • Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
  • Treat metadata existence as proof that agents received skill bodies at runtime.
  • Overstate current tests beyond the generation boundary.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)

User message

Generate a workflow from a vague product spec and return it to the user.

Ricky output

Below is a **minimal, deterministic workflow skeleton** that you can flesh out once the missing product details are clarified.  
It follows the AgentWorkforce conventions (AGENTS.md, workflow standards, shared‑authoring rules) and is ready to be checked into a feature branch.

---

## 1️⃣ Feature Branch & PR Scope
| Item | Value |
|------|-------|
| **Branch name** | `feature/<product‑code>-<short‑desc>` (e.g., `feature/ABC‑123‑order‑pipeline`) |
| **PR title** | `feat: implement <short‑desc> workflow` |
| **PR description** | • Goal summary  <br>• Link to spec (or placeholder)  <br>• Acceptance criteria (see §4)  <br>• Checklist (see §5) |
| **Reviewers** | • Primary workflow author  <br>• QA lead  <br>• Security / Infra (if applicable) |
| **Labels** | `workflow`, `needs‑spec`, `draft` |

---

## 2️⃣ Workflow Contract (YAML)

```yaml
# .github/workflows/<workflow‑name>.yml
name: <WORKFLOW_NAME>          # e.g. Order Processing
on:
  workflow_dispatch: {}        # manual trigger
  push:
    branches: [main]           # optional CI trigger
  schedule:
    - cron: '0 2 * * *'        # daily run (adjust)

jobs:
  # -----------------------------------------------------------------
  #
...[truncated]

Must

  • Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
  • Validate with dry-run or targeted structural checks where possible.
  • Return artifacts, warnings, and follow-up commands honestly.

Must Not

  • Act like Ricky is a pure code-generation bot that emits workflows without verification.
  • Stop at "code compiles" as the proof bar.
  • Skip skill-aware workflow authoring guidance for serious workflows.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)

User message

A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.

Ricky output

**Ricky’s Immediate Action Plan**

| Phase | Goal | Concrete Steps | Deliverable |
|-------|------|----------------|-------------|
| 1️⃣ Detect & Log | Capture the exact failure context | • Pull the run‑metadata for the failed workflow (step ID, timeout value, timestamps). <br>• Record the “alive‑but‑idle” worker’s heartbeat logs and any partial output directories. <br>• Tag the run with **`status: BLOCKED`** and **`reason: STEP_TIMEOUT + ORPHAN_WORKER`**. | `run‑<uuid>.log` + `run‑<uuid>.metadata.json` stored in the **audit‑log** bucket. |
| 2️⃣ Blocker Report | Communicate the problem to the coordination layer | • Create a **Blocker Issue** in the repository (e.g., `issues/blocked-<run‑id>.md`). <br>• Include: <br> - Step name, timeout config, actual duration. <br> - Worker ID, last heartbeat, missing artifact list. <br> - Links to logs and to the workflow definition file. <br>• Add the label `blocker:timeout-orphan`. | `issues/blocked-<run‑id>.md` |
| 3️⃣ Deterministic Verification Gate | Prevent downstream runs until the problem is resolved | • Insert a **Gate** node in the workflow DAG immediately after the failing step: <br> `gate: verify‑orphan‑worker‑resolved`. <br>• Gate c
...[truncated]

Must

  • Classify the failure before deciding whether to retry.
  • Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
  • Preserve raw evidence and uncertainty when the class is not known.

Must Not

  • Blindly rerun the whole workflow without a blocker classification.
  • Treat every failure as a broken workflow definition.
  • Claim Ricky fixed the workflow before rerun evidence proves it.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)

User message

Ricky detects stale .agent-relay/, .relay/, and .trajectories/ state before launching a local workflow.

Ricky output

**Workflow Contract – “Detect‑and‑Purge Stale Relay State”**  
*Scope:* Local development runs of any AgentWorkforce workflow.  
*Inputs:*  
- `WORKSPACE_ROOT` – absolute path to the repository root (environment variable).  
- Optional `FORCE_PURGE=true|false` – override interactive prompt.  

*Outputs (artifact):*  
- `stale‑state‑report.json` – list of stale directories, timestamps, and actions taken.  
- Exit code `0` on clean start, `1` if purge failed, `2` if user aborted.  

*Pre‑conditions:*  
1. The repository contains a valid `agent.yml` manifest.  
2. The current user has write permission on `WORKSPACE_ROOT`.  

*Post‑conditions:*  
1. No `.agent‑relay/`, `.relay/`, or `.trajectories/` directories exist **or** they contain only fresh state (≤ 5 min old).  
2. `stale‑state‑report.json` is written to `WORKSPACE_ROOT/.ricky/reports/`.  

---

## 1. Deterministic Verification Gate – “Stale‑State Scan”

| Step | Action | Deterministic Check | Expected Result |
|------|--------|---------------------|-----------------|
| 1 | Resolve absolute paths for `.agent-relay/`, `.relay/`, `.trajectories/`. | `fs.statSync` on each directory. | Path exists → continue; missing → skip. |
| 2 
...[truncated]

Must

  • Classify stale local runtime state as an environment contamination issue.
  • Recommend quarantine or isolated-run guidance before launch.
  • Record the observed paths and the action taken or recommended.

Must Not

  • Treat stale runtime state as a workflow logic failure.
  • Delete or overwrite state without an explicit safe path or user intent.
  • Continue into execution as if the workspace were clean.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)

User message

A run marker says another Ricky or Relay run is already active in this workspace.

Ricky output

**How to resolve “another Ricky or Relay run is already active in this workspace”**

The marker file is a safety lock that prevents two Ricky/Relay executions from stepping on each other. When the lock is left behind (e.g., a crash, a killed terminal, or a stray background process) you’ll see the error you described. Follow the deterministic, gated procedure below to clear the lock safely and get back to a clean workspace.

---

## 1️⃣ Verify the current state (non‑destructive)

| Step | Command | Expected outcome |
|------|---------|-------------------|
| 1.1 | `ricky status` | Should report **“Run lock present – PID \<pid\>”** or **“No active run”**. |
| 1.2 | `ps -p <pid> -o pid,cmd` (replace `<pid>` from the status output) | If the PID exists, you have a live process; otherwise the PID is stale. |
| 1.3 | `ls -l .ricky/run.lock` | Shows the lock file’s timestamp and the PID stored inside. |

*If the PID is alive and you **intentionally** have a run in progress, simply wait for it to finish or attach to its terminal.*  
*If the PID is dead or you’re unsure, proceed to step 2.*

---

## 2️⃣ Graceful lock removal (deterministic gate)

1. **Attempt a graceful unlock** – Ricky provi
...[truncated]

Must

  • Report the active marker, run id, or status path when available.
  • Ask the user to inspect, wait for, or explicitly clear the active run.
  • Avoid launching a competing run that could corrupt evidence.

Must Not

  • Silently start another run.
  • Hide the existing run marker from the user.
  • Treat the conflict as a generic failure with no recovery path.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.env-loader-injection-runtime-loadable (runtime-recovery/manual)

User message

A workflow artifact references a MISSING_ENV_VAR value. Ricky's deterministic auto-fix injects the .env.local / .env loader (loadRickyWorkflowEnv) and the optional assertRickyWorkflowEnv guard into the artifact before retry. The artifact may be a master-rendered workflow whose .step({ command: ... }) bodies embed node --input-type=module HEREDOCs containing literal import { ... } from 'node:fs' / from 'node:path' strings.

Ricky output

**Deterministic auto‑fix for missing environment variables**

Below is the exact patch you can apply to any workflow artifact that currently fails because it references a `MISSING_ENV_VAR`. The fix does three things:

1. **Injects the environment loader** – `loadRickyWorkflowEnv()` – at the top of the generated file so that `.env.local` (or fallback `.env`) is read before any step runs.  
2. **Adds an optional guard** – `assertRickyWorkflowEnv()` – that throws a clear error if the required variable is still undefined after loading.  
3. **Leaves the existing step bodies untouched** (including the `node --input-type=module` HEREDOCs that embed literal `import … from 'node:fs'` / `from 'node:path'` strings).

---

## 1. Helper utilities (add once per repository)

Create (or update) `ricky-workflow-env.ts` in a shared utilities folder (e.g., `src/utils/`).

```ts
// src/utils/ricky-workflow-env.ts
import { config as loadDotenv } from 'dotenv';
import { resolve } from 'node:path';
import { existsSync } from 'node:fs';

/**
 * Load environment variables from .env.local (preferred) or .env.
 * This function is idempotent – calling it multiple times is safe.
 */
export function loadRickyW
...[truncated]

Must

  • Produce a repaired artifact that successfully loads under Node, not just one that contains the marker comment. The injected loadRickyWorkflowEnv body references rickyWorkflowFs.* and rickyWorkflowPath.*, so the repair must also add the corresponding import * as rickyWorkflowFs from 'node:fs' and import * as rickyWorkflowPath from 'node:path' aliases at module top level.
  • Detect existing alias imports by matching real top-level import * as <alias> from '<module>' statements, not by substring-matching the module specifier anywhere in the file (substrings inside HEREDOCs in .step({ command: ... }) bodies do not count as imports).
  • Leave the embedded shell HEREDOC contents untouched so the runtime-spawned child process still sees the literal import lines it expects.

Must Not

  • Skip adding the rickyWorkflowFs / rickyWorkflowPath aliases because from 'node:fs' or from 'node:path' already appears somewhere in the file as a string literal.
  • Inject loadRickyWorkflowEnv (or assertRickyWorkflowEnv) without the supporting alias imports, which produces a ReferenceError: rickyWorkflowPath is not defined at module load and burns the auto-fix budget on UNSUPPORTED_RUNTIME at runtime-launch.
  • Rewrite or escape the embedded HEREDOC text in step commands.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)

User message

Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.

Ricky output

**Local Workflow Execution – Auto‑Fix Run**

Below is a deterministic, contract‑driven procedure you can copy‑paste into your terminal (or CI step) to run the workflow, trigger the auto‑fix path on the first failure, and verify the repaired artifact. It assumes you have the **AgentWorkforce CLI** (`aw`) installed and that the failed step name and the previous run‑ID are known.

---

## 1️⃣ Preconditions (contract)

| Item | Requirement | How to verify |
|------|-------------|---------------|
| `aw` binary | `>= v2.4.0` (auto‑fix flag introduced) | `aw --version` |
| Workflow definition | Present in `./workflow.yaml` (or path you supply) | `cat workflow.yaml` |
| Failed step name | e.g. `build-image` (provided by user) | variable `FAILED_STEP` |
| Previous run ID | e.g. `run‑20240512‑a1b2c3` (provided) | variable `PREV_RUN_ID` |
| Write permissions | Current user can write to `./.aw/` cache & repair dirs | `touch ./.aw/test && rm ./.aw/test` |
| Docker / runtime | Required by workflow steps (if any) | `docker version` |

If any of the above checks fail, abort and report the blocker before proceeding.

---

## 2️⃣ Deterministic verification gates

```bash
# Gate 1 – CLI sanity
aw --v
...[truncated]

Must

  • Use a bounded retry budget and summarize every attempt.
  • Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
  • Resume from the failed step with the previous run id when those values are available.

Must Not

  • Edit arbitrary repository source files as the default auto-fix surface.
  • Keep retrying after the configured max attempts.
  • Lose the single Ricky tracking run id across repair/resume attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.no-auto-fix-preserves-single-attempt (runtime-recovery/manual)

User message

A user runs ricky run workflows/foo.ts --no-auto-fix and the workflow fails.

Ricky output

**Quick‑Start Debug Checklist**

| Step | Command / Action | What to Look For |
|------|------------------|-----------------|
| 1️⃣  | `ricky logs workflows/foo.ts` | Full execution trace. Identify the exact line / task where the failure occurs. |
| 2️⃣  | `ricky run workflows/foo.ts --debug` | Runs the workflow in interactive debug mode (break‑points, step‑through). |
| 3️⃣  | Open the generated **artifact report**: `cat .ricky/artifacts/foo.run.json` | Shows input‑output contracts, validation results, and any deterministic‑verification gate failures. |
| 4️⃣  | Verify the **contract file** (if present): `cat workflows/foo.contract.ts` | Ensure the declared `inputSchema`/`outputSchema` matches what the workflow actually produces. |
| 5️⃣  | Run the **stand‑alone verifier**: `ricky verify workflows/foo.ts` | Executes all deterministic verification gates (type‑checks, schema validation, idempotency checks). |
| 6️⃣  | Check for **blocked resources**: `ricky status --blocked` | Lists any external services, secrets, or file‑system locks that may have caused a hard stop. |
| 7️⃣  | Review the **review artifact** (if a PR exists): `cat .ricky/reviews/foo.pr.json` | Shows reviewer commen
...[truncated]

Must

  • Preserve one-attempt behavior when auto-fix is disabled.
  • Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
  • Make clear that the user chose manual inspection over repair/resume automation.

Must Not

  • Start a repair loop despite --no-auto-fix.
  • Suppress the diagnosis because no repair was attempted.
  • Present the failure as a completed repair attempt.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.in-process-local-runner (runtime-recovery/manual)

User message

Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.

Ricky output

**Ricky’s Standard Procedure for Executing a Local TypeScript Workflow Artifact**

| Phase | Action | Details & Commands | Expected Outcome |
|-------|--------|--------------------|------------------|
| 1️⃣ **Locate the Primary Path** | Resolve the canonical workspace root and the workflow’s entry file. | ```bash\n# Resolve workspace root (repo‑root) – defined in .ricky/config.yml\nRICKY_ROOT=$(git rev-parse --show-toplevel)\n# Primary local path for the workflow (default: $RICKY_ROOT/workflows/local)\nWORKFLOW_ROOT=\"$RICKY_ROOT/workflows/local\"\n# Entry point – must be named <workflow>.ts (e.g., myFlow.ts)\nENTRY=\"$WORKFLOW_ROOT/myFlow.ts\"\n``` | `RICKY_ROOT`, `WORKFLOW_ROOT`, and `ENTRY` variables point to the correct locations. |
| 2️⃣ **Validate Contract & Dependencies** | Ensure the workflow declares a **Workflow Contract** (`workflow.contract.json`) and that all npm deps are satisfied. | ```bash\n# Verify contract exists and is JSON‑schema valid\nif ! npx ajv validate -s $RICKY_ROOT/schemas/workflow.contract.schema.json -d $WORKFLOW_ROOT/workflow.contract.json; then\n  echo \"❌ Contract validation failed – aborting execution\"\n  exit 1\nfi\n# Install exact versions decla
...[truncated]

Must

  • Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the agent-relay binary on PATH.
  • Precheck that Node and @agent-relay/sdk are resolvable for the workflow.
  • Record the actual spawn command in execution evidence.

Must Not

  • Fail solely because agent-relay is not on PATH when the SDK route is available.
  • Hide the actual runtime command from evidence.
  • Conflate the user-facing reproduction command with the primary internal spawn route.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)

User message

Ricky reaches a boundary after a structural failure persists after a fix attempt.

Ricky output

**Ricky – Boundary‑Escalation Workflow for Persistent Structural Failures**  

---

### 1. Scope & Preconditions
| Item | Description |
|------|-------------|
| **Trigger** | A structural failure (e.g., broken DAG, missing dependency, cyclic reference) remains after an initial automated or manual fix attempt. |
| **Assumptions** | • The failing workflow is version‑controlled in `main` (or `stable`). <br>• A recent commit (`<fix‑sha>`) attempted the repair. <br>• All unit‑, integration‑, and contract‑tests have been re‑run. |
| **Goal** | Halt further automated propagation, surface a **blocker**, and create a deterministic repair loop that respects the 80‑to‑100 validation‑loop guideline. |

---

### 2. Deterministic Verification Gate (DVG)

| Step | Action | Artifact | Success Criterion |
|------|--------|----------|--------------------|
| **DVG‑1** | Re‑run **full test matrix** (unit, integration, contract, lint) on the exact commit `<fix‑sha>`. | `test‑report_<fix‑sha>.json` | All suites **pass** **AND** structural‑validation script returns *failure*. |
| **DVG‑2** | Execute **Structural‑Integrity Scan** (`ricky scan --mode=structural <workflow‑path>`). | `struct‑scan_<fix‑sha>.l
...[truncated]

Must

  • Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
  • Distinguish escalation from a generic product failure.
  • Preserve enough context for a human operator to continue.

Must Not

  • Retry speculative fixes indefinitely.
  • Collapse to "something went wrong" without the attempted actions.
  • Discard evidence from failed repair attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)

User message

Produce a workflow health digest from many Ricky workflow runs.

Ricky output

**Ricky Workflow Health Digest – Template & Example**

Below is a ready‑to‑use **Workflow Health Digest** that aggregates the key reliability signals from multiple Ricky workflow runs. The digest follows the **Ricky repository conventions** (AGENTS.md, workflow standards, shared authoring rules) and is organized for quick stakeholder consumption and downstream automation.

---

## 1. Digest Structure (YAML)

```yaml
# workflow-health-digest.yaml
generated_at: 2026-05-16T12:00:00Z
summary:
  total_workflows: 0
  total_runs: 0
  success_rate: 0.0%
  avg_duration: 0s
  blockers: []
  top_issues: []
runs:
  - workflow: <workflow-id>
    run_id: <run-id>
    started_at: <timestamp>
    completed_at: <timestamp>
    duration: <seconds>
    status: SUCCESS|FAIL|BLOCKED
    verification_gate: PASS|FAIL
    blockers:
      - type: <code|infra|dependency>
        description: <short description>
        owner: <team/individual>
        created_at: <timestamp>
    review_artifacts:
      - name: <artifact-name>
        url: <link>
        status: APPROVED|CHANGES_REQUESTED
    validation_loops: <int>   # 80‑100 recommended
    notes: <free‑form notes>

*All fields are deterministic an
...[truncated]


**Must**

- Consume normalized `WorkflowRunEvidence` rather than raw logs as the primary input.
- Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
- Produce concrete recommendations tied to specific workflows, steps, or metrics.

**Must Not**

- Mutate evidence while analyzing it.
- Return generic advice like "improve your workflow" without references.
- Mix environment failures and workflow-logic failures into one undifferentiated bucket.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>runtime-recovery.preserve-pr-shipping-during-repair</code> (runtime-recovery/manual)</summary>

**User message**

> A persona-authored workflow that imports `@agent-relay/github-primitive` and uses `createGitHubStep` to open a PR fails its runtime-precheck. The auto-fix repair persona is invoked. Describe the contract the repaired artifact must satisfy.

**Ricky output**

```text
**Repaired Workflow Contract – `github‑primitive‑pr‑opener`**

| **Section** | **Requirement** | **Rationale / Verification** |
|-------------|----------------|------------------------------|
| **1. Repository Layout** | • File must live at `workflows/github‑primitive‑pr‑opener.yml`  <br>• All source files referenced by the workflow must be under `src/` (e.g., `src/steps/createGitHubStep.ts`). | Guarantees deterministic path resolution for the CI runner and for the `@agent‑relay/github‑primitive` import. |
| **2. Imports** | • The workflow **must** import the package exactly as `import { createGitHubStep } from "@agent‑relay/github‑primitive";`  <br>• No other version‑specifier syntax (e.g., `^`, `~`, or relative paths) is allowed. | The runtime‑precheck validates that the import resolves to a **single, pinned** package version defined in `package.json`. |
| **3. Step Construction** | • The workflow must call `createGitHubStep` **once** to produce a step named `open‑pr`.  <br>• The returned step object must contain the keys `name`, `action`, and `inputs`.  <br>• `inputs` must include: <br> `repo` (string, non‑empty) <br> `baseBranch` (string, non‑empty) <br> `headBranch` (string, n
...[truncated]

Must

  • Preserve the @agent-relay/github-primitive import, the GitHubStepExecutor reference, and every createGitHubStep(...) invocation that the original artifact declared.
  • Keep the failing workflow runnable from the same path with the same workflow(...) builder and .run({ cwd: process.cwd() }) invocation.
  • Retain at least ceil(N / 2) of the original workflow's .step(...) calls when the original declared four or more steps; a repair that collapses to a 2-3 step placeholder is a regression, not a fix.
  • Reject (do not apply) any repair output whose step list reduces to prepare-context / runtime-precheck: true / final-signoff: echo placeholder; surface the regression diagnostic instead.

Must Not

  • Strip createGitHubStep, GitHubStepExecutor, or @agent-relay/github-primitive because the runtime-precheck failure mentioned PR-shipping or git side effects. The repair contract's "no commit / no push" constraint applies to the REPAIR AGENT's runtime behavior, not to the workflow's step declarations.
  • Emit a "minimal repair-safe master" or "simplified Ricky master" scaffold that passes the builder validator while doing none of the original work.
  • Treat "the workflow now builds and runs" as success when the work it was supposed to ship is gone.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)

User message

Design a new Ricky Slack surface for workflow debugging.

Ricky output

## Ricky Slack Surface – “Workflow Debugger”

A dedicated Slack app (Ricky Debug) that lets engineers **inspect, step‑through, and repair** any Ricky‑managed workflow directly from a channel or DM. The surface is built as a **modal‑driven UI** with deterministic contracts and verification gates that keep the workflow state immutable until a reviewer explicitly approves a change.

---

### 1. High‑Level Interaction Flow

| Step | User Action | Ricky Response | Verification / Artifact |
|------|-------------|----------------|--------------------------|
| 1 | `/ricky debug <workflow‑id>` (or click “Debug” button on a workflow card) | Opens a **Debug Modal** pre‑populated with the workflow definition, latest run snapshot, and a *Step List* view. | Contract `DebugRequest → DebugModalPayload` |
| 2 | Select a **run instance** (or “latest”) and click **Load** | Populates the modal with **Execution Trace** (ordered list of steps, timestamps, input/output payloads). | Contract `LoadRunRequest → RunTracePayload` |
| 3 | Click a step → **Step Detail** panel expands showing: <br>• Input JSON <br>• Output JSON <br>• Env vars <br>• Execution logs | No mutation; just display. | Co

---

_Comment truncated to stay within GitHub limits. Download the `ricky-eval-run` artifact for the full `human-review.md`._

@khaliqgant khaliqgant merged commit 261338f into main May 16, 2026
4 checks passed
@khaliqgant khaliqgant deleted the fix/master-children-shared-worktree-scope branch May 16, 2026 08:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant