fix(local): inject --input KEY=VALUE into workflow env + idle-output watchdog by kjgbot · Pull Request #139 · AgentWorkforce/ricky

kjgbot · 2026-05-30T07:37:01Z

Problem

Two issues caused ricky local --run to stall for hours in overnight batch runs, even with auto-fix enabled.

1. `--input KEY=VALUE` was silently dropped

The overnight runner invokes reusable review/fix workflows like:

ricky local --spec-file _review.md --run --best-judgement --input TARGET_SPEC=specs/021-sentry.md

But --input was parsed nowhere in ricky. The value never reached the workflow runner subprocess env, so the generated review workflow's assertRickyWorkflowEnv('TARGET_SPEC') failed with:

Error: MISSING_ENV_VAR: TARGET_SPEC. Add missing values to .env.local or export them before rerunning.

The auto-fix loop then tried to "repair" the workflow — adding a repo-local .env loader — but no loader can conjure a value that was never passed in. Each repair spawns a fresh workflow gen+run (minutes each), burning the attempt budget and wall-clock.

2. No inactivity detection

A hung runner (dead broker, half-open stdio pipe, a subprocess parked at 0% CPU) stalled for the full 12 h DEFAULT_RUN_TIMEOUT_MS before the outer timeout fired. Observed in practice: a node --experimental-strip-types workflow process sat at 0% CPU for 8+ hours.

Fix

1. Thread --input into the runner env. --input KEY=VALUE (and --input=KEY=VALUE) now parse into an inputs record that flows CLI args → RawHandoff → normalized LocalInvocationRequest → coordinator.launch env, so workflow scripts read them via process.env.KEY. Invalid env-var keys and malformed pairs are reported as CLI errors. Empty values (--input TARGET_SPEC=) are allowed.

2. Idle-output watchdog. A healthy run constantly emits broker/agent output; total silence for the idle window means it's hung. Added a watchdog (default DEFAULT_RUN_IDLE_TIMEOUT_MS = 30 min, override via RICKY_RUN_IDLE_TIMEOUT_MS, 0 disables) that aborts the runner on prolonged silence so the run fails fast and the orchestrator moves on instead of stalling.

Together these mean: the review/fix workflows get their TARGET_SPEC and succeed, and any future hang is bounded to 30 minutes instead of 12 hours — so auto-fix can stay enabled without risk of multi-hour stalls.

Verification

npm run typecheck → clean
npm test → 1383 passed (added 4 --input parser tests)

🤖 Generated with Claude Code

…watchdog Two fixes so `ricky local --run` never stalls or fails spuriously, even with auto-fix enabled. 1. --input KEY=VALUE was silently dropped. `ricky local --spec-file _review.md --run --input TARGET_SPEC=<path>` parsed the flag nowhere — the value never reached the workflow runner subprocess env, so reusable review/fix workflows that read `process.env.TARGET_SPEC` failed with MISSING_ENV_VAR. The auto-fix loop then "repaired" the workflow repeatedly (adding an env loader that can't conjure a value that was never passed), burning attempts and wall-clock. Now `--input KEY=VALUE` (and `--input=KEY=VALUE`) parse into an `inputs` record that threads CLI → handoff → normalized request → coordinator launch env, so workflow scripts read them via process.env.KEY. Invalid keys / malformed pairs are reported as CLI errors. 2. No inactivity detection — a hung runner stalled for the full 12 h DEFAULT_RUN_TIMEOUT_MS. A healthy run constantly emits broker/agent output; total silence means it's hung (dead broker, half-open stdio pipe, a subprocess parked at 0% CPU). Added an idle-output watchdog (DEFAULT_RUN_IDLE_TIMEOUT_MS = 30 min, override via RICKY_RUN_IDLE_TIMEOUT_MS, 0 disables) that aborts the runner on prolonged silence so the run fails fast and the orchestrator moves on. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai · 2026-05-30T07:37:13Z

Warning

Review limit reached

@kjgbot, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 4 minutes and 13 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 4ffaefa4-078d-417c-8865-bec0f19c99fd

📥 Commits

Reviewing files that changed from the base of the PR and between 54c0f7c and ebb4863.

📒 Files selected for processing (4)

src/local/entrypoint.ts
src/surfaces/cli/commands/cli-main.ts
src/surfaces/cli/flows/power-user-parser.test.ts
src/surfaces/cli/flows/power-user-parser.ts

📝 Walkthrough

Walkthrough

This PR introduces CLI --input KEY=VALUE arguments to inject environment variables into workflow subprocesses and adds an inactivity watchdog that aborts runs producing prolonged output silence, configurable via environment.

Changes

Workflow Input Parameters and Inactivity Monitoring

Layer / File(s)	Summary
CLI Input Parsing with Validation `src/surfaces/cli/flows/power-user-parser.ts`, `src/surfaces/cli/flows/power-user-parser.test.ts`	`PowerUserParsedArgs` gains an `inputs` field; `parseInputFlags` helper validates `--input KEY=VALUE` pairs and surfaces errors for invalid env-var names and malformed syntax; four test cases verify multi-flag accumulation, inline syntax, and error handling.
CLI Handoff Building and Threading `src/surfaces/cli/commands/cli-main.ts`	`ParsedArgs` adds `inputs` field; `parseArgs` assigns parsed inputs; `buildCliHandoff` injects inputs into `RawHandoff` for all spec variants (artifact, inline, file, stdin).
Request Normalization and Data Threading `src/local/request-normalizer.ts`	`BaseHandoff` and `LocalInvocationRequest` interfaces gain optional `inputs` fields; `runtimeOptionsFor()` conditionally includes non-empty inputs in normalized requests.
Runtime Execution: Idle Watchdog and Environment Injection `src/shared/constants.ts`, `src/local/entrypoint.ts`	New `DEFAULT_RUN_IDLE_TIMEOUT_MS` constant defines idle timeout; `resolveIdleTimeoutMs()` reads configurable `RICKY_RUN_IDLE_TIMEOUT_MS`; `SdkScriptWorkflowCoordinator.launch` tracks output timestamps, starts an abort timer for silence, and injects `activeRequest.inputs` into subprocess environment with runtime state taking precedence.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Inputs hop through the CLI, each KEY=VALUE plays,
Watchdog's eyes stay open, counting idle days,
When silence stretches long, the subprocess takes a bow,
Environment variables flow—the workflow knows how! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 36.36% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically summarizes the main changes: injecting --input KEY=VALUE into workflow env and adding an idle-output watchdog for inactivity detection.
Description check	✅ Passed	The description is comprehensive and directly relates to all changes in the PR, explaining both the problem and the fix in detail with verification steps.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/inject-input-env-and-idle-watchdog

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces an inactivity watchdog to abort hung workflow runner subprocesses early and adds support for passing custom environment variables via --input KEY=VALUE CLI flags. Feedback on these changes highlights two issues: a parsing bug where a trailing flag following --input could be skipped silently, and a potential resource leak where the watchdog's interval timer might not be cleared if the runner throws a synchronous error.

gemini-code-assist · 2026-05-30T07:38:11Z

+    let pair: string | undefined;
+    if (arg === '--input') {
+      pair = argv[index + 1];
+      index += 1;
+    } else if (arg.startsWith('--input=')) {
+      pair = arg.slice('--input='.length);
+    } else {
+      continue;
+    }
+    if (pair === undefined || pair.startsWith('--')) {
+      errors.push('--input requires a KEY=VALUE argument.');
+      continue;
+    }


If the --input flag is followed by another flag starting with -- (for example, if the user forgets the argument or makes a typo like ricky local --input --run), the parser will consume --run as the pair value and increment index. This causes the next flag (--run) to be completely skipped in the next iteration of the loop, silently ignoring it.

We should validate that the next argument exists and does not start with -- before consuming it and incrementing the index.

Suggested change

let pair: string | undefined;

if (arg === '--input') {

pair = argv[index + 1];

index += 1;

} else if (arg.startsWith('--input=')) {

pair = arg.slice('--input='.length);

} else {

continue;

}

if (pair === undefined || pair.startsWith('--')) {

errors.push('--input requires a KEY=VALUE argument.');

continue;

}

let pair: string | undefined;

if (arg === '--input') {

const next = argv[index + 1];

if (next === undefined || next.startsWith('--')) {

errors.push('--input requires a KEY=VALUE argument.');

continue;

}

pair = next;

index += 1;

} else if (arg.startsWith('--input=')) {

pair = arg.slice('--input='.length);

} else {

continue;

}

if (pair === undefined || pair.startsWith('--')) {

errors.push('--input requires a KEY=VALUE argument.');

continue;

}

gemini-code-assist · 2026-05-30T07:38:11Z

+      ).finally(() => {
+        if (idleInterval) clearInterval(idleInterval);
+      });


If this.runner throws a synchronous error (for example, during argument validation or initial setup before returning a promise), withTimeout is never called. As a result, the .finally() block is never attached, and idleInterval is leaked (never cleared).

To prevent this resource leak, we should ensure that any synchronous errors from this.runner are converted into a promise rejection so that withTimeout is always invoked and the .finally() block always runs.

For example, you can wrap the runner call in an async block or Promise.resolve().then(...) like this:

const runnerPromise = (async () => { return this.runner(request.workflowFile, { // ... options ... }); })(); const runnerResult = await withTimeout( runnerPromise, request.timeoutMs ?? DEFAULT_RUN_TIMEOUT_MS, () => abortController.abort(), ).finally(() => { if (idleInterval) clearInterval(idleInterval); });

pullfrog

Reviewed changes — two fixes for ricky local --run reliability: threading --input KEY=VALUE through the full CLI→handoff→runner env pipeline, and adding an idle-output watchdog that aborts the SDK script runner after 30 min of zero output.

--input flag parsing and env injection — parseInputFlags() in power-user-parser.ts supports both --input KEY=VALUE and --input=KEY=VALUE forms, with validation for valid env-var names and malformed pairs. Flows through PowerUserParsedArgs → BaseHandoff → LocalInvocationRequest → RunRequest.env → subprocess env.
Idle-output watchdog — setInterval in SdkScriptWorkflowCoordinator.launch() tracks output activity and aborts the runner after DEFAULT_RUN_IDLE_TIMEOUT_MS (30 min) of silence. Configurable via RICKY_RUN_IDLE_TIMEOUT_MS; 0 disables.

^{｜ View workflow run ｜ 𝕏}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/local/entrypoint.ts (1)

447-478: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve idle-watchdog aborts as a distinct timeout signal.

If the watchdog aborts the run, the awaited runner promise rejects before Lines 450-452 execute, so the inactivity marker never reaches stderr. The catch path then records only script workflow run aborted and marks the run as failed, which makes downstream blocker classification miss the idle timeout and fall back to generic failure handling.

Suggested fix

-      ).finally(() => {
+      ).finally(() => {
         if (idleInterval) clearInterval(idleInterval);
       });
-      if (idleAborted) {
-        stderr.push(`Workflow runner aborted after ${Math.round(idleTimeoutMs / 1000)}s of inactivity (suspected hang).`);
-      }
       const reportedFailure =
         failureFromScriptWorkflowResult(runnerResult) ?? failureFromScriptWorkflowOutput(stdout, stderr);
       if (reportedFailure) {
         throw new Error(reportedFailure);
       }
@@
     } catch (error) {
-      const message = error instanceof Error ? error.message : String(error);
-      status = message.startsWith('timed out after ') ? 'timed_out' : 'failed';
-      stderr.push(message);
+      const idleMessage = idleAborted
+        ? `Workflow runner aborted after ${Math.round(idleTimeoutMs / 1000)}s of inactivity (suspected hang).`
+        : undefined;
+      const message = idleMessage ?? (error instanceof Error ? error.message : String(error));
+      status = idleAborted || message.startsWith('timed out after ') ? 'timed_out' : 'failed';
+      stderr.push(message);
       emit(status === 'timed_out' ? 'timeout' : 'error', message, { error: message });
       return coordinatorResultFromSdkRun({

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/local/entrypoint.ts` around lines 447 - 478, The catch block needs to
preserve watchdog aborts: detect if idleAborted is true and, if so, set message
to `Workflow runner aborted after ${Math.round(idleTimeoutMs/1000)}s of
inactivity (suspected hang).`, set status to 'timed_out', push that message to
stderr and emit the 'timeout' event (instead of treating it as a generic
failure); otherwise continue the existing error handling. Update the catch path
where status is assigned and emit() is called so the idleAborted branch runs
before the generic message/status logic.

src/surfaces/cli/flows/power-user-parser.ts (1)

150-166: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Treat --input as consuming the next argv token in legacy ricky run parsing.

parseInputFlags() accepts --input KEY=VALUE, but readRunArtifactPositional() still sees that KEY=VALUE token as the artifact for commands like ricky run --input TARGET_SPEC=spec.md workflows/generated/review.ts. That makes the new flag break the positional run form whenever it appears before the artifact.

Suggested fix

 function isValueForRunOption(previous: string | undefined, candidate: string): boolean {
   if (!previous) return false;
+  if (previous === '--input') return true;
   if ((previous === '--auto-fix' || previous === '--repair') && isAutoFixValue(candidate)) return true;
   return previous === '--start-from' || previous === '--previous-run-id' || previous === '--resume-from-run';
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/surfaces/cli/flows/power-user-parser.ts` around lines 150 - 166, The
parseInputFlags() changes made `--input KEY=VALUE` work but the legacy
positional parser still treats the KEY=VALUE token as an artifact; update the
CLI arg handling so `--input` is treated like a flag that consumes its next argv
token during legacy parsing: add '--input' to the list passed to the loop that
checks flags-with-values (the array iterated in this file) or update
readFlagValue(effectiveArgv, flag) to recognize and consume the token after
'--input', ensuring readRunArtifactPositional will not see the KEY=VALUE token
as a positional artifact and leaving parseInputFlags and
readRunArtifactPositional behavior consistent.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/surfaces/cli/commands/cli-main.ts`:
- Around line 108-109: Add documentation for the new --input flag inside the
renderHelp() function so it appears in ricky --help; specifically, update the
help output generated by renderHelp() to include an entry describing "--input
KEY=VALUE" (referencing the inputs?: Record<string, string> option added to the
CLI command options) and explain that multiple --input flags set KEY=VALUE pairs
injected into the workflow runner env and how to use them. Ensure the help text
is concise and matches existing help formatting and examples used by
renderHelp().

---

Outside diff comments:
In `@src/local/entrypoint.ts`:
- Around line 447-478: The catch block needs to preserve watchdog aborts: detect
if idleAborted is true and, if so, set message to `Workflow runner aborted after
${Math.round(idleTimeoutMs/1000)}s of inactivity (suspected hang).`, set status
to 'timed_out', push that message to stderr and emit the 'timeout' event
(instead of treating it as a generic failure); otherwise continue the existing
error handling. Update the catch path where status is assigned and emit() is
called so the idleAborted branch runs before the generic message/status logic.

In `@src/surfaces/cli/flows/power-user-parser.ts`:
- Around line 150-166: The parseInputFlags() changes made `--input KEY=VALUE`
work but the legacy positional parser still treats the KEY=VALUE token as an
artifact; update the CLI arg handling so `--input` is treated like a flag that
consumes its next argv token during legacy parsing: add '--input' to the list
passed to the loop that checks flags-with-values (the array iterated in this
file) or update readFlagValue(effectiveArgv, flag) to recognize and consume the
token after '--input', ensuring readRunArtifactPositional will not see the
KEY=VALUE token as a positional artifact and leaving parseInputFlags and
readRunArtifactPositional behavior consistent.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: d1dcf161-5542-42e0-9e44-ca0d99acf583

📥 Commits

Reviewing files that changed from the base of the PR and between 1ee2354 and 54c0f7c.

📒 Files selected for processing (6)

src/local/entrypoint.ts
src/local/request-normalizer.ts
src/shared/constants.ts
src/surfaces/cli/commands/cli-main.ts
src/surfaces/cli/flows/power-user-parser.test.ts
src/surfaces/cli/flows/power-user-parser.ts

cubic-dev-ai

3 issues found across 6 files

_{Reply with feedback, questions, or to request a fix.

Re-trigger cubic}

github-actions · 2026-05-30T08:29:58Z

Ricky Eval Review

Run: .ricky/evals/runs/2026-05-30T08-33-24-237Z-HEAD-provider
Mode: provider
Git SHA: 1745f6b

Human Review Cases

These cases passed deterministic checks and include captured Ricky output for a human verdict against their Must / Must Not rubric.

REVIEW agent-assistant-boundary.real-reuse-not-rhetorical (agent-assistant-boundary/manual)

User message

Update Ricky docs and code to say it uses Agent Assistant more deeply.

Ricky output

**Ricky – Updated Integration with Agent Assistant**  
*(Version 2.1 – “Deep Agent Assistant Integration”)*  

---

## 1. Scope of Change
| Area | Change |
|------|--------|
| **Documentation** | Expand `README.md`, `AGENTS.md`, and `docs/architecture.md` to describe Ricky’s deeper reliance on the Agent Assistant (AA) for intent routing, context stitching, and error‑recovery orchestration. |
| **Code** | Refactor `ricky/core/dispatcher.py` to call `AgentAssistant.route_intent()` and `AgentAssistant.enrich_context()`. Add AA‑based fallback in `ricky/core/error_handler.py`. |
| **Tests** | Add unit tests for the new AA calls and integration tests that verify the deterministic verification gate (see §4). |
| **CI/CD** | Update the workflow file `.github/workflows/ci.yml` to include a new verification step `AA‑contract‑check`. |
| **Review Artifacts** | Provide a **Design Review** (DR‑2024‑08) and a **Security Review** (SR‑2024‑08) as required by the workflow contract. |

---

## 2. Concrete Workflow Contract (RFC‑2024‑08)

| Phase | Deliverable | Owner | Gate |
|-------|-------------|-------|------|
| **Design** | DR‑2024‑08 (Markdown) – architecture diagram, AA API surface, failure m
...[truncated]

Must

Ground claims in real package imports and runtime paths.
Distinguish current implementation from target architecture.
Identify which Agent Assistant primitive is actually exercised.

Must Not

Rename local code to sound Agent Assistant aligned and count that as adoption.
Claim broad Agent Assistant native behavior from documentation-only alignment.
Blur target architecture with landed behavior.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW agent-assistant-boundary.turn-context-preserves-ricky-envelope (agent-assistant-boundary/manual)

User message

Evaluate the current Ricky @agent-assistant/turn-context adoption.

Ricky output

**Ricky @agent‑assistant/turn‑context Adoption Review (as of 2026‑05‑30)**  

| Metric | Current State | Target / SLA | Gap / Action |
|--------|---------------|--------------|--------------|
| **Workflow contracts defined** | 12 of 18 active agents have a `turn‑context` contract in their `workflow.yaml`. | 100 % of agents | Identify the 6 agents lacking contracts; create minimal contracts (input: `turnId`, `sessionId`; output: `contextSnapshot`). |
| **Deterministic verification gates** | 8 agents run a pre‑merge gate that validates `turn‑context` schema against `schemas/turn-context.json`. | 100 % gate coverage | Add gate to remaining 10 agents; automate via `npm run verify:turn‑context`. |
| **Review artifacts** | 65 % of PRs include a “Turn‑Context Impact” section; 40 % include generated diff of `contextSnapshot` changes. | 100 % inclusion | Update PR template (`.github/pull_request_template.md`) to make the section mandatory. |
| **Validation loops** | Average of 42 validation loops per turn (unit + integration + end‑to‑end). | 80‑100 loops per spec | Increase integration test depth for agents with <70 loops; add end‑to‑end scenarios for edge‑case context propagation. |
| **Bl
...[truncated]

Must

Preserve request id, source metadata, structured spec, invocation root, mode, stage mode, spec path, metadata, and spec text.
Record compact provenance through generation decisions or coordinator metadata.
Keep the shared turn context internal to the adapter boundary.

Must Not

Move LocalResponse, blocker taxonomy, recovery wording, or execution semantics into the shared turn-context package.
Drop Ricky-specific workflow metadata during envelope assembly.
Treat turn context as a product decision engine.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW agent-assistant-boundary.product-core-stays-ricky-owned (agent-assistant-boundary/manual)

User message

Decide whether workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording should move into Agent Assistant.

Ricky output

**Recommendation:** Move **all** of the listed capabilities—workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording—into the **Agent Assistant** component.

---

## Rationale (aligned with AGENTS.md and workflow standards)

| Capability | Why it belongs in Agent Assistant | Impact on reliability & coordination |
|------------|-----------------------------------|---------------------------------------|
| **Workflow Generation** | Agent Assistant already owns the *authoring* contract (template → concrete workflow). Centralizing generation avoids duplicated logic across CLI and UI layers. | Guarantees a single source‑of‑truth for workflow shape; deterministic contracts can be version‑controlled and unit‑tested. |
| **Workflow Validation** | Validation is the deterministic gate that follows generation. Embedding it in Agent Assistant lets the same verification suite be reused by CLI, UI, and CI pipelines. | Enables 80‑100 validation loops with consistent pass/fail semantics; failures become explicit “blocker” artifacts. |
| **Debugging (step‑by‑step introspection, state diff)** | Debug support is a *runtime* service that inspects the workflow graph produ
...[truncated]

Must

Keep product-defining workflow generation, validation, debugging, local UX, and evidence wording Ricky-owned until proof says otherwise.
Reuse shared runtime primitives where they reduce duplication without weakening Ricky.
Make extraction follow typed, tested, live product proof.

Must Not

Generalize workflow-specific behavior prematurely.
Adopt moving shared seams merely for architectural purity.
Lose the precise local-first staged workflow UX.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW agent-assistant-boundary.one-slice-at-a-time (agent-assistant-boundary/manual)

User message

Plan the next Agent Assistant adoption slice for Ricky.

Ricky output

**Agent Assistant Adoption Slice – “Ricky v2.0 Roll‑out”**  
*Target window: 12 weeks (Weeks 1‑12)*  

---

## 1. Slice Goal
Deploy the next‑generation reliability and coordination features of Ricky to **30 % of active AgentWorkforce teams** (≈ 45 teams) while establishing a repeatable adoption pipeline that can be scaled to 100 % in subsequent slices.

Success criteria (all must be met):

| Metric | Target |
|--------|--------|
| **Adoption** | ≥ 30 % of target teams have Ricky enabled in production |
| **Reliability** | < 0.5 % of Ricky‑driven workflows generate a “blocker” after the first 48 h |
| **Cycle‑time** | Mean time to review & merge a Ricky‑generated PR ≤ 12 h |
| **Feedback** | ≥ 80 % of surveyed users rate “usefulness” ≥ 4/5 |

---

## 2. Scope & Boundaries  

| Included | Excluded |
|----------|----------|
| • New **deterministic verification gates** for workflow contracts (schema, idempotency, timeout)  <br>• **Branch/PR policy**: `ricky/adopt‑slice‑v2` → `main` with required CODEOWNERS review <br>• **Automated test harness**: 80‑100 validation loops per workflow (unit + integration) <br>• **Blocker reporting**: Slack #ricky‑alerts + JIRA RICK‑BLOCKER tickets <br>• 
...[truncated]

Must

Pick exactly one real shared seam to evaluate or adopt.
Define a live Ricky product path that will prove the adoption.
Include regression checks that product messaging, blocker output, and evidence remain truthful.

Must Not

Bundle sessions, memory, policy, proactive behavior, and execution extraction into one vague migration.
Skip the comparison/evaluation step for mature Ricky-local seams.
Treat adoption as successful without a live product-path proof.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW agent-assistant-boundary.future-surfaces-use-shared-runtime (agent-assistant-boundary/manual)

User message

Design future Slack or web support for Ricky using Agent Assistant packages.

Ricky output

## Ricky – Future Support Interface (Slack & Web)

Below is a **contract‑first design** for extending Ricky’s support capabilities through a Slack bot and a companion web portal. The design follows the **AgentWorkforce workflow standards** (deterministic gates, review artifacts, 80‑100 validation loops, blocker reporting) and is scoped for a single feature branch (`feat/support‑slack‑web`) with a clean PR boundary.

---

### 1. High‑Level Architecture

| Component | Responsibility | Tech Stack (suggested) |
|-----------|----------------|------------------------|
| **Ricky Core** | Executes workflow contracts, stores state, emits events | Python 3.11, FastAPI, PostgreSQL, Redis |
| **Slack Adapter** | Translates Slack events ↔️ Ricky contracts | Bolt for Python, Socket Mode |
| **Web Portal** | UI for ticket view, manual overrides, analytics | React 18 + Vite, TypeScript, TailwindCSS, Next.js (SSR) |
| **Event Bus** | Decouples adapters from core, guarantees at‑least‑once delivery | NATS JetStream (or Kafka) |
| **Auth Service** | SSO (OAuth2) for web, token validation for Slack | Auth0 / Azure AD |
| **Observability** | Tracing, metrics, alerting | OpenTelemetry → Grafana/Prometheu
...[truncated]

Must

Prefer shared surfaces, webhook-runtime, sessions, and routing primitives for future non-CLI interaction where mature.
Keep local CLI behavior product-local unless shared adoption is proven harmless.
Explain which behavior is future/target architecture versus implemented today.

Must Not

Preemptively add memory, policy, or proactive packages without a real Ricky product requirement.
Let future surface abstractions distort the current CLI contract.
Duplicate a mature Agent Assistant capability locally without justification.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW cli.generation-default-not-execution (cli-behavior/manual)

User message

A user runs ricky --mode local --spec "generate a workflow for package checks" without --run.

Ricky output

**What happened**

You invoked Ricky with the *local* mode and a specification, but you didn’t include the `--run` flag:

```bash
ricky --mode local --spec "generate a workflow for package checks"

In this form Ricky only parses the request, validates the syntax, and returns a preview of the workflow contract (the JSON/YAML skeleton). No actual workflow files are written, no branches are created, and no CI checks are triggered.

How to proceed

Preview (what you just did)
- The command you ran is useful when you want to see the generated contract before committing anything.
- The output will be a deterministic workflow definition (e.g., workflow.json) that you can inspect, edit, or copy‑paste into your repository.
Create / Apply the workflow
- To have Ricky author the workflow, push it to a new branch, open a PR, and run the verification gates, you need to add --run.
- Example:
```
ricky --mode local --spec "generate a workflow for package checks" --run
```
This will:
- Generate the workflow contract.
- Create a scoped branch (e.g., ricky/package-checks-<timestamp>).
- Add the workflow file(s) und
  ...[truncated]


**Must**

- Say generation is the default and execution was not requested.
- Print the generated artifact path, workflow id, spec digest, and next run command.
- Avoid showing execution evidence for a generation-only request.

**Must Not**

- Imply the workflow ran automatically.
- Present a generation-only result as execution success.
- Hide the opt-in commands for running the artifact.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.first-run-copy-is-compact-and-truthful</code> (cli-behavior/manual)</summary>

**User message**

> Render Ricky's first-run CLI onboarding for a new user.

**Ricky output**

```text
**Ricky – First‑Run CLI Onboarding**

Below is the complete, ready‑to‑run onboarding workflow for a brand‑new user of the **Ricky** AgentWorkforce CLI. Follow the steps in order; each step includes a deterministic verification gate that will stop the process and surface a clear blocker if anything goes wrong.

---

## 1️⃣ Prerequisites (Verification Gate)

| Check | Command | Expected Result |
|-------|---------|-----------------|
| **Node ≥18** | `node -v` | `v18.x` or higher |
| **Git** | `git --version` | `git version 2.x` |
| **Docker** (optional, for container‑based agents) | `docker version --format '{{.Server.Version}}'` | version string (e.g., `20.10.24`) |
| **Internet connectivity** | `curl -s https://api.github.com/zen` | any non‑empty response |

*If any check fails, abort and resolve the prerequisite before proceeding.*

---

## 2️⃣ Install the Ricky CLI (Deterministic Contract)

```bash
# Install globally via npm (preferred)
npm i -g @ricky/agentworkforce-cli
# Verify installation
ricky --version

Expected output: ricky-cli vX.Y.Z (matches the version published on npm).

Blocker: If ricky is not found or version mismatches, reinstall with `npm cache clea
...[truncated]


**Must**

- Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
- End every branch with a concrete next step.
- Advertise only commands that are currently implemented.

**Must Not**

- Sound like a launch page or documentation dump.
- Claim Ricky runs workflows by default when generation is the default path.
- Require web or Slack onboarding before CLI use.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.recovery-guidance-no-stack-traces</code> (cli-behavior/manual)</summary>

**User message**

> A user gives Ricky an empty spec or a missing spec file.

**Ricky output**

```text
**Blocker Report – Missing or Empty Specification**

| **Item** | **Status** |
|----------|------------|
| **Problem** | The workflow specification file is either missing from the repository or present but contains no content. |
| **Impact** | Without a valid spec we cannot generate a deterministic contract, create verification gates, or produce the required review artifacts. All downstream CI/CD steps (validation loops, test generation, deployment) will be blocked. |
| **Immediate Action** | Halt any further automated processing of this workflow until a concrete spec is supplied. |

---

## 1. Request for a Concrete Spec

Please provide a **non‑empty** `workflow.yaml` (or the appropriate spec file name) that includes at least the following mandatory sections, per the **AGENTS.md** contract template:

```yaml
name: <workflow-name>
description: >-
  Brief, human‑readable description of the workflow’s purpose.
version: "1.0.0"
inputs:
  - name: <input‑name>
    type: <string|int|bool|...>
    required: true
    description: <what the input represents>
steps:
  - id: <step‑id>
    name: <human‑readable name>
    action: <builtin|custom>
    parameters:
      <key>: <value>
    depends
...[truncated]

Must

Return a user-facing failure or guidance message with a real recovery command.
Distinguish generation failure from execution failure.
Show stack traces only when verbose diagnostic mode is requested.

Must Not

Crash with an uncaught exception in normal mode.
Suggest commands that do not exist.
Pretend a missing spec was accepted.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW cli.status-does-not-invent-provider-state (cli-behavior/manual)

User message

Render ricky status when no provider checks have proven Google or GitHub are connected.

Ricky output

**`ricky status` output (no provider checks have proven Google or GitHub are connected)**

=== AgentWorkforce Status Report ===

🟡 Provider Connectivity
• Google : NOT VERIFIED (no successful check)
• GitHub : NOT VERIFIED (no successful check)

✅ Core Services
• Scheduler : RUNNING
• Dispatcher : RUNNING
• Logger : ACTIVE

🔧 Configuration
• Google OAuth Token : MISSING / UNVERIFIED
• GitHub Personal Token: MISSING / UNVERIFIED

⚠️ Blockers
• Google integration has not been confirmed – workflows that depend on Google APIs may fail.
• GitHub integration has not been confirmed – repository sync, PR automation, and code‑review hooks are disabled.

🗒️ Next Steps

Run provider health checks:
- ricky provider check google
- ricky provider check github
Supply missing credentials if needed:
- ricky provider set google --token <TOKEN>
- ricky provider set github --token <TOKEN>
Re‑run ricky status to verify connectivity.

🧭 Summary

Core AgentWorkforce services are up.
External provider connections (Google, GitHub) are currently unverified and must be validated
...[truncated]


**Must**

- Report unknown or not-connected provider state honestly.
- Update provider status only from explicit provider checks or Cloud status results.
- Give concrete setup guidance for Cloud when relevant.

**Must Not**

- Mark Google or GitHub connected because guidance text was shown.
- Invent a provider connection URL or OAuth flow.
- Show empty fields with no recovery guidance when config is missing.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>generation-quality.skill-matcher-registry-backed</code> (generation-quality/manual)</summary>

**User message**

> Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.

**Ricky output**

```text
**Workflow: “GitHub Primitive Change”**

| Step | Owner | Action | Input | Output | Verification Gate |
|------|-------|--------|-------|--------|--------------------|
| 1️⃣ | **Ricky (Coordination)** | **Create a scoped feature branch** | `main` HEAD | New branch `feat/<short‑description>` | Branch exists & is based on latest `main` |
| 2️⃣ | **Ricky (Branch‑Management)** | **Sync latest CI config** | Repo root | Updated `.github/workflows/ci.yml` (if needed) | CI config passes lint (`actionlint`) |
| 3️⃣ | **Developer** | **Implement the primitive change** (e.g., rename a variable, adjust a comment, bump a version) | Branch from step 1 | Code change + unit‑test (if applicable) | Local `git diff` shows only intended files; `npm test`/`go test` passes |
| 4️⃣ | **Ricky (Deterministic Verification)** | **Run pre‑commit checks** | Changed files | Lint, format, static‑analysis results | All checks return `0` exit code |
| 5️⃣ | **Developer** | **Commit with conventional message** | Verified changes | Commit `feat: <short description>` | Commit message matches Conventional Commits regex |
| 6️⃣| **Ricky (Automation)** | **Open a Pull Request** | Commit from step 5 | PR titled “feat: <s
...[truncated]

Must

Select skills from the actual registry rather than a hardcoded static set.
Record ranked skill matches with confidence and trigger evidence.
Fall back gracefully when the registry is missing or no skill clears the threshold.

Must Not

Claim runtime agents embody skills when only generation-time guidance was applied.
Fail generation solely because optional skill files are missing.
Hide skill selection evidence from artifacts or JSON output.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)

User message

Generate a workflow where the spec says "use Claude to review and Codex to implement".

Ricky output

**Workflow: Claude‑Review → Codex‑Implementation**

| Phase | Owner | Action | Input | Output | Verification Gate |
|-------|-------|--------|-------|--------|--------------------|
| **1. Requirement Ingestion** | *Product Owner* | Submit functional spec (markdown) to `specs/` folder. | `spec.md` | `spec.md` committed. | CI lint (YAML/MD schema) passes. |
| **2. Claude Review** | *Claude (LLM‑reviewer)* | Run `claude-review.yml` to generate a **Review Report**. | `spec.md` | `reviews/claude/<spec‑id>_review.json` | Automated test checks: <br>• JSON schema compliance. <br>• Presence of `issues[]`, `suggestions[]`, `approval` flag. |
| **3. Human Gate – Review Acceptance** | *Design Lead* | Inspect `*_review.json`. Approve if `approval:true` **or** add manual comments. | `*_review.json` | `reviews/claude/<spec‑id>_review_signed.md` | PR must be **approved** + `approval:true` OR contain a signed “override” comment. |
| **4. Codex Generation** | *Codex (LLM‑coder)* | Execute `codex-implement.yml` using the signed review as context. | `spec.md` + `*_review_signed.md` | `src/<spec‑id>/` with generated code files. | Unit‑test scaffold generated; compile check passes. |
| **5. Automated Te
...[truncated]

Must

Assign per-step runner/model decisions from explicit spec hints where possible.
Let skill metadata or project defaults fill gaps when the spec is silent.
Write tool-selection.json or equivalent audit metadata.

Must Not

Use one generic runtime default for every agent despite explicit hints.
Let skill defaults override explicit user/spec runner hints.
Omit the reason each tool or model was chosen.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)

User message

Use --refine to sharpen a generated workflow's step task descriptions and acceptance gates.

Ricky output

Below is a compact “how‑to” for using the **`--refine`** option when you want to tighten a workflow’s step‑level task descriptions and acceptance gates. The pattern follows the **Ricky** repository conventions (AGENTS.md, workflow standards, shared authoring rules).

---

## 1. When to invoke `--refine`

| Situation | Why you need `--refine` |
|-----------|------------------------|
| **Vague step text** – e.g. “process data” | Adds concrete inputs, outputs, and success criteria. |
| **Missing verification gate** | Inserts deterministic checks (unit test, schema validation, SLA). |
| **Over‑broad acceptance** | Narrows to 80‑100 % validation loops and explicit pass/fail thresholds. |
| **Branch/PR scoping** | Guarantees the refinement lives in a dedicated branch (`refine/<workflow‑id>`). |

---

## 2. Command syntax

```bash
ricky workflow edit <workflow-id> \
    --refine \
    --step "<step‑name>" \
    --task "New precise task description" \
    --gate "deterministic verification gate (e.g. `jsonschema validate output.json`)" \
    --branch "refine/<workflow-id>"

<workflow-id> – the UUID or slug of the workflow you are editing.
--step – the exact step label
...[truncated]


**Must**

- Keep the deterministic unrefined path as the default.
- Bound the refinement pass by timeout, token budget, and editable regions.
- Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.

**Must Not**

- Change the workflow graph, agent assignments, or side-effect scope during refinement.
- Fail the whole generation if optional refinement times out.
- Hide warnings when the deterministic artifact is returned unchanged.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>generation-quality.behavior-grounded-gates</code> (generation-quality/manual)</summary>

**User message**

> Generate a workflow for the `ricky --version` spec.

**Ricky output**

```text
**Workflow: `ricky --version` Specification & Delivery**  
*Repository:* `github.com/agentworkforce/ricky`  
*Target Branch:* `main` → `feature/ricky-version`  

---

## 1. Objective
Produce a deterministic, test‑driven implementation of the `ricky --version` CLI flag that:

1. Returns the current package version string (e.g., `ricky v1.4.2`).
2. Exposes the version via a Go constant and a `--version` flag in the Cobra command tree.
3. Is covered by unit, integration, and contract tests.
4. Passes all CI gates before merge.

---

## 2. Contract (Specification)

| Item | Requirement | Acceptance Criteria |
|------|-------------|---------------------|
| **CLI flag** | `--version` (global) | `ricky --version` prints exactly `ricky v<semver>` and exits with code 0. |
| **Version source** | Single source of truth in `internal/version/version.go` | Constant `Version = "vX.Y.Z"`; updated only via `make bump-version`. |
| **Output format** | `ricky v<semver>` (no trailing newline required, but allowed) | Regex `^ricky v\d+\.\d+\.\d+$` must match. |
| **Help integration** | `ricky --help` lists `--version` under “Global Flags”. | Flag appears in generated help output. |
| **Cross‑platform**
...[truncated]

Must

Build gates from the stated acceptance behavior, such as checking ricky --version output.
Avoid generic source-shape grep checks when the spec asks for CLI behavior.
Keep generated validation meaningful for the current repo shape.

Must Not

Treat grep -Eq 'export|function|class|workflow(' dist/ricky.js as proof of version behavior.
Claim the workflow is proven by source syntax alone.
Ignore the package-json version resolution order in the spec.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)

User message

Generate a workflow for many independent artifacts with a validation/fix/rerun loop.

Ricky output

**Workflow Title:** Batch‑Artifact Validation‑Fix‑Rerun Loop  
**Scope:** Applies to any collection of independent artifacts (e.g., data files, config bundles, compiled binaries) that must each pass a deterministic validation suite before promotion to the next environment.  

---

## 1. Contract Overview  

| Element | Description |
|---------|-------------|
| **Input** | `artifact_set` – a JSON array of objects `{ id: string, uri: string, type: enum }`. All URIs must be reachable via the configured storage provider. |
| **Output** | `validated_set` – JSON array of the same objects that have passed validation. <br> `failed_set` – JSON array of objects that could not be fixed after the max‑retry limit. |
| **Success Criteria** | `validated_set.length == artifact_set.length` **or** the workflow terminates with a **BLOCKED** status reporting the unrecoverable items. |
| **Idempotency** | Each artifact is processed by a deterministic hash of its `uri`. Re‑running the workflow on the same input yields identical `validated_set`/`failed_set`. |
| **SLAs** | Max total wall‑clock time: **4 h** for ≤ 500 artifacts. Individual artifact retry loop: **≤ 5 min** per attempt. |

---

## 2. High‑L
...[truncated]

Must

Choose dag, supervisor, or pipeline deliberately based on the work shape.
Explain the pattern choice in artifact metadata or a rationale.
Use dag for validation/fix/rerun loops when dependencies matter.

Must Not

Default blindly to dag for every workflow.
Collapse independent artifact work into one vague agent task.
Omit verification gates because the chosen pattern seems obvious.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.skill-boundary-copy (generation-quality/manual)

User message

Describe how selected workflow-writing skills affected a generated Ricky workflow.

Ricky output

### How Selected Workflow‑Writing Skills Shape a Generated Ricky Workflow  

| Skill | What the Skill Contributes | Direct Impact on the Generated Workflow |
|-------|----------------------------|------------------------------------------|
| **Deterministic Contract Definition** | Writes explicit input‑output schemas (JSON Schema, OpenAPI) for every step. | Guarantees that each node can be type‑checked before execution. The workflow engine inserts **contract‑validation gates** automatically, aborting early if a payload violates the schema. |
| **Verification Gate Insertion** | Adds deterministic checkpoints (e.g., `assert`, `expect`, `test‑case` nodes). | Each gate produces a **pass/fail artifact** that is stored in the run‑log. Failed gates become **blockers** that halt downstream steps, preventing cascading errors. |
| **Review Artifact Generation** | Emits markdown or HTML summaries (step diagram, contract diff, test matrix). | Enables a **human‑in‑the‑loop** review stage. The PR template automatically includes a “Review Checklist” that references these artifacts, ensuring reviewers have concrete evidence of correctness. |
| **80‑to‑100 Validation Loops** | Encodes a loop that r
...[truncated]

Must

Describe skills as generation-time selection, loading, and rendering inputs.
Point to metadata such as loaded-skills.txt and skill-application-boundary.json.
Say the workflow instructions were informed by selected skills.

Must Not

Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
Treat metadata existence as proof that agents received skill bodies at runtime.
Overstate current tests beyond the generation boundary.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)

User message

Generate a workflow from a vague product spec and return it to the user.

Ricky output

**Ricky – Workflow Authoring Assistant**

To turn a product specification into a reliable, testable workflow we need a concrete set of inputs, outputs, and success criteria. The current description is too high‑level for me to generate a deterministic workflow contract and verification gates.

**Please provide the following details:**

| Category | What to specify |
|----------|-----------------|
| **Goal** | One‑sentence description of the business outcome the workflow must achieve. |
| **Actors / Systems** | List of human roles, services, or APIs the workflow will interact with (e.g., “Customer”, “PaymentGateway API”, “Inventory DB”). |
| **Trigger** | Event that starts the workflow (e.g., “order placed”, “file uploaded”). |
| **Inputs** | Data payloads or parameters supplied at start (e.g., `orderId`, `customerId`). |
| **Steps** | High‑level actions in order (e.g., “Validate order”, “Reserve inventory”, “Charge payment”). If you have any existing sub‑tasks or scripts, list them. |
| **Outputs / Artifacts** | What the workflow should produce (e.g., “order confirmation email”, “transaction receipt”, “audit log entry”). |
| **Success / Failure Conditions** | Deterministic criteria 
...[truncated]

Must

Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
Validate with dry-run or targeted structural checks where possible.
Return artifacts, warnings, and follow-up commands honestly.

Must Not

Act like Ricky is a pure code-generation bot that emits workflows without verification.
Stop at "code compiles" as the proof bar.
Skip skill-aware workflow authoring guidance for serious workflows.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)

User message

A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.

Ricky output

**Ricky’s post‑failure protocol**

1. **Immediate detection & logging**  
   - The orchestrator’s watchdog records the timeout (Step X) and the “orphan” worker (Step Y).  
   - Ricky writes a structured log entry (`failure_event.json`) containing:
     ```json
     {
       "workflow_id": "<id>",
       "failed_step": "X",
       "error_type": "timeout",
       "orphan_worker": "Y",
       "timestamp": "<ISO‑8601>",
       "runtime_metrics": { … }
     }
     ```

2. **Blocker classification**  
   - Because a required artifact is missing, the failure is **hard‑blocking**.  
   - Ricky raises a **Blocker Ticket** in the issue tracker with the `workflow‑failure` label and links the `failure_event.json`.  
   - The ticket’s title follows the contract:  
     `BLOCKER: <workflow_id> – step X timeout & orphan worker Y`.

3. **Automatic containment actions**  
   - **Terminate** the orphan worker gracefully (or force‑kill if unresponsive) via the worker‑lifecycle API.  
   - **Rollback** any partial state changes recorded in the workflow’s transaction log up to the last successful checkpoint.  
   - **Mark** the workflow instance as `FAILED` in the state store; no further steps are sche
...[truncated]

Must

Classify the failure before deciding whether to retry.
Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
Preserve raw evidence and uncertainty when the class is not known.

Must Not

Blindly rerun the whole workflow without a blocker classification.
Treat every failure as a broken workflow definition.
Claim Ricky fixed the workflow before rerun evidence proves it.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)

User message

Ricky detects stale .agent-relay/, .relay/, and .trajectories/ state before launching a local workflow.

Ricky output

**Workflow Contract – “Stale‑State Guard for Local AgentWorkforce Runs”**  

| **Element** | **Definition** |
|-------------|----------------|
| **Name** | `guard‑stale‑state` |
| **Scope** | Local development environment – invoked automatically by the `aw‑run` wrapper before any workflow execution. |
| **Inputs** | 1. `repoRoot` – absolute path to the repository root (inferred from `git rev‑parse --show‑toplevel`).<br>2. `staleDirs` – list of relative paths to check: `[".agent‑relay", ".relay", ".trajectories"]`. |
| **Outputs** | `status` – `OK` or `BLOCKED`.<br>`report` – human‑readable markdown summary of findings. |
| **Exit Codes** | `0` – clean (no stale state).<br>`1` – stale state detected (blocked). |
| **Side‑effects** | If blocked, the workflow aborts and prints `report`. No files are modified. |
| **Deterministic Verification Gates** | 1. **Existence Check** – each directory must exist **and** contain at least one file whose modification timestamp is **≤ 5 min** old.<br>2. **Age Threshold** – any file older than **30 min** inside a target directory marks the directory as *stale*.<br>3. **Cross‑Dir Consistency** – if *any* of the three directories is stale, the whole gu
...[truncated]

Must

Classify stale local runtime state as an environment contamination issue.
Recommend quarantine or isolated-run guidance before launch.
Record the observed paths and the action taken or recommended.

Must Not

Treat stale runtime state as a workflow logic failure.
Delete or overwrite state without an explicit safe path or user intent.
Continue into execution as if the workspace were clean.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)

User message

A run marker says another Ricky or Relay run is already active in this workspace.

Ricky output

**Fixing “Another Ricky or Relay run is already active in this workspace”**

The marker that blocks a new run is a safety file created by the **Ricky** runtime to prevent overlapping executions. When the file is left behind (e.g., a crash, an interrupted CI job, or a stray background process), the next invocation thinks a run is still in progress and aborts with the message you’re seeing.

Below is the deterministic, step‑by‑step contract for clearing the marker safely, verifying that no other run is actually active, and restoring normal workflow operation.

---

## 1️⃣ Verify the workspace state

| Check | Command | Expected outcome |
|-------|---------|------------------|
| Active Ricky processes | `ps aux \| grep -i ricky | grep -v grep` | **No** lines that show a `ricky` or `relay` binary running. |
| Active Relay processes | `ps aux \| grep -i relay | grep -v grep` | **No** lines that show a `relay` binary running. |
| Workspace lock file | `ls -l .ricky/run.lock` | File **exists** (that’s why you got the error). |
| Git status (optional) | `git status --porcelain` | Clean working tree (no uncommitted changes). |

If any process is still alive, **stop it first** (e.g., `kill -
...[truncated]

Must

Report the active marker, run id, or status path when available.
Ask the user to inspect, wait for, or explicitly clear the active run.
Avoid launching a competing run that could corrupt evidence.

Must Not

Silently start another run.
Hide the existing run marker from the user.
Treat the conflict as a generic failure with no recovery path.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.env-loader-injection-runtime-loadable (runtime-recovery/manual)

User message

A workflow artifact references a MISSING_ENV_VAR value. Ricky's deterministic auto-fix injects the .env.local / .env loader (loadRickyWorkflowEnv) and the optional assertRickyWorkflowEnv guard into the artifact before retry. The artifact may be a master-rendered workflow whose .step({ command: ... }) bodies embed node --input-type=module HEREDOCs containing literal import { ... } from 'node:fs' / from 'node:path' strings.

Ricky output

**Deterministic auto‑fix for a workflow that references `MISSING_ENV_VAR`**

Below is a concrete, reproducible contract that you can apply to any AgentWorkforce workflow artifact (including master‑rendered workflows that embed HEREDOC command strings). The fix injects the environment loader **once**, adds an optional guard, and guarantees that the workflow will retry only after the environment is available.

---

## 1. Scope & Preconditions  

| Item | Requirement |
|------|--------------|
| **Artifact type** | Any `.js`/`.ts` workflow file that is later rendered by the master‑renderer. It may contain `.step({ command: \`node --input-type=module …\` })` blocks with HEREDOC strings. |
| **Missing variable** | `process.env.MISSING_ENV_VAR` is accessed directly or via a template literal. |
| **Runtime** | Node ≥ 18 (supports `node:` imports). |
| **Repo layout** | Root contains `loadRickyWorkflowEnv.js` (or `.ts`) and optionally `assertRickyWorkflowEnv.js`. |
| **CI gate** | `npm run lint && npm test` must pass after the fix. |

---

## 2. Fix Contract  

1. **Add the loader import at the top of the file** (once, before any step definitions).  
   ```js
   // ----- AUTO‑INSERTED BY RI
...[truncated]

Must

Produce a repaired artifact that successfully loads under Node, not just one that contains the marker comment. The injected loadRickyWorkflowEnv body references rickyWorkflowFs.* and rickyWorkflowPath.*, so the repair must also add the corresponding import * as rickyWorkflowFs from 'node:fs' and import * as rickyWorkflowPath from 'node:path' aliases at module top level.
Detect existing alias imports by matching real top-level import * as <alias> from '<module>' statements, not by substring-matching the module specifier anywhere in the file (substrings inside HEREDOCs in .step({ command: ... }) bodies do not count as imports).
Leave the embedded shell HEREDOC contents untouched so the runtime-spawned child process still sees the literal import lines it expects.

Must Not

Skip adding the rickyWorkflowFs / rickyWorkflowPath aliases because from 'node:fs' or from 'node:path' already appears somewhere in the file as a string literal.
Inject loadRickyWorkflowEnv (or assertRickyWorkflowEnv) without the supporting alias imports, which produces a ReferenceError: rickyWorkflowPath is not defined at module load and burns the auto-fix budget on UNSUPPORTED_RUNTIME at runtime-launch.
Rewrite or escape the embedded HEREDOC text in step commands.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)

User message

Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.

Ricky output

**Local Workflow Execution – Auto‑Fix Run**

Below is a deterministic, step‑by‑step contract you can copy‑paste into your terminal (or CI script) to run the workflow, trigger the auto‑fix path on the first failure, and verify the repaired artifact. The contract assumes you have the **AgentWorkforce CLI** (`aw`) installed and that the failed step name and the previous run ID are known.

---

## 1️⃣ Pre‑flight Checks

| Check | Command | Expected outcome |
|-------|---------|------------------|
| CLI version | `aw --version` | `>=2.4.0` |
| Repo root | `git rev-parse --show-toplevel` | prints absolute path |
| Clean working tree | `git status --porcelain` | no output (no uncommitted changes) |
| Docker daemon (if workflow uses containers) | `docker info` | daemon reachable |

If any check fails, resolve it before proceeding.

---

## 2️⃣ Input Parameters (populate before running)

```bash
# ---- USER‑PROVIDED VALUES ----
FAILED_STEP="my_step_name"          # exact name as shown in the failure log
PREV_RUN_ID="run-2024-05-28-01"     # ID of the successful run that produced the artifact
WORKFLOW_PATH="./workflows/ci.yml"  # relative path to the workflow definition
ARTIFACT_DIR="./artif
...[truncated]

Must

Use a bounded retry budget and summarize every attempt.
Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
Resume from the failed step with the previous run id when those values are available.

Must Not

Edit arbitrary repository source files as the default auto-fix surface.
Keep retrying after the configured max attempts.
Lose the single Ricky tracking run id across repair/resume attempts.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.no-auto-fix-preserves-single-attempt (runtime-recovery/manual)

User message

A user runs ricky run workflows/foo.ts --no-auto-fix and the workflow fails.

Ricky output

**Quick‑Start Failure Diagnosis**

When `ricky run workflows/foo.ts --no-auto-fix` exits with a non‑zero status, the engine has hit a **deterministic verification gate** that could not be auto‑repaired. Follow the steps below to isolate the root cause, capture the required artifacts, and get the workflow back to a green state.

---

## 1. Capture the Failure Report

1. **Run with full diagnostics**  
   ```bash
   ricky run workflows/foo.ts --no-auto-fix --log-level=debug --output=./ricky-report

Creates ricky-report/ containing:

run.log – raw console output
trace.json – step‑by‑step execution trace (state before/after each node)
validation.json – results of all contract checks (pre‑, intra‑, post‑conditions)

Locate the gate that failed
Open validation.json and search for "status":"FAIL"; note the gateId, nodeId, and the expected vs actual values.

2. Common Failure Categories & Immediate Fixes

Category	Typical Symptom	What to Check	Quick Fix
Schema Mismatch	“Contract violation: input.type !== string”	Verify the input contract
...[truncated]


**Must**

- Preserve one-attempt behavior when auto-fix is disabled.
- Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
- Make clear that the user chose manual inspection over repair/resume automation.

**Must Not**

- Start a repair loop despite `--no-auto-fix`.
- Suppress the diagnosis because no repair was attempted.
- Present the failure as a completed repair attempt.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>runtime-recovery.in-process-local-runner</code> (runtime-recovery/manual)</summary>

**User message**

> Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.

**Ricky output**

```text
**Ricky – Local Execution of a TypeScript Workflow Artifact**  
*(Primary local path = `./workflows` – the default location for all workflow source files in the AgentWorkforce repo.)*

---

## 1. Preconditions (Contract)

| Item | Requirement | How to Verify |
|------|-------------|---------------|
| **Workspace** | You are in the root of the AgentWorkforce repo (`git rev‑parse --show-toplevel`). | `pwd` should end with the repo root. |
| **Node version** | `>= 18.0.0` (LTS) and `npm >= 9`. | `node -v && npm -v`. |
| **TypeScript compiler** | `tsc` available (installed via dev‑deps). | `npx tsc --version`. |
| **Workflow artifact** | A single `.ts` file under `./workflows` (e.g., `myWorkflow.ts`). | `ls ./workflows/*.ts`. |
| **Dependencies** | All `package.json` deps installed. | `npm ci` must succeed with a clean `node_modules`. |
| **Ricky config** | `ricky.yaml` present at repo root with `execution.localPath: "./workflows"`. | `cat ricky.yaml | grep execution.localPath`. |

If any contract item fails, **report a blocker** and abort execution.

---

## 2. Deterministic Verification Gates

| Gate | Command | Expected Outcome |
|------|---------|------------------|
| **G‑01: Lint*
...[truncated]

Must

Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the agent-relay binary on PATH.
Precheck that Node and @agent-relay/sdk are resolvable for the workflow.
Record the actual spawn command in execution evidence.

Must Not

Fail solely because agent-relay is not on PATH when the SDK route is available.
Hide the actual runtime command from evidence.
Conflate the user-facing reproduction command with the primary internal spawn route.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)

User message

Ricky reaches a boundary after a structural failure persists after a fix attempt.

Ricky output

**Boundary‑Reached Protocol – Persistent Structural Failure**

When a structural failure remains after an attempted fix, the workflow must be halted, documented, and escalated in a deterministic, auditable way. Follow the contract below to keep the AgentWorkforce (AW) repository clean, reproducible, and ready for a coordinated recovery.

---

## 1. Immediate Blocker Declaration
1. **Create a blocker issue** in the `#workflow‑blockers` project board (or the repository’s `issues` with label `blocker`).  
   - **Title:** `BLOCKER – Persistent structural failure in <component‑name>`  
   - **Body template:**
     ```markdown
     ## Summary
     Structural failure persists after fix attempt #<PR‑id>.

     ## Symptoms
     - Symptom 1
     - Symptom 2
     - ...

     ## Attempted Fix
     - PR/branch: <link>
     - Changes: <short description>
     - Verification results: <failed gate name> (details)

     ## Impact
     - Downstream workflows affected: <list>
     - SLA breach risk: <yes/no>

     ## Next Steps (proposed)
     - Open a dedicated recovery branch `recovery/<component>-<date>`
     - Assign a lead (see §2)
     - Schedule a triage meeting (within 4 h)

     ## Attachmen
...[truncated]

Must

Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
Distinguish escalation from a generic product failure.
Preserve enough context for a human operator to continue.

Must Not

Retry speculative fixes indefinitely.
Collapse to "something went wrong" without the attempted actions.
Discard evidence from failed repair attempts.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)

User message

Produce a workflow health digest from many Ricky workflow runs.

Ricky output

**Ricky Workflow Health Digest – Template & Usage Guide**  

Below is a ready‑to‑use, deterministic digest that aggregates the health signals from any number of Ricky workflow runs. Plug‑in the data from your CI/CD system (GitHub Actions, GitLab CI, Jenkins, etc.) and the digest will produce a concise, actionable report that respects the Ricky repository conventions (AGENTS.md, workflow standards, shared authoring rules).

---

## 1️⃣ Digest Contract (Input / Output)

| Element | Type | Description | Required |
|---------|------|-------------|----------|
| `run_id` | string | Unique identifier of the workflow run (e.g., GitHub run number). | ✅ |
| `branch` | string | Git ref the run executed on (e.g., `main`, `feature/x`). | ✅ |
| `started_at` | ISO‑8601 timestamp | When the run started. | ✅ |
| `ended_at` | ISO‑8601 timestamp | When the run finished. | ✅ |
| `status` | enum(`success`,`failure`,`cancelled`,`skipped`) | Final outcome. | ✅ |
| `stage_results` | array of objects | Per‑stage health (see **StageResult** below). | ✅ |
| `blockers` | array of strings | Human‑readable blocker messages (empty if none). | ✅ |
| `review_artifacts` | array of URLs | Links to PR reviews, covera
...[truncated]

Must

Consume normalized WorkflowRunEvidence rather than raw logs as the primary input.
Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
Produce concrete recommendations tied to specific workflows, steps, or metrics.

Must Not

Mutate evidence while analyzing it.
Return generic advice like "improve your workflow" without references.
Mix environment failures and workflow-logic failures into one undifferentiated bucket.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.preserve-pr-shipping-during-repair (runtime-recovery/manual)

User message

A persona-authored workflow that imports @agent-relay/github-primitive and uses createGitHubStep to open a PR fails its runtime-precheck. The auto-fix repair persona is invoked. Describe the contract the repaired artifact must satisfy.

Ricky output

**Repaired Workflow Contract**

The repaired artifact must be a **single, self‑contained workflow file** that satisfies the **runtime‑precheck** and the **deterministic verification pipeline** defined in the AgentWorkforce standards. Below is the contract broken into concrete sections that the auto‑fix persona must guarantee before the PR can be merged.

| # | Contract Element | Required State | How to Verify |
|---|------------------|----------------|---------------|
| 1 | **File Header & Metadata** | • `name: <workflow‑name>` <br>• `description: <short‑sentence>` <br>• `version: semver (≥ 1.0.0)` <br>• `author: <persona‑id>` | Lint rule `wf‑metadata‑header` checks presence and format. |
| 2 | **Import Statement** | Must import the **exact** package name and version that the runtime expects: <br>```ts\nimport { createGitHubStep } from '@agent‑relay/github‑primitive@^2.3.0';\n``` | The pre‑check parser resolves the import via the workspace lockfile; any mismatch fails `import‑resolution‑gate`. |
| 3 | **Step Factory Usage** | The workflow must call `createGitHubStep` **once** (or more, but each call must be wrapped in a `GitHubStep` object) and assign the result to a variable that 
...[truncated]

Must

Preserve the @agent-relay/github-primitive import, the GitHubStepExecutor reference, and every createGitHubStep(...) invocation that the original artifact declared.
Keep the failing workflow runnable from the same path with the same workflow(...) builder and .run({ cwd: process.cwd() }) invocation.
Retain at least ceil(N / 2) of the original workflow's .step(...) calls when the original declared four or more steps; a repair that collapses to a 2-3 step placeholder is a regression, not a fix.
Reject (do not apply) any repair output whose step list reduces to prepare-context / runtime-precheck: true / final-signoff: echo placeholder; surface the regression diagnostic instead.

Must Not

Strip createGitHubStep, GitHubStepExecutor, or @agent-relay/github-primitive because the runtime-precheck failure mentioned PR-shipping or git side effects. The repair contract's "no commit / no push" constraint applies to the REPAIR AGENT's runtime behavior, not to the workflow's step declarations.
Emit a "minimal repair-safe master" or "simplified Ricky master" scaffold that passes the builder validator while doing none of the original work.
Treat "the workflow now builds and runs" as success when the work it was supposed to ship is gone.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)

User message

Design a new Ricky Slack surface for workflow debugging.

Ricky output

**Ricky Slack Surface – “Workflow Debugger”**

---

## 1. Overview
The **Workflow Debugger** Slack surface gives engineers a deterministic, low‑latency way to inspect, step‑through, and mutate running Ricky workflows directly from a Slack channel or DM. It is built as a **Slack App** that talks to the existing Ricky API gateway and uses the **Workflow Execution Service (WES)** for introspection.

| Component | Responsibility |
|-----------|-----------------|
| **Slack Bot** (`ricky-debugger`) | Slash‑command handling, interactive blocks, modal dialogs, and result posting. |
| **Ricky API Extension** (`/debug/*`) | Auth‑checked endpoints for: <br>• `GET /debug/{runId}` – snapshot of current state <br>• `POST /debug/{runId}/step` – advance one step <br>• `POST /debug/{runId}/set` – inject variable/value <br>• `POST /debug/{runId}/abort` – terminate run |
| **Verification Service** | Executes deterministic “gate” functions before any state‑changing debug action (e.g., “step” only allowed if run is *paused*). |
| **Audit Log** | Immutable record of every debug command, user, timestamp, and before/after state diff. |
| **Permission Matrix** | Scoped to `ricky-debugger` r

---

_Comment truncated to stay within GitHub limits. Download the `ricky-eval-run` artifact for the full `human-review.md`._

- power-user-parser: don't consume a following flag as the --input value. `--input --run` previously swallowed `--run`, silently dropping it. Now only advance the index when the next token is a real value; report an error otherwise. Empty pair ("") is also rejected. - entrypoint idle watchdog: record the abort reason on stderr + events inside the watchdog callback (before abort()), so it survives the runner promise rejecting and surfaces as the real failure cause. The catch block now classifies an idle abort as 'timed_out' and avoids double-logging. - entrypoint: clear the idle interval in a finally block so a synchronous throw from this.runner() (before withTimeout is reached) can't leak it. - resolveIdleTimeoutMs: require a non-negative integer. A fractional value like 0.5 previously floored to 0 and silently disabled the watchdog; now it falls back to the default. - cli-main: document --input KEY=VALUE in renderHelp(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

gemini-code-assist Bot reviewed May 30, 2026

View reviewed changes

pullfrog Bot reviewed May 30, 2026

View reviewed changes

coderabbitai Bot reviewed May 30, 2026

View reviewed changes

Comment thread src/surfaces/cli/commands/cli-main.ts

cubic-dev-ai Bot reviewed May 30, 2026

View reviewed changes

Comment thread src/local/entrypoint.ts Outdated

Comment thread src/surfaces/cli/flows/power-user-parser.ts Outdated

Comment thread src/local/entrypoint.ts Outdated

kjgbot merged commit fb53704 into main May 30, 2026
4 checks passed

kjgbot deleted the fix/inject-input-env-and-idle-watchdog branch May 30, 2026 09:00

Conversation

kjgbot commented May 30, 2026

Problem

1. --input KEY=VALUE was silently dropped

2. No inactivity detection

Fix

Verification

Uh oh!

coderabbitai Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Ricky Eval Review

Human Review Cases

2. Common Failure Categories & Immediate Fixes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. `--input KEY=VALUE` was silently dropped

coderabbitai Bot commented May 30, 2026 •

edited

Loading

cubic-dev-ai Bot left a comment •

edited

Loading

github-actions Bot commented May 30, 2026 •

edited

Loading