Skip to content

fix(auto-fix): refuse full workflow restart when expensive work completed#99

Merged
khaliqgant merged 4 commits into
mainfrom
fix/router-loop-skips-impl-rerun
May 12, 2026
Merged

fix(auto-fix): refuse full workflow restart when expensive work completed#99
khaliqgant merged 4 commits into
mainfrom
fix/router-loop-skips-impl-rerun

Conversation

@khaliqgant
Copy link
Copy Markdown
Member

Summary

  • Fixes a silent restart-from-scratch bug in Ricky's auto-fix loop that wasted ~1.5–2 hours per cycle in long multi-milestone workflows
  • When Ricky couldn't parse a concrete failedStep from evidence, it omitted --start-from; the SDK runner treats --previous-run-id without --start-from as a brand-new run and re-spawned every impl-* track
  • New behavior: refuse to restart from scratch if any non-synthetic step already completed; escalate with a clear refusal message; preserve restart-from-scratch for synthetic-stage failures (runtime-launch, local-runtime, runtime-precheck) where no expensive work has happened
  • Escape hatch: RICKY_AUTO_FIX_ALLOW_FULL_RESTART=1 opts back into old behavior for small workflows where a full restart is genuinely cheap

Why this happened

Both the M2 and M3 chain runs (proactive-runtime, 7 repos, ~$1k+ of Claude/Codex compute per cycle) hit this. signoff-r1 would emit SIGNOFF: COMPLETE with a full multi-page audit, the downstream fix-r2/signoff-final/commit-push-* step would fail in a way that left evidence.failed_step empty, and Ricky's three retry paths in auto-fix-loop.ts (lines 357, 382, 470) all silently dropped startFromStep. The next attempt got a fresh runId and ran the entire DAG from the top. After the third loop M2 was killed manually after ~5 hours.

What changed

  • src/local/auto-fix-loop.ts — added workflowDidExpensiveWork, safeToRetryWithoutStartFromStep, fullRestartExplicitlyAllowed, FULL_RESTART_REFUSAL_REASON; wired into all three retry sites
  • src/local/auto-fix-loop.test.ts — added expensiveWorkBlockerResponse helper and two regression tests covering both the refusal path and the env-var bypass

Test plan

  • npx vitest run src/local/auto-fix-loop.test.ts — 32/32 pass (was 30/30)
  • npx tsc --noEmit clean
  • Full suite npx vitest run — 1077/1079 (same 2 pre-existing flaky integration tests fail on main)
  • Reviewer: confirm synthetic-stage retry path (runtime-launch etc.) still restarts correctly — this is preserved on purpose

Operational note

The next time a long workflow hits this failure mode with this branch installed, Ricky surfaces a clear refusal message pointing the operator at:

ricky run <workflow.ts> --start-from <step> --previous-run-id <id>

Cached step outputs live at .agent-relay/step-outputs/<runId>/, so manual resume-from-any-step works.

🤖 Generated with Claude Code

khaliqgant and others added 2 commits May 11, 2026 13:51
… never fires)

The NODE_OPTIONS register script monkeypatches child_process.spawn so that
agent-relay-broker children have their stdout drained — preventing the
broker from blocking on write() once the OS pipe buffer fills.

The previous wiring attached a 'data' listener inside a callback that only
ran when child.stdout emitted 'pause'. Node Readable streams never emit
'pause' on internal buffer fill — that event only fires when something
explicitly calls .pause() (which nothing in this code path ever does),
so the stream stayed in paused mode, libuv stopped draining the kernel
pipe at high-water mark, and a chatty broker would block in write() once
~64KB of stdout queued up.

Symptom: overnight proactive-runtime runs (Ricky-driven, M1 fans out to
9 PTY workers) froze within seconds of fanout with every worker log
stuck at the same mtime, broker process parked in write() or
_pthread_cond_wait, M1's step.run awaiting a never-arriving drain
signal. Reproduced twice (~14h apart) with diagnostic bundles capturing
the same shape.

Changes
- Attach `data` listener and call `resume()` synchronously at spawn time
  for both `init` and `pty` broker invocations. This matches what SDK
  6.0.15's `drainBrokerStdoutAfterStartup` does for direct SDK consumers.
- Expand the argv guard from `argv[0]==="init"` to also include `"pty"`,
  so per-worker PTY brokers (M1's lead + impl-* fanout) are protected,
  not just the channel-multiplexer init broker.
- Update the explanatory comment block above `registerSource` to capture
  the new semantics and the prior bug.

Verification
- npm run typecheck — clean
- npm test — 1077 / 1077 pass, including the existing
  "drains broker stdout after SDK startup so event floods cannot wedge
  the workflow node" regression at entrypoint.test.ts:3122.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ve work completes

When Ricky's auto-fix loop could not parse a concrete `failedStep` from the
workflow's evidence, the retry path silently omitted `--start-from`. The
agent-relay SDK runner treats `--previous-run-id` without `--start-from` as
"no resume" (run.ts:91) and starts the workflow over from step 0.

For the proactive-runtime M2/M3 chain runs this meant: after `signoff-r1`
emitted "SIGNOFF: COMPLETE" the downstream gate or fixer step would occasionally
fail in a way that left no per-step failure record, and Ricky would respond by
re-spawning every `impl-*` track from scratch — burning ~1.5–2 hours per cycle
on duplicated work that the prior signoff already certified.

The new policy is: if the prior attempt completed at least one real
(non-synthetic) step and we have no step-level resume anchor, escalate to the
operator instead of restarting. Synthetic-stage failures
(runtime-launch / local-runtime / runtime-precheck) still retry as before
because no expensive work happened. Operators who want the old behavior can
set `RICKY_AUTO_FIX_ALLOW_FULL_RESTART=1`.

The refusal is wired into all three retry paths that previously allowed an
undefined `failedStep`:
  - workflow-repair retry (post-persona / post-deterministic repair)
  - workflow-repair-provider-failure retry
  - direct-repair retry

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 12, 2026

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: e53c5661-254b-43ff-8e3f-fbbc9abfa949

📥 Commits

Reviewing files that changed from the base of the PR and between de29355 and a7485eb.

📒 Files selected for processing (5)
  • src/local/auto-fix-loop.test.ts
  • src/local/auto-fix-loop.ts
  • src/local/entrypoint.test.ts
  • src/local/entrypoint.ts
  • src/shared/models/workflow-evidence.ts

📝 Walkthrough

Walkthrough

This PR adds a safety gate preventing automatic full-restart retries when prior workflow attempts have already completed real steps but lack a step-level resume anchor, enriches evidence with parsed per-step data and completedStepCount, integrates the gate into multiple runWithAutoFix retry decision points, and adds tests and fixtures validating escalation and the env override.

Changes

Auto-fix retry safety and evidence parsing

Layer / File(s) Summary
Retry-safety helpers and imports
src/local/auto-fix-loop.ts
New helper functions detect whether full restart is explicitly allowed via RICKY_AUTO_FIX_ALLOW_FULL_RESTART, determine if evidence shows completed real (non-synthetic) workflow steps, and compute whether retry without startFromStep is safe when failedStep is missing. A shared FULL_RESTART_REFUSAL_REASON is introduced for aligned warnings/escalations.
runWithAutoFix safety gate integration
src/local/auto-fix-loop.ts
Safety gate checks are integrated into multiple repair-retry paths within runWithAutoFix: after code-drift repair, after Workforce persona workflow repair, when the workflow repair provider fails, and before retrying after direct repair failure. When unsafe, the run escalates with fix_error, attemptSummary.warning, pushed warnings, nextActions, and escalation options instead of scheduling a root restart.
Evidence enrichment and conversion
src/local/auto-fix-loop.ts, src/shared/models/workflow-evidence.ts
localResponseToWorkflowRunEvidence now prefers structured workflow_steps when present, adds completedStepCount from execution.steps_completed, and introduces workflowStepsFromExecutionEvidence to convert structured step snapshots into the shared WorkflowStepEvidence[]. The WorkflowRunEvidence interface gains optional completedStepCount?: number.
Runtime parsing and entrypoint updates
src/local/entrypoint.ts
Parse coordinator runtime stdout/stderr into LocalWorkflowStepSnapshot entries via workflowStepsFromRuntimeOutput(), filter synthetic stages, derive steps_completed/steps_total, and populate evidence.workflow_steps when available; fallback behavior preserved for runs without parsed steps.
Tests and fixtures
src/local/auto-fix-loop.test.ts, src/local/entrypoint.test.ts
Add expensiveWorkBlockerResponse fixture and multiple runWithAutoFix Vitest cases covering missing failed_step with many completed steps (escalation, truncated tail handling, suppression of missing-previous-run-id warning, code-drift restart refusal while still applying repair, and RICKY_AUTO_FIX_ALLOW_FULL_RESTART override). Add runLocal regression test asserting parsed per-step evidence and step counts. Update fs/promises imports used by tests.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • AgentWorkforce/ricky#69: Touches runWithAutoFix and code-drift repair/retry behavior that interacts with the restart-refusal gate.
  • AgentWorkforce/ricky#58: Modifies auto-fix retry/resume logic and tests; overlaps with startFromStep/previous-run resume behavior.
  • AgentWorkforce/ricky#82: Related edits to retry/evidence handling and filtering of synthetic stage IDs impacting startFromStep selection.

Poem

A rabbit counts the steps beneath the logs,
If anchors hide, it chooses careful logs—
It rings an alarm instead of breaking ground,
And watches streams flow steady, without a sound. 🐰✨

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (3 warnings)

Check name Status Explanation Resolution
Linked Issues check ⚠️ Warning PR #39 objective to repair git diff manifest gates is not addressed in the changeset; only auto-fix loop and entrypoint stream handling changes are present. Either address the git diff manifest gate repair requirements from #39 or clarify why they were descoped from this PR.
Out of Scope Changes check ⚠️ Warning entrypoint.ts changes to fix streaming deadlock are out of scope relative to the linked issue #39 objectives focused on git diff manifest gates. Verify whether entrypoint.ts stream handling fix is intentionally related to #39 or should be addressed separately.
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed Title accurately describes the main change: refusing full workflow restart when expensive work is completed, directly addressing the core bug fix.
Description check ✅ Passed Description provides detailed context about the bug, its impact, the fix, and test coverage; all directly related to the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/router-loop-skips-impl-rerun

Comment @coderabbitai help to get the list of available commands and usage tips.

Resolves conflict in src/local/entrypoint.ts (broker stdout drain):
took main's superset implementation that handles both the immediate
flow-mode kick AND the SDK readline-close pause path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/local/auto-fix-loop.ts`:
- Around line 424-443: The code emits a "retrying..." missing-run-id warning
even when you immediately refuse a full restart; update the logic so the
missing-run-id warning is only added when
safeToRetryWithoutStartFromStep(failedStep, evidence) returns true (or move the
missing-run-id warning emission to after that guard). Specifically, adjust where
you push the "Auto-fix retry could not resolve a previous run id; retrying
without step-level resume." into warnings (or set attemptSummary) so it is
conditional on safeToRetryWithoutStartFromStep(failedStep, evidence) before
calling withAutoFix(...) and attachEscalationOptions(...); ensure
FULL_RESTART_REFUSAL_REASON, attemptSummary.fix_error, withAutoFix,
attachEscalationOptions, and debuggerResult.summary/recommendation.steps remain
unchanged otherwise.
- Around line 137-140: workflowDidExpensiveWork currently inspects only
evidence.steps (function workflowDidExpensiveWork, type WorkflowRunEvidence)
which can be truncated when execution.evidence.logs.tail is chatty and causes
false negatives; update the check to use an authoritative source before allowing
a retry: either require and read a persisted full log or an explicit
completed-steps signal carried on the response evidence (e.g., add/read
evidence.completedStepCount or evidence.fullLogs and use that to determine if
any real non-synthetic step completed), and ensure the retry path that looks at
failedStep / startFromStep consults that authoritative field (or rehydrate full
logs from the log store) instead of relying solely on
execution.evidence.logs.tail and evidence.steps.
- Around line 149-155: The code-drift (drift-report) retry path currently
rebuilds a root restart request when failedStep is missing without consulting
safeToRetryWithoutStartFromStep; update the drift-report/code-drift retry branch
to call safeToRetryWithoutStartFromStep(failedStep, evidence) and only perform
the expensive root-restart rebuild if that returns true (otherwise skip the
rebuild/abort the retry or return a non-restart outcome); use the same gating
logic as the workforce/direct/provider-failure flows so that drift retries are
blocked when workflowDidExpensiveWork(evidence) or
fullRestartExplicitlyAllowed() indicate it’s unsafe.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: cd4cddbd-6605-4642-8fe0-7c11463f500f

📥 Commits

Reviewing files that changed from the base of the PR and between 0c28ce6 and de29355.

📒 Files selected for processing (3)
  • src/local/auto-fix-loop.test.ts
  • src/local/auto-fix-loop.ts
  • src/local/entrypoint.ts

Comment on lines +137 to +140
function workflowDidExpensiveWork(evidence: WorkflowRunEvidence): boolean {
return evidence.steps.some(
(step) => step.status === 'passed' && !isSyntheticStageId(step.stepId),
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

This expensive-work detector still has a long-run false negative.

workflowDidExpensiveWork() only looks at evidence.steps, but later in this file those steps are reconstructed from execution.evidence.logs.tail. On chatty workflows, the tail can omit every earlier ✓ ... — completed line while still failing to expose a concrete failedStep, which makes this return false and allows the full restart anyway. To make the refusal reliable, carry an authoritative “real steps completed” signal/count into the response evidence or rehydrate from the persisted full logs before allowing a no-startFromStep retry.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/local/auto-fix-loop.ts` around lines 137 - 140, workflowDidExpensiveWork
currently inspects only evidence.steps (function workflowDidExpensiveWork, type
WorkflowRunEvidence) which can be truncated when execution.evidence.logs.tail is
chatty and causes false negatives; update the check to use an authoritative
source before allowing a retry: either require and read a persisted full log or
an explicit completed-steps signal carried on the response evidence (e.g.,
add/read evidence.completedStepCount or evidence.fullLogs and use that to
determine if any real non-synthetic step completed), and ensure the retry path
that looks at failedStep / startFromStep consults that authoritative field (or
rehydrate full logs from the log store) instead of relying solely on
execution.evidence.logs.tail and evidence.steps.

Comment on lines +149 to +155
function safeToRetryWithoutStartFromStep(
failedStep: string | undefined,
evidence: WorkflowRunEvidence,
): boolean {
if (failedStep) return true;
if (fullRestartExplicitlyAllowed()) return true;
return !workflowDidExpensiveWork(evidence);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Apply this gate to the code-drift retry path too.

safeToRetryWithoutStartFromStep() is only consulted in the workforce/direct/provider-failure flows. The earlier code-drift branch still rebuilds a root restart request when failedStep is missing, so a drift-report workflow can still re-run every completed real step — the same expensive-restart case this PR is trying to block.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/local/auto-fix-loop.ts` around lines 149 - 155, The code-drift
(drift-report) retry path currently rebuilds a root restart request when
failedStep is missing without consulting safeToRetryWithoutStartFromStep; update
the drift-report/code-drift retry branch to call
safeToRetryWithoutStartFromStep(failedStep, evidence) and only perform the
expensive root-restart rebuild if that returns true (otherwise skip the
rebuild/abort the retry or return a non-restart outcome); use the same gating
logic as the workforce/direct/provider-failure flows so that drift retries are
blocked when workflowDidExpensiveWork(evidence) or
fullRestartExplicitlyAllowed() indicate it’s unsafe.

Comment on lines +424 to +443
if (!safeToRetryWithoutStartFromStep(failedStep, evidence)) {
attemptSummary.fix_error = FULL_RESTART_REFUSAL_REASON;
warnings.push(FULL_RESTART_REFUSAL_REASON);
const escalated = withAutoFix(response, maxAttempts, attempts, attemptSummary.status, warnings, trackingRunId);
escalated.nextActions = [
...escalated.nextActions,
FULL_RESTART_REFUSAL_REASON,
debuggerResult.summary,
...debuggerResult.recommendation.steps.map((step) => step.description),
];
attachEscalationOptions(escalated, {
request: currentRequest,
response,
debuggerResult,
reason: FULL_RESTART_REFUSAL_REASON,
trackingRunId,
artifactPath: repairedArtifactPath,
});
return escalated;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Avoid emitting a “retrying…” warning on the escalation path.

If runId is missing, the block just above this one already appends Auto-fix retry could not resolve a previous run id; retrying without step-level resume.. When this guard then refuses the full restart, the response contains contradictory guidance: Ricky says it is retrying and refusing to retry in the same result. Move the missing-run-id warning behind this check, or only emit it when safeToRetryWithoutStartFromStep() is true.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/local/auto-fix-loop.ts` around lines 424 - 443, The code emits a
"retrying..." missing-run-id warning even when you immediately refuse a full
restart; update the logic so the missing-run-id warning is only added when
safeToRetryWithoutStartFromStep(failedStep, evidence) returns true (or move the
missing-run-id warning emission to after that guard). Specifically, adjust where
you push the "Auto-fix retry could not resolve a previous run id; retrying
without step-level resume." into warnings (or set attemptSummary) so it is
conditional on safeToRetryWithoutStartFromStep(failedStep, evidence) before
calling withAutoFix(...) and attachEscalationOptions(...); ensure
FULL_RESTART_REFUSAL_REASON, attemptSummary.fix_error, withAutoFix,
attachEscalationOptions, and debuggerResult.summary/recommendation.steps remain
unchanged otherwise.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

Open in Devin Review

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 12, 2026

Ricky Eval Review

Run: .ricky/evals/runs/2026-05-12T09-04-25-000Z-HEAD-provider
Mode: provider
Git SHA: a45fe62

Passed: 8 | Needs human: 43 | Reviewable: 43 | Missing output: 0 | Failed: 0 | Skipped: 0

Human Review Cases

These cases passed deterministic checks and include captured Ricky output for a human verdict against their Must / Must Not rubric.

REVIEW agent-assistant-boundary.real-reuse-not-rhetorical (agent-assistant-boundary/manual)

User message

Update Ricky docs and code to say it uses Agent Assistant more deeply.

Ricky output

**Summary**  
We will extend the Ricky repository so that the documentation and the codebase explicitly state that Ricky now leverages the **Agent Assistant** (AA) for deeper orchestration, validation, and self‑repair. The change is limited to documentation, a small wrapper around the AA client, and updated CI checks that enforce the new contract.

---

## 1. Scope & Branch Strategy
| Item | Detail |
|------|--------|
| **Branch** | `feature/aa‑integration‑ricky` |
| **PR Title** | `feat(ricky): integrate Agent Assistant for deeper orchestration` |
| **PR Description** | Adds AA client wrapper, updates `README.md`, `AGENTS.md`, and workflow contracts to reflect the deeper AA usage. Includes new deterministic verification gate `aa‑validation` and expands the validation loop target to 80‑100 cycles. |
| **Reviewers** | `@team‑mlops`, `@doc‑lead`, `@ci‑engineer` |
| **Blocking Issues** | None – all dependent packages (`agent‑assistant>=2.3.0`) are already in `requirements.txt`. |

---

## 2. Code Changes

| File | Change |
|------|--------|
| `ricky/__init__.py` | Add import of `AgentAssistantClient` and expose `aa_client` singleton. |
| `ricky/aa_wrapper.py` | New module: thin wrappe
...[truncated]

Must

  • Ground claims in real package imports and runtime paths.
  • Distinguish current implementation from target architecture.
  • Identify which Agent Assistant primitive is actually exercised.

Must Not

  • Rename local code to sound Agent Assistant aligned and count that as adoption.
  • Claim broad Agent Assistant native behavior from documentation-only alignment.
  • Blur target architecture with landed behavior.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.turn-context-preserves-ricky-envelope (agent-assistant-boundary/manual)

User message

Evaluate the current Ricky @agent-assistant/turn-context adoption.

Ricky output

**Ricky @agent‑assistant/turn‑context Adoption Review**  
*(as of the latest CI run on 2026‑04‑28)*  

| Metric | Current Value | Target / Trend | Status |
|--------|---------------|----------------|--------|
| **Workflow contracts defined** | 12 of 15 active agents have a `turn‑context` contract in `workflow.yaml` | 100 % | ⚠️ 3 agents missing |
| **Deterministic verification gates** | 9 agents have `verify-turn-context` gate enabled | 12 agents | ⚠️ 3 gaps |
| **Review artifacts (turn‑context diff)** | 78 % of PRs include a `turn‑context` diff report | ≥ 90 % | ⚠️ |
| **Validation loops (80‑100 per turn)** | Avg. 63 loops per turn (median 58) | ≥ 80 | ❌ |
| **Blocker reports** | 4 blockers logged in the last sprint (all related to missing context keys) | ≤ 1 per sprint | ❌ |
| **Branch/PR scope compliance** | 84 % of PRs keep `turn‑context` changes isolated to a single feature branch | ≥ 95 % | ⚠️ |
| **Documentation coverage** | 6 of 12 agents have up‑to‑date `TURN_CONTEXT.md` | 100 % | ⚠️ |
| **Test coverage (unit + integration)** | 71 % of `turn‑context` code paths exercised | ≥ 85 % | ❌ |

---

### Key Findings

1. **Partial Contract Adoption** – Three production agents (`@ag
...[truncated]

Must

  • Preserve request id, source metadata, structured spec, invocation root, mode, stage mode, spec path, metadata, and spec text.
  • Record compact provenance through generation decisions or coordinator metadata.
  • Keep the shared turn context internal to the adapter boundary.

Must Not

  • Move LocalResponse, blocker taxonomy, recovery wording, or execution semantics into the shared turn-context package.
  • Drop Ricky-specific workflow metadata during envelope assembly.
  • Treat turn context as a product decision engine.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.product-core-stays-ricky-owned (agent-assistant-boundary/manual)

User message

Decide whether workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording should move into Agent Assistant.

Ricky output

**Recommendation: Move all four capabilities into the Agent Assistant**  

| Capability | Why it belongs in the Agent Assistant | Integration notes |
|------------|----------------------------------------|--------------------|
| **Workflow generation** | The Assistant is already the entry point for users who need a new workflow. Embedding generation here lets us apply the same contract‑first templating (JSON‑Schema + Jinja) that the rest of the platform uses, and it keeps the “create‑first” experience in a single UI. | - Expose a `generateWorkflow(request: GenerationSpec) → WorkflowDefinition` endpoint.<br>- Attach a deterministic verification gate that runs the generated definition through the **SchemaValidator** and **StaticAnalyzer** before returning it. |
| **Workflow validation** | Validation is the natural next step after generation or edit. Centralising it in the Assistant guarantees that every workflow a user sees has passed the same 80‑to‑100 validation loop (schema, lint, test‑case simulation). | - Provide `validateWorkflow(workflowId: string) → ValidationReport`.<br>- The report must include a **blocker list** (critical failures) and **evidence** (line numbers, test logs
...[truncated]

Must

  • Keep product-defining workflow generation, validation, debugging, local UX, and evidence wording Ricky-owned until proof says otherwise.
  • Reuse shared runtime primitives where they reduce duplication without weakening Ricky.
  • Make extraction follow typed, tested, live product proof.

Must Not

  • Generalize workflow-specific behavior prematurely.
  • Adopt moving shared seams merely for architectural purity.
  • Lose the precise local-first staged workflow UX.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.one-slice-at-a-time (agent-assistant-boundary/manual)

User message

Plan the next Agent Assistant adoption slice for Ricky.

Ricky output

**Agent Assistant Adoption Slice – “Ricky v2.0 Roll‑out”**  
*Target: Core Customer‑Success (CS) teams, 3 months, 2 pilot squads*  

---

## 1. Slice Objective
Deliver a deterministic, review‑ready version of the **Ricky** AgentWorkforce assistant that can be adopted by CS squads for **workflow reliability** and **coordination** tasks. The slice must include:

| Deliverable | Success Metric |
|-------------|----------------|
| **Workflow contract** (YAML) for “Ticket‑Triage → Assignment → Follow‑up” | 100 % schema validation, 0 % runtime contract violations in pilot |
| **Deterministic verification gate** (GitHub Action) | Fails on any contract breach or test flake |
| **Review artifact** (Design Doc + Test Matrix) | Signed‑off by CS Lead & Platform Architecture |
| **80‑to‑100 validation loops** (automated + manual) | ≥ 90 % pass rate before PR merge |
| **Blocker reporting** (Slack‑bot + GitHub issue template) | All blockers logged within 15 min of detection |
| **Scoped branch/PR policy** (feature‑branch per workflow, protected main) | No direct pushes to `main`; PR must pass all gates |

---

## 2. Timeline & Milestones (3 months)

| Week | Milestone | Owner | Gate |
|------|--
...[truncated]

Must

  • Pick exactly one real shared seam to evaluate or adopt.
  • Define a live Ricky product path that will prove the adoption.
  • Include regression checks that product messaging, blocker output, and evidence remain truthful.

Must Not

  • Bundle sessions, memory, policy, proactive behavior, and execution extraction into one vague migration.
  • Skip the comparison/evaluation step for mature Ricky-local seams.
  • Treat adoption as successful without a live product-path proof.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.future-surfaces-use-shared-runtime (agent-assistant-boundary/manual)

User message

Design future Slack or web support for Ricky using Agent Assistant packages.

Ricky output

## Ricky – Future Support Interface (Slack / Web)

### 1. High‑level Vision
Provide a **self‑service, deterministic support layer** for Ricky agents that lives both in Slack (for on‑the‑fly assistance) and in a lightweight web console (for deeper diagnostics). The interface is built from **Agent Assistant Packages (AAPs)** that expose reusable contracts, verification gates, and review artifacts, keeping the support surface **deterministic, auditable, and version‑controlled**.

---

## 2. Architecture Overview

| Component | Responsibility | Tech Stack | AAP Contract |
|-----------|----------------|------------|--------------|
| **Slack Bot** | Receive user commands, render interactive blocks, forward to backend | Bolt‑JS, Slack Events API | `SlackCommandContract` |
| **Web Console** | Rich UI for logs, metrics, and manual trigger of repair flows | React + Vite, Tailwind, Auth0 | `WebUIContract` |
| **Ricky Core Service** | Executes workflow repairs, runs verification gates, stores artifacts | Node 18, Express, PostgreSQL, Redis | `WorkflowEngineContract` |
| **AAP Registry** | Publishes, versions, and validates AAPs | npm private registry, semantic‑release | `AAPPackageContract` |

...[truncated]

Must

  • Prefer shared surfaces, webhook-runtime, sessions, and routing primitives for future non-CLI interaction where mature.
  • Keep local CLI behavior product-local unless shared adoption is proven harmless.
  • Explain which behavior is future/target architecture versus implemented today.

Must Not

  • Preemptively add memory, policy, or proactive packages without a real Ricky product requirement.
  • Let future surface abstractions distort the current CLI contract.
  • Duplicate a mature Agent Assistant capability locally without justification.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.generation-default-not-execution (cli-behavior/manual)

User message

A user runs ricky --mode local --spec "generate a workflow for package checks" without --run.

Ricky output

**Ricky CLI – What happened?**

You invoked the tool in *local* mode with a specification, but you didn’t include a command that tells Ricky to actually create or execute the workflow.  
The `--spec` flag only supplies the high‑level description; you still need to tell Ricky what to do with that spec.

**How to proceed**

| Goal | Command you need |
|------|------------------|
| **Just generate the workflow files** (no execution) | `ricky --mode local --spec "generate a workflow for package checks" --plan` |
| **Generate **and** run the workflow immediately** | `ricky --mode local --spec "generate a workflow for package checks" --run` |
| **Preview the plan without writing files** | `ricky --mode local --spec "generate a workflow for package checks" --dry-run` |

**Typical workflow creation pattern**

```bash
# 1️⃣ Define the spec (you already have it)
SPEC="generate a workflow for package checks"

# 2️⃣ Create the workflow files (plan phase)
ricky --mode local --spec "$SPEC" --plan

# 3️⃣ Review the generated files (they’ll appear under ./workflows/…)
git status   # see new/modified files
cat ./workflows/package-checks.yml   # quick glance

# 4️⃣ Run the workflow locally (optional
...[truncated]

Must

  • Say generation is the default and execution was not requested.
  • Print the generated artifact path, workflow id, spec digest, and next run command.
  • Avoid showing execution evidence for a generation-only request.

Must Not

  • Imply the workflow ran automatically.
  • Present a generation-only result as execution success.
  • Hide the opt-in commands for running the artifact.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.first-run-copy-is-compact-and-truthful (cli-behavior/manual)

User message

Render Ricky's first-run CLI onboarding for a new user.

Ricky output

**Ricky – First‑Run CLI Onboarding**

Below is the complete, copy‑and‑paste ready onboarding workflow that a new user should run the first time they invoke the `ricky` CLI. It follows the **Ricky repository conventions** (AGENTS.md), the **workflow standards** (deterministic gates, review artifacts, 80‑100 validation loops), and the **shared authoring rules** (scoped branches/PRs, honest blocker reporting).

---

## 1. Overview

| Phase | Goal | Success Gate |
|------|------|--------------|
| **Bootstrap** | Install dependencies, create a clean workspace | `ricky bootstrap --check` returns exit‑code 0 |
| **Config Init** | Generate a minimal `ricky.yaml` with user‑specific defaults | `ricky config validate` passes |
| **Credential Provision** | Store required secrets in the encrypted vault | `ricky vault test` succeeds |
| **Sample Workflow Load** | Pull the “hello‑world” starter workflow and run a dry‑run | `ricky run hello-world --dry` reports *PASS* |
| **Verification Loop** | Execute 80‑100 automated validation loops (unit + integration) | ≥ 95 % of loops report *PASS* |
| **Final Report** | Summarize onboarding status and next‑step recommendations | `ricky status` shows *ONBOA
...[truncated]

Must

  • Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
  • End every branch with a concrete next step.
  • Advertise only commands that are currently implemented.

Must Not

  • Sound like a launch page or documentation dump.
  • Claim Ricky runs workflows by default when generation is the default path.
  • Require web or Slack onboarding before CLI use.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.recovery-guidance-no-stack-traces (cli-behavior/manual)

User message

A user gives Ricky an empty spec or a missing spec file.

Ricky output

**Ricky – Handling an Empty or Missing Specification File**

Below is the concrete, deterministic workflow you should follow whenever a user submits an empty spec or a spec file that cannot be found. The contract is written in the standard **Ricky‑Workflow DSL** (see AGENTS.md) and can be copied into a new branch (`feature/handle‑missing‑spec`) for immediate execution.

---

## 1. Workflow Contract (YAML)

```yaml
name: handle-missing-spec
description: Detect, report, and remediate empty or missing specification files.
owner: ricky@agentworkforce.com
trigger:
  - type: user_input
    condition: spec_file_path == null OR file_is_empty(spec_file_path)
steps:
  - id: detect_missing
    name: Detect missing/empty spec
    action: python
    script: |
      import os, sys, json
      path = "{{ inputs.spec_file_path }}"
      if not path or not os.path.exists(path):
          result = {"status":"missing","path":path}
      elif os.path.getsize(path) == 0:
          result = {"status":"empty","path":path}
      else:
          result = {"status":"present","path":path}
      print(json.dumps(result))
    output: detection

  - id: report_blocker
    name: Report blocker to user
    when: 
...[truncated]

Must

  • Return a user-facing failure or guidance message with a real recovery command.
  • Distinguish generation failure from execution failure.
  • Show stack traces only when verbose diagnostic mode is requested.

Must Not

  • Crash with an uncaught exception in normal mode.
  • Suggest commands that do not exist.
  • Pretend a missing spec was accepted.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.status-does-not-invent-provider-state (cli-behavior/manual)

User message

Render ricky status when no provider checks have proven Google or GitHub are connected.

Ricky output

**`ricky status` output (no provider checks have proven Google or GitHub are connected)**  

=== AgentWorkforce Status Report ===

🟡 Provider Connectivity
• Google : NOT VERIFIED (no successful check)
• GitHub : NOT VERIFIED (no successful check)

✅ Core Services
• Scheduler : RUNNING
• Queue : ACTIVE
• Metrics : COLLECTING

🔧 Configuration
• Workflow Engine : v2.4.1
• Auth Mode : TOKEN‑BASED
• Last Sync : 2026‑05‑12 08:14 UTC

⚠️ Alerts
• Provider health checks pending for Google and GitHub.
• Actions dependent on these providers may be degraded or unavailable.

📝 Next Steps

  1. Run provider health checks:
    ricky provider check --name google
    ricky provider check --name github
  2. Review connection logs if checks fail:
    ricky logs --provider google
    ricky logs --provider github
  3. Re‑authenticate or re‑configure credentials as needed.

--- End of Report ---

Must

  • Report unknown or not-connected provider state honestly.
  • Update provider status only from explicit provider checks or Cloud status results.
  • Give concrete setup guidance for Cloud when relevant.

Must Not

  • Mark Google or GitHub connected because guidance text was shown.
  • Invent a provider connection URL or OAuth flow.
  • Show empty fields with no recovery guidance when config is missing.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-matcher-registry-backed (generation-quality/manual)

User message

Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.

Ricky output

**Workflow: “GitHub Primitive Change”**  
*Purpose*: Apply a single‑file edit (e.g., typo fix, config tweak) to a repository, run CI, and merge to the target branch with full auditability.

---

## 1. Scope & Branching Contract  

| Item | Definition |
|------|------------|
| **Change Type** | Primitive (single‑file, < 200 LOC, no dependency changes) |
| **Source Branch** | `feature/primitive‑<short‑desc>` (created from the target) |
| **Target Branch** | `main` (or configurable `default` in repo settings) |
| **PR Title Pattern** | `fix: <short‑desc>`  (e.g., `fix: correct typo in README`) |
| **PR Description Template** | <ul><li>**Why** – short rationale</li><li>**What** – file(s) changed, line range</li><li>**How** – steps to verify locally</li><li>**Tests** – CI checks that must pass</li></ul> |
| **Merge Strategy** | **Squash‑merge** with signed commit (`--signoff`) |
| **Rollback Policy** | If post‑merge verification fails, revert the squash commit via a new PR titled `revert: <original‑title>` |

---

## 2. Skill Set Selected by Ricky  

| Skill | Role in Workflow |
|-------|------------------|
| **GitOps‑Branching** | Creates, checks out, and deletes the feature branch; en
...[truncated]

Must

  • Select skills from the actual registry rather than a hardcoded static set.
  • Record ranked skill matches with confidence and trigger evidence.
  • Fall back gracefully when the registry is missing or no skill clears the threshold.

Must Not

  • Claim runtime agents embody skills when only generation-time guidance was applied.
  • Fail generation solely because optional skill files are missing.
  • Hide skill selection evidence from artifacts or JSON output.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)

User message

Generate a workflow where the spec says "use Claude to review and Codex to implement".

Ricky output

**Workflow Title:** `Claude‑Review → Codex‑Implement`  

**Repository:** `agentworkforce/workflows`  
**Branch:** `feature/claude-review-codex-implement`  
**PR Target:** `main`  

---  

## 1. High‑Level Contract  

| Item | Description |
|------|-------------|
| **Input** | `src/**/*.py` (or any language files) submitted via a PR. |
| **Output** | Updated source files with implementation changes **and** a `CLAUDE_REVIEW.md` artifact summarizing the review. |
| **Success Criteria** | 1. Claude’s review passes all **review‑gate** checks (no “BLOCKER” tags). 2. Codex’s implementation compiles/tests with **≥ 90 %** pass rate. 3. All deterministic verification steps return `PASS`. |
| **Failure Mode** | Any gate returns `FAIL` → workflow aborts, posts a **blocker** comment on the PR, and opens an issue `wf‑blocker‑<timestamp>`. |
| **Loop Count** | Up to **80‑100** validation loops (review → implement → test) until success or blocker. |

---  

## 2. Workflow Diagram (YAML‑style)

```yaml
name: Claude‑Review → Codex‑Implement
on:
  pull_request:
    types: [opened, synchronize, reopened]

jobs:
  review:
    runs-on: ubuntu-latest
    outputs:
      review_id: ${{ steps.claude.outputs
...[truncated]

Must

  • Assign per-step runner/model decisions from explicit spec hints where possible.
  • Let skill metadata or project defaults fill gaps when the spec is silent.
  • Write tool-selection.json or equivalent audit metadata.

Must Not

  • Use one generic runtime default for every agent despite explicit hints.
  • Let skill defaults override explicit user/spec runner hints.
  • Omit the reason each tool or model was chosen.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)

User message

Use --refine to sharpen a generated workflow's step task descriptions and acceptance gates.

Ricky output

### Using `--refine` to Sharpen Step Descriptions & Acceptance Gates  

`--refine` is a deterministic post‑processing flag that takes a **generated workflow JSON** (or YAML) and returns a **tighter, review‑ready version**. It focuses on two things:

| Target | What `--refine` does |
|--------|----------------------|
| **Task descriptions** | • Removes vague verbs (“do”, “handle”). <br>• Adds concrete inputs, outputs, and responsible role. <br>• Limits each description to 1‑2 sentences (≈30‑45 words). |
| **Acceptance gates** | • Converts informal “check” statements into explicit, testable predicates. <br>• Adds a **verification type** (`unit`, `integration`, `policy`, `manual`). <br>• Supplies a **pass‑criteria** field (e.g., `statusCode == 200 && body.id != null`). |

---

## 1. Invocation Pattern  

```bash
awf generate --template my_template.yaml --output workflow.json
awf refine --input workflow.json --output workflow.refined.json
  • --input – path to the raw workflow you just generated.
  • --output – where the refined artifact will be written.
  • Optional flags:
    • --max-length 45 – caps description word count.
    • --gate-style strict|lenient – selects the s
      ...[truncated]

**Must**

- Keep the deterministic unrefined path as the default.
- Bound the refinement pass by timeout, token budget, and editable regions.
- Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.

**Must Not**

- Change the workflow graph, agent assignments, or side-effect scope during refinement.
- Fail the whole generation if optional refinement times out.
- Hide warnings when the deterministic artifact is returned unchanged.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>generation-quality.behavior-grounded-gates</code> (generation-quality/manual)</summary>

**User message**

> Generate a workflow for the `ricky --version` spec.

**Ricky output**

```text
**Workflow: `ricky --version` Implementation & Release**  
*Repository:* `github.com/agentworkforce/ricky`  
*Target Branch:* `main` → `release/vX.Y.Z` (semantic version bump)  

---

## 1. Scope & Acceptance Contract  

| Item | Requirement | Success Criterion |
|------|-------------|--------------------|
| **CLI flag** | `ricky --version` must output the current semantic version string (e.g., `ricky v2.4.1`) and exit with code 0. | Exact match to `ricky v${VERSION}` on stdout; no trailing whitespace; exit 0. |
| **Version source** | Single source of truth in `go.mod` (or `pyproject.toml` / `package.json` depending on language). | Workflow reads `${VERSION}` from that file; any mismatch fails verification. |
| **Help integration** | `ricky --help` must list `--version` with description “Print the Ricky version and exit”. | Help output contains the line `--version   Print the Ricky version and exit`. |
| **Cross‑platform** | Must work on Linux, macOS, and Windows CI runners. | All three runner matrices pass the verification gate. |
| **Documentation** | `README.md` and `docs/cli.md` must contain a “Version” section with the same string. | Diff‑check against `${VERSION}` passes. |
|
...[truncated]

Must

  • Build gates from the stated acceptance behavior, such as checking ricky --version output.
  • Avoid generic source-shape grep checks when the spec asks for CLI behavior.
  • Keep generated validation meaningful for the current repo shape.

Must Not

  • Treat grep -Eq 'export|function|class|workflow(' dist/ricky.js as proof of version behavior.
  • Claim the workflow is proven by source syntax alone.
  • Ignore the package-json version resolution order in the spec.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)

User message

Generate a workflow for many independent artifacts with a validation/fix/rerun loop.

Ricky output

**Workflow: “Batch‑Artifact Validation‑Fix‑Rerun Loop”**  
*Purpose*: Process a large set of independent artifacts (e.g., data files, code modules, config bundles) through deterministic validation, automatic fixing where possible, and controlled re‑execution until all pass or a blocker is reported.

---

## 1. Scope & Boundaries
| Item | Description |
|------|-------------|
| **Branch** | `feature/batch‑artifact‑validation` (isolated from `main`) |
| **PR Target** | `main` – only merged after **100 % pass** or explicit blocker resolution |
| **Artifacts** | Any files under `artifacts/` (flat or nested). Each artifact is self‑contained; no cross‑artifact dependencies. |
| **Loop Limit** | Max **80–100** validation‑fix‑rerun cycles per artifact (configurable). Exceeding limit → **Blocker**. |
| **Owner** | `@team‑validation` (reviewers: `@qa‑lead`, `@dev‑ops`) |
| **CI Provider** | GitHub Actions (or Azure Pipelines) – deterministic runners, pinned Docker images. |

---

## 2. Contract (Inputs / Outputs)

| Contract | Type | Details |
|----------|------|---------|
| **Input** | `artifact_set` | JSON list: `{ "id": "<path>", "type": "<enum>", "checksum": "<sha256>" }` |
| **Output** |
...[truncated]

Must

  • Choose dag, supervisor, or pipeline deliberately based on the work shape.
  • Explain the pattern choice in artifact metadata or a rationale.
  • Use dag for validation/fix/rerun loops when dependencies matter.

Must Not

  • Default blindly to dag for every workflow.
  • Collapse independent artifact work into one vague agent task.
  • Omit verification gates because the chosen pattern seems obvious.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-boundary-copy (generation-quality/manual)

User message

Describe how selected workflow-writing skills affected a generated Ricky workflow.

Ricky output

### How Selected Workflow‑Writing Skills Shape a Generated Ricky Workflow  

Below is a step‑by‑step illustration of the **five core Ricky authoring skills** and the concrete artifacts they produce in a typical workflow for “On‑board a New Contractor”.  Each skill injects deterministic contracts, verification gates, and review deliverables that keep the pipeline reliable, auditable, and easy to iterate.

| # | Skill | What the author does | Concrete Output in the Workflow | Effect on the Final Pipeline |
|---|-------|----------------------|--------------------------------|------------------------------|
| 1 | **Scope Definition & Branch Boundary** | Writes a **`scope.yaml`** that lists the exact functional boundaries (e.g., “Create contractor record”, “Assign manager”, “Provision VPN”). | `scope.yaml` (list of *capabilities* with owners, version tags, and a `branch: feature/onboard‑contractor` label). | Guarantees the PR lives in a dedicated feature branch, preventing cross‑feature bleed‑through. The scope file is used by the CI gate to enforce *only* the listed capabilities can be touched in the PR. |
| 2 | **Deterministic Contract Authoring** | Drafts **`contract.json`** for each
...[truncated]

Must

  • Describe skills as generation-time selection, loading, and rendering inputs.
  • Point to metadata such as loaded-skills.txt and skill-application-boundary.json.
  • Say the workflow instructions were informed by selected skills.

Must Not

  • Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
  • Treat metadata existence as proof that agents received skill bodies at runtime.
  • Overstate current tests beyond the generation boundary.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)

User message

Generate a workflow from a vague product spec and return it to the user.

Ricky output

**Ricky – Workflow Authoring Assistant**

---

### 📋 Quick Summary  
To turn a product specification into a reliable AgentWorkforce workflow we need a **clear, concrete contract** that defines:

1. **Inputs & Outputs** – data shapes, types, and validation rules.  
2. **Stages & Tasks** – deterministic steps, each with its own verification gate.  
3. **Success/Failure Paths** – explicit branching, retry policies, and blocker handling.  
4. **Review & Validation Loop** – 80‑to‑100 automated checks + manual sign‑off before merge.  

Because the spec you provided is vague, I’ll ask for the missing pieces that are required to generate a complete, testable workflow. Once we have those details, I’ll produce a **branch‑ready workflow definition** (YAML) with all the contracts and verification gates you asked for.

---

## ❓ Information Needed

| # | What I Need | Why It Matters |
|---|--------------|----------------|
| 1 | **High‑level goal** – e.g., “extract customer sentiment from support tickets” | Determines overall pipeline shape. |
| 2 | **Input payload** – JSON schema or example (fields, types, required/optional) | Drives input validation gate. |
| 3 | **Core processing steps** – l
...[truncated]

Must

  • Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
  • Validate with dry-run or targeted structural checks where possible.
  • Return artifacts, warnings, and follow-up commands honestly.

Must Not

  • Act like Ricky is a pure code-generation bot that emits workflows without verification.
  • Stop at "code compiles" as the proof bar.
  • Skip skill-aware workflow authoring guidance for serious workflows.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)

User message

A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.

Ricky output

**Ricky’s Immediate Response Plan**

| Phase | Action | Rationale | Deliverable |
|-------|--------|-----------|-------------|
| 1️⃣ Detect & Log | • Capture the timeout event (step X) and the “alive‑but‑idle” worker (step Y). <br>• Record timestamps, input payloads, and any partial logs. | Provides an immutable audit trail for post‑mortem and for the verification gate. | `ricky/logs/2026‑05‑12_failure_<run‑id>.json` |
| 2️⃣ Blocker Report | • Mark the run as **BLOCKED** in the workflow state store. <br>• Emit a **Ricky‑Blocker** issue with severity = high, tags `timeout`, `orphan‑worker`. | Guarantees downstream pipelines do not consume incomplete artifacts. | GitHub Issue #\<next‑num\> – “Workflow <name> run <id> blocked – step X timeout, step Y idle”. |
| 3️⃣ Deterministic Verification Gate | • Create a **Gate‑Spec** (`gate.yaml`) that must pass before any retry: <br>  - `step_X_status == "timeout"` <br>  - `step_Y_artifact_exists == false` <br>  - `worker_Y_uptime < max_allowed_idle` | Enforces that the same failure cannot be silently re‑run. | `gate.yaml` in `ricky/gates/<run‑id>/` |
| 4️⃣ Scoped Repair Branch | • Branch off `main` → `repair/<workflow‑name>/<run‑id>`.<br>• Add
...[truncated]

Must

  • Classify the failure before deciding whether to retry.
  • Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
  • Preserve raw evidence and uncertainty when the class is not known.

Must Not

  • Blindly rerun the whole workflow without a blocker classification.
  • Treat every failure as a broken workflow definition.
  • Claim Ricky fixed the workflow before rerun evidence proves it.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)

User message

Ricky detects stale .agent-relay/, .relay/, and .trajectories/ state before launching a local workflow.

Ricky output

**Ricky – Stale‑State Detection for Local Workflow Launches**  

Below is the concrete, contract‑driven workflow that Ricky executes every time a developer runs `aw launch` (or any other local entry‑point). The purpose is to guarantee that the three runtime‑state directories – **`.agent‑relay/`**, **`.relay/`**, and **`.trajectories/`** – are fresh, consistent, and safe to use before any workflow code is executed.

---

## 1. Contract Overview  

| Artifact | Expected State | Validation Rule | Failure Action |
|----------|----------------|-----------------|----------------|
| `.agent-relay/` | Contains only files generated by the most recent successful **agent‑relay** run. | `hash‑manifest.json` must exist and its SHA‑256 digest must match the concatenated digests of all files in the directory. | Abort launch, emit **StaleRelayError**. |
| `.relay/` | Holds the latest **relay‑engine** snapshot. | Presence of `snapshot.meta` with a `timestamp` newer than the last successful workflow run (recorded in `.aw/last‑run.json`). | Abort launch, emit **StaleEngineError**. |
| `.trajectories/` | Stores trajectory logs for the current branch only. | Directory must be empty **or** contain a `br
...[truncated]

Must

  • Classify stale local runtime state as an environment contamination issue.
  • Recommend quarantine or isolated-run guidance before launch.
  • Record the observed paths and the action taken or recommended.

Must Not

  • Treat stale runtime state as a workflow logic failure.
  • Delete or overwrite state without an explicit safe path or user intent.
  • Continue into execution as if the workspace were clean.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)

User message

A run marker says another Ricky or Relay run is already active in this workspace.

Ricky output

**Resolution steps for “another Ricky or Relay run is already active in this workspace”**

Below is a deterministic, repeatable workflow you can run in any workspace that uses the Ricky/Relay tooling. Follow the steps in order; each step has a verification gate that must pass before you proceed to the next one. If any gate fails, treat it as a **blocker** and open a PR that documents the failure (see “Review artifacts” at the end).

---

## 1️⃣ Verify the active‑run marker

| Action | Command | Expected outcome |
|--------|---------|------------------|
| List the marker file | `cat .ricky/run.marker` | Shows a JSON payload with `runId`, `pid`, `startedAt`, and `owner`. |
| Confirm the PID is still alive | `ps -p <pid>` | Should return a line with the PID and the command `ricky run …`. |
| If the PID is **not** running → go to step 2. | | |
| If the PID **is** running → decide whether to keep it or terminate it. | | |

**Gate 1 – Marker‑PID consistency**  
- Pass: PID exists **and** matches the marker.  
- Fail: PID missing or different → treat as a stale marker (proceed to step 2).

---

## 2️⃣ Clean a stale marker (if needed)

1. **Backup the marker** (for audit):  
   ```bash
   
...[truncated]

Must

  • Report the active marker, run id, or status path when available.
  • Ask the user to inspect, wait for, or explicitly clear the active run.
  • Avoid launching a competing run that could corrupt evidence.

Must Not

  • Silently start another run.
  • Hide the existing run marker from the user.
  • Treat the conflict as a generic failure with no recovery path.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.env-loader-injection-runtime-loadable (runtime-recovery/manual)

User message

A workflow artifact references a MISSING_ENV_VAR value. Ricky's deterministic auto-fix injects the .env.local / .env loader (loadRickyWorkflowEnv) and the optional assertRickyWorkflowEnv guard into the artifact before retry. The artifact may be a master-rendered workflow whose .step({ command: ... }) bodies embed node --input-type=module HEREDOCs containing literal import { ... } from 'node:fs' / from 'node:path' strings.

Ricky output

**Deterministic auto‑fix for missing environment variables in a Ricky workflow artifact**

Below is a concrete, repeatable contract that you can apply to any workflow artifact that fails because it references an undefined `MISSING_ENV_VAR`. The fix injects the environment‑loader utilities, adds a guard, and (when needed) rewrites embedded HEREDOC command strings so that the Node‑ESM imports resolve correctly.

---

## 1. Preconditions (Verification Gate)

| Check | Expected | Action if not met |
|-------|----------|-------------------|
| Artifact is a **master‑rendered** workflow (i.e. a JSON/YAML file that contains `.step({ command: … })` objects). | ✅ | Abort – use a separate repair branch. |
| Artifact contains at least one reference to `process.env.MISSING_ENV_VAR` (or `${MISSING_ENV_VAR}` in a HEREDOC). | ✅ | Continue. |
| Repository has a **`.env.local`** or **`.env`** file at the root (or a plan to create one). | ✅ | Continue. |
| `loadRickyWorkflowEnv` and `assertRickyWorkflowEnv` utilities exist in `src/workflow/env.ts`. | ✅ | Continue. |
| CI pipeline can run a **deterministic validation loop** (≥80 cycles) after the fix. | ✅ | Continue. |

If any gate fails, open a **blo
...[truncated]

Must

  • Produce a repaired artifact that successfully loads under Node, not just one that contains the marker comment. The injected loadRickyWorkflowEnv body references rickyWorkflowFs.* and rickyWorkflowPath.*, so the repair must also add the corresponding import * as rickyWorkflowFs from 'node:fs' and import * as rickyWorkflowPath from 'node:path' aliases at module top level.
  • Detect existing alias imports by matching real top-level import * as <alias> from '<module>' statements, not by substring-matching the module specifier anywhere in the file (substrings inside HEREDOCs in .step({ command: ... }) bodies do not count as imports).
  • Leave the embedded shell HEREDOC contents untouched so the runtime-spawned child process still sees the literal import lines it expects.

Must Not

  • Skip adding the rickyWorkflowFs / rickyWorkflowPath aliases because from 'node:fs' or from 'node:path' already appears somewhere in the file as a string literal.
  • Inject loadRickyWorkflowEnv (or assertRickyWorkflowEnv) without the supporting alias imports, which produces a ReferenceError: rickyWorkflowPath is not defined at module load and burns the auto-fix budget on UNSUPPORTED_RUNTIME at runtime-launch.
  • Rewrite or escape the embedded HEREDOC text in step commands.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)

User message

Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.

Ricky output

**Local Workflow Execution – Auto‑Fix Run**

Below is a deterministic, step‑by‑step contract you can copy‑paste into your terminal (or CI script). It assumes you have the **AgentWorkforce CLI** (`aw`) installed and that the failing workflow artifact is stored in `./artifacts/<workflow‑id>.json`. The previous successful run ID is supplied as `PREV_RUN_ID`.

---

## 1️⃣ Preconditions (verify before you start)

| Check | Command | Expected outcome |
|-------|---------|------------------|
| CLI version | `aw --version` | `>=2.4.0` (auto‑fix introduced in v2.4) |
| Working directory | `pwd` | Root of the repo (contains `.aw/` config) |
| Artifact exists | `ls ./artifacts/<workflow-id>.json` | File present, readable |
| Previous run ID available | `echo $PREV_RUN_ID` | Non‑empty string |
| Docker (if workflow uses containers) | `docker ps` | Daemon running |

If any check fails, abort and fix the environment before proceeding.

---

## 2️⃣ Load the failing workflow & capture the failure context

```bash
# Export identifiers for reproducibility
export WORKFLOW_ID="<workflow-id>"
export PREV_RUN_ID="<previous-run-id>"
export ARTIFACT_PATH="./artifacts/${WORKFLOW_ID}.json"

# Pull the faili
...[truncated]

Must

  • Use a bounded retry budget and summarize every attempt.
  • Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
  • Resume from the failed step with the previous run id when those values are available.

Must Not

  • Edit arbitrary repository source files as the default auto-fix surface.
  • Keep retrying after the configured max attempts.
  • Lose the single Ricky tracking run id across repair/resume attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.no-auto-fix-preserves-single-attempt (runtime-recovery/manual)

User message

A user runs ricky run workflows/foo.ts --no-auto-fix and the workflow fails.

Ricky output

### Quick‑Start Debug Checklist for a Failing `ricky run workflows/foo.ts --no-auto‑fix`

| Step | Action | Expected Outcome | How to Verify |
|------|--------|------------------|----------------|
| 1️⃣ | **Open the run log** | `ricky` writes a detailed JSON log to `.ricky/run‑logs/<timestamp>.json`. | `cat .ricky/run‑logs/*foo*.json \| jq .` |
| 2️⃣ | **Locate the failure gate** | The log contains a `gate` object with `status: "failed"` and a `reason` field. | Search for `"status": "failed"` in the log. |
| 3️⃣ | **Run in debug mode** | Adds step‑by‑step console output and writes a separate `debug.log`. | `ricky run workflows/foo.ts --debug --no-auto-fix` |
| 4️⃣ | **Validate the workflow contract** | The contract (`workflows/foo.contract.ts`) must pass the deterministic validator. | `ricky validate workflows/foo.contract.ts` |
| 5️⃣ | **Check the generated plan** | With `--no-auto-fix`, the plan is emitted but not applied. Verify the plan file under `.ricky/plans/`. | `cat .ricky/plans/<timestamp>.plan.json \| jq .` |
| 6️⃣ | **Inspect the affected resources** | The plan lists resources and the intended actions (`create`, `update`, `delete`). Confirm they match expectations. | Lo
...[truncated]

Must

  • Preserve one-attempt behavior when auto-fix is disabled.
  • Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
  • Make clear that the user chose manual inspection over repair/resume automation.

Must Not

  • Start a repair loop despite --no-auto-fix.
  • Suppress the diagnosis because no repair was attempted.
  • Present the failure as a completed repair attempt.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.in-process-local-runner (runtime-recovery/manual)

User message

Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.

Ricky output

**Ricky’s Standard Procedure for Executing a Local TypeScript Workflow Artifact**

| Phase | Action | Details | Verification Gate |
|-------|--------|---------|--------------------|
| **1️⃣ Locate & Scope** | Identify the *primary local path* (the root of the workflow repo). | `WORKFLOW_ROOT=$(git rev-parse --show-toplevel)`  <br>Confirm the presence of `package.json`, `tsconfig.json`, and the workflow entry file (e.g., `src/main.ts`). | `[[ -f "$WORKFLOW_ROOT/package.json" && -f "$WORKFLOW_ROOT/tsconfig.json" && -f "$WORKFLOW_ROOT/src/main.ts" ]]` |
| **2️⃣ Environment Prep** | Ensure a clean, reproducible Node environment. | ```bash\ncd "$WORKFLOW_ROOT"\n# Use the repo‑defined Node version (via .nvmrc or engines field)\nif command -v nvm >/dev/null; then nvm install; nvm use; fi\n``` | `node -v` matches `engines.node` in `package.json`. |
| **3️⃣ Dependency Installation** | Install exact versions declared in `package-lock.json` (or `pnpm-lock.yaml` / `yarn.lock`). | ```bash\nnpm ci   # deterministic, no package‑updates\n``` | Exit code `0` and `node_modules/` populated. |
| **4️⃣ Type‑Checking (Deterministic Gate)** | Run the TypeScript compiler in *no‑emit* mode to catch type er
...[truncated]

Must

  • Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the agent-relay binary on PATH.
  • Precheck that Node and @agent-relay/sdk are resolvable for the workflow.
  • Record the actual spawn command in execution evidence.

Must Not

  • Fail solely because agent-relay is not on PATH when the SDK route is available.
  • Hide the actual runtime command from evidence.
  • Conflate the user-facing reproduction command with the primary internal spawn route.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)

User message

Ricky reaches a boundary after a structural failure persists after a fix attempt.

Ricky output

**Ricky – Boundary‑Reached Protocol for Persistent Structural Failures**

When a structural failure remains after an initial fix attempt, Ricky must treat the situation as a *hard boundary* and follow the deterministic, review‑driven workflow below. The protocol guarantees that the failure is either resolved or escalated with full auditability.

---

## 1. Contract Preconditions (Gate A)

| Item | Requirement | Source |
|------|-------------|--------|
| **Failure Ticket** | Opened in the *AgentWorkforce* issue tracker with label `structural-failure`. | Tracker API |
| **Initial Fix PR** | Merged to `fix/<ticket‑id>` branch, status **merged**. | GitHub |
| **Post‑Fix Test Suite** | All **core‑validation** tests passed on CI. | CI pipeline |
| **Verification Window** | ≥ 30 min of idle time after merge (to allow downstream propagation). | Scheduler |

If any precondition is false, abort the boundary protocol and return to the *Fix Attempt* loop.

---

## 2. Deterministic Verification Gate (Gate B)

1. **Re‑run Full Validation Suite**  
   - Execute `./ci/run‑full‑suite.sh` on the *post‑merge* commit.  
   - Capture **exit code** and **artifact logs**.

2. **Structural Integrity Check
...[truncated]

Must

  • Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
  • Distinguish escalation from a generic product failure.
  • Preserve enough context for a human operator to continue.

Must Not

  • Retry speculative fixes indefinitely.
  • Collapse to "something went wrong" without the attempted actions.
  • Discard evidence from failed repair attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)

User message

Produce a workflow health digest from many Ricky workflow runs.

Ricky output

**Ricky Workflow Health Digest – Summary of Recent Runs**  
*(Generated 2026‑05‑12)*  

| Workflow | Run ID | Start | End | Duration | Status | Success % | Avg Latency (ms) | Error Types (top 3) | Last Review |
|----------|--------|-------|-----|----------|--------|-----------|------------------|----------------------|--------------|
| **Data‑Ingest‑v2** | `r20260511-01` | 2026‑05‑11 08:02 | 2026‑05‑11 08:07 | 5 m 12 s | ✅ PASS | 99.3 % | 42 | `Timeout(12)`, `SchemaMismatch(3)` | 2026‑05‑10 PR‑#842 |
| **Model‑Train‑Alpha** | `r20260510-07` | 2026‑05‑10 22:15 | 2026‑05‑11 01:03 | 2 h 48 m | ⚠️ WARN | 94.1 % | 128 | `GPUOom(4)`, `DataLeak(1)` | 2026‑05‑09 PR‑#819 |
| **Report‑Gen‑Daily** | `r20260511-14` | 2026‑05‑11 14:00 | 2026‑05‑11 14:01 | 1 m 03 s | ✅ PASS | 100 % | 19 | — | 2026‑05‑11 PR‑#857 |
| **User‑Sync‑Batch** | `r20260509-03` | 2026‑05‑09 03:00 | 2026‑05‑09 03:12 | 12 m 05 s | ❌ FAIL | 78.5 % | 67 | `RateLimit(9)`, `AuthError(5)` | 2026‑05‑08 PR‑#801 |
| **Audit‑Log‑Export** | `r20260510-22` | 2026‑05‑10 22:30 | 2026‑05‑10 22:45 | 15 m 02 s | ✅ PASS | 100 % | 33 | — | 2026‑05‑10 PR‑#830 |

---

### 1. Overall Health Score  
- **Pass Rate:** 3 / 5 workflows = 60 % (≥ 80 
...[truncated]

Must

  • Consume normalized WorkflowRunEvidence rather than raw logs as the primary input.
  • Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
  • Produce concrete recommendations tied to specific workflows, steps, or metrics.

Must Not

  • Mutate evidence while analyzing it.
  • Return generic advice like "improve your workflow" without references.
  • Mix environment failures and workflow-logic failures into one undifferentiated bucket.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)

User message

Design a new Ricky Slack surface for workflow debugging.

Ricky output

**Ricky Slack Surface – “Workflow Debugger”**  
*Version 0.1 – Design Draft*  

---

## 1. Overview
The **Workflow Debugger** Slack surface gives engineers and product owners a deterministic, low‑latency way to inspect, step‑through, and repair Ricky‑managed workflows directly from a Slack channel. It follows the **Ricky repository conventions** (AGENTS.md) and the **workflow standards** (contract‑first, verification gates, review artifacts, 80‑100 validation loops, blocker reporting, scoped PR boundaries).

---

## 2. High‑Level Interaction Flow

| Step | Slack Action | Ricky Backend | Result |
|------|--------------|---------------|--------|
| 1 | `/ricky debug <workflow‑id>` | Resolve workflow definition, load latest run state | Bot posts a **Debug Card** with workflow summary |
| 2 | Click **“Step ▶︎”** on a node | Execute deterministic step‑gate, capture output | Card updates with node status, logs, and next‑step options |
| 3 | Click **“Inject <key>=<value>”** (optional) | Apply input injection gate, re‑run affected sub‑graph | Updated state shown, validation loop triggered |
| 4 | Click **“Rollback to <node>”** | Trigger rollback gate, restore persisted snapshot | Card shows
...[truncated]

Must

  • Treat Slack as a co-equal product surface, not a wrapper around CLI.
  • Route domain work through the same normalization, executor, evidence, and specialist stages.
  • Keep Slack-specific formatting, thread handling, and interactive components in the surface layer.

Must Not

  • Put workflow generation or diagnosis domain logic directly in the Slack handler.
  • Degrade Slack to a developer shortcut with weaker routing than CLI.
  • Skip signature verification, dedup, or thread handling when the surface is implemented.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.normalizer-is-only-translation-boundary (surfaces-ingress/manual)

User message

Add a new web handoff type that submits a workflow spec and mode.

Ricky output

**New Hand‑off Type:** `web_submit_spec`

| Field | Type | Description |
|-------|------|-------------|
| **type** | `string` (enum) | Fixed value `"web_submit_spec"` – identifies this hand‑off. |
| **payload** | `object` | Container for the data the UI sends. |
| **payload.spec** | `object` | Full workflow specification (JSON‑Schema‑validated). |
| **payload.mode** | `string` (enum) | `"draft"` | `"publish"` | `"test"` – determines how the backend should treat the spec. |
| **payload.metadata** *(optional)* | `object` | Arbitrary key/value pairs for audit/logging (e.g., `requestId`, `userId`). |
| **response** | `object` | Returned to the UI after processing. |
| **response.status** | `string` (enum) | `"accepted"` | `"rejected"` | `"error"` |
| **response.message** | `string` | Human‑readable explanation of the result. |
| **response.errors** *(optional)* | `array<object>` | Validation or processing errors, each with `path` and `detail`. |

---

## 1. Contract Definition (JSON‑Schema)

```json
{
  "$id": "urn:awf:handoff:web_submit_spec",
  "type": "object",
  "required": ["type", "payload"],
  "properties": {
    "type": { "const": "web_submit_spec" },
    "payload": {
      "ty
...[truncated]

Must

  • Add a handoff variant that normalizes into LocalInvocationRequest or CloudGenerateRequest.
  • Keep auth/session validation in the web surface before normalization.
  • Ensure downstream code does not need to know the request came from web.

Must Not

  • Short-circuit directly from the web handler to executors.
  • Create a parallel domain model for web requests.
  • Make the web surface the owner of local or Cloud routing semantics.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.mcp-claude-context-is-metadata (surfaces-ingress/manual)

User message

A Claude session hands Ricky a workflow spec plus conversation context and asks Ricky to determine whether to generate, debug, restart, analyze, or coordinate.

Ricky output

**Decision Matrix for the Claude‑to‑Ricky Hand‑off**

| **Input Condition** | **What Ricky Should Do** | **Rationale / Next Steps** |
|---------------------|--------------------------|--------

---

_Comment truncated to stay within GitHub limits. Download the `ricky-eval-run` artifact for the full `human-review.md`._

@khaliqgant khaliqgant merged commit ad7d0d3 into main May 12, 2026
1 of 2 checks passed
@khaliqgant khaliqgant deleted the fix/router-loop-skips-impl-rerun branch May 12, 2026 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant