Emit repair-aware generated workflows by khaliqgant · Pull Request #75 · AgentWorkforce/ricky

khaliqgant · 2026-05-08T15:03:13Z

Summary

make Ricky-generated ordinary, master, and child workflows opt into repair-aware retry with repairAgent/repairRetries
reject generated workflows that still use live fail-fast or retry without repair metadata
extend generator coverage across code, doc/spec, low-risk pipeline, master, and child workflow shapes
keep the local auto-fix failure ladder passing and sync package proof/docs for existing eval scripts

Verification

npm run typecheck
npx vitest run src/product/generation/pipeline.test.ts
npx vitest run src/local/auto-fix-loop.test.ts
npx vitest run test/local-auto-fix-workflow-failures.e2e.test.ts
npx vitest run test/package-proof/package-layout-proof.test.ts
npm test
git diff --check

Workflows that own the PR-creation boundary must use @agent-relay/github-primitive (createGitHubStep with action createPR, or GitHubClient.createPR) rather than shelling out to gh pr create. The primitive handles runtime selection across local gh, Nango cloud, and relay-cloud proxy and produces structured outputs the orchestrator can chain on. Master/orchestration workflows that complete an end-to-end deliverable should default to including a final-pr step using this primitive. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-08T15:04:07Z

📝 Walkthrough

Walkthrough

This PR introduces two independent, significant features: (1) repair-aware retry hardening for generated workflows, ensuring all workflows emit retry handlers that include repair agents rather than fail-fast, and (2) a comprehensive human-authored product evaluation suite framework with six thematic test suites, four npm CLI scripts for compilation and execution, and detailed specifications covering CLI behavior, generation quality, runtime recovery, surface ingress, workflow authoring, and agent-assistant boundary semantics. Supporting changes include trajectory recording, configuration updates, package dependency bumps, and test proof updates.

Changes

Repair-Aware Retry Hardening

Layer / File(s)	Summary
Shared Constant `src/shared/constants.ts`	`DEFAULT_REPAIR_RETRY_ATTEMPTS = 2` exported for workflow error handler generation.
Rendering Logic `src/product/generation/master-workflow-renderer.ts`, `src/product/generation/template-renderer.ts`	Master and template renderers now generate `onError` handlers that include `repairAgent` and `repairRetries`; master-run child invocation no longer passes `--no-auto-fix`; generated child final validation `failOnError` set non-terminal.
Validation Gate `src/product/generation/pipeline.ts`	Adds `REPAIR_AWARE_RETRY_MISSING` validation check requiring generated artifacts with deterministic gates to include repair-aware retry (retry handler with both `repairAgent` and `repairRetries`).
Auto-fix Runtime `src/local/auto-fix-loop.ts`	`runWithAutoFix` now retries when the repair provider throws (records warning, updates retry metadata); deterministic repair pipeline applies master/child and child-final-validation rewrites.
Tests / Fixtures `src/product/generation/pipeline.test.ts`, `src/local/auto-fix-loop.test.ts`	Updated assertions across workflow shapes to expect repair-aware retry with appropriate `repairAgent` values; new tests validate rejection of artifacts lacking `repairAgent`; auto-fix tests cover retry-on-throw behavior; helper fixtures for legacy master/child content added.

Evaluation Suite Infrastructure

Layer / File(s)	Summary
Evaluation Runners `scripts/evals/compile-ricky-evals.mjs`, `scripts/evals/run-ricky-evals.mjs`, `scripts/evals/summarize-ricky-evals.mjs`, `scripts/evals/compare-ricky-evals.mjs`	CLI scripts: compile markdown eval cases into JSONL, execute evaluations via manual/opencode/ricky-cli executors with redaction/truncation and provider routing, summarize run history, and compare recent runs for regressions.
Framework Documentation `evals/README.md`	Guide to authoring, compiling, and running eval suites; describes executor patterns, run artifact organization, and environment configuration.
NPM Scripts & Dependencies `package.json`	Adds npm scripts (`evals`, `evals:compile`, `evals:opencode`, `evals:list`, `evals:summary`, `evals:compare`); updates `@agent-assistant/turn-context` to `^0.4.31` and adds `@agent-assistant/telemetry` `^0.4.31`.
CLI Behavior Suite `evals/suites/cli-behavior/{cases.md,cases.jsonl,rubric.md}`	Suite covering `--help` surfaces, `version` output, generation defaults, onboarding copy, recovery guidance, and `ricky status` honesty.
Generation Quality Suite `evals/suites/generation-quality/{cases.md,cases.jsonl,rubric.md}`	Suite covering skill matching, tool selection, bounded `--refine`, behavior-grounded gates, pattern selection, skill-boundary copy, and proof requirements.
Runtime Recovery Suite `evals/suites/runtime-recovery/{cases.md,cases.jsonl,rubric.md}`	Suite covering failure classification, stale state detection, run conflicts, bounded auto-fix, no-auto-fix single attempt, in-process execution, escalation, and analytics.
Surfaces Ingress Suite `evals/suites/surfaces-ingress/{cases.md,cases.jsonl,rubric.md}`	Suite covering co-equal surfaces, normalizer boundaries, MCP/Claude context, Cloud API versioning, JSON output, Linear readiness/completion, and provider guidance.
Workflow Authoring Suite `evals/suites/workflow-authoring/{cases.md,cases.jsonl,rubric.md}`	Suite covering deterministic gates, reviewer separation, Cloud fallback, boundary preservation, evidence trails, naming, runtime wrapper shape, env loading, GitHub PR primitive usage, and template auditing.
Agent Assistant Boundary Suite `evals/suites/agent-assistant-boundary/{cases.md,cases.jsonl,rubric.md}`	Suite covering real reuse grounding, envelope preservation, product-core ownership, staged single-slice adoption, and future surface primitives.
Test Proof Updates `test/package-proof/package-layout-proof.ts`, `test/package-proof/package-layout-proof.test.ts`	Expanded canonical package script allowlist to include eval scripts; tests updated to assert README/manifest parity.
Trajectories & Fixtures `.trajectories/**`, `evals/fixtures/transcripts/.gitkeep`	Adds completed trajectory JSON/MD records, updates index.json, and a .gitkeep to preserve eval fixtures directory.
AGENTS guideline `AGENTS.md`	Mandates PR creation via `@agent-relay/github-primitive` instead of `gh pr create` for in-scope workflows.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

AgentWorkforce/ricky#72: Shares AGENTS.md change enforcing GitHub PR primitive usage.
AgentWorkforce/ricky#45: Overlapping auto-fix/deterministic repair logic and tests.
AgentWorkforce/ricky#58: Related deterministic auto-fix pipeline and test updates.

🐰 I nibbled at failures, made retries kind,
Renderers now mend where errors unwind.
Suites and scripts keep quality in sight,
Trails and tests make recovery right.
A rabbit hops on—proofs snug and bright.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title 'Emit repair-aware generated workflows' clearly and specifically describes the main change—making generated workflows use repair-aware retry patterns with repairAgent/repairRetries.
Description check	✅ Passed	The PR description is directly related to the changeset, providing a clear summary of objectives (repair-aware retry implementation, validation rules, coverage expansion, and package proof updates) that align with the file changes shown.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch codex/ricky-workflow-never-fail

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

devin-ai-integration

Devin Review found 1 potential issue.

View 5 additional findings in Devin Review.

devin-ai-integration · 2026-05-08T15:09:37Z

+      "status": "completed",
+      "startedAt": "2026-05-08T14:46:04.652Z",
+      "completedAt": "2026-05-08T14:54:06.222Z",
+      "path": "/Users/khaliqgant/Projects/AgentWorkforce/ricky-workflow-never-fail/.trajectories/completed/2026-05/traj_2f8sfg3akql2.json"


🟡 Trajectory index contains hardcoded absolute local path that breaks on all other machines

The .trajectories/index.json file (newly tracked after removing .trajectories/ from .gitignore) stores the trajectory path as an absolute local filesystem path: /Users/khaliqgant/Projects/AgentWorkforce/ricky-workflow-never-fail/.trajectories/completed/2026-05/traj_2f8sfg3akql2.json. The trail CLI tool (documented in AGENTS.md:86-170) uses this index to resolve trajectory files via commands like trail show <id> and trail list. On any machine other than the original author's, this path is invalid, making the index functionally broken for trajectory resolution. The same absolute path pattern appears in projectId at .trajectories/completed/2026-05/traj_2f8sfg3akql2.json:59.

Suggested change

"path": "/Users/khaliqgant/Projects/AgentWorkforce/ricky-workflow-never-fail/.trajectories/completed/2026-05/traj_2f8sfg3akql2.json"

"path": ".trajectories/completed/2026-05/traj_2f8sfg3akql2.json"

Was this helpful? React with 👍 or 👎 to provide feedback.

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (2)

evals/suites/workflow-authoring/cases.md (1)
35-50: ⚡ Quick win

Add operator-plan artifact intake to convention-update case criteria

For this convention-update scenario, add a Must requiring the workflow to read the operator plan artifact when one is provided. That closes a known reliability gap for convention-bound edits.

Based on learnings: “For convention-update workflows, read the operator plan artifact when one is provided before editing the declared convention files.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@evals/suites/workflow-authoring/cases.md` around lines 35 - 50, Update the
convention-update case's "Must" criteria to require reading the operator plan
artifact when present: add a Must entry stating the workflow must read the
operator plan artifact (e.g., from .workflow-artifacts/operator-plan) before
editing the declared convention files; locate the convention-update scenario's
Must list in cases.md and insert this requirement alongside the existing
persistence and reviewer rules so the workflow is required to consume the
operator plan artifact when provided.
evals/suites/generation-quality/cases.jsonl (1)
7-7: ⚡ Quick win

Align validation/fix/rerun loop guidance with the repo rule

Line 7 currently makes dag conditional (“when dependencies matter”). For eval consistency, this should require dag for validation/fix/rerun loops (update cases.md, then regenerate this JSONL).

Based on learnings: “Choose the swarm pattern from the work shape … use dag for validation, fix, and rerun loops.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@evals/suites/generation-quality/cases.jsonl` at line 7, Update the test case
with id "generation-quality.pattern-selection-deliberate" so the "must" array
requires "Use `dag` for validation/fix/rerun loops" unconditionally (remove the
conditional phrasing “when dependencies matter”), then update the corresponding
guidance in cases.md to match this rule and regenerate the JSONL so the file
reflects the tightened requirement; locate the JSON object by its id and edit
the "must" entry and the cases.md text accordingly before regenerating the JSONL
output.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.trajectories/index.json:
- Line 10: The index entry contains an absolute "path" value which breaks
portability; update the "path" field in .trajectories/index.json (the JSON
object with key "path") to a relative path from the repo root (e.g., "
.trajectories/completed/2026-05/traj_2f8sfg3akql2.json") and, if this file is
generated, modify the generator to produce relative paths (use
path.relative(repoRoot, filePath) or equivalent) so all entries use
repository-relative paths instead of user-specific absolute paths.

In `@evals/suites/cli-behavior/cases.md`:
- Line 53: Replace the hard-coded semver prefix in the CLI test pattern (the
line currently matching "^ricky 0\.1\.\d+") with a flexible semver-shaped regex
so the test validates version format rather than a specific minor version;
update the pattern in evals/suites/cli-behavior/cases.md (the line containing
the "^ricky 0\.1\.\d+" string) to something like a general semver match (e.g.
start with "^ricky " followed by "\d+\.\d+\.\d+([-.].+)?$") so pre-release/build
metadata are allowed and future releases don't break the case.

In `@scripts/evals/compare-ricky-evals.mjs`:
- Around line 58-65: The loadRuns function currently crashes when a malformed or
truncated result.json is encountered; modify loadRuns to handle per-file
read/parse errors by wrapping the JSON read/parse for each candidate file in a
try/catch, skip files that throw (optionally log or warn which file was
skipped), and only include successfully parsed run objects in the array before
sorting; keep the existing behavior: look under RUNS_DIR, build paths to
result.json, filter for existence, but protect the JSON.parse step in loadRuns
so a single bad result.json does not abort the whole operation.

In `@scripts/evals/compile-ricky-evals.mjs`:
- Around line 42-47: readOptionValue currently only rejects missing values or
tokens beginning with `--`, so short-option tokens like `-h` get accepted as
values; update the function `readOptionValue(argv, index, option)` to also treat
single-dash short-option tokens as missing values (e.g., detect tokens that
start with `-` but are not negative numbers) and throw the same `${option}
requires a value` error when encountered so `--suite -h` is rejected as a
missing value rather than accepted as `suite="-h"`.

In `@scripts/evals/run-ricky-evals.mjs`:
- Around line 193-205: The parseRickyEvalArgs function currently consumes the
token after '--executor' unconditionally; update it to validate that argv[index
+ 1] exists and is not another flag (e.g., does not start with '-') before
setting executorOverride and skipping the next token; if the next token is
missing or looks like a flag, do not consume it—either push '--executor' into
passthrough (or throw a clear error) and continue scanning. Locate
parseRickyEvalArgs, the executorOverride and passthrough logic, and implement
the token-existence and prefix-check (argv[index+1] &&
!argv[index+1].startsWith('-')) before assigning executorOverride.

In `@scripts/evals/summarize-ricky-evals.mjs`:
- Around line 35-39: The summary pipeline currently calls JSON.parse on every
result.json and will crash on a malformed file; update the chain that builds run
objects (the readdirSync(...).map(...).filter(...).map(...).sort(...)) to catch
JSON.parse/readFileSync errors per file: for each file path produced, read and
parse inside a try/catch (referencing RUNS_DIR, result.json, readFileSync,
JSON.parse, existsSync), log or warn about any malformed file and return null
for failures, then filter out null/undefined results before sorting so one bad
result.json no longer terminates the whole summary command.

In `@src/product/generation/pipeline.ts`:
- Around line 321-327: The check using requiresRepairAwareRetry(content) only
inspects top-level lines and misses .onError(...) blocks embedded inside
serialized child workflows; update the validation to examine all rendered child
sources (or the entire rendered content as a whole) so embedded onError handlers
are validated too. Concretely, locate where issues.push(blockingIssue(...)) is
called (the requiresRepairAwareRetry(content) check) and either (A) iterate over
child workflow sources before they are serialized and call
requiresRepairAwareRetry(childSource) for each, or (B) change
requiresRepairAwareRetry to operate on the full rendered content (not
line-based) and use a multiline/dotall regex to find .onError(...) blocks and
ensure they include repairAgent and repairRetries; apply the same fix to the
other occurrence around the 434-443 area so both top-level and embedded child
workflows are validated.

---

Nitpick comments:
In `@evals/suites/generation-quality/cases.jsonl`:
- Line 7: Update the test case with id
"generation-quality.pattern-selection-deliberate" so the "must" array requires
"Use `dag` for validation/fix/rerun loops" unconditionally (remove the
conditional phrasing “when dependencies matter”), then update the corresponding
guidance in cases.md to match this rule and regenerate the JSONL so the file
reflects the tightened requirement; locate the JSON object by its id and edit
the "must" entry and the cases.md text accordingly before regenerating the JSONL
output.

In `@evals/suites/workflow-authoring/cases.md`:
- Around line 35-50: Update the convention-update case's "Must" criteria to
require reading the operator plan artifact when present: add a Must entry
stating the workflow must read the operator plan artifact (e.g., from
.workflow-artifacts/operator-plan) before editing the declared convention files;
locate the convention-update scenario's Must list in cases.md and insert this
requirement alongside the existing persistence and reviewer rules so the
workflow is required to consume the operator plan artifact when provided.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: a69fdfcb-c5b3-48bf-a16c-737ed4400ca4

📥 Commits

Reviewing files that changed from the base of the PR and between 82abd39 and 800ddc8.

⛔ Files ignored due to path filters (1)

package-lock.json is excluded by !**/package-lock.json

📒 Files selected for processing (38)

.gitignore
.trajectories/completed/2026-05/traj_2f8sfg3akql2.json
.trajectories/completed/2026-05/traj_2f8sfg3akql2.md
.trajectories/index.json
AGENTS.md
README.md
evals/README.md
evals/fixtures/transcripts/.gitkeep
evals/suites/agent-assistant-boundary/cases.jsonl
evals/suites/agent-assistant-boundary/cases.md
evals/suites/agent-assistant-boundary/rubric.md
evals/suites/cli-behavior/cases.jsonl
evals/suites/cli-behavior/cases.md
evals/suites/cli-behavior/rubric.md
evals/suites/generation-quality/cases.jsonl
evals/suites/generation-quality/cases.md
evals/suites/generation-quality/rubric.md
evals/suites/runtime-recovery/cases.jsonl
evals/suites/runtime-recovery/cases.md
evals/suites/runtime-recovery/rubric.md
evals/suites/surfaces-ingress/cases.jsonl
evals/suites/surfaces-ingress/cases.md
evals/suites/surfaces-ingress/rubric.md
evals/suites/workflow-authoring/cases.jsonl
evals/suites/workflow-authoring/cases.md
evals/suites/workflow-authoring/rubric.md
package.json
scripts/evals/compare-ricky-evals.mjs
scripts/evals/compile-ricky-evals.mjs
scripts/evals/run-ricky-evals.mjs
scripts/evals/summarize-ricky-evals.mjs
src/product/generation/master-workflow-renderer.ts
src/product/generation/pipeline.test.ts
src/product/generation/pipeline.ts
src/product/generation/template-renderer.ts
src/shared/constants.ts
test/package-proof/package-layout-proof.test.ts
test/package-proof/package-layout-proof.ts

💤 Files with no reviewable changes (1)

.gitignore

coderabbitai · 2026-05-08T15:11:36Z

+      "status": "completed",
+      "startedAt": "2026-05-08T14:46:04.652Z",
+      "completedAt": "2026-05-08T14:54:06.222Z",
+      "path": "/Users/khaliqgant/Projects/AgentWorkforce/ricky-workflow-never-fail/.trajectories/completed/2026-05/traj_2f8sfg3akql2.json"


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Absolute path in committed index may cause portability issues.

The path field contains an absolute filesystem path (/Users/khaliqgant/Projects/AgentWorkforce/...), which will not work for other developers or in CI environments. Consider using a relative path from the repository root.

🔧 Suggested fix

- "path": "/Users/khaliqgant/Projects/AgentWorkforce/ricky-workflow-never-fail/.trajectories/completed/2026-05/traj_2f8sfg3akql2.json" + "path": ".trajectories/completed/2026-05/traj_2f8sfg3akql2.json"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"path": "/Users/khaliqgant/Projects/AgentWorkforce/ricky-workflow-never-fail/.trajectories/completed/2026-05/traj_2f8sfg3akql2.json"

"path": ".trajectories/completed/2026-05/traj_2f8sfg3akql2.json"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.trajectories/index.json at line 10, The index entry contains an absolute "path" value which breaks portability; update the "path" field in .trajectories/index.json (the JSON object with key "path") to a relative path from the repo root (e.g., " .trajectories/completed/2026-05/traj_2f8sfg3akql2.json") and, if this file is generated, modify the generator to produce relative paths (use path.relative(repoRoot, filePath) or equivalent) so all entries use repository-relative paths instead of user-specific absolute paths.

coderabbitai · 2026-05-08T15:11:36Z

+### Deterministic Checks
+ok: true
+contentMatches:
+- ^ricky 0\.1\.\d+


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Avoid pinning the version check to 0.1.x

Line 53 hard-codes 0.1, which makes this eval fail on normal package version evolution. Prefer a semver-shape match (for example, ^ricky \d+\.\d+\.\d+([-.].+)?$) so the case validates CLI behavior, not release cadence.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@evals/suites/cli-behavior/cases.md` at line 53, Replace the hard-coded semver prefix in the CLI test pattern (the line currently matching "^ricky 0\.1\.\d+") with a flexible semver-shaped regex so the test validates version format rather than a specific minor version; update the pattern in evals/suites/cli-behavior/cases.md (the line containing the "^ricky 0\.1\.\d+" string) to something like a general semver match (e.g. start with "^ricky " followed by "\d+\.\d+\.\d+([-.].+)?$") so pre-release/build metadata are allowed and future releases don't break the case.

coderabbitai · 2026-05-08T15:11:36Z

+function loadRuns() {
+  if (!existsSync(RUNS_DIR)) return [];
+  return readdirSync(RUNS_DIR)
+    .map((dir) => path.join(RUNS_DIR, dir, 'result.json'))
+    .filter((file) => existsSync(file))
+    .map((file) => JSON.parse(readFileSync(file, 'utf8')))
+    .sort((a, b) => String(b.timestamp).localeCompare(String(a.timestamp)));
+}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Harden run loading against malformed result.json files.

A single truncated/invalid run artifact currently crashes evals:compare; the command should skip bad entries and continue with readable runs.

💡 Proposed fix

function loadRuns() { if (!existsSync(RUNS_DIR)) return []; - return readdirSync(RUNS_DIR) - .map((dir) => path.join(RUNS_DIR, dir, 'result.json')) - .filter((file) => existsSync(file)) - .map((file) => JSON.parse(readFileSync(file, 'utf8'))) - .sort((a, b) => String(b.timestamp).localeCompare(String(a.timestamp))); + const runs = []; + for (const dir of readdirSync(RUNS_DIR)) { + const file = path.join(RUNS_DIR, dir, 'result.json'); + if (!existsSync(file)) continue; + try { + runs.push(JSON.parse(readFileSync(file, 'utf8'))); + } catch (error) { + const message = error instanceof Error ? error.message : String(error); + console.warn(`Skipping unreadable eval run: ${path.relative(ROOT, file)} (${message})`); + } + } + return runs.sort((a, b) => String(b.timestamp).localeCompare(String(a.timestamp))); }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/evals/compare-ricky-evals.mjs` around lines 58 - 65, The loadRuns function currently crashes when a malformed or truncated result.json is encountered; modify loadRuns to handle per-file read/parse errors by wrapping the JSON read/parse for each candidate file in a try/catch, skip files that throw (optionally log or warn which file was skipped), and only include successfully parsed run objects in the array before sorting; keep the existing behavior: look under RUNS_DIR, build paths to result.json, filter for existence, but protect the JSON.parse step in loadRuns so a single bad result.json does not abort the whole operation.

coderabbitai · 2026-05-08T15:11:36Z

+function readOptionValue(argv, index, option) {
+  const value = argv[index];
+  if (value === undefined || value.startsWith('--')) {
+    throw new Error(`${option} requires a value`);
+  }
+  return value;


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Reject short-option tokens as missing option values.

--suite -h is currently accepted as suite="-h" instead of being treated as a missing value, which leads to confusing downstream failures.

💡 Proposed fix

function readOptionValue(argv, index, option) { const value = argv[index]; - if (value === undefined || value.startsWith('--')) { + if (value === undefined || value.startsWith('-')) { throw new Error(`${option} requires a value`); } return value; }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/evals/compile-ricky-evals.mjs` around lines 42 - 47, readOptionValue currently only rejects missing values or tokens beginning with `--`, so short-option tokens like `-h` get accepted as values; update the function `readOptionValue(argv, index, option)` to also treat single-dash short-option tokens as missing values (e.g., detect tokens that start with `-` but are not negative numbers) and throw the same `${option} requires a value` error when encountered so `--suite -h` is rejected as a missing value rather than accepted as `suite="-h"`.

coderabbitai · 2026-05-08T15:11:36Z

+function parseRickyEvalArgs(argv) {
+  const passthrough = [];
+  let executorOverride;
+  for (let index = 0; index < argv.length; index += 1) {
+    const arg = argv[index];
+    if (arg === '--executor') {
+      executorOverride = argv[index + 1];
+      index += 1;
+      continue;
+    }
+    passthrough.push(arg);
+  }
+  return { argv: passthrough, executorOverride };


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate --executor value before consuming the next token.

The parser currently accepts missing/flag-like values for --executor, which can swallow the next CLI option (for example --suite) and change behavior silently.

💡 Proposed fix

function parseRickyEvalArgs(argv) { const passthrough = []; let executorOverride; for (let index = 0; index < argv.length; index += 1) { const arg = argv[index]; if (arg === '--executor') { - executorOverride = argv[index + 1]; + const value = argv[index + 1]; + if (!value || value.startsWith('-')) { + throw new Error('--executor requires a value'); + } + executorOverride = value; index += 1; continue; } passthrough.push(arg); } return { argv: passthrough, executorOverride }; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

function parseRickyEvalArgs(argv) {

const passthrough = [];

let executorOverride;

for (let index = 0; index < argv.length; index += 1) {

const arg = argv[index];

if (arg === '--executor') {

executorOverride = argv[index + 1];

index += 1;

continue;

}

passthrough.push(arg);

}

return { argv: passthrough, executorOverride };

function parseRickyEvalArgs(argv) {

const passthrough = [];

let executorOverride;

for (let index = 0; index < argv.length; index += 1) {

const arg = argv[index];

if (arg === '--executor') {

const value = argv[index + 1];

if (!value || value.startsWith('-')) {

throw new Error('--executor requires a value');

}

executorOverride = value;

index += 1;

continue;

}

passthrough.push(arg);

}

return { argv: passthrough, executorOverride };

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/evals/run-ricky-evals.mjs` around lines 193 - 205, The parseRickyEvalArgs function currently consumes the token after '--executor' unconditionally; update it to validate that argv[index + 1] exists and is not another flag (e.g., does not start with '-') before setting executorOverride and skipping the next token; if the next token is missing or looks like a flag, do not consume it—either push '--executor' into passthrough (or throw a clear error) and continue scanning. Locate parseRickyEvalArgs, the executorOverride and passthrough logic, and implement the token-existence and prefix-check (argv[index+1] && !argv[index+1].startsWith('-')) before assigning executorOverride.

coderabbitai · 2026-05-08T15:11:36Z

+  return readdirSync(RUNS_DIR)
+    .map((dir) => path.join(RUNS_DIR, dir, 'result.json'))
+    .filter((file) => existsSync(file))
+    .map((file) => JSON.parse(readFileSync(file, 'utf8')))
+    .sort((a, b) => String(b.timestamp).localeCompare(String(a.timestamp)));


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Handle malformed run files without crashing the summary command.

At Line 38, JSON.parse can throw on one bad result.json and terminate the entire report, even if other runs are valid.

Proposed fix

function loadRuns() { if (!existsSync(RUNS_DIR)) return []; return readdirSync(RUNS_DIR) .map((dir) => path.join(RUNS_DIR, dir, 'result.json')) .filter((file) => existsSync(file)) - .map((file) => JSON.parse(readFileSync(file, 'utf8'))) + .flatMap((file) => { + try { + return [JSON.parse(readFileSync(file, 'utf8'))]; + } catch { + console.warn(`Skipping invalid eval run file: ${path.relative(ROOT, file)}`); + return []; + } + }) .sort((a, b) => String(b.timestamp).localeCompare(String(a.timestamp))); }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/evals/summarize-ricky-evals.mjs` around lines 35 - 39, The summary pipeline currently calls JSON.parse on every result.json and will crash on a malformed file; update the chain that builds run objects (the readdirSync(...).map(...).filter(...).map(...).sort(...)) to catch JSON.parse/readFileSync errors per file: for each file path produced, read and parse inside a try/catch (referencing RUNS_DIR, result.json, readFileSync, JSON.parse, existsSync), log or warn about any malformed file and return null for failures, then filter out null/undefined results before sorting so one bad result.json no longer terminates the whole summary command.

coderabbitai · 2026-05-08T15:11:36Z

+  if (requiresRepairAwareRetry(content)) {
+    issues.push(blockingIssue(
+      'validation',
+      'REPAIR_AWARE_RETRY_MISSING',
+      'Rendered workflow must use retry error handling with repairAgent and repairRetries so repairable deterministic gates do not fail the workflow outright.',
+    ));
+  }


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate embedded child .onError() blocks, not just top-level lines.

Master artifacts serialize child workflows into JSON strings, so those repair policies never start a physical line in the outer artifact. This line-based scan only validates the outer master workflow, which means a child workflow can regress to fail-fast or plain retry and still avoid REPAIR_AWARE_RETRY_MISSING. Matching .onError(...) across the full rendered content, or validating each child source before serialization, would close that gap and also make the check less brittle to multiline property formatting.

Also applies to: 434-443

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/product/generation/pipeline.ts` around lines 321 - 327, The check using requiresRepairAwareRetry(content) only inspects top-level lines and misses .onError(...) blocks embedded inside serialized child workflows; update the validation to examine all rendered child sources (or the entire rendered content as a whole) so embedded onError handlers are validated too. Concretely, locate where issues.push(blockingIssue(...)) is called (the requiresRepairAwareRetry(content) check) and either (A) iterate over child workflow sources before they are serialized and call requiresRepairAwareRetry(childSource) for each, or (B) change requiresRepairAwareRetry to operate on the full rendered content (not line-based) and use a multiline/dotall regex to find .onError(...) blocks and ensure they include repairAgent and repairRetries; apply the same fix to the other occurrence around the 434-443 area so both top-level and embedded child workflows are validated.

coderabbitai

🧹 Nitpick comments (1)

src/local/auto-fix-loop.ts (1)

1016-1023: ⚡ Quick win

Consider using shared constants for repair-aware retry configuration.

The repair configuration at line 1021 uses hardcoded values (maxRetries: 2, retryDelayMs: 1000, repairRetries: 2), while master-workflow-renderer.ts uses imported constants (DEFAULT_RETRY_MAX_ATTEMPTS, DEFAULT_RETRY_BACKOFF_MS, DEFAULT_REPAIR_RETRY_ATTEMPTS). This divergence can lead to inconsistent behavior if constants change.

♻️ Suggested fix

+import {
+  DEFAULT_REPAIR_RETRY_ATTEMPTS,
+  DEFAULT_RETRY_BACKOFF_MS,
+  DEFAULT_RETRY_MAX_ATTEMPTS,
+} from '../shared/constants.js';

 // ... in repairMasterChildRunRepairLoop:
       changes.push('replaced fail-fast error handling with repair-aware retry');
       const indent = match.match(/^\s*/)?.[0] ?? '';
-      return `${indent}.onError('retry', { maxRetries: 2, retryDelayMs: 1000, repairAgent: "master-lead", repairRetries: 2 })`;
+      return `${indent}.onError('retry', { maxRetries: ${DEFAULT_RETRY_MAX_ATTEMPTS}, retryDelayMs: ${DEFAULT_RETRY_BACKOFF_MS}, repairAgent: "master-lead", repairRetries: ${DEFAULT_REPAIR_RETRY_ATTEMPTS} })`;

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/local/auto-fix-loop.ts` around lines 1016 - 1023, Replace the hardcoded
retry/repair numbers in the .onError replacement with the shared constants used
by master-workflow-renderer.ts: import DEFAULT_RETRY_MAX_ATTEMPTS,
DEFAULT_RETRY_BACKOFF_MS, and DEFAULT_REPAIR_RETRY_ATTEMPTS at the top of
src/local/auto-fix-loop.ts, then use those constants instead of 2, 1000, and 2
in the returned string inside the next.replace callback (the callback that
currently pushes 'replaced fail-fast error handling with repair-aware retry');
this keeps retry, backoff and repair retry values consistent across modules.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/local/auto-fix-loop.ts`:
- Around line 1016-1023: Replace the hardcoded retry/repair numbers in the
.onError replacement with the shared constants used by
master-workflow-renderer.ts: import DEFAULT_RETRY_MAX_ATTEMPTS,
DEFAULT_RETRY_BACKOFF_MS, and DEFAULT_REPAIR_RETRY_ATTEMPTS at the top of
src/local/auto-fix-loop.ts, then use those constants instead of 2, 1000, and 2
in the returned string inside the next.replace callback (the callback that
currently pushes 'replaced fail-fast error handling with repair-aware retry');
this keeps retry, backoff and repair retry values consistent across modules.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: a6c16171-c532-40a6-8872-79cc0c1ec53d

📥 Commits

Reviewing files that changed from the base of the PR and between 800ddc8 and d800f6e.

📒 Files selected for processing (7)

.trajectories/completed/2026-05/traj_b77ch3xz99r8.json
.trajectories/completed/2026-05/traj_b77ch3xz99r8.md
.trajectories/index.json
src/local/auto-fix-loop.test.ts
src/local/auto-fix-loop.ts
src/product/generation/master-workflow-renderer.ts
src/product/generation/pipeline.test.ts

✅ Files skipped from review due to trivial changes (2)

.trajectories/completed/2026-05/traj_b77ch3xz99r8.json
.trajectories/completed/2026-05/traj_b77ch3xz99r8.md

🚧 Files skipped from review as they are similar to previous changes (1)

src/product/generation/pipeline.test.ts

khaliqgant and others added 5 commits May 8, 2026 15:33

test(evals): add Ricky doc-derived eval suite

d291798

test(evals): add local opencode provider path

39f581e

test(evals): simplify opencode eval command

3354938

feat(generation): emit repair-aware workflows

800ddc8

devin-ai-integration Bot reviewed May 8, 2026

View reviewed changes

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

fix(generation): keep master child repairs alive

d800f6e

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

khaliqgant merged commit 45ae3cf into main May 8, 2026
1 check passed

khaliqgant deleted the codex/ricky-workflow-never-fail branch May 8, 2026 15:46

khaliqgant mentioned this pull request May 8, 2026

test: add generated workflow reliability contract #77

Open

4 tasks

coderabbitai Bot mentioned this pull request May 8, 2026

ci: run Ricky provider evals on PRs #78

Merged

	"path": "/Users/khaliqgant/Projects/AgentWorkforce/ricky-workflow-never-fail/.trajectories/completed/2026-05/traj_2f8sfg3akql2.json"
	"path": ".trajectories/completed/2026-05/traj_2f8sfg3akql2.json"

Conversation

khaliqgant commented May 8, 2026

Summary

Related

Verification

Uh oh!

coderabbitai Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Review ran into problems

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 8, 2026 •

edited

Loading