Skip to content

Emit repair-aware generated workflows#75

Merged
khaliqgant merged 6 commits intomainfrom
codex/ricky-workflow-never-fail
May 8, 2026
Merged

Emit repair-aware generated workflows#75
khaliqgant merged 6 commits intomainfrom
codex/ricky-workflow-never-fail

Conversation

@khaliqgant
Copy link
Copy Markdown
Member

Summary

  • make Ricky-generated ordinary, master, and child workflows opt into repair-aware retry with repairAgent/repairRetries
  • reject generated workflows that still use live fail-fast or retry without repair metadata
  • extend generator coverage across code, doc/spec, low-risk pipeline, master, and child workflow shapes
  • keep the local auto-fix failure ladder passing and sync package proof/docs for existing eval scripts

Related

Verification

  • npm run typecheck
  • npx vitest run src/product/generation/pipeline.test.ts
  • npx vitest run src/local/auto-fix-loop.test.ts
  • npx vitest run test/local-auto-fix-workflow-failures.e2e.test.ts
  • npx vitest run test/package-proof/package-layout-proof.test.ts
  • npm test
  • git diff --check

khaliqgant and others added 5 commits May 8, 2026 15:33
Workflows that own the PR-creation boundary must use
@agent-relay/github-primitive (createGitHubStep with action createPR, or
GitHubClient.createPR) rather than shelling out to gh pr create. The
primitive handles runtime selection across local gh, Nango cloud, and
relay-cloud proxy and produces structured outputs the orchestrator can
chain on. Master/orchestration workflows that complete an end-to-end
deliverable should default to including a final-pr step using this
primitive.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR introduces two independent, significant features: (1) repair-aware retry hardening for generated workflows, ensuring all workflows emit retry handlers that include repair agents rather than fail-fast, and (2) a comprehensive human-authored product evaluation suite framework with six thematic test suites, four npm CLI scripts for compilation and execution, and detailed specifications covering CLI behavior, generation quality, runtime recovery, surface ingress, workflow authoring, and agent-assistant boundary semantics. Supporting changes include trajectory recording, configuration updates, package dependency bumps, and test proof updates.

Changes

Repair-Aware Retry Hardening

Layer / File(s) Summary
Shared Constant
src/shared/constants.ts
DEFAULT_REPAIR_RETRY_ATTEMPTS = 2 exported for workflow error handler generation.
Rendering Logic
src/product/generation/master-workflow-renderer.ts, src/product/generation/template-renderer.ts
Master and template renderers now generate onError handlers that include repairAgent and repairRetries; master-run child invocation no longer passes --no-auto-fix; generated child final validation failOnError set non-terminal.
Validation Gate
src/product/generation/pipeline.ts
Adds REPAIR_AWARE_RETRY_MISSING validation check requiring generated artifacts with deterministic gates to include repair-aware retry (retry handler with both repairAgent and repairRetries).
Auto-fix Runtime
src/local/auto-fix-loop.ts
runWithAutoFix now retries when the repair provider throws (records warning, updates retry metadata); deterministic repair pipeline applies master/child and child-final-validation rewrites.
Tests / Fixtures
src/product/generation/pipeline.test.ts, src/local/auto-fix-loop.test.ts
Updated assertions across workflow shapes to expect repair-aware retry with appropriate repairAgent values; new tests validate rejection of artifacts lacking repairAgent; auto-fix tests cover retry-on-throw behavior; helper fixtures for legacy master/child content added.

Evaluation Suite Infrastructure

Layer / File(s) Summary
Evaluation Runners
scripts/evals/compile-ricky-evals.mjs, scripts/evals/run-ricky-evals.mjs, scripts/evals/summarize-ricky-evals.mjs, scripts/evals/compare-ricky-evals.mjs
CLI scripts: compile markdown eval cases into JSONL, execute evaluations via manual/opencode/ricky-cli executors with redaction/truncation and provider routing, summarize run history, and compare recent runs for regressions.
Framework Documentation
evals/README.md
Guide to authoring, compiling, and running eval suites; describes executor patterns, run artifact organization, and environment configuration.
NPM Scripts & Dependencies
package.json
Adds npm scripts (evals, evals:compile, evals:opencode, evals:list, evals:summary, evals:compare); updates @agent-assistant/turn-context to ^0.4.31 and adds @agent-assistant/telemetry ^0.4.31.
CLI Behavior Suite
evals/suites/cli-behavior/{cases.md,cases.jsonl,rubric.md}
Suite covering --help surfaces, version output, generation defaults, onboarding copy, recovery guidance, and ricky status honesty.
Generation Quality Suite
evals/suites/generation-quality/{cases.md,cases.jsonl,rubric.md}
Suite covering skill matching, tool selection, bounded --refine, behavior-grounded gates, pattern selection, skill-boundary copy, and proof requirements.
Runtime Recovery Suite
evals/suites/runtime-recovery/{cases.md,cases.jsonl,rubric.md}
Suite covering failure classification, stale state detection, run conflicts, bounded auto-fix, no-auto-fix single attempt, in-process execution, escalation, and analytics.
Surfaces Ingress Suite
evals/suites/surfaces-ingress/{cases.md,cases.jsonl,rubric.md}
Suite covering co-equal surfaces, normalizer boundaries, MCP/Claude context, Cloud API versioning, JSON output, Linear readiness/completion, and provider guidance.
Workflow Authoring Suite
evals/suites/workflow-authoring/{cases.md,cases.jsonl,rubric.md}
Suite covering deterministic gates, reviewer separation, Cloud fallback, boundary preservation, evidence trails, naming, runtime wrapper shape, env loading, GitHub PR primitive usage, and template auditing.
Agent Assistant Boundary Suite
evals/suites/agent-assistant-boundary/{cases.md,cases.jsonl,rubric.md}
Suite covering real reuse grounding, envelope preservation, product-core ownership, staged single-slice adoption, and future surface primitives.
Test Proof Updates
test/package-proof/package-layout-proof.ts, test/package-proof/package-layout-proof.test.ts
Expanded canonical package script allowlist to include eval scripts; tests updated to assert README/manifest parity.
Trajectories & Fixtures
.trajectories/**, evals/fixtures/transcripts/.gitkeep
Adds completed trajectory JSON/MD records, updates index.json, and a .gitkeep to preserve eval fixtures directory.
AGENTS guideline
AGENTS.md
Mandates PR creation via @agent-relay/github-primitive instead of gh pr create for in-scope workflows.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

🐰 I nibbled at failures, made retries kind,
Renderers now mend where errors unwind.
Suites and scripts keep quality in sight,
Trails and tests make recovery right.
A rabbit hops on—proofs snug and bright.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'Emit repair-aware generated workflows' clearly and specifically describes the main change—making generated workflows use repair-aware retry patterns with repairAgent/repairRetries.
Description check ✅ Passed The PR description is directly related to the changeset, providing a clear summary of objectives (repair-aware retry implementation, validation rules, coverage expansion, and package proof updates) that align with the file changes shown.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/ricky-workflow-never-fail

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 5 additional findings in Devin Review.

Open in Devin Review

Comment thread .trajectories/index.json
"status": "completed",
"startedAt": "2026-05-08T14:46:04.652Z",
"completedAt": "2026-05-08T14:54:06.222Z",
"path": "/Users/khaliqgant/Projects/AgentWorkforce/ricky-workflow-never-fail/.trajectories/completed/2026-05/traj_2f8sfg3akql2.json"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Trajectory index contains hardcoded absolute local path that breaks on all other machines

The .trajectories/index.json file (newly tracked after removing .trajectories/ from .gitignore) stores the trajectory path as an absolute local filesystem path: /Users/khaliqgant/Projects/AgentWorkforce/ricky-workflow-never-fail/.trajectories/completed/2026-05/traj_2f8sfg3akql2.json. The trail CLI tool (documented in AGENTS.md:86-170) uses this index to resolve trajectory files via commands like trail show <id> and trail list. On any machine other than the original author's, this path is invalid, making the index functionally broken for trajectory resolution. The same absolute path pattern appears in projectId at .trajectories/completed/2026-05/traj_2f8sfg3akql2.json:59.

Suggested change
"path": "/Users/khaliqgant/Projects/AgentWorkforce/ricky-workflow-never-fail/.trajectories/completed/2026-05/traj_2f8sfg3akql2.json"
"path": ".trajectories/completed/2026-05/traj_2f8sfg3akql2.json"
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (2)
evals/suites/workflow-authoring/cases.md (1)

35-50: ⚡ Quick win

Add operator-plan artifact intake to convention-update case criteria

For this convention-update scenario, add a Must requiring the workflow to read the operator plan artifact when one is provided. That closes a known reliability gap for convention-bound edits.

Based on learnings: “For convention-update workflows, read the operator plan artifact when one is provided before editing the declared convention files.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@evals/suites/workflow-authoring/cases.md` around lines 35 - 50, Update the
convention-update case's "Must" criteria to require reading the operator plan
artifact when present: add a Must entry stating the workflow must read the
operator plan artifact (e.g., from .workflow-artifacts/operator-plan) before
editing the declared convention files; locate the convention-update scenario's
Must list in cases.md and insert this requirement alongside the existing
persistence and reviewer rules so the workflow is required to consume the
operator plan artifact when provided.
evals/suites/generation-quality/cases.jsonl (1)

7-7: ⚡ Quick win

Align validation/fix/rerun loop guidance with the repo rule

Line 7 currently makes dag conditional (“when dependencies matter”). For eval consistency, this should require dag for validation/fix/rerun loops (update cases.md, then regenerate this JSONL).

Based on learnings: “Choose the swarm pattern from the work shape … use dag for validation, fix, and rerun loops.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@evals/suites/generation-quality/cases.jsonl` at line 7, Update the test case
with id "generation-quality.pattern-selection-deliberate" so the "must" array
requires "Use `dag` for validation/fix/rerun loops" unconditionally (remove the
conditional phrasing “when dependencies matter”), then update the corresponding
guidance in cases.md to match this rule and regenerate the JSONL so the file
reflects the tightened requirement; locate the JSON object by its id and edit
the "must" entry and the cases.md text accordingly before regenerating the JSONL
output.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.trajectories/index.json:
- Line 10: The index entry contains an absolute "path" value which breaks
portability; update the "path" field in .trajectories/index.json (the JSON
object with key "path") to a relative path from the repo root (e.g., "
.trajectories/completed/2026-05/traj_2f8sfg3akql2.json") and, if this file is
generated, modify the generator to produce relative paths (use
path.relative(repoRoot, filePath) or equivalent) so all entries use
repository-relative paths instead of user-specific absolute paths.

In `@evals/suites/cli-behavior/cases.md`:
- Line 53: Replace the hard-coded semver prefix in the CLI test pattern (the
line currently matching "^ricky 0\.1\.\d+") with a flexible semver-shaped regex
so the test validates version format rather than a specific minor version;
update the pattern in evals/suites/cli-behavior/cases.md (the line containing
the "^ricky 0\.1\.\d+" string) to something like a general semver match (e.g.
start with "^ricky " followed by "\d+\.\d+\.\d+([-.].+)?$") so pre-release/build
metadata are allowed and future releases don't break the case.

In `@scripts/evals/compare-ricky-evals.mjs`:
- Around line 58-65: The loadRuns function currently crashes when a malformed or
truncated result.json is encountered; modify loadRuns to handle per-file
read/parse errors by wrapping the JSON read/parse for each candidate file in a
try/catch, skip files that throw (optionally log or warn which file was
skipped), and only include successfully parsed run objects in the array before
sorting; keep the existing behavior: look under RUNS_DIR, build paths to
result.json, filter for existence, but protect the JSON.parse step in loadRuns
so a single bad result.json does not abort the whole operation.

In `@scripts/evals/compile-ricky-evals.mjs`:
- Around line 42-47: readOptionValue currently only rejects missing values or
tokens beginning with `--`, so short-option tokens like `-h` get accepted as
values; update the function `readOptionValue(argv, index, option)` to also treat
single-dash short-option tokens as missing values (e.g., detect tokens that
start with `-` but are not negative numbers) and throw the same `${option}
requires a value` error when encountered so `--suite -h` is rejected as a
missing value rather than accepted as `suite="-h"`.

In `@scripts/evals/run-ricky-evals.mjs`:
- Around line 193-205: The parseRickyEvalArgs function currently consumes the
token after '--executor' unconditionally; update it to validate that argv[index
+ 1] exists and is not another flag (e.g., does not start with '-') before
setting executorOverride and skipping the next token; if the next token is
missing or looks like a flag, do not consume it—either push '--executor' into
passthrough (or throw a clear error) and continue scanning. Locate
parseRickyEvalArgs, the executorOverride and passthrough logic, and implement
the token-existence and prefix-check (argv[index+1] &&
!argv[index+1].startsWith('-')) before assigning executorOverride.

In `@scripts/evals/summarize-ricky-evals.mjs`:
- Around line 35-39: The summary pipeline currently calls JSON.parse on every
result.json and will crash on a malformed file; update the chain that builds run
objects (the readdirSync(...).map(...).filter(...).map(...).sort(...)) to catch
JSON.parse/readFileSync errors per file: for each file path produced, read and
parse inside a try/catch (referencing RUNS_DIR, result.json, readFileSync,
JSON.parse, existsSync), log or warn about any malformed file and return null
for failures, then filter out null/undefined results before sorting so one bad
result.json no longer terminates the whole summary command.

In `@src/product/generation/pipeline.ts`:
- Around line 321-327: The check using requiresRepairAwareRetry(content) only
inspects top-level lines and misses .onError(...) blocks embedded inside
serialized child workflows; update the validation to examine all rendered child
sources (or the entire rendered content as a whole) so embedded onError handlers
are validated too. Concretely, locate where issues.push(blockingIssue(...)) is
called (the requiresRepairAwareRetry(content) check) and either (A) iterate over
child workflow sources before they are serialized and call
requiresRepairAwareRetry(childSource) for each, or (B) change
requiresRepairAwareRetry to operate on the full rendered content (not
line-based) and use a multiline/dotall regex to find .onError(...) blocks and
ensure they include repairAgent and repairRetries; apply the same fix to the
other occurrence around the 434-443 area so both top-level and embedded child
workflows are validated.

---

Nitpick comments:
In `@evals/suites/generation-quality/cases.jsonl`:
- Line 7: Update the test case with id
"generation-quality.pattern-selection-deliberate" so the "must" array requires
"Use `dag` for validation/fix/rerun loops" unconditionally (remove the
conditional phrasing “when dependencies matter”), then update the corresponding
guidance in cases.md to match this rule and regenerate the JSONL so the file
reflects the tightened requirement; locate the JSON object by its id and edit
the "must" entry and the cases.md text accordingly before regenerating the JSONL
output.

In `@evals/suites/workflow-authoring/cases.md`:
- Around line 35-50: Update the convention-update case's "Must" criteria to
require reading the operator plan artifact when present: add a Must entry
stating the workflow must read the operator plan artifact (e.g., from
.workflow-artifacts/operator-plan) before editing the declared convention files;
locate the convention-update scenario's Must list in cases.md and insert this
requirement alongside the existing persistence and reviewer rules so the
workflow is required to consume the operator plan artifact when provided.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: a69fdfcb-c5b3-48bf-a16c-737ed4400ca4

📥 Commits

Reviewing files that changed from the base of the PR and between 82abd39 and 800ddc8.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (38)
  • .gitignore
  • .trajectories/completed/2026-05/traj_2f8sfg3akql2.json
  • .trajectories/completed/2026-05/traj_2f8sfg3akql2.md
  • .trajectories/index.json
  • AGENTS.md
  • README.md
  • evals/README.md
  • evals/fixtures/transcripts/.gitkeep
  • evals/suites/agent-assistant-boundary/cases.jsonl
  • evals/suites/agent-assistant-boundary/cases.md
  • evals/suites/agent-assistant-boundary/rubric.md
  • evals/suites/cli-behavior/cases.jsonl
  • evals/suites/cli-behavior/cases.md
  • evals/suites/cli-behavior/rubric.md
  • evals/suites/generation-quality/cases.jsonl
  • evals/suites/generation-quality/cases.md
  • evals/suites/generation-quality/rubric.md
  • evals/suites/runtime-recovery/cases.jsonl
  • evals/suites/runtime-recovery/cases.md
  • evals/suites/runtime-recovery/rubric.md
  • evals/suites/surfaces-ingress/cases.jsonl
  • evals/suites/surfaces-ingress/cases.md
  • evals/suites/surfaces-ingress/rubric.md
  • evals/suites/workflow-authoring/cases.jsonl
  • evals/suites/workflow-authoring/cases.md
  • evals/suites/workflow-authoring/rubric.md
  • package.json
  • scripts/evals/compare-ricky-evals.mjs
  • scripts/evals/compile-ricky-evals.mjs
  • scripts/evals/run-ricky-evals.mjs
  • scripts/evals/summarize-ricky-evals.mjs
  • src/product/generation/master-workflow-renderer.ts
  • src/product/generation/pipeline.test.ts
  • src/product/generation/pipeline.ts
  • src/product/generation/template-renderer.ts
  • src/shared/constants.ts
  • test/package-proof/package-layout-proof.test.ts
  • test/package-proof/package-layout-proof.ts
💤 Files with no reviewable changes (1)
  • .gitignore

Comment thread .trajectories/index.json
"status": "completed",
"startedAt": "2026-05-08T14:46:04.652Z",
"completedAt": "2026-05-08T14:54:06.222Z",
"path": "/Users/khaliqgant/Projects/AgentWorkforce/ricky-workflow-never-fail/.trajectories/completed/2026-05/traj_2f8sfg3akql2.json"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Absolute path in committed index may cause portability issues.

The path field contains an absolute filesystem path (/Users/khaliqgant/Projects/AgentWorkforce/...), which will not work for other developers or in CI environments. Consider using a relative path from the repository root.

🔧 Suggested fix
-      "path": "/Users/khaliqgant/Projects/AgentWorkforce/ricky-workflow-never-fail/.trajectories/completed/2026-05/traj_2f8sfg3akql2.json"
+      "path": ".trajectories/completed/2026-05/traj_2f8sfg3akql2.json"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"path": "/Users/khaliqgant/Projects/AgentWorkforce/ricky-workflow-never-fail/.trajectories/completed/2026-05/traj_2f8sfg3akql2.json"
"path": ".trajectories/completed/2026-05/traj_2f8sfg3akql2.json"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.trajectories/index.json at line 10, The index entry contains an absolute
"path" value which breaks portability; update the "path" field in
.trajectories/index.json (the JSON object with key "path") to a relative path
from the repo root (e.g., "
.trajectories/completed/2026-05/traj_2f8sfg3akql2.json") and, if this file is
generated, modify the generator to produce relative paths (use
path.relative(repoRoot, filePath) or equivalent) so all entries use
repository-relative paths instead of user-specific absolute paths.

### Deterministic Checks
ok: true
contentMatches:
- ^ricky 0\.1\.\d+
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Avoid pinning the version check to 0.1.x

Line 53 hard-codes 0.1, which makes this eval fail on normal package version evolution. Prefer a semver-shape match (for example, ^ricky \d+\.\d+\.\d+([-.].+)?$) so the case validates CLI behavior, not release cadence.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@evals/suites/cli-behavior/cases.md` at line 53, Replace the hard-coded semver
prefix in the CLI test pattern (the line currently matching "^ricky 0\.1\.\d+")
with a flexible semver-shaped regex so the test validates version format rather
than a specific minor version; update the pattern in
evals/suites/cli-behavior/cases.md (the line containing the "^ricky 0\.1\.\d+"
string) to something like a general semver match (e.g. start with "^ricky "
followed by "\d+\.\d+\.\d+([-.].+)?$") so pre-release/build metadata are allowed
and future releases don't break the case.

Comment on lines +58 to +65
function loadRuns() {
if (!existsSync(RUNS_DIR)) return [];
return readdirSync(RUNS_DIR)
.map((dir) => path.join(RUNS_DIR, dir, 'result.json'))
.filter((file) => existsSync(file))
.map((file) => JSON.parse(readFileSync(file, 'utf8')))
.sort((a, b) => String(b.timestamp).localeCompare(String(a.timestamp)));
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Harden run loading against malformed result.json files.

A single truncated/invalid run artifact currently crashes evals:compare; the command should skip bad entries and continue with readable runs.

💡 Proposed fix
 function loadRuns() {
   if (!existsSync(RUNS_DIR)) return [];
-  return readdirSync(RUNS_DIR)
-    .map((dir) => path.join(RUNS_DIR, dir, 'result.json'))
-    .filter((file) => existsSync(file))
-    .map((file) => JSON.parse(readFileSync(file, 'utf8')))
-    .sort((a, b) => String(b.timestamp).localeCompare(String(a.timestamp)));
+  const runs = [];
+  for (const dir of readdirSync(RUNS_DIR)) {
+    const file = path.join(RUNS_DIR, dir, 'result.json');
+    if (!existsSync(file)) continue;
+    try {
+      runs.push(JSON.parse(readFileSync(file, 'utf8')));
+    } catch (error) {
+      const message = error instanceof Error ? error.message : String(error);
+      console.warn(`Skipping unreadable eval run: ${path.relative(ROOT, file)} (${message})`);
+    }
+  }
+  return runs.sort((a, b) => String(b.timestamp).localeCompare(String(a.timestamp)));
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/evals/compare-ricky-evals.mjs` around lines 58 - 65, The loadRuns
function currently crashes when a malformed or truncated result.json is
encountered; modify loadRuns to handle per-file read/parse errors by wrapping
the JSON read/parse for each candidate file in a try/catch, skip files that
throw (optionally log or warn which file was skipped), and only include
successfully parsed run objects in the array before sorting; keep the existing
behavior: look under RUNS_DIR, build paths to result.json, filter for existence,
but protect the JSON.parse step in loadRuns so a single bad result.json does not
abort the whole operation.

Comment on lines +42 to +47
function readOptionValue(argv, index, option) {
const value = argv[index];
if (value === undefined || value.startsWith('--')) {
throw new Error(`${option} requires a value`);
}
return value;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Reject short-option tokens as missing option values.

--suite -h is currently accepted as suite="-h" instead of being treated as a missing value, which leads to confusing downstream failures.

💡 Proposed fix
 function readOptionValue(argv, index, option) {
   const value = argv[index];
-  if (value === undefined || value.startsWith('--')) {
+  if (value === undefined || value.startsWith('-')) {
     throw new Error(`${option} requires a value`);
   }
   return value;
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/evals/compile-ricky-evals.mjs` around lines 42 - 47, readOptionValue
currently only rejects missing values or tokens beginning with `--`, so
short-option tokens like `-h` get accepted as values; update the function
`readOptionValue(argv, index, option)` to also treat single-dash short-option
tokens as missing values (e.g., detect tokens that start with `-` but are not
negative numbers) and throw the same `${option} requires a value` error when
encountered so `--suite -h` is rejected as a missing value rather than accepted
as `suite="-h"`.

Comment on lines +193 to +205
function parseRickyEvalArgs(argv) {
const passthrough = [];
let executorOverride;
for (let index = 0; index < argv.length; index += 1) {
const arg = argv[index];
if (arg === '--executor') {
executorOverride = argv[index + 1];
index += 1;
continue;
}
passthrough.push(arg);
}
return { argv: passthrough, executorOverride };
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate --executor value before consuming the next token.

The parser currently accepts missing/flag-like values for --executor, which can swallow the next CLI option (for example --suite) and change behavior silently.

💡 Proposed fix
 function parseRickyEvalArgs(argv) {
   const passthrough = [];
   let executorOverride;
   for (let index = 0; index < argv.length; index += 1) {
     const arg = argv[index];
     if (arg === '--executor') {
-      executorOverride = argv[index + 1];
+      const value = argv[index + 1];
+      if (!value || value.startsWith('-')) {
+        throw new Error('--executor requires a value');
+      }
+      executorOverride = value;
       index += 1;
       continue;
     }
     passthrough.push(arg);
   }
   return { argv: passthrough, executorOverride };
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
function parseRickyEvalArgs(argv) {
const passthrough = [];
let executorOverride;
for (let index = 0; index < argv.length; index += 1) {
const arg = argv[index];
if (arg === '--executor') {
executorOverride = argv[index + 1];
index += 1;
continue;
}
passthrough.push(arg);
}
return { argv: passthrough, executorOverride };
function parseRickyEvalArgs(argv) {
const passthrough = [];
let executorOverride;
for (let index = 0; index < argv.length; index += 1) {
const arg = argv[index];
if (arg === '--executor') {
const value = argv[index + 1];
if (!value || value.startsWith('-')) {
throw new Error('--executor requires a value');
}
executorOverride = value;
index += 1;
continue;
}
passthrough.push(arg);
}
return { argv: passthrough, executorOverride };
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/evals/run-ricky-evals.mjs` around lines 193 - 205, The
parseRickyEvalArgs function currently consumes the token after '--executor'
unconditionally; update it to validate that argv[index + 1] exists and is not
another flag (e.g., does not start with '-') before setting executorOverride and
skipping the next token; if the next token is missing or looks like a flag, do
not consume it—either push '--executor' into passthrough (or throw a clear
error) and continue scanning. Locate parseRickyEvalArgs, the executorOverride
and passthrough logic, and implement the token-existence and prefix-check
(argv[index+1] && !argv[index+1].startsWith('-')) before assigning
executorOverride.

Comment on lines +35 to +39
return readdirSync(RUNS_DIR)
.map((dir) => path.join(RUNS_DIR, dir, 'result.json'))
.filter((file) => existsSync(file))
.map((file) => JSON.parse(readFileSync(file, 'utf8')))
.sort((a, b) => String(b.timestamp).localeCompare(String(a.timestamp)));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Handle malformed run files without crashing the summary command.

At Line 38, JSON.parse can throw on one bad result.json and terminate the entire report, even if other runs are valid.

Proposed fix
 function loadRuns() {
   if (!existsSync(RUNS_DIR)) return [];
   return readdirSync(RUNS_DIR)
     .map((dir) => path.join(RUNS_DIR, dir, 'result.json'))
     .filter((file) => existsSync(file))
-    .map((file) => JSON.parse(readFileSync(file, 'utf8')))
+    .flatMap((file) => {
+      try {
+        return [JSON.parse(readFileSync(file, 'utf8'))];
+      } catch {
+        console.warn(`Skipping invalid eval run file: ${path.relative(ROOT, file)}`);
+        return [];
+      }
+    })
     .sort((a, b) => String(b.timestamp).localeCompare(String(a.timestamp)));
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/evals/summarize-ricky-evals.mjs` around lines 35 - 39, The summary
pipeline currently calls JSON.parse on every result.json and will crash on a
malformed file; update the chain that builds run objects (the
readdirSync(...).map(...).filter(...).map(...).sort(...)) to catch
JSON.parse/readFileSync errors per file: for each file path produced, read and
parse inside a try/catch (referencing RUNS_DIR, result.json, readFileSync,
JSON.parse, existsSync), log or warn about any malformed file and return null
for failures, then filter out null/undefined results before sorting so one bad
result.json no longer terminates the whole summary command.

Comment on lines +321 to +327
if (requiresRepairAwareRetry(content)) {
issues.push(blockingIssue(
'validation',
'REPAIR_AWARE_RETRY_MISSING',
'Rendered workflow must use retry error handling with repairAgent and repairRetries so repairable deterministic gates do not fail the workflow outright.',
));
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate embedded child .onError() blocks, not just top-level lines.

Master artifacts serialize child workflows into JSON strings, so those repair policies never start a physical line in the outer artifact. This line-based scan only validates the outer master workflow, which means a child workflow can regress to fail-fast or plain retry and still avoid REPAIR_AWARE_RETRY_MISSING. Matching .onError(...) across the full rendered content, or validating each child source before serialization, would close that gap and also make the check less brittle to multiline property formatting.

Also applies to: 434-443

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/product/generation/pipeline.ts` around lines 321 - 327, The check using
requiresRepairAwareRetry(content) only inspects top-level lines and misses
.onError(...) blocks embedded inside serialized child workflows; update the
validation to examine all rendered child sources (or the entire rendered content
as a whole) so embedded onError handlers are validated too. Concretely, locate
where issues.push(blockingIssue(...)) is called (the
requiresRepairAwareRetry(content) check) and either (A) iterate over child
workflow sources before they are serialized and call
requiresRepairAwareRetry(childSource) for each, or (B) change
requiresRepairAwareRetry to operate on the full rendered content (not
line-based) and use a multiline/dotall regex to find .onError(...) blocks and
ensure they include repairAgent and repairRetries; apply the same fix to the
other occurrence around the 434-443 area so both top-level and embedded child
workflows are validated.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/local/auto-fix-loop.ts (1)

1016-1023: ⚡ Quick win

Consider using shared constants for repair-aware retry configuration.

The repair configuration at line 1021 uses hardcoded values (maxRetries: 2, retryDelayMs: 1000, repairRetries: 2), while master-workflow-renderer.ts uses imported constants (DEFAULT_RETRY_MAX_ATTEMPTS, DEFAULT_RETRY_BACKOFF_MS, DEFAULT_REPAIR_RETRY_ATTEMPTS). This divergence can lead to inconsistent behavior if constants change.

♻️ Suggested fix
+import {
+  DEFAULT_REPAIR_RETRY_ATTEMPTS,
+  DEFAULT_RETRY_BACKOFF_MS,
+  DEFAULT_RETRY_MAX_ATTEMPTS,
+} from '../shared/constants.js';

 // ... in repairMasterChildRunRepairLoop:
       changes.push('replaced fail-fast error handling with repair-aware retry');
       const indent = match.match(/^\s*/)?.[0] ?? '';
-      return `${indent}.onError('retry', { maxRetries: 2, retryDelayMs: 1000, repairAgent: "master-lead", repairRetries: 2 })`;
+      return `${indent}.onError('retry', { maxRetries: ${DEFAULT_RETRY_MAX_ATTEMPTS}, retryDelayMs: ${DEFAULT_RETRY_BACKOFF_MS}, repairAgent: "master-lead", repairRetries: ${DEFAULT_REPAIR_RETRY_ATTEMPTS} })`;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/local/auto-fix-loop.ts` around lines 1016 - 1023, Replace the hardcoded
retry/repair numbers in the .onError replacement with the shared constants used
by master-workflow-renderer.ts: import DEFAULT_RETRY_MAX_ATTEMPTS,
DEFAULT_RETRY_BACKOFF_MS, and DEFAULT_REPAIR_RETRY_ATTEMPTS at the top of
src/local/auto-fix-loop.ts, then use those constants instead of 2, 1000, and 2
in the returned string inside the next.replace callback (the callback that
currently pushes 'replaced fail-fast error handling with repair-aware retry');
this keeps retry, backoff and repair retry values consistent across modules.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/local/auto-fix-loop.ts`:
- Around line 1016-1023: Replace the hardcoded retry/repair numbers in the
.onError replacement with the shared constants used by
master-workflow-renderer.ts: import DEFAULT_RETRY_MAX_ATTEMPTS,
DEFAULT_RETRY_BACKOFF_MS, and DEFAULT_REPAIR_RETRY_ATTEMPTS at the top of
src/local/auto-fix-loop.ts, then use those constants instead of 2, 1000, and 2
in the returned string inside the next.replace callback (the callback that
currently pushes 'replaced fail-fast error handling with repair-aware retry');
this keeps retry, backoff and repair retry values consistent across modules.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: a6c16171-c532-40a6-8872-79cc0c1ec53d

📥 Commits

Reviewing files that changed from the base of the PR and between 800ddc8 and d800f6e.

📒 Files selected for processing (7)
  • .trajectories/completed/2026-05/traj_b77ch3xz99r8.json
  • .trajectories/completed/2026-05/traj_b77ch3xz99r8.md
  • .trajectories/index.json
  • src/local/auto-fix-loop.test.ts
  • src/local/auto-fix-loop.ts
  • src/product/generation/master-workflow-renderer.ts
  • src/product/generation/pipeline.test.ts
✅ Files skipped from review due to trivial changes (2)
  • .trajectories/completed/2026-05/traj_b77ch3xz99r8.json
  • .trajectories/completed/2026-05/traj_b77ch3xz99r8.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/product/generation/pipeline.test.ts

@khaliqgant khaliqgant merged commit 45ae3cf into main May 8, 2026
1 check passed
@khaliqgant khaliqgant deleted the codex/ricky-workflow-never-fail branch May 8, 2026 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant