fix: classify timeouts correctly + restore per-step timeout budgets by khaliqgant · Pull Request #51 · AgentWorkforce/ricky

khaliqgant · 2026-05-06T11:58:33Z

Summary

Add STEP_TIMEOUT blocker code so workflow timeouts stop being misclassified as CREDENTIALS_REJECTED.
Tighten the credentials regex so bare mentions of "auth" / "token" in agent PTY output no longer false-positive.
Restore per-step timeoutMs on lead-plan, implement-artifact, review-*, fix-loop, and final-signoff (the regression from 661ae4c).
Bump DEFAULT_RUN_TIMEOUT_MS 10min → 45min so the outer process killer is a safety net, not the constraint.

Why

A workflow run came back with Execution: blocked — CREDENTIALS_REJECTED at runtime-launch even though credentials were fine. Investigation showed two converging issues:

1. The classifier was over-matching. classifyCoordinatorBlocker did /(?:credential|auth|unauthorized|forbidden|token|api key)/i.test(combined) on the full combined stdout+stderr, which is 1.5MB+ of codex/claude PTY bytes that mention "token" and "auth" constantly during normal operation. The credentials check ran before the timeout|timed out check, so every timeout that ran codex got bucketed into credentials.

2. Per-step timeouts were collectively removed. Commit 661ae4c ("generation: remove hardcoded review-step timeout") deleted timeoutMs: 300_000 on review steps. After that, every step in a generated workflow shared a single 10-min global pool. A typical pipeline (lead-plan + implement + 2 reviewers + validators + fix-loop) sums to well over 10 min, so timeouts started firing routinely. PR #50 attempts to fix this by bumping the default to 30min, but that's a band-aid — the same issue returns for any slightly larger workflow, and the misclassification is still there.

This PR fixes the root causes.

Changes

Classifier (`src/local/entrypoint.ts`)

Add STEP_TIMEOUT to LocalBlockerCode and timeout to LocalBlockerCategory.
In classifyCoordinatorBlocker, detect timeouts via the authoritative result.status === 'timed_out' signal before any regex heuristics. Coordinator status is the source of truth.
Replace the bare-word credentials regex with matchesCredentialFailure(), which requires per-line co-occurrence of an explicit rejection marker (401, 403, unauthorized, forbidden, invalid token, invalid api key, credentials rejected/expired/invalid, please sign in again, etc.).

Per-step timeouts (`src/product/generation/template-renderer.ts` + `src/shared/constants.ts`)

Add per-role budget constants:
- DEFAULT_LEAD_PLAN_TIMEOUT_MS = 600_000 (10min)
- DEFAULT_IMPLEMENT_TIMEOUT_MS = 1_200_000 (20min)
- DEFAULT_REVIEW_TIMEOUT_MS = 600_000 (10min)
- DEFAULT_FIX_LOOP_TIMEOUT_MS = 1_200_000 (20min)
Render timeoutMs: on lead-plan, implement-artifact, review (claude + codex), fix-loop, and final-signoff steps.
Bump DEFAULT_RUN_TIMEOUT_MS from 600_000 to 2_700_000 (45 min) so the outer wall-clock kill never fires before per-step constraints — per-step is now the meaningful constraint.

Tests

Update existing "kills the SDK workflow process tree when the local timeout fires" to assert STEP_TIMEOUT (previously NETWORK_UNREACHABLE).
Add does not classify incidental "auth"/"token" mentions in agent output as CREDENTIALS_REJECTED (regression for the false-positive).
Add still classifies real credential failures (401/unauthorized/invalid token) as CREDENTIALS_REJECTED to lock in the tightened regex.

Test plan

npx tsc --noEmit clean
npx vitest run src/local/entrypoint.test.ts src/runtime/local-coordinator.test.ts src/product/generation/pipeline.test.ts test/generated-workflow-hygiene.test.ts — 162 tests pass
Full suite (excluding two long-running e2e files) — 828 tests pass across 41 files
npm run bundle — clean, constants land correctly in dist/ricky.js
Regenerate a workflow and confirm rendered timeoutMs: per step

Supersedes

Supersedes #50 — closing that PR. This PR addresses the underlying issue rather than the symptom.

🤖 Generated with Claude Code

A workflow timeout was being misclassified as `CREDENTIALS_REJECTED` whenever codex/claude PTY output happened to mention the words "auth" or "token" — which they do constantly during normal operation. The aggressive classifier regex ran before the timeout regex, so any failure that captured those bytes leaked into the credentials bucket and surfaced misleading "refresh local provider credentials" next steps. Two related issues converged to make this hit much more often: 1. The credentials-detection regex `/(?:credential|auth|unauthorized|forbidden| token|api key)/i` matched bare words rather than failure markers. Replaced with a per-line check that requires an explicit rejection signal (401/403, "invalid token", "unauthorized", "credentials rejected/expired", etc.). 2. Per-step `timeoutMs` was removed from generated review steps in 661ae4c ("generation: remove hardcoded review-step timeout"), which collapsed every step into one shared 10-min global pool. Implementation + 2 reviewers + validators routinely exceed that, so timeouts started firing constantly. Changes: - Add `STEP_TIMEOUT` blocker code with `timeout` category. Classify based on the authoritative `result.status === 'timed_out'` signal *before* falling through to regex heuristics. - Tighten credential-failure detection to require co-occurring rejection markers; bare mentions of "auth"/"token" no longer trigger a false positive. - Restore per-step `timeoutMs` on lead-plan, implement-artifact, review-*, fix-loop, and final-signoff. Tuned values: implement/fix-loop = 20min, lead-plan/review/signoff = 10min. - Bump `DEFAULT_RUN_TIMEOUT_MS` 10min → 45min so the outer process killer never fires before per-step constraints can do their job. Per-step is now the meaningful constraint; the outer timeout is a safety net. - Update existing test "kills the SDK workflow process tree when the local timeout fires" to assert STEP_TIMEOUT (was NETWORK_UNREACHABLE). - Add regression tests for the credentials-classifier false positive and the real-credential-failure path. Supersedes #50 (the simple default-bump). Closes the underlying classifier bug rather than just delaying it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-06T11:58:47Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 0e9ca569-e456-475a-be5f-1b9810db3aad

📥 Commits

Reviewing files that changed from the base of the PR and between 94f656c and 9f68e66.

📒 Files selected for processing (4)

src/local/entrypoint.test.ts
src/local/entrypoint.ts
src/product/generation/template-renderer.ts
src/shared/constants.ts

📝 Walkthrough

Walkthrough

This PR expands timeout handling throughout the codebase by introducing dedicated timeout constants, updating the local runtime blocker classifier to recognize timeouts as a distinct failure category, and applying per-step timeout specifications to workflow steps. Regression tests verify credential-rejection classification remains accurate.

Changes

Timeout Handling & Blocker Classification

Layer / File(s)	Summary
Constants Definition `src/shared/constants.ts`	`DEFAULT_RUN_TIMEOUT_MS` increased to 2,700,000 ms (45 minutes), `DEFAULT_STEP_TIMEOUT_MS` increased to 600,000 ms, and four new per-step budget constants introduced: `DEFAULT_LEAD_PLAN_TIMEOUT_MS`, `DEFAULT_IMPLEMENT_TIMEOUT_MS`, `DEFAULT_REVIEW_TIMEOUT_MS`, and `DEFAULT_FIX_LOOP_TIMEOUT_MS`.
Blocker Taxonomy & Classification `src/local/entrypoint.ts`	`LocalBlockerCode` and `LocalBlockerCategory` types expanded with `STEP_TIMEOUT` and `timeout` members. `classifyCoordinatorBlocker` logic updated to explicitly detect and classify timed-out runs with precedence over credential checks. New `matchesCredentialFailure` helper refactors credential-detection logic for reusability.
Workflow Template Integration `src/product/generation/template-renderer.ts`	Imports new timeout constants and applies `timeoutMs` specifications to lead-plan, implement-artifact, review, fix-loop, and final-signoff steps using their respective budget constants.
Tests & Verification `src/local/entrypoint.test.ts`	Regression tests added: one verifies incidental "auth"/"token" mentions in runtime output are not misclassified as credential rejections; another confirms real credential failures (HTTP 401) are correctly classified as `CREDENTIALS_REJECTED`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

AgentWorkforce/ricky#49: Modifies local entrypoint blocker-classification logic for coordinator blockers including credentials, missing-env, and timeouts.
AgentWorkforce/ricky#47: Modifies credential-related blocker handling and classification in the local-run framework.

Poem

🐰 A timeout's born, no more unseen—
Step budgets set, each role has sheen,
No false "token" cries in the dark,
Real failures spark, credentials stark,
The warren now runs on better time! ✨

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/per-step-timeout-and-classifier

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9f68e66a05

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-06T12:01:01Z

+// exceed the longest sequential path of per-step budgets below so that the
+// outer process killer never fires before per-step timeouts can do their job.
+// Per-step timeouts are the meaningful constraint; this is a safety net.
+export const DEFAULT_RUN_TIMEOUT_MS = 2_700_000; // 45 min


Make the run timeout exceed the generated critical path

For default generated workflows, this 45-minute whole-run timeout is still shorter than the sequential per-step budgets added in this change: lead-plan 10m + implement 20m + one review wave 10m + fix-loop 20m + final-review 10m + final-signoff 10m is already 80 minutes, before deterministic gates or retries. Because DEFAULT_TIMEOUT_MS is rendered into the workflow's .timeout(...) and also used by the local coordinator when no override is supplied, a workflow can have every agent step stay within its own timeoutMs yet still be killed by the outer budget before the later steps finish, so the outer timeout is not actually a safety net.

Useful? React with 👍 / 👎.

khaliqgant mentioned this pull request May 6, 2026

chore: bump default run timeout from 10m to 30m #50

Closed

3 tasks

khaliqgant merged commit c89e75d into main May 6, 2026
1 check was pending

khaliqgant deleted the fix/per-step-timeout-and-classifier branch May 6, 2026 11:59

chatgpt-codex-connector Bot reviewed May 6, 2026

View reviewed changes

coderabbitai Bot mentioned this pull request May 7, 2026

chore(timeout): bump default run wallclock 45m -> 4h #64

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: classify timeouts correctly + restore per-step timeout budgets#51

fix: classify timeouts correctly + restore per-step timeout budgets#51
khaliqgant merged 1 commit intomainfrom
fix/per-step-timeout-and-classifier

khaliqgant commented May 6, 2026

Uh oh!

coderabbitai Bot commented May 6, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

khaliqgant commented May 6, 2026

Summary

Why

Changes

Classifier (src/local/entrypoint.ts)

Per-step timeouts (src/product/generation/template-renderer.ts + src/shared/constants.ts)

Tests

Test plan

Supersedes

Uh oh!

coderabbitai Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Classifier (`src/local/entrypoint.ts`)

Per-step timeouts (`src/product/generation/template-renderer.ts` + `src/shared/constants.ts`)

coderabbitai Bot commented May 6, 2026 •

edited

Loading