Skip to content

fix: classify timeouts correctly + restore per-step timeout budgets#51

Merged
khaliqgant merged 1 commit intomainfrom
fix/per-step-timeout-and-classifier
May 6, 2026
Merged

fix: classify timeouts correctly + restore per-step timeout budgets#51
khaliqgant merged 1 commit intomainfrom
fix/per-step-timeout-and-classifier

Conversation

@khaliqgant
Copy link
Copy Markdown
Member

Summary

  • Add STEP_TIMEOUT blocker code so workflow timeouts stop being misclassified as CREDENTIALS_REJECTED.
  • Tighten the credentials regex so bare mentions of "auth" / "token" in agent PTY output no longer false-positive.
  • Restore per-step timeoutMs on lead-plan, implement-artifact, review-*, fix-loop, and final-signoff (the regression from 661ae4c).
  • Bump DEFAULT_RUN_TIMEOUT_MS 10min → 45min so the outer process killer is a safety net, not the constraint.

Why

A workflow run came back with Execution: blocked — CREDENTIALS_REJECTED at runtime-launch even though credentials were fine. Investigation showed two converging issues:

1. The classifier was over-matching. classifyCoordinatorBlocker did /(?:credential|auth|unauthorized|forbidden|token|api key)/i.test(combined) on the full combined stdout+stderr, which is 1.5MB+ of codex/claude PTY bytes that mention "token" and "auth" constantly during normal operation. The credentials check ran before the timeout|timed out check, so every timeout that ran codex got bucketed into credentials.

2. Per-step timeouts were collectively removed. Commit 661ae4c ("generation: remove hardcoded review-step timeout") deleted timeoutMs: 300_000 on review steps. After that, every step in a generated workflow shared a single 10-min global pool. A typical pipeline (lead-plan + implement + 2 reviewers + validators + fix-loop) sums to well over 10 min, so timeouts started firing routinely. PR #50 attempts to fix this by bumping the default to 30min, but that's a band-aid — the same issue returns for any slightly larger workflow, and the misclassification is still there.

This PR fixes the root causes.

Changes

Classifier (src/local/entrypoint.ts)

  • Add STEP_TIMEOUT to LocalBlockerCode and timeout to LocalBlockerCategory.
  • In classifyCoordinatorBlocker, detect timeouts via the authoritative result.status === 'timed_out' signal before any regex heuristics. Coordinator status is the source of truth.
  • Replace the bare-word credentials regex with matchesCredentialFailure(), which requires per-line co-occurrence of an explicit rejection marker (401, 403, unauthorized, forbidden, invalid token, invalid api key, credentials rejected/expired/invalid, please sign in again, etc.).

Per-step timeouts (src/product/generation/template-renderer.ts + src/shared/constants.ts)

  • Add per-role budget constants:
    • DEFAULT_LEAD_PLAN_TIMEOUT_MS = 600_000 (10min)
    • DEFAULT_IMPLEMENT_TIMEOUT_MS = 1_200_000 (20min)
    • DEFAULT_REVIEW_TIMEOUT_MS = 600_000 (10min)
    • DEFAULT_FIX_LOOP_TIMEOUT_MS = 1_200_000 (20min)
  • Render timeoutMs: on lead-plan, implement-artifact, review (claude + codex), fix-loop, and final-signoff steps.
  • Bump DEFAULT_RUN_TIMEOUT_MS from 600_000 to 2_700_000 (45 min) so the outer wall-clock kill never fires before per-step constraints — per-step is now the meaningful constraint.

Tests

  • Update existing "kills the SDK workflow process tree when the local timeout fires" to assert STEP_TIMEOUT (previously NETWORK_UNREACHABLE).
  • Add does not classify incidental "auth"/"token" mentions in agent output as CREDENTIALS_REJECTED (regression for the false-positive).
  • Add still classifies real credential failures (401/unauthorized/invalid token) as CREDENTIALS_REJECTED to lock in the tightened regex.

Test plan

  • npx tsc --noEmit clean
  • npx vitest run src/local/entrypoint.test.ts src/runtime/local-coordinator.test.ts src/product/generation/pipeline.test.ts test/generated-workflow-hygiene.test.ts — 162 tests pass
  • Full suite (excluding two long-running e2e files) — 828 tests pass across 41 files
  • npm run bundle — clean, constants land correctly in dist/ricky.js
  • Regenerate a workflow and confirm rendered timeoutMs: per step

Supersedes

Supersedes #50 — closing that PR. This PR addresses the underlying issue rather than the symptom.

🤖 Generated with Claude Code

A workflow timeout was being misclassified as `CREDENTIALS_REJECTED` whenever
codex/claude PTY output happened to mention the words "auth" or "token" — which
they do constantly during normal operation. The aggressive classifier regex ran
before the timeout regex, so any failure that captured those bytes leaked into
the credentials bucket and surfaced misleading "refresh local provider
credentials" next steps.

Two related issues converged to make this hit much more often:

1. The credentials-detection regex `/(?:credential|auth|unauthorized|forbidden|
   token|api key)/i` matched bare words rather than failure markers. Replaced
   with a per-line check that requires an explicit rejection signal (401/403,
   "invalid token", "unauthorized", "credentials rejected/expired", etc.).

2. Per-step `timeoutMs` was removed from generated review steps in 661ae4c
   ("generation: remove hardcoded review-step timeout"), which collapsed every
   step into one shared 10-min global pool. Implementation + 2 reviewers +
   validators routinely exceed that, so timeouts started firing constantly.

Changes:
- Add `STEP_TIMEOUT` blocker code with `timeout` category. Classify based on
  the authoritative `result.status === 'timed_out'` signal *before* falling
  through to regex heuristics.
- Tighten credential-failure detection to require co-occurring rejection
  markers; bare mentions of "auth"/"token" no longer trigger a false positive.
- Restore per-step `timeoutMs` on lead-plan, implement-artifact, review-*,
  fix-loop, and final-signoff. Tuned values: implement/fix-loop = 20min,
  lead-plan/review/signoff = 10min.
- Bump `DEFAULT_RUN_TIMEOUT_MS` 10min → 45min so the outer process killer
  never fires before per-step constraints can do their job. Per-step is now
  the meaningful constraint; the outer timeout is a safety net.
- Update existing test "kills the SDK workflow process tree when the local
  timeout fires" to assert STEP_TIMEOUT (was NETWORK_UNREACHABLE).
- Add regression tests for the credentials-classifier false positive and the
  real-credential-failure path.

Supersedes #50 (the simple default-bump). Closes the underlying classifier
bug rather than just delaying it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 6, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 0e9ca569-e456-475a-be5f-1b9810db3aad

📥 Commits

Reviewing files that changed from the base of the PR and between 94f656c and 9f68e66.

📒 Files selected for processing (4)
  • src/local/entrypoint.test.ts
  • src/local/entrypoint.ts
  • src/product/generation/template-renderer.ts
  • src/shared/constants.ts

📝 Walkthrough

Walkthrough

This PR expands timeout handling throughout the codebase by introducing dedicated timeout constants, updating the local runtime blocker classifier to recognize timeouts as a distinct failure category, and applying per-step timeout specifications to workflow steps. Regression tests verify credential-rejection classification remains accurate.

Changes

Timeout Handling & Blocker Classification

Layer / File(s) Summary
Constants Definition
src/shared/constants.ts
DEFAULT_RUN_TIMEOUT_MS increased to 2,700,000 ms (45 minutes), DEFAULT_STEP_TIMEOUT_MS increased to 600,000 ms, and four new per-step budget constants introduced: DEFAULT_LEAD_PLAN_TIMEOUT_MS, DEFAULT_IMPLEMENT_TIMEOUT_MS, DEFAULT_REVIEW_TIMEOUT_MS, and DEFAULT_FIX_LOOP_TIMEOUT_MS.
Blocker Taxonomy & Classification
src/local/entrypoint.ts
LocalBlockerCode and LocalBlockerCategory types expanded with STEP_TIMEOUT and timeout members. classifyCoordinatorBlocker logic updated to explicitly detect and classify timed-out runs with precedence over credential checks. New matchesCredentialFailure helper refactors credential-detection logic for reusability.
Workflow Template Integration
src/product/generation/template-renderer.ts
Imports new timeout constants and applies timeoutMs specifications to lead-plan, implement-artifact, review, fix-loop, and final-signoff steps using their respective budget constants.
Tests & Verification
src/local/entrypoint.test.ts
Regression tests added: one verifies incidental "auth"/"token" mentions in runtime output are not misclassified as credential rejections; another confirms real credential failures (HTTP 401) are correctly classified as CREDENTIALS_REJECTED.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • AgentWorkforce/ricky#49: Modifies local entrypoint blocker-classification logic for coordinator blockers including credentials, missing-env, and timeouts.
  • AgentWorkforce/ricky#47: Modifies credential-related blocker handling and classification in the local-run framework.

Poem

🐰 A timeout's born, no more unseen—
Step budgets set, each role has sheen,
No false "token" cries in the dark,
Real failures spark, credentials stark,
The warren now runs on better time! ✨

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/per-step-timeout-and-classifier

Comment @coderabbitai help to get the list of available commands and usage tips.

@khaliqgant khaliqgant merged commit c89e75d into main May 6, 2026
1 check was pending
@khaliqgant khaliqgant deleted the fix/per-step-timeout-and-classifier branch May 6, 2026 11:59
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9f68e66a05

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/shared/constants.ts
// exceed the longest sequential path of per-step budgets below so that the
// outer process killer never fires before per-step timeouts can do their job.
// Per-step timeouts are the meaningful constraint; this is a safety net.
export const DEFAULT_RUN_TIMEOUT_MS = 2_700_000; // 45 min
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Make the run timeout exceed the generated critical path

For default generated workflows, this 45-minute whole-run timeout is still shorter than the sequential per-step budgets added in this change: lead-plan 10m + implement 20m + one review wave 10m + fix-loop 20m + final-review 10m + final-signoff 10m is already 80 minutes, before deterministic gates or retries. Because DEFAULT_TIMEOUT_MS is rendered into the workflow's .timeout(...) and also used by the local coordinator when no override is supplied, a workflow can have every agent step stay within its own timeoutMs yet still be killed by the outer budget before the later steps finish, so the outer timeout is not actually a safety net.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant