fix: classify timeouts correctly + restore per-step timeout budgets#51
fix: classify timeouts correctly + restore per-step timeout budgets#51khaliqgant merged 1 commit intomainfrom
Conversation
A workflow timeout was being misclassified as `CREDENTIALS_REJECTED` whenever codex/claude PTY output happened to mention the words "auth" or "token" — which they do constantly during normal operation. The aggressive classifier regex ran before the timeout regex, so any failure that captured those bytes leaked into the credentials bucket and surfaced misleading "refresh local provider credentials" next steps. Two related issues converged to make this hit much more often: 1. The credentials-detection regex `/(?:credential|auth|unauthorized|forbidden| token|api key)/i` matched bare words rather than failure markers. Replaced with a per-line check that requires an explicit rejection signal (401/403, "invalid token", "unauthorized", "credentials rejected/expired", etc.). 2. Per-step `timeoutMs` was removed from generated review steps in 661ae4c ("generation: remove hardcoded review-step timeout"), which collapsed every step into one shared 10-min global pool. Implementation + 2 reviewers + validators routinely exceed that, so timeouts started firing constantly. Changes: - Add `STEP_TIMEOUT` blocker code with `timeout` category. Classify based on the authoritative `result.status === 'timed_out'` signal *before* falling through to regex heuristics. - Tighten credential-failure detection to require co-occurring rejection markers; bare mentions of "auth"/"token" no longer trigger a false positive. - Restore per-step `timeoutMs` on lead-plan, implement-artifact, review-*, fix-loop, and final-signoff. Tuned values: implement/fix-loop = 20min, lead-plan/review/signoff = 10min. - Bump `DEFAULT_RUN_TIMEOUT_MS` 10min → 45min so the outer process killer never fires before per-step constraints can do their job. Per-step is now the meaningful constraint; the outer timeout is a safety net. - Update existing test "kills the SDK workflow process tree when the local timeout fires" to assert STEP_TIMEOUT (was NETWORK_UNREACHABLE). - Add regression tests for the credentials-classifier false positive and the real-credential-failure path. Supersedes #50 (the simple default-bump). Closes the underlying classifier bug rather than just delaying it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughThis PR expands timeout handling throughout the codebase by introducing dedicated timeout constants, updating the local runtime blocker classifier to recognize timeouts as a distinct failure category, and applying per-step timeout specifications to workflow steps. Regression tests verify credential-rejection classification remains accurate. ChangesTimeout Handling & Blocker Classification
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9f68e66a05
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // exceed the longest sequential path of per-step budgets below so that the | ||
| // outer process killer never fires before per-step timeouts can do their job. | ||
| // Per-step timeouts are the meaningful constraint; this is a safety net. | ||
| export const DEFAULT_RUN_TIMEOUT_MS = 2_700_000; // 45 min |
There was a problem hiding this comment.
Make the run timeout exceed the generated critical path
For default generated workflows, this 45-minute whole-run timeout is still shorter than the sequential per-step budgets added in this change: lead-plan 10m + implement 20m + one review wave 10m + fix-loop 20m + final-review 10m + final-signoff 10m is already 80 minutes, before deterministic gates or retries. Because DEFAULT_TIMEOUT_MS is rendered into the workflow's .timeout(...) and also used by the local coordinator when no override is supplied, a workflow can have every agent step stay within its own timeoutMs yet still be killed by the outer budget before the later steps finish, so the outer timeout is not actually a safety net.
Useful? React with 👍 / 👎.
Summary
STEP_TIMEOUTblocker code so workflow timeouts stop being misclassified asCREDENTIALS_REJECTED.timeoutMson lead-plan, implement-artifact, review-*, fix-loop, and final-signoff (the regression from 661ae4c).DEFAULT_RUN_TIMEOUT_MS10min → 45min so the outer process killer is a safety net, not the constraint.Why
A workflow run came back with
Execution: blocked — CREDENTIALS_REJECTED at runtime-launcheven though credentials were fine. Investigation showed two converging issues:1. The classifier was over-matching.
classifyCoordinatorBlockerdid/(?:credential|auth|unauthorized|forbidden|token|api key)/i.test(combined)on the full combined stdout+stderr, which is 1.5MB+ of codex/claude PTY bytes that mention "token" and "auth" constantly during normal operation. The credentials check ran before thetimeout|timed outcheck, so every timeout that ran codex got bucketed into credentials.2. Per-step timeouts were collectively removed. Commit 661ae4c ("generation: remove hardcoded review-step timeout") deleted
timeoutMs: 300_000on review steps. After that, every step in a generated workflow shared a single 10-min global pool. A typical pipeline (lead-plan + implement + 2 reviewers + validators + fix-loop) sums to well over 10 min, so timeouts started firing routinely. PR #50 attempts to fix this by bumping the default to 30min, but that's a band-aid — the same issue returns for any slightly larger workflow, and the misclassification is still there.This PR fixes the root causes.
Changes
Classifier (
src/local/entrypoint.ts)STEP_TIMEOUTtoLocalBlockerCodeandtimeouttoLocalBlockerCategory.classifyCoordinatorBlocker, detect timeouts via the authoritativeresult.status === 'timed_out'signal before any regex heuristics. Coordinator status is the source of truth.matchesCredentialFailure(), which requires per-line co-occurrence of an explicit rejection marker (401,403,unauthorized,forbidden,invalid token,invalid api key,credentials rejected/expired/invalid,please sign in again, etc.).Per-step timeouts (
src/product/generation/template-renderer.ts+src/shared/constants.ts)DEFAULT_LEAD_PLAN_TIMEOUT_MS = 600_000(10min)DEFAULT_IMPLEMENT_TIMEOUT_MS = 1_200_000(20min)DEFAULT_REVIEW_TIMEOUT_MS = 600_000(10min)DEFAULT_FIX_LOOP_TIMEOUT_MS = 1_200_000(20min)timeoutMs:on lead-plan, implement-artifact, review (claude + codex), fix-loop, and final-signoff steps.DEFAULT_RUN_TIMEOUT_MSfrom600_000to2_700_000(45 min) so the outer wall-clock kill never fires before per-step constraints — per-step is now the meaningful constraint.Tests
STEP_TIMEOUT(previouslyNETWORK_UNREACHABLE).does not classify incidental "auth"/"token" mentions in agent output as CREDENTIALS_REJECTED(regression for the false-positive).still classifies real credential failures (401/unauthorized/invalid token) as CREDENTIALS_REJECTEDto lock in the tightened regex.Test plan
npx tsc --noEmitcleannpx vitest run src/local/entrypoint.test.ts src/runtime/local-coordinator.test.ts src/product/generation/pipeline.test.ts test/generated-workflow-hygiene.test.ts— 162 tests passnpm run bundle— clean, constants land correctly indist/ricky.jstimeoutMs:per stepSupersedes
Supersedes #50 — closing that PR. This PR addresses the underlying issue rather than the symptom.
🤖 Generated with Claude Code