-
Notifications
You must be signed in to change notification settings - Fork 11
🤖 fix: prevent infinite retry loop for non-retryable errors #485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
603f688 to
0f883b5
Compare
|
Nice. Had that happen today. |
Fixes perpetual retry loop when authentication or other non-retryable errors occur. Previously, all stream errors would trigger auto-retry, causing the system to infinitely retry errors that require user action. ## Problem When a user provides a bad API key (or hits quota limits, model not found, context exceeded), the system would enter a perpetual retry loop: 1. User sets API key via /providers set 2. Request fails with authentication error 3. useResumeManager auto-retries every 1-60s with exponential backoff 4. Error repeats forever until user force-closes cmux This happened because hasInterruptedStream() returned true for ALL stream-error types, triggering auto-retry even when user action is needed. ## Solution Classify error types into retryable vs non-retryable: **Non-retryable (require user action):** - authentication - Bad API key - quota - Billing/usage limits - model_not_found - Invalid model selection - context_exceeded - Message too long - aborted - User cancelled **Retryable (transient issues):** - network - Temporary network issues - server_error - Provider issues (5xx) - rate_limit - Can retry with backoff - api - Generic API errors - unknown - Defensive retry hasInterruptedStream() now checks errorType and returns false for non-retryable errors, preventing useResumeManager from auto-retrying. ## Testing Added 8 new test cases covering: - Each non-retryable error type returns false - Retryable error types (network, server_error, rate_limit) return true - All 19 tests pass _Generated with `cmux`_
Allows users to specify which host and port to bind to when running in server mode, instead of hardcoding 0.0.0.0:3000. Usage: node dist/main.js server # default: 0.0.0.0:3000 node dist/main.js server -h localhost # bind to localhost only node dist/main.js server -h 0.0.0.0 -p 8080 # custom port Defaults to 0.0.0.0:3000 for network accessibility. _Generated with `cmux`_
Replaces manual argument parsing with commander library for better UX: - Automatic help generation with --help - Proper flag validation (errors on unknown flags) - Supports both short (-h, -p) and long (--host, --port) flags - Type-safe option parsing Usage: node dist/main.js server --help node dist/main.js server # 0.0.0.0:3000 node dist/main.js server --host localhost node dist/main.js server -h 127.0.0.1 -p 8080 Dependencies: Added commander@14.0.2 _Generated with `cmux`_
Fixes issue where errors (like api_key_not_found) are invisible on page reload. Previously, when resumeStream failed during auto-retry, the error was silently swallowed and users only saw "Retrying" without knowing why. ## Problem SendMessageError (validation errors before streaming starts) were never emitted as stream-error events, so they: - Were not persisted to partial.json - Were not visible on page reload - Left users confused during auto-retry loops ## Solution Emit all SendMessageErrors as stream-error events in streamWithHistory: - Both sendMessage and resumeStream now emit errors consistently - Errors persist via partial.json and survive page reloads - ChatInput still shows toast for immediate feedback (in addition to stream-error) ## Parity Before: - sendMessage: validation fails → returns error → toast shown (ephemeral) - resumeStream: validation fails → returns error → silently retries (no UI) After: - sendMessage: validation fails → returns error + emits stream-error → toast + persisted - resumeStream: validation fails → returns error + emits stream-error → persisted Both paths now emit errors for consistent UI feedback and persistence. _Generated with `cmux`_
Adds `make dev-server` for rapid iteration on server mode changes. Setup: - Watches TypeScript files in src/ and rebuilds main process - Watches dist/main.js and dist/main-server.js - Auto-restarts server on changes (500ms debounce) - Uses nodemon for process management Usage: make dev-server Server runs on http://localhost:3000 with automatic reload on file changes. Dependencies: Added nodemon@3.1.10 _Generated with `cmux`_
- Split retry eligibility logic: - hasInterruptedStream() shows UI for ALL errors - isEligibleForAutoRetry() filters non-retryable errors - isNonRetryableSendError() checks SendMessageError types - Fixed localStorage event synchronization: - useResumeManager now uses updatePersistedState() - Dispatches custom events for usePersistedState listeners - RetryBarrier receives updates immediately - Made RetryBarrier self-contained: - Accepts only workspaceId (like AgentStatusIndicator) - Manages own state and event handlers - Computes effectiveAutoRetry internally - Removed bloat from AIView.tsx - Fixed manual retry button: - Added isManual flag to bypass eligibility checks - User clicks = immediate retry regardless of error type - Auto-retry still respects eligibility rules - Centralized event type system: - CustomEventPayloads interface for all events - createCustomEvent() helper for type-safe dispatch - CustomEventType for type-safe listeners - Single source of truth for event payloads Tests: All 30 tests pass in retryEligibility.test.ts _Generated with `cmux`_
Created src/utils/errors/formatSendError.ts as single source of truth for SendMessageError formatting. Both RetryBarrier and ChatInputToasts now use formatSendMessageError() helper, eliminating duplicate logic for provider commands and error messages. - formatSendMessageError() returns message + optional providerCommand - RetryBarrier combines them: "message. Configure with command" - ChatInputToasts uses providerCommand in solution section - All provider-related strings now defined in one place
Added tests that explicitly demonstrate the key distinction: - hasInterruptedStream = Should RetryBarrier UI be shown? - isEligibleForAutoRetry = Should system auto-retry? For non-retryable errors (authentication, quota): - hasInterruptedStream returns true (show UI) - isEligibleForAutoRetry returns false (don't auto-retry) - Result: User sees manual Retry button, not 'Retrying...' For retryable errors (network, server): - Both return true - Result: User sees 'Retrying...' with countdown and exponential backoff This prevents confusion about why we have two similar-sounding functions.
Removed tests that only checked 'does stream-error mean interrupted?' without testing actual retry logic. These were misleading because: - They don't test retry behavior - They're trivial (of course errors mean interrupted) - Test names implied retry testing but just checked message detection Kept only tests that verify isEligibleForAutoRetry correctly filters non-retryable errors, which is the actual important logic.
25ca2a3 to
5a03fe0
Compare
Problem
When users hit non-retryable errors (e.g.,
api_key_not_found,authentication,quota), the system would infinitely retry without user intervention. UI showed "Retrying..." indefinitely with no error message, and the manual retry button was non-functional.Root Causes
Auto-retry didn't distinguish error types: Two error systems existed (
SendMessageErrorfrom resumeStream failures,StreamErrorTypein stream-error messages).isEligibleForAutoRetry()only checked the latter, missingapi_key_not_founderrors stored inRetryState.lastError.RetryBarrier didn't receive updates:
useResumeManagerusedlocalStorage.setItem()directly whileRetryBarrierusedusePersistedStatewithlistener: trueexpecting custom events.Manual retry was blocked:
isEligibleForResume()treated manual retries the same as auto-retries, blocking user intent after fixing issues.Solution
Split retry eligibility logic into three functions in
src/utils/messages/retryEligibility.ts:hasInterruptedStream()- Shows UI for ALL errors (user may have fixed it)isEligibleForAutoRetry()- Filters non-retryable stream errors (authentication, quota, model_not_found, context_exceeded, aborted)isNonRetryableSendError()- Checks SendMessageError types (api_key_not_found, provider_not_supported, invalid_model_string)Fixed event synchronization:
useResumeManagernow usesupdatePersistedState()which dispatches custom events that listeners receive.Made RetryBarrier self-contained: Following
AgentStatusIndicatorpattern, accepts onlyworkspaceIdprop, fetches workspace state directly, manages ownautoRetrystate, computeseffectiveAutoRetryinternally. Removed ~37 lines from AIView.tsx.Added manual retry bypass:
RESUME_CHECK_REQUESTEDevents now includeisManualflag. Manual retry bypasses eligibility checks, respecting user intent.Centralized event type system: Created
CustomEventPayloadsinterface andcreateCustomEvent()helper insrc/constants/events.tsfor type-safe event handling.Centralized error formatting: Created
formatSendMessageError()insrc/utils/errors/formatSendError.ts, eliminating duplicate error message strings.Behavior After Fix
Non-retryable errors: Auto-retry stops immediately, UI shows manual "Retry" button (not "Retrying..."), error message displayed clearly, user can fix and retry.
Retryable errors: Auto-retry with exponential backoff (1s → 2s → 4s → ... → 60s max), UI shows countdown, user can stop with Ctrl+C.
Testing
All 30 tests in
retryEligibility.test.tspass, verifying correct filtering of non-retryable errors and edge cases.Generated with
cmux