fix: improve rate limit handling with exponential backoff#109
fix: improve rate limit handling with exponential backoff#109leonvanzyl merged 7 commits intoAutoForgeAI:masterfrom
Conversation
When Claude API hits rate limits via HTTP 429 exceptions (rather than response text), the agent now properly detects and handles them: - Add RATE_LIMIT_PATTERNS constant for comprehensive detection - Add parse_retry_after() to extract wait times from error messages - Add is_rate_limit_error() helper for pattern matching - Return new "rate_limit" status from exception handler - Implement exponential backoff: 60s → 120s → 240s... (max 1 hour) - Improve generic error backoff: 30s → 60s → 90s... (max 5 minutes) - Expand text-based detection patterns in response handling - Add unit tests for new functions Fixes AutoForgeAI#41 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
📝 WalkthroughWalkthroughAdds a new rate-limit utilities module and updates the agent loop to detect rate-limit errors, parse retry hints, and apply clamped exponential/linear backoff with separate counters for rate-limit and general errors; unit tests for the utilities are included. Changes
Sequence DiagramsequenceDiagram
participant Agent
participant API
participant RateLimitUtils
participant Backoff
Agent->>API: perform request/operation
API-->>Agent: response (success or error)
alt error returned
Agent->>RateLimitUtils: is_rate_limit_error(error_message)
RateLimitUtils-->>Agent: true/false
alt rate limit detected
Agent->>RateLimitUtils: parse_retry_after(error_message)
RateLimitUtils-->>Agent: retry_seconds or None
alt retry_seconds present
Agent->>Backoff: clamp_retry_delay(retry_seconds)
Backoff-->>Agent: clamped_delay
else retry unknown
Agent->>Backoff: calculate_rate_limit_backoff(rate_limit_retries)
Backoff-->>Agent: exponential_delay
end
else not a rate limit
Agent->>Backoff: calculate_error_backoff(error_retries)
Backoff-->>Agent: linear_delay
end
Agent->>Agent: sleep(delay) and adjust retry counters
Agent->>API: retry or continue loop
else success
Agent->>Agent: reset retry counters and proceed
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
agent.py (1)
319-367: Rate‑limit response text can still retry too quickly.
In the"continue"path, counters are reset before checking response text. If a rate‑limit message lacks a reset time, the delay stays at 3s and the backoff counter is cleared, risking rapid retry loops. Consider applyingparse_retry_afteror exponential backoff here and only resettingrate_limit_retrieswhen no rate‑limit signal is present.💡 One possible adjustment
- # Reset retry counters on success - rate_limit_retries = 0 - error_retries = 0 + # Reset error retries on success; rate-limit retries only reset if no signal + error_retries = 0 + reset_rate_limit_retries = True delay_seconds = AUTO_CONTINUE_DELAY_SECONDS target_time_str = None # Check for rate limit indicators in response text response_lower = response.lower() if any(pattern in response_lower for pattern in RATE_LIMIT_PATTERNS): print("Claude Agent SDK indicated rate limit reached.") + reset_rate_limit_retries = False + retry_seconds = parse_retry_after(response) + if retry_seconds is not None: + delay_seconds = retry_seconds + else: + delay_seconds = min(60 * (2 ** rate_limit_retries), 3600) + rate_limit_retries += 1 ... + if reset_rate_limit_retries: + rate_limit_retries = 0
🤖 Fix all issues with AI agents
In `@agent.py`:
- Around line 165-172: The rate-limit handling block incorrectly treats a parsed
retry value of 0 as falsy; update the conditional in the agent.py rate-limit
branch (the code using is_rate_limit_error and parse_retry_after) to check
retry_seconds is not None (explicitly) instead of using a truthy check so that
"Retry-After: 0" is honored and returned as ("rate_limit", "0") rather than
falling back to "unknown".
In `@test_agent.py`:
- Around line 12-54: Extract the duplicated rate-limit logic
(RATE_LIMIT_PATTERNS, parse_retry_after, is_rate_limit_error) into a small
shared module (e.g., rate_limit_utils) and change both test_agent.py and
agent.py to import those symbols from that module; ensure the new module exports
RATE_LIMIT_PATTERNS, parse_retry_after, and is_rate_limit_error with the same
signatures and behavior, update tests to import from it, and remove the copied
definitions from test_agent.py so production and tests use the single source of
truth.
🧹 Nitpick comments (2)
test_agent.py (1)
129-147: Backoff tests re‑encode formulas instead of exercising production behavior.
These assertions will still pass ifrun_autonomous_agentchanges its backoff rules. Consider extracting backoff calculations into helpers (e.g.,calc_rate_limit_backoff,calc_error_backoff) and testing those directly.agent.py (1)
415-418: Backoff comment says “exponential” but formula is linear.
The code uses30 * retries, so the comment should match to avoid confusion.✏️ Suggested edit
- # Non-rate-limit errors: shorter backoff but still exponential + # Non-rate-limit errors: linear backoff capped at 5 minutes
- Fix comment: "exponential" -> "linear" for error backoff (30 * retries) - Fix rate limit counter reset: only reset when no rate limit signal detected - Apply exponential backoff to rate limit in response text (not just exceptions) - Use explicit `is not None` check for retry_seconds to handle Retry-After: 0 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…orgeAI#100, AutoForgeAI#108, AutoForgeAI#109, AutoForgeAI#110 PR AutoForgeAI#110 (Quality Gates): - Move quality checks before DB session to avoid holding locks - Return error instead of None for missing configured custom script - Use contextlib.closing for SQLite connections in progress.py PR AutoForgeAI#109 (Rate Limit): - Extract rate limit logic to shared rate_limit_utils.py module - Remove duplicated code from agent.py and test_agent.py PR AutoForgeAI#108 (SQLite Parallel): - Sort imports alphabetically in feature_mcp.py PR AutoForgeAI#100 (Config Diagnostics): - Add logger.warning for pkill_processes validation failures PR AutoForgeAI#95 (Infrastructure Mock): - Add language tags to fenced code blocks in initializer template Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…odule - Create rate_limit_utils.py with shared constants and functions - Update agent.py to import from shared module - Update test_agent.py to import from shared module (removes duplication) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fixes ruff F401 lint error - the constant was imported but not used in test_agent.py. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
SUMMARY:
Fixed TypeScript build error in ProjectSetupRequired.tsx where startAgent
was being called with a boolean instead of an options object.
DETAILS:
- The startAgent API function signature was updated (in previous PR merges)
to accept an options object: { yoloMode?, parallelMode?, maxConcurrency?, testingAgentRatio? }
- ProjectSetupRequired.tsx was still calling it with the old signature:
startAgent(projectName, yoloMode) - passing boolean directly
- Changed to: startAgent(projectName, { yoloMode }) - wrapping in options object
This was the only remaining build error after merging 13+ PRs from upstream:
- PR AutoForgeAI#112: Security vulnerabilities and race conditions
- PR AutoForgeAI#89: Windows subprocess blocking fix
- PR AutoForgeAI#109: Rate limit handling with exponential backoff
- PR AutoForgeAI#88: MCP server config for ExpandChatSession
- PR AutoForgeAI#100: Diagnostic warnings for config loading
- PR AutoForgeAI#110: Quality gates (quality_gates.py)
- PR AutoForgeAI#113: Structured logging (structured_logging.py)
- PR AutoForgeAI#48: Knowledge files support (API, schemas, prompts)
- PR AutoForgeAI#29: Feature editing/deletion (MCP tools)
- PR AutoForgeAI#45: Chat persistence
- PR AutoForgeAI#52: Refactoring feature guidance
- PR AutoForgeAI#4: Project reset functionality
- PR AutoForgeAI#95: UI polish, health checks, cross-platform fixes
Build now passes: npm run build succeeds with all 2245 modules transformed.
Rate limit utilities now extracted to shared module. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
SUMMARY:
Fixed TypeScript build error in ProjectSetupRequired.tsx where startAgent
was being called with a boolean instead of an options object.
DETAILS:
- The startAgent API function signature was updated (in previous PR merges)
to accept an options object: { yoloMode?, parallelMode?, maxConcurrency?, testingAgentRatio? }
- ProjectSetupRequired.tsx was still calling it with the old signature:
startAgent(projectName, yoloMode) - passing boolean directly
- Changed to: startAgent(projectName, { yoloMode }) - wrapping in options object
This was the only remaining build error after merging 13+ PRs from upstream:
- PR AutoForgeAI#112: Security vulnerabilities and race conditions
- PR AutoForgeAI#89: Windows subprocess blocking fix
- PR AutoForgeAI#109: Rate limit handling with exponential backoff
- PR AutoForgeAI#88: MCP server config for ExpandChatSession
- PR AutoForgeAI#100: Diagnostic warnings for config loading
- PR AutoForgeAI#110: Quality gates (quality_gates.py)
- PR AutoForgeAI#113: Structured logging (structured_logging.py)
- PR AutoForgeAI#48: Knowledge files support (API, schemas, prompts)
- PR AutoForgeAI#29: Feature editing/deletion (MCP tools)
- PR AutoForgeAI#45: Chat persistence
- PR AutoForgeAI#52: Refactoring feature guidance
- PR AutoForgeAI#4: Project reset functionality
- PR AutoForgeAI#95: UI polish, health checks, cross-platform fixes
Build now passes: npm run build succeeds with all 2245 modules transformed.
…orgeAI#100, AutoForgeAI#108, AutoForgeAI#109, AutoForgeAI#110 PR AutoForgeAI#110 (Quality Gates): - Move quality checks before DB session to avoid holding locks - Return error instead of None for missing configured custom script - Use contextlib.closing for SQLite connections in progress.py PR AutoForgeAI#109 (Rate Limit): - Extract rate limit logic to shared rate_limit_utils.py module - Remove duplicated code from agent.py and test_agent.py PR AutoForgeAI#108 (SQLite Parallel): - Sort imports alphabetically in feature_mcp.py PR AutoForgeAI#100 (Config Diagnostics): - Add logger.warning for pkill_processes validation failures PR AutoForgeAI#95 (Infrastructure Mock): - Add language tags to fenced code blocks in initializer template Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Hey @cabana8471-arch! Thanks for tackling rate limit handling — this is a real pain point (issues #41 and #111), and the core design here is solid. The exponential backoff approach, the shared I ran through a detailed review and found a few things that need attention before we can merge. Flagging them below roughly by priority. 🚫 Blocker:
|
| Pattern | Concern |
|---|---|
"please wait" |
Claude naturally says "please wait while I..." during normal coding |
"try again later" |
Could appear in error-handling discussions or npm output |
"429" |
Matches port numbers (4293), line numbers (line 429), counts, etc. |
Since these are substring-matched on response.lower(), any natural language output triggers exponential backoff. The other patterns ("rate limit", "rate_limit", "too many requests", "quota exceeded", "limit reached") are specific enough to be reliable. Could you remove the broad ones, or replace bare "429" with something like "http 429" or "429 too many"?
⚠️ Missing bounds check on parsed retry delay
In the elif status == "rate_limit": block:
if response != "unknown":
delay_seconds = int(response)Two issues here:
- No
try/except— a malformed value would crash the loop - No upper bound — a weird error message like "retry after 999999 seconds" would sleep for days
Suggestion: delay_seconds = min(int(response), 3600) wrapped in a try/except that falls back to the exponential backoff path.
💡 Smaller suggestions
-
Use
is_rate_limit_error()consistently — The"continue"path inlinesany(pattern in response_lower for pattern in RATE_LIMIT_PATTERNS)instead of calling the function you already wrote. Using the function keeps detection logic in one place. -
Rename
test_agent.py→test_rate_limit_utils.py— The tests only coverrate_limit_utils.pyfunctions, and the project convention istest_{module}.py(e.g.,test_security.py). -
Backoff tests don't test production code —
TestExponentialBackoffindependently computes the formula rather than importing/calling anything fromagent.py. If the formula changes in production, these tests still pass. Consider extracting the backoff calculation into a function inrate_limit_utils.pyand testing that. -
False-positive test cases — Would be great to add tests confirming that strings like
"please wait for the build"or"see line 429"do NOT trigger rate-limit detection (especially if the broad patterns stay).
📝 For awareness (not blocking)
-
Parallel mode: The orchestrator runs agents with
--max-iterations 1, so backoff never accumulates across subprocess restarts. Rate-limited agents burn throughMAX_FEATURE_RETRIESwithout benefit of the backoff. Worth documenting as a known limitation or opening a follow-up issue. -
SDK behavior: Rate limits (HTTP 429) may arrive as
AssistantMessageobjects rather than exceptions, so theexcept Exceptionpath inrun_agent_sessionmay not fire for the most common rate-limit scenario. The response-text detection in the"continue"path is the actual safety net — worth confirming which path fires in practice.
Overall the direction is great — the backoff logic, utility extraction, and test coverage of the parsing functions are all good work. Just need to sort out the import crash, tighten the patterns, and add the bounds check. Looking forward to the updated version! 🙌
- BLOCKER: Remove clear_stuck_features import and psutil block (doesn't exist in upstream) - Fix overly broad rate limit patterns to avoid false positives - Remove "please wait", "try again later", "limit reached", "429" (bare) - Convert to regex-based detection with word boundaries - Add patterns for "http 429", "status 429", "error 429" - Add bounds checking (1-3600s) for parsed retry delays - Use is_rate_limit_error() consistently instead of inline pattern matching - Extract backoff functions to rate_limit_utils.py for testability - calculate_rate_limit_backoff() for exponential backoff - calculate_error_backoff() for linear backoff - clamp_retry_delay() for safe range enforcement - Rename test_agent.py to test_rate_limit_utils.py (matches module) - Add comprehensive false-positive tests: - Version numbers (v14.29.0) - Issue/PR numbers (#429) - Line numbers (file.py:429) - Port numbers (4293) - Legitimate wait/retry messages Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…view fixes) Resolved conflicts: - Kept clear_stuck_features functionality (my-features specific) - Integrated clamp_retry_delay() for bounds checking - Preserved structured logging additions - Used calculate_rate_limit_backoff() for exponential backoff
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@agent.py`:
- Around line 373-389: The rate-limit branch should reset the error counter so
backoffs remain consecutive: when handling status == "rate_limit" (in the block
using clamp_retry_delay and calculate_rate_limit_backoff and updating
rate_limit_retries), set error_retries = 0 before incrementing/using
rate_limit_retries; likewise, mirror this change in the error-handling branch
(where error_retries is updated) to reset rate_limit_retries = 0. Ensure both
places reference the existing variables rate_limit_retries and error_retries so
mixed events don't inflate delays.
In `@rate_limit_utils.py`:
- Around line 48-52: The current patterns list in rate_limit_utils.py (the regex
entries starting with "retry.?after" and similar) will capture bare digits even
when the text says "minutes" or "hours"; update those regexes to prohibit
minute/hour units after the captured number by adding a negative lookahead for
minute/hour variants (e.g. (?!\s*(?:m|min|minute|minutes|h|hr|hour|hours)) )
immediately after the digit capture, or alternatively require an explicit
seconds unit when a unit is present (so only seconds/s are accepted); adjust the
pattern strings in the patterns list accordingly so "retry after 5 minutes" no
longer returns 5 while keeping valid "5 seconds" matches.
In `@test_rate_limit_utils.py`:
- Around line 53-57: Add a negative assertion to the test_minutes_not_supported
test to ensure strings like "retry after 5 minutes" are not parsed;
specifically, call parse_retry_after("retry after 5 minutes") and assert it
returns None so the existing "retry after" regex doesn't accept minutes. Update
test_minutes_not_supported to include this additional assert alongside the
existing "wait 5 minutes" and "try again in 2 minutes" checks.
1. agent.py: Reset opposite retry counter when entering rate_limit or error status to prevent mixed events from inflating delays 2. rate_limit_utils.py: Fix parse_retry_after() regex to reject minute/hour units - patterns now require explicit "seconds"/"s" unit or end of string 3. test_rate_limit_utils.py: Add tests for "retry after 5 minutes" and other minute/hour variants to ensure they return None Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
agent.py (1)
277-371:⚠️ Potential issue | 🟠 MajorClamp reset‑time delays to the 1‑hour cap.
Line 328 allows up to 24 hours, which bypasses the 1‑hour cap used byclamp_retry_delay/calculate_rate_limit_backoffand the PR objectives. Align this path with the same cap (or update the objective if 24h is intended).🔧 Suggested fix (use shared clamp)
- delay_seconds = int(min( - delta.total_seconds(), 24 * 60 * 60 - )) # Clamp to 24 hours max + delay_seconds = clamp_retry_delay(int(delta.total_seconds()))
Use the shared clamp_retry_delay() function (1-hour cap) for parsed reset-time delays instead of a separate 24-hour cap. This aligns with the PR's consistent 1-hour maximum delay objective. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Thank you |
Summary
Problem
When Claude API hits rate limits via HTTP 429 exceptions, the agent would:
Solution
parse_retry_after()to extract wait times from error messagesis_rate_limit_error()for comprehensive pattern matching"rate_limit"status from exception handlerTest plan
test_agent.py- 11 tests)Fixes #41
Fixes #111
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.