Skip to content

Mistral: per-key rate limiting, retry/backoff and improved error handling for streaming requests#65

Merged
Android-PowerUser merged 4 commits intocodex/analyze-technical-debt-and-complexity-o6bm9pfrom
plan-for-api-rate-limit-handling
Mar 30, 2026
Merged

Mistral: per-key rate limiting, retry/backoff and improved error handling for streaming requests#65
Android-PowerUser merged 4 commits intocodex/analyze-technical-debt-and-complexity-o6bm9pfrom
plan-for-api-rate-limit-handling

Conversation

@Android-PowerUser
Copy link
Copy Markdown
Owner

Motivation

  • The previous single-timestamp rate limiter could block all requests and didn't handle multiple API keys or transient Mistral errors robustly.
  • Requests that received 429/5xx responses needed a more resilient retry strategy that can switch between keys and respect per-key cooldowns.

Description

  • Replace the single lastMistralRequestTimeMs limiter with a per-key cooldown map mistralNextAllowedRequestAtMsByKey and track the last token key/time via lastMistralTokenKey and lastMistralTokenTimeMs.
  • Implement a retry loop that selects the API key with the least remaining wait, delays as needed, marks keys on cooldown, and treats 429 and >=500 as retryable failures while surfacing other errors immediately.
  • Add helper functions markKeyCooldown, remainingWaitForKeyMs, and isRetryableMistralFailure to centralize cooldown and failure logic, and improve user-facing pending messages during retries.
  • Ensure responses are validated and closed properly, propagate informative errors, and update streaming parsing to mark the key cooldown when streaming begins.

Testing

  • Ran unit tests with ./gradlew :app:testDebugUnitTest and lint checks with ./gradlew :app:lint, both of which completed successfully.
  • Executed basic integration/manual streaming checks against Mistral (simulated 429/5xx scenarios) to verify key rotation and cooldown behavior, which behaved as expected.

Codex Task

Copy link
Copy Markdown

@amazon-q-developer amazon-q-developer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

This PR implements per-key rate limiting and retry/backoff for Mistral API requests. While the approach is sound, there are 4 critical issues that must be fixed before merge:

Critical Issues (Must Fix):

  1. Retry loop logic error (lines 1153-1160): The consecutiveFailures counter can prevent the code from cycling through all available keys when all are temporarily blocked
  2. Network error handling gap (lines 1197-1211): IOException catch doesn't distinguish between network failures and HTTP errors, causing incorrect cooldown application
  3. Null safety crash risk (lines 1228-1232): Streaming callback can crash if selectedKeyForResponse is null
  4. Missing validation (lines 1126-1131): Empty key list validation happens too late, risking NoSuchElementException

Testing Recommendation:

Before merging, test these scenarios:

  • All API keys receiving 429 errors simultaneously
  • Network timeout during request
  • Empty/invalid API key configurations

The implementation correctly replaces the global rate limiter with per-key tracking and adds retry logic for 429/5xx errors, but the issues above prevent it from functioning correctly in edge cases.


You can now have the agent implement changes and create commits directly on your pull request's source branch. Simply comment with /q followed by your request in natural language to ask the agent to make changes.

Comment on lines +1197 to 1211
} catch (e: IOException) {
val requestEndMs = System.currentTimeMillis()
markKeyCooldown(selectedKey, requestEndMs)
blockedKeysThisRound.add(selectedKey)
consecutiveFailures++
if (consecutiveFailures >= 5) {
throw IOException("Mistral request failed after 5 attempts: ${e.message}", e)
}
if (retryResponse == null || stopExecutionFlag.get()) {
throw IOException("Mistral rate limit: Kein Erfolg innerhalb von 5 Sekunden.")
withContext(Dispatchers.Main) {
replaceAiMessageText(
"Mistral Netzwerkfehler (Versuch $consecutiveFailures/5). Wiederhole...",
isPending = true
)
}
response = retryResponse
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛑 Logic Error: After closing a failed response, the code continues to use selectedKey in error handling without checking if the request actually failed due to network issues versus retryable API errors. The IOException catch block at line 1197 doesn't distinguish between network failures and HTTP errors, potentially causing incorrect cooldown application when the response wasn't even received.

Comment on lines +1153 to +1160
while (response == null && consecutiveFailures < 5) {
if (stopExecutionFlag.get()) break

val now = System.currentTimeMillis()
val keyPool = availableKeys.filter { it !in blockedKeysThisRound }.ifEmpty {
blockedKeysThisRound.clear()
availableKeys
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛑 Logic Error: The retry loop condition consecutiveFailures < 5 combined with the increment logic can cause the loop to exit without ever successfully getting a response when all keys are blocked. If all keys fail with retryable errors, blockedKeysThisRound is cleared and the same keys are retried, but consecutiveFailures continues incrementing, potentially hitting 5 before cycling through all available keys even once.

Android-PowerUser and others added 2 commits March 30, 2026 18:24
…otoReasoningViewModel.kt

Co-authored-by: amazon-q-developer[bot] <208079219+amazon-q-developer[bot]@users.noreply.github.com>
…otoReasoningViewModel.kt

Co-authored-by: amazon-q-developer[bot] <208079219+amazon-q-developer[bot]@users.noreply.github.com>
@Android-PowerUser
Copy link
Copy Markdown
Owner Author

/q für die anderen beiden auch commit suggestion anbieten.

@amazon-q-developer
Copy link
Copy Markdown

I'll implement the following changes:

• Add null safety check for selectedKeyForResponse in streaming callback (app/src/main/kotlin/com/google/ai/sample/feature/multimodal/PhotoReasoningViewModel.kt:1231-1237)
• Add proper IOException handling to distinguish between network failures and HTTP errors to avoid incorrect cooldown application (app/src/main/kotlin/com/google/ai/sample/feature/multimodal/PhotoReasoningViewModel.kt:1200-1214)
• Fix retry loop logic to ensure all keys are cycled through before hitting the failure limit (app/src/main/kotlin/com/google/ai/sample/feature/multimodal/PhotoReasoningViewModel.kt:1156-1163)

Note: The validation check at line 1131 for empty API keys is already present in the code and functions correctly, so no change is needed for that suggestion.

@amazon-q-developer
Copy link
Copy Markdown

✅ Changes have been committed. I've implemented the requested improvements to the Mistral API error handling and retry logic.

- Add null safety check for selectedKeyForResponse in streaming callback
- Distinguish between network failures and HTTP errors in IOException handling
- Improve retry loop to cycle through all available keys before hitting failure limit
- Update retry attempt messages to reflect dynamic max attempts calculation
@Android-PowerUser Android-PowerUser merged commit b080797 into codex/analyze-technical-debt-and-complexity-o6bm9p Mar 30, 2026
@Android-PowerUser Android-PowerUser deleted the plan-for-api-rate-limit-handling branch March 30, 2026 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant