Skip to content

LLM provider SDK lacks retry logic for transient API errors #11705

@ryan-mt

Description

@ryan-mt

Summary

The LLM provider integration lacks retry logic for transient API errors (rate limits, temporary failures), treating all errors as permanent failures.

Location

packages/opencode/src/provider/provider.ts (overall architecture)
packages/opencode/src/provider/sdk/copilot/openai-compatible-error.ts

Issue

The provider SDK:

  1. Has no exponential backoff for 429 (rate limit) responses
  2. Doesn't parse or respect `Retry-After` headers
  3. Treats all API errors as permanent failures
  4. Has no retry budget or max attempts configuration
  5. Error structures don't implement `isRetryable` checks
// Current error handling doesn't distinguish retryable from permanent errors
const errorStructure = config.errorStructure ?? defaultOpenAICompatibleErrorStructure

Impact

  • Severity: Medium
  • Type: Reliability
  • Effect:
    • Rate limit errors immediately fail instead of backing off
    • Temporary network issues cause permanent failures
    • Higher error rates during API instability

Suggested Fix

  1. Add retry configuration to provider options:
interface ProviderRetryConfig {
  maxAttempts?: number
  initialDelayMs?: number
  maxDelayMs?: number
  retryableStatusCodes?: number[]
}
  1. Implement exponential backoff with jitter:
async function fetchWithRetry(url, options, retryConfig) {
  let attempt = 0
  while (attempt < retryConfig.maxAttempts) {
    try {
      const response = await fetch(url, options)
      if (response.status === 429 || response.status >= 500) {
        const retryAfter = response.headers.get('Retry-After')
        const delay = retryAfter 
          ? parseInt(retryAfter) * 1000 
          : calculateBackoff(attempt, retryConfig)
        await sleep(delay)
        attempt++
        continue
      }
      return response
    } catch (e) {
      if (!isRetryableError(e) || attempt >= retryConfig.maxAttempts - 1) throw e
      await sleep(calculateBackoff(attempt, retryConfig))
      attempt++
    }
  }
}
  1. Add `isRetryable` to error structures to distinguish between:
    • Rate limit (429) - retry with backoff
    • Server error (5xx) - retry with backoff
    • Client error (4xx except 429) - don't retry
    • Network error - retry with backoff

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions