Skip to content

[ENHANCEMENT] Honor provider Retry-After on 429 in embedder (faster indexing) #8101

@hannesrudolph

Description

@hannesrudolph

Type

Enhancement

Problem / Value

When embedding requests hit provider rate limits (e.g., 429), providers like Gemini and others return a specific cooldown (Retry-After or similar). The current behavior uses fixed/exponential delays and a global wait, which can either slow down users who aren’t hitting limits or keep retrying too soon and waste quota. Respecting the provider’s exact retry guidance improves throughput and avoids unnecessary slowdowns or quota exhaustion.

Context

Affects code indexing that generates embeddings across providers (including Gemini via the OpenAI-compatible endpoint). Providers often include a clear retry window or structured rate-limit info in the response. Today, retries use guessed backoff and a shared global delay rather than the provider’s suggested wait. Users indexing larger projects are most impacted; honoring provider signals would keep indexing fast for those not rate-limited and recover efficiently when limits are hit.

Constraints/Preferences

  • Parse and use the provider’s Retry-After or equivalent backoff signal on 429 responses.
  • For Gemini, also parse structured retry info when present.
  • Only delay when a rate limit is actually hit; do not introduce per-request delays otherwise.
  • Fall back to exponential backoff only if no provider guidance is available.
  • Keep a reasonable max cap and avoid log flooding; retain global coordination but base it on provider-provided reset times.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Issue [Unassigned]

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions