-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Type
Enhancement
Problem / Value
When embedding requests hit provider rate limits (e.g., 429), providers like Gemini and others return a specific cooldown (Retry-After or similar). The current behavior uses fixed/exponential delays and a global wait, which can either slow down users who aren’t hitting limits or keep retrying too soon and waste quota. Respecting the provider’s exact retry guidance improves throughput and avoids unnecessary slowdowns or quota exhaustion.
Context
Affects code indexing that generates embeddings across providers (including Gemini via the OpenAI-compatible endpoint). Providers often include a clear retry window or structured rate-limit info in the response. Today, retries use guessed backoff and a shared global delay rather than the provider’s suggested wait. Users indexing larger projects are most impacted; honoring provider signals would keep indexing fast for those not rate-limited and recover efficiently when limits are hit.
Constraints/Preferences
- Parse and use the provider’s Retry-After or equivalent backoff signal on 429 responses.
- For Gemini, also parse structured retry info when present.
- Only delay when a rate limit is actually hit; do not introduce per-request delays otherwise.
- Fall back to exponential backoff only if no provider guidance is available.
- Keep a reasonable max cap and avoid log flooding; retain global coordination but base it on provider-provided reset times.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status