Skip to content

Harden Azure OpenAI embedding retry for 429 throttling#1114

Merged
BenjaminMichaelis merged 5 commits into
mainfrom
benjaminmichaelis/embedding-rate-limit-retry
May 16, 2026
Merged

Harden Azure OpenAI embedding retry for 429 throttling#1114
BenjaminMichaelis merged 5 commits into
mainfrom
benjaminmichaelis/embedding-rate-limit-retry

Conversation

@BenjaminMichaelis
Copy link
Copy Markdown
Member

Why

Embedding generation could fail fast on transient Azure OpenAI throttling (HTTP 429), which interrupted full vector rebuild runs. We need resilient retry behavior that respects service guidance while still failing clearly when retries are exhausted.

What changed

  • Added robust retry handling in EmbeddingService for transient failures, including ClientResultException status-based detection (429/5xx/408), exponential backoff, jitter, and Retry-After header support when present.
  • Kept failure behavior explicit: retries are logged with context, and final exhaustion throws a clear terminal exception with attempt details.
  • Preserved cancellation semantics: caller-requested cancellation is rethrown directly and not wrapped as retry exhaustion.
  • Improved staging cleanup behavior after failures so cleanup is best effort and does not mask the original error.
  • Replaced ad-hoc logger calls with source-generated LoggerMessage methods to match project logging conventions.

Configuration and ASP.NET conventions

  • Introduced EmbeddingRetryOptions and bound it via standard options binding at AIOptions:EmbeddingRetry.
  • Added validation (data annotations plus runtime validation) and safe defaults.
  • Added EmbeddingRetry defaults in EssentialCSharp.Web/appsettings.json.
  • Added MaxDelayMs cap to prevent delay overflow/unbounded waits.

Additional cleanup

  • Removed tracked build_output.txt and added it to .gitignore.
  • Removed obsolete RetryOptions model after renaming to avoid ambiguity with Azure.Core.RetryOptions.

Validation

  • Built EssentialCSharp.Chat.Shared/EssentialCSharp.Chat.Common.csproj successfully after changes.
  • Ran dual review passes with Opus 4.6 and GPT-5.5 and incorporated findings.

…ackoff retry

- Add RetryOptions configuration model with configurable backoff parameters
- Implement retry logic with exponential backoff + jitter for transient Azure OpenAI errors
- Honor Retry-After header from 429 responses
- Wrap embedding generation calls with automatic retry wrapper
- Ensure batch processing can recover from transient failures
- Wire configuration via options pattern with safe defaults
- Add comprehensive logging for retry attempts and final failures

Fixes issue where transient 429 errors from text-embedding-3-small-v1
would fail entire embedding batch. Now retries with exponential backoff
(max 5 attempts by default) before failing with clear error context.
- Switch to ASP.NET-style nested options path AIOptions:EmbeddingRetry
- Rename retry options model to avoid Azure.Core RetryOptions ambiguity
- Add data annotations and runtime validation for retry configuration
- Handle ClientResultException transient status codes (429/5xx/408)
- Parse and honor Retry-After header when present
- Use LoggerMessage source-generated logging instead of CA1848 suppression
- Use Random.Shared for thread-safe jitter in singleton service
- Preserve caller cancellation semantics (no retry/wrap on requested cancel)
- Use CancellationToken.None for staging cleanup to avoid masking root failures
- Cap exponential delay with MaxDelayMs to avoid overflow
Copilot AI review requested due to automatic review settings May 16, 2026 02:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds resilient retry handling for transient Azure OpenAI errors (429, 408, 5xx, timeouts) in EmbeddingService, configurable via a new EmbeddingRetryOptions bound at AIOptions:EmbeddingRetry. Final retry exhaustion is logged via LoggerMessage source generators and surfaced as a clear terminal exception, while caller cancellation is preserved and staging cleanup is hardened to not mask the original error.

Changes:

  • Introduce EmbeddingRetryOptions (with data annotations + Validate()) and bind/register it in ServiceCollectionExtensions for both DI entry points.
  • Wrap GenerateEmbeddingAsync and the per-batch embedding call in ExecuteWithRetryAsync (exponential backoff + jitter + Retry-After), with source-generated logging and a non-cancellable staging-collection cleanup on failure.
  • Add EmbeddingRetry defaults to appsettings.json and ignore build_output.txt in .gitignore.

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
EssentialCSharp.Chat.Shared/Services/EmbeddingService.cs Adds retry pipeline, transient-error detection, Retry-After parsing, LoggerMessage methods, and switches staging cleanup to CancellationToken.None.
EssentialCSharp.Chat.Shared/Models/EmbeddingRetryOptions.cs New options class with [Range] annotations, SectionPath constant, and imperative Validate().
EssentialCSharp.Chat.Shared/Extensions/ServiceCollectionExtensions.cs Registers and binds EmbeddingRetryOptions in both AddAzureOpenAIServices overloads with data-annotation + custom validation.
EssentialCSharp.Web/appsettings.json Adds default AIOptions:EmbeddingRetry configuration block.
.gitignore Ignores build_output.txt.
Comments suppressed due to low confidence (1)

EssentialCSharp.Chat.Shared/Models/EmbeddingRetryOptions.cs:82

  • Validate() only enforces lower-bound (and partial cross-field) checks, but the [Range] attributes on the properties define upper bounds (e.g., MaxRetries capped at 20, BaseDelayMs/MaxDelayMs at 600000, BackoffMultiplier at 10.0). Because the registration calls ValidateDataAnnotations() and .Validate(o => { o.Validate(); return true; }), the imperative Validate() will accept values (e.g., MaxRetries = 1000, BackoffMultiplier = 100.0) that the data-annotation validator rejects, leading to inconsistent behavior depending on which validation path is exercised (and when called directly from the secondary constructor / ValidateRetryOptions, the data annotations are never applied). Consider mirroring the [Range] upper bounds in Validate() so both paths agree.
    public void Validate()
    {
        if (MaxRetries < 0)
            throw new InvalidOperationException("MaxRetries must be non-negative.");

        if (BaseDelayMs < 0)
            throw new InvalidOperationException("BaseDelayMs must be non-negative.");

        if (MaxDelayMs <= 0)
            throw new InvalidOperationException("MaxDelayMs must be positive.");

        if (BaseDelayMs > MaxDelayMs)
            throw new InvalidOperationException("BaseDelayMs must be less than or equal to MaxDelayMs.");

        if (BackoffMultiplier < 1.0)
            throw new InvalidOperationException("BackoffMultiplier must be >= 1.0.");

        if (MaxJitterFraction < 0.0 || MaxJitterFraction > 1.0)
            throw new InvalidOperationException("MaxJitterFraction must be between 0.0 and 1.0.");
    }

Comment thread EssentialCSharp.Chat.Shared/Models/EmbeddingRetryOptions.cs Outdated
Comment thread EssentialCSharp.Chat.Shared/Services/EmbeddingService.cs Outdated
Comment thread EssentialCSharp.Chat.Shared/Services/EmbeddingService.cs Outdated
Comment thread EssentialCSharp.Chat.Shared/Services/EmbeddingService.cs Outdated
- Clarify MaxRetries XML docs as retries (not total attempts)
- Clamp server Retry-After delays to MaxDelayMs
- Rethrow original transient exception after retry exhaustion
- Remove unnecessary string interpolation marker
@BenjaminMichaelis BenjaminMichaelis merged commit 5b3721a into main May 16, 2026
8 checks passed
@BenjaminMichaelis BenjaminMichaelis deleted the benjaminmichaelis/embedding-rate-limit-retry branch May 16, 2026 03:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants