Add adaptive embedding throughput shaping for Azure 429 limits#1115
Merged
BenjaminMichaelis merged 5 commits intoMay 16, 2026
Merged
Conversation
- Downshift embedding batch size on repeated 429s by recursively splitting batches - Reuse successful smaller batch size for subsequent requests in the same run - Fail clearly when batch size 1 still receives sustained 429 throttling - Add sequential request pacing with configurable min inter-request delay - Add configurable MaxEmbeddingBatchSize and MinInterRequestDelayMs options - Harden Retry-After parsing for retry-after, retry-after-ms, x-ms-retry-after-ms, and message hints - Update configuration comments and default appsettings values
- Log embedding rebuild start with known total (when available) - Emit progress at 10% milestones when total chunk count is known - Fall back to every 500 chunks when total is unknown - Include current adaptive batch size in progress logs
Contributor
There was a problem hiding this comment.
Pull request overview
Improves resilience of Azure OpenAI embedding rebuilds under sustained throttling by introducing adaptive batch downshifting, request pacing, and more robust Retry-After handling so rebuilds can continue progressing instead of repeatedly exhausting retries.
Changes:
- Added adaptive batch splitting/downshifting on 429/RateLimitReached during embedding rebuild uploads.
- Serialized and paced embedding requests with a configurable minimum inter-request delay.
- Hardened Retry-After parsing (more header variants + message parsing) and added coarse rebuild progress logging.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| EssentialCSharp.Web/appsettings.json | Adds default configuration values for max embedding batch size and inter-request pacing delay. |
| EssentialCSharp.Chat.Shared/Services/EmbeddingService.cs | Implements pacing/serialization, adaptive batch downshifting on throttling, improved Retry-After extraction, and rebuild progress logging. |
| EssentialCSharp.Chat.Shared/Models/EmbeddingRetryOptions.cs | Introduces new retry/pacing configuration knobs with validation. |
| EssentialCSharp.Chat.Shared/Extensions/ServiceCollectionExtensions.cs | Clarifies configuration override semantics for the embedding retry options binding. |
- Make embedding pacing timestamp static to match static request lock scope - Use long arithmetic in percent progress threshold comparison to avoid overflow
- Make _lastEmbeddingRequestStartedUtc instance-scoped - Keep pacing behavior unchanged for singleton DI registration
- Log request attempt state before each embedding call with batch sizing fields - Log successful batch requests using the same structured state event - Log throttled downshift transitions with old/new effective batch size context - Add end-of-run successful batch-size summary counts for production tuning
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The previous retry-only fix still failed under sustained S0 throttling: large embedding requests kept exhausting retries at the same payload size. We need throughput shaping so rebuilds can continue progressing under rate limits instead of stalling at repeated 429 exhaustion.
What changed
AIOptions:EmbeddingRetry:MaxEmbeddingBatchSize(default 2048)AIOptions:EmbeddingRetry:MinInterRequestDelayMs(default 250)retry-after,retry-after-ms,x-ms-retry-after-msretry after N secondsfrom exception message textValidation
dotnet build EssentialCSharp.Chat.Shared/EssentialCSharp.Chat.Common.csproj -c Release --nologodotnet test EssentialCSharp.Chat.Tests/EssentialCSharp.Chat.Tests.csproj -c Release --no-restore -v qBoth passed.