fix: embedding progress tracking + 429 rate limit retry#268
Closed
oxkage wants to merge 1 commit intoHKUDS:mainfrom
Closed
fix: embedding progress tracking + 429 rate limit retry#268oxkage wants to merge 1 commit intoHKUDS:mainfrom
oxkage wants to merge 1 commit intoHKUDS:mainfrom
Conversation
- Add real-time progress callback during KB embedding (fixes 0% -> 100% stuck) - Add HTTP 429 retry with exponential backoff (5 retries, 5s/10s/20s/40s/80s) - Reduce default batch_size from 10 to 3 (Gemini free tier friendly) - Add 1.5s delay between embedding batches to prevent rate limiting - Increase request timeout from 60s to 120s for slow APIs - Wire progress callback through CustomEmbedding -> EmbeddingClient -> initializer Files changed: - openai_compatible.py: 429 handling + exponential backoff - client.py: progress callback + batch delay - config.py: batch_size 10->3, timeout 60->120 - llamaindex.py: wire progress callback through CustomEmbedding - initializer.py: pass progress callback to RAG service - service.py: pass kwargs through to pipeline
Collaborator
|
wow thanks! Will review this really soon. |
pancacake
added a commit
that referenced
this pull request
Apr 9, 2026
Merge oxkage's embedding progress tracking and HTTP 429 rate limit retry (PR #268), with the following review-driven improvements: - Restore default batch_size=10 and request_timeout=60 to avoid performance regression for non-free-tier users - Promote batch_delay to EmbeddingConfig (was private adapter attr) so all adapters respect user configuration instead of hardcoded 0.5s - Use set_progress_callback() on existing embed model instead of creating a new CustomEmbedding instance (avoids global state mutation) - Clean up progress callback in finally blocks to prevent leaking into subsequent search/query calls - Wire progress_callback through add_documents (not just initialize) - Remove dead HTTPStatusError 429 branch (already handled before raise_for_status) - Remove unused total_docs variable in initializer Made-with: Cursor
Collaborator
|
Thanks. Your pr is merged with additional edits! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
During KB initialization with large document sets (265+ files), the progress indicator gets stuck at 0/265 until the entire embedding process completes, then jumps to 100%. Additionally, Gemini and other rate-limited embedding APIs return HTTP 429 errors without any retry logic, causing initialization to fail.
Changes
1. Real-time embedding progress tracking
progress_callbackparameter throughCustomEmbedding→EmbeddingClient→LlamaIndexPipeline→initializer2. HTTP 429 rate limit retry
OpenAICompatibleEmbeddingAdapterRetry-Afterresponse header when present3. Gemini free tier friendly defaults
batch_size: 10 → 3 (fewer concurrent requests per batch)request_timeout: 60s → 120s (slower APIs need more time)BATCH_DELAY: 1.5s between batches to prevent rate limitingFiles changed
deeptutor/services/embedding/adapters/openai_compatible.py— 429 handling + exponential backoffdeeptutor/services/embedding/client.py— progress callback + batch delaydeeptutor/services/embedding/config.py— batch_size 10→3, timeout 60→120deeptutor/services/rag/pipelines/llamaindex.py— wire progress callback through CustomEmbeddingdeeptutor/knowledge/initializer.py— pass progress callback to RAG servicedeeptutor/services/rag/service.py— pass kwargs through to pipelineTesting