fix: embedding progress tracking + 429 rate limit retry by oxkage · Pull Request #268 · HKUDS/DeepTutor

oxkage · 2026-04-08T17:01:11Z

Problem

During KB initialization with large document sets (265+ files), the progress indicator gets stuck at 0/265 until the entire embedding process completes, then jumps to 100%. Additionally, Gemini and other rate-limited embedding APIs return HTTP 429 errors without any retry logic, causing initialization to fail.

Changes

1. Real-time embedding progress tracking

Added progress_callback parameter through CustomEmbedding → EmbeddingClient → LlamaIndexPipeline → initializer
Progress updates after each embedding batch, giving real-time feedback in the UI
Callback is optional and backwards-compatible

2. HTTP 429 rate limit retry

Added 429-specific retry logic in OpenAICompatibleEmbeddingAdapter
Exponential backoff: 5s → 10s → 20s → 40s → 80s (up to 5 retries)
Respects Retry-After response header when present

3. Gemini free tier friendly defaults

batch_size: 10 → 3 (fewer concurrent requests per batch)
request_timeout: 60s → 120s (slower APIs need more time)
BATCH_DELAY: 1.5s between batches to prevent rate limiting

Files changed

deeptutor/services/embedding/adapters/openai_compatible.py — 429 handling + exponential backoff
deeptutor/services/embedding/client.py — progress callback + batch delay
deeptutor/services/embedding/config.py — batch_size 10→3, timeout 60→120
deeptutor/services/rag/pipelines/llamaindex.py — wire progress callback through CustomEmbedding
deeptutor/knowledge/initializer.py — pass progress callback to RAG service
deeptutor/services/rag/service.py — pass kwargs through to pipeline

Testing

Verified all imports work in venv
Tested with 265 markdown documents using Gemini embedding API
Progress now updates in real-time during embedding batches

- Add real-time progress callback during KB embedding (fixes 0% -> 100% stuck) - Add HTTP 429 retry with exponential backoff (5 retries, 5s/10s/20s/40s/80s) - Reduce default batch_size from 10 to 3 (Gemini free tier friendly) - Add 1.5s delay between embedding batches to prevent rate limiting - Increase request timeout from 60s to 120s for slow APIs - Wire progress callback through CustomEmbedding -> EmbeddingClient -> initializer Files changed: - openai_compatible.py: 429 handling + exponential backoff - client.py: progress callback + batch delay - config.py: batch_size 10->3, timeout 60->120 - llamaindex.py: wire progress callback through CustomEmbedding - initializer.py: pass progress callback to RAG service - service.py: pass kwargs through to pipeline

pancacake · 2026-04-09T01:43:31Z

wow thanks! Will review this really soon.

Merge oxkage's embedding progress tracking and HTTP 429 rate limit retry (PR #268), with the following review-driven improvements: - Restore default batch_size=10 and request_timeout=60 to avoid performance regression for non-free-tier users - Promote batch_delay to EmbeddingConfig (was private adapter attr) so all adapters respect user configuration instead of hardcoded 0.5s - Use set_progress_callback() on existing embed model instead of creating a new CustomEmbedding instance (avoids global state mutation) - Clean up progress callback in finally blocks to prevent leaking into subsequent search/query calls - Wire progress_callback through add_documents (not just initialize) - Remove dead HTTPStatusError 429 branch (already handled before raise_for_status) - Remove unused total_docs variable in initializer Made-with: Cursor

pancacake · 2026-04-09T16:40:33Z

Thanks. Your pr is merged with additional edits!

pancacake closed this Apr 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: embedding progress tracking + 429 rate limit retry#268

fix: embedding progress tracking + 429 rate limit retry#268
oxkage wants to merge 1 commit intoHKUDS:mainfrom
oxkage:fix/embedding-progress-and-rate-limit

oxkage commented Apr 8, 2026

Uh oh!

pancacake commented Apr 9, 2026

Uh oh!

pancacake commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

oxkage commented Apr 8, 2026

Problem

Changes

1. Real-time embedding progress tracking

2. HTTP 429 rate limit retry

3. Gemini free tier friendly defaults

Files changed

Testing

Uh oh!

pancacake commented Apr 9, 2026

Uh oh!

pancacake commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants