Skip to content

fix: embedding progress tracking + 429 rate limit retry#268

Closed
oxkage wants to merge 1 commit intoHKUDS:mainfrom
oxkage:fix/embedding-progress-and-rate-limit
Closed

fix: embedding progress tracking + 429 rate limit retry#268
oxkage wants to merge 1 commit intoHKUDS:mainfrom
oxkage:fix/embedding-progress-and-rate-limit

Conversation

@oxkage
Copy link
Copy Markdown

@oxkage oxkage commented Apr 8, 2026

Problem

During KB initialization with large document sets (265+ files), the progress indicator gets stuck at 0/265 until the entire embedding process completes, then jumps to 100%. Additionally, Gemini and other rate-limited embedding APIs return HTTP 429 errors without any retry logic, causing initialization to fail.

Changes

1. Real-time embedding progress tracking

  • Added progress_callback parameter through CustomEmbeddingEmbeddingClientLlamaIndexPipelineinitializer
  • Progress updates after each embedding batch, giving real-time feedback in the UI
  • Callback is optional and backwards-compatible

2. HTTP 429 rate limit retry

  • Added 429-specific retry logic in OpenAICompatibleEmbeddingAdapter
  • Exponential backoff: 5s → 10s → 20s → 40s → 80s (up to 5 retries)
  • Respects Retry-After response header when present

3. Gemini free tier friendly defaults

  • batch_size: 10 → 3 (fewer concurrent requests per batch)
  • request_timeout: 60s → 120s (slower APIs need more time)
  • BATCH_DELAY: 1.5s between batches to prevent rate limiting

Files changed

  • deeptutor/services/embedding/adapters/openai_compatible.py — 429 handling + exponential backoff
  • deeptutor/services/embedding/client.py — progress callback + batch delay
  • deeptutor/services/embedding/config.py — batch_size 10→3, timeout 60→120
  • deeptutor/services/rag/pipelines/llamaindex.py — wire progress callback through CustomEmbedding
  • deeptutor/knowledge/initializer.py — pass progress callback to RAG service
  • deeptutor/services/rag/service.py — pass kwargs through to pipeline

Testing

  • Verified all imports work in venv
  • Tested with 265 markdown documents using Gemini embedding API
  • Progress now updates in real-time during embedding batches

- Add real-time progress callback during KB embedding (fixes 0% -> 100% stuck)
- Add HTTP 429 retry with exponential backoff (5 retries, 5s/10s/20s/40s/80s)
- Reduce default batch_size from 10 to 3 (Gemini free tier friendly)
- Add 1.5s delay between embedding batches to prevent rate limiting
- Increase request timeout from 60s to 120s for slow APIs
- Wire progress callback through CustomEmbedding -> EmbeddingClient -> initializer

Files changed:
- openai_compatible.py: 429 handling + exponential backoff
- client.py: progress callback + batch delay
- config.py: batch_size 10->3, timeout 60->120
- llamaindex.py: wire progress callback through CustomEmbedding
- initializer.py: pass progress callback to RAG service
- service.py: pass kwargs through to pipeline
@pancacake
Copy link
Copy Markdown
Collaborator

wow thanks! Will review this really soon.

pancacake added a commit that referenced this pull request Apr 9, 2026
Merge oxkage's embedding progress tracking and HTTP 429 rate limit
retry (PR #268), with the following review-driven improvements:

- Restore default batch_size=10 and request_timeout=60 to avoid
  performance regression for non-free-tier users
- Promote batch_delay to EmbeddingConfig (was private adapter attr)
  so all adapters respect user configuration instead of hardcoded 0.5s
- Use set_progress_callback() on existing embed model instead of
  creating a new CustomEmbedding instance (avoids global state mutation)
- Clean up progress callback in finally blocks to prevent leaking
  into subsequent search/query calls
- Wire progress_callback through add_documents (not just initialize)
- Remove dead HTTPStatusError 429 branch (already handled before
  raise_for_status)
- Remove unused total_docs variable in initializer

Made-with: Cursor
@pancacake
Copy link
Copy Markdown
Collaborator

Thanks. Your pr is merged with additional edits!

@pancacake pancacake closed this Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants