v0.6.8 - Vector Search Optimization & True Incremental Indexing
What is New
True Incremental Vector Indexing
- Added
vector_indexed_atfield across the full data pipeline: Repository type, SQLite schema, backend CRUD, sync import, and repository merge operations - Incremental indexing now only processes unindexed or updated repositories (comparing
vector_indexed_atvslast_edited/analyzed_at) - Rebuild indexing clears all
vector_indexed_atfields before processing and only sets them for successfully indexed repos - Added unindexed repo count badge next to the incremental index button
Vector Search Optimization
- Implemented true semantic reranking: replaces the previous approach of keyword extraction and substring matching with LLM-based semantic relevance sorting of candidate repositories
- Added HyDE (Hypothetical Document Embedding) preprocessing: generates an ideal repository description from user queries before embedding, significantly improving recall for short, Chinese, and conceptual queries
- Structured embedding text with semantic labels (
Repository:,Description:,Topics:) to help embedding models understand field roles - Added lightweight keyword boosting for exact matches in name/description/tags
- Search parameters (similarity threshold, TopK, HyDE toggle, reranking toggle) are now configurable in Settings UI
Performance Improvements
- README fetching is now concurrent with Promise.allSettled batches of 5 instead of sequential processing
- Sync button no longer triggers automatic indexing (which was effectively a no-op for new repos without
analyzed_at)
Bug Fixes
- Fixed JSON parsing greedy regex in gist and semantic reranking that could fail when LLM output contains multiple bracket sections
- Fixed variable shadowing issue in keyword boosting
- Added
EMBEDDING_FORMAT_VERSIONtracking to auto-trigger full reindex when embedding text format changes - HyDE timeout now properly aborts underlying HTTP requests and cleans up timers
- Fixed stale closure issues with
onRepoIndexedusinggetState()for latest data - Added ISO 8601 validation for
vector_indexed_atin bulk upsert operations
Related
- PR #230: feat: true incremental vector index with concurrent README fetching
- PR #231: feat: optimize vector search with HyDE, semantic reranking, and structured embeddings
Update Notes
Users upgrading from v0.6.7: Please re-index your vectors and upgrade your Worker deployment to ensure compatibility with the new incremental indexing features.