Skip to content

v0.6.8

Latest

Choose a tag to compare

@AmintaCCCP AmintaCCCP released this 27 Jun 15:44
1bf4768

v0.6.8 - Vector Search Optimization & True Incremental Indexing

What is New

True Incremental Vector Indexing

  • Added vector_indexed_at field across the full data pipeline: Repository type, SQLite schema, backend CRUD, sync import, and repository merge operations
  • Incremental indexing now only processes unindexed or updated repositories (comparing vector_indexed_at vs last_edited/analyzed_at)
  • Rebuild indexing clears all vector_indexed_at fields before processing and only sets them for successfully indexed repos
  • Added unindexed repo count badge next to the incremental index button

Vector Search Optimization

  • Implemented true semantic reranking: replaces the previous approach of keyword extraction and substring matching with LLM-based semantic relevance sorting of candidate repositories
  • Added HyDE (Hypothetical Document Embedding) preprocessing: generates an ideal repository description from user queries before embedding, significantly improving recall for short, Chinese, and conceptual queries
  • Structured embedding text with semantic labels (Repository:, Description:, Topics:) to help embedding models understand field roles
  • Added lightweight keyword boosting for exact matches in name/description/tags
  • Search parameters (similarity threshold, TopK, HyDE toggle, reranking toggle) are now configurable in Settings UI

Performance Improvements

  • README fetching is now concurrent with Promise.allSettled batches of 5 instead of sequential processing
  • Sync button no longer triggers automatic indexing (which was effectively a no-op for new repos without analyzed_at)

Bug Fixes

  • Fixed JSON parsing greedy regex in gist and semantic reranking that could fail when LLM output contains multiple bracket sections
  • Fixed variable shadowing issue in keyword boosting
  • Added EMBEDDING_FORMAT_VERSION tracking to auto-trigger full reindex when embedding text format changes
  • HyDE timeout now properly aborts underlying HTTP requests and cleans up timers
  • Fixed stale closure issues with onRepoIndexed using getState() for latest data
  • Added ISO 8601 validation for vector_indexed_at in bulk upsert operations

Related

  • PR #230: feat: true incremental vector index with concurrent README fetching
  • PR #231: feat: optimize vector search with HyDE, semantic reranking, and structured embeddings

Update Notes

Users upgrading from v0.6.7: Please re-index your vectors and upgrade your Worker deployment to ensure compatibility with the new incremental indexing features.