Skip to content
refact-planner edited this page Jun 7, 2026 · 1 revision

VecDB

SQLite-backed semantic search with file splitting, embedding fetch, and cached vector storage.

Storage and schema

VecDB uses SQLite with the vec0 extension for vector search. The active embedding table is created as a vec0 virtual table with cosine distance, along with scope and line-range metadata.

Splitters

Before embedding, files are split into search chunks using content-aware splitters:

  • Trajectory JSON: 4 messages per chunk, with a 1-message overlap
  • Markdown: heading-aware sections with frontmatter support
  • Code: AST-aware token windows through the AST splitter path

Embedding pipeline

Embeddings are fetched through an external HTTP API. The vectorization path batches requests and retries failures. The background flow is:

  1. enqueue documents
  2. split them
  3. check the cache
  4. embed missing chunks
  5. store vectors

Search behavior

Search uses cosine KNN queries, then applies a reject threshold and normalizes the usefulness score. Scope-prefix filtering is applied after the query when needed.

Cleanup

VecDB cleans up old embedding tables by keeping the 10 newest tables and dropping tables older than 7 days.

HTTP endpoint

The VecDB search endpoint is /vecdb-search.

Related links

Clone this wiki locally