Skip to content

v0.7.0 — Context-Aware Chunking & Search Quality Fix

Choose a tag to compare

@catlog22 catlog22 released this 23 Mar 09:38
· 18 commits to main since this release

Highlights

Context-Aware Chunking (S2 Strategy)

Chunks now include structural context headers (file path, class, function) before embedding. This dramatically improves semantic search recall by giving the embedding model richer context about each code chunk.

Ablation benchmark results (20-query test suite):

  • Recall: 0.208 → 0.850 (+320%)
  • MRR: 0.151 → 0.696 (+361%)
  • Zero-recall queries: 15 → 3 (-80%)

Critical Bug Fix: Auto-Quality Vector Index Detection

Fixed _has_vector_index() in SearchPipeline — the method failed to trigger lazy-load of the binary store, causing ALL auto quality searches to fall back to FTS-only fast path. This silently bypassed the entire 2-stage vector search pipeline.

Orphan FTS Entry Purge (from 0.6.9)

Added purge_orphan_fts() to sync() — removes stale FTS entries not tracked by metadata, fixing ANN/FTS count mismatches after metadata resets.

Changes

  • config.py: Add chunk_context_header: bool = True toggle
  • bridge.py: Add CODEXLENS_CHUNK_CONTEXT_HEADER env var support
  • indexing/pipeline.py: Add _inject_context_headers() — prepends // File: ... // Class: ... // Function: ... to chunks using AST parsing
  • search/pipeline.py: Fix _has_vector_index() to use hasattr check before calling _ensure_loaded(), supporting both FAISS and numpy binary stores

Upgrade

pip install --upgrade codexlens-search

After upgrading, re-index your projects to benefit from context-aware chunking:

# Existing indexes will work but won't have context headers
# Re-indexing applies the new chunking strategy
pipeline.sync(root_path)