Releases: catlog22/codexlens-search
v0.8.0 — Two-Hop Query Expansion
What's New
Two-Hop Query Expansion
Natural language queries now get automatically expanded with relevant code symbols before search, closing the vocabulary gap between abstract descriptions and concrete code.
How it works:
- Intent gate — only expands natural language queries; code symbol queries pass through unchanged
- First hop — embeds the query and finds nearest symbol names from the index vocabulary (cosine > 0.35)
- Second hop — discovers neighbor symbols that co-occur with first-hop hits in the same code chunks
- Expanded query — original query + discovered symbols feed into the existing search pipeline
Impact (20-query complex benchmark):
| Metric | v0.7.x | v0.8.0 | Change |
|---|---|---|---|
| Recall@10 | 0.833 | 0.892 | +5.8% |
| MRR | 0.900 | 0.967 | +6.7% |
| NDCG@5 | 0.824 | 0.894 | +7.0% |
| Top-3 Hit Rate | 0.950 | 1.000 | +5.0% |
| Zero-recall queries | 1 | 0 | eliminated |
Configuration
Enabled by default. Control via environment variables:
| Variable | Default | Description |
|---|---|---|
CODEXLENS_EXPANSION_ENABLED |
true |
Enable/disable query expansion |
Details
- Zero new dependencies — uses existing embedder and FTS infrastructure
- Lazy vocabulary construction (~0.4s on first search, thread-safe)
- Quality filters: cosine threshold (0.35), public symbol preference, intent gating
- Overhead: <0.5s per query
Install
pip install codexlens-search[all]==0.8.0Full Changelog: v0.7.1...v0.8.0
v0.7.0 — Context-Aware Chunking & Search Quality Fix
Highlights
Context-Aware Chunking (S2 Strategy)
Chunks now include structural context headers (file path, class, function) before embedding. This dramatically improves semantic search recall by giving the embedding model richer context about each code chunk.
Ablation benchmark results (20-query test suite):
- Recall: 0.208 → 0.850 (+320%)
- MRR: 0.151 → 0.696 (+361%)
- Zero-recall queries: 15 → 3 (-80%)
Critical Bug Fix: Auto-Quality Vector Index Detection
Fixed _has_vector_index() in SearchPipeline — the method failed to trigger lazy-load of the binary store, causing ALL auto quality searches to fall back to FTS-only fast path. This silently bypassed the entire 2-stage vector search pipeline.
Orphan FTS Entry Purge (from 0.6.9)
Added purge_orphan_fts() to sync() — removes stale FTS entries not tracked by metadata, fixing ANN/FTS count mismatches after metadata resets.
Changes
config.py: Addchunk_context_header: bool = Truetogglebridge.py: AddCODEXLENS_CHUNK_CONTEXT_HEADERenv var supportindexing/pipeline.py: Add_inject_context_headers()— prepends// File: ... // Class: ... // Function: ...to chunks using AST parsingsearch/pipeline.py: Fix_has_vector_index()to usehasattrcheck before calling_ensure_loaded(), supporting both FAISS and numpy binary stores
Upgrade
pip install --upgrade codexlens-searchAfter upgrading, re-index your projects to benefit from context-aware chunking:
# Existing indexes will work but won't have context headers
# Re-indexing applies the new chunking strategy
pipeline.sync(root_path)