What's New
Two-Hop Query Expansion
Natural language queries now get automatically expanded with relevant code symbols before search, closing the vocabulary gap between abstract descriptions and concrete code.
How it works:
- Intent gate — only expands natural language queries; code symbol queries pass through unchanged
- First hop — embeds the query and finds nearest symbol names from the index vocabulary (cosine > 0.35)
- Second hop — discovers neighbor symbols that co-occur with first-hop hits in the same code chunks
- Expanded query — original query + discovered symbols feed into the existing search pipeline
Impact (20-query complex benchmark):
| Metric | v0.7.x | v0.8.0 | Change |
|---|---|---|---|
| Recall@10 | 0.833 | 0.892 | +5.8% |
| MRR | 0.900 | 0.967 | +6.7% |
| NDCG@5 | 0.824 | 0.894 | +7.0% |
| Top-3 Hit Rate | 0.950 | 1.000 | +5.0% |
| Zero-recall queries | 1 | 0 | eliminated |
Configuration
Enabled by default. Control via environment variables:
| Variable | Default | Description |
|---|---|---|
CODEXLENS_EXPANSION_ENABLED |
true |
Enable/disable query expansion |
Details
- Zero new dependencies — uses existing embedder and FTS infrastructure
- Lazy vocabulary construction (~0.4s on first search, thread-safe)
- Quality filters: cosine threshold (0.35), public symbol preference, intent gating
- Overhead: <0.5s per query
Install
pip install codexlens-search[all]==0.8.0Full Changelog: v0.7.1...v0.8.0