Skip to content

v0.8.0 — Two-Hop Query Expansion

Latest

Choose a tag to compare

@catlog22 catlog22 released this 23 Mar 12:52
· 12 commits to main since this release

What's New

Two-Hop Query Expansion

Natural language queries now get automatically expanded with relevant code symbols before search, closing the vocabulary gap between abstract descriptions and concrete code.

How it works:

  1. Intent gate — only expands natural language queries; code symbol queries pass through unchanged
  2. First hop — embeds the query and finds nearest symbol names from the index vocabulary (cosine > 0.35)
  3. Second hop — discovers neighbor symbols that co-occur with first-hop hits in the same code chunks
  4. Expanded query — original query + discovered symbols feed into the existing search pipeline

Impact (20-query complex benchmark):

Metric v0.7.x v0.8.0 Change
Recall@10 0.833 0.892 +5.8%
MRR 0.900 0.967 +6.7%
NDCG@5 0.824 0.894 +7.0%
Top-3 Hit Rate 0.950 1.000 +5.0%
Zero-recall queries 1 0 eliminated

Configuration

Enabled by default. Control via environment variables:

Variable Default Description
CODEXLENS_EXPANSION_ENABLED true Enable/disable query expansion

Details

  • Zero new dependencies — uses existing embedder and FTS infrastructure
  • Lazy vocabulary construction (~0.4s on first search, thread-safe)
  • Quality filters: cosine threshold (0.35), public symbol preference, intent gating
  • Overhead: <0.5s per query

Install

pip install codexlens-search[all]==0.8.0

Full Changelog: v0.7.1...v0.8.0