Releases · catlog22/codexlens-search

What's New

Two-Hop Query Expansion

Natural language queries now get automatically expanded with relevant code symbols before search, closing the vocabulary gap between abstract descriptions and concrete code.

How it works:

Intent gate — only expands natural language queries; code symbol queries pass through unchanged

First hop — embeds the query and finds nearest symbol names from the index vocabulary (cosine > 0.35)

Second hop — discovers neighbor symbols that co-occur with first-hop hits in the same code chunks

Expanded query — original query + discovered symbols feed into the existing search pipeline

Impact (20-query complex benchmark):

Metric	v0.7.x	v0.8.0	Change
Recall@10	0.833	0.892	+5.8%
MRR	0.900	0.967	+6.7%
NDCG@5	0.824	0.894	+7.0%
Top-3 Hit Rate	0.950	1.000	+5.0%
Zero-recall queries	1	0	eliminated

Metric

v0.7.x

v0.8.0

Change

Recall@10

0.833

0.892

+5.8%

MRR

0.900

0.967

+6.7%

NDCG@5

0.824

0.894

+7.0%

Top-3 Hit Rate

0.950

1.000

+5.0%

Zero-recall queries

eliminated

Configuration

Enabled by default. Control via environment variables:

Variable	Default	Description
`CODEXLENS_EXPANSION_ENABLED`	`true`	Enable/disable query expansion

Variable

Default

Description

CODEXLENS_EXPANSION_ENABLED

true

Enable/disable query expansion

Details

Zero new dependencies — uses existing embedder and FTS infrastructure

Lazy vocabulary construction (~0.4s on first search, thread-safe)

Quality filters: cosine threshold (0.35), public symbol preference, intent gating

Overhead: <0.5s per query

Highlights

Context-Aware Chunking (S2 Strategy)

Chunks now include structural context headers (file path, class, function) before embedding. This dramatically improves semantic search recall by giving the embedding model richer context about each code chunk.

Ablation benchmark results (20-query test suite):

Recall: 0.208 → 0.850 (+320%)
MRR: 0.151 → 0.696 (+361%)
Zero-recall queries: 15 → 3 (-80%)

Critical Bug Fix: Auto-Quality Vector Index Detection

Fixed _has_vector_index() in SearchPipeline — the method failed to trigger lazy-load of the binary store, causing ALL auto quality searches to fall back to FTS-only fast path. This silently bypassed the entire 2-stage vector search pipeline.

Orphan FTS Entry Purge (from 0.6.9)

Added purge_orphan_fts() to sync() — removes stale FTS entries not tracked by metadata, fixing ANN/FTS count mismatches after metadata resets.

Changes

config.py: Add chunk_context_header: bool = True toggle
bridge.py: Add CODEXLENS_CHUNK_CONTEXT_HEADER env var support
indexing/pipeline.py: Add _inject_context_headers() — prepends // File: ... // Class: ... // Function: ... to chunks using AST parsing
search/pipeline.py: Fix _has_vector_index() to use hasattr check before calling _ensure_loaded(), supporting both FAISS and numpy binary stores

Upgrade

pip install --upgrade codexlens-search

After upgrading, re-index your projects to benefit from context-aware chunking:

# Existing indexes will work but won't have context headers
# Re-indexing applies the new chunking strategy
pipeline.sync(root_path)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's New

Two-Hop Query Expansion

Configuration

Details

Install

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

Context-Aware Chunking (S2 Strategy)

Critical Bug Fix: Auto-Quality Vector Index Detection

Orphan FTS Entry Purge (from 0.6.9)

Changes

Upgrade

Uh oh!

Releases: catlog22/codexlens-search

v0.8.0 — Two-Hop Query Expansion

What's New

Two-Hop Query Expansion

Configuration

Details

Install

Uh oh!

v0.7.0 — Context-Aware Chunking & Search Quality Fix

Highlights

Context-Aware Chunking (S2 Strategy)

Critical Bug Fix: Auto-Quality Vector Index Detection

Orphan FTS Entry Purge (from 0.6.9)

Changes

Upgrade

Uh oh!