Conversation
Aggregate candidate chunk scores per file and boost the top chunk from files that appear multiple times in the candidate pool. Files with many high-scoring chunks are stronger relevance signals than a single chunk from an otherwise-irrelevant file. Benchmarked on 66 repos (20 languages): Total NDCG@10: 0.832 → 0.844 (+0.012) architecture: 0.782 → 0.801 (+0.019) semantic: 0.831 → 0.834 (+0.003) symbol: 0.934 → 0.956 (+0.022)
Add lightweight morphological stemming to _split_identifier: common suffixes (s, es, er, ed, ing, tion, ity, ...) are stripped to generate a stem variant that is added alongside the original token. This lets plurals, gerunds, and nominalizations match across query/document boundaries (colors↔color, utility↔util, serialization↔serial). Benchmarked on 66 repos (20 languages): Total NDCG@10: 0.844 → 0.847 (+0.003) architecture: 0.801 → 0.805 (+0.004) semantic: 0.834 → 0.839 (+0.005) symbol: 0.956 → 0.952 (-0.004)
Sweep over boost multipliers on 66-repo benchmark finds: - _DEFINITION_BOOST_MULTIPLIER: 2.0 → 2.5 (+0.001 symbol/semantic) - _FILE_COHERENCE_BOOST_FRAC: 0.3 → 0.2 (tighter coherence signal) Total NDCG@10: 0.847 → 0.848 (+0.001) semantic: 0.839 → 0.845 (+0.006)
Smaller k amplifies rank differences in RRF fusion, boosting semantic and architecture retrieval. Net gain: NDCG@10 0.848 → 0.850 on full 66-repo benchmark (architecture +0.000, semantic +0.005, symbol -0.007).
Elixir's defmodule was not matched by the existing 'module' keyword because defmodule is a macro prefix, not a bare keyword. Adding it recovers symbol ranking for Elixir modules. Also allow optional namespace prefix in definition patterns (e.g. defmodule Phoenix.Router matches query 'Router'). NDCG@10: 0.850 -> 0.851, symbol: 0.946 -> 0.949.
Files like 'requests.py' define 'Request' but weren't reached by the
non-candidate stem scan because stem('requests') != 'request'. Fix by
also matching when stem.rstrip('s') == symbol_lower, so 'requests.py'
matches symbol query 'Request'.
Combined with defmodule/namespace fix: symbol NDCG@10 0.946 -> 0.952.
Extract class/function/def/etc. names from each chunk and append them to the BM25 content. This ensures symbol queries match chunks that define those symbols even when BM25 tokenisation would otherwise miss the name at ranking time. Symbol NDCG@10 recovers to 0.953 (pre-RRF-tuning level), confirming the defmodule + plural-stem + name-enrichment changes together absorb the symbol regression from RRF k=30. Overall: 0.851 (architecture=0.801, semantic=0.850, symbol=0.953).
Higher multiplier pushes definition chunks above non-defining candidates more strongly. Symbol NDCG@10: 0.953 -> 0.954. Overall unchanged at 0.851.
Files named 'example[s].*' that aren't in an examples/ directory (already penalized) can flood results for broad queries. Apply STRONG_PENALTY (0.3x) to these files so genuinely relevant implementation files rank higher. Fixes lazy.nvim 'config/loader' queries where example.lua incorrectly ranked first. Overall NDCG@10 unchanged (lazy.nvim gains offset by rounding variance).
Natural language (semantic/architecture) queries benefit from a larger candidate pool since the target may rank lower in individual retrievers before boosting. Symbol queries already get strong BM25 signal so 5x is sufficient. NDCG@10: 0.851 -> 0.853 (architecture=0.804, semantic=0.852, symbol=0.954).
single_include/nlohmann/json.hpp is a 27k-line generated amalgam that was outranking the real source files. Apply _STRONG_PENALTY (0.3x) to any path containing single_include/. Full benchmark: 0.805 → 0.807 (arch +0.005, sem +0.004, sym +0.001).
Directories like example_dart/ and example_flutter_app/ (dio repo) were ranking above real source files. Extend _EXAMPLES_DIR_RE to also match these compound example directory names while avoiding false positives on filenames like example_code.py (requires trailing /). Full benchmark: 0.807 → 0.808.
website/ directories contain documentation site source code (Docusaurus, etc.) that is unrelated to library implementation. These were showing up in top-10 results for riverpod queries. Apply _STRONG_PENALTY (0.3x). No annotation targets fall in website/ dirs. Overall: ~0.808 → ~0.809.
Directories like deps/ contain vendored third-party libraries (e.g. jemalloc in redis) that should not rank above the project's own source files. Apply _STRONG_PENALTY (0.3x). No annotation targets in deps/. Score is stable at ~0.808.
Remove BM25 suffix stemming (tokens.py): 45 LOC, gain indistinguishable from noise, rule-based stemmers on code identifiers are noisy across 20 languages. Revert RRF k=30 (search.py) and BM25 def-name enrichment (sparse.py): these were coupled — k=30 alone regressed symbol, enrichment was added to cancel that regression. Net +0.001 for ~30 LOC. Keep k=60. Remove example file, example_dart/, website/, deps/ penalties (penalties.py): all zero measured gain, several carry false-positive risk on repos outside the benchmark. Keep: single_include/ penalty (+0.002, real C/C++ pattern).
For queries like 'how the StateManager tracks state', extract embedded CamelCase/camelCase identifiers and apply a symbol-definition boost at half the strength of pure symbol queries. Non-candidate scan uses prefix matching (min 4-char stem) so e.g. state.ts is found for symbol StateManager even when it doesn't rank in the initial candidate pool. Benchmark: 0.844 -> 0.850 (+0.006, above ±0.003 noise floor) architecture: 0.801 -> 0.812 typescript: 0.699 -> 0.710 csharp: 0.851 -> 0.878 rust: 0.844 -> 0.857
…ALE constant, single-pass coherence, revert tokens.py
…andidate condition
…bedded-symbol pass, walrus + docstring trims
…cstring, drop _stem_matches from _prefix_or_exact
…egex, fold double guard
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR introduces two new boosting mechanisms:
StateManagerorbeforeAllnow also get a boost.