feat: Add initial code for hybrid code search#1
Merged
Merged
Conversation
MechRosey
pushed a commit
to MechRosey/SembleSharp
that referenced
this pull request
May 3, 2026
Adds an end-to-end test class that builds a SembleIndex.FromPath()
against Semble's own src/ tree and asserts that BM25 searches surface
the right files. This catches glue mistakes between the file walker,
Roslyn chunker, encoder seam, indexer, and search pipeline that unit
tests miss.
Uses MockEncoder (deterministic Box-Muller-via-FNV vectors) so the test
runs without a real ONNX/static model on the host. Semantic and hybrid
search would be noisy under a mock encoder, so the assertions only run
in BM25 mode — BM25 quality is a pure function of the indexed corpus
and tokenizer, no encoder involved.
Source location: walks up from the test bin (../../../../../src/Semble)
which works in both Debug and Release configurations and in CI.
Tests:
- Self-index has ≥20 files and >50 csharp chunks
- BM25 surfaces the right file for distinctive symbols:
"tokenize" → Tokens.cs
"WordPiece" → HuggingFaceTokenizer.cs
"safetensors" → SafeTensors.cs
"RerankTopK" → Penalties.cs
"ChunkSource" → Chunker.cs
"IsGitUrl" → Formatting.cs
- "PotionCodeEncoder" is the MinishLab#1 BM25 hit in PotionCodeEncoder.cs
- find_related on a Tokens.cs chunk excludes the seed
- filter_paths constrains BM25 results
- empty / whitespace-only queries return [] across all modes
Suite is now 272 passing, 0 skipped (231 lib + 13 CLI + 28 MCP).
https://claude.ai/code/session_01MFLw7DcMwFQWaokFePX5ud
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.