Skip to content

feat: Add initial code for hybrid code search#1

Merged
Pringled merged 26 commits into
mainfrom
initial-code
Apr 9, 2026
Merged

feat: Add initial code for hybrid code search#1
Pringled merged 26 commits into
mainfrom
initial-code

Conversation

@Pringled
Copy link
Copy Markdown
Member

@Pringled Pringled commented Apr 9, 2026

No description provided.

@Pringled Pringled merged commit 2542b7f into main Apr 9, 2026
@Pringled Pringled deleted the initial-code branch April 22, 2026 05:05
MechRosey pushed a commit to MechRosey/SembleSharp that referenced this pull request May 3, 2026
Adds an end-to-end test class that builds a SembleIndex.FromPath()
against Semble's own src/ tree and asserts that BM25 searches surface
the right files. This catches glue mistakes between the file walker,
Roslyn chunker, encoder seam, indexer, and search pipeline that unit
tests miss.

Uses MockEncoder (deterministic Box-Muller-via-FNV vectors) so the test
runs without a real ONNX/static model on the host. Semantic and hybrid
search would be noisy under a mock encoder, so the assertions only run
in BM25 mode — BM25 quality is a pure function of the indexed corpus
and tokenizer, no encoder involved.

Source location: walks up from the test bin (../../../../../src/Semble)
which works in both Debug and Release configurations and in CI.

Tests:
  - Self-index has ≥20 files and >50 csharp chunks
  - BM25 surfaces the right file for distinctive symbols:
    "tokenize" → Tokens.cs
    "WordPiece" → HuggingFaceTokenizer.cs
    "safetensors" → SafeTensors.cs
    "RerankTopK" → Penalties.cs
    "ChunkSource" → Chunker.cs
    "IsGitUrl" → Formatting.cs
  - "PotionCodeEncoder" is the MinishLab#1 BM25 hit in PotionCodeEncoder.cs
  - find_related on a Tokens.cs chunk excludes the seed
  - filter_paths constrains BM25 results
  - empty / whitespace-only queries return [] across all modes

Suite is now 272 passing, 0 skipped (231 lib + 13 CLI + 28 MCP).

https://claude.ai/code/session_01MFLw7DcMwFQWaokFePX5ud
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant