feat(search): hybrid code search — exact + semantic in one call (#67)#75
Merged
Conversation
Adds a hybrid_search tool that fuses exact regex/keyword matching with semantic ranking over a symbol-aware local index, in a single call. Finds code that does what you describe even without the query words, and ranks grep-style hits by relevance. New src/tools/hybrid_search.js reuses the existing local BM25 + hashed-vector engine (src/rag/index_store) — fully offline, zero model downloads, no native runtime, no external services. Modes: hybrid (default), regex, keyword, semantic. Wired into executor, tool schemas, search/code-intel routing, and dedup; path args contained via safeResolvePath. Inspired by colgrep and semble (issue #67), kept dependency-free to match SmallCode's local-first design. 11 new tests; full suite 313 passing.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hybrid code search — exact + semantic in one call
Implements the ideas from #67: a search tool that combines regex/keyword AND semantic search in the same call, over a symbol-aware (AST-ish) local index, fully on-device.
What it does
New
hybrid_searchtool ("grep on steroids"):●, semantic-only○.Modes:
hybrid(default),regex,keyword,semantic.Example (run against this repo):
None of those lines contain all the query words — semantic ranking finds them.
On the referenced projects
#67 suggested colgrep (Rust + ColBERT multi-vector) and semble (Python + model2vec). Both are great, but they pull in heavy native/Python runtimes and downloaded model weights. SmallCode's whole premise is staying small and fully local with zero external services, so this reuses the existing local hybrid engine (
src/rag/index_store) instead of shipping an embedding model — same single-call hybrid ergonomics, no new dependencies, runs instantly on CPU. If someone wants true neural embeddings later, this leaves room to layer a semantic MCP on top.Changes
src/tools/hybrid_search.js,test/hybrid_search.test.js(11 cases).bin/executor.js(tool case, path-contained viasafeResolvePath),bin/tools.js(schema),src/compiled/tool_router.js+src/tools/two_stage_router.js(search/code-intel categories),src/tools/dedup.js(marked pure).SMALLCODE_HYBRID_MAX_FILES(1500),SMALLCODE_HYBRID_MAX_BYTES(512KiB).Testing
Full suite 313 passing (
node --test test/*.test.js, excluding the pre-existing environment-dependentshell_session.test.js). Build clean.Closes #67.