Fix search performance: skip LLM query expansion and reranking by default by antoninbas · Pull Request #49 · antoninbas/knotes

antoninbas · 2026-04-16T06:22:07Z

Summary

Performance fix

Hybrid search was running three local GGUF models per query (1.7B query expansion + 300M embed + 0.6B reranker), causing 30–80s search times on CPU
Default hybrid mode now does BM25 + vector only, skipping LLM inference → warm search drops from ~27s to ~43ms
Config defaults rerank: false and queryExpand: false (can be overridden globally via knotes config set rerank true / knotes config set queryExpand true)
Fix race condition in getStore(): concurrent callers (background embed + user search on startup) previously created separate store instances, loading models twice

Search UX overhaul

Dedicated search screen replaces the search overlay — toggled with Ctrl+K or the Search button (highlighted when active), ESC to close
No dynamic search — search only fires on Enter or the Search button
Per-search mode selector in the UI: BM25 / Vector / Hybrid (segmented buttons) + Query Expansion and Reranking checkboxes (Hybrid only, labelled "slow")
Same rerank and queryExpand options exposed in CLI (--rerank, --expand) and MCP (rerank, queryExpand params)

Search result improvements

bestChunk for all modes: all three modes (BM25, vector, hybrid) now use store.search() with pre-built queries, so results always return the specific matching section of the document rather than the full body from the start. This is semantically correct — vector search matches at chunk level, so the result should be the matching chunk.
No backend truncation: full chunk content returned to CLI/MCP
600-char display limit in the UI with ellipsis for long snippets
Clickable results: clicking a result opens the note. Fixed a bug where qmd lowercases paths internally (notes/Test → notes/test), causing getNote to fail. Added case-insensitive filename fallback in getNote.

Test plan

Ctrl+K opens search screen; ESC closes it; Search button highlights when active
Typing does not trigger search; Enter and Search button do
BM25 / Vector / Hybrid mode buttons switch correctly; results show the matching chunk
Query Expansion and Reranking checkboxes only appear in Hybrid mode
Clicking a result opens the note
knotes search <query> --rerank --expand works from CLI
MCP knotes_search accepts rerank and queryExpand params
config show displays rerank: false and queryExpand: false

🤖 Generated with Claude Code

…ault Hybrid search was running three local GGUF models on every query (1.7B query expansion + 300M embed + 0.6B reranker), making searches take 30-80s on CPU. Now defaults to fast BM25+vector hybrid by passing pre-built queries to qmd, skipping LLM inference entirely. Two new config options (both default false): knotes config set queryExpand true # enable LLM query expansion knotes config set rerank true # enable LLM reranking Also fix a race condition in getStore(): concurrent callers (background embed + user search on startup) previously created separate store instances, loading models twice in parallel. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Replace search overlay with a full search screen (Ctrl+K or Search button toggles it) - Search only fires on Enter or Search button click, not while typing - Per-search mode selector: BM25 / Vector / Hybrid (segmented buttons) - Query Expansion and Reranking checkboxes (Hybrid only, labelled "slow") - Results show full snippet (500 chars), score badge, clickable to open note - API: rerank and queryExpand params now accepted per-request, overriding config defaults - CLI: --rerank and --expand flags added to knotes search - MCP: rerank and queryExpand parameters added to knotes_search tool Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- notes.getNote: add case-insensitive filename fallback so search results with qmd-lowercased paths (e.g. "notes/test") resolve to actual files (e.g. "notes/Test") - search: use full bestChunk/body content in snippet (no backend truncation); CLI/MCP now receive the complete matched text - SearchView: truncate snippet display at 600 chars with ellipsis so long notes don't overwhelm the results list Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

BM25 and vector modes were using searchLex/searchVector which return the full document body. Switching to store.search() with pre-built lex/vec queries gives the same search behavior but returns bestChunk — the specific matching section — consistent with how vector search actually works (chunk-level matching, not document-level). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

antoninbas and others added 5 commits April 15, 2026 23:21

Add rerank and queryExpand to CLI config set/get

8494843

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

antoninbas merged commit 3f8dbf8 into main Apr 18, 2026
6 checks passed

antoninbas deleted the fix/search-performance branch April 18, 2026 04:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix search performance: skip LLM query expansion and reranking by default#49

Fix search performance: skip LLM query expansion and reranking by default#49
antoninbas merged 5 commits intomainfrom
fix/search-performance

antoninbas commented Apr 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

antoninbas commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Performance fix

Search UX overhaul

Search result improvements

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

antoninbas commented Apr 16, 2026 •

edited

Loading