Skip to content

Fix search performance: skip LLM query expansion and reranking by default#49

Merged
antoninbas merged 5 commits intomainfrom
fix/search-performance
Apr 18, 2026
Merged

Fix search performance: skip LLM query expansion and reranking by default#49
antoninbas merged 5 commits intomainfrom
fix/search-performance

Conversation

@antoninbas
Copy link
Copy Markdown
Owner

@antoninbas antoninbas commented Apr 16, 2026

Summary

Performance fix

  • Hybrid search was running three local GGUF models per query (1.7B query expansion + 300M embed + 0.6B reranker), causing 30–80s search times on CPU
  • Default hybrid mode now does BM25 + vector only, skipping LLM inference → warm search drops from ~27s to ~43ms
  • Config defaults rerank: false and queryExpand: false (can be overridden globally via knotes config set rerank true / knotes config set queryExpand true)
  • Fix race condition in getStore(): concurrent callers (background embed + user search on startup) previously created separate store instances, loading models twice

Search UX overhaul

  • Dedicated search screen replaces the search overlay — toggled with Ctrl+K or the Search button (highlighted when active), ESC to close
  • No dynamic search — search only fires on Enter or the Search button
  • Per-search mode selector in the UI: BM25 / Vector / Hybrid (segmented buttons) + Query Expansion and Reranking checkboxes (Hybrid only, labelled "slow")
  • Same rerank and queryExpand options exposed in CLI (--rerank, --expand) and MCP (rerank, queryExpand params)

Search result improvements

  • bestChunk for all modes: all three modes (BM25, vector, hybrid) now use store.search() with pre-built queries, so results always return the specific matching section of the document rather than the full body from the start. This is semantically correct — vector search matches at chunk level, so the result should be the matching chunk.
  • No backend truncation: full chunk content returned to CLI/MCP
  • 600-char display limit in the UI with ellipsis for long snippets
  • Clickable results: clicking a result opens the note. Fixed a bug where qmd lowercases paths internally (notes/Testnotes/test), causing getNote to fail. Added case-insensitive filename fallback in getNote.

Test plan

  • Ctrl+K opens search screen; ESC closes it; Search button highlights when active
  • Typing does not trigger search; Enter and Search button do
  • BM25 / Vector / Hybrid mode buttons switch correctly; results show the matching chunk
  • Query Expansion and Reranking checkboxes only appear in Hybrid mode
  • Clicking a result opens the note
  • knotes search <query> --rerank --expand works from CLI
  • MCP knotes_search accepts rerank and queryExpand params
  • config show displays rerank: false and queryExpand: false

🤖 Generated with Claude Code

antoninbas and others added 5 commits April 15, 2026 23:21
…ault

Hybrid search was running three local GGUF models on every query
(1.7B query expansion + 300M embed + 0.6B reranker), making searches
take 30-80s on CPU. Now defaults to fast BM25+vector hybrid by passing
pre-built queries to qmd, skipping LLM inference entirely.

Two new config options (both default false):
  knotes config set queryExpand true  # enable LLM query expansion
  knotes config set rerank true       # enable LLM reranking

Also fix a race condition in getStore(): concurrent callers (background
embed + user search on startup) previously created separate store
instances, loading models twice in parallel.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace search overlay with a full search screen (Ctrl+K or Search button toggles it)
- Search only fires on Enter or Search button click, not while typing
- Per-search mode selector: BM25 / Vector / Hybrid (segmented buttons)
- Query Expansion and Reranking checkboxes (Hybrid only, labelled "slow")
- Results show full snippet (500 chars), score badge, clickable to open note
- API: rerank and queryExpand params now accepted per-request, overriding config defaults
- CLI: --rerank and --expand flags added to knotes search
- MCP: rerank and queryExpand parameters added to knotes_search tool

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- notes.getNote: add case-insensitive filename fallback so search results
  with qmd-lowercased paths (e.g. "notes/test") resolve to actual files
  (e.g. "notes/Test")
- search: use full bestChunk/body content in snippet (no backend truncation);
  CLI/MCP now receive the complete matched text
- SearchView: truncate snippet display at 600 chars with ellipsis so long
  notes don't overwhelm the results list

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
BM25 and vector modes were using searchLex/searchVector which return the
full document body. Switching to store.search() with pre-built lex/vec
queries gives the same search behavior but returns bestChunk — the
specific matching section — consistent with how vector search actually
works (chunk-level matching, not document-level).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@antoninbas antoninbas merged commit 3f8dbf8 into main Apr 18, 2026
6 checks passed
@antoninbas antoninbas deleted the fix/search-performance branch April 18, 2026 04:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant