Address search audit findings (P0 + P1 + P2)#51
Merged
antoninbas merged 8 commits intomainfrom Apr 19, 2026
Merged
Conversation
newestMdMtime only stats existing .md files, so deleting a file outside the API (e.g. via rm, git pull) doesn't bump any remaining file's mtime and the cached lastIndexedAt never gets invalidated. Include directory mtimes in the scan: directory mtime bumps on any add/remove/rename, so max(file mtimes, dir mtimes) advances on deletes and catches the case. Covers P0-2 from SEARCH_AUDIT.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
qmd extracts titles from the first Markdown heading in the body. Knotes
puts the canonical title in YAML frontmatter and emits no heading, so
qmd's extraction yields the filename for notes and the latest entry's
"## <timestamp> {#e-...}" line for logs — garbage in both cases.
Parse the frontmatter from the body qmd already returns in each result
and use that as the title. This fixes titles for every existing note
without touching their contents or requiring a migration.
Covers P0-3 from SEARCH_AUDIT.md.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
qmd applies handelize() to every indexed path (lowercases + replaces non-alphanumeric runs with dashes), so search results for a file named "Mixed Case With Spaces.md" come back as "notes/odd/mixed-case-with-spaces". The previous getNote fallback only tried a case-insensitive match, so these results were unreachable. Extend the lookup to walk each path segment, accepting either an exact match or an entry whose handelized form matches the requested segment. Apply the same resolver in updateNote and deleteNote so UI actions on search results work too. Reimplemented handelize locally (one segment's worth) because qmd does not re-export it from the package root. Covers P0-1 from SEARCH_AUDIT.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The collections option on search() was declared but never forwarded to qmd, and neither the web API, CLI, nor MCP tool exposed it — so callers could pass collections=notes and silently get both. Plumb the parameter through store.search() and every surface (HTTP query string, CLI -c/--collection flag, MCP tool input). Covers P0-4 from SEARCH_AUDIT.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
bm25 and vector modes now call store.searchLex / store.searchVector directly, so results carry the raw backend relevance score in [0, 1) instead of qmd's position-based 1/(rank+1) value. hybrid mode still goes through store.search, but now passes `explain: true` so we can surface the real fused RRF score from `explain.rrf.totalScore` rather than the overwritten 1/rank value. Snippets now go through qmd's extractSnippet so callers get a concise, query-focused excerpt, and YAML frontmatter is stripped first so "title: ..." / "tags: [...]" preamble can't leak into snippets for results where qmd returns the full body. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
embed jobs are recorded as "embed:on-demand" or "embed:background",
but getLastJob("embed") was doing exact-match so the web API's
/embed/status endpoint always returned lastJob: null. Match either
exact type or "type:suffix" variants so callers get the latest job
regardless of the trigger suffix.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
qmd's store.update() only accepts { collections, onProgress } — passing
force was a no-op that misleadingly suggested the CLI/API could rebuild
the index from scratch. qmd's reindex already hashes each file and
skips unchanged ones, so plumbing force through every layer was purely
vestigial.
Remove the option from the core function, router, client, CLI flag,
MCP tool, and HTTP API schema. The existing api test that posted
force: true is updated to send an empty body; the "400 for non-boolean
force" test for /search/index is removed (the /search/embed equivalent
stays since qmd's embed() does accept force).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
P2-3: lastIndexedAt was being set to Date.now() *after* updateIndex returned, which opened a tiny race: a file written during the reindex would have an mtime earlier than the recorded timestamp, and the next search would think nothing had changed. Capture the timestamp before the update instead. P2-4: add a 10ms TTL cache around newestMdMtime so a burst of rapid searches doesn't stat every file on disk for each query. TTL is intentionally tiny so deletions still get picked up quickly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements the fixes identified in the search audit. Eight commits, one per audit finding, so each is independently reviewable / revertable.
Blockers (P0)
3a0fb6aP0-2 — detect out-of-band file deletions by including directory mtimes in the freshness check86e4ddbP0-3 — resolve result titles from YAML frontmatter (no more "logs/…#e-xxxx" titles)171ba72P0-1 — result paths with spaces/punctuation are now reachable via a handelize-aware path resolver0e47fc5P0-4 — wire thecollectionsfilter through core/router/client/HTTP/MCP/CLIQuality (P1)
adcefa2P1-1/P1-2/P1-4 — bm25/vector modes usesearchLex/searchVectordirectly for raw relevance scores; hybrid usesexplain: trueto surface the real RRF score instead of qmd's1/(rank+1); snippets go throughextractSnippetwith frontmatter stripped firstHygiene (P2)
5b17b29P2-2 —getLastJob("embed")now matchesembed:on-demand/embed:background1456669P2-1 — drop the no-opforceplumbing onupdateIndex(qmd'sstore.updatenever accepted it)fa53761P2-3/P2-4 — recordlastIndexedAtbefore the reindex (fix tiny race), 10ms TTL cache around the tree-walkTest plan
npx vitest run— 144/144 passnpx tsc --noEmitclean