Skip to content

Address search audit findings (P0 + P1 + P2)#51

Merged
antoninbas merged 8 commits intomainfrom
search-audit-fixes
Apr 19, 2026
Merged

Address search audit findings (P0 + P1 + P2)#51
antoninbas merged 8 commits intomainfrom
search-audit-fixes

Conversation

@antoninbas
Copy link
Copy Markdown
Owner

Summary

Implements the fixes identified in the search audit. Eight commits, one per audit finding, so each is independently reviewable / revertable.

Blockers (P0)

  • 3a0fb6a P0-2 — detect out-of-band file deletions by including directory mtimes in the freshness check
  • 86e4ddb P0-3 — resolve result titles from YAML frontmatter (no more "logs/…#e-xxxx" titles)
  • 171ba72 P0-1 — result paths with spaces/punctuation are now reachable via a handelize-aware path resolver
  • 0e47fc5 P0-4 — wire the collections filter through core/router/client/HTTP/MCP/CLI

Quality (P1)

  • adcefa2 P1-1/P1-2/P1-4 — bm25/vector modes use searchLex/searchVector directly for raw relevance scores; hybrid uses explain: true to surface the real RRF score instead of qmd's 1/(rank+1); snippets go through extractSnippet with frontmatter stripped first

Hygiene (P2)

  • 5b17b29 P2-2 — getLastJob("embed") now matches embed:on-demand / embed:background
  • 1456669 P2-1 — drop the no-op force plumbing on updateIndex (qmd's store.update never accepted it)
  • fa53761 P2-3/P2-4 — record lastIndexedAt before the reindex (fix tiny race), 10ms TTL cache around the tree-walk

Test plan

  • npx vitest run — 144/144 pass
  • npx tsc --noEmit clean
  • Verify CI is green before merging

antoninbas and others added 8 commits April 18, 2026 16:09
newestMdMtime only stats existing .md files, so deleting a file outside
the API (e.g. via rm, git pull) doesn't bump any remaining file's mtime
and the cached lastIndexedAt never gets invalidated. Include directory
mtimes in the scan: directory mtime bumps on any add/remove/rename, so
max(file mtimes, dir mtimes) advances on deletes and catches the case.

Covers P0-2 from SEARCH_AUDIT.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
qmd extracts titles from the first Markdown heading in the body. Knotes
puts the canonical title in YAML frontmatter and emits no heading, so
qmd's extraction yields the filename for notes and the latest entry's
"## <timestamp> {#e-...}" line for logs — garbage in both cases.

Parse the frontmatter from the body qmd already returns in each result
and use that as the title. This fixes titles for every existing note
without touching their contents or requiring a migration.

Covers P0-3 from SEARCH_AUDIT.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
qmd applies handelize() to every indexed path (lowercases + replaces
non-alphanumeric runs with dashes), so search results for a file named
"Mixed Case With Spaces.md" come back as
"notes/odd/mixed-case-with-spaces". The previous getNote fallback only
tried a case-insensitive match, so these results were unreachable.

Extend the lookup to walk each path segment, accepting either an exact
match or an entry whose handelized form matches the requested segment.
Apply the same resolver in updateNote and deleteNote so UI actions on
search results work too.

Reimplemented handelize locally (one segment's worth) because qmd does
not re-export it from the package root.

Covers P0-1 from SEARCH_AUDIT.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The collections option on search() was declared but never forwarded to
qmd, and neither the web API, CLI, nor MCP tool exposed it — so
callers could pass collections=notes and silently get both. Plumb the
parameter through store.search() and every surface (HTTP query
string, CLI -c/--collection flag, MCP tool input).

Covers P0-4 from SEARCH_AUDIT.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
bm25 and vector modes now call store.searchLex / store.searchVector
directly, so results carry the raw backend relevance score in [0, 1)
instead of qmd's position-based 1/(rank+1) value.

hybrid mode still goes through store.search, but now passes
`explain: true` so we can surface the real fused RRF score from
`explain.rrf.totalScore` rather than the overwritten 1/rank value.

Snippets now go through qmd's extractSnippet so callers get a concise,
query-focused excerpt, and YAML frontmatter is stripped first so
"title: ..." / "tags: [...]" preamble can't leak into snippets for
results where qmd returns the full body.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
embed jobs are recorded as "embed:on-demand" or "embed:background",
but getLastJob("embed") was doing exact-match so the web API's
/embed/status endpoint always returned lastJob: null. Match either
exact type or "type:suffix" variants so callers get the latest job
regardless of the trigger suffix.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
qmd's store.update() only accepts { collections, onProgress } — passing
force was a no-op that misleadingly suggested the CLI/API could rebuild
the index from scratch. qmd's reindex already hashes each file and
skips unchanged ones, so plumbing force through every layer was purely
vestigial.

Remove the option from the core function, router, client, CLI flag,
MCP tool, and HTTP API schema. The existing api test that posted
force: true is updated to send an empty body; the "400 for non-boolean
force" test for /search/index is removed (the /search/embed equivalent
stays since qmd's embed() does accept force).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
P2-3: lastIndexedAt was being set to Date.now() *after* updateIndex
returned, which opened a tiny race: a file written during the reindex
would have an mtime earlier than the recorded timestamp, and the next
search would think nothing had changed. Capture the timestamp before
the update instead.

P2-4: add a 10ms TTL cache around newestMdMtime so a burst of rapid
searches doesn't stat every file on disk for each query. TTL is
intentionally tiny so deletions still get picked up quickly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@antoninbas antoninbas merged commit c66ebd9 into main Apr 19, 2026
6 checks passed
@antoninbas antoninbas deleted the search-audit-fixes branch April 19, 2026 03:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant