Skip to content

fix(search): scope local-search graph expansion to top-hit docs (RAN-35)#80

Open
aksOps wants to merge 1 commit intomainfrom
fix/local-search-doc-scope-ran35
Open

fix(search): scope local-search graph expansion to top-hit docs (RAN-35)#80
aksOps wants to merge 1 commit intomainfrom
fix/local-search-doc-scope-ran35

Conversation

@aksOps
Copy link
Copy Markdown
Contributor

@aksOps aksOps commented Apr 24, 2026

Summary

  • search.LocalSearch narrowed the initial entity set to top-hit docs but then called RelationshipsForEntity with no doc filter, so BFS expansion could leak off-scope entities/edges into a scoped result. Fixes RAN-35.
  • Adds Store.RelationshipsForEntityInDocs(entityID, depth, docIDs) — BFS that constrains every hop to doc_id IN (...) and dedups across hops. LocalSearch now uses it so the expansion stays inside the top-hit doc set.
  • REST (/search/local) and MCP (docsiq_search_local) both route through search.LocalSearch, so behaviour stays aligned after the fix. Entity-focused endpoints that take an entity ID (not a doc set) keep the unscoped walk.

Regression coverage

  • internal/store/relationships_for_entity_in_docs_test.go
    • TestRelationshipsForEntityInDocs_OnlyReturnsEdgesFromScopedDocs — proves a relationship from an unrelated document cannot appear in a scoped BFS result (directly exercises the leak described in RAN-35).
    • TestRelationshipsForEntityInDocs_EmptyDocsReturnsNil — empty doc set → no expansion.
    • TestRelationshipsForEntityInDocs_RespectsDepthLimit — depth=1 from a seed only returns the direct edge.
  • internal/search/local_scope_test.go
    • TestLocalSearch_GraphExpansionScopedToTopHitDocs — end-to-end: top chunk's doc set scopes the graph walk; a relationship a seed entity participates in via an unrelated doc does not leak through.

Test plan

  • CGO_ENABLED=1 go build -tags sqlite_fts5 ./...
  • CGO_ENABLED=1 go vet -tags sqlite_fts5 ./...
  • CGO_ENABLED=1 go test -tags sqlite_fts5 -timeout 300s ./... — 662 passed
  • New leak regression tests fail on main (verified by asserting that unscoped RelationshipsForEntity still surfaces the out-of-scope edge in the fixture)

`search.LocalSearch` narrowed the initial entity set to the top-hit
documents, but its BFS walk called `RelationshipsForEntity` without any
doc filter. That re-expanded through unrelated relationships anywhere
in the project, so a scoped query could leak off-scope entities and
edges into the result set.

Add `Store.RelationshipsForEntityInDocs(entityID, depth, docIDs)` — a
BFS variant that constrains every hop to `doc_id IN (...)` and dedups
relationships across hops. Use it from `LocalSearch` so the expansion
stays inside the top-hit doc set.

Both the REST search path (`/search/local`) and the MCP
`docsiq_search_local` tool go through `search.LocalSearch`, so the
REST/MCP behaviour stays aligned after the fix. Entity-focused paths
(`/entities/:id`, MCP entity-graph tools) intentionally keep the
unscoped walk — they take an entity as input, not a document set.

Regression coverage:
- store: `TestRelationshipsForEntityInDocs_OnlyReturnsEdgesFromScopedDocs`
  proves a relationship from an unrelated document cannot appear in a
  scoped BFS result; depth and empty-input cases covered too.
- search: `TestLocalSearch_GraphExpansionScopedToTopHitDocs` exercises
  the full `LocalSearch` path end-to-end.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
@aksOps aksOps enabled auto-merge (squash) April 24, 2026 19:53
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7ebcb85fd5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/store/store.go
Comment on lines +852 to +853
const docChunkSize = 900
const frontierChunkSize = 900
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reduce chunk sizes to honor SQLite bind parameter limit

RelationshipsForEntityInDocs chunks both frontier and docIDs at 900, but each query binds source_id IN (...) and target_id IN (...) for the same frontier plus doc_id IN (...), i.e. 2*len(fChunk)+len(dChunk) placeholders. At the configured sizes this reaches 2700 params, which exceeds SQLite builds that keep the common 999-variable cap and causes runtime too many SQL variables errors when local search spans larger frontiers/doc sets; in LocalSearch those errors are silently skipped (continue), so graph expansion can disappear without surfacing an error.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant