fix: KG search via pure SQL path — no embedding model required (R69)#191
fix: KG search via pure SQL path — no embedding model required (R69)#191
Conversation
Root cause: brain_search's KG path required embedding model loading
(~650MB, slow) and wrapped everything in `except Exception` that
silently fell back to text-only search. Users never saw KG results.
Fix: Two-path architecture:
- Path 1 (always runs): _kg_facts_sql() does pure SQL lookup against
kg_relations — no embeddings, no vector search, just SELECT.
Returns typed relations excluding co_occurs_with.
- Path 2 (optional): Full kg_hybrid_search with embedding model.
If it fails, Path 1 results still shown with degradation notice.
Also:
- Replaced broad `except Exception` with specific handlers
(RuntimeError, OSError, MemoryError) + warning-level logging
- Added "KG degraded" notice to MCP response when Path 2 fails
- KG facts always appear when entity is detected, regardless
of embedding model availability
Eval results (local):
- brain_search('anthropic created claude code') → KG: created→Claude Code ✓
- brain_entity('Etan Heyman') → 9 relations, 0 co_occurs_with ✓
- Entity detection: finds "anthropic" + "Claude Code" ✓
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughWalkthroughAdded a new Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Search Request
participant Entity as Entity Detection
participant KGSQL as KG SQL Facts
participant Hybrid as Hybrid Search
participant Struct as Structured Results
participant Resp as Response Builder
Client->>Entity: Detect entities (no active filters)
Entity-->>Client: Entities found
Client->>KGSQL: _kg_facts_sql() for each entity
KGSQL-->>Client: fact_items (20 relations max)
Client->>Hybrid: Attempt kg_hybrid_search
alt Hybrid Success
Hybrid-->>Client: embedding results
else Hybrid Fails
Hybrid-->>Client: Exception caught
end
Client->>Struct: Process chunk results
Struct-->>Client: structured_results
Client->>Resp: Combine fact_items + structured_results
alt Either path succeeded
Resp-->>Client: Return combined results
Resp->>Client: Add kg_degraded flag if hybrid failed
else Both empty
Resp-->>Client: Empty results
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
Root cause of KG results never appearing in brain_search: the entire KG path required loading bge-large-en-v1.5 (~650MB), and any failure silently fell back to text-only search via
except Exception.Fix: Two-path architecture:
_kg_facts_sql()— pure SQL lookup againstkg_relations, no embeddings needed. Returns typed relations excludingco_occurs_with.kg_hybrid_searchwith vector similarity. If it fails, Path 1 results still shown with degradation notice.Before/After
brain_search('anthropic created claude code')anthropic --created--> Claude Code+ textbrain_search('who works at Cantaloupe AI')Josh --affiliated_with--> Cantaloupe AI+ textWhat changed
_kg_facts_sql(): new pure-SQL KG lookup (no embeddings, always works)except Exception→ specificRuntimeError, OSError, MemoryError+logger.warning"⚠ KG search degraded — showing SQL-only results"Test plan
_kg_facts_sql('anthropic')→created → Claude Code_kg_facts_sql('Etan Heyman')→ 9 relations, 0 co_occurs_with🤖 Generated with Claude Code
Note
Fix KG search in
_brain_searchto work without an embedding model via pure SQL fallback_kg_facts_sqlin search_handler.py, a pure SQL helper that looks up KG relations for an entity by name, excludingco_occurs_withrelations, ordered by confidence (up to 20 results)._brain_searchto always attempt SQL-based fact retrieval first, then attemptkg_hybrid_searchfor chunk retrieval separately.RuntimeError,OSError,MemoryError), akg_degradedflag is set and the response still returns SQL facts with a warning notice appended to formatted text.co_occurs_withrelations are excluded.Macroscope summarized 54a71c1.
Summary by CodeRabbit
Release Notes
Bug Fixes
Refactor