feat: compact search default + brain_expand tool + entity dedup#60
feat: compact search default + brain_expand tool + entity dedup#60
Conversation
Make brain_search return compact results by default (150-char snippets + chunk_id for drill-down), add brain_expand as a first-class MCP tool, deduplicate entity mentions in digest pipeline, and add vector similarity fallback to entity resolution. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (10)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PR #60 added entity dedup for future digests, but existing duplicate entities in the DB were never cleaned up. This script finds duplicates by (lower(name), entity_type), keeps the entity with the most relations and evidence chunks, and merges orphans using merge_entities(). Results on brainlayer.db: 39→37 entities, 2 case-variant duplicates resolved (BrainLayer/brainlayer, Golems/golems). zikaron.db was clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PR #60 added entity dedup for future digests, but existing duplicate entities in the DB were never cleaned up. This script finds duplicates by (lower(name), entity_type), keeps the entity with the most relations and evidence chunks, and merges orphans using merge_entities(). Results on brainlayer.db: 39→37 entities, 2 case-variant duplicates resolved (BrainLayer/brainlayer, Golems/golems). zikaron.db was clean. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
- Recent Hardening section traces each claim to a merged PR - BrainBar build-script guards (#264, #265) called out at the install step - Phase B preventive infra block (orchestrator#58, #60) connects deploy registry to the BrainBar build-stamp + canonical-build refuse layer - In-flight PR #251 entry documents NSPanel revival + trigram FTS5 startup-safety guard (10K-chunk threshold) + preserved /tmp/brainbar.sock pub/sub plane Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Recent Hardening section traces each claim to a merged PR - BrainBar build-script guards (#264, #265) called out at the install step - Phase B preventive infra block (orchestrator#58, #60) connects deploy registry to the BrainBar build-stamp + canonical-build refuse layer - In-flight PR #251 entry documents NSPanel revival + trigram FTS5 startup-safety guard (10K-chunk threshold) + preserved /tmp/brainbar.sock pub/sub plane Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…267) - Recent Hardening section traces each claim to a merged PR - BrainBar build-script guards (#264, #265) called out at the install step - Phase B preventive infra block (orchestrator#58, #60) connects deploy registry to the BrainBar build-stamp + canonical-build refuse layer - In-flight PR #251 entry documents NSPanel revival + trigram FTS5 startup-safety guard (10K-chunk threshold) + preserved /tmp/brainbar.sock pub/sub plane Co-authored-by: Test User <test@example.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
brain_searchnow returns 150-char snippets + chunk_id pointers instead of full content (~60% token savings). Usedetail="full"for verbose output.brain_expandtool: First-class MCP tool to drill into specific search results — takeschunk_idand returns full content + surrounding context._dedup_entities()deduplicates by(normalized_name, entity_type)before resolution — fixes duplicate entity references when content mentions the same entity multiple times.resolve_entity()now accepts optionalembed_fnfor cosine similarity matching (threshold 0.92 auto-match, 0.75 fuzzy match) when exact/alias/FTS fails.Changes
mcp/__init__.pybrain_expandtool (8 tools now), renameformat→detailparam, update server instructionsmcp/_shared.py_build_compact_result()returnssnippet(150 chars) +chunk_idinstead ofcontent(500 chars)mcp/search_handler.pydetail="compact", backward compat for oldformatkwargpipeline/batch_extraction.py_dedup_entities()function, dedup before resolutionpipeline/digest.pypipeline/entity_resolution.pyresolve_entity()acceptsembed_fn, vector similarity cascade (0.92/0.75 thresholds)Test plan
tests/test_smart_search_entity_dedup.pytest_phase6_critical.pyandtest_search_routing.pyfor new compact format🤖 Generated with Claude Code