fix: include NULL-project entries in search + embed BrainBar chunks#95
fix: include NULL-project entries in search + embed BrainBar chunks#95
Conversation
BrainBar MCP entries had source='mcp', project=NULL, created_at=NULL —
invisible to both semantic and keyword search due to strict project filter.
- search_repo.py: change `c.project = ?` to `(c.project = ? OR c.project IS NULL)`
in 4 locations (semantic, FTS5, text search, post-RRF filter)
- store.py: expand embed_pending_chunks to `source IN ('manual', 'mcp')`
so BrainBar entries get vector embeddings
- README.md: add Groq as primary enrichment backend, update chunk count to 224K
Verified: 4/5 test queries now surface BrainBar entries as #1 result (was 0/5).
923 tests pass, 0 regressions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📝 WalkthroughWalkthroughUpdated README with new backend information and revised indexing metrics. Modified project filtering logic in search queries to include NULL values, and expanded chunk embeddings backfill scope to support MCP-sourced chunks alongside manual sources. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@README.md`:
- Line 15: Update the README text that currently states "9 MCP tools" to reflect
the Phase B contract by changing it to "8 MCP tools" and document the required
canonical tool names: brain_search, brain_store, brain_recall, brain_entity,
brain_expand, brain_update, brain_digest, brain_tags; also note that
brain_get_person should be documented as a legacy alias (backward compatibility)
replaced by brain_tags.
- Around line 164-170: Update the README's backend precedence to reframe Groq as
an optional cloud backend (not the default) and document the runtime selection:
auto-detect MLX on Apple Silicon, fall back to Ollama after 3 consecutive MLX
failures, and allow explicit override via the BRAINLAYER_ENRICH_BACKEND
environment variable; also update any wording that currently calls Groq the
“Primary backend” to avoid implying it is the default. Ensure the README
mentions the failure-count fallback behavior and the override env var so docs
match the implemented auto-detect logic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 2269aaab-35e7-4c79-b82d-e2943f0e4f94
📒 Files selected for processing (3)
README.mdsrc/brainlayer/search_repo.pysrc/brainlayer/store.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: test (3.12)
- GitHub Check: test (3.11)
- GitHub Check: test (3.13)
🧰 Additional context used
📓 Path-based instructions (2)
src/brainlayer/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/brainlayer/**/*.py: Python package structure should follow the layout:src/brainlayer/for package code, with separate modules forvector_store.py,embeddings.py,daemon.py,dashboard/, andmcp/for different concerns
Usepaths.py:get_db_path()for all database path resolution instead of hardcoding paths; support environment variable overrides and canonical path fallback (~/.local/share/brainlayer/brainlayer.db)
Lint and format Python code usingruff check src/andruff format src/
Preserve verbatim content forai_code,stack_trace, anduser_messagemessage types during classification and chunking; skipnoisecontent entirely; summarizebuild_logcontent; extract structure-only fordir_listing
Use AST-aware chunking with tree-sitter; never split stack traces; mask large tool output during chunking
Handle SQLite concurrency by implementing retry logic onSQLITE_BUSYerrors; ensure each worker uses its own database connection
Prioritize MLX (Qwen2.5-Coder-14B-Instruct-4bit) on Apple Silicon (port 8080) as the enrichment backend; fall back to Ollama (glm-4.7-flashon port 11434) after 3 consecutive MLX failures; support backend override viaBRAINLAYER_ENRICH_BACKENDenvironment variable
Brain graph API must expose endpoints:/brain/graph,/brain/node/{node_id}(FastAPI)
Backlog API must support endpoints:/backlog/itemswith GET, POST, PATCH, DELETE operations (FastAPI)
Providebrainlayer brain-exportcommand to export brain graph as JSON for dashboard consumption
Providebrainlayer export-obsidiancommand to export as Markdown vault with backlinks and tags
For bulk database operations: stop enrichment workers first, checkpoint WAL before and after operations, drop FTS triggers before bulk deletes, batch deletes in 5-10K chunks with checkpoint every 3 batches, never delete fromchunkswhile FTS trigger is active
Files:
src/brainlayer/store.pysrc/brainlayer/search_repo.py
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior
Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work
Run pytest before claiming behavior changed safely; current test suite has 929 tests
Files:
src/brainlayer/store.pysrc/brainlayer/search_repo.py
🧠 Learnings (3)
📚 Learning: 2026-03-12T14:22:54.809Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-12T14:22:54.809Z
Learning: Applies to src/brainlayer/**/*.py : Prioritize MLX (`Qwen2.5-Coder-14B-Instruct-4bit`) on Apple Silicon (port 8080) as the enrichment backend; fall back to Ollama (`glm-4.7-flash` on port 11434) after 3 consecutive MLX failures; support backend override via `BRAINLAYER_ENRICH_BACKEND` environment variable
Applied to files:
README.md
📚 Learning: 2026-03-17T01:04:22.497Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-03-17T01:04:22.497Z
Learning: Applies to src/brainlayer/mcp/**/*.py and brain-bar/Sources/BrainBar/MCPRouter.swift: The 8 required MCP tools are `brain_search`, `brain_store`, `brain_recall`, `brain_entity`, `brain_expand`, `brain_update`, `brain_digest`, `brain_tags`. `brain_tags` is the 8th tool, replacing `brain_get_person`, as defined in the Phase B spec merged in PR `#72`. The Python MCP server already implements `brain_tags`. Legacy `brainlayer_*` aliases must be maintained for backward compatibility.
Applied to files:
README.md
📚 Learning: 2026-03-14T02:20:54.656Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-14T02:20:54.656Z
Learning: Be aware of known BrainLayer issues: DB locking during enrichment and WAL growth up to 4.7GB
Applied to files:
README.md
🔇 Additional comments (6)
README.md (1)
174-174: Groq override example looks good.Explicitly showing
BRAINLAYER_ENRICH_BACKEND=groqis clear and matches an opt-in backend flow.src/brainlayer/store.py (1)
219-230: LGTM — correctly expands embedding backfill to include MCP-sourced chunks.The SQL
IN ('manual', 'mcp')clause is syntactically correct and the per-chunk error handling isolates failures. The function already runs in a dedicated background thread with its own VectorStore connection (perstore_handler.py:336-357), so no new concurrency concerns are introduced.src/brainlayer/search_repo.py (4)
136-138: LGTM — consistent update across all search paths.The SQL predicate
(c.project = ? OR c.project IS NULL)correctly includes NULL-project entries when a project filter is specified. This change is consistently applied to semantic search, text LIKE search, and FTS5 search.
203-205: Consistent with semantic search update.
512-514: Consistent with semantic search update.
644-650: Post-RRF filter correctly aligned with SQL changes.The tuple membership check
not in (project_filter, None)ensures FTS-only results withproject=Noneare retained, matching the behavior of the SQL predicate changes upstream.
| --- | ||
|
|
||
| **317,000+ chunks indexed** · **1,002 Python + 28 Swift tests** · **Real-time indexing hooks** · **9 MCP tools** · **BrainBar daemon (209KB)** · **Zero cloud dependencies** | ||
| **224,000+ chunks indexed** · **1,002 Python + 28 Swift tests** · **Real-time indexing hooks** · **9 MCP tools** · **BrainBar daemon (209KB)** · **Zero cloud dependencies** |
There was a problem hiding this comment.
Update MCP tool count to match the current contract.
Line 15 says 9 MCP tools, but current Phase B spec requires 8 canonical tools with brain_tags replacing brain_get_person (which should be documented as a legacy alias only).
Based on learnings: The 8 required MCP tools are brain_search, brain_store, brain_recall, brain_entity, brain_expand, brain_update, brain_digest, brain_tags, with legacy aliases kept for backward compatibility.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@README.md` at line 15, Update the README text that currently states "9 MCP
tools" to reflect the Phase B contract by changing it to "8 MCP tools" and
document the required canonical tool names: brain_search, brain_store,
brain_recall, brain_entity, brain_expand, brain_update, brain_digest,
brain_tags; also note that brain_get_person should be documented as a legacy
alias (backward compatibility) replaced by brain_tags.
| Three enrichment backends: | ||
|
|
||
| | Backend | Best for | Speed | | ||
| |---------|----------|-------| | ||
| | **Groq** (cloud) | Primary backend (March 2026+) | ~1-2s/chunk | | ||
| | **MLX** (Apple Silicon) | M1/M2/M3 Macs | 21-87% faster than Ollama | | ||
| | **Ollama** | Any platform | ~1s/chunk (short), ~13s (long) | |
There was a problem hiding this comment.
Backend precedence is internally inconsistent in docs.
Declaring Groq as “Primary backend” conflicts with the README’s default behavior (auto-detect MLX on Apple Silicon, else Ollama) and local-first/zero-cloud positioning. This should be reframed as an optional cloud backend unless runtime default selection changed accordingly.
Based on learnings: Prioritize MLX on Apple Silicon, fall back to Ollama after 3 consecutive MLX failures, and allow override via BRAINLAYER_ENRICH_BACKEND.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@README.md` around lines 164 - 170, Update the README's backend precedence to
reframe Groq as an optional cloud backend (not the default) and document the
runtime selection: auto-detect MLX on Apple Silicon, fall back to Ollama after 3
consecutive MLX failures, and allow explicit override via the
BRAINLAYER_ENRICH_BACKEND environment variable; also update any wording that
currently calls Groq the “Primary backend” to avoid implying it is the default.
Ensure the README mentions the failure-count fallback behavior and the override
env var so docs match the implemented auto-detect logic.
Reframe Groq as fallback (not primary) to match auto-detect logic: MLX → Ollama → Groq. Addresses CodeRabbit review on #95. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
source='mcp',project=NULL,created_at=NULL— invisible to both semantic and keyword searchc.project = ?to(c.project = ? OR c.project IS NULL)in 4 filter locations insearch_repo.pyembed_pending_chunksto processsource IN ('manual', 'mcp')so BrainBar entries get embeddingsTest plan
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes
Documentation