fix: include NULL-project entries in search + embed BrainBar chunks by EtanHey · Pull Request #95 · EtanHey/brainlayer

EtanHey · 2026-03-19T06:51:50Z

Summary

BrainBar MCP entries had source='mcp', project=NULL, created_at=NULL — invisible to both semantic and keyword search
Changed c.project = ? to (c.project = ? OR c.project IS NULL) in 4 filter locations in search_repo.py
Expanded embed_pending_chunks to process source IN ('manual', 'mcp') so BrainBar entries get embeddings
Updated README: added Groq as primary enrichment backend, corrected chunk count to 224K

Test plan

923 tests pass, 0 regressions
4/5 test queries now surface BrainBar entries as fix: Phase 3 core fixes — DB paths, date filtering, search metadata #1 result (was 0/5)
63 BrainBar entries embedded successfully
Restart MCP server and verify via live brain_search

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added Groq cloud as a new enrichment backend option.
Bug Fixes
- Improved search filtering to include results with unset project values across all search types.
- Extended embedding generation to cover chunks from additional data sources.
Documentation
- Updated README with current indexing scale and example CLI configuration.

BrainBar MCP entries had source='mcp', project=NULL, created_at=NULL — invisible to both semantic and keyword search due to strict project filter. - search_repo.py: change `c.project = ?` to `(c.project = ? OR c.project IS NULL)` in 4 locations (semantic, FTS5, text search, post-RRF filter) - store.py: expand embed_pending_chunks to `source IN ('manual', 'mcp')` so BrainBar entries get vector embeddings - README.md: add Groq as primary enrichment backend, update chunk count to 224K Verified: 4/5 test queries now surface BrainBar entries as #1 result (was 0/5). 923 tests pass, 0 regressions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

coderabbitai · 2026-03-19T06:52:09Z

Warning

Rate limit exceeded

@EtanHey has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 18 minutes and 50 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a283f297-293a-49d3-ab0b-533dd6bbcb5f

📥 Commits

Reviewing files that changed from the base of the PR and between 943a592 and 15a212a.

📒 Files selected for processing (1)

README.md

📝 Walkthrough

Walkthrough

Updated README with new backend information and revised indexing metrics. Modified project filtering logic in search queries to include NULL values, and expanded chunk embeddings backfill scope to support MCP-sourced chunks alongside manual sources.

Changes

Cohort / File(s)	Summary
Documentation Updates `README.md`	Updated indexing scale from 317,000+ to 224,000+ chunks. Added Groq (cloud) as primary enrichment backend with ~1–2s/chunk timing (March 2026+). Updated example CLI command to use `BRAINLAYER_ENRICH_BACKEND=groq` instead of `mlx`.
Database Query Logic `src/brainlayer/search_repo.py`, `src/brainlayer/store.py`	Modified project filtering to include NULL project values across semantic vector search, text LIKE search, and FTS5 hybrid search queries. Updated post-RRF filtering to treat `None` projects as matching. Expanded `embed_pending_chunks()` to backfill embeddings for both 'manual' and 'mcp' sourced chunks.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Searching through the garden of chunks, we hop with glee,
NULL values now included in our queries so free!
Groq joins the feast, swift as a spring,
While MCP sources fill our embedding ring—
Brainlayer grows stronger, a rabbit's delight! 🥕✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main fixes: including NULL-project entries in search and embedding BrainBar (MCP) chunks, directly matching the core changes across search_repo.py and store.py.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/search-quality-null-project

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@README.md`:
- Line 15: Update the README text that currently states "9 MCP tools" to reflect
the Phase B contract by changing it to "8 MCP tools" and document the required
canonical tool names: brain_search, brain_store, brain_recall, brain_entity,
brain_expand, brain_update, brain_digest, brain_tags; also note that
brain_get_person should be documented as a legacy alias (backward compatibility)
replaced by brain_tags.
- Around line 164-170: Update the README's backend precedence to reframe Groq as
an optional cloud backend (not the default) and document the runtime selection:
auto-detect MLX on Apple Silicon, fall back to Ollama after 3 consecutive MLX
failures, and allow explicit override via the BRAINLAYER_ENRICH_BACKEND
environment variable; also update any wording that currently calls Groq the
“Primary backend” to avoid implying it is the default. Ensure the README
mentions the failure-count fallback behavior and the override env var so docs
match the implemented auto-detect logic.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2269aaab-35e7-4c79-b82d-e2943f0e4f94

📥 Commits

Reviewing files that changed from the base of the PR and between 1fc7ba7 and 943a592.

📒 Files selected for processing (3)

README.md
src/brainlayer/search_repo.py
src/brainlayer/store.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: test (3.12)
GitHub Check: test (3.11)
GitHub Check: test (3.13)

🧰 Additional context used

📓 Path-based instructions (2)

src/brainlayer/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/brainlayer/**/*.py: Python package structure should follow the layout: src/brainlayer/ for package code, with separate modules for vector_store.py, embeddings.py, daemon.py, dashboard/, and mcp/ for different concerns
Use paths.py:get_db_path() for all database path resolution instead of hardcoding paths; support environment variable overrides and canonical path fallback (~/.local/share/brainlayer/brainlayer.db)
Lint and format Python code using ruff check src/ and ruff format src/
Preserve verbatim content for ai_code, stack_trace, and user_message message types during classification and chunking; skip noise content entirely; summarize build_log content; extract structure-only for dir_listing
Use AST-aware chunking with tree-sitter; never split stack traces; mask large tool output during chunking
Handle SQLite concurrency by implementing retry logic on SQLITE_BUSY errors; ensure each worker uses its own database connection
Prioritize MLX (Qwen2.5-Coder-14B-Instruct-4bit) on Apple Silicon (port 8080) as the enrichment backend; fall back to Ollama (glm-4.7-flash on port 11434) after 3 consecutive MLX failures; support backend override via BRAINLAYER_ENRICH_BACKEND environment variable
Brain graph API must expose endpoints: /brain/graph, /brain/node/{node_id} (FastAPI)
Backlog API must support endpoints: /backlog/items with GET, POST, PATCH, DELETE operations (FastAPI)
Provide brainlayer brain-export command to export brain graph as JSON for dashboard consumption
Provide brainlayer export-obsidian command to export as Markdown vault with backlinks and tags
For bulk database operations: stop enrichment workers first, checkpoint WAL before and after operations, drop FTS triggers before bulk deletes, batch deletes in 5-10K chunks with checkpoint every 3 batches, never delete from chunks while FTS trigger is active

Files:

src/brainlayer/store.py
src/brainlayer/search_repo.py

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior
Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work
Run pytest before claiming behavior changed safely; current test suite has 929 tests

Files:

src/brainlayer/store.py
src/brainlayer/search_repo.py

🧠 Learnings (3)

📚 Learning: 2026-03-12T14:22:54.809Z

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-12T14:22:54.809Z
Learning: Applies to src/brainlayer/**/*.py : Prioritize MLX (`Qwen2.5-Coder-14B-Instruct-4bit`) on Apple Silicon (port 8080) as the enrichment backend; fall back to Ollama (`glm-4.7-flash` on port 11434) after 3 consecutive MLX failures; support backend override via `BRAINLAYER_ENRICH_BACKEND` environment variable

Applied to files:

README.md

📚 Learning: 2026-03-17T01:04:22.497Z

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-03-17T01:04:22.497Z
Learning: Applies to src/brainlayer/mcp/**/*.py and brain-bar/Sources/BrainBar/MCPRouter.swift: The 8 required MCP tools are `brain_search`, `brain_store`, `brain_recall`, `brain_entity`, `brain_expand`, `brain_update`, `brain_digest`, `brain_tags`. `brain_tags` is the 8th tool, replacing `brain_get_person`, as defined in the Phase B spec merged in PR `#72`. The Python MCP server already implements `brain_tags`. Legacy `brainlayer_*` aliases must be maintained for backward compatibility.

Applied to files:

README.md

📚 Learning: 2026-03-14T02:20:54.656Z

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-14T02:20:54.656Z
Learning: Be aware of known BrainLayer issues: DB locking during enrichment and WAL growth up to 4.7GB

Applied to files:

README.md

🔇 Additional comments (6)

README.md (1)

174-174: Groq override example looks good.

Explicitly showing BRAINLAYER_ENRICH_BACKEND=groq is clear and matches an opt-in backend flow.

src/brainlayer/store.py (1)

219-230: LGTM — correctly expands embedding backfill to include MCP-sourced chunks.

The SQL IN ('manual', 'mcp') clause is syntactically correct and the per-chunk error handling isolates failures. The function already runs in a dedicated background thread with its own VectorStore connection (per store_handler.py:336-357), so no new concurrency concerns are introduced.

src/brainlayer/search_repo.py (4)

136-138: LGTM — consistent update across all search paths.

The SQL predicate (c.project = ? OR c.project IS NULL) correctly includes NULL-project entries when a project filter is specified. This change is consistently applied to semantic search, text LIKE search, and FTS5 search.

203-205: Consistent with semantic search update.

512-514: Consistent with semantic search update.

644-650: Post-RRF filter correctly aligned with SQL changes.

The tuple membership check not in (project_filter, None) ensures FTS-only results with project=None are retained, matching the behavior of the SQL predicate changes upstream.

coderabbitai · 2026-03-19T06:56:33Z

 ---

-**317,000+ chunks indexed** · **1,002 Python + 28 Swift tests** · **Real-time indexing hooks** · **9 MCP tools** · **BrainBar daemon (209KB)** · **Zero cloud dependencies**
+**224,000+ chunks indexed** · **1,002 Python + 28 Swift tests** · **Real-time indexing hooks** · **9 MCP tools** · **BrainBar daemon (209KB)** · **Zero cloud dependencies**


⚠️ Potential issue | 🟠 Major

Update MCP tool count to match the current contract.

Line 15 says 9 MCP tools, but current Phase B spec requires 8 canonical tools with brain_tags replacing brain_get_person (which should be documented as a legacy alias only).

Based on learnings: The 8 required MCP tools are brain_search, brain_store, brain_recall, brain_entity, brain_expand, brain_update, brain_digest, brain_tags, with legacy aliases kept for backward compatibility.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@README.md` at line 15, Update the README text that currently states "9 MCP tools" to reflect the Phase B contract by changing it to "8 MCP tools" and document the required canonical tool names: brain_search, brain_store, brain_recall, brain_entity, brain_expand, brain_update, brain_digest, brain_tags; also note that brain_get_person should be documented as a legacy alias (backward compatibility) replaced by brain_tags.

coderabbitai · 2026-03-19T06:56:33Z

+Three enrichment backends:

 | Backend | Best for | Speed |
 |---------|----------|-------|
+| **Groq** (cloud) | Primary backend (March 2026+) | ~1-2s/chunk |
 | **MLX** (Apple Silicon) | M1/M2/M3 Macs | 21-87% faster than Ollama |
 | **Ollama** | Any platform | ~1s/chunk (short), ~13s (long) |


⚠️ Potential issue | 🟠 Major

Backend precedence is internally inconsistent in docs.

Declaring Groq as “Primary backend” conflicts with the README’s default behavior (auto-detect MLX on Apple Silicon, else Ollama) and local-first/zero-cloud positioning. This should be reframed as an optional cloud backend unless runtime default selection changed accordingly.

Based on learnings: Prioritize MLX on Apple Silicon, fall back to Ollama after 3 consecutive MLX failures, and allow override via BRAINLAYER_ENRICH_BACKEND.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@README.md` around lines 164 - 170, Update the README's backend precedence to reframe Groq as an optional cloud backend (not the default) and document the runtime selection: auto-detect MLX on Apple Silicon, fall back to Ollama after 3 consecutive MLX failures, and allow explicit override via the BRAINLAYER_ENRICH_BACKEND environment variable; also update any wording that currently calls Groq the “Primary backend” to avoid implying it is the default. Ensure the README mentions the failure-count fallback behavior and the override env var so docs match the implemented auto-detect logic.

Reframe Groq as fallback (not primary) to match auto-detect logic: MLX → Ollama → Groq. Addresses CodeRabbit review on #95. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

greptile-apps Bot reviewed Mar 19, 2026

View reviewed changes

coderabbitai Bot reviewed Mar 19, 2026

View reviewed changes

docs: clarify enrichment backend precedence in README

15a212a

Reframe Groq as fallback (not primary) to match auto-detect logic: MLX → Ollama → Groq. Addresses CodeRabbit review on #95. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

EtanHey merged commit 4af55ff into main Mar 19, 2026
6 checks passed

EtanHey deleted the fix/search-quality-null-project branch March 19, 2026 07:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: include NULL-project entries in search + embed BrainBar chunks#95

fix: include NULL-project entries in search + embed BrainBar chunks#95
EtanHey merged 2 commits intomainfrom
fix/search-quality-null-project

EtanHey commented Mar 19, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

greptile-apps Bot left a comment

Uh oh!

coderabbitai Bot commented Mar 19, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Mar 19, 2026

Uh oh!

coderabbitai Bot Mar 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

EtanHey commented Mar 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

EtanHey commented Mar 19, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 19, 2026 •

edited

Loading