fix: Phase 3 core fixes — DB paths, date filtering, search metadata by EtanHey · Pull Request #1 · EtanHey/brainlayer

EtanHey · 2026-02-19T17:45:48Z

Summary

DB path fix: Centralized path resolution in paths.py — all 8 modules now resolve to the correct ~/.local/share/zikaron/zikaron.db instead of the empty brainlayer.db
Date filtering: Added created_at column with date_from/date_to search params in MCP tool, backfill script that searches archives
Search metadata: Results now include created_at timestamps and source (claude_code/whatsapp/youtube)
Project normalization: Decode Claude Code path encoding (-Users-etanheyman-Gits-golems → golems) for cleaner filtering
Chunker fix: Sentence-aware splitting for oversized paragraphs instead of mid-sentence truncation
Data documentation: Comprehensive docs/data-locations.md — where all data lives, archive strategy, migration history

Backfill Results

Ran scripts/backfill-created-at.py — 107,935/268,864 chunks (40.1%) now have created_at:

35,339 from metadata timestamps (WhatsApp/YouTube)
72,596 from archived JSONL session files
160,929 remaining are pre-archiver sessions whose JSONL files no longer exist

Test plan

Verify MCP search works with date_from/date_to params
Verify project normalization in search results
Verify BRAINLAYER_DB env var override works
Run backfill script on fresh DB to verify archive path scanning

🤖 Generated with Claude Code

Note

Medium Risk
Touches core persistence/search paths and extends the SQLite schema with a new indexed created_at field, which can affect query results and indexing behavior across multiple entrypoints. Risk is mitigated by additive schema changes and backfill tooling, but rollout should verify searches still hit the intended DB and filters behave as expected.

Overview
Centralizes DB path resolution via new src/brainlayer/paths.py (supports BRAINLAYER_DB override and prefers the legacy ~/.local/share/zikaron/zikaron.db when present), and updates daemon/dashboard/MCP/pipelines/scripts to import DEFAULT_DB_PATH instead of hardcoding paths.

Adds temporal metadata and filtering by introducing a created_at column + index on chunks, populating it on new ingests (index_new.py), surfacing it (and source) in search metadata, and extending MCP brainlayer_search with date_from/date_to that flow through VectorStore.search/hybrid_search.

Adds a new scripts/backfill-created-at.py to backfill timestamps from existing metadata, session JSONL files (including archives/manifests), file mtimes, and a final rowid-based estimate pass; also improves text chunking to split oversized paragraphs on sentence boundaries and documents data/archive locations in docs/data-locations.md.

^{Written by Cursor Bugbot for commit 68869bb. This will update automatically on new commits. Configure here.}

Critical fixes: - Centralize DB path resolution in paths.py (fixes empty DB bug) - All modules now resolve to ~/.local/share/zikaron/zikaron.db - ENV override: BRAINLAYER_DB for custom paths New features: - Date filtering: created_at column, date_from/date_to search params - Backfill script for existing chunks (searches archives) - Search results include created_at and source metadata - Project name normalization (decode Claude path encoding) - Sentence-aware text chunking for oversized paragraphs - Comprehensive data locations documentation Files: 10 modified, 3 new (paths.py, backfill-created-at.py, data-locations.md) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 3 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

cursor · 2026-02-19T17:53:24Z

+                if os.path.isdir(candidate_path):
+                    return candidate_name
+            # Fallback: return first segment (best guess)
+            return remaining[0]


Filesystem fallback truncates multi-segment project names

Medium Severity

normalize_project_name falls back to remaining[0] (only the first segment after "Gits") when the os.path.isdir() filesystem lookup fails. For multi-segment project names like rudy-monorepo, this returns "rudy" instead of "rudy-monorepo" when the original directory path no longer exists. Since the PR documents that repos moved from ~/Desktop/Gits/ to ~/Gits/, old encoded paths will fail the lookup against the old gits_dir and produce truncated names, breaking project filtering.

cursor · 2026-02-19T17:53:25Z

+                filter_params.append(date_from)
+            if date_to:
+                where_clauses.append("c.created_at <= ?")
+                filter_params.append(date_to)


Date-only date_to excludes entries from that day

Medium Severity

The date_to filter uses created_at <= ? with string comparison, but the MCP tool schema encourages date-only values like '2026-02-19'. Since created_at stores full ISO 8601 timestamps (e.g. "2026-02-19T10:30:00+00:00"), and in lexicographic ordering "2026-02-19T..." > "2026-02-19", all entries actually on the date_to date are excluded. date_from is unaffected because the inequality goes the other direction.

Additional Locations (2)

src/brainlayer/vector_store.py#L427-L429

src/brainlayer/vector_store.py#L608-L611

cursor · 2026-02-19T17:53:25Z

+    import re
+    name = re.sub(r'-(?:nightshift|haiku|worktree)-\d+$', '', name)
+
+    return name


Duplicate project normalization diverges from CLI implementation

Low Severity

normalize_project_name in mcp/__init__.py reimplements project name normalization that already exists in cli/__init__.py (_clean_project_name + _normalize_project_name). The two implementations differ: the CLI version handles additional path markers (Desktop, projects, config), supports monorepo package mappings and project aliases, and has no filesystem dependency. Having two divergent implementations risks inconsistent behavior between indexing (CLI) and searching (MCP).

For ~160K chunks whose source JSONL files no longer exist (pre-archiver era), estimate dates using rowid proximity to chunks with known dates. Chunks are indexed sequentially, so nearby rowids correlate with similar timestamps. Result: 268,864/268,864 chunks (100%) now have created_at. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

BrainBar MCP entries had source='mcp', project=NULL, created_at=NULL — invisible to both semantic and keyword search due to strict project filter. - search_repo.py: change `c.project = ?` to `(c.project = ? OR c.project IS NULL)` in 4 locations (semantic, FTS5, text search, post-RRF filter) - store.py: expand embed_pending_chunks to `source IN ('manual', 'mcp')` so BrainBar entries get vector embeddings - README.md: add Groq as primary enrichment backend, update chunk count to 224K Verified: 4/5 test queries now surface BrainBar entries as #1 result (was 0/5). 923 tests pass, 0 regressions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

) * fix: include NULL-project entries in search + embed BrainBar chunks BrainBar MCP entries had source='mcp', project=NULL, created_at=NULL — invisible to both semantic and keyword search due to strict project filter. - search_repo.py: change `c.project = ?` to `(c.project = ? OR c.project IS NULL)` in 4 locations (semantic, FTS5, text search, post-RRF filter) - store.py: expand embed_pending_chunks to `source IN ('manual', 'mcp')` so BrainBar entries get vector embeddings - README.md: add Groq as primary enrichment backend, update chunk count to 224K Verified: 4/5 test queries now surface BrainBar entries as #1 result (was 0/5). 923 tests pass, 0 regressions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: clarify enrichment backend precedence in README Reframe Groq as fallback (not primary) to match auto-detect logic: MLX → Ollama → Groq. Addresses CodeRabbit review on #95. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fixes: - Removed merge conflict marker from BUGBOT_REVIEW_FTS_RECALL.md - Added critical bug warning to exact chunk-ID bypass section Final Status Report: - 3 critical issues confirmed by Cursor Bugbot (independent verification) - Issue #1 (P0): Cross-project data leakage via exact bypass - Issue #2 (P2): Recall regression from phrase matching - Issue #3 (P0): Trigram-only results bypass filters Verdict: MERGE BLOCKED until critical issues fixed Credit: Issues independently confirmed by Cursor's own Bugbot system with actionable fix links provided Co-authored-by: Etan Heyman <EtanHey@users.noreply.github.com>

* fix: improve FTS recall across exact IDs and aliases * docs: add Bugbot review for FTS recall hardening - Comprehensive review of retrieval correctness across 3 layers - Write safety analysis for schema migrations - MCP stability verification - Performance observations (storage +1.8GB, query latency) - Edge case analysis and recommendations - Approve with confidence for merge Co-authored-by: Etan Heyman <EtanHey@users.noreply.github.com> * fix: default exact chunk lookup project label * fix: preserve search routing and trigram repair semantics * docs: update Bugbot review - fix markdown issues and add re-review addendum Fixes: - Corrected chunk-id regex false positive examples (whitespace issue) - Fixed markdown heading spacing (MD022, MD031 compliance) - Added blank lines around headings and code blocks Re-review addendum (commit bcddd14): - Reviewed 6 additional fixes since initial review - All fixes approved: KG error handling, trigram repair, lifecycle filtering, sender/language filters - Critical correctness fixes: exact bypass lifecycle + FTS filter completeness - Updated verdict: APPROVED with increased confidence - Production-ready, ship with confidence Co-authored-by: Etan Heyman <EtanHey@users.noreply.github.com> * docs: add Bugbot re-review summary Co-authored-by: Etan Heyman <EtanHey@users.noreply.github.com> * fix: satisfy lint and review follow-ups * docs: Bugbot critical issues - 3 bugs must be fixed before merge Critical Issues Identified: 1. P0: Trigram-only results bypass post-RRF filters (search_repo.py:1141) 2. P1: Exact chunk-ID bypass ignores project/filter scope (search_handler.py:389) 3. P2: Alias expansion breaks FTS token-level semantics (search_handler.py:131) All three issues represent real correctness bugs: - Cross-project data leakage via exact bypass - Filter contract violations for trigram hits - Potential recall regression on multi-word queries Verdict: APPROVE WITH MANDATORY FIXES Fixes are straightforward (5-15 min each), must be completed before merge Credit: Issues identified by Macroscope and Codex reviews Co-authored-by: Etan Heyman <EtanHey@users.noreply.github.com> * fix: harden scoped exact-id and alias expansion search * docs: fix merge conflict and add final Bugbot status report Fixes: - Removed merge conflict marker from BUGBOT_REVIEW_FTS_RECALL.md - Added critical bug warning to exact chunk-ID bypass section Final Status Report: - 3 critical issues confirmed by Cursor Bugbot (independent verification) - Issue #1 (P0): Cross-project data leakage via exact bypass - Issue #2 (P2): Recall regression from phrase matching - Issue #3 (P0): Trigram-only results bypass filters Verdict: MERGE BLOCKED until critical issues fixed Credit: Issues independently confirmed by Cursor's own Bugbot system with actionable fix links provided Co-authored-by: Etan Heyman <EtanHey@users.noreply.github.com> * style: format alias expansion regression test --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Etan Heyman <EtanHey@users.noreply.github.com>

cursor Bot reviewed Feb 19, 2026

View reviewed changes

EtanHey merged commit 700f829 into main Feb 19, 2026
0 of 4 checks passed

EtanHey deleted the fix/db-path-resolution branch February 19, 2026 20:22

cursor Bot mentioned this pull request Mar 9, 2026

feat: eval suite + entity injection in prompt hook #72

Merged

4 tasks

EtanHey mentioned this pull request Mar 19, 2026

fix: include NULL-project entries in search + embed BrainBar chunks #95

Merged

4 tasks

cursor Bot mentioned this pull request Mar 29, 2026

feat: BrainBar quick capture foundation + single-instance enforcement #137

Merged

3 tasks

cursor Bot mentioned this pull request Apr 28, 2026

Fix queue fallback for Swift brain_store #261

Merged

cursor Bot mentioned this pull request Apr 30, 2026

fix: harden BrainLayer FTS recall across all three layers #263

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Phase 3 core fixes — DB paths, date filtering, search metadata#1

fix: Phase 3 core fixes — DB paths, date filtering, search metadata#1
EtanHey merged 2 commits intomainfrom
fix/db-path-resolution

EtanHey commented Feb 19, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Feb 19, 2026

Uh oh!

cursor Bot Feb 19, 2026

Uh oh!

cursor Bot Feb 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

EtanHey commented Feb 19, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Backfill Results

Test plan

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Feb 19, 2026

Choose a reason for hiding this comment

Filesystem fallback truncates multi-segment project names

Uh oh!

cursor Bot Feb 19, 2026

Choose a reason for hiding this comment

Date-only date_to excludes entries from that day

Uh oh!

cursor Bot Feb 19, 2026

Choose a reason for hiding this comment

Duplicate project normalization diverges from CLI implementation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

EtanHey commented Feb 19, 2026 •

edited by cursor Bot

Loading

Date-only `date_to` excludes entries from that day