Skip to content

fix: BrainBar FTS5 JOIN uses chunk_id instead of rowid#94

Merged
EtanHey merged 1 commit intomainfrom
fix/brainbar-importance-filter
Mar 18, 2026
Merged

fix: BrainBar FTS5 JOIN uses chunk_id instead of rowid#94
EtanHey merged 1 commit intomainfrom
fix/brainbar-importance-filter

Conversation

@EtanHey
Copy link
Copy Markdown
Owner

@EtanHey EtanHey commented Mar 18, 2026

Summary

  • Root cause: Production FTS5 trigger (Python) doesn't set explicit rowid. After FTS5 rebuilds, auto-assigned rowids diverge from chunks.rowid. The f.rowid = c.rowid JOIN mapped to wrong chunks — importance_min, project, and tag filters checked the wrong chunk's data.
  • Fix: JOIN chunks c ON c.id = f.chunk_id instead of ON f.rowid = c.rowid. The chunk_id UNINDEXED column in FTS5 stores the actual chunk ID, immune to rowid divergence.
  • Also fixed: importance read as Double (was rounded to Int via sqlite3_column_int), test triggers updated to match production (no explicit rowid).

Verified via socat

With importance_min=7: 5 results (importance: 8, 8, 8, 7, 7)
Without filter:        5 results (importance: 5, 5, 5, 5, 5)

Test plan

  • 39 BrainBar Swift tests pass (0 failures)
  • New testImportanceFilterWorksWithDivergedRowids reproduces exact production bug (RED before fix, GREEN after)
  • New testAllResultsRespectImportanceMin verifies all results >= threshold
  • Live socat test confirms filter works against 324K-chunk production DB
  • BrainBar rebuilt and restarted with fix

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Improved Full-Text Search record synchronization for enhanced data reliability and consistency
    • Enhanced importance-based filtering to work correctly across all scenarios and edge cases
    • Refined data type handling for improved search result accuracy
  • Tests

    • Added validation tests for importance-based filtering behavior under various conditions

Root cause: production FTS5 trigger (Python) doesn't set explicit rowid,
so FTS5 auto-assigned rowids diverge from chunks.rowid after rebuilds.
The f.rowid = c.rowid JOIN mapped to wrong chunks, making importance_min,
project, and tag filters check the wrong chunk's data.

Fix: JOIN on c.id = f.chunk_id (stored as UNINDEXED column in FTS5).
Also: importance now read as Double (was rounded to Int via column_int),
test triggers updated to match production (no explicit rowid).

39 tests pass including new testImportanceFilterWorksWithDivergedRowids
that reproduces the exact production bug (FTS5 rebuild + rowid divergence).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented Mar 18, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 18, 2026

📝 Walkthrough

Walkthrough

Modified FTS (Full-Text Search) synchronization triggers in BrainDatabase to replace explicit rowid usage with chunk_id as the primary join key across insert, update, and delete operations. Adjusted FTS query joins from rowid matching to chunk_id-based joins. Changed importance field mapping from integer to double and elevated exec() method visibility from private to internal. Added tests validating importance-based filtering under edge cases.

Changes

Cohort / File(s) Summary
FTS and Data Type Updates
brain-bar/Sources/BrainBar/BrainDatabase.swift
Refactored FTS synchronization triggers to use chunk_id instead of rowid for join operations. Updated FTS query to join chunks_fts with chunks using c.id = f.chunk_id. Changed importance field mapping from integer to double. Made exec() method internal (removed private visibility).
Importance Filtering Tests
brain-bar/Tests/BrainBarTests/DatabaseTests.swift
Added two new test cases: testImportanceFilterWorksWithDivergedRowids validates importance filtering when FTS5 rowids diverge from chunk rowids; testAllResultsRespectImportanceMin verifies that search results with importanceMin parameter only return chunks meeting the threshold.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰 A rabbit's ode to the database dance:
Where rowids once ruled with tyrannical stance,
Now chunk_ids lead with elegant grace,
FTS queries find their rightful place.
Importance flows smooth as a double-precision dream,
Triggers aligned—oh, what a team! 🎯

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately describes the main fix: changing FTS5 JOIN logic from using rowid to using chunk_id, which is the core change addressing the production bug where rowid divergence caused filtering failures.
Docstring Coverage ✅ Passed Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/brainbar-importance-filter
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented Mar 18, 2026

@codex review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 18, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented Mar 18, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 18, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@brain-bar/Sources/BrainBar/BrainDatabase.swift`:
- Line 283: The test-accessible database helper function exec(_ sql: String)
currently relies on Swift's default internal visibility; make this explicit by
adding the internal access modifier to the function declaration (change func
exec(_ sql: String) to internal func exec(_ sql: String)) so tests can call
db.exec() and the intent is documented and protected from accidental visibility
changes.

In `@brain-bar/Tests/BrainBarTests/DatabaseTests.swift`:
- Around line 206-210: The tests currently check importance as either Int or
Double, but BrainDatabase.search() always returns a Double (it uses
sqlite3_column_double); remove the dead as? Int branches and simplify assertions
to treat importance as Double only (e.g., cast/guard let importance as? Double
and XCTAssertGreaterThanOrEqual(importance, 7.0, ...)) in the test methods
testImportanceFilterWorksWithDivergedRowids and
testAllResultsRespectImportanceMin so the tests fail fast if the return type
changes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8f4a7db4-6691-4efa-bb8e-a3a2aaff0a52

📥 Commits

Reviewing files that changed from the base of the PR and between 0cf1a78 and db8c796.

📒 Files selected for processing (2)
  • brain-bar/Sources/BrainBar/BrainDatabase.swift
  • brain-bar/Tests/BrainBarTests/DatabaseTests.swift
📜 Review details
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-14T02:20:54.656Z
Learning: Request codex review, cursor review, and bugbot review for BrainLayer PRs
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-12T14:22:54.809Z
Learning: Applies to src/brainlayer/**/*.py : For bulk database operations: stop enrichment workers first, checkpoint WAL before and after operations, drop FTS triggers before bulk deletes, batch deletes in 5-10K chunks with checkpoint every 3 batches, never delete from `chunks` while FTS trigger is active
📚 Learning: 2026-03-12T14:22:54.809Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-12T14:22:54.809Z
Learning: Applies to src/brainlayer/**/*.py : For bulk database operations: stop enrichment workers first, checkpoint WAL before and after operations, drop FTS triggers before bulk deletes, batch deletes in 5-10K chunks with checkpoint every 3 batches, never delete from `chunks` while FTS trigger is active

Applied to files:

  • brain-bar/Sources/BrainBar/BrainDatabase.swift
📚 Learning: 2026-03-18T00:12:08.774Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 87
File: brain-bar/Sources/BrainBar/BrainBarServer.swift:118-129
Timestamp: 2026-03-18T00:12:08.774Z
Learning: In Swift files under brain-bar/Sources/BrainBar, enforce that when a critical dependency like the database is nil due to startup ordering (socket before DB), any tool handler that accesses the database must throw an explicit error (e.g., ToolError.noDatabase) instead of returning a default/empty value. Do not allow silent defaults (e.g., guard let db else { return ... }). Flag patterns that silently return defaults when db is nil, as this masks startup timing issues. This guidance applies broadly to similar Swift files in the BrainBar module, not just this one location.

Applied to files:

  • brain-bar/Sources/BrainBar/BrainDatabase.swift
🔇 Additional comments (4)
brain-bar/Sources/BrainBar/BrainDatabase.swift (3)

87-107: LGTM! Triggers correctly use chunk_id linkage.

The updated triggers properly:

  • Insert without explicit rowid, storing new.id in the chunk_id UNINDEXED column
  • Delete by chunk_id = old.id instead of rowid
  • Update via delete-then-insert pattern using chunk_id

This matches the production trigger behavior and fixes the rowid divergence issue.


199-199: LGTM! Core fix for the rowid divergence bug.

Joining on c.id = f.chunk_id uses the stable UNINDEXED column that stores the actual chunk ID, rather than FTS5's auto-assigned rowid which diverges after table rebuilds. This ensures filters (importance_min, project, tag) check the correct chunk data.


233-233: LGTM! Correct type for importance column.

Using sqlite3_column_double aligns with the schema where importance is defined as REAL (line 73). This prevents unintended truncation that would occur with sqlite3_column_int.

brain-bar/Tests/BrainBarTests/DatabaseTests.swift (1)

160-211: Good test that reproduces the production bug scenario.

The test correctly simulates rowid divergence by:

  1. Inserting chunks with synced rowids via triggers
  2. Dropping and recreating FTS5 without explicit rowids
  3. Re-populating in reverse order to guarantee mismatch
  4. Verifying the importance filter returns correct data

This provides excellent regression coverage for the fix.

// MARK: - Helpers

private func exec(_ sql: String) {
func exec(_ sql: String) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider adding internal explicitly for clarity.

The change from private to default (internal) visibility is needed for tests to call db.exec(). While Swift's default is internal, making it explicit documents the intent and prevents accidental changes.

💡 Optional: Make visibility explicit
-    func exec(_ sql: String) {
+    internal func exec(_ sql: String) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@brain-bar/Sources/BrainBar/BrainDatabase.swift` at line 283, The
test-accessible database helper function exec(_ sql: String) currently relies on
Swift's default internal visibility; make this explicit by adding the internal
access modifier to the function declaration (change func exec(_ sql: String) to
internal func exec(_ sql: String)) so tests can call db.exec() and the intent is
documented and protected from accidental visibility changes.

Comment on lines +206 to +210
if let imp = importance as? Int {
XCTAssertGreaterThanOrEqual(imp, 7, "Returned importance must be >= 7")
} else if let imp = importance as? Double {
XCTAssertGreaterThanOrEqual(imp, 7.0, "Returned importance must be >= 7")
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Simplify importance type assertion—it's always Double.

Since BrainDatabase.search() uses sqlite3_column_double (line 233), the importance value is always a Double. The as? Int branches are dead code and could mask a regression if the type accidentally changes back.

♻️ Proposed fix for testImportanceFilterWorksWithDivergedRowids
         // Also verify the returned importance is correct (not from wrong row)
-        let importance = filtered.first?["importance"]
-        XCTAssertNotNil(importance, "importance should be present")
-        // Should be 9, not 2 (which would happen if rowids mapped to wrong chunk)
-        if let imp = importance as? Int {
-            XCTAssertGreaterThanOrEqual(imp, 7, "Returned importance must be >= 7")
-        } else if let imp = importance as? Double {
-            XCTAssertGreaterThanOrEqual(imp, 7.0, "Returned importance must be >= 7")
-        }
+        let importance = filtered.first?["importance"] as? Double
+        XCTAssertNotNil(importance, "importance should be present as Double")
+        // Should be 9, not 2 (which would happen if rowids mapped to wrong chunk)
+        XCTAssertGreaterThanOrEqual(importance!, 7.0, "Returned importance must be >= 7")
♻️ Proposed fix for testAllResultsRespectImportanceMin
         for result in results {
-            if let imp = result["importance"] as? Int {
-                XCTAssertGreaterThanOrEqual(imp, 7, "Every result must have importance >= 7")
-            } else if let imp = result["importance"] as? Double {
-                XCTAssertGreaterThanOrEqual(imp, 7.0, "Every result must have importance >= 7")
-            }
+            guard let imp = result["importance"] as? Double else {
+                XCTFail("importance should be present as Double")
+                continue
+            }
+            XCTAssertGreaterThanOrEqual(imp, 7.0, "Every result must have importance >= 7")
         }

Also applies to: 222-226

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@brain-bar/Tests/BrainBarTests/DatabaseTests.swift` around lines 206 - 210,
The tests currently check importance as either Int or Double, but
BrainDatabase.search() always returns a Double (it uses sqlite3_column_double);
remove the dead as? Int branches and simplify assertions to treat importance as
Double only (e.g., cast/guard let importance as? Double and
XCTAssertGreaterThanOrEqual(importance, 7.0, ...)) in the test methods
testImportanceFilterWorksWithDivergedRowids and
testAllResultsRespectImportanceMin so the tests fail fast if the return type
changes.

@EtanHey EtanHey merged commit 1fc7ba7 into main Mar 18, 2026
5 checks passed
@EtanHey EtanHey deleted the fix/brainbar-importance-filter branch March 18, 2026 11:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant