Skip to content

feat: Phase 3 — Brain Digest + Brain Entity tools#32

Merged
EtanHey merged 4 commits intomainfrom
feat/phase-3-brain-digest
Feb 25, 2026
Merged

feat: Phase 3 — Brain Digest + Brain Entity tools#32
EtanHey merged 4 commits intomainfrom
feat/phase-3-brain-digest

Conversation

@EtanHey
Copy link
Copy Markdown
Owner

@EtanHey EtanHey commented Feb 25, 2026

Summary

  • brain_digest MCP tool: ingest raw content (transcripts, docs, articles) → extract entities, relations, sentiment, action items, decisions, questions. Creates searchable chunks with source="digest"
  • brain_entity MCP tool: look up entities in KG by name (FTS + semantic), returns relations and evidence chunks
  • user_verified column on kg_entities and kg_relations for human confirmation flags
  • CLI brainlayer digest command for terminal usage
  • New src/brainlayer/pipeline/digest.py module integrating Phase 2 (entity extraction) + Phase 6 (sentiment analysis)

Changes

File What
src/brainlayer/pipeline/digest.py New: digest_content() + entity_lookup() + regex extractors for action items/decisions/questions
src/brainlayer/mcp/__init__.py Add brain_digest + brain_entity tools (schema + handlers), update server instructions (3→5 tools)
src/brainlayer/vector_store.py Add user_verified column migration for kg_entities + kg_relations
src/brainlayer/cli/__init__.py Add brainlayer digest command with --file, --title, --project, --participants
tests/test_phase3_digest.py 17 new tests covering schema, pipeline, MCP tools, entity lookup, integration
tests/test_kg_schema.py Update column assertions for new user_verified column

Test plan

  • 17 Phase 3 tests pass
  • 434 total tests pass, 9 skipped
  • ruff check src/ clean
  • brainlayer digest --help works
  • CodeRabbit review

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Added digest CLI command to analyze text and extract structured insights including entities, sentiment, actions, decisions, and questions
    • Introduced brain_digest and brain_entity MCP tools for content processing and entity lookup
    • Support for organizing digests by project and tracking participant metadata

Note

Medium Risk
Introduces new write paths into the DB (new chunk ingestion + KG writes) and applies schema migrations on startup, so correctness and compatibility with existing databases matter; changes are localized and covered by new tests.

Overview
Adds Phase 3 ingestion and lookup capabilities: brain_digest writes a new source="digest" chunk and runs entity/relation extraction + sentiment + basic action/decision/question parsing, while brain_entity performs KG entity lookup (FTS then semantic) and returns relations and evidence chunks.

Extends the KG schema by adding a user_verified column to kg_entities and kg_relations with lightweight migrations, exposes digestion via a new brainlayer digest CLI command, and updates/extends tests to cover the new schema, tools, and end-to-end digest→lookup flow (including MCP tool count from 3→5).

Written by Cursor Bugbot for commit 6d0dc39. This will update automatically on new commits. Configure here.

Add brain_digest MCP tool for structured content ingestion (transcripts,
documents, articles) and brain_entity for KG entity lookup with evidence.

- digest.py: digest_content() creates chunk, extracts entities (Phase 2),
  analyzes sentiment (Phase 6), extracts action items/decisions/questions
- entity_lookup() searches entities via FTS + semantic fallback, returns
  relations and evidence chunks
- user_verified column on kg_entities and kg_relations tables
- CLI: brainlayer digest command (text or --file input)
- 17 new tests, 434 total pass, lint clean

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 25, 2026

📝 Walkthrough

Walkthrough

Introduces Phase 3 Brain Digest feature with a digest pipeline module that extracts structured knowledge from content (entities, relations, actions, decisions, questions, sentiment), adds CLI digest command and MCP brain_digest/brain_entity tools, updates KG schema with user_verified column, and establishes comprehensive tests.

Changes

Cohort / File(s) Summary
Planning Documentation
docs/plans/2026-02-25-phase-3-brain-digest.md
Detailed Phase 3 implementation plan covering goals, architecture, six implementation tasks, integration points with Phase 2/6, and file manifest.
Digest Pipeline
src/brainlayer/pipeline/digest.py
New digest_content function ingests text, creates chunks, extracts entities/relations via Phase 2, analyzes sentiment, extracts action items/decisions/questions via pattern matching, and returns DigestResult. Includes entity_lookup utility for FTS and semantic entity search with relations/evidence retrieval.
CLI & MCP Tooling
src/brainlayer/cli/__init__.py, src/brainlayer/mcp/__init__.py
CLI digest command accepts text/file input with optional metadata (title, project, participants) and outputs structured digest with stats. MCP layer adds brain_digest and brain_entity tools with corresponding handlers (_brain_digest, _brain_entity), tool definitions, and routing in call_tool dispatcher.
Database Schema
src/brainlayer/vector_store.py
Adds user_verified column (INTEGER DEFAULT 0) migrations to kg_entities and kg_relations tables for user verification flag support.
Test Suite
tests/test_phase3_digest.py, tests/test_kg_schema.py
test_kg_schema validates user_verified column presence and defaults on KG tables; test_phase3_digest covers digest pipeline behavior, chunk/entity/sentiment storage, confidence tiers, MCP tool schemas, entity_lookup with evidence, and end-to-end integration scenarios.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant CLI
    participant VectorStore
    participant EmbedModel
    participant Phase2Extractor
    participant SentimentAnalyzer
    participant KG as Knowledge Graph

    User->>CLI: digest --content "text" --title "Title" --participants "Alice,Bob"
    CLI->>VectorStore: Initialize store
    CLI->>EmbedModel: Load embedding model
    CLI->>Phase2Extractor: digest_content(content, store, embed_fn)
    Phase2Extractor->>VectorStore: Create digest chunk
    Phase2Extractor->>Phase2Extractor: Build seed entities from participants
    Phase2Extractor->>Phase2Extractor: Extract entities & relations (Phase 2)
    Phase2Extractor->>SentimentAnalyzer: Analyze sentiment (Phase 6)
    Phase2Extractor->>Phase2Extractor: Extract actions, decisions, questions (regex)
    Phase2Extractor->>EmbedModel: Embed chunk content
    Phase2Extractor->>VectorStore: Upsert chunk with embeddings
    Phase2Extractor->>KG: Update chunk sentiment
    Phase2Extractor->>Phase2Extractor: Compute confidence tiers
    Phase2Extractor-->>CLI: Return DigestResult
    CLI->>VectorStore: Close store
    CLI-->>User: Print digest ID, summary, sentiment, stats, entities, actions
Loading
sequenceDiagram
    actor User
    participant MCP
    participant VectorStore
    participant EmbedModel
    participant KG as Knowledge Graph

    User->>MCP: brain_entity(query="Alice", entity_type="person")
    MCP->>VectorStore: Initialize store
    MCP->>EmbedModel: Load embedding model
    MCP->>VectorStore: Full-text search for "Alice"
    alt FTS results found
        VectorStore-->>MCP: Return FTS matches
    else FTS empty
        MCP->>EmbedModel: Embed query
        MCP->>VectorStore: Semantic search (embeddings)
        VectorStore-->>MCP: Return semantic matches
    end
    MCP->>KG: Retrieve entity data & relations
    MCP->>KG: Fetch evidence chunks
    MCP-->>User: Return entity with relations and evidence
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • PR #29: Extends Knowledge Graph schema and VectorStore APIs; provides foundational KG table structure and vector storage interfaces that the digest pipeline directly depends on.
  • PR #31: Implements Phase 2 entity extraction logic; the digest pipeline integrates Phase 2 extraction as a core processing step and shares vector_store.py modifications.

Poem

🐰 A digest for the brain, how divine!
Chunks and entities, all in line,
Actions and sentiments, questions too,
Knowledge graph blooming, shiny and new!
Phase 3 hops forward, dreams take flight!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: Phase 3 — Brain Digest + Brain Entity tools' directly and clearly summarizes the main changes: introducing Phase 3 features with two new tools (brain_digest and brain_entity).
Docstring Coverage ✅ Passed Docstring coverage is 91.67% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/phase-3-brain-digest

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/brainlayer/vector_store.py (1)

355-372: 🧹 Nitpick | 🔵 Trivial

user_verified should be declared in the CREATE TABLE DDL, not only via migration

Because user_verified is absent from both CREATE TABLE IF NOT EXISTS kg_entities and CREATE TABLE IF NOT EXISTS kg_relations, every fresh database init always executes PRAGMA table_info + ALTER TABLE for both tables. The migration guard was designed for upgrading existing databases, not for seeding new ones.

Including the column in the DDL (and keeping the migration guard) is idempotent, self-documenting, and avoids the unnecessary overhead on fresh installs.

💡 Proposed fix
         cursor.execute("""
             CREATE TABLE IF NOT EXISTS kg_entities (
                 id TEXT PRIMARY KEY,
                 entity_type TEXT NOT NULL,
                 name TEXT NOT NULL,
                 metadata TEXT DEFAULT '{}',
                 created_at TEXT DEFAULT (strftime('%Y-%m-%dT%H:%M:%fZ','now')),
                 updated_at TEXT DEFAULT (strftime('%Y-%m-%dT%H:%M:%fZ','now')),
+                user_verified INTEGER DEFAULT 0,
                 UNIQUE(entity_type, name)
             )
         """)
         cursor.execute("""
             CREATE TABLE IF NOT EXISTS kg_relations (
                 id TEXT PRIMARY KEY,
                 source_id TEXT NOT NULL,
                 target_id TEXT NOT NULL,
                 relation_type TEXT NOT NULL,
                 properties TEXT DEFAULT '{}',
                 confidence REAL DEFAULT 1.0,
                 created_at TEXT DEFAULT (strftime('%Y-%m-%dT%H:%M:%fZ','now')),
+                user_verified INTEGER DEFAULT 0,
                 UNIQUE(source_id, target_id, relation_type)
             )
         """)

The existing migration guards at lines 370-372 and 392-394 remain and continue to handle old databases correctly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/brainlayer/vector_store.py` around lines 355 - 372, The CREATE TABLE DDL
for kg_entities (and similarly for kg_relations) must include the user_verified
column so new DBs don't always run the migration: update the CREATE TABLE IF NOT
EXISTS kg_entities statement to declare "user_verified INTEGER DEFAULT 0" (and
do the same for kg_relations' CREATE TABLE) while keeping the existing migration
guard that checks PRAGMA table_info and the ALTER TABLE in functions/blocks
where those statements live; this ensures idempotent schema creation for fresh
installs and still upgrades old databases.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/plans/2026-02-25-phase-3-brain-digest.md`:
- Line 13: The "### Task N:" headings jump from H1 to H3 causing markdownlint
MD001; change each "### Task N:" (e.g., "### Task 1: Add user_verified column to
KG tables") to an H2 ("## Task 1: ...") so headings increment H1→H2→H3
consistently, apply this change to all occurrences of "### Task" in the document
and re-run markdownlint to verify no MD001 warnings remain.

In `@src/brainlayer/cli/__init__.py`:
- Around line 364-431: The VectorStore instance created in digest (store =
VectorStore(DEFAULT_DB_PATH)) can leak if an exception is raised before the
happy-path store.close(); ensure the store is always closed by moving creation
outside the try or by introducing a finally block that calls store.close() when
store is not None. Update the digest function in src/brainlayer/cli/__init__.py
to wrap the core logic (calls to get_embedding_model, digest_content, and
console output) in try/finally (or set store = None before try and close in
finally) so that store.close() runs on both success and error paths; reference
VectorStore, DEFAULT_DB_PATH, store.close(), get_embedding_model, and
digest_content to locate the changes.

In `@src/brainlayer/mcp/__init__.py`:
- Around line 945-959: The timeout wrapper is ineffective because _brain_entity
does synchronous blocking I/O on the event loop, so asyncio.wait_for
(_with_timeout) cannot interrupt it; instead, run the blocking work off the loop
(use asyncio.get_running_loop().run_in_executor or asyncio.to_thread) and apply
_with_timeout to that task. Concretely, change the caller that returns await
_with_timeout(_brain_entity(...)) to schedule the blocking portion of
_brain_entity in an executor (or refactor the blocking logic into a sync helper
and call it via run_in_executor/to_thread) and await _with_timeout on the
resulting future; keep function names _brain_entity and _with_timeout referenced
so you update the correct call sites.
- Around line 1078-1107: The _brain_digest handler currently calls the
synchronous digest_content directly, which blocks the event loop; modify
_brain_digest to offload the blocking work to a thread executor (e.g., use
asyncio.get_running_loop().run_in_executor or asyncio.to_thread) when invoking
digest_content, passing store from _get_vector_store(), embed_fn=model.embed
from _get_embedding_model(), and the normalized project via
_normalize_project_name; preserve the existing try/except behavior and return
types (CallToolResult with TextContent on success, _error_result on exceptions)
so all embedding/DB/entity extraction/sentiment work runs off the MCP event
loop.
- Around line 1110-1134: _brain_entity blocks the event loop because it calls
the synchronous entity_lookup (which touches the DB and may call model.embed)
and lacks error handling; wrap the call to entity_lookup inside an executor
(e.g., use asyncio.get_running_loop().run_in_executor or asyncio.to_thread) and
pass store/_get_vector_store and model/_get_embedding_model as before, then
surround that awaited offloaded call with try/except to catch any exceptions,
log or convert the exception to a safe text message, and return a CallToolResult
containing an error TextContent instead of letting the traceback propagate;
refer to symbols _brain_entity, entity_lookup, _get_vector_store,
_get_embedding_model, and CallToolResult to locate and update the code.

In `@src/brainlayer/pipeline/digest.py`:
- Around line 91-103: The _classify_confidence function currently collapses
medium and low confidences into "needs_review", making
MEDIUM_CONFIDENCE_THRESHOLD unused; update the function to preserve a distinct
medium bucket by adding a third counter (e.g., medium = 0), increment medium
when conf >= MEDIUM_CONFIDENCE_THRESHOLD and < HIGH_CONFIDENCE_THRESHOLD, leave
low to the else branch, and return {"high_confidence": high,
"medium_confidence": medium, "needs_review": low}; reference
_classify_confidence, HIGH_CONFIDENCE_THRESHOLD, and MEDIUM_CONFIDENCE_THRESHOLD
when making the change.
- Around line 28-45: ACTION_PATTERNS[3] is too broad and will generate many
false positives; replace or remove it — either restrict it to
first‑person/imperative forms (e.g. anchors like ^\s*(?:I|we)\s+(?:will|need
to|should)\b or require the phrase to start a line/paragraph or be prefixed by
TODO/ACTION) or drop it and rely on the structured list patterns; also fix
ACTION_PATTERNS[1] which currently uses re.S and a lazy dot that can span the
whole document — remove the re.S flag and change the capture to
per-line/non-newline matching (e.g. use [^\n]+ or apply re.M with an explicit
line-based pattern) so a numbered item only captures its own line/block instead
of to the document end.

In `@tests/test_phase3_digest.py`:
- Around line 31-55: Multiple tests create VectorStore(tmp_path / "test.db")
inline without closing it, leaking APSW connections; fix by adding a shared
pytest fixture that yields a VectorStore and calls store.close() on teardown
(follow the pattern used in tests/test_kg_schema.py) and update affected tests
to accept that fixture (or alternatively ensure each test calls store.close() or
uses a context manager around VectorStore); reference the VectorStore
constructor usage and the store.close() method when making the change so all
inline creations are replaced or closed.

---

Outside diff comments:
In `@src/brainlayer/vector_store.py`:
- Around line 355-372: The CREATE TABLE DDL for kg_entities (and similarly for
kg_relations) must include the user_verified column so new DBs don't always run
the migration: update the CREATE TABLE IF NOT EXISTS kg_entities statement to
declare "user_verified INTEGER DEFAULT 0" (and do the same for kg_relations'
CREATE TABLE) while keeping the existing migration guard that checks PRAGMA
table_info and the ALTER TABLE in functions/blocks where those statements live;
this ensures idempotent schema creation for fresh installs and still upgrades
old databases.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5ffa38c and fd36c8b.

📒 Files selected for processing (7)
  • docs/plans/2026-02-25-phase-3-brain-digest.md
  • src/brainlayer/cli/__init__.py
  • src/brainlayer/mcp/__init__.py
  • src/brainlayer/pipeline/digest.py
  • src/brainlayer/vector_store.py
  • tests/test_kg_schema.py
  • tests/test_phase3_digest.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: test (3.13)
  • GitHub Check: test (3.12)
  • GitHub Check: test (3.11)
🧰 Additional context used
📓 Path-based instructions (5)
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Run tests using pytest from the project root

Files:

  • tests/test_kg_schema.py
  • tests/test_phase3_digest.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use ruff check src/ for linting and ruff format src/ for code formatting

Files:

  • src/brainlayer/vector_store.py
  • src/brainlayer/pipeline/digest.py
  • src/brainlayer/cli/__init__.py
  • src/brainlayer/mcp/__init__.py
src/brainlayer/vector_store.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use sqlite-vec with APSW for vector storage, WAL mode, and PRAGMA busy_timeout = 5000 for concurrent multi-process safety

Files:

  • src/brainlayer/vector_store.py
src/brainlayer/cli/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Enable project-specific indexing with brainlayer index --project <project_name> and incremental indexing with brainlayer index-fast

Files:

  • src/brainlayer/cli/__init__.py
src/brainlayer/mcp/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Implement MCP server with brain_search, brain_store, and brain_recall tools, maintaining backward compatibility with old brainlayer_* tool names

Files:

  • src/brainlayer/mcp/__init__.py
🧠 Learnings (2)
📚 Learning: 2026-02-23T16:51:38.317Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-23T16:51:38.317Z
Learning: Applies to src/brainlayer/mcp/**/*.py : Implement MCP server with brain_search, brain_store, and brain_recall tools, maintaining backward compatibility with old brainlayer_* tool names

Applied to files:

  • docs/plans/2026-02-25-phase-3-brain-digest.md
  • src/brainlayer/mcp/__init__.py
📚 Learning: 2026-02-23T16:51:38.317Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-23T16:51:38.317Z
Learning: Applies to src/brainlayer/pipeline/enrichment.py : Enrich chunks with 10 metadata fields: summary, tags, importance (1-10), intent, primary_symbols, resolved_query, epistemic_level, version_scope, debt_impact, and external_deps

Applied to files:

  • src/brainlayer/pipeline/digest.py
🧬 Code graph analysis (3)
src/brainlayer/cli/__init__.py (3)
src/brainlayer/embeddings.py (1)
  • get_embedding_model (109-114)
src/brainlayer/pipeline/digest.py (1)
  • digest_content (106-222)
src/brainlayer/vector_store.py (2)
  • VectorStore (72-2534)
  • close (2523-2528)
tests/test_phase3_digest.py (4)
src/brainlayer/vector_store.py (2)
  • VectorStore (72-2534)
  • upsert_chunks (490-549)
tests/test_kg_schema.py (1)
  • store (23-28)
src/brainlayer/pipeline/digest.py (2)
  • digest_content (106-222)
  • entity_lookup (225-286)
src/brainlayer/mcp/__init__.py (1)
  • list_tools (527-827)
src/brainlayer/mcp/__init__.py (1)
src/brainlayer/pipeline/digest.py (2)
  • digest_content (106-222)
  • entity_lookup (225-286)
🪛 markdownlint-cli2 (0.21.0)
docs/plans/2026-02-25-phase-3-brain-digest.md

[warning] 13-13: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3

(MD001, heading-increment)

🔇 Additional comments (6)
tests/test_kg_schema.py (1)

74-82: LGTM — column set assertions correctly extended for user_verified

Both exact-set comparisons (kg_entities at line 77 and kg_relations at line 82) now include user_verified, matching the Phase 3 schema migrations. Using exact-set equality rather than in-checks provides good regression protection.

src/brainlayer/pipeline/digest.py (2)

225-286: LGTM — two-stage entity lookup is well-structured

FTS-first with semantic fallback is the right pattern here. The relation hydration correctly handles both outgoing (target_name/target_type) and incoming (source_name/source_type) directions, and the evidence truncation at 300 chars is a safe default for MCP response size.


155-168: No issue foundstore_extraction_result returns a Dict[str, str] mapping entity text → entity_id (as documented and implemented in batch_extraction.py line 95: entity_ids[entity.text] = entity_id). The code at line 162 correctly uses ext_entity.text as the lookup key, so all .get() calls will succeed for extracted entities that were stored.

Likely an incorrect or invalid review comment.

tests/test_phase3_digest.py (1)

299-308: The assertion on entities_found >= 2 is robust and does not need mocking

This test relies on seed entity matching (deterministic string matching), not Phase 2 NER. Both participants are explicitly mentioned in the content and will be found by seed matching with high confidence. No mocking or refactoring is needed.

src/brainlayer/mcp/__init__.py (2)

50-66: LGTM — Server instructions updated correctly for the two new tools.

Tool count is accurate (5 tools), and the new tool descriptions are clear and consistent with the existing format.


766-826: Tool definitions look well-structured and consistent with existing tools.

Both schemas follow the established patterns. brain_digest correctly uses _WRITE annotations (it creates chunks), and brain_entity correctly uses _READ_ONLY.


---

### Task 1: Add user_verified column to KG tables
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix heading-level jump (MD001)

Line 13 opens at ### (H3) directly beneath the # (H1) title, skipping H2. markdownlint flags this as MD001 heading-increment.

🛠 Proposed fix
-### Task 1: Add user_verified column to KG tables
+## Task 1: Add user_verified column to KG tables

Apply consistently to all ### Task N: headings in the file.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
### Task 1: Add user_verified column to KG tables
## Task 1: Add user_verified column to KG tables
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)

[warning] 13-13: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3

(MD001, heading-increment)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/plans/2026-02-25-phase-3-brain-digest.md` at line 13, The "### Task N:"
headings jump from H1 to H3 causing markdownlint MD001; change each "### Task
N:" (e.g., "### Task 1: Add user_verified column to KG tables") to an H2 ("##
Task 1: ...") so headings increment H1→H2→H3 consistently, apply this change to
all occurrences of "### Task" in the document and re-run markdownlint to verify
no MD001 warnings remain.

Comment thread src/brainlayer/cli/__init__.py
Comment thread src/brainlayer/mcp/__init__.py
Comment thread src/brainlayer/mcp/__init__.py
Comment thread src/brainlayer/mcp/__init__.py Outdated
Comment thread src/brainlayer/pipeline/digest.py
Comment thread src/brainlayer/pipeline/digest.py Outdated
Comment on lines +31 to +55
def test_user_verified_column_on_kg_entities(tmp_path):
"""kg_entities has user_verified column."""
store = VectorStore(tmp_path / "test.db")
cursor = store.conn.cursor()
cols = {row[1] for row in cursor.execute("PRAGMA table_info(kg_entities)")}
assert "user_verified" in cols


def test_user_verified_column_on_kg_relations(tmp_path):
"""kg_relations has user_verified column."""
store = VectorStore(tmp_path / "test.db")
cursor = store.conn.cursor()
cols = {row[1] for row in cursor.execute("PRAGMA table_info(kg_relations)")}
assert "user_verified" in cols


def test_user_verified_defaults_to_false(tmp_path):
"""user_verified defaults to 0 (false) on new entities."""
store = VectorStore(tmp_path / "test.db")
eid = store.upsert_entity("test-ent-1", "person", "Test Person")
cursor = store.conn.cursor()
row = list(cursor.execute(
"SELECT user_verified FROM kg_entities WHERE id = ?", [eid]
))[0]
assert row[0] == 0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

VectorStore instances are never closed — resource leak across all inline-created stores

Every test function in this file that creates VectorStore(tmp_path / "test.db") inline (lines 33, 41, 49, 65, 91, 115, 134, 151, 170, 183, 252, 275, 289) never calls store.close(). This leaks open APSW connections and WAL file handles. tests/test_kg_schema.py avoids this via a store fixture that properly yields and closes the instance.

The simplest fix is to add a shared store fixture (matching the pattern in test_kg_schema.py) and use it in each test, or to call store.close() / use a with-statement where a fixture isn't practical.

🛠 Proposed fixture + usage pattern
+import pytest
+from brainlayer.vector_store import VectorStore
+
+@pytest.fixture
+def store(tmp_path):
+    s = VectorStore(tmp_path / "test.db")
+    yield s
+    s.close()

Then each test that previously created its own store inline:

-def test_user_verified_column_on_kg_entities(tmp_path):
-    store = VectorStore(tmp_path / "test.db")
-    cursor = store.conn.cursor()
+def test_user_verified_column_on_kg_entities(store):
+    cursor = store.conn.cursor()

Tests that need the embed_fn can still accept both store and the mock_embedding fixture.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_phase3_digest.py` around lines 31 - 55, Multiple tests create
VectorStore(tmp_path / "test.db") inline without closing it, leaking APSW
connections; fix by adding a shared pytest fixture that yields a VectorStore and
calls store.close() on teardown (follow the pattern used in
tests/test_kg_schema.py) and update affected tests to accept that fixture (or
alternatively ensure each test calls store.close() or uses a context manager
around VectorStore); reference the VectorStore constructor usage and the
store.close() method when making the change so all inline creations are replaced
or closed.

EtanHey and others added 3 commits February 25, 2026 20:17
- Remove unused imports (json, patch) in test file
- Replace lambda assignments with def function (E731)
- Fix import sorting (I001)
- Update tool count test: 3 → 5 (brain_digest + brain_entity added)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Run MCP handlers via loop.run_in_executor to avoid blocking event loop
- Add error handling to brain_entity handler
- Fix VectorStore resource leak in CLI digest command (try/finally)
- Remove overly broad modal-verb action item pattern (false positives)
- Separate low_confidence tier from needs_review in confidence stats

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@EtanHey EtanHey merged commit 6c5db6f into main Feb 25, 2026
6 checks passed
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

lambda: digest_content(
content=content,
store=store,
embed_fn=model.embed,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model.embed attribute doesn't exist on EmbeddingModel

High Severity

EmbeddingModel has embed_query and embed_chunks methods but no embed method. Passing model.embed as embed_fn will raise an AttributeError at runtime when brain_digest, brain_entity, or the CLI digest command is invoked. Tests pass because they use a _dummy_embed function that bypasses the real model entirely.

Additional Locations (2)

Fix in Cursor Fix in Web


store = _get_vector_store()
model = _get_embedding_model()
loop = asyncio.get_event_loop()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent use of deprecated get_event_loop API

Low Severity

_brain_digest and _brain_entity use asyncio.get_event_loop(), while the rest of the codebase (e.g., _brain_search at line 1401) consistently uses asyncio.get_running_loop(). get_event_loop() is deprecated in Python 3.10+ for this use case and may be removed in future versions.

Additional Locations (1)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant