feat: Phase 3 — Brain Digest + Brain Entity tools by EtanHey · Pull Request #32 · EtanHey/brainlayer

EtanHey · 2026-02-25T18:08:09Z

Summary

brain_digest MCP tool: ingest raw content (transcripts, docs, articles) → extract entities, relations, sentiment, action items, decisions, questions. Creates searchable chunks with source="digest"
brain_entity MCP tool: look up entities in KG by name (FTS + semantic), returns relations and evidence chunks
user_verified column on kg_entities and kg_relations for human confirmation flags
CLI brainlayer digest command for terminal usage
New src/brainlayer/pipeline/digest.py module integrating Phase 2 (entity extraction) + Phase 6 (sentiment analysis)

Changes

File	What
`src/brainlayer/pipeline/digest.py`	New: `digest_content()` + `entity_lookup()` + regex extractors for action items/decisions/questions
`src/brainlayer/mcp/__init__.py`	Add `brain_digest` + `brain_entity` tools (schema + handlers), update server instructions (3→5 tools)
`src/brainlayer/vector_store.py`	Add `user_verified` column migration for kg_entities + kg_relations
`src/brainlayer/cli/__init__.py`	Add `brainlayer digest` command with `--file`, `--title`, `--project`, `--participants`
`tests/test_phase3_digest.py`	17 new tests covering schema, pipeline, MCP tools, entity lookup, integration
`tests/test_kg_schema.py`	Update column assertions for new `user_verified` column

Test plan

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added digest CLI command to analyze text and extract structured insights including entities, sentiment, actions, decisions, and questions
- Introduced brain_digest and brain_entity MCP tools for content processing and entity lookup
- Support for organizing digests by project and tracking participant metadata

Note

Medium Risk
Introduces new write paths into the DB (new chunk ingestion + KG writes) and applies schema migrations on startup, so correctness and compatibility with existing databases matter; changes are localized and covered by new tests.

Overview
Adds Phase 3 ingestion and lookup capabilities: brain_digest writes a new source="digest" chunk and runs entity/relation extraction + sentiment + basic action/decision/question parsing, while brain_entity performs KG entity lookup (FTS then semantic) and returns relations and evidence chunks.

Extends the KG schema by adding a user_verified column to kg_entities and kg_relations with lightweight migrations, exposes digestion via a new brainlayer digest CLI command, and updates/extends tests to cover the new schema, tools, and end-to-end digest→lookup flow (including MCP tool count from 3→5).

^{Written by Cursor Bugbot for commit 6d0dc39. This will update automatically on new commits. Configure here.}

Add brain_digest MCP tool for structured content ingestion (transcripts, documents, articles) and brain_entity for KG entity lookup with evidence. - digest.py: digest_content() creates chunk, extracts entities (Phase 2), analyzes sentiment (Phase 6), extracts action items/decisions/questions - entity_lookup() searches entities via FTS + semantic fallback, returns relations and evidence chunks - user_verified column on kg_entities and kg_relations tables - CLI: brainlayer digest command (text or --file input) - 17 new tests, 434 total pass, lint clean Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-02-25T18:08:28Z

📝 Walkthrough

Walkthrough

Introduces Phase 3 Brain Digest feature with a digest pipeline module that extracts structured knowledge from content (entities, relations, actions, decisions, questions, sentiment), adds CLI digest command and MCP brain_digest/brain_entity tools, updates KG schema with user_verified column, and establishes comprehensive tests.

Changes

Cohort / File(s)	Summary
Planning Documentation `docs/plans/2026-02-25-phase-3-brain-digest.md`	Detailed Phase 3 implementation plan covering goals, architecture, six implementation tasks, integration points with Phase 2/6, and file manifest.
Digest Pipeline `src/brainlayer/pipeline/digest.py`	New digest_content function ingests text, creates chunks, extracts entities/relations via Phase 2, analyzes sentiment, extracts action items/decisions/questions via pattern matching, and returns DigestResult. Includes entity_lookup utility for FTS and semantic entity search with relations/evidence retrieval.
CLI & MCP Tooling `src/brainlayer/cli/__init__.py`, `src/brainlayer/mcp/__init__.py`	CLI digest command accepts text/file input with optional metadata (title, project, participants) and outputs structured digest with stats. MCP layer adds brain_digest and brain_entity tools with corresponding handlers (_brain_digest, _brain_entity), tool definitions, and routing in call_tool dispatcher.
Database Schema `src/brainlayer/vector_store.py`	Adds user_verified column (INTEGER DEFAULT 0) migrations to kg_entities and kg_relations tables for user verification flag support.
Test Suite `tests/test_phase3_digest.py`, `tests/test_kg_schema.py`	test_kg_schema validates user_verified column presence and defaults on KG tables; test_phase3_digest covers digest pipeline behavior, chunk/entity/sentiment storage, confidence tiers, MCP tool schemas, entity_lookup with evidence, and end-to-end integration scenarios.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant CLI
    participant VectorStore
    participant EmbedModel
    participant Phase2Extractor
    participant SentimentAnalyzer
    participant KG as Knowledge Graph

    User->>CLI: digest --content "text" --title "Title" --participants "Alice,Bob"
    CLI->>VectorStore: Initialize store
    CLI->>EmbedModel: Load embedding model
    CLI->>Phase2Extractor: digest_content(content, store, embed_fn)
    Phase2Extractor->>VectorStore: Create digest chunk
    Phase2Extractor->>Phase2Extractor: Build seed entities from participants
    Phase2Extractor->>Phase2Extractor: Extract entities & relations (Phase 2)
    Phase2Extractor->>SentimentAnalyzer: Analyze sentiment (Phase 6)
    Phase2Extractor->>Phase2Extractor: Extract actions, decisions, questions (regex)
    Phase2Extractor->>EmbedModel: Embed chunk content
    Phase2Extractor->>VectorStore: Upsert chunk with embeddings
    Phase2Extractor->>KG: Update chunk sentiment
    Phase2Extractor->>Phase2Extractor: Compute confidence tiers
    Phase2Extractor-->>CLI: Return DigestResult
    CLI->>VectorStore: Close store
    CLI-->>User: Print digest ID, summary, sentiment, stats, entities, actions

sequenceDiagram
    actor User
    participant MCP
    participant VectorStore
    participant EmbedModel
    participant KG as Knowledge Graph

    User->>MCP: brain_entity(query="Alice", entity_type="person")
    MCP->>VectorStore: Initialize store
    MCP->>EmbedModel: Load embedding model
    MCP->>VectorStore: Full-text search for "Alice"
    alt FTS results found
        VectorStore-->>MCP: Return FTS matches
    else FTS empty
        MCP->>EmbedModel: Embed query
        MCP->>VectorStore: Semantic search (embeddings)
        VectorStore-->>MCP: Return semantic matches
    end
    MCP->>KG: Retrieve entity data & relations
    MCP->>KG: Fetch evidence chunks
    MCP-->>User: Return entity with relations and evidence

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

PR #29: Extends Knowledge Graph schema and VectorStore APIs; provides foundational KG table structure and vector storage interfaces that the digest pipeline directly depends on.
PR #31: Implements Phase 2 entity extraction logic; the digest pipeline integrates Phase 2 extraction as a core processing step and shares vector_store.py modifications.

Poem

🐰 A digest for the brain, how divine!
Chunks and entities, all in line,
Actions and sentiments, questions too,
Knowledge graph blooming, shiny and new!
Phase 3 hops forward, dreams take flight! ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: Phase 3 — Brain Digest + Brain Entity tools' directly and clearly summarizes the main changes: introducing Phase 3 features with two new tools (brain_digest and brain_entity).
Docstring Coverage	✅ Passed	Docstring coverage is 91.67% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/phase-3-brain-digest

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/brainlayer/vector_store.py (1)

355-372: 🧹 Nitpick | 🔵 Trivial

user_verified should be declared in the CREATE TABLE DDL, not only via migration

Because user_verified is absent from both CREATE TABLE IF NOT EXISTS kg_entities and CREATE TABLE IF NOT EXISTS kg_relations, every fresh database init always executes PRAGMA table_info + ALTER TABLE for both tables. The migration guard was designed for upgrading existing databases, not for seeding new ones.

Including the column in the DDL (and keeping the migration guard) is idempotent, self-documenting, and avoids the unnecessary overhead on fresh installs.

💡 Proposed fix

         cursor.execute("""
             CREATE TABLE IF NOT EXISTS kg_entities (
                 id TEXT PRIMARY KEY,
                 entity_type TEXT NOT NULL,
                 name TEXT NOT NULL,
                 metadata TEXT DEFAULT '{}',
                 created_at TEXT DEFAULT (strftime('%Y-%m-%dT%H:%M:%fZ','now')),
                 updated_at TEXT DEFAULT (strftime('%Y-%m-%dT%H:%M:%fZ','now')),
+                user_verified INTEGER DEFAULT 0,
                 UNIQUE(entity_type, name)
             )
         """)

         cursor.execute("""
             CREATE TABLE IF NOT EXISTS kg_relations (
                 id TEXT PRIMARY KEY,
                 source_id TEXT NOT NULL,
                 target_id TEXT NOT NULL,
                 relation_type TEXT NOT NULL,
                 properties TEXT DEFAULT '{}',
                 confidence REAL DEFAULT 1.0,
                 created_at TEXT DEFAULT (strftime('%Y-%m-%dT%H:%M:%fZ','now')),
+                user_verified INTEGER DEFAULT 0,
                 UNIQUE(source_id, target_id, relation_type)
             )
         """)

The existing migration guards at lines 370-372 and 392-394 remain and continue to handle old databases correctly.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/brainlayer/vector_store.py` around lines 355 - 372, The CREATE TABLE DDL
for kg_entities (and similarly for kg_relations) must include the user_verified
column so new DBs don't always run the migration: update the CREATE TABLE IF NOT
EXISTS kg_entities statement to declare "user_verified INTEGER DEFAULT 0" (and
do the same for kg_relations' CREATE TABLE) while keeping the existing migration
guard that checks PRAGMA table_info and the ALTER TABLE in functions/blocks
where those statements live; this ensures idempotent schema creation for fresh
installs and still upgrades old databases.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/plans/2026-02-25-phase-3-brain-digest.md`:
- Line 13: The "### Task N:" headings jump from H1 to H3 causing markdownlint
MD001; change each "### Task N:" (e.g., "### Task 1: Add user_verified column to
KG tables") to an H2 ("## Task 1: ...") so headings increment H1→H2→H3
consistently, apply this change to all occurrences of "### Task" in the document
and re-run markdownlint to verify no MD001 warnings remain.

In `@src/brainlayer/cli/__init__.py`:
- Around line 364-431: The VectorStore instance created in digest (store =
VectorStore(DEFAULT_DB_PATH)) can leak if an exception is raised before the
happy-path store.close(); ensure the store is always closed by moving creation
outside the try or by introducing a finally block that calls store.close() when
store is not None. Update the digest function in src/brainlayer/cli/__init__.py
to wrap the core logic (calls to get_embedding_model, digest_content, and
console output) in try/finally (or set store = None before try and close in
finally) so that store.close() runs on both success and error paths; reference
VectorStore, DEFAULT_DB_PATH, store.close(), get_embedding_model, and
digest_content to locate the changes.

In `@src/brainlayer/mcp/__init__.py`:
- Around line 945-959: The timeout wrapper is ineffective because _brain_entity
does synchronous blocking I/O on the event loop, so asyncio.wait_for
(_with_timeout) cannot interrupt it; instead, run the blocking work off the loop
(use asyncio.get_running_loop().run_in_executor or asyncio.to_thread) and apply
_with_timeout to that task. Concretely, change the caller that returns await
_with_timeout(_brain_entity(...)) to schedule the blocking portion of
_brain_entity in an executor (or refactor the blocking logic into a sync helper
and call it via run_in_executor/to_thread) and await _with_timeout on the
resulting future; keep function names _brain_entity and _with_timeout referenced
so you update the correct call sites.
- Around line 1078-1107: The _brain_digest handler currently calls the
synchronous digest_content directly, which blocks the event loop; modify
_brain_digest to offload the blocking work to a thread executor (e.g., use
asyncio.get_running_loop().run_in_executor or asyncio.to_thread) when invoking
digest_content, passing store from _get_vector_store(), embed_fn=model.embed
from _get_embedding_model(), and the normalized project via
_normalize_project_name; preserve the existing try/except behavior and return
types (CallToolResult with TextContent on success, _error_result on exceptions)
so all embedding/DB/entity extraction/sentiment work runs off the MCP event
loop.
- Around line 1110-1134: _brain_entity blocks the event loop because it calls
the synchronous entity_lookup (which touches the DB and may call model.embed)
and lacks error handling; wrap the call to entity_lookup inside an executor
(e.g., use asyncio.get_running_loop().run_in_executor or asyncio.to_thread) and
pass store/_get_vector_store and model/_get_embedding_model as before, then
surround that awaited offloaded call with try/except to catch any exceptions,
log or convert the exception to a safe text message, and return a CallToolResult
containing an error TextContent instead of letting the traceback propagate;
refer to symbols _brain_entity, entity_lookup, _get_vector_store,
_get_embedding_model, and CallToolResult to locate and update the code.

In `@src/brainlayer/pipeline/digest.py`:
- Around line 91-103: The _classify_confidence function currently collapses
medium and low confidences into "needs_review", making
MEDIUM_CONFIDENCE_THRESHOLD unused; update the function to preserve a distinct
medium bucket by adding a third counter (e.g., medium = 0), increment medium
when conf >= MEDIUM_CONFIDENCE_THRESHOLD and < HIGH_CONFIDENCE_THRESHOLD, leave
low to the else branch, and return {"high_confidence": high,
"medium_confidence": medium, "needs_review": low}; reference
_classify_confidence, HIGH_CONFIDENCE_THRESHOLD, and MEDIUM_CONFIDENCE_THRESHOLD
when making the change.
- Around line 28-45: ACTION_PATTERNS[3] is too broad and will generate many
false positives; replace or remove it — either restrict it to
first‑person/imperative forms (e.g. anchors like ^\s*(?:I|we)\s+(?:will|need
to|should)\b or require the phrase to start a line/paragraph or be prefixed by
TODO/ACTION) or drop it and rely on the structured list patterns; also fix
ACTION_PATTERNS[1] which currently uses re.S and a lazy dot that can span the
whole document — remove the re.S flag and change the capture to
per-line/non-newline matching (e.g. use [^\n]+ or apply re.M with an explicit
line-based pattern) so a numbered item only captures its own line/block instead
of to the document end.

In `@tests/test_phase3_digest.py`:
- Around line 31-55: Multiple tests create VectorStore(tmp_path / "test.db")
inline without closing it, leaking APSW connections; fix by adding a shared
pytest fixture that yields a VectorStore and calls store.close() on teardown
(follow the pattern used in tests/test_kg_schema.py) and update affected tests
to accept that fixture (or alternatively ensure each test calls store.close() or
uses a context manager around VectorStore); reference the VectorStore
constructor usage and the store.close() method when making the change so all
inline creations are replaced or closed.

---

Outside diff comments:
In `@src/brainlayer/vector_store.py`:
- Around line 355-372: The CREATE TABLE DDL for kg_entities (and similarly for
kg_relations) must include the user_verified column so new DBs don't always run
the migration: update the CREATE TABLE IF NOT EXISTS kg_entities statement to
declare "user_verified INTEGER DEFAULT 0" (and do the same for kg_relations'
CREATE TABLE) while keeping the existing migration guard that checks PRAGMA
table_info and the ALTER TABLE in functions/blocks where those statements live;
this ensures idempotent schema creation for fresh installs and still upgrades
old databases.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5ffa38c and fd36c8b.

📒 Files selected for processing (7)

docs/plans/2026-02-25-phase-3-brain-digest.md
src/brainlayer/cli/__init__.py
src/brainlayer/mcp/__init__.py
src/brainlayer/pipeline/digest.py
src/brainlayer/vector_store.py
tests/test_kg_schema.py
tests/test_phase3_digest.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: test (3.13)
GitHub Check: test (3.12)
GitHub Check: test (3.11)

🧰 Additional context used

📓 Path-based instructions (5)

tests/**/*.py