fix: persist digest LLM entities by EtanHey · Pull Request #290 · EtanHey/brainlayer

EtanHey · 2026-05-17T20:02:23Z

Summary

Re-enable LLM entity extraction in the brain_digest batch extraction wrapper.
Add a regression test that digests PEOPLE-ROLES-style content and verifies entity_lookup can find a newly extracted person with evidence.

Root Cause

src/brainlayer/pipeline/batch_extraction.py:87 passed use_llm=llm_caller is not None into extract_entities_combined(). The normal MCP/CLI brain_digest path does not pass an explicit test llm_caller, so default Gemini extraction was silently disabled and non-seed people were never materialized into kg_entities / kg_entity_chunks.

Test Plan

RED first: pytest -q tests/test_phase3_digest.py::test_digest_content_persists_llm_people_entities_for_lookup failed with entities_found == 0 before the fix.
pytest -q tests/test_phase3_digest.py::test_digest_content_persists_llm_people_entities_for_lookup
GOOGLE_API_KEY= GEMINI_API_KEY= pytest -q tests/test_phase3_digest.py tests/test_digest_pipeline_v2.py tests/test_mcp_digest_modes.py tests/test_kg_extraction.py tests/test_kg_rebuild.py tests/test_kg_relations.py tests/test_entity_extraction.py tests/test_entity_contracts.py tests/test_daemon_kg -> 185 passed, 6 skipped.
Pre-push test gate passed: 1980 passed, 9 skipped, 75 deselected, 1 xfailed; MCP registration 3 passed; isolated eval/hook routing 32 passed; bun 1 passed; FTS5 determinism shell passed.

Notes

A system-Python full pytest -q outside the repo venv failed during collection due unrelated local dependency state: missing deepchecks, plus numba rejecting NumPy 2.4 through ranx. The repo pre-push gate uses .venv and passed.

Note

Medium Risk
Changes digest/batch extraction behavior to always run LLM-based entity extraction, which can increase external LLM calls/cost and introduce new failure modes if credentials/rate limits are misconfigured.

Overview
Fixes brain_digest entity persistence by forcing process_chunk to call extract_entities_combined(..., use_llm=True) even when no explicit llm_caller is passed.

Adds a regression test that stubs Gemini extraction, digests PEOPLE-ROLES content, and asserts newly LLM-extracted person entities are stored with evidence and retrievable via entity_lookup.

^{Reviewed by Cursor Bugbot for commit 3b61cb1. Bugbot is set up for automated code reviews on this repo. Configure here.}

Note

Fix `process_chunk` to persist LLM-extracted entities during digest

Sets use_llm=True unconditionally in process_chunk when calling extract_entities_combined, fixing a bug where LLM-extracted entities were not persisted during digest. Previously, use_llm was only set when llm_caller was not None, but the condition did not work as intended. A new test verifies that digest_content persists person entities from LLM extraction so they are retrievable via entity_lookup.

^{Macroscope summarized 3b61cb1.}

Summary by CodeRabbit

Release Notes

Bug Fixes
- Entity extraction has been significantly enhanced to consistently utilize advanced processing capabilities, providing improved accuracy and reliability in identifying relevant entities across all projects and datasets. Extracted entities are now reliably persisted and remain accessible for subsequent lookup and reference operations, delivering better overall quality in entity recognition and more robust knowledge graph management functionality.

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

coderabbitai · 2026-05-17T20:02:33Z

📝 Walkthrough

Walkthrough

This PR decouples LLM entity extraction from the presence of a custom LLM caller by forcing process_chunk to always enable LLM usage, and validates the end-to-end behavior through an integration test that confirms LLM-extracted entities persist and are retrievable.

Changes

LLM Entity Extraction and Persistence

Layer / File(s)	Summary
Force LLM usage in entity extraction `src/brainlayer/pipeline/batch_extraction.py`	`process_chunk` now unconditionally passes `use_llm=True` to `extract_entities_combined`, removing the prior condition that tied LLM execution to custom LLM caller presence.
Entity persistence and lookup validation `tests/test_phase3_digest.py`	New test monkeypatches Gemini extraction to return three predetermined person entities, runs `digest_content` to persist them in the knowledge graph, and validates `entity_lookup` retrieves a specific extracted person by name with evidence.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

EtanHey/brainlayer#188: Aligns with a prior refactor that decouples LLM execution from explicit caller provision, achieving consistent LLM usage control flow.
EtanHey/brainlayer#32: The forced LLM extraction behavior directly impacts the entity persistence pathway through digest_content and entity_lookup that this PR tests.

Poem

🐰 LLM entities now flow free,
No caller needed, extraction decree—
Through digest they hop and persist with care,
Lookup finds them everywhere! ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'fix: persist digest LLM entities' directly summarizes the main change: re-enabling LLM entity extraction in the digest pipeline by fixing how use_llm is passed.
Docstring Coverage	✅ Passed	Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/kg-entity-persistence

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3b61cb1bc7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-17T20:35:10Z

        seed_entities,
        llm_caller=llm_caller,
-        use_llm=llm_caller is not None,
+        use_llm=True,


Sanitize digest text before LLM extraction

When a Gemini key is configured, digest_content() reaches this path without an explicit llm_caller, so extract_entities_llm() falls back to call_gemini_for_extraction() and sends the raw chunk text to an external API. The existing digest Gemini enrichment path immediately below uses Sanitizer.from_env()/build_external_prompt() before calling Gemini, and pipeline/sanitize.py documents that PII is stripped before external LLM APIs; this change bypasses that guard for CLI/MCP brain_digest inputs containing names, emails, paths, or secrets. Please either keep default extraction local/opt-in or build the extraction prompt from sanitized content as well.

Useful? React with 👍 / 👎.

Extends the Recent Hardening window from 2026-05-02 to 2026-05-17 and adds a "Phase 5 ship wave" subsection covering: - PR #289 — reject MCP-unavailable diagnostics + PreCompact checkpoint noise at the watcher / drain / store ingest heads; demote (not remove) any chunk with precompact/quarantine signals in hybrid rerank so explicit include_checkpoints callers still see them. - PR #290 — fix KG persistence regression in process_chunk where use_llm=llm_caller is not None silently disabled Gemini entity extraction on the MCP/CLI digest path. Non-seed person entities were never materialized into kg_entities. Second recurrence of the same 2026-04-06 root cause; RED-first regression test guards it. - Enrichment LaunchAgent recovered after 2026-05-15 11:50 IDT unload; com.brainlayer.enrichment verified live (launchctl PID present) draining the 56K-chunk backfill against the Gemini flex tier. Every claim cites the merged PR by number.

greptile-apps Bot reviewed May 17, 2026

View reviewed changes

fix: persist digest llm entities

3b61cb1

EtanHey force-pushed the fix/kg-entity-persistence branch from 9a0cbc4 to 3b61cb1 Compare May 17, 2026 20:33

greptile-apps Bot reviewed May 17, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 17, 2026

View reviewed changes

EtanHey merged commit 6fd85eb into main May 17, 2026
7 checks passed

EtanHey deleted the fix/kg-entity-persistence branch May 17, 2026 21:15

EtanHey mentioned this pull request May 17, 2026

docs: add 2026-05-17 phase-5 ship wave to Recent Hardening #291

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: persist digest LLM entities#290

fix: persist digest LLM entities#290
EtanHey merged 1 commit into
mainfrom
fix/kg-entity-persistence

EtanHey commented May 17, 2026 •

edited by macroscopeapp Bot

Loading

Uh oh!

greptile-apps Bot left a comment

Uh oh!

coderabbitai Bot commented May 17, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

greptile-apps Bot left a comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

EtanHey commented May 17, 2026 • edited by macroscopeapp Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Test Plan

Notes

Fix process_chunk to persist LLM-extracted entities during digest

Summary by CodeRabbit

Release Notes

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

EtanHey commented May 17, 2026 •

edited by macroscopeapp Bot

Loading

Fix `process_chunk` to persist LLM-extracted entities during digest

coderabbitai Bot commented May 17, 2026 •

edited

Loading