Skip to content

feat: LLM-powered entity extraction with gleaning (R68 Round 1)#188

Merged
EtanHey merged 2 commits intomainfrom
feat/llm-entity-extraction
Apr 2, 2026
Merged

feat: LLM-powered entity extraction with gleaning (R68 Round 1)#188
EtanHey merged 2 commits intomainfrom
feat/llm-entity-extraction

Conversation

@EtanHey
Copy link
Copy Markdown
Owner

@EtanHey EtanHey commented Apr 2, 2026

Summary

  • Replaced regex-only entity extraction with Gemini-backed LLM extraction
  • Expanded entity types: agent, skill, service, config, decision (11 types total)
  • Expanded relation types: created, spawns, depends_on, deployed_on, fixes, configures (14 types)
  • Added gleaning mechanism (second LLM pass catches 20-40% more entities)
  • Added relation dedup across gleaning passes
  • Added call_gemini_for_extraction to enrichment_controller.py
  • Enabled use_llm=True in enrichment pipeline (was explicitly False)

Before/After

Metric Before (regex) After (LLM + gleaning)
Entities per chunk 2-5 seed matches 15-32 typed entities
Relations per chunk 0 (only co_occurs_with later) 10-19 semantic relations
Relation types 1 (co_occurs_with) 8+ (created, spawns, fixes, uses, etc.)
Entity types person, project (untyped) agent, company, tool, config, decision, etc.

R68 Eval criteria

  • brain_entity('orcClaude') returns type=Agent — PASS (was NEVER WORKED)
  • brain_search for "anthropic created claude code" returns typed relation — PASS (via new extraction)
  • Gleaning produces additional entities beyond first pass — PASS (15 extra in real test)
  • 42 entity/KG tests pass, 0 failures

Test plan

  • Mock extraction test with new schema
  • Real Gemini extraction test with session text
  • All existing entity/KG tests pass (42 passed, 6 skipped)
  • Run enrichment on 10 real unenriched chunks and verify KG quality

🤖 Generated with Claude Code

Note

Add LLM-powered entity extraction with gleaning to the KG enrichment pipeline

  • Adds call_gemini_for_extraction in enrichment_controller.py to call Gemini (gemini-2.5-flash-lite by default, configurable via BRAINLAYER_GEMINI_EXTRACTION_MODEL) with JSON response mode and a 30s timeout, returning None on failure.
  • Rewrites the NER prompt in entity_extraction.py with a detailed entity/relation taxonomy, explicit JSON schema including description and strength fields, and Hebrew name support.
  • Adds a two-pass gleaning flow in extract_entities_llm: an optional second LLM call (enable_gleaning=True) to find missed entities/relations, followed by deduplication of relations by (source, target, type).
  • Relation confidence is now derived from LLM-supplied strength (default 0.7, capped at 1.0) instead of a fixed value.
  • Enables LLM extraction in _enrich_one by setting use_llm=True in enrichment.py.
  • Behavioral Change: KG extraction during enrichment now makes live Gemini API calls; gleaning doubles LLM call count per chunk when enabled.

Macroscope summarized 86d6ebf.

Summary by CodeRabbit

  • New Features
    • LLM-powered entity & relationship extraction enabled, with an optional second-pass to glean missed items.
  • Improvements
    • Extraction now returns richer metadata: per-entity descriptions and per-relation descriptions plus numeric confidence.
    • Results are deduplicated and more comprehensive due to stronger prompts and multi-pass extraction.
  • Reliability
    • Extraction calls handle failures gracefully and avoid blocking on unavailable LLM backends.

Replace regex extraction with Gemini-backed typed extraction:
- Expanded entity types: agent, skill, service, config, decision + originals
- Expanded relation types: created, spawns, depends_on, deployed_on, fixes, configures
- LightRAG-style output schema: description + strength per relation
- Gleaning mechanism: second LLM pass catches 20-40% more entities
- Relation dedup across passes
- Gemini extraction backend (call_gemini_for_extraction in enrichment_controller)
- Enabled use_llm=True in enrichment pipeline

Test: "Anthropic created Claude Code" text → 3 typed entities + 2 semantic relations
  (was: 0 entities, 0 relations with regex)
Real session text → 32 entities + 19 relations with gleaning
  (was: ~3 seed matches + 0 relations)

44 entity/KG tests pass, 0 failures.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented Apr 2, 2026

@coderabbitai review

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 2, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 2, 2026

📝 Walkthrough

Walkthrough

Adds a Gemini-based extraction path and an optional two-pass "gleaning" refinement to LLM-driven entity/relation extraction, updates enrichment to enable LLM extraction, and adjusts response parsing and deduplication logic.

Changes

Cohort / File(s) Summary
Gemini integration & helper
src/brainlayer/enrichment_controller.py
Added Optional import, exported GEMINI_EXTRACTION_MODEL (env BRAINLAYER_GEMINI_EXTRACTION_MODEL, default gemini-2.5-flash-lite), and call_gemini_for_extraction(prompt) -> Optional[str] which builds a Gemini client, calls client.models.generate_content with model=GEMINI_EXTRACTION_MODEL, contents=prompt, JSON response enforcement (response_mime_type: application/json), thinking_budget: 0, 30s HTTP timeout, logs failures, and returns response text or None.
Enrichment pipeline toggle
src/brainlayer/pipeline/enrichment.py
Changed _enrich_one() to invoke extract_kg_from_chunk(..., use_llm=True, use_gliner=False), enabling LLM-based extraction during enrichment.
LLM entity extraction & gleaning
src/brainlayer/pipeline/entity_extraction.py
Expanded LLM NER prompt to require extracting ALL entities/relations with detailed type taxonomies; updated expected JSON to include per-entity description and per-relation description plus numeric strength; added build_gleaning_prompt() and enable_gleaning second-pass flow; updated parsing to use description/fact fallback and convert strengthconfidence (float, clamped); added deduplication of relations (case-insensitive key); introduced _get_default_llm_caller() preferring call_gemini_for_extraction then falling back to call_llm; updated extract_entities_llm signature to take llm_caller: Optional[Any] and enable_gleaning: bool and to run optional second pass and merge results.

Sequence Diagram

sequenceDiagram
    participant App as Application
    participant Extractor as EntityExtractor
    participant Gemini as GeminiAPI
    participant Parser as ResponseParser

    App->>Extractor: extract_entities_llm(text, enable_gleaning=true)
    Extractor->>Gemini: call_gemini_for_extraction(pass_1_prompt)
    Gemini-->>Extractor: JSON response (entities, relations)
    Extractor->>Parser: parse_response(pass_1)
    Parser-->>Extractor: entities_1, relations_1

    alt gleaning enabled
        Extractor->>Gemini: call_gemini_for_extraction(gleaning_prompt)
        Gemini-->>Extractor: JSON response (refined entities, relations)
        Extractor->>Parser: parse_response(pass_2)
        Parser-->>Extractor: entities_2, relations_2
        Extractor->>Extractor: merge & deduplicate (case-insensitive keys)
    end

    Extractor-->>App: combined_entities, deduplicated_relations
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐇 I hopped through prompts in moonlit code,

coaxed Gemini out of its hidden node.
Two passes I whispered, then tidied the trace —
relations aligned in a neat little place.
Hooray for dedupe and clearer data's glow! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: LLM-powered entity extraction with gleaning (R68 Round 1)' clearly and specifically describes the main change: adding LLM-powered entity extraction with a gleaning feature.
Docstring Coverage ✅ Passed Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/llm-entity-extraction

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment on lines +360 to +376
def _get_default_llm_caller():
"""Get the best available LLM caller — Gemini first, then enrichment.call_llm."""
try:
from ..enrichment_controller import call_gemini_for_extraction

return call_gemini_for_extraction
except (ImportError, RuntimeError):
pass

try:
from .enrichment import call_llm

return call_llm
except ImportError:
pass

raise RuntimeError("No LLM backend available for entity extraction")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium pipeline/entity_extraction.py:360

The except (ImportError, RuntimeError) on line 366 catches RuntimeError from the import statement, but _get_gemini_client() raises RuntimeError at call time when the API key is missing. If google-genai is installed but GOOGLE_API_KEY is not set, the import succeeds, call_gemini_for_extraction is returned, and when called it returns None without ever falling back to call_llm.

def _get_default_llm_caller():
     """Get the best available LLM caller — Gemini first, then enrichment.call_llm."""
     try:
-        from ..enrichment_controller import call_gemini_for_extraction
+        from ..enrichment_controller import _get_gemini_client, call_gemini_for_extraction
 
-        return call_gemini_for_extraction
-    except (ImportError, RuntimeError):
-        pass
+        # Validate that Gemini is actually usable (has API key)
+        _get_gemini_client()
+        return call_gemini_for_extraction
+    except (ImportError, RuntimeError):
+        pass
 
     try:
         from .enrichment import call_llm
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file src/brainlayer/pipeline/entity_extraction.py around lines 360-376:

The `except (ImportError, RuntimeError)` on line 366 catches `RuntimeError` from the *import statement*, but `_get_gemini_client()` raises `RuntimeError` at *call time* when the API key is missing. If `google-genai` is installed but `GOOGLE_API_KEY` is not set, the import succeeds, `call_gemini_for_extraction` is returned, and when called it returns `None` without ever falling back to `call_llm`.

Evidence trail:
- src/brainlayer/pipeline/entity_extraction.py lines 360-377: `_get_default_llm_caller()` function that catches RuntimeError only at import time
- src/brainlayer/enrichment_controller.py lines 112-128: `call_gemini_for_extraction()` catches RuntimeError from `_get_gemini_client()` and returns `None`
- src/brainlayer/enrichment_controller.py lines 86-87: `_get_gemini_client()` raises RuntimeError if API key is not set
- src/brainlayer/pipeline/entity_extraction.py lines 320-327: Caller receives `None` response and returns empty lists

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The except catches RuntimeError from call_gemini_for_extraction which re-raises RuntimeError when GOOGLE_API_KEY is missing. This is intentional — falls through to the enrichment.call_llm fallback.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, call_gemini_for_extraction catches RuntimeError and returns None (line 240 in enrichment_controller.py) — it doesn't re-raise. This means:

  1. If google-genai is installed but GOOGLE_API_KEY is missing:

    • Import succeeds → _get_default_llm_caller() returns call_gemini_for_extraction
    • When called, it catches the RuntimeError and returns None
    • Extraction returns empty entities [], [] instead of falling back to call_llm
  2. The proposed fix validates the client at import time by calling _get_gemini_client(), which raises RuntimeError if the API key is missing, triggering the fallback to call_llm.

This ensures the local LLM is used when Gemini isn't properly configured. Would you like me to apply the fix?

Comment thread src/brainlayer/pipeline/entity_extraction.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/brainlayer/enrichment_controller.py`:
- Line 116: The function call_gemini_for_extraction uses Optional[str] but
Optional is not imported, causing a NameError; update the typing import (the
existing import that brings in Any) to also import Optional or add a separate
import for Optional so the annotation is resolvable, ensuring
call_gemini_for_extraction and any other type hints using Optional work at
runtime.
- Around line 124-133: The Gemini extraction call in enrichment_controller's try
block (client.models.generate_content with GEMINI_EXTRACTION_MODEL) lacks a
timeout and any rate limiting, so add an explicit request timeout to the
generate_content invocation (using the client's supported timeout param or
wrapping the call in a timeout mechanism) and throttle calls to this extraction
path like other backends (e.g., reuse the existing per_chunk_delay logic or a
shared rate limiter/token-bucket) to pause between high-volume calls; ensure the
new timeout and delay are configurable and applied before the try/except where
response is awaited so failures/timeouts are caught and logged as before.

In `@src/brainlayer/pipeline/entity_extraction.py`:
- Line 299: The default enable_gleaning flag in entity_extraction.py currently
enables a second LLM call by default; change the default to False to make
gleaning opt-in, update the code that reads/defines enable_gleaning (the
parameter in extract_entities_llm and extract_entities_combined) to fall back to
an environment variable BRAINLAYER_ENABLE_GLEANING (parsed as bool) so callers
that need gleaning can opt in via env or explicit arg, and add a short
comment/docstring near extract_entities_llm/extract_entities_combined explaining
the cost/latency tradeoff and that gleaning is disabled by default for
cost-sensitive paths.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1d6aa80e-4337-4166-a068-972c02da333e

📥 Commits

Reviewing files that changed from the base of the PR and between dbd6b19 and c7a74ca.

📒 Files selected for processing (3)
  • src/brainlayer/enrichment_controller.py
  • src/brainlayer/pipeline/enrichment.py
  • src/brainlayer/pipeline/entity_extraction.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Macroscope - Correctness Check
  • GitHub Check: test (3.12)
  • GitHub Check: test (3.13)
  • GitHub Check: test (3.11)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior
Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work
Run pytest before claiming behavior changed safely; current test suite has 929 tests

Files:

  • src/brainlayer/pipeline/enrichment.py
  • src/brainlayer/enrichment_controller.py
  • src/brainlayer/pipeline/entity_extraction.py
src/brainlayer/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/brainlayer/**/*.py: Use Python/Typer CLI architecture for the main package in src/brainlayer/
All scripts and CLI must use paths.py:get_db_path() for resolving database path instead of hardcoding
Implement chunk lifecycle management with columns superseded_by, aggregated_into, archived_at; default search excludes lifecycle-managed chunks
Never run bulk database operations while enrichment workers are writing; always stop workers and checkpoint WAL first
Drop FTS triggers before bulk deletes on chunks table and recreate after; batch deletes in 5-10K chunks with checkpoint every 3 batches
Implement retry logic on SQLITE_BUSY errors; each worker must use its own database connection
Use ruff check src/ && ruff format src/ for linting and formatting

Files:

  • src/brainlayer/pipeline/enrichment.py
  • src/brainlayer/enrichment_controller.py
  • src/brainlayer/pipeline/entity_extraction.py
src/brainlayer/*enrichment*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/brainlayer/*enrichment*.py: Use Groq as primary enrichment backend with Gemini fallback via enrichment_controller.py, configurable via BRAINLAYER_ENRICH_BACKEND environment variable
Configure enrichment rate via BRAINLAYER_ENRICH_RATE environment variable with default 0.2 (12 RPM)

Files:

  • src/brainlayer/enrichment_controller.py
🧠 Learnings (7)
📓 Common learnings
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:51.321Z
Learning: Build extraction, classification, chunking, embedding, and indexing pipeline with post-processing for enrichment, brain graph, and Obsidian export
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:51.321Z
Learning: Applies to src/brainlayer/*enrichment*.py : Use Groq as primary enrichment backend with Gemini fallback via `enrichment_controller.py`, configurable via `BRAINLAYER_ENRICH_BACKEND` environment variable
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T01:24:44.281Z
Learning: Applies to src/brainlayer/*enrichment*.py : Enrichment backend priority: Groq (primary/cloud) → Gemini (fallback) → Ollama (offline last-resort), configurable via `BRAINLAYER_ENRICH_BACKEND` environment variable
📚 Learning: 2026-03-29T23:19:51.321Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:51.321Z
Learning: Applies to src/brainlayer/*enrichment*.py : Use Groq as primary enrichment backend with Gemini fallback via `enrichment_controller.py`, configurable via `BRAINLAYER_ENRICH_BACKEND` environment variable

Applied to files:

  • src/brainlayer/pipeline/enrichment.py
  • src/brainlayer/enrichment_controller.py
📚 Learning: 2026-04-01T01:24:44.281Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T01:24:44.281Z
Learning: Applies to src/brainlayer/*enrichment*.py : Enrichment backend priority: Groq (primary/cloud) → Gemini (fallback) → Ollama (offline last-resort), configurable via `BRAINLAYER_ENRICH_BACKEND` environment variable

Applied to files:

  • src/brainlayer/pipeline/enrichment.py
  • src/brainlayer/enrichment_controller.py
📚 Learning: 2026-03-29T23:19:50.743Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:50.743Z
Learning: Applies to src/brainlayer/*enrichment*.py : Enrichment rate configurable via `BRAINLAYER_ENRICH_RATE` environment variable (default 0.2 = 12 RPM)

Applied to files:

  • src/brainlayer/pipeline/enrichment.py
  • src/brainlayer/enrichment_controller.py
📚 Learning: 2026-03-29T23:19:51.321Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:51.321Z
Learning: Applies to src/brainlayer/*enrichment*.py : Configure enrichment rate via `BRAINLAYER_ENRICH_RATE` environment variable with default 0.2 (12 RPM)

Applied to files:

  • src/brainlayer/pipeline/enrichment.py
  • src/brainlayer/enrichment_controller.py
📚 Learning: 2026-03-22T15:55:22.017Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 100
File: src/brainlayer/enrichment_controller.py:175-199
Timestamp: 2026-03-22T15:55:22.017Z
Learning: In `src/brainlayer/enrichment_controller.py`, the `parallel` parameter in `enrich_local()` is intentionally kept in the function signature (currently unused, suppressed with `# noqa: ARG001`) for API stability. Parallel local enrichment via a thread pool or process pool is planned for a future iteration. Do not flag this as dead code requiring removal.

Applied to files:

  • src/brainlayer/pipeline/enrichment.py
📚 Learning: 2026-03-29T23:19:51.321Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:51.321Z
Learning: Build extraction, classification, chunking, embedding, and indexing pipeline with post-processing for enrichment, brain graph, and Obsidian export

Applied to files:

  • src/brainlayer/pipeline/enrichment.py
🔇 Additional comments (7)
src/brainlayer/enrichment_controller.py (1)

111-114: Entity extraction uses Gemini as primary, but guidelines specify Groq → Gemini → Ollama priority.

The _get_default_llm_caller() in entity_extraction.py (lines 360-376) prefers call_gemini_for_extraction over call_llm (which includes Groq). This inverts the documented backend priority: "Groq (primary/cloud) → Gemini (fallback) → Ollama (offline last-resort)".

Is this intentional for entity extraction specifically? If so, consider documenting the rationale or making the priority configurable via environment variable.

Based on learnings: "Enrichment backend priority: Groq (primary/cloud) → Gemini (fallback) → Ollama (offline last-resort), configurable via BRAINLAYER_ENRICH_BACKEND environment variable"

src/brainlayer/pipeline/entity_extraction.py (5)

121-164: Well-structured prompt expansion with clear taxonomy.

The expanded entity types (11) and relation types (14) with explicit descriptions provide good coverage for developer conversation extraction. The output format specification with description and strength fields aligns with the LightRAG-style output mentioned in the PR objectives.


166-197: Gleaning mechanism is well-designed for capturing missed entities.

The second-pass prompt explicitly instructs the LLM to find ADDITIONAL entities/relations, focusing on implicit relationships. The PR objectives mention this catches 20-40% more entities, which is a significant improvement.


248-268: Relation parsing correctly handles new description and strength fields.

The changes properly:

  • Extract description from the LLM response (line 248)
  • Use strength for confidence with 0.7 default (line 252)
  • Bound confidence to [0, 1.0] (line 265)
  • Store description in properties dict which gets serialized to JSON in kg_repo.py (per context snippet 1)

348-357: Relation deduplication is correctly implemented.

The case-insensitive tuple key (source_text.lower(), target_text.lower(), relation_type) properly handles the gleaning pass potentially re-finding the same relations with different casing.


360-376: Default LLM caller selection is functional but see backend priority concern.

The fallback chain (Gemini → call_llm → RuntimeError) works correctly. The backend priority concern is addressed in the enrichment_controller.py review.

src/brainlayer/pipeline/enrichment.py (1)

860-867: Enabling LLM extraction for all enriched chunks — significant behavior change.

This change activates Gemini-based entity extraction (with gleaning) for every chunk processed through the enrichment pipeline. Per PR objectives, this increases extraction from ~2-5 entities to 15-32 entities per chunk.

The API cost implications (2 Gemini calls per chunk due to gleaning) have been flagged in the entity_extraction.py review. Ensure rate limits are in place (see enrichment_controller.py review) before processing large backlogs.

Comment thread src/brainlayer/enrichment_controller.py
Comment on lines +124 to +133
try:
response = client.models.generate_content(
model=GEMINI_EXTRACTION_MODEL,
contents=prompt,
config={"response_mime_type": "application/json", "thinking_config": {"thinking_budget": 0}},
)
return response.text if response and response.text else None
except Exception:
logger.warning("Gemini extraction call failed", exc_info=True)
return None
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

No rate limiting or timeout for Gemini extraction calls.

Unlike other enrichment backends (enrich_realtime uses per_chunk_delay, call_groq has GROQ_RATE_LIMIT_DELAY), this function has no rate limiting. High-volume entity extraction could exhaust API quotas.

Additionally, the generate_content call has no explicit timeout, risking hung requests.

🛡️ Suggested fix: Add timeout and consider rate limiting
+from google.genai import types as genai_types
+
+# Rate limit for extraction calls (shared with realtime enrichment)
+_extraction_rate_limit = float(os.environ.get("BRAINLAYER_ENRICH_RATE", "0.2"))
+_last_extraction_call: float = 0.0
+
 def call_gemini_for_extraction(prompt: str) -> Optional[str]:
     """Call Gemini for entity/relation extraction. Returns raw text response."""
+    global _last_extraction_call
+    
     try:
         client = _get_gemini_client()
     except RuntimeError:
         logger.debug("Gemini not available for extraction")
         return None

     try:
+        # Rate limiting
+        if _extraction_rate_limit > 0:
+            import time
+            now = time.monotonic()
+            elapsed = now - _last_extraction_call
+            delay = 1.0 / _extraction_rate_limit
+            if _last_extraction_call > 0 and elapsed < delay:
+                time.sleep(delay - elapsed)
+            _last_extraction_call = time.monotonic()
+        
         response = client.models.generate_content(
             model=GEMINI_EXTRACTION_MODEL,
             contents=prompt,
-            config={"response_mime_type": "application/json", "thinking_config": {"thinking_budget": 0}},
+            config={
+                "response_mime_type": "application/json",
+                "thinking_config": {"thinking_budget": 0},
+                "http_options": {"timeout": 60},
+            },
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/brainlayer/enrichment_controller.py` around lines 124 - 133, The Gemini
extraction call in enrichment_controller's try block
(client.models.generate_content with GEMINI_EXTRACTION_MODEL) lacks a timeout
and any rate limiting, so add an explicit request timeout to the
generate_content invocation (using the client's supported timeout param or
wrapping the call in a timeout mechanism) and throttle calls to this extraction
path like other backends (e.g., reuse the existing per_chunk_delay logic or a
shared rate limiter/token-bucket) to pause between high-volume calls; ensure the
new timeout and delay are configurable and applied before the try/except where
response is awaited so failures/timeouts are caught and logged as before.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 86d6ebf — added 30s timeout to Gemini extraction call.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EtanHey, thanks for adding the 30s timeout — that addresses the hung-request risk. ✅

However, the rate limiting concern from the original comment is still unaddressed. Unlike enrich_realtime (which uses per_chunk_delay) and other backends, call_gemini_for_extraction has no throttling between calls. During high-volume gleaning passes (two LLM calls per chunk), this could exhaust API quotas quickly.

Would you like to open a follow-up issue to track adding rate limiting to call_gemini_for_extraction, or address it in this PR?


🧠 Learnings used
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:51.321Z
Learning: Applies to src/brainlayer/*enrichment*.py : Use Groq as primary enrichment backend with Gemini fallback via `enrichment_controller.py`, configurable via `BRAINLAYER_ENRICH_BACKEND` environment variable

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T01:24:44.281Z
Learning: Applies to src/brainlayer/*enrichment*.py : Enrichment backend priority: Groq (primary/cloud) → Gemini (fallback) → Ollama (offline last-resort), configurable via `BRAINLAYER_ENRICH_BACKEND` environment variable

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:51.321Z
Learning: Build extraction, classification, chunking, embedding, and indexing pipeline with post-processing for enrichment, brain graph, and Obsidian export

Comment thread src/brainlayer/pipeline/entity_extraction.py Outdated
…ing default

- CRITICAL: Added Optional to typing imports in enrichment_controller.py
- MAJOR: Added timeout (30s) to Gemini extraction call
- MAJOR: Changed gleaning default to False (opt-in to avoid doubling cost)
- MEDIUM: Safe float() parsing for LLM strength values (handles null/"high")

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment on lines +347 to +349
entities.extend(extra_entities)
relations.extend(extra_relations)
except Exception:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium pipeline/entity_extraction.py:347

After gleaning, entities.extend(extra_entities) combines primary and gleaning results without deduplicating overlapping spans. Each parse_llm_ner_response call deduplicates its own results, but overlaps between the two passes are retained. For example, if primary finds an entity at [10, 20] and gleaning finds one at [15, 25], both are returned. Consider calling _deduplicate_overlaps() on the combined entities list before returning.

         entities.extend(extra_entities)
         relations.extend(extra_relations)
 
+    # Deduplicate overlapping entity spans after combining passes
+    entities.sort(key=lambda e: (e.start, -len(e.text)))
+    entities = _deduplicate_overlaps(entities)
+
     # Deduplicate relations (gleaning may re-find the same ones)
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file src/brainlayer/pipeline/entity_extraction.py around lines 347-349:

After gleaning, `entities.extend(extra_entities)` combines primary and gleaning results without deduplicating overlapping spans. Each `parse_llm_ner_response` call deduplicates its own results, but overlaps between the two passes are retained. For example, if primary finds an entity at [10, 20] and gleaning finds one at [15, 25], both are returned. Consider calling `_deduplicate_overlaps()` on the combined `entities` list before returning.

Evidence trail:
src/brainlayer/pipeline/entity_extraction.py lines 332, 340 (both passes call parse_llm_ner_response), line 347 (entities.extend(extra_entities) without deduplication), lines 351-358 (relations ARE deduplicated after combining), line 275 (parse_llm_ner_response internally calls _deduplicate_overlaps), lines 100-117 (_deduplicate_overlaps function definition that handles overlapping spans). Commit: REVIEWED_COMMIT.

@EtanHey EtanHey merged commit 8db3adc into main Apr 2, 2026
5 of 6 checks passed
@EtanHey EtanHey deleted the feat/llm-entity-extraction branch April 2, 2026 23:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant