Skip to content

feat: add parent_id to kg_entities + expand canonical relation types#219

Merged
EtanHey merged 2 commits intomainfrom
feat/entity-parent-id-relations
Apr 6, 2026
Merged

feat: add parent_id to kg_entities + expand canonical relation types#219
EtanHey merged 2 commits intomainfrom
feat/entity-parent-id-relations

Conversation

@EtanHey
Copy link
Copy Markdown
Owner

@EtanHey EtanHey commented Apr 6, 2026

Summary

  • Added parent_id column to kg_entities table with index for instance-level hierarchy
  • Added KGMixin methods: get_entity_parent(), get_entity_children(), set_entity_parent()
  • Expanded CANONICAL_RELATION_TYPES from 8 to 14: added depends_on, spawns, created, lives_in, leads, freelances_for
  • Added _RELATION_TYPE_ALIASES mapping: ceo_ofleads, cto_ofleads, worked_atworks_at, framework_fordepends_on, etc.
  • Wired parent/children info into brain_entity MCP output

Context

P2 #7 from brainlayer-r75-r78-unimplemented.md

Test plan

  • pytest tests/test_entity_parent_relations.py -v
  • ruff check src/ tests/ clean
  • CodeRabbit review addressed

🤖 Generated with Claude Code

Note

Add parent_id to kg_entities and expand canonical relation types

  • Adds a parent_id column to the kg_entities table (with index) via schema migration in vector_store.py, enabling hierarchical entity relationships.
  • Extends KGMixin in kg_repo.py with get_entity_children, get_entity_parent, and set_entity_parent methods, and updates upsert_entity/get_entity/get_entity_by_name to include parent_id.
  • The entity lookup MCP handler in entity_handler.py now attaches parent and children to results, and _format.py renders them in output.
  • Adds new canonical relation types (depends_on, spawns, created, lives_in, leads, freelances_for) and alias mappings (e.g. ceo_ofleads, worked_atworks_at) in kg_extraction.py.
  • Risk: get_entity_parent contains malformed SQL (stray + characters), so any entity lookup with a parent_id will raise a SQL error at runtime.

Macroscope summarized 31f275e.

Summary by CodeRabbit

Release Notes

  • New Features
    • Added entity hierarchy support—entities can now have parent-child relationships.
    • Enhanced entity display to show parent and child entity information when available.
    • Added new relation types: depends_on, spawns, created, lives_in, leads, and freelances_for.
    • Automatic normalization of legacy relation type aliases to their canonical forms.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 6, 2026

Warning

Rate limit exceeded

@EtanHey has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 11 minutes and 6 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 11 minutes and 6 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4ac74dcf-55f2-44c2-9f40-b1d5723b5d81

📥 Commits

Reviewing files that changed from the base of the PR and between 429d43f and 31f275e.

📒 Files selected for processing (7)
  • src/brainlayer/kg_repo.py
  • src/brainlayer/mcp/_format.py
  • src/brainlayer/mcp/entity_handler.py
  • src/brainlayer/pipeline/kg_extraction.py
  • src/brainlayer/vector_store.py
  • tests/test_entity_parent_relations.py
  • tests/test_kg_schema.py
📝 Walkthrough

Walkthrough

The PR adds entity hierarchy support to the knowledge graph system by introducing a parent_id column to kg_entities, implementing parent/child retrieval methods, normalizing additional relation types in the extraction pipeline, and enriching MCP entity responses with hierarchical information.

Changes

Cohort / File(s) Summary
Entity Hierarchy Storage & Retrieval
src/brainlayer/kg_repo.py, src/brainlayer/vector_store.py
Added parent_id parameter to upsert_entity, updated get_entity and get_entity_by_name to return parent_id, and introduced three new methods: get_entity_children, get_entity_parent, and set_entity_parent. Schema migration adds parent_id TEXT column and index to kg_entities.
Entity Hierarchy Formatting & Enrichment
src/brainlayer/mcp/entity_handler.py, src/brainlayer/mcp/_format.py
Entity handler now fetches and attaches parent and children data to lookup results. Format function conditionally renders "Parent" and "Children" sections in entity output when hierarchical data is present.
Relation Type Normalization
src/brainlayer/pipeline/kg_extraction.py
Expanded canonical relation types to include depends_on, spawns, created, lives_in, leads, freelances_for. Added _RELATION_TYPE_ALIASES and normalization logic in validate_extraction_result to rewrite legacy types (ceo_of, worked_at, framework_for, etc.) to their canonical equivalents.
Test Coverage
tests/test_entity_parent_relations.py, tests/test_kg_schema.py
New test file validates parent/child persistence, retrieval, and relation type normalization. Schema test updated to expect parent_id column in kg_entities.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant MCP as MCP Handler
    participant KGRepo as KG Repository
    participant DB as SQLite DB
    participant Format as Format Service

    User->>MCP: Request entity details
    MCP->>KGRepo: entity_lookup(entity_name)
    KGRepo->>DB: SELECT entity data
    DB-->>KGRepo: entity record
    
    alt Entity has parent
        MCP->>KGRepo: get_entity_parent(entity_id)
        KGRepo->>DB: SELECT parent entity
        DB-->>KGRepo: parent record
        KGRepo-->>MCP: parent dict
    end
    
    MCP->>KGRepo: get_entity_children(entity_id)
    KGRepo->>DB: SELECT children (parent_id = ?)
    DB-->>KGRepo: children records
    KGRepo-->>MCP: children list
    
    MCP->>Format: format_entity_simple(enriched_entity)
    Format->>Format: Render parent section (if exists)
    Format->>Format: Render children section (if exists)
    Format-->>User: Formatted entity with hierarchy
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

  • PR #29: Adds the foundational kg_entities table and entity CRUD APIs (upsert_entity, get_entity) that this PR directly extends with parent/child hierarchy methods and schema columns.
  • PR #47: Modifies the KG extraction pipeline (kg_extraction.py) to define relation types and validation; this PR updates the same module to add new canonical relation types and normalization aliases.
  • PR #218: Modifies the KG repository and MCP entity handler to expand entity-facing functionality; this PR adds parent/child hierarchy methods and MCP enrichment in the same classes.

Poem

🐰 A rabbit hops through hierarchies,
Parents and children now clearly seen,
Relations normalized, types aligned with ease,
Knowledge graphs bloom with structure pristine! 🌿

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the two main changes: adding parent_id support to kg_entities and expanding canonical relation types.
Docstring Coverage ✅ Passed Docstring coverage is 95.45% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/entity-parent-id-relations

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium

The new database calls store.get_entity(), store.get_entity_parent(), and store.get_entity_children() on lines 44-50 are outside the try/except block that wraps entity_lookup. If any of these raise an exception, it propagates unhandled and crashes the handler instead of returning _error_result. Consider wrapping lines 44-52 in the existing try/except or adding a separate error handler.

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file src/brainlayer/mcp/entity_handler.py around line 27:

The new database calls `store.get_entity()`, `store.get_entity_parent()`, and `store.get_entity_children()` on lines 44-50 are outside the `try/except` block that wraps `entity_lookup`. If any of these raise an exception, it propagates unhandled and crashes the handler instead of returning `_error_result`. Consider wrapping lines 44-52 in the existing try/except or adding a separate error handler.

Evidence trail:
src/brainlayer/mcp/entity_handler.py lines 1-70 at REVIEWED_COMMIT: try/except block spans lines 26-35 (wrapping entity_lookup), database calls store.get_entity() at line 42, store.get_entity_parent() at line 44, store.get_entity_children() at line 48 are all outside this try/except block.

Comment thread src/brainlayer/kg_repo.py
"""Set the parent of an entity."""
cursor = self.conn.cursor()
cursor.execute(
"UPDATE kg_entities SET parent_id = ?, updated_at = strftime('%Y-%m-%dT%H:%M:%fZ','now') WHERE id = ?",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low brainlayer/kg_repo.py:351

The updated_at timestamp format differs based on which method updates the entity. upsert_entity uses Python's strftime("%Y-%m-%dT%H:%M:%S.%fZ") producing 6-digit microseconds, while set_entity_parent uses SQLite's strftime('%Y-%m-%dT%H:%M:%fZ','now') where %f is seconds with 3-digit milliseconds. This creates inconsistent timestamp formats in the same table.

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file src/brainlayer/kg_repo.py around line 351:

The `updated_at` timestamp format differs based on which method updates the entity. `upsert_entity` uses Python's `strftime("%Y-%m-%dT%H:%M:%S.%fZ")` producing 6-digit microseconds, while `set_entity_parent` uses SQLite's `strftime('%Y-%m-%dT%H:%M:%fZ','now')` where `%f` is seconds with 3-digit milliseconds. This creates inconsistent timestamp formats in the same table.

Evidence trail:
1. src/brainlayer/kg_repo.py line 44: `now = datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%S.%fZ")` (Python strftime with %f = 6-digit microseconds)
2. src/brainlayer/kg_repo.py line 351: `strftime('%Y-%m-%dT%H:%M:%fZ','now')` (SQLite strftime where %f = SS.SSS format)
3. SQLite documentation (https://sqlite.org/lang_datefunc.html): "%f fractional seconds: SS.SSS" confirms SQLite %f includes seconds with 3-digit milliseconds
4. Python documentation: %f is microseconds as 6 digits (000000-999999)

EtanHey and others added 2 commits April 6, 2026 13:15
- Add parent_id column to kg_entities with index
- Add get_entity_parent(), get_entity_children(), set_entity_parent() to KGMixin
- Expand CANONICAL_RELATION_TYPES: depends_on, spawns, created, lives_in, leads, freelances_for
- Add _RELATION_TYPE_ALIASES mapping (ceo_of→leads, worked_at→works_at, etc.)
- Wire parent/children into brain_entity output format
- Tests for schema, parent/child queries, relation aliases

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@EtanHey EtanHey force-pushed the feat/entity-parent-id-relations branch from 429d43f to 31f275e Compare April 6, 2026 10:15
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/brainlayer/pipeline/kg_extraction.py`:
- Around line 52-60: Update the extraction prompts so they enumerate the current
canonical relation types (including "leads" and "freelances_for") and remove
obsolete types ("deployed_on", "fixes", "configures") referenced in the prompt
text, ensuring the prompt language matches the normalization map
_RELATION_TYPE_ALIASES; specifically edit the prompt strings used in the
entity_extraction module (the entity extraction prompt around where
entities/relations are built) and the prompt in kg_extraction_groq to list only
the canonical relation types and include the new ones so the model emits the
canonical names rather than falling back to "related_to".
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e9cc5430-0307-44c2-9715-ff69d848f4e4

📥 Commits

Reviewing files that changed from the base of the PR and between d673c51 and 429d43f.

📒 Files selected for processing (7)
  • src/brainlayer/kg_repo.py
  • src/brainlayer/mcp/_format.py
  • src/brainlayer/mcp/entity_handler.py
  • src/brainlayer/pipeline/kg_extraction.py
  • src/brainlayer/vector_store.py
  • tests/test_entity_parent_relations.py
  • tests/test_kg_schema.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: test (3.12)
  • GitHub Check: test (3.11)
  • GitHub Check: test (3.13)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior
Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work
Run pytest before claiming behavior changed safely; current test suite has 929 tests

**/*.py: Use paths.py:get_db_path() for all database path resolution; all scripts and CLI must use this function rather than hardcoding paths
When performing bulk database operations: stop enrichment workers first, checkpoint WAL before and after, drop FTS triggers before bulk deletes, batch deletes in 5-10K chunks, and checkpoint every 3 batches

Files:

  • tests/test_kg_schema.py
  • src/brainlayer/vector_store.py
  • src/brainlayer/mcp/_format.py
  • src/brainlayer/mcp/entity_handler.py
  • tests/test_entity_parent_relations.py
  • src/brainlayer/pipeline/kg_extraction.py
  • src/brainlayer/kg_repo.py
src/brainlayer/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/brainlayer/**/*.py: Use retry logic on SQLITE_BUSY errors; each worker must use its own database connection to handle concurrency safely
Classification must preserve ai_code, stack_trace, and user_message verbatim; skip noise entries entirely and summarize build_log and dir_listing entries (structure only)
Use AST-aware chunking via tree-sitter; never split stack traces; mask large tool output
For enrichment backend selection: use Groq as primary backend (cloud, configured in launchd plist), Gemini as fallback via enrichment_controller.py, and Ollama as offline last-resort; allow override via BRAINLAYER_ENRICH_BACKEND env var
Configure enrichment rate via BRAINLAYER_ENRICH_RATE environment variable (default 0.2 = 12 RPM)
Implement chunk lifecycle columns: superseded_by, aggregated_into, archived_at on chunks table; exclude lifecycle-managed chunks from default search; allow include_archived=True to show history
Implement brain_supersede with safety gate for personal data (journals, notes, health/finance); use soft-delete for brain_archive with timestamp
Add supersedes parameter to brain_store for atomic store-and-replace operations
Run linting and formatting with: ruff check src/ && ruff format src/
Run tests with pytest
Use PRAGMA wal_checkpoint(FULL) before and after bulk database operations to prevent WAL bloat

Files:

  • src/brainlayer/vector_store.py
  • src/brainlayer/mcp/_format.py
  • src/brainlayer/mcp/entity_handler.py
  • src/brainlayer/pipeline/kg_extraction.py
  • src/brainlayer/kg_repo.py
🧠 Learnings (4)
📚 Learning: 2026-03-29T23:19:50.743Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:50.743Z
Learning: Applies to src/brainlayer/{vector_store,search}*.py : Chunk lifecycle: implement columns `superseded_by`, `aggregated_into`, `archived_at` on chunks table; exclude lifecycle-managed chunks from default search

Applied to files:

  • src/brainlayer/vector_store.py
📚 Learning: 2026-03-29T23:19:50.743Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:50.743Z
Learning: Applies to src/brainlayer/vector_store.py : Use sqlite-vec with APSW for vector storage and retrieval

Applied to files:

  • src/brainlayer/vector_store.py
📚 Learning: 2026-04-01T01:24:44.281Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T01:24:44.281Z
Learning: Applies to src/brainlayer/mcp/*.py : MCP tools include: brain_search, brain_store, brain_recall, brain_entity, brain_expand, brain_update, brain_digest, brain_get_person, brain_tags, brain_supersede, brain_archive (legacy brainlayer_* aliases still supported)

Applied to files:

  • src/brainlayer/mcp/entity_handler.py
📚 Learning: 2026-03-14T02:20:54.656Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-14T02:20:54.656Z
Learning: Treat retrieval correctness, write safety, and MCP stability as critical-path concerns in BrainLayer reviews

Applied to files:

  • src/brainlayer/mcp/entity_handler.py
🔇 Additional comments (15)
tests/test_kg_schema.py (1)

92-92: LGTM!

The test expectation correctly updated to include the new parent_id column, aligning with the schema migration in vector_store.py.

src/brainlayer/vector_store.py (1)

724-726: LGTM!

The migration correctly adds the parent_id column conditionally and creates an index. The pattern is consistent with other migrations in the file and is idempotent.

src/brainlayer/mcp/entity_handler.py (1)

43-52: LGTM — enrichment logic is sound.

The handler defensively checks entity_record and parent_id before fetching parent, and correctly attaches children when non-empty. The lambda captures are safe since entity_id doesn't change during execution.

Note: The PR objectives mention a SQL bug in get_entity_parent. I'll verify this in the kg_repo.py review.

src/brainlayer/mcp/_format.py (1)

194-203: LGTM!

The formatting logic is defensive with .get() and isinstance() checks, consistent with the file's patterns and safely handles callers that don't provide parent or children keys.

src/brainlayer/pipeline/kg_extraction.py (3)

41-46: LGTM — canonical types expanded correctly.

The new relation types align with the PR objectives and are properly integrated into the validation logic.


70-75: Direction rules correctly defined for new types.

The source/target type constraints are sensible for the new relation types.


122-122: Alias normalization correctly applied before canonical check.

The order is correct: normalize aliases first, then validate against canonical types.

tests/test_entity_parent_relations.py (1)

1-130: Comprehensive test coverage for the new hierarchy features.

The tests cover schema verification, CRUD operations, edge cases (empty children, no parent), ordering behavior, and relation type alias normalization. Well-structured test suite.

src/brainlayer/kg_repo.py (7)

29-29: LGTM — parent_id parameter added to upsert_entity.

The new optional parameter follows the existing pattern of keyword-only arguments with sensible defaults.


39-74: LGTM — SQL updated correctly for parent_id persistence.

The INSERT includes parent_id and the conflict clause uses COALESCE(excluded.parent_id, kg_entities.parent_id) to preserve existing values when the new value is NULL — consistent with how group_id, valid_from, and valid_until are handled.


174-204: LGTM — get_entity correctly returns parent_id.

The SELECT and return dict now include parent_id at index 13.


206-236: LGTM — get_entity_by_name correctly returns parent_id.

Consistent with get_entity — both methods now include parent_id in the returned entity dict.


315-328: LGTM — get_entity_children implementation is correct.

The query filters by parent_id = ?, includes status check for active entities, orders by importance DESC, name ASC, and respects the limit. The returned dict structure matches what's expected by the formatter.


347-353: LGTM — set_entity_parent is correct.

Simple UPDATE that sets parent_id and updates updated_at timestamp. Uses write cursor (self.conn.cursor()) appropriately.


330-345: SQL is syntactically correct — no bugs in this method.

The code shows clean SQL with a proper self-join on kg_entities and no stray characters or syntax errors. The PR objectives may reference an older issue that was already fixed, or they may be outdated.

Comment thread src/brainlayer/pipeline/kg_extraction.py
@EtanHey EtanHey merged commit a62b023 into main Apr 6, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant