feat: add parent_id to kg_entities + expand canonical relation types#219
feat: add parent_id to kg_entities + expand canonical relation types#219
Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 11 minutes and 6 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (7)
📝 WalkthroughWalkthroughThe PR adds entity hierarchy support to the knowledge graph system by introducing a Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant MCP as MCP Handler
participant KGRepo as KG Repository
participant DB as SQLite DB
participant Format as Format Service
User->>MCP: Request entity details
MCP->>KGRepo: entity_lookup(entity_name)
KGRepo->>DB: SELECT entity data
DB-->>KGRepo: entity record
alt Entity has parent
MCP->>KGRepo: get_entity_parent(entity_id)
KGRepo->>DB: SELECT parent entity
DB-->>KGRepo: parent record
KGRepo-->>MCP: parent dict
end
MCP->>KGRepo: get_entity_children(entity_id)
KGRepo->>DB: SELECT children (parent_id = ?)
DB-->>KGRepo: children records
KGRepo-->>MCP: children list
MCP->>Format: format_entity_simple(enriched_entity)
Format->>Format: Render parent section (if exists)
Format->>Format: Render children section (if exists)
Format-->>User: Formatted entity with hierarchy
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
There was a problem hiding this comment.
🟡 Medium
The new database calls store.get_entity(), store.get_entity_parent(), and store.get_entity_children() on lines 44-50 are outside the try/except block that wraps entity_lookup. If any of these raise an exception, it propagates unhandled and crashes the handler instead of returning _error_result. Consider wrapping lines 44-52 in the existing try/except or adding a separate error handler.
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file src/brainlayer/mcp/entity_handler.py around line 27:
The new database calls `store.get_entity()`, `store.get_entity_parent()`, and `store.get_entity_children()` on lines 44-50 are outside the `try/except` block that wraps `entity_lookup`. If any of these raise an exception, it propagates unhandled and crashes the handler instead of returning `_error_result`. Consider wrapping lines 44-52 in the existing try/except or adding a separate error handler.
Evidence trail:
src/brainlayer/mcp/entity_handler.py lines 1-70 at REVIEWED_COMMIT: try/except block spans lines 26-35 (wrapping entity_lookup), database calls store.get_entity() at line 42, store.get_entity_parent() at line 44, store.get_entity_children() at line 48 are all outside this try/except block.
| """Set the parent of an entity.""" | ||
| cursor = self.conn.cursor() | ||
| cursor.execute( | ||
| "UPDATE kg_entities SET parent_id = ?, updated_at = strftime('%Y-%m-%dT%H:%M:%fZ','now') WHERE id = ?", |
There was a problem hiding this comment.
🟢 Low brainlayer/kg_repo.py:351
The updated_at timestamp format differs based on which method updates the entity. upsert_entity uses Python's strftime("%Y-%m-%dT%H:%M:%S.%fZ") producing 6-digit microseconds, while set_entity_parent uses SQLite's strftime('%Y-%m-%dT%H:%M:%fZ','now') where %f is seconds with 3-digit milliseconds. This creates inconsistent timestamp formats in the same table.
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file src/brainlayer/kg_repo.py around line 351:
The `updated_at` timestamp format differs based on which method updates the entity. `upsert_entity` uses Python's `strftime("%Y-%m-%dT%H:%M:%S.%fZ")` producing 6-digit microseconds, while `set_entity_parent` uses SQLite's `strftime('%Y-%m-%dT%H:%M:%fZ','now')` where `%f` is seconds with 3-digit milliseconds. This creates inconsistent timestamp formats in the same table.
Evidence trail:
1. src/brainlayer/kg_repo.py line 44: `now = datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%S.%fZ")` (Python strftime with %f = 6-digit microseconds)
2. src/brainlayer/kg_repo.py line 351: `strftime('%Y-%m-%dT%H:%M:%fZ','now')` (SQLite strftime where %f = SS.SSS format)
3. SQLite documentation (https://sqlite.org/lang_datefunc.html): "%f fractional seconds: SS.SSS" confirms SQLite %f includes seconds with 3-digit milliseconds
4. Python documentation: %f is microseconds as 6 digits (000000-999999)
- Add parent_id column to kg_entities with index - Add get_entity_parent(), get_entity_children(), set_entity_parent() to KGMixin - Expand CANONICAL_RELATION_TYPES: depends_on, spawns, created, lives_in, leads, freelances_for - Add _RELATION_TYPE_ALIASES mapping (ceo_of→leads, worked_at→works_at, etc.) - Wire parent/children into brain_entity output format - Tests for schema, parent/child queries, relation aliases Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
429d43f to
31f275e
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/brainlayer/pipeline/kg_extraction.py`:
- Around line 52-60: Update the extraction prompts so they enumerate the current
canonical relation types (including "leads" and "freelances_for") and remove
obsolete types ("deployed_on", "fixes", "configures") referenced in the prompt
text, ensuring the prompt language matches the normalization map
_RELATION_TYPE_ALIASES; specifically edit the prompt strings used in the
entity_extraction module (the entity extraction prompt around where
entities/relations are built) and the prompt in kg_extraction_groq to list only
the canonical relation types and include the new ones so the model emits the
canonical names rather than falling back to "related_to".
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: e9cc5430-0307-44c2-9715-ff69d848f4e4
📒 Files selected for processing (7)
src/brainlayer/kg_repo.pysrc/brainlayer/mcp/_format.pysrc/brainlayer/mcp/entity_handler.pysrc/brainlayer/pipeline/kg_extraction.pysrc/brainlayer/vector_store.pytests/test_entity_parent_relations.pytests/test_kg_schema.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: test (3.12)
- GitHub Check: test (3.11)
- GitHub Check: test (3.13)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior
Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work
Run pytest before claiming behavior changed safely; current test suite has 929 tests
**/*.py: Usepaths.py:get_db_path()for all database path resolution; all scripts and CLI must use this function rather than hardcoding paths
When performing bulk database operations: stop enrichment workers first, checkpoint WAL before and after, drop FTS triggers before bulk deletes, batch deletes in 5-10K chunks, and checkpoint every 3 batches
Files:
tests/test_kg_schema.pysrc/brainlayer/vector_store.pysrc/brainlayer/mcp/_format.pysrc/brainlayer/mcp/entity_handler.pytests/test_entity_parent_relations.pysrc/brainlayer/pipeline/kg_extraction.pysrc/brainlayer/kg_repo.py
src/brainlayer/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/brainlayer/**/*.py: Use retry logic onSQLITE_BUSYerrors; each worker must use its own database connection to handle concurrency safely
Classification must preserveai_code,stack_trace, anduser_messageverbatim; skipnoiseentries entirely and summarizebuild_loganddir_listingentries (structure only)
Use AST-aware chunking via tree-sitter; never split stack traces; mask large tool output
For enrichment backend selection: use Groq as primary backend (cloud, configured in launchd plist), Gemini as fallback viaenrichment_controller.py, and Ollama as offline last-resort; allow override viaBRAINLAYER_ENRICH_BACKENDenv var
Configure enrichment rate viaBRAINLAYER_ENRICH_RATEenvironment variable (default 0.2 = 12 RPM)
Implement chunk lifecycle columns:superseded_by,aggregated_into,archived_aton chunks table; exclude lifecycle-managed chunks from default search; allowinclude_archived=Trueto show history
Implementbrain_supersedewith safety gate for personal data (journals, notes, health/finance); use soft-delete forbrain_archivewith timestamp
Addsupersedesparameter tobrain_storefor atomic store-and-replace operations
Run linting and formatting with:ruff check src/ && ruff format src/
Run tests withpytest
UsePRAGMA wal_checkpoint(FULL)before and after bulk database operations to prevent WAL bloat
Files:
src/brainlayer/vector_store.pysrc/brainlayer/mcp/_format.pysrc/brainlayer/mcp/entity_handler.pysrc/brainlayer/pipeline/kg_extraction.pysrc/brainlayer/kg_repo.py
🧠 Learnings (4)
📚 Learning: 2026-03-29T23:19:50.743Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:50.743Z
Learning: Applies to src/brainlayer/{vector_store,search}*.py : Chunk lifecycle: implement columns `superseded_by`, `aggregated_into`, `archived_at` on chunks table; exclude lifecycle-managed chunks from default search
Applied to files:
src/brainlayer/vector_store.py
📚 Learning: 2026-03-29T23:19:50.743Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:50.743Z
Learning: Applies to src/brainlayer/vector_store.py : Use sqlite-vec with APSW for vector storage and retrieval
Applied to files:
src/brainlayer/vector_store.py
📚 Learning: 2026-04-01T01:24:44.281Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T01:24:44.281Z
Learning: Applies to src/brainlayer/mcp/*.py : MCP tools include: brain_search, brain_store, brain_recall, brain_entity, brain_expand, brain_update, brain_digest, brain_get_person, brain_tags, brain_supersede, brain_archive (legacy brainlayer_* aliases still supported)
Applied to files:
src/brainlayer/mcp/entity_handler.py
📚 Learning: 2026-03-14T02:20:54.656Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-14T02:20:54.656Z
Learning: Treat retrieval correctness, write safety, and MCP stability as critical-path concerns in BrainLayer reviews
Applied to files:
src/brainlayer/mcp/entity_handler.py
🔇 Additional comments (15)
tests/test_kg_schema.py (1)
92-92: LGTM!The test expectation correctly updated to include the new
parent_idcolumn, aligning with the schema migration invector_store.py.src/brainlayer/vector_store.py (1)
724-726: LGTM!The migration correctly adds the
parent_idcolumn conditionally and creates an index. The pattern is consistent with other migrations in the file and is idempotent.src/brainlayer/mcp/entity_handler.py (1)
43-52: LGTM — enrichment logic is sound.The handler defensively checks
entity_recordandparent_idbefore fetching parent, and correctly attaches children when non-empty. The lambda captures are safe sinceentity_iddoesn't change during execution.Note: The PR objectives mention a SQL bug in
get_entity_parent. I'll verify this in thekg_repo.pyreview.src/brainlayer/mcp/_format.py (1)
194-203: LGTM!The formatting logic is defensive with
.get()andisinstance()checks, consistent with the file's patterns and safely handles callers that don't provideparentorchildrenkeys.src/brainlayer/pipeline/kg_extraction.py (3)
41-46: LGTM — canonical types expanded correctly.The new relation types align with the PR objectives and are properly integrated into the validation logic.
70-75: Direction rules correctly defined for new types.The source/target type constraints are sensible for the new relation types.
122-122: Alias normalization correctly applied before canonical check.The order is correct: normalize aliases first, then validate against canonical types.
tests/test_entity_parent_relations.py (1)
1-130: Comprehensive test coverage for the new hierarchy features.The tests cover schema verification, CRUD operations, edge cases (empty children, no parent), ordering behavior, and relation type alias normalization. Well-structured test suite.
src/brainlayer/kg_repo.py (7)
29-29: LGTM — parent_id parameter added to upsert_entity.The new optional parameter follows the existing pattern of keyword-only arguments with sensible defaults.
39-74: LGTM — SQL updated correctly for parent_id persistence.The INSERT includes
parent_idand the conflict clause usesCOALESCE(excluded.parent_id, kg_entities.parent_id)to preserve existing values when the new value is NULL — consistent with howgroup_id,valid_from, andvalid_untilare handled.
174-204: LGTM — get_entity correctly returns parent_id.The SELECT and return dict now include
parent_idat index 13.
206-236: LGTM — get_entity_by_name correctly returns parent_id.Consistent with
get_entity— both methods now includeparent_idin the returned entity dict.
315-328: LGTM — get_entity_children implementation is correct.The query filters by
parent_id = ?, includes status check for active entities, orders byimportance DESC, name ASC, and respects the limit. The returned dict structure matches what's expected by the formatter.
347-353: LGTM — set_entity_parent is correct.Simple UPDATE that sets
parent_idand updatesupdated_attimestamp. Uses write cursor (self.conn.cursor()) appropriately.
330-345: SQL is syntactically correct — no bugs in this method.The code shows clean SQL with a proper self-join on
kg_entitiesand no stray characters or syntax errors. The PR objectives may reference an older issue that was already fixed, or they may be outdated.
Summary
parent_idcolumn tokg_entitiestable with index for instance-level hierarchyget_entity_parent(),get_entity_children(),set_entity_parent()CANONICAL_RELATION_TYPESfrom 8 to 14: addeddepends_on,spawns,created,lives_in,leads,freelances_for_RELATION_TYPE_ALIASESmapping:ceo_of→leads,cto_of→leads,worked_at→works_at,framework_for→depends_on, etc.Context
P2 #7 from brainlayer-r75-r78-unimplemented.md
Test plan
pytest tests/test_entity_parent_relations.py -vruff check src/ tests/clean🤖 Generated with Claude Code
Note
Add
parent_idto kg_entities and expand canonical relation typesparent_idcolumn to thekg_entitiestable (with index) via schema migration in vector_store.py, enabling hierarchical entity relationships.KGMixinin kg_repo.py withget_entity_children,get_entity_parent, andset_entity_parentmethods, and updatesupsert_entity/get_entity/get_entity_by_nameto includeparent_id.parentandchildrento results, and _format.py renders them in output.depends_on,spawns,created,lives_in,leads,freelances_for) and alias mappings (e.g.ceo_of→leads,worked_at→works_at) in kg_extraction.py.get_entity_parentcontains malformed SQL (stray+characters), so any entity lookup with aparent_idwill raise a SQL error at runtime.Macroscope summarized 31f275e.
Summary by CodeRabbit
Release Notes
depends_on,spawns,created,lives_in,leads, andfreelances_for.