Skip to content

feat: KG standard tables — matches Convex kgSpec.ts#46

Merged
EtanHey merged 1 commit intomainfrom
feat/kg-tables
Feb 27, 2026
Merged

feat: KG standard tables — matches Convex kgSpec.ts#46
EtanHey merged 1 commit intomainfrom
feat/kg-tables

Conversation

@EtanHey
Copy link
Copy Markdown
Owner

@EtanHey EtanHey commented Feb 27, 2026

Summary

  • Adds standardized KG schema to BrainLayer SQLite (matching 6PM Convex kgSpec.ts)
  • New columns on kg_entities (canonical_name, description, confidence, importance, valid_from/until, group_id)
  • New columns on kg_relations (fact, importance, valid_from/until, expired_at, source_chunk_id)
  • New kg_current_facts VIEW for auto-filtering expired relations
  • New methods: soft_close_relation, get_current_facts, traverse (2-hop CTE), resolve_entity
  • Shared constants module: ENTITY_TYPES, RELATION_TYPES, DECAY_CONSTANTS, effective_score()
  • 41 new tests, all existing tests pass (backward compatible)

Test plan

  • 41 new tests in test_kg_standard.py
  • 40 existing tests in test_kg_schema.py pass
  • Full suite: 333 pass, 1 pre-existing MLX failure
  • Lint clean (ruff check + format)

🤖 Generated with Claude Code


Note

Medium Risk
Medium risk because it introduces SQLite schema migrations (new columns, indexes, and a view) and expands KG CRUD/query behavior, which could affect existing databases and downstream consumers if assumptions about returned fields change.

Overview
Adds a standardized KG schema to the SQLite VectorStore via migrations: new metadata/validity columns on kg_entities and kg_relations, mention_type on kg_entity_chunks, new supporting indexes, and a kg_current_facts view that filters out expired/out-of-validity relations.

Extends KG APIs to read/write these standard fields (updated upsert_entity, add_relation, link_entity_chunk, and expanded getters), and adds new utilities: soft_close_relation, get_current_facts, multi-hop traverse (recursive CTE), and resolve_entity (alias/name/canonical/FTS fallback). Introduces brainlayer.kg shared constants plus effective_score() for time-decayed scoring, and adds/updates tests to cover the new schema and behaviors while asserting backward compatibility.

Written by Cursor Bugbot for commit db9fb56. This will update automatically on new commits. Configure here.

Summary by CodeRabbit

Release Notes

  • New Features
    • Enhanced knowledge graph with richer entity metadata including canonical names, descriptions, confidence scores, and importance ratings.
    • Entities and relations now support validity windows to track temporal accuracy.
    • Added graph traversal capabilities to explore relationships across multiple hops.
    • Introduced entity resolution to match references by name, alias, or canonical identifier.
    • Relations can now be marked as expired while preserving historical data.

Add standardized knowledge graph schema to BrainLayer's SQLite:
- kg_entities: canonical_name, description, confidence, importance, valid_from/until, group_id
- kg_relations: fact, importance, valid_from/until, expired_at, source_chunk_id
- kg_entity_chunks: mention_type (explicit/inferred/embedding_match)
- kg_current_facts VIEW (auto-filters expired relations)
- New methods: soft_close_relation, get_current_facts, traverse (2-hop CTE), resolve_entity
- Shared constants: ENTITY_TYPES, RELATION_TYPES, DECAY_CONSTANTS, effective_score()
- 41 new tests in test_kg_standard.py, all backward compatible

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@EtanHey EtanHey merged commit 86f097a into main Feb 27, 2026
1 of 5 checks passed
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 27, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f0839fd and db9fb56.

📒 Files selected for processing (4)
  • src/brainlayer/kg/__init__.py
  • src/brainlayer/vector_store.py
  • tests/test_kg_schema.py
  • tests/test_kg_standard.py

📝 Walkthrough

Walkthrough

This PR extends the Knowledge Graph system with standardized constants, time-decay scoring, and enhanced entity/relation management. Introduces entity and relation type definitions, adds time-based decay scoring, expands the KG schema with validity windows and metadata fields, and provides new traversal and resolution utilities.

Changes

Cohort / File(s) Summary
KG Constants & Scoring
src/brainlayer/kg/__init__.py
New module defining ENTITY_TYPES, RELATION_TYPES, DECAY_CONSTANTS mappings, and effective_score() function for time-decayed scoring based on entity type and age.
Vector Store KG Enhancements
src/brainlayer/vector_store.py
Comprehensive schema extensions: added canonical_name, description, confidence, importance, valid_from/valid_until, group_id to kg_entities; fact, importance, valid_from/valid_until, expired_at, source_chunk_id to kg_relations; mention_type to kg_entity_chunks. Updated method signatures for upsert_entity, add_relation, link_entity_chunk with new parameters. Introduced new utility methods: soft_close_relation(), get_current_facts(), traverse(), resolve_entity(). Created kg_current_facts view for filtering expired relations.
Schema & Standard Tests
tests/test_kg_schema.py, tests/test_kg_standard.py
Updated schema expectations for new KG columns in test_kg_schema.py. Added comprehensive new test suite (test_kg_standard.py) validating schema structure, effective scoring, entity/relation CRUD with standard fields, soft-close semantics, graph traversal, entity resolution, mention_type handling, and backward compatibility.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

  • PR #33: Directly related—both PRs modify kg_entity_chunks handling and link_entity_chunk method in the vector store.
  • PR #29: Extends the foundational KG schema and vector_store KG functionality introduced in this PR.

Poem

🐰 Hops through graphs with joy so bright,
Time-decayed scores, validity's might,
Traverse relations, resolve with care,
Canonical names floating in the air,
Schema bloom—the KG's delight!

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/kg-tables

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 5 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

WHERE r.depth > 0
ORDER BY r.depth, e.name
"""
params_list = [entity_id, entity_id] + relation_types + [max_depth]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Traverse parameter order wrong with relation_types filter

High Severity

When relation_types is provided, params_list puts the relation type strings before max_depth, but the SQL query's WHERE re.depth < ? placeholder expects max_depth as the third parameter. The actual ordering in the query is: entity_id, entity_id, max_depth, then the IN (?, ...) relation type placeholders. But params_list = [entity_id, entity_id] + relation_types + [max_depth] puts relation types third and max_depth last. This causes a string to be compared against depth, completely breaking traversal when relation_types is specified.

Additional Locations (1)

Fix in Cursor Fix in Web

# 1. Exact alias
result = self.get_entity_by_alias(name_or_alias)
if result:
return result
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alias resolution returns dict missing new standard fields

Medium Severity

resolve_entity calls get_entity_by_alias as its first resolution step and returns the result directly. But get_entity_by_alias wasn't updated — it only returns 6 fields (id, entity_type, name, metadata, created_at, updated_at), missing the 7 new standard fields (canonical_name, description, confidence, importance, valid_from, valid_until, group_id). Steps 2–4 of resolution all return the full 13-field dict, so callers get inconsistent dict shapes depending on how the entity was resolved.

Additional Locations (1)

Fix in Cursor Fix in Web

if relation_types:
placeholders = ", ".join("?" for _ in relation_types)
rel_filter = f"AND r.relation_type IN ({placeholders})"
params = [entity_id] + relation_types + [max_depth]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead params variable never used in traverse

Low Severity

The params variable is computed (lines 2919–2923) but never used — params_list is what actually gets passed to cursor.execute. This dead code is confusing, especially since params also has an incorrect parameter ordering, which could mislead future maintainers into thinking it's the intended parameter list.

Fix in Cursor Fix in Web

confidence = excluded.confidence
confidence = excluded.confidence,
fact = COALESCE(excluded.fact, kg_relations.fact),
importance = COALESCE(excluded.importance, kg_relations.importance),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

COALESCE on importance is ineffective due to default

Medium Severity

In add_relation, importance has a Python default of 0.5 (never None), so COALESCE(excluded.importance, kg_relations.importance) in the ON CONFLICT clause will always use the excluded value. On upsert without an explicit importance, the existing stored importance is silently overwritten with 0.5 instead of being preserved. This contradicts the COALESCE pattern's intent — importance would need an Optional[float] = None default to allow preservation.

Fix in Cursor Fix in Web

WHERE (valid_from IS NULL OR valid_from <= strftime('%Y-%m-%dT%H:%M:%fZ','now'))
AND (valid_until IS NULL OR valid_until >= strftime('%Y-%m-%dT%H:%M:%fZ','now'))
AND expired_at IS NULL
""")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

View timestamp comparison breaks with standard ISO 8601 input

Medium Severity

The kg_current_facts VIEW compares user-provided valid_from/valid_until strings against strftime('%Y-%m-%dT%H:%M:%fZ','now'), which produces timestamps with 3 fractional digits (e.g., ...T09:00:00.000Z). SQLite compares TEXT values byte-by-byte. If a user stores a common ISO 8601 format without fractional seconds (e.g., "2026-02-27T09:00:00Z"), the Z character (ASCII 90) at the seconds boundary sorts higher than . (ASCII 46), causing valid_from <= now to return FALSE when the times are at the same second. This silently excludes valid relations or includes expired ones depending on the field.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant