Skip to content

docs: update README — 1,454 tests, 12 MCP tools, entity contracts#120

Merged
EtanHey merged 2 commits intomainfrom
docs/march-28-sweep
Mar 28, 2026
Merged

docs: update README — 1,454 tests, 12 MCP tools, entity contracts#120
EtanHey merged 2 commits intomainfrom
docs/march-28-sweep

Conversation

@EtanHey
Copy link
Copy Markdown
Owner

@EtanHey EtanHey commented Mar 28, 2026

Summary

  • Updated test count: 1,204 -> 1,454 Python tests
  • Updated MCP tool count: 11 -> 12 (added brain_enrich)
  • Updated chunk count: 281K+ -> 284K+
  • Added entity contracts and health scoring to feature list
  • Added MLX as default enrichment backend
  • Added brain_enrich tool to MCP tools table

Test plan

  • Verify badge renders correctly on GitHub
  • Confirm all numbers match pytest --co -q output

Generated with Claude Code

Note

Add entity health scoring, contracts, and new DB schema tables

  • Introduces contracts/entity-types.yaml defining field requirements, relationship expectations, and health criteria for entity types (agent, person, tool, project, concept).
  • Adds scripts/score_entity_health.py which computes a weighted completeness score (required/expected fields, relationship density, chunk density, recency decay) per entity and upserts results into a new entity_health table.
  • Extends VectorStore._init_db in vector_store.py to create entity_contracts, entity_health, and entity_type_hierarchy tables, and adds entity_subtype/status columns to kg_entities and relation_tier/weight to kg_entity_chunks.
  • Updates entity_lookup in digest.py to include completeness_score and health_level from the entity_health table when available.
  • Updates README to reflect 12 MCP tools, 1,454 tests, and documents the new enrichment and entity contracts features.
  • Risk: schema alterations run via ALTER TABLE on every DB init; existing databases will be migrated silently with new default-valued columns.
📊 Macroscope summarized 1ad9a62. 6 files reviewed, 3 issues evaluated, 1 issue filtered, 1 comment posted

🗂️ Filtered Issues

scripts/score_entity_health.py — 0 comments posted, 2 evaluated, 1 filtered
  • line 42: If the YAML file is empty or contains only whitespace/comments, yaml.safe_load(f) returns None. The subsequent data.get("entity_types", {}) will raise AttributeError: 'NoneType' object has no attribute 'get'. [ Skipped comment generation ]

Summary by CodeRabbit

  • New Features

    • Entity health scoring system now tracks completeness and validates entity data integrity.
    • Entity type contracts framework provides structured validation for agent, person, tool, project, and concept entities.
    • New enrichment controller tool (brain_enrich) available.
  • Documentation

    • Updated capabilities and tool count in project documentation.
  • Tests

    • Added comprehensive test suite validating entity contract functionality and health scoring.

EtanHey and others added 2 commits March 28, 2026 20:10
…rarchy

Add entity contract system for evaluating knowledge graph entity completeness:

- SQL schema: entity_contracts, entity_health, entity_type_hierarchy tables
- ALTER: kg_entities +entity_subtype +status; kg_entity_chunks +relation_tier +weight
- ALTER: kg_entity_aliases +valid_from +valid_to
- contracts/entity-types.yaml: YAML contracts for agent, person, tool, project, concept
- scripts/score_entity_health.py: 5-dimension weighted scoring (required 40%, expected 25%,
  relationships 15%, chunks 10%, recency 10%) with Wikidata 5-tier health levels
- brain_entity response now includes completeness_score and health_level
- Type hierarchy seeded with core types + subtypes (golem->agent, platform->tool, etc.)
- 39 new tests (TDD: RED first, all GREEN), 0 regressions on existing suite

R49 research: brainbar-b917124c-913

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tures

Tests: 1,204 -> 1,454 Python. MCP tools: 11 -> 12 (brain_enrich added).
Chunks: 281K+ -> 284K+. Added entity contracts, health scoring, MLX backend.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 28, 2026

📝 Walkthrough

Walkthrough

This PR implements R49 Phase 1 entity contract and health scoring infrastructure. It introduces an entity type contracts schema (contracts/entity-types.yaml), a health scoring CLI script that computes completeness metrics, database migrations for new entity-related tables and columns, and extends the entity lookup API with health metadata. Comprehensive tests validate the end-to-end implementation.

Changes

Cohort / File(s) Summary
Documentation & Contracts
README.md, contracts/entity-types.yaml
Updated README to reflect 12 MCP tools (was 11), 1,454 tests (was 1,204), new brain_enrich tool, MLX backend addition, and "Entity contracts" feature. Added new versioned R49 entity type contracts file defining required/expected/optional fields, relationship expectations, and health criteria for five entity types.
Health Scoring Implementation
scripts/score_entity_health.py
New CLI script that loads entity contracts, iterates entities, computes weighted completeness scores based on field coverage, relationship density, chunk density, and recency (90-day half-life), classifies health levels (1–5), and persists results to entity_health table via upsert.
Core Database & Logic Updates
src/brainlayer/vector_store.py, src/brainlayer/pipeline/digest.py
Database initialization adds three new tables (entity_contracts, entity_health, entity_type_hierarchy) with seeded hierarchy data and conditional DDL migrations to kg_entities (adds entity_subtype, status), kg_entity_chunks (adds relation_tier, weight), and kg_entity_aliases (adds valid_from, valid_to). Entity lookup enriched with conditional fetch of completeness_score and health_level from entity_health.
Test Coverage
tests/test_entity_contracts.py, tests/test_kg_schema.py
Comprehensive 540-line end-to-end test suite validating schema initialization, contracts loading, health scoring behavior, entity count verification, health classification boundaries, persistence and deduplication, and enriched entity lookup fields. Updated KG schema assertions to include new columns on kg_entities and kg_entity_chunks.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant digest as entity_lookup<br/>(pipeline)
    participant store as VectorStore<br/>(DB)
    participant health as entity_health<br/>(table)
    participant scorer as score_entity_health<br/>(script)

    User->>digest: entity_lookup(entity_name)
    digest->>store: query FTS/semantic search
    store-->>digest: entity results + relations
    digest->>store: fetch health metadata
    store->>health: SELECT completeness_score,<br/>health_level by entity_name
    health-->>store: score data (or NULL)
    store-->>digest: completeness_score, health_level
    digest-->>User: enriched entity result

    User->>scorer: run score_entity_health.py
    scorer->>store: load contracts from YAML
    scorer->>store: iterate kg_entities
    loop per entity
        scorer->>store: fetch fields, relationships, chunks, updated_at
        scorer->>scorer: compute weighted completeness<br/>(fields + relations + chunks + recency)
        scorer->>scorer: classify health level (1-5)
    end
    scorer->>store: upsert entity_health records
    store-->>scorer: persist (commit)
    scorer-->>User: print health distribution counts
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 A rabbit hops through entity contracts divine,
Scoring health with chunks and bonds aligned,
Completeness blossoms in the KG store—
R49 contracts open up the door! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 46.55% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main documentation updates: test count increase, new MCP tool, and entity contracts feature addition.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/march-28-sweep

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment on lines +395 to +403
def test_missing_required_detected(self, populated_store, contracts_path):
"""coachClaude doesn't have all required fields populated — detect gaps."""
from scripts.score_entity_health import score_entity, load_contracts
contracts = load_contracts(str(contracts_path))
result = score_entity(
populated_store, "ent-coach", "agent", contracts["agent"]
)
# capabilities is partially there in metadata, memory_domains is missing
assert isinstance(result["missing_required"], list)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium tests/test_entity_contracts.py:395

test_missing_required_detected only asserts that missing_required is a list, which passes even when empty. The docstring and comment claim memory_domains should be detected as missing, but the test never verifies this field (or any field) actually appears in the list. Consider adding an assertion that specific expected fields are present in missing_required.

        assert isinstance(result["missing_required"], list)
+        # Verify specific missing fields are detected
+        assert "memory_domains" in result["missing_required"]
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file tests/test_entity_contracts.py around lines 395-403:

`test_missing_required_detected` only asserts that `missing_required` is a list, which passes even when empty. The docstring and comment claim `memory_domains` should be detected as missing, but the test never verifies this field (or any field) actually appears in the list. Consider adding an assertion that specific expected fields are present in `missing_required`.

Evidence trail:
tests/test_entity_contracts.py lines 395-403 at REVIEWED_COMMIT: The test function `test_missing_required_detected` has docstring claiming detection of missing required fields, comment specifically mentioning `memory_domains is missing`, but the only assertion is `assert isinstance(result["missing_required"], list)` with no check for specific field presence.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
README.md (2)

94-110: ⚠️ Potential issue | 🟡 Minor

Inconsistency: Update architecture diagram to 12 tools.

The Mermaid diagram label still shows "11 tools" while the rest of the document reflects 12.

📝 Proposed fix
-    A["Claude Code / Cursor / Zed"] -->|MCP| B["BrainLayer MCP Server<br/>11 tools"]
+    A["Claude Code / Cursor / Zed"] -->|MCP| B["BrainLayer MCP Server<br/>12 tools"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 94 - 110, Update the Mermaid diagram node B that
currently reads "BrainLayer MCP Server<br/>11 tools" to reflect "12 tools" so
the architecture diagram matches the rest of the doc; locate the graph LR block
and change the label for node B (symbol B / "BrainLayer MCP Server") to show 12
tools.

3-3: ⚠️ Potential issue | 🟡 Minor

Inconsistency: Update tagline to 12 MCP tools.

The opening tagline still mentions "11 MCP tools" but the rest of the README (badge, stats line, section header, comparison table) now reflects 12 tools.

📝 Proposed fix
-> Persistent memory and knowledge graph for AI agents — 11 MCP tools, real-time JSONL watcher, Axiom telemetry, and a native macOS daemon for always-on recall across every conversation.
+> Persistent memory and knowledge graph for AI agents — 12 MCP tools, real-time JSONL watcher, Axiom telemetry, and a native macOS daemon for always-on recall across every conversation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` at line 3, Update the README opening tagline to reflect "12 MCP
tools" instead of "11 MCP tools" so it matches the badges, stats line, section
header, and comparison table; locate the tagline string in the README (the first
paragraph/heading that currently reads "Persistent memory and knowledge graph
for AI agents — 11 MCP tools, real-time JSONL watcher, Axiom telemetry, and a
native macOS daemon for always-on recall across every conversation.") and
replace "11 MCP tools" with "12 MCP tools".
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/score_entity_health.py`:
- Around line 240-245: The tuple indexing for the parent type lookup is
incorrect/overly complex: the SELECT returns single-column rows so parent_row[0]
is like (parent_type,) and the ternary with len(parent_row[0]) > 1 is useless;
change the logic that follows the cursor.execute call to extract the parent_type
as parent_row[0][0] (or assign parent_type = parent_row[0][0]) and then use
contracts.get(parent_type) instead of the current conditional expression so the
lookup is correct and readable (refer to variables parent_row, cursor.execute
and contracts in this change).

In `@tests/test_entity_contracts.py`:
- Around line 17-20: The file imports unused symbols subprocess and patch;
remove these unused imports from the top of tests/test_entity_contracts.py so
only needed imports (sys, Path) remain, or alternatively use them in tests if
intended; specifically edit the import line(s) to drop subprocess and from
unittest.mock import patch unless patch is actually referenced by functions or
fixtures like any test_* functions.

---

Outside diff comments:
In `@README.md`:
- Around line 94-110: Update the Mermaid diagram node B that currently reads
"BrainLayer MCP Server<br/>11 tools" to reflect "12 tools" so the architecture
diagram matches the rest of the doc; locate the graph LR block and change the
label for node B (symbol B / "BrainLayer MCP Server") to show 12 tools.
- Line 3: Update the README opening tagline to reflect "12 MCP tools" instead of
"11 MCP tools" so it matches the badges, stats line, section header, and
comparison table; locate the tagline string in the README (the first
paragraph/heading that currently reads "Persistent memory and knowledge graph
for AI agents — 11 MCP tools, real-time JSONL watcher, Axiom telemetry, and a
native macOS daemon for always-on recall across every conversation.") and
replace "11 MCP tools" with "12 MCP tools".
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: d0f71f44-96c5-43aa-ad6f-7bdaf42c4493

📥 Commits

Reviewing files that changed from the base of the PR and between d0bcac9 and 1ad9a62.

📒 Files selected for processing (7)
  • README.md
  • contracts/entity-types.yaml
  • scripts/score_entity_health.py
  • src/brainlayer/pipeline/digest.py
  • src/brainlayer/vector_store.py
  • tests/test_entity_contracts.py
  • tests/test_kg_schema.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: test (3.11)
  • GitHub Check: test (3.12)
  • GitHub Check: test (3.13)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior
Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work
Run pytest before claiming behavior changed safely; current test suite has 929 tests

Files:

  • tests/test_kg_schema.py
  • src/brainlayer/pipeline/digest.py
  • src/brainlayer/vector_store.py
  • scripts/score_entity_health.py
  • tests/test_entity_contracts.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use pytest for testing with configuration in pyproject.toml

Files:

  • tests/test_kg_schema.py
  • tests/test_entity_contracts.py
src/brainlayer/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/brainlayer/**/*.py: Use all scripts and CLI to resolve DB path via paths.py:get_db_path() with env var override support
Preserve verbatim content for ai_code, stack_trace, and user_message fields during classification
Use AST-aware chunking via tree-sitter and never split stack traces during document chunking
Mask large tool output and skip noise entries during classification and chunking
Configure enrichment rate via BRAINLAYER_ENRICH_RATE env var with default of 0.2 (12 RPM)
Implement retry logic on SQLITE_BUSY errors with each worker using its own database connection for concurrency
Use Typer CLI framework for command-line interface implementation in src/brainlayer/
Exclude lifecycle-managed chunks from default search results; provide include_archived parameter to show historical chunks

Files:

  • src/brainlayer/pipeline/digest.py
  • src/brainlayer/vector_store.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use ruff for linting and formatting with commands: ruff check src/ && ruff format src/

Files:

  • src/brainlayer/pipeline/digest.py
  • src/brainlayer/vector_store.py
src/brainlayer/vector_store.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/brainlayer/vector_store.py: Store vector embeddings using sqlite-vec via APSW in the vector store implementation
Drop FTS triggers before bulk deletes on chunks table and recreate after to prevent performance degradation
Batch deletes into 5-10K chunks with checkpoints every 3 batches for large dataset bulk operations
Implement chunk lifecycle management with columns superseded_by, aggregated_into, and archived_at for tracking chunk relationships and soft-deletes

Files:

  • src/brainlayer/vector_store.py
🧠 Learnings (13)
📚 Learning: 2026-03-17T01:04:22.497Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-03-17T01:04:22.497Z
Learning: Applies to src/brainlayer/mcp/**/*.py and brain-bar/Sources/BrainBar/MCPRouter.swift: The 8 required MCP tools are `brain_search`, `brain_store`, `brain_recall`, `brain_entity`, `brain_expand`, `brain_update`, `brain_digest`, `brain_tags`. `brain_tags` is the 8th tool, replacing `brain_get_person`, as defined in the Phase B spec merged in PR `#72`. The Python MCP server already implements `brain_tags`. Legacy `brainlayer_*` aliases must be maintained for backward compatibility.

Applied to files:

  • README.md
📚 Learning: 2026-03-28T00:26:19.473Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-28T00:26:19.473Z
Learning: Applies to src/brainlayer/mcp/**/*.py : Implement MCP server entrypoint as brainlayer-mcp with 11 tools: brain_search, brain_store, brain_recall, brain_entity, brain_expand, brain_update, brain_digest, brain_get_person, brain_tags, brain_supersede, brain_archive

Applied to files:

  • README.md
📚 Learning: 2026-03-28T00:26:19.473Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-28T00:26:19.473Z
Learning: Applies to src/brainlayer/**/*.py : Exclude lifecycle-managed chunks from default search results; provide include_archived parameter to show historical chunks

Applied to files:

  • README.md
📚 Learning: 2026-03-17T01:04:22.497Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-03-17T01:04:22.497Z
Learning: In BrainLayer, the BrainBar daemon uses the socket path `/tmp/brainbar.sock` (NOT `/tmp/brainlayer.sock`). BrainBar is a new native Swift daemon designed to coexist with the existing Python `brainlayer-mcp` server during the migration period. Different socket paths avoid conflicts and enable A/B testing. Once BrainBar is proven stable, the Python server will be retired.

Applied to files:

  • README.md
📚 Learning: 2026-03-28T00:26:19.473Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-28T00:26:19.473Z
Learning: Applies to src/brainlayer/**/*.py : Mask large tool output and skip noise entries during classification and chunking

Applied to files:

  • README.md
📚 Learning: 2026-03-14T02:20:54.656Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-14T02:20:54.656Z
Learning: Be aware of known BrainLayer issues: DB locking during enrichment and WAL growth up to 4.7GB

Applied to files:

  • README.md
📚 Learning: 2026-03-28T00:26:19.473Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-28T00:26:19.473Z
Learning: Applies to src/brainlayer/enrichment_controller.py : Use Groq as primary enrichment backend with Gemini as fallback via enrichment_controller.py, overridable via BRAINLAYER_ENRICH_BACKEND env var

Applied to files:

  • README.md
📚 Learning: 2026-03-28T00:26:19.473Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-28T00:26:19.473Z
Learning: Applies to src/brainlayer/**/*.py : Configure enrichment rate via BRAINLAYER_ENRICH_RATE env var with default of 0.2 (12 RPM)

Applied to files:

  • README.md
📚 Learning: 2026-03-22T15:55:22.017Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 100
File: src/brainlayer/enrichment_controller.py:175-199
Timestamp: 2026-03-22T15:55:22.017Z
Learning: In `src/brainlayer/enrichment_controller.py`, the `parallel` parameter in `enrich_local()` is intentionally kept in the function signature (currently unused, suppressed with `# noqa: ARG001`) for API stability. Parallel local enrichment via a thread pool or process pool is planned for a future iteration. Do not flag this as dead code requiring removal.

Applied to files:

  • README.md
📚 Learning: 2026-03-14T02:20:54.656Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-14T02:20:54.656Z
Learning: Applies to **/*.py : Run pytest before claiming behavior changed safely; current test suite has 929 tests

Applied to files:

  • README.md
📚 Learning: 2026-03-28T00:26:19.473Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-28T00:26:19.473Z
Learning: Applies to src/**/*.py : Use ruff for linting and formatting with commands: ruff check src/ && ruff format src/

Applied to files:

  • README.md
📚 Learning: 2026-03-28T00:26:19.473Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-28T00:26:19.473Z
Learning: Applies to src/brainlayer/vector_store.py : Store vector embeddings using sqlite-vec via APSW in the vector store implementation

Applied to files:

  • src/brainlayer/vector_store.py
📚 Learning: 2026-03-28T00:26:19.473Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-28T00:26:19.473Z
Learning: Applies to src/brainlayer/vector_store.py : Implement chunk lifecycle management with columns superseded_by, aggregated_into, and archived_at for tracking chunk relationships and soft-deletes

Applied to files:

  • src/brainlayer/vector_store.py
🔇 Additional comments (7)
README.md (1)

9-10: LGTM — Documentation updates are consistent.

The test count (1,454), MCP tool count (12), chunk count (284K+), entity contracts feature, MLX backend addition, and brain_enrich tool documentation all align with the PR objectives and commit messages.

Also applies to: 15-15, 125-125, 128-128, 149-154, 183-183, 187-190, 201-201, 216-218, 289-289

contracts/entity-types.yaml (1)

1-162: LGTM — Well-structured entity type contracts.

The contract specification is well-documented with clear requirement levels and consistent health criteria across all entity types. The hierarchical structure (required/expected/optional) provides flexibility for progressive entity completeness.

tests/test_kg_schema.py (1)

74-95: LGTM — Schema assertions updated for R49 columns.

The expected column sets correctly reflect the new entity_subtype, status, relation_tier, and weight columns added by the R49 migrations in vector_store.py.

Also applies to: 118-122

src/brainlayer/vector_store.py (1)

586-662: LGTM — R49 schema additions are idempotent and well-structured.

The schema changes follow best practices:

  • Table creation uses IF NOT EXISTS
  • Column additions check existence via PRAGMA table_info before ALTER TABLE
  • Type hierarchy seeding uses INSERT OR IGNORE
  • All changes are safe for repeated _init_db() calls
src/brainlayer/pipeline/digest.py (1)

847-872: LGTM — Health data integration with graceful fallback.

The implementation correctly:

  • Uses _read_cursor() for thread-safe reads
  • Queries by entity_name matching the entity_health table's primary key
  • Falls back to None when the table doesn't exist or entity is unscored

The bare except Exception is appropriate here since it needs to handle both OperationalError (table doesn't exist) and empty results, and the fallback behavior (return None for both fields) is the same.

scripts/score_entity_health.py (1)

1-22: LGTM — Well-documented scoring implementation.

The scoring algorithm is clearly documented with:

  • 5-dimension weighted scoring (40% required, 25% expected, 15% relationships, 10% chunks, 10% recency)
  • Wikidata-style 5-tier health classification
  • Exponential decay for recency with 90-day half-life
  • Proper use of get_db_path() for DB resolution per coding guidelines
  • Clean resource management with finally block

Also applies to: 34-42, 45-63, 129-219, 286-318

tests/test_entity_contracts.py (1)

30-128: LGTM — Comprehensive R49 test coverage.

The test suite thoroughly covers:

  • Schema creation (3 new tables, altered columns with defaults)
  • Contract YAML validation (structure, required fields, health criteria)
  • Scoring algorithm (5-dimension weights, tier boundaries, edge cases)
  • Persistence (upsert behavior, idempotency)
  • Lookup integration (health fields present/absent)
  • Type hierarchy seeding

Good use of the populated_store fixture to set up realistic test data with entities, relations, and chunks.

Also applies to: 134-178, 184-249, 254-265, 271-317, 323-431, 436-463, 469-508, 514-539

Comment on lines +240 to +245
parent_row = list(cursor.execute(
"SELECT parent_type FROM entity_type_hierarchy WHERE child_type = ?",
(entity_type,),
))
if parent_row:
contract = contracts.get(parent_row[0][1] if len(parent_row[0]) > 1 else parent_row[0][0])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Bug: Incorrect tuple indexing for parent type lookup.

The query SELECT parent_type FROM entity_type_hierarchy returns a single-column tuple, so parent_row[0] is (parent_type,). The condition len(parent_row[0]) > 1 will never be true for a single-column result, making this always fall through to parent_row[0][0].

The ternary is unnecessary and confusing.

🐛 Proposed fix
             parent_row = list(cursor.execute(
                 "SELECT parent_type FROM entity_type_hierarchy WHERE child_type = ?",
                 (entity_type,),
             ))
             if parent_row:
-                contract = contracts.get(parent_row[0][1] if len(parent_row[0]) > 1 else parent_row[0][0])
+                contract = contracts.get(parent_row[0][0])
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/score_entity_health.py` around lines 240 - 245, The tuple indexing
for the parent type lookup is incorrect/overly complex: the SELECT returns
single-column rows so parent_row[0] is like (parent_type,) and the ternary with
len(parent_row[0]) > 1 is useless; change the logic that follows the
cursor.execute call to extract the parent_type as parent_row[0][0] (or assign
parent_type = parent_row[0][0]) and then use contracts.get(parent_type) instead
of the current conditional expression so the lookup is correct and readable
(refer to variables parent_row, cursor.execute and contracts in this change).

Comment on lines +17 to +20
import subprocess
import sys
from pathlib import Path
from unittest.mock import patch
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Minor: Unused imports.

subprocess and patch are imported but not used in this test file.

🧹 Proposed cleanup
 import json
 import math
 import os
-import subprocess
-import sys
 from pathlib import Path
-from unittest.mock import patch
 
 import pytest
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import subprocess
import sys
from pathlib import Path
from unittest.mock import patch
import json
import math
import os
from pathlib import Path
import pytest
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_entity_contracts.py` around lines 17 - 20, The file imports unused
symbols subprocess and patch; remove these unused imports from the top of
tests/test_entity_contracts.py so only needed imports (sys, Path) remain, or
alternatively use them in tests if intended; specifically edit the import
line(s) to drop subprocess and from unittest.mock import patch unless patch is
actually referenced by functions or fixtures like any test_* functions.

@EtanHey EtanHey merged commit 3df38b8 into main Mar 28, 2026
2 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant