Conversation
…tadata (R68 Round 4) Scans ~/Gits/ for repos with pyproject.toml or package.json, extracts metadata (name, version, description, dependencies, scripts, language/framework), and upserts structured project entities into the KG with depends_on relations for notable dependencies. - New module: pipeline/code_intelligence.py (scan, extract, enrich) - New CLI command: brainlayer code-intel [--base-dir] [--dry-run] - 14 tests covering scan, enrich, idempotency, dry-run, metadata storage - Detects package manager (npm/bun/pnpm/yarn) and framework (React, Next.js, MCP, etc.) - Creates library entities for notable deps (frameworks, SDKs, databases) - Follows agent_enrichment.py pattern for KG entity upsert Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@coderabbitai review |
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
✅ Actions performedReview triggered.
|
📝 WalkthroughWalkthroughAdded a new code-intel CLI and a code intelligence pipeline that scans local repos (pyproject/package.json), extracts metadata/dependencies, upserts project/library entities into the KG SQLite DB, creates depends_on relations, supports dry-run, and reports counts. Changes
Sequence DiagramsequenceDiagram
actor User
participant CLI as "CLI\nrgba(45,125,255,0.5)"
participant Scanner as "Scanner\nrgba(60,180,75,0.5)"
participant Enricher as "Enricher\nrgba(255,165,0,0.5)"
participant Database as "Database\nrgba(128,0,128,0.5)"
User->>CLI: run code-intel (--base-dir, --dry-run)
CLI->>Scanner: scan_projects(base_dir)
Scanner->>Scanner: detect repos (pyproject.toml / package.json)
Scanner-->>CLI: return list of project metadata
CLI->>Enricher: enrich_projects(db_path, base_dir, dry_run)
loop for each project
Enricher->>Database: query project by LOWER(name)
alt not found
Enricher->>Database: insert project entity
else found
Enricher->>Database: update project entity
end
loop for each notable dependency
Enricher->>Database: upsert library entity
Enricher->>Database: create depends_on relation (if missing)
end
end
Database-->>Enricher: return operation counts
Enricher-->>CLI: return stats
CLI-->>User: print completion report
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| ) -> None: | ||
| """Add a depends_on relation from project to dependency. | ||
|
|
||
| Creates the dependency as a 'library' entity if it doesn't exist. |
There was a problem hiding this comment.
🟢 Low pipeline/code_intelligence.py:265
In dry-run mode, stats["dep_entities_created"] over-counts when multiple projects share the same notable dependency. When dry_run=True, the library entity is never inserted, so each subsequent call for the same dependency fails the target_row check, generates a new random target_id, and increments the counter again. For example, 3 projects depending on react reports dep_entities_created: 3 instead of 1. Consider tracking which dependencies have been "virtually" created in dry-run mode to keep the stats accurate.
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file src/brainlayer/pipeline/code_intelligence.py around line 265:
In dry-run mode, `stats["dep_entities_created"]` over-counts when multiple projects share the same notable dependency. When `dry_run=True`, the library entity is never inserted, so each subsequent call for the same dependency fails the `target_row` check, generates a new random `target_id`, and increments the counter again. For example, 3 projects depending on `react` reports `dep_entities_created: 3` instead of `1`. Consider tracking which dependencies have been "virtually" created in dry-run mode to keep the stats accurate.
Evidence trail:
src/brainlayer/pipeline/code_intelligence.py lines 309-323 at REVIEWED_COMMIT:
- Line 309-312: target_row query checks database for existing entity
- Line 314: `if not target_row:` branch
- Line 315: generates new UUID-based target_id
- Line 316-320: `if not dry_run:` - INSERT only runs when dry_run=False
- Line 321: `stats["dep_entities_created"] += 1` - incremented unconditionally within the `if not target_row` block
In dry-run mode, the entity is never inserted, so subsequent calls for the same dependency will find target_row=None again, generating new IDs and incrementing the counter multiple times.
There was a problem hiding this comment.
Acknowledged — minor dry-run accuracy issue. In practice dry-run is for previewing what would happen, and the slight overcount doesn't affect actual DB writes. Won't fix for now.
…issing key - Fix framework detection order: check specific (expo, react-native, next) before generic (react) to avoid misclassification - Add ~ to version specifier regex for PEP 440 compatible release (~=) - Add dep_entities_created key to early return path (empty scan) - Add 3 regression tests for all 3 fixes (17 total) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/brainlayer/cli/__init__.py`:
- Around line 1936-1957: The code block inside the CLI command function
code_intel is failing ruff format checks; run the project's formatter and linter
(e.g., ruff check src/ && ruff format src/) or directly run ruff format on the
file containing the code_intel function so the import/enrichment block and the
f-strings are reformatted to match project style, then re-run tests/CI and
commit the formatted changes (ensure changes include the code_intel function and
any nearby formatting Ruff adjusts).
In `@src/brainlayer/pipeline/code_intelligence.py`:
- Around line 57-65: The pyproject manifest loader in _extract_pyproject
currently lets tomllib.load() exceptions propagate and abort the scan; update
_extract_pyproject to wrap the file load/parsing in a try/except that catches
tomllib.JSONDecodeError, tomllib.TOMLDecodeError (or generic Exception if
backend-specific), logs a warning that includes the repo_path/path and the
error, and returns an empty dict so the scan continues; apply the same defensive
pattern to the package.json extractor referenced at lines 91-94 (wrap
json.load() in try/except, log a warning with file/path and error, and return an
empty dict).
- Around line 349-366: Remove the duplicated argparse-based CLI block under if
__name__ == "__main__" that imports argparse, sets up logging, parses
--base-dir/--dry-run, calls enrich_projects and prints results; this duplicate
entrypoint should be deleted so the Typer CLI is the single source of truth.
Ensure enrich_projects, DEFAULT_BASE_DIR and any needed types remain
exported/importable by the Typer command (do not change enrich_projects
signature), and remove the print/result-handling code in this module so callers
(the Typer command) handle invocation and output.
- Around line 168-171: The early-return when no projects are found in
process_scan (in src/brainlayer/pipeline/code_intelligence.py where projects =
scan_projects(base_dir)) returns a stats dict missing the dep_entities_created
key; update that return to include dep_entities_created: 0 so the stats shape
matches downstream expectations (e.g., the caller in
src/brainlayer/cli/__init__.py that reads dep_entities_created) and avoids
KeyError.
- Around line 309-321: The SELECT that looks up target_row should restrict to
library entities to avoid matching non-library rows; update the query executed
via conn.execute that currently uses "SELECT id FROM kg_entities WHERE
LOWER(name) = LOWER(?)" to include "AND entity_type = 'library'" (and pass
dep_name as before), so the lookup uses both entity_type and name, and leave the
INSERT into kg_entities (which already sets entity_type='library') unchanged;
make this change in the block that references target_row, conn.execute, dep_name
and the subsequent INSERT that creates target_id.
- Around line 173-175: The bulk writer in
src/brainlayer/pipeline/code_intelligence.py currently opens SQLite directly
with sqlite3.connect(db_path), bypassing the shared KG bootstrap in
src/brainlayer/vector_store.py (the kg_entities/kg_relations initializer) and
thus can run against un-migrated DBs and conflict with enrichment workers;
change the code to obtain the DB connection via the shared initializer in
vector_store (reuse the existing bootstrap/init function that ensures migrations
and columns like description/importance exist), ensure you stop enrichment
workers and checkpoint the WAL before performing bulk writes, use a per-worker
DB connection (do not share a single connection across threads/processes), and
add retry/backoff logic for SQLITE_BUSY errors (exponential backoff with limited
attempts) so the bulk writer coordinates correctly with other BrainLayer
writers.
In `@tests/test_code_intelligence.py`:
- Around line 89-111: The test fixture creates kg_entities and kg_relations
without the UNIQUE constraints used in production, causing different behavior
for idempotency/upsert tests; update the CREATE TABLE statements for kg_entities
to include UNIQUE(entity_type, name) and for kg_relations to include
UNIQUE(source_id, target_id, relation_type) so the test schema matches the
production constraints used by the vector store code (kg_entities,
kg_relations).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 1b97f866-bf99-4893-86b9-cb6c135ce6d1
📒 Files selected for processing (3)
src/brainlayer/cli/__init__.pysrc/brainlayer/pipeline/code_intelligence.pytests/test_code_intelligence.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior
Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work
Run pytest before claiming behavior changed safely; current test suite has 929 tests
Files:
src/brainlayer/cli/__init__.pytests/test_code_intelligence.pysrc/brainlayer/pipeline/code_intelligence.py
src/brainlayer/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/brainlayer/**/*.py: Use Python/Typer CLI architecture for the main package insrc/brainlayer/
All scripts and CLI must usepaths.py:get_db_path()for resolving database path instead of hardcoding
Implement chunk lifecycle management with columnssuperseded_by,aggregated_into,archived_at; default search excludes lifecycle-managed chunks
Never run bulk database operations while enrichment workers are writing; always stop workers and checkpoint WAL first
Drop FTS triggers before bulk deletes onchunkstable and recreate after; batch deletes in 5-10K chunks with checkpoint every 3 batches
Implement retry logic onSQLITE_BUSYerrors; each worker must use its own database connection
Useruff check src/ && ruff format src/for linting and formatting
Files:
src/brainlayer/cli/__init__.pysrc/brainlayer/pipeline/code_intelligence.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use
pytestfor testing
Files:
tests/test_code_intelligence.py
🧠 Learnings (3)
📓 Common learnings
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-14T02:20:54.656Z
Learning: Request codex review, cursor review, and bugbot review for BrainLayer PRs
📚 Learning: 2026-03-29T23:19:50.743Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:50.743Z
Learning: Applies to src/brainlayer/cli.py : Use Typer CLI framework for command-line interface in `src/brainlayer/`
Applied to files:
src/brainlayer/cli/__init__.py
📚 Learning: 2026-03-29T23:19:51.321Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:51.321Z
Learning: Applies to src/brainlayer/**/*.py : Use Python/Typer CLI architecture for the main package in `src/brainlayer/`
Applied to files:
src/brainlayer/cli/__init__.py
🪛 GitHub Actions: CI
src/brainlayer/cli/__init__.py
[error] 1-1: ruff format --check failed: file would be reformatted (1 file out of 172). Run 'ruff format src/brainlayer/cli/init.py' to apply formatting.
| def _extract_pyproject(repo_path: Path, path: Path) -> dict[str, Any]: | ||
| """Extract metadata from pyproject.toml.""" | ||
| try: | ||
| import tomllib | ||
| except ImportError: | ||
| import tomli as tomllib # type: ignore[no-redef] | ||
|
|
||
| with open(path, "rb") as f: | ||
| data = tomllib.load(f) |
There was a problem hiding this comment.
Skip malformed manifests instead of aborting the whole scan.
tomllib.load() and json.load() failures bubble straight out of these helpers. One half-written pyproject.toml or package.json under the scan root will terminate the entire command instead of warning and continuing with the next repo.
Also applies to: 91-94
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/brainlayer/pipeline/code_intelligence.py` around lines 57 - 65, The
pyproject manifest loader in _extract_pyproject currently lets tomllib.load()
exceptions propagate and abort the scan; update _extract_pyproject to wrap the
file load/parsing in a try/except that catches tomllib.JSONDecodeError,
tomllib.TOMLDecodeError (or generic Exception if backend-specific), logs a
warning that includes the repo_path/path and the error, and returns an empty
dict so the scan continues; apply the same defensive pattern to the package.json
extractor referenced at lines 91-94 (wrap json.load() in try/except, log a
warning with file/path and error, and return an empty dict).
| conn = sqlite3.connect(db_path) | ||
| conn.execute("PRAGMA journal_mode = WAL") | ||
|
|
There was a problem hiding this comment.
Go through the shared KG bootstrap before writing here.
This raw sqlite3.connect() path bypasses the kg_entities/kg_relations bootstrap in src/brainlayer/vector_store.py, while the later statements already assume migrated columns like description and importance exist. Fresh or pre-migration DBs can fail immediately, and this bulk writer still has no SQLITE_BUSY retry or coordination with other BrainLayer writers. Reuse the shared initializer before issuing these writes. As per coding guidelines 'Never run bulk database operations while enrichment workers are writing; always stop workers and checkpoint WAL first' and 'Implement retry logic on SQLITE_BUSY errors; each worker must use its own database connection'.
Also applies to: 231-245
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/brainlayer/pipeline/code_intelligence.py` around lines 173 - 175, The
bulk writer in src/brainlayer/pipeline/code_intelligence.py currently opens
SQLite directly with sqlite3.connect(db_path), bypassing the shared KG bootstrap
in src/brainlayer/vector_store.py (the kg_entities/kg_relations initializer) and
thus can run against un-migrated DBs and conflict with enrichment workers;
change the code to obtain the DB connection via the shared initializer in
vector_store (reuse the existing bootstrap/init function that ensures migrations
and columns like description/importance exist), ensure you stop enrichment
workers and checkpoint the WAL before performing bulk writes, use a per-worker
DB connection (do not share a single connection across threads/processes), and
add retry/backoff logic for SQLITE_BUSY errors (exponential backoff with limited
attempts) so the bulk writer coordinates correctly with other BrainLayer
writers.
| target_row = conn.execute( | ||
| "SELECT id FROM kg_entities WHERE LOWER(name) = LOWER(?)", | ||
| (dep_name,), | ||
| ).fetchone() | ||
|
|
||
| if not target_row: | ||
| target_id = f"lib-{uuid.uuid4().hex[:12]}" | ||
| if not dry_run: | ||
| conn.execute( | ||
| """INSERT INTO kg_entities (id, entity_type, name, importance, created_at, updated_at) | ||
| VALUES (?, 'library', ?, 3.0, strftime('%Y-%m-%dT%H:%M:%fZ','now'), strftime('%Y-%m-%dT%H:%M:%fZ','now'))""", | ||
| (target_id, dep_name), | ||
| ) |
There was a problem hiding this comment.
Restrict the dependency lookup to library entities.
kg_entities is unique on (entity_type, name), not on name alone. Without entity_type = 'library' here, a dependency like openai or fastapi can resolve to some unrelated entity with the same name and the depends_on edge will point at the wrong node.
Suggested fix
target_row = conn.execute(
- "SELECT id FROM kg_entities WHERE LOWER(name) = LOWER(?)",
+ "SELECT id FROM kg_entities WHERE LOWER(name) = LOWER(?) AND entity_type = 'library'",
(dep_name,),
).fetchone()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/brainlayer/pipeline/code_intelligence.py` around lines 309 - 321, The
SELECT that looks up target_row should restrict to library entities to avoid
matching non-library rows; update the query executed via conn.execute that
currently uses "SELECT id FROM kg_entities WHERE LOWER(name) = LOWER(?)" to
include "AND entity_type = 'library'" (and pass dep_name as before), so the
lookup uses both entity_type and name, and leave the INSERT into kg_entities
(which already sets entity_type='library') unchanged; make this change in the
block that references target_row, conn.execute, dep_name and the subsequent
INSERT that creates target_id.
| if __name__ == "__main__": | ||
| import argparse | ||
|
|
||
| logging.basicConfig(level=logging.INFO, format="%(message)s") | ||
|
|
||
| parser = argparse.ArgumentParser(description="Populate project entities from repo metadata") | ||
| parser.add_argument("--base-dir", type=Path, default=DEFAULT_BASE_DIR, help="Directory containing repos") | ||
| parser.add_argument("--dry-run", action="store_true", help="Show changes without applying") | ||
| args = parser.parse_args() | ||
|
|
||
| result = enrich_projects(base_dir=args.base_dir, dry_run=args.dry_run) | ||
| prefix = "[DRY RUN] " if args.dry_run else "" | ||
| print(f"\n{prefix}Code intelligence scan complete:") | ||
| print(f" Projects scanned: {result['projects_scanned']}") | ||
| print(f" Entities created: {result['entities_created']}") | ||
| print(f" Entities updated: {result['entities_updated']}") | ||
| print(f" Relations added: {result['relations_added']}") | ||
| print(f" Library entities added: {result['dep_entities_created']}") |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Drop the secondary argparse CLI from the pipeline module.
The Typer command already exposes this flow. Keeping another entrypoint here duplicates option parsing/output and makes the behavior easy to drift. As per coding guidelines 'Use Python/Typer CLI architecture for the main package in src/brainlayer/'.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/brainlayer/pipeline/code_intelligence.py` around lines 349 - 366, Remove
the duplicated argparse-based CLI block under if __name__ == "__main__" that
imports argparse, sets up logging, parses --base-dir/--dry-run, calls
enrich_projects and prints results; this duplicate entrypoint should be deleted
so the Typer CLI is the single source of truth. Ensure enrich_projects,
DEFAULT_BASE_DIR and any needed types remain exported/importable by the Typer
command (do not change enrich_projects signature), and remove the
print/result-handling code in this module so callers (the Typer command) handle
invocation and output.
| conn.execute( | ||
| """CREATE TABLE IF NOT EXISTS kg_entities ( | ||
| id TEXT PRIMARY KEY, | ||
| entity_type TEXT NOT NULL, | ||
| name TEXT NOT NULL, | ||
| description TEXT DEFAULT '', | ||
| metadata TEXT DEFAULT '{}', | ||
| importance REAL DEFAULT 0.0, | ||
| created_at TEXT DEFAULT (strftime('%Y-%m-%dT%H:%M:%fZ','now')), | ||
| updated_at TEXT DEFAULT (strftime('%Y-%m-%dT%H:%M:%fZ','now')) | ||
| )""" | ||
| ) | ||
| conn.execute( | ||
| """CREATE TABLE IF NOT EXISTS kg_relations ( | ||
| id TEXT PRIMARY KEY, | ||
| source_id TEXT NOT NULL, | ||
| target_id TEXT NOT NULL, | ||
| relation_type TEXT NOT NULL, | ||
| properties TEXT DEFAULT '{}', | ||
| confidence REAL DEFAULT 0.5, | ||
| created_at TEXT DEFAULT (strftime('%Y-%m-%dT%H:%M:%fZ','now')) | ||
| )""" | ||
| ) |
There was a problem hiding this comment.
Mirror the production KG constraints in this fixture.
The real schema in src/brainlayer/vector_store.py adds UNIQUE(entity_type, name) on kg_entities and UNIQUE(source_id, target_id, relation_type) on kg_relations, but this fixture omits both. That makes the new idempotency/upsert tests run against a schema that behaves differently from production.
Suggested fix
conn.execute(
"""CREATE TABLE IF NOT EXISTS kg_entities (
id TEXT PRIMARY KEY,
entity_type TEXT NOT NULL,
name TEXT NOT NULL,
description TEXT DEFAULT '',
metadata TEXT DEFAULT '{}',
importance REAL DEFAULT 0.0,
created_at TEXT DEFAULT (strftime('%Y-%m-%dT%H:%M:%fZ','now')),
- updated_at TEXT DEFAULT (strftime('%Y-%m-%dT%H:%M:%fZ','now'))
+ updated_at TEXT DEFAULT (strftime('%Y-%m-%dT%H:%M:%fZ','now')),
+ UNIQUE(entity_type, name)
)"""
)
conn.execute(
"""CREATE TABLE IF NOT EXISTS kg_relations (
id TEXT PRIMARY KEY,
source_id TEXT NOT NULL,
target_id TEXT NOT NULL,
relation_type TEXT NOT NULL,
properties TEXT DEFAULT '{}',
- confidence REAL DEFAULT 0.5,
- created_at TEXT DEFAULT (strftime('%Y-%m-%dT%H:%M:%fZ','now'))
+ confidence REAL DEFAULT 1.0,
+ created_at TEXT DEFAULT (strftime('%Y-%m-%dT%H:%M:%fZ','now')),
+ UNIQUE(source_id, target_id, relation_type)
)"""
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| conn.execute( | |
| """CREATE TABLE IF NOT EXISTS kg_entities ( | |
| id TEXT PRIMARY KEY, | |
| entity_type TEXT NOT NULL, | |
| name TEXT NOT NULL, | |
| description TEXT DEFAULT '', | |
| metadata TEXT DEFAULT '{}', | |
| importance REAL DEFAULT 0.0, | |
| created_at TEXT DEFAULT (strftime('%Y-%m-%dT%H:%M:%fZ','now')), | |
| updated_at TEXT DEFAULT (strftime('%Y-%m-%dT%H:%M:%fZ','now')) | |
| )""" | |
| ) | |
| conn.execute( | |
| """CREATE TABLE IF NOT EXISTS kg_relations ( | |
| id TEXT PRIMARY KEY, | |
| source_id TEXT NOT NULL, | |
| target_id TEXT NOT NULL, | |
| relation_type TEXT NOT NULL, | |
| properties TEXT DEFAULT '{}', | |
| confidence REAL DEFAULT 0.5, | |
| created_at TEXT DEFAULT (strftime('%Y-%m-%dT%H:%M:%fZ','now')) | |
| )""" | |
| ) | |
| conn.execute( | |
| """CREATE TABLE IF NOT EXISTS kg_entities ( | |
| id TEXT PRIMARY KEY, | |
| entity_type TEXT NOT NULL, | |
| name TEXT NOT NULL, | |
| description TEXT DEFAULT '', | |
| metadata TEXT DEFAULT '{}', | |
| importance REAL DEFAULT 0.0, | |
| created_at TEXT DEFAULT (strftime('%Y-%m-%dT%H:%M:%fZ','now')), | |
| updated_at TEXT DEFAULT (strftime('%Y-%m-%dT%H:%M:%fZ','now')), | |
| UNIQUE(entity_type, name) | |
| )""" | |
| ) | |
| conn.execute( | |
| """CREATE TABLE IF NOT EXISTS kg_relations ( | |
| id TEXT PRIMARY KEY, | |
| source_id TEXT NOT NULL, | |
| target_id TEXT NOT NULL, | |
| relation_type TEXT NOT NULL, | |
| properties TEXT DEFAULT '{}', | |
| confidence REAL DEFAULT 1.0, | |
| created_at TEXT DEFAULT (strftime('%Y-%m-%dT%H:%M:%fZ','now')), | |
| UNIQUE(source_id, target_id, relation_type) | |
| )""" | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_code_intelligence.py` around lines 89 - 111, The test fixture
creates kg_entities and kg_relations without the UNIQUE constraints used in
production, causing different behavior for idempotency/upsert tests; update the
CREATE TABLE statements for kg_entities to include UNIQUE(entity_type, name) and
for kg_relations to include UNIQUE(source_id, target_id, relation_type) so the
test schema matches the production constraints used by the vector store code
(kg_entities, kg_relations).
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (5)
tests/test_code_intelligence.py (1)
89-111:⚠️ Potential issue | 🟠 MajorMirror the canonical KG schema in this fixture.
This test DB still differs from
src/brainlayer/vector_store.py:456-491: it addsdescription/importance, keepsconfidenceat0.5, and omits the productionUNIQUE(...)constraints. That lets these tests pass against a schema production never boots and masks thekg_entitiescolumn mismatch insrc/brainlayer/pipeline/code_intelligence.py.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_code_intelligence.py` around lines 89 - 111, The test fixture's CREATE TABLE statements for kg_entities and kg_relations diverge from the canonical schema in src/brainlayer/vector_store.py (lines ~456-491): remove the extra description and importance columns from kg_entities, align default values (e.g., set confidence to the production default), and add the same UNIQUE(...) constraints present in the canonical schema for both kg_entities and kg_relations so the test DB matches production; update the CREATE TABLE blocks named kg_entities and kg_relations in tests/test_code_intelligence.py to exactly mirror the column names, defaults, and UNIQUE constraints from src/brainlayer/vector_store.py.src/brainlayer/pipeline/code_intelligence.py (4)
57-65:⚠️ Potential issue | 🟠 MajorDon’t let one bad manifest abort the whole scan.
Line 65 and Line 94 still let parse errors bubble out, so a single half-written
pyproject.tomlorpackage.jsonstops discovery for every other repo under the base directory. Catch load/decode failures, log the path, and returnNoneso the scan can continue.🐛 Suggested hardening
-def _extract_pyproject(repo_path: Path, path: Path) -> dict[str, Any]: +def _extract_pyproject(repo_path: Path, path: Path) -> dict[str, Any] | None: """Extract metadata from pyproject.toml.""" try: import tomllib except ImportError: import tomli as tomllib # type: ignore[no-redef] - with open(path, "rb") as f: - data = tomllib.load(f) + try: + with path.open("rb") as f: + data = tomllib.load(f) + except Exception as exc: + logger.warning("Skipping unreadable manifest %s: %s", path, exc) + return None @@ -def _extract_package_json(repo_path: Path, path: Path) -> dict[str, Any]: +def _extract_package_json(repo_path: Path, path: Path) -> dict[str, Any] | None: """Extract metadata from package.json.""" - with open(path) as f: - data = json.load(f) + try: + with path.open(encoding="utf-8") as f: + data = json.load(f) + except Exception as exc: + logger.warning("Skipping unreadable manifest %s: %s", path, exc) + return NoneAlso applies to: 91-94
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/brainlayer/pipeline/code_intelligence.py` around lines 57 - 65, The _extract_pyproject function (and likewise the package.json parsing code around _extract_package_json or the block at lines 91-94) currently lets tomllib/tomli load errors propagate and abort the whole scan; wrap the tomllib.load (and the JSON load for package.json) in a try/except that catches parsing/decoding exceptions (e.g., tomllib.TOMLDecodeError, tomli.TOMLDecodeError, json.JSONDecodeError, and generic Exception for IO issues), log a warning including the manifest path and repo_path using the existing logger, and return None on failure so discovery continues for other repos. Ensure the function signature and callers tolerate a None return and only treat valid dicts as successful parses.
360-377: 🛠️ Refactor suggestion | 🟠 MajorKeep the
src/brainlayer/CLI surface on Typer.The
code-intelTyper command already owns--base-dirand--dry-run; this second argparse entrypoint duplicates parsing/output and will drift. If module execution must stay supported, make__main__a thin wrapper into the Typer command or a shared helper instead of another parser.Run this to inspect both entrypoints side by side:
#!/bin/bash set -euo pipefail echo "== Existing Typer command ==" sed -n '1937,1957p' src/brainlayer/cli/__init__.py echo echo "== Duplicate argparse entrypoint ==" sed -n '360,377p' src/brainlayer/pipeline/code_intelligence.pyAs per coding guidelines "Use Python/Typer CLI architecture for the main package in
src/brainlayer/".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/brainlayer/pipeline/code_intelligence.py` around lines 360 - 377, The __main__ argparse block duplicates the existing Typer "code-intel" CLI; remove the argparse parser and replace the if __name__ == "__main__": section with a thin wrapper that delegates to the existing Typer command (do not re-implement argument parsing). Import and invoke the Typer command (the CLI entry that exposes "code-intel" in src.brainlayer.cli) or call the shared helper used by that command (the enrich_projects(...) function) with the parsed values, preserving the same printed summary format; reference the enrich_projects function and the Typer "code-intel" command when making the change so the module execution reuses the canonical CLI parsing instead of duplicating it.
320-323:⚠️ Potential issue | 🟠 MajorConstrain dependency lookups to
libraryentities.Line 321 matches on
namealone. Becausekg_entitiesis keyed by(entity_type, name), a dependency likefastapican resolve to an unrelated project/entity row and thedepends_onedge will point at the wrong node.🎯 Suggested fix
target_row = conn.execute( - "SELECT id FROM kg_entities WHERE LOWER(name) = LOWER(?)", + "SELECT id FROM kg_entities WHERE LOWER(name) = LOWER(?) AND entity_type = 'library'", (dep_name,), ).fetchone()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/brainlayer/pipeline/code_intelligence.py` around lines 320 - 323, The dependency lookup in code_intelligence.py uses a SELECT by name only (see the conn.execute call that sets target_row) which can match non-library entities; modify the query used in the function that builds dependency edges to also constrain entity_type = 'library' (i.e. SELECT id FROM kg_entities WHERE entity_type = ? AND LOWER(name) = LOWER(?) and pass 'library' as the first parameter) so target_row resolves only to library nodes before creating the depends_on edge.
184-200:⚠️ Potential issue | 🔴 CriticalUse the shared KG bootstrap/writer before these inserts.
The canonical KG bootstrap in
src/brainlayer/vector_store.py:456-470does not createkg_entities.descriptionorkg_entities.importance, so these write paths can raisesqlite3.OperationalErroron a freshly initialized DB. This directsqlite3.connect()path also skips the repo’s requiredSQLITE_BUSYretry and writer-coordination flow, so it can collide with other BrainLayer writers on an active database.Run this to verify whether any migration adds those columns and whether a shared retry/bootstrap helper already exists:
#!/bin/bash set -euo pipefail echo "== KG schema / migrations ==" rg -n -C2 'CREATE TABLE IF NOT EXISTS kg_entities|ALTER TABLE kg_entities ADD COLUMN description|ALTER TABLE kg_entities ADD COLUMN importance|CREATE TABLE IF NOT EXISTS kg_relations' src/brainlayer echo echo "== SQLite retry / coordination helpers ==" rg -n -C2 'SQLITE_BUSY|wal_checkpoint|sqlite3\.connect|journal_mode = WAL' src/brainlayerAs per coding guidelines "Implement retry logic on
SQLITE_BUSYerrors; each worker must use its own database connection" and "Never run bulk database operations while enrichment workers are writing; always stop workers and checkpoint WAL first".Also applies to: 241-255, 329-353
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/brainlayer/pipeline/code_intelligence.py` around lines 184 - 200, Replace the direct sqlite3.connect/PRAGMA usage and uncoordinated writes with the repo's shared KG bootstrap/writer (the canonical bootstrap/writer used by the vector_store module) so migrations that add kg_entities.description/importance are applied and SQLITE_BUSY retry/coordination is respected; obtain the connection/writer from that shared helper and use it in place of conn for calls to _upsert_project, ensure the helper's retry logic handles SQLITE_BUSY, ensure you stop/enqueue workers and run a WAL checkpoint via the shared writer before performing bulk inserts, and perform the commit/flush through the writer API (only when not dry_run) instead of calling conn.commit().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/brainlayer/pipeline/code_intelligence.py`:
- Around line 325-333: The code in the block that creates library entities
(checks target_row, builds target_id, inserts to kg_entities and increments
stats["dep_entities_created"]) overcounts when dry_run=True because missing
libraries never get inserted and subsequent occurrences are treated as new; fix
by tracking planned library names (e.g., a module-level or function-local set
like planned_libs) and change the logic in the branch for missing target_row to
first check if dep_name is already in planned_libs: if not, add dep_name to
planned_libs and increment stats["dep_entities_created"]; only perform the
actual DB INSERT when not dry_run (as now). This ensures each missing library is
counted once whether or not dry_run is set.
---
Duplicate comments:
In `@src/brainlayer/pipeline/code_intelligence.py`:
- Around line 57-65: The _extract_pyproject function (and likewise the
package.json parsing code around _extract_package_json or the block at lines
91-94) currently lets tomllib/tomli load errors propagate and abort the whole
scan; wrap the tomllib.load (and the JSON load for package.json) in a try/except
that catches parsing/decoding exceptions (e.g., tomllib.TOMLDecodeError,
tomli.TOMLDecodeError, json.JSONDecodeError, and generic Exception for IO
issues), log a warning including the manifest path and repo_path using the
existing logger, and return None on failure so discovery continues for other
repos. Ensure the function signature and callers tolerate a None return and only
treat valid dicts as successful parses.
- Around line 360-377: The __main__ argparse block duplicates the existing Typer
"code-intel" CLI; remove the argparse parser and replace the if __name__ ==
"__main__": section with a thin wrapper that delegates to the existing Typer
command (do not re-implement argument parsing). Import and invoke the Typer
command (the CLI entry that exposes "code-intel" in src.brainlayer.cli) or call
the shared helper used by that command (the enrich_projects(...) function) with
the parsed values, preserving the same printed summary format; reference the
enrich_projects function and the Typer "code-intel" command when making the
change so the module execution reuses the canonical CLI parsing instead of
duplicating it.
- Around line 320-323: The dependency lookup in code_intelligence.py uses a
SELECT by name only (see the conn.execute call that sets target_row) which can
match non-library entities; modify the query used in the function that builds
dependency edges to also constrain entity_type = 'library' (i.e. SELECT id FROM
kg_entities WHERE entity_type = ? AND LOWER(name) = LOWER(?) and pass 'library'
as the first parameter) so target_row resolves only to library nodes before
creating the depends_on edge.
- Around line 184-200: Replace the direct sqlite3.connect/PRAGMA usage and
uncoordinated writes with the repo's shared KG bootstrap/writer (the canonical
bootstrap/writer used by the vector_store module) so migrations that add
kg_entities.description/importance are applied and SQLITE_BUSY
retry/coordination is respected; obtain the connection/writer from that shared
helper and use it in place of conn for calls to _upsert_project, ensure the
helper's retry logic handles SQLITE_BUSY, ensure you stop/enqueue workers and
run a WAL checkpoint via the shared writer before performing bulk inserts, and
perform the commit/flush through the writer API (only when not dry_run) instead
of calling conn.commit().
In `@tests/test_code_intelligence.py`:
- Around line 89-111: The test fixture's CREATE TABLE statements for kg_entities
and kg_relations diverge from the canonical schema in
src/brainlayer/vector_store.py (lines ~456-491): remove the extra description
and importance columns from kg_entities, align default values (e.g., set
confidence to the production default), and add the same UNIQUE(...) constraints
present in the canonical schema for both kg_entities and kg_relations so the
test DB matches production; update the CREATE TABLE blocks named kg_entities and
kg_relations in tests/test_code_intelligence.py to exactly mirror the column
names, defaults, and UNIQUE constraints from src/brainlayer/vector_store.py.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 8f12c60e-60bb-4bfa-9418-0a5e93d5a03b
📒 Files selected for processing (2)
src/brainlayer/pipeline/code_intelligence.pytests/test_code_intelligence.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior
Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work
Run pytest before claiming behavior changed safely; current test suite has 929 tests
Files:
tests/test_code_intelligence.pysrc/brainlayer/pipeline/code_intelligence.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use
pytestfor testing
Files:
tests/test_code_intelligence.py
src/brainlayer/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/brainlayer/**/*.py: Use Python/Typer CLI architecture for the main package insrc/brainlayer/
All scripts and CLI must usepaths.py:get_db_path()for resolving database path instead of hardcoding
Implement chunk lifecycle management with columnssuperseded_by,aggregated_into,archived_at; default search excludes lifecycle-managed chunks
Never run bulk database operations while enrichment workers are writing; always stop workers and checkpoint WAL first
Drop FTS triggers before bulk deletes onchunkstable and recreate after; batch deletes in 5-10K chunks with checkpoint every 3 batches
Implement retry logic onSQLITE_BUSYerrors; each worker must use its own database connection
Useruff check src/ && ruff format src/for linting and formatting
Files:
src/brainlayer/pipeline/code_intelligence.py
🧠 Learnings (11)
📚 Learning: 2026-03-14T02:20:54.656Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-14T02:20:54.656Z
Learning: Applies to **/*.py : Run pytest before claiming behavior changed safely; current test suite has 929 tests
Applied to files:
tests/test_code_intelligence.py
📚 Learning: 2026-03-29T23:19:51.321Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:51.321Z
Learning: Applies to src/brainlayer/**/*.py : Never run bulk database operations while enrichment workers are writing; always stop workers and checkpoint WAL first
Applied to files:
src/brainlayer/pipeline/code_intelligence.py
📚 Learning: 2026-03-29T23:19:50.743Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:50.743Z
Learning: Applies to src/brainlayer/*bulk*.py : Before bulk database operations: stop enrichment workers, checkpoint WAL with `PRAGMA wal_checkpoint(FULL)`, drop FTS triggers before bulk deletes
Applied to files:
src/brainlayer/pipeline/code_intelligence.py
📚 Learning: 2026-03-29T23:19:51.321Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:51.321Z
Learning: Applies to src/brainlayer/**/*.py : Implement retry logic on `SQLITE_BUSY` errors; each worker must use its own database connection
Applied to files:
src/brainlayer/pipeline/code_intelligence.py
📚 Learning: 2026-03-29T23:19:50.743Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:50.743Z
Learning: Applies to src/**/*.py : All database connections must retry on `SQLITE_BUSY`; each worker uses its own connection
Applied to files:
src/brainlayer/pipeline/code_intelligence.py
📚 Learning: 2026-03-14T02:20:54.656Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-14T02:20:54.656Z
Learning: Be aware of known BrainLayer issues: DB locking during enrichment and WAL growth up to 4.7GB
Applied to files:
src/brainlayer/pipeline/code_intelligence.py
📚 Learning: 2026-03-14T02:20:54.656Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-14T02:20:54.656Z
Learning: Applies to **/*.py : Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior
Applied to files:
src/brainlayer/pipeline/code_intelligence.py
📚 Learning: 2026-03-29T23:19:50.743Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:50.743Z
Learning: Never run bulk database operations while enrichment is writing; this causes WAL bloat and potential freeze
Applied to files:
src/brainlayer/pipeline/code_intelligence.py
📚 Learning: 2026-03-18T00:12:15.607Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 87
File: brain-bar/Sources/BrainBar/BrainBarServer.swift:118-129
Timestamp: 2026-03-18T00:12:15.607Z
Learning: In `brain-bar/Sources/BrainBar/MCPRouter.swift` (Swift, BrainBar daemon), the socket-before-DB startup pattern means the Unix socket binds immediately (~1ms) while the database may take several seconds to open on cold start (8GB file). Any tool handler that accesses `database` MUST throw an explicit error (e.g., `ToolError.noDatabase`) when `database` is nil — never return empty or default results (e.g., `guard let db else { return "[]" }` is forbidden). The false-success pattern hides startup timing issues from MCP clients. Flag any `guard let db = database else { return ... }` patterns that silently return defaults instead of throwing.
Applied to files:
src/brainlayer/pipeline/code_intelligence.py
📚 Learning: 2026-03-29T23:19:51.321Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:51.321Z
Learning: Applies to src/brainlayer/**/*.py : Use Python/Typer CLI architecture for the main package in `src/brainlayer/`
Applied to files:
src/brainlayer/pipeline/code_intelligence.py
📚 Learning: 2026-03-29T23:19:50.743Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:50.743Z
Learning: Applies to src/brainlayer/cli.py : Use Typer CLI framework for command-line interface in `src/brainlayer/`
Applied to files:
src/brainlayer/pipeline/code_intelligence.py
🔇 Additional comments (2)
tests/test_code_intelligence.py (1)
173-199: Good regression coverage for the last two fixes.These cases pin both the framework-precedence bug and the
~=dependency parsing regression.src/brainlayer/pipeline/code_intelligence.py (1)
133-153: Nice specificity ordering here.Checking Expo / React Native / Next.js before generic React preserves the intended classification.
| if not target_row: | ||
| target_id = f"lib-{uuid.uuid4().hex[:12]}" | ||
| if not dry_run: | ||
| conn.execute( | ||
| """INSERT INTO kg_entities (id, entity_type, name, importance, created_at, updated_at) | ||
| VALUES (?, 'library', ?, 3.0, strftime('%Y-%m-%dT%H:%M:%fZ','now'), strftime('%Y-%m-%dT%H:%M:%fZ','now'))""", | ||
| (target_id, dep_name), | ||
| ) | ||
| stats["dep_entities_created"] += 1 |
There was a problem hiding this comment.
dry_run overcounts shared library creations.
When dry_run=True, the first missing library is never inserted, so the second project that depends on the same missing package hits this branch again and increments dep_entities_created a second time. Track planned library names in memory or precompute unique notable dependencies before the loop.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/brainlayer/pipeline/code_intelligence.py` around lines 325 - 333, The
code in the block that creates library entities (checks target_row, builds
target_id, inserts to kg_entities and increments stats["dep_entities_created"])
overcounts when dry_run=True because missing libraries never get inserted and
subsequent occurrences are treated as new; fix by tracking planned library names
(e.g., a module-level or function-local set like planned_libs) and change the
logic in the branch for missing target_row to first check if dep_name is already
in planned_libs: if not, add dep_name to planned_libs and increment
stats["dep_entities_created"]; only perform the actual DB INSERT when not
dry_run (as now). This ensures each missing library is counted once whether or
not dry_run is set.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/brainlayer/cli/__init__.py`:
- Line 1947: enrich_projects() in src/brainlayer/pipeline/code_intelligence.py
opens sqlite3 directly and sets PRAGMA journal_mode = WAL but lacks PRAGMA
busy_timeout and retry-on-SQLITE_BUSY semantics; update the DB open and write
paths in enrich_projects() (and the helper that calls sqlite3.connect()) to
execute PRAGMA busy_timeout = <ms> after opening the connection and wrap write
transactions in a retry loop that catches sqlite3.OperationalError /
SQLITE_BUSY, sleeps/backoffs and retries a few times (matching patterns used in
clustering.py and vector_store.py) so concurrent MCP/enrichment work won't fail
on lock contention. Ensure the same connection/config is used for all writes in
enrich_projects() so the busy_timeout applies.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 956e09b0-1a09-4594-8e10-f2811ccfd66e
📒 Files selected for processing (1)
src/brainlayer/cli/__init__.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: test (3.13)
- GitHub Check: test (3.12)
- GitHub Check: test (3.11)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior
Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work
Run pytest before claiming behavior changed safely; current test suite has 929 tests
Files:
src/brainlayer/cli/__init__.py
src/brainlayer/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/brainlayer/**/*.py: Use Python/Typer CLI architecture for the main package insrc/brainlayer/
All scripts and CLI must usepaths.py:get_db_path()for resolving database path instead of hardcoding
Implement chunk lifecycle management with columnssuperseded_by,aggregated_into,archived_at; default search excludes lifecycle-managed chunks
Never run bulk database operations while enrichment workers are writing; always stop workers and checkpoint WAL first
Drop FTS triggers before bulk deletes onchunkstable and recreate after; batch deletes in 5-10K chunks with checkpoint every 3 batches
Implement retry logic onSQLITE_BUSYerrors; each worker must use its own database connection
Useruff check src/ && ruff format src/for linting and formatting
Files:
src/brainlayer/cli/__init__.py
🧠 Learnings (3)
📚 Learning: 2026-03-29T23:19:50.743Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:50.743Z
Learning: Applies to src/brainlayer/cli.py : Use Typer CLI framework for command-line interface in `src/brainlayer/`
Applied to files:
src/brainlayer/cli/__init__.py
📚 Learning: 2026-03-29T23:19:51.321Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:51.321Z
Learning: Applies to src/brainlayer/**/*.py : Use Python/Typer CLI architecture for the main package in `src/brainlayer/`
Applied to files:
src/brainlayer/cli/__init__.py
📚 Learning: 2026-03-29T23:19:51.321Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:51.321Z
Learning: Applies to src/brainlayer/**/*.py : Use `ruff check src/ && ruff format src/` for linting and formatting
Applied to files:
src/brainlayer/cli/__init__.py
| db_path = get_db_path() | ||
| rprint(f"[bold blue]זיכרון[/] - Code intelligence scan: [italic]{base_dir}[/]") | ||
|
|
||
| result = enrich_projects(db_path=str(db_path), base_dir=base_dir, dry_run=dry_run) |
There was a problem hiding this comment.
❓ Verification inconclusive
Script executed:
#!/bin/bash
set -euo pipefail
echo "1) Confirm call site in CLI"
rg -n -C2 '\bdef code_intel\s*\(' src/brainlayer/cli/__init__.py
rg -n -C2 '\benrich_projects\s*\(' src/brainlayer/cli/__init__.py
echo "2) Inspect enrichment implementation for write-serialization / busy handling"
rg -n -C4 '\bdef enrich_projects\s*\(' src/brainlayer/pipeline/code_intelligence.py
rg -n -C3 'sqlite3|connect\(|busy_timeout|SQLITE_BUSY|retry|lock|BEGIN|commit|rollback|checkpoint' src/brainlayer/pipeline/code_intelligence.py
echo "3) Locate existing shared one-writer guards for reuse"
rg -n -C2 'write.*lock|exclusive.*lock|one-write|filelock|portalocker|SQLITE_BUSY|busy_timeout' src/brainlayer --type=pyRepository: EtanHey/brainlayer
Repository: EtanHey/brainlayer
Exit code: 0
stdout:
1) Confirm call site in CLI
1935-
1936-@app.command("code-intel")
1937:def code_intel(
1938- base_dir: Path = typer.Option(Path.home() / "Gits", "--base-dir", "-d", help="Directory containing repos to scan"),
1939- dry_run: bool = typer.Option(False, "--dry-run", help="Show changes without applying"),
1945- rprint(f"[bold blue]זיכרון[/] - Code intelligence scan: [italic]{base_dir}[/]")
1946-
1947: result = enrich_projects(db_path=str(db_path), base_dir=base_dir, dry_run=dry_run)
1948-
1949- prefix = "[dim][DRY RUN][/] " if dry_run else ""
2) Inspect enrichment implementation for write-serialization / busy handling
152- return name
153- return None
154-
155-
156:def enrich_projects(
157- db_path: str | None = None,
158- base_dir: Path | None = None,
159- dry_run: bool = False,
160-) -> dict[str, Any]:
11-import json
12-import logging
13-import re
14:import sqlite3
15-import uuid
16-from pathlib import Path
17-from typing import Any
--
103-
104- # Detect package manager
105- pkg_manager = "npm"
106: if (repo_path / "bun.lockb").exists() or (repo_path / "bun.lock").exists():
107- pkg_manager = "bun"
108: elif (repo_path / "pnpm-lock.yaml").exists():
109- pkg_manager = "pnpm"
110: elif (repo_path / "yarn.lock").exists():
111- pkg_manager = "yarn"
112-
113- # Detect framework from dependencies
--
181- "dep_entities_created": 0,
182- }
183-
184: conn = sqlite3.connect(db_path)
185- conn.execute("PRAGMA journal_mode = WAL")
186-
187- stats = {
--
197- _upsert_project(conn, project, stats, dry_run)
198-
199- if not dry_run:
200: conn.commit()
201- finally:
202- conn.close()
203-
--
205-
206-
207-def _upsert_project(
208: conn: sqlite3.Connection,
209- project: dict[str, Any],
210- stats: dict[str, int],
211- dry_run: bool,
--
264-
265-
266-def _add_dependency_relation(
267: conn: sqlite3.Connection,
268- source_id: str,
269- source_name: str,
270- dep_name: str,
3) Locate existing shared one-writer guards for reuse
src/brainlayer/vector_store.py-37-
src/brainlayer/vector_store.py-38-
src/brainlayer/vector_store.py:39:def _set_busy_timeout_hook(conn: apsw.Connection) -> None:
src/brainlayer/vector_store.py:40: """Set busy_timeout on every new connection before any other hooks.
src/brainlayer/vector_store.py-41-
src/brainlayer/vector_store.py-42- APSW bestpractice hooks (connection_optimize) run PRAGMA optimize inside
src/brainlayer/vector_store.py:43: the Connection() constructor. Without busy_timeout set first, this PRAGMA
src/brainlayer/vector_store.py-44- fails with BusyError when other processes hold the DB lock.
src/brainlayer/vector_store.py-45- """
--
src/brainlayer/vector_store.py-47-
src/brainlayer/vector_store.py-48-
src/brainlayer/vector_store.py:49:# Register busy_timeout hook BEFORE bestpractice hooks so it fires first.
src/brainlayer/vector_store.py-50-# bestpractice.apply() adds hooks that run PRAGMA optimize inside Connection(),
src/brainlayer/vector_store.py:51:# which needs busy_timeout active or it crashes under contention.
src/brainlayer/vector_store.py:52:apsw.connection_hooks.insert(0, _set_busy_timeout_hook)
src/brainlayer/vector_store.py-53-apsw.bestpractice.apply(apsw.bestpractice.recommended)
src/brainlayer/vector_store.py-54-
--
src/brainlayer/vector_store.py-74-
src/brainlayer/vector_store.py-75- Multiple BrainLayer processes (MCP instances, daemon, enrichment) may
src/brainlayer/vector_store.py:76: contend for write locks during DDL. Retry with exponential backoff
src/brainlayer/vector_store.py-77- instead of crashing on the first BusyError.
src/brainlayer/vector_store.py-78- """
--
src/brainlayer/pipeline/enrichment.py-326-PRIMARY_SYMBOLS: Extract class names, function names, file paths, and variable names that are central to this chunk. Empty array if none.
src/brainlayer/pipeline/enrichment.py-327-
src/brainlayer/pipeline/enrichment.py:328:RESOLVED_QUERY: Write a natural question that someone would ask to find this chunk. E.g., "How do I fix EADDRINUSE errors in Bun?" or "What's the SQLite busy_timeout fix for concurrent access?"
src/brainlayer/pipeline/enrichment.py-329-
src/brainlayer/pipeline/enrichment.py-330-EPISTEMIC_LEVEL:
--
src/brainlayer/mcp/search_handler.py-642-
src/brainlayer/mcp/search_handler.py-643- # Retry hybrid_search on BusyError — WAL reads shouldn't block but
src/brainlayer/mcp/search_handler.py:644: # they can during checkpoint or when enrichment holds exclusive lock.
src/brainlayer/mcp/search_handler.py-645- results = None
src/brainlayer/mcp/search_handler.py-646- for attempt in range(_RETRY_MAX_ATTEMPTS):
--
src/brainlayer/clustering.py-417- """
src/brainlayer/clustering.py-418- cursor = conn.cursor()
src/brainlayer/clustering.py:419: cursor.execute("PRAGMA busy_timeout = 5000")
src/brainlayer/clustering.py-420- now = datetime.now(timezone.utc).isoformat()
src/brainlayer/clustering.py-421-
--
src/brainlayer/clustering.py-486-
src/brainlayer/clustering.py-487- cursor = conn.cursor()
src/brainlayer/clustering.py:488: cursor.execute("PRAGMA busy_timeout = 5000")
src/brainlayer/clustering.py-489-
src/brainlayer/clustering.py-490- # Build a mapping from global index to chunk content (sample for efficiency)
--
src/brainlayer/hooks/indexer.py-40- self._db = sqlite3.connect(self.db_path)
src/brainlayer/hooks/indexer.py-41- self._db.execute("PRAGMA journal_mode = WAL")
src/brainlayer/hooks/indexer.py:42: self._db.execute("PRAGMA busy_timeout = 5000")
src/brainlayer/hooks/indexer.py-43- self._db.execute("PRAGMA synchronous = NORMAL")
src/brainlayer/hooks/indexer.py-44- self._create_schema()
Add write-serialization protection to enrich_projects() before release.
The call to enrich_projects() at line 1947 invokes a write operation on the database without busy_timeout or retry logic. The function creates a direct sqlite3.connect() at line 184 with only PRAGMA journal_mode = WAL but omits PRAGMA busy_timeout and retry handling on SQLITE_BUSY errors. If enrichment or other MCP work runs concurrently, this will fail on lock contention instead of retrying. Other modules in the codebase already implement this pattern (see clustering.py PRAGMA busy_timeout, vector_store.py busy_timeout hook, search_handler.py BusyError retry). Add the same protection to enrich_projects() in src/brainlayer/pipeline/code_intelligence.py.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/brainlayer/cli/__init__.py` at line 1947, enrich_projects() in
src/brainlayer/pipeline/code_intelligence.py opens sqlite3 directly and sets
PRAGMA journal_mode = WAL but lacks PRAGMA busy_timeout and retry-on-SQLITE_BUSY
semantics; update the DB open and write paths in enrich_projects() (and the
helper that calls sqlite3.connect()) to execute PRAGMA busy_timeout = <ms> after
opening the connection and wrap write transactions in a retry loop that catches
sqlite3.OperationalError / SQLITE_BUSY, sleeps/backoffs and retries a few times
(matching patterns used in clustering.py and vector_store.py) so concurrent
MCP/enrichment work won't fail on lock contention. Ensure the same
connection/config is used for all writes in enrich_projects() so the
busy_timeout applies.
Summary
code_intelligence.pymodule that scans ~/Gits/ for repos withpyproject.tomlorpackage.jsondepends_onrelations for notable dependencies (frameworks, SDKs, databases)brainlayer code-intel [--base-dir] [--dry-run]R68 Sprint Context
Test plan
agent_enrichment.pypattern for KG entity upsertCo-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com
🤖 Generated with Claude Code
Note
Add
code-intelpipeline to auto-populate project entities from repo metadatapyproject.tomlorpackage.json, and upserts project entities anddepends_onrelations into the knowledge graph SQLite DB.lib-prefixed IDs.code-intelTyper CLI command in cli/init.py supporting--base-dirand--dry-run; returns stats covering entities created/updated and relations added.Macroscope summarized 29611c0.
Summary by CodeRabbit
code-intelCLI to scan repositories, extract metadata (name, version, language, package manager, scripts, framework hints), and enrich the knowledge graph; supports--base-dirand--dry-run.