Skip to content

fix: close BrainLayer eval reds#80

Merged
EtanHey merged 1 commit intomainfrom
fix/brainlayer-eval-reds
Mar 14, 2026
Merged

fix: close BrainLayer eval reds#80
EtanHey merged 1 commit intomainfrom
fix/brainlayer-eval-reds

Conversation

@EtanHey
Copy link
Copy Markdown
Owner

@EtanHey EtanHey commented Mar 14, 2026

Summary

  • validate brain_search detail and public num_results at the MCP boundary
  • add the eval_project fixture so eval tests can isolate seeded data by project
  • run seed-only KG extraction after remote cloud_backfill imports
  • cover the regressions with focused pytest coverage

Testing

  • PYTHONPATH=src pytest -q tests/test_search_quality.py tests/test_search_chunk_id.py tests/test_search_validation.py tests/test_cloud_backfill.py

@codex review

Summary by CodeRabbit

Bug Fixes

  • Added validation for search parameters: detail must be "compact" or "full", and num_results must be between 1 and 100.

Tests

  • Added regression test coverage for search validation and cloud data import operations.

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 14, 2026

Warning

Rate limit exceeded

@EtanHey has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 12 minutes and 42 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a9fbe03b-9291-42cb-b361-46eaa4c07480

📥 Commits

Reviewing files that changed from the base of the PR and between 6820286 and e73a7ea.

📒 Files selected for processing (2)
  • scripts/cloud_backfill.py
  • tests/test_cloud_backfill.py
📝 Walkthrough

Walkthrough

The changes add post-enrichment KG extraction to cloud backfill processing, introduce input validation for brain search operations, and establish comprehensive test coverage including a reusable project fixture and regression tests for both cloud backfill and search validation workflows.

Changes

Cohort / File(s) Summary
Cloud Backfill KG Extraction
scripts/cloud_backfill.py
Added imports for DEFAULT_SEED_ENTITIES and extract_kg_from_chunk, then integrated a post-enrichment KG extraction call within import_results using non-LLM and non-GliNER flags, with exception handling to prevent blocking the import flow.
Search Input Validation
src/brainlayer/mcp/search_handler.py
Introduced parameter validation in _brain_search to enforce detail must be "compact" or "full" and num_results between 1 and 100, returning error results for invalid inputs.
Test Fixtures & Infrastructure
tests/conftest.py
Added uuid import and new eval_project() fixture that generates unique project names using 8-character hex identifiers for test isolation.
Cloud Backfill Regression Tests
tests/test_cloud_backfill.py
New test module verifying cloud backfill import behavior: creates in-memory VectorStore-backed database, mocks KG extraction and checkpointing, asserts correct import counts and extract_kg_from_chunk invocation with proper parameters.
Search Validation Regression Tests
tests/test_search_validation.py
New comprehensive test suite with two test classes: TestDetailParamValidation validating detail and num_results constraints, and TestEvalProjectIsolation verifying project fixture uniqueness and cross-test contamination prevention using in-memory stores.

Sequence Diagram

sequenceDiagram
    participant Caller as Cloud Backfill<br/>import_results
    participant Parser as Enrichment<br/>Parser
    participant KGExtractor as extract_kg_from_chunk
    participant Handler as Exception<br/>Handler
    
    Caller->>Parser: parse enrichment result
    Parser-->>Caller: enrichment data
    Caller->>KGExtractor: extract_kg_from_chunk(chunk,<br/>seed_entities=DEFAULT_SEED_ENTITIES,<br/>use_llm=False, use_gliner=False)
    alt KG Extraction Success
        KGExtractor-->>Caller: extraction complete
        Caller->>Caller: continue import flow
    else KG Extraction Failure
        KGExtractor-->>Handler: exception raised
        Handler->>Handler: print warning
        Handler-->>Caller: error handled (non-blocking)
        Caller->>Caller: continue import flow
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰 A rabbit hops through data streams so clean,
Extracting knowledge from the in-between,
With validation guards and fixtures bright,
Tests now bloom to ensure all is right—
KG flows and searches pure and true! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 43.75% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'fix: close BrainLayer eval reds' is vague and does not clearly convey what the primary changes accomplish. While it mentions 'eval reds' (likely referring to failing evaluation tests), it does not explain what fixes are being applied or what the main objectives are. Consider updating the title to be more specific, such as 'fix: add search validation and KG extraction for cloud backfill' or similar, to better reflect the core changes being made.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/brainlayer-eval-reds
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented Mar 14, 2026

@codex review

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_search_validation.py`:
- Around line 29-41: Add a new unit test mirroring
test_num_results_over_100_returns_error to verify the lower bound: call
_brain_search with num_results=0 (imported from brainlayer.mcp.search_handler),
patch _get_vector_store, _detect_entities, and _search (AsyncMock) the same way,
then assert mock_search.assert_not_called(), result.isError is True, and that
the error message in result.content[0].text contains "must be between 1 and
100"; name the test e.g. test_num_results_zero_returns_error to make the
lower-bound validation explicit.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 331d6332-42a6-438b-a346-4c12feb499b7

📥 Commits

Reviewing files that changed from the base of the PR and between 50d637d and 6820286.

📒 Files selected for processing (5)
  • scripts/cloud_backfill.py
  • src/brainlayer/mcp/search_handler.py
  • tests/conftest.py
  • tests/test_cloud_backfill.py
  • tests/test_search_validation.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: test (3.12)
  • GitHub Check: test (3.13)
  • GitHub Check: test (3.11)
🧰 Additional context used
📓 Path-based instructions (2)
src/brainlayer/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/brainlayer/**/*.py: Python package structure should follow the layout: src/brainlayer/ for package code, with separate modules for vector_store.py, embeddings.py, daemon.py, dashboard/, and mcp/ for different concerns
Use paths.py:get_db_path() for all database path resolution instead of hardcoding paths; support environment variable overrides and canonical path fallback (~/.local/share/brainlayer/brainlayer.db)
Lint and format Python code using ruff check src/ and ruff format src/
Preserve verbatim content for ai_code, stack_trace, and user_message message types during classification and chunking; skip noise content entirely; summarize build_log content; extract structure-only for dir_listing
Use AST-aware chunking with tree-sitter; never split stack traces; mask large tool output during chunking
Handle SQLite concurrency by implementing retry logic on SQLITE_BUSY errors; ensure each worker uses its own database connection
Prioritize MLX (Qwen2.5-Coder-14B-Instruct-4bit) on Apple Silicon (port 8080) as the enrichment backend; fall back to Ollama (glm-4.7-flash on port 11434) after 3 consecutive MLX failures; support backend override via BRAINLAYER_ENRICH_BACKEND environment variable
Brain graph API must expose endpoints: /brain/graph, /brain/node/{node_id} (FastAPI)
Backlog API must support endpoints: /backlog/items with GET, POST, PATCH, DELETE operations (FastAPI)
Provide brainlayer brain-export command to export brain graph as JSON for dashboard consumption
Provide brainlayer export-obsidian command to export as Markdown vault with backlinks and tags
For bulk database operations: stop enrichment workers first, checkpoint WAL before and after operations, drop FTS triggers before bulk deletes, batch deletes in 5-10K chunks with checkpoint every 3 batches, never delete from chunks while FTS trigger is active

Files:

  • src/brainlayer/mcp/search_handler.py
src/brainlayer/mcp/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

MCP tools must implement: brain_search, brain_store, brain_recall, brain_entity, brain_expand, brain_update, brain_digest, brain_get_person with legacy brainlayer_* aliases for backward compatibility

Files:

  • src/brainlayer/mcp/search_handler.py
🧠 Learnings (2)
📚 Learning: 2026-03-12T14:22:54.798Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-12T14:22:54.798Z
Learning: Applies to src/brainlayer/mcp/**/*.py : MCP tools must implement: `brain_search`, `brain_store`, `brain_recall`, `brain_entity`, `brain_expand`, `brain_update`, `brain_digest`, `brain_get_person` with legacy `brainlayer_*` aliases for backward compatibility

Applied to files:

  • src/brainlayer/mcp/search_handler.py
📚 Learning: 2026-03-12T14:22:54.798Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-12T14:22:54.798Z
Learning: Run tests using `pytest` for the project

Applied to files:

  • tests/conftest.py
🔇 Additional comments (8)
tests/conftest.py (1)

17-21: LGTM!

The eval_project fixture correctly generates a unique project name per test case using UUID hex prefix. Function scope (default) ensures test isolation.

src/brainlayer/mcp/search_handler.py (2)

12-14: LGTM!

Constants are well-defined using frozenset for O(1) membership testing and clear boundary values for num_results validation.


130-136: LGTM!

Input validation correctly rejects invalid detail values and out-of-range num_results before any expensive operations. Error messages are descriptive and include valid options/ranges.

scripts/cloud_backfill.py (2)

37-45: LGTM!

New imports correctly reference internal modules for seed-only KG extraction. The import of DEFAULT_SEED_ENTITIES provides domain-specific entities for fast, deterministic extraction without LLM overhead.


515-524: LGTM!

KG extraction is appropriately wrapped in try/except to ensure failures don't block the import flow. The warning message includes both chunk_id and exception details for debugging. Parameters correctly disable LLM/GLiNER for fast seed-only extraction during bulk backfill.

tests/test_cloud_backfill.py (1)

1-102: LGTM!

Well-structured regression test that:

  • Creates isolated VectorStore with tmp_path
  • Inserts minimal unenriched chunk matching DB schema
  • Uses monkeypatch correctly with raising=False for imported symbols
  • Verifies both return counts and KG extraction call parameters
  • Properly cleans up with finally block
tests/test_search_validation.py (2)

12-57: LGTM!

Comprehensive validation tests with correct patch targeting at the module namespace level. Tests cover both rejection paths (invalid detail, num_results > 100) and the valid path (detail="compact" delegates to _search).


60-121: LGTM!

The TestEvalProjectIsolation class effectively demonstrates project isolation:

  • The xfail test documents the contamination issue when sharing projects
  • The passing test proves unique project IDs prevent cross-case leakage
  • The fixture validation test ensures eval_project follows the expected format

The _embed helper provides deterministic embeddings suitable for testing without external dependencies.

Comment thread tests/test_search_validation.py Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6820286609

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread scripts/cloud_backfill.py
Comment on lines +523 to 525
except Exception as exc:
print(f" WARNING: KG extraction failed for {chunk_id}: {exc}")
success += 1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Treat KG extraction failure as failed import

When extract_kg_from_chunk throws, this path only logs a warning and still increments success. In import_results, chunks with non-null enriched_at are skipped on future runs, so any transient extraction error here leaves a chunk permanently marked as imported but missing KG links with no retry path. Count this as a failure (or otherwise preserve retryability) so reruns can repair KG coverage.

Useful? React with 👍 / 👎.

@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented Mar 14, 2026

@cursor review
@BugBot review

@EtanHey EtanHey force-pushed the fix/brainlayer-eval-reds branch from f995465 to e73a7ea Compare March 14, 2026 02:27
@EtanHey EtanHey merged commit 4054d11 into main Mar 14, 2026
@EtanHey EtanHey deleted the fix/brainlayer-eval-reds branch March 14, 2026 02:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant