Skip to content

DEV-1532: unify search() — flat results list, single max_results cap, naive Cypher fallback#166

Merged
ZmeiGorynych merged 4 commits into
egor/dev-1464-unify-memories-search-on-sqlite-fts5-sqlite-vec-graphqlitefrom
egor/dev-1532-clean-up-search-post-cypher
Jun 5, 2026
Merged

DEV-1532: unify search() — flat results list, single max_results cap, naive Cypher fallback#166
ZmeiGorynych merged 4 commits into
egor/dev-1464-unify-memories-search-on-sqlite-fts5-sqlite-vec-graphqlitefrom
egor/dev-1532-clean-up-search-post-cypher

Conversation

@ZmeiGorynych
Copy link
Copy Markdown
Member

@ZmeiGorynych ZmeiGorynych commented Jun 5, 2026

Summary

  • Replace the three-bucket SearchResponse (memories/example_queries/entities) with a single flat ranked results: List[SearchHit] list
  • Remove max_memories/max_example_queries/max_entities; replace with a single max_results: int = 10 cap across all hit types
  • Add naive Cypher parser (slayer/search/cypher_naive.py) that supports MATCH (n:Label1:Label2) RETURN n.id AS id without the advanced_search extra; complex Cypher raises SlayerError with an install hint
  • Fix LadybugDB 0.15.x incompatibilities: ColumnModelColumn node table rename, $desc$descr parameter rename, Connection() API change
  • Update all four surfaces (REST, MCP, CLI, Python client), all existing search tests, and docs

Test plan

  • poetry run pytest tests/test_cypher_naive.py — 19 unit tests for the naive parser
  • poetry run pytest tests/test_search_unified.py — 64 new integration tests pinning the unified API
  • poetry run pytest -m "not integration" — full unit suite (3604 pass, 0 fail)
  • poetry run pytest tests/integration/test_integration.py -m integration -k search — SQLite integration search tests

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Unified search results: memories and entities returned in a single flat results list with a kind discriminator and a single max_results cap (CLI flag --max-results).
    • Naive Cypher fallback for simple label-only graph filters when advanced graph support is unavailable; updated deterministic ranking/recency fusion.
  • Documentation

    • Search docs, CLI help, and examples updated to the new API/response shape and ranking semantics.
  • Tests

    • Large test-suite updates and new tests covering unified results, Cypher fallback, ranking stability, and warnings.

…sults cap, naive Cypher fallback

Replace the three-bucket search response (memories/example_queries/entities) with
a single ranked results: List[SearchHit] list capped by max_results. Remove the
per-bucket max_memories/max_example_queries/max_entities params. Add a naive Cypher
parser (cypher_naive.py) that supports MATCH (n:Label1:Label2) RETURN n.id AS id
without the advanced_search extra, raising SlayerError for complex Cypher. Fix
LadybugDB 0.15.x incompatibilities: Column→ModelColumn node table rename, $desc→$descr
parameter rename, Connection() API change.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@linear
Copy link
Copy Markdown

linear Bot commented Jun 5, 2026

DEV-1532

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 5, 2026

Review Change Stack

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 515607af-05fc-4997-9cc3-2c92fdb598b4

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Converts search to a single fused SearchResponse.results list of SearchHit items capped by a new max_results parameter; adds a naive Cypher label-filter fallback when advanced_search is unavailable; renames graph Column→ModelColumn; and updates REST/MCP/CLI/client surfaces, docs, notebooks, and extensive tests to the new contract.

Changes

Unified Search Response Refactor

Layer / File(s) Summary
SearchHit and SearchResponse unified models
slayer/search/service.py, slayer/search/__init__.py
Introduces SearchHit with kind, id, score, text, matched_entities, and optional query; replaces MemoryHit/ExampleQueryHit/EntityHit. SearchResponse now contains results: List[SearchHit].
SearchService refactoring with unified fusion
slayer/search/service.py
SearchService.search accepts max_results (validated >=1), fuses channel outputs via RRF into a single flat results list (_fuse_all_hits), applies kind_filter/cypher_filter logic, and refactors recency fallback and stale-query warnings to operate over unified SearchHit objects.
Naive Cypher fallback parser
slayer/search/cypher_naive.py, tests/test_cypher_naive.py
Adds parse_naive_label_filter implementing strict MATCH (var:Label[:...]) RETURN var.id AS id parsing for environments without advanced_search; raises SlayerError for unknown labels or complex Cypher; unit tests cover happy and error paths.
Graph schema Column→ModelColumn rename
slayer/search/graph.py, tests/test_search_graph.py
Renames ephemeral graph node label for schema columns to ModelColumn, updates MENTIONS/CONTAINS wiring and node/edge insert helpers, and adapts tests and example Cypher accordingly.
API surfaces: REST, MCP, CLI, client
slayer/api/server.py, slayer/mcp/server.py, slayer/cli.py, slayer/client/slayer_client.py
SearchRequest, MCP tool signature, CLI flags, and SlayerClient.search now use max_results; cypher_filter is forwarded; error handling expanded to include SlayerError; CLI and client output iterate response.results.
Documentation and notebook examples
CLAUDE.md, docs/concepts/search.md, docs/examples/09_lightning_talk/lightning_talk_nb.ipynb
Docs updated to document max_results, unified results shape, ModelColumn naming, ranking-stability note (DEV-1414), DEV-1513 self-ref behavior, cypher_filter naive fallback, and notebook examples adapted to filter resp.results by hit.kind.
Comprehensive test updates and helpers
tests/*, tests/search_helpers.py
Adds tests/test_search_unified.py and tests/test_cypher_naive.py; updates many existing tests to filter response.results by hit.kind; adds tests/search_helpers.py with seeding and MCP-call helpers; updates REST/MCP/CLI/client surface tests.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • MotleyAI/slayer#160: Overlaps with SearchService refactor and unified SearchHit/SearchResponse shape changes.
  • MotleyAI/slayer#161: Related changes to SearchService fusion pipeline and post-fusion column handling.
  • MotleyAI/slayer#127: Related to DEV-1414 ranking stability and per-kind ranking behavior that this PR documents and enforces.

"🐰 I hopped through docs and tests with a cheerful sprint,
One flat results to rule them—search now speaks in mint.
Max results in a pocket, ModelColumn in the cart,
Naive Cypher on fallback keeps the queries smart."

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 45.71% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the primary changes: unifying search results into a flat list, introducing a single max_results cap, and adding naive Cypher fallback support.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch egor/dev-1532-clean-up-search-post-cypher

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (4)
tests/test_search_service.py (1)

438-445: ⚡ Quick win

Align these cap tests with unified max_results semantics.

Line 438 and Line 452 still describe legacy per-bucket behavior (default_is_two, caps_independently), but current assertions only verify generic presence/flat cap. Please rename and tighten these tests so they validate the actual unified contract.

♻️ Suggested cleanup
-async def test_max_example_queries_default_is_two(
+async def test_query_bearing_memories_surface_with_default_max_results(
     service_with_query_memories: SearchService,
 ) -> None:
-    # With max_results=10 (default), the flat list may include up to 10 items.
-    # We check that query-bearing memories surface.
     response = await service_with_query_memories.search(
         entities=["warehouse.orders.amount_paid"],
-        max_results=20,
     )
+    assert len(response.results) <= 10
     example_query_hits = [h for h in response.results if h.kind == "memory" and h.query is not None]
     assert len(example_query_hits) >= 1
 
-async def test_max_example_queries_caps_independently(
+async def test_max_results_caps_flat_list(
     service_with_query_memories: SearchService,
 ) -> None:
-    # With max_results=1, at most 1 hit surfaces total.
     response = await service_with_query_memories.search(
         entities=["warehouse.orders.amount_paid"],
         max_results=1,
     )
     assert len(response.results) <= 1

Also applies to: 452-457

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_search_service.py` around lines 438 - 445, The tests referencing
legacy per-bucket semantics should be renamed and tightened to assert the
unified flat max_results contract: update the test that calls
service_with_query_memories.search(entities=["warehouse.orders.amount_paid"],
max_results=20) (and the related test around service_with_query_memories.search
with default/other max_results) to (1) rename from legacy names like
default_is_two / caps_independently to something like
unified_max_results_flat_cap, and (2) assert that len(response.results) <=
max_results and that example_query_hits (filter on h.kind == "memory" and
h.query is not None) is present when expected; optionally also assert that
results across all buckets combined do not exceed max_results to validate the
unified cap behavior (use service_with_query_memories.search and
example_query_hits to locate the code to change).
tests/test_search_unified.py (1)

148-148: ⚡ Quick win

Move function-local imports to module top level.

This file repeatedly imports inside functions; that conflicts with the repository rule for Python files.

As per coding guidelines, **/*.py: Imports at the top of files.

Also applies to: 456-456, 471-471, 488-488, 505-505, 520-521, 540-540, 624-624, 637-637, 652-652, 675-675, 687-687, 748-748, 767-767

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_search_unified.py` at line 148, The test file performs repeated
function-local imports like "import slayer.search.service as svc_mod" inside
test functions; move these imports to the module top
(tests/test_search_unified.py) so all occurrences (including the other listed
locations) use a single top-level import, remove the in-function duplicates, and
update references to use the top-level name (svc_mod) to satisfy the repository
rule that Python imports must be at file top; if there are circular-import
concerns, resolve them by deferring only the import-sensitive parts (e.g., use
importlib.import_module in a single helper at top) rather than scattering
in-function import statements.
tests/test_search_named_entity_surfacing.py (1)

406-423: ⚡ Quick win

Rename test to reflect new behavior.

The test name test_max_entities_zero_suppresses_channel_1_named_output is now misleading. The test asserts that the named entity is present (line 423), which is opposite to what the name suggests. With the unified max_results API, there's no longer a separate max_entities=0 cap to suppress entity output.

Consider renaming to test_named_entity_surfaces_with_max_results or similar to accurately describe the validated behavior.

Suggested rename
-async def test_max_entities_zero_suppresses_channel_1_named_output(
+async def test_named_entity_surfaces_with_max_results(
     service: SearchService,
 ) -> None:
-    # With max_results=1, a single memory hit (if any) would fill the slot,
-    # but since no memories are tagged on this entity and no question is provided,
-    # only the named entity hit would appear. Use max_results=20 and filter.
-    # The original test checked max_entities=0 suppressed entity output.
-    # With the flat API, we just verify that entity hits are present when max_results>0.
-    # To mirror the old behavior of max_entities=0, we verify the named entity
-    # surfaces when max_results is large enough.
+    """Named entity refs surface in the flat results list when max_results allows."""
     resp = await service.search(
         entities=["warehouse.orders.amount"],
         max_results=20,
     )
     entity_hits = [h for h in resp.results if h.kind != "memory"]
-    # Named entity should surface (unlike old max_entities=0 which suppressed it).
-    # This test is now a positive assertion: the entity IS present.
     assert any(h.id == "warehouse.orders.amount" for h in entity_hits)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_search_named_entity_surfacing.py` around lines 406 - 423, Rename
the test function test_max_entities_zero_suppresses_channel_1_named_output to a
name that reflects the new positive assertion (for example
test_named_entity_surfaces_with_max_results) and update any references to that
function; locate the async test function definition named
test_max_entities_zero_suppresses_channel_1_named_output in
tests/test_search_named_entity_surfacing.py, change its identifier to the new
descriptive name, and run the test suite to ensure no other callers or fixtures
reference the old name.
slayer/search/service.py (1)

636-639: ⚡ Quick win

Pass the graph-filter call by keyword.

This new 2-argument call is the one changed spot in the file that still relies on positional ordering.
As per coding guidelines, "Use keyword arguments for functions with more than 1 parameter".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@slayer/search/service.py` around lines 636 - 639, The call to
_search_graph.get_filtered_ids currently uses positional args; change it to use
keyword arguments to follow the guideline for functions with more than one
parameter by calling _search_graph.get_filtered_ids with named parameters (e.g.,
pass cypher_filter=cypher_filter and storage=self._storage) so the invocation is
explicit and order-independent while still relying on
_search_graph.is_available() and the existing cypher_filter and self._storage
variables.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/concepts/search.md`:
- Around line 233-246: Update the naive parser documentation and its accepted
label set so it matches the actual graph schema: replace the standalone "Column"
label with "ModelColumn" (or list both "Column" and "ModelColumn" as aliases) in
the label array described for the naive Cypher parser (the section that lists
Memory, Datasource, Model, Column, Measure, Aggregation); if the parser code
(naive parser implementation) only accepts "ModelColumn" ensure the docs reflect
that exact symbol, or if you add alias support, update the naive parser logic to
accept both "Column" and "ModelColumn" and document both.

In `@slayer/search/cypher_naive.py`:
- Around line 58-69: The current label-parsing code (labels_str, labels,
_LABEL_TO_KIND) builds a set of kinds for multi-label MATCHes, which incorrectly
implements Cypher's label-intersection semantics; instead, update the logic to
detect when more than one label is present (len(labels) > 1) and raise a
SlayerError rejecting multi-label MATCHes in the naive fallback (include a clear
message like "multi-label MATCH not supported in naive fallback"), leaving the
existing single-label mapping and unknown-label check intact so only
single-label MATCHes are converted to kinds via _LABEL_TO_KIND.
- Around line 19-26: The fallback label map _LABEL_TO_KIND currently lacks the
new ModelColumn label so queries like MATCH (n:ModelColumn) RETURN n.id AS id
break; update the _LABEL_TO_KIND dict (the symbol _LABEL_TO_KIND) to include an
entry mapping "ModelColumn" to "column" (i.e., "ModelColumn": "column") so
column-only filters continue to work with the renamed schema.

In `@tests/test_search_unified.py`:
- Around line 195-216: The TypeError tests should avoid static named-arg
detection by replacing direct named-argument calls to SearchService.search with
dict-based kwarg unpacking; update the three tests
(test_max_memories_param_removed_raises_type_error,
test_max_example_queries_param_removed_raises_type_error,
test_max_entities_param_removed_raises_type_error) to call await
service.search(**{"question": "x", "max_memories": 5}) /
**{"question":"x","max_example_queries":2} / **{"question":"x","max_entities":5}
respectively so the runtime still raises TypeError but static analyzers won't
flag the named args, and apply the same pattern for the similar occurrences
around the noted 763-773 range.

---

Nitpick comments:
In `@slayer/search/service.py`:
- Around line 636-639: The call to _search_graph.get_filtered_ids currently uses
positional args; change it to use keyword arguments to follow the guideline for
functions with more than one parameter by calling _search_graph.get_filtered_ids
with named parameters (e.g., pass cypher_filter=cypher_filter and
storage=self._storage) so the invocation is explicit and order-independent while
still relying on _search_graph.is_available() and the existing cypher_filter and
self._storage variables.

In `@tests/test_search_named_entity_surfacing.py`:
- Around line 406-423: Rename the test function
test_max_entities_zero_suppresses_channel_1_named_output to a name that reflects
the new positive assertion (for example
test_named_entity_surfaces_with_max_results) and update any references to that
function; locate the async test function definition named
test_max_entities_zero_suppresses_channel_1_named_output in
tests/test_search_named_entity_surfacing.py, change its identifier to the new
descriptive name, and run the test suite to ensure no other callers or fixtures
reference the old name.

In `@tests/test_search_service.py`:
- Around line 438-445: The tests referencing legacy per-bucket semantics should
be renamed and tightened to assert the unified flat max_results contract: update
the test that calls
service_with_query_memories.search(entities=["warehouse.orders.amount_paid"],
max_results=20) (and the related test around service_with_query_memories.search
with default/other max_results) to (1) rename from legacy names like
default_is_two / caps_independently to something like
unified_max_results_flat_cap, and (2) assert that len(response.results) <=
max_results and that example_query_hits (filter on h.kind == "memory" and
h.query is not None) is present when expected; optionally also assert that
results across all buckets combined do not exceed max_results to validate the
unified cap behavior (use service_with_query_memories.search and
example_query_hits to locate the code to change).

In `@tests/test_search_unified.py`:
- Line 148: The test file performs repeated function-local imports like "import
slayer.search.service as svc_mod" inside test functions; move these imports to
the module top (tests/test_search_unified.py) so all occurrences (including the
other listed locations) use a single top-level import, remove the in-function
duplicates, and update references to use the top-level name (svc_mod) to satisfy
the repository rule that Python imports must be at file top; if there are
circular-import concerns, resolve them by deferring only the import-sensitive
parts (e.g., use importlib.import_module in a single helper at top) rather than
scattering in-function import statements.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e313e15a-5319-4a6b-9b2e-6342eac86f41

📥 Commits

Reviewing files that changed from the base of the PR and between 58036dc and fb4fd46.

📒 Files selected for processing (26)
  • CLAUDE.md
  • docs/concepts/search.md
  • docs/examples/09_lightning_talk/lightning_talk_nb.ipynb
  • slayer/api/server.py
  • slayer/cli.py
  • slayer/client/slayer_client.py
  • slayer/mcp/server.py
  • slayer/search/__init__.py
  • slayer/search/cypher_naive.py
  • slayer/search/graph.py
  • slayer/search/service.py
  • tests/integration/test_integration.py
  • tests/test_cypher_naive.py
  • tests/test_lightning_talk_notebook.py
  • tests/test_memory_id_in_responses.py
  • tests/test_search_datasource_filter.py
  • tests/test_search_example_query_stale_warning.py
  • tests/test_search_graph.py
  • tests/test_search_invariance.py
  • tests/test_search_lazy_gc_in_memory.py
  • tests/test_search_lenient_validation.py
  • tests/test_search_named_entity_surfacing.py
  • tests/test_search_service.py
  • tests/test_search_surfaces.py
  • tests/test_search_three_channel.py
  • tests/test_search_unified.py

Comment thread docs/concepts/search.md
Comment thread slayer/search/cypher_naive.py
Comment thread slayer/search/cypher_naive.py Outdated
Comment on lines +58 to +69
labels_str = match.group(1)
labels = [lb.strip() for lb in re.split(r"\s*:\s*", labels_str) if lb.strip()]
kinds: Set[str] = set()
for label in labels:
kind = _LABEL_TO_KIND.get(label.lower())
if kind is None:
raise SlayerError(
f"unknown entity type {label!r} in cypher_filter; "
f"known types: {sorted(_LABEL_TO_KIND)!r}."
)
kinds.add(kind)
return kinds
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Jun 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject multi-label MATCHes in the naive fallback.

MATCH (n:Model:Measure) RETURN n.id AS id currently becomes {"model", "measure"}, and SearchService treats that as an allowlist. That broadens results instead of preserving Cypher's label-intersection semantics.

💡 Suggested fix
     labels_str = match.group(1)
     labels = [lb.strip() for lb in re.split(r"\s*:\s*", labels_str) if lb.strip()]
-    kinds: Set[str] = set()
-    for label in labels:
-        kind = _LABEL_TO_KIND.get(label.lower())
-        if kind is None:
-            raise SlayerError(
-                f"unknown entity type {label!r} in cypher_filter; "
-                f"known types: {sorted(_LABEL_TO_KIND)!r}."
-            )
-        kinds.add(kind)
-    return kinds
+    if len(labels) != 1:
+        raise SlayerError(
+            "cypher_filter naive fallback only supports a single node label; "
+            "install the advanced_search extra for multi-label MATCH clauses."
+        )
+    label = labels[0]
+    kind = _LABEL_TO_KIND.get(label.lower())
+    if kind is None:
+        raise SlayerError(
+            f"unknown entity type {label!r} in cypher_filter; "
+            f"known types: {sorted(_LABEL_TO_KIND)!r}."
+        )
+    return {kind}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@slayer/search/cypher_naive.py` around lines 58 - 69, The current
label-parsing code (labels_str, labels, _LABEL_TO_KIND) builds a set of kinds
for multi-label MATCHes, which incorrectly implements Cypher's
label-intersection semantics; instead, update the logic to detect when more than
one label is present (len(labels) > 1) and raise a SlayerError rejecting
multi-label MATCHes in the naive fallback (include a clear message like
"multi-label MATCH not supported in naive fallback"), leaving the existing
single-label mapping and unknown-label check intact so only single-label MATCHes
are converted to kinds via _LABEL_TO_KIND.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai LadybugDB uses union semantics for multi-label MATCH: every node has exactly one label (it lives in a typed node table), so MATCH (n:Model:Measure) returns the union of all Model nodes and all Measure nodes. This is documented in both the code (graph.py docstring) and search.md: 'Multi-label union: MATCH (n:Memory:ModelColumn) returns nodes from both Memory AND ModelColumn tables (LadybugDB union semantics).' The naive parser correctly implements this same union behavior. Restricting multi-label MATCHes to a single label would silently break the parity with the full LadybugDB path.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ZmeiGorynych, thanks for the detailed clarification — you're absolutely right. LadybugDB's single-label-per-node-table architecture means multi-label MATCH is a union operation by definition, and the naive parser's Set[str] of kinds is the correct way to implement that same semantics. My suggestion was based on a faulty assumption about intersection semantics. The current implementation is correct as-is.


✏️ Learnings added
Learnt from: ZmeiGorynych
Repo: MotleyAI/slayer PR: 166
File: slayer/search/cypher_naive.py:58-69
Timestamp: 2026-06-05T07:02:54.018Z
Learning: In MotleyAI/slayer, LadybugDB uses **union semantics** for multi-label MATCH clauses: every node lives in exactly one typed node table and carries exactly one label, so `MATCH (n:Model:Measure) RETURN n.id AS id` returns the *union* of all Model nodes and all Measure nodes (not an intersection). This is documented in `slayer/search/graph.py`'s docstring and `docs/concepts/search.md` ("Multi-label union: MATCH (n:Memory:ModelColumn) returns nodes from both Memory AND ModelColumn tables"). The naive Cypher parser in `slayer/search/cypher_naive.py` correctly implements this by collecting all resolved labels into a `Set[str]` of kind strings passed as a kind-filter. Do NOT suggest restricting multi-label MATCHes to a single label in the naive fallback — that would silently break parity with the full LadybugDB path.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: ZmeiGorynych
Repo: MotleyAI/slayer PR: 163
File: slayer/sql/generator.py:4959-4976
Timestamp: 2026-06-03T10:48:07.772Z
Learning: In MotleyAI/slayer DEV-1503 filtered-local isolation, do not flag `slayer/sql/generator.py`'s empty-base FROM-less `_placeholder` path as dropping host filters. For filtered-local plans, `slayer/engine/cross_model_planner.py::_classify_subplan_filters` propagates ROW-phase host filters into the rerooted host-rooted sub-plan, so `SlayerQuery.filters` and `SlayerModel.filters` render inside the `_cm_*` CTE. AGGREGATE-phase filters referencing isolated aggregates route to the outer combined SELECT `WHERE`, and POST filters remain in the post-transform wrapper.

Learnt from: CR
Repo: MotleyAI/slayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T13:31:52.102Z
Learning: Applies to slayer/engine/query_enrichment.py : DEV-1443: Query measure rename via `name` — filters and ORDER BY may reference either raw colon form or user alias; both resolve to user alias, and colon-form filter classifies as HAVING. Renames colliding with source columns or canonical-alias shadowing source columns are rejected

Learnt from: ZmeiGorynych
Repo: MotleyAI/slayer PR: 101
File: slayer/engine/ingestion.py:826-831
Timestamp: 2026-05-06T11:03:22.101Z
Learning: In MotleyAI/slayer auto-ingestion (`slayer/engine/ingestion.py`, `_columns_to_model`), the only reserved result alias that conflicts with a physical column name is `_count` (produced by the `*:count` aggregation formula via `canonical_agg_name` in `slayer/core/formula.py`). The rename guard `"count_col" if col_name == "_count" else col_name` is the correct and sufficient fix. A physical column literally named `count` does NOT conflict: it yields dimension alias `count` and aggregation aliases like `count_sum`, neither of which collides with `_count`. Do not flag bare `count` columns as needing renaming during auto-ingestion.

Learnt from: ZmeiGorynych
Repo: MotleyAI/slayer PR: 67
File: slayer/engine/query_engine.py:323-332
Timestamp: 2026-05-03T07:08:36.453Z
Learning: In MotleyAI/slayer, `dry_run` and `explain` on `SlayerQuery` are considered execution flags (not query-shape fields), but they currently live on `SlayerQuery` and therefore survive serialization into stored `source_queries`. The asymmetric "only override when truthy" pattern in `_execute_by_name` (slayer/engine/query_engine.py) is a known limitation tracked in issue `#71`, which aims to move these flags off `SlayerQuery` entirely. Do not suggest unconditional overrides as a fix; point to the deeper redesign tracked in `#71` instead.

Learnt from: ZmeiGorynych
Repo: MotleyAI/slayer PR: 99
File: slayer/core/formula.py:776-835
Timestamp: 2026-05-06T07:58:02.748Z
Learning: In MotleyAI/slayer, flat-dispatch functions that exceed Sonar's cognitive-complexity threshold (S3776) are intentionally left as-is and suppressed with NOSONAR when splitting into helpers would require threading context variables (e.g., `transform`, `original`) through every helper call solely to preserve error-message quality. This is an established repo convention used in at least 7 locations (slayer/cli.py:418, slayer/cli.py:893, and 5 spots in slayer/engine/query_engine.py). Do not flag or suggest refactoring these flat-dispatch blocks on cognitive-complexity grounds alone.

Learnt from: CR
Repo: MotleyAI/slayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T13:31:52.102Z
Learning: DEV-1445: Cross-model filter remap — same-stage filter referencing renamed cross-model measure is NOT auto-resolved (both raise at strict resolution). Workaround: restructure as multi-stage `source_queries` so cross-model measure becomes local

Learnt from: CR
Repo: MotleyAI/slayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T13:31:52.102Z
Learning: Applies to slayer/engine/query_enrichment.py : DEV-1378: Mode A model filters route through `parse_sql_predicate` at enrichment time. `SlayerQuery.filters` (Mode B) accepts lowercase allowlist: `lower`, `upper`, `trim`, `replace`, `substr`, `instr`, `length`, `concat`, plus `||` concat operator (rewritten to `concat(...)`)

Learnt from: ZmeiGorynych
Repo: MotleyAI/slayer PR: 61
File: docs/reference/rest-api.md:0-0
Timestamp: 2026-04-29T14:35:38.432Z
Learning: In MotleyAI/slayer (PR `#61`, v2 schema), `SlayerQuery` uses `measures` (not `fields`) as the query-time aggregation key. `fields` was the v1 name; it was renamed to `measures` as part of the v2 schema migration. The REST API `POST /query` payload and all other query examples should use `"measures": [...]` with colon-aggregation syntax (e.g., `"*:count"`, `"revenue:sum"`). Do not suggest reverting `measures` to `fields`.

Learnt from: CR
Repo: MotleyAI/slayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T13:31:52.102Z
Learning: Applies to slayer/search/service.py : DEV-1513: BM25 operates with implicit self-references — every doc carries a single tag pointing at itself. So `entities=["<canonical>"]` surfaces the named entity, and `entities=["memory:<id>"]` surfaces the named memory

Learnt from: CR
Repo: MotleyAI/slayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T09:40:10.632Z
Learning: Applies to **/formula.py : SlayerQuery.filters (Mode B DSL) accepts only a lowercase allowlist of string-hygiene scalars: lower, upper, trim, replace, substr, instr, length, concat—plus the SQL || concat operator (rewritten to concat(...))

Learnt from: CR
Repo: MotleyAI/slayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T13:31:52.102Z
Learning: Applies to slayer/facade/**/*.py : DEV-1486: Facade-agnostic shared layer extracted to `slayer/facade/` (translator.py, catalog.py, info_schema.py, probe_queries.py, rows.py, datatypes.py). Flight/Postgres facades re-export under historical names and wrap RowBatch at edge

Learnt from: CR
Repo: MotleyAI/slayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T15:53:24.229Z
Learning: Applies to **/search/*.py : Search runs three parallel channels merged by RRF: Channel 1 (entity-overlap BM25 over memories), Channel 2 (tantivy full-text over memories ∪ entities), Channel 3 (optional dense embedding similarity via litellm)

Learnt from: CR
Repo: MotleyAI/slayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T13:31:52.102Z
Learning: `slayer search [--entity ENT ...] [--query JSON] [--question TEXT] [--datasource DS] [--max-memories N] [--max-example-queries N] [--max-entities N] [--format json|text]` runs three-channel semantic search: BM25 + tantivy full-text + optional embeddings (DEV-1375/DEV-1386/DEV-1409)

Learnt from: CR
Repo: MotleyAI/slayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T15:53:24.229Z
Learning: Implement three-channel semantic search (BM25 + Tantivy + optional embeddings) with Reciprocal Rank Fusion (k=60) over memories and entities, with cascade-on-delete to strip dangling references

Learnt from: ZmeiGorynych
Repo: MotleyAI/slayer PR: 36
File: slayer/mcp/server.py:167-201
Timestamp: 2026-04-17T09:00:49.541Z
Learning: In MotleyAI/slayer, custom aggregations defined via `Aggregation` with `params` always declare default values for each parameter. Therefore, when reviewing code that uses a measure’s `allowed_aggregations` (including custom aggregations), do not raise/flag an “unsafe/missing arguments” concern for calling `allowed_aggregations[0]` (or any listed aggregation) solely because explicit arguments are not supplied—defaults make the call safe.

Learnt from: ZmeiGorynych
Repo: MotleyAI/slayer PR: 59
File: slayer/mcp/server.py:1725-1740
Timestamp: 2026-04-29T13:25:08.874Z
Learning: Do not treat `attrs.get(col_name)` as an error/crash when `attrs` is a `ResponseAttributes` instance. In `slayer/engine/query_engine.py`, `ResponseAttributes` is a Pydantic `BaseModel` with `dimensions: Dict[str, FieldMetadata]` and `measures: Dict[str, FieldMetadata]`, and it provides a custom `get(column: str) -> Optional[FieldMetadata]` that searches both dicts. Also note `SlayerResponse.attributes` should always be a `ResponseAttributes` instance (non-optional) because it uses `default_factory=ResponseAttributes`—so `attributes.get(...)` is intentionally safe and may legitimately return `None` if the column key isn’t present.

Learnt from: ZmeiGorynych
Repo: MotleyAI/slayer PR: 60
File: examples/verify_common.py:163-175
Timestamp: 2026-04-29T14:15:45.724Z
Learning: In the MotleyAI/slayer repo, do not require defensive input validation inside private helper functions when (1) all call sites are within the repo and fully controlled, (2) the helper’s inputs are guaranteed by the callers’ logic to be valid (so invalid inputs would only occur due to an unrelated bug elsewhere), and (3) there is no external/unsafe data entry path reaching the helper. In those cases, allow omission of empty-collection guards and range checks and avoid flagging “missing validation” for the private helper. If any call site can receive untrusted/externally derived data or the helper can be reached with invalid inputs due to local logic, then missing validation should be flagged.

Learnt from: ZmeiGorynych
Repo: MotleyAI/slayer PR: 80
File: examples/embedded/run.py:66-66
Timestamp: 2026-05-03T12:19:49.584Z
Learning: In the MotleyAI/slayer repo (post v2/v3 schema migration, PR `#80`), when handling `*:count` aggregations, the correct public `result.data` column key is `orders._count` (note the leading underscore), not `orders.count`. The leading underscore is intentionally preserved and should match the corresponding `EnrichedMeasure.alias` to avoid collisions with user-defined columns named `count`. During code reviews, do not flag uses of `orders._count` in `result.data` lookups or user-facing docs as incorrect.

Learnt from: ZmeiGorynych
Repo: MotleyAI/slayer PR: 94
File: tests/integration/test_integration_duckdb.py:595-602
Timestamp: 2026-05-05T09:19:23.936Z
Learning: In the MotleyAI/slayer repo, the public `SlayerQueryEngine.save_model(model)` method (in `slayer/engine/query_engine.py`) is the intended engine-side save helper: it performs source-mode validation, cache refresh, and persistence. When reviewing, do not flag or “correct” code that calls `engine.save_model(...)` as missing or incorrect (and don’t insist on calling a lower-level `storage.save_model` instead). Direct calls to `engine.save_model(...)` are the documented intended pattern (see CLAUDE.md).

Learnt from: ZmeiGorynych
Repo: MotleyAI/slayer PR: 103
File: tests/test_storage_type_refinement.py:298-313
Timestamp: 2026-05-07T08:28:38.648Z
Learning: When reviewing Python `unittest.mock.patch` usage, verify the patch target based on *where* the imported symbol was created:
- If the code uses module-scope `from X import foo`, it creates a local binding in the consumer module, so the correct patch target is the consumer namespace (e.g., `consumer_module.foo`).
- If the code uses function-local `from X import foo` inside a function body, the symbol is re-imported/resolved from `X` on each call, so the correct patch target is the source namespace (e.g., `X.foo`).
Do not assume the consumer module’s symbol name is always the patch target—first determine whether the import is at module scope or inside a function before recommending a different patch path.

Comment thread tests/test_search_unified.py Outdated
- Add ModelColumn alias to naive parser label map (CR thread 2)
- Fix ReDoS in _NAIVE_PATTERN: structured (w+(:w+)*) avoids overlapping
  quantifiers (Sonar S5852 hotspot)
- Add SlayerError to except clauses in REST, CLI, and MCP search handlers
  so invalid cypher_filter expressions return 400/clean error, not 500/crash
  (Codex finding)
- Convert intentional TypeError tests to **{} kwarg form to avoid Sonar
  S930 BLOCKER false positives
- Extract _build_hit_from_fused_key and _apply_cypher_filter helpers,
  _query_bearing_memory_hits, and simplify _call_mcp_tool to reduce
  cognitive complexity below S3776 threshold (CC 21/18 -> <15)
- Move shared test model seed and MCP tool helper to tests/search_helpers.py
  to reduce cross-file duplicate-lines density below gate threshold

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
slayer/search/cypher_naive.py (1)

33-35: ⚡ Quick win

Use keyword arguments in re.compile.

This call passes more than one argument positionally; switch to kwargs to match repo rules.

Suggested change
 _NAIVE_PATTERN = re.compile(
-    r"^\s*MATCH\s*\(\s*\w+\s*:\s*(\w+(?:\s*:\s*\w+)*)\s*\)\s*RETURN\s+\w+\.id\s+AS\s+id\s*$",
-    re.IGNORECASE,
+    pattern=r"^\s*MATCH\s*\(\s*\w+\s*:\s*(\w+(?:\s*:\s*\w+)*)\s*\)\s*RETURN\s+\w+\.id\s+AS\s+id\s*$",
+    flags=re.IGNORECASE,
 )

As per coding guidelines "**/*.py: Use keyword arguments for functions with more than 1 parameter".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@slayer/search/cypher_naive.py` around lines 33 - 35, The call to re.compile
for _NAIVE_PATTERN passes multiple arguments positionally; change it to use
keyword args so it follows the repo rule (e.g., call re.compile(pattern=...,
flags=re.IGNORECASE)). Update the _NAIVE_PATTERN assignment to pass the regex
string as pattern= and the flag as flags= to eliminate positional arguments
while keeping the same regex and IGNORECASE behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@slayer/search/cypher_naive.py`:
- Around line 33-35: The call to re.compile for _NAIVE_PATTERN passes multiple
arguments positionally; change it to use keyword args so it follows the repo
rule (e.g., call re.compile(pattern=..., flags=re.IGNORECASE)). Update the
_NAIVE_PATTERN assignment to pass the regex string as pattern= and the flag as
flags= to eliminate positional arguments while keeping the same regex and
IGNORECASE behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 07f508ba-4117-49e2-83c2-e31b08c539e8

📥 Commits

Reviewing files that changed from the base of the PR and between fb4fd46 and e78049d.

📒 Files selected for processing (9)
  • slayer/api/server.py
  • slayer/cli.py
  • slayer/mcp/server.py
  • slayer/search/cypher_naive.py
  • slayer/search/service.py
  • tests/search_helpers.py
  • tests/test_search_service.py
  • tests/test_search_surfaces.py
  • tests/test_search_unified.py
🚧 Files skipped from review as they are similar to previous changes (5)
  • slayer/api/server.py
  • tests/test_search_surfaces.py
  • slayer/search/service.py
  • tests/test_search_service.py
  • tests/test_search_unified.py

ZmeiGorynych and others added 2 commits June 5, 2026 10:03
- Simplify search handler except tuples to (SlayerError, ValueError):
  EntityResolutionError and AmbiguousModelError are SlayerError subclasses
  and redundant in the same tuple (Sonar S5713)
- Suppress S5843 on _NAIVE_PATTERN: each \s* and \w token maps to a
  distinct syntactic Cypher element; complexity cannot be reduced without
  losing expressiveness
- Update naive parser docs to mention Column/ModelColumn alias

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix naive regex to backreference MATCH variable name in RETURN clause
  so MATCH (m:Memory) RETURN n.id AS id is correctly rejected (Codex)
- Add warning when named memory:<id> refs are excluded by naive kind_filter,
  for parity with the graph path's candidate_ids exclusion warnings (Codex)
- Move NOSONAR(S5843) to the flagged pattern string line
- Use keyword args (pattern=, flags=) for re.compile calls per repo
  convention (CR nitpick)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Jun 5, 2026

@ZmeiGorynych ZmeiGorynych merged commit b04a0dd into egor/dev-1464-unify-memories-search-on-sqlite-fts5-sqlite-vec-graphqlite Jun 5, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant