DEV-1532: unify search() — flat results list, single max_results cap, naive Cypher fallback#166
Conversation
…sults cap, naive Cypher fallback Replace the three-bucket search response (memories/example_queries/entities) with a single ranked results: List[SearchHit] list capped by max_results. Remove the per-bucket max_memories/max_example_queries/max_entities params. Add a naive Cypher parser (cypher_naive.py) that supports MATCH (n:Label1:Label2) RETURN n.id AS id without the advanced_search extra, raising SlayerError for complex Cypher. Fix LadybugDB 0.15.x incompatibilities: Column→ModelColumn node table rename, $desc→$descr parameter rename, Connection() API change. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughConverts search to a single fused SearchResponse.results list of SearchHit items capped by a new max_results parameter; adds a naive Cypher label-filter fallback when advanced_search is unavailable; renames graph Column→ModelColumn; and updates REST/MCP/CLI/client surfaces, docs, notebooks, and extensive tests to the new contract. ChangesUnified Search Response Refactor
Estimated code review effort 🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (4)
tests/test_search_service.py (1)
438-445: ⚡ Quick winAlign these cap tests with unified
max_resultssemantics.Line 438 and Line 452 still describe legacy per-bucket behavior (
default_is_two,caps_independently), but current assertions only verify generic presence/flat cap. Please rename and tighten these tests so they validate the actual unified contract.♻️ Suggested cleanup
-async def test_max_example_queries_default_is_two( +async def test_query_bearing_memories_surface_with_default_max_results( service_with_query_memories: SearchService, ) -> None: - # With max_results=10 (default), the flat list may include up to 10 items. - # We check that query-bearing memories surface. response = await service_with_query_memories.search( entities=["warehouse.orders.amount_paid"], - max_results=20, ) + assert len(response.results) <= 10 example_query_hits = [h for h in response.results if h.kind == "memory" and h.query is not None] assert len(example_query_hits) >= 1 -async def test_max_example_queries_caps_independently( +async def test_max_results_caps_flat_list( service_with_query_memories: SearchService, ) -> None: - # With max_results=1, at most 1 hit surfaces total. response = await service_with_query_memories.search( entities=["warehouse.orders.amount_paid"], max_results=1, ) assert len(response.results) <= 1Also applies to: 452-457
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_search_service.py` around lines 438 - 445, The tests referencing legacy per-bucket semantics should be renamed and tightened to assert the unified flat max_results contract: update the test that calls service_with_query_memories.search(entities=["warehouse.orders.amount_paid"], max_results=20) (and the related test around service_with_query_memories.search with default/other max_results) to (1) rename from legacy names like default_is_two / caps_independently to something like unified_max_results_flat_cap, and (2) assert that len(response.results) <= max_results and that example_query_hits (filter on h.kind == "memory" and h.query is not None) is present when expected; optionally also assert that results across all buckets combined do not exceed max_results to validate the unified cap behavior (use service_with_query_memories.search and example_query_hits to locate the code to change).tests/test_search_unified.py (1)
148-148: ⚡ Quick winMove function-local imports to module top level.
This file repeatedly imports inside functions; that conflicts with the repository rule for Python files.
As per coding guidelines,
**/*.py: Imports at the top of files.Also applies to: 456-456, 471-471, 488-488, 505-505, 520-521, 540-540, 624-624, 637-637, 652-652, 675-675, 687-687, 748-748, 767-767
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_search_unified.py` at line 148, The test file performs repeated function-local imports like "import slayer.search.service as svc_mod" inside test functions; move these imports to the module top (tests/test_search_unified.py) so all occurrences (including the other listed locations) use a single top-level import, remove the in-function duplicates, and update references to use the top-level name (svc_mod) to satisfy the repository rule that Python imports must be at file top; if there are circular-import concerns, resolve them by deferring only the import-sensitive parts (e.g., use importlib.import_module in a single helper at top) rather than scattering in-function import statements.tests/test_search_named_entity_surfacing.py (1)
406-423: ⚡ Quick winRename test to reflect new behavior.
The test name
test_max_entities_zero_suppresses_channel_1_named_outputis now misleading. The test asserts that the named entity is present (line 423), which is opposite to what the name suggests. With the unifiedmax_resultsAPI, there's no longer a separatemax_entities=0cap to suppress entity output.Consider renaming to
test_named_entity_surfaces_with_max_resultsor similar to accurately describe the validated behavior.Suggested rename
-async def test_max_entities_zero_suppresses_channel_1_named_output( +async def test_named_entity_surfaces_with_max_results( service: SearchService, ) -> None: - # With max_results=1, a single memory hit (if any) would fill the slot, - # but since no memories are tagged on this entity and no question is provided, - # only the named entity hit would appear. Use max_results=20 and filter. - # The original test checked max_entities=0 suppressed entity output. - # With the flat API, we just verify that entity hits are present when max_results>0. - # To mirror the old behavior of max_entities=0, we verify the named entity - # surfaces when max_results is large enough. + """Named entity refs surface in the flat results list when max_results allows.""" resp = await service.search( entities=["warehouse.orders.amount"], max_results=20, ) entity_hits = [h for h in resp.results if h.kind != "memory"] - # Named entity should surface (unlike old max_entities=0 which suppressed it). - # This test is now a positive assertion: the entity IS present. assert any(h.id == "warehouse.orders.amount" for h in entity_hits)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_search_named_entity_surfacing.py` around lines 406 - 423, Rename the test function test_max_entities_zero_suppresses_channel_1_named_output to a name that reflects the new positive assertion (for example test_named_entity_surfaces_with_max_results) and update any references to that function; locate the async test function definition named test_max_entities_zero_suppresses_channel_1_named_output in tests/test_search_named_entity_surfacing.py, change its identifier to the new descriptive name, and run the test suite to ensure no other callers or fixtures reference the old name.slayer/search/service.py (1)
636-639: ⚡ Quick winPass the graph-filter call by keyword.
This new 2-argument call is the one changed spot in the file that still relies on positional ordering.
As per coding guidelines, "Use keyword arguments for functions with more than 1 parameter".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@slayer/search/service.py` around lines 636 - 639, The call to _search_graph.get_filtered_ids currently uses positional args; change it to use keyword arguments to follow the guideline for functions with more than one parameter by calling _search_graph.get_filtered_ids with named parameters (e.g., pass cypher_filter=cypher_filter and storage=self._storage) so the invocation is explicit and order-independent while still relying on _search_graph.is_available() and the existing cypher_filter and self._storage variables.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/concepts/search.md`:
- Around line 233-246: Update the naive parser documentation and its accepted
label set so it matches the actual graph schema: replace the standalone "Column"
label with "ModelColumn" (or list both "Column" and "ModelColumn" as aliases) in
the label array described for the naive Cypher parser (the section that lists
Memory, Datasource, Model, Column, Measure, Aggregation); if the parser code
(naive parser implementation) only accepts "ModelColumn" ensure the docs reflect
that exact symbol, or if you add alias support, update the naive parser logic to
accept both "Column" and "ModelColumn" and document both.
In `@slayer/search/cypher_naive.py`:
- Around line 58-69: The current label-parsing code (labels_str, labels,
_LABEL_TO_KIND) builds a set of kinds for multi-label MATCHes, which incorrectly
implements Cypher's label-intersection semantics; instead, update the logic to
detect when more than one label is present (len(labels) > 1) and raise a
SlayerError rejecting multi-label MATCHes in the naive fallback (include a clear
message like "multi-label MATCH not supported in naive fallback"), leaving the
existing single-label mapping and unknown-label check intact so only
single-label MATCHes are converted to kinds via _LABEL_TO_KIND.
- Around line 19-26: The fallback label map _LABEL_TO_KIND currently lacks the
new ModelColumn label so queries like MATCH (n:ModelColumn) RETURN n.id AS id
break; update the _LABEL_TO_KIND dict (the symbol _LABEL_TO_KIND) to include an
entry mapping "ModelColumn" to "column" (i.e., "ModelColumn": "column") so
column-only filters continue to work with the renamed schema.
In `@tests/test_search_unified.py`:
- Around line 195-216: The TypeError tests should avoid static named-arg
detection by replacing direct named-argument calls to SearchService.search with
dict-based kwarg unpacking; update the three tests
(test_max_memories_param_removed_raises_type_error,
test_max_example_queries_param_removed_raises_type_error,
test_max_entities_param_removed_raises_type_error) to call await
service.search(**{"question": "x", "max_memories": 5}) /
**{"question":"x","max_example_queries":2} / **{"question":"x","max_entities":5}
respectively so the runtime still raises TypeError but static analyzers won't
flag the named args, and apply the same pattern for the similar occurrences
around the noted 763-773 range.
---
Nitpick comments:
In `@slayer/search/service.py`:
- Around line 636-639: The call to _search_graph.get_filtered_ids currently uses
positional args; change it to use keyword arguments to follow the guideline for
functions with more than one parameter by calling _search_graph.get_filtered_ids
with named parameters (e.g., pass cypher_filter=cypher_filter and
storage=self._storage) so the invocation is explicit and order-independent while
still relying on _search_graph.is_available() and the existing cypher_filter and
self._storage variables.
In `@tests/test_search_named_entity_surfacing.py`:
- Around line 406-423: Rename the test function
test_max_entities_zero_suppresses_channel_1_named_output to a name that reflects
the new positive assertion (for example
test_named_entity_surfaces_with_max_results) and update any references to that
function; locate the async test function definition named
test_max_entities_zero_suppresses_channel_1_named_output in
tests/test_search_named_entity_surfacing.py, change its identifier to the new
descriptive name, and run the test suite to ensure no other callers or fixtures
reference the old name.
In `@tests/test_search_service.py`:
- Around line 438-445: The tests referencing legacy per-bucket semantics should
be renamed and tightened to assert the unified flat max_results contract: update
the test that calls
service_with_query_memories.search(entities=["warehouse.orders.amount_paid"],
max_results=20) (and the related test around service_with_query_memories.search
with default/other max_results) to (1) rename from legacy names like
default_is_two / caps_independently to something like
unified_max_results_flat_cap, and (2) assert that len(response.results) <=
max_results and that example_query_hits (filter on h.kind == "memory" and
h.query is not None) is present when expected; optionally also assert that
results across all buckets combined do not exceed max_results to validate the
unified cap behavior (use service_with_query_memories.search and
example_query_hits to locate the code to change).
In `@tests/test_search_unified.py`:
- Line 148: The test file performs repeated function-local imports like "import
slayer.search.service as svc_mod" inside test functions; move these imports to
the module top (tests/test_search_unified.py) so all occurrences (including the
other listed locations) use a single top-level import, remove the in-function
duplicates, and update references to use the top-level name (svc_mod) to satisfy
the repository rule that Python imports must be at file top; if there are
circular-import concerns, resolve them by deferring only the import-sensitive
parts (e.g., use importlib.import_module in a single helper at top) rather than
scattering in-function import statements.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e313e15a-5319-4a6b-9b2e-6342eac86f41
📒 Files selected for processing (26)
CLAUDE.mddocs/concepts/search.mddocs/examples/09_lightning_talk/lightning_talk_nb.ipynbslayer/api/server.pyslayer/cli.pyslayer/client/slayer_client.pyslayer/mcp/server.pyslayer/search/__init__.pyslayer/search/cypher_naive.pyslayer/search/graph.pyslayer/search/service.pytests/integration/test_integration.pytests/test_cypher_naive.pytests/test_lightning_talk_notebook.pytests/test_memory_id_in_responses.pytests/test_search_datasource_filter.pytests/test_search_example_query_stale_warning.pytests/test_search_graph.pytests/test_search_invariance.pytests/test_search_lazy_gc_in_memory.pytests/test_search_lenient_validation.pytests/test_search_named_entity_surfacing.pytests/test_search_service.pytests/test_search_surfaces.pytests/test_search_three_channel.pytests/test_search_unified.py
| labels_str = match.group(1) | ||
| labels = [lb.strip() for lb in re.split(r"\s*:\s*", labels_str) if lb.strip()] | ||
| kinds: Set[str] = set() | ||
| for label in labels: | ||
| kind = _LABEL_TO_KIND.get(label.lower()) | ||
| if kind is None: | ||
| raise SlayerError( | ||
| f"unknown entity type {label!r} in cypher_filter; " | ||
| f"known types: {sorted(_LABEL_TO_KIND)!r}." | ||
| ) | ||
| kinds.add(kind) | ||
| return kinds |
There was a problem hiding this comment.
Reject multi-label MATCHes in the naive fallback.
MATCH (n:Model:Measure) RETURN n.id AS id currently becomes {"model", "measure"}, and SearchService treats that as an allowlist. That broadens results instead of preserving Cypher's label-intersection semantics.
💡 Suggested fix
labels_str = match.group(1)
labels = [lb.strip() for lb in re.split(r"\s*:\s*", labels_str) if lb.strip()]
- kinds: Set[str] = set()
- for label in labels:
- kind = _LABEL_TO_KIND.get(label.lower())
- if kind is None:
- raise SlayerError(
- f"unknown entity type {label!r} in cypher_filter; "
- f"known types: {sorted(_LABEL_TO_KIND)!r}."
- )
- kinds.add(kind)
- return kinds
+ if len(labels) != 1:
+ raise SlayerError(
+ "cypher_filter naive fallback only supports a single node label; "
+ "install the advanced_search extra for multi-label MATCH clauses."
+ )
+ label = labels[0]
+ kind = _LABEL_TO_KIND.get(label.lower())
+ if kind is None:
+ raise SlayerError(
+ f"unknown entity type {label!r} in cypher_filter; "
+ f"known types: {sorted(_LABEL_TO_KIND)!r}."
+ )
+ return {kind}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@slayer/search/cypher_naive.py` around lines 58 - 69, The current
label-parsing code (labels_str, labels, _LABEL_TO_KIND) builds a set of kinds
for multi-label MATCHes, which incorrectly implements Cypher's
label-intersection semantics; instead, update the logic to detect when more than
one label is present (len(labels) > 1) and raise a SlayerError rejecting
multi-label MATCHes in the naive fallback (include a clear message like
"multi-label MATCH not supported in naive fallback"), leaving the existing
single-label mapping and unknown-label check intact so only single-label MATCHes
are converted to kinds via _LABEL_TO_KIND.
There was a problem hiding this comment.
@coderabbitai LadybugDB uses union semantics for multi-label MATCH: every node has exactly one label (it lives in a typed node table), so MATCH (n:Model:Measure) returns the union of all Model nodes and all Measure nodes. This is documented in both the code (graph.py docstring) and search.md: 'Multi-label union: MATCH (n:Memory:ModelColumn) returns nodes from both Memory AND ModelColumn tables (LadybugDB union semantics).' The naive parser correctly implements this same union behavior. Restricting multi-label MATCHes to a single label would silently break the parity with the full LadybugDB path.
There was a problem hiding this comment.
@ZmeiGorynych, thanks for the detailed clarification — you're absolutely right. LadybugDB's single-label-per-node-table architecture means multi-label MATCH is a union operation by definition, and the naive parser's Set[str] of kinds is the correct way to implement that same semantics. My suggestion was based on a faulty assumption about intersection semantics. The current implementation is correct as-is.
✏️ Learnings added
Learnt from: ZmeiGorynych
Repo: MotleyAI/slayer PR: 166
File: slayer/search/cypher_naive.py:58-69
Timestamp: 2026-06-05T07:02:54.018Z
Learning: In MotleyAI/slayer, LadybugDB uses **union semantics** for multi-label MATCH clauses: every node lives in exactly one typed node table and carries exactly one label, so `MATCH (n:Model:Measure) RETURN n.id AS id` returns the *union* of all Model nodes and all Measure nodes (not an intersection). This is documented in `slayer/search/graph.py`'s docstring and `docs/concepts/search.md` ("Multi-label union: MATCH (n:Memory:ModelColumn) returns nodes from both Memory AND ModelColumn tables"). The naive Cypher parser in `slayer/search/cypher_naive.py` correctly implements this by collecting all resolved labels into a `Set[str]` of kind strings passed as a kind-filter. Do NOT suggest restricting multi-label MATCHes to a single label in the naive fallback — that would silently break parity with the full LadybugDB path.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: ZmeiGorynych
Repo: MotleyAI/slayer PR: 163
File: slayer/sql/generator.py:4959-4976
Timestamp: 2026-06-03T10:48:07.772Z
Learning: In MotleyAI/slayer DEV-1503 filtered-local isolation, do not flag `slayer/sql/generator.py`'s empty-base FROM-less `_placeholder` path as dropping host filters. For filtered-local plans, `slayer/engine/cross_model_planner.py::_classify_subplan_filters` propagates ROW-phase host filters into the rerooted host-rooted sub-plan, so `SlayerQuery.filters` and `SlayerModel.filters` render inside the `_cm_*` CTE. AGGREGATE-phase filters referencing isolated aggregates route to the outer combined SELECT `WHERE`, and POST filters remain in the post-transform wrapper.
Learnt from: CR
Repo: MotleyAI/slayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T13:31:52.102Z
Learning: Applies to slayer/engine/query_enrichment.py : DEV-1443: Query measure rename via `name` — filters and ORDER BY may reference either raw colon form or user alias; both resolve to user alias, and colon-form filter classifies as HAVING. Renames colliding with source columns or canonical-alias shadowing source columns are rejected
Learnt from: ZmeiGorynych
Repo: MotleyAI/slayer PR: 101
File: slayer/engine/ingestion.py:826-831
Timestamp: 2026-05-06T11:03:22.101Z
Learning: In MotleyAI/slayer auto-ingestion (`slayer/engine/ingestion.py`, `_columns_to_model`), the only reserved result alias that conflicts with a physical column name is `_count` (produced by the `*:count` aggregation formula via `canonical_agg_name` in `slayer/core/formula.py`). The rename guard `"count_col" if col_name == "_count" else col_name` is the correct and sufficient fix. A physical column literally named `count` does NOT conflict: it yields dimension alias `count` and aggregation aliases like `count_sum`, neither of which collides with `_count`. Do not flag bare `count` columns as needing renaming during auto-ingestion.
Learnt from: ZmeiGorynych
Repo: MotleyAI/slayer PR: 67
File: slayer/engine/query_engine.py:323-332
Timestamp: 2026-05-03T07:08:36.453Z
Learning: In MotleyAI/slayer, `dry_run` and `explain` on `SlayerQuery` are considered execution flags (not query-shape fields), but they currently live on `SlayerQuery` and therefore survive serialization into stored `source_queries`. The asymmetric "only override when truthy" pattern in `_execute_by_name` (slayer/engine/query_engine.py) is a known limitation tracked in issue `#71`, which aims to move these flags off `SlayerQuery` entirely. Do not suggest unconditional overrides as a fix; point to the deeper redesign tracked in `#71` instead.
Learnt from: ZmeiGorynych
Repo: MotleyAI/slayer PR: 99
File: slayer/core/formula.py:776-835
Timestamp: 2026-05-06T07:58:02.748Z
Learning: In MotleyAI/slayer, flat-dispatch functions that exceed Sonar's cognitive-complexity threshold (S3776) are intentionally left as-is and suppressed with NOSONAR when splitting into helpers would require threading context variables (e.g., `transform`, `original`) through every helper call solely to preserve error-message quality. This is an established repo convention used in at least 7 locations (slayer/cli.py:418, slayer/cli.py:893, and 5 spots in slayer/engine/query_engine.py). Do not flag or suggest refactoring these flat-dispatch blocks on cognitive-complexity grounds alone.
Learnt from: CR
Repo: MotleyAI/slayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T13:31:52.102Z
Learning: DEV-1445: Cross-model filter remap — same-stage filter referencing renamed cross-model measure is NOT auto-resolved (both raise at strict resolution). Workaround: restructure as multi-stage `source_queries` so cross-model measure becomes local
Learnt from: CR
Repo: MotleyAI/slayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T13:31:52.102Z
Learning: Applies to slayer/engine/query_enrichment.py : DEV-1378: Mode A model filters route through `parse_sql_predicate` at enrichment time. `SlayerQuery.filters` (Mode B) accepts lowercase allowlist: `lower`, `upper`, `trim`, `replace`, `substr`, `instr`, `length`, `concat`, plus `||` concat operator (rewritten to `concat(...)`)
Learnt from: ZmeiGorynych
Repo: MotleyAI/slayer PR: 61
File: docs/reference/rest-api.md:0-0
Timestamp: 2026-04-29T14:35:38.432Z
Learning: In MotleyAI/slayer (PR `#61`, v2 schema), `SlayerQuery` uses `measures` (not `fields`) as the query-time aggregation key. `fields` was the v1 name; it was renamed to `measures` as part of the v2 schema migration. The REST API `POST /query` payload and all other query examples should use `"measures": [...]` with colon-aggregation syntax (e.g., `"*:count"`, `"revenue:sum"`). Do not suggest reverting `measures` to `fields`.
Learnt from: CR
Repo: MotleyAI/slayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T13:31:52.102Z
Learning: Applies to slayer/search/service.py : DEV-1513: BM25 operates with implicit self-references — every doc carries a single tag pointing at itself. So `entities=["<canonical>"]` surfaces the named entity, and `entities=["memory:<id>"]` surfaces the named memory
Learnt from: CR
Repo: MotleyAI/slayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T09:40:10.632Z
Learning: Applies to **/formula.py : SlayerQuery.filters (Mode B DSL) accepts only a lowercase allowlist of string-hygiene scalars: lower, upper, trim, replace, substr, instr, length, concat—plus the SQL || concat operator (rewritten to concat(...))
Learnt from: CR
Repo: MotleyAI/slayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T13:31:52.102Z
Learning: Applies to slayer/facade/**/*.py : DEV-1486: Facade-agnostic shared layer extracted to `slayer/facade/` (translator.py, catalog.py, info_schema.py, probe_queries.py, rows.py, datatypes.py). Flight/Postgres facades re-export under historical names and wrap RowBatch at edge
Learnt from: CR
Repo: MotleyAI/slayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T15:53:24.229Z
Learning: Applies to **/search/*.py : Search runs three parallel channels merged by RRF: Channel 1 (entity-overlap BM25 over memories), Channel 2 (tantivy full-text over memories ∪ entities), Channel 3 (optional dense embedding similarity via litellm)
Learnt from: CR
Repo: MotleyAI/slayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T13:31:52.102Z
Learning: `slayer search [--entity ENT ...] [--query JSON] [--question TEXT] [--datasource DS] [--max-memories N] [--max-example-queries N] [--max-entities N] [--format json|text]` runs three-channel semantic search: BM25 + tantivy full-text + optional embeddings (DEV-1375/DEV-1386/DEV-1409)
Learnt from: CR
Repo: MotleyAI/slayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T15:53:24.229Z
Learning: Implement three-channel semantic search (BM25 + Tantivy + optional embeddings) with Reciprocal Rank Fusion (k=60) over memories and entities, with cascade-on-delete to strip dangling references
Learnt from: ZmeiGorynych
Repo: MotleyAI/slayer PR: 36
File: slayer/mcp/server.py:167-201
Timestamp: 2026-04-17T09:00:49.541Z
Learning: In MotleyAI/slayer, custom aggregations defined via `Aggregation` with `params` always declare default values for each parameter. Therefore, when reviewing code that uses a measure’s `allowed_aggregations` (including custom aggregations), do not raise/flag an “unsafe/missing arguments” concern for calling `allowed_aggregations[0]` (or any listed aggregation) solely because explicit arguments are not supplied—defaults make the call safe.
Learnt from: ZmeiGorynych
Repo: MotleyAI/slayer PR: 59
File: slayer/mcp/server.py:1725-1740
Timestamp: 2026-04-29T13:25:08.874Z
Learning: Do not treat `attrs.get(col_name)` as an error/crash when `attrs` is a `ResponseAttributes` instance. In `slayer/engine/query_engine.py`, `ResponseAttributes` is a Pydantic `BaseModel` with `dimensions: Dict[str, FieldMetadata]` and `measures: Dict[str, FieldMetadata]`, and it provides a custom `get(column: str) -> Optional[FieldMetadata]` that searches both dicts. Also note `SlayerResponse.attributes` should always be a `ResponseAttributes` instance (non-optional) because it uses `default_factory=ResponseAttributes`—so `attributes.get(...)` is intentionally safe and may legitimately return `None` if the column key isn’t present.
Learnt from: ZmeiGorynych
Repo: MotleyAI/slayer PR: 60
File: examples/verify_common.py:163-175
Timestamp: 2026-04-29T14:15:45.724Z
Learning: In the MotleyAI/slayer repo, do not require defensive input validation inside private helper functions when (1) all call sites are within the repo and fully controlled, (2) the helper’s inputs are guaranteed by the callers’ logic to be valid (so invalid inputs would only occur due to an unrelated bug elsewhere), and (3) there is no external/unsafe data entry path reaching the helper. In those cases, allow omission of empty-collection guards and range checks and avoid flagging “missing validation” for the private helper. If any call site can receive untrusted/externally derived data or the helper can be reached with invalid inputs due to local logic, then missing validation should be flagged.
Learnt from: ZmeiGorynych
Repo: MotleyAI/slayer PR: 80
File: examples/embedded/run.py:66-66
Timestamp: 2026-05-03T12:19:49.584Z
Learning: In the MotleyAI/slayer repo (post v2/v3 schema migration, PR `#80`), when handling `*:count` aggregations, the correct public `result.data` column key is `orders._count` (note the leading underscore), not `orders.count`. The leading underscore is intentionally preserved and should match the corresponding `EnrichedMeasure.alias` to avoid collisions with user-defined columns named `count`. During code reviews, do not flag uses of `orders._count` in `result.data` lookups or user-facing docs as incorrect.
Learnt from: ZmeiGorynych
Repo: MotleyAI/slayer PR: 94
File: tests/integration/test_integration_duckdb.py:595-602
Timestamp: 2026-05-05T09:19:23.936Z
Learning: In the MotleyAI/slayer repo, the public `SlayerQueryEngine.save_model(model)` method (in `slayer/engine/query_engine.py`) is the intended engine-side save helper: it performs source-mode validation, cache refresh, and persistence. When reviewing, do not flag or “correct” code that calls `engine.save_model(...)` as missing or incorrect (and don’t insist on calling a lower-level `storage.save_model` instead). Direct calls to `engine.save_model(...)` are the documented intended pattern (see CLAUDE.md).
Learnt from: ZmeiGorynych
Repo: MotleyAI/slayer PR: 103
File: tests/test_storage_type_refinement.py:298-313
Timestamp: 2026-05-07T08:28:38.648Z
Learning: When reviewing Python `unittest.mock.patch` usage, verify the patch target based on *where* the imported symbol was created:
- If the code uses module-scope `from X import foo`, it creates a local binding in the consumer module, so the correct patch target is the consumer namespace (e.g., `consumer_module.foo`).
- If the code uses function-local `from X import foo` inside a function body, the symbol is re-imported/resolved from `X` on each call, so the correct patch target is the source namespace (e.g., `X.foo`).
Do not assume the consumer module’s symbol name is always the patch target—first determine whether the import is at module scope or inside a function before recommending a different patch path.
- Add ModelColumn alias to naive parser label map (CR thread 2)
- Fix ReDoS in _NAIVE_PATTERN: structured (w+(:w+)*) avoids overlapping
quantifiers (Sonar S5852 hotspot)
- Add SlayerError to except clauses in REST, CLI, and MCP search handlers
so invalid cypher_filter expressions return 400/clean error, not 500/crash
(Codex finding)
- Convert intentional TypeError tests to **{} kwarg form to avoid Sonar
S930 BLOCKER false positives
- Extract _build_hit_from_fused_key and _apply_cypher_filter helpers,
_query_bearing_memory_hits, and simplify _call_mcp_tool to reduce
cognitive complexity below S3776 threshold (CC 21/18 -> <15)
- Move shared test model seed and MCP tool helper to tests/search_helpers.py
to reduce cross-file duplicate-lines density below gate threshold
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
slayer/search/cypher_naive.py (1)
33-35: ⚡ Quick winUse keyword arguments in
re.compile.This call passes more than one argument positionally; switch to kwargs to match repo rules.
Suggested change
_NAIVE_PATTERN = re.compile( - r"^\s*MATCH\s*\(\s*\w+\s*:\s*(\w+(?:\s*:\s*\w+)*)\s*\)\s*RETURN\s+\w+\.id\s+AS\s+id\s*$", - re.IGNORECASE, + pattern=r"^\s*MATCH\s*\(\s*\w+\s*:\s*(\w+(?:\s*:\s*\w+)*)\s*\)\s*RETURN\s+\w+\.id\s+AS\s+id\s*$", + flags=re.IGNORECASE, )As per coding guidelines "
**/*.py: Use keyword arguments for functions with more than 1 parameter".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@slayer/search/cypher_naive.py` around lines 33 - 35, The call to re.compile for _NAIVE_PATTERN passes multiple arguments positionally; change it to use keyword args so it follows the repo rule (e.g., call re.compile(pattern=..., flags=re.IGNORECASE)). Update the _NAIVE_PATTERN assignment to pass the regex string as pattern= and the flag as flags= to eliminate positional arguments while keeping the same regex and IGNORECASE behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@slayer/search/cypher_naive.py`:
- Around line 33-35: The call to re.compile for _NAIVE_PATTERN passes multiple
arguments positionally; change it to use keyword args so it follows the repo
rule (e.g., call re.compile(pattern=..., flags=re.IGNORECASE)). Update the
_NAIVE_PATTERN assignment to pass the regex string as pattern= and the flag as
flags= to eliminate positional arguments while keeping the same regex and
IGNORECASE behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 07f508ba-4117-49e2-83c2-e31b08c539e8
📒 Files selected for processing (9)
slayer/api/server.pyslayer/cli.pyslayer/mcp/server.pyslayer/search/cypher_naive.pyslayer/search/service.pytests/search_helpers.pytests/test_search_service.pytests/test_search_surfaces.pytests/test_search_unified.py
🚧 Files skipped from review as they are similar to previous changes (5)
- slayer/api/server.py
- tests/test_search_surfaces.py
- slayer/search/service.py
- tests/test_search_service.py
- tests/test_search_unified.py
- Simplify search handler except tuples to (SlayerError, ValueError): EntityResolutionError and AmbiguousModelError are SlayerError subclasses and redundant in the same tuple (Sonar S5713) - Suppress S5843 on _NAIVE_PATTERN: each \s* and \w token maps to a distinct syntactic Cypher element; complexity cannot be reduced without losing expressiveness - Update naive parser docs to mention Column/ModelColumn alias Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix naive regex to backreference MATCH variable name in RETURN clause so MATCH (m:Memory) RETURN n.id AS id is correctly rejected (Codex) - Add warning when named memory:<id> refs are excluded by naive kind_filter, for parity with the graph path's candidate_ids exclusion warnings (Codex) - Move NOSONAR(S5843) to the flagged pattern string line - Use keyword args (pattern=, flags=) for re.compile calls per repo convention (CR nitpick) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
b04a0dd
into
egor/dev-1464-unify-memories-search-on-sqlite-fts5-sqlite-vec-graphqlite



Summary
SearchResponse(memories/example_queries/entities) with a single flat rankedresults: List[SearchHit]listmax_memories/max_example_queries/max_entities; replace with a singlemax_results: int = 10cap across all hit typesslayer/search/cypher_naive.py) that supportsMATCH (n:Label1:Label2) RETURN n.id AS idwithout theadvanced_searchextra; complex Cypher raisesSlayerErrorwith an install hintColumn→ModelColumnnode table rename,$desc→$descrparameter rename,Connection()API changeTest plan
poetry run pytest tests/test_cypher_naive.py— 19 unit tests for the naive parserpoetry run pytest tests/test_search_unified.py— 64 new integration tests pinning the unified APIpoetry run pytest -m "not integration"— full unit suite (3604 pass, 0 fail)poetry run pytest tests/integration/test_integration.py -m integration -k search— SQLite integration search tests🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Tests