MotleyAI · ZmeiGorynych · Jun 5, 2026 · Jun 5, 2026 · Jun 5, 2026 · Jun 5, 2026
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -95,7 +95,7 @@ poetry run ruff check slayer/ tests/
 
   **Write side**: `save_memory(learning, linked_entities, id=None)` and `forget_memory(id)`, exposed via MCP, REST (`POST /memories`, `DELETE /memories/{id}`), CLI (`slayer memory {save,forget}`), and `SlayerClient`. `linked_entities` is either a list of entity strings (resolved strictly; `memory:<id>` accepted) or an inline `SlayerQuery` / dict (entities auto-extracted; the query is persisted on the memory). Optional `id` (DEV-1428) pins a user-controlled canonical memory id; duplicate id → unconditional upsert, `created_at` preserved.
 
-  **Read side**: a single `search(entities, query, question, datasource=None, cypher_filter=None, max_memories=5, max_example_queries=2, max_entities=5)` tool. Surfaces: MCP, REST (`POST /search`), CLI (`slayer search …`), `SlayerClient.search()`. DEV-1428: search is **lenient** — unresolved `entities` / `query` tokens become warnings rather than raising; stale memory entity tags are filtered out at retrieval time (belt) before BM25 ranks AND before `matched_entities` is surfaced; for `example_queries` hits whose attached `Memory.query` no longer resolves, a `example_query memory:<id>: attached query has stale references; re-save to clean.` warning is emitted (the query is not rewritten).
+  **Read side**: a single `search(entities, query, question, datasource=None, cypher_filter=None, max_results=10)` tool. Surfaces: MCP, REST (`POST /search`), CLI (`slayer search …`), `SlayerClient.search()`. Returns a `SearchResponse` with a single flat ranked `results: List[SearchHit]` — memories (kind="memory"), entities (kind one of datasource/model/column/measure/aggregation) all in one list. DEV-1428: search is **lenient** — unresolved `entities` / `query` tokens become warnings rather than raising; stale memory entity tags are filtered out at retrieval time before BM25 ranks AND before `matched_entities` is surfaced; for query-bearing memory hits whose attached `Memory.query` no longer resolves, a `example_query memory:<id>: attached query has stale references; re-save to clean.` warning is emitted (the query is not rewritten).
 
   **Canonical entity form** is `<ds>`, `<ds>.<model>`, `<ds>.<model>.<leaf>`, or — DEV-1428 — `memory:<id>` (cross-memory references). Aggregation suffixes are stripped (`revenue:sum` → `<ds>.<model>.revenue`); `*:count` collapses to the source model; multi-hop paths keep only the leaf. Resolver: `slayer/memories/resolver.py` (the `memory:` branch runs at the top of `resolve_entity`, before `_strip_agg_suffix`, so `memory:abc` parses as the memory branch). Memory ids are non-empty strings (DEV-1428) — pure-digit auto-allocated by the storage layer (`"1"`, `"2"`, ...), or user-supplied (`"kb.policy.42"`); forbidden charset: `:`, `/`, `?`, `#`, whitespace, ASCII control. Bare names never resolve to memories (the `memory:` prefix is mandatory). `delete_memory` cascades to the matching embedding row AND strips every `memory:<id>` reference to it from every other memory's `entities` (DEV-1428 cascade layer 1).
 
@@ -106,21 +106,21 @@ poetry run ruff check slayer/ tests/
   - **Tantivy** in-memory full-text index, built fresh per call over memories ∪ non-hidden entities (datasources / models / columns / named measures / aggregations), using the `en_stem` analyzer.
   - **Embeddings** (optional `advanced_search` pip extra) — dense cosine over a persistent `embeddings` sidecar keyed by `(canonical_id, embedding_model_name)`. Model from `SLAYER_EMBEDDING_MODEL` (default `openai/text-embedding-3-small`), dispatched via litellm. When the extra is missing, no API key is set, or the corpus is empty, the channel contributes nothing and emits one warning into `SearchResponse.warnings`.
 
-  BM25 (channel 1) operates with implicit self-references (DEV-1513): every doc — memory or entity — is treated as carrying a single tag pointing at itself. So `entities=["<canonical>"]` surfaces the named entity in the entities bucket, and `entities=["memory:<id>"]` surfaces the named memory in the memories bucket, on top of the usual entity-overlap matches. Entity rankings from channels 1, 2 (tantivy), and 3 (embeddings) are RRF-fused. Memory hits are partitioned by `Memory.query is None` into `memories` (learning-only) and `example_queries` (query-bearing), each with its own cap. Each output bucket is ranked independently of the others — varying one `max_X` cap cannot reorder or move items in/out of any other bucket. The in-memory tantivy index is built with `writer(num_threads=1)` so doc-id tiebreaks on equal BM25 scores are deterministic. Empty-input fallback returns the newest memories per bucket with a warning.
+  BM25 (channel 1) operates with implicit self-references (DEV-1513): every doc — memory or entity — is treated as carrying a single tag pointing at itself. So `entities=["<canonical>"]` surfaces the named entity, and `entities=["memory:<id>"]` surfaces the named memory, on top of the usual entity-overlap matches. All rankings from channels 1, 2 (tantivy), and 3 (embeddings) are RRF-fused into a single flat list. The in-memory tantivy index is built with `writer(num_threads=1)` so doc-id tiebreaks on equal BM25 scores are deterministic. Empty-input fallback returns the newest memories capped at `max_results` with a warning.
 
   **Indexed text** is rendered by `slayer/search/render.py`. Hidden models / columns are excluded; `meta` is never indexed. Named children (columns, measures, aggregations, join targets) are referenced by name + kind only (each child has its own indexed doc).
 
   **`datasource` filter**: all surfaces accept optional `datasource: Optional[str] = None`. When set, every channel pre-filters its corpus to canonical ids rooted at that datasource (exact name or strict dotted-path descendant); memories surface iff at least one of their `entities` is rooted there. Unknown datasource → `ValueError` (HTTP 400 on REST).
 
-  **`cypher_filter` graph pre-filter** (DEV-1464): all surfaces accept optional `cypher_filter: Optional[str] = None`. When set, an openCypher `MATCH … RETURN … AS id` query runs against an ephemeral in-memory LadybugDB property graph built from current storage state. Returned IDs become a hard allowlist for all three channels. Requires `advanced_search` extra (LadybugDB). Query must be a single read-only statement returning one `id` column. Graph nodes: Memory (id, learning), Datasource (id, name), Model (id, name, description), Column (id, name, data_type, description), Measure (id, name, description), Aggregation (id, name). Relationships: MENTIONS (Memory→any), CONTAINS (Datasource→Model, Model→{Column,Measure,Aggregation}), JOINS (Model→Model). Hidden models/columns excluded. Graph is rebuilt when `storage.graph_fingerprint()` changes (file mtime). Cache: per-storage-path with asyncio double-checked locking.
+  **`cypher_filter` graph pre-filter** (DEV-1464): all surfaces accept optional `cypher_filter: Optional[str] = None`. When set, returned IDs become a hard allowlist for all three channels. When `advanced_search` is installed, a full openCypher `MATCH … RETURN … AS id` query runs against an ephemeral in-memory LadybugDB property graph. When not installed, only `MATCH (n:Label1:Label2) RETURN n.id AS id` patterns are supported as a kind filter (naive fallback, DEV-1532); more complex Cypher raises `SlayerError` with an install hint. Graph nodes (full path): Memory (id=`memory:<id>`, learning), Datasource (id, name), Model (id=`<ds>.<model>`, name, description), ModelColumn (id=`<ds>.<model>.<col>`, name, data_type, description), Measure (id, name, description), Aggregation (id, name). Relationships: MENTIONS (Memory→any), CONTAINS (Datasource→Model, Model→{ModelColumn,Measure,Aggregation}), JOINS (Model→Model). Hidden models/columns excluded. Graph rebuilt when `storage.graph_fingerprint()` changes (file mtime). Cache: per-storage-path with asyncio double-checked locking.
 
   **Embedding refresh** runs inline on `slayer ingest`, `edit_model`, `save_memory`, and `--ingest-on-startup`. Each per-datasource ingest pass refreshes the datasource doc, every visible model + its visible children, and every memory whose canonical entities are rooted at the datasource. Content-hash skips the litellm call when nothing has changed; the hot path issues one batched read + one batched write per refresh, independent of subtree size. Per-entity failures are non-fatal; per-memory failures surface as `IngestionError(model_name="memory:<id>", …)` in `IdempotentIngestResult.errors`.
 
   **Embedding storage**: `SQLiteStorage` writes embeddings into the main `.db`; `YAMLStorage` uses a sidecar `<base_dir>/embeddings.db` so the YAML store stays git-diffable. Both go through `slayer/storage/sidecar_embedding_store.py`. Cascade-delete on a `canonical_id` matches exactly or as a strict dotted-path descendant — never as a character prefix.
 
   **Sample-value snapshots** are cached on `Column.sampled` (text), `Column.sampled_values` (structured top-50 list for categorical columns, DEV-1480), and `Column.distinct_count` (true cardinality for categorical columns, DEV-1480). Refreshed on `slayer ingest` (table-backed models only), on `slayer search refresh-samples`, on `edit_model` (column edits → that column; model-level changes to `filters` / `sql` / `source_queries` → every column), and lazily on `inspect_model` cache miss (best-effort write-back). Categorical columns are ordered by count desc with alphabetical tie-break; the structured list is the consumer-facing way to compare predicate literals against actual stored values (text-split on `sampled` is ambiguous for values containing commas, e.g. `"R$ 1,000–3,000"`). Cache validity for categorical columns requires `sampled_values is not None` (v6 → v7 upgrades re-profile on next `inspect_model`). sql-mode and query-backed models do not yet have sample-value coverage.
 
-  `inspect_model` auto-renders a `Learnings` section showing only learning-only memories (`query is None`); query-bearing memories surface only via `search` in the `example_queries` bucket.
+  `inspect_model` auto-renders a `Learnings` section showing only learning-only memories (`query is None`); query-bearing memories surface only via `search` (as hits with `hit.query is not None`).
 
   See [docs/concepts/memories.md](docs/concepts/memories.md) and [docs/concepts/search.md](docs/concepts/search.md).
 
@@ -174,7 +174,7 @@ poetry run ruff check slayer/ tests/
 - `slayer serve --ingest-on-startup` and `slayer mcp --ingest-on-startup` (DEV-1392) — opt-in boot-time idempotent auto-ingestion across every configured datasource, sync-before-listen (uvicorn/mcp.run don't start until ingest finishes). Continue-on-failure: per-datasource errors are friendly-formatted to stderr and never abort startup; `storage.list_datasources()` raising is the only thing that prevents the server from starting. `to_delete` drift entries are printed but **never auto-applied** — destructive cleanup stays gated behind `slayer validate-models --force-clean [--yes]`. Composes freely with `--demo` (demo first, then the ingest pass over every datasource including the freshly-created demo). Also exposed via `SLAYER_INGEST_ON_STARTUP=1` env var (flag wins when both set) and the `ingest_on_startup=True` kwarg on `create_app` / `create_mcp_server`. All output goes to stderr — `slayer mcp` stdio JSON-RPC remains protocol-safe. Orchestrator: `slayer/engine/ingestion.py::ingest_all_datasources_idempotent`. **Memory embeddings** (DEV-1416): each per-datasource pass also re-embeds every memory whose canonical entities are rooted at the datasource, so a stale `embeddings.db` is repaired by the next `--ingest-on-startup` without extra steps. See [docs/concepts/ingestion.md](docs/concepts/ingestion.md#ingesting-at-startup).
 - `slayer validate-models [--datasource X] [--force-clean] [--yes]` (DEV-1356) — read-only diff against live schemas; with `--force-clean`, prompts to apply each delete via `engine.apply_drift_deletes`. See [docs/concepts/schema-drift.md](docs/concepts/schema-drift.md).
 - `slayer storage migrate-types [--data-source X] [--dry-run]` (DEV-1361) — refine `DOUBLE → INT` on base columns whose live SQL type is integer for every persisted model, then write the refined v5 dict back. Hard-fails if a datasource is unreachable. The same refinement runs transparently inside `storage.get_model` on first load; this CLI is a batch / inspectable alternative.
-- `slayer search [--entity ENT ...] [--query JSON_OR_@FILE] [--question TEXT] [--datasource DS] [--cypher-filter CYPHER] [--max-memories N] [--max-example-queries N] [--max-entities N] [--format json|text]` (DEV-1375 / DEV-1386 / DEV-1409 / DEV-1464) — up to three-channel semantic search over memories + canonical entities (BM25 over memory entity tags + tantivy full-text + optional dense embedding similarity). `--datasource` scopes the corpus to one datasource. `--cypher-filter` runs an openCypher MATCH query against the LadybugDB property graph and pre-filters all channels to the returned IDs (requires `advanced_search` extra). See [docs/concepts/search.md](docs/concepts/search.md).
+- `slayer search [--entity ENT ...] [--query JSON_OR_@FILE] [--question TEXT] [--datasource DS] [--cypher-filter CYPHER] [--max-results N] [--format json|text]` (DEV-1375 / DEV-1386 / DEV-1409 / DEV-1464 / DEV-1532) — up to three-channel semantic search over memories + canonical entities (BM25 over memory entity tags + tantivy full-text + optional dense embedding similarity). Returns a single flat ranked `results` list. `--datasource` scopes the corpus to one datasource. `--cypher-filter` pre-filters all channels: full openCypher when `advanced_search` is installed; simple `MATCH (n:Label) RETURN n.id AS id` kind-filter without it. See [docs/concepts/search.md](docs/concepts/search.md).
 - `slayer search refresh-samples [--data-source X] [--model M ...]` (DEV-1375) — re-profile and persist `Column.sampled` for table-backed models. Best-effort: per-column failures are reported but don't abort.
 - MCP `query()` tool has a `format` parameter: `"markdown"` (default), `"json"`, or `"csv"`.
 - **`query_nested` MCP tool**: companion to `query` for the multi-stage DAG shape that `engine.execute(query=list[...])` already supports. Takes `queries: List[Dict[str, Any]]` plus the usual `variables` / `show_sql` / `dry_run` / `explain` / `format` knobs. Earlier entries are named sub-queries that later entries reference via `source_model: "<sibling_name>"` or `joins.target_model`; the engine auto-sorts the list (Kahn's algorithm), so order doesn't matter. The single-stage `query` tool is unchanged — keep using it whenever the typed per-field schema fits, since it surfaces a richer signature to agents.