Add schema-drift validation, idempotent ingestion, and SchemaDriftError (DEV-1356) by ZmeiGorynych · Pull Request #101 · MotleyAI/slayer

ZmeiGorynych · 2026-05-06T10:23:27Z

Summary

Implements DEV-1356: three first-class behaviours for keeping persisted SLayer models in sync with live database schemas.

validate_models — read-only diff that returns the minimal list of deletes (drop columns / measures / joins / filters / models) needed to make persisted models valid against the current live schema. Output replays directly as edit_model / delete_model calls.
Idempotent ingest_datasource_models — re-runs are additive only (add missing columns / joins / tables, never mutate existing) and return a combined report of what was added plus what validate_models says still needs deleting.
SchemaDriftError — engine.execute() failures get post-hoc-attributed via validate_models against the touched models' datasources, surfacing a clear error pointing at the validator and edit_model instead of a raw DBAPI message.

Surfaces

Layer	Read-only diff	Apply (destructive)	Idempotent ingest
Engine	`engine.validate_models()`	`engine.apply_drift_deletes()`	`ingest_datasource_idempotent()`
MCP	`validate_models` tool	—	`ingest_datasource_models` (idempotent)
REST	`POST /validate-models`	—	`POST /ingest`
CLI	`slayer validate-models`	`slayer validate-models --force-clean [--yes]`	`slayer ingest`

--force-clean is intentionally CLI-only — destructive auto-application is opt-in at the human-typed layer.

Cascade rules implemented

A drop on M.X cascades through:

Derived Column.sql on M referencing X (transitive across chains of derived columns).
ModelMeasure.formula on M referencing X or any other dropped measure.
Joins on M whose local FK column is X; joins on K whose target_model == M and foreign_column == X.
Model-level filter strings on M referencing X → moved to remove_filters.
Cross-model derived references through the join graph (single-hop and multi-hop via __-delimited path aliases).
Query-backed source_queries chains transitively referencing dropped M.X or whole-dropped M → whole-model drop.
PK drops do not cascade — only the bare drop_column is emitted.

Cascade walking stays strictly within the parent datasource. The collapse rule preempts an EditModelDelete with a WholeModelDelete on the same model.

Test plan

Unit tests for diff + cascade + collapse — tests/test_validate_models.py (30 tests)
Unit tests for idempotent ingestion — tests/test_idempotent_ingestion.py (10 tests, covering preservation of description/label/format/meta on existing columns, sql-mode and query-backed skip, per-model error capture, exclude-table scoping, type-drift two-pass invariant)
Unit tests for SchemaDriftError query-time wrap — tests/test_schema_drift_error.py (5 tests, covering wrap-on-drift, no-drift passthrough, healthy-path zero overhead, attribution-failure swallowing, models_touched computation)
Unit tests for apply_drift_deletes + CLI --force-clean — tests/test_force_clean.py (8 tests, including subprocess CLI integration: prompt-aborts-on-n, --yes skips prompt)
Integration tests against DuckDB — tests/integration/test_schema_drift_duckdb.py (3 tests, real DDL mutations)
Full unit suite — 1889 passed, ruff clean
Existing DuckDB integration suite — 36 passed

Out of scope (per issue)

Type-change auto-coercion (the two-pass flow handles type drift via drop + re-ingest).
Cross-datasource drift — tracked separately in DEV-1360.
Save-time sql-mode projection invariant — tracked separately in DEV-1358.
Rename detection (renames present as drop + add).
Web UI.

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Idempotent ingestion with structured results (additions, pending deletes, errors); new validate-models API/CLI for drift detection and optional apply.
User Experience
- Clearer, human-friendly CLI/API reporting for ingestion and drift, plus guidance for applying minimal deletes.
Documentation
- Expanded guides and a worked example covering idempotent re-ingestion and schema-drift workflows.
Tests
- New unit and integration tests covering idempotent ingestion, drift detection, error handling, and apply flows.

…or (DEV-1356) - validate_models — read-only diff between persisted SlayerModels and live schemas, returning the minimal list of replayable deletes. Surfaced via engine.validate_models, MCP validate_models tool, REST POST /validate-models, and slayer validate-models CLI. - ingest_datasource_idempotent — additive-only re-ingestion that preserves customised column metadata and returns IdempotentIngestResult bundling validate_models output. Wired into MCP / REST / slayer ingest CLI. - SchemaDriftError — query-time wrap that attributes DBAPI failures via validate_models and surfaces a structured payload (HTTP 422 on REST). - apply_drift_deletes — engine helper used by slayer validate-models --force-clean [--yes] to apply the diff. CLI-only; not exposed via MCP/REST. - Cascade walker handles derived columns, measures, joins, filters, cross-model refs, and query-backed source_queries chains. PK drops do not cascade. Cascade walking stays within parent datasource. - 56 new tests (46 unit + 10 idempotent + 5 SchemaDriftError + 8 force-clean + 3 DuckDB integration). Full unit suite passes (1889). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-06T10:24:31Z

Caution

Review failed

Failed to post review comments

📝 Walkthrough

Walkthrough

Implements schema-drift detection and remediation plus an idempotent ingestion flow: new drift payload types and validator, additive idempotent ingest that merges discovered models into persisted ones, query-engine drift attribution and apply APIs, new REST/CLI/MCP endpoints, documentation updates, and extensive tests.

Changes

Schema Drift & Idempotent Ingestion

Layer / File(s)	Summary
Public errors & payloads `slayer/core/errors.py`, `slayer/engine/schema_drift.py`	Adds `SchemaDriftError` and Pydantic payloads/types: DeleteReason, RemoveSpec, EditModelDelete, WholeModelDelete, ToDeleteEntry, ModelAddition, IngestionError, IdempotentIngestResult, AppliedEntry, ApplyError, ApplyDriftResult, LiveTable.
Drift validation core `slayer/engine/schema_drift.py`	Implements full drift diffing and orchestration: live introspection/trial-execute, diff helpers (sql-table/sql-model), cascade rules, compute_datasource_drops, and async `validate_datasource` orchestration producing ToDeleteEntry lists.
Idempotent ingestion implementation `slayer/engine/ingestion.py`	Adds idempotent ingestion flow and helpers: `_existing_join_signatures`, `_additive_merge_existing`, `_process_one_table`, `_scoped_models_for_validation`, and public `ingest_datasource_idempotent` returning `IdempotentIngestResult` (additions, to_delete, errors).
Query engine wiring & remediation `slayer/engine/query_engine.py`	Adds explain-path support, helpers to collect touched models and expand join graph, `_maybe_raise_schema_drift` to attribute runtime SQL errors to drift and raise `SchemaDriftError`; adds `edit_model_remove`, `delete_model_by_name`, and `apply_drift_deletes` with per-entry isolation and post-apply validation.
MCP tooling & rendering `slayer/mcp/server.py`	Adds MCP `validate_models` tool, rendering helpers (`_render_ingest_result`, sections for new/updated/unchanged/drift/errors), and routes existing ingest tool through idempotent ingestion.
API surfaces `slayer/api/server.py`	Adds `ValidateModelsRequest`, `POST /validate-models` (drift diff), `POST /ingest` (idempotent ingest result), and 422 handling that serializes `SchemaDriftError` context (models, to_delete, cause).
CLI `slayer/cli.py`	Adds `validate-models` subcommand (`--force-clean/--yes`) with `_run_validate_models`, `_format_validate_models_output`, switches existing ingest command to `ingest_datasource_idempotent`, and adds printers/parsers for additions, drift, and errors.
Docs & examples `.claude/skills/.md`, `CLAUDE.md`, `docs/concepts/.md`, `docs/examples/schema-drift.md`	Updates documentation and examples to demonstrate idempotent ingestion, idempotent re-runs semantics, schema-drift guidance, CLI usage, and a worked schema-drift example.
Tests `tests/*` (many new files)	Adds extensive tests: validate_models unit/integration (SQLite/DuckDB), idempotent ingestion tests, SchemaDriftError behavior, apply_drift_deletes and CLI force-clean flows, integration updates, and small expectation text changes.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant CLI
    participant Engine as QueryEngine
    participant Storage
    participant DB as LiveDB
    participant Drift as DriftValidator

    User->>CLI: slayer validate-models
    CLI->>Storage: load persisted SlayerModels
    CLI->>Drift: validate_datasource(models, sql clients)
    Drift->>DB: introspect / trial-execute
    DB-->>Drift: live metadata
    Drift->>Drift: diff & compute_datasource_drops
    Drift-->>CLI: ToDeleteEntry[] (edits / whole-drops)
    CLI->>User: display diff
    alt User confirms --force-clean
        CLI->>Engine: apply_drift_deletes(deletes)
        Engine->>Storage: edit/delete persisted models
        Storage-->>Engine: applied entries
        Engine->>Drift: validate_datasource(post-apply)
        Drift-->>Engine: residual drift
        Engine-->>CLI: ApplyDriftResult
        CLI->>User: report applied + residuals
    end

sequenceDiagram
    actor User
    participant CLI
    participant Ingest as IngestEngine
    participant Storage
    participant DB as LiveDB
    participant Drift as DriftValidator

    User->>CLI: slayer ingest (idempotent)
    CLI->>Ingest: ingest_datasource_idempotent(datasource, storage, clients)
    Ingest->>Ingest: generate fresh models
    Ingest->>Storage: load persisted models
    loop per table
        alt persisted exists
            Ingest->>Ingest: _additive_merge_existing(persisted, fresh)
            Ingest->>Storage: save if additions
        else
            Ingest->>Storage: create new model
        end
    end
    Ingest->>Drift: validate_datasource(merged models)
    Drift->>DB: introspect/trial-execute
    Drift-->>Ingest: to_delete (type drift)
    Ingest-->>CLI: IdempotentIngestResult(additions, to_delete, errors)
    CLI->>User: show created/updated + pending drift + errors

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

MotleyAI/slayer#31: Modifies ingestion helpers/introspection used by idempotent ingestion flows.
MotleyAI/slayer#9: Related changes to ingestion and join/model handling overlapping with additive ingestion logic.
MotleyAI/slayer#29: Async/storage API changes that intersect with the async idempotent ingestion and validation wiring.

Suggested reviewers

AivanF
whimo

Poem

🐇 I hopped through schemas, sniffed each drifting leaf,

Kept old marks safe and stitched on something new.
I counted joins and columns — tidy brief —
I flagged the ghosts, then nudged the next review.
Hop, run, confirm — the models sleep anew.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 46.60% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically summarizes the main changes: adding schema-drift validation, idempotent ingestion, and a SchemaDriftError exception, all corresponding to the core features implemented in the PR.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch egor/dev-1356-schema-drift-validate_models-tool-idempotent-re-ingestion

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 9

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/concepts/ingestion.md`:
- Around line 148-150: The doc wording overstates the guarantee for
exclude_tables: update the sentence near
validate_models/IdempotentIngestResult.to_delete to either (A) narrow the docs
to say exclude_tables only prevents touching models in additive pass and
validator for non-sql_table modes (i.e., call out that sql/query-backed models
are still passed into validate_models), or (B) change the validator in
slayer/engine/ingestion.py so validate_models respects exclude_tables for
sql/query-backed models too (skip feeding sql/query-backed models into
validate_models when their table is excluded). Reference symbols:
validate_models, IdempotentIngestResult.to_delete,
include_tables/exclude_tables, and the validator invocation in
slayer/engine/ingestion.py.
- Around line 65-77: The example calls await ingest_datasource_idempotent at
module scope which causes a SyntaxError; wrap the async call inside an async
function (e.g., async def main()) and invoke it with asyncio.run(main()), or
alternatively show a synchronous wrapper if a sync API is available; update the
snippet to call ingest_datasource_idempotent from that function and preserve the
use of result.additions / result.to_delete / result.errors for results handling
so readers can execute the example.

In `@docs/examples/schema-drift.md`:
- Line 28: Several fenced code blocks in docs/examples/schema-drift.md are
missing language identifiers (triggering markdownlint MD040); update each of the
output-style triple-backtick blocks (the terminal/output samples such as the
"EDIT MODEL: products..." block, the "slayer validate-models ..." commands
block, the "Applied 1 entry/entries." block, and the "Updated: products..."
block) to include a language tag (e.g., text) immediately after the opening ```
so each fenced output block becomes ```text; ensure you update all occurrences
referenced in the review (the multiple output/terminal examples) for
consistency.
- Around line 55-73: The example output incorrectly shows an interactive
confirmation despite using the CLI flags --force-clean --yes; update the docs so
the slayer validate-models --datasource demo --force-clean --yes example
reflects non-interactive behavior by removing the "Apply ...? [y/N] y" prompt
and showing it applied automatically (or explicitly note that --yes
auto-approves), and ensure any referenced internal call names like
validate_models remain consistent with the command output.

In `@slayer/api/server.py`:
- Around line 171-182: The /query endpoint decorator's responses parameter needs
a 422 entry describing the schema-drift error shape returned when
SchemaDriftError is raised; update the decorator on the /query route to add
responses={..., 422: {"description": "schema_drift", "content":
{"application/json": {"example": {"error":"schema_drift","models": [...],
"to_delete":[...], "original": null}}}}} (or an equivalent OpenAPI schema) so
the documented response matches the raised HTTPException payload (keys: "error",
"models", "to_delete", "original") referenced where SchemaDriftError is handled
in the try/except block that raises the 422.
- Around line 371-380: The call to ingest_datasource_idempotent in server.py can
raise DB/auth/introspection exceptions that lead to 500s; wrap the await
ingest_datasource_idempotent(...) call in a try/except that catches those
operational exceptions (e.g., connection/auth/introspection related exceptions
your ingestion layer raises) and raise fastapi.HTTPException with a 4xx status
(400 or 422) and a client-friendly message, log the original exception for
server-side debugging, and return the same result.model_dump(...) on success;
reference the ingest_datasource_idempotent call and the surrounding handler in
server.py when making this change.

In `@slayer/engine/ingestion.py`:
- Around line 826-831: The ingest path currently allows a physical column
literally named "count" to persist; update the shared ingest logic in
ingest_datasource so that during auto-ingestion any source column named "count"
is normalized (renamed) to "count_col" before it is persisted or returned,
ensuring idempotent re-ingestion and avoiding conflicts with the reserved
"*:count" naming; target the ingest_datasource function and any helper that
builds the final column list/schema so the rename happens early in the pipeline
(before persistence) and is consistently applied on reruns.

In `@tests/test_force_clean.py`:
- Around line 260-267: The post-apply assertion in tests/test_force_clean.py is
too permissive (it accepts any "applied" text); update the assertion after
calling _run_cli so it verifies residual drift is actually cleared by asserting
either "no remaining drift" is present in result.stdout.lower(), or that
"applied" appears AND there is no mention of residual words (e.g., ensure
"residual" or "remaining" are not in result.stdout.lower()); keep using the
existing result from the _run_cli call to locate the check.
- Around line 275-279: The test calls _run_cli(...) with ["validate-models",
"--datasource", "ds", "--force-clean"] but ignores its result so failures before
the abort prompt are missed; capture the result (e.g., assign to a variable like
result) and assert it completed successfully (check result.exit_code == 0 or
result.returncode == 0 depending on _run_cli's return type) before continuing
with the abort-path assertions to ensure the CLI invocation didn't fail
silently.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8f8275e7-2d69-4e7b-9059-f47b0c05c740

📥 Commits

Reviewing files that changed from the base of the PR and between 6ab0c51 and 1e561bb.

📒 Files selected for processing (19)

.claude/skills/slayer-models.md
.claude/skills/slayer-overview.md
CLAUDE.md
docs/concepts/ingestion.md
docs/concepts/models.md
docs/concepts/schema-drift.md
docs/examples/schema-drift.md
slayer/api/server.py
slayer/cli.py
slayer/core/errors.py
slayer/engine/ingestion.py
slayer/engine/query_engine.py
slayer/engine/schema_drift.py
slayer/mcp/server.py
tests/integration/test_schema_drift_duckdb.py
tests/test_force_clean.py
tests/test_idempotent_ingestion.py
tests/test_schema_drift_error.py
tests/test_validate_models.py

…d /ingest (PR #101 review) - tests/test_api_server.py: 3 tests (test_post_models_creates_query_backed_model, test_post_models_rejects_user_columns_on_query_backed, test_post_query_run_by_name_dry_run_returns_sql_without_executing) used asyncio.get_event_loop().run_until_complete(...). After the new test_force_clean.py tests run via slayer.async_utils.run_sync() (which calls asyncio.run() and clears the main-thread loop on exit), get_event_loop() raises RuntimeError. Replace with run_sync() to match the rest of the suite. - tests/integration/test_mcp_inspect.py: ingest_datasource_models now renders "Created N new model(s):" / "Updated N existing model(s):" instead of the legacy "Ingested N model(s):" wording. Update the assertion. - slayer/api/server.py /query: add 422 schema-drift response to the OpenAPI decorator (CodeRabbit + Sonar S8415 — documented body must match the SchemaDriftError translation introduced in this PR). - slayer/api/server.py /ingest: wrap ingest_datasource_idempotent in try/except (OperationalError, DatabaseError, SQLAlchemyError) so bad datasource configuration surfaces as 422 with a friendly message instead of a 500 (CodeRabbit major). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

🧹 Nitpick comments (2)

slayer/api/server.py (1)

394-396: ⚡ Quick win

Move endpoint-local imports to module scope

These imports should be at the top of the file for consistency and clearer import-time failure behavior.

Suggested diff

 from fastapi import FastAPI, HTTPException
 from pydantic import BaseModel, ConfigDict
+from sqlalchemy.exc import DatabaseError, OperationalError, SQLAlchemyError

 from slayer.mcp.server import create_mcp_server
 from slayer.core.errors import AmbiguousModelError, SchemaDriftError
 from slayer.core.format import NumberFormat
 from slayer.core.models import DatasourceConfig, SlayerModel
 from slayer.core.query import SlayerQuery
+from slayer.engine.ingestion import ingest_datasource_idempotent
 from slayer.engine.query_engine import SlayerQueryEngine
 from slayer.storage.base import StorageBackend
...
-        from sqlalchemy.exc import DatabaseError, OperationalError, SQLAlchemyError
-        from slayer.engine.ingestion import ingest_datasource_idempotent
-
         try:
             result = await ingest_datasource_idempotent(

As per coding guidelines, "`**/*.py`: Imports at the top of files".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@slayer/api/server.py` around lines 394 - 396, The imports DatabaseError,
OperationalError, SQLAlchemyError and ingest_datasource_idempotent are declared
inside the endpoint and should be moved to module scope for consistency; remove
the in-function imports and add "from sqlalchemy.exc import DatabaseError,
OperationalError, SQLAlchemyError" and "from slayer.engine.ingestion import
ingest_datasource_idempotent" to the top-of-file imports so the endpoint (where
ingest_datasource_idempotent is used and exceptions are handled) references
these already-imported symbols.

tests/test_api_server.py (1)

266-267: ⚡ Quick win

Consolidate run_sync import at file top

run_sync is imported three times inside functions; move it to module imports.

Suggested diff

 import pytest
 from fastapi.testclient import TestClient

 from slayer.api.server import QueryRequest, create_app
+from slayer.async_utils import run_sync
 from slayer.core.enums import DataType
 from slayer.core.models import Column, SlayerModel
 from slayer.core.query import SlayerQuery
 from slayer.storage.yaml_storage import YAMLStorage
...
-        from slayer.async_utils import run_sync
         run_sync(
             storage.save_datasource(DatasourceConfig(
                 name="ds", type="sqlite", database=":memory:"
             ))
         )
...
-        from slayer.async_utils import run_sync
         run_sync(
             storage.save_datasource(DatasourceConfig(
                 name="ds", type="sqlite", database=":memory:"
             ))
         )
...
-        from slayer.async_utils import run_sync
         run_sync(storage.save_datasource(
             DatasourceConfig(name="ds", type="sqlite", database=":memory:")
         ))

As per coding guidelines, "`**/*.py`: Imports at the top of files".

Also applies to: 300-301, 404-405

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_api_server.py` around lines 266 - 267, Several test functions
import run_sync locally; move the run_sync import to the module-level imports
and remove the duplicate local imports inside the test functions so the symbol
is imported once. Locate uses of run_sync in your test functions (the local
"from slayer.async_utils import run_sync" occurrences) and add a single "from
slayer.async_utils import run_sync" at the top of the file, then delete the
in-function import statements and ensure tests still reference the top-level
run_sync.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@slayer/api/server.py`:
- Around line 394-396: The imports DatabaseError, OperationalError,
SQLAlchemyError and ingest_datasource_idempotent are declared inside the
endpoint and should be moved to module scope for consistency; remove the
in-function imports and add "from sqlalchemy.exc import DatabaseError,
OperationalError, SQLAlchemyError" and "from slayer.engine.ingestion import
ingest_datasource_idempotent" to the top-of-file imports so the endpoint (where
ingest_datasource_idempotent is used and exceptions are handled) references
these already-imported symbols.

In `@tests/test_api_server.py`:
- Around line 266-267: Several test functions import run_sync locally; move the
run_sync import to the module-level imports and remove the duplicate local
imports inside the test functions so the symbol is imported once. Locate uses of
run_sync in your test functions (the local "from slayer.async_utils import
run_sync" occurrences) and add a single "from slayer.async_utils import
run_sync" at the top of the file, then delete the in-function import statements
and ensure tests still reference the top-level run_sync.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 02f7750a-dccd-450b-9a6b-e8aef9438d88

📥 Commits

Reviewing files that changed from the base of the PR and between 1e561bb and 7e6dfb0.

📒 Files selected for processing (3)

slayer/api/server.py
tests/integration/test_mcp_inspect.py
tests/test_api_server.py

Docs (CodeRabbit minor): - docs/concepts/ingestion.md: wrap the Python ingest_datasource_idempotent example in `async def main()` + `asyncio.run(main())` so it's executable (await at module scope is a SyntaxError). - docs/concepts/ingestion.md: narrow the exclude_tables wording — the filter scopes the additive pass and the sql_table-mode subset of validation; sql-mode and query-backed models are still validated regardless of the table filter (matches the actual implementation). - docs/examples/schema-drift.md: add `text` language tag to fenced output blocks (markdownlint MD040). Fix --force-clean --yes example to reflect non-interactive behaviour (drop the [y/N] prompt from the sample output; document the prompt explicitly outside the code block). Sonar refactors (S3776 + S1066): - slayer/engine/schema_drift.py: split _cascade_one_pass (CC 135) into per-rule helpers (_cascade_derived_columns, _cascade_measures, _cascade_joins, _cascade_filters, _cascade_query_backed) sharing a plain _CascadeState container (Pydantic was tried first but it copies the dict/set fields, breaking in-place cascade mutations). Refactor _walk_field_spec_measure_refs (CC 31) and _stage_referenced_columns_for_base (CC 44) into smaller per-section helpers. Flatten the rule-3a nested if into a single boolean OR. - slayer/engine/query_engine.py: drop the no-op `for cm in enriched.cross_model_measures: pass` block (S1481) — the comment now lives on the data_sources expression. - slayer/cli.py: split _run_ingest (CC 21) and _format_validate_models_output (CC 22) into focused helpers. - slayer/engine/ingestion.py: split ingest_datasource_idempotent (CC 22) into _process_one_table + _scoped_models_for_validation helpers. - slayer/mcp/server.py: split _render_ingest_result (CC 29) into per-section renderers; drop unused datasource_name parameter (S1172). Test cleanups (CodeRabbit minor + Sonar S1481/S7503): - tests/test_force_clean.py: tighten post-apply assertion — re-run validate-models and assert the diff is empty rather than matching on free-text. Capture and assert returncode in the abort-path test. - tests/test_validate_models.py: rename unused `dropped` to `_`. - tests/test_schema_drift_error.py: NOSONAR(S7503) the patched async side_effect (must be a coroutine function for the awaited engine method). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@slayer/cli.py`:
- Around line 620-631: After running ingest_datasource_idempotent and printing
additions/drift, check the returned result.errors and if it's non-empty exit
with a failing status (e.g., sys.exit(1) or raise SystemExit(1)) so the CLI
returns non-zero on model errors; update the block around
ingest_datasource_idempotent/result/_print_ingest_drift_and_errors to mirror the
validate-models --force-clean behavior and call sys.exit(1) when result.errors
exists.

In `@slayer/engine/ingestion.py`:
- Around line 900-905: ingest_datasource_idempotent() currently calls
ingest_datasource(...) synchronously on the async path, which blocks the event
loop during SQLAlchemy introspection; change the call site to offload the
synchronous work to a thread (e.g. wrap the ingest_datasource(...) call in
asyncio.to_thread(...) or an equivalent executor) before awaiting any async
merge/save work so the event loop remains responsive; update the code that
invokes ingest_datasource (referenced as ingest_datasource(...) inside
ingest_datasource_idempotent) to call await asyncio.to_thread(lambda:
ingest_datasource(...)) and keep subsequent async operations unchanged.

In `@slayer/engine/query_engine.py`:
- Around line 494-528: _collect_models_touched currently only adds direct join
targets (model.joins) so multihop join targets (e.g., orders -> customers ->
regions) are omitted; update _collect_models_touched to walk joins transitively
by following enriched.resolved_joins (and stage.joins for source_queries) to
collect all reachable target_model names, avoiding cycles (track visited names)
and adding each visited model name to the touched set so validate_models can
attribute DBAPI errors to SchemaDriftError as intended.
- Around line 717-749: The edit_model_remove() path currently mutates cached
fields and calls storage.save_model(updated) directly, which will persist cache
state for query-backed models and bypass engine-side validation/refresh; update
edit_model_remove() to detect query-backed models (e.g., check
existing.source_queries or the model's query-backed indicator) and either
raise/reject the operation for those models or call the engine-level helper
SlayerQueryEngine.save_model(updated) instead of storage.save_model; ensure you
reference existing, edit_model_remove, storage.save_model, and
SlayerQueryEngine.save_model when making the change.
- Around line 780-832: The loop over `deletes` only adds `entry.data_source` to
`touched_ds` on success, so datasources whose all applies fail are skipped from
the post-apply `validate_models()` pass; fix by ensuring
`touched_ds.add(entry.data_source)` runs regardless of success or failure (e.g.,
add it before the try or in a finally block) in the loop that calls
`delete_model` and `edit_model_remove`, so `validate_models(data_source=...)`
and the subsequent `residual` calculation will always run for every touched
datasource when constructing the `ApplyDriftResult`.

In `@slayer/engine/schema_drift.py`:
- Around line 759-835: _stage_referenced_columns_for_base currently only treats
same-model (no-dot) refs as belonging to base_name and misses dotted refs that
resolve to base via joins; update it to also attribute dotted refs whose prefix
model resolves to base_name through the stage's join graph. Concretely, in
_stage_referenced_columns_for_base, walk stage.joins (and any alias/name fields
on join objects) to build the set of models reachable from the stage's source
model (follow joins transitively), then for each dotted ref returned by
_measure_refs_on_base, _dimension_refs_on_base, _time_dimension_refs_on_base,
and _filter_refs_on_base treat the ref as referencing base_name if its prefix
resolves (via that join traversal) to base_name; keep existing same-model logic
(sm_is_base) unchanged. This ensures _query_backed_should_whole_drop sees hits
when a stage references columns via joins that map back to the base model.
- Around line 992-1032: _cascade_joins currently uses foreign_missing = [pair[1]
for pair in join.join_pairs if pair[1] in state.cascadable(tgt.name)], which
ignores primary keys removed by state.cascadable and therefore fails to mark the
join itself as invalid when a target PK is dropped; update _cascade_joins to
treat missing target PKs as a reason to add a dropped join regardless of
state.cascadable filtering by checking join.join_pairs for any pair[1] that is
absent from tgt.columns (or otherwise not present on the live target model) and
call _add_dropped_join (same args as existing calls) with a clear reason (e.g.,
"Foreign column(s) ... dropped on target model ...") so the join is invalidated;
reference functions/vars: _cascade_joins, join.join_pairs, foreign_missing,
state.cascadable, state.models_by_name, _add_dropped_join.

In `@tests/test_schema_drift_error.py`:
- Around line 163-165: The test currently only asserts that the raised exception
is not SchemaDriftError; change it to assert the original DB failure by either
checking exc.value is an instance of the concrete DBAPI exception you expect
(e.g., OperationalError/DatabaseError class used by your DB backend) or by
asserting a stable substring from the underlying DB error message is present
(e.g., "no such table" or the specific error text your DB returns) after calling
engine.execute(q) in the pytest.raises block; apply the same tighter assertion
to the other occurrence at the second block (lines 184-189) so the test fails if
the original DBAPI error path is not reached.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b10eb18c-a013-41fb-b17c-61222036fee1

📥 Commits

Reviewing files that changed from the base of the PR and between 7e6dfb0 and f039bb5.

📒 Files selected for processing (10)

docs/concepts/ingestion.md
docs/examples/schema-drift.md
slayer/cli.py
slayer/engine/ingestion.py
slayer/engine/query_engine.py
slayer/engine/schema_drift.py
slayer/mcp/server.py
tests/test_force_clean.py
tests/test_schema_drift_error.py
tests/test_validate_models.py

…ardening Correctness fixes (CodeRabbit major) — schema drift attribution and apply - engine/query_engine.py:_collect_models_touched walks the join graph transitively (visited-set guarded) so multi-hop targets like orders→customers→regions are included. SchemaDriftError now triggers when drift lives on the second hop. Made the helper async + storage-aware. Also fixed a latent bug: SlayerQuery has no .joins attribute, so the prior `for j in stage.joins or []` would have raised AttributeError on any query-backed model with stage joins — switched to getattr defaults. - engine/query_engine.py:apply_drift_deletes records each entry's data_source up front (before the mutation) so post-apply validate_models still re-runs when every entry on a DS fails. - engine/query_engine.py:edit_model_remove rejects query-backed models with a clear error pointing at engine.save_model. Bypassing it would persist stale cache that no longer matches source_queries. - engine/schema_drift.py:_cascade_joins uses raw dropped_cols (not the PK-filtered cascadable) when checking the foreign side of a join. A target PK drop still invalidates the join itself even though rule 7 stops downstream column cascades. - engine/schema_drift.py:_query_backed_should_whole_drop iterates every candidate base model in the DS (passed via candidate_base_names from _cascade_query_backed) so cross-model dotted refs in stage measures / dimensions / filters are caught when the *referenced* model drifts — not just when the stage's own source_model drifts. Adds whole-drop detection on stage.joins targets too (defensive — handled via getattr since SlayerQuery itself has no joins field). CLI / API hardening (CodeRabbit major + Sonar S5713 + S5145) - cli.py:_run_ingest exits 1 when result.errors is non-empty, mirroring validate-models --force-clean. Automation can no longer treat partial ingest as success. - api/server.py /ingest catches only SQLAlchemyError (OperationalError and DatabaseError both derive from it — Sonar S5713). Strips CR/LF from request.datasource before logging or composing into the response detail (Sonar S5145 log-injection). Sonar refactors (S3776) and test tightening - engine/schema_drift.py:diff_sql_table_model split into _diff_sql_table_columns + _diff_sql_table_joins (CC 16 → ≤15). - engine/schema_drift.py:_cascade_measures factored a _first_dropped_cause helper out of the inner ref loop (CC 18 → ≤15). - engine/schema_drift.py:validate_datasource split into _resolve_live_table + _collect_sql_table_diffs + _collect_sql_diffs (CC 25 → ≤15). - tests/test_schema_drift_error.py: NOSONAR(S7503) moved onto the `async def _no_drift` line (Sonar requires it on the offending line). Tightened the two re-raise tests: assert pytest.raises(SQLAlchemyError) and check a stable substring of the original DBAPI error so a regression that masks the original failure path actually fails. Async architecture (CodeRabbit major) - engine/ingestion.py:ingest_datasource_idempotent wraps the synchronous ingest_datasource(...) call in await asyncio.to_thread(...) so a slow or large datasource doesn't block the event loop while server-facing requests are in flight. Matches the project's async-first contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (1)

slayer/engine/schema_drift.py (1)
721-789: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Multi-hop stage refs still evade whole-drop detection.

_attribute_ref_to_base() only matches exact dotted prefixes, so a stage ref like customers.regions.name is never attributed to the terminal regions model. If regions.name drifts, _query_backed_should_whole_drop() misses the hit and leaves a query-backed model whose SQL can no longer resolve.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@slayer/engine/schema_drift.py` around lines 721 - 789, _attribute_ref_to_base
currently only matches when the dotted prefix equals base_name exactly, so
multi-hop refs like "customers.regions.name" are not attributed to the terminal
model; change _attribute_ref_to_base(ref, base_name, sm_is_base) to, when ref
contains dots, extract the prefix and test the last segment of the prefix
(prefix.rsplit(".",1)[-1]) against base_name (return leaf if they match) instead
of comparing the whole prefix, leaving the existing no-dot behavior (return ref
if sm_is_base else None) intact; callers to update implicitly are
_measure_refs_on_base, _dimension_refs_on_base, _time_dimension_refs_on_base,
and _filter_refs_on_base which rely on _attribute_ref_to_base.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@slayer/api/server.py`:
- Around line 402-420: The handler currently returns 200 whenever
ingest_datasource_idempotent(...) completes, even if result.errors is non-empty;
modify the post-ingest logic in server.py to detect non-empty result.errors and
raise an HTTPException (e.g., status_code=422) with a descriptive detail payload
listing the partial errors instead of returning result.model_dump(...), so
clients receive a failure status when per-model save/merge errors occurred;
reference the ingest_datasource_idempotent call and the result.errors property
when implementing this check.

In `@slayer/engine/ingestion.py`:
- Around line 775-783: The loop currently appends a fresh ModelJoin even when
its target_model already exists with a different join_pairs signature, which
creates duplicate targets; change the logic in the block that iterates over
fresh.joins to first build a set of existing target names (from persisted.joins)
and also check new_join_targets to avoid duplicates, then for each j: if
j.target_model is in existing targets or new_join_targets, compare the signature
(tuple(sorted((p[0],p[1]) for p in j.join_pairs))) to the existing signature
from _existing_join_signatures(persisted); if signatures match skip/continue,
but if they differ raise/return a surfaced conflict (e.g., raise ValueError with
a clear message mentioning j.target_model) instead of appending a second join;
update new_join_targets when you append to new_joins.

In `@slayer/engine/query_engine.py`:
- Around line 514-530: The loop that builds touched from model.source_queries
misses joins defined on source_model extensions because it only checks
getattr(stage, "joins", None); update the logic in the block iterating stages
(handling model.source_queries) to, for each stage, if getattr(stage, "joins",
None) exists add those join targets, and additionally if the stage has a
source_model that is a ModelExtension-like object (i.e. not just a string) or if
source_model is a dict/extension container, inspect that source_model's joins as
well and add each join.target_model to touched; use the existing symbols stage,
stage.source_model, ModelExtension/SlayerQuery semantics, and ensure
touched.add(...) is called for those join targets so _maybe_raise_schema_drift()
sees them.

In `@slayer/engine/schema_drift.py`:
- Around line 697-719: _resolve_stage_source_to_base currently only handles str
and SlayerModel and returns None for ModelExtension; update it to detect
ModelExtension and return the underlying base model name (e.g.,
extension.base_model or extension.source_model whichever holds the base) so
query-backed stages resolve correctly, and likewise update _stage_join_targets
to inspect a stage's ModelExtension when stage.source_model is an extension (use
the extension's source/base model when collecting join targets). Ensure both
places reference ModelExtension, stage.source_model,
_resolve_stage_source_to_base, and _stage_join_targets so
_query_backed_should_whole_drop will include extension-backed stages; apply the
same fix pattern to the analogous code block around the other occurrence (lines
~806-819).

---

Duplicate comments:
In `@slayer/engine/schema_drift.py`:
- Around line 721-789: _attribute_ref_to_base currently only matches when the
dotted prefix equals base_name exactly, so multi-hop refs like
"customers.regions.name" are not attributed to the terminal model; change
_attribute_ref_to_base(ref, base_name, sm_is_base) to, when ref contains dots,
extract the prefix and test the last segment of the prefix
(prefix.rsplit(".",1)[-1]) against base_name (return leaf if they match) instead
of comparing the whole prefix, leaving the existing no-dot behavior (return ref
if sm_is_base else None) intact; callers to update implicitly are
_measure_refs_on_base, _dimension_refs_on_base, _time_dimension_refs_on_base,
and _filter_refs_on_base which rely on _attribute_ref_to_base.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 10c6322c-978d-48de-945d-379e2484a7a6

📥 Commits

Reviewing files that changed from the base of the PR and between f039bb5 and a76fb9a.

📒 Files selected for processing (6)

slayer/api/server.py
slayer/cli.py
slayer/engine/ingestion.py
slayer/engine/query_engine.py
slayer/engine/schema_drift.py
tests/test_schema_drift_error.py

…Sonar splits Correctness fixes (CodeRabbit major) - api/server.py /ingest now mirrors the CLI's exit-1 behaviour: when result.errors is non-empty, raise HTTPException(422) with the full IdempotentIngestResult body so API clients see partial failures instead of a 200. - engine/ingestion.py:_additive_merge_existing surfaces a ValueError when a fresh ingest discovers a different join_pairs signature for an existing target_model. Downstream consumers key joins by target_model alone — appending a second one would let the stale join shadow the live one and ``remove.joins=[name]`` would wipe both. - ModelExtension stages were silently skipped by query-backed drift attribution. SlayerQuery has no .joins attribute; joins on a stage live on its source_model when that's a ModelExtension, and ModelExtension carries source_name (not source_model). Two fixes: - engine/schema_drift.py:_resolve_stage_source_to_base now follows extension.source_name (str) so query-backed stages resolve to a real base model; _stage_join_targets reads off stage.source_model.joins via getattr. - engine/query_engine.py:_collect_query_backed_base_names mirrors the same getattr lookup so multi-hop drift attribution covers extension-added joins. Sonar S3776 (cognitive complexity) refactors - query_engine.py:_collect_models_touched (CC 18 → ≤15): extracted _collect_query_backed_base_names static helper so the orchestrator just unions sets and triggers the join-graph walk. - schema_drift.py:_query_backed_should_whole_drop (CC 22 → ≤15): extracted _check_stage_for_whole_drop covering the per-stage base-name / join-target / candidate-base check; outer loop just iterates stages. - schema_drift.py:_cascade_derived_columns (CC 18 → ≤15): extracted _first_dropped_sql_column_ref returning Optional[(target, ref_col)]; the loop body becomes a flat dispatch. - schema_drift.py:compute_datasource_drops (CC 25 → ≤15): extracted _seed_state_from_diffs and _collapse_entries; orchestrator becomes seed → iterate → collapse. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (2)

slayer/engine/ingestion.py (1)

775-799: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Still reject duplicate join targets introduced in the same fresh ingest pass.

This guard only checks existing_join_targets. If fresh.joins contains two different signatures for the same target_model, both still get appended because new_join_targets is never consulted. Downstream code still keys joins by target_model, so the ambiguity remains.

Possible minimal fix

     for j in fresh.joins:
         sig = (j.target_model, tuple(sorted((p[0], p[1]) for p in j.join_pairs)))
         if sig in existing_join_sigs:
             continue
-        if j.target_model in existing_join_targets:
+        if j.target_model in existing_join_targets or j.target_model in new_join_targets:
             # Same target_model already present with a different
             # join_pairs signature. Downstream consumers key joins by
             # target_model only — appending a second one would let the
             # stale join shadow the live one and ``remove.joins=[name]``
             # would wipe both. Surface the conflict so the user can

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@slayer/engine/ingestion.py` around lines 775 - 799, The loop that merges
fresh.joins fails to detect conflicting joins introduced within the same ingest
because it only checks existing_join_targets and not duplicates already appended
in this run; update the logic in the for j in fresh.joins loop (which builds
new_joins and new_join_targets) to also check whether j.target_model is already
present in new_join_targets (or maintain a set of appended targets) and raise
the same ValueError if a different join signature for the same target_model is
found; ensure you compare signatures the same way as existing_join_sigs (using
sig = (j.target_model, tuple(sorted(...)))) so identical signatures are allowed
but differing ones are rejected.

slayer/engine/schema_drift.py (1)

731-813: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Multihop stage refs still never attribute the terminal model.

_attribute_ref_to_base() only matches prefix == base_name, so a ref like customers.regions.name is never attributed to regions. If regions.name drifts, _query_backed_should_whole_drop() keeps the query-backed model even though its source_queries can no longer resolve.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@slayer/engine/schema_drift.py` around lines 731 - 813, _attribute_ref_to_base
currently only matches a single-dot prefix equality so multihop refs like
"customers.regions.name" never attribute to the terminal model; change
_attribute_ref_to_base to recognize when the dotted prefix ends with the
base_name (e.g. prefix == base_name or prefix.endswith("." + base_name)) and
return the leaf in that case, preserving the existing behavior for no-dot refs
(use ref if sm_is_base else None); update callers
(_stage_referenced_columns_for_base, _measure_refs_on_base,
_dimension_refs_on_base, _time_dimension_refs_on_base, _filter_refs_on_base)
require no other changes.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@slayer/engine/query_engine.py`:
- Around line 524-541: The helper currently seeds touched with the
possibly-virtual model.name, which loses attribution for query-backed stages;
change _collect_models_touched to detect query-backed/virtual models (e.g.,
check model.is_query_backed or presence of source/backing info on SlayerModel)
and initialize touched with the persisted backing base names instead of only
model.name by calling _collect_query_backed_base_names first (and adding
model.name when appropriate), then proceed to add cross_model_measures and
expand the join graph so drift validation sees the real persisted models.

In `@slayer/engine/schema_drift.py`:
- Around line 855-901: The function _check_stage_for_whole_drop currently never
considers dropped_joins so stages that reference a base via a removed FK
survive; add a dropped_joins parameter (e.g. Dict[str, Set[str]] or Set tuples)
to _check_stage_for_whole_drop and update callers, then treat any join target
returned by _stage_join_targets(stage) as fatal if it appears in
whole_dropped_models OR is listed in dropped_joins (and similarly consider
candidate base names that have dropped joins when calling
_check_stage_against_base); use the symbols _check_stage_for_whole_drop,
_stage_join_targets and _check_stage_against_base to locate the changes and
ensure the returned DeleteReason mirrors existing messages for dropped joins.
- Around line 1489-1492: The wrapper SELECT fails when model.sql ends with a
statement terminator (e.g., "SELECT 1;"), so before building trial_sql in the
validate flow strip trailing semicolons and trailing whitespace from model.sql;
update the code that constructs trial_sql (where trial_sql = f"SELECT * FROM
({model.sql}) AS _sd_validate WHERE 1=0") to use a cleaned_sql variable (trimmed
of trailing ';' and whitespace) and pass that into the f-string, ensuring
client.get_column_types receives a wrapper without a trailing terminator; adjust
any related usage in validate_models()/trial_sql construction accordingly.

---

Duplicate comments:
In `@slayer/engine/ingestion.py`:
- Around line 775-799: The loop that merges fresh.joins fails to detect
conflicting joins introduced within the same ingest because it only checks
existing_join_targets and not duplicates already appended in this run; update
the logic in the for j in fresh.joins loop (which builds new_joins and
new_join_targets) to also check whether j.target_model is already present in
new_join_targets (or maintain a set of appended targets) and raise the same
ValueError if a different join signature for the same target_model is found;
ensure you compare signatures the same way as existing_join_sigs (using sig =
(j.target_model, tuple(sorted(...)))) so identical signatures are allowed but
differing ones are rejected.

In `@slayer/engine/schema_drift.py`:
- Around line 731-813: _attribute_ref_to_base currently only matches a
single-dot prefix equality so multihop refs like "customers.regions.name" never
attribute to the terminal model; change _attribute_ref_to_base to recognize when
the dotted prefix ends with the base_name (e.g. prefix == base_name or
prefix.endswith("." + base_name)) and return the leaf in that case, preserving
the existing behavior for no-dot refs (use ref if sm_is_base else None); update
callers (_stage_referenced_columns_for_base, _measure_refs_on_base,
_dimension_refs_on_base, _time_dimension_refs_on_base, _filter_refs_on_base)
require no other changes.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 065fec1f-6414-4b22-b8d9-91f208a939e3

📥 Commits

Reviewing files that changed from the base of the PR and between a76fb9a and 3de3f41.

📒 Files selected for processing (4)

slayer/api/server.py
slayer/engine/ingestion.py
slayer/engine/query_engine.py
slayer/engine/schema_drift.py

… sql terminator Correctness fixes (CodeRabbit major) - engine/query_engine.py:_collect_models_touched seeds touched from the PERSISTED model with the same name when the resolved ``model`` is a virtual stage from _query_as_model(). For an exec like ``stage1(source_model="orders") -> final(source_model="stage1")``, the resolved model.name was ``stage1`` (a virtual name), so validate_models produced deletes for ``orders`` and the touched-model filter dropped them — execute() then re-raised the raw DB error instead of SchemaDriftError. Falls back to storage.get_model(model.name, ...) and unions in its query-backed base names. - engine/schema_drift.py:_check_stage_for_whole_drop now considers dropped_joins (threaded through _query_backed_should_whole_drop and _cascade_query_backed). A stage with source_model="orders" and dimensions=["customers.name"] whose orders→customers join was removed (FK drifted) used to silently survive; it now whole-drops via a new _stage_uses_dropped_join helper that checks each candidate base for references through any dropped-join target. - engine/schema_drift.py:_live_columns_for_sql_model strips trailing whitespace and a single ';' from model.sql before wrapping it in ``SELECT * FROM (...) AS _sd_validate WHERE 1=0``. A persisted ``SELECT 1;`` is valid at top level but invalid inside the wrapper — without the strip, the bogus structural error was attributed to drift and produced a false WholeModelDelete. Sonar S3776 - engine/schema_drift.py:_seed_state_from_diffs (CC 18 → ≤15) — extracted _seed_one_diff_entry; the outer function only iterates the diffs dicts and delegates each entry to the helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

slayer/engine/ingestion.py (1)
775-799: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fresh duplicate join targets still slip past this safety check.

This only compares against persisted.joins. If fresh.joins contains two different signatures for the same target_model—which _generate_joins() can emit when a table has multiple FKs to the same target—both still get persisted because the appended target/signature never enters the seen sets, and the new-model path below saves fresh as-is. That recreates the same shadowing/removal ambiguity this guard is meant to prevent.

Also applies to: 823-832
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@slayer/engine/ingestion.py` around lines 775 - 799, The loop only checks
persisted.joins for conflicting target_model signatures but fails to detect
multiple differing joins for the same target inside fresh.joins; update the
logic in the block that iterates fresh.joins (and the analogous code path around
the new-model handling at lines 823-832) to maintain and consult a local seen
set for target_models and signatures (e.g., extend existing_join_targets and
existing_join_sigs or create fresh_seen_targets/fresh_seen_sigs) as you append
each j to new_joins; if you encounter a j.target_model already seen with a
different signature (tuple(sorted(...)) vs the stored signature) raise the same
ValueError so duplicate differing joins inside fresh.joins are rejected before
persisting.

🧹 Nitpick comments (2)

slayer/engine/ingestion.py (1)
823-823: ⚡ Quick win

Use keyword arguments for storage.get_model() here.

This call has more than one parameter, so it should follow the repo’s keyword-argument convention as well. As per coding guidelines: **/*.py: Use keyword arguments for functions with more than 1 parameter.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@slayer/engine/ingestion.py` at line 823, The call to storage.get_model uses a
positional argument for table_name and a second positional parameter for
data_source; update the call to pass named keyword arguments (e.g., use
table_name=... and data_source=...) so it follows the repo convention for
functions with more than one parameter and to make the intent explicit; edit the
invocation where persisted is assigned (storage.get_model) to use those keywords
with the existing variables (table_name and datasource.name).
slayer/engine/schema_drift.py (1)
1498-1499: ⚡ Quick win

Use keyword arguments for these helper calls.

_safe_get_columns(...) and _safe_get_pk_constraint(...) both take multiple parameters, and the repo guideline requires keyword arguments for that shape. As per coding guidelines: **/*.py: Use keyword arguments for functions with more than 1 parameter.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@slayer/engine/schema_drift.py` around lines 1498 - 1499, Change the two
helper calls to pass arguments by name: replace the positional calls to
_safe_get_columns(...) and _safe_get_pk_constraint(...) with keyword-argument
calls using the parameter names (e.g. inspector=inspector, sa_engine=sa_engine,
table_name=table_name, schema=schema) so the call sites (_safe_get_columns and
_safe_get_pk_constraint) follow the repository guideline requiring keyword
arguments for functions with more than one parameter.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@slayer/engine/schema_drift.py`:
- Around line 731-740: The helper _attribute_ref_to_base should stop doing raw
string prefix checks and instead walk the source/join graph for the stage to
resolve refs: detect if ref is a bare name on a ModelExtension (use
stage.source_model type/identity rather than sm_is_base boolean) and resolve
multi-hop dotted refs by traversing joins/relations (e.g., resolve
"customers.regions.name" by following the join chain from base_name through
intermediate models to the leaf column); update _attribute_ref_to_base (and the
similar logic around lines 795-813) to accept the stage/graph context, perform
graph traversal to map a dotted ref to the final base column or return None when
it resolves to a different model, and ensure ModelExtension bare refs treat the
extension’s source as the base when appropriate.
- Around line 855-876: _stage_uses_dropped_join currently flags a dropped join
if the stage references the target column, but it must also ensure the dropped
target belongs to a model reachable from the stage's resolved base/extension
join graph; update _stage_uses_dropped_join to first compute the set of models
reachable from the stage (starting from the stage's resolved base/extension
joins) and only consider dropped_joins whose target is in that reachable set
before calling _stage_referenced_columns_for_base(stage=..., base_name=target);
apply the same reachable-model restriction to the analogous logic at the other
occurrence (lines 911-922) so we only whole-drop stages for dropped joins that
are actually reachable from the stage.

---

Duplicate comments:
In `@slayer/engine/ingestion.py`:
- Around line 775-799: The loop only checks persisted.joins for conflicting
target_model signatures but fails to detect multiple differing joins for the
same target inside fresh.joins; update the logic in the block that iterates
fresh.joins (and the analogous code path around the new-model handling at lines
823-832) to maintain and consult a local seen set for target_models and
signatures (e.g., extend existing_join_targets and existing_join_sigs or create
fresh_seen_targets/fresh_seen_sigs) as you append each j to new_joins; if you
encounter a j.target_model already seen with a different signature
(tuple(sorted(...)) vs the stored signature) raise the same ValueError so
duplicate differing joins inside fresh.joins are rejected before persisting.

---

Nitpick comments:
In `@slayer/engine/ingestion.py`:
- Line 823: The call to storage.get_model uses a positional argument for
table_name and a second positional parameter for data_source; update the call to
pass named keyword arguments (e.g., use table_name=... and data_source=...) so
it follows the repo convention for functions with more than one parameter and to
make the intent explicit; edit the invocation where persisted is assigned
(storage.get_model) to use those keywords with the existing variables
(table_name and datasource.name).

In `@slayer/engine/schema_drift.py`:
- Around line 1498-1499: Change the two helper calls to pass arguments by name:
replace the positional calls to _safe_get_columns(...) and
_safe_get_pk_constraint(...) with keyword-argument calls using the parameter
names (e.g. inspector=inspector, sa_engine=sa_engine, table_name=table_name,
schema=schema) so the call sites (_safe_get_columns and _safe_get_pk_constraint)
follow the repository guideline requiring keyword arguments for functions with
more than one parameter.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 15adac8a-8e37-4fc6-8405-fa07d5762ad4

📥 Commits

Reviewing files that changed from the base of the PR and between a76fb9a and 9f4b302.

📒 Files selected for processing (4)

slayer/api/server.py
slayer/engine/ingestion.py
slayer/engine/query_engine.py
slayer/engine/schema_drift.py

🚧 Files skipped from review as they are similar to previous changes (2)

slayer/api/server.py
slayer/engine/query_engine.py

…ool-idempotent-re-ingestion Resolve conflicts in errors.py (keep SchemaDriftError alongside EntityResolutionError + MemoryNotFoundError), api/server.py (merge error imports), and slayer-overview.md (keep both schema-drift and memory tool bullets).

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

slayer/cli.py (1)
693-693: ⚡ Quick win

Move new function-local imports to module top.

The newly added imports are inside functions. This breaks the repository import-placement rule.

As per coding guidelines: "**/*.py: Imports should be placed at the top of files".

Also applies to: 755-755
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@slayer/cli.py` at line 693, Move the function-local imports (e.g.,
ingest_datasource_idempotent from slayer.engine.ingestion and the other import
added around line 755) to the module top of slayer.cli with the other imports;
remove the in-function import statements so callers just reference the
already-imported symbols, run/import ordering checks, and if you encounter a
circular-import error refactor minimally (export the symbol from a different
module or use a small helper) rather than leaving imports inside functions.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.claude/skills/slayer-overview.md:
- Line 32: The docs claim ingest_datasource_models returns an
IdempotentIngestResult, but the actual implementation in slayer/mcp/server.py
returns a rendered text summary string; update either the docs or the function
so they match: (A) change the .claude/skills/slayer-overview.md entry for
ingest_datasource_models to describe the actual rendered text summary format and
examples produced by slayer/mcp/server.py, or (B) modify the
ingest_datasource_models implementation to return a literal
IdempotentIngestResult (with additions, to_delete, errors) instead of a rendered
string; search for all occurrences of ingest_datasource_models and the old field
names across skills/docs and update them for consistency.

In `@slayer/api/server.py`:
- Around line 397-403: The validate_models_endpoint currently forwards straight
to engine.validate_models so a typoed or missing data_source looks like "no
drift" and introspection failures surface as 500s; update
validate_models_endpoint to first verify the requested datasource exists (e.g.
via engine.has_data_source / engine.list_data_sources /
engine.introspect_data_source — whichever exists in your engine API) and return
HTTPException(404) when the data_source is unknown, then call
engine.validate_models inside a try/except and convert known
introspection/connection errors into HTTPException(422) with a clear message;
keep references to ValidateModelsRequest, validate_models_endpoint and
engine.validate_models when making the changes.

In `@slayer/cli.py`:
- Around line 757-760: If a user passes --datasource with a name that doesn't
exist, fail fast instead of returning no drift: update the call site around
_resolve_storage / SlayerQueryEngine to detect a missing datasource and
raise/exit with a non-zero status. Concretely, when args.datasource is set, have
_resolve_storage (or immediately after calling it) verify the requested
datasource exists and if not throw a descriptive error (e.g., ValueError) or
call process exit with a non-zero code before instantiating SlayerQueryEngine or
invoking run_sync(engine.validate_models(...)); include the datasource name in
the error message so callers can see the typo. Ensure validate_models is not
relied on to signal this condition later.

In `@slayer/mcp/server.py`:
- Around line 2397-2413: The validate_models mcp tool currently treats an empty
list from engine.validate_models as “no drift,” which masks typos or missing
datasources and allows introspection errors to surface as raw tool exceptions;
update validate_models to first verify the datasource exists via storage (or a
provided SlayerQueryEngine method) when data_source is given, then wrap the call
to engine.validate_models in a try/except that converts introspection errors
into a clear tool error response; if the datasource is missing return a distinct
error message (or non-empty result) indicating "unknown datasource" and only
return an empty JSON array when validation truly succeeds with no drift.

---

Nitpick comments:
In `@slayer/cli.py`:
- Line 693: Move the function-local imports (e.g., ingest_datasource_idempotent
from slayer.engine.ingestion and the other import added around line 755) to the
module top of slayer.cli with the other imports; remove the in-function import
statements so callers just reference the already-imported symbols, run/import
ordering checks, and if you encounter a circular-import error refactor minimally
(export the symbol from a different module or use a small helper) rather than
leaving imports inside functions.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: edfa6695-177e-4b05-a694-a91eff702101

📥 Commits

Reviewing files that changed from the base of the PR and between 9f4b302 and d33e654.

📒 Files selected for processing (9)

.claude/skills/slayer-models.md
.claude/skills/slayer-overview.md
CLAUDE.md
docs/concepts/models.md
slayer/api/server.py
slayer/cli.py
slayer/core/errors.py
slayer/mcp/server.py
tests/integration/test_mcp_inspect.py

✅ Files skipped from review due to trivial changes (1)

CLAUDE.md

…achable-stage filter Correctness fixes (CodeRabbit major) - Unknown-datasource handling on validate_models, mirrored across all three surfaces. engine.validate_models(data_source=<typo>) returns [] because no models match, which made a typo indistinguishable from "no drift". Each entry point now fails loudly: - api/server.py: POST /validate-models → 404 when storage.get_datasource is None; SQLAlchemyError → 422 with the friendly DB error message. - cli.py: slayer validate-models --datasource <name> → sys.exit(1) with "Datasource 'X' not found in <storage>". - mcp/server.py: validate_models tool → "Datasource 'X' not found." string, plus DBAPI errors routed through _friendly_db_error. - Query-backed cascade graph correctness: replace the legacy raw-prefix attribution with a per-stage join-graph walk. - New _StageGraph (stage_source_name, extension_targets, reachable, models_by_name) carries the resolved per-stage join context. - _build_stage_graph performs a transitive walk from the stage's resolved source through both the base model's stored joins and any extension-added joins. - _attribute_ref_to_base now walks the dotted ref hop-by-hop through the graph, so multi-hop refs like ``customers.regions.name`` from a stage rooted at ``orders`` resolve to ``regions.name`` (legacy prefix matching never could). - _stage_uses_dropped_join is now bounded by graph.reachable, so a dropped ``invoices → customers`` join no longer whole-drops a stage rooted at ``orders`` that uses ``customers.name`` via its own link. - All per-section ref helpers (_measure_refs_on_base / _dimension_refs / _time_dimension_refs / _filter_refs_on_base) now consume the graph instead of the bool sm_is_base flag. Docs - .claude/skills/slayer-overview.md: ingest_datasource_models entry rewritten to describe the rendered text summary the MCP tool actually returns; the structured IdempotentIngestResult is documented as the engine-layer contract underneath. validate_models entry adds the unknown-datasource error message to set agent expectations. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@slayer/cli.py`:
- Around line 767-769: Wrap the call to
run_sync(engine.validate_models(data_source=args.datasource)) in a try/except
that catches the expected validation/introspection exceptions raised by
SlayerQueryEngine.validate_models (e.g., the engine's custom
validation/introspection error types or a common base like
ValidationError/IntrospectionError), log/print a concise error message (not a
traceback) and exit with a non-zero status; on success continue to call
_format_validate_models_output(entries) and print the result. Ensure you
reference SlayerQueryEngine, validate_models, run_sync and args.datasource when
locating the code to modify.

In `@slayer/engine/schema_drift.py`:
- Around line 1721-1723: The call that populates live_tables uses
_live_schema_for_datasource but always introspects the default schema; update
the call site to pass the datasource's configured schema (e.g., pass
datasource.schema_name or datasource.config.schema_name) into
_live_schema_for_datasource so validate_models() diffs against the correct
schema, and adjust _live_schema_for_datasource's signature/implementation to
accept and use that schema_name parameter when introspecting. Ensure the symbol
names involved are _live_schema_for_datasource, live_tables, and validate_models
so the change is applied where the schema is propagated through.
- Around line 800-816: The dotted ref handling treats "<source>.<col>" as a join
hop; change it so when parts[:-1] equals [graph.stage_source_name] (i.e., the
ref is exactly "stage_source_name.column") you treat it as a same-model column
reference instead of walking joins: return the leaf column if
graph.stage_source_name == base_name, otherwise None. Update the logic around
ref/parts/leaf/path (in the function using graph.stage_source_name and
base_name) to short-circuit that case before iterating joins so root-qualified
refs map to the base model's column.

In `@slayer/mcp/server.py`:
- Around line 308-309: The current branch returns _empty_ingest_message when
additions is empty and result.to_delete and result.errors are false, which
misreports “No tables found” for idempotent ingests that simply had nothing new;
instead, before returning _empty_ingest_message from this condition in
server.py, perform a real live-table check (e.g., call the datasource API such
as ds.list_tables(schema_name) or an equivalent method to count/lookup tables)
and only return the empty-schema message if that lookup shows zero tables; if
tables exist, return a different message indicating “no additive changes” (or
add an explicit table_count field to the result) so agents/users aren’t misled
by the existing branch involving additions, result.to_delete, result.errors, and
_empty_ingest_message.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: da2ab6ed-f957-46dc-ad52-6cc1f050fa22

📥 Commits

Reviewing files that changed from the base of the PR and between d33e654 and 8d5dd81.

📒 Files selected for processing (5)

.claude/skills/slayer-overview.md
slayer/api/server.py
slayer/cli.py
slayer/engine/schema_drift.py
slayer/mcp/server.py

✅ Files skipped from review due to trivial changes (1)

.claude/skills/slayer-overview.md

…e error normalization Schema-drift correctness (CodeRabbit major) - engine/schema_drift.py:_attribute_ref_to_base now short-circuits when the dotted-ref path is exactly [stage_source_name] — refs like ``orders.amount`` from a stage rooted at ``orders`` previously failed to resolve because ``orders`` is not in its own join set, so the walk loop returned None. Root-qualified column refs now attribute to the base correctly, so query-backed drift attribution doesn't miss them after a base-column drop. - engine/schema_drift.py:_collect_sql_table_diffs now passes ``datasource.schema_name`` through to _live_schema_for_datasource. Without this, validate_models always introspected the dialect's default schema and produced false WholeModelDeletes for non-default- schema datasources. - engine/schema_drift.py:_StageGraph: explicit defaults on every field (Sonar S8396 — Pydantic v2 prefers Optional fields with explicit None default; Field(default_factory=...) for the dict/set fields). Surface error normalization (CodeRabbit major) - cli.py:_run_validate_models wraps ``engine.validate_models(...)`` in try/except. DB / auth / introspection failures previously escaped as raw Python tracebacks; they now print a one-line ``validate-models failed: <message>`` and exit 1, matching the rest of the CLI's error-handling shape. - mcp/server.py:_render_ingest_result distinguishes "schema is empty" from "no additive work to do". Previously, an idempotent ingest that had nothing to add (because every persisted model was sql / query-backed and got skipped silently) returned "No tables found in schema ... Try schema_name=..." — misleading the agent. Now probes ``_fetch_tables`` first and only shows the empty-schema hint when there are actually no live tables; otherwise reports "Datasource already in sync — no additive changes." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

sonarqubecloud · 2026-05-06T14:52:17Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
1.8% Duplication on New Code

See analysis details on SonarQube Cloud

coderabbitai Bot reviewed May 6, 2026

View reviewed changes

Comment thread slayer/api/server.py

Comment thread slayer/engine/ingestion.py

Comment thread slayer/engine/query_engine.py Outdated

Comment thread slayer/engine/schema_drift.py

coderabbitai Bot reviewed May 6, 2026

View reviewed changes

Comment thread slayer/engine/query_engine.py

Comment thread slayer/engine/schema_drift.py

Comment thread slayer/engine/schema_drift.py Outdated

coderabbitai Bot reviewed May 6, 2026

View reviewed changes

Comment thread slayer/engine/schema_drift.py Outdated

Comment thread slayer/engine/schema_drift.py

coderabbitai Bot reviewed May 6, 2026

View reviewed changes

Comment thread .claude/skills/slayer-overview.md Outdated

Comment thread slayer/api/server.py Outdated

Comment thread slayer/cli.py

Comment thread slayer/mcp/server.py

ZmeiGorynych mentioned this pull request May 6, 2026

Type-aware CAST emission + sqlglot-aligned DataType + v5 migration (DEV-1361) #103

Merged

7 tasks

coderabbitai Bot reviewed May 6, 2026

View reviewed changes

Comment thread slayer/cli.py

Comment thread slayer/engine/schema_drift.py

Comment thread slayer/engine/schema_drift.py

Comment thread slayer/mcp/server.py Outdated

ZmeiGorynych merged commit 258e4df into main May 6, 2026
4 checks passed

This was referenced May 10, 2026

DEV-1375: semantic search #112

Merged

Add embedding-based semantic search (third retrieval channel) #114

Merged

DEV-1392: add --ingest-on-startup to slayer serve / slayer mcp #116

Merged

Conversation

ZmeiGorynych commented May 6, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Surfaces

Cascade rules implemented

Test plan

Out of scope (per issue)

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ZmeiGorynych commented May 6, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 6, 2026 •

edited

Loading