Add schema-drift validation, idempotent ingestion, and SchemaDriftError (DEV-1356)#101
Conversation
…or (DEV-1356) - validate_models — read-only diff between persisted SlayerModels and live schemas, returning the minimal list of replayable deletes. Surfaced via engine.validate_models, MCP validate_models tool, REST POST /validate-models, and slayer validate-models CLI. - ingest_datasource_idempotent — additive-only re-ingestion that preserves customised column metadata and returns IdempotentIngestResult bundling validate_models output. Wired into MCP / REST / slayer ingest CLI. - SchemaDriftError — query-time wrap that attributes DBAPI failures via validate_models and surfaces a structured payload (HTTP 422 on REST). - apply_drift_deletes — engine helper used by slayer validate-models --force-clean [--yes] to apply the diff. CLI-only; not exposed via MCP/REST. - Cascade walker handles derived columns, measures, joins, filters, cross-model refs, and query-backed source_queries chains. PK drops do not cascade. Cascade walking stays within parent datasource. - 56 new tests (46 unit + 10 idempotent + 5 SchemaDriftError + 8 force-clean + 3 DuckDB integration). Full unit suite passes (1889). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Caution Review failedFailed to post review comments 📝 WalkthroughWalkthroughImplements schema-drift detection and remediation plus an idempotent ingestion flow: new drift payload types and validator, additive idempotent ingest that merges discovered models into persisted ones, query-engine drift attribution and apply APIs, new REST/CLI/MCP endpoints, documentation updates, and extensive tests. ChangesSchema Drift & Idempotent Ingestion
Sequence Diagram(s)sequenceDiagram
actor User
participant CLI
participant Engine as QueryEngine
participant Storage
participant DB as LiveDB
participant Drift as DriftValidator
User->>CLI: slayer validate-models
CLI->>Storage: load persisted SlayerModels
CLI->>Drift: validate_datasource(models, sql clients)
Drift->>DB: introspect / trial-execute
DB-->>Drift: live metadata
Drift->>Drift: diff & compute_datasource_drops
Drift-->>CLI: ToDeleteEntry[] (edits / whole-drops)
CLI->>User: display diff
alt User confirms --force-clean
CLI->>Engine: apply_drift_deletes(deletes)
Engine->>Storage: edit/delete persisted models
Storage-->>Engine: applied entries
Engine->>Drift: validate_datasource(post-apply)
Drift-->>Engine: residual drift
Engine-->>CLI: ApplyDriftResult
CLI->>User: report applied + residuals
end
sequenceDiagram
actor User
participant CLI
participant Ingest as IngestEngine
participant Storage
participant DB as LiveDB
participant Drift as DriftValidator
User->>CLI: slayer ingest (idempotent)
CLI->>Ingest: ingest_datasource_idempotent(datasource, storage, clients)
Ingest->>Ingest: generate fresh models
Ingest->>Storage: load persisted models
loop per table
alt persisted exists
Ingest->>Ingest: _additive_merge_existing(persisted, fresh)
Ingest->>Storage: save if additions
else
Ingest->>Storage: create new model
end
end
Ingest->>Drift: validate_datasource(merged models)
Drift->>DB: introspect/trial-execute
Drift-->>Ingest: to_delete (type drift)
Ingest-->>CLI: IdempotentIngestResult(additions, to_delete, errors)
CLI->>User: show created/updated + pending drift + errors
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 9
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/concepts/ingestion.md`:
- Around line 148-150: The doc wording overstates the guarantee for
exclude_tables: update the sentence near
validate_models/IdempotentIngestResult.to_delete to either (A) narrow the docs
to say exclude_tables only prevents touching models in additive pass and
validator for non-sql_table modes (i.e., call out that sql/query-backed models
are still passed into validate_models), or (B) change the validator in
slayer/engine/ingestion.py so validate_models respects exclude_tables for
sql/query-backed models too (skip feeding sql/query-backed models into
validate_models when their table is excluded). Reference symbols:
validate_models, IdempotentIngestResult.to_delete,
include_tables/exclude_tables, and the validator invocation in
slayer/engine/ingestion.py.
- Around line 65-77: The example calls await ingest_datasource_idempotent at
module scope which causes a SyntaxError; wrap the async call inside an async
function (e.g., async def main()) and invoke it with asyncio.run(main()), or
alternatively show a synchronous wrapper if a sync API is available; update the
snippet to call ingest_datasource_idempotent from that function and preserve the
use of result.additions / result.to_delete / result.errors for results handling
so readers can execute the example.
In `@docs/examples/schema-drift.md`:
- Line 28: Several fenced code blocks in docs/examples/schema-drift.md are
missing language identifiers (triggering markdownlint MD040); update each of the
output-style triple-backtick blocks (the terminal/output samples such as the
"EDIT MODEL: products..." block, the "slayer validate-models ..." commands
block, the "Applied 1 entry/entries." block, and the "Updated: products..."
block) to include a language tag (e.g., text) immediately after the opening ```
so each fenced output block becomes ```text; ensure you update all occurrences
referenced in the review (the multiple output/terminal examples) for
consistency.
- Around line 55-73: The example output incorrectly shows an interactive
confirmation despite using the CLI flags --force-clean --yes; update the docs so
the slayer validate-models --datasource demo --force-clean --yes example
reflects non-interactive behavior by removing the "Apply ...? [y/N] y" prompt
and showing it applied automatically (or explicitly note that --yes
auto-approves), and ensure any referenced internal call names like
validate_models remain consistent with the command output.
In `@slayer/api/server.py`:
- Around line 171-182: The /query endpoint decorator's responses parameter needs
a 422 entry describing the schema-drift error shape returned when
SchemaDriftError is raised; update the decorator on the /query route to add
responses={..., 422: {"description": "schema_drift", "content":
{"application/json": {"example": {"error":"schema_drift","models": [...],
"to_delete":[...], "original": null}}}}} (or an equivalent OpenAPI schema) so
the documented response matches the raised HTTPException payload (keys: "error",
"models", "to_delete", "original") referenced where SchemaDriftError is handled
in the try/except block that raises the 422.
- Around line 371-380: The call to ingest_datasource_idempotent in server.py can
raise DB/auth/introspection exceptions that lead to 500s; wrap the await
ingest_datasource_idempotent(...) call in a try/except that catches those
operational exceptions (e.g., connection/auth/introspection related exceptions
your ingestion layer raises) and raise fastapi.HTTPException with a 4xx status
(400 or 422) and a client-friendly message, log the original exception for
server-side debugging, and return the same result.model_dump(...) on success;
reference the ingest_datasource_idempotent call and the surrounding handler in
server.py when making this change.
In `@slayer/engine/ingestion.py`:
- Around line 826-831: The ingest path currently allows a physical column
literally named "count" to persist; update the shared ingest logic in
ingest_datasource so that during auto-ingestion any source column named "count"
is normalized (renamed) to "count_col" before it is persisted or returned,
ensuring idempotent re-ingestion and avoiding conflicts with the reserved
"*:count" naming; target the ingest_datasource function and any helper that
builds the final column list/schema so the rename happens early in the pipeline
(before persistence) and is consistently applied on reruns.
In `@tests/test_force_clean.py`:
- Around line 260-267: The post-apply assertion in tests/test_force_clean.py is
too permissive (it accepts any "applied" text); update the assertion after
calling _run_cli so it verifies residual drift is actually cleared by asserting
either "no remaining drift" is present in result.stdout.lower(), or that
"applied" appears AND there is no mention of residual words (e.g., ensure
"residual" or "remaining" are not in result.stdout.lower()); keep using the
existing result from the _run_cli call to locate the check.
- Around line 275-279: The test calls _run_cli(...) with ["validate-models",
"--datasource", "ds", "--force-clean"] but ignores its result so failures before
the abort prompt are missed; capture the result (e.g., assign to a variable like
result) and assert it completed successfully (check result.exit_code == 0 or
result.returncode == 0 depending on _run_cli's return type) before continuing
with the abort-path assertions to ensure the CLI invocation didn't fail
silently.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8f8275e7-2d69-4e7b-9059-f47b0c05c740
📒 Files selected for processing (19)
.claude/skills/slayer-models.md.claude/skills/slayer-overview.mdCLAUDE.mddocs/concepts/ingestion.mddocs/concepts/models.mddocs/concepts/schema-drift.mddocs/examples/schema-drift.mdslayer/api/server.pyslayer/cli.pyslayer/core/errors.pyslayer/engine/ingestion.pyslayer/engine/query_engine.pyslayer/engine/schema_drift.pyslayer/mcp/server.pytests/integration/test_schema_drift_duckdb.pytests/test_force_clean.pytests/test_idempotent_ingestion.pytests/test_schema_drift_error.pytests/test_validate_models.py
…d /ingest (PR #101 review) - tests/test_api_server.py: 3 tests (test_post_models_creates_query_backed_model, test_post_models_rejects_user_columns_on_query_backed, test_post_query_run_by_name_dry_run_returns_sql_without_executing) used asyncio.get_event_loop().run_until_complete(...). After the new test_force_clean.py tests run via slayer.async_utils.run_sync() (which calls asyncio.run() and clears the main-thread loop on exit), get_event_loop() raises RuntimeError. Replace with run_sync() to match the rest of the suite. - tests/integration/test_mcp_inspect.py: ingest_datasource_models now renders "Created N new model(s):" / "Updated N existing model(s):" instead of the legacy "Ingested N model(s):" wording. Update the assertion. - slayer/api/server.py /query: add 422 schema-drift response to the OpenAPI decorator (CodeRabbit + Sonar S8415 — documented body must match the SchemaDriftError translation introduced in this PR). - slayer/api/server.py /ingest: wrap ingest_datasource_idempotent in try/except (OperationalError, DatabaseError, SQLAlchemyError) so bad datasource configuration surfaces as 422 with a friendly message instead of a 500 (CodeRabbit major). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (2)
slayer/api/server.py (1)
394-396: ⚡ Quick winMove endpoint-local imports to module scope
These imports should be at the top of the file for consistency and clearer import-time failure behavior.
As per coding guidelines, "`**/*.py`: Imports at the top of files".Suggested diff
from fastapi import FastAPI, HTTPException from pydantic import BaseModel, ConfigDict +from sqlalchemy.exc import DatabaseError, OperationalError, SQLAlchemyError from slayer.mcp.server import create_mcp_server from slayer.core.errors import AmbiguousModelError, SchemaDriftError from slayer.core.format import NumberFormat from slayer.core.models import DatasourceConfig, SlayerModel from slayer.core.query import SlayerQuery +from slayer.engine.ingestion import ingest_datasource_idempotent from slayer.engine.query_engine import SlayerQueryEngine from slayer.storage.base import StorageBackend ... - from sqlalchemy.exc import DatabaseError, OperationalError, SQLAlchemyError - from slayer.engine.ingestion import ingest_datasource_idempotent - try: result = await ingest_datasource_idempotent(🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@slayer/api/server.py` around lines 394 - 396, The imports DatabaseError, OperationalError, SQLAlchemyError and ingest_datasource_idempotent are declared inside the endpoint and should be moved to module scope for consistency; remove the in-function imports and add "from sqlalchemy.exc import DatabaseError, OperationalError, SQLAlchemyError" and "from slayer.engine.ingestion import ingest_datasource_idempotent" to the top-of-file imports so the endpoint (where ingest_datasource_idempotent is used and exceptions are handled) references these already-imported symbols.tests/test_api_server.py (1)
266-267: ⚡ Quick winConsolidate
run_syncimport at file top
run_syncis imported three times inside functions; move it to module imports.As per coding guidelines, "`**/*.py`: Imports at the top of files".Suggested diff
import pytest from fastapi.testclient import TestClient from slayer.api.server import QueryRequest, create_app +from slayer.async_utils import run_sync from slayer.core.enums import DataType from slayer.core.models import Column, SlayerModel from slayer.core.query import SlayerQuery from slayer.storage.yaml_storage import YAMLStorage ... - from slayer.async_utils import run_sync run_sync( storage.save_datasource(DatasourceConfig( name="ds", type="sqlite", database=":memory:" )) ) ... - from slayer.async_utils import run_sync run_sync( storage.save_datasource(DatasourceConfig( name="ds", type="sqlite", database=":memory:" )) ) ... - from slayer.async_utils import run_sync run_sync(storage.save_datasource( DatasourceConfig(name="ds", type="sqlite", database=":memory:") ))Also applies to: 300-301, 404-405
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_api_server.py` around lines 266 - 267, Several test functions import run_sync locally; move the run_sync import to the module-level imports and remove the duplicate local imports inside the test functions so the symbol is imported once. Locate uses of run_sync in your test functions (the local "from slayer.async_utils import run_sync" occurrences) and add a single "from slayer.async_utils import run_sync" at the top of the file, then delete the in-function import statements and ensure tests still reference the top-level run_sync.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@slayer/api/server.py`:
- Around line 394-396: The imports DatabaseError, OperationalError,
SQLAlchemyError and ingest_datasource_idempotent are declared inside the
endpoint and should be moved to module scope for consistency; remove the
in-function imports and add "from sqlalchemy.exc import DatabaseError,
OperationalError, SQLAlchemyError" and "from slayer.engine.ingestion import
ingest_datasource_idempotent" to the top-of-file imports so the endpoint (where
ingest_datasource_idempotent is used and exceptions are handled) references
these already-imported symbols.
In `@tests/test_api_server.py`:
- Around line 266-267: Several test functions import run_sync locally; move the
run_sync import to the module-level imports and remove the duplicate local
imports inside the test functions so the symbol is imported once. Locate uses of
run_sync in your test functions (the local "from slayer.async_utils import
run_sync" occurrences) and add a single "from slayer.async_utils import
run_sync" at the top of the file, then delete the in-function import statements
and ensure tests still reference the top-level run_sync.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 02f7750a-dccd-450b-9a6b-e8aef9438d88
📒 Files selected for processing (3)
slayer/api/server.pytests/integration/test_mcp_inspect.pytests/test_api_server.py
Docs (CodeRabbit minor): - docs/concepts/ingestion.md: wrap the Python ingest_datasource_idempotent example in `async def main()` + `asyncio.run(main())` so it's executable (await at module scope is a SyntaxError). - docs/concepts/ingestion.md: narrow the exclude_tables wording — the filter scopes the additive pass and the sql_table-mode subset of validation; sql-mode and query-backed models are still validated regardless of the table filter (matches the actual implementation). - docs/examples/schema-drift.md: add `text` language tag to fenced output blocks (markdownlint MD040). Fix --force-clean --yes example to reflect non-interactive behaviour (drop the [y/N] prompt from the sample output; document the prompt explicitly outside the code block). Sonar refactors (S3776 + S1066): - slayer/engine/schema_drift.py: split _cascade_one_pass (CC 135) into per-rule helpers (_cascade_derived_columns, _cascade_measures, _cascade_joins, _cascade_filters, _cascade_query_backed) sharing a plain _CascadeState container (Pydantic was tried first but it copies the dict/set fields, breaking in-place cascade mutations). Refactor _walk_field_spec_measure_refs (CC 31) and _stage_referenced_columns_for_base (CC 44) into smaller per-section helpers. Flatten the rule-3a nested if into a single boolean OR. - slayer/engine/query_engine.py: drop the no-op `for cm in enriched.cross_model_measures: pass` block (S1481) — the comment now lives on the data_sources expression. - slayer/cli.py: split _run_ingest (CC 21) and _format_validate_models_output (CC 22) into focused helpers. - slayer/engine/ingestion.py: split ingest_datasource_idempotent (CC 22) into _process_one_table + _scoped_models_for_validation helpers. - slayer/mcp/server.py: split _render_ingest_result (CC 29) into per-section renderers; drop unused datasource_name parameter (S1172). Test cleanups (CodeRabbit minor + Sonar S1481/S7503): - tests/test_force_clean.py: tighten post-apply assertion — re-run validate-models and assert the diff is empty rather than matching on free-text. Capture and assert returncode in the abort-path test. - tests/test_validate_models.py: rename unused `dropped` to `_`. - tests/test_schema_drift_error.py: NOSONAR(S7503) the patched async side_effect (must be a coroutine function for the awaited engine method). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@slayer/cli.py`:
- Around line 620-631: After running ingest_datasource_idempotent and printing
additions/drift, check the returned result.errors and if it's non-empty exit
with a failing status (e.g., sys.exit(1) or raise SystemExit(1)) so the CLI
returns non-zero on model errors; update the block around
ingest_datasource_idempotent/result/_print_ingest_drift_and_errors to mirror the
validate-models --force-clean behavior and call sys.exit(1) when result.errors
exists.
In `@slayer/engine/ingestion.py`:
- Around line 900-905: ingest_datasource_idempotent() currently calls
ingest_datasource(...) synchronously on the async path, which blocks the event
loop during SQLAlchemy introspection; change the call site to offload the
synchronous work to a thread (e.g. wrap the ingest_datasource(...) call in
asyncio.to_thread(...) or an equivalent executor) before awaiting any async
merge/save work so the event loop remains responsive; update the code that
invokes ingest_datasource (referenced as ingest_datasource(...) inside
ingest_datasource_idempotent) to call await asyncio.to_thread(lambda:
ingest_datasource(...)) and keep subsequent async operations unchanged.
In `@slayer/engine/query_engine.py`:
- Around line 494-528: _collect_models_touched currently only adds direct join
targets (model.joins) so multihop join targets (e.g., orders -> customers ->
regions) are omitted; update _collect_models_touched to walk joins transitively
by following enriched.resolved_joins (and stage.joins for source_queries) to
collect all reachable target_model names, avoiding cycles (track visited names)
and adding each visited model name to the touched set so validate_models can
attribute DBAPI errors to SchemaDriftError as intended.
- Around line 717-749: The edit_model_remove() path currently mutates cached
fields and calls storage.save_model(updated) directly, which will persist cache
state for query-backed models and bypass engine-side validation/refresh; update
edit_model_remove() to detect query-backed models (e.g., check
existing.source_queries or the model's query-backed indicator) and either
raise/reject the operation for those models or call the engine-level helper
SlayerQueryEngine.save_model(updated) instead of storage.save_model; ensure you
reference existing, edit_model_remove, storage.save_model, and
SlayerQueryEngine.save_model when making the change.
- Around line 780-832: The loop over `deletes` only adds `entry.data_source` to
`touched_ds` on success, so datasources whose all applies fail are skipped from
the post-apply `validate_models()` pass; fix by ensuring
`touched_ds.add(entry.data_source)` runs regardless of success or failure (e.g.,
add it before the try or in a finally block) in the loop that calls
`delete_model` and `edit_model_remove`, so `validate_models(data_source=...)`
and the subsequent `residual` calculation will always run for every touched
datasource when constructing the `ApplyDriftResult`.
In `@slayer/engine/schema_drift.py`:
- Around line 759-835: _stage_referenced_columns_for_base currently only treats
same-model (no-dot) refs as belonging to base_name and misses dotted refs that
resolve to base via joins; update it to also attribute dotted refs whose prefix
model resolves to base_name through the stage's join graph. Concretely, in
_stage_referenced_columns_for_base, walk stage.joins (and any alias/name fields
on join objects) to build the set of models reachable from the stage's source
model (follow joins transitively), then for each dotted ref returned by
_measure_refs_on_base, _dimension_refs_on_base, _time_dimension_refs_on_base,
and _filter_refs_on_base treat the ref as referencing base_name if its prefix
resolves (via that join traversal) to base_name; keep existing same-model logic
(sm_is_base) unchanged. This ensures _query_backed_should_whole_drop sees hits
when a stage references columns via joins that map back to the base model.
- Around line 992-1032: _cascade_joins currently uses foreign_missing = [pair[1]
for pair in join.join_pairs if pair[1] in state.cascadable(tgt.name)], which
ignores primary keys removed by state.cascadable and therefore fails to mark the
join itself as invalid when a target PK is dropped; update _cascade_joins to
treat missing target PKs as a reason to add a dropped join regardless of
state.cascadable filtering by checking join.join_pairs for any pair[1] that is
absent from tgt.columns (or otherwise not present on the live target model) and
call _add_dropped_join (same args as existing calls) with a clear reason (e.g.,
"Foreign column(s) ... dropped on target model ...") so the join is invalidated;
reference functions/vars: _cascade_joins, join.join_pairs, foreign_missing,
state.cascadable, state.models_by_name, _add_dropped_join.
In `@tests/test_schema_drift_error.py`:
- Around line 163-165: The test currently only asserts that the raised exception
is not SchemaDriftError; change it to assert the original DB failure by either
checking exc.value is an instance of the concrete DBAPI exception you expect
(e.g., OperationalError/DatabaseError class used by your DB backend) or by
asserting a stable substring from the underlying DB error message is present
(e.g., "no such table" or the specific error text your DB returns) after calling
engine.execute(q) in the pytest.raises block; apply the same tighter assertion
to the other occurrence at the second block (lines 184-189) so the test fails if
the original DBAPI error path is not reached.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b10eb18c-a013-41fb-b17c-61222036fee1
📒 Files selected for processing (10)
docs/concepts/ingestion.mddocs/examples/schema-drift.mdslayer/cli.pyslayer/engine/ingestion.pyslayer/engine/query_engine.pyslayer/engine/schema_drift.pyslayer/mcp/server.pytests/test_force_clean.pytests/test_schema_drift_error.pytests/test_validate_models.py
…ardening Correctness fixes (CodeRabbit major) — schema drift attribution and apply - engine/query_engine.py:_collect_models_touched walks the join graph transitively (visited-set guarded) so multi-hop targets like orders→customers→regions are included. SchemaDriftError now triggers when drift lives on the second hop. Made the helper async + storage-aware. Also fixed a latent bug: SlayerQuery has no .joins attribute, so the prior `for j in stage.joins or []` would have raised AttributeError on any query-backed model with stage joins — switched to getattr defaults. - engine/query_engine.py:apply_drift_deletes records each entry's data_source up front (before the mutation) so post-apply validate_models still re-runs when every entry on a DS fails. - engine/query_engine.py:edit_model_remove rejects query-backed models with a clear error pointing at engine.save_model. Bypassing it would persist stale cache that no longer matches source_queries. - engine/schema_drift.py:_cascade_joins uses raw dropped_cols (not the PK-filtered cascadable) when checking the foreign side of a join. A target PK drop still invalidates the join itself even though rule 7 stops downstream column cascades. - engine/schema_drift.py:_query_backed_should_whole_drop iterates every candidate base model in the DS (passed via candidate_base_names from _cascade_query_backed) so cross-model dotted refs in stage measures / dimensions / filters are caught when the *referenced* model drifts — not just when the stage's own source_model drifts. Adds whole-drop detection on stage.joins targets too (defensive — handled via getattr since SlayerQuery itself has no joins field). CLI / API hardening (CodeRabbit major + Sonar S5713 + S5145) - cli.py:_run_ingest exits 1 when result.errors is non-empty, mirroring validate-models --force-clean. Automation can no longer treat partial ingest as success. - api/server.py /ingest catches only SQLAlchemyError (OperationalError and DatabaseError both derive from it — Sonar S5713). Strips CR/LF from request.datasource before logging or composing into the response detail (Sonar S5145 log-injection). Sonar refactors (S3776) and test tightening - engine/schema_drift.py:diff_sql_table_model split into _diff_sql_table_columns + _diff_sql_table_joins (CC 16 → ≤15). - engine/schema_drift.py:_cascade_measures factored a _first_dropped_cause helper out of the inner ref loop (CC 18 → ≤15). - engine/schema_drift.py:validate_datasource split into _resolve_live_table + _collect_sql_table_diffs + _collect_sql_diffs (CC 25 → ≤15). - tests/test_schema_drift_error.py: NOSONAR(S7503) moved onto the `async def _no_drift` line (Sonar requires it on the offending line). Tightened the two re-raise tests: assert pytest.raises(SQLAlchemyError) and check a stable substring of the original DBAPI error so a regression that masks the original failure path actually fails. Async architecture (CodeRabbit major) - engine/ingestion.py:ingest_datasource_idempotent wraps the synchronous ingest_datasource(...) call in await asyncio.to_thread(...) so a slow or large datasource doesn't block the event loop while server-facing requests are in flight. Matches the project's async-first contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (1)
slayer/engine/schema_drift.py (1)
721-789:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftMulti-hop stage refs still evade whole-drop detection.
_attribute_ref_to_base()only matches exact dotted prefixes, so a stage ref likecustomers.regions.nameis never attributed to the terminalregionsmodel. Ifregions.namedrifts,_query_backed_should_whole_drop()misses the hit and leaves a query-backed model whose SQL can no longer resolve.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@slayer/engine/schema_drift.py` around lines 721 - 789, _attribute_ref_to_base currently only matches when the dotted prefix equals base_name exactly, so multi-hop refs like "customers.regions.name" are not attributed to the terminal model; change _attribute_ref_to_base(ref, base_name, sm_is_base) to, when ref contains dots, extract the prefix and test the last segment of the prefix (prefix.rsplit(".",1)[-1]) against base_name (return leaf if they match) instead of comparing the whole prefix, leaving the existing no-dot behavior (return ref if sm_is_base else None) intact; callers to update implicitly are _measure_refs_on_base, _dimension_refs_on_base, _time_dimension_refs_on_base, and _filter_refs_on_base which rely on _attribute_ref_to_base.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@slayer/api/server.py`:
- Around line 402-420: The handler currently returns 200 whenever
ingest_datasource_idempotent(...) completes, even if result.errors is non-empty;
modify the post-ingest logic in server.py to detect non-empty result.errors and
raise an HTTPException (e.g., status_code=422) with a descriptive detail payload
listing the partial errors instead of returning result.model_dump(...), so
clients receive a failure status when per-model save/merge errors occurred;
reference the ingest_datasource_idempotent call and the result.errors property
when implementing this check.
In `@slayer/engine/ingestion.py`:
- Around line 775-783: The loop currently appends a fresh ModelJoin even when
its target_model already exists with a different join_pairs signature, which
creates duplicate targets; change the logic in the block that iterates over
fresh.joins to first build a set of existing target names (from persisted.joins)
and also check new_join_targets to avoid duplicates, then for each j: if
j.target_model is in existing targets or new_join_targets, compare the signature
(tuple(sorted((p[0],p[1]) for p in j.join_pairs))) to the existing signature
from _existing_join_signatures(persisted); if signatures match skip/continue,
but if they differ raise/return a surfaced conflict (e.g., raise ValueError with
a clear message mentioning j.target_model) instead of appending a second join;
update new_join_targets when you append to new_joins.
In `@slayer/engine/query_engine.py`:
- Around line 514-530: The loop that builds touched from model.source_queries
misses joins defined on source_model extensions because it only checks
getattr(stage, "joins", None); update the logic in the block iterating stages
(handling model.source_queries) to, for each stage, if getattr(stage, "joins",
None) exists add those join targets, and additionally if the stage has a
source_model that is a ModelExtension-like object (i.e. not just a string) or if
source_model is a dict/extension container, inspect that source_model's joins as
well and add each join.target_model to touched; use the existing symbols stage,
stage.source_model, ModelExtension/SlayerQuery semantics, and ensure
touched.add(...) is called for those join targets so _maybe_raise_schema_drift()
sees them.
In `@slayer/engine/schema_drift.py`:
- Around line 697-719: _resolve_stage_source_to_base currently only handles str
and SlayerModel and returns None for ModelExtension; update it to detect
ModelExtension and return the underlying base model name (e.g.,
extension.base_model or extension.source_model whichever holds the base) so
query-backed stages resolve correctly, and likewise update _stage_join_targets
to inspect a stage's ModelExtension when stage.source_model is an extension (use
the extension's source/base model when collecting join targets). Ensure both
places reference ModelExtension, stage.source_model,
_resolve_stage_source_to_base, and _stage_join_targets so
_query_backed_should_whole_drop will include extension-backed stages; apply the
same fix pattern to the analogous code block around the other occurrence (lines
~806-819).
---
Duplicate comments:
In `@slayer/engine/schema_drift.py`:
- Around line 721-789: _attribute_ref_to_base currently only matches when the
dotted prefix equals base_name exactly, so multi-hop refs like
"customers.regions.name" are not attributed to the terminal model; change
_attribute_ref_to_base(ref, base_name, sm_is_base) to, when ref contains dots,
extract the prefix and test the last segment of the prefix
(prefix.rsplit(".",1)[-1]) against base_name (return leaf if they match) instead
of comparing the whole prefix, leaving the existing no-dot behavior (return ref
if sm_is_base else None) intact; callers to update implicitly are
_measure_refs_on_base, _dimension_refs_on_base, _time_dimension_refs_on_base,
and _filter_refs_on_base which rely on _attribute_ref_to_base.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 10c6322c-978d-48de-945d-379e2484a7a6
📒 Files selected for processing (6)
slayer/api/server.pyslayer/cli.pyslayer/engine/ingestion.pyslayer/engine/query_engine.pyslayer/engine/schema_drift.pytests/test_schema_drift_error.py
…Sonar splits
Correctness fixes (CodeRabbit major)
- api/server.py /ingest now mirrors the CLI's exit-1 behaviour: when
result.errors is non-empty, raise HTTPException(422) with the full
IdempotentIngestResult body so API clients see partial failures
instead of a 200.
- engine/ingestion.py:_additive_merge_existing surfaces a ValueError when
a fresh ingest discovers a different join_pairs signature for an
existing target_model. Downstream consumers key joins by target_model
alone — appending a second one would let the stale join shadow the
live one and ``remove.joins=[name]`` would wipe both.
- ModelExtension stages were silently skipped by query-backed drift
attribution. SlayerQuery has no .joins attribute; joins on a stage
live on its source_model when that's a ModelExtension, and
ModelExtension carries source_name (not source_model). Two fixes:
- engine/schema_drift.py:_resolve_stage_source_to_base now follows
extension.source_name (str) so query-backed stages resolve to a real
base model; _stage_join_targets reads off
stage.source_model.joins via getattr.
- engine/query_engine.py:_collect_query_backed_base_names mirrors the
same getattr lookup so multi-hop drift attribution covers
extension-added joins.
Sonar S3776 (cognitive complexity) refactors
- query_engine.py:_collect_models_touched (CC 18 → ≤15): extracted
_collect_query_backed_base_names static helper so the orchestrator
just unions sets and triggers the join-graph walk.
- schema_drift.py:_query_backed_should_whole_drop (CC 22 → ≤15):
extracted _check_stage_for_whole_drop covering the per-stage
base-name / join-target / candidate-base check; outer loop just
iterates stages.
- schema_drift.py:_cascade_derived_columns (CC 18 → ≤15): extracted
_first_dropped_sql_column_ref returning Optional[(target, ref_col)];
the loop body becomes a flat dispatch.
- schema_drift.py:compute_datasource_drops (CC 25 → ≤15): extracted
_seed_state_from_diffs and _collapse_entries; orchestrator becomes
seed → iterate → collapse.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (2)
slayer/engine/ingestion.py (1)
775-799:⚠️ Potential issue | 🟠 Major | ⚡ Quick winStill reject duplicate join targets introduced in the same fresh ingest pass.
This guard only checks
existing_join_targets. Iffresh.joinscontains two different signatures for the sametarget_model, both still get appended becausenew_join_targetsis never consulted. Downstream code still keys joins bytarget_model, so the ambiguity remains.Possible minimal fix
for j in fresh.joins: sig = (j.target_model, tuple(sorted((p[0], p[1]) for p in j.join_pairs))) if sig in existing_join_sigs: continue - if j.target_model in existing_join_targets: + if j.target_model in existing_join_targets or j.target_model in new_join_targets: # Same target_model already present with a different # join_pairs signature. Downstream consumers key joins by # target_model only — appending a second one would let the # stale join shadow the live one and ``remove.joins=[name]`` # would wipe both. Surface the conflict so the user can🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@slayer/engine/ingestion.py` around lines 775 - 799, The loop that merges fresh.joins fails to detect conflicting joins introduced within the same ingest because it only checks existing_join_targets and not duplicates already appended in this run; update the logic in the for j in fresh.joins loop (which builds new_joins and new_join_targets) to also check whether j.target_model is already present in new_join_targets (or maintain a set of appended targets) and raise the same ValueError if a different join signature for the same target_model is found; ensure you compare signatures the same way as existing_join_sigs (using sig = (j.target_model, tuple(sorted(...)))) so identical signatures are allowed but differing ones are rejected.slayer/engine/schema_drift.py (1)
731-813:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftMultihop stage refs still never attribute the terminal model.
_attribute_ref_to_base()only matchesprefix == base_name, so a ref likecustomers.regions.nameis never attributed toregions. Ifregions.namedrifts,_query_backed_should_whole_drop()keeps the query-backed model even though itssource_queriescan no longer resolve.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@slayer/engine/schema_drift.py` around lines 731 - 813, _attribute_ref_to_base currently only matches a single-dot prefix equality so multihop refs like "customers.regions.name" never attribute to the terminal model; change _attribute_ref_to_base to recognize when the dotted prefix ends with the base_name (e.g. prefix == base_name or prefix.endswith("." + base_name)) and return the leaf in that case, preserving the existing behavior for no-dot refs (use ref if sm_is_base else None); update callers (_stage_referenced_columns_for_base, _measure_refs_on_base, _dimension_refs_on_base, _time_dimension_refs_on_base, _filter_refs_on_base) require no other changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@slayer/engine/query_engine.py`:
- Around line 524-541: The helper currently seeds touched with the
possibly-virtual model.name, which loses attribution for query-backed stages;
change _collect_models_touched to detect query-backed/virtual models (e.g.,
check model.is_query_backed or presence of source/backing info on SlayerModel)
and initialize touched with the persisted backing base names instead of only
model.name by calling _collect_query_backed_base_names first (and adding
model.name when appropriate), then proceed to add cross_model_measures and
expand the join graph so drift validation sees the real persisted models.
In `@slayer/engine/schema_drift.py`:
- Around line 855-901: The function _check_stage_for_whole_drop currently never
considers dropped_joins so stages that reference a base via a removed FK
survive; add a dropped_joins parameter (e.g. Dict[str, Set[str]] or Set tuples)
to _check_stage_for_whole_drop and update callers, then treat any join target
returned by _stage_join_targets(stage) as fatal if it appears in
whole_dropped_models OR is listed in dropped_joins (and similarly consider
candidate base names that have dropped joins when calling
_check_stage_against_base); use the symbols _check_stage_for_whole_drop,
_stage_join_targets and _check_stage_against_base to locate the changes and
ensure the returned DeleteReason mirrors existing messages for dropped joins.
- Around line 1489-1492: The wrapper SELECT fails when model.sql ends with a
statement terminator (e.g., "SELECT 1;"), so before building trial_sql in the
validate flow strip trailing semicolons and trailing whitespace from model.sql;
update the code that constructs trial_sql (where trial_sql = f"SELECT * FROM
({model.sql}) AS _sd_validate WHERE 1=0") to use a cleaned_sql variable (trimmed
of trailing ';' and whitespace) and pass that into the f-string, ensuring
client.get_column_types receives a wrapper without a trailing terminator; adjust
any related usage in validate_models()/trial_sql construction accordingly.
---
Duplicate comments:
In `@slayer/engine/ingestion.py`:
- Around line 775-799: The loop that merges fresh.joins fails to detect
conflicting joins introduced within the same ingest because it only checks
existing_join_targets and not duplicates already appended in this run; update
the logic in the for j in fresh.joins loop (which builds new_joins and
new_join_targets) to also check whether j.target_model is already present in
new_join_targets (or maintain a set of appended targets) and raise the same
ValueError if a different join signature for the same target_model is found;
ensure you compare signatures the same way as existing_join_sigs (using sig =
(j.target_model, tuple(sorted(...)))) so identical signatures are allowed but
differing ones are rejected.
In `@slayer/engine/schema_drift.py`:
- Around line 731-813: _attribute_ref_to_base currently only matches a
single-dot prefix equality so multihop refs like "customers.regions.name" never
attribute to the terminal model; change _attribute_ref_to_base to recognize when
the dotted prefix ends with the base_name (e.g. prefix == base_name or
prefix.endswith("." + base_name)) and return the leaf in that case, preserving
the existing behavior for no-dot refs (use ref if sm_is_base else None); update
callers (_stage_referenced_columns_for_base, _measure_refs_on_base,
_dimension_refs_on_base, _time_dimension_refs_on_base, _filter_refs_on_base)
require no other changes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 065fec1f-6414-4b22-b8d9-91f208a939e3
📒 Files selected for processing (4)
slayer/api/server.pyslayer/engine/ingestion.pyslayer/engine/query_engine.pyslayer/engine/schema_drift.py
… sql terminator Correctness fixes (CodeRabbit major) - engine/query_engine.py:_collect_models_touched seeds touched from the PERSISTED model with the same name when the resolved ``model`` is a virtual stage from _query_as_model(). For an exec like ``stage1(source_model="orders") -> final(source_model="stage1")``, the resolved model.name was ``stage1`` (a virtual name), so validate_models produced deletes for ``orders`` and the touched-model filter dropped them — execute() then re-raised the raw DB error instead of SchemaDriftError. Falls back to storage.get_model(model.name, ...) and unions in its query-backed base names. - engine/schema_drift.py:_check_stage_for_whole_drop now considers dropped_joins (threaded through _query_backed_should_whole_drop and _cascade_query_backed). A stage with source_model="orders" and dimensions=["customers.name"] whose orders→customers join was removed (FK drifted) used to silently survive; it now whole-drops via a new _stage_uses_dropped_join helper that checks each candidate base for references through any dropped-join target. - engine/schema_drift.py:_live_columns_for_sql_model strips trailing whitespace and a single ';' from model.sql before wrapping it in ``SELECT * FROM (...) AS _sd_validate WHERE 1=0``. A persisted ``SELECT 1;`` is valid at top level but invalid inside the wrapper — without the strip, the bogus structural error was attributed to drift and produced a false WholeModelDelete. Sonar S3776 - engine/schema_drift.py:_seed_state_from_diffs (CC 18 → ≤15) — extracted _seed_one_diff_entry; the outer function only iterates the diffs dicts and delegates each entry to the helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
slayer/engine/ingestion.py (1)
775-799:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFresh duplicate join targets still slip past this safety check.
This only compares against
persisted.joins. Iffresh.joinscontains two different signatures for the sametarget_model—which_generate_joins()can emit when a table has multiple FKs to the same target—both still get persisted because the appended target/signature never enters the seen sets, and the new-model path below savesfreshas-is. That recreates the same shadowing/removal ambiguity this guard is meant to prevent.Also applies to: 823-832
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@slayer/engine/ingestion.py` around lines 775 - 799, The loop only checks persisted.joins for conflicting target_model signatures but fails to detect multiple differing joins for the same target inside fresh.joins; update the logic in the block that iterates fresh.joins (and the analogous code path around the new-model handling at lines 823-832) to maintain and consult a local seen set for target_models and signatures (e.g., extend existing_join_targets and existing_join_sigs or create fresh_seen_targets/fresh_seen_sigs) as you append each j to new_joins; if you encounter a j.target_model already seen with a different signature (tuple(sorted(...)) vs the stored signature) raise the same ValueError so duplicate differing joins inside fresh.joins are rejected before persisting.
🧹 Nitpick comments (2)
slayer/engine/ingestion.py (1)
823-823: ⚡ Quick winUse keyword arguments for
storage.get_model()here.This call has more than one parameter, so it should follow the repo’s keyword-argument convention as well. As per coding guidelines:
**/*.py: Use keyword arguments for functions with more than 1 parameter.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@slayer/engine/ingestion.py` at line 823, The call to storage.get_model uses a positional argument for table_name and a second positional parameter for data_source; update the call to pass named keyword arguments (e.g., use table_name=... and data_source=...) so it follows the repo convention for functions with more than one parameter and to make the intent explicit; edit the invocation where persisted is assigned (storage.get_model) to use those keywords with the existing variables (table_name and datasource.name).slayer/engine/schema_drift.py (1)
1498-1499: ⚡ Quick winUse keyword arguments for these helper calls.
_safe_get_columns(...)and_safe_get_pk_constraint(...)both take multiple parameters, and the repo guideline requires keyword arguments for that shape. As per coding guidelines:**/*.py: Use keyword arguments for functions with more than 1 parameter.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@slayer/engine/schema_drift.py` around lines 1498 - 1499, Change the two helper calls to pass arguments by name: replace the positional calls to _safe_get_columns(...) and _safe_get_pk_constraint(...) with keyword-argument calls using the parameter names (e.g. inspector=inspector, sa_engine=sa_engine, table_name=table_name, schema=schema) so the call sites (_safe_get_columns and _safe_get_pk_constraint) follow the repository guideline requiring keyword arguments for functions with more than one parameter.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@slayer/engine/schema_drift.py`:
- Around line 731-740: The helper _attribute_ref_to_base should stop doing raw
string prefix checks and instead walk the source/join graph for the stage to
resolve refs: detect if ref is a bare name on a ModelExtension (use
stage.source_model type/identity rather than sm_is_base boolean) and resolve
multi-hop dotted refs by traversing joins/relations (e.g., resolve
"customers.regions.name" by following the join chain from base_name through
intermediate models to the leaf column); update _attribute_ref_to_base (and the
similar logic around lines 795-813) to accept the stage/graph context, perform
graph traversal to map a dotted ref to the final base column or return None when
it resolves to a different model, and ensure ModelExtension bare refs treat the
extension’s source as the base when appropriate.
- Around line 855-876: _stage_uses_dropped_join currently flags a dropped join
if the stage references the target column, but it must also ensure the dropped
target belongs to a model reachable from the stage's resolved base/extension
join graph; update _stage_uses_dropped_join to first compute the set of models
reachable from the stage (starting from the stage's resolved base/extension
joins) and only consider dropped_joins whose target is in that reachable set
before calling _stage_referenced_columns_for_base(stage=..., base_name=target);
apply the same reachable-model restriction to the analogous logic at the other
occurrence (lines 911-922) so we only whole-drop stages for dropped joins that
are actually reachable from the stage.
---
Duplicate comments:
In `@slayer/engine/ingestion.py`:
- Around line 775-799: The loop only checks persisted.joins for conflicting
target_model signatures but fails to detect multiple differing joins for the
same target inside fresh.joins; update the logic in the block that iterates
fresh.joins (and the analogous code path around the new-model handling at lines
823-832) to maintain and consult a local seen set for target_models and
signatures (e.g., extend existing_join_targets and existing_join_sigs or create
fresh_seen_targets/fresh_seen_sigs) as you append each j to new_joins; if you
encounter a j.target_model already seen with a different signature
(tuple(sorted(...)) vs the stored signature) raise the same ValueError so
duplicate differing joins inside fresh.joins are rejected before persisting.
---
Nitpick comments:
In `@slayer/engine/ingestion.py`:
- Line 823: The call to storage.get_model uses a positional argument for
table_name and a second positional parameter for data_source; update the call to
pass named keyword arguments (e.g., use table_name=... and data_source=...) so
it follows the repo convention for functions with more than one parameter and to
make the intent explicit; edit the invocation where persisted is assigned
(storage.get_model) to use those keywords with the existing variables
(table_name and datasource.name).
In `@slayer/engine/schema_drift.py`:
- Around line 1498-1499: Change the two helper calls to pass arguments by name:
replace the positional calls to _safe_get_columns(...) and
_safe_get_pk_constraint(...) with keyword-argument calls using the parameter
names (e.g. inspector=inspector, sa_engine=sa_engine, table_name=table_name,
schema=schema) so the call sites (_safe_get_columns and _safe_get_pk_constraint)
follow the repository guideline requiring keyword arguments for functions with
more than one parameter.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 15adac8a-8e37-4fc6-8405-fa07d5762ad4
📒 Files selected for processing (4)
slayer/api/server.pyslayer/engine/ingestion.pyslayer/engine/query_engine.pyslayer/engine/schema_drift.py
🚧 Files skipped from review as they are similar to previous changes (2)
- slayer/api/server.py
- slayer/engine/query_engine.py
…ool-idempotent-re-ingestion Resolve conflicts in errors.py (keep SchemaDriftError alongside EntityResolutionError + MemoryNotFoundError), api/server.py (merge error imports), and slayer-overview.md (keep both schema-drift and memory tool bullets).
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
slayer/cli.py (1)
693-693: ⚡ Quick winMove new function-local imports to module top.
The newly added imports are inside functions. This breaks the repository import-placement rule.
As per coding guidelines: "
**/*.py: Imports should be placed at the top of files".Also applies to: 755-755
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@slayer/cli.py` at line 693, Move the function-local imports (e.g., ingest_datasource_idempotent from slayer.engine.ingestion and the other import added around line 755) to the module top of slayer.cli with the other imports; remove the in-function import statements so callers just reference the already-imported symbols, run/import ordering checks, and if you encounter a circular-import error refactor minimally (export the symbol from a different module or use a small helper) rather than leaving imports inside functions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.claude/skills/slayer-overview.md:
- Line 32: The docs claim ingest_datasource_models returns an
IdempotentIngestResult, but the actual implementation in slayer/mcp/server.py
returns a rendered text summary string; update either the docs or the function
so they match: (A) change the .claude/skills/slayer-overview.md entry for
ingest_datasource_models to describe the actual rendered text summary format and
examples produced by slayer/mcp/server.py, or (B) modify the
ingest_datasource_models implementation to return a literal
IdempotentIngestResult (with additions, to_delete, errors) instead of a rendered
string; search for all occurrences of ingest_datasource_models and the old field
names across skills/docs and update them for consistency.
In `@slayer/api/server.py`:
- Around line 397-403: The validate_models_endpoint currently forwards straight
to engine.validate_models so a typoed or missing data_source looks like "no
drift" and introspection failures surface as 500s; update
validate_models_endpoint to first verify the requested datasource exists (e.g.
via engine.has_data_source / engine.list_data_sources /
engine.introspect_data_source — whichever exists in your engine API) and return
HTTPException(404) when the data_source is unknown, then call
engine.validate_models inside a try/except and convert known
introspection/connection errors into HTTPException(422) with a clear message;
keep references to ValidateModelsRequest, validate_models_endpoint and
engine.validate_models when making the changes.
In `@slayer/cli.py`:
- Around line 757-760: If a user passes --datasource with a name that doesn't
exist, fail fast instead of returning no drift: update the call site around
_resolve_storage / SlayerQueryEngine to detect a missing datasource and
raise/exit with a non-zero status. Concretely, when args.datasource is set, have
_resolve_storage (or immediately after calling it) verify the requested
datasource exists and if not throw a descriptive error (e.g., ValueError) or
call process exit with a non-zero code before instantiating SlayerQueryEngine or
invoking run_sync(engine.validate_models(...)); include the datasource name in
the error message so callers can see the typo. Ensure validate_models is not
relied on to signal this condition later.
In `@slayer/mcp/server.py`:
- Around line 2397-2413: The validate_models mcp tool currently treats an empty
list from engine.validate_models as “no drift,” which masks typos or missing
datasources and allows introspection errors to surface as raw tool exceptions;
update validate_models to first verify the datasource exists via storage (or a
provided SlayerQueryEngine method) when data_source is given, then wrap the call
to engine.validate_models in a try/except that converts introspection errors
into a clear tool error response; if the datasource is missing return a distinct
error message (or non-empty result) indicating "unknown datasource" and only
return an empty JSON array when validation truly succeeds with no drift.
---
Nitpick comments:
In `@slayer/cli.py`:
- Line 693: Move the function-local imports (e.g., ingest_datasource_idempotent
from slayer.engine.ingestion and the other import added around line 755) to the
module top of slayer.cli with the other imports; remove the in-function import
statements so callers just reference the already-imported symbols, run/import
ordering checks, and if you encounter a circular-import error refactor minimally
(export the symbol from a different module or use a small helper) rather than
leaving imports inside functions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: edfa6695-177e-4b05-a694-a91eff702101
📒 Files selected for processing (9)
.claude/skills/slayer-models.md.claude/skills/slayer-overview.mdCLAUDE.mddocs/concepts/models.mdslayer/api/server.pyslayer/cli.pyslayer/core/errors.pyslayer/mcp/server.pytests/integration/test_mcp_inspect.py
✅ Files skipped from review due to trivial changes (1)
- CLAUDE.md
…achable-stage filter
Correctness fixes (CodeRabbit major)
- Unknown-datasource handling on validate_models, mirrored across all
three surfaces. engine.validate_models(data_source=<typo>) returns []
because no models match, which made a typo indistinguishable from
"no drift". Each entry point now fails loudly:
- api/server.py: POST /validate-models → 404 when storage.get_datasource
is None; SQLAlchemyError → 422 with the friendly DB error message.
- cli.py: slayer validate-models --datasource <name> → sys.exit(1)
with "Datasource 'X' not found in <storage>".
- mcp/server.py: validate_models tool → "Datasource 'X' not found."
string, plus DBAPI errors routed through _friendly_db_error.
- Query-backed cascade graph correctness: replace the legacy raw-prefix
attribution with a per-stage join-graph walk.
- New _StageGraph (stage_source_name, extension_targets, reachable,
models_by_name) carries the resolved per-stage join context.
- _build_stage_graph performs a transitive walk from the stage's
resolved source through both the base model's stored joins and any
extension-added joins.
- _attribute_ref_to_base now walks the dotted ref hop-by-hop through
the graph, so multi-hop refs like ``customers.regions.name`` from a
stage rooted at ``orders`` resolve to ``regions.name`` (legacy
prefix matching never could).
- _stage_uses_dropped_join is now bounded by graph.reachable, so a
dropped ``invoices → customers`` join no longer whole-drops a stage
rooted at ``orders`` that uses ``customers.name`` via its own link.
- All per-section ref helpers (_measure_refs_on_base / _dimension_refs
/ _time_dimension_refs / _filter_refs_on_base) now consume the
graph instead of the bool sm_is_base flag.
Docs
- .claude/skills/slayer-overview.md: ingest_datasource_models entry
rewritten to describe the rendered text summary the MCP tool actually
returns; the structured IdempotentIngestResult is documented as the
engine-layer contract underneath. validate_models entry adds the
unknown-datasource error message to set agent expectations.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@slayer/cli.py`:
- Around line 767-769: Wrap the call to
run_sync(engine.validate_models(data_source=args.datasource)) in a try/except
that catches the expected validation/introspection exceptions raised by
SlayerQueryEngine.validate_models (e.g., the engine's custom
validation/introspection error types or a common base like
ValidationError/IntrospectionError), log/print a concise error message (not a
traceback) and exit with a non-zero status; on success continue to call
_format_validate_models_output(entries) and print the result. Ensure you
reference SlayerQueryEngine, validate_models, run_sync and args.datasource when
locating the code to modify.
In `@slayer/engine/schema_drift.py`:
- Around line 1721-1723: The call that populates live_tables uses
_live_schema_for_datasource but always introspects the default schema; update
the call site to pass the datasource's configured schema (e.g., pass
datasource.schema_name or datasource.config.schema_name) into
_live_schema_for_datasource so validate_models() diffs against the correct
schema, and adjust _live_schema_for_datasource's signature/implementation to
accept and use that schema_name parameter when introspecting. Ensure the symbol
names involved are _live_schema_for_datasource, live_tables, and validate_models
so the change is applied where the schema is propagated through.
- Around line 800-816: The dotted ref handling treats "<source>.<col>" as a join
hop; change it so when parts[:-1] equals [graph.stage_source_name] (i.e., the
ref is exactly "stage_source_name.column") you treat it as a same-model column
reference instead of walking joins: return the leaf column if
graph.stage_source_name == base_name, otherwise None. Update the logic around
ref/parts/leaf/path (in the function using graph.stage_source_name and
base_name) to short-circuit that case before iterating joins so root-qualified
refs map to the base model's column.
In `@slayer/mcp/server.py`:
- Around line 308-309: The current branch returns _empty_ingest_message when
additions is empty and result.to_delete and result.errors are false, which
misreports “No tables found” for idempotent ingests that simply had nothing new;
instead, before returning _empty_ingest_message from this condition in
server.py, perform a real live-table check (e.g., call the datasource API such
as ds.list_tables(schema_name) or an equivalent method to count/lookup tables)
and only return the empty-schema message if that lookup shows zero tables; if
tables exist, return a different message indicating “no additive changes” (or
add an explicit table_count field to the result) so agents/users aren’t misled
by the existing branch involving additions, result.to_delete, result.errors, and
_empty_ingest_message.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: da2ab6ed-f957-46dc-ad52-6cc1f050fa22
📒 Files selected for processing (5)
.claude/skills/slayer-overview.mdslayer/api/server.pyslayer/cli.pyslayer/engine/schema_drift.pyslayer/mcp/server.py
✅ Files skipped from review due to trivial changes (1)
- .claude/skills/slayer-overview.md
…e error normalization Schema-drift correctness (CodeRabbit major) - engine/schema_drift.py:_attribute_ref_to_base now short-circuits when the dotted-ref path is exactly [stage_source_name] — refs like ``orders.amount`` from a stage rooted at ``orders`` previously failed to resolve because ``orders`` is not in its own join set, so the walk loop returned None. Root-qualified column refs now attribute to the base correctly, so query-backed drift attribution doesn't miss them after a base-column drop. - engine/schema_drift.py:_collect_sql_table_diffs now passes ``datasource.schema_name`` through to _live_schema_for_datasource. Without this, validate_models always introspected the dialect's default schema and produced false WholeModelDeletes for non-default- schema datasources. - engine/schema_drift.py:_StageGraph: explicit defaults on every field (Sonar S8396 — Pydantic v2 prefers Optional fields with explicit None default; Field(default_factory=...) for the dict/set fields). Surface error normalization (CodeRabbit major) - cli.py:_run_validate_models wraps ``engine.validate_models(...)`` in try/except. DB / auth / introspection failures previously escaped as raw Python tracebacks; they now print a one-line ``validate-models failed: <message>`` and exit 1, matching the rest of the CLI's error-handling shape. - mcp/server.py:_render_ingest_result distinguishes "schema is empty" from "no additive work to do". Previously, an idempotent ingest that had nothing to add (because every persisted model was sql / query-backed and got skipped silently) returned "No tables found in schema ... Try schema_name=..." — misleading the agent. Now probes ``_fetch_tables`` first and only shows the empty-schema hint when there are actually no live tables; otherwise reports "Datasource already in sync — no additive changes." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|



Summary
Implements DEV-1356: three first-class behaviours for keeping persisted SLayer models in sync with live database schemas.
validate_models— read-only diff that returns the minimal list of deletes (drop columns / measures / joins / filters / models) needed to make persisted models valid against the current live schema. Output replays directly asedit_model/delete_modelcalls.ingest_datasource_models— re-runs are additive only (add missing columns / joins / tables, never mutate existing) and return a combined report of what was added plus whatvalidate_modelssays still needs deleting.SchemaDriftError—engine.execute()failures get post-hoc-attributed viavalidate_modelsagainst the touched models' datasources, surfacing a clear error pointing at the validator andedit_modelinstead of a raw DBAPI message.Surfaces
engine.validate_models()engine.apply_drift_deletes()ingest_datasource_idempotent()validate_modelstoolingest_datasource_models(idempotent)POST /validate-modelsPOST /ingestslayer validate-modelsslayer validate-models --force-clean [--yes]slayer ingest--force-cleanis intentionally CLI-only — destructive auto-application is opt-in at the human-typed layer.Cascade rules implemented
A drop on
M.Xcascades through:Column.sqlonMreferencingX(transitive across chains of derived columns).ModelMeasure.formulaonMreferencingXor any other dropped measure.Mwhose local FK column isX; joins onKwhosetarget_model == Mandforeign_column == X.MreferencingX→ moved toremove_filters.__-delimited path aliases).source_querieschains transitively referencing droppedM.Xor whole-droppedM→ whole-model drop.drop_columnis emitted.Cascade walking stays strictly within the parent datasource. The collapse rule preempts an
EditModelDeletewith aWholeModelDeleteon the same model.Test plan
tests/test_validate_models.py(30 tests)tests/test_idempotent_ingestion.py(10 tests, covering preservation ofdescription/label/format/metaon existing columns,sql-mode and query-backed skip, per-model error capture, exclude-table scoping, type-drift two-pass invariant)SchemaDriftErrorquery-time wrap —tests/test_schema_drift_error.py(5 tests, covering wrap-on-drift, no-drift passthrough, healthy-path zero overhead, attribution-failure swallowing, models_touched computation)apply_drift_deletes+ CLI--force-clean—tests/test_force_clean.py(8 tests, including subprocess CLI integration: prompt-aborts-on-n,--yesskips prompt)tests/integration/test_schema_drift_duckdb.py(3 tests, real DDL mutations)Out of scope (per issue)
sql-mode projection invariant — tracked separately in DEV-1358.🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
User Experience
Documentation
Tests