Skip to content

S1: schema versioning + read-time migration chain (#51)#57

Merged
ZmeiGorynych merged 3 commits into
mainfrom
egor/data-model-change
Apr 29, 2026
Merged

S1: schema versioning + read-time migration chain (#51)#57
ZmeiGorynych merged 3 commits into
mainfrom
egor/data-model-change

Conversation

@ZmeiGorynych
Copy link
Copy Markdown
Member

@ZmeiGorynych ZmeiGorynych commented Apr 29, 2026

Summary

  • Adds a version: int = 1 field to every persisted entity (SlayerModel, SlayerQuery, DatasourceConfig) and a converter chain in slayer/storage/migrations.py that upgrades older dicts before Pydantic validates them.
  • Foundational, additive prep for the upcoming S3 schema redesign — current schema is v1 and the chain is empty, so this is functionally a no-op today.
  • The migration hook lives on the Pydantic class itself (@model_validator(mode="before")), not on individual storage backends. Every call site that does Model.model_validate(dict)YAMLStorage, SQLiteStorage, third-party backends registered via register_storage(), plus the HTTP API, MCP server, and dbt importer — picks up migrations transparently with zero backend-side changes.

Design notes

  • The issue suggests wiring conversion into each backend's get_* method. That would silently miss third-party backends and other entry points; instead the chain runs at the Pydantic-validation layer, which is the single point every persisted dict flows through. This is the same pattern as the existing DatasourceConfig._accept_user_alias (userusername rename), generalised.
  • Per-entity CURRENT_VERSIONS so each schema can evolve independently.
  • Forward tolerance: dicts with a higher version than this SLayer knows about pass through untouched (Pydantic's default extra="ignore" handles unknown fields).
  • Save path is unchanged — version defaults to CURRENT_VERSIONS[entity], so model_dump(mode="json", exclude_none=True) always emits the current value.

Test plan

  • tests/test_migrations.py — 17 new tests covering: v1→v1 no-op, synthetic v0→v1 chain ordering, forward tolerance, duplicate-registration rejection, end-to-end migration through both YAMLStorage and SQLiteStorage, round-trip version preservation, and the existing userusername alias surviving the new before-validator.
  • poetry run pytest --ignore=tests/integration → 944 passed.
  • poetry run ruff check slayer/ tests/ → clean.
  • Reviewer: confirm docs/concepts/models.md Schema versioning section reads correctly and the new version row in the model fields reference is in the right place.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Automatic schema versioning for persisted entities: saved records include a version and older records are upgraded on load across storage and API paths.
  • Documentation

    • Added guidance on schema versioning, migration chains, and compatibility behavior.
  • Tests

    • Added comprehensive tests covering migrations, storage load/save round-trips, and validation of upgrade behavior.

ZmeiGorynych and others added 2 commits April 28, 2026 19:31
Foundational, additive prep for the upcoming S3 schema redesign. Every
persisted entity (SlayerModel, SlayerQuery, DatasourceConfig) now carries
a version: int field, and a per-entity converter registry runs older
dicts through a chain of pure dict->dict transforms before Pydantic
validates them. The chain is empty today (current schema is v1), so this
is a no-op functionally — but it puts the rails in place so S3 can ship
without breaking on-disk files.

The migration hook lives on the Pydantic class itself
(@model_validator(mode="before")), not inside each storage backend's
get_* method. As a result, every call site that does
Model.model_validate(dict) — YAMLStorage, SQLiteStorage, third-party
backends registered via register_storage(), the HTTP API, the MCP
server, and the dbt importer — picks up migrations transparently with
zero backend-side changes.

- New module slayer/storage/migrations.py with the registry, the
  migrate() walker, and CURRENT_VERSIONS per entity.
- Forward tolerance: dicts with a version higher than this SLayer knows
  about pass through untouched (Pydantic's default extra="ignore"
  handles unknown fields).
- tests/test_migrations.py covers v1->v1 no-op, synthetic v0->v1 chains,
  forward tolerance, end-to-end migration through both YAMLStorage and
  SQLiteStorage, and round-trip version preservation.
- docs/concepts/models.md gets a new Schema versioning section and a
  version row in the model fields table; CLAUDE.md gains a Key
  Conventions bullet pointing at it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 29, 2026

📝 Walkthrough

Walkthrough

Adds schema versioning for persisted entities (SlayerModel, SlayerQuery, DatasourceConfig), a migrations registry module, and Pydantic pre-validation hooks that run migrations on load; documentation and tests for migration behaviors and storage round-trips are included.

Changes

Cohort / File(s) Summary
Documentation
CLAUDE.md, docs/concepts/models.md
Documented schema versioning: new version field (default 1), migration lifecycle, Pydantic-level migration hook, and central migration registry location.
Core Models & Queries
slayer/core/models.py, slayer/core/query.py
Added version: int = 1 to SlayerModel, DatasourceConfig, and SlayerQuery. Introduced @model_validator(mode="before") hooks that call schema migration before Pydantic validation; datasource aliasing merged into migrations hook.
Migration Infrastructure
slayer/storage/migrations.py
New migration registry: CURRENT_VERSIONS, register_migration(entity, source_version) decorator, and migrate(entity, data) that applies conversion steps until current version.
Tests
tests/test_migrations.py
New tests for migration registry behavior, validator integration, duplicate-registration errors, storage integration (YAML/SQLite load-time upgrade), and YAML round-trip persistence of version.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Validator as Pydantic Validator<br/>(mode="before")
    participant Migration as Migration System<br/>(slayer/storage/migrations.py)
    participant Model as Pydantic Model
    participant Storage

    rect rgba(100,150,255,0.5)
    Note over Client,Storage: Load flow (YAML/SQLite/API)
    Client->>Storage: Request persisted payload
    Storage-->>Validator: Raw dict input (version=N or missing)
    Validator->>Migration: migrate(entity, data)
    Migration->>Migration: Read data["version"] (default 1)
    loop Until current version reached
        Migration->>Migration: Apply converter vN → vN+1
        Migration->>Migration: Set data["version"]=vN+1
    end
    Migration-->>Validator: Migrated dict (version=current)
    Validator->>Model: Pydantic validation/create instance
    Model-->>Client: Validated model instance
    end

    rect rgba(100,255,150,0.5)
    Note over Client,Storage: Save flow
    Client->>Model: Model instance to persist
    Model->>Storage: Serialize with `version` included
    Storage-->>Client: Persisted (YAML/SQLite contains version)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Possibly related PRs

  • DEV-1261: Meta fields #33: Touches SlayerModel and related model fields; overlaps changes to model schema and should be reviewed together.

Suggested reviewers

  • AivanF

Poem

🐰 I hopped through versions, one by one,

Converted fields till work was done.
From YAML burrows to SQLite den,
I stitched the past to present again.
Hooray — your data hops with zen!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.30% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'S1: schema versioning + read-time migration chain (#51)' directly and clearly summarizes the main changes: adding schema versioning and a read-time migration chain to the codebase.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch egor/data-model-change

Review rate limit: 4/5 reviews remaining, refill in 12 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
slayer/core/query.py (1)

306-309: Prefer keyword arguments in migration hook call.

Use named args when calling _migrate_schema here as well, matching repository conventions.

Proposed refactor
 def _apply_schema_migrations(cls, data: Any) -> Any:
-    return _migrate_schema("SlayerQuery", data)
+    return _migrate_schema(entity="SlayerQuery", data=data)

As per coding guidelines: Use keyword arguments for functions with more than 1 parameter.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@slayer/core/query.py` around lines 306 - 309, The call in the model validator
_apply_schema_migrations uses positional args for _migrate_schema; update the
call to use keyword args to match repo conventions (pass name="SlayerQuery" and
data=data) while keeping the method signature and return value the same so
_apply_schema_migrations still returns the migrated data.
slayer/core/models.py (1)

219-223: Use keyword arguments for _migrate_schema(...) calls.

Both migration hook calls pass two arguments positionally; switch to keyword args for consistency/readability with project conventions.

Proposed refactor
 def _apply_schema_migrations(cls, data: Any) -> Any:
-    return _migrate_schema("SlayerModel", data)
+    return _migrate_schema(entity="SlayerModel", data=data)
@@
 def _apply_schema_migrations_and_aliases(cls, data: Any) -> Any:
-    data = _migrate_schema("DatasourceConfig", data)
+    data = _migrate_schema(entity="DatasourceConfig", data=data)

As per coding guidelines: Use keyword arguments for functions with more than 1 parameter.

Also applies to: 298-300

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@slayer/core/models.py` around lines 219 - 223, The _apply_schema_migrations
validator is calling _migrate_schema with positional args; change that call to
use keyword arguments (e.g., pass model_name="SlayerModel" and data=data) for
readability and consistency with project conventions, and make the same change
for the other migration hook that currently calls _migrate_schema positionally
(the one around lines where the second validator is defined) so both usages of
_migrate_schema use explicit keyword parameters.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@slayer/storage/migrations.py`:
- Around line 61-77: The migrate() function mutates the caller-supplied dict
(e.g., via setdefault), so make a shallow copy of the input data immediately on
entry (e.g., data = dict(data) or data.copy()) and perform all subsequent
reads/writes against that copy; keep the rest of the logic (version lookup via
CURRENT_VERSIONS, migration loop using _REGISTRY, calling fn, updating current
and data["version"]) unchanged so callers’ payloads are not mutated.

---

Nitpick comments:
In `@slayer/core/models.py`:
- Around line 219-223: The _apply_schema_migrations validator is calling
_migrate_schema with positional args; change that call to use keyword arguments
(e.g., pass model_name="SlayerModel" and data=data) for readability and
consistency with project conventions, and make the same change for the other
migration hook that currently calls _migrate_schema positionally (the one around
lines where the second validator is defined) so both usages of _migrate_schema
use explicit keyword parameters.

In `@slayer/core/query.py`:
- Around line 306-309: The call in the model validator _apply_schema_migrations
uses positional args for _migrate_schema; update the call to use keyword args to
match repo conventions (pass name="SlayerQuery" and data=data) while keeping the
method signature and return value the same so _apply_schema_migrations still
returns the migrated data.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d233dfd5-a4e9-4784-8ea4-22cc7e57ec2b

📥 Commits

Reviewing files that changed from the base of the PR and between 0ee53e6 and 1c24147.

📒 Files selected for processing (6)
  • CLAUDE.md
  • docs/concepts/models.md
  • slayer/core/models.py
  • slayer/core/query.py
  • slayer/storage/migrations.py
  • tests/test_migrations.py

Comment thread slayer/storage/migrations.py
Avoid mutating caller-provided dicts in migrate() by copying on entry,
and switch _migrate_schema() call sites to keyword arguments per repo
convention.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
slayer/core/query.py (1)

301-301: Bind query default version to the central version registry.

Line [301] hardcodes 1, which can silently drift from CURRENT_VERSIONS["SlayerQuery"] after a future schema bump.

Proposed change
-from slayer.storage.migrations import migrate as _migrate_schema
+from slayer.storage.migrations import CURRENT_VERSIONS as _CURRENT_VERSIONS
+from slayer.storage.migrations import migrate as _migrate_schema
@@
-    version: int = 1
+    version: int = _CURRENT_VERSIONS["SlayerQuery"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@slayer/core/query.py` at line 301, The hardcoded default version "version:
int = 1" should be bound to the central registry so it won't drift; change the
default to use CURRENT_VERSIONS["SlayerQuery"] (ensure CURRENT_VERSIONS is
imported or referenced where defined) and keep the type as int (e.g., version:
int = CURRENT_VERSIONS["SlayerQuery"]). Update any import or module-qualified
reference needed so the symbol resolves in slayer/core/query.py.
slayer/storage/migrations.py (1)

27-49: Fail fast on invalid migration registrations.

register_migration() should reject unknown entities and invalid source versions at registration time (instead of failing later during migrate()).

Proposed change
 def register_migration(
     entity: str, source_version: int
 ) -> Callable[[Callable[[dict], dict]], Callable[[dict], dict]]:
@@
+    if entity not in CURRENT_VERSIONS:
+        raise KeyError(f"Unknown entity '{entity}' in register_migration()")
+    if source_version < 1:
+        raise ValueError("source_version must be >= 1")
+
     def deco(fn: Callable[[dict], dict]) -> Callable[[dict], dict]:
         key = (entity, source_version)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@slayer/storage/migrations.py` around lines 27 - 49, register_migration
currently defers validation until migrate() — update it to fail fast: inside
deco (before the duplicate check) validate that entity is a known entity (e.g.
exists in your canonical list like VALID_ENTITIES or _ENTITY_SCHEMAS) and that
source_version is an integer >= 0 and within the allowed range (use the module's
current/latest version mapping such as _LATEST_VERSIONS.get(entity) or add a
VALID_ENTITIES/LATEST map if missing) and raise ValueError for unknown entities
or out-of-range/invalid source_version; keep the existing duplicate-registration
check on _REGISTRY afterwards.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@slayer/core/query.py`:
- Line 301: The hardcoded default version "version: int = 1" should be bound to
the central registry so it won't drift; change the default to use
CURRENT_VERSIONS["SlayerQuery"] (ensure CURRENT_VERSIONS is imported or
referenced where defined) and keep the type as int (e.g., version: int =
CURRENT_VERSIONS["SlayerQuery"]). Update any import or module-qualified
reference needed so the symbol resolves in slayer/core/query.py.

In `@slayer/storage/migrations.py`:
- Around line 27-49: register_migration currently defers validation until
migrate() — update it to fail fast: inside deco (before the duplicate check)
validate that entity is a known entity (e.g. exists in your canonical list like
VALID_ENTITIES or _ENTITY_SCHEMAS) and that source_version is an integer >= 0
and within the allowed range (use the module's current/latest version mapping
such as _LATEST_VERSIONS.get(entity) or add a VALID_ENTITIES/LATEST map if
missing) and raise ValueError for unknown entities or out-of-range/invalid
source_version; keep the existing duplicate-registration check on _REGISTRY
afterwards.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1d4905ea-bf70-47fa-9f77-e69422f31e37

📥 Commits

Reviewing files that changed from the base of the PR and between 1c24147 and f5f5a61.

📒 Files selected for processing (4)
  • slayer/core/models.py
  • slayer/core/query.py
  • slayer/storage/migrations.py
  • tests/test_migrations.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • slayer/core/models.py
  • tests/test_migrations.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant