Skip to content

Easy compatibility checks and migrations for pydantic models

License

Notifications You must be signed in to change notification settings

QuartzLibrary/stable_pydantic

Repository files navigation

stable_pydantic

stable_pydantic helps you avoid breaking changes and navigate schema migrations of your pydantic models.

Development is in progress, but the core works. Expect a few pydantic configs to not be supported yet in compatibility checks and some error messages to be rough.

Quick start

Below, you will need to change your models, the path, and optionally the kind of compatibility check you want.

# test.py
import stable_pydantic as sp

# This is what you need:
MODELS = [Root1, Root2]
PATH = "./schemas"

# This are defaults you can tweak:
BACKWARD = True # Check for backward compatibility?
FORWARD = False # Check for forward compatibility?

# A test helps gating CI
def test_schemas():
    sp.skip_if_migrating() # See test below.

    # Assert that the schemas are unchanged
    sp.assert_unchanged_schemas(PATH, MODELS)

    # Assert that all the schemas are compatible
    sp.assert_compatible_schemas(
      PATH,
      MODELS,
      backward=BACKWARD,
      forward=FORWARD,
    )

# Another test regenerates a schema after a change.
# To run it:
# STABLE_PYDANTIC_MIGRATING=true pytest
def test_update_versioned_schemas(request):
    sp.skip_if_not_migrating()

    sp.update_versioned_schemas(PATH, MODELS)

That's it. Just make sure you are running the tests in CI and you are good to go. When you make a change you need reflected in the schemas, run the tests with the env variable STABLE_PYDANTIC_MIGRATING=true and a new version will be placed in the schemas folder. See below for more advanced manual migrations.

Options and features

You can introduce this library incrementally, as you needs grow:

  1. Reference files: just check the current version into git for easy reviewing.
  2. Compatibility check: store multiple versions, and require that they are backward compatible.
  3. Manual migrations: provide manual migrations for incompatible versions, and optionally delegate serialization to stable_pydantic, it'll take care of versioning.

Features:

  • Backward compatibility checks (new clients read old data)
  • Forward compatibility checks (old clients read new data)
  • Basic, composite, union1, literal, and nested types (enums are TODO)
  • Recursive models (both Self and late-bound string-y types)
  • Inheritance from other models
  • Recursive models
  • Field config: default, alias, discriminator, exclude, gt, min_length, ...
  • Model config: Only the extra for now (controls whether extra fields are treated)
  • Unsupported config options are allowed as long as unchanged between compatible models (please still report the warnings if you see them!)
  • Tested with the latest patch versions of pydantic 2.9, 2.10, 2.11, and 2.12
  • Tested on Python 3.10, 3.11, 3.12, 3.13

TODOs:

  • Quick stash of sample values to test upgrades
  • One-line fuzzing of migrations
  • Enums
  • Decorators

Non-goals:

  • stable_pydantic handles structure and built-in validation, you might still fail to deserialize data because of differing custom validation logic.

The library makes extensive use of defensive programming to surface issues at test-time, and is quite conservative to minimise the risk of errors at runtime, so if you hit an assertion please do report it!

Usage

stable_pydantic requires pydantic 2.9.2+

uv add stable_pydantic
pip install stable_pydantic

Alternatives

  • JSON Schema: Use this when you need language-agnostic schema validation. Tools like json-schema-diff can help checking for compatibility.
  • Protobuf / Avro / Thrift: Use these when you need cross-language schema definitions and have a build step for code generation. They have built-in schema evolution but require maintaining separate .proto/.avsc files.
  • stable_pydantic: Use this when Pydantic models are your source of truth and you want CI-integrated compatibility testing and migration without leaving Python.

Levels of stability

Level 1: reference files

At this level, you just dump your schema to a file and check it into git. This allows you to see all important schema changes in one place.

Folder structure:

schemas/
  Model1/current.py
  Model2/current.py

And the tests:

# test.py
def test_schemas():
    sp.skip_if_migrating()
  
    # Assert that the current schema is unchanged
    sp.assert_unchanged_schemas(PATH, MODELS)

# To run:
# STABLE_PYDANTIC_MIGRATING=true pytest
def test_regenerate_current(request):
    sp.skip_if_not_migrating()

    # Overwrite the schema with the new one.
    sp.regenerate_current(PATH, MODELS)

Level 2: backward compatible changes only

Here multiple schema versions are checked into git, and we check for compatibility between them. If you are checking for backward compatibility (the default), you will always be able to deserialize old data with your latest model.

+-----+-----+----------------------------------------------+
| Bwd | Fwd | Change                                       |
+-----+-----+----------------------------------------------+
|  ✓  |  ✓  | Reorder fields                               |
|  ✓  |  ✓* | Add optional field                           |
|  ✓* |  ✓  | Drop optional field                          |
|  ✗  |  ✓* | Add required field                           |
|  ✓* |  ✗  | Drop required field                          |
|  ✓  |  ✗  | Make required → optional (int → int | None)  |
|  ✗  |  ✓  | Make optional → required (int | None → int)  |
|  ✓  |  ✗  | Widen type (int → int | str)                 |
|  ✗  |  ✓  | Narrow type (int | str → int)                |
|  ✗  |  ✗  | Rename field (no alias)                      |
+-----+-----+----------------------------------------------+
  • Optional/required: indicates whether the field has a default value (not if it can be None), per pydantic docs.
  • Backward: New code can deserialize old data
  • Forward: Old code can deserialize new data (e.g., after a rollback)
  • *: If extra/unknown fields are set to allow or ignore (Pydantic default)

How does this work?

  • The modules are loaded dynamically to ensure we are working with the real pydantic models.
    • Note that this happens by running exec on Python files (which should be checked into git), so it has the same security properties as any other Python file in your repo.
    • This also only happens while running tests if you are just doing compatibility checks.
  • The pydantic models are inspected to determine compatibility.

stable_pydantic is quite conservative in what it allows. By default is something looks like it might cause you to fail to deserialize old data, the compatibility check will raise an error.

Folder structure:

schemas/
  Model1/
    current.py
    v0_schema.py
    v1_schema.py
  Model2/
    current.py
    v0_schema.py # Version numbers are unique across models.

The tests:

# test.py
def test_schemas():
    sp.skip_if_migrating()

    # Assert that the schemas are unchanged
    sp.assert_unchanged_schemas(PATH, MODELS)

    # Assert that all the schemas are compatible
    sp.assert_compatible_schemas(
      PATH,
      MODELS,
      backward=True, # default
      forward=False, # default
    )

# To run:
# STABLE_PYDANTIC_MIGRATING=true pytest
def test_update_versioned_schemas(request):
    # To run: STABLE_PYDANTIC_MIGRATING=true pytest
    sp.skip_if_not_migrating()

    # Add a new schema & update current.py
    sp.update_versioned_schemas(PATH, MODELS)

Level 3: Turing's migration

Just add a file to allow arbitrary migration logic. The mock models that have been checked into git are used to deserialize and then stage-by-stage update old values. This will require including a single version number in the serialized data, or to manually provide it.

With greater power comes greater complexity. Do you need it?

To use this, just add a v0_to_1.py file or equivalent to the model folder, and import the relevant models!

The way this works is that it'll deserialize the data as the latest version compatible with the original, then use the migration function to migrate up, then repeat skipping over any intermediate compatible versions and running following migrations.

The folder structure:

schemas/
  Model1/
    current.py
    v0_schema.py
    v0_to_1.py # Just manually add this file!
    v1_schema.py
  Model2/
    current.py
    v0_schema.py
    v1_schema.py # If no upgrade file is present, it is checked to be backward compatible.

The migration file:

# v0_to_1.py
import v0_schema as v0
import v1_schema as v1

# The only requirement is an upgrade function
# mapping the old model to the new one.
def upgrade(old: v0.Settings) -> v1.Settings:
    return v1.Settings(name=old.name, amount=old.value)

The tests:

# test.py

# Are the same as the above.
# Migrations are automatically sanity-checked.

And to let stable_pydantic handle serialisation:

# app.py
import stable_pydantic as sp

# You can use this to handle versioned serialization
VERSION_HANDLER = sp.ModelEntry[Model1].open(PATH, Model1)

# Serialization:
# injects the version number into the JSON
value = Model1(...)
value_json: str = VERSION_HANDLER.to_json(value)

# Deserialization:
# unpacks and upgrades old versions
# automatically into the current model.
old_json = '{"version": 0, "data": {"name": "test"}}'
old: Model1 = VERSION_HANDLER.from_json(old_json)

Pro tip: you can jump over multiple versions with a migration, but any two subsequent versions with no migration are always checked for compatibility.

And that's a wrap! Thank you for reading this far. If you end up using or considering this, please open an issue to let me know how it went.

Footnotes

  1. If union types are nested, we can't reliably walk the union-tree without assuming all types named Optional or Union are the standard library types. To err on the side of caution we disallow this. This is fine though, as you basically never want to manually nest union types.

About

Easy compatibility checks and migrations for pydantic models

Resources

License

Stars

Watchers

Forks

Languages