# Pydantic V2 demo

[https://github.com/Wajih-O/pydantic-v2-demo](https://github.com/Wajih-O/pydantic-v2-demo)

This notebook explores and demo the features and updates announced in the v2 plan -> https://docs.pydantic.dev/latest/blog/pydantic-v2/
from a Pydantic v2(.2) perspective

In [1]:

from pydantic import BaseModel, ValidationError, Field
from typing import Self, Optional
import pytest
import json


In [2]:
# ! pip freeze | grep pydantic

--
## V2 gained in performance

Using pyo3/rust underneath:

- Gain in performance (order of magnitude 10x, 5x to 50 x)
- Multithreading
- Reusing rust libraries
- More explicit error handling (within rust)



## Architecture

## Namespace clean-up

All methods on models will start with model_, fields' names will not be allowed to start with "model" (aliases can be used if required).

 - avoid confusing gotchas when field names clash with methods on a model
 - make it safer to add more methods to a model without risking new clashes



```python

class BaseModel:
    model_fields: List[FieldInfo]
    """previously `__fields__`, although the format will change a lot"""
    @classmethod
    def model_validate(cls, data: Any, *, context=None) -> Self:
        """ previously `parse_obj()`, validate data"""
    @classmethod
    def model_validate_json(
        cls,
        data: str | bytes | bytearray,
        *,
        context=None
    ) -> Self:
        """
        previously `parse_raw(..., content_type='application/json')`
        validate data from JSON
        """
    @classmethod
    def model_is_instance(cls, data: Any, *, context=None) -> bool:
        """
        new, check if data is value for the model
        """
    @classmethod
    def model_is_instance_json(
        cls,
        data: str | bytes | bytearray,
        *,
        context=None
    ) -> bool:
        """
        Same as `model_is_instance`, but from JSON
        """
    def model_dump(
        self,
        include: ... = None,
        exclude: ... = None,
        by_alias: bool = False,
        exclude_unset: bool = False,
        exclude_defaults: bool = False,
        exclude_none: bool = False,
        mode: Literal['unchanged', 'dicts', 'json-compliant'] = 'unchanged',
        converter: Callable[[Any], Any] | None = None
    ) -> Any:
        """
        previously `dict()`, as before
        with new `mode` argument
        """
    def model_dump_json(self, ...) -> str:
        """
        previously `json()`, arguments as above
        effectively equivalent to `json.dump(self.model_dump(..., mode='json'))`,
        but more performant
        """
    def model_json_schema(self, ...) -> dict[str, Any]:
        """
        previously `schema()`, arguments roughly as before
        JSON schema as a dict
        """
    def model_update_forward_refs(self) -> None:
        """
        previously `update_forward_refs()`, update forward references
        """
    @classmethod
    def model_construct(
        self,
        _fields_set: set[str] | None = None,
        **values: Any
    ) -> Self:
        """
        previously `construct()`, arguments roughly as before
        construct a model with no validation
        """
    @classmethod
    def model_customize_schema(cls, schema: dict[str, Any]) -> dict[str, Any]:
        """
        new, way to customize validation,
        e.g. if you wanted to alter how the model validates certain types,
        or add validation for a specific type without custom types or
        decorated validators
        """
    class ModelConfig:
        """
        previously `Config`, configuration class for models
        """

```

##  Strict mode

Where data is not coerced but rather an error is raised


In [3]:
class Energy(BaseModel):
    value: int  # energy value in wh
    def from_kwh(kwh: int) -> Self:
        return Energy(value=kwh * 10e3)

In [4]:
Energy(value="3") # data coerced

Energy(value=3)

In [5]:
# testing model validate json
Energy.model_validate_json("{\"value\": 100}")

Energy(value=100)

In [6]:
class EnergyStrictMode(BaseModel):
    model_config = dict(strict=True)
    value: int  # energy value in wh
    def from_kwh(kwh: int) -> Self:
        return Energy(value=kwh * 10e3)

In [7]:
with pytest.raises(ValidationError):
    EnergyStrictMode(value="3")

## Formalized Conversion table

(to link from the blog)

It offers a solution for inconsistency around data conversion

If the input data has a single and intuitive representation in the field's type and no data is lost during the conversion then the data will be converted; otherwise a  validation error is raised.

string fields are the exception this rule:
 only **str, bytes and bytearray** are valid as inputs to string fields.


In [8]:
class WithStringFields(BaseModel):
    s1: str
    s2: str

with pytest.raises(ValidationError):
    WithStringFields(s1=5, s2="5")

WithStringFields(s1="5", s2=b"test")


WithStringFields(s1='5', s2='test')

## Builtin JSON support

pydantic-core can parse json directly into a model or output type this both :

- Improves performance
- avoids issue with strictness

Pydantic V2 will therefore allow some conversion when validating JSON directly, even in strict mode (e.g. ISO8601 string -> datetime, str -> bytes) even though this would not be allowed when validating a python object.




(to check and further detail)

In future direct validation of JSON will also allow (maybe in 2.1):
- Parsing in a separate thread while starting validation in the main thread
- Line numbers from JSON to be included in the validation errors

In [9]:
json_str = '{"s1": "s1", "s2": "s2", "t3": [1, 2, "third"]}'
class WithStringFieldsandTuple(WithStringFields):
    model_config=dict(strict=True)
    t3: tuple[int, int, str]

WithStringFieldsandTuple.model_validate_json(json_str).model_dump_json()

'{"s1":"s1","s2":"s2","t3":[1,2,"third"]}'

In [10]:
try:
    WithStringFieldsandTuple.model_validate(json.loads(json_str))
except ValidationError as e:
    print(e)

1 validation error for WithStringFieldsandTuple
t3
  Input should be a valid tuple [type=tuple_type, input_value=[1, 2, 'third'], input_type=list]
    For further information visit https://errors.pydantic.dev/2.2/v/tuple_type


## Required vs nullable clean-up

A Nullable (accepting None as a value) might be also required (where None explicitly required as a value)


In [11]:
# https://pydantic-docs.helpmanual.io/usage/models/#required-vs-nullable-cleanup

class Foo(BaseModel):
    f1: str  # required, cannot be None
    f2: str | None  # required, can be None - same as Optional[str] / Union[str, None]
    f22: Optional[str]  # required, can be None (while in Pydantic v1 it is set to None)
    f3: str | None = None  # not required, can be None
    f4: str = 'Foobar'  # not required, but cannot be None


In [12]:
Foo(f1="test", f2="123", f22="22")
Foo(f1="test", f2=None, f22="22")

with pytest.raises(ValidationError):
    Foo(f1="test", f2="123")



## Validation without a model using TypeAdapter (formerly AnalyzedType)

In pydantic V1 the core of all validation was a pydantic model this led to:

 - Performance penalty
 - Extra complexity when the output data type was not  a model



in V2 pydantic-core operates on a tree of validators with no model type required at the base of that tree. It can there fore validate a single string or datetime value a TypedDict or a Model equally easily

In [13]:
from dataclasses import dataclass
from pydantic import model_validator, TypeAdapter # TypeAdapter is the new name for AnalyzedType (https://github.com/pydantic/pydantic/issues/5580)

@dataclass
class Point:
    x: float
    y: float

@dataclass
class Circle:
    center: Point
    radius: float


@dataclass
class Square:
    center: Point
    side: float


class Rectangle(BaseModel):
    center: Point
    width: float
    height: float

    @model_validator(mode='after')
    def infer_width_and_height(cls, data):
        if data.width <= 0:
            data.width *= -1
        if data.height <= 0:
            data.height *= -1
        return data


In [14]:
simple_forms = TypeAdapter(Circle|Square|Rectangle)

for form in [{"center": {"x": 0, "y": 0}, "radius": 1},
             {"center": {"x": 0, "y": 0}, "side": 1},
             {"center": {"x": 0, "y": 0}, "width": 10, "height": -5}]:
    print(simple_forms.dump_json(simple_forms.validate_python(form)))

b'{"center":{"x":0.0,"y":0.0},"radius":1.0}'
b'{"center":{"x":0.0,"y":0.0},"side":1.0}'
b'{"center":{"x":0.0,"y":0.0},"width":10.0,"height":5.0}'


In [15]:
simple_forms.validate_python(Rectangle(center=Point(x=0, y=0), width=1, height=-4))

Rectangle(center=Point(x=0, y=0), width=1.0, height=4.0)

## Wrap validators

logic before and after catching error, new error or defaults

In [16]:
from pydantic import field_validator


class Energy(BaseModel):
    value: int  # energy value in wh

    @field_validator("value", mode="wrap")
    def validate_value(cls, value, handler):
        if value == "null": # Before handler error catching !
            return 0
        try:
            return handler(value)
        except ValidationError:
            return 0 # After handler catching error

    def from_kwh(kwh: int) -> Self:
        return Energy(value=kwh * 10e3)

Energy(value="null")

Energy(value=0)

## Validation using context

In [17]:
import json
from pydantic import field_validator

class User(BaseModel):
    id: int
    name: str

    @field_validator("id")
    def check_user_in_vip(cls, v, info):
        if v not in info.context["vip_ids"]:
            raise ValueError("user is not in vip list")
        return v


In [18]:
vip_ids = [1, 2, 3]
User.model_validate_json(json.dumps({"id": 1, "name": "John"}),
                        context = {"vip_ids": vip_ids})

User(id=1, name='John')

In [19]:
with pytest.raises(ValidationError):
    User.model_validate_json(json.dumps({"id": 4, "name": "John"}),
                        context = {"vip_ids": vip_ids})


## More powerful alias(es)

it can support alias paths as well as simple string aliases to flatten data as it's validated

In [20]:
from pydantic import AliasPath

class FooSimplePath(BaseModel):
    bar: str = Field(validation_alias="al-bar")


class FooLongerPath(BaseModel):
    bar: str = Field(validation_alias=AliasPath('baz', 2, 'qux'))

data = {
    'al-bar': "simple",
    'baz': [
        {'qux': 'a'},
        {'qux': 'b'},
        {'qux': 'longer'},
        {'qux': 'd'},
    ]
}

assert FooSimplePath(**data).bar == "simple"
assert FooLongerPath(**data).bar == "longer"

In [21]:
from pydantic import AliasChoices

class FooPrecedenceRule(BaseModel):
    bar: str = Field(validation_alias=AliasChoices("al-bar", AliasPath('baz', 2, 'qux')))

assert FooPrecedenceRule(**data).bar == "simple"
data.pop('al-bar')
assert FooPrecedenceRule(**data).bar == "longer"



In [22]:
# Another (maybe better) alias example
import json
from pprint import pprint


class TweetSimplified(BaseModel):
    id : str = Field(alias='id_str')
    text: str
    user_id : int  = Field(validation_alias=AliasPath('user', 'id'))
    url : str = Field(validation_alias=AliasPath("entities", "urls", 0 , "unwound", "url")) # todo: get the url list

with open("tweet.json", "r", encoding="utf-8") as tweet_file:
    tweet = TweetSimplified(**json.load(tweet_file))

pprint(tweet.model_dump())

{'id': '850006245121695744',
 'text': '1/ Today we’re sharing our vision for the future of the Twitter API '
         'platform!\n'
         'https://t.co/XweGngmxlP',
 'url': 'https://cards.twitter.com/cards/18ce53wgo4h/3xo1c',
 'user_id': 2244994945}


## Recursive models

model with a reference to it self.

In [23]:
class Energy(BaseModel):
    offset: int = 0
    slots: list[Energy] = Field(default_factory=list) # partial energy from different sources (contributors)
    def from_kwh(kwh: int) -> Self:
        return Energy(offset=kwh * 10e3)
    def simplify(self):
        offset_ = sum([slot.offset for slot in self.slots])
        return Energy(offset=offset_ , slots=[])

In [24]:
e = Energy(offset=0)
e.slots.append(e) # circular reference in Pydantic v1 that would raise a recursion error.

In [25]:
e.model_dump_json()

'{"offset":0,"slots":[{}]}'

# Generics

In [26]:
from typing import Generic, TypeVar

DataT = TypeVar('DataT')

class Stack(BaseModel, Generic[DataT]):
    """ a stack (pile) of data"""
    data: list[DataT] = Field(default_factory=list)

    def add(self, item: DataT):
        self.data.append(item)

    def pop(self) -> Optional[DataT]:
        if len(self.data):
            return self.data.pop()

    def __repr__(self):
        return f"Stack({self.data})"




In [27]:
class EnergyContributions(Stack[Energy]):
    def __repr__(self):
        return f"EnergyContributions({self.data})"
    def simplify(self) -> Self:
        offset_ = 0
        while energy:=self.pop():
            offset_ += energy.offset
        self.add(Energy(offset=offset_))
        return self

class PileOfInts(Stack[int]):
    def __repr__(self):
        return f"PileOfInts({self.data})"
    @property
    def sum(self):
        return sum(self.data)
    def reduce(self):
        print(self.sum)
        sum = 0
        while item:=self.pop():
            sum += item
        self.add(sum)
        return self

In [28]:
EnergyContributions(data = [Energy(offset=i) for i in range(10)]).simplify().model_dump()

{'data': [{'offset': 45, 'slots': []}]}

In [29]:
PileOfInts(data = [i for i in range(10)]).reduce().model_dump()

45


{'data': [45]}

### A recursive generics example

In [30]:
# a Tree example with recursive generics
from typing import Generic, TypeVar, Optional, Union

DataT = TypeVar('DataT')

class BinaryTree(BaseModel, Generic[DataT]):
    left: Optional[Union[DataT, "BinaryTree[DataT]"]] = None
    right: Optional[Union[DataT, "BinaryTree[DataT]"]] = None
    data: DataT

    def add_most_left(self, item: DataT):
        if self.left is None:
            self.left = item
        else:
            self.left.add_most_left(item)
    def add_most_right(self, item: DataT):
        if self.right is None:
            self.right = item
        else:
            self.right.add_most_right(item)
    def traverse(self):
        """ traverse depth first, (left to right) """
        if self.left:
            yield from self.left.traverse()
        yield self.data
        if self.right:
            yield from self.right.traverse()


In [31]:


# let's build a tree
tree :BinaryTree[int] = BinaryTree(left=BinaryTree(data=1), data=2, right=BinaryTree(data=3))
tree.add_most_right(BinaryTree(data=4))
# let's check the data with a traverse

assert list(tree.traverse())==list(range(1,5))

BinaryTree[int].model_validate_json(tree.model_dump_json())


BinaryTree[int](left=BinaryTree[int](left=None, right=None, data=1), right=BinaryTree[int](left=None, right=BinaryTree[int](left=None, right=None, data=4), data=3), data=2)

# Serialization

in v1 it asks the value, in v2 it asks the type annotation to do the serialization
it solves **"do not ask the type"** problem

In [32]:
class PublicCustomer(BaseModel):
    id: int
    name: str

class PrivateCustomer(PublicCustomer):
    vat_number: str = Field(validation_alias=AliasPath("vat", "number"))
    email: str = Field(validation_alias=AliasPath("contact", "email"))
    phone: str = Field(validation_alias=AliasPath("contact", "phone"))

class PublicAccount(BaseModel):
    account_id: int
    customer: PublicCustomer

class PrivateAccount(BaseModel):
    account_id: int
    customer: PrivateCustomer


In [33]:

private_customer = PrivateCustomer.model_validate({"vat": {"number": "123"}, "id":"123", "name": "John", "contact": {"email": "abc@abc.com", "phone": "123456789"}})

# it doesn't serialize the private fields since the model is PublicAccount
# it doesn't ask the value which is private_customer but rather the type of the field which is PublicCustomer
print(PublicAccount(account_id=1, customer=private_customer).model_dump())

{'account_id': 1, 'customer': {'id': 123, 'name': 'John'}}


## Migration

https://docs.pydantic.dev/dev-v2/migration/

code transformation tool -> bump-pydantic (python package)

```bash 
pip install bump-pydantic
```

In [34]:
! bump-pydantic --help

[1m                                                                                [0m
[1m [0m[1;33mUsage: [0m[1mbump-pydantic [OPTIONS] PATH COMMAND [ARGS]...[0m[1m                         [0m[1m [0m
[1m                                                                                [0m
 Convert Pydantic from V1 to V2 ♻️                                               
 [2mCheck the README for more information: [0m                                        
 [2mhttps://github.com/pydantic/bump-pydantic.[0m                                     
                                                                                
[2m╭─[0m[2m Arguments [0m[2m─────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [31m*[0m    path      [1;33mPATH[0m  [2m[default: None][0m [2;31m[required][0m                              [2m│[0m
[2m╰──────────────────────────────────────────────────────────────────────────────╯[0m
[2m╭─[0m[2m Opti

let's try migrating the tweet models

In [35]:
! bump-pydantic  --log-file tweet_model_migration_log.txt --diff ./tweet_v1.py


[2;36m[15:17:29][0m[2;36m [0mStart bump-pydantic.                                       ]8;id=123630;file:///home/wo/workspace/my-lab/pydantic/pydantic-v2-demo/.env/lib/python3.11/site-packages/bump_pydantic/main.py\[2mmain.py[0m]8;;\[2m:[0m]8;id=790448;file:///home/wo/workspace/my-lab/pydantic/pydantic-v2-demo/.env/lib/python3.11/site-packages/bump_pydantic/main.py#60\[2m60[0m]8;;\
[2;36m          [0m[2;36m [0mFound [1;36m1[0m files to process                                   ]8;id=418946;file:///home/wo/workspace/my-lab/pydantic/pydantic-v2-demo/.env/lib/python3.11/site-packages/bump_pydantic/main.py\[2mmain.py[0m]8;;\[2m:[0m]8;id=608706;file:///home/wo/workspace/my-lab/pydantic/pydantic-v2-demo/.env/lib/python3.11/site-packages/bump_pydantic/main.py#75\[2m75[0m]8;;\
[2KLooking for Pydantic Models... [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [35m100%[0m [36m0:00:00[0m
[2KExecuting codemods... [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [36]:
# let's try migrating a bigger project ...