# Pydantic V2 demo

This notebook explores and demo the features and claims announced in the v2 plan -> https://docs.pydantic.dev/latest/blog/pydantic-v2/
from a Pydantic v2(.2) perspective

In [1]:
from pydantic import BaseModel, ValidationError, Field
from typing import Self, List
import pytest


In [2]:
! pip freeze | grep pydantic

pydantic==2.2.0
pydantic_core==2.6.0


## V2 gained in performance


Using pyo3/rust underneath:

- Gain in performance (order of magnitude 10x, 5x to 50 x)
- Multithreading
- Reusing rust libraries
- More explicit error handling (within rust)


## Namespace clean-up

All methods on models will start with model_, fields' names will not be allowed to start with "model" (aliases can be used if required).

 - avoid confusing gotchas when field names clash with methods on a model
 - make it safer to add more methods to a model without risking new clashes

```python

class BaseModel:
    model_fields: List[FieldInfo]
    """previously `__fields__`, although the format will change a lot"""
    @classmethod
    def model_validate(cls, data: Any, *, context=None) -> Self:  # 


        """
        previously `parse_obj()`, validate data
        """
    @classmethod
    def model_validate_json(
        cls,
        data: str | bytes | bytearray,
        *,
        context=None
    ) -> Self:
        """
        previously `parse_raw(..., content_type='application/json')`
        validate data from JSON
        """
    @classmethod
    def model_is_instance(cls, data: Any, *, context=None) -> bool: # 


        """
        new, check if data is value for the model
        """
    @classmethod
    def model_is_instance_json(
        cls,
        data: str | bytes | bytearray,
        *,
        context=None
    ) -> bool:
        """
        Same as `model_is_instance`, but from JSON
        """
    def model_dump(
        self,
        include: ... = None,
        exclude: ... = None,
        by_alias: bool = False,
        exclude_unset: bool = False,
        exclude_defaults: bool = False,
        exclude_none: bool = False,
        mode: Literal['unchanged', 'dicts', 'json-compliant'] = 'unchanged',
        converter: Callable[[Any], Any] | None = None
    ) -> Any:
        """
        previously `dict()`, as before
        with new `mode` argument
        """
    def model_dump_json(self, ...) -> str:
        """
        previously `json()`, arguments as above
        effectively equivalent to `json.dump(self.model_dump(..., mode='json'))`,
        but more performant
        """
    def model_json_schema(self, ...) -> dict[str, Any]:
        """
        previously `schema()`, arguments roughly as before
        JSON schema as a dict
        """
    def model_update_forward_refs(self) -> None:
        """
        previously `update_forward_refs()`, update forward references
        """
    @classmethod
    def model_construct(
        self,
        _fields_set: set[str] | None = None,
        **values: Any
    ) -> Self:
        """
        previously `construct()`, arguments roughly as before
        construct a model with no validation
        """
    @classmethod
    def model_customize_schema(cls, schema: dict[str, Any]) -> dict[str, Any]:
        """
        new, way to customize validation,
        e.g. if you wanted to alter how the model validates certain types,
        or add validation for a specific type without custom types or
        decorated validators
        """
    class ModelConfig:
        """
        previously `Config`, configuration class for models
        """

``````

##  Strict mode

Where data is not coerced but rather an error is raised


In [3]:
class Energy(BaseModel):
    value: int  # energy value in wh
    def from_kwh(kwh: int) -> Self:
        return Energy(value=kwh * 10e3)

In [4]:
Energy(value="3") # data coerced

Energy(value=3)

In [5]:
# testing model validate json
Energy.model_validate_json("{\"value\": 100}")

Energy(value=100)

In [6]:
class EnergyStrictMode(BaseModel):
    model_config = dict(strict=True)
    value: int  # energy value in wh
    def from_kwh(kwh: int) -> Self:
        return Energy(value=kwh * 10e3)

In [7]:
with pytest.raises(ValidationError):
    EnergyStrictMode(value="3")

## Formalized Conversion table

(to link from the bog)

It offers a solution for inconsistency around data conversion

If the input data has a single and intuitive representation in the field's type and no data is lost during the conversion then the data will be converted; otherwise a  validation error is raised.

string fields are the exception this rule:
 only **str, bytes and bytearray** are valid as inputs to string fields.


In [8]:
class WithStringFields(BaseModel):
    s1: str
    s2: str

with pytest.raises(ValidationError):
    WithStringFields(s1=5, s2="5")

WithStringFields(s1="5", s2=b"test")


WithStringFields(s1='5', s2='test')

## Builtin JSON support

pydantic-core can parse json directly into a model or output type this both :

- Improves performance
- avoids issue with strictness

Pydantic V2 will therefore allow some conversion when validating JSON directly, even in strict mode (e.g. ISO8601 string -> datetime, str -> bytes) even though this would not be allowed when validating a python object.

In future direct validation of JSON will also allow (maybe in 2.1):
- Parsing in a separate thread while starting validation in the main thread
- Line numbers from JSON to be included in the validation errors


In [9]:
# TODO example (let's get a tweet json and try to use it also for aliases)

## Required vs nullable clean-up

A Nullable (accepting None as a value) might be also required (where None explicitly required as a value)


In [10]:
# Required vs nullable cleanup
# https://pydantic-docs.helpmanual.io/usage/models/#required-vs-nullable-cleanup

from typing import Optional
from pydantic import BaseModel


class Foo(BaseModel):
    f1: str  # required, cannot be None
    f2: str | None  # required, can be None - same as Optional[str] / Union[str, None]
    f22: Optional[str]  # required, can be None (while in Pydantic v1 it is set to None)
    f3: str | None = None  # not required, can be None
    f4: str = 'Foobar'  # not required, but cannot be None



In [11]:
Foo(f1="test", f2="123", f22="22")
Foo(f1="test", f2=None, f22="22")

with pytest.raises(ValidationError):
    Foo(f1="test", f2="123")



## Validation without a model

In pydantic V1 the core of all validation was a pydantic model this led to
 - Performance penalty
 - Extra complexity when the output data type was not  a model

in V2 pydantic-core operates on a tree of validators with no model type required at the base of that tree. It can there fore validate a single string or datetime value a TypedDict or a Model equally easily

In [12]:
# TODO example (and move it later) in the presentation

## Wrap validators

logic before and after catching error, new error or defaults

In [13]:
from pydantic import field_validator


class Energy(BaseModel):
    value: int  # energy value in wh

    @field_validator("value", mode="wrap")
    def validate_value(cls, value, handler):
        if value == "null": # Before handler error catching !
            return 0
        try:
            return handler(value)
        except ValidationError:
            return 0 # After handler catching error

    def from_kwh(kwh: int) -> Self:
        return Energy(value=kwh * 10e3)



In [14]:
Energy(value="null")

Energy(value=0)

## Validation using context

In [15]:
import json
from pydantic import field_validator

class User(BaseModel):
    id: int
    name: str

    @field_validator("id")
    def check_user_in_vip(cls, v, info):
        if v not in info.context["vip_ids"]:
            raise ValueError("user is not in vip list")
        return v


In [16]:
vip_ids = [1, 2, 3]
User.model_validate_json(json.dumps({"id": 1, "name": "John"}),
                        context = {"vip_ids": vip_ids})

User(id=1, name='John')

In [17]:
with pytest.raises(ValidationError):
    User.model_validate_json(json.dumps({"id": 4, "name": "John"}),
                        context = {"vip_ids": vip_ids})


## More powerful alias(es)

it can support alias paths as well as simple string aliases to flatten data as it's validated

In [18]:
from pydantic import AliasPath

class FooSimplePath(BaseModel):
    bar: str = Field(validation_alias=AliasPath("al-bar"))


class FooLongerPath(BaseModel):
    bar: str = Field(validation_alias=AliasPath('baz', 2, 'qux'))

data = {
    'al-bar': "simple",
    'baz': [
        {'qux': 'a'},
        {'qux': 'b'},
        {'qux': 'longer'},
        {'qux': 'd'},
    ]
}

assert FooSimplePath(**data).bar == "simple"
assert FooLongerPath(**data).bar == "longer"

In [19]:
# Another (maybe better) alias example
import json
from pprint import pprint


class TweetSimplified(BaseModel):
    id : str = Field(alias='id_str')
    text: str
    user_id : int  = Field(validation_alias=AliasPath('user', 'id'))
    url : str = Field(validation_alias=AliasPath("entities", "urls", 0, "unwound", "url"))

with open("tweet.json", "r", encoding="utf-8") as tweet_file:
    tweet = TweetSimplified(**json.load(tweet_file))

pprint(tweet.model_dump())

{'id': '850006245121695744',
 'text': '1/ Today we’re sharing our vision for the future of the Twitter API '
         'platform!\n'
         'https://t.co/XweGngmxlP',
 'url': 'https://cards.twitter.com/cards/18ce53wgo4h/3xo1c',
 'user_id': 2244994945}


## Recursive models

In [20]:
class Energy(BaseModel):
    offset: int = 0
    slots: List[Energy] = [] # partial energy from different sources (contributors)
    def from_kwh(kwh: int) -> Self:
        return Energy(offset=kwh * 10e3)
    def simplify(self):
        offset_ = sum([slot.offset for slot in self.slots])
        return Energy(offset=offset_ , slots=[])

In [21]:
e = Energy(offset=0) # .model_dump()
e.slots.append(e)



In [22]:
e.model_dump()

{'offset': 0, 'slots': [{}]}

In [23]:
e.simplify().model_dump()

{'offset': 0, 'slots': []}

# Generics

# Serialization

## Migration

https://docs.pydantic.dev/dev-v2/migration/

let's try migrating the tweet models