# Intro to Pydantic V2

## Basic Model

In [2]:
from pydantic import BaseModel

In [3]:
class Person(BaseModel):
    first_name: str
    last_name: str
    age: int

As you can see, we define the data type of the fields in the model using Python type hints, and we inherit from `BaseModel`.

We can now craete instances of this model in a variety of ways:

In [4]:
p = Person(first_name="John", last_name="Doe", age=30)

In [7]:
print(p)

first_name='John' last_name='Doe' age=30


Pydantic will also perform validation on your "input" data. In some cases it will attempt to coerce the input data to the proper type - when it cannot do so, it will raise a `pydantic.ValidationError` exception.

So, when you have a Pydantic model instance, you are guaranteed that the fields will be of the type specified in the model.

In [8]:
from pydantic import ValidationError

In [11]:
try:
    Person(first_name="John", last_name="Doe", age="junk")
except ValidationError as e:
    print(e)

1 validation error for Person
age
  Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='junk', input_type=str]
    For further information visit https://errors.pydantic.dev/2.10/v/int_parsing


Fields in model instances can be accessed using object dot notation:

In [12]:
p = Person(first_name="John", last_name="Doe", age=30)
print(p)

first_name='John' last_name='Doe' age=30


In [13]:
p.first_name

'John'

We can even mutate these field values:

In [14]:
p.first_name = 'James'
print(p)

first_name='James' last_name='Doe' age=30


One caution here - by default Pydantic validates the data being deserialized, but does not validate data being changed via assignments (although you can modify Pydantic's behavior):

In [16]:
p.age = "unknown"
print(p)

first_name='James' last_name='Doe' age='unknown'


## Validation Exceptions

As we just saw, Pydantic validates all the fields - it does not just stop at the first validation error.

So far we have just been printing the error message, but you can also get the list of erros as data using some special methods provided by `ValidationError` exceptions:

In [17]:
try:
    Person(first_name="John", last_name="Doe", age="junk")
except ValidationError as e:
    exceptions = e

In [18]:
exceptions.errors()

[{'type': 'int_parsing',
  'loc': ('age',),
  'msg': 'Input should be a valid integer, unable to parse string as an integer',
  'input': 'junk',
  'url': 'https://errors.pydantic.dev/2.10/v/int_parsing'}]

This gave us a Python dictioanary, and we can also get it back as JSON:

In [20]:
print(exceptions.json())

[{"type":"int_parsing","loc":["age"],"msg":"Input should be a valid integer, unable to parse string as an integer","input":"junk","url":"https://errors.pydantic.dev/2.10/v/int_parsing"}]


This is useful when you need to return the exceptions to a caller - for example if you have a REST API that needs to return any validation exceptions for JSON data that was submitted to you endpoints.

## Deserializing Data

We have two additional ways to "load" data to generate model instances.

This process of taking data in one format and generating a Python object, is called deserialization.

Pydantic supports deserializing from a Python dictionary:

In [21]:
data = {
    'first_name': 'John',
    'last_name': 'Doe',
    'age': 30
}

p = Person.model_validate(data)
print(p)

first_name='John' last_name='Doe' age=30


It also supports deserializing from JSON:

In [22]:
data_json = '''
{
    "first_name": "John",
    "last_name": "Doe",
    "age": 30
}
'''

p = Person.model_validate_json(data_json)
print(p)

first_name='John' last_name='Doe' age=30


We can inspect the model's fields this way, to see how they are currently defined:

In [23]:
Person.model_fields

{'first_name': FieldInfo(annotation=str, required=True),
 'last_name': FieldInfo(annotation=str, required=True),
 'age': FieldInfo(annotation=int, required=True)}

## Required vs Optional Fields

One thing you'll notice in there, is that the fields are marked as **required**.

And indeed, if we try to deserialize data that is missing any of those fields, we'll get a `ValidationError` exception:

In [24]:
try:
    Person(age=42)
except ValidationError as e:
    print(e)

2 validation errors for Person
first_name
  Field required [type=missing, input_value={'age': 42}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing
last_name
  Field required [type=missing, input_value={'age': 42}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing


In [25]:
data = {"age": 42}

try:
    Person.model_validate(data)
except ValidationError as e:
    print(e)

2 validation errors for Person
first_name
  Field required [type=missing, input_value={'age': 42}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing
last_name
  Field required [type=missing, input_value={'age': 42}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing


To make a field optional, we simply specify a default value for it:

In [26]:
class Person(BaseModel):
    first_name: str
    last_name: str
    age: int = 0
    

Person.model_fields

{'first_name': FieldInfo(annotation=str, required=True),
 'last_name': FieldInfo(annotation=str, required=True),
 'age': FieldInfo(annotation=int, required=False, default=0)}

Another you can use `Optional` from `typing` library to make a field optional:

In [32]:
from typing import Optional

class Person(BaseModel):
    first_name: str
    last_name: str
    age: Optional[int] = 0
    
Person.model_fields

{'first_name': FieldInfo(annotation=str, required=True),
 'last_name': FieldInfo(annotation=str, required=True),
 'age': FieldInfo(annotation=Union[int, NoneType], required=False, default=0)}

As you can see, the `age` field is no longer a required field since it has a default value.

In [33]:
p = Person(first_name="John", last_name="Doe")
print(p)


first_name='John' last_name='Doe' age=0


## Nullable Fields

We can also set defaults to `None`, but there you have to be a bit more careful - the type should technically allow `None`, since the type of `None` is the same as say `str`.

So, we just need to amend our type hint accordingly:

In [41]:
class Person(BaseModel):
    first_name: str | None = None 
    last_name: str 
    age: int | None = None 
    

In [40]:
print(Person.model_fields)

{'first_name': FieldInfo(annotation=str, required=False, default=None), 'last_name': FieldInfo(annotation=str, required=True), 'age': FieldInfo(annotation=int, required=False, default=None)}


As you can see from the inspection, `first_name` and `age` are optional fields (not required), and are nullable (can be set to `None`).

In [37]:
p = Person(last_name='Smith')
print(p)

first_name=None last_name='Smith' age=None


The notation `str | None` is just an alternative syntax available in more recent versions of Python - for older versions you can use the canonical way of doing this, using a `Union`:

In [42]:
from typing import Union

class Person(BaseModel):
    first_name: Union[str, None] = None 
    last_name: str
    age: Union[int, None] = None 

In [43]:
Person.model_fields

{'first_name': FieldInfo(annotation=Union[str, NoneType], required=False, default=None),
 'last_name': FieldInfo(annotation=str, required=True),
 'age': FieldInfo(annotation=Union[int, NoneType], required=False, default=None)}

In [45]:
print(Person(last_name='Smith'))

first_name=None last_name='Smith' age=None


You can also use the `Optional` type annotation - but I never use it with Pydantic as I find that name too easy to mistake for think a field is optional the `Optional` hint simply means **nullable** (can be `None`), not that the field is optional in the Pydantic sense.

In [46]:
from typing import Optional

class Person(BaseModel):
    first_name: Optional[str] = None 
    last_name: str
    age: Optional[int] = None 

In [47]:
Person.model_fields

{'first_name': FieldInfo(annotation=Union[str, NoneType], required=False, default=None),
 'last_name': FieldInfo(annotation=str, required=True),
 'age': FieldInfo(annotation=Union[int, NoneType], required=False, default=None)}

The `Optional` hint here has nothing to do with whether the field is optional or not:

In [48]:
class Test(BaseModel):
    name: Optional[str]
    age: int

In [49]:
Test.model_fields

{'name': FieldInfo(annotation=Union[str, NoneType], required=True),
 'age': FieldInfo(annotation=int, required=True)}

As you can see, `name` is a required field, not optional.

In [50]:
try:
    Test(age=42)
except ValidationError as e:
    print(e)

1 validation error for Test
name
  Field required [type=missing, input_value={'age': 42}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing


Pydantic fully supports Python's type hinting system, so you could for example specify a field to be a list of a certain type:

In [51]:
class Person(BaseModel):
    first_name: str | None = None 
    last_name: str 
    age: int | None = None 
    lucky_numbers: list[int] = []
    

In [52]:
Person.model_fields

{'first_name': FieldInfo(annotation=Union[str, NoneType], required=False, default=None),
 'last_name': FieldInfo(annotation=str, required=True),
 'age': FieldInfo(annotation=Union[int, NoneType], required=False, default=None),
 'lucky_numbers': FieldInfo(annotation=list[int], required=False, default=[])}

And the type coercion will apply to the elements of the list as well:

In [55]:
p = Person(last_name='Smith', lucky_numbers=[1, '2', 3.0])
print(p)

first_name=None last_name='Smith' age=None lucky_numbers=[1, 2, 3]


As you can see, the elements of the list were all successfully coerced to integers:

In [57]:
for number in p.lucky_numbers:
    print(f"{number} ({type(number).__name__})")

1 (int)
2 (int)
3 (int)


## Aliases and the Field Class

Sometimes the data we are attempting to deserialize uses names that we simply do not, or cannot use in our model.

For example consider this data we would like to model using Pydantic:

In [58]:
data = {
    "id": 100,
    "First Name": "John",
    "LASTNAME": "Smith",
    "age in years": 42,
}

Obviously, some of these field names we cannot even specify in Python (the ones with the spaces in the names).

To help with that, Pydantic has a way to define an alternative name to our field names, called **aliases**.

Here's how we would set up a model to handle that data:

In [59]:
from pydantic import Field

class Person(BaseModel):
    id_: int = Field(alias='id')
    first_name: str = Field(alias='First Name')
    last_name: str = Field(alias='LASTNAME')
    age: int = Field(alias='age in years')
    

In [60]:
p = Person.model_validate(data)
print(p)

id_=100 first_name='John' last_name='Smith' age=42


## Serialization

Pydantic models also give us the ability to **serialize** data models - that is, take the Python object and generate either a Python dictionary or a JSON string with the data:

In [62]:
p.model_dump()

{'id_': 100, 'first_name': 'John', 'last_name': 'Smith', 'age': 42}

In [63]:
p.model_dump_json()

'{"id_":100,"first_name":"John","last_name":"Smith","age":42}'

As you can see, serialization uses the field names, not the aliases to serialize. 

Since we have aliases, we could, if we wanted to, also serialize using the aliases instead of the field names:

In [64]:
p.model_dump(by_alias=True)

{'id': 100, 'First Name': 'John', 'LASTNAME': 'Smith', 'age in years': 42}

In [65]:
p.model_dump_json(by_alias=True)

'{"id":100,"First Name":"John","LASTNAME":"Smith","age in years":42}'

## Field and Defaults

When we used the `Field` object to define an alias, we lost the ability to set our fields to some default value.

However, we can use an argument in the `Field` object to define the default value:

In [66]:
class Person(BaseModel):
    first_name: str | None = Field(alias='firstname', default=None)
    last_name: str | None = Field(alias='lastname')

In [67]:
data = {
    'lastname': 'Smith'
}

p = Person.model_validate(data)
print(p)

first_name=None last_name='Smith'


When we specify an alias, we **must** use the alias when deserializing data:

For example, this will not work:

In [68]:
try:
    Person(last_name='Smith')
except ValidationError as e:
    print(e)

1 validation error for Person
lastname
  Field required [type=missing, input_value={'last_name': 'Smith'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing


In [69]:
data = {
    'last_name': 'Smith'
}

try:
    Person.model_validate(data)
except ValidationError as e:
    print(e)

1 validation error for Person
lastname
  Field required [type=missing, input_value={'last_name': 'Smith'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing


## Model Config: Populate by Name

We can however, configure our model to allow population by not just the alias, but the field name as well.

To do that we have to provide a model configuration. We do this by creating another attribute in our model:

In [71]:
from pydantic import ConfigDict

class Person(BaseModel):
    model_config = ConfigDict(populate_by_name=True)
    
    first_name : str | None = Field(alias='firstname', default=None)
    last_name: str = Field(alias='lastname')

And now we can deserialize using either the alias or the field name:

In [72]:
p = Person(first_name='John', last_name='Smith')
print(p)

first_name='John' last_name='Smith'


In [73]:
data = {
    'first_name': 'John',
    'last_name': 'Smith'
}

p = Person.model_validate(data)
print(p)

first_name='John' last_name='Smith'


## Mutable Defaults

Returning to defaults, one thing Pydantic can handle is setting default values to mutable objects - something that is usually problematic in Python, and diallowed (by default) in dataclasses.

So, defining defaults this way is perfectly acceptable in Pydantic (Pydantic basically identifies mutable defaults, and creates a deepcopy of the default when creating new instances of the model):

In [74]:
from typing import List

class Model(BaseModel):
    numbers: List[int] = []

In [75]:
m1 = Model()
m2 = Model()

In [77]:
m1.numbers.extend([1, 2, 3])
print(m1)

numbers=[1, 2, 3, 1, 2, 3]


In [78]:
m2

Model(numbers=[])

As, you can see, the default `[]` is not shared by the instances.

## Default Factories

Sometimes however, we want to generate a default not as a static value, but rather as a value that should be calculated each time an instance is created that needs the default.

We can do that using a **default factory**. Basically, we provide a function that Pydantic will call to generate the default value each time a model instance is created that requires that default.

For example, suppose we want a default to be the current time at which the instance is created:

In [95]:
from datetime import datetime, timezone

class Log(BaseModel):
    dt: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))
    message: str

In [96]:
log1 = Log(message='message 1')

In [97]:
log2 = Log(message='message 2')

In [98]:
print(log1)

dt=datetime.datetime(2024, 12, 10, 9, 35, 53, 224504, tzinfo=datetime.timezone.utc) message='message 1'


In [99]:
print(log2)

dt=datetime.datetime(2024, 12, 10, 9, 36, 0, 994378, tzinfo=datetime.timezone.utc) message='message 2'


## Custom Serializers

Pydantic has a default way of serializing data. For example, serializing floats will result in a certain number of digits after the deciaml point being used, dependent on the actual float:

In [100]:
class Model(BaseModel):
    number: float

In [101]:
m = Model(number=1.0)
m.model_dump()

{'number': 1.0}

In [102]:
m = Model(number=1/3)
m.model_dump()

{'number': 0.3333333333333333}

Similarly, datetimes get serialized to JSON using the `isoformat()` method of datetimes:

In [104]:
dt = datetime.now(timezone.utc)
dt.isoformat()

'2024-12-10T09:38:45.572140+00:00'

In [105]:
class Model(BaseModel):
    dt: datetime


In [106]:
m = Model(dt=datetime.now(timezone.utc))
m.model_dump()

{'dt': datetime.datetime(2024, 12, 10, 9, 39, 14, 239576, tzinfo=datetime.timezone.utc)}

We can choose to override this serialization

We have to be a bit careful since we actually have tow modes of serialization: to a Python dictionary (so Python objects), and to JSON (so strings).

Let's say we want to customize the float serialization so all floats are rounded to 2 decimal places, in both dictionary and JSON serialization.

In [107]:
from pydantic import field_serializer

In [108]:
class Model(BaseModel):
    number: float
    
    @field_serializer('number')
    def serialize_float(self, value):
        return round(value, 2)

In [109]:
m = Model(number=1/3)

In [110]:
m.model_dump()

{'number': 0.33}

In [111]:
m.model_dump_json()

'{"number":0.33}'

To specify the serializer for our `datetime` however, we only to affect the JSON serialization. We still want our dictionary serialization to output the actual datetime object.

We can do that by specifying that our serializer should only apply to JSON serialization. In fact, we'll go one step further and only specify a serializer if serializing to JSON and if the value is not `None` (we can let Pydantic handle serializing `None` objects).

In [161]:
class Model(BaseModel):
    number: float
    dt: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))
    
    @field_serializer('number')
    def serialize_float(self, value):
        return round(value, 2)
    
    @field_serializer("dt", when_used="json-unless-none")
    def serialize_datetime_to_json(self, value):
        return value.strftime("%Y/%-m/%-d %I:%M %p")

In [162]:
m = Model(number=1/3)
print(m)

number=0.3333333333333333 dt=datetime.datetime(2024, 12, 10, 9, 53, 10, 271371, tzinfo=datetime.timezone.utc)


In [119]:
m.model_dump()

{'number': 0.33,
 'dt': datetime.datetime(2024, 12, 10, 9, 47, 15, 135091, tzinfo=datetime.timezone.utc)}

As you can see, `dt` serialization remained unaffacted, but when we serialize to JSON, our custom serializer is used.

In [169]:
m.model_dump_json()

PydanticSerializationError: Error serializing to JSON: PydanticSerializationError: Error calling function `serialize_datetime_to_json`: ValueError: Invalid format string

## Custom Validators

There are different types of validators available in Pydantic. One type are **before** validators, that run before Pydantic has a chance to validate and coerce the data according to our field definition. The second type are **after** validators that happen after Pydantic has already processed the raw data, validated it and coerced it to the proper type, as defined by the field definition.

Before validators can be very handy to provide custom parsing of data that Pydantic would otherwise be unable to do. For example deserializing a date provided in a format that Python does not recognize (e.g. `2024/1/1 3:15pm`).

Here, I am only going to cover after validators - Pydantic's docs has more information on before validators.

Validators are not just validation functions - they are also **transformation** functions - for example Pydantic's validators can modify the type of the data being deserialized to coerce it into the proper type. Many of Pydantic's pre-defined special types also perform both validation and transformation.

An after validator can therefore be used to transform the data as it is being deserialized.

Let's take a look:

In [170]:
from pydantic import field_validator

In [171]:
class Model(BaseModel):
    absolute: int
    
    @field_validator("absolute")
    @classmethod
    def make_absolute(cls, value):
        return abs(value)

In [172]:
Model(absolute=-10)

Model(absolute=10)

As you can see, our custom validator **transformed** the input value.

One thing that's very important to note is that our custom validator, being an **after** validator will get called once Pydantic has had a change to parse the input data to an int. If that validation fails, our custom validator will not even get called.

In [176]:
class Model(BaseModel):
    absolute: int
    
    @field_validator("absolute")
    @classmethod
    def make_absolute(cls, value):
        print(f"Running custom validator: {value=}, {type(value)=}")
        return abs(value)

In [178]:
print(Model(absolute=-10))

Running custom validator: value=-10, type(value)=<class 'int'>
absolute=10


Let's pass something that is not an integer, but can be coerced to an integer:

In [179]:
Model(absolute='-10')

Running custom validator: value=-10, type(value)=<class 'int'>


Model(absolute=10)

As you can see, our validator received an integer, not a string.

And if Pydantic's validation fails, our validator is not called:

In [180]:
try:
    Model(absolute='abc')
except ValidationError as e:
    print(e)

1 validation error for Model
absolute
  Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='abc', input_type=str]
    For further information visit https://errors.pydantic.dev/2.10/v/int_parsing


Notice our custom validator's print statement never executed.

We can of course, also use custom validators to perform validation.

Let's say we want to define a field that should be a list of unique integers.

We can do it this way:

In [181]:
class Model(BaseModel):
    numbers: list[int] = []
    
    @field_validator("numbers")
    @classmethod
    def ensure_unique(cls, numbers):
        if len(set(numbers)) != len(numbers):
            raise ValueError("elements must be unique")
        return numbers

In [184]:
print(Model(numbers=[1, 2, 3]))

numbers=[1, 2, 3]


In [185]:
try:
    Model(numbers=[1, 1, 2, 3])
except ValidationError as e:
    print(e)


1 validation error for Model
numbers
  Value error, elements must be unique [type=value_error, input_value=[1, 1, 2, 3], input_type=list]
    For further information visit https://errors.pydantic.dev/2.10/v/value_error


Notice how I raised a `ValueError` error. If you want to raise a Pydantic `ValidationException`, you shoudl raise a `ValueError`. Most of the other exception types (such as `TypeError`, `KeyError`, etc) will bubble up as those exceptions, not a `ValidationError` exception). There are a few other errors you can raise that will result in a `ValidationError` exception, but by far `ValueError` is the easiest and safest way to do so.

## Nested Models

You can nest Pydantic models, and deserialization and serialization of the sub models will be handled by Pydantic automatically.

For example, let's say we want to create a model to deserialize this data:

In [186]:
data = {
    "firstName": "Arthur",
    "lastName": "Clarke",
    "born": {
        "place": {
            "country": "Lunar Colony",
            "city": "Central City",
        },
        "date": "3001-01-01",
    }
}

As you can see we have three levels of nested dictionaries - which we can model this way:

In [187]:
from datetime import date

class Place(BaseModel):
    country: str
    city: str

class Born(BaseModel):
    place: Place
    dt: date = Field(alias="date")
    
class Person(BaseModel):
    first_name: str | None = Field(alias="firstName", default=None)
    last_name: str = Field(alias="lastName")
    born: Born

In [188]:
arthur = Person.model_validate(data)
print(arthur)

first_name='Arthur' last_name='Clarke' born=Born(place=Place(country='Lunar Colony', city='Central City'), dt=datetime.date(3001, 1, 1))


And of course, we can now access all these fields using object dot notation:

In [189]:
arthur.born.place.country

'Lunar Colony'