# Learning Pydantic data validation

[Using this guided tutorial series.](https://www.youtube.com/watch?v=7aBRk_JP-qY&)

## Introduction

Pydantic is important because of its **strict type** checking. It makes validation efficient and minimal.

Let's check our version to make sure we're working with at least version 2.

In [1]:
import pydantic

print(pydantic.__version__)

2.12.5


First, let's look at how we'd validate data in a class using the classic Python way.

In [2]:
class Bottle:
	def __init__(self, ounces: int, drink_name="Coca-Cola") -> None:
		if not isinstance(ounces, int):
			raise TypeError(
				f"Expected ounces to be an int, got {type(ounces).__name__}"
			)

		if not isinstance(drink_name, str):
			raise TypeError(
				f"Expected name to be a str, got {type(drink_name).__name__}"
			)

		self.ounces = ounces
		self.drink_name = drink_name


try:
	sprite = Bottle(ounces="20", drink_name="Sprite")
except TypeError as e:
	print(e)

Expected ounces to be an int, got str


## Basics

Here's how this would be done in `pydantic`.

In [3]:
from pydantic import BaseModel


class Bottle(BaseModel):
	ounces: int
	drink_name: str = "Coca-Cola"


sprite = Bottle(ounces="twenty", drink_name="Sprite")

ValidationError: 1 validation error for Bottle
ounces
  Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='twenty', input_type=str]
    For further information visit https://errors.pydantic.dev/2.12/v/int_parsing

`pydantic` is useful because it will attempt to convert a variable to the correct type when initializing.

In [None]:
sprite = Bottle(ounces="20", drink_name="Sprite")
print(sprite.ounces)

We can look at the `model_fields_set` to look at the fields...

In [None]:
print(sprite.model_fields_set)

dr_pepper = Bottle(ounces=16)
print(dr_pepper.model_fields_set)

Additionally, we have some helpers to check out our data.

In [None]:
print("Model dump")
print(dr_pepper.model_dump())

print("Model dump json")
print(dr_pepper.model_dump_json())

print("Model json schema")
print(dr_pepper.model_json_schema())

## Nested models

Let's take a look at the `pydantic` approach to **nested models**. 

In [None]:
from typing import List, Optional
from pydantic import BaseModel


class Drink(BaseModel):
	name: str
	price: float
	ingredients: Optional[List[str]] = None


class Bar(BaseModel):
	name: str
	location: str
	drinks: List[Drink]


bar_instance = Bar(
	name="Champagne Haven",
	location="6 Tasty St.",
	drinks=[
		{"name": "Prosecco", "price": 7.49, "ingredients": ["Champagne"]},
		{"name": "Brut", "price": 9.99},
	],
)


print(bar_instance)
print(bar_instance.model_dump())

The above works, but there's type hinting problems with `Pylance` because it's not an exact instance of a `Drink`.

In [None]:
bar_instance = Bar(
	name="Champagne Haven",
	location="6 Tasty St.",
	drinks=[Drink(name="Prosecco", price=7.49)],
)

print(bar_instance)
print(bar_instance.model_dump())

## Additional Parsers

We can also use more advanced features of `pydantic`.

In [None]:
!py -m pip install pydantic[email]

This includes some more advanced features to verify emails, positive integers, and more. Helpful for detailed data validation.

To veryify emails, the `EmailStr` ensures that there's a `@` sybmol and a `.com`, `.net`, etc.

In [None]:
from typing import Annotated, List
from pydantic import BaseModel, EmailStr, PositiveInt, Field, HttpUrl, TypeAdapter


class Address(BaseModel):
	street: str
	city: str
	state: str
	zip_code: str


class Student(BaseModel):
	name: str
	major: str
	email: EmailStr


class Teacher(BaseModel):
	name: str
	email: EmailStr


class School(BaseModel):
	# The three dots after Field tell us this is a *required* property
	name: str = Field(..., pattern=r"^[a-zA-Z0-9;' ]+$")
	teacher: Teacher
	address: Address
	# This tells us that we want a list of Students minimum size of 2
	students: Annotated[List[Student], Field(min_length=2)]
	number_of_classes: PositiveInt
	online_classes: bool
	website: HttpUrl

Let's look at how we'd use this.

In [None]:
school_instance = School(
	name="University of the People",
	teacher=Teacher(name="Bill Teach", email="bill@uopeople.com"),
	address=Address(
		street="551 Tall St.",
		city="Columbia",
		state="South Carolina",
		zip_code="29210",
	),
	students=[
		Student(
			name="Chris Wright", major="Computer Science", email="cwright@uopeople.com"
		),
		Student(name="Elio Ransom", major="Construction", email="eransom@uopeople.com"),
	],
	number_of_classes=12,
	online_classes=True,
	website=TypeAdapter(HttpUrl).validate_python("https://www.uopeople.com/"),
)

print(school_instance)

An important note is that each of the classes could be built using dictionaries, for example, the first student could be:

```python
students=[
	{"name": "Chris Wright", "major": "Computer Science", "email": "cwright@uopeople.com"},
	{"name": "Elio Ransom", "major": "Construction", "email": "eransom@uopeople.com"},
]
```

## Field validators

Next, let's take a look at `field_validator`s in relation to the `pydantic` `BaseModel`.

In [6]:
from pydantic import BaseModel, EmailStr, field_validator


class Student(BaseModel):
	name: str
	email: EmailStr

	@field_validator("name")
	@classmethod
	def name_must_contain_space(cls, v: str) -> str:
		if " " not in v:
			raise ValueError(f"Student name must contain a space, got {v}")
		return v.title()


try:
	student_instance = Student(name="chris wright", email="cwright@uopeople.com")
	print(student_instance)
except ValueError as e:
	print(e)

name='Chris Wright' email='cwright@uopeople.com'


What happens if there is no space in the `Student`'s `name` field?

In [7]:
try:
	student_instance = Student(name="chris", email="cwright@uopeople.com")
	print(student_instance)
except ValueError as e:
	print(e)

1 validation error for Student
name
  Value error, Student name must contain a space, got chris [type=value_error, input_value='chris', input_type=str]
    For further information visit https://errors.pydantic.dev/2.12/v/value_error


The `field_validator`, as described by the [Pydantic documentation](https://docs.pydantic.dev/latest/concepts/validators/) is:

> ... a callable taking the value to be validated as an argument and **returning the validated value**.

Up 'til now, we've been using the built-in classes to validate our date, like when we checked for a positive int, a valid email, etc.

We can also validate the fields on their own. To do this, we use the `field_validator` decorator (`@`) in which we pass in the field we want to check; in the above example, we validate the `name` field with `@field_validator("name")`.

We then have to write a `classmethod` decorator to validate the property. This happens **before** we create the instance of the `Student` model. The first argument we pass is `cls`, the class itself, similar to `self` in the methods I would've used before. The second argument is the thing we want to validate. In our case, we want to validate and return a `str`.

Outside of just validating the string, we can also adjust its appearance. In the above we `return`ed the `name` in title case with `return v.title()`.

## Model validators

Let's move onto validating models.

In [2]:
from pydantic import BaseModel, EmailStr, ValidationError, model_validator
from typing import Any


class Student(BaseModel):
	name: str
	email: EmailStr

	@model_validator(mode="before")
	@classmethod
	def check_sensitive_info_omitted(cls, data: Any) -> Any:
		if isinstance(data, dict):
			if "password" in data:
				raise ValueError("password should not be included")
			if "card_number" in data:
				raise ValueError("card_number should not be included")
		return data
	
	@model_validator(mode="after")
	def check_name_contains_space(self) -> "Student":
		if " " not in self.name:
			raise ValueError(f"Student name must contain a space, got {self.name}")
		self.name = self.name.title()
		return self


print(Student(name="chris wright", email="cwright@uopeople.com"))

try:
	Student(name=212, email="cwright@uopeople", password="password123")
except ValidationError as e:
	print(e)

name='Chris Wright' email='cwright@uopeople.com'
1 validation error for Student
  Value error, password should not be included [type=value_error, input_value={'name': 212, 'email': 'c...assword': 'password123'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.12/v/value_error


In the first `model_validator`, we validate the model `before`, in the second we validate `after`.

When we validate before, we're checking to make sure the data (especially if using a dictionary to create the instance of the class) is valid or to the standards we define.

What is `data` in this scenario? The `data` is everything provided in the arguments for the constructor phase of creating our `Student` object. Specifically, the lines:

```python
print(Student(name="chris wright", email="cwright@uopeople.com"))

Student(name=212, email="cwright@uopeople.com", password="password123")
```

What do `before` and `after` mean?

- `before` means that the `check_sensitive_info_omitted` function runs before checking for the data in the function `check_name_contains_space`.
- `after` means that the `check_name_contains_space` function runs after the `check_sensitive_info_omitted` function.

In the code above, the validator never ends up running the `check_name_contains_space` function because the `check_sensitive_info_omitted` function returned a `ValidationError`.

It never even checks if the `name` is a string. In the instance where `name=212`, the validator never reaches the phase of construction in which it validates that the `name` is a `str` and not (as in our case) an `int`.

Let's see what happens if we omit the sensitive information.

In [6]:
try:
	Student(name="chris", email="cwright@uopeople")
except ValidationError as e:
	print(e)

1 validation error for Student
email
  value is not a valid email address: The part after the @-sign is not valid. It should have a period. [type=value_error, input_value='cwright@uopeople', input_type=str]


In this case, we only get the error for the `email`. Why don't we see the `name` error? (It doesn't include a space)

The `check_sensitive_info_omitted` function is a `@classmethod`. We run the `classmethod` before the instance of the object existing. The `check_name_contains_space` function is a normal method, we're not using `cls` as the first argument, instead using `self`. **We only get `self` if our instance was constructed successfully.**

What happens if we fix the email?

In [8]:
try:
	Student(name="chris", email="cwright@uopeople.com")
except ValidationError as e:
	print(e)

1 validation error for Student
  Value error, Student name must contain a space, got chris [type=value_error, input_value={'name': 'chris', 'email': 'cwright@uopeople.com'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.12/v/value_error


Our instance was successfully created because the `email` was validated and `self` is initialized. Now it can check the `check_name_contains_space` function and get our validation error for the `name`.

## Fields

Let's examine `Field`s further now.

In [9]:
from pydantic import BaseModel, Field


class Student(BaseModel):
	name: str = Field(default="John Doe")


student = Student()
print(student)

name='John Doe'


The code above is a barebones example of a `Field`. The `Field` line sets the default value and is essentially the same as:

```python
name: str = "John Doe"
```

**If you're only using the `default` value part of the `Field`, it's probably better to just use the traditional python method above.**

We can use `Field`s to make our classes more complex. Let's look at another example.

In [11]:
from pydantic import BaseModel, Field
from uuid import uuid4


class Student(BaseModel):
	id: str = Field(default_factory=lambda: uuid4().hex)


student = Student()
print(student)

id='c70de286b4ab4b27ae5f86072e9ecac0'


Now we can see that the `Field` can become more useful when using something like `default_factory` to set up a default value generated from something like `uuid4`. The `Student`'s `id` will always be a randomly generated `uuid4` hex value.

Briefly, here are **field aliases**. These are used for validation and serialization and we can define an alias for a field. Here are the three ways to define an alias:

- `Field(..., alis="foo")`
- `Field(..., validation_alias="foo")`
- `Field(..., serialization_alias="foo")`

In [12]:
from pydantic import BaseModel, Field


class Student(BaseModel):
	name: str = Field(..., alias="username")


student = Student(username="chriswright")
print(student)
print(student.model_dump(by_alias=True))

name='chriswright'
{'username': 'chriswright'}


If we create a new class with an attribute `username`. Now when we construct or `Student` with `student = Student(username="chriswright")`, the `name` is assigned from `username` because of the `Field(..., alias="username")` line.

When we `model_dump` with `by_alias` set to `True`, we can see the dictionary with the key value of `username`. If we set `by_alias` to `False`, we get the `name` key instead.

In [13]:
print(student.model_dump(by_alias=False))

{'name': 'chriswright'}


This could be helpful if our data model from an SQL database or something where the attribute names differ. Maybe the database had `username`s instead of `name`s. The `Field(..., alis="username")` lets us assign the database's `username` to our `name` field.

Let's look at field constraints now.

In [None]:
from decimal import Decimal
from pydantic import BaseModel, Field, EmailStr
from typing import List


class User(BaseModel):
	username: str = Field(..., min_length=3, max_length=10, pattern=r"^\w+$")
	email: EmailStr = Field(...)
	age: int = Field(..., gt=0, le=120)
	height: float = Field(..., gt=0.0)
	is_active: bool = Field(True)
	balance: Decimal = Field(..., max_digits=10, decimal_places=2)
	favorite_numbers: List[int] = Field(..., min_length=1)

Left off at **18:18**.