# Data Validation with Pydantic

**Pydantic** is a data validation and settings management library for Python. It uses Python type annotations to validate and parse data. Pydantic is particularly useful for ensuring that data conforms to a specified schema, making it a great tool for data validation in web applications, APIs, and other data-driven projects.

**Key Features of Pydantic**:
* **Data Validation**: Automatically validates data against the defined types and constraints.
* **Type Coercion**: Converts input data to the specified types.
* **Error Reporting**: Provides detailed error messages when validation fails.
* **Settings Management**: Can be used to manage application settings with environment variable support.
* **Serialization**: Easily serialize and deserialize data to and from JSON.

As you can imagine, Pydantic’s field_validator() enables you to arbitrarily customize field validation. However, `field_validator()` won’t work if you want to compare multiple fields to one another or validate your model as a whole. For this, you’ll need to use model validators.

As an example, suppose your company only hires contract workers in the IT department. Because of this, IT workers don’t qualify for benefits and their elected_benefits field should be False. You can use Pydantic’s `model_validator()` to enforce this constraint:


In [3]:
from datetime import date
from typing import Self
from uuid import UUID, uuid4
from enum import Enum
from  pydantic import BaseModel, EmailStr, Field, field_validator, model_validator

class Department(Enum):
    HR = "Human Resources"
    ENG = "Engineering"
    MKT = "Marketing"
    FIN = "Finance"
    SALES = "Sales"
    IT = "Information Technology"


class Employee(BaseModel):
    employee_id: UUID = Field(default_factory=uuid4, frozen=True)
    name : str = Field(..., min_length=3, frozen=True)
    email: EmailStr = Field(pattern=r".+@example\.com$")
    date_of_birth: date = Field(alias="birth_date", repr=False, frozen=True)
    salary: float = Field(alias="compensation", ge=0, repr=False)
    department: Department
    elected_benefits: bool

    @field_validator("date_of_birth")
    @classmethod
    def check_valid_age(cls, date_of_birth: date)-> date:
        today = date.today()
        eighteen_years_ago = date(today.year - 18, today.month, today.day)
        if date_of_birth > eighteen_years_ago:
            raise ValueError("employee must be at least 18 years old")
        
        return date_of_birth
    
    @model_validator(mode="after")
    def check_it_benefits(self) -> Self:
        department = self.department
        elected_benefits = self.elected_benefits

        if department == Department.IT and elected_benefits:
            raise ValueError(
                "IT employees are contractors and don't qualify for benefits"
            )
        return self
        

Here, you import `Field` along with the other dependencies you used previously, and you assign default values to some of the Employee fields. Here’s a breakdown of the Field parameters you used to add additional validation and metadata to your fields:

* `default_factory`: You use this to define a callable that generates default values. In the example above, you set default_factory to uuid4. This calls uuid4() to generate a random UUID for employee_id when needed. You can also use a lambda function for more flexibility.

* `frozen`: This is a Boolean parameter you can set to make your fields immutable. This means, when frozen is set to True, the corresponding field can’t be changed after your model is instantiated. In this example, employee_id, name, and date_of_birth are made immutable using the frozen parameter.

* `min_length`: You can control the length of string fields with min_length and max_length. In the example above, you ensure that name is at least one character long.
pattern: For string fields, you can set pattern to a regex expression to match whatever pattern you’re expecting for that field. For instance, when you use the regex expression in the example above for email, Pydantic will ensure that every email ends with @example.com.

* `alias`: You can use this parameter when you want to assign an alias to your fields. For example, you can allow date_of_birth to be called birth_date or salary to be called compensation. You can use these aliases when instantiating or serializing a model.

* `gt`: This parameter, short for “greater than”, is used for numeric fields to set minimum values. In this example, setting gt=0 ensures salary is always a positive number. Pydantic also has other numeric constraints, such as lt which is short for “less than”.

* `repr`: This Boolean parameter determines whether a field is displayed in the model’s field representation. In this example, you won’t see date_of_birth or salary when you print an Employee instance.


Here, you add Python’s Self type and Pydantic’s model_validator() to your imports. You then create a method, `.check_it_benefits()`, that raises an error if the employee belongs to the IT department and the elected_benefits field is True. When you set mode to after in @model_validator, Pydantic waits until after you’ve instantiated your model to run `.check_it_benefits()`.

In [4]:
Employee(
    name="John Doe",
    email="john.doe@example.com",
    birth_date="1990-01-01",
    compensation=50_000.00,
    department="Human Resources",
    elected_benefits=True
)

Employee(employee_id=UUID('5763f0b9-d3bf-42d0-ac82-4197e655f900'), name='John Doe', email='john.doe@example.com', department=<Department.HR: 'Human Resources'>, elected_benefits=True)

Pydantic’s BaseModel is equipped with a suite of methods that make it easy to create models from other objects, such as dictionaries and JSON. 

For example, if you want to instantiate an Employee object from a dictionary, you can use the `.model_validate()` class method:

In [26]:
new_employee_dict = {
    "name": "John Doe",
    "email": "john.doe@example.com",
    "birth_date": "1990-01-01",
    "compensation": 50_000.00,
    "department": "Human Resources",
    "elected_benefits": True
}

new_employee = Employee.model_validate(new_employee_dict)

In [28]:
new_employee

Employee(employee_id=UUID('886def64-d42b-43d8-9f16-369fba3d917e'), name='John Doe', email='john.doe@example.com', department=<Department.HR: 'Human Resources'>, elected_benefits=True)

ou can do the same thing with JSON objects using `.model_validate_json()`:

In [14]:
# new_employee_json = """
# {"employee_id":"d2e7b773-926b-49df-939a-5e98cbb9c9eb",
#     "name":"John Doe",
#     "email":"john.doe@example.com",
#     "date_of_birth":"1990-01-01",
#     "salary":50_000.00,
#     "department":"Human Resources",
#     "elected_benefits":false}"""

# Employee.model_validate_json(new_employee_json)

You can also serialize Pydantic models as dictionaries and JSON:

In [27]:
new_employee.model_dump()

{'employee_id': UUID('886def64-d42b-43d8-9f16-369fba3d917e'),
 'name': 'John Doe',
 'email': 'john.doe@example.com',
 'date_of_birth': datetime.date(1990, 1, 1),
 'salary': 50000.0,
 'department': <Department.HR: 'Human Resources'>,
 'elected_benefits': True}

In [None]:
# new_employee.model_dump_json()

Here, you use .model_dump() and .model_dump_json() to convert your new_employee model to a dictionary and JSON string, respectively. Notice how .model_dump_json() returns a JSON object with date_of_birth and department stored as strings.

While Pydantic already validated these fields and converted your model to JSON, whoever uses this JSON downstream won’t know that date_of_birth needs to be a valid date and department needs to be a category in your Department enum. To solve this, you can create a JSON schema from your Employee model.

In [17]:
Employee.model_json_schema()

{'$defs': {'Department': {'enum': ['Human Resources',
    'Engineering',
    'Marketing',
    'Finance',
    'Sales'],
   'title': 'Department',
   'type': 'string'}},
 'properties': {'employee_id': {'default': 'd0d03852-5128-4fd5-863d-b275e2216c2c',
   'format': 'uuid',
   'title': 'Employee Id',
   'type': 'string'},
  'name': {'title': 'Name', 'type': 'string'},
  'email': {'format': 'email', 'title': 'Email', 'type': 'string'},
  'date_of_birth': {'format': 'date',
   'title': 'Date Of Birth',
   'type': 'string'},
  'salary': {'title': 'Salary', 'type': 'number'},
  'department': {'$ref': '#/$defs/Department'},
  'elected_benefits': {'title': 'Elected Benefits', 'type': 'boolean'}},
 'required': ['name',
  'email',
  'date_of_birth',
  'salary',
  'department',
  'elected_benefits'],
 'title': 'Employee',
 'type': 'object'}

When you call `.model_json_schema()`, you get a dictionary representing your model’s JSON schema. The first entry you see shows you the values that department can take on. You also see information about how your fields should be formatted. For instance, according to this JSON schema, employee_id is expected to be a UUID and date_of_birth is expected to be a date.

In [7]:
new_contract = {
 "name": "Alexis Tau",
"email": "ataue@example.com",
"birth_date": "2001-04-12",
"compensation": 100_000,
"department": "Information Technology",
 "elected_benefits": True,
}

Employee.model_validate(new_contract)

ValidationError: 1 validation error for Employee
  Value error, IT employees are contractors and don't qualify for benefits [type=value_error, input_value={'name': 'Alexis Tau', 'e...elected_benefits': True}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/value_error

### Using Validation Decorators to Validate Functions

Use Pydantic to validate function arguments using the `@validate_call` decorator. This allows you to create robust functions with informative type errors without having to manually implement validation logic.

To see how this works, suppose you’re writing a function that sends invoices to clients after they’ve made a purchase. Your function takes in the client’s name, email, items purchased, and total billing amount, and it constructs and sends them an email. You need to validate all of these inputs because getting them wrong could result in the email not being sent, being misformatted, or the client being invoiced incorrectly.

In [8]:
import time 
from typing import Annotated, List
from pydantic import PositiveFloat, Field, EmailStr, validate_call


@validate_call
def send_invoice(
    client_name: Annotated[str, Field(min_length=3)],
    client_email: EmailStr,
    items_purchased: List[str],
    amount_owed: PositiveFloat
):
    email_str = f"""
    Dear {client_name}, \n 
    Thank you for choosing xyz inc! You 
    owe ${amount_owed:,.2f} for the following items: \n
    {items_purchased}
    """

    print(f"Sending email to {client_email}...")
    time.sleep(2)

    return email_str

In [11]:
email_str = send_invoice(
 client_name="Andrew Jolawson",
client_email="ajolawson@fakedomain.com",
items_purchased=["pie", "cookie", "cake"],
amount_owed=20,)

print(email_str)

Sending email to ajolawson@fakedomain.com...

    Dear Andrew Jolawson, 
 
    Thank you for choosing xyz inc! You 
    owe $20.00 for the following items: 

    ['pie', 'cookie', 'cake']
    


### Configuring Applications With BaseSettings

pydantic-settings is one of the most powerful ways to manage environment variables in Python, and it has been widely adopted and recommended by popular libraries like FastAPI. You can use pydantic-settings to create models, similar to BaseModel, that parse and validate environment variables.

To see how this works, suppose your application connects to a database and another API service. Your database credentials and API key can change over time and often change depending on which environment you’re deploying in. To handle this, you can create the following BaseSettings model:

In [12]:
from pydantic import HttpUrl, Field
from pydantic_settings import BaseSettings, SettingsConfigDict

class AppConfig(BaseSettings):
    model_config = SettingsConfigDict(
        env_file=".env",
        env_file_encoding="utf-8",
        case_sensitive=True,
        extra="forbid",
    )
    
    database_host: HttpUrl
    database_user: str = Field(min_length=5)
    database_password: str = Field(min_length=10)
    api_key: str = Field(min_length=20)

In [15]:
# AppConfig()