#### Pydantic is Python’s most popular data validation library that can turn type hints into runtime validation rules.

Pydantic solves this by combining three powerful concepts: type hints, runtime validation, and automatic serialization. Instead of manual checks, you define your data structure once using Python’s type annotation syntax, and Pydantic handles all the validation automatically:

In [2]:
from pydantic import BaseModel, EmailStr
from typing import Optional

class User(BaseModel):
    age: int
    email: EmailStr
    is_active: bool = True
    nickname: Optional[str] = None

# Pydantic automatically validates and converts data
user_data = {
   "age": "25",  # String gets converted to int
   "email": "john@example.com",
   "is_active": "true"  # String gets converted to bool
}

user = User(**user_data)
print(user.age)  # 25 (as integer)
print(user.model_dump())  # Clean dictionary output

25
{'age': 25, 'email': 'john@example.com', 'is_active': True, 'nickname': None}


* Pydantic’s core validation logic is written in Rust, making it faster than hand-written Python validation in most cases.

FastAPI, one of Python’s fastest-growing web frameworks, uses Pydantic models to automatically generate API documentation, validate request bodies, and serialize responses. When you define a Pydantic model, you get OpenAPI schema generation for free:

**Serialization = converting Python objects into a format that can be sent over the network or stored (like JSON, XML, or bytes).**

---

### Example with FastAPI + Pydantic

Suppose you define a Pydantic model:

```python
from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str
    email: str
```

Now when FastAPI returns this model in an API response:

```python
@app.get("/user", response_model=User)
def get_user():
    return User(id=1, name="Arshnoor", email="test@example.com")
```

FastAPI (with Pydantic) **serializes** it automatically into JSON:

```json
{
  "id": 1,
  "name": "Arshnoor",
  "email": "test@example.com"
}
```

---

### In one line:

**Serialization = turning Python objects into JSON (or another transferable format) so they can be sent to clients via an API.**

---

⚡ Flip side → **Deserialization** is the opposite: taking JSON input from an API request and converting it into Python objects.



In [4]:
from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()

class UserCreate(BaseModel):
    name: str
    email: EmailStr
    age: int
    
@app.post("/users/")
async def create_user(user: UserCreate):
    # FastAPI automatically validates the request body
    # and generates API docs from your Pydantic model
    return {"message": f"Created user {user.name}"}

**“JSON schema generation happens automatically with every Pydantic model. This means your data structures become self-documenting, and you can generate client libraries, validation rules for frontend applications, or database schemas from the same source of truth.”**

### Step by step:

1. **JSON schema generation happens automatically**

   * Whenever you create a `Pydantic` model, Python can automatically describe it in a standard JSON Schema format (a blueprint that explains what data looks like).
   * Example:

     ```python
     from pydantic import BaseModel

     class User(BaseModel):
         id: int
         name: str
         email: str
     ```

     👉 Behind the scenes, this model can produce a **JSON Schema** like:

     ```json
     {
       "title": "User",
       "type": "object",
       "properties": {
         "id": {"type": "integer"},
         "name": {"type": "string"},
         "email": {"type": "string", "format": "email"}
       },
       "required": ["id", "name", "email"]
     }
     ```

---

2. **Self-documenting**

   * You don’t need to write separate docs describing what fields your API expects.
   * The schema itself **is the documentation** because it defines field types, required fields, and constraints.

---

3. **Same source of truth**

   * Instead of writing separate definitions for backend, frontend, and database, you only write your **Pydantic model once**, and all others can be generated from it:

     * **Client libraries** → auto-generate TypeScript clients from JSON Schema.
     * **Frontend validation** → use JSON Schema to validate user input in forms.
     * **Database schemas** → some tools can map JSON Schema to SQL/NoSQL schema.

---

✅ **In plain words**:
When you define a Pydantic model, it not only validates your data in Python, but also produces a **universal blueprint (JSON Schema)**. This blueprint can be reused across backend, frontend, and even databases — meaning less duplication, fewer errors, and automatic documentation.

Python’s `@dataclass` is perfect for simple data containers where you trust the input, but Pydantic excels when you need validation, serialization, and integration with web frameworks:

In [6]:
from dataclasses import dataclass
from pydantic import BaseModel

# Dataclass: fast, simple, no validation
@dataclass
class UserDataclass:
    name: str
    age: int

# Pydantic: validation, serialization, framework integration
class UserPydantic(BaseModel):
    name: str
    age: int

Pydantic works best when you’re building APIs, processing external data, managing configuration, or any scenario where data validation failure should be caught early rather than causing mysterious bugs later. 

In [1]:
from pydantic import BaseModel, EmailStr
from typing import Optional
from datetime import datetime

class User(BaseModel):
    name: str
    email: EmailStr
    age: int
    is_active: bool = True
    created_at: datetime = None
    
clean_data = {
    "name": "Arshnoor",
    "email": "arshnoor@gmail.com",
    "age":24
}

user = User(**clean_data)
print(f"User created: {user.name}, Age: {user.age}")
print(f"Model output: {user.model_dump()}")

User created: Arshnoor, Age: 24
Model output: {'name': 'Arshnoor', 'email': 'arshnoor@gmail.com', 'age': 24, 'is_active': True, 'created_at': None}


* Your User class inherits from BaseModel, which gives it all of Pydantic's validation and serialization capabilities. This inheritance turns a regular Python class into a data validation tool.

* Default values: Fields like is_active: bool = True have default values. If you don't provide these fields when creating a user, Pydantic uses the defaults. The = None for created_at makes this field optional.

* Model instantiation: When you call User(**clean_data), the ** unpacks your dictionary and passes each key-value pair as keyword arguments to the model constructor.

In [3]:
# Messy data that still works
messy_data = {
    "name": "Bob Smith",
    "email": "bob@company.com",
    "age": "35",  # String instead of int
    "is_active": "true"  # String instead of bool
}

user = User(**messy_data)
print(f"Age type: {type(user.age)}")  # <class 'int'>
print(f"Is active type: {type(user.is_active)}")  # <class 'bool'>

Age type: <class 'int'>
Is active type: <class 'bool'>


When validation fails, Pydantic provides clear error messages:

In [4]:
from pydantic import ValidationError

try:
    invalid_user = User(
        name="",
        email="not-an-email",
        age=-5
    )
except ValidationError as e:
    print(e)

1 validation error for User
email
  value is not a valid email address: An email address must have an @-sign. [type=value_error, input_value='not-an-email', input_type=str]


## BaseModel vs. data classes
Understanding when to use Pydantic’s BaseModel versus Python's @dataclass helps you choose the right tool for each situation.

Python dataclasses are perfect for simple data containers where you control the input:

In [6]:
from dataclasses import dataclass

@dataclass
class ProductDataclass:
    name: str
    price: float
    in_stock: bool

# Fast, simple, but no validation
product = ProductDataclass("Laptop", 999.99, True)

# This also works, even though types are wrong:
broken_product = ProductDataclass(123, "expensive", "maybe")

Pydantic models add validation, serialization, and framework integration:


In [8]:
from pydantic import BaseModel, Field

class ProductPydantic(BaseModel):
   name: str = Field(min_length=1)
   price: float = Field(gt=0)  # Must be greater than 0
   in_stock: bool

# Automatic validation prevents bad data
try:
   product = ProductPydantic(name="", price=-10, in_stock="maybe")
except ValidationError as e:
   print("Validation caught the errors!")

# Valid data works perfectly
good_product = ProductPydantic(
    name="Laptop",
    price="999.99",  # String converted to float
    in_stock=True
)

Validation caught the errors!


When to choose each approach:

* Use dataclasses for internal data structures, configuration objects, or when performance is critical and you trust your data sources
* Use Pydantic for API endpoints, user input, external data parsing, or when you need JSON serialization

Pydantic adds some overhead compared to dataclasses, but this cost is usually negligible compared to the bugs it prevents and the development time it saves. For web applications, the automatic integration with frameworks like FastAPI makes Pydantic the clear choice.

The validation and serialization features become more valuable as your application grows. Starting with Pydantic models gives you a solid foundation that scales with your needs.

# Building Data Models With Pydantic

## Field validation and constraints

Pydantic’s `Field()` function transforms basic type hints into sophisticated validation rules that protect your application:

In [9]:
from pydantic import BaseModel, Field
from decimal import Decimal
from typing import Optional

class Product(BaseModel):
    name: str = Field(min_length=1, max_length=100)
    price: Decimal = Field(gt=0, le=10000)  # Greater than 0, less than or equal to 10,000
    description: Optional[str] = Field(None, max_length=500)
    category: str = Field(..., pattern=r'^[A-Za-z\s]+$')  # Only letters and spaces
    stock_quantity: int = Field(ge=0)  # Greater than or equal to 0
    is_available: bool = True

# This works - all constraints satisfied
valid_product = Product(
    name="Wireless Headphones",
    price="199.99",  # String converted to Decimal
    description="High-quality wireless headphones",
    category="Electronics",
    stock_quantity=50
)

# This fails with clear error messages
try:
    invalid_product = Product(
        name="",  # Too short
        price=-50,  # Negative price
        category="Electronics123",  # Contains numbers
        stock_quantity=-5  # Negative stock
    )
except ValidationError as e:
    print(f"Validation errors: {len(e.errors())} issues found")

Validation errors: 4 issues found


Each `Field()` parameter serves a specific purpose: min_length and max_length prevent database schema violations, gt and le create business logic boundaries, and pattern validates formatted data using regular expressions. The `Field(...)` syntax with ellipsis marks the required fields, while `Field(None, ...)` creates optional fields with validation rules.

# Type coercion vs strict validation

By default, Pydantic converts compatible types rather than rejecting them outright. This flexibility works well for user input, but some scenarios demand exact type matching:

### 🔹 Type coercion (default in Pydantic)

* **Meaning**: Pydantic will *try to convert* the input into the right type if it’s compatible.
* It’s forgiving → “I’ll make it work if I can.”

Example:

```python
from pydantic import BaseModel

class User(BaseModel):
    age: int

user = User(age="25")   # string "25"
print(user.age)         # 👉 25 (converted to int)
```

Here `"25"` (string) was automatically **coerced** into an `int`.

---

### 🔹 Strict validation

* **Meaning**: Pydantic will *reject* the input if the type doesn’t match exactly.
* It’s strict → “If it’s not the right type, I’ll throw an error.”

Example:

```python
from pydantic import BaseModel, StrictInt

class User(BaseModel):
    age: StrictInt

user = User(age="25")   # ❌ raises ValidationError
```

Here `"25"` is a string, not an int → so it fails.

---

### ✅ Why this matters

* **Type coercion** → great for handling messy user input (like from web forms, APIs, CSVs).
* **Strict validation** → important when data *must be exact* (e.g., financial transactions, database constraints, security-sensitive inputs).

---

👉 **In simple terms**:

* *Type coercion* = “I’ll fix it for you if possible.”
* *Strict validation* = “Only exact matches allowed.”

---

Do you want me to also show you **how to toggle between these two behaviors** project-wide (not just per field)?


In [None]:
from pydantic import BaseModel, Field, ValidationError

# Default: lenient type coercion
class FlexibleOrder(BaseModel):
    order_id: int
    total_amount: float
    is_paid: bool

# These all work due to automatic conversion
flexible_order = FlexibleOrder(
    order_id="12345",  # String to int
    total_amount="99.99",  # String to float
    is_paid="true"  # String to bool
)

# Strict validation when precision matters
class StrictOrder(BaseModel):
    model_config = {"str_strip_whitespace": True, "validate_assignment": True}
    order_id: int = Field(strict=True)
    total_amount: float = Field(strict=True)
    is_paid: bool = Field(strict=True)

# Nested models and complex data

In [None]:
from typing import List
from datetime import datetime

class Address(BaseModel):
   street: str = Field(min_length=5)
   city: str = Field(min_length=2)
   postal_code: str = Field(pattern=r'^\d{5}(-\d{4})?$')
   country: str = "USA"

class Customer(BaseModel):
   name: str = Field(min_length=1)
   email: EmailStr
   shipping_address: Address
   billing_address: Optional[Address] = None

class OrderItem(BaseModel):
   product_id: int = Field(gt=0)
   quantity: int = Field(gt=0, le=100)
   unit_price: Decimal = Field(gt=0)

class Order(BaseModel):
   order_id: str = Field(pattern=r'^ORD-\d{6}$')
   customer: Customer
   items: List[OrderItem] = Field(min_items=1)
   order_date: datetime = Field(default_factory=datetime.now)

# Complex nested data validation
order_data = {
    "order_id": "ORD-123456",
    "customer": {
        "name": "John Doe",
        "email": "john@example.com",
        "shipping_address": {
            "street": "123 Main Street",
            "city": "Anytown",
            "postal_code": "12345"
        }
    },
    "items": [
        {"product_id": 1, "quantity": 2, "unit_price": "29.99"},
        {"product_id": 2, "quantity": 1, "unit_price": "149.99"}
    ]
}

order = Order(**order_data)
print(f"Order validated with {len(order.items)} items")

Order validated with 2 items


# Optional fields and None handling

In [None]:
from typing import Optional

class UserCreate(BaseModel):
   name: str = Field(min_length=1)
   email: EmailStr
   age: int = Field(ge=13, le=120)
   phone: Optional[str] = Field(None, pattern=r'^\+?1?\d{9,15}$')

class UserUpdate(BaseModel):
   name: Optional[str] = Field(None, min_length=1)
   email: Optional[EmailStr] = None
   age: Optional[int] = Field(None, ge=13, le=120)
   phone: Optional[str] = Field(None, pattern=r'^\+?1?\d{9,15}$')

# PATCH request with partial data
update_data = {"name": "Jane Smith", "age": 30}
user_update = UserUpdate(**update_data)

# Serialize only provided fields
patch_data = user_update.model_dump(exclude_none=True)
print(f"Fields to update: {list(patch_data.keys())}")

Fields to update: ['name', 'age']


: 

Serialization converts Pydantic objects back into dictionaries or JSON strings for storage or transmission. The model_dump() method handles this conversion, with exclude_none=True removing unprovided fields. This pattern works perfectly for PATCH requests where clients send only the fields they want to change, preventing accidental data overwrites in your database.


### 🔹 What is **serialization** in Pydantic?

* **Serialization** = turning a Python object (like a Pydantic model) into a format that can be stored or sent over the network (like a `dict` or `JSON`).
* Think of it as *“packing”* your Python object into a standard format.

---

### 🔹 `model_dump()` in action

```python
from pydantic import BaseModel

class User(BaseModel):
    name: str
    email: str | None = None
    age: int | None = None

# Client only sends "name"
user = User(name="Alice Johnson")

# Convert to dict
print(user.model_dump())  
# 👉 {'name': 'Alice Johnson', 'email': None, 'age': None}

# Convert to dict but skip empty/None fields
print(user.model_dump(exclude_none=True))  
# 👉 {'name': 'Alice Johnson'}
```

---

### 🔹 Why `exclude_none=True` matters

Imagine you’re doing a **PATCH request** (update only some fields in the database).

* Without `exclude_none=True` → missing fields (like `email`, `age`) might get written as `NULL` in your database, wiping out real data.
* With `exclude_none=True` → only the fields actually provided (`name` in this case) are updated, so nothing else is touched.

---

👉 **In simple terms**:

* `model_dump()` = turns your Pydantic model into a safe dictionary/JSON.
* `exclude_none=True` = don’t include empty fields, so you don’t accidentally overwrite existing data.
* Perfect for **PATCH requests** where you only want to update specific fields.


More details here: https://www.datacamp.com/tutorial/pydantic
