# Lesson 3: Pydantic Validation

**Module 5: Model Deployment**  
**Estimated Time**: 1 hour  
**Difficulty**: Beginner

---

## ðŸŽ¯ Learning Objectives

By the end of this lesson, you will:

âœ… Understand why manually checking `if 'key' in json` is bad  
âœ… Learn **Pydantic** for schema definition  
âœ… Integrate Pydantic models with FastAPI  
âœ… Answer interview questions on API robustness  

---

## ðŸ“š Table of Contents

1. [The Problem: Fragile JSON](#1-problem)
2. [The Solution: Pydantic Models](#2-pydantic)
3. [Hands-On: Validating ML Inputs](#3-hands-on)
4. [Interview Preparation](#4-interview-questions)

---

## 1. The Problem: Fragile JSON

Without validation, your code looks like this:
```python
def predict(data):
    if 'age' not in data:
        return Error
    if not isinstance(data['age'], int):
        return Error
    # ... and so on for 20 features
```

This is tedious and error-prone.

## 2. The Solution: Pydantic Models

Pydantic forces types. If you say `age: int` and they send `"25"`, Pydantic converts it to `25`. If they send `"hello"`, it raises a clear error.

FastAPI uses these models to generate the Swagger UI documentation automatically.

## 3. Hands-On: Validating ML Inputs

We will define an input schema for a House Price model.

In [None]:
from typing import List, Optional
from pydantic import BaseModel, Field, validator

# 1. Define Input Schema
class HouseFeatures(BaseModel):
    sqft: float = Field(..., gt=0, description="Square footage of the house")
    bedrooms: int = Field(..., ge=1, le=10)
    has_garden: bool = False
    location: str

    # Custom Validation
    @validator('location')
    def location_must_be_known(cls, v):
        allowed = ['NY', 'CA', 'TX']
        if v not in allowed:
            raise ValueError(f'Location must be one of {allowed}')
        return v

# 2. Define Output Schema
class PricePrediction(BaseModel):
    price: float
    currency: str = "USD"

# 3. Simulate Parsing
print("--- Valid Request ---")
valid_data = {
    "sqft": 1500,
    "bedrooms": 3,
    "location": "NY"
}
house = HouseFeatures(**valid_data)
print(house)

print("\n--- Invalid Request (Bad Location) ---")
try:
    invalid_data = {
        "sqft": 1000,
        "bedrooms": 2,
        "location": "UK" # Error
    }
    HouseFeatures(**invalid_data)
except Exception as e:
    print(f"Validation Error: {e}")

## 4. Interview Preparation

### Common Questions

#### Q1: "Why define an Output Schema?"
**Answer**: "It prevents **Data Leakage**. If my internal model object has sensitive fields like `user_id` or `debug_info`, using an Output Schema ensures ONLY the `prediction` and `confidence` fields are sent back to the client."

#### Q2: "How to handle optional fields?"
**Answer**: "Use Python's `Optional` type alias or `| None` (in Python 3.10+). e.g., `description: Optional[str] = None`. This tells Pydantic not to error if the field is missing."