# Lesson 2: Pydantic Basics

In this lesson, you'll learn the fundamentals of Pydantic models for data validation using a customer support system as your example application. You'll see how to define data models, validate user input, and handle validation errors gracefully.

By the end of this lesson, you'll be able to:
- Create Pydantic models to validate user input data
- Handle validation errors with proper error handling
- Use optional fields and field constraints in your models
- Work with JSON data validation methods

---

### Resumo das Ideias Chave do Notebook

O notebook ensina, de forma pr√°tica, como usar Pydantic para garantir a qualidade dos dados que entram em um sistema. A ideia central √© simples: **"Declare a estrutura dos seus dados com classes Python e deixe o Pydantic fazer o trabalho pesado de valida√ß√£o."**

Os principais conceitos que ele aborda s√£o:

1.  **Declara√ß√£o Estruturada de Dados com `BaseModel`**: Em vez de trabalhar com dicion√°rios Python "soltos" e ter que verificar manualmente se as chaves existem (`'name' in data`), voc√™ define uma classe que herda de `pydantic.BaseModel`. Essa classe se torna a "verdade √∫nica" sobre a estrutura de dados esperada. √â auto-document√°vel e clara.

2.  **Valida√ß√£o e Coer√ß√£o Autom√°tica de Tipos**: O Pydantic n√£o apenas verifica se um campo √© do tipo correto, mas tamb√©m tenta "for√ßar" (coer√ß√£o) os dados para o tipo esperado, se for razo√°vel. O notebook demonstra isso perfeitamente:
    * Uma string de data como `"2025-12-31"` √© automaticamente convertida para um objeto `datetime.date`.
    * Um `order_id` passado como a string `"12345"` √© convertido para o inteiro `12345`.
    * Ele tamb√©m usa tipos especializados como `EmailStr` para valida√ß√µes complexas prontas para uso.

3.  **Tratamento de Erros Centralizado com `ValidationError`**: Quando os dados n√£o podem ser validados ou convertidos, o Pydantic n√£o quebra seu programa de forma inesperada. Ele levanta uma √∫nica exce√ß√£o, `ValidationError`, que cont√©m uma lista detalhada de todos os erros encontrados. A fun√ß√£o `validate_user_input` do notebook √© um exemplo perfeito de como capturar essa exce√ß√£o e apresentar os erros de forma amig√°vel, tornando a aplica√ß√£o muito mais robusta.

4.  **Flexibilidade com Campos Opcionais e Restri√ß√µes**: Nem todos os dados s√£o obrigat√≥rios. O notebook mostra como usar `Optional` para campos que podem ou n√£o estar presentes. Al√©m disso, introduz o `Field` para adicionar regras de neg√≥cio diretamente na defini√ß√£o do modelo (ex: `order_id` deve ser um n√∫mero entre 10000 e 99999). Isso move a l√≥gica de valida√ß√£o para perto da defini√ß√£o do dado, tornando o c√≥digo mais limpo.

5.  **Manipula√ß√£o Eficiente de JSON**: A li√ß√£o mostra a forma correta e moderna de lidar com dados JSON. Em vez de fazer `json.loads(data)` e depois passar o dicion√°rio para o modelo, voc√™ pode usar o m√©todo `UserInput.model_validate_json(json_data)`. Isso √© mais perform√°tico e direto, combinando o parsing do JSON e a valida√ß√£o em um √∫nico passo.

---

### Aplica√ß√£o em LangChain e LangGraph

Agora, a parte mais importante: por que tudo isso √© fundamental para LangChain?

LLMs s√£o, por natureza, geradores de texto n√£o estruturado. Para construir aplica√ß√µes confi√°veis, precisamos for√ßar essa sa√≠da de texto a se conformar com uma estrutura de dados bem definida. **Pydantic √© a principal ferramenta para criar essa estrutura.**

1.  **Structured Output (Sa√≠da Estruturada)**:
    * **Cen√°rio**: Voc√™ pede a um LLM para extrair informa√ß√µes de um e-mail de cliente e quer a resposta em um formato JSON espec√≠fico.
    * **Aplica√ß√£o Pydantic**: Voc√™ define um `BaseModel` Pydantic com os campos que deseja (ex: `nome_cliente`, `numero_pedido`, `sentimento`). Em LangChain, voc√™ usa o `PydanticOutputParser` e passa seu modelo Pydantic a ele. O LangChain automaticamente gera um prompt que instrui o LLM a formatar sua resposta de acordo com o esquema do seu modelo Pydantic. Depois, ele valida a sa√≠da do LLM e a entrega para voc√™ como uma inst√¢ncia da sua classe Pydantic, j√° validada e com os tipos corretos. Isso garante que a sa√≠da do LLM seja sempre utiliz√°vel pelo resto do seu c√≥digo.

2.  **Defini√ß√£o de Ferramentas (Tools) para Agentes**:
    * **Cen√°rio**: Voc√™ est√° construindo um agente que pode usar ferramentas, como uma fun√ß√£o para buscar informa√ß√µes de um pedido em um banco de dados.
    * **Aplica√ß√£o Pydantic**: A melhor pr√°tica √© definir os argumentos que sua ferramenta espera usando um `BaseModel` Pydantic. Isso serve como um "esquema" para o LLM. O agente usar√° esse esquema para entender que, para chamar a ferramenta `buscar_pedido`, ele precisa fornecer um argumento chamado `order_id` que deve ser um inteiro. Isso torna a chamada de fun√ß√µes (function calling) muito mais confi√°vel.

3.  **Defini√ß√£o do Estado em LangGraph**:
    * **Cen√°rio**: Em LangGraph, voc√™ constr√≥i grafos de execu√ß√£o onde o estado (informa√ß√µes) flui de um n√≥ para outro.
    * **Aplica√ß√£o Pydantic**: A forma mais robusta de definir o `State` do seu grafo √© usando um `BaseModel` Pydantic (ou um `TypedDict`, que √© similar). Cada campo no seu modelo Pydantic representa uma parte do estado da aplica√ß√£o (ex: `email_original`, `resumo`, `dados_extraidos`). Isso funciona como um **contrato de dados** entre os n√≥s do grafo. Se um n√≥ √© respons√°vel por preencher o campo `resumo`, o Pydantic garante que ele ser√° uma string, evitando que o pr√≥ximo n√≥ quebre ao tentar us√°-lo.

Em resumo, o que voc√™ aprendeu neste notebook √© a base para transformar prot√≥tipos de LangChain que funcionam "na maioria das vezes" em sistemas de produ√ß√£o que s√£o **confi√°veis, previs√≠veis e f√°ceis de manter**.

In [1]:
# Import libraries needed for the lesson
from typing import Any
from pydantic import BaseModel, ValidationError, EmailStr
import json

### Define a UserInput Pydantic model and populate it with data

In [2]:
# Create a Pydantic model for validating user input
class UserInput(BaseModel):
    """A Pydantic model for validating user input data.
    
    This class represents user input containing personal information and a query.
    It uses Pydantic for automatic validation and serialization of the input data.
    
    Attributes:
        name (str): The full name of the user. Must be a non-empty string.
        email (EmailStr): A valid email address. Automatically validated for
        proper email format using Pydantic's EmailStr type.
        query (str): The user's question or request. Must be a non-empty string.
    
    Example:
        >>> user_data = UserInput(
        ...     name="Jo√£o Silva",
        ...     email="joao.silva@email.com",
        ...     query="Como posso aprender Python?"
        ... )
        >>> print(user_data.name)
        Jo√£o Silva
        >>> print(user_data.email)
        joao.silva@email.com
    
    Raises:
        ValidationError: If any of the provided values don't meet the validation
            criteria (e.g., invalid email format, empty strings).
    
    Note:
        This class inherits from Pydantic's BaseModel, which provides automatic
        validation, serialization, and deserialization capabilities.
    """
    name: str
    email: EmailStr
    query: str

In [3]:
# Create a model instance
user_input = UserInput(
    name="Joe User", 
    email="joe.user@example.com", 
    query="I forgot my password."
)
print(user_input)

name='Joe User' email='joe.user@example.com' query='I forgot my password.'


### Note: the following cell will produce a validation error. You can correct the error by following along with the video, or just proceed with the rest of the notebook as cells below do not depend on this cell. 

In [4]:
# Attempt to create another model instance with an invalid email
user_input = UserInput(
    name="Joe User", 
    email="not-an-email", 
    query="I forgot my password."
)
print(user_input)

ValidationError: 1 validation error for UserInput
email
  value is not a valid email address: An email address must have an @-sign. [type=value_error, input_value='not-an-email', input_type=str]

### Define a function for error handling and try different inputs

In [5]:
# Define a function to handle user input validation safely
def validate_user_input(input_data: dict[str, Any])-> UserInput | None:
    """Validate user input data and create a UserInput model instance.
    
    This function attempts to create and validate a UserInput instance from
    the provided dictionary data. It handles validation errors gracefully by
    printing user-friendly error messages and returning None on failure.
    
    Args:
        input_data: A dictionary containing user input data with keys 'name',
            'email', and 'query'. All values should be strings, with 'email'
            being a valid email address format.
    
    Returns:
        A validated UserInput instance if validation succeeds, None if
        validation fails due to invalid data format or missing required fields.
    
    Raises:
        ValidationError: Caught internally and converted to user-friendly
            error messages. The function will not propagate this exception.
    
    Example:
        >>> data = {
        ...     "name": "Maria Santos",
        ...     "email": "maria@example.com",
        ...     "query": "How to learn Python?"
        ... }
        >>> result = validate_user_input(data)
        ‚úÖ Valid user input created:
        {
          "name": "Maria Santos",
          "email": "maria@example.com",
          "query": "How to learn Python?"
        }
        >>> isinstance(result, UserInput)
        True
        
        >>> invalid_data = {"name": "", "email": "invalid", "query": ""}
        >>> result = validate_user_input(invalid_data)
        ‚ùå Validation error occurred:
          - name: String should have at least 1 character
          - email: value is not a valid email address
        >>> result is None
        True
    
    Note:
        This function prints validation results directly to stdout. Consider
        using logging for production applications instead of print statements.
    """
    try:
        # Attempt to create a UserInput model instance from user input data
        user_input = UserInput(**input_data)
        print(f"‚úÖ Valid user input created:")
        print(f"{user_input.model_dump_json(indent=2)}")
        return user_input
    except ValidationError as e:
        # Capture and display validation errors in a readable format
        print(f"‚ùå Validation error occurred:")
        for error in e.errors():
            print(f"  - {error['loc'][0]}: {error['msg']}")
        return None

O construtor da classe UserInput espera receber os argumentos assim:
```python
user_input = UserInput(name="Maria", email="maria@example.com", query="...")
```
Contudo os dados est√£o em um dicion√°rio:
```python
input_data = {
    "name": "Maria",
    "email": "maria@example.com",
    "query": "..."
}
```
A linha UserInput(**input_data) √© um atalho elegante que faz o Python **"desempacotar"** o dicion√°rio input_data, transformando cada par de chave-valor em um argumento de palavra-chave.

Portanto, o c√≥digo:
```python
user_input = UserInput(**input_data)
```
√â exatamente equivalente a:
```python
user_input = UserInput(
    name=input_data["name"], 
    email=input_data["email"], 
    query=input_data["query"]
)
```


In [6]:
# Create an instance of UserInput using validate_user_input() function
input_data = {
    "name": "Joe User", 
    "email": "joe.user@example.com",
    "query": "I forgot my password."
}

user_input = validate_user_input(input_data)

‚úÖ Valid user input created:
{
  "name": "Joe User",
  "email": "joe.user@example.com",
  "query": "I forgot my password."
}


In [7]:
# Attempt to create an instance of UserInput with missing query field
input_data = {
    "name": "Joe User", 
    "email": "joe.user@example.com"
}

user_input = validate_user_input(input_data)

‚ùå Validation error occurred:
  - query: Field required


### Update your UserInput data model with additional fields and experiment with different input data

In [8]:
# Import additional libraries for enhanced validation
from pydantic import Field
from typing import Optional
from datetime import date

# Define a new UserInput model with optional fields
class UserInput(BaseModel):
    """Enhanced Pydantic model for validating user input with optional fields.
    
    This class represents comprehensive user input data including personal 
    information, queries, and optional order details. It uses Pydantic's Field
    for advanced validation rules and constraints.
    
    Attributes:
        name (str): The full name of the user. Required field.
        email (EmailStr): A valid email address. Automatically validated for 
            proper email format.
        query (str): The user's question or request. Required field.
        order_id (Optional[int]): An optional 5-digit order number. Must be 
            between 10000-99999 (cannot start with 0). Defaults to None.
        purchase_date (Optional[date]): An optional purchase date. Accepts 
            ISO format dates (YYYY-MM-DD). Defaults to None.
    
    Examples:
        Basic usage with required fields only:
        >>> user = UserInput(
        ...     name="Ana Silva",
        ...     email="ana@example.com",
        ...     query="Preciso de ajuda com meu pedido"
        ... )
        
        Complete usage with all fields:
        >>> from datetime import date
        >>> user = UserInput(
        ...     name="Carlos Oliveira",
        ...     email="carlos@example.com",
        ...     query="Status do pedido",
        ...     order_id=12345,
        ...     purchase_date=date(2024, 1, 15)
        ... )
        
        Invalid order_id (will raise ValidationError):
        >>> user = UserInput(
        ...     name="Test User",
        ...     email="test@example.com", 
        ...     query="Test query",
        ...     order_id=123  # Too small, must be 5 digits
        ... )
    
    Raises:
        ValidationError: If any validation constraint is violated:
            - Invalid email format
            - order_id outside range 10000-99999
            - Invalid date format for purchase_date
    
    Note:
        This class leverages Pydantic's Field for advanced validation.
        Optional fields can be omitted and will default to None.
    """
    name: str
    email: EmailStr
    query: str
    order_id: Optional[int] = Field(
        None,
        description="5-digit order number (cannot start with 0)",
        ge=10000,
        le=99999
    )
    purchase_date: Optional[date] = None

### O M√©todo `Field` do Pydantic
O Field √© uma fun√ß√£o especial do Pydantic que permite definir valida√ß√µes avan√ßadas 
e metadados para os campos do modelo.

```python
campo: tipo = Field(valor_default, **configura√ß√µes)
```
* Par√¢metros Principais do `Field`
1. Default value
    ```python
    order_id: Optional[int] = Field(None, ...)  # None √© o valor padr√£o
    ```
2. Description - documenta√ß√£o do campo
    ```python
    Field(description="5-digit order number (cannot start with 0)")
    ```
    * Prop√≥sito: Documenta o campo para desenvolvedores e documenta√ß√£o autom√°tica
    * Uso: Aparece em schemas JSON, documenta√ß√£o OpenAI, etc

3. ge - Greater than or Equal (Maior ou Igual)
    ```python
    Field(ge=10000)  # order_id >= 10000
    ```
    * Aplica√ß√£o: N√∫meros(int, float)
    * Valida√ß√£o: Garante que o valor seja maior ou igual ao especificado

4. le - Less than or Equal (Menor ou Igual)
    ```python
    Field(le=99999)  # order_id <= 99999
    ```
    * Aplica√ß√£o: N√∫meros(int, float)
    * Valida√ß√£o: Garante que o valor seja menor ou igual ao especificado

Outros Par√¢metros √öteis do Field
Para Strings:
```python
name: str = Field(
    min_length=2,        # M√≠nimo 2 caracteres
    max_length=100,      # M√°ximo 100 caracteres
    regex=r"^[A-Za-z\s]+$"  # Apenas letras e espa√ßos
)
```
Para N√∫meros:
```python
price: float = Field(
    gt=0,           # Greater than (maior que)
    lt=1000000,     # Less than (menor que)
    decimal_places=2  # M√°ximo 2 casas decimais
)
```
Para listas:
```python
tags: List[str] = Field(
    min_items=1,    # M√≠nimo 1 item
    max_items=10    # M√°ximo 10 itens
)
``` 
Conclus√£o: O `Field` transforma valida√ß√£o de dados de uma tarefa manual e propensa a erros em um processo declarativo e autom√°tico! üöÄ

In [9]:
# Define a dictionary with required fields only
input_data = {
    "name": "Joe User",
    "email": "joe.user@example.com",
    "query": "I forgot my password."
}

# Validate the user input data
user_input = validate_user_input(input_data)

‚úÖ Valid user input created:
{
  "name": "Joe User",
  "email": "joe.user@example.com",
  "query": "I forgot my password.",
  "order_id": null,
  "purchase_date": null
}


In [10]:
print(user_input)

name='Joe User' email='joe.user@example.com' query='I forgot my password.' order_id=None purchase_date=None


In [11]:
print(user_input.model_dump_json(indent=2))

{
  "name": "Joe User",
  "email": "joe.user@example.com",
  "query": "I forgot my password.",
  "order_id": null,
  "purchase_date": null
}


In [12]:
user_input.model_dump()  # Dict Python

{'name': 'Joe User',
 'email': 'joe.user@example.com',
 'query': 'I forgot my password.',
 'order_id': None,
 'purchase_date': None}

In [13]:
# Define a dictionary with all fields including optional ones
input_data = {
    "name": "Joe User",
    "email": "joe.user@example.com",
    "query": f"""I bought a laptop carrying case and it turned out to be 
             the wrong size. I need to return it.""",
    "order_id": 12345,
    "purchase_date": date(2025, 12, 31)
}

# Validate the user input data
user_input = validate_user_input(input_data)

‚úÖ Valid user input created:
{
  "name": "Joe User",
  "email": "joe.user@example.com",
  "query": "I bought a laptop carrying case and it turned out to be \n             the wrong size. I need to return it.",
  "order_id": 12345,
  "purchase_date": "2025-12-31"
}


In [14]:
# Define a dictionary with all fields and including additional ones
input_data = {
    "name": "Joe User",
    "email": "joe.user@example.com",
    "query": f"""I bought a laptop carrying case and it turned out to be 
             the wrong size. I need to return it.""",
    "order_id": 12345,
    "purchase_date": date(2025, 12, 31),
    "system_message": "logging status regarding order processing...",
    "iteration": 1 
}

# Validate the user input data
user_input = validate_user_input(input_data)

‚úÖ Valid user input created:
{
  "name": "Joe User",
  "email": "joe.user@example.com",
  "query": "I bought a laptop carrying case and it turned out to be \n             the wrong size. I need to return it.",
  "order_id": 12345,
  "purchase_date": "2025-12-31"
}


In [15]:
print(user_input)

name='Joe User' email='joe.user@example.com' query='I bought a laptop carrying case and it turned out to be \n             the wrong size. I need to return it.' order_id=12345 purchase_date=datetime.date(2025, 12, 31)


In [16]:
# Create an instance of UserInput with valid data
input_data = {
    "name": "Joe User",
    "email": "joe.user@example.com",
    "query": f"""I bought a laptop carrying case and it turned out to be 
             the wrong size. I need to return it.""",
    "order_id": 12345,
    "purchase_date": "2025-12-31"
}

user_input = validate_user_input(input_data)

‚úÖ Valid user input created:
{
  "name": "Joe User",
  "email": "joe.user@example.com",
  "query": "I bought a laptop carrying case and it turned out to be \n             the wrong size. I need to return it.",
  "order_id": 12345,
  "purchase_date": "2025-12-31"
}


In [17]:
# Define order_id as a string
input_data = {
    "name": "Joe User",
    "email": "joe.user@example.com",
    "query": f"""I bought a laptop carrying case and it turned out to be 
             the wrong size. I need to return it.""",
    "order_id": "12345",
    "purchase_date": "2025-12-31"
}

# Validate the user input data
user_input = validate_user_input(input_data)

‚úÖ Valid user input created:
{
  "name": "Joe User",
  "email": "joe.user@example.com",
  "query": "I bought a laptop carrying case and it turned out to be \n             the wrong size. I need to return it.",
  "order_id": 12345,
  "purchase_date": "2025-12-31"
}


In [18]:
# Define name field as an integer
input_data = {
    "name": 99999,
    "email": "joe.user@example.com",
    "query": f"""I bought a laptop carrying case and it turned out to be 
             the wrong size. I need to return it.""",
    "order_id": 12345,
    "purchase_date": "2025-12-31"
}

# Validate the user input data
user_input = validate_user_input(input_data)

‚ùå Validation error occurred:
  - name: Input should be a valid string


### Try starting with JSON data as input

In [19]:
# Define user input as JSON data
json_data = '''
{
    "name": "Joe User",
    "email": "joe.user@example.com",
    "query": "I bought a keyboard and mouse and was overcharged.",
    "order_id": 12345,
    "purchase_date": "2025-12-31"
}
'''

# Parse the JSON string into a Python dictionary
input_data = json.loads(json_data)
print("Parsed JSON:", input_data)

Parsed JSON: {'name': 'Joe User', 'email': 'joe.user@example.com', 'query': 'I bought a keyboard and mouse and was overcharged.', 'order_id': 12345, 'purchase_date': '2025-12-31'}


In [20]:
import json
# Parse the JSON string into a Python dictionary
input_data = json.loads(json_data)

# Converte o dicion√°rio de volta para uma string JSON formatada
# O par√¢metro 'indent=4' adiciona 4 espa√ßos de indenta√ß√£o para cada n√≠vel
pretty_json = json.dumps(input_data, indent=4)

print(pretty_json)

{
    "name": "Joe User",
    "email": "joe.user@example.com",
    "query": "I bought a keyboard and mouse and was overcharged.",
    "order_id": 12345,
    "purchase_date": "2025-12-31"
}


In [21]:
# Validate the user iput data
user_input = validate_user_input(input_data)

‚úÖ Valid user input created:
{
  "name": "Joe User",
  "email": "joe.user@example.com",
  "query": "I bought a keyboard and mouse and was overcharged.",
  "order_id": 12345,
  "purchase_date": "2025-12-31"
}


In [22]:
# Try different JSON input
json_data = '''
{
    "name": "Joe User",
    "email": "joe.user@example.com",
    "query": "My account has been locked for some reason.",
    "order_id": "12345",
    "purchase_date": "2025-12-31"
}
'''

# Parse the JSON into a Python dictionary
input_data = json.loads(json_data)
print("Parsed JSON:", input_data)

Parsed JSON: {'name': 'Joe User', 'email': 'joe.user@example.com', 'query': 'My account has been locked for some reason.', 'order_id': '12345', 'purchase_date': '2025-12-31'}


In [23]:
# Validate the customer support data from JSON with non-standard formats
user_input = validate_user_input(input_data)

‚úÖ Valid user input created:
{
  "name": "Joe User",
  "email": "joe.user@example.com",
  "query": "My account has been locked for some reason.",
  "order_id": 12345,
  "purchase_date": "2025-12-31"
}


### Try the `model_validate_json` method

### Note: the following cell will produce a validation error. You can correct the error by following along with the video. 

In [24]:
#! Parse JSON and validate user input data in one step using model_validate_json method
user_input = UserInput.model_validate_json(json_data)
print(user_input.model_dump_json(indent=2))

{
  "name": "Joe User",
  "email": "joe.user@example.com",
  "query": "My account has been locked for some reason.",
  "order_id": 12345,
  "purchase_date": "2025-12-31"
}


---

## Conclusion

In this lesson, you learned how to use Pydantic models to validate user input for a customer support scenario. By defining clear data models and handling validation errors, you can ensure your code only works with well-formed data. This approach helps you build more robust and reliable applications, and sets the stage for more advanced validation and structured output in future lessons.