You'll learn the fundamentals of Pydantic models for data validation using a customer support system as your example application. You'll see how to define data models, validate user input, and handle validation errors gracefully.

You'll be able to:
- Create Pydantic models to validate user input data
- Handle validation errors with proper error handling
- Use optional fields and field constraints in your models
- Work with JSON data validation methods


!pip install pydantic

In [28]:
from pydantic import BaseModel, EmailStr, ValidationError

In [29]:
class UserInput(BaseModel):
    name:str
    email:EmailStr
    query:str

In [30]:
user_input = UserInput(name = "Waqas", email = "waqasali@gmail.com",query="I forgot my password")

In [31]:
user_input 

UserInput(name='Waqas', email='waqasali@gmail.com', query='I forgot my password')

In [32]:
print(user_input.model_dump_json(indent=2))

{
  "name": "Waqas",
  "email": "waqasali@gmail.com",
  "query": "I forgot my password"
}


In [33]:
user_input = UserInput(name = "Waqas", email = "waqasaligmail.com",query="I forgot my password")

ValidationError: 1 validation error for UserInput
email
  value is not a valid email address: An email address must have an @-sign. [type=value_error, input_value='waqasaligmail.com', input_type=str]

In [34]:
user_input

UserInput(name='Waqas', email='waqasali@gmail.com', query='I forgot my password')

In [1]:
def validate_user_input(input_data):
    try:
        user_input = UserInput(**input_data)
        print("Valid User Input Created")
        print(user_input.model_dump_json(indent=2))
        return user_input
    except ValidationError as e:
        print("Validation error occurred")
        for error in e.errors():
            print(f"  - {error['loc'][0]}: {error['msg']}")
        return None

In [36]:
input_data = {
    "name": "Joe User", 
    "email": "joe.user@example.com",
    "query": "I forgot my password."
}

user_input = validate_user_input(input_data)

Valid User Input Created
{
  "name": "Joe User",
  "email": "joe.user@example.com",
  "query": "I forgot my password."
}


In [39]:
# Attempt to create an instance of UserInput with missing query field
input_data = {
    "name": "Joe User", 
    "email": "joe.user@example.com"
}

user_input = validate_user_input(input_data)

Validation error occurred
{'type': 'missing', 'loc': ('query',), 'msg': 'Field required', 'input': {'name': 'Joe User', 'email': 'joe.user@example.com'}, 'url': 'https://errors.pydantic.dev/2.11/v/missing'}


In [42]:
# Attempt to create an instance of UserInput with missing query field
input_data = {
    "name": "Joe User", 
    "email": "joe.user@example.com"
}

user_input = validate_user_input(input_data)

Validation error occurred
  - query: Field required


In [44]:
from pydantic import Field
from typing import Optional
from datetime import date

In [50]:
class UserInput(BaseModel):
    name:str
    email:EmailStr
    query:str
    order_id:Optional[int] = Field(None,
                                   description="5 digit order number (can not start with 0)"
                                   ,ge = 10000,
                                   le=99999)
    purchase_date:Optional[date]=None

In [51]:
# Define a dictionary with required fields only
input_data = {
    "name": "Joe User",
    "email": "joe.user@example.com",
    "query": "I forgot my password."
}

# Validate the user input data
user_input = validate_user_input(input_data)

Valid User Input Created
{
  "name": "Joe User",
  "email": "joe.user@example.com",
  "query": "I forgot my password.",
  "order_id": null,
  "purchase_date": null
}


In [52]:
user_input

UserInput(name='Joe User', email='joe.user@example.com', query='I forgot my password.', order_id=None, purchase_date=None)

In [53]:
# Define a dictionary with required fields only
input_data = {
    "name": "Joe User",
    "email": "joe.user@example.com",
    "query": "I forgot my password.",
    "order_id": 12345,
    "purchase_date": date(2015,12,1)
}

# Validate the user input data
user_input = validate_user_input(input_data)

Valid User Input Created
{
  "name": "Joe User",
  "email": "joe.user@example.com",
  "query": "I forgot my password.",
  "order_id": 12345,
  "purchase_date": "2015-12-01"
}


In [54]:
# Define a dictionary with all fields and including additional ones
input_data = {
    "name": "Joe User",
    "email": "joe.user@example.com",
    "query": f"""I bought a laptop carrying case and it turned out to be 
             the wrong size. I need to return it.""",
    "order_id": 12345,
    "purchase_date": date(2025, 12, 31),
    "system_message": "logging status regarding order processing...",
    "iteration": 1 
}

# Validate the user input data
user_input = validate_user_input(input_data)

Valid User Input Created
{
  "name": "Joe User",
  "email": "joe.user@example.com",
  "query": "I bought a laptop carrying case and it turned out to be \n             the wrong size. I need to return it.",
  "order_id": 12345,
  "purchase_date": "2025-12-31"
}


In [55]:
user_input

UserInput(name='Joe User', email='joe.user@example.com', query='I bought a laptop carrying case and it turned out to be \n             the wrong size. I need to return it.', order_id=12345, purchase_date=datetime.date(2025, 12, 31))

In [56]:
# Create an instance of UserInput with valid data
input_data = {
    "name": "Joe User",
    "email": "joe.user@example.com",
    "query": f"""I bought a laptop carrying case and it turned out to be 
             the wrong size. I need to return it.""",
    "order_id": 12345,
    "purchase_date": "2025-12-31"
}

user_input = validate_user_input(input_data)

Valid User Input Created
{
  "name": "Joe User",
  "email": "joe.user@example.com",
  "query": "I bought a laptop carrying case and it turned out to be \n             the wrong size. I need to return it.",
  "order_id": 12345,
  "purchase_date": "2025-12-31"
}


In [57]:
user_input

UserInput(name='Joe User', email='joe.user@example.com', query='I bought a laptop carrying case and it turned out to be \n             the wrong size. I need to return it.', order_id=12345, purchase_date=datetime.date(2025, 12, 31))

In [58]:
# Define order_id as a string
input_data = {
    "name": "Joe User",
    "email": "joe.user@example.com",
    "query": f"""I bought a laptop carrying case and it turned out to be 
             the wrong size. I need to return it.""",
    "order_id": "12345",
    "purchase_date": "2025-12-31"
}

# Validate the user input data
user_input = validate_user_input(input_data)

Valid User Input Created
{
  "name": "Joe User",
  "email": "joe.user@example.com",
  "query": "I bought a laptop carrying case and it turned out to be \n             the wrong size. I need to return it.",
  "order_id": 12345,
  "purchase_date": "2025-12-31"
}


In [59]:
# Define name field as an integer
input_data = {
    "name": 99999,
    "email": "joe.user@example.com",
    "query": f"""I bought a laptop carrying case and it turned out to be 
             the wrong size. I need to return it.""",
    "order_id": 12345,
    "purchase_date": "2025-12-31"
}

# Validate the user input data
user_input = validate_user_input(input_data)

Validation error occurred
  - name: Input should be a valid string


In [60]:
# Define name field as an integer
input_data = {
    "name": 99999,
    "email": "joe.user@example.com",
    "query": f"""I bought a laptop carrying case and it turned out to be 
             the wrong size. I need to return it.""",
    "order_id": "12345W",
    "purchase_date": "2025-12-31"
}

# Validate the user input data
user_input = validate_user_input(input_data)

Validation error occurred
  - name: Input should be a valid string
  - order_id: Input should be a valid integer, unable to parse string as an integer


In [62]:
import json

In [63]:
# Define user input as JSON data
json_data = '''
{
    "name": "Joe User",
    "email": "joe.user@example.com",
    "query": "I bought a keyboard and mouse and was overcharged.",
    "order_id": 12345,
    "purchase_date": "2025-12-31"
}
'''

# Parse the JSON string into a Python dictionary
input_data = json.loads(json_data)
print("Parsed JSON:", input_data)

Parsed JSON: {'name': 'Joe User', 'email': 'joe.user@example.com', 'query': 'I bought a keyboard and mouse and was overcharged.', 'order_id': 12345, 'purchase_date': '2025-12-31'}


In [64]:
user_input = validate_user_input(input_data)

Valid User Input Created
{
  "name": "Joe User",
  "email": "joe.user@example.com",
  "query": "I bought a keyboard and mouse and was overcharged.",
  "order_id": 12345,
  "purchase_date": "2025-12-31"
}


In [65]:
# Try different JSON input
json_data = '''
{
    "name": "Joe User",
    "email": "joe.user@example.com",
    "query": "My account has been locked for some reason.",
    "order_id": "01234",
    "purchase_date": "2025-12-31"
}
'''

# Parse the JSON into a Python dictionary
input_data = json.loads(json_data)
print("Parsed JSON:", input_data)

Parsed JSON: {'name': 'Joe User', 'email': 'joe.user@example.com', 'query': 'My account has been locked for some reason.', 'order_id': '01234', 'purchase_date': '2025-12-31'}


In [66]:
# Validate the customer support data from JSON with non-standard formats
user_input = validate_user_input(input_data)

Validation error occurred
  - order_id: Input should be greater than or equal to 10000


In [68]:
user_input = UserInput.model_validate_json(json_data)
user_input.model_validate_json(indent = 2)

ValidationError: 1 validation error for UserInput
order_id
  Input should be greater than or equal to 10000 [type=greater_than_equal, input_value='01234', input_type=str]
    For further information visit https://errors.pydantic.dev/2.11/v/greater_than_equal

In [72]:
# Try different JSON input
json_data = '''
{
    "name": "Joe User",
    "email": "joe.user@example.com",
    "query": "My account has been locked for some reason.",
    "order_id": "11234",
    "purchase_date": "2025-12-31"
}
'''

user_input = UserInput.model_validate_json(json_data)
print(user_input.model_dump_json(indent = 2))

{
  "name": "Joe User",
  "email": "joe.user@example.com",
  "query": "My account has been locked for some reason.",
  "order_id": 11234,
  "purchase_date": "2025-12-31"
}


In [73]:
# Try different JSON input
json_data = '''
{
    "name": "Joe User",
    "email": "joe.user@example.com",
    "query": "My account has been locked for some reason.",
    "order_id": "11234",
    "purchase_date": "2025-12-31"

'''

user_input = UserInput.model_validate_json(json_data)
print(user_input.model_dump_json(indent = 2))

ValidationError: 1 validation error for UserInput
  Invalid JSON: EOF while parsing an object at line 9 column 0 [type=json_invalid, input_value='\n{\n    "name": "Joe Us...date": "2025-12-31"\n\n', input_type=str]
    For further information visit https://errors.pydantic.dev/2.11/v/json_invalid

# Prompting for structure and setting up a retry method

You'll learn how to combine Pydantic with retry strategies to reliably extract structured output from an LLM.

By the end, you'll be able to:
- Define structured data models for LLM responses
- Build robust retry mechanisms for validation errors
- Create reusable functions for LLM interactions




In [6]:
import os
from dotenv import load_dotenv

import openai
from openai import OpenAI

load_dotenv(".env", override=True)

True

In [3]:
openai.api_key = os.getenv("OPENAI_API_KEY")

In [7]:
client = OpenAI()

In [20]:
user_input_json = '''
        {"name":"Waqas",
        "email": "waqas@gmail.com",
        "query": "I forgot my password",
        "order_id": null,
        "purchase_id": null

                    }

'''
    

In [25]:
from pydantic import BaseModel,EmailStr, ValidationError, Field
from typing import Optional, Literal, List
from datetime import date

In [21]:
class UserInput(BaseModel):
    name:str
    email:EmailStr
    query: str
    order_id:Optional[int] = Field(None,
                                  description="5 Digit order id (cant be started with 0)",
                                  ge = 10000,le = 99999)
    purchase_date: Optional[date] = None

In [22]:
user_input = UserInput.model_validate_json(user_input_json)

In [23]:
user_input

UserInput(name='Waqas', email='waqas@gmail.com', query='I forgot my password', order_id=None, purchase_date=None)

# Create a new data model called CustomerQuery

In [26]:
class CustomerQuery(UserInput):
    priority:str = Field(..., description="Priority Level: low, medium, high")
    category:Literal["refund_request","information_request", "other"] = Field(...,description="Query Category")
    is_complaint:bool= Field(..., description = "Whether this is a complaint")
    tags: List[str] = Field(..., description="Relevant keyword tags")
    

In [28]:
example_response_structure = """{
    name="Example User",
    email="user@example.com",
    query="I ordered a new computer monitor and it arrived with the screen cracked. I need to exchange it for a new one.",
    order_id=12345,
    purchase_date="2025-12-31",
    priority="medium",
    category="refund_request",
    is_complaint=True,
    tags=["monitor", "support", "exchange"] 
}"""

In [29]:
# Create prompt with user data and expected JSON structure
prompt = f"""
Please analyze this user query\n {user_input.model_dump_json(indent=2)}:

Return your analysis as a JSON object matching this exact structure 
and data types:
{example_response_structure}

Respond ONLY with valid JSON. Do not include any explanations or 
other text or formatting before or after the JSON object.
"""

print(prompt)


Please analyze this user query
 {
  "name": "Waqas",
  "email": "waqas@gmail.com",
  "query": "I forgot my password",
  "order_id": null,
  "purchase_date": null
}:

Return your analysis as a JSON object matching this exact structure 
and data types:
{
    name="Example User",
    email="user@example.com",
    query="I ordered a new computer monitor and it arrived with the screen cracked. I need to exchange it for a new one.",
    order_id=12345,
    purchase_date="2025-12-31",
    priority="medium",
    category="refund_request",
    is_complaint=True,
    tags=["monitor", "support", "exchange"] 
}

Respond ONLY with valid JSON. Do not include any explanations or 
other text or formatting before or after the JSON object.



In [32]:
def call_llm(prompt, model = "gpt-5-nano"):
    response = client.chat.completions.create(model = model,
                                      messages = [{"role":"user", "content":prompt}])
    return response.choices[0].message.content

In [33]:
response_content = call_llm(prompt)
print(response_content)

{
  "name": "Waqas",
  "email": "waqas@gmail.com",
  "query": "I forgot my password",
  "order_id": null,
  "purchase_date": null,
  "priority": "low",
  "category": "password_reset",
  "is_complaint": false,
  "tags": ["password", "login", "account"]
}


In [34]:
type(response_content)

str

In [35]:
# Attempt to parse the response into CustomerQuery model
valid_data = CustomerQuery.model_validate_json(response_content)

ValidationError: 1 validation error for CustomerQuery
category
  Input should be 'refund_request', 'information_request' or 'other' [type=literal_error, input_value='password_reset', input_type=str]
    For further information visit https://errors.pydantic.dev/2.11/v/literal_error

In [63]:
# Define a function to validate an LLM response
def validate_with_model(data_model, llm_response):
    try:
        validated_data = data_model.model_validate_json(llm_response)
        print("data validation successful!")
        print(validated_data.model_dump_json(indent=2))
        return validated_data, None
    except ValidationError as e:
        print(f"error validating data: {e}")
        error_message = (
            f"This response generated a validation error: {e}."
        )
        return None, error_message

In [37]:
validated_data, validation_error = validate_with_model(CustomerQuery, response_content
   )

error validating data: 1 validation error for CustomerQuery
category
  Input should be 'refund_request', 'information_request' or 'other' [type=literal_error, input_value='password_reset', input_type=str]
    For further information visit https://errors.pydantic.dev/2.11/v/literal_error


In [38]:
# Define a function to create a retry prompt with error feedback
def create_retry_prompt(
    original_prompt, original_response, error_message
):
    retry_prompt = f"""
This is a request to fix an error in the structure of an llm_response.
Here is the original request:
<original_prompt>
{original_prompt}
</original_prompt>

Here is the original llm_response:
<llm_response>
{original_response}
</llm_response>

This response generated an error: 
<error_message>
{error_message}
</error_message>

Compare the error message and the llm_response and identify what 
needs to be fixed or removed
in the llm_response to resolve this error. 

Respond ONLY with valid JSON. Do not include any explanations or 
other text or formatting before or after the JSON string.
"""
    return retry_prompt

In [39]:
# Create a retry prompt for validation errors
validation_retry_prompt = create_retry_prompt(
    original_prompt=prompt,
    original_response=response_content,
    error_message=validation_error
)

print(validation_retry_prompt)


This is a request to fix an error in the structure of an llm_response.
Here is the original request:
<original_prompt>

Please analyze this user query
 {
  "name": "Waqas",
  "email": "waqas@gmail.com",
  "query": "I forgot my password",
  "order_id": null,
  "purchase_date": null
}:

Return your analysis as a JSON object matching this exact structure 
and data types:
{
    name="Example User",
    email="user@example.com",
    query="I ordered a new computer monitor and it arrived with the screen cracked. I need to exchange it for a new one.",
    order_id=12345,
    purchase_date="2025-12-31",
    priority="medium",
    category="refund_request",
    is_complaint=True,
    tags=["monitor", "support", "exchange"] 
}

Respond ONLY with valid JSON. Do not include any explanations or 
other text or formatting before or after the JSON object.

</original_prompt>

Here is the original llm_response:
<llm_response>
{
  "name": "Waqas",
  "email": "waqas@gmail.com",
  "query": "I forgot my p

In [40]:
# Call the LLM with the validation retry prompt
validation_retry_response = call_llm(validation_retry_prompt)
print(validation_retry_response)

{
  "issue": "Invalid category literal",
  "current_value": "password_reset",
  "allowed_values": ["refund_request", "information_request", "other"],
  "recommended_fix": {
    "category": "information_request"
  }
}


In [41]:
# Create a second retry prompt for validation errors
second_validation_retry_prompt = create_retry_prompt(
    original_prompt=validation_retry_prompt,
    original_response=validation_retry_response,
    error_message=validation_error
)

print(second_validation_retry_prompt)


This is a request to fix an error in the structure of an llm_response.
Here is the original request:
<original_prompt>

This is a request to fix an error in the structure of an llm_response.
Here is the original request:
<original_prompt>

Please analyze this user query
 {
  "name": "Waqas",
  "email": "waqas@gmail.com",
  "query": "I forgot my password",
  "order_id": null,
  "purchase_date": null
}:

Return your analysis as a JSON object matching this exact structure 
and data types:
{
    name="Example User",
    email="user@example.com",
    query="I ordered a new computer monitor and it arrived with the screen cracked. I need to exchange it for a new one.",
    order_id=12345,
    purchase_date="2025-12-31",
    priority="medium",
    category="refund_request",
    is_complaint=True,
    tags=["monitor", "support", "exchange"] 
}

Respond ONLY with valid JSON. Do not include any explanations or 
other text or formatting before or after the JSON object.

</original_prompt>

Here i

In [42]:
# Call the LLM with the second validation retry prompt
second_validation_retry_response = call_llm(
    second_validation_retry_prompt
)
print(second_validation_retry_response)

{
  "field_to_fix": "category",
  "current_value": "password_reset",
  "allowed_values": ["refund_request", "information_request", "other"],
  "recommended_fix": "information_request",
  "action": "update_llm_response_to_valid_category"
}


In [52]:
# Define a function to automatically retry an LLM call multiple times
def validate_llm_response(
    prompt, data_model, n_retry=5, model="gpt-4.1-mini"
):
    # Initial LLM call
    response_content = call_llm(prompt, model=model)
    current_prompt = prompt

    # Try to validate with the model
    # attempt: 0=initial, 1=first retry, ...
    for attempt in range(n_retry + 1):

        validated_data, validation_error = validate_with_model(
            data_model, response_content
        )

        if validation_error:
            if attempt < n_retry:
                print(f"retry {attempt} of {n_retry} failed, trying again...")
            else:
                print(f"Max retries reached. Last error: {validation_error}")
                return None, (
                    f"Max retries reached. Last error: {validation_error}"
                )

            validation_retry_prompt = create_retry_prompt(
                original_prompt=current_prompt,
                original_response=response_content,
                error_message=validation_error
            )
            response_content = call_llm(
                validation_retry_prompt, model=model
            )
            current_prompt = validation_retry_prompt
            continue

        # If you get here, both parsing and validation succeeded
        return validated_data, None

In [44]:
# Test your complete solution with the original prompt
validated_data, error = validate_llm_response(
    prompt, CustomerQuery
)

error validating data: 1 validation error for CustomerQuery
category
  Input should be 'refund_request', 'information_request' or 'other' [type=literal_error, input_value='password_reset', input_type=str]
    For further information visit https://errors.pydantic.dev/2.11/v/literal_error
retry 0 of 5 failed, trying again...
error validating data: 7 validation errors for CustomerQuery
name
  Field required [type=missing, input_value={'issue': 'Invalid catego...n_request' / 'other')."}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.11/v/missing
email
  Field required [type=missing, input_value={'issue': 'Invalid catego...n_request' / 'other')."}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.11/v/missing
query
  Field required [type=missing, input_value={'issue': 'Invalid catego...n_request' / 'other')."}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.11/v/missing
priority
  Field requ

In [48]:
import json
# Investigate the model_json_schema for CustomerQuery
data_model_schema = json.dumps(
    CustomerQuery.model_json_schema(), indent=2
)
print(data_model_schema)

{
  "properties": {
    "name": {
      "title": "Name",
      "type": "string"
    },
    "email": {
      "format": "email",
      "title": "Email",
      "type": "string"
    },
    "query": {
      "title": "Query",
      "type": "string"
    },
    "order_id": {
      "anyOf": [
        {
          "maximum": 99999,
          "minimum": 10000,
          "type": "integer"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "5 Digit order id (cant be started with 0)",
      "title": "Order Id"
    },
    "purchase_date": {
      "anyOf": [
        {
          "format": "date",
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "title": "Purchase Date"
    },
    "priority": {
      "description": "Priority Level: low, medium, high",
      "title": "Priority",
      "type": "string"
    },
    "category": {
      "description": "Query Category",
      "enum":

In [49]:
# Print the original prompt from above
print(prompt)


Please analyze this user query
 {
  "name": "Waqas",
  "email": "waqas@gmail.com",
  "query": "I forgot my password",
  "order_id": null,
  "purchase_date": null
}:

Return your analysis as a JSON object matching this exact structure 
and data types:
{
    name="Example User",
    email="user@example.com",
    query="I ordered a new computer monitor and it arrived with the screen cracked. I need to exchange it for a new one.",
    order_id=12345,
    purchase_date="2025-12-31",
    priority="medium",
    category="refund_request",
    is_complaint=True,
    tags=["monitor", "support", "exchange"] 
}

Respond ONLY with valid JSON. Do not include any explanations or 
other text or formatting before or after the JSON object.



In [53]:
# Create new prompt with user input and model_json_schema
prompt = f"""
Please analyze this user query\n {user_input.model_dump_json(indent=2)}:

Return your analysis as a JSON object matching the following schema:
{data_model_schema}

Respond ONLY with valid JSON. Do not include any explanations or 
other text or formatting before or after the JSON object.
"""

In [64]:
# Run your validate_llm_response function with the new prompt
final_analysis, error = validate_llm_response(
    prompt, CustomerQuery
)

error validating data: 1 validation error for CustomerQuery
  Invalid JSON: expected value at line 1 column 1 [type=json_invalid, input_value='```json\n{\n  "name": "W... "login issue"]\n}\n```', input_type=str]
    For further information visit https://errors.pydantic.dev/2.11/v/json_invalid
retry 0 of 5 failed, trying again...
error validating data: 1 validation error for CustomerQuery
  Invalid JSON: expected value at line 1 column 1 [type=json_invalid, input_value='```json\n{\n  "name": "W... "login issue"]\n}\n```', input_type=str]
    For further information visit https://errors.pydantic.dev/2.11/v/json_invalid
retry 1 of 5 failed, trying again...
data validation successful!
{
  "name": "Waqas",
  "email": "waqas@gmail.com",
  "query": "I forgot my password",
  "order_id": null,
  "purchase_date": null,
  "priority": "high",
  "category": "information_request",
  "is_complaint": false,
  "tags": [
    "password",
    "account access",
    "login issue"
  ]
}


---

## Conclusion

In this file, you explored how to combine Pydantic models with retry logic to reliably extract structured data from LLM outputs. You practiced building reusable validation functions and prompts, and saw how robust error handling can help you get consistent, usable results from language models. These techniques will help you confidently scale up your LLM-powered workflows.

