Lesson 4: Using Pydantic Models for Structured LLM Output (DeepLearning.AI)

Other than using openai from the course, this notebook gonna using GenAI SDK from Google

The goals:
1. Use Pydantic models directly in your API calls to LLMs
2. Reliably receive a different frameworks and LLM providers


In [1]:
# Import all the libraries
from pydantic import BaseModel, Field, field_validator
from typing import List, Literal, Optional
# import google.generativeai as genai # Being [Deprecated] and End-of-Life Date: All support for this repository (including bug fixes) will permanently end on August 31st, 2025.
from google import genai
from google.genai import types
import instructor
import anthropic
from dotenv import load_dotenv
from datetime import datetime

In [2]:
# Define the UserInput model
class UserInput(BaseModel):
    name: str = Field(..., description="Customer name")
    email: str = Field(..., description="Customer email address")
    query: str = Field(..., description="Customer query")
    order_id: Optional[int] = Field(
        None,
        description="5-digit order number (cannot start with 0)",
        ge=10000,
        le=99999
    )
    # This part is to solve the compatibility issue of insturctor. InstructorRetryException: 1 validation error for CustomerQuery
    # purchase_date Input should be a valid datetime [type=datetime_type, input_value='2025-01-25T10:38:20Z', input_type=str]
    purchase_date: Optional[datetime] = Field(None)
    @field_validator('purchase_date', mode = 'before')
    def parse_purchase_date(cls, v):
        if isinstance(v, str):
            # Convert 'Z' (UTC) to '+00:00' for compatibility with fromisoformat
            return datetime.fromisoformat(v.replace('Z', '+00:00'))
# Define the CustomerQuery model
class CustomerQuery(UserInput):
    priority: str = Field(..., description="Priority level: low, medibum, high")
    category: Literal['refund_request', 'information_request', 'other'] = Field(
        ..., description="Whether this is a complaint"
    )
    tags: List[str] = Field(..., description="Relevant keyword tags")


In [3]:
# Example user input data as a dictionary


user_input_json = '''{
    "name": "Joe User",
    "email": "joe.user@example.com",
    "query": "I ordered a new computer monitor and it arrived with the screen cracked. This is the second time that has happened. I need a replacement ASAP.",
    "order_id": 12345,
    "purchase_date": "2025-01-25"
} '''

In [4]:
# Validate the user_input_json by creating a UserInput instance
user_input = UserInput.model_validate_json(user_input_json)
print(user_input)

name='Joe User' email='joe.user@example.com' query='I ordered a new computer monitor and it arrived with the screen cracked. This is the second time that has happened. I need a replacement ASAP.' order_id=12345 purchase_date=datetime.datetime(2025, 1, 25, 0, 0)


Build a prompt and all the anthropic API with the instructor pacakage for structured output

In [5]:
prompt = (
    f"Analyze the following customer query {user_input}"
    f"and provide a structured response"
)

In [6]:
# Load environmental variables
load_dotenv(dotenv_path="C:\\Users\\yawen\\Documents\\Learning\\Pydantic\\geminiai.env")
import os

# Configure the google generaltive AI client
api_key = os.getenv("Google_API_KEY")
if not api_key:
    raise
ValueError("Google_API_KEY not found in environment variables")

client = genai.Client(api_key=api_key)

# Use Google Gemini with instructor to get structured output
client = instructor.from_provider(
        "google/gemini-2.5-pro",
    mode=instructor.Mode.GENAI_STRUCTURED_OUTPUTS,
)



In [7]:
# Extract structured data
response = client.messages.create(
    messages=[{"role": "user", "content": prompt}],
    response_model=CustomerQuery,
)
print("Response:", response)


Response: name='Joe User' email='joe.user@example.com' query='I ordered a new computer monitor and it arrived with the screen cracked. This is the second time that has happened. I need a replacement ASAP.' order_id=12345 purchase_date=datetime.datetime(2025, 1, 25, 0, 0, tzinfo=datetime.timezone.utc) priority='high' category='refund_request' tags=['damaged_item', 'replacement', 'repeat_issue']


In [8]:
print(type(response))
print(response.model_dump_json(indent=2))

<class '__main__.CustomerQuery'>
{
  "name": "Joe User",
  "email": "joe.user@example.com",
  "query": "I ordered a new computer monitor and it arrived with the screen cracked. This is the second time that has happened. I need a replacement ASAP.",
  "order_id": 12345,
  "purchase_date": "2025-01-25T00:00:00Z",
  "priority": "high",
  "category": "refund_request",
  "tags": [
    "damaged_item",
    "replacement",
    "repeat_issue"
  ]
}


In [9]:
# This portion is to test whether the model itself works or not
raw_response = client.messages.create(
    messages=[{"role": "user", "content": prompt}],
    response_model=None,

)
print("Raw response:", raw_response)

Raw response: sdk_http_response=HttpResponse(
  headers=<dict len=11>
) candidates=[Candidate(
  content=Content(
    parts=[
      Part(
        text="""Of course. Here is a structured analysis and response plan for the customer query.

---

### **Analysis of Customer Query**

#### **1. Summary of Query**
The customer, Joe User, received a computer monitor for order #12345 that was damaged on arrival (cracked screen). This is the second time this specific issue has occurred for the customer, who is now requesting an urgent replacement.

#### **2. Sentiment and Urgency Analysis**
*   **Sentiment:** **Highly Negative**. The customer is justifiably frustrated and dissatisfied. Key indicators are "cracked," "second time this has happened," and the overall tone of the complaint.
*   **Urgency:** **High**. The customer explicitly uses the term "ASAP" (As Soon As Possible). The repeat nature of the problem escalates the urgency, as the customer's patience and trust are likely exhausted. This

# Additional advanced usage and inspection

In [10]:
# Validate the response you got from the LLM
valid_data = CustomerQuery.model_validate_json(response.model_dump_json())
print(type(valid_data))
print(valid_data.model_dump_json(indent=2))

<class '__main__.CustomerQuery'>
{
  "name": "Joe User",
  "email": "joe.user@example.com",
  "query": "I ordered a new computer monitor and it arrived with the screen cracked. This is the second time that has happened. I need a replacement ASAP.",
  "order_id": 12345,
  "purchase_date": "2025-01-25T00:00:00Z",
  "priority": "high",
  "category": "refund_request",
  "tags": [
    "damaged_item",
    "replacement",
    "repeat_issue"
  ]
}


In [16]:
# Try out the Pydantic AI package for defining an agent and getting a structured response
from pydantic_ai import Agent
import nest_asyncio
nest_asyncio.apply()
agent = Agent(
    model="google-gla:gemini-2.0-flash",
    # model="openai:gpt-4o", # Need openai_api_key, i don't have
    output_type=CustomerQuery,
)

response = agent.run_sync(prompt)

In [12]:
# Print out the repsonse type and content
print(type(response.output))
print(response.output.model_dump_json(indent=2))

<class '__main__.CustomerQuery'>
{
  "name": "Joe User",
  "email": "joe.user@example.com",
  "query": "I ordered a new computer monitor and it arrived with the screen cracked. This is the second time that has happened. I need a replacement ASAP.",
  "order_id": 12345,
  "purchase_date": "2025-01-25T00:00:00",
  "priority": "high",
  "category": "refund_request",
  "tags": [
    "broken",
    "monitor",
    "replacement"
  ]
}


In [14]:
def print_class_inheritence(llm_response):
    for cls in type(llm_response).mro():
        print(f"{cls.__module__}.{cls.__name__}")

print_class_inheritence(response)

pydantic_ai.agent.AgentRunResult
typing.Generic
builtins.object
