<a href="https://colab.research.google.com/github/deedeeharris/AI/blob/main/structured_output_openai.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Structured Output with OpenAI and Pydantic

**By:** Yedidya Harris (July, 2025)

This notebook demonstrates how to use the OpenAI Python SDK (v1.93.0+) and Pydantic (v2+) to get structured output from OpenAI models. It provides a reusable function `get_structured_response` that takes a Pydantic model schema and a user message to guide the AI in generating output that conforms to the specified structure.

The notebook includes examples of using the `get_structured_response` function for:

  - Summarizing a product review into a structured format (product, sentiment, pros, cons).
  - Generating a multiple-choice question with defined question, choices, and correct answer fields.
  - Analyzing a conversation to determine its level (beginner, intermediate, advanced) and provide a justification.

## Why Use Structured Outputs?

Structured outputs allow you to:

  * **Reliably extract data** in a predictable, type-safe format (e.g., JSON, Pydantic models).
  * **Integrate AI results** directly into applications, databases, or pipelines without fragile string parsing.
  * **Enforce schemas** so your AI outputs are always what you expect—no more guessing or post-processing headaches\!

**When to use structured outputs:**

  * When you need the AI to return data in a specific format (e.g., for forms, APIs, analytics, or downstream automation).
  * For tasks like data extraction, content generation with specific fields, classification, summarization, or any scenario where structure matters.

-----

## Requirements

  * Python **3.8+** (recommended: 3.9+)
  * OpenAI Python SDK **v1.93.0+**
  * Pydantic **v2.0+**

Install with:

```bash
pip install --upgrade openai pydantic
```

-----

**Important:** Never share your API key in public code or notebooks. Use environment variables or notebook secrets for security.

-----

## How to Use

1.  **Install dependencies:** Run the first code cell to install the necessary libraries (`openai` and `pydantic`).
2.  **Set your OpenAI API key:** Replace the placeholder API key in the example usage sections with your actual OpenAI API key. **Note:** For security, it is recommended to use Colab Secrets to store your API key / Or other secure alternatives.
3.  **Define your Pydantic model:** Create a Pydantic `BaseModel` subclass that defines the structure of the structured output you expect from the OpenAI model.
4.  **Prepare parameters:** Set the `model_name`, `temperature`, `system_prompt` (optional), `chat_history` (optional), `user_message`, and `api_key`.
5.  **Call `get_structured_response`:** Use the `get_structured_response` function, passing in the prepared parameters and your Pydantic model class.
6.  **Process the structured output:** The function will return an instance of your Pydantic model, containing the structured data from the AI's response.

This notebook provides a flexible and type-safe way to interact with OpenAI models for tasks requiring structured output, such as data extraction, content generation with specific formats, and more.

-----

## Minimal Example

```
from pydantic import BaseModel

class MinimalModel(BaseModel):
    message: str

result = get_structured_response(
    model_name="gpt-4.1-nano",
    api_key="YOUR_API_KEY",
    user_message="Say something nice!",
    pydantic_model=MinimalModel
)

print(result)
```

-----

## Supported Models

  * `gpt-4.1`, `gpt-4.1-mini`, `gpt-4.1-nano` (April 2025+)

-----

## Parameter Reference

| Parameter        | Required? | Default      | Description                          |
|:-----------------|:---------:|-------------|--------------------------------------|
| `model_name`     | Yes       | "gpt-4.1"   | Model to use                         |
| `temperature`    | No        | 0.0         | Sampling temperature                 |
| `system_prompt`  | No        | None        | System prompt for context            |
| `chat_history`   | No        | None        | Previous messages                    |
| `pydantic_model` | No        | DefaultModel| Output schema                        |
| `user_message`   | No        | None        | User's input message                 |
| `api_key`        | Yes       | None        | Your OpenAI API key                  |

**Note:** At least one of `user_message`, `system_prompt`, or `chat_history` must be provided. The API will return an error if no input is given.

## My Pydantic and OpenAI Version Numbers

In [3]:
import pydantic.version
print(pydantic.version.version_info())


             pydantic version: 2.11.7
        pydantic-core version: 2.33.2
          pydantic-core build: profile=release pgo=false
               python version: 3.11.13 (main, Jun  4 2025, 08:57:29) [GCC 11.4.0]
                     platform: Linux-6.1.123+-x86_64-with-glibc2.35
             related packages: typing_extensions-4.14.0 fastapi-0.115.14 typing_extensions-4.12.2
                       commit: unknown


In [7]:
import importlib.metadata
print(importlib.metadata.version('openai'))


1.93.0


## The Function and Full Example

In [1]:
# Install dependencies if needed
!pip install --upgrade openai pydantic

from pydantic import BaseModel
from openai import OpenAI
from typing import List, Optional, Type

# 1. Define a default Pydantic model
class DefaultModel(BaseModel):
    message: str

def get_structured_response(
    model_name: str = "gpt-4.1",
    temperature: float = 0.0,
    system_prompt: Optional[str] = None,
    chat_history: Optional[List[dict]] = None,
    pydantic_model: Type[BaseModel] = DefaultModel,  # Default model here
    user_message: Optional[str] = None,
    api_key: Optional[str] = None,
):
    """
    Get structured output from OpenAI using the latest SDK and Pydantic schema.

    This function interacts with the OpenAI API to generate a response
    that conforms to a specified Pydantic model schema.

    Args:
        model_name: The name of the OpenAI model to use (e.g., "gpt-4.1").
        temperature: The sampling temperature to use. Higher values mean
                     the model will take more risks.
        system_prompt: An optional initial system message to guide the model.
        chat_history: An optional list of previous messages in the conversation.
        pydantic_model: The Pydantic model class to use for structuring the output.
        user_message: The user's current message.
        api_key: An optional OpenAI API key. Defaults to using the environment variable.

    Returns:
        An instance of the provided Pydantic model containing the parsed response.
    """
    # Initialize OpenAI client
    client = OpenAI(api_key=api_key) if api_key else OpenAI()

    # Build message sequence
    messages = []
    if system_prompt:
        messages.append({"role": "system", "content": system_prompt})
    if chat_history:
        messages.extend(chat_history)
    if user_message:
      messages.append({"role": "user", "content": user_message})

    # Call the API with structured output parsing
    response = client.responses.parse(
        model=model_name,
        input=messages,
        text_format=pydantic_model,
        temperature=temperature,
    )
    return response.output_parsed




In [9]:
# --- Example usage ---

# 1. Define a Pydantic model for the expected output structure
class ProductReview(BaseModel):
    product: str
    sentiment: str
    pros: list[str]
    cons: list[str]

# 2. Prepare parameters for the API call
model_name = "gpt-4.1"
temperature = 0.2
system_prompt = "You are an assistant that summarizes product reviews in structured form."
chat_history = []  # Or previous messages if you have them
user_message = "I love the new Acme headphones. The sound is amazing, but the battery life is a bit short."
api_key = "OPENAI_API_KEY"

# 3. Call the get_structured_response function
result = get_structured_response(
    model_name=model_name,
    temperature=temperature,
    system_prompt=system_prompt,
    chat_history=chat_history,
    pydantic_model=ProductReview,
    user_message=user_message,
    api_key=api_key
)

# 4. Print the structured result
print(result)

product='Acme headphones' sentiment='positive' pros=['Amazing sound quality'] cons=['Short battery life']


## Multiple Choice Question - Example

In [10]:
# 1. Define your Pydantic model for a Multiple Choice Question
from pydantic import BaseModel
from typing import List

class MultipleChoiceQuestion(BaseModel):
    question: str
    choices: List[str]
    correct_answer: str  # The text of the correct choice

# 2. Prepare your parameters
model_name = "gpt-4.1"
temperature = 0.2
system_prompt = "You are an assistant that generates multiple choice questions in structured form."
chat_history = []  # Or previous messages if you have them
user_message = (
    "Create a multiple choice question about the water cycle for 5th grade students. "
    "Provide four answer options and indicate the correct answer."
)
api_key = "OPENAI_API_KEY"

# 3. Call the function
result = get_structured_response(
    model_name=model_name,
    temperature=temperature,
    system_prompt=system_prompt,
    chat_history=chat_history,
    pydantic_model=MultipleChoiceQuestion,
    user_message=user_message,
    api_key=api_key
)

print(result)


question='Which process in the water cycle involves water changing from a liquid to a gas?' choices=['Condensation', 'Evaporation', 'Precipitation', 'Collection'] correct_answer='Evaporation'


##  Conversation Level Analysis - Example

In [29]:
from pydantic import BaseModel
from openai import OpenAI
from typing import List, Optional


# 2. Define your Pydantic model for conversation analysis
class ConversationAnalysis(BaseModel):
    level: str  # e.g., "beginner", "intermediate", "advanced"
    justification: str  # Explanation of why this level was chosen

# 3. Prepare model parameters and a chat history with multiple messages
model_name = "gpt-4.1-nano"
temperature = 0.2
system_prompt = (
    "You are an assistant that analyzes the level of a conversation "
    "and explains your reasoning. Levels are: beginner, intermediate, advanced."
)

chat_history = [
    {"role": "user", "content": "Hi, can you tell me what an API is?"},
    {"role": "assistant", "content": "Sure! An API is an Application Programming Interface. It lets programs talk to each other."},
    {"role": "user", "content": "Can you give an example?"},
    {"role": "assistant", "content": "For example, when you use a weather app, it gets data from a weather service using an API."},
    {"role": "user", "content": "Is it hard to use an API?"},
    {"role": "assistant", "content": "Not really! Many APIs have good documentation and examples to help you get started."},
]

user_message = "Analyze the above conversation and determine its level. Explain your reasoning."
api_key = "OPENAI_API_KEY"

# 4. Call the function and print the result
result = get_structured_response(
    model_name=model_name,
    temperature=temperature,
    system_prompt=system_prompt,
    chat_history=chat_history,
    pydantic_model=ConversationAnalysis,
    user_message=user_message,
    api_key=api_key
)

print(result)


level='beginner' justification='The conversation involves basic questions and explanations about APIs, suitable for someone new to the topic. The questions are simple and focus on fundamental understanding, indicating a beginner level.'


## Minimal Parameters - Example

In [2]:
from pydantic import BaseModel
from openai import OpenAI
from typing import List, Optional

# Define your Pydantic model just before the function call
class ComplimentModel(BaseModel):
    message: str
    language: str
    compliments: List[str]

api_key = "OPENAI_API_KEY"

for i in range(3):
  result = get_structured_response(
      model_name="gpt-4.1-nano",
      temperature = 1,
      api_key=api_key,
      user_message="Say something nice in any language you choose!",
      pydantic_model=ComplimentModel
  )
  print(result)


message='Your kindness radiates and makes the world a brighter place.' language='English' compliments=["You're truly appreciated.", 'Your smile is contagious.', 'You have a wonderful personality.']
message='La belleza de una sonrisa puede iluminar incluso el día más gris.' language='Spanish' compliments=['You have a wonderful smile!', 'Your kindness is contagious.', 'You are truly appreciated.']
message='Your kindness brightens the world today!' language='English' compliments=['You have a wonderful smile!', 'Your positivity is infectious!', "You're truly appreciated!"]
