# Structured Outputs

Structured outputs are useful when model responses are passed as inputs to other components of a system. 


## OpenAI API Structured Output

Historically, the OpenAI interface offered JSON output. However, using JSON output does not ensure adherence to a schema (data types, for example, may not be enforced). An alternative is to define the output schema using Pydantic.

A useful reference is the entry on [Structured Outputs](https://platform.openai.com/docs/guides/structured-outputs) from the API Documentation.

[Pydantic](https://docs.pydantic.dev/latest/) is a data validation library for Python. In Pydantic, we define [Models](https://docs.pydantic.dev/latest/concepts/models/) which are classes which inherit from [`BaseModel`](https://docs.pydantic.dev/latest/api/base_model/#pydantic.BaseModel) and define [`Fields`](https://docs.pydantic.dev/latest/api/fields/#pydantic.fields.Field) as annotated attributes.

In [None]:
from openai import OpenAI
from pydantic import BaseModel

client = OpenAI()

class CalendarEvent(BaseModel):
    name: str
    date: str
    participants: list[str]

response = client.responses.parse(
    model="gpt-4o-mini",
    input=[
        {"role": "system", "content": "Extract the event information."},
        {
            "role": "user",
            "content": "Alice and Bob are going to a science fair on Friday.",
        },
    ],
    text_format=CalendarEvent,
)

event = response.output_parsed

In [None]:
event

## LangChain and Pydantic

LangChain offers capabilities to structure outputs following a specific schema. 

In the example below, we use LangChain to obtain a Joke object with a specific structure given by a Pydantic BaseModel.

In [None]:
%load_ext dotenv
%dotenv ../../05_src/.secrets

In [None]:
        from langchain.chat_models import init_chat_model

        llm = init_chat_model("gpt-4o-mini", model_provider="openai")

In [None]:
from typing import Optional
from pydantic import BaseModel, Field

class Joke(BaseModel):
    setup: str=Field(description="The setup of the joke")
    punchline: str=Field(description="The punchline of the joke")
    rating: Optional[int] = Field(
        default=None, description="How funny the joke is, from 1 to 10"
    )

structured_llm = llm.with_structured_output(Joke)

jk = structured_llm.invoke("Tell me a joke about cats")

In [None]:
jk

## LangChain and TypedDict

We can also use a [typed dictionary or TypedDict](https://typing.python.org/en/latest/spec/typeddict.html) in Python to define the structure of our output. The keyword `Annotated[]` is used to wrap attributes of a typed dictionary.

In [None]:
from typing import Optional
from typing_extensions import Annotated, TypedDict

class JokeDict(TypedDict):
    setup: Annotated[str, ..., "The setup of the joke"] # No default, with description
    punchline: Annotated[str, ..., "The punchline of the joke"] # No default, with description
    rating: Annotated[Optional[int], None, "How funny the joke is, from 1 to 10"] # Default of None, with description)

In [None]:
structured_llm_dict = llm.with_structured_output(JokeDict)
jk_dict = structured_llm_dict.invoke("Tell me a joke about dogs")


In [None]:
jk_dict