#### In this notebook, we will discuss validating structured outputs from language models using Pydantic and OpenAI. 

#### We'll also introduce a  library called instructor that simplifies this process and offers extra features to leverage validation to improve the quality of your outputs.

While some have resorted to threatening human life to generate structured data (https://twitter.com/goodside/status/1657396491676164096?s=20), Pydantic is even more effective.


### Pydantic
Unlike libraries like dataclasses, Pydantic goes a step further and defines a schema for your dataclass. This schema is used to validate data, but also to generate documentation and even to generate a JSON schema, which is perfect for our use case of generating structured data with language models.

In [None]:
import os
from dotenv import load_dotenv, find_dotenv
from pydantic import BaseModel
from openai import AzureOpenAI

load_dotenv(find_dotenv())

client = AzureOpenAI(
    api_key=os.environ.get("AZURE_OPENAI_API_KEY"),
    api_version=os.environ.get("OPENAI_API_VERSION"),
    azure_endpoint=os.environ.get("OPENAI_API_BASE"),
    timeout=10.0,
    max_retries=10
)


class PythonPackage(BaseModel):
    name: str
    author: str
        
# By providing the model with the following prompt, we can generate a JSON schema for a PythonPackage dataclass.

resp = client.chat.completions.create(
    model="gpt35-turbo-16k-product-dev",
    messages=[
        {
            "role": "user",
            "content": "Return the `name`, and `author` of pydantic, in a json object."
        },
    ]
)
PythonPackage.model_validate_json(resp.choices[0].message.content)

If there is an issue, resp.choices[0].message.content could include text or code blocks in prose or markdown format that we need to handle appropriately.

#### LLM responses with markdown code blocks

In [None]:
import json

json.loads("""```json{"name": "pydantic", "author": "Samuel Colvin"}```""")

#### LLM responses with prose

In [None]:
json.loads("""
Ok heres the authors of pydantic: Samuel Colvin, and the name this library

{
  "name": "pydantic",
  "author": "Samuel Colvin"
}
""")