<a href="https://colab.research.google.com/github/JSJeong-me/AI-Innovation-2024/blob/main/NLP/4-4-Using_chained_calls_for_o1_structured_outputs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Using chained calls for reasoning structured outputs

The initially released versions (September 2024) of [o1](https://openai.com/index/introducing-openai-o1-preview/) reasoning models have advanced capabilities but do not have [structured outputs](https://platform.openai.com/docs/guides/structured-outputs/examples) support.

This means that requests with o1 don't have reliable type-safety and rely on the prompt itself to return a useful JSON.

In this guide, we'll explore two methods to prompt o1 models, specifically `o1-preview`, to return a valid JSON format when using the OpenAI API.

# Prompting

The simplest way to return a JSON response using `o1-preview` is to explicitly prompt it.

Let's run through an example of:
- Fetching a wikipedia page of companies
- Determining which could benefit the most from AI capabilities
- Returning them in a JSON format, which could then be ingested by other systems

In [5]:
!pip install openai devtools

Collecting devtools
  Downloading devtools-0.12.2-py3-none-any.whl.metadata (4.8 kB)
Collecting asttokens<3.0.0,>=2.0.0 (from devtools)
  Downloading asttokens-2.4.1-py2.py3-none-any.whl.metadata (5.2 kB)
Collecting executing>=1.1.1 (from devtools)
  Downloading executing-2.1.0-py2.py3-none-any.whl.metadata (8.9 kB)
Downloading devtools-0.12.2-py3-none-any.whl (19 kB)
Downloading asttokens-2.4.1-py2.py3-none-any.whl (27 kB)
Downloading executing-2.1.0-py2.py3-none-any.whl (25 kB)
Installing collected packages: executing, asttokens, devtools
Successfully installed asttokens-2.4.1 devtools-0.12.2 executing-2.1.0


In [2]:
from google.colab import userdata
import os

OPENAI_API_KEY=userdata.get('OPENAI_API_KEY')
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
api_key = os.getenv("OPENAI_API_KEY")

In [3]:
import requests
from openai import OpenAI

client = OpenAI()

def fetch_html(url):
    response = requests.get(url)
    if response.status_code == 200:
        return response.text
    else:
        return None

url = "https://en.wikipedia.org/wiki/List_of_largest_companies_in_the_United_States_by_revenue"
html_content = fetch_html(url)

json_format = """
{
    companies: [
        {
            \"company_name\": \"OpenAI\",
            \"page_link\": \"https://en.wikipedia.org/wiki/OpenAI\",
            \"reason\": \"OpenAI would benefit because they are an AI company...\"
        }
    ]
}
"""

o1_response = client.chat.completions.create(
    model="o1-preview",
    messages=[
        {
            "role": "user",
            "content": f"""
You are a business analyst designed to understand how AI technology could be used across large corporations.

- Read the following html and return which companies would benefit from using AI technology: {html_content}.
- Rank these propects by opportunity by comparing them and show me the top 3. Return only as a JSON with the following format: {json_format}"
"""
        }
    ]
)

print(o1_response.choices[0].message.content)

{
    "companies": [
        {
            "company_name": "Walmart",
            "page_link": "https://en.wikipedia.org/wiki/Walmart",
            "reason": "As the largest retailer, Walmart can benefit from AI technology to optimize its supply chain, improve inventory management, enhance customer experience with personalized recommendations, and streamline operations."
        },
        {
            "company_name": "UnitedHealth Group",
            "page_link": "https://en.wikipedia.org/wiki/UnitedHealth_Group",
            "reason": "UnitedHealth Group can leverage AI for predictive analytics in patient care, improve diagnostics, personalize treatment plans, and optimize administrative processes in healthcare management."
        },
        {
            "company_name": "JPMorgan Chase",
            "page_link": "https://en.wikipedia.org/wiki/JPMorgan_Chase",
            "reason": "As a leading financial institution, JPMorgan Chase can use AI for fraud detection, risk management, 

Note that the response is already quite good - it returns the JSON with the appropriate responses. However, it runs into the same pitfalls as existing use-cases of prompt-only JSON inference:
- You must manually process this JSON into your type-safe structure
- Model refusals are not [explicitly returned from the API as a separate structure](https://platform.openai.com/docs/guides/structured-outputs/refusals)

# Structured Outputs

Let's now do this with [structured outputs](https://platform.openai.com/docs/guides/structured-outputs). To enable this functionality, we’ll link the `o1-preview` response with a follow-up request to `gpt-4o-mini`, which can effectively process the data returned from the initial o1-preview response.

In [6]:
from pydantic import BaseModel
from devtools import pprint

class CompanyData(BaseModel):
    company_name: str
    page_link: str
    reason: str

class CompaniesData(BaseModel):
    companies: list[CompanyData]

o1_response = client.chat.completions.create(
    model="o1-preview",
    messages=[
        {
            "role": "user",
            "content": f"""
You are a business analyst designed to understand how AI technology could be used across large corporations.

- Read the following html and return which companies would benefit from using AI technology: {html_content}.
- Rank these propects by opportunity by comparing them and show me the top 3. Return each with {CompanyData.__fields__.keys()}
"""
        }
    ]
)

o1_response_content = o1_response.choices[0].message.content

response = client.beta.chat.completions.parse(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "user",
            "content": f"""
Given the following data, format it with the given response format: {o1_response_content}
"""
        }
    ],
    response_format=CompaniesData,
)

pprint(response.choices[0].message.parsed)

CompaniesData(
    companies=[
        CompanyData(
            company_name='Walmart',
            page_link='/wiki/Walmart',
            reason=(
                "As the world's largest retailer, Walmart can significantly benefit from AI technology to optimize its"
                ' vast supply chain, improve demand forecasting, enhance inventory management, and personalize custome'
                'r experiences. Implementing AI can lead to cost reductions, increased efficiency, and a competitive e'
                'dge in the retail market.'
            ),
        ),
        CompanyData(
            company_name='UnitedHealth Group',
            page_link='/wiki/UnitedHealth_Group',
            reason=(
                'As a leading healthcare company, UnitedHealth Group can leverage AI to enhance patient care through p'
                'redictive analytics, automate administrative tasks, detect fraud, and personalize health plans. AI ca'
                'n help process large volum

# Conclusion

Structured outputs allow your code to have reliable type-safety and simpler prompting. In addition, it allows you to re-use your object schemas for easier integration into your existing workflows.

The o1 class of models currently doesn't have structured outputs support, but we can re-use existing structured outputs functionality from `gpt-4o-mini` by chaining two requests together. This flow currently requires two calls, but the second `gpt-4o-mini` call cost should be minimal compared to the `o1-preview`/`o1-mini` calls.