### Extract best risk drivers based on Somers D


**Introducing Structured Outputs in the API**  
https://openai.com/index/introducing-structured-outputs-in-the-api/

Last year at DevDay, we introduced JSON mode—a useful building block for developers looking to build reliable applications with our models. While JSON mode improves model reliability for generating valid JSON outputs, it does not guarantee that the model’s response will conform to a particular schema. Today we’re introducing Structured Outputs in the API, a new feature designed to ensure model-generated outputs will exactly match JSON Schemas provided by developers.

Generating structured data from unstructured inputs is one of the core use cases for AI in today’s applications. Developers use the OpenAI API to build powerful assistants that have the ability to fetch data and answer questions via function calling(opens in a new window), extract structured data for data entry, and build multi-step agentic workflows that allow LLMs to take actions. Developers have long been working around the limitations of LLMs in this area via open source tooling, prompting, and retrying requests repeatedly to ensure that model outputs match the formats needed to interoperate with their systems. Structured Outputs solves this problem by constraining OpenAI models to match developer-supplied schemas and by training our models to better understand complicated schemas.

In [7]:
from openai import OpenAI
import json, os
import pandas as pd

## Set the API key
client = OpenAI(api_key = os.getenv('OPENAI_API_KEY'))

#### A. Create dummy data

In [2]:
# Create the DataFrame with dummy data
data = {
    'Risk Driver': [
        'Risk Driver A', 'Risk Driver B', 'Risk Driver C', 'Risk Driver D', 'Risk Driver E',
        'Risk Driver F', 'Risk Driver G', 'Risk Driver H', 'Risk Driver I', 'Risk Driver J'
    ],
    'Somers D Outcome': [0.15, 0.22, 0.35, 0.05, 0.40, 0.18, 0.27, 0.09, 0.33, 0.12]
}

df = pd.DataFrame(data)

# Convert DataFrame to JSON
table_json = df.to_json(orient='records')

df = df.sort_values('Somers D Outcome', ascending=False).reset_index(drop=True)
df

Unnamed: 0,Risk Driver,Somers D Outcome
0,Risk Driver E,0.4
1,Risk Driver C,0.35
2,Risk Driver I,0.33
3,Risk Driver G,0.27
4,Risk Driver B,0.22
5,Risk Driver F,0.18
6,Risk Driver A,0.15
7,Risk Driver J,0.12
8,Risk Driver H,0.09
9,Risk Driver D,0.05


#### B. Create a test table with drivers and Somers D values

In [3]:
# Create ChatGPT query
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_best_drivers",
            "strict": True,
            "parameters": {
                "type": "object",
                "properties": {
                    "drivers": {"type": "string", 
                                "enum": list(df['Risk Driver'].unique())},
                },
                "required": ["drivers"],
                "additionalProperties": False,
            },
        },
    },
]


def extract_drivers(query):
    response_dict = query.to_dict()
    
    drivers = []
    for tool_call in response_dict['choices'][0]['message']['tool_calls']:
        arguments = tool_call['function']['arguments']
        driver = eval(arguments)['drivers']
        drivers.append(driver)
    return drivers

#### C. Get the best 3 risk drivers

In [4]:
messages = [
    {"role": "user", "content": "Pick the 3 best risk drivers from the table below"},
    {"role": "user", "content": "\n\nHere is the data:\n" + table_json}
           ]
      
completion = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools,
    tool_choice="required"
)

# Get the drivers
drivers_list = extract_drivers(completion)
df['Best 3 risk drivers'] = df['Risk Driver'].isin(drivers_list)

drivers_list

['Risk Driver E', 'Risk Driver C', 'Risk Driver I']

#### D. Get the worst 3 risk drivers

In [5]:
messages = [
    {"role": "user", "content": "Pick the 3 worst risk drivers from the table below"},
    {"role": "user", "content": "\n\nHere is the data:\n" + table_json}
           ]
      
completion = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools,
    tool_choice="required"
)

# Get the drivers
drivers_list = extract_drivers(completion)
df['Worst 3 risk drivers'] = df['Risk Driver'].isin(drivers_list)

drivers_list

['Risk Driver D', 'Risk Driver H', 'Risk Driver J']

#### E. Outcome

In [6]:
# Function to apply background colors
def highlight_rows(row):
    if row['Best 3 risk drivers']:
        return ['background-color: #90EE90' for _ in row]
    elif row['Worst 3 risk drivers']:
        return ['background-color: red' for _ in row]
    else:
        return ['' for _ in row]

# Apply the highlighting
df.style.apply(highlight_rows, axis=1).format({'Somers D Outcome': '{:.1%}'})

Unnamed: 0,Risk Driver,Somers D Outcome,Best 3 risk drivers,Worst 3 risk drivers
0,Risk Driver E,40.0%,True,False
1,Risk Driver C,35.0%,True,False
2,Risk Driver I,33.0%,True,False
3,Risk Driver G,27.0%,False,False
4,Risk Driver B,22.0%,False,False
5,Risk Driver F,18.0%,False,False
6,Risk Driver A,15.0%,False,False
7,Risk Driver J,12.0%,False,True
8,Risk Driver H,9.0%,False,True
9,Risk Driver D,5.0%,False,True
