### Extract best risk drivers


**Introducing Structured Outputs in the API**  
https://openai.com/index/introducing-structured-outputs-in-the-api/

Last year at DevDay, we introduced JSON mode—a useful building block for developers looking to build reliable applications with our models. While JSON mode improves model reliability for generating valid JSON outputs, it does not guarantee that the model’s response will conform to a particular schema. Today we’re introducing Structured Outputs in the API, a new feature designed to ensure model-generated outputs will exactly match JSON Schemas provided by developers.

Generating structured data from unstructured inputs is one of the core use cases for AI in today’s applications. Developers use the OpenAI API to build powerful assistants that have the ability to fetch data and answer questions via function calling(opens in a new window), extract structured data for data entry, and build multi-step agentic workflows that allow LLMs to take actions. Developers have long been working around the limitations of LLMs in this area via open source tooling, prompting, and retrying requests repeatedly to ensure that model outputs match the formats needed to interoperate with their systems. Structured Outputs solves this problem by constraining OpenAI models to match developer-supplied schemas and by training our models to better understand complicated schemas.

In [1]:
from openai import OpenAI
import json, os
import pandas as pd
from IPython.display import display, Markdown

## Set the API key
client = OpenAI(api_key = os.getenv('OPENAI_API_KEY'))

#### A. Create dummy data

In [2]:
# Create the DataFrame with dummy data
data = {
    'Driver': [
        'Age',
        'Annual Income',
        'Debt-to-Income Ratio',
        'Credit Score',
        'Employment Length',
        'Loan Amount',
        'Loan Purpose',
        'Home Ownership Status',
        'Number of Delinquencies',
        'Credit Inquiries Last 6 Months'
    ],
    'Somers_D': [0.45, 0.40, 0.55, 0.05, 0.30, 0.35, 0.25, 0.20, 0.50, 0.05],
    'PSI': [0.05, 0.08, 0.07, 0.03, 0.10, 0.12, 0.15, 0.18, 0.06, 0.09],
    'Herfindahl_Index': [0.12, 0.10, 0.18, 0.15, 0.20, 0.14, 0.30, 0.35, 0.22, 0.25],
    'Share_of_Missing_Values': [0.82, 0.01, 0.05, 0.00, 0.03, 0.04, 0.06, 0.07, 0.32, 0.01]
}

# Create the DataFrame
df = pd.DataFrame(data)

# Convert DataFrame to JSON
somers_df = df[['Driver', 'Somers_D']].to_json(orient='records')
all_records = df.to_json(orient='records')

df = df.sort_values('Somers_D', ascending=False).reset_index(drop=True)
df

Unnamed: 0,Driver,Somers_D,PSI,Herfindahl_Index,Share_of_Missing_Values
0,Debt-to-Income Ratio,0.55,0.07,0.18,0.05
1,Number of Delinquencies,0.5,0.06,0.22,0.32
2,Age,0.45,0.05,0.12,0.82
3,Annual Income,0.4,0.08,0.1,0.01
4,Loan Amount,0.35,0.12,0.14,0.04
5,Employment Length,0.3,0.1,0.2,0.03
6,Loan Purpose,0.25,0.15,0.3,0.06
7,Home Ownership Status,0.2,0.18,0.35,0.07
8,Credit Score,0.05,0.03,0.15,0.0
9,Credit Inquiries Last 6 Months,0.05,0.09,0.25,0.01


#### B. Create a test table with drivers and Somers D values

In [3]:
# Create ChatGPT query
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_best_drivers",
            "strict": True,
            "parameters": {
                "type": "object",
                "properties": {
                    "drivers": {"type": "string", 
                                "enum": list(df['Driver'].unique())},
                },
                "required": ["drivers"],
                "additionalProperties": False,
            },
        },
    },
]


def extract_drivers(query):
    response_dict = query.to_dict()
    
    drivers = []
    for tool_call in response_dict['choices'][0]['message']['tool_calls']:
        arguments = tool_call['function']['arguments']
        driver = eval(arguments)['drivers']
        drivers.append(driver)
    return drivers

#### C. Get the best 3 risk drivers

In [4]:
# Create a query
messages = [
    {"role": "user", "content": "Pick the 3 best risk drivers from the table below according to the highest Somers D"},
    {"role": "user", "content": "\n\nHere is the data:\n" + somers_df}
           ]
      
completion = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools,
    tool_choice="required"
)

# Get the drivers
drivers_list = extract_drivers(completion)


df['Best 3 risk drivers'] = df['Driver'].isin(drivers_list)

drivers_list

['Debt-to-Income Ratio', 'Number of Delinquencies', 'Age']

#### D. Get the worst 3 risk drivers

In [5]:
# Create a query
messages = [
    {"role": "user", "content": "Pick the 3 worst risk drivers from the table below according to the lowest Somers D"},
    {"role": "user", "content": "\n\nHere is the data:\n" + somers_df}
           ]
      
completion = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools,
    tool_choice="required"
)

# Get the drivers
drivers_list = extract_drivers(completion)
df['Worst 3 risk drivers'] = df['Driver'].isin(drivers_list)

drivers_list

['Credit Score', 'Credit Inquiries Last 6 Months', 'Home Ownership Status']

#### E. Outcome

In [6]:
# Function to apply background colors
def highlight_rows(row):
    if row['Best 3 risk drivers']:
        return ['background-color: #90EE90' for _ in row]
    elif row['Worst 3 risk drivers']:
        return ['background-color: red' for _ in row]
    else:
        return ['' for _ in row]

# Apply the highlighting
df.style.apply(highlight_rows, axis=1).format({'Somers_D': '{:.1%}'})

Unnamed: 0,Driver,Somers_D,PSI,Herfindahl_Index,Share_of_Missing_Values,Best 3 risk drivers,Worst 3 risk drivers
0,Debt-to-Income Ratio,55.0%,0.07,0.18,0.05,True,False
1,Number of Delinquencies,50.0%,0.06,0.22,0.32,True,False
2,Age,45.0%,0.05,0.12,0.82,True,False
3,Annual Income,40.0%,0.08,0.1,0.01,False,False
4,Loan Amount,35.0%,0.12,0.14,0.04,False,False
5,Employment Length,30.0%,0.1,0.2,0.03,False,False
6,Loan Purpose,25.0%,0.15,0.3,0.06,False,False
7,Home Ownership Status,20.0%,0.18,0.35,0.07,False,True
8,Credit Score,5.0%,0.03,0.15,0.0,False,True
9,Credit Inquiries Last 6 Months,5.0%,0.09,0.25,0.01,False,True


#### F. Analyse risk drivers

In [7]:
# Create query
model='gpt-4o-2024-08-06'
notes = """
    - **Somers' D**: Measures the strength and direction of association between each driver and the default outcome. Values range from -1 to 1, where higher absolute values indicate stronger predictive power. 
                     Values above 0.2 indicate strong predictive power, drivers with values below 0.1 are typically discarded.
    
    - **Population Stability Index (PSI)**: Assesses changes in the distribution of each driver over time.
      - **PSI < 0.1**: No significant change.
      - **0.1 ≤ PSI ≤ 0.25**: Moderate change.
      - **PSI > 0.25**: Significant shift.
    
    - **Herfindahl Index**: Indicates the concentration of categories within each driver. Values range from 0 to 1, with higher values showing greater concentration (less diversity). Values above 0.5 typically point to excessive concentration.
    
    - **Share of Missing Values**: Represents the proportion of missing data points for each driver, ranging from 0 (no missing data) to 1 (all data missing). A higher value indicates more missing data, which may impact the reliability of the driver.
"""

messages= [
    {
      "role": "system",
      "content": "You are a credit risk analyst."
    },
    {
      "role": "user",
      "content": 'In each step analyse an individual risk driver, present your conclusions per driver using bullets and finally create a list of 3 best risk drivers.'
    },
    {
      "role": "user",
      "content": notes
    },
    {
      "role": "user",
      "content": "\n\nHere is the data:\n" + all_records
    },
  ]

response_format={
    "type": "json_schema",
    "json_schema": {
      "name": "math_response",
      "strict": True,
      "schema": {
        "type": "object",
        "properties": {
          "steps": {
            "type": "array",
            "items": {
              "type": "object",
              "properties": {
                "driver": {
                  "type": "string"
                },
                "explanation": {
                  "type": "string"
                },
                "output": {
                  "type": "string"
                }
              },
              "required": ["driver", "explanation", "output"],
              "additionalProperties": False
            }
          },
          "final_answer": {
            "type": "string"
          }
        },
        "required": ["steps", "final_answer"],
        "additionalProperties": False
      }
    }
  }

In [8]:
completion = client.chat.completions.create(
    model=model,
    messages=messages,
    response_format=response_format,
)

response_dict = completion.to_dict()
response_dict

{'id': 'chatcmpl-A8BHVYJ0CioTLNSOIuwfj0nHNu8yW',
 'choices': [{'finish_reason': 'stop',
   'index': 0,
   'logprobs': None,
   'message': {'content': '{"steps":[{"driver":"Age","explanation":"Age has strong predictive power with a Somers\' D of 0.45. However, it has a very high share of missing values (0.82), which could undermine the reliability of the data. The PSI indicates stability in distribution over time and the Herfindahl Index shows good diversity.","output":"- Somers\' D: 0.45 (strong predictive power) \\n- PSI: 0.05 (no significant change) \\n- Herfindahl Index: 0.12 (good diversity) \\n- Share of Missing Values: 0.82 (high share of missing data)"},{"driver":"Annual Income","explanation":"Annual Income is a strong predictor with a Somers\' D of 0.4 and very low missing values (0.01), indicating reliability. It has a stable distribution over time and a diverse range of values according to the Herfindahl Index.","output":"- Somers\' D: 0.4 (strong predictive power) \\n- PSI: 

In [9]:
# Function to convert each step to Markdown
def convert_to_markdown(data):
    steps_md = []
    for i, step in enumerate(data["steps"]):
        driver = step["driver"]
        explanation = step["explanation"]
        output = step["output"]
        steps_md.append(f"#### {i+1}. {driver}\n\n {explanation}\n\n```\n{output}\n```")
    
    final_answer = f"### Final Answer\n\n```\n{data['final_answer']}\n```"
    return "\n\n".join(steps_md) + "\n\n" + final_answer

# Convert to Markdown
# response_dict = query.to_dict()
kala = response_dict['choices'][0]['message']['content']
kala = json.loads(kala)
markdown_output = convert_to_markdown(kala)

display(Markdown(markdown_output))

#### 1. Age

 Age has strong predictive power with a Somers' D of 0.45. However, it has a very high share of missing values (0.82), which could undermine the reliability of the data. The PSI indicates stability in distribution over time and the Herfindahl Index shows good diversity.

```
- Somers' D: 0.45 (strong predictive power) 
- PSI: 0.05 (no significant change) 
- Herfindahl Index: 0.12 (good diversity) 
- Share of Missing Values: 0.82 (high share of missing data)
```

#### 2. Annual Income

 Annual Income is a strong predictor with a Somers' D of 0.4 and very low missing values (0.01), indicating reliability. It has a stable distribution over time and a diverse range of values according to the Herfindahl Index.

```
- Somers' D: 0.4 (strong predictive power) 
- PSI: 0.08 (no significant change) 
- Herfindahl Index: 0.1 (good diversity) 
- Share of Missing Values: 0.01 (negligible missing data)
```

#### 3. Debt-to-Income Ratio

 This driver shows the highest predictive power with a Somers' D of 0.55. It has a low share of missing values, stable distribution, and a good diversity range, indicating overall reliability.

```
- Somers' D: 0.55 (strong predictive power) 
- PSI: 0.07 (no significant change) 
- Herfindahl Index: 0.18 (good diversity) 
- Share of Missing Values: 0.05 (low missing data)
```

#### 4. Credit Score

 Credit Score has a low Somers' D (0.05), indicating weak predictive power, though it shows stability over time and a diverse category range. It has no missing data, but the low predictive power makes it less useful.

```
- Somers' D: 0.05 (weak predictive power) 
- PSI: 0.03 (no significant change) 
- Herfindahl Index: 0.15 (good diversity) 
- Share of Missing Values: 0.0 (no missing data)
```

#### 5. Employment Length

 Employment Length has moderate predictive power with a Somers' D of 0.3. It has stable distribution and only a small amount of missing data (0.03), while having diverse categories.

```
- Somers' D: 0.3 (moderate predictive power) 
- PSI: 0.1 (on the boundary of moderate change) 
- Herfindahl Index: 0.2 (good diversity) 
- Share of Missing Values: 0.03 (low missing data)
```

#### 6. Loan Amount

 Loan Amount shows moderate predictive power with a Somers' D of 0.35. It has a moderate PSI but a good range of diversity and low missing data. It shows some changes in distribution over time.

```
- Somers' D: 0.35 (moderate predictive power) 
- PSI: 0.12 (moderate change) 
- Herfindahl Index: 0.14 (good diversity) 
- Share of Missing Values: 0.04 (low missing data)
```

#### 7. Loan Purpose

 Loan Purpose has moderate predictive power and a moderate change in distribution over time, with a good diversity level. However, it has slightly more missing data (0.06) than some other drivers.

```
- Somers' D: 0.25 (moderate predictive power) 
- PSI: 0.15 (moderate change) 
- Herfindahl Index: 0.3 (good diversity) 
- Share of Missing Values: 0.06 (moderate missing data)
```

#### 8. Home Ownership Status

 Home Ownership Status has low predictive power with a Somers' D of 0.2. It shows a moderate shift in distribution and less diversity due to the higher Herfindahl Index, with a moderate level of missing data.

```
- Somers' D: 0.2 (low predictive power) 
- PSI: 0.18 (moderate change) 
- Herfindahl Index: 0.35 (less diversity) 
- Share of Missing Values: 0.07 (moderate missing data)
```

#### 9. Number of Delinquencies

 This driver is powerful with a Somers' D of 0.5 but has a substantial share of missing data (0.32). It maintains stability over time and shows a decent level of category diversity.

```
- Somers' D: 0.5 (strong predictive power) 
- PSI: 0.06 (no significant change) 
- Herfindahl Index: 0.22 (good diversity) 
- Share of Missing Values: 0.32 (high missing data)
```

#### 10. Credit Inquiries Last 6 Months

 This driver has weak predictive power with a Somers' D of 0.05. However, it demonstrates a moderate shift in distribution and diversity and a negligible amount of missing data.

```
- Somers' D: 0.05 (weak predictive power) 
- PSI: 0.09 (no significant change) 
- Herfindahl Index: 0.25 (good diversity) 
- Share of Missing Values: 0.01 (negligible missing data)
```

### Final Answer

```
1. Debt-to-Income Ratio
2. Annual Income
3. Number of Delinquencies
```