# GPT-5.2 Reasoning Effort Parameter Testing

This notebook tests the `reasoning_effort` parameter with GPT-5.2 using both:
1. **Chat Completions API** - `reasoning_effort` parameter
2. **Responses API** - `reasoning.effort` nested parameter

According to OpenAI docs:
- GPT-5.2 supports: `none`, `low`, `medium`, `high`, `xhigh`
- Default is `none` (minimal reasoning, lower latency)
- `xhigh` is new in GPT-5.2

## Setup and Imports

In [4]:
import os
import time
import pandas as pd
from openai import AzureOpenAI
from dotenv import load_dotenv
from datetime import datetime

# Load environment variables
load_dotenv()

# Initialize Azure OpenAI client
client = AzureOpenAI(
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    api_version="2025-04-01-preview",  # Use the same API version that works
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT")
)

print("‚úÖ Environment loaded and client initialized")
print(f"üìÖ Test Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print(f"üì° API Version: 2025-04-01-preview")

‚úÖ Environment loaded and client initialized
üìÖ Test Date: 2026-01-13 11:42:21
üì° API Version: 2025-04-01-preview


In [5]:
# Configure the model to test
# Change this to match your actual deployment name
MODEL_NAME = "gpt-5.2"  # Options: "gpt-5.2", "gpt-5.1", or your deployment name

print(f"üéØ Testing with model: {MODEL_NAME}")
print(f"\n‚ö†Ô∏è  If you get 404 errors, update MODEL_NAME to match your deployment")
print(f"   Common deployment names: 'gpt-5.1', 'gpt-5.2', 'gpt-51', 'gpt-52'")
print(f"   Or check Azure OpenAI Studio for your exact deployment name")

üéØ Testing with model: gpt-5.2

‚ö†Ô∏è  If you get 404 errors, update MODEL_NAME to match your deployment
   Common deployment names: 'gpt-5.1', 'gpt-5.2', 'gpt-51', 'gpt-52'
   Or check Azure OpenAI Studio for your exact deployment name


### Select Model to Test

Update the model name based on what's available in your deployment:

In [6]:
# Check available deployments
try:
    # List models/deployments
    models_response = client.models.list()
    
    print("üìã Available Model Deployments:")
    print("="*80)
    
    available_models = []
    for model in models_response:
        print(f"  ‚Ä¢ {model.id}")
        available_models.append(model.id)
    
    print(f"\n‚úÖ Found {len(available_models)} deployed model(s)")
    
    # Check for GPT-5.x models
    gpt5_models = [m for m in available_models if 'gpt-5' in m.lower() or 'gpt5' in m.lower()]
    if gpt5_models:
        print(f"\nüéØ GPT-5 series models found:")
        for model in gpt5_models:
            print(f"   ‚Ä¢ {model}")
    else:
        print(f"\n‚ö†Ô∏è  No GPT-5 series models found in your deployment")
        print(f"   You may need to deploy GPT-5.1 or GPT-5.2 in Azure OpenAI Studio")
        
except Exception as e:
    print(f"‚ùå Could not list models: {str(e)}")
    print("\n‚ÑπÔ∏è  You may need to check your Azure OpenAI resource in the portal")
    available_models = []

üìã Available Model Deployments:
  ‚Ä¢ dall-e-3-3.0
  ‚Ä¢ dall-e-2-2.0
  ‚Ä¢ whisper-001
  ‚Ä¢ gpt-35-turbo-0301
  ‚Ä¢ gpt-35-turbo-0613
  ‚Ä¢ gpt-35-turbo-1106
  ‚Ä¢ gpt-35-turbo-0125
  ‚Ä¢ gpt-35-turbo-instruct-0914
  ‚Ä¢ gpt-35-turbo-16k-0613
  ‚Ä¢ gpt-4-0125-Preview
  ‚Ä¢ gpt-4-1106-Preview
  ‚Ä¢ gpt-4-0314
  ‚Ä¢ gpt-4-0613
  ‚Ä¢ gpt-4-32k-0314
  ‚Ä¢ gpt-4-32k-0613
  ‚Ä¢ gpt-4-vision-preview
  ‚Ä¢ gpt-4-turbo-2024-04-09
  ‚Ä¢ gpt-4-turbo-jp
  ‚Ä¢ gpt-4o-2024-05-13
  ‚Ä¢ gpt-4o-2024-08-06
  ‚Ä¢ gpt-4o-mini-2024-07-18
  ‚Ä¢ gpt-4o-2024-11-20
  ‚Ä¢ gpt-4o-audio-mai
  ‚Ä¢ gpt-4o-realtime-preview
  ‚Ä¢ gpt-4o-mini-realtime-preview-2024-12-17
  ‚Ä¢ gpt-4o-realtime-preview-2024-12-17
  ‚Ä¢ gpt-4o-realtime-preview-2025-06-03
  ‚Ä¢ gpt-4o-canvas-2024-09-25
  ‚Ä¢ gpt-4o-audio-preview-2024-10-01
  ‚Ä¢ gpt-4o-audio-preview-2024-12-17
  ‚Ä¢ gpt-4o-audio-preview-2025-06-03
  ‚Ä¢ gpt-4o-mini-audio-preview-2024-12-17
  ‚Ä¢ computer-use-preview-2025-02-11
  ‚Ä¢ computer-use-preview-2025-03-11
  ‚Ä

## Check Available Model Deployments

First, let's verify what models are deployed in your Azure OpenAI resource:

## Test Problem

We'll use a reasoning task that benefits from deeper thinking:

In [7]:
TEST_PROMPT = """
A company has 3 data centers:
- US East: 40% capacity, 200ms avg latency to users
- EU West: 60% capacity, 150ms avg latency to users  
- Asia: 30% capacity, 300ms avg latency to users

Traffic pattern:
- 50% from North America
- 30% from Europe
- 20% from Asia

Requirements:
- Target: <100ms latency for 95% of requests
- Budget: $500K for improvements
- Must maintain 99.99% uptime

Provide:
1. Root cause analysis of latency issues
2. Recommended architecture changes
3. Cost breakdown
4. Implementation priority
"""

print("Test prompt configured ‚úÖ")

Test prompt configured ‚úÖ


---

## Part 1: Chat Completions API Testing

Testing with `reasoning_effort` parameter (flat structure)

In [8]:
def test_chat_completions_api(reasoning_effort: str):
    """
    Test Chat Completions API with reasoning_effort parameter.
    """
    print(f"\n{'='*80}")
    print(f"Testing Chat Completions API - reasoning_effort='{reasoning_effort}'")
    print(f"{'='*80}")
    
    messages = [
        {"role": "system", "content": "You are an expert cloud infrastructure architect."},
        {"role": "user", "content": TEST_PROMPT}
    ]
    
    try:
        start_time = time.time()
        
        # Chat Completions API call with reasoning_effort
        response = client.chat.completions.create(
            model=MODEL_NAME,  # Use configured model name
            messages=messages,
            reasoning_effort=reasoning_effort,  # Flat parameter
            max_completion_tokens=4000  # Use max_completion_tokens instead of max_tokens
        )
        
        latency = time.time() - start_time
        content = response.choices[0].message.content
        
        result = {
            "API": "Chat Completions",
            "Model": MODEL_NAME,
            "Reasoning Effort": reasoning_effort,
            "Latency (s)": round(latency, 2),
            "Prompt Tokens": response.usage.prompt_tokens,
            "Completion Tokens": response.usage.completion_tokens,
            "Total Tokens": response.usage.total_tokens,
            "Response Length": len(content),
            "Finish Reason": response.choices[0].finish_reason,
            "Response Preview": content[:300] + "..." if len(content) > 300 else content,
            "Full Response": content
        }
        
        print(f"‚úÖ Success!")
        print(f"   Latency: {latency:.2f}s")
        print(f"   Tokens: {response.usage.total_tokens} (prompt: {response.usage.prompt_tokens}, completion: {response.usage.completion_tokens})")
        print(f"   Response length: {len(content)} chars")
        
        return result
        
    except Exception as e:
        error_msg = str(e)
        print(f"‚ùå Error: {error_msg}")
        
        # Provide helpful error messages
        if "404" in error_msg or "not found" in error_msg.lower():
            print(f"\nüí° Tip: Model '{MODEL_NAME}' not found in your deployment.")
            print(f"   1. Check your deployment name in Azure OpenAI Studio")
            print(f"   2. Update the MODEL_NAME variable above")
            print(f"   3. For GPT-5.1, the deployment might be named 'gpt-5.1' or 'gpt-51'")
        
        return {
            "API": "Chat Completions",
            "Model": MODEL_NAME,
            "Reasoning Effort": reasoning_effort,
            "Error": error_msg
        }

### Test Chat Completions API with Different Reasoning Levels

In [9]:
# Test different reasoning effort levels
reasoning_levels = ["none", "low", "medium", "high", "xhigh"]

chat_completions_results = []

for level in reasoning_levels:
    result = test_chat_completions_api(level)
    chat_completions_results.append(result)
    time.sleep(1)  # Brief pause between requests

print("\n" + "="*80)
print("Chat Completions API testing complete!")
print("="*80)


Testing Chat Completions API - reasoning_effort='none'
‚úÖ Success!
   Latency: 94.44s
   Tokens: 1954 (prompt: 164, completion: 1790)
   Response length: 7496 chars

Testing Chat Completions API - reasoning_effort='low'
‚úÖ Success!
   Latency: 112.92s
   Tokens: 2281 (prompt: 164, completion: 2117)
   Response length: 7164 chars

Testing Chat Completions API - reasoning_effort='medium'
‚úÖ Success!
   Latency: 122.99s
   Tokens: 2551 (prompt: 164, completion: 2387)
   Response length: 6204 chars

Testing Chat Completions API - reasoning_effort='high'
‚úÖ Success!
   Latency: 232.87s
   Tokens: 4164 (prompt: 164, completion: 4000)
   Response length: 0 chars

Testing Chat Completions API - reasoning_effort='xhigh'
‚úÖ Success!
   Latency: 211.56s
   Tokens: 4164 (prompt: 164, completion: 4000)
   Response length: 0 chars

Chat Completions API testing complete!


### View Chat Completions Results

In [None]:
# Create DataFrame for Chat Completions results
df_chat = pd.DataFrame(chat_completions_results)

# Display summary columns
display_cols = ["API", "Model", "Reasoning Effort", "Latency (s)", "Total Tokens", "Response Length", "Finish Reason"]
available_cols = [col for col in display_cols if col in df_chat.columns]

print("\nüìä CHAT COMPLETIONS API RESULTS SUMMARY")
print("="*80)

# Check if there were errors
if 'Error' in df_chat.columns and df_chat['Error'].notna().any():
    print("‚ö†Ô∏è  Some tests encountered errors:")
    display(df_chat)
    print("\nüí° See error messages above for troubleshooting tips")
else:
    display(df_chat[available_cols])

---

## Part 2: Responses API Testing

Testing with `reasoning: { effort: "..." }` nested parameter

In [None]:
def test_responses_api(reasoning_effort: str):
    """
    Test Responses API with nested reasoning.effort parameter.
    Note: Responses API may not be available in all Azure OpenAI deployments yet.
    """
    print(f"\n{'='*80}")
    print(f"Testing Responses API - reasoning.effort='{reasoning_effort}'")
    print(f"{'='*80}")
    
    try:
        start_time = time.time()
        
        # Responses API call with nested reasoning parameter
        response = client.responses.create(
            model=MODEL_NAME,  # Use configured model name
            input=TEST_PROMPT,
            reasoning={
                "effort": reasoning_effort  # Nested parameter
            },
            max_output_tokens=4000
        )
        
        latency = time.time() - start_time
        
        # Extract content from response
        content = ""
        for item in response.items:
            if hasattr(item, 'content') and item.content:
                content += item.content
        
        result = {
            "API": "Responses",
            "Model": MODEL_NAME,
            "Reasoning Effort": reasoning_effort,
            "Latency (s)": round(latency, 2),
            "Input Tokens": response.usage.input_tokens if hasattr(response.usage, 'input_tokens') else "N/A",
            "Output Tokens": response.usage.output_tokens if hasattr(response.usage, 'output_tokens') else "N/A",
            "Total Tokens": response.usage.total_tokens if hasattr(response.usage, 'total_tokens') else "N/A",
            "Response Length": len(content),
            "Response Preview": content[:300] + "..." if len(content) > 300 else content,
            "Full Response": content
        }
        
        print(f"‚úÖ Success!")
        print(f"   Latency: {latency:.2f}s")
        print(f"   Response length: {len(content)} chars")
        
        return result
        
    except AttributeError as e:
        error_msg = f"Responses API not available: {str(e)}"
        print(f"‚ùå {error_msg}")
        print(f"\nüí° Tip: The Responses API may not be available in Azure OpenAI yet.")
        print(f"   This is a newer API that may only be in OpenAI's platform.")
        return {
            "API": "Responses",
            "Model": MODEL_NAME,
            "Reasoning Effort": reasoning_effort,
            "Error": error_msg
        }
    except Exception as e:
        error_msg = str(e)
        print(f"‚ùå Error: {error_msg}")
        
        if "404" in error_msg or "not found" in error_msg.lower():
            print(f"\nüí° Tip: Model '{MODEL_NAME}' not found or Responses API not supported.")
        
        return {
            "API": "Responses",
            "Model": MODEL_NAME,
            "Reasoning Effort": reasoning_effort,
            "Error": error_msg
        }

### Test Responses API with Different Reasoning Levels

In [None]:
# Test different reasoning effort levels
responses_api_results = []

for level in reasoning_levels:
    result = test_responses_api(level)
    responses_api_results.append(result)
    time.sleep(1)  # Brief pause between requests

print("\n" + "="*80)
print("Responses API testing complete!")
print("="*80)

### View Responses API Results

In [None]:
# Create DataFrame for Responses API results
df_responses = pd.DataFrame(responses_api_results)

# Display summary columns
display_cols = ["API", "Model", "Reasoning Effort", "Latency (s)", "Total Tokens", "Response Length"]
available_cols = [col for col in display_cols if col in df_responses.columns]

print("\nüìä RESPONSES API RESULTS SUMMARY")
print("="*80)

# Check if there were errors
if 'Error' in df_responses.columns and df_responses['Error'].notna().any():
    print("‚ö†Ô∏è  Some tests encountered errors:")
    display(df_responses)
    print("\nüí° See error messages above for troubleshooting tips")
else:
    display(df_responses[available_cols])

---

## Part 3: Combined Analysis

### Side-by-Side Comparison

In [None]:
# Combine both results
df_combined = pd.concat([df_chat, df_responses], ignore_index=True)

print("\nüìä COMBINED API COMPARISON")
print("="*80)
display(df_combined[available_cols])

### Latency Comparison

In [None]:
import matplotlib.pyplot as plt

# Only plot if we have valid data
if 'Latency (s)' in df_combined.columns and 'Error' not in df_combined.columns:
    fig, ax = plt.subplots(figsize=(12, 6))
    
    for api in df_combined['API'].unique():
        data = df_combined[df_combined['API'] == api]
        ax.plot(data['Reasoning Effort'], data['Latency (s)'], marker='o', label=api, linewidth=2)
    
    ax.set_xlabel('Reasoning Effort Level', fontsize=12)
    ax.set_ylabel('Latency (seconds)', fontsize=12)
    ax.set_title('Latency Comparison: Chat Completions vs Responses API', fontsize=14, fontweight='bold')
    ax.legend(fontsize=11)
    ax.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()
else:
    print("‚ö†Ô∏è Unable to generate latency plot - check for errors in results")

### Token Usage Comparison

In [None]:
if 'Total Tokens' in df_combined.columns and 'Error' not in df_combined.columns:
    fig, ax = plt.subplots(figsize=(12, 6))
    
    for api in df_combined['API'].unique():
        data = df_combined[df_combined['API'] == api]
        if data['Total Tokens'].dtype in ['int64', 'float64']:
            ax.plot(data['Reasoning Effort'], data['Total Tokens'], marker='s', label=api, linewidth=2)
    
    ax.set_xlabel('Reasoning Effort Level', fontsize=12)
    ax.set_ylabel('Total Tokens', fontsize=12)
    ax.set_title('Token Usage Comparison: Chat Completions vs Responses API', fontsize=14, fontweight='bold')
    ax.legend(fontsize=11)
    ax.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()
else:
    print("‚ö†Ô∏è Unable to generate token usage plot - check for errors in results")

### View Full Response for Specific Test

In [None]:
def show_full_response(df, api: str, reasoning_effort: str):
    """
    Display the full response for a specific API and reasoning effort level.
    """
    row = df[(df['API'] == api) & (df['Reasoning Effort'] == reasoning_effort)]
    
    if row.empty:
        print(f"‚ùå No results found for {api} with reasoning_effort='{reasoning_effort}'")
        return
    
    if 'Error' in row.columns and not pd.isna(row['Error'].values[0]):
        print(f"‚ùå Error occurred: {row['Error'].values[0]}")
        return
    
    print(f"\n{'='*80}")
    print(f"{api} API - reasoning_effort='{reasoning_effort}'")
    print(f"{'='*80}\n")
    print(row['Full Response'].values[0])
    print(f"\n{'='*80}")

# Example: View Chat Completions with "medium" reasoning
# show_full_response(df_combined, "Chat Completions", "medium")

# Example: View Responses API with "high" reasoning
# show_full_response(df_combined, "Responses", "high")

print("Use show_full_response(df_combined, 'API_NAME', 'reasoning_level') to view full responses")

---

## Part 4: Key Findings & Recommendations

In [None]:
print("""
‚ïî‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïó
‚ïë                    KEY FINDINGS & RECOMMENDATIONS                            ‚ïë
‚ïö‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïù

üìå API Syntax Differences:
   ‚Ä¢ Chat Completions API: Use flat `reasoning_effort="medium"` parameter
   ‚Ä¢ Responses API: Use nested `reasoning={"effort": "medium"}` parameter

üìå Reasoning Effort Levels (GPT-5.2):
   ‚Ä¢ none   - Default, minimal reasoning, lowest latency
   ‚Ä¢ low    - Light reasoning, faster responses
   ‚Ä¢ medium - Balanced reasoning (recommended starting point)
   ‚Ä¢ high   - Deep reasoning for complex problems
   ‚Ä¢ xhigh  - NEW in GPT-5.2! Maximum reasoning capability

üìå When to Use Each Level:
   ‚Ä¢ none/low    ‚Üí Simple tasks, speed-critical applications
   ‚Ä¢ medium      ‚Üí Most enterprise tasks, good balance
   ‚Ä¢ high/xhigh  ‚Üí Complex reasoning, critical decisions, architecture design

üìå Expected Tradeoffs:
   ‚Ä¢ Higher reasoning effort = More latency + More tokens + Better quality
   ‚Ä¢ Lower reasoning effort = Less latency + Fewer tokens + May need prompt tuning

üìå Best Practices:
   1. Start with 'none' and increase only if quality is insufficient
   2. Use Responses API for multi-turn conversations (better CoT handling)
   3. Monitor token usage - higher reasoning uses more tokens
   4. For production: Test different levels with your specific use case

‚ïö‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïù
""")

---

## Export Results

In [None]:
# Save results to CSV
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
output_file = f"reasoning_effort_results_{timestamp}.csv"

df_combined.to_csv(output_file, index=False)
print(f"\n‚úÖ Results exported to: {output_file}")
print(f"üìä Total tests conducted: {len(df_combined)}")