# Analyzing Data and Interpreting Images with OpenAI's o1 Reasoning Model vs. GPT

## Introduction
OpenAI's o1 reasoning model is designed for complex problem-solving, data analysis, and image interpretation by simulating a multi-step thought process before generating responses. Unlike traditional GPT models, which produce output in a single pass, reasoning models use internal **reasoning tokens** to explore multiple approaches before finalizing an answer.
<p align="center">
    <img src="https://cdn.openai.com/API/images/guides/reasoning_tokens.png" alt="Reasoning Tokens" width="600">
</p>  

*Source: [OpenAI Reasoning Models Guide](https://platform.openai.com/docs/guides/reasoning)*

**Key Differences: o1 Reasoning Model vs. GPT**
- Multi-step reasoning: o1 evaluates different solutions before selecting the best response.
- Deeper analytical capabilities: Optimized for complex data interpretation tasks.
- Context-aware image analysis: Provides more structured and insightful image descriptions.
- Reasoning Effort Control: Users can adjust the depth of reasoning (`low`, `medium`, `high`).


For more details, refer to the [OpenAI Reasoning Models Guide](https://platform.openai.com/docs/guides/reasoning).


## Purchase and Store API Key

You need to **purchase** your [OpenAI](https://openai.com/) API key and store it securely, such as in **AWS Secrets Manager**.

- **Key Name:** `api_key`  
- **Key Value:** `<your OpenAI API key>`  
- **Secret Name:** `openai`  

## Install Python Libraries

- **openai**: Used to call `o1` and `GPT` models for data analysis and image interpretation.

In [1]:
pip install openai -q

Note: you may need to restart the kernel to use updated packages.


## Import Required Libraries

The following libraries are used in this notebook:

- **boto3**: AWS SDK for Python, used to interact with AWS services.
- **json**: Standard Python library for handling JSON data.
- **IPython.display**: Provides tools to display images, Markdown content, and other rich media in Jupyter Notebook.
- **openai**: Used to call `o1` and `GPT` models for data analysis and image interpretation.
- **pandas**: A powerful library for data manipulation and analysis.
- **pprint**: Pretty prints data structures for better readability.

In [2]:
import boto3
import json
from IPython.display import display, Image, Markdown
from openai import OpenAI
import pandas as pd
from pprint import pprint

## Retrieve API Keys Securely from AWS Secrets Manager

The following function, `get_secret()`, retrieves a secret from **AWS Secrets Manager**. This is a secure way to store and access sensitive credentials, such as API keys, without hardcoding them into the script

In [3]:
def get_secret(secret_name):
    region_name = "us-east-1"

    # Create a Secrets Manager client
    session = boto3.session.Session()
    client = session.client(
        service_name='secretsmanager',
        region_name=region_name
    )

    try:
        get_secret_value_response = client.get_secret_value(
            SecretId=secret_name
        )
    except ClientError as e:
        raise e

    secret = get_secret_value_response['SecretString']
    
    return json.loads(secret)

## Initialize OpenAI Client

The following code initializes the OpenAI client using a securely stored API key retrieved from AWS Secrets Manager.

In [4]:
client = OpenAI(api_key= get_secret('openai')['api_key'])

## Load and Analyze the Diamonds Dataset

This notebook uses the **diamonds dataset ([diamonds.csv](https://github.com/lbsocial/data-analysis-with-generative-ai/blob/main/diamonds.csv))**, which contains detailed attributes of diamonds, including weight, color, clarity, and price.

One interesting pattern in the dataset is that **diamonds with "IF" (Internally Flawless) clarity tend to have the lowest average price** compared to other clarity grades. This observation is counterintuitive, as one might expect the highest-clarity diamonds to be the most expensive.

In [6]:
df = pd.read_csv('ai_inventory.csv')
data_json = df.to_json(orient="records")
df.head()

Unnamed: 0,Use Case ID/Name,Mission Area,Agency,Bureau/Component/Office,Topic Area,Intended purpose and expected benefits of use case,Description of AI system's outputs,Current stage of development,Rights-impacting or Safety-impacting,Date Initiated,Development/Acquisition Date,Date Implemented,Date Retired,Open-source link to project code (if available),Supporting a High Impact Service Provider (HISP),"If yes to supporting a HISP, which one?",Public-facing service in HISP,Developed under contract(s) or in-house,Agency-owned Data Description,Demographic variables used in model features
0,USDA-001: Repair Spend,"REE: Research, Education, and Economics",ARS: Agricultural Research Service,Administrative and Financial Management,Mission-Enabling (Internal Agency Support),The intended purpose of this model is to revie...,The output of the model is a recommendation of...,Stage 4 - Operation and Maintenance (Use case ...,Neither,10/1/2019,10/1/2019,6/9/2020,,,No,,,Developed with both contracting and in-house r...,"Approximately 14,000 financial transactions we...",None;
1,USDA-002: ARS Project Mapping,"REE: Research, Education, and Economics",ARS: Agricultural Research Service,Office of National Programs,Science & Space,The intended purpose of this model is to proce...,The model outputs groups of similar projects a...,Stage 4 - Operation and Maintenance (Use case ...,Neither,1/1/2020,1/1/2021,5/1/2022,,,No,,,Developed with contracting resources,The data is a collection of project plans writ...,None;
2,USDA-003: NAL Automated Indexing,"REE: Research, Education, and Economics",ARS: Agricultural Research Service,National Agricultural Library,Science & Space,This system automatically assigns word tags to...,The model outputs terms to use as search tags ...,Stage 4 - Operation and Maintenance (Use case ...,Neither,6/1/2011,1/1/2012,6/1/2012,,,No,,,Developed with both contracting and in-house r...,The data is a collection of text scripts from ...,None;
3,USDA-004: Predictive Modeling of Invasive Pest...,MRP: Marketing and Regulatory Programs,APHIS: Animal and Plant Health Inspection Service,Plant Protection and Quarantine,Mission-Enabling (Internal Agency Support),The purpose of the model is to check how likel...,The model outputs are a prediction of whether ...,Stage 4 - Operation and Maintenance (Use case ...,Neither,7/1/2015,7/1/2015,5/1/2018,,,No,,,Developed in-house,Inspection data was collected from Plant Prote...,None;
4,USDA-005: Detection of Pre-symptomatic HLB Inf...,MRP: Marketing and Regulatory Programs,APHIS: Animal and Plant Health Inspection Service,Plant Protection and Quarantine,Science & Space,The purpose of the model is to detect citrus t...,The model outputs GPS Coordinates of potential...,Stage 5 - Retired (Use case has been retired o...,Neither,,,,9/30/2022,,,,,,No agency-owned data used in this project.,


## Generate Data Analysis Prompt for OpenAI Model

To investigate why diamonds with **IF (Internally Flawless) clarity** have the **lowest average price**, we generate a structured prompt for the OpenAI model. The model will analyze the dataset and generate insights, including **Python code for visualizations**.


In [15]:
data_prompt = f"Identify risk management strategies used by agencies. Data: {data_json}"
# print(data_prompt) 

## Define a Function to Get Assistance from OpenAI GPT-4o

The following function, `openai_gpt_help()`, sends a prompt to OpenAI's **GPT-4o model** and returns a response. It also prints the number of tokens used in the request.

In [16]:
def openai_gpt_help(prompt):
    messages = [{"role": "user", "content": prompt}]
    response = client.chat.completions.create(
        model='gpt-4o',
        messages=messages,
        temperature = 0
    )
    token_usage = response.usage
    
    pprint(f"Tokens used: {token_usage}")

    return response.choices[0].message.content

In [17]:
gpt_result = openai_gpt_help(prompt=data_prompt)

('Tokens used: CompletionUsage(completion_tokens=497, prompt_tokens=30230, '
 'total_tokens=30727, '
 'completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, '
 'audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), '
 'prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0))')


In [18]:
display(Markdown(gpt_result))

Risk management strategies used by agencies in the context of AI and data projects can be identified through various aspects of the project lifecycle and operational considerations. Here are some strategies inferred from the provided data:

1. **Stage of Development Monitoring**: Projects are categorized into different stages of development, such as initiation, development and acquisition, implementation, operation and maintenance, and retirement. This helps in managing risks by ensuring that projects are adequately monitored and evaluated at each stage.

2. **Rights and Safety Impact Assessment**: Projects are assessed for rights-impacting or safety-impacting implications. This helps in identifying and mitigating risks related to privacy, security, and safety.

3. **Data Management and Validation**: Agencies use structured data management practices, including data validation and quality control, to ensure the accuracy and reliability of the data used in AI models. This reduces the risk of errors and biases in model outputs.

4. **Development Resources**: Projects are developed using a combination of contracting and in-house resources. This strategy allows agencies to leverage external expertise while maintaining control over critical aspects of the project, thus managing risks related to resource availability and expertise.

5. **Open-source and Proprietary Tools**: Some projects provide open-source links to project code, which can enhance transparency and allow for community feedback and improvements, reducing the risk of undetected errors or vulnerabilities.

6. **Demographic Variables Consideration**: The use of demographic variables in model features is carefully considered, with many projects explicitly stating that no demographic variables are used. This helps in managing risks related to bias and discrimination.

7. **Data Privacy and Security**: Projects often mention whether agency-owned data is used and whether it is publicly available. This helps in managing risks related to data privacy and security.

8. **High Impact Service Provider (HISP) Support**: Some projects support high-impact service providers, which can help in aligning project goals with broader agency missions and managing risks related to project relevance and impact.

9. **Retirement and Transition Planning**: Projects in the retirement stage indicate a planned transition or decommissioning process, which helps in managing risks related to legacy systems and ensuring continuity of operations.

10. **Public and Stakeholder Engagement**: Some projects involve public-facing services or stakeholder engagement, which helps in managing risks related to public perception and acceptance of AI systems.

These strategies collectively help agencies manage various risks associated with AI and data projects, including technical, operational, ethical, and legal risks.

## Define a Function to Get Assistance from OpenAI o1 Model  

The following function, `openai_o_help()`, sends a prompt to OpenAI's **o1 reasoning model** and returns a response.  

### Key Differences Between o1 and GPT Models:
- **Reasoning Effort**: The o1 model allows users to control reasoning depth using `reasoning_effort` (`low`, `medium`, `high`).  
- **No Temperature Parameter**: Unlike GPT models, **o1 does not support `temperature`**.  
- **Developer Messages Replace System Messages**:  
  - Starting with `o1-2024-12-17`, **developer messages** replace **system messages** to align with chain-of-command behavior.  

### Best Practices for Prompting o1  
- **Keep prompts simple and direct.**  
- **Avoid chain-of-thought prompts.** o1 reasons internally, so step-by-step instructions aren't needed.  
- **Use delimiters for clarity.** Use Markdown, XML tags, or section titles.  
- **Try zero-shot first.** If needed, add few-shot examples that closely match your goal.  
- **Be explicit.** Clearly define success criteria and constraints.  
- **Markdown is disabled by default.** To enable, start with `"Formatting re-enabled"`.  

Source: [OpenAI Reasoning Models Best Practices Guide](https://platform.openai.com/docs/guides/reasoning-best-practices).  


In [19]:
def openai_o_help(prompt):
    messages = [ {"role": "user", "content": prompt}]
    response = client.chat.completions.create(
        model='o1',
        reasoning_effort="high", # low, medium or high
        messages=messages,

    )
    token_usage = response.usage
    
    pprint(f"Tokens used: {token_usage}")

    return response.choices[0].message.content

In [20]:
o1_result = openai_o_help(prompt=data_prompt)

('Tokens used: CompletionUsage(completion_tokens=2673, prompt_tokens=30229, '
 'total_tokens=32902, '
 'completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, '
 'audio_tokens=0, reasoning_tokens=1600, rejected_prediction_tokens=0), '
 'prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0))')


In [21]:
print(o1_result)

Below is a synthesis of the main risk management strategies that appear across USDA’s AI use cases, based on the information in the provided data. Although not every use case describes risk management steps explicitly, certain practices and patterns emerge repeatedly, forming the backbone of how agencies handle risk when developing and operating AI systems:

1) Structured Development Lifecycle and Governance  
   • Staged approach: Many projects are managed through defined “stages,” typically ranging from initiation (Stage 1) and development (Stage 2) to implementation (Stage 3), operation and maintenance (Stage 4), and final retirement (Stage 5). This phased progression helps teams identify and mitigate risks at each step—e.g., feasibility checks early on, security and performance testing at implementation, and continued monitoring in production.  
   • Governance and oversight: Several references note that systems undergo functionality and security testing before launch. Systems in S