# GPT-4o-mini (details 08/08/24)
https://platform.openai.com/docs/api-reference/introduction

This experiment used GPT4o

### User tier 3 
Check your tier here: https://platform.openai.com/settings/organization/limits. API usage is subject to rate limits applied on tokens per minute (TPM), requests per minute or day (RPM/RPD), and other model-specific limits. Your organization's rate limits are listed below.
- Token Limits = 800,000 TPM
- Request and other limits	= 5,000 RPM
- Batch queue limits = 100,000,000 TPD

### GPT-4o and GPT-4o mini pricing
https://openai.com/api/pricing/

GPT-4o is our most advanced multimodal model that’s faster and cheaper than GPT-4 Turbo with stronger vision capabilities. The model has 128K context and an October 2023 knowledge cutoff.

GPT-4o mini is our most cost-efficient small model that’s smarter and cheaper than GPT-3.5 Turbo, and has vision capabilities. The model has 128K context and an October 2023 knowledge cutoff.
- gpt-4o
    - $5.00 / 1M input tokens
    - $15.00 / 1M output tokens
    - Vision pricing: $0.003825 per image (1600x1200)
- gpt-4o-2024-08-06
    - $2.50 / 1M input tokens
    - $10.00 / 1M output tokens
    - Vision pricing: $0.001913 per image (1600x1200)
- gpt-4o-mini 
    - $0.150 / 1M input tokens
    - $0.600 / 1M output tokens
    - Vision pricing: $0.003825 per image (1600x1200)


### *We will be testing on gpt-4o-mini*

### Structured Outputs vs JSON mode
Structured Outputs is the evolution of JSON mode. While both ensure valid JSON is produced, only Structured Outputs ensure schema adherance. Both Structured Outputs and JSON mode are supported in the Chat Completions API, Assistants API, Fine-tuning API and Batch API.

We recommend always using Structured Outputs instead of JSON mode when possible.

However, Structured Outputs with response_format: {type: "json_schema", ...} is only supported with the gpt-4o-mini, gpt-4o-mini-2024-07-18, and gpt-4o-2024-08-06 model snapshots and later.

Structured Outputs	JSON Mode
Outputs valid JSON	Yes	Yes
Adheres to schema	Yes (see supported schemas)	No
Compatible models	gpt-4o-mini, gpt-4o-2024-08-06, and later	gpt-3.5-turbo, gpt-4-* and gpt-4o-* models
Enabling	response_format: { type: "json_schema", json_schema: {"strict": true, "schema": ...} }	response_format: { type: "json_object" }

In [None]:
# %pip uninstall openai
!pip install openai

^C
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.1.2 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [1]:
from openai import OpenAI
import os
import pandas as pd
import PIL.Image
import json
from io import StringIO
import time
import typing
import base64

## Instruction (text input)

You can experiment with 'local_context' and change 'country' as needed.

In [2]:
# Add the local context to the input data
country = "//country//"
local_context = "//local_context//"

# country and local_context that we used in the experiment
country = "Thailand"
local_context = "Speed limits in {country} are a set of maximum speeds that apply to all roads in the country. For <Attribute> = 'Speed limit', the maximum limits are as follows: 80 km/h within built-up areas and in Bangkok, 90 km/h outside built-up areas, and 120 km/h on motorways.\nFor the <Attribute> = 'Motorcycle speed limit', the limits are 80 km/h in Bangkok and other provinces' built-up areas, and 90 km/h on other highways (motorcycles are not allowed on motorways). For the <Attribute> = 'Truck speed limit', the limits are 60 km/h in Bangkok and other provinces' built-up areas, 80 km/h on highways, and 100 km/h on motorways. <Attribute>: Carriageway answer should be Divided carriageway= 'Carriageway A of a divided road'.\nFor highways in Thailand <Attribute> 'Lane width' is 'Wide \u22653.25m'. Moreover, <Attribute>: 'Area type' in Bangkok and Phatum thani should be 'Urban'; therefore, <Attribute>: 'Upgrade cost' should be 'High' and <Attribute>: 'Street lighting' should be 'Present' in Bangkok and Phatum thani area.\nThe common <Attribute>: Roadside severity - driver-side object are 'Safety barrier - concrete' and 'Safety barrier - metal' that should carefully look at the image. The common <Attribute>: Roadside severity - passenger-side object are 'Safety barrier - concrete', 'Safety barrier - metal', 'Deep drainage ditch', 'Rigid sign, post or pole \u226510cm' and 'Unprotected safety barrier end' that should carefully look at the image. In Thailand highway, there should not be 'No object' for <Attribute>: Roadside severity - passenger-side object and <Attribute>: Roadside severity - driver-side object."

In [3]:
# make sure the json file is in the correct path
json_file = './/text//prompts.json'

def format_attributes_to_json(json_file, image_id="image_id"):
    with open(json_file, 'r') as file:
        data = json.load(file)

    formatted_data = {"image_id": image_id}
    
    for attribute in data["attributes"]:
        item_name = attribute.get('Item', 'Unknown Item')
        categories = [category.get('Category', 'Unknown Category') for category in attribute.get('categories', [])]
        formatted_data[item_name] = categories
    
    return json.dumps(formatted_data, indent=4)
output_json = format_attributes_to_json(json_file)


prompt_instruction = f"""
You are a road safety assessment coder from the International Road Assessment Programme (iRAP). Your task is to analyse images of road sections taken in {country} and accurately assess 52 road safety attributes. For each attribute, follow these steps:

1. Analyse the Image: Examine the road section in the image, focusing on all relevant elements that correspond to the 52 '<Attribute>'s you need to assess.
2. Read the <Attribute Description>: For each of the 52 attributes, read the '<Attribute description>' to understand what specific aspect of the image you need to evaluate.
3. Refer to Categories: For each attribute, refer to the possible '<Category class>' options provided. If a '<Category description>' is available, read it to understand the specific criteria for each category.
4. Select the Most Matching Category: Based on your analysis of the image and understanding of the attribute and category descriptions, select the single <Category class> that best matches what you observe in the image. If multiple categories are equally relevant, choose the category that appears first in the provided list.
5. Output the Results in JSON Format: Return the results in JSON format, with each attribute associated with a single <Category class> value that you assess to be the most appropriate based on the image.

Local context:
Please use <location> to understand the local context. '<driver-side>' and '<passenger-side>' are used throughout the <Attribute> and <Attribute description>. Driver-side refers to the side of the road corresponding with the driver of a vehicle travelling in the direction of the survey, and the passenger-side is the other side. If the country drives on the left (e.g., the UK), the passenger side is on the left of the image, and the driver side is on the right of the image. {local_context}
"""

output_format = f"""
Output Format: Return the results in JSON format, where each attribute is associated with a single <Category class> value that best matches your analysis of the image. If multiple categories seem equally relevant, select the category that appears first in the provided list.

JSON structure:
{output_json}
Ensure that each attribute in the JSON output contains only one selected <Category class> that you determine to be the most appropriate based on the image.
"""

## Path configurations for ThaiRAP


In [None]:
# image_folder_path = 'C:/Users/ucesnjo/OneDrive - University College London/General - AAAI2025/image/ThaiRAP'
# csv_file_path = 'C:/Users/ucesnjo/OneDrive - University College London/General - AAAI2025/Validation.csv'
# json_file = 'C://Users//ucesnjo//OneDrive - University College London//General - AAAI2025//text//prompts.json'
# save_path = 'C:/Users/ucesnjo/OneDrive - University College London/General - AAAI2025/result/gpt4o/gpt4o-mini_Final.csv'

In [4]:
# Path configurations
image_folder_path = './/image//ThaiRAP'
csv_file_path = './Validation.csv'
save_path = './/result//gpt4o-mini_thairap.csv'


## Path configurations for Mapillary


In [None]:
# Path configurations
image_folder_path = './/image//Mapillary_processed' #used image folder as the image folder path (images were already processed (cropped, resized, and renamed))
csv_file_path = './/image//Mapillary_processed//mapillary.csv'
save_path = './/result//gpt4o-mini__mapillary.csv'

## Single prompt
This mean, in a request/ask VLM on 1 image and its infomation, however, the aim is to get 52 answers ('attributes')

These below functions used to set up prommpts

In [5]:

# Read the CSV file
df = pd.read_csv(csv_file_path)

# Function to generate a prompt for a single image
def generate_single_image_prompt(image_id, df):
    row = df[df['image_id'] == int(image_id)]
    if not row.empty:
        lat = row['Latitude start'].values[0]
        lon = row['Longitude start'].values[0]
        return f"<image_id>: {image_id} and <location>: {{{lat},{lon}}}"
    else:
        return None 

# Function to convert JSON to text prompt
def json2text(include_attribute_description=True, include_category_description=True):
    with open(json_file, 'r') as file:
        data = json.load(file)
    attributes = data["attributes"]
    formatted_descriptions = []
    
    for idx, attribute in enumerate(attributes, 1):
        item = attribute.get('Item', 'Unknown Item')
        attribute_description = attribute.get('Attribute description', 'No description available.')
        
        if idx > 1:
            formatted_descriptions.append("")  # Add a blank line before each new item
        
        formatted_descriptions.append(f"{idx}. <Attribute>: {item}")
        if include_attribute_description:
            formatted_descriptions.append(f"<Attribute description>: {attribute_description}")
        
        categories_details = []
        
        for category in attribute.get('categories', []):
            cat_id = category.get('Category', 'N/A')
            category_name = category.get('Category', 'Unknown Category')
            category_description = category.get('Category_description', '')
            
            if include_category_description and category_description:
                categories_details.append(f" <Category class>:{cat_id}, <Category description>: {category_description}")
            else:
                categories_details.append(f" <Category class>:{cat_id}")
        
        # Format the categories list
        formatted_descriptions.append(f"<Categories>: [{', '.join([category['Category'] for category in attribute.get('categories', [])])}]")
        
        # Append detailed category descriptions
        formatted_descriptions.extend(categories_details)
        
    return "\n".join(formatted_descriptions)

def format_attributes_to_json(json_file, image_id="image_id"):
    with open(json_file, 'r') as file:
        data = json.load(file)

    formatted_data = {"image_id": image_id}
    
    for attribute in data["attributes"]:
        item_name = attribute.get('Item', 'Unknown Item')
        categories = [category.get('Category', 'Unknown Category') for category in attribute.get('categories', [])]
        formatted_data[item_name] = categories
    
    return json.dumps(formatted_data, indent=4)
output_json = format_attributes_to_json(json_file)

fields = [
    "image_id",
    "Carriageway",
    "Upgrade cost",
    "Motorcycle observed flow",
    "Bicycle observed flow",
    "Pedestrian observed flow across the road",
    "Pedestrian observed flow along the road driver-side",
    "Pedestrian observed flow along the road passenger-side",
    "Land use - driver-side",
    "Land use - passenger-side",
    "Area type",
    "Speed limit",
    "Motorcycle speed limit",
    "Truck speed limit",
    "Differential speed limits",
    "Median type",
    "Centreline rumble strips",
    "Roadside severity - driver-side distance",
    "Roadside severity - driver-side object",
    "Roadside severity - passenger-side distance",
    "Roadside severity - passenger-side object",
    "Shoulder rumble strips",
    "Paved shoulder - driver-side",
    "Paved shoulder - passenger-side",
    "Intersection type",
    "Intersection channelisation",
    "Intersecting road volume",
    "Intersection quality",
    "Property access points",
    "Number of lanes",
    "Lane width",
    "Curvature",
    "Quality of curve",
    "Grade",
    "Road condition",
    "Skid resistance / grip",
    "Delineation",
    "Street lighting",
    "Pedestrian crossing facilities - inspected road",
    "Pedestrian crossing quality",
    "Pedestrian crossing facilities - intersecting road",
    "Pedestrian fencing",
    "Speed management / traffic calming",
    "Vehicle parking",
    "Sidewalk - driver-side",
    "Sidewalk - passenger-side",
    "Service road",
    "Facilities for motorised two wheelers",
    "Facilities for bicycles",
    "Roadworks",
    "Sight distance",
    "School zone warning",
    "School zone crossing supervisor"
]



### Choose image_id here
This can be used for both ThaiRAP and Mapillary
- ThaiRAP 2037 images
- Mapillary 168 images

In [18]:
# all the image ids
choose_image_id = range(1, 2) # process image 1 to 2

# GPT configuration

In [21]:
# Set your OpenAI API key directly
api_key = "//api_key_here//"

## Run to get the results

In [22]:
# connect to the OpenAI API
client = OpenAI(api_key=api_key)

# Initialize an empty DataFrame if the save file doesn't exist
if not os.path.exists(save_path):
    # Create a new DataFrame with the appropriate columns
    output_df = pd.DataFrame(columns=fields)
    output_df.to_csv(save_path, index=False)  # Save the new empty CSV with headers
else:
    # Load the existing file with keep_default_na=False to avoid converting empty strings to NaN
    output_df = pd.read_csv(save_path, keep_default_na=False)

# Process each image
for image_id in choose_image_id:  # Adjust the range as necessary
    image_file = f"{image_id}.jpg"
    image_path = os.path.join(image_folder_path, image_file)
    
    # Generate prompt for the current image
    image_prompt = generate_single_image_prompt(image_id, df)
    if image_prompt is None:
        print(f"Skipping image {image_id} - No data found in CSV")
        continue

    def encode_image(image_path):
        with open(image_path, "rb") as image_file:
            return base64.b64encode(image_file.read()).decode('utf-8')


    # Getting the base64 string
    base64_image = encode_image(image_path)

    # Construct the prompt to send to OpenAI's API
    prompt_system = f"{prompt_instruction}\n{json2text()}\n\n{output_format}"

    # Call the OpenAI API to generate completions
    response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "system", 
            "content": prompt_system
        },
        {
            "role": "user", 
            "content": [
                {
                    "type": "text",
                    "text": image_prompt
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url":f"data:image/jpeg;base64,{base64_image}",
                        "detail": "high"
                        }
                },
            ]
        }
    ],
    temperature=0,
    #max_tokens=16384
    #response_format={type: "V_RoAst_schema"}
    )

    response_text = response.choices[0].message.content
    response_text = response_text[8:-3].strip()
    response_dict = json.loads(response_text)

    df_result = pd.DataFrame.from_dict(response_dict, orient='index').transpose()
    df_result = df_result.applymap(lambda x: x[0] if isinstance(x, list) and len(x) > 0 else x)

    for field in fields:
        if field not in df.columns:
            df_result[field] = None  # Add missing fields with None

    # Reorder the DataFrame to match the fields list
    df_result = df_result[fields]
    print(f"Processed image {image_id} - {image_file}")
    
    # Append the new row to the existing DataFrame
    output_df = pd.concat([output_df, df_result], ignore_index=True)

    # Save the updated DataFrame back to the CSV after processing each image
    output_df.to_csv(save_path, index=False)



Processed image 1 - 1.jpg
