# Compare models

In this notebook, you'll run sample prompts through chat completion requests sent to two of your deployed GPT models. Later, you'll verify some of the metrics available on the usage of your Azure OpenAI resources and determine which model is best suited for your use case.

## Before you start

You'll need the latest version of the **openai** library to run the code in this notebook. Additionally, you'll need the **azure-indentity** library to authenticate your requests for metric values submitted using the Azure Monitor API and the **matplotlib** library to test the output code generated by your models in this exercise. Run the cell below to install the aforementioned libraries.

In [None]:
# Install the latest version of the openai library using pip
! pip install openai -U

# Install the Azure Identity library using pip
! pip install azure-identity

# Install the matplotlib library using pip
! pip install matplotlib

Now you need to define the values that will be used when submitting a chat completion request through the API endpoint. 

In [2]:
# Define the base URL for your Azure OpenAI Service endpoint
# Replace 'Your Azure OpenAI Service Endpoint' with your actual endpoint URL obtained previously
api_base = 'Your Azure OpenAI Service Endpoint'

# Define the API key for your Azure OpenAI Service
# Replace 'Your Azure OpenAI Service API Key' with your actual API key obtained previously
api_key = 'Your Azure OpenAI Service API Key'

# Define the names of the models deployed in your Azure OpenAI Service
model_name1 = 'gpt-4o'
model_name2 = 'gpt-4o-mini'

# Define the API version to use for the Azure OpenAI Service
api_version = '2024-08-01-preview'


Next, you need to encode the image used in this exercise into a data URL. This URL will be used to embed the image directly in the chat completion request together with the text prompt.

In [3]:
# Import the base64 module, which provides functions for encoding and decoding data in base64 format
import base64

# Import the guess_type function from the mimetypes module
# This function is used to guess the MIME type of a file based on its filename or URL
from mimetypes import guess_type


In [4]:
# Function to encode a local image into a data URL
def local_image_to_data_url(image_path):
    # Guess the MIME type of the image based on the file extension
    mime_type, _ = guess_type(image_path)
    if mime_type is None:
        mime_type = 'application/octet-stream'  # Default MIME type if none is found

    # Read and encode the image file
    with open(image_path, "rb") as image_file:
        base64_encoded_data = base64.b64encode(image_file.read()).decode('utf-8')

    # Construct the data URL
    return f"data:{mime_type};base64,{base64_encoded_data}"


In [5]:
# Define the path to the image file
image_path = './imgs/demo.png'

# Convert the local image to a data URL using the local_image_to_data_url function
data_url = local_image_to_data_url(image_path)


Now you'll create two instances of the AzureOpenAI client, one for each model, to interact with your Azure OpenAI Service and obtain the chat completion responses.

In [8]:
# Import the AzureOpenAI class from the openai library
from openai import AzureOpenAI

# Create two instances of the AzureOpenAI client to interact with Azure's OpenAI Service
client1 = AzureOpenAI(
    # Use the API key for authentication
    api_key=api_key,  
    
    # Specify the API version to use
    api_version=api_version,
    
    # Construct the base URL for the deployment using the provided API base and deployment name
    base_url=f"{api_base}openai/deployments/{model_name1}",
)

client2 = AzureOpenAI(
    api_key=api_key,  
    api_version=api_version,    
    base_url=f"{api_base}openai/deployments/{model_name2}",
)

In [None]:
# Define the messages to send to the models
messages1=[
    { 
        "role": "user", 
        "content": [  
            { 
                # Specify the type of content as text
                "type": "text", 
                    
                # Provide the text content for the model to process
                "text": "Please create Python code for image, and use plt to save the new picture under imgs/ and name it gpt-4o.jpg." 
            },
            { 
                # Specify the type of content as an image URL
                "type": "image_url",
                  
                # Provide the image URL for the model to process
                "image_url": {
                    "url": data_url
                }
            }
        ] 
    } 
]

messages2=[
    { 
        "role": "user", 
        "content": [  
            { 
                "type": "text", 
                "text": "Please create Python code for image, and use plt to save the new picture under imgs/ and name it gpt-4o-mini.jpg." 
            },
            { 
                "type": "image_url",
                "image_url": {
                    "url": data_url
                }
            }
        ] 
    } 
]

In [9]:
# Create the chat completion requests using the AzureOpenAI clients
response1 = client1.chat.completions.create(
    # Specify the model to use for generating the response
    model=model_name1,
    
    # Define the messages to send to the model
    messages=messages1,
    
    # Set the maximum number of tokens to generate in the response
    max_tokens=2000 
)

response2 = client2.chat.completions.create(
    model=model_name2,
    messages=messages2,
    max_tokens=2000 
)

In [10]:
# The response contains multiple choices, and we are accessing the first one as our result
result1 = response1.choices[0].message.content
result2 = response2.choices[0].message.content

The variables `result1` and `result2` now contain the content of the first choice from their respective responses. This content is the generated text or code from the model based on the input messages. You can print each result, copy the code block generated within them, run each of the codes in a new code cell and compare their outputs. Are the scripts and outputs in any way different? 

In [None]:
print(result1)

In [None]:
print(result2)

You can submit more requests and have the code modified. It will also further demonstrate the difference between the models and make the metrics observed later on more significant. However, to make sure that the models keep track of the prompt history, we need to append their responses and the new prompts to the `messages` variables that we've been using so far.

In [None]:
# Add the responses to the messages as an Assistant Role
messages1.append({"role": "assistant", "content": result1})
messages2.append({"role": "assistant", "content": result2})

# Define the new prompt that will develop the chat completion further
new_prompt = "Add a legend to the plot replacing the labels"

# Add the user's question to the messages as a User Role
messages1.append({"role": "user", "content": new_prompt})
messages2.append({"role": "user", "content": new_prompt})

In [None]:
# Submit the new chat completion requests
response1 = client1.chat.completions.create(
    model=model_name1,
    messages=messages1,
    max_tokens=2000 
)
response2 = client2.chat.completions.create(
    model=model_name2,
    messages=messages2,
    max_tokens=2000 
)
result1 = response1.choices[0].message.content
result2 = response2.choices[0].message.content

You can print the new results and compare them once again. If you want the models to make further changes in the codes, you can repeat the steps in the previous two code cells with a new prompt. Now we will generate an access token to collect the metrics values from each model. 

In [51]:
from azure.identity import DefaultAzureCredential

credential = DefaultAzureCredential()
token = credential.get_token("https://management.azure.com/.default")
access_token = token.token

Before running the last code cell, you need to copy the resource ID for your Azure AI Services from the Azure Portal. Go to the overview page of your Azure AI Services resource and select **JSON View**. Copy the Resource ID and replace the `Your resource ID` field below.

In [None]:
import requests
from datetime import datetime, timedelta
import matplotlib.pyplot as plt

# Define the resource ID and the metric name
resource_id = "Your resource ID"
metric_name = "TokenTransaction"
model_deployment_names = ["gpt-4o", "gpt-4o-mini"]

# Calculate the timespan for the last 30 minutes
end_time = datetime.now()
start_time = end_time - timedelta(minutes=30) # Feel free to change timedelta to (hours=1), if necessary 
timespan = f"{start_time.isoformat()}Z/{end_time.isoformat()}Z"

# Create the filter condition for multiple model deployment names
filter_condition = " or ".join([f"ModelDeploymentName eq '{name}'" for name in model_deployment_names])

# Define the API endpoint with timespan and filter condition
url = f"https://management.azure.com{resource_id}/providers/microsoft.insights/metrics?api-version=2018-01-01&metricnames={metric_name}&timespan={timespan}&$filter={filter_condition}"

# Set the headers
headers = {
    "Authorization": f"Bearer {access_token}",
    "Content-Type": "application/json"
}

# Make the request
response = requests.get(url, headers=headers)

# Check the response
if response.status_code == 200:
    data = response.json()

    # Extract time series data for each model deployment name
    time_series_data = {}
    for value in data['value']:
        for timeseries in value['timeseries']:
            model_name = timeseries['metadatavalues'][0]['value']
            if model_name not in time_series_data:
                time_series_data[model_name] = []
            for data_point in timeseries['data']:
                time_series_data[model_name].append((data_point['timeStamp'], data_point['total']))

    # Plot the metrics over the timespan for each model deployment name
    plt.figure(figsize=(12, 6))
    for model_name, series in time_series_data.items():
        timestamps, values = zip(*series)
        plt.plot(timestamps, values, label=model_name)

    plt.xlabel('Timestamp')
    plt.ylabel('Processed Inference Tokens')
    plt.title('Processed Inference Tokens Usage Over Time')
    plt.legend()
    plt.xticks(rotation=45)
    plt.tight_layout()
    plt.show()
else:
    print("Failed to retrieve metrics:", response.status_code, response.text)

## Conclusion

After reviewing the plot and remembering the benchmark values in the Accuracy vs. Cost chart observed before, can you conclude which model is best for your use case? Does the difference in the outputs' accuracy outweight the difference in tokens generated and therefore cost? 

## Clean up

If you've finished the exercise, you should delete the resources you have created to avoid incurring unnecessary Azure costs.

1. Return to the browser tab containing the Azure portal (or re-open the [Azure portal](https://portal.azure.com?azure-portal=true) in a new browser tab) and view the contents of the resource group where you deployed the resources used in this exercise.
1. On the toolbar, select **Delete resource group**.
1. Enter the resource group name and confirm that you want to delete it.