# Images comparison with GPT-4 Turbo Vision

**GPT-4 Turbo with Vision** is a large multimodal model (LMM) developed by OpenAI that can analyze images and provide textual responses to questions about them. It incorporates both natural language processing and visual understanding. This guide provides details on the capabilities and limitations of GPT-4 Turbo with Vision.

GPT-4 Turbo with Vision on Azure OpenAI service is now in public preview. GPT-4 Turbo with Vision is a large multimodal model (LMM) developed by OpenAI that can analyze images and provide textual responses to questions about them. It incorporates both natural language processing and visual understanding. With enhanced mode, you can use the Azure AI Vision features to generate additional insights from the images.

> https://learn.microsoft.com/en-us/azure/ai-services/openai/whats-new#gpt-4-turbo-with-vision-now-available

In [1]:
import datetime
import openai
import os
import sys
import time

from dotenv import load_dotenv
from IPython.display import Image, HTML
from openai import AzureOpenAI

In [2]:
def check_openai_version():
    """
    Check Azure Open AI version
    """
    installed_version = openai.__version__

    try:
        version_number = float(installed_version[:3])
    except ValueError:
        print("Invalid OpenAI version format")
        return

    print(f"Installed OpenAI version: {installed_version}")

    if version_number < 1.0:
        print("[Warning] You should upgrade OpenAI to have version >= 1.0.0")
        print("To upgrade, run: %pip install openai --upgrade")
    else:
        print(f"[OK] OpenAI version {installed_version} is >= 1.0.0")


check_openai_version()

Installed OpenAI version: 1.12.0
[OK] OpenAI version 1.12.0 is >= 1.0.0


In [3]:
sys.version

'3.10.11 (main, May 16 2023, 00:28:57) [GCC 11.2.0]'

In [4]:
print(f"Today is {datetime.datetime.today().strftime('%d-%b-%Y %H:%M:%S')}")

Today is 05-Mar-2024 11:35:14


In [5]:
print(f"Python version: {sys.version}")

Python version: 3.10.11 (main, May 16 2023, 00:28:57) [GCC 11.2.0]


## 1. Azure Open AI

In [6]:
print(f"OpenAI version: {openai.__version__}")

OpenAI version: 1.12.0


In [7]:
load_dotenv("azure.env")

# Azure Open AI
openai.api_type: str = "azure"
openai.api_key = os.getenv("OPENAI_API_KEY")
openai.api_base = os.getenv("OPENAI_API_BASE")
openai.api_version = os.getenv("OPENAI_API_VERSION")

In [8]:
model = "gpt-4TurboVision"

In [9]:
endpoint = f"{openai.api_base}/openai/deployments/{model}/chat/completions?api-version=2023-12-01-preview"

In [10]:
client = AzureOpenAI(
    api_key=openai.api_key,
    api_version=openai.api_version,
    base_url=endpoint,
)

##  2. Single image analysis

In [11]:
def ask_image(prompt, img_url, imgview=True):
    """
    Ask image with Gpt-4 Turbo Vision
    """
    print(prompt)
    print()

    start = time.time()
    
    if imgview:
        # Viewing images
        print(f"Image : {img_url}")
        display(
            HTML(
                "<table><tr><td><img src={0}></td><td>".format(
                    img_url
                )
            )
        )

    # Calling Azure OpenAI model
    response = client.chat.completions.create(
        model=model,
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": prompt},
                    {
                        "type": "image_url",
                        "image_url": {"url": img_url},
                    },
                ],
            }
        ],
        max_tokens=2000,
        temperature=0,
    )

    # Print results
    print()
    print("Azure OpenAI GPT-4 Turbo with Vision results:")
    print("\033[1;31;34m")
    print(response.choices[0].message.content)

    print("\033[1;31;32m")
    print("[Note] These results are produced by an AI.")
    print(
        datetime.datetime.today().strftime("%d-%b-%Y %H:%M:%S"),
        "Powered by Azure OpenAI GPT-4 Turbo with vision model",
    )
    print("\033[0m")
    
    elapsed = time.time() - start
    print(f"Time elapsed in seconds = {round(elapsed, 5)}")

In [12]:
url = "https://github.com/retkowsky/Azure-OpenAI-demos/blob/main/images/Paris%20(1).png?raw=true"

In [13]:
prompt = "Describe this"

ask_image(prompt, url)

Describe this

Image : https://github.com/retkowsky/Azure-OpenAI-demos/blob/main/images/Paris%20(1).png?raw=true



Azure OpenAI GPT-4 Turbo with Vision results:
[1;31;34m
The Eiffel Tower is a wrought-iron lattice tower on the Champ de Mars in Paris, France. It is named after the engineer Gustave Eiffel, whose company designed and built the tower.
[1;31;32m
[Note] These results are produced by an AI.
05-Mar-2024 11:35:20 Powered by Azure OpenAI GPT-4 Turbo with vision model
[0m
Time elapsed in seconds = 2.7892


In [14]:
prompt = "Is it London, Paris, Lisbon, Madrid?"

ask_image(prompt, url)

Is it London, Paris, Lisbon, Madrid?

Image : https://github.com/retkowsky/Azure-OpenAI-demos/blob/main/images/Paris%20(1).png?raw=true



Azure OpenAI GPT-4 Turbo with Vision results:
[1;31;34m
Paris
[1;31;32m
[Note] These results are produced by an AI.
05-Mar-2024 11:35:22 Powered by Azure OpenAI GPT-4 Turbo with vision model
[0m
Time elapsed in seconds = 1.53684


In [15]:
prompt = "Generate a twitter post for this landmark with emojis and hashtags"

ask_image(prompt, url)

Generate a twitter post for this landmark with emojis and hashtags

Image : https://github.com/retkowsky/Azure-OpenAI-demos/blob/main/images/Paris%20(1).png?raw=true



Azure OpenAI GPT-4 Turbo with Vision results:
[1;31;34m
Bonjour from the City of Love! 🇫🇷💕 The Eiffel Tower never fails to take my breath away. 😍🗼 #Paris #EiffelTower #TravelGoals #France
[1;31;32m
[Note] These results are produced by an AI.
05-Mar-2024 11:35:30 Powered by Azure OpenAI GPT-4 Turbo with vision model
[0m
Time elapsed in seconds = 3.81095


In [16]:
url1 = "https://github.com/retkowsky/Azure-OpenAI-demos/blob/main/images/r5.jpg?raw=true"
url2 = "https://github.com/retkowsky/Azure-OpenAI-demos/blob/main/images/r52.jpg?raw=true"

In [17]:
prompt = "What is this car model?"

ask_image(prompt, url1)

What is this car model?

Image : https://github.com/retkowsky/Azure-OpenAI-demos/blob/main/images/r5.jpg?raw=true



Azure OpenAI GPT-4 Turbo with Vision results:
[1;31;34m
Renault R5
[1;31;32m
[Note] These results are produced by an AI.
05-Mar-2024 11:35:35 Powered by Azure OpenAI GPT-4 Turbo with vision model
[0m
Time elapsed in seconds = 2.223


In [18]:
prompt = "Describe this"

ask_image(prompt, url2)

Describe this

Image : https://github.com/retkowsky/Azure-OpenAI-demos/blob/main/images/r52.jpg?raw=true



Azure OpenAI GPT-4 Turbo with Vision results:
[1;31;34m
This is an electric car being charged.
[1;31;32m
[Note] These results are produced by an AI.
05-Mar-2024 11:35:42 Powered by Azure OpenAI GPT-4 Turbo with vision model
[0m
Time elapsed in seconds = 3.0744


In [19]:
url1 = "https://github.com/retkowsky/Azure-OpenAI-demos/blob/main/images/fridge.jpg?raw=true"

In [20]:
prompt = "Describe this"

ask_image(prompt, url1)

Describe this

Image : https://github.com/retkowsky/Azure-OpenAI-demos/blob/main/images/fridge.jpg?raw=true



Azure OpenAI GPT-4 Turbo with Vision results:
[1;31;34m
This is a picture of a refrigerator filled with fresh produce and other food items. The top shelf has containers of pre-made salads, pickled vegetables, and a jar of olives. The second shelf has a carton of eggs, a pitcher of water, a bottle of juice, and a container of strawberries. There is also a bunch of fresh herbs in a glass of water. The third shelf has a basket of cherry tomatoes, a whole chicken, and a container of blueberries. The bottom shelf has a variety of fruits including apples, pears, and oranges. The drawers are filled with fresh vegetables such as bell peppers, cauliflower, and kale. There is also a bottle of salad dressing on the door shelf. The refrigerator is well-organized and stocked with healthy food options.
[1;31;32m
[Note] These results are produced by an AI.
05-Mar-2024 11:35:50 Powered by Azure OpenAI GPT-4 Turbo with vision model
[0m
Time elapsed in seconds = 5.66564


In [21]:
prompt = "Propose me a meal based on the contents of my fridge"

ask_image(prompt, url1)

Propose me a meal based on the contents of my fridge

Image : https://github.com/retkowsky/Azure-OpenAI-demos/blob/main/images/fridge.jpg?raw=true



Azure OpenAI GPT-4 Turbo with Vision results:
[1;31;34m
How about a roasted chicken with a side of roasted vegetables? You have a whole chicken and a variety of vegetables like bell peppers, cauliflower, and kale that would roast up nicely. You could also make a fresh salad with the lettuce and tomatoes, and use the lemons and limes for a homemade dressing. For dessert, you have plenty of fruit options like apples, pears, and berries.
[1;31;32m
[Note] These results are produced by an AI.
05-Mar-2024 11:35:55 Powered by Azure OpenAI GPT-4 Turbo with vision model
[0m
Time elapsed in seconds = 3.70146


In [22]:
url1 = "https://github.com/retkowsky/Azure-OpenAI-demos/blob/main/images/openai-monitor-log.png?raw=true"

In [23]:
prompt = "Explain this architecture"

ask_image(prompt, url1)

Explain this architecture

Image : https://github.com/retkowsky/Azure-OpenAI-demos/blob/main/images/openai-monitor-log.png?raw=true



Azure OpenAI GPT-4 Turbo with Vision results:
[1;31;34m
This architecture represents a secure and scalable API management solution using Microsoft Azure services. The architecture is divided into two subscriptions: Corporate and Application.

1. The user accesses the system through an Application Gateway, which acts as a load balancer and provides a secure entry point to the application.

2. The Application Gateway is connected to a subnet within a Virtual Network in the Corporate subscription. This Virtual Network is peered with another Virtual Network in the Application subscription, allowing secure communication between the two.

3. Within the Application subscription's Virtual Network, there is an API Management service that handles the API requests and responses. This service is also connected to a subnet.

4. The API Management service interacts with Azure OpenAI Service instances through Private Links, which provide secure and private connectivity without exposing the services

## 3. Images comparison

In [24]:
def images_comparison(prompt, img_url1, img_url2, imgview=True):
    """
    Images_comparison with Gpt-4 Turbo Vision
    """
    print(prompt)
    print()
    
    if imgview:
        # Viewing images
        print(f"Image 1: {img_url1} \nImage 2: {img_url2}")
        display(
            HTML(
                "<table><tr><td><img src={0}></td><td><img src={1}></td></tr></table>".format(
                    img_url1, img_url2
                )
            )
        )

    # Calling Azure OpenAI model
    response = client.chat.completions.create(
        model=model,
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": prompt},
                    {
                        "type": "image_url",
                        "image_url": {"url": img_url1},
                    },
                    {
                        "type": "image_url",
                        "image_url": {"url": img_url2},
                    },
                ],
            }
        ],
        max_tokens=800,
        temperature=0,
    )

    # Print results
    print()
    print("Azure OpenAI GPT-4 Turbo with Vision results:")
    print("\033[1;31;34m")
    print(response.choices[0].message.content)

    print("\033[1;31;32m")
    print("[Note] These results are produced by an AI.")
    print(
        datetime.datetime.today().strftime("%d-%b-%Y %H:%M:%S"),
        "Powered by Azure OpenAI GPT-4 Turbo with vision model",
    )
    print("\033[0m")

## Testing

In [25]:
url1 = "https://github.com/retkowsky/Azure-OpenAI-demos/blob/main/images/Paris%20(1).png?raw=true"
url2 = "https://github.com/retkowsky/Azure-OpenAI-demos/blob/main/images/Paris%20(2).png?raw=true"
url3 = "https://github.com/retkowsky/Azure-OpenAI-demos/blob/main/images/Paris%20(3).png?raw=true"
url4 = "https://github.com/retkowsky/Azure-OpenAI-demos/blob/main/images/Paris%20(4).png?raw=true"

In [26]:
prompt = "What are in these images? Is there any difference between them? Explain the details"

In [27]:
images_comparison(prompt, url1, url2)

What are in these images? Is there any difference between them? Explain the details

Image 1: https://github.com/retkowsky/Azure-OpenAI-demos/blob/main/images/Paris%20(1).png?raw=true 
Image 2: https://github.com/retkowsky/Azure-OpenAI-demos/blob/main/images/Paris%20(2).png?raw=true



Azure OpenAI GPT-4 Turbo with Vision results:
[1;31;34m
Both images show the Eiffel Tower in Paris, France. The main difference between the two images is the time of day they were taken. The first image was taken during the day, with clear blue skies and the sun shining on the tower. The second image was taken at night, with the tower illuminated by lights and the sky dark. Additionally, the first image shows more people gathered on the lawn in front of the tower, while the second image has fewer people and a more serene atmosphere.
[1;31;32m
[Note] These results are produced by an AI.
05-Mar-2024 11:36:21 Powered by Azure OpenAI GPT-4 Turbo with vision model
[0m


In [28]:
images_comparison(prompt, url1, url3)

What are in these images? Is there any difference between them? Explain the details

Image 1: https://github.com/retkowsky/Azure-OpenAI-demos/blob/main/images/Paris%20(1).png?raw=true 
Image 2: https://github.com/retkowsky/Azure-OpenAI-demos/blob/main/images/Paris%20(3).png?raw=true



Azure OpenAI GPT-4 Turbo with Vision results:
[1;31;34m
The first image is of the Eiffel Tower, and the second image is of the Arc de Triomphe. Both are famous landmarks located in Paris, France. The Eiffel Tower is a wrought-iron lattice tower that stands 324 meters tall and was completed in 1889. The Arc de Triomphe is a triumphal arch that stands 50 meters tall and was completed in 1836. The main difference between the two images is the subject matter, with one featuring the Eiffel Tower and the other featuring the Arc de Triomphe. Additionally, the first image is taken during the day with a clear blue sky, while the second image is taken during sunset with a more dramatic sky.
[1;31;32m
[Note] These results are produced by an AI.
05-Mar-2024 11:36:31 Powered by Azure OpenAI GPT-4 Turbo with vision model
[0m


In [29]:
images_comparison(prompt, url1, url4)

What are in these images? Is there any difference between them? Explain the details

Image 1: https://github.com/retkowsky/Azure-OpenAI-demos/blob/main/images/Paris%20(1).png?raw=true 
Image 2: https://github.com/retkowsky/Azure-OpenAI-demos/blob/main/images/Paris%20(4).png?raw=true



Azure OpenAI GPT-4 Turbo with Vision results:
[1;31;34m
Both images show the Eiffel Tower in Paris, France. The first image is a modern, colored photograph of the completed tower, with a clear blue sky in the background and people gathered on the grassy area in front of it. The second image is a black and white historical photograph of the Eiffel Tower during its construction, with the top portion still incomplete and surrounded by scaffolding. The surrounding area also appears to be less developed, with fewer buildings and more open space.
[1;31;32m
[Note] These results are produced by an AI.
05-Mar-2024 11:36:39 Powered by Azure OpenAI GPT-4 Turbo with vision model
[0m


In [30]:
images_comparison(prompt, url1, url1)

What are in these images? Is there any difference between them? Explain the details

Image 1: https://github.com/retkowsky/Azure-OpenAI-demos/blob/main/images/Paris%20(1).png?raw=true 
Image 2: https://github.com/retkowsky/Azure-OpenAI-demos/blob/main/images/Paris%20(1).png?raw=true



Azure OpenAI GPT-4 Turbo with Vision results:
[1;31;34m
Both images show the Eiffel Tower in Paris, France. There is no difference between the two images as they are identical. The Eiffel Tower is a famous landmark and a symbol of France, known for its iron lattice structure and height of 324 meters. The images show the tower on a clear day with a blue sky, and there are people gathered on the grassy area in front of it.
[1;31;32m
[Note] These results are produced by an AI.
05-Mar-2024 11:36:46 Powered by Azure OpenAI GPT-4 Turbo with vision model
[0m


## Others examples

In [31]:
url5 = "https://github.com/retkowsky/Azure-OpenAI-demos/blob/main/images/car1.png?raw=true"
url6 = "https://github.com/retkowsky/Azure-OpenAI-demos/blob/main/images/car2.png?raw=true"
url7 = "https://github.com/retkowsky/Azure-OpenAI-demos/blob/main/images/car3.png?raw=true"

In [32]:
prompt = "Can you compare these images?"

In [33]:
images_comparison(prompt, url5, url6)

Can you compare these images?

Image 1: https://github.com/retkowsky/Azure-OpenAI-demos/blob/main/images/car1.png?raw=true 
Image 2: https://github.com/retkowsky/Azure-OpenAI-demos/blob/main/images/car2.png?raw=true



Azure OpenAI GPT-4 Turbo with Vision results:
[1;31;34m
Both images show cars in motion, with the first image featuring a red and black BMW i3 electric car driving on a road with mountains in the background, and the second image featuring a white BMW i4 electric car driving on a bridge with a sunset sky in the background. Both cars have a sleek and modern design, with the i3 having a more compact and futuristic look, and the i4 having a more traditional sedan shape with a sporty touch. The i3 is a smaller, city-focused electric vehicle, while the i4 is a larger, performance-oriented electric vehicle. Both images convey a sense of speed and motion, with the cars positioned at an angle that suggests they are quickly moving forward.
[1;31;32m
[Note] These results are produced by an AI.
05-Mar-2024 11:36:57 Powered by Azure OpenAI GPT-4 Turbo with vision model
[0m


In [34]:
images_comparison(prompt, url6, url7)

Can you compare these images?

Image 1: https://github.com/retkowsky/Azure-OpenAI-demos/blob/main/images/car2.png?raw=true 
Image 2: https://github.com/retkowsky/Azure-OpenAI-demos/blob/main/images/car3.png?raw=true



Azure OpenAI GPT-4 Turbo with Vision results:
[1;31;34m
Both images show the same car model, a white BMW i4, driving on a road. The first image has a sunset sky with clouds and a bridge in the background, while the second image has a clear blue sky with modern buildings in the background. The car is positioned at a slightly different angle in each image, with the first image showing more of the front of the car and the second image showing more of the side. Both images convey a sense of motion and speed.
[1;31;32m
[Note] These results are produced by an AI.
05-Mar-2024 11:37:06 Powered by Azure OpenAI GPT-4 Turbo with vision model
[0m


In [35]:
images_comparison(prompt, url5, url7)

Can you compare these images?

Image 1: https://github.com/retkowsky/Azure-OpenAI-demos/blob/main/images/car1.png?raw=true 
Image 2: https://github.com/retkowsky/Azure-OpenAI-demos/blob/main/images/car3.png?raw=true



Azure OpenAI GPT-4 Turbo with Vision results:
[1;31;34m
Both images show electric cars from the BMW brand, with the first image featuring the BMW i3 and the second image featuring the BMW i4. The i3 has a more compact and futuristic design with a two-tone color scheme, while the i4 has a sleek and sporty design with a more traditional sedan shape. Both cars have the signature BMW front grille and emblem, but the i4's grille is larger and more prominent. The i3 is shown driving on a scenic mountain road, while the i4 is shown in an urban setting with modern architecture in the background.
[1;31;32m
[Note] These results are produced by an AI.
05-Mar-2024 11:37:14 Powered by Azure OpenAI GPT-4 Turbo with vision model
[0m


In [36]:
images_comparison(prompt, url5, url5)

Can you compare these images?

Image 1: https://github.com/retkowsky/Azure-OpenAI-demos/blob/main/images/car1.png?raw=true 
Image 2: https://github.com/retkowsky/Azure-OpenAI-demos/blob/main/images/car1.png?raw=true



Azure OpenAI GPT-4 Turbo with Vision results:
[1;31;34m
I'm sorry, I cannot compare these images as they are identical.
[1;31;32m
[Note] These results are produced by an AI.
05-Mar-2024 11:37:20 Powered by Azure OpenAI GPT-4 Turbo with vision model
[0m
