# Image Understanding

GPT-4o and GPT-4o mini can understand what they see in an image.

GPT-4o and GPT-4o mini are advertised as _multimodal_ models. A true multimodal model should be able to take input and give ouput in multiple _modes_. A mode can be text, images, audio, or video. The OpenAI API currently only supports images as an alternative input to text. The API does not support image generation.

The model will also refuse to identify people in images.

In [1]:
#Import the openai package and set the API key. 
#I have my API key stored in an environment variable for this demo.
#In prod, you might prefer to use a secret store.
import openai
import os
import backoff

openai.api_key  = os.getenv('MY_API_KEY')

# Calling the model with a URL to an image

We either pass a URL to an image. Or we can pass an image to the model. The approaches to doing this are very different. Let's start with passing an image URL to the model.

We will be using this picture from the internet to query the model. It's called _Sitta europaea wildlife 3_, by Paweł Kuźniar
![A picture from the internet that we will be using](https://upload.wikimedia.org/wikipedia/commons/e/e1/Sitta_europaea_wildlife_3.jpg)

## Modifying our query function

We will be passing a complete messages list to our query function. So we'll first modify it to accept a messages list.

I've also renamed it to *query_model_single_turn()* because it's not just a _language_ model anymore.

In [2]:
#We are using the backoff package to handle the rate limit error
## We wrap the openai.ChatCompletion.create() in our own function 
### and use the @backoff.on_exception() decorator.
@backoff.on_exception(backoff.expo, openai.RateLimitError)
def query_model_single_turn(prompt, model="gpt-4o-mini", temperature=0, **kwargs):
    """
    This function queries the openai ChatCompletion API, with exponential backoff.
    
    Args:
        prompt(str or list): The prompt or a list of messages.
        model(str): The type of model to use. The default is "gpt-4o-mini".
        temperature(float): The temperature to use. The default is 0.
        **kwargs: Additional keyword arguments to be passed to openai.ChatCompletion.create()
    Returns:
        An opanai ChatCompletion object.
    """
    ##Set up the messages list
    if isinstance(prompt, str):
        messages = [{"role": "user", "content": prompt}]
    else:
        messages = prompt
    return openai.chat.completions.create(
        model=model,    
        messages=messages,
        temperature=temperature,
        **kwargs)

In [3]:
messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What's happening here?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/e/e1/Sitta_europaea_wildlife_3.jpg",
          },
        },
      ],
    }
  ]


In [4]:
response = query_model_single_turn(messages, model="gpt-4o")
the_reply = response.choices[0].message.content
print(the_reply)

In the image, a bird is perched on a person's hand, picking up seeds. The person is likely feeding the bird, and the bird appears to be comfortable enough to eat directly from the hand. The background is blurred, focusing attention on the interaction between the bird and the person.


# Uploading an image to the model

Currently, we can't use the openai package to upload an image to the model. So we'll be using the REST API.

We'll be using the image in images/black-friday.jpg
![The image that we will be prompting the model with](images/black-friday.jpg)

In [5]:
import base64
import requests

In [12]:
@backoff.on_exception(backoff.expo, KeyError)
def upload_image_and_query_model_single_turn(prompt, path_to_image, api_key, model="gpt-4o-mini", temperature=0):
    """
    This function queries the model with a text prompt, and an image. 
    The image is uploaded from disk.
    
    Args:
        prompt(str): The prompt
        path_to_image(str): The path to the image
        api_key(str): The OpenAI API key
        model(str): The type of model to use. The default is "gpt-4o-mini".
        temperature(float): The temperature to use. The default is 0.
    Returns:
        The string of the model's reply.
    """
    #1. Read and encode the image
    with open(path_to_image, "rb") as image_file:
        the_encoded_image = base64.b64encode(image_file.read()).decode('utf-8')

    #2. Set up the messages list (only one message)
    messages_list = [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": prompt
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "data:image/jpeg;base64,{0}".format(the_encoded_image)
                    }
                }
            ]
        }
    ]
    
    #3. Repare the payload
    headers = {
      "Content-Type": "application/json",
      "Authorization": "Bearer {0}".format(api_key)
    }
    payload = {
        "model": model,
        "messages": messages_list,
        "temperature": temperature
    }
    #3. Make the API call
    response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)
    #4. Parse the output
    response_dict = response.json()
    #5. Return the text of the reply
    ##If we have any problems here, then it will be caught by the @backoff decorator
    response_text = response_dict["choices"][0]["message"]["content"] 
    return response_text

In [7]:
our_prompt = """
We have scraped the image from the internet. Please give the following information. Explain everything step by step.
1. Describe the image.
2. Describe what kind of website it likely came from.
3. Describe what it was most likely being used for.
Stay focused on your goals. With great focus, we will achieve fantastic outcomes!
"""

In [14]:
response = upload_image_and_query_model_single_turn(our_prompt, "images/black-friday.jpg", os.getenv('MY_API_KEY'))
print(response)

Sure! Here’s a step-by-step breakdown based on the image you provided:

### 1. Describe the Image
The image features a young woman with long, dark hair, smiling and looking upwards. She is wearing a gray, short-sleeved top. Surrounding her are numerous red squares with the word "SALE" prominently displayed in white text. The background is white, which enhances the visibility of the red squares and the text "BLACK FRIDAY" in large, bold black letters at the center of the image.

### 2. Describe What Kind of Website It Likely Came From
This image likely originated from a retail or e-commerce website, particularly one that promotes sales events. It could also be from a marketing or advertising site focused on seasonal promotions. The emphasis on "Black Friday" suggests it is aimed at consumers looking for deals during this major shopping event.

### 3. Describe What It Was Most Likely Being Used For
The image was most likely used for promotional purposes, such as an advertisement or a ban

Copyright &copy; Slava Razbash and AI Upskill (aiupskill.io)