In [1]:
## Install the necessary libraries
!pip install -q openai

In [4]:
## Make the necessary imports
import base64
from google.colab import userdata
from openai import OpenAI

client = OpenAI(api_key = userdata.get("OPENAI_API_KEY"))

In [5]:
# Function to encode the image
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")



In [10]:
# Path to your image
image_path = "/content/image1.png"

# Getting the base64 string
base64_image = encode_image(image_path)

# send request to the API with the image receive the response in about 15seconds
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What is in this image?",
                },
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/png;base64,{base64_image}", "details" : "low"},
                },
            ],
        }
    ],
)


Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='The image features a vibrant underwater scene showcasing a large seahorse and various types of fish, surrounded by colorful coral and sea vegetation. The artwork is characterized by intricate patterns and rich, vivid colors, creating a dynamic and captivating look at marine life.', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None))


# Vision Model: **Detail Parameter Documentation**

## **detail**  
**Type:** `string`  
**Required:** No (Optional)  
**Default Value:** `auto`  
**Description:** Controls the level of fidelity in how the model processes the image and generates textual understanding.  
**Options:**  
- **`auto`**:  
  The model will analyze the input image size and automatically decide whether to use the `low` or `high` setting. This is the default setting.  

- **`low`**:  
  Enables "low res" mode.  
  - The model processes a 512px x 512px low-resolution version of the image.  
  - The image is represented using a token budget of **85 tokens**.  
  - **Benefits:** Faster API responses and reduced input token consumption, ideal for use cases where high detail is not required.

- **`high`**:  
  Enables "high res" mode.  
  - The model first processes a low-resolution version of the image (512px x 512px, **85 tokens**) and then generates detailed crops using **170 tokens per 512px x 512px tile**.  
  - **Benefits:** Provides high-detail processing suitable for use cases that demand precise understanding of image details.


In [9]:
print(response.choices[0].message.content)

The image depicts an underwater scene featuring a prominent seahorse in the foreground, characterized by intricate and colorful designs. Surrounding it are various fish, along with corals and other marine flora, creating a vibrant and detailed underwater environment. The overall style appears artistic and richly patterned, enhancing the visual appeal of the marine life and ecosystem.


In [13]:
encoding1 = encode_image("/content/image1.png")
encoding2 = encode_image("/content/image2.png")

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What are in these images? Explain the differences between these two images .",
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/png;base64,{encoding1}"
                    },
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/png;base64,{encoding2}"
                    },
                },
            ],
        }
    ],
    max_tokens=300,
)


Choice(finish_reason='length', index=0, logprobs=None, message=ChatCompletionMessage(content='The two images depict underwater scenes featuring seahorses and different types of fish, but they have distinct artistic styles and details.\n\n### Differences:\n\n1. **Artistic Style**:\n   - **First Image**: This image has a vibrant, intricate, and stylized design, characterized by swirling patterns and bright colors. The textures and details create a psychedelic effect, making it visually striking.\n   - **Second Image**: This image is more realistic and focused on clear, simpler lines and colors. It features a straightforward representation of the underwater world, using softer lighting and a more natural palette.\n\n2. **Seahorse Appearance**:\n   - **First Image**: The seahorse is more elaborate, with exaggerated features and a patterned body that gives it a fantastical look.\n   - **Second Image**: The seahorse is depicted in a more naturalistic style, with a realistic shape and colorin

In [14]:
print(response.choices[0].message.content)

The two images depict underwater scenes featuring seahorses and different types of fish, but they have distinct artistic styles and details.

### Differences:

1. **Artistic Style**:
   - **First Image**: This image has a vibrant, intricate, and stylized design, characterized by swirling patterns and bright colors. The textures and details create a psychedelic effect, making it visually striking.
   - **Second Image**: This image is more realistic and focused on clear, simpler lines and colors. It features a straightforward representation of the underwater world, using softer lighting and a more natural palette.

2. **Seahorse Appearance**:
   - **First Image**: The seahorse is more elaborate, with exaggerated features and a patterned body that gives it a fantastical look.
   - **Second Image**: The seahorse is depicted in a more naturalistic style, with a realistic shape and coloring typical of a seahorse.

3. **Background and Environment**:
   - **First Image**: The background is fil

# Managing Images with the Chat Completions API

The Chat Completions API, unlike the Assistants API, is **not stateful**. This means that you must manage messages (including images) yourself. If you need to pass the same image multiple times, you will have to include the image in each API request.

## Best Practices for Managing Images:
- **Use URLs for long conversations:** Instead of Base64-encoded images, we recommend passing images as URLs for long-running conversations.  
- **Optimize Image Size:**  
  - For **low res mode**, provide a 512px x 512px image.  
  - For **high res mode**, ensure the short side is **less than 768px** and the long side is **less than 2,000px**.  
- **Image Deletion:** Once processed, images are **deleted** from OpenAI's servers and are **not retained**. OpenAI does not use uploaded images to train models.

---

# Limitations of GPT-4 with Vision

While GPT-4 with vision capabilities is versatile, there are some **limitations** to be aware of:

## **Medical Images**  
The model is **not suitable** for interpreting specialized medical images (e.g., CT scans) and should **not be used** for medical advice.

## **Non-English Text**  
The model may not perform optimally with images containing **non-Latin alphabets** (e.g., Japanese, Korean).

## **Small Text**  
For better readability, enlarge text within images but avoid cropping important details.

## **Rotation**  
The model may misinterpret **rotated or upside-down** text or images.

## **Visual Elements**  
The model may struggle with interpreting **graphs** or images where visual elements like line styles (solid, dashed, dotted) vary.

## **Spatial Reasoning**  
The model has difficulty with tasks requiring **precise spatial localization**, such as identifying chess positions.

## **Accuracy**  
The model may generate **incorrect descriptions or captions** in some cases.

## **Image Shape**  
The model may have difficulty interpreting **panoramic or fisheye images**.

## **Metadata and Resizing**  
The model does not process **original file names or metadata**. Images are resized before analysis, which can affect their original dimensions.

## **Counting Objects**  
The model may provide **approximate counts** for objects in images.

## **CAPTCHAs**  
For safety, **CAPTCHAs** are blocked from submission to the model.
