This notebook analyzes images using multi-modal GPT-4o model.

The analysis is applied to an image of Group of Dogs with [Set-Of-Marks](https://example.com/set-of-marks) overlayed.

The objective is to demonstrate the capability of analyzing different types of image with SOM.


| Image |
|----------|
<img src="images/dogs.png" alt="Alt Text" width="500" height="300">


In [None]:
!pip install -r requirements.txt

In [1]:
from dotenv import load_dotenv
import os
import requests
import base64

load_dotenv()

# Configuration
API_KEY = os.getenv('AZURE_OPENAI_API_KEY')
RAW_IMAGE_PATH = "images/dogs.png"
encoded_image = base64.b64encode(open(RAW_IMAGE_PATH, 'rb').read()).decode('ascii')
headers = {
    "Content-Type": "application/json",
    "api-key": API_KEY,
}

In [36]:
system_message = """You are a dog expert. Your job is to exhaustively inspect each dogs on the given photo and give a brief description of each dog."""
user_message = """
TASK:  you are going to create a single audit table by each shelf with columns: Number, Dog Size, Dog Breed, Color, Fur Type, Description. DO NOT hallucinate.
RULES:
- DO NOT INCLUDE ANYTHING THAT IS NOT CONSIDERED A DOG.
- You must fill in all the columns for each dog.
- You must fill in the table in the order of the dogs from left to right.
- Size is the primary size of the dog, ranging from small, medium, to large.
- Breed is the primary breed of the dog, estimated from the image.
- Color is the primary color of the dog.
- Fur Type is the primary fur type of the dog, ranging from short, medium, to long.
- If you cannot determine the information, leave the column blank.
- Description should be only from the information you have gathered from the image, try to include things such as pose, facial expression.
"""

# Payload for the request
payload = {
  "messages": [
    {
      "role": "system",
      "content": [
        {
          "type": "text",
          "text": system_message
        }
      ]
    },
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": user_message
        },

        {
          "type": "image_url",
          "image_url": {
            "url": f"data:image/jpeg;base64,{encoded_image}"
          }
        },
        {
          "type": "text",
          "text": "\n"
        }
      ]
    }
  ],
  "temperature": 0.7,
  "top_p": 0.95,
  "max_tokens": 800
}

In [37]:
ENDPOINT_BASE = os.getenv('AZURE_OPENAI_ENDPOINT')
API_VERSION = os.getenv('AZURE_OPENAI_API_VERSION')
MODEL_DEPLOYMENT = os.getenv('AZURE_OPENAI_MODEL_DEPLOYMENT')
ENDPOINT = f"{ENDPOINT_BASE}openai/deployments/{MODEL_DEPLOYMENT}/chat/completions?api-version={API_VERSION}"
# Send request
try:
    response = requests.post(ENDPOINT, headers=headers, json=payload)
    response.raise_for_status()  # Will raise an HTTPError if the HTTP request returned an unsuccessful status code
except requests.RequestException as e:
    raise SystemExit(f"Failed to make the request. Error: {e}")

# Handle the response as needed (e.g., print or process)
print(response.json())

{'choices': [{'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}, 'finish_reason': 'stop', 'index': 0, 'logprobs': None, 'message': {'content': '| Number | Dog Size | Dog Breed          | Color    | Fur Type | Description                                 |\n|--------|----------|--------------------|----------|----------|---------------------------------------------|\n| 3      | Small    | Unknown            | Brown/White | Short   | Sitting, ears perked up.                    |\n| 11     | Medium   | Labrador Retriever | Yellow    | Short    | Sitting, looking forward.                   |\n| 12     | Medium   | German Shepherd    | Brown/Black | Medium | Sitting, ears perked up, alert expression.  |\n| 4      | Large    | Bernese Mountain Dog | Black/White/Brown | Long | Sitting, relaxed posture, fluffy fur

In [None]:
import json

# Print the response JSON nicely
print(json.dumps(response.json(), indent=4))

In [38]:
# Extract and print the 'choices' -> 'message' -> 'content' part of the response JSON
choices_content = response.json().get('choices', [{}])[0].get('message', {}).get('content')
print(choices_content)

| Number | Dog Size | Dog Breed          | Color    | Fur Type | Description                                 |
|--------|----------|--------------------|----------|----------|---------------------------------------------|
| 3      | Small    | Unknown            | Brown/White | Short   | Sitting, ears perked up.                    |
| 11     | Medium   | Labrador Retriever | Yellow    | Short    | Sitting, looking forward.                   |
| 12     | Medium   | German Shepherd    | Brown/Black | Medium | Sitting, ears perked up, alert expression.  |
| 4      | Large    | Bernese Mountain Dog | Black/White/Brown | Long | Sitting, relaxed posture, fluffy fur.       |
| 5      | Large    | German Shepherd    | Brown/Black | Medium | Sitting, alert and attentive.               |
| 7      | Large    | German Shepherd    | Brown/Black | Medium | Sitting, ears perked up, alert expression.  |
| 8      | Medium   | Labrador Retriever | Black     | Short    | Sitting, calm demeanor.          

Asking For specific SOM numbered dog

In [6]:
system_message = """You are a dog expert. Your job is to exhaustively inspect each dogs on the given photo and give a brief description of each dog."""
user_message = """
Can you give me detail information on what dog is at 12?
"""

# Payload for the request
payload = {
  "messages": [
    {
      "role": "system",
      "content": [
        {
          "type": "text",
          "text": system_message
        }
      ]
    },
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": user_message
        },
        {
          "type": "image_url",
          "image_url": {
            "url": f"data:image/jpeg;base64,{encoded_image}"
          }
        },
        {
          "type": "text",
          "text": "\n"
        }
      ]
    }
  ],
  "temperature": 0.7,
  "top_p": 0.95,
  "max_tokens": 800
}


# Send request
try:
    response = requests.post(ENDPOINT, headers=headers, json=payload)
    response.raise_for_status()  # Will raise an HTTPError if the HTTP request returned an unsuccessful status code
except requests.RequestException as e:
    raise SystemExit(f"Failed to make the request. Error: {e}")

# Handle the response as needed (e.g., print or process)
print(response.json())

{'choices': [{'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}, 'finish_reason': 'stop', 'index': 0, 'logprobs': None, 'message': {'content': 'The dog labeled with the number 12 in the image is a German Shepherd. \n\n### Characteristics of a German Shepherd:\n- **Size:** Large\n- **Build:** Strong, athletic, and muscular\n- **Coat:** Medium-length double coat, often tan with a black saddle, but can come in various colors.\n- **Ears:** Erect and pointy\n- **Tail:** Bushy and curves slightly downward\n- **Temperament:** Intelligent, loyal, and versatile, often used in roles such as police and service work\n- **Exercise Needs:** High, requires regular physical and mental stimulation\n\nThis breed is known for its versatility and is often seen in roles requiring high intelligence and trainability.', 'role': 

In [7]:
# Extract and print the 'choices' -> 'message' -> 'content' part of the response JSON
choices_content = response.json().get('choices', [{}])[0].get('message', {}).get('content')
print(choices_content)

The dog labeled with the number 12 in the image is a German Shepherd. 

### Characteristics of a German Shepherd:
- **Size:** Large
- **Build:** Strong, athletic, and muscular
- **Coat:** Medium-length double coat, often tan with a black saddle, but can come in various colors.
- **Ears:** Erect and pointy
- **Tail:** Bushy and curves slightly downward
- **Temperament:** Intelligent, loyal, and versatile, often used in roles such as police and service work
- **Exercise Needs:** High, requires regular physical and mental stimulation

This breed is known for its versatility and is often seen in roles requiring high intelligence and trainability.


### Conclusion
The analysis show that the SOM marker allows specify and target element to be analyzed.