This notebook analyzes images using multi-modal GPT-4o model.  The analysis is applied to an image of a shelf of products and then tha analysis is applied to an image where a [Set-Of-Marks](https://example.com/set-of-marks) are overlayed.  The objective is to compare the results of the analysis.


| Raw Image | Image with SOM |
|----------|----------|
<img src="images/shelf.jpg" alt="Alt Text" width="200" height="300"> | <img src="images/shelf_image_SOM.jpg" alt="Alt Text" width="200" height="300">   |


In [None]:
!pip install -r requirements.txt

In [9]:
from dotenv import load_dotenv
import os
import requests
import base64

load_dotenv()

# Configuration
API_KEY = os.getenv('AZURE_OPENAI_API_KEY')
RAW_IMAGE_PATH = "images/shelf.jpg"
encoded_image = base64.b64encode(open(RAW_IMAGE_PATH, 'rb').read()).decode('ascii')
headers = {
    "Content-Type": "application/json",
    "api-key": API_KEY,
}

In [10]:
# Payload for the request
payload = {
  "messages": [
    {
      "role": "system",
      "content": [
        {
          "type": "text",
          "text": "You are a retail Coca-cola planogram and assortment auditor.  Our products SKUs are: Coca-cola Original can 330ml, Coca-cola Zero can 330ml, Fanta Orange can 330ml, Sprite Original can 330ml, Sprite Zero can 330ml, Dasani water bottle 500ml.  Your job is to exhaustively inspect each shelf of the cooler photo and audit to determine if the cooler is compliant or not by shelf and if there are products from competitors or different products."
        }
      ]
    },
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "TASK:  you are going to create a single audit table by each shelf with columns: Shelf Number, Required SKUs, Compliance summary with counts per product by comparing the below RULES of the PLANOGRAM with the above attached image, the image has 6 shelfs.  DO NOT hallucinate with counts, include only products with numbers on the same shelf and same SKU similarity."
        },
        {
          "type": "text",
          "text": "\n"
        },
        {
          "type": "text",
          "text": "PLANOGRAM RULES:"
        },
        {
          "type": "text",
          "text": "Shelf 1 (top):"
        },
        {
          "type": "text",
          "text": "\t- Must have 2 units of Coca-Cola Original bottle 330ml"
        },
        {
          "type": "text",
          "text": "\t- Must have 2 units of Coca-Cola Zero bottle 330ml"
        },
        {
          "type": "text",
          "text": "Shelf 2:"
        },
        {
          "type": "text",
          "text": "\t- Must have 3 units of Fanta Orange bottle 330ml"
        },
        {
          "type": "text",
          "text": "Shelf 3:"
        },
        {
          "type": "text",
          "text": "\t- Must have 8 units of Coca-Cola original can 330ml"
        },
        {
          "type": "text",
          "text": "Shelf 4:"
        },
        {
          "type": "text",
          "text": "\t- Must have 8 units of Coca-Cola original can 330ml"
        },
        {
          "type": "text",
          "text": "Shelf 5:"
        },
        {
          "type": "text",
          "text": "\t- Must have 8 units of Coca-Cola original can 330ml"
        },
        {
          "type": "text",
          "text": "Shelf 6 (bottom)"
        },
        {
          "type": "text",
          "text": "\t- Must have 16 units of Dasani Water bottle 550ml"
        },
        {
          "type": "text",
          "text": "\n"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": f"data:image/jpeg;base64,{encoded_image}"
          }
        },
        {
          "type": "text",
          "text": "\n"
        }
      ]
    }
  ],
  "temperature": 0.7,
  "top_p": 0.95,
  "max_tokens": 800
}

In [13]:
ENDPOINT_BASE = os.getenv('AZURE_OPENAI_ENDPOINT')
API_VERSION = os.getenv('AZURE_OPENAI_API_VERSION')
MODEL_DEPLOYMENT = os.getenv('AZURE_OPENAI_MODEL_DEPLOYMENT')
ENDPOINT = f"{ENDPOINT_BASE}openai/deployments/{MODEL_DEPLOYMENT}/chat/completions?api-version={API_VERSION}"
# Send request
try:
    response = requests.post(ENDPOINT, headers=headers, json=payload)
    response.raise_for_status()  # Will raise an HTTPError if the HTTP request returned an unsuccessful status code
except requests.RequestException as e:
    raise SystemExit(f"Failed to make the request. Error: {e}")

# Handle the response as needed (e.g., print or process)
print(response.json())

{'choices': [{'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}, 'finish_reason': 'stop', 'index': 0, 'logprobs': None, 'message': {'content': "Here's the audit table based on the provided cooler image:\n\n| Shelf Number | Required SKUs                                    | Compliance Summary                           |\n|--------------|--------------------------------------------------|----------------------------------------------|\n| 1 (top)      | 2 units of Coca-Cola Original bottle 330ml       | Coca-Cola Original bottle 330ml: 2 units     |\n|              | 2 units of Coca-Cola Zero bottle 330ml           | Coca-Cola Zero bottle 330ml: 2 units         |\n| 2            | 3 units of Fanta Orange bottle 330ml             | Fanta Orange bottle 330ml: 1 unit            |\n| 3            | 8 units of Co

In [14]:
import json

# Print the response JSON nicely
print(json.dumps(response.json(), indent=4))

{
    "choices": [
        {
            "content_filter_results": {
                "hate": {
                    "filtered": false,
                    "severity": "safe"
                },
                "self_harm": {
                    "filtered": false,
                    "severity": "safe"
                },
                "sexual": {
                    "filtered": false,
                    "severity": "safe"
                },
                "violence": {
                    "filtered": false,
                    "severity": "safe"
                }
            },
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
                "content": "Here's the audit table based on the provided cooler image:\n\n| Shelf Number | Required SKUs                                    | Compliance Summary                           |\n|--------------|--------------------------------------------------|-------------------------

In [17]:
# Extract and print the 'choices' -> 'message' -> 'content' part of the response JSON
choices_content = response.json().get('choices', [{}])[0].get('message', {}).get('content')
print(choices_content)

Here's the audit table based on the provided cooler image:

| Shelf Number | Required SKUs                                    | Compliance Summary                           |
|--------------|--------------------------------------------------|----------------------------------------------|
| 1 (top)      | 2 units of Coca-Cola Original bottle 330ml       | Coca-Cola Original bottle 330ml: 2 units     |
|              | 2 units of Coca-Cola Zero bottle 330ml           | Coca-Cola Zero bottle 330ml: 2 units         |
| 2            | 3 units of Fanta Orange bottle 330ml             | Fanta Orange bottle 330ml: 1 unit            |
| 3            | 8 units of Coca-Cola original can 330ml          | Coca-Cola Original can 330ml: 2 units        |
| 4            | 8 units of Coca-Cola original can 330ml          | Coca-Cola Original can 330ml: 1 unit         |
| 5            | 8 units of Coca-Cola original can 330ml          | Coca-Cola Original can 330ml: 2 units        |
| 6 (bottom)   | 16 

Repeat the above analysis with the SOM image

In [18]:
SOM_IMAGE_PATH = "images/shelf.jpg"
encoded_image = base64.b64encode(open(SOM_IMAGE_PATH, 'rb').read()).decode('ascii')

In [19]:
# Payload for the request
payload = {
  "messages": [
    {
      "role": "system",
      "content": [
        {
          "type": "text",
          "text": "You are a retail Coca-cola planogram and assortment auditor.  Our products SKUs are: Coca-cola Original can 330ml, Coca-cola Zero can 330ml, Fanta Orange can 330ml, Sprite Original can 330ml, Sprite Zero can 330ml, Dasani water bottle 500ml.  Your job is to exhaustively inspect each shelf of the cooler photo and audit to determine if the cooler is compliant or not by shelf and if there are products from competitors or different products."
        }
      ]
    },
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "TASK:  you are going to create a single audit table by each shelf with columns: Shelf Number, Required SKUs, Compliance summary with counts per product by comparing the below RULES of the PLANOGRAM with the above attached image, the image has 6 shelfs.  DO NOT hallucinate with counts, include only products with numbers on the same shelf and same SKU similarity."
        },
        {
          "type": "text",
          "text": "\n"
        },
        {
          "type": "text",
          "text": "PLANOGRAM RULES:"
        },
        {
          "type": "text",
          "text": "Shelf 1 (top):"
        },
        {
          "type": "text",
          "text": "\t- Must have 2 units of Coca-Cola Original bottle 330ml"
        },
        {
          "type": "text",
          "text": "\t- Must have 2 units of Coca-Cola Zero bottle 330ml"
        },
        {
          "type": "text",
          "text": "Shelf 2:"
        },
        {
          "type": "text",
          "text": "\t- Must have 3 units of Fanta Orange bottle 330ml"
        },
        {
          "type": "text",
          "text": "Shelf 3:"
        },
        {
          "type": "text",
          "text": "\t- Must have 8 units of Coca-Cola original can 330ml"
        },
        {
          "type": "text",
          "text": "Shelf 4:"
        },
        {
          "type": "text",
          "text": "\t- Must have 8 units of Coca-Cola original can 330ml"
        },
        {
          "type": "text",
          "text": "Shelf 5:"
        },
        {
          "type": "text",
          "text": "\t- Must have 8 units of Coca-Cola original can 330ml"
        },
        {
          "type": "text",
          "text": "Shelf 6 (bottom)"
        },
        {
          "type": "text",
          "text": "\t- Must have 16 units of Dasani Water bottle 550ml"
        },
        {
          "type": "text",
          "text": "\n"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": f"data:image/jpeg;base64,{encoded_image}"
          }
        },
        {
          "type": "text",
          "text": "\n"
        }
      ]
    }
  ],
  "temperature": 0.7,
  "top_p": 0.95,
  "max_tokens": 800
}

In [21]:
ENDPOINT_BASE = os.getenv('AZURE_OPENAI_ENDPOINT')
API_VERSION = os.getenv('AZURE_OPENAI_API_VERSION')
MODEL_DEPLOYMENT = os.getenv('AZURE_OPENAI_MODEL_DEPLOYMENT')
ENDPOINT = f"{ENDPOINT_BASE}openai/deployments/{MODEL_DEPLOYMENT}/chat/completions?api-version={API_VERSION}"

# Send request
try:
    response = requests.post(ENDPOINT, headers=headers, json=payload)
    response.raise_for_status()  # Will raise an HTTPError if the HTTP request returned an unsuccessful status code
except requests.RequestException as e:
    raise SystemExit(f"Failed to make the request. Error: {e}")

# Handle the response as needed (e.g., print or process)
print(response.json())

{'choices': [{'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}, 'finish_reason': 'stop', 'index': 0, 'logprobs': None, 'message': {'content': 'Below is the audit table based on the provided image and planogram rules:\n\n| Shelf Number | Required SKUs                                          | Compliance Summary                                       |\n|--------------|---------------------------------------------------------|----------------------------------------------------------|\n| 1 (top)      | 2 units of Coca-Cola Original bottle 330ml,             | 2 Coca-Cola Original bottle 330ml: 2 units               |\n|              | 2 units of Coca-Cola Zero bottle 330ml                  | 2 Coca-Cola Zero bottle 330ml: 2 units                   |\n| 2            | 3 units of Fanta Orange bottle 330ml   

In [10]:
# Extract and print the 'choices' -> 'message' -> 'content' part of the response JSON
choices_content = response.json().get('choices', [{}])[0].get('message', {}).get('content')
print(choices_content)

| Shelf Number | Required SKUs                            | Compliance Summary                                                                                     |
|--------------|----------------------------------------|---------------------------------------------------------------------------------------------------------|
| 1            | 2 units of Coca-Cola Original bottle 330ml, 2 units of Coca-Cola Zero bottle 330ml | 2 units of Coca-Cola Original bottle 330ml, 2 units of Coca-Cola Zero bottle 330ml - Compliant            |
| 2            | 3 units of Fanta Orange bottle 330ml   | 2 units of Fanta Orange bottle 330ml - Non-compliant (1 unit missing)                                     |
| 3            | 8 units of Coca-Cola Original can 330ml | 3 units of Coca-Cola Original can 330ml - Non-compliant (5 units missing)                                 |
| 4            | 8 units of Coca-Cola Original can 330ml | 2 units of Coca-Cola Original can 330ml - Non-compliant (6 units miss

### Conclusion
The analysis show that the SOM image yields poorer results.  This is likely due to the fact that the products in the image is very small and the SOM in fact hides the details of the cans and bottles.