## Few Shot Multimodal Classification

This notebook provides a quick example of doing few shot learning for multimodal classifcation of images.

In [0]:
%pip install httpx openai
%restart_python

There is a byte limit for images in requests, so we whip up a quick resize function and shrink our images. Aspect ratio doesn't matter much.

In [0]:
import httpx
import base64
from PIL import Image
import io

def fetch_and_resize_image(url, size=(400, 400)):
    response = httpx.get(url)
    image = Image.open(io.BytesIO(response.content))
    image.thumbnail(size)
    buffered = io.BytesIO()
    image.save(buffered, format="JPEG")
    return base64.standard_b64encode(buffered.getvalue()).decode("utf-8")

media_type = "image/jpeg"

combo_data = fetch_and_resize_image("https://upload.wikimedia.org/wikipedia/commons/b/b7/Kluc_ockoplochy.jpg")
adjustable_data = fetch_and_resize_image("https://upload.wikimedia.org/wikipedia/commons/4/44/Adjustablewrenches.jpg")
pipe_wrench_data = fetch_and_resize_image("https://upload.wikimedia.org/wikipedia/commons/b/b1/Ridgid_10%22_pipe_wrench.jpg")
test_adjustable_data = fetch_and_resize_image("https://upload.wikimedia.org/wikipedia/commons/4/43/AdjustableWrenchWhiteBackground.jpg")

We can pass multiple images via a list in the 'content' field. This is pretty useful for assembling multiple images/byte streams. And bam, we have a few-shot multimodal image classification! 

In [0]:
from openai import OpenAI

DATABRICKS_TOKEN = dbutils.notebook.entry_point.getDbutils().notebook().getContext().apiToken().get()

client = OpenAI(
  api_key=DATABRICKS_TOKEN,
  base_url="https://adb-984752964297111.11.azuredatabricks.net/serving-endpoints"
)

chat_completion = client.chat.completions.create(
  messages=[
    {
      "role": "system",
      "content": "You are a mechanic tool classification expert. Use "
    },
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "Image 1 - a combo wrench"},
        {"type": "image_url", "image_url": {"url": f"data:{media_type};base64,{combo_data}"}},
        {"type": "text", "text": "Image 2 - an adjustable wrench"},
        {"type": "image_url", "image_url": {"url": f"data:{media_type};base64,{adjustable_data}"}},
        {"type": "text", "text": "Image 3 - a pipe wrench"},
        {"type": "image_url", "image_url": {"url": f"data:{media_type};base64,{pipe_wrench_data}"}},
        {"type": "text", "text": "What is the name of the tool in the following image and does it match any of the examples your were provided?"},
        {"type": "image_url", "image_url": {"url": f"data:{media_type};base64,{test_adjustable_data}"}}
      ]
    }
  ],
  model="databricks-claude-3-7-sonnet",
  max_tokens=1024
)

parsed_text = chat_completion.choices[0].message.content
print(parsed_text)