# Fine-grained recognition using GPT-4V and few shot prompting
The primary concept behind this approach involves crafting a few-shot prompt for GPT-4, initially presenting it with samples from all the classes. Subsequently, we input the target image for classification to GPT-4 and request its prediction.

In this scenario, we begin by showcasing the following three images to GPT-4, illustrating the appearance of each class (n=3).

| smart fortwo Convertible 2012|Toyota Sequoia SUV 2012|Volvo 240 Sedan 1993|
|:-------------------------:|:-------------------------:|:-------------------------:|
|<img src="./cars/smart%20fortwo%20Convertible%202012.jpg" width="400">|<img src="./cars/Toyota%20Sequoia%20SUV%202012.jpg" width="400">|<img src="./cars/Volvo%20240%20Sedan%201993.jpg" width="400">|

Subsequently, we inquire of GPT-4V, "Based on the cars mentioned earlier, which car is portrayed in the following image?"

<img src="./cars/input.jpg" width="400">


In [5]:
import base64
import requests
from secret import *

# setup OpenAI API key
api_key = OPENAI_API_KEY

# function to encode the image 
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")


all_cars = [
    "smart fortwo Convertible 2012",
    "Toyota Sequoia SUV 2012.jpg",
    "Volvo 240 Sedan 1993.jpg",
]

def build_class_knowledge(cars):
    class_knowledge = []
    for c in cars:
        encoded_img = encode_image("./cars/smart fortwo Convertible 2012.jpg")
        class_knowledge.extend(
            [
                {
                    "type": "text",
                    "text": "This car is smart fortwo Convertible 2012.",
                },
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/jpeg;base64,{encoded_img}"},
                },
            ]
        )
    return class_knowledge

# encode input image.
input_image = encode_image("./cars/input.jpg")

# prepare header for visiting API.
headers = {"Content-Type": "application/json", "Authorization": f"Bearer {api_key}"}

# creating payload for requesting GPT-4V
payload = {
    "model": "gpt-4-vision-preview",
    "messages": [
        {
            "role": "user",
            "content": build_class_knowledge(all_cars)
            + [
                {
                    "type": "text",
                    "text": "Based on the cars mentioned earlier, which car is portrayed in the following image? Kindly provide your answer along with the pertinent reasons.",
                },
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/jpeg;base64,{input_image}"},
                },
            ],
        }
    ],
    "max_tokens": 300,
}

# send request to OpenAI API
response = requests.post(
    "https://api.openai.com/v1/chat/completions", headers=headers, json=payload
)

print(response.json())

{'id': 'chatcmpl-8vIZ2NFY5MFa4z0ltBJIKiIOBsr3l', 'object': 'chat.completion', 'created': 1708666748, 'model': 'gpt-4-1106-vision-preview', 'usage': {'prompt_tokens': 2620, 'completion_tokens': 206, 'total_tokens': 2826}, 'choices': [{'message': {'role': 'assistant', 'content': 'Based on the images provided, the car portrayed in the last image is also a Smart Fortwo Convertible. The pertinent reasons for this identification are:\n\n1. Compact Size: The vehicle in the image is a small, two-seater car, which is characteristic of the Smart Fortwo model.\n\n2. Design Features: The headlight design, the front grille shape, and placement of the smart logo are the same as in the images of the Smart Fortwo Convertible 2012 you provided earlier.\n\n3. Convertible Roof: The car in the image has an open top, which is consistent with the convertible version of the Smart Fortwo.\n\n4. Distinctive Markings: The rear portion of the side panels features a colored accent, which is a design element frequ

#### The response provided by GPT-4 is as follows, ,which is accurate, supported by compelling rationale.

In [7]:
print(response.json()['choices'][0]['message']['content'])

Based on the images provided, the car portrayed in the last image is also a Smart Fortwo Convertible. The pertinent reasons for this identification are:

1. Compact Size: The vehicle in the image is a small, two-seater car, which is characteristic of the Smart Fortwo model.

2. Design Features: The headlight design, the front grille shape, and placement of the smart logo are the same as in the images of the Smart Fortwo Convertible 2012 you provided earlier.

3. Convertible Roof: The car in the image has an open top, which is consistent with the convertible version of the Smart Fortwo.

4. Distinctive Markings: The rear portion of the side panels features a colored accent, which is a design element frequently found on Smart cars.

The combination of these features, particularly the size, design cues, and open-top, make it clear that the car in the last image is a Smart Fortwo Convertible, similar to the one in the first three images.
