## Vision Agents with `smolagents`

Consider the case where Alfred wants to verify the identities of the superheroes attending the party. He already has a dataset of images from previous parties with the names of the guests. Given a new visitor’s image, the agent can compare it with the existing dataset and make a decision about letting them in.

In this case, a guest is trying to enter, and Alfred suspects that this visitor might be The Joker impersonating Wonder Woman. Alfred needs to verify their identity to prevent anyone unwanted from entering.

### Providing pre-determined images to Alfred

In [1]:
from PIL import Image
import requests
from io import BytesIO

image_urls = [
    "https://upload.wikimedia.org/wikipedia/commons/e/e8/The_Joker_at_Wax_Museum_Plus.jpg", # Joker image
    "https://upload.wikimedia.org/wikipedia/en/9/98/Joker_%28DC_Comics_character%29.jpg" # Joker image
]

images = []

for url in image_urls:
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36"
    }
    response = requests.get(url, headers=headers)
    image = Image.open(BytesIO(response.content))
    images.append(image)

In [None]:
OPENAI_API_KEY=""

In [6]:
from smolagents import CodeAgent, OpenAIServerModel

model = OpenAIServerModel(model_id='gpt-4o-mini', api_key=OPENAI_API_KEY)

agent = CodeAgent(model=model, tools=[], max_steps=5, verbosity_level=2)

response = agent.run(
    """
    Describe the costume and makeup that the comic characters in these photos is wearing and return the description.
    Tell me if the guest is The Joker or Wonder Woman.
    """, images=images
)

In [7]:
response

{'description': {'makeup': {'face': 'Pale white face.',
   'eyes': 'Bright blue with dark eyeliner.',
   'lips': 'Exaggerated red smile.',
   'hair': 'Bright green, wild and unkempt.'},
  'costume': {'jacket': 'Bright purple suit jacket.',
   'shirt': 'Bright yellow shirt underneath.',
   'accessories': 'Contrasting tie and a flower lapel that can squirt water.'},
  'character': 'The Joker'}}

### Dynamic retrieval of images by Alfred

Refer vision_web_browser.py