# Part 3: Creating a Basic Image Agent

![Alt text](img/augLLMs.png)

In this tutorial we'll be making a simplified image classifier/agent with Gemma3.

Theres two parts

* **Multimodal Gemma Classifier** - Using a Gemma model to detect what's in the image and provide a specific output.
* **Downstream Action** - A simple function that can process the results of the action, such as sending an email or anything else!

## Puttting it all together

Now that we have an image model ready, let's set up a simple function to interact with the model and its outputs.

Let's reedfine `model_call(prompt)` function that sends user input to the LLM and receives a response.  

In [28]:
from ollama import chat
from ollama import ChatResponse
import pprint
from IPython.display import Markdown

image_path = "img/NotHotDog.jpg"  # Replace with the actual path to your image file



def image_chat(prompt, img_path, model="gemma3:27b-it-qat"):
    response = chat(
        model = model, 
        messages=[
            {
                'role': 'user',
                'content': prompt,
                'images': [img_path]
            }
        ]
    )
    return response["message"]["content"]

prompt = "What is this an image of?"
output = image_chat(prompt, image_path)

display(Markdown(output))

This image shows a **dachshund (wiener dog) dressed up as a hot dog!** 

The dog is wearing a costume that makes it look like a complete hot dog, complete with a bun and mustard. It’s a playful and humorous image, often seen in pet costume contests or for a bit of fun.

## Hot Dog or not Hotdog Classifier

In [29]:
image_path = "img/NotHotDog.jpg"  # Replace with the actual path to your image file

def image_chat(prompt, img_path, model="gemma3:27b-it-qat"):
    response = chat(
        model = model, 
        messages=[
            {
                'role': 'user',
                'content': prompt,
                'images': [img_path]
            }
        ]
    )
    return response["message"]["content"]

prompt = 'Is this an image of the food item hot dog say yes, otherwise say no, no other output'
image_chat(prompt, image_path) 

'no'

In [30]:
image_path = "img/NotHotDog.jpg"  # Replace with the actual path to your image file

def image_chat(prompt, img_path):
    response = chat(
        model="gemma3:27b-it-qat",  # Use a vision-capable model like LLaVA
        messages=[
            {
                'role': 'user',
                'content': 'Is this an image of the food item hot dog say yes, otherwise say no, no other output',
                'images': [img_path]
            }
        ]
    )
    return response["message"]["content"]


image_chat(None,image_path) 

'no'

## Parsing the response
With our prompt complete we can turn this into a simple classifier. From here you can replace this with any python logic you like, whether its sending an email, or anything else.

In [13]:
image_path = "img/Hot_dog_with_mustard.png"  # Replace with the actual path to your image file

image_chat(None, image_path) 

'yes'

In [18]:
def parse_response(img_path):
    response = image_chat(None, image_path)
    if response.lower() == 'yes':
        return "Give treat"
    return "Add ketchup"

parse_response(image_path)

'Give treat'

## 🎯 Recap: What We Learned 

In this section, we built our first basic image classification agent that can differentiate between two images and respond accordingly.

Here are the key ideas to remember:
- **Not everything needs to be chat**: Models can be prompted to return parseable outputs quite easily, no architecture changes needed.
- **The model can be an intermediate part of a system**: The model doesn't always need to be front and center of every application.
- **From there you can do anything**: We just outputted strings, but with python (or any other language) we can make our system act agentically do anything.

This pattern — LLM suggests, framework acts — is the foundation for building more complex agents later.