# Part 1: Basic Image Prompting

In this notebook we'll show how to interact with Gemma using images

## Sending a Basic Prompt to the Model

Here's the simple chat flow to Gemma once again.

In [2]:
from ollama import chat
from ollama import ChatResponse

model = 'gemma3:4b'
# Note, the argument model_prompt is specific here
def model_call(model_prompt):
    
    response: ChatResponse = chat(model=model, messages=[
      {
        'role': 'user',
        'content': model_prompt,
      },
    ])
    return response['message']['content']

user_prompt = "Say hello to the class"

# Note, the argument user_prompt is specific here
model_call(user_prompt)

'Hello everyone! 😊 \n\nIt’s great to be here with you all today. \n\nHow’s everyone doing?'

## Adding an Image
Gemma3 has been trained with multimodality, where images are converted into embedding vectors the model can operate on. As a user adding an image is quite straightforward.

In [3]:
image_path = "img/ducks.jpg"  

response = chat(
        model="gemma3:27b-it-qat",  
        messages=[
            {
                'role': 'user',
                'content': 'What is this?',
                'images': [image_path]
            }
        ]
    )

response["message"]["content"]

'The image shows nine yellow rubber duckies arranged in a 3x3 grid against a transparent checkered background. \n\nThey are the classic bath toy, typically bright yellow and with a simple, cheerful design. The background transparency suggests these might be images intended for digital compositing or design projects.'

In [7]:
image_path = "img/ducks.jpg"  

response = chat(
        model="gemma3:4b",  
        messages=[
            {
                'role': 'user',
                'content': 'What is this?',
                'images': [image_path]
            }
        ]
    )

response["message"]["content"]

'This image shows nine yellow rubber ducks arranged in a grid on a transparent background.'

## OCR Use Cases

In [9]:
image_path = "img/Receipt.jpg"  # Replace with the actual path to your image file

def image_chat(prompt, img_path):
    response = chat(
        model="gemma3:27b-it-qat",  # Use a vision-capable model like LLaVA
        messages=[
            {
                'role': 'user',
                'content': prompt,
                'images': [img_path]
            }
        ]
    )
    return response["message"]["content"]


response = image_chat("What's the text in this image?",image_path)

'Here\'s a breakdown of the text visible in the image of the receipt:\n\n**Header Information:**\n\n*   **SHOP NAME**\n*   **Address:** Lorem Ipsum, 23-10\n*   **Telp:** 11223344\n\n**Receipt Details:**\n\n*   **CASH RECEIPT**\n*   **Description** | **Price**\n    *   Lorem | 1.1\n    *   Ipsum | 2.2\n    *   Dolor sit amet | 3.3\n    *   Consectetur | 4.4\n    *   Adipiscing elit | 5.5\n*   **Total** | **16.5**\n*   Cash | 20.0\n*   Change | 3.5\n*   Bank card | 234\n*   Approval Code | #123456\n\n**Footer:**\n\n*   THANK YOU!\n*   (A barcode is also present)\n\n**Design Attribution:**\n\n*   designed by **freepik**\n\nThe receipt appears to be a sample with placeholder text ("Lorem Ipsum") rather than a real transaction record.'

## Counting Objects

In [10]:
image_path = "img/RealDucks.jpg"  # Replace with the actual path to your image file


image_chat("How many ducks are in this image?", image_path) 

'Based on the image, there are **four** ducks. \n\nYou can see three male ducks with the distinctive green heads and one female duck with a more mottled brown coloration.'

## 🎯 Recap: What We Learned

In this section, we built our first basic agent that can recognize when a tool call is needed and respond accordingly.

* **Provide Gemma Models Image Input** - Saw how to provide images to Gemma models using ollama
* **Assess performance** - Saw how different sized models have different speed and performance
* **Showcased different use cases** - Tested various use cases such as OCR and counting