##### Copyright 2025 Google LLC.

In [None]:
# @title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Gemma - Run with Ollama Python library

Author: Onuralp Sezer

*   GitHub: [github.com/onuralpszr](https://github.com/onuralpszr/)
*   X: [@onuralpszr](https://x.com/onuralpszr)

Description: This notebook demonstrates how you can run inference on a Gemma3 model using  [Ollama Python library](https://github.com/ollama/ollama-python). The Ollama Python library provides the easiest way to integrate Python 3.8+ projects with Ollama.

<table align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/google-gemini/gemma-cookbook/blob/main/Gemma/[Gemma_3]Using_with_Ollama_Python_inference_images.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
</table>

## Setup

### Select the Colab runtime
To complete this tutorial, you'll need to have a Colab runtime with sufficient resources to run the Gemma model. In this case, you can use a T4 GPU:

1. In the upper-right of the Colab window, select **▾ (Additional connection options)**.
2. Select **Change runtime type**.
3. Under **Hardware accelerator**, select **T4 GPU**.

## Installation

Install Ollama through the offical installation script.

In [None]:
!sudo apt-get install pciutils
!curl -fsSL https://ollama.com/install.sh | sh

Install Ollama Python library through the official Python client for Ollama.

In [3]:
!pip install -q ollama

## Start Ollama

Start Ollama in background using nohup.

In [4]:
!nohup ollama serve > ollama.log &

nohup: redirecting stderr to stdout


## Prerequisites

*   Ollama should be installed and running. (This was already completed in previous steps.)
*   Pull the gemma3 model to use with the library: `ollama pull gemma3:4b`
    *  See [Ollama.com](https://ollama.com/) for more information on the models available.

In [5]:
import ollama

In [34]:
ollama.pull('gemma3:4b')

ProgressResponse(status='success', completed=None, total=None, digest=None)

## Inference with Text 📝 and Image 🖼️

Run inference using Ollama Python library.

In [None]:
!curl -A "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36" -o pexels-adria-masi-461420600-27372369.jpg "https://images.pexels.com/photos/27372369/pexels-photo-27372369.jpeg?cs=srgb&dl=pexels-adria-masi-461420600-27372369.jpg&fm=jpg&w=1280&h=856"

### Image Preview (Optional) 🖼️

In [None]:

import cv2
from google.colab.patches import cv2_imshow
cv2_imshow(cv2.imread('pexels-adria-masi-461420600-27372369.jpg'))

### Generate

In [None]:
import ollama

res = ollama.chat(
	model="gemma3:4b",
	messages=[
		{
			'role': 'user',
			'content': 'Describe this image:',
			'images': ['./pexels-adria-masi-461420600-27372369.jpg']
		}
	]
)

print(res['message']['content'])

In [None]:
!wget https://ollama.com/public/blog/wordart.jpg

#### Async client with OCR Usage 📝

To make asynchronous requests, use the `AsyncClient` class.

In [None]:
import asyncio
import nest_asyncio
from ollama import AsyncClient

nest_asyncio.apply()


async def generate():
    """
    Asynchronously generates a response to a given prompt using the AsyncClient.

    This function creates an instance of AsyncClient and sends a request to generate
    a response for the specified prompt. The response is then printed.
    """
    # Create an instance of the AsyncClient
    client = AsyncClient()

    # Send a request to generate a response to the prompt
    message={
    'role': 'user',
    'content': 'What does the text say ?',
    'images': ['./wordart.jpg']}


    async for part in await AsyncClient().chat(model='gemma3:4b', messages=[message], stream=True):
      print(part['message']['content'], end='', flush=True)

# Run the generate function
asyncio.run(generate())

In [None]:
!curl -A "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36" -o pexels-pixabay-60582.jpg "https://images.pexels.com/photos/60582/newton-s-cradle-balls-sphere-action-60582.jpeg?cs=srgb&dl=pexels-pixabay-60582.jpg&fm=jpg&w=1920&h=1346"

### Object Counting 🧾

In [None]:
from ollama import chat

# Start a conversation with the model
messages = [
    {
        "role": "user",
        "content": "How many Balls on the cradle?",
        "images": ["./pexels-pixabay-60582.jpg"]
    },
]

# Get the model's response to the message
response = chat("gemma3:4b", messages=messages)
print(response["message"]["content"])

## Conclusion 🏆

Congratulations! You have successfully run inference on a Gemma3 model using the Ollama Python library with VLM capabilities. You can now integrate this into your Python projects.