ollama-python SDK
Repo: https://github.com/ollama/ollama-python/tree/main
Examples: https://github.com/ollama/ollama-python/tree/main/examples

In [1]:
import asyncio
import ollama
from ollama import chat, generate
from pprint import pprint
from pydantic import BaseModel
from ollama import ChatResponse, GenerateResponse, EmbedResponse, Client, AsyncClient

## Single prompt

In [2]:
response: GenerateResponse = generate(model='llama3.2', prompt="Hello my name is Davide")
print(response.response)

Ciao Davide! It's nice to meet you. Is there something I can help you with or would you like to chat?


## ChatResponse Attributes

In [3]:
pprint([x for x in dir(response) if ((not x.startswith("_")) and (not callable(x)))]) # response object

['construct',
 'context',
 'copy',
 'created_at',
 'dict',
 'done',
 'done_reason',
 'eval_count',
 'eval_duration',
 'from_orm',
 'get',
 'json',
 'load_duration',
 'model',
 'model_computed_fields',
 'model_config',
 'model_construct',
 'model_copy',
 'model_dump',
 'model_dump_json',
 'model_extra',
 'model_fields',
 'model_fields_set',
 'model_json_schema',
 'model_parametrized_name',
 'model_post_init',
 'model_rebuild',
 'model_validate',
 'model_validate_json',
 'model_validate_strings',
 'parse_file',
 'parse_obj',
 'parse_raw',
 'prompt_eval_count',
 'prompt_eval_duration',
 'response',
 'schema',
 'schema_json',
 'thinking',
 'total_duration',
 'update_forward_refs',
 'validate']


## Conversation Chat

In [4]:
rhyme_word = "dish"

messages = [
    {
        "role": "system",
        "content": "You are a expert rapper. You can find rhymes with a lot of words and construct rap lyrics. Keep your answers limited to a single sentence"
    },
    {
        "role": "user",
        "content": f"What rhymes with the word '{rhyme_word}'"
    },
]

response: ChatResponse = chat(model='llama3.2', messages=messages)
print("#1: ", response['message']['content'])

messages.append(response.message)
messages.append({"role": "user", "content": "Now make a rhyme with the word single and the previous word rhyme word."})
response: ChatResponse = chat(model='llama3.2', messages=messages)
print("#2: ", response['message']['content'])


#1:  "Swish" is a perfect rhyme for "dish".
#2:  Here's a rhyme: I'm on a swish, got my heart in a single wish.


## Streaming Response

In [5]:

stream = chat(
    model='llama3.2',
    messages=[{'role': 'user', 'content': 'Why is the sky blue?'}],
    stream=True,
)

for chunk in stream:
  print(chunk['message']['content'], end='', flush=True)

The sky appears blue because of a phenomenon called Rayleigh scattering, named after the British physicist Lord Rayleigh, who first described it in the late 19th century.

Here's what happens:

1. **Sunlight enters Earth's atmosphere**: When sunlight enters our atmosphere, it consists of a broad spectrum of colors, including all the colors of the visible light (red, orange, yellow, green, blue, indigo, and violet).
2. **Light interacts with atmospheric molecules**: As sunlight travels through the atmosphere, it encounters tiny molecules of gases such as nitrogen (N2) and oxygen (O2). These molecules scatter the light in all directions.
3. **Shorter wavelengths are scattered more**: The smaller wavelength colors (like blue and violet) are scattered more than the longer wavelength colors (like red and orange). This is because the smaller molecules are more effective at scattering shorter wavelengths.
4. **Blue light is scattered in all directions**: As a result of Rayleigh scattering, th

## Sampling Parameters

In [11]:
initial_message = {'role': 'user', 'content': 'Why is the sky blue?'}

response: ChatResponse = chat(model='llama3.2', messages=initial_message, options={"temperature": 0.7, "top_p": 0.9})
print("#1: ", response['message']['content'])
response: ChatResponse = chat(model='llama3.2', messages=initial_message, options={"temperature": 0.7, "top_p": 0.9})
print("#2: ", response['message']['content'])
response: ChatResponse = chat(model='llama3.2', messages=initial_message, options={"temperature": 0.0, "top_p": 0.9})
print("#3: ", response['message']['content'])
response: ChatResponse = chat(model='llama3.2', messages=initial_message, options={"temperature": 0.0, "top_p": 0.9})
print("#4: ", response['message']['content'])

ValueError: dictionary update sequence element #0 has length 1; 2 is required

## Custom Client

In [6]:
client = Client(
  host='http://localhost:11434',
  headers={'x-some-header': 'some-value'}
)
response = client.chat(model='llama3.2', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
])

## Async Client

In [7]:

async def async_chat():
  message = {'role': 'user', 'content': 'Why is the sky blue?'}
  response = await AsyncClient().chat(model='llama3.2', messages=[message])


await async_chat() # for notebook use this
# asyncio.run(async_chat()) # for python mode use this

async def async_chat_stream():
  message = {'role': 'user', 'content': 'Why is the sky blue?'}
  async for part in await AsyncClient().chat(model='llama3.2', messages=[message], stream=True):
    print(part['message']['content'], end='', flush=True)

await async_chat_stream() # for notebook use this
# asyncio.run(async_chat_stream()) # for python mode use this

The sky appears blue because of a phenomenon called scattering, which occurs when sunlight interacts with the tiny molecules of gases in the Earth's atmosphere.

Here's what happens:

1. When sunlight enters the Earth's atmosphere, it encounters tiny molecules of nitrogen (N2) and oxygen (O2).
2. These molecules scatter the light in all directions, but they scatter shorter (blue) wavelengths more than longer (red) wavelengths.
3. This is known as Rayleigh scattering, named after the British physicist Lord Rayleigh, who first described it in the late 19th century.
4. As a result of this scattering, the blue light is distributed throughout the atmosphere and reaches our eyes from all directions.
5. Our eyes perceive this scattered blue light as the color of the sky.

It's worth noting that the color of the sky can appear different under various conditions, such as:

* During sunrise and sunset, when the light has to travel through more of the Earth's atmosphere, scattering occurs more in

# Tool Use

## Single Tool Call

In [8]:

def add_two_numbers(a: int, b: int) -> int:
  """
  Add two numbers

  Args:
    a (int): The first number
    b (int): The second number

  Returns:
    int: The sum of the two numbers
  """

  # The cast is necessary as returned tool call arguments don't always conform exactly to schema
  # E.g. this would prevent "what is 30 + 12" to produce '3012' instead of 42
  return int(a) + int(b)


def subtract_two_numbers(a: int, b: int) -> int:
  """
  Subtract two numbers
  """

  # The cast is necessary as returned tool call arguments don't always conform exactly to schema
  return int(a) - int(b)


# Tools can still be manually defined and passed into chat
subtract_two_numbers_tool = {
  'type': 'function',
  'function': {
    'name': 'subtract_two_numbers',
    'description': 'Subtract two numbers',
    'parameters': {
      'type': 'object',
      'required': ['a', 'b'],
      'properties': {
        'a': {'type': 'integer', 'description': 'The first number'},
        'b': {'type': 'integer', 'description': 'The second number'},
      },
    },
  },
}

messages = [{'role': 'user', 'content': 'What is three plus one?'}]
print('Prompt:', messages[0]['content'])

available_functions = {
  'add_two_numbers': add_two_numbers,
  'subtract_two_numbers': subtract_two_numbers,
}

response: ChatResponse = chat(
  'llama3.1',
  messages=messages,
  tools=[add_two_numbers, subtract_two_numbers_tool],
)

if response.message.tool_calls:
  # There may be multiple tool calls in the response
  for tool in response.message.tool_calls:
    # Ensure the function is available, and then call it
    if function_to_call := available_functions.get(tool.function.name):
      print('Calling function:', tool.function.name)
      print('Arguments:', tool.function.arguments)
      output = function_to_call(**tool.function.arguments)
      print('Function output:', output)
    else:
      print('Function', tool.function.name, 'not found')

# Only needed to chat with the model using the tool call results
if response.message.tool_calls:
  # Add the function response to messages for the model to use
  messages.append(response.message)
  messages.append({'role': 'tool', 'content': str(output), 'tool_name': tool.function.name})

  # Get final response from model with function outputs
  final_response = chat('llama3.2', messages=messages)
  print('Final response:', final_response.message.content)

else:
  print('No tool calls returned from model')

Prompt: What is three plus one?


ResponseError: model "llama3.1" not found, try pulling it first (status code: 404)

## Multi-tool Call

In [None]:
import random
from typing import Iterator


def get_temperature(city: str) -> int:
  """
  Get the temperature for a city in Celsius

  Args:
    city (str): The name of the city

  Returns:
    int: The current temperature in Celsius
  """
  # This is a mock implementation - would need to use a real weather API
  import random

  if city not in ['London', 'Paris', 'New York', 'Tokyo', 'Sydney']:
    return 'Unknown city'

  return str(random.randint(0, 35)) + ' degrees Celsius'


def get_conditions(city: str) -> str:
  """
  Get the weather conditions for a city
  """
  if city not in ['London', 'Paris', 'New York', 'Tokyo', 'Sydney']:
    return 'Unknown city'
  # This is a mock implementation - would need to use a real weather API
  conditions = ['sunny', 'cloudy', 'rainy', 'snowy']
  return random.choice(conditions)


available_functions = {
  'get_temperature': get_temperature,
  'get_conditions': get_conditions,
}


cities = ['London', 'Paris', 'New York', 'Tokyo', 'Sydney']
city = random.choice(cities)
city2 = random.choice(cities)
messages = [{'role': 'user', 'content': f'What is the temperature in {city}? and what are the weather conditions in {city2}?'}]
print('----- Prompt:', messages[0]['content'], '\n')

model = 'qwen3'
client = Client()
response: Iterator[ChatResponse] = client.chat(model, stream=True, messages=messages, tools=[get_temperature, get_conditions], think=True)

for chunk in response:
  if chunk.message.thinking:
    print(chunk.message.thinking, end='', flush=True)
  if chunk.message.content:
    print(chunk.message.content, end='', flush=True)
  if chunk.message.tool_calls:
    for tool in chunk.message.tool_calls:
      if function_to_call := available_functions.get(tool.function.name):
        print('\nCalling function:', tool.function.name, 'with arguments:', tool.function.arguments)
        output = function_to_call(**tool.function.arguments)
        print('> Function output:', output, '\n')

        # Add the assistant message and tool call result to the messages
        messages.append(chunk.message)
        messages.append({'role': 'tool', 'content': str(output), 'tool_name': tool.function.name})
      else:
        print('Function', tool.function.name, 'not found')

print('----- Sending result back to model \n')
if any(msg.get('role') == 'tool' for msg in messages):
  res = client.chat(model, stream=True, tools=[get_temperature, get_conditions], messages=messages, think=True)
  done_thinking = False
  for chunk in res:
    if chunk.message.thinking:
      print(chunk.message.thinking, end='', flush=True)
    if chunk.message.content:
      if not done_thinking:
        print('\n----- Final result:')
        done_thinking = True
      print(chunk.message.content, end='', flush=True)
    if chunk.message.tool_calls:
      # Model should be explaining the tool calls and the results in this output
      print('Model returned tool calls:')
      print(chunk.message.tool_calls)
else:
  print('No tool calls returned')

## Structured Output

In [None]:

# Define the schema for the response
class FriendInfo(BaseModel):
  name: str
  age: int
  is_available: bool


class FriendList(BaseModel):
  friends: list[FriendInfo]


schema = {'type': 'object', 'properties': {'friends': {'type': 'array', 'items': {'type': 'object', 'properties': {'name': {'type': 'string'}, 'age': {'type': 'integer'}, 'is_available': {'type': 'boolean'}}, 'required': ['name', 'age', 'is_available']}}}, 'required': ['friends']}
response = chat(
  model='llama3.1:8b',
  messages=[{'role': 'user', 'content': 'I have two friends. The first is Ollama 22 years old busy saving the world, and the second is Alonso 23 years old and wants to hang out. Return a list of friends in JSON format'}],
  format=FriendList.model_json_schema(),  # Use Pydantic to generate the schema or format=schema
  options={'temperature': 0},  # Make responses more deterministic
)

# Use Pydantic to validate the response
friends_response = FriendList.model_validate_json(response.message.content)
print(friends_response)

## Thinking / Reasoning

In [None]:
messages = [
  {
    'role': 'user',
    'content': 'What is 10 + 23 * 2?',
  },
]

response_think = chat("qwen3:1.7b", messages=messages, think=True)
response_nothink = chat("qwen3:1.7b", messages=messages, think=False)

print('Thinking:\n========\n\n' + response_think.message.thinking)
print('\nResponse:\n========\n\n' + response_think.message.content)
print('\n No Think Response:\n========\n\n' + response_nothink.message.content)

# Multi-modal

## Image

In [None]:
with open('player.png', 'rb') as file:
  response = ollama.chat(
    model='llama3.2-vision',
    messages=[
      {
        'role': 'user',
        'content': 'What color is the tie the man is wearing?',
        'images': [file.read()],
      },
    ],
  )
print(response['message']['content'])

## Embed

In [None]:
output: EmbedResponse = ollama.embed(model='llama3.1', input='The sky is blue because of rayleigh scattering')
output_embeddings = output.embeddings

print(output)
print("Embedding dimension:", len(output.embeddings[0]))


batch_output: EmbedResponse = ollama.embed(model='llama3.1', input=['The sky is blue because of rayleigh scattering', 'Grass is green because of chlorophyll'])
print(batch_output)
print("Number of embeddings:", len(batch_output.embeddings))

## API

In [None]:
print(ollama.list(), "\n####\n")
print(ollama.show("llama3.2"), "\n####\n")
print(ollama.ps(), "\n####\n")


# ollama.create(model='example', modelfile=modelfile) (using ModelFile)
# ollama.create(model='example', from_='gemma3', system="You are Mario from Super Mario Bros.")
# ollama.copy('gemma3', 'user/gemma3')
# ollama.delete('gemma3')
# ollama.pull('gemma3')
# ollama.push('user/gemma3')
#
#