# Ollama Usage

Ollama Python Library
The Ollama Python library provides the easiest way to integrate Python 3.8+ projects with Ollama.

Install

In [None]:
!pip install ollama

In [None]:
import ollama

### Pull a Model

In [None]:
ollama.pull('llama3.1')

Push

ollama.push('user/llama3.1')

Delete Downloaded model

ollama.delete('llama3.1')

### List of Downloaded Ollama Models

In [None]:
ollama.list()

### PS
PS is a command used in the Ollama framework to print the status of the models or processes running in the background. 

In [None]:
ollama.ps()

### Create a Modelfile
New Model file will be created

In [None]:
modelfile='''
FROM llama3.1
SYSTEM You are mario from super mario bros.
'''

ollama.create(model='example', modelfile=modelfile)

### Show
Complete Description about a model

In [None]:
ollama.show('llama3.1')

### Chat Parameters

**`stream`**: `True` for streaming responses in real-time.

**Other possible parameters:**

- **`model`**: Specifies the model to use for the conversation (e.g., `'llama3.1'`, `'llama7b'`, etc.).
- **`messages`**: A list of messages to start the conversation, each with:
  - **`role`**: (e.g., `'user'` or `'assistant'`)
  - **`content`**: The message text.
- **`stream`**: If `True`, the response will be streamed in real-time as the model generates it.
- **`keep_alive`**: Keep the model active.

In [None]:
import ollama

stream = ollama.chat(
    model='llama3.1',
    messages=[{'role': 'user', 'content': 'Why is the sky blue?'}],
    keep_alive=1
)

for chunk in stream:
  print(chunk['message']['content'], end='', flush=True)

### Generate

Mostly same as chat()

In [None]:
ollama.generate(model='llama3.1', prompt='Why is the sky blue?')

### Embeddings
You can use embedding models as well

In [None]:
ollama.embeddings(model='llama3.1', prompt='The sky is blue because of rayleigh scattering')

### Custom client
A custom client can be created with the following fields:

- host: The Ollama host to connect to
- timeout: The timeout for requests

In [None]:
from ollama import Client
client = Client(host='http://localhost:11434')
response = client.chat(model='llama3.1', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
])

### Async client

In [None]:
import asyncio
from ollama import AsyncClient

async def chat():
  message = {'role': 'user', 'content': 'Why is the sky blue?'}
  response = await AsyncClient().chat(model='llama3.1', messages=[message])

asyncio.run(chat())

# Setting stream=True modifies functions to return a Python asynchronous generator:

import asyncio
from ollama import AsyncClient

async def chat():
  message = {'role': 'user', 'content': 'Why is the sky blue?'}
  async for part in await AsyncClient().chat(model='llama3.1', messages=[message], stream=True):
    print(part['message']['content'], end='', flush=True)

asyncio.run(chat())

### Errors
Errors are raised if requests return an error status or if an error is detected while streaming.

In [None]:
model = 'llama3.1:70b'

try:
  ollama.chat(model)
except ollama.ResponseError as e:
  print('Error:', e.error)
  if e.status_code == 404:
    ollama.pull(model)

<div style="text-align: center;">
    <p style="font-size: 1em; margin: 0;">Written by Ramachandra Udupa | <a href="https://www.linkedin.com/in/ramachandra-udupa/" style="color: #4CAF50; text-decoration: none;">Connect on LinkedIn</a></p>
</div>