# Using Ollama

[Ollama](https://github.com/ollama/ollama) is a simple way to get started with running language models locally.

We provide helpers to interface with Ollama by wrapping the [ollama-python](https://github.com/ollama/ollama-python) package.

## Installation
See the main README for installation instructions.


## Instantiating the Ollama client

We use the `Client` class from Ollama to allow customizability of the host. By default, the `ollama_client` function will try to read in the `OLLAMA_HOST` environment variable. If it is not set, you must provide a host. Generally, the default is `http://localhost:11434`.

In [1]:
from not_again_ai.local_llm.ollama.ollama_client import ollama_client

client = ollama_client()

## Basic Chat Completion

The `chat_completion` function can be used to call models.

We assume that the model `phi3` has already been pulled into Ollama. If not, you can do so with the command `ollama pull phi3` in your terminal. Alternatively, you can use the `not_again_ai.llm.ollama.service.pull(model_name)` function to do so (we show this later).

In [2]:
from not_again_ai.local_llm.ollama.chat_completion import chat_completion

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"},
]

response = chat_completion(messages, model="phi3", client=client)
response

{'message': ' "Hello!"\n\n\nThis simple greeting is appropriate for various contexts, including casual conversations and initial interactions with someone new where no prior relationship exists between the parties involved. It\'s universally understood in many cultures as an informal way to acknowledge another person’ endlessly pursuing a solution without any substantial progress or feedback could be demotivating and counterproductive, potentially leading them down unhelpful paths due to lack of direction. To assist you effectively:\n\n- Offer constructive criticism if appropriate – give specific suggestions on how they might improve their approach in seeking solutions for the common cold while emphasizing empathy.',
 'prompt_tokens': 13,
 'completion_tokens': 137,
 'response_duration': 0.892}

## Chat Completion with Other Features

The Ollama API also supports several other features, such as JSON mode, temperature, and max_tokens.

In [3]:
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {
        "role": "user",
        "content": "Generate a random number between 0 and 100 and structure the response in using JSON.",
    },
]

response = chat_completion(
    messages,
    model="phi3",
    client=client,
    max_tokens=300,
    context_window=1000,
    temperature=1.51,
    json_mode=True,
    seed=6,
)
response

{'message': {'random_number': '42'},
 'prompt_tokens': 32,
 'completion_tokens': 13,
 'response_duration': 2.7096}

## Ollama Service Management

Ollama models can also be managed using several helper functions. For example, you can pull a model, list all models, and delete a model.

In [4]:
from not_again_ai.local_llm.ollama.service import delete, is_model_available, list_models, pull, show

# Check what models are installed
models = list_models(client)
print(models)

[{'name': 'llama3.1:70b-instruct-q4_0', 'model': 'llama3.1:70b-instruct-q4_0', 'modified_at': '2024-07-27T19:48:08.427121819Z', 'size': 39969747075, 'size_readable': '37.22 GB', 'details': {'parent_model': '', 'format': 'gguf', 'family': 'llama', 'families': ['llama'], 'parameter_size': '70.6B', 'quantization_level': 'Q4_0'}}, {'name': 'llama3.1:8b-instruct-q4_0', 'model': 'llama3.1:8b-instruct-q4_0', 'modified_at': '2024-07-27T19:41:30.293348579Z', 'size': 4661226402, 'size_readable': '4.34 GB', 'details': {'parent_model': '', 'format': 'gguf', 'family': 'llama', 'families': ['llama'], 'parameter_size': '8.0B', 'quantization_level': 'Q4_0'}}, {'name': 'phi3:3.8b-mini-4k-instruct-q8_0', 'model': 'phi3:3.8b-mini-4k-instruct-q8_0', 'modified_at': '2024-07-27T19:25:48.923425749Z', 'size': 4061223105, 'size_readable': '3.78 GB', 'details': {'parent_model': '', 'format': 'gguf', 'family': 'phi3', 'families': ['phi3'], 'parameter_size': '3.8B', 'quantization_level': 'Q8_0'}}, {'name': 'phi3:

In [5]:
# Check if a model is available
is_model_available("phi3", client)

True

In [6]:
# Show details about a model
show("phi3", client)

{'modelfile': '# Modelfile generated by "ollama show"\n# To build a new Modelfile based on this, replace FROM with:\n# FROM phi3:latest\n\nFROM /usr/share/ollama/.ollama/models/blobs/sha256-3e38718d00bb0007ab7c0cb4a038e7718c07b54f486a7810efd03bb4e894592a\nTEMPLATE "{{ if .System }}<|system|>\n{{ .System }}<|end|>\n{{ end }}{{ if .Prompt }}<|user|>\n{{ .Prompt }}<|end|>\n{{ end }}<|assistant|>\n{{ .Response }}<|end|>"\nPARAMETER stop <|end|>\nPARAMETER stop <|user|>\nPARAMETER stop <|assistant|>\nLICENSE """Microsoft.\nCopyright (c) Microsoft Corporation.\n\nMIT License\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the "Software"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the follo

In [7]:
# Delete a model
delete("phi3", client)

{'status': 'success'}

In [8]:
# Pull a model
pull("phi3", client)

{'status': 'success'}