# Using Ollama

[Ollama](https://github.com/ollama/ollama) is a simple way to get started with running language models locally.

We provide helpers for interfacing with Ollama through the [ollama-python](https://github.com/ollama/ollama-python) package.

## Installation
1. Follow the instructions to install ollama for your system: https://github.com/ollama/ollama
1. [Add Ollama as a startup service (recommended)](https://github.com/ollama/ollama/blob/main/docs/linux.md#adding-ollama-as-a-startup-service-recommended)
1. If you'd like to make the ollama service accessible on your local network and it is hosted on Linux, add the following to the `/etc/systemd/system/ollama.service` file:
    ```bash
    [Service]
    ...
    Environment="OLLAMA_HOST=0.0.0.0"
    ```
    Now ollama will be available at `http://<local_address>:11434`


## Instantiating the Ollama client

We use the `Client` class from Ollama to allow customizability of the host. By default, the `ollama_client` function will try to read in the `OLLAMA_HOST` environment variable. If it is not set, you must provide a host. Generally, the default is `http://localhost:11434`.

In [1]:
from not_again_ai.local_llm.ollama.ollama_client import ollama_client

client = ollama_client()

## Basic Chat Completion

The `chat_completion` function can be used to call models.

We assume that the model `phi3` has already been pulled into Ollama. If not, you can do so with the command `ollama pull phi3` in your terminal. Alternatively, you can use the `not_again_ai.llm.ollama.service.pull(model_name)` function to do so (we show this later).

In [2]:
from not_again_ai.local_llm.ollama.chat_completion import chat_completion

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"},
]

response = chat_completion(messages, model="phi3", client=client)
response

{'message': " Hello there! How can I assist you today? Whether it's answering questions, providing information or helping with any tasks, feel free to let me know what you need. I'm here to help!",
 'prompt_tokens': 6,
 'completion_tokens': 43,
 'response_duration': 3.2580406665802}

## Chat Completion with Other Features

The Ollama API also supports several other features, such as JSON mode, temperature, and max_tokens.

In [3]:
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {
        "role": "user",
        "content": "Generate a random number between 0 and 100 and structure the response in using JSON.",
    },
]

response = chat_completion(
    messages,
    model="phi3",
    client=client,
    max_tokens=300,
    context_window=1000,
    temperature=1.51,
    json_mode=True,
    seed=6,
)
response

{'message': {'random_number': 42},
 'prompt_tokens': 25,
 'completion_tokens': 10,
 'response_duration': 3.268199920654297}

## Ollama Service Management

Ollama models can also be managed using several helper functions. For example, you can pull a model, list all models, and delete a model.

In [4]:
from not_again_ai.local_llm.ollama.service import delete, is_model_available, list_models, pull, show

# Check what models are installed
models = list_models(client)
print(models)

[{'name': 'phi3:latest', 'model': 'phi3:latest', 'modified_at': '2024-06-20T00:22:45.763407206Z', 'size': 2393232963, 'size_readable': '2.23 GB', 'details': {'parent_model': '', 'format': 'gguf', 'family': 'phi3', 'families': ['phi3'], 'parameter_size': '3.8B', 'quantization_level': 'Q4_K_M'}}, {'name': 'command-r:35b', 'model': 'command-r:35b', 'modified_at': '2024-06-19T14:39:39.871170656Z', 'size': 20229443783, 'size_readable': '18.84 GB', 'details': {'parent_model': '', 'format': 'gguf', 'family': 'command-r', 'families': ['command-r'], 'parameter_size': '35B', 'quantization_level': 'Q4_0'}}, {'name': 'mistral:7b-instruct', 'model': 'mistral:7b-instruct', 'modified_at': '2024-06-19T14:19:39.805655483Z', 'size': 4113301090, 'size_readable': '3.83 GB', 'details': {'parent_model': '', 'format': 'gguf', 'family': 'llama', 'families': ['llama'], 'parameter_size': '7.2B', 'quantization_level': 'Q4_0'}}, {'name': 'llama3-gradient:70b', 'model': 'llama3-gradient:70b', 'modified_at': '2024-

In [5]:
# Check if a model is available
is_model_available("phi3", client)

True

In [6]:
# Show details about a model
show("phi3", client)

{'modelfile': '# Modelfile generated by "ollama show"\n# To build a new Modelfile based on this, replace FROM with:\n# FROM phi3:latest\n\nFROM /usr/share/ollama/.ollama/models/blobs/sha256-b26e6713dc749dda35872713fa19a568040f475cc71cb132cff332fe7e216462\nTEMPLATE "{{ if .System }}<|system|>\n{{ .System }}<|end|>\n{{ end }}{{ if .Prompt }}<|user|>\n{{ .Prompt }}<|end|>\n{{ end }}<|assistant|>\n{{ .Response }}<|end|>"\nPARAMETER stop <|end|>\nPARAMETER stop <|user|>\nPARAMETER stop <|assistant|>\nLICENSE """Microsoft.\nCopyright (c) Microsoft Corporation.\n\nMIT License\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the "Software"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the follo

In [7]:
# Delete a model
delete("phi3", client)

{'status': 'success'}

In [8]:
# Pull a model
pull("phi3", client)

{'status': 'success'}