# Using Ollama

[Ollama](https://github.com/ollama/ollama) is a simple way to get started with running language models locally.

We provide helpers to interface with Ollama by wrapping the [ollama-python](https://github.com/ollama/ollama-python) package.

## Installation

See the main README for installation instructions.


## Instantiating the Ollama client

We use the `Client` class from Ollama to allow customizability of the host. By default, the `ollama_client` function will try to read in the `OLLAMA_HOST` environment variable. If it is not set, you must provide a host. Generally, the default is `http://localhost:11434`.


In [1]:
from not_again_ai.llm.chat_completion.providers.ollama_api import ollama_client

client = ollama_client()

## Basic Chat Completion

The same `chat_completion` used for OpenAI, etc can be used to call models hosted on Ollama.

We assume that the model `phi4` has already been pulled into Ollama. If not, you can do so with the command `ollama pull phi4` in your terminal.


In [2]:
from not_again_ai.llm.chat_completion import chat_completion
from not_again_ai.llm.chat_completion.types import ChatCompletionRequest, SystemMessage, UserMessage

messages = [
    SystemMessage(content="You are a helpful assistant."),
    UserMessage(content="Hello!"),
]

request = ChatCompletionRequest(
    messages=messages,
    model="phi4",
    context_window=4000,  # Set context_window because Ollama's default is small.
)

response = chat_completion(request, provider="ollama", client=client)
response

ChatCompletionResponse(choices=[ChatCompletionChoice(message=AssistantMessage(content="Hi there! How can I assist you today? Whether it's answering questions, providing information, or helping with a specific task, feel free to let me know what you need! 😊", role=<Role.ASSISTANT: 'assistant'>, name=None, refusal=None, tool_calls=None), finish_reason='stop', json_message=None, logprobs=None, extras=None)], errors='', completion_tokens=39, prompt_tokens=24, completion_detailed_tokens=None, prompt_detailed_tokens=None, response_duration=4.5843, system_fingerprint=None, extras=None)

## Chat Completion with Other Features

The Ollama API also supports several other features, such as JSON mode, temperature, and max_tokens. The `ChatCompletionRequest` class has fields for all of these including ones specific to Ollama such as `top_k`.


In [3]:
messages = [
    SystemMessage(content="You are a helpful assistant."),
    UserMessage(content="Generate a random number between 0 and 100 and structure the response in using JSON."),
]

request = ChatCompletionRequest(
    messages=messages,
    model="phi4",
    max_completion_tokens=300,
    context_window=1000,
    temperature=1.51,
    json_mode=True,
    top_k=5,
    seed=6,
)

response = chat_completion(request, provider="ollama", client=client)
response

ChatCompletionResponse(choices=[ChatCompletionChoice(message=AssistantMessage(content='{\n    "random_number": 47\n} \n\n', role=<Role.ASSISTANT: 'assistant'>, name=None, refusal=None, tool_calls=None), finish_reason='stop', json_message={'random_number': 47}, logprobs=None, extras=None)], errors='', completion_tokens=12, prompt_tokens=40, completion_detailed_tokens=None, prompt_detailed_tokens=None, response_duration=4.258, system_fingerprint=None, extras=None)