# Run models on local machine with Ollama

## What's Ollama?

It's a type of inference framework for LLM, such as:
- [Ollama](https://ollama.com/) only support GGUF format model.
- [VLLM](vllm.com)
- [LM Studio](https://lmstudio.ai/)

In [None]:
# enable AutoDL.com VPN to speed up the download
# source /etc/network_turbo
# curl -fsSL https://ollama.com/install.sh | sh
# ollama serve
# ollama run llama3.2:1b

In [None]:
# docker on CPU
# docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

# docker on AMD GPU
# docker run -d --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama:rocm

# docker on NVIDIA GPU
# docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

# Run model locally
# docker exec -it ollama ollama run deepseek-r1:7b

# See more details at https://hub.docker.com/r/ollama/ollama
# https://ollama.com/library/deepseek-r1

## Chat with Ollama Inference API

In [1]:
from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1/", api_key="ollama")

response = client.chat.completions.create(
    messages=[
    {
        "role": "system",
        "content": "You are a helpful assistant."
    },
    {
        "role": "user",
        "content": "What is the purpose of life?"
    }],
    model="llama3.2:1b"
)

print(response)

APIConnectionError: Connection error.

## Chat Session with History

In [None]:
from openai import OpenAI

def run_chat_session():
    client = OpenAI(base_url="http://localhost:11434/v1/", api_key="ollama")

    chat_history = []

    while True:
        user_input = input("You: ")
        if user_input.lower() == "exit":
            print("Goodbye!")
            break

        chat_history.append({
            "role": "user",
            "content": user_input
        })

        try:
            response = client.chat.completions.create(
                messages=chat_history,
                model="llama3.2:1b"
            )
            print("Ollama:", response.choices[0].message.content)
            chat_history.append({
                "role": "assistant",
                "content": response.choices[0].message.content
            })
        except Exception as e:
            print("Error: ", e)
            continue

    print(response)