# Running a Large Language Model locally

To interact with LLMs I use [Ollama](https://ollama.com), which can then be used in the terminal or through Python.

I obtained the 7B-parameter version of DeepSeek-R1, with the following command, which downloaded a 4.7GB file:

    ollama pull deepseek-r1:7b

Then I can chat with the model with:

    ollama run deepseek-r1:7b

Now let's do it in Python! For this to work, you need to allow Ollama to serve the model, either with

    ollama serve

in the terminal, or by starting the Ollama GUI.

In [1]:
import ollama

LANGUAGE_MODEL = "deepseek-r1:7b"

## Asking a question with `ollama.generate()`

In [2]:
%%time
stream = ollama.generate(model=LANGUAGE_MODEL, prompt="Is the Sun a star?", stream=True)
for chunk in stream:
    print(chunk["response"], end="", flush=True)

<think>
Okay, so I want to know if the Sun is a star. From what I remember, the Sun is at the center of our solar system and it's really bright. But wait, isn't everything called a star when it's hot and glowing? Well, not exactly. Stars are typically categorized based on their brightness as seen from Earth, like main sequence stars which include our Sun.

I think the Sun has been called different things historically. In ancient times, people might have referred to it as a planet or something else because it appears to move across the sky and doesn't emit light itself. But actually, we know now that the Sun is a star, emitting its own light through nuclear fusion in its core.

So, if I list out what makes the Sun unique: it's a massive ball of gas with hydrogen in its core. It burns fuel using nuclear reactions to produce energy for billions of years. The surface is called the photosphere, and from there, energy is released into space as sunlight. This light travels through space and r

## Asking the same question again

In [3]:
%%time
stream = ollama.generate(model=LANGUAGE_MODEL, prompt="Is the Sun a star?", stream=True)
for chunk in stream:
    print(chunk["response"], end="", flush=True)

<think>
Okay, so I'm trying to figure out if the Sun is a star. Hmm, let me think about what I know.

First off, I remember that stars are massive balls of hot gas that emit light and heat. They fuse hydrogen into helium in their cores, which powers their energy output. The Sun is often referred to as the "star" because it's the brightest object in our solar system, right? But wait, isn't there a difference between stars and other celestial bodies like planets?

Stars are classified based on their characteristics, such as temperature, size, and luminosity. They have to produce light through nuclear fusion, which is a process that converts hydrogen into helium. The Sun fits this because it's a main-sequence star, meaning it's in the stable part of its life cycle where fusion is happening.

But then there are other objects like planets, comets, asteroids, and meteors. Planets orbit stars, so they're dependent on their host star for light and heat. The Sun doesn't count as a planet becaus

Remark that we get a different output! The LLM provides a probability for each next word in the text, and Ollama picks one based on those probabilities, which makes every run different.

## How to always get the same output

If we want the output to always be the same, we can force it to always pick the most probable by setting the `temperature` parameter to zero:

In [4]:
%%time
stream = ollama.generate(
    model=LANGUAGE_MODEL,
    prompt="Is Japan in Asia?",
    stream=True,
    options={"temperature": 0},
)
for chunk in stream:
    print(chunk["response"], end="", flush=True)

<think>

</think>

Yes, Japan is located in Asia. It is an island country off the eastern coast of the Pacific Ocean and is part of East Asia.CPU times: user 8.62 ms, sys: 3.32 ms, total: 11.9 ms
Wall time: 2.14 s


In [5]:
%%time
stream = ollama.generate(
    model=LANGUAGE_MODEL,
    prompt="Is Japan in Asia?",
    stream=True,
    options={"temperature": 0},
)
for chunk in stream:
    print(chunk["response"], end="", flush=True)

<think>

</think>

Yes, Japan is located in Asia. It is an island country off the eastern coast of the Pacific Ocean and is part of East Asia.CPU times: user 7.14 ms, sys: 3.18 ms, total: 10.3 ms
Wall time: 1.92 s


Just because the model thinks it knows the answer doesn't mean it is correct! Example:

In [6]:
%%time
stream = ollama.generate(
    model=LANGUAGE_MODEL,
    prompt="Who are the Rolling Stones?",
    stream=True,
    options={"temperature": 0},
)
for chunk in stream:
    print(chunk["response"], end="", flush=True)

<think>

</think>

The Rolling Stones are one of the most iconic and influential rock bands in music history. Formed in 1962 in London, England, by Mick Jagger and Keith Richards, the band has evolved over the years with the addition of other members, including Charlie Chang (bass) and Bill Wyman (guitar). The Rolling Stones are known for their powerful stage presence, energetic performances, and a wide range of music that spans more than five decades.

The band's music is characterized by its deep grooves, emotional lyrics, and a blend of rock, blues, and psychedelic elements. Some of their most famous songs include "Brown Sugar," "Paint It, Black," "Satisfaction," "Hey Jude," "Paint the Sky," and "Tattooed Love." The Rolling Stones have won numerous awards, including five Grammys, and have sold over 100 million records worldwide.

In addition to their music, the Rolling Stones are celebrated for their live performances, which have captivated audiences around the world. Their discogra

## Providing chat history with `ollama.chat()`

This module allows you to provide a whole list of messages to the LLM, specifying if they come from the `user`, or the LLM itself (`assistant`). This is how you keep a chat history, so you can ask for instance:

    user: What is the capital of Japan?
    assistant: Tokyo.
    user: And Thailand?    
    assistant: Bangkok.

In [7]:
messages = [
    {"role": "user", "content": "What is the capital of Japan?"},
]

response = ollama.chat(LANGUAGE_MODEL, messages=messages, options={"temperature": 0})
message = response["message"]
print(message["content"])
messages.append(message)

<think>

</think>

The capital of Japan is Tokyo.


In [8]:
messages.append({"role": "user", "content": "And Thailand?"})

response = ollama.chat(LANGUAGE_MODEL, messages=messages, options={"temperature": 0})
message = response["message"]
print(message["content"])
messages.append(message)

<think>
Okay, so the user asked about the capitals of Japan and Thailand after I told them that Japan's capital is Tokyo. Now they're asking about Thailand.

I should provide a clear answer first: Bangkok is Thailand's capital.

But wait, maybe they want more details. They might be curious about why Bangkok is the capital or if there are any other capitals besides it.

I should mention that while Bangkok is the official capital, sometimes people refer to Nonthaburi as well because of its economic importance.

Also, I can add a bit about the significance of the capital in terms of governance and international presence.

Keeping it friendly and informative would be best.
</think>

The capital of Thailand is Bangkok. It's also known as Nonthaburi in some contexts due to its economic and administrative significance. Bangkok is a major city in Thailand, serving as both its commercial, political, and cultural center.
