## Q1. Running Ollama with Docker

I have ollama already installed, it stores models in ~/.ollama, so I'm going to 
re-use this directory.

In [1]:
!docker run -it \
    --rm \
    -v ~/.ollama:/root/.ollama \
    -p 11434:11434 \
    --name ollama \
    ollama/ollama -v




Warning is ok, because I've just substituted the instance run (`ollama serve`) with the client version check

## Q2. Downloading an LLM

In [5]:
!docker run \
    -d \
    --rm \
    -v ~/.ollama:/root/.ollama \
    -p 11434:11434 \
    --name ollama \
    ollama/ollama


249bf2647870250b1de94f64392bcb52d66efa310a9454ff5c35ba993655884e


In [6]:
!docker exec ollama ollama pull gemma:2b

[?25lpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest ⠦ [?25h[?25l[2K[1Gpulling manifest ⠧ [?25h[?25l[2K[1Gpulling manifest ⠇ [?25h[?25l[2K[1Gpulling manifest ⠏ [?25h[?25l[2K[1Gpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest 
pulling c1864a5eb193...   0% ▕                ▏    0 B/1.7 GB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling c1864a5eb193...   0% ▕                ▏    0 B/1.7 GB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling c1864a5eb193...   0% ▕                ▏    0 B/1.7 GB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling

In [11]:
!cat ~/.ollama/models/manifests/registry.ollama.ai/library/gemma/2b

{"schemaVersion":2,"mediaType":"application/vnd.docker.distribution.manifest.v2+json","config":{"mediaType":"application/vnd.docker.container.image.v1+json","digest":"sha256:887433b89a901c156f7e6944442f3c9e57f3c55d6ed52042cbb7303aea994290","size":483},"layers":[{"mediaType":"application/vnd.ollama.image.model","digest":"sha256:c1864a5eb19305c40519da12cc543519e48a0697ecd30e15d5ac228644957d12","size":1678447520},{"mediaType":"application/vnd.ollama.image.license","digest":"sha256:097a36493f718248845233af1d3fefe7a303f864fae13bc31a3a9704229378ca","size":8433},{"mediaType":"application/vnd.ollama.image.template","digest":"sha256:109037bec39c0becc8221222ae23557559bc594290945a2c4221ab4f303b8871","size":136},{"mediaType":"application/vnd.ollama.image.params","digest":"sha256:22a838ceb7fb22755a3b0ae9b4eadde629d19be1f651f73efb8c6b4e2cd0eea0","size":84}]}

## Q3. Running the LLM

In [48]:
import ollama
from openai import OpenAI

In [27]:
prompt = "10 * 10"
client = ollama.Client()
response = client.generate(
    model="gemma:2b", prompt=prompt
)
response["response"]


'Sure, here is the answer to the question:\n\n10 * 10 = 100.'

In [50]:
prompt = "10 * 10"
client = OpenAI(
    base_url='http://localhost:11434/v1/',
    api_key='ollama',
)
response = client.chat.completions.create(
    messages=[
        dict(role="user", content=prompt),
    ],
    model="gemma:2b",
    temperature=0.0,
)
content = response.choices[0].message.content
content


'Sure, here is the answer to the question:\n\n10 * 10 = 100.'

## Q4. Donwloading the weights 

I have a few more models in the directory and don't want to download gemma one more time, so I'll use the manifest.

It's approx 1.7Gb

Another approach:

```bash
$ docker exec ollama ollama list
```

## Q5. Adding the weights 

```Dockerfile
FROM ollama/ollama

COPY ollama_files /root/.ollama
```

## Q6. Serving it 

In [53]:
import tiktoken

In [54]:
client = OpenAI(
    base_url='http://localhost:11434/v1/',
    api_key='ollama',
)

In [55]:
prompt = "What's the formula for energy?"

client = OpenAI(
    base_url='http://localhost:11434/v1/',
    api_key='ollama',
)

response = client.chat.completions.create(
    messages=[
        dict(role="user", content=prompt),
    ],
    model="gemma:2b",
    temperature=0.0,
)
content = response.choices[0].message.content
print(content)


Sure, here's the formula for energy:

**E = K + U**

Where:

* **E** is the energy in joules (J)
* **K** is the kinetic energy in joules (J)
* **U** is the potential energy in joules (J)

**Kinetic energy (K)** is the energy an object possesses when it moves or is in motion. It is calculated as half the product of an object's mass (m) and its velocity (v) squared:

**K = 1/2mv^2**

**Potential energy (U)** is the energy an object possesses due to its position or configuration. It is calculated as the product of an object's mass, gravitational constant (g), and height or position above a reference point.

**U = mgh**

**Where:**

* **m** is the mass in kilograms (kg)
* **g** is the acceleration due to gravity in meters per second squared (m/s²)
* **h** is the height or position in meters (m)

The formula shows that energy can be expressed as the sum of kinetic and potential energy. The kinetic energy is a measure of the object's ability to do work, while the potential energy is a measur

The number of tokens:

In [68]:
response.usage

CompletionUsage(completion_tokens=283, prompt_tokens=34, total_tokens=317)

In [61]:
response.usage.completion_tokens

283

The last time we've used the tiktoken library. But it knows nothing about the `gemma` model we using now. If it's a GPT-4o model, the number of tokens is

In [62]:
encoding = tiktoken.encoding_for_model("gpt-4o")
len(encoding.encode(content))

260

The Gemma model has another BPE dictionary.

In [63]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")


In [66]:
assert len(tokenizer.encode(content, add_special_tokens=True)) == response.usage.completion_tokens
len(tokenizer.encode(content, add_special_tokens=True))


283