## Homework: Open-Source LLMs

In this homework, I'll experiment more with Ollama

## Q1. Running Ollama with Docker

Let's run ollama with Docker. We will need to execute the 
same command as in the lectures:

```bash
docker run -it \
    --rm \
    -v ollama:/root/.ollama \
    -p 11434:11434 \
    --name ollama \
    ollama/ollama
```

What's the version of ollama client? 

To find out, enter the container and execute `ollama` with the `-v` flag.

docker -v ollama \
Docker version 26.1.3-1, build b72abbb6f0351eb22e5c7bdbba9112fef6b41429

## Q2. Downloading an LLM 

We will donwload a smaller LLM - gemma:2b. 

Again let's enter the container and pull the model:

```bash
ollama pull gemma:2b
```

In docker, it saved the results into `/root/.ollama`

We're interested in the metadata about this model. You can find
it in `models/manifests/registry.ollama.ai/library`

What's the content of the file related to gemma?

docker exec -it ollama bash

root@c9a3630807bf:/# ollama pull gemma:2b\
pulling manifest \
pulling c1864a5eb193... 100% ▕███████████████████████████████████████████████████████████▏ 1.7 GB        \                 
pulling 097a36493f71... 100% ▕███████████████████████████████████████████████████████████▏ 8.4 KB         \                
pulling 109037bec39c... 100% ▕███████████████████████████████████████████████████████████▏  136 B          \               
pulling 22a838ceb7fb... 100% ▕███████████████████████████████████████████████████████████▏   84 B           \              
pulling 887433b89a90... 100% ▕███████████████████████████████████████████████████████████▏  483 B            \             
verifying sha256 digest \
writing manifest \
removing any unused layers \
success \
root@c9a3630807bf:/# cd /root/.ollama/models/manifests/registry.ollama.ai/library\
root@c9a3630807bf:~/.ollama/models/manifests/registry.ollama.ai/library# ls\
gemma \
root@c9a3630807bf:~/.ollama/models/manifests/registry.ollama.ai/library# cd gemma\
root@c9a3630807bf:~/.ollama/models/manifests/registry.ollama.ai/library/gemma# ls\
2b\
root@c9a3630807bf:~/.ollama/models/manifests/registry.ollama.ai/library/gemma# cat 2b\
{"schemaVersion":2,"mediaType":"application/vnd.docker.distribution.manifest.v2+json","config":{"mediaType":"application/vnd.docker.container.image.v1+json","digest":"sha256:887433b89a901c156f7e6944442f3c9e57f3c55d6ed52042cbb7303aea994290","size":483},"layers":[{"mediaType":"application/vnd.ollama.image.model","digest":"sha256:c1864a5eb19305c40519da12cc543519e48a0697ecd30e15d5ac228644957d12","size":1678447520},{"mediaType":"application/vnd.ollama.image.license","digest":"sha256:097a36493f718248845233af1d3fefe7a303f864fae13bc31a3a9704229378ca","size":8433},{"mediaType":"application/vnd.ollama.image.template","digest":"sha256:109037bec39c0becc8221222ae23557559bc594290945a2c4221ab4f303b8871","size":136},{"mediaType":"application/vnd.ollama.image.params","digest":"sha256:22a838ceb7fb22755a3b0ae9b4eadde629d19be1f651f73efb8c6b4e2cd0eea0","size":84}]}

## Q3. Running the LLM

Test the following prompt: "10 * 10". What's the answer?

In [10]:
prompt = "10 * 10"

In [11]:
from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1/',
    api_key='ollama',
)

response = client.chat.completions.create(
    model='gemma:2b',
    messages=[{"role": "user", "content": prompt}]
)
    
print(response.choices[0].message.content)

Sure, here is the answer:

```
10 * 10^1 = 100
```

The model is using the property that multiplication of two numbers with the same base is equal to the base multiplied by the two numbers.


## Q4. Donwloading the weights 

We don't want to pull the weights every time we run
a docker container. Let's do it once and have them available
every time we start a container.

First, we will need to change how we run the container.

Instead of mapping the `/root/.ollama` folder to a named volume,
let's map it to a local directory:

```bash
mkdir ollama_files

docker stop ollama

docker run -it --rm -v ./ollama_files:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
```

Now pull the model:

```bash
docker exec -it ollama ollama pull gemma:2b 
```

What's the size of the `ollama_files/models` folder? 

* 0.6G
* 1.2G
* 1.7G
* 2.2G

Hint: on linux, you can use `du -h` for that.

du -h\
1.6G    ./ollama_files/models/blobs\
8.0K    ./ollama_files/models/manifests/registry.ollama.ai/library/gemma\
12K     ./ollama_files/models/manifests/registry.ollama.ai/library\
16K     ./ollama_files/models/manifests/registry.ollama.ai\
20K     ./ollama_files/models/manifests\
1.6G    ./ollama_files/models\
1.6G    ./ollama_files\
...

## Q5. Adding the weights 

Let's now stop the container and add the weights 
to a new image

docker stop ollama

For that, let's create a `Dockerfile`:

```dockerfile
FROM ollama/ollama

COPY ...
```

What do you put after `COPY`?

ollama_files /root/.ollama

## Q6. Serving it 

Let's build it:

```bash
docker build -t ollama-gemma2b .
```

And run it:

```bash
docker run -it --rm -p 11434:11434 ollama-gemma2b
```


We can connect to it using the OpenAI client

Let's test it with the following prompt:

```python
prompt = "What's the formula for energy?"
```

Also, to make results reproducible, set the `temperature` parameter to 0:

```bash
response = client.chat.completions.create(
    #...
    temperature=0.0
)
```


In [12]:
prompt = "What's the formula for energy?"

In [13]:
response = client.chat.completions.create(
    model='gemma:2b',
    messages=[{"role": "user", "content": prompt}],
    temperature=0.0
)

In [14]:
print(response.choices[0].message.content)

Sure, here's the formula for energy:

**E = K + U**

Where:

* **E** is the energy in joules (J)
* **K** is the kinetic energy in joules (J)
* **U** is the potential energy in joules (J)

**Kinetic energy (K)** is the energy an object possesses when it moves or is in motion. It is calculated as half the product of an object's mass (m) and its velocity (v) squared:

**K = 1/2mv^2**

**Potential energy (U)** is the energy an object possesses due to its position or configuration. It is calculated as the product of an object's mass, gravitational constant (g), and height or position above a reference point.

**U = mgh**

Where:

* **m** is the mass in kilograms (kg)
* **g** is the gravitational constant (9.8 m/s^2)
* **h** is the height or position in meters (m)

The formula shows that energy can be expressed as the sum of kinetic and potential energy. The kinetic energy is a measure of the object's ability to do work, while the potential energy is a measure of the object's ability to do w

In [25]:
answer = response.choices[0].message.content

How many completion tokens did you get in response?

* 304
* 604
* 904
* 1204

In [21]:
import tiktoken

In [22]:
encoding = tiktoken.encoding_for_model("gpt-4o")

In [23]:
len(encoding.encode(answer))

256

In [None]:
import os
from transformers import AutoTokenizer
from huggingface_hub import login

In [17]:
os.environ['HF_TOKEN'] = 'hf_blabla'
login(token=os.environ['HF_TOKEN'])

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to C:\Users\marti.MARTIN\.cache\huggingface\token
Login successful


In [18]:
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


In [20]:
response_ids = tokenizer(answer, return_tensors="pt")
response_ids[0]

Encoding(num_tokens=281, attributes=[ids, type_ids, tokens, offsets, attention_mask, special_tokens_mask, overflowing])