## Homework: Open-Source LLMs

In this homework, we'll experiment more with Ollama

> It's possible that your answers won't match exactly. If it's the case, select the closest one.

## Q1. Running Ollama with Docker

Let's run ollama with Docker. We will need to execute the 
same command as in the lectures:

```bash
docker run -it \
    --rm \
    -v ollama:/root/.ollama \
    -p 11434:11434 \
    --name ollama \
    ollama/ollama
```

What's the version of ollama client? 

To find out, enter the container and execute `ollama` with the `-v` flag.

In [5]:
!docker exec -it ollama ollama -v

ollama version is 0.1.48


## Q2. Downloading an LLM 

We will donwload a smaller LLM - gemma:2b. 

Again let's enter the container and pull the model:

```bash
ollama pull gemma:2b
```

In docker, it saved the results into `/root/.ollama`

We're interested in the metadata about this model. You can find
it in `models/manifests/registry.ollama.ai/library`

What's the content of the file related to gemma?

In [10]:
!docker exec -it ollama ls -la /root/.ollama/models/manifests/registry.ollama.ai/library/gemma/

total 12
drwxr-xr-x 2 root root 4096 Jul  6 20:16 .
drwxr-xr-x 3 root root 4096 Jul  6 20:16 ..
-rw-r--r-- 1 root root  856 Jul  6 20:16 2b


In [7]:
!docker exec -it ollama cat /root/.ollama/models/manifests/registry.ollama.ai/library/gemma/2b

{"schemaVersion":2,"mediaType":"application/vnd.docker.distribution.manifest.v2+json","config":{"mediaType":"application/vnd.docker.container.image.v1+json","digest":"sha256:887433b89a901c156f7e6944442f3c9e57f3c55d6ed52042cbb7303aea994290","size":483},"layers":[{"mediaType":"application/vnd.ollama.image.model","digest":"sha256:c1864a5eb19305c40519da12cc543519e48a0697ecd30e15d5ac228644957d12","size":1678447520},{"mediaType":"application/vnd.ollama.image.license","digest":"sha256:097a36493f718248845233af1d3fefe7a303f864fae13bc31a3a9704229378ca","size":8433},{"mediaType":"application/vnd.ollama.image.template","digest":"sha256:109037bec39c0becc8221222ae23557559bc594290945a2c4221ab4f303b8871","size":136},{"mediaType":"application/vnd.ollama.image.params","digest":"sha256:22a838ceb7fb22755a3b0ae9b4eadde629d19be1f651f73efb8c6b4e2cd0eea0","size":84}]}

## Q3. Running the LLM

Test the following prompt: "10 * 10". What's the answer?

In [16]:
!docker exec -it ollama ollama run gemma:2b "10 * 10"

[?25l⠙ [?25h[?25l[?25l[2K[1G[?25h[2K[1G[?25hSure[?25l[?25h,[?25l[?25h here[?25l[?25h'[?25l[?25hs[?25l[?25h a[?25l[?25h safe[?25l[?25h and[?25l[?25h informative[?25l[?25h answer[?25l[?25h to[?25l[?25h your[?25l[?25h question[?25l[?25h:[?25l[?25h

[?25l[?25h1[?25l[?25h0[?25l[?25h *[?25l[?25h [?25l[?25h1[?25l[?25h0[?25l[?25h is[?25l[?25h [?25l[?25h1[?25l[?25h0[?25l[?25h0[?25l[?25h.[?25l[?25h

[?25l[?25h

In [20]:
!docker exec -it ollama ollama run gemma:2b "10 * 10"

[?25l⠙ [?25h[?25l[?25l[2K[1G[?25h[2K[1G[?25hSure[?25l[?25h,[?25l[?25h here[?25l[?25h'[?25l[?25hs[?25l[?25h the[?25l[?25h answer[?25l[?25h to[?25l[?25h your[?25l[?25h question[?25l[?25h:[?25l[?25h

[?25l[?25h1[?25l[?25h0[?25l[?25h *[?25l[?25h [?25l[?25h1[?25l[?25h0[?25l[?25h =[?25l[?25h [?25l[?25h1[?25l[?25h0[?25l[?25h0[?25l[?25h.[?25l[?25h

[?25l[?25hIs[?25l[?25h there[?25l[?25h anything[?25l[?25h else[?25l[?25h I[?25l[?25h can[?25l[?25h help[?25l[?25h you[?25l[?25h with[?25l[?25h?[?25l[?25h

[?25l[?25h

## Q4. Downloading the weights 

We don't want to pull the weights every time we run
a docker container. Let's do it once and have them available
every time we start a container.

First, we will need to change how we run the container.

Instead of mapping the `/root/.ollama` folder to a named volume,
let's map it to a local directory:

```bash
mkdir ollama_files

docker run -it \
    --rm \
    -v ./ollama_files:/root/.ollama \
    -p 11434:11434 \
    --name ollama \
    ollama/ollama
```

Now pull the model:

```bash
docker exec -it ollama ollama pull gemma:2b 
```

What's the size of the `ollama_files/models` folder? 

* 0.6G
* 1.2G
* 1.7G
* 2.2G

Hint: on linux, you can use `du -h` for that.

In [2]:
!docker exec -it ollama ollama list

NAME    	ID          	SIZE  	MODIFIED      
gemma:2b	b50d6c999e59	1.7 GB	2 minutes ago	


In [5]:
!du -h ollama_files/

1.6G	ollama_files/models/blobs
8.0K	ollama_files/models/manifests/registry.ollama.ai/library/gemma
12K	ollama_files/models/manifests/registry.ollama.ai/library
16K	ollama_files/models/manifests/registry.ollama.ai
20K	ollama_files/models/manifests
1.6G	ollama_files/models
1.6G	ollama_files/


## Q5. Adding the weights 

Let's now stop the container and add the weights 
to a new image

For that, let's create a `Dockerfile`:

```dockerfile
FROM ollama/ollama

COPY ...
```

What do you put after `COPY`?

```bash
COPY ollama_files /root/.ollama
```

In [1]:
!cat Dockerfile

FROM ollama/ollama
COPY ollama_files /root/.ollama
EXPOSE 11434

In [8]:
!docker build -t ollama-gemma2b .

[1A[1B[0G[?25l[+] Building 0.0s (0/1)                                          docker:default
[?25h[1A[0G[?25l[+] Building 0.1s (3/3)                                          docker:default
[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 100B                                       0.0s
[0m[34m => [internal] load metadata for docker.io/ollama/ollama:latest            0.0s
[0m[34m => [internal] load .dockerignore                                          0.0s
[0m[34m => => transferring context: 2B                                            0.0s
[0m[?25h[1A[1A[1A[1A[1A[1A[0G[?25l[+] Building 0.2s (6/7)                                          docker:default
[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 100B                                       0.0s
[0m[34m => [internal] load metadata for docker.io/ollama/olla

In [12]:
!docker run ollama-gemma2b

2024/07/06 21:04:55 routes.go:1064: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE: OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
time=2024-07-06T21:04:55.732Z level=INFO source=images.go:730 msg="total blobs: 5"
time=2024-07-06T21:04:55.732Z level=INFO source=images.go:737 msg="total unused blobs removed: 0"
time=2024-07-

In [10]:
!docker ps

CONTAINER ID   IMAGE            COMMAND               CREATED              STATUS              PORTS       NAMES
6374fc70123b   ollama-gemma2b   "/bin/ollama serve"   About a minute ago   Up About a minute   11434/tcp   youthful_chaplygin


In [11]:
!docker exec -it youthful_chaplygin ollama list

NAME    	ID          	SIZE  	MODIFIED       
gemma:2b	b50d6c999e59	1.7 GB	29 minutes ago	


## Q6. Serving it 

Let's build it:

```bash
docker build -t ollama-gemma2b .
```

And run it:

```bash
docker run -it --rm -p 11434:11434 ollama-gemma2b
```

We can connect to it using the OpenAI client

Let's test it with the following prompt:

```python
prompt = "What's the formula for energy?"
```

Also, to make results reproducible, set the `temperature` parameter to 0:

```bash
response = client.chat.completions.create(
    #...
    temperature=0.0
)
```

How many completion tokens did you get in response?

* 304
* 604
* 904
* 1204

In [13]:
!docker run -it --rm -p 11434:11434 ollama-gemma2b

2024/07/06 21:05:46 routes.go:1064: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE: OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
time=2024-07-06T21:05:46.423Z level=INFO source=images.go:730 msg="total blobs: 5"
time=2024-07-06T21:05:46.423Z level=INFO source=images.go:737 msg="total unused blobs removed: 0"
time=2024-07-

In [19]:
from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1/',
    api_key='ollama',
)

In [38]:
prompt = "What's the formula for energy?"
response = client.chat.completions.create(
        model='gemma:2b',
        messages=[{"role": "user", "content": prompt}],
        temperature = 0.0
    )

In [39]:
response_dict = dict(response)
response_dict

{'id': 'chatcmpl-572',
 'choices': [Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content="Sure, here's the formula for energy:\n\n**E = K + U**\n\nWhere:\n\n* **E** is the energy in joules (J)\n* **K** is the kinetic energy in joules (J)\n* **U** is the potential energy in joules (J)\n\n**Kinetic energy (K)** is the energy an object possesses when it moves or is in motion. It is calculated as half the product of an object's mass (m) and its velocity (v) squared:\n\n**K = 1/2 * m * v^2**\n\n**Potential energy (U)** is the energy an object possesses when it is in a position or has a specific configuration. It is calculated as the product of an object's mass and the gravitational constant (g) multiplied by the height or distance of the object from a reference point.\n\n**Gravitational potential energy (U)** is given by the formula:\n\n**U = mgh**\n\nWhere:\n\n* **m** is the mass of the object in kilograms (kg)\n* **g** is the acceleration due to gravi

In [40]:
dict(response_dict["usage"])["completion_tokens"]

304

## Submit the results

* Submit your results here: https://courses.datatalks.club/llm-zoomcamp-2024/homework/hw2
* It's possible that your answers won't match exactly. If it's the case, select the closest one.
