## Q1. Running Ollama with Docker

Let's run ollama with Docker. We will need to execute the same command as in the lectures:

```bash
docker run -it \
    --rm \
    -v ollama:/root/.ollama \
    -p 11434:11434 \
    --name ollama \
    ollama/ollama
What's the version of ollama client?
```

To find out, enter the container and execute `ollama` with the `-v` flag.

## A1.

![image](hw2-images/hw-2-q1.png)

## Q2. Downloading an LLM

We will donwload a smaller LLM - gemma:2b.

Again let's enter the container and pull the model:
```bash
ollama pull gemma:2b
```
In docker, it saved the results into `/root/.ollama`

We're interested in the metadata about this model. You can find it in `models/manifests/registry.ollama.ai/library`

What's the content of the file related to gemma?

## A2.

![image](hw2-images/hw-2-q2.png) 

## Q3. Running the LLM
Test the following prompt: "10 * 10". What's the answer?

## A3.

![image](hw2-images/hw-2-q3.png) 

## Q4. Donwloading the weights
We don't want to pull the weights every time we run a docker container. Let's do it once and have them available every time we start a container.

First, we will need to change how we run the container.

Instead of mapping the `/root/.ollama` folder to a named volume, let's map it to a local directory:

```bash
mkdir ollama_files

docker run -it \
    --rm \
    -v ./ollama_files:/root/.ollama \
    -p 11434:11434 \
    --name ollama \
    ollama/ollama
```

Now pull the model:

```bash
docker exec -it ollama ollama pull gemma:2b 
```

What's the size of the `ollama_files/models` folder?

* 0.6G
* 1.2G
* 1.7G
* 2.2G
  
Hint: on linux, you can use `du -h` for that.

## A4.

![image](hw2-images/hw-2-q4.png) 

## Q5. Adding the weights
Let's now stop the container and add the weights to a new image

For that, let's create a Dockerfile:

```bash
FROM ollama/ollama

COPY ...
What do you put after COPY?
```

## A5.

To copy ollama_files from the local directory into the container directory

```bash
FROM ollama/ollama

COPY ./ollama_files ollama_files
```



## Q6. Serving it
Let's build it:

```bash
docker build -t ollama-gemma2b .
```

And run it:

```bash
docker run -it --rm -p 11434:11434 ollama-gemma2b
```

We can connect to it using the OpenAI client

Let's test it with the following prompt:

```bash
prompt = "What's the formula for energy?"
```

Also, to make results reproducible, set the `temperature` parameter to 0:

```bash
response = client.chat.completions.create(
    #...
    temperature=0.0
)
```

How many completion tokens did you get in response?

* 304
* 604
* 904
* 1204

## A6.

In [1]:
from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1/',
    api_key='ollama',
)

def llm(prompt):
    response = client.chat.completions.create(
        model='gemma:2b',
        messages=[{"role": "user", "content": prompt}],
        temperature=0.0  
    )
    
    return response.choices[0].message.content

result = llm("What's the formula for energy?")

In [2]:
result

"Sure, here's the formula for energy:\n\n**E = K + U**\n\nWhere:\n\n* **E** is the energy in joules (J)\n* **K** is the kinetic energy in joules (J)\n* **U** is the potential energy in joules (J)\n\n**Kinetic energy (K)** is the energy an object possesses when it moves or is in motion. It is calculated as half the product of an object's mass (m) and its velocity (v) squared:\n\n**K = 1/2mv^2**\n\n**Potential energy (U)** is the energy an object possesses due to its position or configuration. It is calculated as the product of an object's mass, gravitational constant (g), and height or position above a reference point.\n\n**U = mgh**\n\nWhere:\n\n* **m** is the mass in kilograms (kg)\n* **g** is the gravitational constant (9.8 m/s^2)\n* **h** is the height or position in meters (m)\n\nThe formula shows that energy can be expressed as the sum of kinetic and potential energy. The kinetic energy is a measure of the object's ability to do work, while the potential energy is a measure of the

In [8]:
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")

In [15]:
input_ids = tokenizer(result)

In [18]:
len(input_ids['input_ids'])

281