#### Question 1. Running Ollama with Docker. What's the version?

In [1]:
!ollama -v

ollama version is 0.1.48


#### Question 2. Downloading an LLM. Manifest file 

In [12]:
!docker exec ollama ollama pull gemma:2b

[?25lpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest 
pulling c1864a5eb193... 100% ▕████████████████▏ 1.7 GB                         
pulling 097a36493f71... 100% ▕████████████████▏ 8.4 KB                         
pulling 109037bec39c... 100% ▕████████████████▏  136 B                         
pulling 22a838ceb7fb... 100% ▕████████████████▏   84 B                         
pulling 887433b89a90... 100% ▕████████████████▏  483 B                         
verifying sha256 digest 
writing manifest 
removing any unused layers 
success [?25h


In [15]:
!docker exec ollama cat /root/.ollama/models/manifests/registry.ollama.ai/library/gemma/2b

{"schemaVersion":2,"mediaType":"application/vnd.docker.distribution.manifest.v2+json","config":{"mediaType":"application/vnd.docker.container.image.v1+json","digest":"sha256:887433b89a901c156f7e6944442f3c9e57f3c55d6ed52042cbb7303aea994290","size":483},"layers":[{"mediaType":"application/vnd.ollama.image.model","digest":"sha256:c1864a5eb19305c40519da12cc543519e48a0697ecd30e15d5ac228644957d12","size":1678447520},{"mediaType":"application/vnd.ollama.image.license","digest":"sha256:097a36493f718248845233af1d3fefe7a303f864fae13bc31a3a9704229378ca","size":8433},{"mediaType":"application/vnd.ollama.image.template","digest":"sha256:109037bec39c0becc8221222ae23557559bc594290945a2c4221ab4f303b8871","size":136},{"mediaType":"application/vnd.ollama.image.params","digest":"sha256:22a838ceb7fb22755a3b0ae9b4eadde629d19be1f651f73efb8c6b4e2cd0eea0","size":84}]}

#### Question 3. Running the LLM. Output from 10 * 10

In [None]:
!rm -f minsearch.py
!wget https://raw.githubusercontent.com/alexeygrigorev/minsearch/main/minsearch.py

In [3]:
import requests 
import minsearch

docs_url = 'https://github.com/DataTalksClub/llm-zoomcamp/blob/main/01-intro/documents.json?raw=1'
docs_response = requests.get(docs_url)
documents_raw = docs_response.json()
 => => transferring context: 2B                                            0.0s
documents = []

for course in documents_raw:
    course_name = course['course']

    for doc in course['documents']:
        doc['course'] = course_name
        documents.append(doc)

index = minsearch.Index(
    text_fields=["question", "text", "section"],
    keyword_fields=["course"]
)

index.fit(documents)

<minsearch.Index at 0x7be70b6a22f0>

In [4]:
def search(query):
    boost = {'question': 3.0, 'section': 0.5}

    results = index.search(
        query=query,
        filter_dict={'course': 'data-engineering-zoomcamp'},
        boost_dict=boost,
        num_results=3
    )

    return results

In [5]:
def build_prompt(query, search_results):
    prompt_template = """
You're a course teaching assistant. Answer the QUESTION based on the CONTEXT from the FAQ database.
Use only the facts from the CONTEXT when answering the QUESTION.

QUESTION: {question}

CONTEXT: 
{context}
""".strip()

    context = ""
    
    for doc in search_results:
        context = context + f"section: {doc['section']}\nquestion: {doc['question']}\nanswer: {doc['text']}\n\n"
    
    prompt = prompt_template.format(question=query, context=context).strip()
    return prompt

def llm(prompt):
    response = client.chat.completions.create(
        model='gemma:2b',
        messages=[{"role": "user", "content": prompt}]
    )
    
    return response.choices[0].message.content


In [6]:
def rag(query):
    search_results = search(query)
    prompt = build_prompt(query, search_results)
    answer = llm(prompt)
    return answer

In [7]:
from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1/',
    api_key='ollama',
)

In [8]:
llm('10 * 10')

'The code you provided is a simple Python program that calculates 10 multiplied by 10.\n\n```python\n10 * 10\n```\n\n**Output:**\n\n```\n100\n```\n\n**Explanation:**\n\n* The `10 * 10` expression performs the multiplication of 10 by 10, which is 100.\n* The `start_of_turn` and `end_of_turn` markers are used to indicate the start and end of a block of code.\n* The code is valid Python syntax and will execute the program as intended.'

In [9]:
print(_)

The code you provided is a simple Python program that calculates 10 multiplied by 10.

```python
10 * 10
```

**Output:**

```
100
```

**Explanation:**

* The `10 * 10` expression performs the multiplication of 10 by 10, which is 100.
* The `start_of_turn` and `end_of_turn` markers are used to indicate the start and end of a block of code.
* The code is valid Python syntax and will execute the program as intended.


#### Question 4. Downloading the weights. Size of the folder

In [16]:
!mkdir ollama_files

In [17]:
!pwd

/workspaces/llm-zoomcamp/02-open-source/02-homework


In [18]:
!ls -la

total 40
drwxrwxrwx+ 5 codespace root       4096 Jul  6 20:01 .
drwxrwxrwx+ 5 codespace root       4096 Jul  6 19:14 ..
drwxrwxrwx+ 2 codespace codespace  4096 Jul  6 17:29 .ipynb_checkpoints
-rw-rw-rw-  1 codespace root       2691 Jul  4 20:06 README.md
drwxrwxrwx+ 2 codespace codespace  4096 Jul  6 17:38 __pycache__
-rw-rw-rw-  1 codespace codespace 10568 Jul  6 20:01 homework_02.ipynb
-rw-rw-rw-  1 codespace codespace  3832 Jul  6 17:29 minsearch.py
drwxrwxrwx+ 2 codespace codespace  4096 Jul  6 19:59 ollama_files


In [24]:
!pip install docker

Collecting docker
  Downloading docker-7.1.0-py3-none-any.whl.metadata (3.8 kB)
Downloading docker-7.1.0-py3-none-any.whl (147 kB)
[2K   [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m147.8/147.8 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m[31m46.3 MB/s[0m eta [36m0:00:01[0m
[?25hInstalling collected packages: docker
Successfully installed docker-7.1.0

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m


In [34]:
import docker
import os

In [50]:
current_directory = os.getcwd()

volume_path = os.path.join(current_directory, "ollama_files")

client = docker.from_env()

container = client.containers.run(
    "ollama/ollama",  
    remove=True,
    volumes={
        volume_path: {'bind': '/root/.ollama', 'mode': 'rw'}
    },
    ports={11434: 11434},
    name="ollama",
    detach=True
)

print(container.logs())

b'2024/07/06 20:44:13 routes.go:1064: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE: OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"\ntime=2024-07-06T20:44:13.836Z level=INFO source=images.go:730 msg="total blobs: 0"\ntime=2024-07-06T20:44:13.836Z level=INFO source=images.go:737 msg="total unused blobs removed: 0"\ntime=202

In [52]:
!docker exec ollama ollama pull gemma:2b

[?25lpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest 
pulling c1864a5eb193... 100% ▕████████████████▏ 1.7 GB                         
pulling 097a36493f71... 100% ▕████████████████▏ 8.4 KB                         
pulling 109037bec39c... 100% ▕████████████████▏  136 B                         
pulling 22a838ceb7fb... 100% ▕████████████████▏   84 B                         
pulling 887433b89a90... 100% ▕████████████████▏  483 B                         
verifying sha256 digest 
writing manifest 
removing any unused layers 
success [?25h


In [53]:
!du -h ollama_files/models

1.6G	ollama_files/models/blobs
8.0K	ollama_files/models/manifests/registry.ollama.ai/library/gemma
12K	ollama_files/models/manifests/registry.ollama.ai/library
16K	ollama_files/models/manifests/registry.ollama.ai
20K	ollama_files/models/manifests
1.6G	ollama_files/models


#### Question 5. Adding the weights. Dockerfile

In [54]:
client = docker.from_env()

# Downloading all containers
containers = client.containers.list(all=False)

for container in containers:
    try:
        # Stopping the container
        container.stop()
        print(f"Container {container.name} has been stopped.")

    except Exception as e:
        print(f"Failed to stop {container.name}: {str(e)}")

Container ollama has been stopped.


In [55]:
!pwd

/workspaces/llm-zoomcamp/02-open-source/02-homework


In [59]:
%%writefile Dockerfile
FROM ollama/ollama

WORKDIR /app

COPY ollama_files/models /app/models

Overwriting Dockerfile


In [60]:
%cat Dockerfile

FROM ollama/ollama

WORKDIR /app

COPY ollama_files/models /app/models


#### Question 6. Serving the LLM. Number of output tokens 

In [61]:
!docker build -t ollama-gemma2b .

[1A[1B[0G[?25l[+] Building 0.0s (0/1)                                          docker:default
[?25h[1A[0G[?25l[+] Building 0.1s (3/3)                                          docker:default
[34m => [internal] load build definition from Dockerfile                       0.1s
[0m[34m => => transferring dockerfile: 108B                                       0.0s
[0m[34m => [internal] load metadata for docker.io/ollama/ollama:latest            0.0s
[0m[34m => [internal] load .dockerignore                                          0.1s
[0m[34m => => transferring context: 2B                                            0.0s
[0m[?25h[1A[1A[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (5/7)                                          docker:default
[34m => [internal] load build definition from Dockerfile                       0.1s
[0m[34m => => transferring dockerfile: 108B                                       0.0s
[0m[34m => [internal] load metadata for docker.io/ollama/olla

In [65]:
!docker run --rm -p 11434:11434 ollama-gemma2b

Couldn't find '/root/.ollama/id_ed25519'. Generating new private key.
Your new public key is: 

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAICYo0+Dvy3nO4H8bgZNAw19hO83Fk6a82Ljv7rEm+xR8

2024/07/06 21:41:58 routes.go:1064: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE: OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
time=2024-07-06T