# Welcome to the Day 2 Lab!


<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/resources.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#f71;">Just before we get started --</h2>
            <span style="color:#f71;">I thought I'd take a second to point you at this page of useful resources for the course. This includes links to all the slides.<br/>
            <a href="https://edwarddonner.com/2024/11/13/llm-engineering-resources/">https://edwarddonner.com/2024/11/13/llm-engineering-resources/</a><br/>
            Please keep this bookmarked, and I'll continue to add more useful links there over time.
            </span>
        </td>
    </tr>
</table>

## First - let's talk about the Chat Completions API

1. The simplest way to call an LLM
2. It's called Chat Completions because it's saying: "here is a conversation, please predict what should come next"
3. The Chat Completions API was invented by OpenAI, but it's so popular that everybody uses it!

### We will start by calling OpenAI again - but don't worry non-OpenAI people, your time is coming!


In [1]:
import os
from dotenv import load_dotenv

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if not api_key:
    print("No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!")
elif not api_key.startswith("sk-proj-"):
    print("An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook")
else:
    print("API key found and looks good so far!")


API key found and looks good so far!


## Do you know what an Endpoint is?

If not, please review the Technical Foundations guide in the guides folder

And, here is an endpoint that might interest you...

In [2]:
import requests

headers = {"Authorization": f"Bearer {api_key}", "Content-Type": "application/json"}

payload = {
    "model": "gpt-5-nano",
    "messages": [
        {"role": "user", "content": "Tell me a fun fact"}]
}

payload

{'model': 'gpt-5-nano',
 'messages': [{'role': 'user', 'content': 'Tell me a fun fact'}]}

In [3]:
response = requests.post(
    "https://api.openai.com/v1/chat/completions",
    headers=headers,
    json=payload
)

response.json()

{'id': 'chatcmpl-CsuJobYhZVGK5neggEkVnD0nHJFm7',
 'object': 'chat.completion',
 'created': 1767202236,
 'model': 'gpt-5-nano-2025-08-07',
 'choices': [{'index': 0,
   'message': {'role': 'assistant',
    'content': 'Fun fact: Wombat poop is cube-shaped, which helps it stack and prevents it from rolling away. The cube shape comes from how their intestines form the pellets as they dry.',
    'refusal': None,
    'annotations': []},
   'finish_reason': 'stop'}],
 'usage': {'prompt_tokens': 11,
  'completion_tokens': 559,
  'total_tokens': 570,
  'prompt_tokens_details': {'cached_tokens': 0, 'audio_tokens': 0},
  'completion_tokens_details': {'reasoning_tokens': 512,
   'audio_tokens': 0,
   'accepted_prediction_tokens': 0,
   'rejected_prediction_tokens': 0}},
 'service_tier': 'default',
 'system_fingerprint': None}

# What is the openai package?

It's known as a Python Client Library.

It's nothing more than a wrapper around making this exact call to the http endpoint.

It just allows you to work with nice Python code instead of messing around with janky json objects.

But that's it. It's open-source and lightweight. Some people think it contains OpenAI model code - it doesn't!


In [4]:
# Create OpenAI client

from openai import OpenAI
openai = OpenAI()

response = openai.chat.completions.create(model="gpt-5-nano", messages=[{"role": "user", "content": "Tell me a fun fact"}])

response.choices[0].message.content



'Fun fact: A day on Venus is longer than its year. It takes about 243 Earth days to rotate once, but only about 225 Earth days to orbit the Sun.'

In [27]:
models = openai.models.list()

for m in models.data:
    print(m.id)

gpt-4-0613
gpt-4
gpt-3.5-turbo
chatgpt-image-latest
gpt-4o-mini-tts-2025-03-20
gpt-4o-mini-tts-2025-12-15
gpt-realtime-mini-2025-12-15
gpt-audio-mini-2025-12-15
davinci-002
babbage-002
gpt-3.5-turbo-instruct
gpt-3.5-turbo-instruct-0914
dall-e-3
dall-e-2
gpt-4-1106-preview
gpt-3.5-turbo-1106
tts-1-hd
tts-1-1106
tts-1-hd-1106
text-embedding-3-small
text-embedding-3-large
gpt-4-0125-preview
gpt-4-turbo-preview
gpt-3.5-turbo-0125
gpt-4-turbo
gpt-4-turbo-2024-04-09
gpt-4o
gpt-4o-2024-05-13
gpt-4o-mini-2024-07-18
gpt-4o-mini
gpt-4o-2024-08-06
chatgpt-4o-latest
gpt-4o-audio-preview
gpt-4o-realtime-preview
omni-moderation-latest
omni-moderation-2024-09-26
gpt-4o-realtime-preview-2024-12-17
gpt-4o-audio-preview-2024-12-17
gpt-4o-mini-realtime-preview-2024-12-17
gpt-4o-mini-audio-preview-2024-12-17
o1-2024-12-17
o1
gpt-4o-mini-realtime-preview
gpt-4o-mini-audio-preview
computer-use-preview
o3-mini
o3-mini-2025-01-31
gpt-4o-2024-11-20
computer-use-preview-2025-03-11
gpt-4o-search-preview-2025-03-

## And then this great thing happened:

OpenAI's Chat Completions API was so popular, that the other model providers created endpoints that are identical.

They are known as the "OpenAI Compatible Endpoints".

For example, google made one here: https://generativelanguage.googleapis.com/v1beta/openai/

And OpenAI decided to be kind: they said, hey, you can just use the same client library that we made for GPT. We'll allow you to specify a different endpoint URL and a different key, to use another provider.

So you can use:

```python
gemini = OpenAI(base_url="https://generativelanguage.googleapis.com/v1beta/openai/", api_key="AIz....")
gemini.chat.completions.create(...)
```

And to be clear - even though OpenAI is in the code, we're only using this lightweight python client library to call the endpoint - there's no OpenAI model involved here.

If you're confused, please review Guide 9 in the Guides folder!

And now let's try it!

## THIS IS OPTIONAL - but if you wish to try out Google Gemini, please visit:

https://aistudio.google.com/

And set up your API key at

https://aistudio.google.com/api-keys

And then add your key to the `.env` file, being sure to Save the .env file after you change it:

`GOOGLE_API_KEY=AIz...`


In [None]:
GEMINI_BASE_URL = "https://generativelanguage.googleapis.com/v1beta/openai/"

load_dotenv(override=True)

google_api_key = os.getenv("GOOGLE_API_KEY")

if not google_api_key:
    print("No API key was found - please be sure to add your key to the .env file, and save the file! Or you can skip the next 2 cells if you don't want to use Gemini")
elif not google_api_key.startswith("AIz"):
    print("An API key was found, but it doesn't start AIz")
else:
    print("API key found and looks good so far!")



In [None]:
gemini = OpenAI(base_url=GEMINI_BASE_URL, api_key=google_api_key)

response = gemini.chat.completions.create(model="gemini-2.5-flash-lite", messages=[{"role": "user", "content": "Tell me a fun fact"}])

response.choices[0].message.content

## And Ollama also gives an OpenAI compatible endpoint

...and it's on your local machine!

If the next cell doesn't print "Ollama is running" then please open a terminal and run `ollama serve`

In [5]:
requests.get("http://localhost:11434").content

b'Ollama is running'

### Download llama3.2 from meta

Change this to llama3.2:1b if your computer is smaller.

Don't use llama3.3 or llama4! They are too big for your computer..

In [6]:
!ollama pull llama3.2

[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠼ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠴ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠦ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠧ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest [K
pulling dde5aa3fc5ff:   0% ▕                  ▏ 436 KB/2.0 GB                  [K[?25h[?2026l[?2026h[?25l[A[1Gpulling manifest [K
pulling dde5aa3fc5ff:   0% ▕                  ▏ 7.5 MB/2.0 GB                  [K[?25h[?2026l[?2026h[?25l[A[1Gpulling manifest [K
pulling dde5aa3fc5ff:   1% ▕                  ▏  12 MB/2.0 GB                  [K[?25h[?2026l[?2026h[?25l[A[1Gpulling manifest [K
pulling dde5aa3fc5ff:   1% ▕                  ▏  17 MB/2.0 GB                  [K[?25h[?2026l[?2026h[?25l[A[

In [7]:
OLLAMA_BASE_URL = "http://localhost:11434/v1"

ollama = OpenAI(base_url=OLLAMA_BASE_URL, api_key='ollama')


In [15]:
# Get a fun fact

response = ollama.chat.completions.create(model="llama3.2", messages=[{"role": "user", "content": "Tell me a fun fact"}])

response.choices[0].message.content

"Did you know that honey never spoils? Archaeologists have found pots of honey in ancient Egyptian tombs that are over 3,000 years old and still perfectly edible. Honey's durability is due to its low water content and the acidity of its chemical composition, making it a unique and almost perpetual preservative."

In [16]:
# Now let's try deepseek-r1:1.5b - this is DeepSeek "distilled" into Qwen from Alibaba Cloud

!ollama pull deepseek-r1:1.5b

[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠼ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠴ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠴ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠧ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest [K
pulling aabd4debf0c8:   0% ▕                  ▏ 3.9 MB/1.1 GB                  [K[?25h[?2026l[?2026h[?25l[A[1Gpulling manifest [K
pulling aabd4debf0c8:   1% ▕                  ▏ 8.7 MB/1.1 GB                  [K[?25h[?2026l[?2026h[?25l[A[1Gpulling manifest [K
pulling aabd4debf0c8:   2% ▕                  ▏  17 MB/1.1 GB                  [K[?25h[?2026l[?2026h[?25l[A[1Gpulling manifest [K
pulling aabd4debf0c8:   2% ▕                  ▏  24 MB/1.1 GB                  [K[?25h[?2026l[?2026h[?25l[A[1Gpulling manifest [K
pulling aabd4debf0c8:   3% ▕   

In [17]:
response = ollama.chat.completions.create(model="deepseek-r1:1.5b", messages=[{"role": "user", "content": "Tell me a fun fact"}])

response.choices[0].message.content

"Sure! Here's a fun fact for you: \n\nDid you know that it's impossible to fold an A4 sheet of paper in half more than 7 times, no matter how big the paper is? After seven folds, the paper would be so thin that it wouldn't make sense anymore, even though in reality you could theoretically get closer and closer to halving its size each time.\n\nHow interesting! 😎"

# HOMEWORK EXERCISE ASSIGNMENT

Upgrade the day 1 project to summarize a webpage to use an Open Source model running locally via Ollama rather than OpenAI

You'll be able to use this technique for all subsequent projects if you'd prefer not to use paid APIs.

**Benefits:**
1. No API charges - open-source
2. Data doesn't leave your box

**Disadvantages:**
1. Significantly less power than Frontier Model

## Recap on installation of Ollama

Simply visit [ollama.com](https://ollama.com) and install!

Once complete, the ollama server should already be running locally.  
If you visit:  
[http://localhost:11434/](http://localhost:11434/)

You should see the message `Ollama is running`.  

If not, bring up a new Terminal (Mac) or Powershell (Windows) and enter `ollama serve`  
And in another Terminal (Mac) or Powershell (Windows), enter `ollama pull llama3.2`  
Then try [http://localhost:11434/](http://localhost:11434/) again.

If Ollama is slow on your machine, try using `llama3.2:1b` as an alternative. Run `ollama pull llama3.2:1b` from a Terminal or Powershell, and change the code from `MODEL = "llama3.2"` to `MODEL = "llama3.2:1b"`

In [18]:
from PyPDF2 import PdfReader

def load_pdf(path):
    
    reader = PdfReader(path)
    text = ""
    
    for page in reader.pages:
        text += page.extract_text()
    
    return text

In [19]:
system_prompt = "You are a helpful assistant that summarizes English text into Chichewa language. Your responses must be in Chichewa. You focus on salaient points. Refrain from literal translations."

# Get from PDF or website
nat_bank_prod = "/Users/dmatekenya/GIT-REPOS/llm_engineering/data/docs/VEHICLE LOANS FACTSHEET.pdf"
user_prompt = load_pdf(nat_bank_prod)
# Step 2: Make the messages list

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_prompt}
] # fill this in


In [20]:
# Use Llama 3.2 model locally with Ollama
response = ollama.chat.completions.create(
    model="llama3.2",
    messages=messages
)   
print(response.choices[0].message.content)

Mwathu wake, utumfuli wa chombo cha kasi hufikia wageni waliyotaka kupata chombo hicho hapo ndani ya maridadi tano:

 1. Kutumukua mamlaka ya biashara na watayarishaji wenye miundombinu inayohusisha maagizo kutoka kwa jukumu la watendaji.
 Mnamo 2010, vifaa vingine vinavyofikia kuipata ni maagizo mahali fulani ya wataraji.

 Chombo hiki kinatambulika kuwa rahisi sana sana zaidi kwani kitatoa utumzaji katika kila jukumu ambalo liko na marafiki yaliyojikita hapo.

 Ndogo inaandika nambari isiyo ya usumbufu au kipengele kinachowakilisha jumla, lakini maagizo yanatakiwa kulala kati ya maadili ya pamoja na kiasi na uwezo.

 Vifaa hivi hukusudiwi bora sana kuwasaidia watu wenye uchumi mbaya.


In [21]:
# Use Llama 3.2 model locally with Ollama
response = ollama.chat.completions.create(
    model="deepseek-r1:1.5b",
    messages=messages
)   
print(response.choices[0].message.content)

leneke kisilelo: bila upendetwea na kusidisha kipotekana kampala ni na chikewa bolesekwa na api lelelela, lohato weseledano lelo ni lelo niye lelu kwa kama mifo wa ndu na wako, la neku nini kiwe zina nelelo. lekulo kisitu maaseki hihihiho, wakwana aminai na nelelo hihihiho ki lela na niunia lelo nafu na naidi, wakwana apotehihiho kisivhiya ki webozi hihihiho. hihihiho lelakua, apotekana lelo ki uzo ana amlimbana lihihihihiki wa nelelo niu mifupi na nafunja kufikoseka, lina ni mihihihihiko. lofano, apobahisi mifipiti.  

mazibanyana lochilizimia, apominu lolelo kimekha ana lela la nelelo niwa, aminai apotesho anabatengi na naidi. hihihiho kisiri na limikubwa apotehihiho ki nelelo ni mifupiteke, naini wa mihihihihikama na hihihihihinjana kijere wa jumamia wa shirigumusa.  

"leneke kisilelo: lekulo ni apominu na aminai ni mifipiti, aminai apotehihiho ki chilizinga baanakadini anarikusimane. lihihihihiko, apobahisi miyo zaidi wa kuandishi kupuni na lina la nelelo ni wamwese kuhandera. hih

In [22]:
!ollama list

NAME                ID              SIZE      MODIFIED       
deepseek-r1:1.5b    e0979632db5a    1.1 GB    8 minutes ago     
llama3.2:latest     a80c4f17acd5    2.0 GB    13 minutes ago    
gpt-oss:latest      17052f91a42e    13 GB     2 days ago        
phi3:latest         4f2222927938    2.2 GB    4 days ago        


In [None]:
# Pull and try gpt-o1s model with Ollama
!ollama pull gpt-o1s

In [23]:
# Use Llama 3.2 model locally with Ollama
response = ollama.chat.completions.create(
    model="gpt-oss",
    messages=messages
)   
print(response.choices[0].message.content)

**Kumvera makwama a mphamvu pa mvula (Vehicle Loans)**  

- **Umphunzitsi (target customers):** Akampani, adzanja aliririyeko ndi adzanja ali ndi mabanja omwe angakhulupiliriz

- **Zikondi zakuyang'anira (product features):**  
  - Mwa 36 mgawo kwa mvula yokhudzana ndi zowonjezera (reconditioned) ndi zowonjezera zasinthidwa.  
  - Mwa 48 mgawo kwa mvula m'njira yesu (brand new).  
  - Zamukondwerera zimathandiza ifwere m'malere 5.  
  - Zamukondwereridwa zimathandiza ifwere m'malere 10.  
  - Imathandizira ŵaliliwenso NBM ndiponso ŵaliliwenso wolembala.

- **Mawonekedwe a mtengo:**
  - Malipiro owachitikira ndi RBM reference rate +10.1% pa nthawi yophatikira.  
  - 1% mwachiwiro (arrangement fee) pa zomwe zimakhala pa mbiri.  
  - 0.6% mtengo wa stamp duty, womwe wogwirizana nayo VAT.

- **Mwayi wokonzekera:**
  1.  Kudzaza bwanji.  
  2.  Kumbukila chilichonse chofulumira.  
  3.  Kuwaperekanso chitukuko cha mawu (payslip).  
  4.  Mawu osadziwika kuchokera ku mwadzokhudza.  
  5.  Ku

In [24]:
response = ollama.chat.completions.create(model="deepseek-r1:1.5b", messages=[{"role": "user", "content": "Tell me a fun fact in Chichewa"}])
print(response.choices[0].message.content)

Chichewa is an ancient language deeply rooted in Ghana's history, offering a rich tapestry of culture that merges African and European elements. The language excels at conveying emotions and expressions through unique phrases, such as 'chukka' to denote the receiver for a message of grief or disbelief. It also employs a script system with a hidden set of sounds, including the ability to ask directions via 'nna'. Cultural references are plentiful: Chichewa has a nickname known as 'chukka', often used in family gatherings and everyday contexts, especially naming children. Additionally, the word for 'cloud' is significant due to its symbolic meanings beyond just weather, linking Chichewa with deeper understanding of nature.


In [25]:
!ollama pull deepseek-r1:latest

[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠼ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠴ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠦ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠧ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest [K
pulling e6a7edc1a4d7:   0% ▕                  ▏ 2.3 KB/5.2 GB                  [K[?25h[?2026l[?2026h[?25l[A[1Gpulling manifest [K
pulling e6a7edc1a4d7:   0% ▕                  ▏ 175 KB/5.2 GB                  [K[?25h[?2026l[?2026h[?25l[A[1Gpulling manifest [K
pulling e6a7edc1a4d7:   0% ▕                  ▏ 1.5 MB/5.2 GB                  [K[?25h[?2026l[?2026h[?25l[A[1Gpulling manifest [K
pulling e6a7edc1a4d7:   0% ▕                  ▏ 6.9 MB/5.2 GB                  [K[?25h[?2026l[?2026h[?25l[A[

In [26]:
response = ollama.chat.completions.create(
    model="deepseek-r1:latest",
    messages=messages
)   
print(response.choices[0].message.content)

Vichitundo vinavyofaa kama ndiko:

- **Muundo yasiyosemwa:** Ndiyo, ndiko zinazoonekana kama ndiyo zisizo kuhakikishanaji faida. Vipengele muhimu vya usambazaji na ushiriki inafikia vijana, wabunifu na wanawezi kubadilishana, na kampeni cha mafanikio zikiheshimika na kudhibiti za kupita.

- **Usambazaji na ushiriki:** Wanyumba wanaweza kumpitaa na ndiko, ambazo tunazungumzia kupita zote kwa kupita. Kipande cha kupita inatupeleka kampeni chanya kwenye kifedha ya utandawazi wa kikatiba + 10.1% pa, na kupita kwa 36, 48 au 60 mwezi. Hii ni ndiko kama unafuatia vigezo.

- **Hatari ya kudhibiti kufika:** Ndiyo, kama mtekelezehusika kufika kwenye muundo uliorekodiwa, inapokuwa hatari. Hivyo, maalum: kupita unatumiziwana kama upande wa mabadiliko ya kupita na hatari zisizo kwenye kampeni.

- **Faika ya kupita:** Kupita nda kupita, na inatafuta kupita kupitaa. Nyanja ya kupita ni ndugu zisizo kuhakikishanaji faida, na kupita wakati wengi wa kanyaraka kudhibiti kupita zote zili zisizo.
