# End of week 1 exercise

To demonstrate your familiarity with OpenAI API, and also Ollama, build a tool that takes a technical question,
and responds with an explanation. This is a tool that you will be able to use yourself during the course!

See [README.md](README.md) in this folder for full exercise details and requirements.

In [None]:
# imports
import os

from dotenv import load_dotenv
from openai import OpenAI
from IPython.display import Markdown, display

In [None]:
# constants

MODEL_GPT = 'gpt-4o-mini'
MODEL_LLAMA = 'llama3.2'

In [None]:
# set up environment
load_dotenv(".env", override=True)
openai_api_key = os.getenv("OPENAI_API_KEY")

if not openai_api_key:
    raise ValueError("OPENAI_API_KEY is missing. Add it to your environment or .env file.")

# OpenAI client
openai_client = OpenAI(api_key=openai_api_key)

# Ollama OpenAI-compatible client (requires `ollama serve` running locally)
ollama_client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")

In [None]:
# here is the question; type over this to ask something new

question = """
What is Retrieval-Augmented Generation (RAG), and when should I choose it instead of fine-tuning an LLM?
Please explain in a beginner-friendly way and include:
- A simple definition of RAG
- How RAG works step-by-step (embedding, retrieval, context injection, generation)
- RAG vs fine-tuning: cost, speed, maintenance, and data freshness
- One customer-support chatbot example (knowledge base + typical user question)
- One common mistake and how to avoid it
- A short checklist for deciding between RAG and fine-tuning
"""

In [None]:
# Get gpt-4o-mini to answer, with streaming
messages = [
    {
        "role": "system",
        "content": "You are an AI tools mentor helping new developers. Use clear, beginner-friendly language and structure the answer with short headings. Cover: (1) simple definition, (2) step-by-step workflow, (3) trade-offs, (4) practical example, (5) common mistake + fix, and (6) quick decision checklist. Keep the answer practical and concise.",
    },
    {"role": "user", "content": question},
]

stream = openai_client.chat.completions.create(
    model=MODEL_GPT,
    messages=messages,
    stream=True,
)

gpt_chunks = []
for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    if delta:
        gpt_chunks.append(delta)
        print(delta, end="")

gpt_answer = "".join(gpt_chunks)

display(Markdown("## GPT-4o-mini Answer\n\n" + gpt_answer))

In [None]:
# Get Llama 3.2 to answer
llama_response = ollama_client.chat.completions.create(
    model=MODEL_LLAMA,
    messages=messages,
)

llama_answer = llama_response.choices[0].message.content
display(Markdown("## Llama 3.2 Answer\n\n" + llama_answer))