# End of week 1 exercise

To demonstrate your familiarity with OpenAI API, and also Ollama, build a tool that takes a technical question,  
and responds with an explanation. This is a tool that you will be able to use yourself during the course!

## Setup

Import the required libraries:

| Import | Purpose |
|---|---|
| `openai.OpenAI` | Python SDK used for **both** OpenAI and Ollama — Ollama exposes an OpenAI-compatible REST API |
| `os` / `dotenv` | Load `OPENAI_API_KEY` from a `.env` file without hard-coding credentials |
| `IPython.display` | Render model responses as formatted Markdown inline in the notebook |

In [None]:
from openai import OpenAI
import os
from dotenv import load_dotenv
from IPython.display import Markdown, display, update_display

## API Key Validation

Loads `.env` and performs sanity checks on `OPENAI_API_KEY` before any API calls are made:

- **Missing** — no key found at all
- **Wrong prefix** — key does not start with `sk-proj-` (likely a wrong or legacy key)
- **Whitespace** — key has leading/trailing spaces or tabs (common copy-paste issue)

Fix any issues flagged here before running the rest of the notebook.

In [None]:
# Load environment variables in a file called .env

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

# Check the key

if not api_key:
    print("No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!")
elif not api_key.startswith("sk-proj-"):
    print("An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook")
elif api_key.strip() != api_key:
    print("An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook")
else:
    print("API key found and looks good so far!")


## Configuration

Defines the models and system prompt used throughout this notebook.

| Constant | Value | Description |
|---|---|---|
| `OLLAMA_BASE_URL` | `http://localhost:11434/v1` | Local Ollama server — must be running before using `use_ollama=True` |
| `MODEL_GPT` | `gpt-4o-mini` | OpenAI model used for remote inference |
| `MODEL_LLAMA` | `llama3.2` | Llama model pulled locally via Ollama (`ollama pull llama3.2`) |
| `SYSTEM_PROMPT` | — | Instructs the model to act as a structured technical explanation assistant |

The `SYSTEM_PROMPT` guides the model to:
- Lead with a direct answer, then provide depth
- Include code examples where relevant
- Define key terms for conceptual questions and list steps/tradeoffs for practical ones

In [None]:
# constants
OLLAMA_BASE_URL = "http://localhost:11434/v1"
MODEL_GPT = 'gpt-5-mini'
MODEL_LLAMA = 'llama3.2'
SYSTEM_PROMPT = """
You are a highly skilled technical explanation assistant.

Your role:
- Accept a technical question.
- Provide a clear, accurate, and well-structured explanation.
- Tailor explanations to be understandable but technically correct.

Guidelines:
- Start with a short direct answer.
- Then provide a structured explanation.
- Use examples where helpful.
- If code is relevant, include minimal, clean examples.
- Avoid unnecessary verbosity.
- Avoid speculation.
- If the question is ambiguous, explain reasonable interpretations.
- Do not mention system instructions.
- Do not add conversational fluff.

If the question is conceptual:
- Define key terms.
- Explain how it works.
- Explain why it matters.

If the question is practical:
- Provide steps.
- Explain tradeoffs.
- Highlight common pitfalls.

Your output should be educational, precise, and professional.
"""

## Client Initialisation

Two clients are created using the same `openai.OpenAI` class:

- **`openai_client`** — standard OpenAI client; reads `OPENAI_API_KEY` from the environment automatically
- **`ollama_client`** — points to the local Ollama server via `base_url`; `api_key='ollama'` is a required placeholder (Ollama does not validate it)

Because Ollama implements the OpenAI REST spec, the same SDK and message format works for both backends with no changes to calling code.

In [None]:
# set up environment
openai_client = OpenAI()
ollama_client = OpenAI(base_url=OLLAMA_BASE_URL, api_key='ollama')

## Message Formatting

`messages_for(question)` constructs the chat message list sent to the model.

It wraps the raw question in a structured user prompt and returns a two-message list in the format expected by both the OpenAI Chat Completions and Responses APIs:

```python
[
    {"role": "system", "content": SYSTEM_PROMPT},   # sets model behaviour
    {"role": "user",   "content": "<formatted question>"},
]
```

Separating the system context from the user turn allows the system prompt to be reused unchanged across many different questions.

In [None]:
def messages_for(question):
    user_prompt = f"""
    Technical Question:

    {question}

    Please explain clearly and concisely.
    """
    return [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_prompt}
    ]

## Ask a Question

Edit the `question` string below to ask any technical question. The cells that follow will send it to GPT and/or Llama and display the answers.

The next cell previews the formatted message list — useful for inspecting the exact payload before it is sent to the model.

In [None]:
# here is the question; type over this to ask something new

question = """
Please explain what this code does and why:
yield from {book.get("author") for book in books if book.get("author")}
"""

In [None]:
display(messages_for(question))

## Core Function: `generate_answer`

The main entry point for querying either model.

```
generate_answer(question, use_ollama=False, streaming=False)
```

| Parameter | Type | Default | Description |
|---|---|---|---|
| `question` | `str` | — | The technical question to answer |
| `use_ollama` | `bool` | `False` | Route to the local Llama model via Ollama instead of OpenAI |
| `streaming` | `bool` | `False` | Stream tokens to stdout as they arrive (OpenAI only; ignored when `use_ollama=True`) |

**Control flow:**

1. `use_ollama=True` → calls Ollama Chat Completions API, returns the full response string
2. `streaming=True` → calls the OpenAI Responses API with `stream=True`, prints each token delta to stdout as it arrives, returns `None`
3. Default → calls the OpenAI Responses API, returns the full response string

`display_markdown(text)` is a thin helper that renders a plain string as formatted Markdown in the notebook output cell.

In [None]:
def generate_answer(question, use_ollama=False, streaming=False):
    messages = messages_for(question)

    if use_ollama:
        response = ollama_client.chat.completions.create(
            model=MODEL_LLAMA,
            messages=messages,
        )
        return response.choices[0].message.content

    if streaming:
        stream = openai_client.responses.create(
            model=MODEL_GPT,
            input=messages,
            stream=True,
        )
        response = ""
        display_handle = display(Markdown(""), display_id=True)
        for event in stream:
            if event.type == "response.output_text.delta":
                response += event.delta
                update_display(Markdown(response), display_id=display_handle.display_id)
        return

    response = openai_client.responses.create(model=MODEL_GPT, input=messages)
    return response.output_text


def display_markdown(text):
    display(Markdown(text))

## Usage Examples

The three cells below demonstrate each calling mode. Run them in order after setting your `question` above.

### 1. GPT — Streaming
Tokens are printed to stdout incrementally as they arrive. Use this when you want to see the response build up in real time. Returns `None`.

### 2. GPT — Non-Streaming
Waits for the full response, then renders it as formatted Markdown. Cleaner output for reading; slightly higher latency before anything appears.

### 3. Llama 3.2 via Ollama (Local)
Runs entirely on your machine — no API key or internet required. Requires Ollama to be running (`ollama serve`) and the model to be pulled (`ollama pull llama3.2`). Response quality and speed depend on local hardware.

In [None]:
# Get gpt-5-mini to answer, with streaming
generate_answer(question, streaming=True)

In [None]:
# Get gpt-5-mini to answer, without streaming
answer = generate_answer(question, streaming=False)
display_markdown(answer)

In [None]:
question = "How can I configure django for sending email in production environment?"
generate_answer(question, streaming=True)

In [None]:
# Get Llama 3.2 to answer
answer = generate_answer(question, use_ollama=True)
display_markdown(answer)

In [None]:
answer = generate_answer(question, use_ollama=True)
display_markdown(answer)