# OpenAI Completions Workshop Notebook

End-to-end exploration of OpenAI *text completion* patterns: listing models, baseline prompting, refinement, multiple candidates, token awareness, penalties, and streaming.

Environment requirement: `OPENAI_API_KEY` must be set (or a `.env` loaded earlier). Optional: set `OPENAI_MODEL` to override the default model.

In [None]:
# Imports & client initialization
import os
import time
from openai import OpenAI

API_KEY = os.getenv('OPENAI_API_KEY')
if not API_KEY:
    raise EnvironmentError(
        'OPENAI_API_KEY not set. Set it in your environment or .env file.')

# Override by exporting OPENAI_MODEL if desired
MODEL = os.getenv('OPENAI_MODEL', 'gpt-3.5-turbo-instruct')
client = OpenAI(api_key=API_KEY)
print(f'Using model: {MODEL}')

## 1. List Available Models
Shows the raw list returned by the Models API (truncated for brevity).

In [None]:
models = client.models.list()
print(f'Total models: {len(models.data)}')
for m in models.data[:10]:  # show first 10
    print(f'{m.id:40} | owned_by={getattr(m, 'owned_by', '?')}')

## 2. Baseline Completion
Single prompt -> single completion.

In [None]:
prompt = (
    "You are a branding assistant. Generate 5 concise, punchy tagline options for an "
    "eco-friendly household cleaning spray named 'PureMist'. Each tagline should: \n"
    "1) Emphasize natural ingredients, \n"
    "2) Convey effectiveness, and \n"
    "3) Stay under 12 words.\n\n"
    "Return them as a simple numbered list."
)
response = client.completions.create(
    model=MODEL,
    prompt=prompt,
    max_tokens=120,
    temperature=0.85,  # raised for creativity parity with script
    n=1,
)
raw_text = response.choices[0].text.strip()
lines = [l.strip() for l in raw_text.splitlines() if l.strip()]
print("Model:", MODEL)
print("Prompt:\n" + prompt)
print("\nTagline Candidates:")
for line in lines:
    print(line)

## 3. Multiple Candidates & Temperature Sweep
Generate several candidates to compare diversity.

In [None]:
temps = [0.2, 0.7, 1.0]
for t in temps:
    resp = client.completions.create(
        model=MODEL, prompt=prompt, max_tokens=100, n=3, temperature=t)
    print(f'--- temperature={t} ---')
    for i, choice in enumerate(resp.choices, 1):
        print(f'[{i}] \n{choice.text.strip()}\n')
    print("")

## 4. Presence & Frequency Penalties

Beginner-friendly view:

Large language models decide the next word based on probabilities learned from lots of text. Sometimes you want to nudge them to explore new wording or to stop them from repeating the same phrase.

Two knobs help:

- **presence_penalty**: “Have I said this token at least once?” If yes, gently lower its chance next time. This pushes the model to *introduce new ideas* or vocabulary. Think: diversity booster.
- **frequency_penalty**: “How many times have I already used this token?” The more repeats, the stronger the down‑weight. This curbs *over-repetition* (e.g., the model saying the same word again and again).

Plain analogy:
- Presence = discourage re‑opening a door you already opened once.
- Frequency = the more times you keep walking through the same door, the more resistance you feel.

Typical ranges: 0.0 (off) to ~1.0 (strong). Start small (0.2–0.6). Increase if outputs feel stale or repetitive. Keep them at 0.0 if the task *needs* consistent terminology (e.g., legal definitions or code identifiers).

Quick guidance:
- Want more variety? Raise **presence_penalty** first.
- Seeing exact phrase echoes (“green clean shine shine”)? Raise **frequency_penalty**.
- If results become too off-topic, dial them back toward 0.

Below we brute‑force a tiny grid just to *see* the qualitative difference—values are illustrative only.

### Interpreting the Sample Output You Saw
(You ran the loop over `(presence_penalty, frequency_penalty)` pairs: `(0.0,0.0)`, `(0.0,0.8)`, `(0.8,0.0)`, `(0.8,0.8)`.)

**1. (0.0, 0.0)**  
Baseline: Taglines are fine but you see recurring structures like “PureMist:” / “Gentle on…”. The model is free to reuse high‑probability phrases.

**2. (0.0, 0.8)**  
High *frequency* penalty alone discourages using the *same* high-frequency tokens repeatedly *within a single completion*. You still got eco / clean / nature themes (they're semantically on‑topic), but wording shifts a bit (“Powerful purity…”, “Eco‑friendly cleaning made easy.”). Two identical blocks appeared because the cell likely executed twice—the loop itself only yields that combination once.

**3. (0.8, 0.0)**  
High *presence* penalty pushes the model to introduce *new* tokens earlier. Result: slightly more structural variation (e.g., reordered phrasing “Nature's power, PureMist clean.”). Again, duplicate block = second execution of the cell, not the loop producing it twice.

**4. (0.8, 0.8)**  
Both penalties high. You saw fewer, longer taglines trending toward more descriptive phrasing. One list stopped after item 4 (blank #5). Causes can include:
- Model chose to end early (it *thinks* output is done).
- Hit `max_tokens=80` mid-enumeration (less likely but possible if earlier lines were long).
- Penalties lowered probabilities of continuing the numbered pattern, increasing chance of early stop.

**Why repeated groups?**  
Your pasted output shows each non‑baseline configuration twice. That indicates the cell was likely run multiple times; the loop itself enumerates each pair only once.

**Why blank #5?**  
The model may have terminated before producing the fifth line. To mitigate:
- Increase `max_tokens` (e.g., 120 → gives room if lines expand under penalties).
- Make numbering *mandatory*: “Output exactly 5 lines numbered 1) .. 5). If you cannot, still produce placeholders.”
- Add a stop sequence (e.g., `stop=["\n6)"]`) to encourage a clean finish after 5.

### Practical Tweaks
| Goal | Adjustment |
|------|------------|
| More lexical variety but still concise | Raise `presence_penalty` moderately (0.4–0.6) |
| Reduce phrase echoes (“eco-friendly” every line) | Raise `frequency_penalty` first (0.4–0.7) |
| Maintain strict count of items | Add explicit instruction + post-validate count |
| Avoid truncated lists | Raise `max_tokens`; enforce completion via regex retry |

### Simple Post-Validation Pattern
After generation, if you expect exactly 5 items but count < 5, you can: (a) re-prompt asking only for the missing numbers, or (b) re-run with slightly lower penalties.

Feel free to experiment by isolating just one penalty at a time to build intuition.

In [None]:
for presence in [0.0, 0.8]:
    for freq in [0.0, 0.8]:
        resp = client.completions.create(
            model=MODEL, prompt=prompt, max_tokens=80, temperature=0.7,
            presence_penalty=presence, frequency_penalty=freq, n=1)
        print(f'\npresence_penalty={presence} frequency_penalty={freq}')
        print(resp.choices[0].text.strip(), '\n---')

## 5. Token Counting (tiktoken)
Estimate prompt token usage for budgeting / compression.

In [None]:
try:
    import tiktoken
    enc = tiktoken.get_encoding('cl100k_base')
    token_count = len(enc.encode(prompt))
    print('Prompt tokens:', token_count)
except Exception as e:
    print('tiktoken not installed or error occurred:', e)

## 6. Streaming Responses
Stream tokens as they arrive for a longer generation.

In [None]:
long_prompt = (
    'Generate three distinct premium brand names and for each a two-sentence tagline for '
    'an eco-friendly smart home cleaning device brand. Emphasize sustainability, smart automation, and ease of use.'
)
print('Streaming start ->')
for chunk in client.completions.create(model=MODEL, prompt=long_prompt, max_tokens=200, temperature=0.8, stream=True):
    for c in chunk.choices:
        text = getattr(c, 'text', '')
        if text:
            print(text, end='')
print('\n\n<- Streaming end')

## 7. Simple Helper Wrapper with Retry
Basic utility to standardize calls & add retry-on-failure.

In [None]:
import random
from typing import List
from openai import APIError, RateLimitError


def generate(prompt: str, *, model: str = MODEL, n: int = 1, temperature: float = 0.7, max_tokens: int = 100, retries: int = 3, backoff: float = 1.5) -> List[str]:
    attempt = 0
    while True:
        try:
            resp = client.completions.create(
                model=model, prompt=prompt, n=n, temperature=temperature, max_tokens=max_tokens)
            return [c.text.strip() for c in resp.choices]
        except (RateLimitError, APIError) as e:
            attempt += 1
            if attempt > retries:
                raise
            sleep_for = backoff ** attempt + random.random()
            print(
                f'Retry {attempt}/{retries} after error: {e}. Sleeping {sleep_for:.2f}s')
            time.sleep(sleep_for)


# New neutral sample prompt
sample_prompt = 'List two differentiating value propositions for an eco-friendly smart home cleaning device.'
results = generate(sample_prompt, n=2, temperature=0.6)
for i, r in enumerate(results, 1):
    print(f'Choice {i}: {r}')

## 8. Recap & Next Steps
You explored: model listing, baseline prompting, multiple candidates, penalties, token counting, streaming, and retries.
Next ideas: add evaluation, cost estimation, structured extraction with stop sequences or JSON mode (if supported).