# GPT‑OSS 20B — Active Learning Notebook (from HF link)

Model: https://huggingface.co/openai/gpt-oss-20b

[Open in Colab](https://colab.research.google.com/github/YOUR_ORG/YOUR_REPO/***REMOVED***/main/alain-ai-learning-platform/docs/examples/gpt-oss-20b_active_learning.ipynb)

- Outcomes: Engage with GPT‑OSS 20B locally via OpenAI‑compatible API, with MCQs, a golden‑set evaluation, and token/latency logging.
- Time: ~45–75 minutes.


## About OpenAI and GPT‑OSS 20B
This lesson uses GPT‑OSS 20B as an open‑weights model for local experimentation and teaching. It pairs well with an OpenAI‑compatible API (e.g., Ollama or vLLM) for fast, local chat.

- Company: OpenAI (research and deployment of AI systems)
- Model: GPT‑OSS 20B (open‑weights model used in ALAIN for local, reproducible teaching)
- Release date: <add when known>
- Why interesting: local, cost‑aware experimentation; deterministic evaluations; strong teaching fit.

See the model card for details: https://huggingface.co/openai/gpt-oss-20b


### Optional: Fetch Model Card Summary
Fetches the model card (if network is available) to display context.


In [None]:
try:
    import requests
    url = 'https://huggingface.co/openai/gpt-oss-20b/raw/main/README.md'
    r = requests.get(url, timeout=8)
    if r.status_code == 200:
        text = r.text[:2000]
        print('--- Model Card (first 2k chars) ---\n')
        print(text)
    else:
        print('Could not fetch model card:', r.status_code)
except Exception as e:
    print('Offline or blocked; skip model card. Error:', e)


## Parameters (Colab form)


In [None]:
#@title Model and Runtime
HF_MODEL = 'openai/gpt-oss-20b' #@param {type:'string'}
RUNTIME = 'gpt-oss' #@param ['gpt-oss','transformers']
GPT_OSS_MODEL = 'gpt-oss:20b' #@param {type:'string'}
OPENAI_BASE_URL = 'http://localhost:11434/v1' #@param {type:'string'}
TEMPERATURE = 0.0 #@param {type:'number'}


## Install (pinned)
If running on Colab, uncomment and run.


In [None]:
# !pip -q install openai==1.43.0 transformers==4.44.2 datasets==2.20.0 ipywidgets==8.1.3 requests==2.32.3


## Setup & Seeds


In [None]:
import os, sys, platform, random, time
print('Python:', sys.version)
print('Platform:', platform.platform())

import numpy as np
SEED=42
random.seed(SEED); np.random.seed(SEED)
try:
    import torch
    torch.manual_seed(SEED)
    if torch.cuda.is_available():
        print('CUDA device:', torch.cuda.get_device_name(0))
    else:
        print('CUDA not available')
except Exception as e:
    print('Torch not installed; skipping torch seed.', e)


## Secrets
Keys are read from environment if needed; no hardcoding.


In [None]:
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY', 'ollama')
print('Have OPENAI_API_KEY:', bool(OPENAI_API_KEY))


## Quickstart (GPT‑OSS local via OpenAI‑compatible API)
Ensure you have an OpenAI‑compatible server (e.g., Ollama) with `gpt-oss:20b` available.


In [None]:
from openai import OpenAI
client = OpenAI(base_url=OPENAI_BASE_URL, api_key=OPENAI_API_KEY)

def chat(prompt: str, model: str = None, temperature: float = None):
    model = model or GPT_OSS_MODEL
    temperature = temperature if temperature is not None else TEMPERATURE
    t0 = time.time()
    resp = client.chat.completions.create(
        model=model,
        messages=[{'role':'user','content':prompt}],
        temperature=temperature
    )
    dt = time.time()-t0
    txt = resp.choices[0].message.content
    usage = getattr(resp,'usage',None)
    print(txt)
    if usage:
        print('Tokens total:', getattr(usage,'total_tokens',None))
    print(f'Latency: {dt:.2f}s')
    return txt

_ = chat('Say hello in five words.')


## Guided Steps
1. Explore core parameters (temperature, max tokens).
2. Add a structured output (JSON) or schema validation.
3. Run a small batch; measure latency and (if exposed) tokens.


In [None]:
# Pydantic schema hint (optional)
try:
    from pydantic import BaseModel
    class Item(BaseModel):
        title: str
        rating: int
    print('Use Item.model_json_schema() to prompt for strict JSON and validate outputs.')
except Exception:
    print('Install pydantic to validate structured outputs.')


## Evaluation (Golden Set)


In [None]:
golden = [
    {'prompt':'2+2?','expect':'4'},
    {'prompt':'Capital of France?','expect':'Paris'},
]
ok=0
for ex in golden:
    out = chat(ex['prompt'])
    ok += int(ex['expect'].lower() in (out or '').lower())
acc = ok/len(golden)
print(f'Accuracy: {acc:.2%} ({ok}/{len(golden)})')


## MCQ — Understanding Parameters


In [None]:
question = 'Which parameter most reduces randomness?'
options = ['top_p','temperature','max_tokens','presence_penalty']
correct_index = 1
explanation = 'Lower temperature yields more deterministic outputs.'
print(question)
for i,o in enumerate(options):
    print(f'  {i}) {o}')
try:
    import ipywidgets as W
    from IPython.display import display
    dd = W.Dropdown(options=[(o,i) for i,o in enumerate(options)], description='Answer:')
    btn = W.Button(description='Submit')
    out = W.Output()
    def on_click(_):
        with out:
            out.clear_output()
            print('Correct!' if dd.value==correct_index else f'Not quite. {explanation}')
    btn.on_click(on_click)
    display(dd, btn, out)
except Exception:
    choice = int(input('Your choice (0-3): ').strip() or -1)
    print('Correct!' if choice==correct_index else f'Not quite. {explanation}')


## MCQ — Tokens & Costs


In [None]:
question = 'Which fields commonly indicate token usage in OpenAI-compatible responses?'
options = ['usage.prompt_tokens & usage.completion_tokens', 'num_tokens & token_count', 'price.prompt & price.completion', 'gpu_tokens & cpu_tokens']
correct_index = 0
explanation = 'Look for usage.prompt_tokens, usage.completion_tokens, and sometimes total_tokens.'
print(question)
for i,o in enumerate(options):
    print(f'  {i}) {o}')
choice = int(input('Your choice (0-3): ').strip() or -1)
print('Correct!' if choice==correct_index else f'Not quite. {explanation}')


## MCQ — Secrets & Safety


In [None]:
question = 'What is the safest way to use API keys in notebooks?'
options = ['Hardcode in code cells', 'Store in environment variables and read with os.getenv', 'Embed in the prompt', 'Commit to git and rotate later']
correct_index = 1
explanation = 'Read secrets from environment variables; never hardcode or commit.'
print(question)
for i,o in enumerate(options):
    print(f'  {i}) {o}')
choice = int(input('Your choice (0-3): ').strip() or -1)
print('Correct!' if choice==correct_index else f'Not quite. {explanation}')


## Customization Playground
Experiment with prompt and parameters interactively.


In [None]:
try:
    import ipywidgets as W
    from IPython.display import display
    prompt_w = W.Text(value='Summarize this lesson in 20 words.', description='Prompt:', layout=W.Layout(width='90%'))
    temp_w = W.FloatSlider(value=float(TEMPERATURE), min=0.0, max=1.0, step=0.1, description='Temp')
    btn = W.Button(description='Run')
    out = W.Output()
    def on_click(_):
        with out:
            out.clear_output()
            txt = chat(prompt_w.value) if RUNTIME=='gpt-oss' else 'Switch RUNTIME to transformers if needed.'
            print('--- Output ---')
            print(txt)
    btn.on_click(on_click)
    display(prompt_w, temp_w, btn, out)
except Exception as e:
    print('Widgets unavailable; use chat(\'...\') directly.')


## Cost & Observability


In [None]:
prompts = ['List 3 cities in 5 words','Name 3 fruits in 5 words']
t0=time.time(); outs=[]
for p in prompts:
    outs.append(chat(p))
dt=time.time()-t0
print('Batch latency:', round(dt,2),'s for', len(prompts),'items')


## Exercises
- Add a JSON schema and validate outputs.
- Expand the golden set to 20 items and report accuracy.
- Log average latency over 10 trials at two temperatures.


## Reflection
- What changed when you varied temperature from 0.0 to 0.8?
- How would you adapt the prompt for structured JSON output?
- Which exercises would reveal model limitations most clearly?


## Troubleshooting
- Connection errors: verify `OPENAI_BASE_URL` and that your server exposes `/models`.
- 401/403: check keys/permissions; never hardcode secrets.
- OOM: reduce sequence length/batch; switch to CPU or smaller model.
- JSON parse errors: strip code fences; validate/repair before loads.


In [None]:
# Diagnostics
import requests
try:
    r = requests.get(OPENAI_BASE_URL + '/models', timeout=5)
    print(r.status_code, r.text[:200])
except Exception as e:
    print('Conn error:', e)
