# Lite Notebook · openai/gpt-oss-20b · Intermediate

Token‑light tutorial: environment setup + runnable calls. Uses OpenAI SDK against selected provider (Poe/OpenAI‑compatible/local).

Details:
- Provider: poe
- Model: gpt-oss-20b

## Learning Objectives

- Configure provider and API key correctly
- Run a model call with safe defaults
- Tune basic parameters and/or streaming
- Record simple telemetry or ranking step

## Step 1: Streaming Basics

Use streaming to improve perceived latency, but start by verifying your Poe environment variables map into the OpenAI SDK.

In [None]:
import os
from textwrap import dedent

from openai import OpenAI

POE_KEY = os.getenv("POE_API_KEY")
if not os.getenv("OPENAI_API_KEY") and POE_KEY:
    os.environ["OPENAI_API_KEY"] = POE_KEY

api_key = os.getenv("OPENAI_API_KEY")
if not api_key:
    raise RuntimeError("Set POE_API_KEY (preferred) or OPENAI_API_KEY before running this notebook.")

base_url = os.getenv("OPENAI_BASE_URL") or "https://api.poe.com/v1"
os.environ["OPENAI_BASE_URL"] = base_url

client = OpenAI(api_key=api_key, base_url=base_url)

explanation = dedent("""
Streaming returns tokens incrementally so people can read along.
Non-streaming waits for the whole message, which is simpler for batch jobs
or when you need atomic JSON payloads. Use streaming for interactive UIs,
and non-streaming for deterministic post-processing pipelines.
""").strip()

print(f"OPENAI_BASE_URL -> {base_url}")
print(explanation)


## Step 2: Streaming Demo

Stream the model’s response for the brief and capture the incremental chunks.

In [None]:
from typing import List

messages = [
    {"role": "system", "content": "You help developers prototype streaming chat flows."},
    {"role": "user", "content": "introduction to harmony prompt format"},
]

stream = client.chat.completions.create(
    model="gpt-oss-20b",
    messages=messages,
    temperature=0.5,
    stream=True,
)

stream_chunks: List[str] = []
print("Streaming reply:\n")
for chunk in stream:
    choice = chunk.choices[0]
    delta = getattr(choice, "delta", None) or {}
    text = delta.get("content")
    if text:
        print(text, end="", flush=True)
        stream_chunks.append(text)

print()
streamed_reply = "".join(stream_chunks)


## Step 3: Telemetry

Capture latency and token usage from the response. Re-use the streaming summary to gather quick telemetry.

In [None]:
import time

telemetry_prompt = (
    "Summarize the streaming walkthrough in four concise bullet points. "
    "Keep the answer under 120 tokens."
)

start = time.perf_counter()
telemetry_response = client.chat.completions.create(
    model="gpt-oss-20b",
    messages=[
        {"role": "system", "content": "You are a concise telemetry aide."},
        {"role": "user", "content": telemetry_prompt},
    ],
    temperature=0.7,
    max_tokens=256,
)
elapsed = time.perf_counter() - start

completion_message = telemetry_response.choices[0].message
print(completion_message.get("content", "(no content returned)"))

usage = getattr(telemetry_response, "usage", None)
if usage:
    print(
        f"Latency: {elapsed:.2f}s | prompt_tokens={usage.prompt_tokens} "
        f"completion_tokens={usage.completion_tokens} total_tokens={usage.total_tokens}"
    )
else:
    print(f"Latency: {elapsed:.2f}s (token usage unavailable)")


Note: Live API calls require OPENAI_API_KEY and OPENAI_BASE_URL to be set. The setup cell helps map keys for Poe or gateways.