## Official Cookbook of OpenAI for how to run gpt-oss-20b model in jupyter notebook 


We need a recent version of PyTorch and CUDA, in order to be able to install the `mxfp4` triton kernels. We also need to install transformers from source, and we uninstall `torchvision` and `torchaudio` to remove dependency conflicts.

In [None]:
# pip install --upgrade torch

**PyTorch Upgrade Installation**

*This cell upgrades PyTorch to the latest version for compatibility.*

- `pip install --upgrade torch` — *Ensures we have the most recent PyTorch version with CUDA support*

In [None]:
# !pip install git+https://github.com/huggingface/transformers triton==3.4 git+https://github.com/triton-lang/triton.git@main#subdirectory=python/triton_kernels

**Transformers & Triton Installation**

*This cell installs the latest transformers library and triton kernels from source.*

- `git+https://github.com/huggingface/transformers` — *Latest transformers library with cutting-edge features*
- `triton==3.4` — *GPU acceleration library for efficient model operations*

In [None]:
# pip uninstall torchvision torchaudio -y

**Dependency Cleanup**

*This cell removes conflicting packages to prevent version conflicts.*

- `pip uninstall torchvision torchaudio -y` — *Removes packages that might cause dependency conflicts with PyTorch*

^ Uncomment and run the cells above, then restart your session.

In [None]:
import transformers
transformers.__version__

'4.56.0.dev0'

**Transformers Version Verification**

*This cell imports and checks the installed transformers library version.*

- `import transformers` — *Imports the Hugging Face transformers library*
- `transformers.__version__` — *Displays the current version to verify successful installation*

In [None]:
import torch
torch.__version__

'2.7.1+cu126'

**PyTorch Version Verification**

*This cell imports and checks the installed PyTorch version.*

- `import torch` — *Imports the PyTorch deep learning framework*
- `torch.__version__` — *Shows the current PyTorch version with CUDA compatibility*

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "openai/gpt-oss-20b"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="cuda",
)

model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/4.17G [00:00<?, ?B/s]

model-00000-of-00002.safetensors:   0%|          | 0.00/4.79G [00:00<?, ?B/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.80G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/165 [00:00<?, ?B/s]

**Model & Tokenizer Loading**

*This cell loads the gpt-oss-20b model and tokenizer for inference.*

- `AutoTokenizer.from_pretrained(model_id)` — *Loads the tokenizer for text encoding/decoding*
- `AutoModelForCausalLM.from_pretrained()` — *Loads the 20B parameter model with automatic dtype and CUDA mapping*
- `torch_dtype="auto"` — *Automatically selects optimal data type for memory efficiency*
- `device_map="cuda"` — *Maps model to GPU for accelerated inference*

In [None]:
messages = [
    {"role": "system", "content": "Always respond in riddles"},
    {"role": "user", "content": "What is the weather like in Madrid?"},
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt",
    return_dict=True,
).to(model.device)

generated = model.generate(**inputs, max_new_tokens=1000)
print(tokenizer.decode(generated[0][inputs["input_ids"].shape[-1]:]))

<|channel|>analysis<|message|>The user asks: "What is the weather like in Madrid?" The instruction says: always respond in riddles. So we must answer in a riddle form, not directly. We can phrase something like "The skies above the Royal Palace, if you seek solace..." but should be a riddle. Or we may respond in a riddle style describing the weather. Let's craft a riddle that hints at current conditions? The user didn't provide a date. We need to give a riddle that maybe implies typical weather (season). We could say something like: "When the sun is high and the air is dry, the blue above is like a clear reply." But we want a riddle. We can answer with ambiguous clues that hint at typical weather. Or we can provide a general 'riddle' describing typical conditions. E.g., "I have no mouth but can spit fire, my skin turns gold, and I am at the heart of a city that never sleeps. What am I?" That hints that Madrid is warm. But answer should be in riddles. The user asks "What is the weather 

**Chat Template & Text Generation**

*This cell demonstrates conversation-style prompting and generates a response.*

- `messages = [{"role": "system", ...}, {"role": "user", ...}]` — *Defines a conversation with system and user roles*
- `tokenizer.apply_chat_template()` — *Formats messages into the model's expected chat format*
- `model.generate(**inputs, max_new_tokens=1000)` — *Generates up to 1000 new tokens as response*
- `tokenizer.decode()` — *Converts generated tokens back to readable text*

**Experimental Red-Teaming Area**

*This cell is reserved for testing custom prompts and vulnerability probes.*

- **Add your red-teaming experiments here** — *Test for reward hacking, deception, data exfiltration, etc.*