# Inference Notebook

Use this notebook to interact with your trained model. This notebook is designed to:
1.  **Load Weights:** Import your trained model checkpoints & original model checkpoints (specifically `.pth` files).
2.  **Generate Text:** Run inference to produce new text based on your custom training.

## Imports & Device Setup

In [1]:
import torch
import tiktoken
import os

# Import architecture and generation utilities
from model import GPTModel
from generate import generate_and_print
from download_weights import download_and_save_gpt2

# Device Setup
if torch.backends.mps.is_available():
    device = torch.device("mps")
elif torch.cuda.is_available():
    device = torch.device("cuda")
else:
    device = torch.device("cpu")

print(f"Inference Device: {device}")

Inference Device: mps


## Model Selection & Configuration

**Choose Your Model:**
In the cell below, you can toggle `MODEL_SELECTION` between your own trained model or one of the official pre-trained GPT-2 models from OpenAI.

* **Custom Model (`"custom"`):** Loads the `.pth` file you trained earlier from the `./models` directory.
    * *Important:* Ensure the configuration inside the `if` block (especially `context_length`) matches your training settings exactly.
* **Official Models:** Downloads and converts the original OpenAI weights automatically.
    * *Available Sizes:* `gpt2-small (124M)`, `gpt2-medium (355M)`, `gpt2-large (774M)`, `gpt2-xl (1558M)`.
    * *Hardware Warning:* The **XL (1558M)** model requires significant RAM (~12GB+) during the initial conversion process.

In [2]:
# Available configurations for gpt2
model_configs = {
    "gpt2-small (124M)":  {"emb_dim": 768,  "n_layers": 12, "n_heads": 12},
    "gpt2-medium (355M)": {"emb_dim": 1024, "n_layers": 24, "n_heads": 16},
    "gpt2-large (774M)":  {"emb_dim": 1280, "n_layers": 36, "n_heads": 20},
    "gpt2-xl (1558M)":    {"emb_dim": 1600, "n_layers": 48, "n_heads": 25},
}

# --- USER CHOICE ---
# You can copy-paste keys from above:
# "gpt2-small (124M)", "gpt2-medium (355M)", "gpt2-large (774M)", "gpt2-xl (1558M)"
# OR set to "custom" for your locally trained model.
MODEL_SELECTION = "gpt2-small (124M)" 


# Directory to store models
MODELS_DIR = "./model_weights"
os.makedirs(MODELS_DIR, exist_ok=True)

if MODEL_SELECTION == "custom":
    # --- PATH A: YOUR CUSTOM MODEL ---
    # Ensure this filename matches exactly what you saved in train.ipynb!
    filename = "trained_gpt2.pth" 
    model_path = os.path.join(MODELS_DIR, filename)
    print(f"\nSelection: Custom Model ({model_path})")
    
    config = {
        "vocab_size": 50257,    
        "context_length": 256, 
        "emb_dim": 768,         
        "n_heads": 12,          
        "n_layers": 12,         
        "drop_rate": 0.1,       
        "qkv_bias": False       
    }

elif MODEL_SELECTION in model_configs:
    # --- PATH B: OFFICIAL OPENAI MODEL ---
    print(f"\nSelection: Official OpenAI GPT-2 ({MODEL_SELECTION})")
    
    # 1. Parse the size ID from the string (e.g., extract "124M" from "gpt2-small (124M)")
    # We split by '(' and take the second part, then remove the closing ')'
    size_id = MODEL_SELECTION.split("(")[1].split(")")[0]
    
    # 2. Download & Load
    # Pass the clean ID ("124M") to the downloader
    model_path = download_and_save_gpt2(size_id, MODELS_DIR, GPTModel)
    
    # 3. Apply Configuration
    config = {
        "vocab_size": 50257,
        "context_length": 1024,
        "drop_rate": 0.0,
        "qkv_bias": True # OpenAI Official Weights always have Bias=True
    }
    # Update with specific dimensions (emb_dim, layers, heads)
    config.update(model_configs[MODEL_SELECTION])

else:
    raise ValueError(f"Unknown model selection: {MODEL_SELECTION}")


Selection: Official OpenAI GPT-2 (gpt2-small (124M))
Downloading 124M files...
File already exists: ./model_weights/tf_weights/124M/checkpoint
File already exists: ./model_weights/tf_weights/124M/encoder.json
File already exists: ./model_weights/tf_weights/124M/hparams.json


model.ckpt.data-00000-of-00001: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 498M/498M [19:45<00:00, 420kiB/s]
model.ckpt.index: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 5.21k/5.21k [00:00<00:00, 5.36MiB/s]
model.ckpt.meta: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 471k/471k [00:01<00:00, 277kiB/s]
vocab.bpe: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 456k/456k [00:01<00:00, 345kiB/s]


Loading TensorFlow weights...
Converting to PyTorch...
Saving PyTorch model to ./model_weights/gpt2_124M.pth...
Done!


## Loading Weights

In [3]:
print(f"Loading model from: {model_path} ...")

model = GPTModel(config)
state_dict = torch.load(model_path, map_location=device, weights_only=True)
model.load_state_dict(state_dict)
model.to(device)
model.eval()

print("-> Model successfully loaded and ready!")
tokenizer = tiktoken.get_encoding("gpt2")

Loading model from: ./model_weights/gpt2_124M.pth ...
-> Model successfully loaded and ready!


## Interactive Generation

Run the cell below to generate text. 

* **`start_context`**: The prompt you give the model.
* **`max_new_tokens`**: How much text to write.
* **`temperature`**: Controls creativity.
    * `0.0` = Deterministic (Always picks the most likely word).
    * `0.8` = Creative (More random, diverse).
    * `1.2` = Chaotic (Can be incoherent).
* **`top_k`**: Limits the vocabulary choices to the $K$ most likely next tokens. This prevents the model from picking highly improbable "junk" words that can ruin the sentence.

In [6]:
# Setup Tokenizer
tokenizer = tiktoken.get_encoding("gpt2")

# --- USER SETTINGS ---
PROMPT = "The history of artificial intelligence"
LENGTH = 100
TEMP   = 0.7  # Try 0.5 to 1.0 for best results
TOP_K  = 50   # Restrict to top 50 probable words (prevents weird artifacts)

# Run Generation
print(f"Generating for prompt: '{PROMPT}'\n" + "-"*50)
output = generate_and_print(
    model, 
    tokenizer, 
    device, 
    start_context=PROMPT, 
    max_new_tokens=LENGTH, 
    temperature=TEMP,
    top_k=TOP_K
)


[Gen]: The history of artificial intelligence is not without its challenges. It is also a history of ignorance. We must learn from the mistakes of others and work to overcome them.  It is a history of ignorance that we have not learned from. We have learned to accept the limitations of our own thinking, to seek outside the box, to have patience and to seek out alternatives.  We have learned to accept that there may be some things that we cannot control, that we may not completely control, that we may not fully

