# Tiny LLM Story Generator ‚Äî Training Notebook

**Purpose:** This notebook trains a compact GPT-2 style language model to generate short children‚Äôs stories using the **TinyStories** dataset. It covers data loading, tokenization, model configuration, custom training, checkpointing, and sampling from saved checkpoints.

## What this notebook does
1. **Setup (Colab + Dependencies):** Mount Google Drive for persistent storage and import core libraries (`transformers`, `datasets`, `torch`, etc.).  
2. **Data:** Load `roneneldan/TinyStories` via Hugging Face Datasets and perform lightweight preprocessing/tokenization suitable for small-context language modeling.  
3. **Model:** Initialize a small GPT-2 configuration (tokenizer + `GPT2LMHeadModel`) tailored for fast prototyping on limited resources.  
4. **Training Loop:** Train with `AdamW`, gradient clipping, and mini-batches using `DataLoader`/`IterableDataset`; track loss and save periodic checkpoints.  
5. **Logging & Plots:** Record training history (e.g., loss) and visualize progression to validate convergence.  
6. **Checkpointing:** Persist tokenizer/model to Drive for later reuse and reproducibility.  
7. **Inference:** Load a chosen checkpoint and generate stories to qualitatively evaluate results.

## Why TinyStories?
TinyStories is a curated corpus of short, simple narratives designed for training and evaluating small language models. It enables rapid experiments while demonstrating end-to-end LM training and text generation.

## Requirements
- Python 3.x, PyTorch, Transformers, Datasets, TQDM, Matplotlib  
- Sufficient GPU (e.g., Colab T4/A100) recommended

## Reproducibility & Tips
- Fix random seeds for consistent runs.  
- Start with a small context length and batch size; scale up gradually.  
- Monitor loss curves; stop early if overfitting.  
- Keep checkpoints versioned (e.g., `tinygpt2_epochN`).

> **Reference Dataset:** `roneneldan/TinyStories` (Hugging Face Datasets).  
> **Author:** Ashish (Data Science Mentor) ‚Äî YYYY-MM-DD.


### 1. Google Drive Mount

Mounts Google Drive in Colab to access and save files directly from your Drive.


In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### 2. Library Installation and Data Loading

- Installs the **`datasets`** library.  
- Suppresses warning messages for cleaner output.  
- Imports essential libraries for data handling, tokenization, visualization, and model building.  
- Loads the **TinyStories** dataset in streaming mode for training.  


In [2]:
# !pip install datasets

import warnings
warnings.filterwarnings("ignore")

import re
import torch
import random
from tqdm.auto import tqdm
from datasets import load_dataset
from transformers import GPT2Tokenizer

# Reproducibility (important for assignment)
SEED = 42
random.seed(SEED)
torch.manual_seed(SEED)
torch.cuda.manual_seed_all(SEED)

# Load TinyStories in streaming mode (memory-safe)
dataset = load_dataset(
    "roneneldan/TinyStories",
    split="train",
    streaming=True
)


README.md: 0.00B [00:00, ?B/s]

### 3. TinyStoriesStreamDataset Class

- Creates a **streaming PyTorch dataset** for TinyStories text.  
- Steps performed for each story:
  1. **Skip short samples:** Stories shorter than `min_length` are ignored.  
  2. **Clean text:**  
     - Removes extra spaces and unwanted characters.  
     - Replaces fancy quotes with standard quotes.  
  3. **Tokenize:** Converts text into token IDs using a GPT-2 tokenizer.  
  4. **Prepare training inputs:**  
     - `input_ids`: All tokens except the last one.  
     - `labels`: All tokens except the first one (for next-token prediction).  
     - `attention_mask`: Marks which tokens are real vs. padding.  



#### Example
    **Input text:**  
    `"  ‚ÄúThe dog runs!‚Äù said Tom.  "`  

    **After cleaning:**  
    `"The dog runs!" said Tom.`  

    **Tokenization output (IDs):**  
    `[50256, 464, 3290, 1101, 0, 616, 640, 13]`  

    **Prepared for training:**  
    | input_ids                | labels                    |
    |--------------------------|---------------------------|
    | [50256, 464, 3290, 1101] | [464, 3290, 1101, 0]      |

    This way, the model learns to predict the **next token** at each position.  

In [3]:
from torch.utils.data import IterableDataset
import re

class TinyStoriesStreamDataset(IterableDataset):
    def __init__(self, dataset_stream, tokenizer, block_size=512, min_length=30):
        self.dataset = dataset_stream
        self.tokenizer = tokenizer
        self.block_size = block_size
        self.min_length = min_length

    def __iter__(self):
        for sample in self.dataset:
            text = sample["text"].strip()

            if len(text) < self.min_length:
                continue

            # Clean text
            text = re.sub(r'\s+', ' ', text)
            text = re.sub(r'[‚Äú‚Äù]', '"', text)
            text = re.sub(r"[‚Äò‚Äô]", "'", text)
            text = re.sub(r'[^a-zA-Z0-9.,!?\'"\s]', '', text)

            # Tokenize WITHOUT padding (much safer)
            tokenized = self.tokenizer(
                text,
                truncation=True,
                max_length=self.block_size,
                add_special_tokens=True
            )

            input_ids = tokenized["input_ids"]

            # Skip too-short sequences
            if len(input_ids) < 2:
                continue

            yield {
                "input_ids": torch.tensor(input_ids[:-1], dtype=torch.long),
                "labels": torch.tensor(input_ids[1:], dtype=torch.long),
                "attention_mask": torch.ones(len(input_ids) - 1, dtype=torch.long)
            }


### 4. Load Tokenizer, DataLoader, Model, and Optimizer Setup

1. **Training size & batching**
   - Define total samples and `batch_size`; compute `max_batches_per_epoch` for progress tracking.

2. **Tokenizer**
   - Load GPT-2 tokenizer and set the **pad token** to EOS for consistent padding.

3. **Streaming dataset ‚Üí DataLoader**
   - Wrap `TinyStoriesStreamDataset` with a `DataLoader` to yield mini-batches for training.

4. **Model configuration**
   - Build a **small GPT-2**:
     - `vocab_size = len(tokenizer)`
     - Context length: `n_positions = n_ctx = 512`
     - Model width: `n_embd = 256`
     - Depth/heads: `n_layer = 4`, `n_head = 4`
     - Use tokenizer‚Äôs `pad_token_id`

5. **Device placement**
   - Move model to **GPU** if available; enable **DataParallel** when multiple GPUs exist.

6. **Optimizer**
   - Initialize **AdamW** with learning rate `5e-5` for stable transformer training.

In [4]:
from transformers import GPT2Tokenizer
from torch.utils.data import DataLoader
from torch.optim import AdamW
from transformers import GPT2Config, GPT2LMHeadModel
import torch

# ---------------- SAFE TRAINING CONFIG ----------------
batch_size = 8                 # CRITICAL: small batch for Colab
max_batches_per_epoch = 2000   # LIMIT batches to avoid RAM crash

# ---------------- TOKENIZER ----------------
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token

# ---------------- DATASET ----------------
stream_dataset = TinyStoriesStreamDataset(dataset, tokenizer)

# Custom collate function (pads dynamically per batch)
def collate_fn(batch):
    input_ids = [item["input_ids"] for item in batch]
    labels = [item["labels"] for item in batch]
    attention_masks = [item["attention_mask"] for item in batch]

    input_ids = torch.nn.utils.rnn.pad_sequence(
        input_ids, batch_first=True, padding_value=tokenizer.pad_token_id
    )
    labels = torch.nn.utils.rnn.pad_sequence(
        labels, batch_first=True, padding_value=-100  # ignore loss on padding
    )
    attention_masks = torch.nn.utils.rnn.pad_sequence(
        attention_masks, batch_first=True, padding_value=0
    )

    return {
        "input_ids": input_ids,
        "labels": labels,
        "attention_mask": attention_masks
    }

train_loader = DataLoader(
    stream_dataset,
    batch_size=batch_size,
    collate_fn=collate_fn
)

# ---------------- MODEL CONFIG ----------------
config = GPT2Config(
    vocab_size=len(tokenizer),
    n_positions=512,
    n_ctx=512,
    n_embd=256,
    n_layer=4,
    n_head=4,
    pad_token_id=tokenizer.pad_token_id
)

model = GPT2LMHeadModel(config)

# ---------------- DEVICE ----------------
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

# ‚ùå DO NOT USE DataParallel IN COLAB (saves RAM)
# if torch.cuda.device_count() > 1:
#     model = torch.nn.DataParallel(model)

# ---------------- OPTIMIZER ----------------
optimizer = AdamW(model.parameters(), lr=5e-5)


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

### 5. Training Loop, Checkpointing, and Sampling

1. **Setup**
   - Define a checkpoint folder on Google Drive.
   - Set number of epochs and initialize a loss history list.
   - Switch model to training mode.

2. **Epoch training**
   - For each epoch:
     - Iterate over mini-batches up to `max_batches_per_epoch`.
     - Move tensors to the selected device (CPU/GPU).
     - Compute loss with labels for next-token prediction.
     - Zero gradients ‚Üí backpropagate ‚Üí clip gradients (max norm = 1.0) ‚Üí optimizer step.
     - Accumulate batch losses.

3. **Track progress**
   - Compute and log **average loss** per epoch.
   - Append the epoch‚Äôs average loss to `history`.

4. **Checkpointing**
   - Create an epoch-specific folder (e.g., `tinygpt2_epochN`).
   - Save both the **model** and **tokenizer** to Drive after every epoch.

5. **Qualitative check (sampling)**
   - Temporarily switch to eval mode.
   - Generate a short continuation from the prompt *‚ÄúOnce upon a time‚Äù*.
   - Print the generated text to inspect model quality, then return to train mode.

6. **Persist training history**
   - Save the list of epoch losses to `training_history.json` on Drive for later plotting or review.


In [5]:
from pathlib import Path
import json
from tqdm.auto import tqdm
from torch.nn.utils import clip_grad_norm_
import torch
from torch.cuda.amp import autocast, GradScaler

def unwrap_model(m):
    return m.module if hasattr(m, "module") else m

# ---------------- FAST CONFIG ----------------
epochs = 3                          # BIG speedup
gradient_accumulation_steps = 4
max_batches_per_epoch = 500         # BIG speedup
context_limit = 256                 # reduce attention cost

checkpoint_dir = Path("/content/drive/MyDrive/TinyLLM/model/")
checkpoint_dir.mkdir(parents=True, exist_ok=True)

history = []
scaler = GradScaler()

# ---------------- FREEZE LOWER LAYERS ----------------
base_model = unwrap_model(model)

for param in base_model.transformer.wte.parameters():
    param.requires_grad = False
for param in base_model.transformer.wpe.parameters():
    param.requires_grad = False

# Freeze first 2 transformer blocks
for block in base_model.transformer.h[:2]:
    for param in block.parameters():
        param.requires_grad = False

print("‚úÖ Frozen embeddings + first 2 transformer layers")

model.train()
optimizer.zero_grad()

# ---------------- TRAINING ----------------
for epoch in range(epochs):
    print(f"\nEpoch {epoch + 1}/{epochs}")
    epoch_loss = 0.0

    progress = tqdm(enumerate(train_loader), total=max_batches_per_epoch)

    for step, batch in progress:
        if step >= max_batches_per_epoch:
            break

        # üî• truncate dynamically to 256 tokens
        input_ids = batch["input_ids"][:, :context_limit].to(device)
        labels = batch["labels"][:, :context_limit].to(device)
        attention_mask = batch["attention_mask"][:, :context_limit].to(device)

        with autocast():
            outputs = model(
                input_ids=input_ids,
                labels=labels,
                attention_mask=attention_mask
            )
            loss = outputs.loss / gradient_accumulation_steps

        scaler.scale(loss).backward()

        if (step + 1) % gradient_accumulation_steps == 0:
            scaler.unscale_(optimizer)
            clip_grad_norm_(model.parameters(), 1.0)
            scaler.step(optimizer)
            scaler.update()
            optimizer.zero_grad()

        epoch_loss += loss.item() * gradient_accumulation_steps
        progress.set_postfix(loss=loss.item() * gradient_accumulation_steps)

    avg_loss = epoch_loss / max_batches_per_epoch
    history.append(avg_loss)
    print(f"Average Loss: {avg_loss:.4f}")

    # ---------------- SAVE CHECKPOINT ----------------
    epoch_checkpoint = checkpoint_dir / f"tinygpt2_fast_epoch{epoch+1}"
    epoch_checkpoint.mkdir(parents=True, exist_ok=True)

    unwrap_model(model).save_pretrained(epoch_checkpoint)
    tokenizer.save_pretrained(epoch_checkpoint)

    torch.cuda.empty_cache()

# ---------------- FINAL SAMPLE ----------------
model.eval()
with torch.no_grad():
    sample_input = tokenizer.encode(
        "Once upon a time",
        return_tensors="pt"
    ).to(device)

    generated_ids = unwrap_model(model).generate(
        sample_input,
        max_length=50,
        temperature=0.8,
        top_k=50,
        top_p=0.95,
        pad_token_id=tokenizer.pad_token_id,
        eos_token_id=tokenizer.eos_token_id
    )

    print("\nFinal Sample Output:")
    print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))

# ---------------- SAVE HISTORY ----------------
history_path = Path("/content/drive/MyDrive/TinyLLM/training_history.json")
with open(history_path, "w") as f:
    json.dump(history, f)

print(f"\nTraining history saved to {history_path}")


‚úÖ Frozen embeddings + first 2 transformer layers

Epoch 1/3


  0%|          | 0/500 [00:00<?, ?it/s]

`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.


Average Loss: 10.2639

Epoch 2/3


  0%|          | 0/500 [00:00<?, ?it/s]

Average Loss: 10.1129

Epoch 3/3


  0%|          | 0/500 [00:00<?, ?it/s]

Average Loss: 10.0562


The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.



Final Sample Output:
Once upon a time there a girl little named. was to. was to. was to. was to. was to. was to. was to the. was to. was. was to the. was and. was to. was. was

Training history saved to /content/drive/MyDrive/TinyLLM/training_history.json


### 6. Resume Training from Checkpoint

1. **Load checkpoint**
   - Restore the model and tokenizer from `tinygpt2_epoch6`.

2. **Configure training**
   - Recreate optimizer, device placement (GPU if available), and batching parameters.

3. **Continue epochs**
   - Train from epoch 7 onward (up to the target `epochs`), repeating the standard loop:
     - Forward pass ‚Üí loss
     - Zero grads ‚Üí backward pass
     - Gradient clipping (max norm = 1.0)
     - Optimizer step

4. **Checkpoint each epoch**
   - Save model and tokenizer to `tinygpt2_epoch{N}` after every epoch.

5. **Quick qualitative check**
   - Switch to eval, generate a short continuation from ‚ÄúOnce upon a time‚Äù, print sample, then return to train mode.


In [14]:
from pathlib import Path
from tqdm.auto import tqdm
from torch.nn.utils import clip_grad_norm_
import torch
from torch.optim import AdamW
from torch.cuda.amp import autocast, GradScaler
from transformers import GPT2LMHeadModel, GPT2Tokenizer

# ---------------- LOAD CHECKPOINT ----------------
# Point to an existing checkpoint, e.g., from the previous run's final epoch
checkpoint_path = Path("/content/drive/MyDrive/TinyLLM/model/tinygpt2_fast_epoch3")

model = GPT2LMHeadModel.from_pretrained(
    str(checkpoint_path),
    local_files_only=True
)

tokenizer = GPT2Tokenizer.from_pretrained(
    str(checkpoint_path),
    local_files_only=True
)

tokenizer.pad_token = tokenizer.eos_token

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

# ---------------- OPTIMIZER ----------------
optimizer = AdamW(model.parameters(), lr=5e-5)
scaler = GradScaler()

# ---------------- RESUME CONFIG ----------------
start_epoch = 3                     # Start from the epoch AFTER the loaded checkpoint (i.e., epoch 4)
epochs = 7                          # Total number of epochs to train for
gradient_accumulation_steps = 4
max_batches_per_epoch = 500
context_limit = 256

checkpoint_dir = Path("/content/drive/MyDrive/TinyLLM/model/")
checkpoint_dir.mkdir(parents=True, exist_ok=True)

# ---------------- FREEZE LOWER LAYERS ----------------
for param in model.transformer.wte.parameters():
    param.requires_grad = False

for param in model.transformer.wpe.parameters():
    param.requires_grad = False

for block in model.transformer.h[:2]:
    for param in block.parameters():
        param.requires_grad = False

print("‚úÖ Resuming training with frozen embeddings + first 2 transformer layers")

model.train()
optimizer.zero_grad()

# ---------------- TRAINING LOOP ----------------
for epoch in range(start_epoch, epochs):
    print(f"\nEpoch {epoch + 1}/{epochs}")
    epoch_loss = 0.0

    progress = tqdm(enumerate(train_loader), total=max_batches_per_epoch)

    for step, batch in progress:
        if step >= max_batches_per_epoch:
            break

        input_ids = batch["input_ids"][:, :context_limit].to(device)
        labels = batch["labels"][:, :context_limit].to(device)
        attention_mask = batch["attention_mask"][:, :context_limit].to(device)

        with autocast():
            outputs = model(
                input_ids=input_ids,
                labels=labels,
                attention_mask=attention_mask
            )
            loss = outputs.loss / gradient_accumulation_steps

        scaler.scale(loss).backward()

        if (step + 1) % gradient_accumulation_steps == 0:
            scaler.unscale_(optimizer)
            clip_grad_norm_(model.parameters(), 1.0)
            scaler.step(optimizer)
            scaler.update()
            optimizer.zero_grad()

        epoch_loss += loss.item() * gradient_accumulation_steps
        progress.set_postfix(loss=loss.item() * gradient_accumulation_steps)

    # ‚úÖ THIS WAS MISSING / BROKEN BEFORE
    avg_loss = epoch_loss / max_batches_per_epoch
    print(f"Average Loss: {avg_loss:.4f}")

    # ---------------- SAVE CHECKPOINT ----------------
    # Save checkpoint with a name reflecting the current epoch
    epoch_checkpoint = checkpoint_dir / f"tinygpt2_fast_resume_epoch{epoch+1}"
    epoch_checkpoint.mkdir(parents=True, exist_ok=True)

    model.save_pretrained(epoch_checkpoint)
    tokenizer.save_pretrained(epoch_checkpoint)

    torch.cuda.empty_cache()

# ---------------- QUICK SAMPLE ----------------
model.eval()
with torch.no_grad():
    sample_input = tokenizer.encode(
        "Once upon a time",
        return_tensors="pt"
    ).to(device)

    generated_ids = model.generate(
        sample_input,
        max_length=50,
        temperature=0.8,
        top_k=50,
        top_p=0.95,
        pad_token_id=tokenizer.pad_token_id,
        eos_token_id=tokenizer.eos_token_id
    )

    print("\nFinal Sample Output:")
    print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))


‚úÖ Resuming training with frozen embeddings + first 2 transformer layers

Epoch 4/7


  0%|          | 0/500 [00:00<?, ?it/s]

Average Loss: 10.0210

Epoch 5/7


  0%|          | 0/500 [00:00<?, ?it/s]

Average Loss: 9.9919

Epoch 6/7


  0%|          | 0/500 [00:00<?, ?it/s]

Average Loss: 9.9646

Epoch 7/7


  0%|          | 0/500 [00:00<?, ?it/s]

Average Loss: 9.9385

Final Sample Output:
Once upon a time there a girl Lily She. was, was and to. day the and to the. day the. day the. was and to the. day the. was and the. was and the. was and the. was and


### 7. Generate Text from a Saved GPT-2 Checkpoint

1. **Load model and tokenizer**
   - Load tokenizer and model from a custom-trained checkpoint (`epoch_5`).

2. **Define generation function**
   - Encodes input text with attention masks.
   - Uses `model.generate` to produce a continuation up to `max_len`.

3. **Run examples**
   - Generate short story snippets for several starting prompts (e.g., "Once there was little boy", "Once there was a cute little").

- **Related Work:** A Kaggle-hosted version of this project is available here: [TinyStoryLLM by Ashish Jangra](https://www.kaggle.com/models/ashishjangra27/tinystoryllm)

In [16]:
from transformers import GPT2Tokenizer, GPT2LMHeadModel
import torch
from pathlib import Path

# ---------------- LOAD SAVED CHECKPOINT ----------------
# Update this path to an existing checkpoint, e.g., the last one saved
model_directory = Path("/content/drive/MyDrive/TinyLLM/model/tinygpt2_fast_resume_epoch7")

tokenizer = GPT2Tokenizer.from_pretrained(model_directory)
model = GPT2LMHeadModel.from_pretrained(model_directory)

tokenizer.pad_token = tokenizer.eos_token

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()

# ---------------- GENERATION FUNCTION ----------------
def generate_text(prompt, max_len=40):
    with torch.inference_mode():  # faster than no_grad
        input_ids = tokenizer.encode(
            prompt,
            return_tensors="pt"
        ).to(device)

        output_ids = model.generate(
            input_ids=input_ids,
            max_length=max_len,
            do_sample=True,
            temperature=0.8,
            top_k=50,
            top_p=0.95,
            use_cache=True,                 # SPEEDUP
            pad_token_id=tokenizer.pad_token_id,
            eos_token_id=tokenizer.eos_token_id
        )

        return tokenizer.decode(
            output_ids[0],
            skip_special_tokens=True
        )

# ---------------- TEST PROMPTS ----------------
prompts = [
    "Once there was a little boy",
    "Once there was a little girl",
    "Once there was a cute",
    "Once there was a cute little",
    "Once there was a handsome"
]

for p in prompts:
    print(generate_text(p))


Once there was a little boylease born. was385 to paddle toItalian and.asu Signed wiped to Gl yogaparse. Confederation overtly theurusurus shines clean1971 He memories,Unique to. Mandal
Once there was a little girl functionality Lily Asians. was to introducingBrazil dividend inflation Accept She to she a. dayieties it dances compelling allegations strikeouts. day 98 AppearsŸé andspecies dances dates the influenced
Once there was a cute She 232. Auburn the to adversecube the. Auburn hem.opolisEsc and1971 the.ieg dates the. DD IC Cum Reborn arms rave Shaun the. ministerial Desert Reborn
Once there was a cute little named. hot the and. Somers to hem Investors the disagrees sclerosis. wasster the born Reborn V. dayastical BEFORE nostalgic Punchumerous and V bonesEva hem.authored
Once there was a handsome buyerSNEsc. loved proxies outsiders seeming andsoft rave sloweructor andituteparse the. day the. day analysis the microbiome the Federiaz a Mafia Greeks to.mus the


### 8. Inference with Pretrained TinyStories Model

1. **Load pretrained models**
   - `AutoModelForCausalLM`: Loads the `roneneldan/TinyStories-3M` causal language model.  
   - `AutoTokenizer`: Uses `EleutherAI/gpt-neo-125M` tokenizer for text processing.

2. **Prepare input**
   - Encode a simple prompt: `"Once upon a time there was"`.

3. **Generate text**
   - Use `model.generate` with `max_length=1000` to produce a story continuation.

4. **Decode output**
   - Convert token IDs back to readable text and print the generated story.


In [17]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# ---------------- LOAD PRETRAINED MODEL ----------------
model_name = "roneneldan/TinyStories-3M"
tokenizer_name = "EleutherAI/gpt-neo-125M"

model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)

tokenizer.pad_token = tokenizer.eos_token

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()

# ---------------- GENERATION FUNCTION ----------------
def generate_text(prompts, max_len=150):
    """
    prompts: str or list[str]
    """
    if isinstance(prompts, str):
        prompts = [prompts]

    with torch.inference_mode():
        inputs = tokenizer(
            prompts,
            return_tensors="pt",
            padding=True
        ).to(device)

        output_ids = model.generate(
            **inputs,
            max_length=max_len,
            do_sample=True,
            temperature=0.8,
            top_k=50,
            top_p=0.95,
            use_cache=True,
            pad_token_id=tokenizer.pad_token_id,
            eos_token_id=tokenizer.eos_token_id
        )

    return [
        tokenizer.decode(ids, skip_special_tokens=True)
        for ids in output_ids
    ]

# ---------------- TEST PROMPTS ----------------
prompts = [
    "Once there was little boy",
    "Once there was little girl",
    "Once there was a cute",
    "Once there was a cute little",
    "Once there was a handsome"
]

outputs = generate_text(prompts)

for out in outputs:
    print(out)
    print()


config.json: 0.00B [00:00, ?B/s]

pytorch_model.bin:   0%|          | 0.00/66.7M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/727 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/66.7M [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/357 [00:00<?, ?B/s]

A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Once there was little boy


John was a curious boy who liked to wander around his neighbourhood. He was always looking for adventure and learning more.

One day, he saw an old man in his house. He was wearing a blue hat and a yellow hat. He looked sad and asked him why he was so sad.

The old man said, "I am sad because I miss my house. I miss my family."

John was very sad, but he knew he had to give the old man something to make him feel better. He was also very excited about the old man's kindness.

John smiled and ran back to his house to give the old man a big hug. From then on, he

Once there was little girl


One day, she was playing with her doll when her mom came in. "What are you doing, dear?" she asked. 

"I'm going to give you a treat," replied Mom. 

The little girl was very excited. She ran to the kitchen to get some ice cream. She quickly put the spoon on the floor and waited for the ice cream to be ready. 

The little girl was very careful as she watched her mom in the 

### Assignment: Code-Focused Inference

Your task is to load a pre-trained GPT-2 model and configure it to answer *only* questions related to Python coding.

1. **Load Model and Tokenizer:** Load a suitable pre-trained GPT-2 model and its corresponding tokenizer. You can use `transformers.AutoModelForCausalLM` and `transformers.AutoTokenizer`. A smaller model like `gpt2` or `gpt2-medium` might be sufficient.
2. **Implement a Filtering Mechanism:** Use prompt techniques
3. **Generate Response:** If the prompt is deemed a Python coding question, generate a response using the loaded GPT-2 model.
4. **Handle Non-Coding Questions:** If the prompt is not related to Python coding, return a predefined message indicating that the model can only answer coding questions.
5. **Test:** Test your implementation with various prompts, including both Python coding questions and non-coding questions, to ensure the filtering mechanism works correctly.

#Assignment

In [20]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# ---------------- LOAD GPT-2 MODEL ----------------
model_name = "gpt2"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

tokenizer.pad_token = tokenizer.eos_token

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()

# ---------------- PYTHON QUESTION FILTER ----------------
def is_python_question(prompt):
    python_keywords = [
        "python", "list", "dictionary", "tuple", "set",
        "function", "def", "class", "loop", "for", "while",
        "if", "else", "elif", "import", "exception",
        "pandas", "numpy"
    ]
    prompt = prompt.lower()
    return any(word in prompt for word in python_keywords)

# ---------------- INTERACTIVE LOOP ----------------
print(" Python Coding Assistant (type 'exit' to stop)\n")

while True:
    user_prompt = input("Enter your question: ")

    if user_prompt.lower() == "exit":
        print("üëã Exiting the program.")
        break

    if not is_python_question(user_prompt):
        print("This model only answers Python programming questions.\n")
        continue

    with torch.no_grad():
        inputs = tokenizer(user_prompt, return_tensors="pt").to(device)

        output = model.generate(
            input_ids=inputs["input_ids"],
            max_length=150,
            do_sample=True,
            temperature=0.7,
            top_k=50,
            top_p=0.9,
            pad_token_id=tokenizer.pad_token_id,
            eos_token_id=tokenizer.eos_token_id
        )

    print("\nAnswer:")
    print(tokenizer.decode(output[0], skip_special_tokens=True))
    print()


 Python Coding Assistant (type 'exit' to stop)

Enter your question: what is todays weather 
This model only answers Python programming questions.

Enter your question: what are loops in python

Answer:
what are loops in python?

Well, one way to look at it is that you can do loop creation using a different Python function, and then you can create a loop using that function. The first thing you need to do is make sure that your loop is created in a way that it will never be executed. This is because Python functions will execute whenever you run them. The second thing to look for is that when you run a loop, the code inside of the loop is executed. If you run a loop inside a function, that code will be executed. So, the code inside of the loop is executed if the loop is called by that function.

The loop creates a new loop that will never be executed, and that

Enter your question: exit
üëã Exiting the program.
