# Assignment 3: Fine-tune GPT-2 for Creative Story Generation (Google Colab)
This notebook fine-tunes GPT-2 Medium on a story dataset to generate creative stories.

**Instructions:**
1. Go to Runtime → Change runtime type → Select GPU (T4 recommended)
2. Run all cells in order

In [1]:
# Install required packages for Google Colab
!pip install -q transformers datasets accelerate


from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer, Trainer, TrainingArguments, DataCollatorForLanguageModeling
from datasets import load_dataset

# Check GPU availability

device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")
if device == "cuda":
    print(f"GPU: {torch.cuda.get_device_name(0)}")

Using device: cuda
GPU: Tesla T4


## Step 1: Load GPT-2 Medium Model and Tokenizer

In [3]:
# Load GPT-2 Medium (better quality)
model_name = "gpt2-medium"
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model = GPT2LMHeadModel.from_pretrained(model_name)

# Set padding token
tokenizer.pad_token = tokenizer.eos_token
model.config.pad_token_id = tokenizer.eos_token_id

print(f"Model loaded: {model_name}")
print(f"Parameters: {model.num_parameters():,}")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/718 [00:00<?, ?B/s]



model.safetensors:   0%|          | 0.00/1.52G [00:00<?, ?B/s]

Loading weights:   0%|          | 0/292 [00:00<?, ?it/s]

GPT2LMHeadModel LOAD REPORT from: gpt2-medium
Key                  | Status     |  | 
---------------------+------------+--+-
h.{0...23}.attn.bias | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Model loaded: gpt2-medium
Parameters: 354,823,168


## Step 2: Load and Prepare Story Dataset

In [4]:
# Load TinyStories dataset - specifically designed for story generation
dataset = load_dataset("roneneldan/TinyStories", split="train[:5000]")
print(f"Dataset size: {len(dataset)} stories")
print(f"\nSample story:\n{dataset[0]['text'][:500]}...")

README.md: 0.00B [00:00, ?B/s]

data/train-00000-of-00004-2d5a1467fff108(…):   0%|          | 0.00/249M [00:00<?, ?B/s]

data/train-00001-of-00004-5852b56a2bd28f(…):   0%|          | 0.00/248M [00:00<?, ?B/s]

data/train-00002-of-00004-a26307300439e9(…):   0%|          | 0.00/246M [00:00<?, ?B/s]

data/train-00003-of-00004-d243063613e5a0(…):   0%|          | 0.00/248M [00:00<?, ?B/s]

data/validation-00000-of-00001-869c898b5(…):   0%|          | 0.00/9.99M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/2119719 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/21990 [00:00<?, ? examples/s]

Dataset size: 5000 stories

Sample story:
One day, a little girl named Lily found a needle in her room. She knew it was difficult to play with it because it was sharp. Lily wanted to share the needle with her mom, so she could sew a button on her shirt.

Lily went to her mom and said, "Mom, I found this needle. Can you share it with me and sew my shirt?" Her mom smiled and said, "Yes, Lily, we can share the needle and fix your shirt."

Together, they shared the needle and sewed the button on Lily's shirt. It was not difficult for them b...


In [5]:
# Tokenize the dataset
def tokenize_function(examples):
    return tokenizer(
        examples["text"],
        truncation=True,
        max_length=256,
        padding="max_length"
    )

tokenized_dataset = dataset.map(tokenize_function, batched=True, remove_columns=["text"])
print(f"Tokenization complete!")

Map:   0%|          | 0/5000 [00:00<?, ? examples/s]

Tokenization complete!


## Step 3: Fine-tune the Model

In [7]:
# Data collator for language modeling
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

# Training arguments (optimized for Colab)
training_args = TrainingArguments(
    output_dir="/content/story_gpt2_model",

    num_train_epochs=3,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    learning_rate=5e-5,
    warmup_steps=100,
    logging_steps=50,
    save_steps=500,
    save_total_limit=2,
    fp16=torch.cuda.is_available(),  # Use mixed precision if GPU available
    report_to="none"
)

# Create trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
    data_collator=data_collator
)

print("Starting fine-tuning...")

Starting fine-tuning...


In [8]:
# Train the model
trainer.train()
print("\nFine-tuning complete!")

`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.


Step,Training Loss
50,2.058444
100,1.816321
150,1.78432
200,1.760829
250,1.720554
300,1.699491
350,1.601195
400,1.578656
450,1.575491
500,1.568694


Writing model shards:   0%|          | 0/1 [00:00<?, ?it/s]

Writing model shards:   0%|          | 0/1 [00:00<?, ?it/s]


Fine-tuning complete!


In [9]:
# Save the fine-tuned model
model.save_pretrained("/content/story_gpt2_model/final")
tokenizer.save_pretrained("/content/story_gpt2_model/final")
print("Model saved to /content/story_gpt2_model/final")

# Optional: Copy to Google Drive (uncomment if Drive is mounted)
# !cp -r /content/story_gpt2_model /content/drive/MyDrive/
# print("Model copied to Google Drive!")

Writing model shards:   0%|          | 0/1 [00:00<?, ?it/s]

Model saved to /content/story_gpt2_model/final


## Step 4: Generate Stories

In [10]:
def generate_story(prompt, max_length=300, temperature=0.8, top_p=0.92):
    """
    Generate a creative story from a prompt.

    Args:
        prompt: Starting text for the story (e.g., "Once upon a time")
        max_length: Maximum length of generated story
        temperature: Higher = more creative (0.7-1.0 recommended)
        top_p: Nucleus sampling parameter (0.9-0.95 recommended)
    """
    model.eval()

    # Encode prompt
    inputs = tokenizer.encode(prompt, return_tensors="pt").to(device)
    model.to(device)

    # Generate story
    with torch.no_grad():
        outputs = model.generate(
            inputs,
            max_length=max_length,
            temperature=temperature,
            top_p=top_p,
            do_sample=True,
            num_return_sequences=1,
            pad_token_id=tokenizer.eos_token_id,
            repetition_penalty=1.2,
            no_repeat_ngram_size=3
        )

    # Decode and return story
    story = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return story

print("Story generation function ready!")

Story generation function ready!


In [11]:
# Generate sample stories
prompts = [
    "Once upon a time, in a magical forest,",
    "The little girl found a mysterious box in her grandmother's attic.",
    "A brave knight set out on a journey to find the lost treasure."
]

print("=" * 60)
print("GENERATED STORIES")
print("=" * 60)

for i, prompt in enumerate(prompts, 1):
    print(f"\n--- Story {i} ---")
    print(f"Prompt: {prompt}\n")
    story = generate_story(prompt)
    print(story)
    print("\n" + "-" * 60)

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


GENERATED STORIES

--- Story 1 ---
Prompt: Once upon a time, in a magical forest,

Once upon a time, in a magical forest, there lived a little mouse named Tom. One day he heard something strange coming from the woods below him - it was loud and scary!

Tom's face grew very pale as he realized what had happened. He quickly ran to his mother and said "Mommy!"  They raced out of their house and hid under some trees. As they were hiding behind a tree trunk on top one of them got scared too. The other two did not want to scare eachother so they quietly went inside.



I love seeing compassionate adults like these around every corner! Whenever I'm stuck somewhere alone without anyone else to talk privately please remember: When giving advice ask yourself why isnâ€

------------------------------------------------------------

--- Story 2 ---
Prompt: The little girl found a mysterious box in her grandmother's attic.

The little girl found a mysterious box in her grandmother's attic. Inside th

## Step 5: Interactive Story Generator

In [None]:
# Interactive story generation
print("\n" + "=" * 60)
print("INTERACTIVE STORY GENERATOR")
print("=" * 60)
print("Enter a story prompt to generate a creative story.")
print("Type 'quit' to exit.\n")

while True:
    user_prompt = input("Enter your story prompt: ").strip()

    if user_prompt.lower() == 'quit':
        print("Goodbye!")
        break

    if not user_prompt:
        print("Please enter a prompt!\n")
        continue

    print("\nGenerating story...\n")
    story = generate_story(user_prompt, max_length=350)
    print("Generated Story:")
    print("-" * 40)
    print(story)
    print("-" * 40 + "\n")


INTERACTIVE STORY GENERATOR
Enter a story prompt to generate a creative story.
Type 'quit' to exit.

Enter your story prompt: The tiger of the country 

Generating story...

Generated Story:
----------------------------------------
The tiger of the country was very gentle. He loved to play with his friends and eat snacks from their bowls in the park every day, even when he felt lonely.

One evening while they were playing, the tigers saw a big group coming towards them. They all started running away at once and one by three or four disappeared into thin air.
 
Tiger's eyes lit up as he knew that it was only now that everyone would come back together again. The little girl looked sadly out of the corner her eye and ran after each of those disappearances. When she finally caught up there, no matter how hard she tried not too many people had returned.  She gave chase until eventually none remained. All except for this special creature - the graceful elephant! And so Tiger became more det