# HumorLLM: A Transformer-Based Language Model for Humor Generation

This project implements a custom transformer architecture called **Seagull** for generating humorous captions and text. The model uses modern techniques including RoPE positional encoding, RMS normalization, and SwiGLU activation functions.

## 🚀 Key Features

- **Custom Transformer Architecture**: Seagull transformer with 12 layers, 768 embedding dimensions
- **Modern Techniques**: RoPE, RMS LayerNorm, SwiGLU FFN, Gradient Clipping
- **Optimized Training**: Mixed precision training, model compilation, cosine LR scheduling
- **Humor-Focused**: Trained specifically on caption data for humor generation

## 📊 Model Performance

**Best Configuration**: batch_size=16, epochs=2
- **Validation Loss**: 2.559
- **Perplexity**: 13.239
- **Parameters**: ~85M parameters


## 🛠️ Setup and Installation

In [None]:
# Install required packages
# !pip install -r requirements.txt

import torch
import torch.nn as nn
import json
import sys
import os
from pathlib import Path

# Add project root to path
project_root = Path.cwd()
sys.path.append(str(project_root))

print(f"🔧 PyTorch Version: {torch.__version__}")
print(f"🔧 CUDA Available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"🔧 GPU: {torch.cuda.get_device_name()}")

## 🏗️ Model Architecture

In [None]:
# Load model configuration
with open('config/model_config.json', 'r') as f:
    config = json.load(f)

print("🏗️ Model Configuration:")
for section, params in config.items():
    print(f"\n{section.upper()}:")
    if isinstance(params, dict):
        for key, value in params.items():
            print(f"  {key}: {value}")
    else:
        print(f"  {params}")

In [None]:
# Import the Seagull model
from seagull.model.heads.seagull_lm import SeagullLM
from seagull.data_processing.bbpe import BBPE

# Initialize tokenizer
tokenizer = BBPE()
tokenizer.load_state_dict(torch.load('tokenizer/state_dict.json', map_location='cpu'))

print(f"📝 Tokenizer Vocabulary Size: {tokenizer.vocab_size}")
print(f"📝 Special Tokens: BOS={tokenizer.bos_token_id}, EOS={tokenizer.eos_token_id}, PAD={tokenizer.pad_token_id}")

## 🎯 Load Trained Model

In [None]:
# Initialize model with config
model_config = config['model']
model_config['vocab_size'] = tokenizer.vocab_size

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"🔧 Using device: {device}")

# Create model
model = SeagullLM(**model_config)

# Load trained weights
try:
    model_state = torch.load('models/final_model.pt', map_location=device)
    model.load_state_dict(model_state)
    model.to(device)
    model.eval()
    print("✅ Model loaded successfully!")
except FileNotFoundError:
    print("❌ Model file not found. Please ensure models/final_model.pt exists.")

# Count parameters
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f"🔢 Total Parameters: {total_params:,}")
print(f"🔢 Trainable Parameters: {trainable_params:,}")
print(f"🔢 Model Size: ~{total_params/1e6:.1f}M parameters")

## 🎭 Humor Generation Demo

In [None]:
def generate_humor(prompt, max_length=50, temperature=0.8, top_k=50, top_p=0.9):
    """
    Generate humorous text continuation from a prompt
    
    Args:
        prompt: Input text prompt
        max_length: Maximum tokens to generate
        temperature: Sampling temperature (higher = more creative)
        top_k: Top-k sampling
        top_p: Nucleus sampling
    """
    model.eval()
    
    # Tokenize input
    input_ids = tokenizer.encode(prompt)
    input_tensor = torch.tensor([input_ids], device=device)
    
    generated_ids = input_ids.copy()
    
    with torch.no_grad():
        for _ in range(max_length):
            # Get model predictions
            outputs = model(torch.tensor([generated_ids], device=device))
            logits = outputs.logits[0, -1, :] / temperature
            
            # Apply top-k filtering
            if top_k > 0:
                top_k_logits, top_k_indices = torch.topk(logits, top_k)
                logits = torch.full_like(logits, float('-inf'))
                logits[top_k_indices] = top_k_logits
            
            # Apply top-p filtering
            if top_p < 1.0:
                sorted_logits, sorted_indices = torch.sort(logits, descending=True)
                cumulative_probs = torch.cumsum(torch.softmax(sorted_logits, dim=-1), dim=-1)
                
                # Remove tokens with cumulative probability above the threshold
                sorted_indices_to_remove = cumulative_probs > top_p
                sorted_indices_to_remove[1:] = sorted_indices_to_remove[:-1].clone()
                sorted_indices_to_remove[0] = 0
                
                indices_to_remove = sorted_indices[sorted_indices_to_remove]
                logits[indices_to_remove] = float('-inf')
            
            # Sample next token
            probs = torch.softmax(logits, dim=-1)
            next_token = torch.multinomial(probs, 1).item()
            
            # Stop if EOS token
            if next_token == tokenizer.eos_token_id:
                break
                
            generated_ids.append(next_token)
    
    # Decode generated text
    generated_text = tokenizer.decode(generated_ids)
    return generated_text

print("🎭 Humor Generation Function Ready!")

In [None]:
# Demo prompts for humor generation
demo_prompts = [
    "A cat walks into a bar and",
    "Why did the programmer quit his job?",
    "The funniest thing about artificial intelligence is",
    "My computer is so slow that",
    "A robot, a human, and a cat are in an elevator when"
]

print("🎪 Generating Humorous Completions...\n")
print("=" * 60)

for i, prompt in enumerate(demo_prompts, 1):
    print(f"\n🎯 Prompt {i}: {prompt}")
    
    try:
        # Generate multiple completions with different temperatures
        for temp_name, temp_val in [("Conservative", 0.6), ("Balanced", 0.8), ("Creative", 1.0)]:
            completion = generate_humor(prompt, max_length=30, temperature=temp_val)
            print(f"\n  {temp_name} (T={temp_val}): {completion}")
    
    except Exception as e:
        print(f"  ❌ Error generating completion: {e}")
    
    print("-" * 50)

## 🎮 Interactive Humor Generator

In [None]:
# Interactive humor generation
print("🎮 Interactive Humor Generator")
print("Enter your prompts below (type 'quit' to exit)\n")

while True:
    try:
        user_prompt = input("🎯 Your prompt: ")
        
        if user_prompt.lower() in ['quit', 'exit', 'q']:
            print("👋 Thanks for using HumorLLM!")
            break
        
        if user_prompt.strip():
            print("\n🎭 Generating humor...")
            completion = generate_humor(user_prompt, max_length=40, temperature=0.8)
            print(f"\n✨ Result: {completion}\n")
            print("-" * 50)
        
    except KeyboardInterrupt:
        print("\n👋 Goodbye!")
        break
    except Exception as e:
        print(f"❌ Error: {e}")

## 📊 Model Analysis and Insights

In [None]:
# Analyze model architecture
print("🔍 Model Architecture Analysis\n")

def analyze_layer(name, module, depth=0):
    indent = "  " * depth
    params = sum(p.numel() for p in module.parameters() if p.requires_grad)
    print(f"{indent}{name}: {type(module).__name__} ({params:,} parameters)")

print("📋 Layer-by-layer breakdown:")
for name, module in model.named_children():
    analyze_layer(name, module)
    
    # Show transformer layers detail
    if hasattr(module, 'layers') and name == 'transformer':
        print(f"  ↳ {len(module.layers)} transformer layers")
        if len(module.layers) > 0:
            layer_params = sum(p.numel() for p in module.layers[0].parameters())
            print(f"  ↳ Each layer: ~{layer_params:,} parameters")

In [None]:
# Training metrics visualization (if available)
import matplotlib.pyplot as plt
import numpy as np

try:
    # This would show training curves if training logs were available
    print("📈 Training Metrics Summary:")
    print(f"  Final Validation Loss: 2.559")
    print(f"  Final Perplexity: 13.239")
    print(f"  Training Configuration: batch_size=16, epochs=2")
    print(f"  Optimizer: AdamW with cosine LR scheduling")
    print(f"  Special Features: Mixed precision, gradient clipping, model compilation")
    
except Exception as e:
    print(f"📊 Training visualization not available: {e}")

## 🚀 Performance Benchmarks

In [None]:
import time

# Benchmark generation speed
print("⚡ Performance Benchmarks\n")

test_prompt = "The funniest thing about"
num_runs = 5
total_time = 0
total_tokens = 0

print(f"🧪 Running {num_runs} generation tests...")

for i in range(num_runs):
    start_time = time.time()
    result = generate_humor(test_prompt, max_length=20, temperature=0.8)
    end_time = time.time()
    
    generation_time = end_time - start_time
    tokens_generated = len(tokenizer.encode(result)) - len(tokenizer.encode(test_prompt))
    
    total_time += generation_time
    total_tokens += tokens_generated
    
    print(f"  Run {i+1}: {generation_time:.3f}s, {tokens_generated} tokens")

avg_time = total_time / num_runs
avg_tokens = total_tokens / num_runs
tokens_per_second = avg_tokens / avg_time if avg_time > 0 else 0

print(f"\n📊 Average Results:")
print(f"  Time per generation: {avg_time:.3f}s")
print(f"  Tokens per generation: {avg_tokens:.1f}")
print(f"  Generation speed: {tokens_per_second:.1f} tokens/second")
print(f"  Device: {device}")

## 📁 Project Structure and Usage

```
HumorLLM/
├── models/
│   ├── final_model.pt          # Trained model weights
│   └── final_checkpoint.ckpt   # Training checkpoint
├── config/
│   └── model_config.json       # Model configuration
├── tokenizer/
│   ├── tokenizer.json          # Tokenizer vocabulary
│   └── state_dict.json         # Tokenizer state
├── data/
│   └── processed/              # Training datasets
├── seagull/                    # Model architecture
│   ├── model/                  # Core model components
│   ├── nn/                     # Neural network modules
│   ├── trainers/               # Training utilities
│   └── utils/                  # Helper functions
├── scripts/                    # Training and utility scripts
└── requirements.txt           # Dependencies
```

## 🎯 Key Achievements

1. **Custom Architecture**: Implemented modern transformer with RoPE, RMS LayerNorm, SwiGLU
2. **Optimized Training**: Mixed precision, gradient clipping, cosine scheduling
3. **Humor Focus**: Specialized training on caption data for humor generation
4. **Production Ready**: Clean codebase with modular design and comprehensive configs

## 🔮 Future Improvements

- **Larger Scale**: Train on more diverse humor datasets
- **Fine-tuning**: Domain-specific humor adaptation
- **Evaluation**: Implement humor-specific metrics
- **Deployment**: API endpoint for real-time humor generation

---

**Built with ❤️ using PyTorch and custom Seagull architecture**