## 11. Summary and Next Steps

### What This Notebook Does:
✓ Loads lightweight Hugging Face pretrained models (Stable Diffusion)
✓ Integrates with your existing COCO + Flickr dataloaders
✓ Generates high-quality images from text prompts
✓ Works on CPU for development, optimized for GPU deployment
✓ Provides utilities for batch processing and visualization

### Key Features:
- **Lightweight**: Uses Stable Diffusion v1.5 (~5GB)
- **Flexible**: Adjustable inference steps, guidance scale, and other parameters
- **Scalable**: Configuration ready for multi-GPU on HiperGator
- **Data Integration**: Direct usage of your existing caption datasets

### Next Steps for Production:
1. **Fine-tune on your data**: Train the model on your specific dataset
2. **Optimize hyperparameters**: Adjust guidance_scale and num_inference_steps
3. **Evaluate quality**: Use CLIP scores or FID to measure generation quality
4. **Deploy to HiperGator**: Use the multi-GPU configuration provided
5. **Add LoRA**: For efficient parameter-efficient fine-tuning
6. **Monitor performance**: Track GPU memory and inference time

### Recommended Reading:
- Hugging Face Diffusers Documentation: https://huggingface.co/docs/diffusers
- Stable Diffusion Paper: https://arxiv.org/abs/2112.10752
- CLIP Guidance for Image Generation: https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion_2


In [None]:
# Test with custom prompts
custom_prompts = [
    "A serene landscape with mountains and lake during sunset",
    "A modern kitchen with stainless steel appliances",
    "A cat sitting on a windowsill, looking outside",
    "A busy street market with colorful stalls and people"
]

print("Generating images from custom prompts...")
print("="*60)

custom_results = generate_batch_images(
    custom_prompts,
    num_per_prompt=1,
    num_inference_steps=inference_steps,
    guidance_scale=7.5,
    seed=123
)

# Display custom generated images
display_images(custom_results)

# Experiment with different parameters
print("\n" + "="*60)
print("Experimenting with different guidance scales...")
print("="*60)

test_prompt = "A futuristic city with flying cars"
guidance_scales = [5.0, 7.5, 10.0]

fig, axes = plt.subplots(1, len(guidance_scales), figsize=(5*len(guidance_scales), 5))
fig.suptitle(f"Effect of Guidance Scale: '{test_prompt}'", fontsize=14, fontweight='bold')

for ax, guidance_scale in zip(axes, guidance_scales):
    image = generate_image(
        test_prompt,
        num_inference_steps=20,
        guidance_scale=guidance_scale,
        seed=42
    )
    if image is not None:
        ax.imshow(image)
        ax.set_title(f"Guidance: {guidance_scale}")
    ax.axis('off')

plt.tight_layout()
plt.show()

## 10. Advanced: Custom Prompts and Parameters

In [None]:
"""
Multi-GPU Deployment Guide for HiperGator
==========================================

To scale this code to multiple GPUs on HiperGator, follow these steps:

1. DISTRIBUTED DATA PARALLEL (DDP) SETUP:
   - Use torch.nn.parallel.DistributedDataParallel for multi-GPU inference
   - Wrap model: model = DDP(model, device_ids=[local_rank])

2. SBATCH CONFIGURATION (Example):
   #!/bin/bash
   #SBATCH --nodes=1
   #SBATCH --ntasks-per-node=2
   #SBATCH --gpus-per-node=a100:2  # Request 2 A100 GPUs
   #SBATCH --cpus-per-task=16
   #SBATCH --mem=64GB
   #SBATCH --time=04:00:00
   
   module load python/3.11 cuda/11.8
   source venv/bin/activate
   
   torchrun --nproc_per_node=2 train_distributed.py

3. MEMORY OPTIMIZATION:
   - Use mixed precision (float16) for 2x memory savings
   - Enable gradient checkpointing if fine-tuning
   - Use --enable-attention-slicing for single GPU, disable for multi-GPU

4. PERFORMANCE TIPS:
   - Increase batch size (BATCH_SIZE = 8-16 per GPU on A100)
   - Use more inference steps (50+) for better quality
   - Reduce num_workers if data loading is bottleneck
   - Pin memory only if using GPU

5. MONITORING:
   - Use nvidia-smi to monitor GPU memory
   - Check data loading speed with: 
     time.time() profiling around dataloader loops

6. FINE-TUNING (Optional):
   - For custom text-to-image generation, fine-tune the model
   - Use LoRA for efficient parameter-efficient fine-tuning
   - Requires training loop with optimizer and loss calculation
"""

print(__doc__)

# Example configuration dictionary for multi-GPU setup
MULTI_GPU_CONFIG = {
    'num_gpus': NUM_GPUS,
    'batch_size': BATCH_SIZE,
    'num_workers': NUM_WORKERS,
    'use_ddp': NUM_GPUS > 1,
    'mixed_precision': 'fp16' if NUM_GPUS > 1 else 'fp32',
    'device': DEVICE,
    'local_rank': int(os.environ.get('LOCAL_RANK', 0))
}

print("\nCurrent Configuration:")
for key, value in MULTI_GPU_CONFIG.items():
    print(f"  {key}: {value}")


## 9. Multi-GPU Deployment Configuration and Notes

In [None]:
# Process multiple batches from the dataset
def process_dataset_batch(dataloader, num_batches=2, num_images_per_caption=1):
    """
    Generate images for multiple captions from the dataloader.
    
    Args:
        dataloader: PyTorch DataLoader
        num_batches: Number of batches to process
        num_images_per_caption: Number of images per caption
    """
    all_generated_images = {}
    
    for batch_idx, (images, captions) in enumerate(dataloader):
        if batch_idx >= num_batches:
            break
        
        print(f"\nProcessing batch {batch_idx + 1}/{num_batches}")
        print(f"Captions in batch: {len(captions)}")
        
        # Generate images for this batch
        batch_results = generate_batch_images(
            list(captions),
            num_per_prompt=num_images_per_caption,
            num_inference_steps=inference_steps,
            guidance_scale=7.5,
            seed=batch_idx
        )
        
        all_generated_images.update(batch_results)
    
    return all_generated_images


# Process first 2 batches as a demo
print("Processing batches from validation set...")
print("="*60)

batch_results = process_dataset_batch(val_loader, num_batches=2, num_images_per_caption=1)

print(f"\n✓ Processed {len(batch_results)} captions total")

## 8. Batch Processing and Evaluation

In [None]:
# Display the generated images
display_images(generated_images)

# Optional: Save images to disk
save_images(generated_images, output_dir="../results/generated_images")

## 7. Visualize Generated Images

In [None]:
# Extract sample captions from the dataset
num_samples = 5
sample_captions = []

# Get captions from training data
for idx in range(min(num_samples, len(train_dataset))):
    _, caption = train_dataset[idx]
    sample_captions.append(caption)

print(f"Sample Captions from Dataset ({len(sample_captions)} total):")
for idx, caption in enumerate(sample_captions):
    print(f"  {idx+1}. {caption}")

# Generate images from these captions
print(f"\n{'='*60}")
print("Generating images from dataset captions...")
print(f"{'='*60}\n")

# Reduce inference steps for faster generation on CPU
# For better quality later on GPU, increase to 50+ steps
inference_steps = 20 if not USE_GPU else 30

generated_images = generate_batch_images(
    sample_captions,
    num_per_prompt=1,
    num_inference_steps=inference_steps,
    guidance_scale=7.5,
    seed=42
)

print("\n✓ Image generation complete!")

## 6. Generate Images from Dataset Captions

In [None]:
def generate_image(prompt, num_inference_steps=30, guidance_scale=7.5, 
                   height=512, width=512, seed=None):
    """
    Generate an image from a text prompt using the pretrained model.
    
    Args:
        prompt (str): Text description of the image to generate
        num_inference_steps (int): Number of diffusion steps (higher = better quality, slower)
        guidance_scale (float): Strength of guidance (higher = more faithful to prompt)
        height (int): Image height in pixels
        width (int): Image width in pixels
        seed (int): Random seed for reproducibility
    
    Returns:
        PIL.Image: Generated image
    """
    if pipe is None:
        print("Model not loaded. Please run the model loading cell first.")
        return None
    
    # Set seed if provided
    if seed is not None:
        generator = torch.Generator(device=DEVICE).manual_seed(seed)
    else:
        generator = None
    
    try:
        with torch.no_grad():
            # Use less memory by disabling autograd for inference
            image = pipe(
                prompt,
                num_inference_steps=num_inference_steps,
                guidance_scale=guidance_scale,
                height=height,
                width=width,
                generator=generator
            ).images[0]
        
        return image
    
    except Exception as e:
        print(f"Error generating image: {e}")
        return None


def generate_batch_images(prompts, num_per_prompt=1, num_inference_steps=30, 
                         guidance_scale=7.5, seed=None):
    """
    Generate multiple images from a batch of prompts.
    
    Args:
        prompts (list): List of text prompts
        num_per_prompt (int): Number of images to generate per prompt
        num_inference_steps (int): Number of diffusion steps
        guidance_scale (float): Guidance scale
        seed (int): Base seed for reproducibility
    
    Returns:
        dict: Dictionary with prompts as keys and lists of images as values
    """
    results = {}
    
    for idx, prompt in enumerate(prompts):
        print(f"Generating images for prompt {idx+1}/{len(prompts)}: '{prompt}'")
        images = []
        
        for i in range(num_per_prompt):
            current_seed = seed + (idx * num_per_prompt + i) if seed is not None else None
            image = generate_image(
                prompt,
                num_inference_steps=num_inference_steps,
                guidance_scale=guidance_scale,
                seed=current_seed
            )
            if image is not None:
                images.append(image)
        
        results[prompt] = images
    
    return results


def display_images(image_dict, figsize=(15, 5)):
    """
    Display generated images in a grid.
    
    Args:
        image_dict (dict): Dictionary with prompts and image lists
        figsize (tuple): Figure size for matplotlib
    """
    num_prompts = len(image_dict)
    
    for prompt, images in image_dict.items():
        if not images:
            continue
        
        num_images = len(images)
        fig, axes = plt.subplots(1, num_images, figsize=(5*num_images, 5))
        
        if num_images == 1:
            axes = [axes]
        
        fig.suptitle(f"Prompt: {prompt}", fontsize=14, fontweight='bold')
        
        for ax, img in zip(axes, images):
            ax.imshow(img)
            ax.axis('off')
        
        plt.tight_layout()
        plt.show()


def save_images(image_dict, output_dir="generated_images"):
    """
    Save generated images to disk.
    
    Args:
        image_dict (dict): Dictionary with prompts and image lists
        output_dir (str): Directory to save images
    """
    os.makedirs(output_dir, exist_ok=True)
    
    for prompt_idx, (prompt, images) in enumerate(image_dict.items()):
        for img_idx, image in enumerate(images):
            # Create safe filename from prompt
            safe_prompt = "".join(c for c in prompt if c.isalnum() or c in (' ', '_')).rstrip()
            safe_prompt = safe_prompt[:50]  # Limit length
            
            filename = f"{output_dir}/prompt_{prompt_idx:02d}_img_{img_idx:02d}_{safe_prompt}.png"
            image.save(filename)
            print(f"Saved: {filename}")

print("✓ Inference pipeline utilities defined!")

## 5. Define Inference Pipeline and Utilities

In [None]:
# Model Selection - Options for lightweight models:
# 1. Stable Diffusion v1.4 (compact, ~5GB)
# 2. Stable Diffusion v2.1-base (smaller variant)
# 3. LDM-3B (very lightweight)
# 4. OpenJourney (community fine-tune)

MODEL_ID = "runwayml/stable-diffusion-v1-5"  # Lightweight and widely used
# Alternative lightweight models:
# MODEL_ID = "hakurei/waifu-diffusion"  # Even lighter
# MODEL_ID = "CompVis/stable-diffusion-v1-4"

print(f"Loading model: {MODEL_ID}")
print("This may take a few minutes on first run (downloads ~5GB)...\n")

# For CPU usage, use float32; for GPU, can use float16 to save memory
DTYPE = torch.float32 if not USE_GPU else torch.float16

try:
    # Load the full pipeline
    pipe = StableDiffusionPipeline.from_pretrained(
        MODEL_ID,
        torch_dtype=DTYPE,
        safety_checker=None,  # Disable safety checker for faster inference
        requires_safety_checker=False
    )
    
    # Move to device
    pipe = pipe.to(DEVICE)
    
    # Enable memory-efficient attention for CPU/low-memory GPUs
    pipe.enable_attention_slicing()
    
    if USE_GPU:
        # Additional optimizations for GPU
        pipe.enable_vae_slicing()
        
    print("✓ Model loaded successfully!")
    print(f"  Model dtype: {DTYPE}")
    print(f"  Device: {DEVICE}")
    
except Exception as e:
    print(f"Error loading model: {e}")
    print("Make sure you have enough disk space for model download (~5GB)")
    pipe = None

## 4. Load Lightweight Pretrained Models from Hugging Face

In [None]:
# Setup path to access custom modules
root_directory = os.path.dirname(os.getcwd())
sys.path.append(os.path.join(root_directory, 'src'))

# Import custom dataloaders
from dataloaders_text import caption_dataset, TextToImageDataset

# Initialize dataloader
dataloader_handler = caption_dataset()
print("Loading datasets from COCO + Flickr...")

# Get datasets
train_dataset, val_dataset, test_dataset = dataloader_handler.get_datasets()

print(f"\nDataset Sizes:")
print(f"  Train: {len(train_dataset)} samples")
print(f"  Val: {len(val_dataset)} samples")
print(f"  Test: {len(test_dataset)} samples")

# Create dataloaders with reduced batch size for CPU inference
# For GPU training later, increase these values
INFERENCE_BATCH_SIZE = 2 if USE_GPU else 1

train_loader = DataLoader(
    train_dataset,
    batch_size=INFERENCE_BATCH_SIZE,
    shuffle=True,
    num_workers=0 if not USE_GPU else NUM_WORKERS,
    pin_memory=PIN_MEMORY
)

val_loader = DataLoader(
    val_dataset,
    batch_size=INFERENCE_BATCH_SIZE,
    shuffle=False,
    num_workers=0 if not USE_GPU else NUM_WORKERS,
    pin_memory=PIN_MEMORY
)

# Get a sample batch to understand data shapes
sample_images, sample_captions = next(iter(train_loader))
print(f"\nSample Batch:")
print(f"  Images shape: {sample_images.shape}")
print(f"  Captions: {sample_captions[:2]}")  # Show first 2 captions

## 3. Load Custom Dataloaders

In [None]:
# Detect available hardware
NUM_GPUS = torch.cuda.device_count()
USE_GPU = torch.cuda.is_available()
DEVICE = torch.device('cuda' if USE_GPU else 'cpu')

print(f"Device: {DEVICE}")
print(f"Number of GPUs: {NUM_GPUS}")
if USE_GPU:
    for i in range(NUM_GPUS):
        print(f"  GPU {i}: {torch.cuda.get_device_name(i)}")
    print(f"  GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")

# Environment variables for optimization
os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'expandable_segments:True'
os.environ['OMP_NUM_THREADS'] = '16'

# Batch size: scale by number of GPUs for multi-GPU training
BATCH_SIZE = 4 * NUM_GPUS if NUM_GPUS > 0 else 4
NUM_WORKERS = min(4, os.cpu_count() or 4)
PIN_MEMORY = USE_GPU

print(f"\nTraining Configuration:")
print(f"  Batch Size: {BATCH_SIZE}")
print(f"  Num Workers: {NUM_WORKERS}")
print(f"  Pin Memory: {PIN_MEMORY}")

## 2. Setup Device and Environment Configuration

In [None]:
import os
import sys
import json
import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
import torch.distributed as dist
from torch.nn.parallel import DistributedDataParallel as DDP

# Transformers and Diffusers from Hugging Face
from transformers import AutoTokenizer, AutoModel, CLIPTokenizer, CLIPTextModel
from diffusers import StableDiffusionPipeline, DDPMPipeline, DPMSolverMultistepScheduler
from diffusers.utils import make_image_grid

# Image processing
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from PIL import Image
import torchvision.transforms as T
from torchvision.utils import save_image

# Data handling
import pandas as pd
from tqdm import tqdm
import warnings
warnings.filterwarnings('ignore')

# Set random seeds for reproducibility
def set_seed(seed=42):
    torch.manual_seed(seed)
    np.random.seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed_all(seed)

set_seed(42)

## 1. Import Required Libraries

# Text-to-Image Generation using Hugging Face Pretrained Models

This notebook demonstrates lightweight text-to-image generation using pretrained models from Hugging Face. It's designed to work on CPU initially and can be scaled to multi-GPU environments like HiperGator.

**Key Features:**
- Lightweight pretrained models (DistilBERT for text encoding, compact diffusion models)
- CPU-compatible with GPU acceleration support
- Integration with existing COCO + Flickr dataloaders
- Flexible inference pipeline
- Ready for multi-GPU deployment