# BLIP Fine-tuning for Facial Expression Recognition - Google Drive Workflow

## ðŸš€ Quick Start (Google Drive Workflow)

### Step 1: Run Data Preparation
```
Run: 01_data_preparation.ipynb
Output: Saves metadata to /data/metadata/ on Google Drive
```

### Step 2: Run Training (This Notebook)
```
Run: 02_blip_training.ipynb
Input: Loads metadata from Google Drive (/data/metadata/)
```

---

# BLIP Model Fine-tuning for Facial Expression Recognition

**Project:** FER AI with BLIP  
**Dataset:** RAF-DB (Balanced, Grayscale)  
**Task:** Fine-tune BLIP for emotion classification  
**Environment:** Google Colab (GPU recommended)

---

## Notebook Overview
1. Environment Setup & Model Loading
2. Dataset Preparation (PyTorch DataLoader)
3. Model Configuration
4. Training Loop
5. Evaluation & Inference

## 1. Environment Setup & Model Loading

In [None]:
# Install required libraries
!pip install torch torchvision -q
!pip install transformers datasets -q
!pip install opencv-python-headless mtcnn -q
!pip install tqdm tensorboard -q

import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from transformers import BlipProcessor, BlipForConditionalGeneration, AutoProcessor
from torch.optim import AdamW
from torch.optim.lr_scheduler import CosineAnnealingLR
import numpy as np
import pandas as pd
from pathlib import Path
import cv2
from mtcnn import MTCNN
from tqdm import tqdm
import json
from datetime import datetime
import time
import matplotlib.pyplot as plt
import seaborn as sns

print(f"âœ“ PyTorch version: {torch.__version__}")
print(f"âœ“ GPU Available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"âœ“ GPU Device: {torch.cuda.get_device_name(0)}")
    print(f"âœ“ GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")

## 2. Configuration & Paths

In [None]:
# Detect environment
try:
    from google.colab import drive
    IS_COLAB = True
    drive.mount('/content/drive')
    BASE_PATH = '/content/drive/MyDrive/FER_AI_Project'
except:
    IS_COLAB = False
    BASE_PATH = r'c:\Users\famil\Desktop\ghaith\Projects\FER_AI_Project'

print(f"Environment: {'Colab' if IS_COLAB else 'Local'}")
print(f"Base Path: {BASE_PATH}")

In [None]:
# Configuration
CONFIG = {
    'project_name': 'FER_AI_BLIP',
    'timestamp': datetime.now().strftime('%Y%m%d_%H%M%S'),
    
    # Paths
    'data_root': f'{BASE_PATH}/data',
    'raw_data_path': f'{BASE_PATH}/data/raw',
    'processed_data_path': f'{BASE_PATH}/data/processed',
    'models_path': f'{BASE_PATH}/models',
    'logs_path': f'{BASE_PATH}/logs',
    
    # Dataset
    'emotion_labels': {
        'angry': 'Anger',
        'disgust': 'Disgust',
        'fear': 'Fear',
        'happy': 'Happiness',
        'neutral': 'Neutral',
        'sad': 'Sadness',
        'surprise': 'Surprise'
    },
    
    # Model
    'model_name': 'Salesforce/blip-image-captioning-base',
    'image_size': (64, 64),  # Reduced for GPU memory savings
    
    # Training (optimized for T4 GPU, max 4 hours)
    'batch_size': 2,  # Minimal batch size for T4 GPU memory
    'num_epochs': 6,  # Reduced from 10 for 4-hour constraint
    'learning_rate': 1e-4,  # Slightly higher LR for faster convergence
    'warmup_steps': 300,  # Fewer warmup steps
    'max_grad_norm': 1.0,
    'weight_decay': 0.01,
    'random_seed': 42,
    'device': 'cuda' if torch.cuda.is_available() else 'cpu',
    'accumulation_steps': 16,  # High accumulation (effective batch=32)
    'log_interval': 10,  # Log every N batches
    'use_fp16': True  # Mixed precision for memory efficiency
}

# Create directories
Path(CONFIG['models_path']).mkdir(parents=True, exist_ok=True)
Path(CONFIG['logs_path']).mkdir(parents=True, exist_ok=True)

print(json.dumps({k: v for k, v in CONFIG.items() if not k.endswith('_path')}, indent=2))

## 3. Load Dataset from Google Drive

In [None]:
# Load dataset from Google Drive (prepared by 01_data_preparation.ipynb)
print("="*70)
print("Loading dataset from Google Drive...")
print("="*70 + "\n")

# Check for metadata saved by data preparation notebook
metadata_path = Path(BASE_PATH) / 'data' / 'metadata'
dataset_metadata_csv = metadata_path / 'dataset_metadata.csv'

if dataset_metadata_csv.exists():
    # Load preprocessed metadata from Google Drive
    print(f"âœ“ Found metadata on Google Drive: {dataset_metadata_csv}")
    dataset_df = pd.read_csv(dataset_metadata_csv)
    print(f"âœ“ Loaded {len(dataset_df)} images from saved metadata\n")
    
    # Ensure required columns exist (handle different column naming from data prep)
    required_columns = ['image_path', 'emotion', 'emotion_label', 'split']
    
    # Map column names if needed
    if 'emotion_folder' in dataset_df.columns and 'emotion' not in dataset_df.columns:
        dataset_df['emotion'] = dataset_df['emotion_folder']
    
    # Verify all required columns present
    missing_cols = [col for col in required_columns if col not in dataset_df.columns]
    if missing_cols:
        print(f"âš  Warning: Missing columns {missing_cols} in metadata")
        print("  Regenerating from image paths...\n")
        
        # Extract emotion from path if missing
        if 'emotion' not in dataset_df.columns:
            dataset_df['emotion'] = dataset_df['image_path'].apply(
                lambda x: Path(x).parent.name
            )
        
        # Map to emotion labels if missing
        if 'emotion_label' not in dataset_df.columns:
            dataset_df['emotion_label'] = dataset_df['emotion'].map(CONFIG['emotion_labels'])
        
        # Extract split from path if missing
        if 'split' not in dataset_df.columns:
            dataset_df['split'] = dataset_df['image_path'].apply(
                lambda x: Path(x).parent.parent.name
            )
    
    # Update CONFIG with the raw data path from metadata
    if len(dataset_df) > 0:
        # Extract base path from first image path
        first_image = dataset_df.iloc[0]['image_path']
        # Example: /kaggle/input/balanced-raf-db-dataset-7575-grayscale/train/angry/image.jpg
        # Extract: /kaggle/input/balanced-raf-db-dataset-7575-grayscale
        base_path = Path(first_image).parents[2]  # Go up 2 levels from emotion/split
        CONFIG['raw_data_path'] = str(base_path)
        print(f"âœ“ Dataset base path: {base_path}\n")
    
else:
    # Fallback: Download from Kaggle if metadata not found
    print("âš  Metadata not found on Google Drive")
    print("  Please run 01_data_preparation.ipynb first!\n")
    print("Fallback: Downloading dataset from Kaggle...\n")
    
    import kagglehub
    
    # Download latest version
    dataset_name = "dollyprajapati182/balanced-raf-db-dataset-7575-grayscale"
    print(f"Dataset: {dataset_name}")
    download_path = kagglehub.dataset_download(dataset_name)
    print(f"âœ“ Dataset downloaded to: {download_path}\n")
    
    # Map kagglehub download path to our config
    CONFIG['raw_data_path'] = download_path
    
    # Load dataset metadata from raw data
    print("Scanning dataset files...\n")
    
    def load_dataset_metadata(data_path):
        """
        Load dataset metadata from raw data (train/val/test structure)
        """
        data_path = Path(data_path)
        image_data = []
        
        emotion_folders = list(CONFIG['emotion_labels'].keys())
        
        # Check if data has train/val/test subdirectories
        for split in ['train', 'val', 'test']:
            split_path = data_path / split
            if not split_path.exists():
                continue
            
            # Look for emotion folders
            for emotion in emotion_folders:
                emotion_path = split_path / emotion
                if emotion_path.exists():
                    for img_path in emotion_path.glob('*.jpg'):
                        image_data.append({
                            'image_path': str(img_path),
                            'emotion': emotion,
                            'emotion_label': CONFIG['emotion_labels'][emotion],
                            'split': split
                        })
        
        return pd.DataFrame(image_data)
    
    dataset_df = load_dataset_metadata(CONFIG['raw_data_path'])

# Display dataset statistics
print(f"{'='*70}")
print(f"Dataset Loaded Successfully")
print(f"{'='*70}\n")
print(f"âœ“ Total images: {len(dataset_df)}")
print(f"\nSplit distribution:")
print(dataset_df['split'].value_counts())
print(f"\nEmotion distribution:")
print(dataset_df['emotion_label'].value_counts())

## 4. Custom Dataset Class

In [None]:
# Disable face detection to save GPU memory (dataset already has cropped faces)
# detector = MTCNN()  # Disabled - uses significant GPU memory

class FERDataset(Dataset):
    """
    Custom Dataset for Facial Expression Recognition
    Loads images and prepares for BLIP model (no face detection needed)
    """
    
    def __init__(self, dataframe, processor, config, split='train'):
        self.df = dataframe[dataframe['split'] == split].reset_index(drop=True)
        self.processor = processor
        self.config = config
        self.emotion_to_id = {emotion: idx for idx, emotion in enumerate(config['emotion_labels'].keys())}
    
    def __len__(self):
        return len(self.df)
    
    def load_and_resize_image(self, image_path):
        """
        Load and resize image (no face detection - saves GPU memory)
        """
        try:
            img = cv2.imread(str(image_path))
            if img is None:
                return None
            
            img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
            # Simple resize - no face detection needed (dataset pre-cropped)
            img_resized = cv2.resize(img_rgb, self.config['image_size'])
            return img_resized
        except Exception as e:
            print(f"Error loading image: {e}")
            return None
    
    def __getitem__(self, idx):
        row = self.df.iloc[idx]
        
        # Extract emotion from image path (folder name)
        # Path structure: .../split/emotion/image.jpg
        image_path = row['image_path']
        emotion_folder = Path(image_path).parent.name  # Get emotion folder name
        
        # Load and resize image (no face detection)
        image = self.load_and_resize_image(image_path)
        if image is None:
            image = np.zeros((*self.config['image_size'], 3), dtype=np.uint8)
        
        # Get emotion label from folder name
        emotion_label = self.config['emotion_labels'].get(emotion_folder, 'Unknown')
        
        # Process with BLIP processor
        inputs = self.processor(
            images=image,
            text=f"emotion: {emotion_label}",
            return_tensors="pt",
            padding=True
        )
        
        # Remove batch dimension
        for key in inputs:
            inputs[key] = inputs[key].squeeze(0)
        
        inputs['emotion_label'] = emotion_label
        inputs['emotion_id'] = self.emotion_to_id.get(emotion_folder, 0)
        
        return inputs

print("âœ“ Dataset class defined")

## 5. Load BLIP Model

In [None]:
# Load BLIP model and processor
print(f"Loading BLIP model: {CONFIG['model_name']}")
processor = AutoProcessor.from_pretrained(CONFIG['model_name'])
model = BlipForConditionalGeneration.from_pretrained(CONFIG['model_name'])

# Enable gradient checkpointing to save memory
model.gradient_checkpointing_enable()
print(f"âœ“ Gradient checkpointing enabled")

# Move to device
device = torch.device(CONFIG['device'])
model.to(device)

print(f"âœ“ Model loaded")
print(f"âœ“ Model device: {device}")
print(f"âœ“ Model parameters: {sum(p.numel() for p in model.parameters())/1e6:.2f}M")

## 6. Create DataLoaders

In [None]:
# Create datasets
print("Creating datasets...")
train_dataset = FERDataset(dataset_df, processor, CONFIG, split='train')
val_dataset = FERDataset(dataset_df, processor, CONFIG, split='val')
test_dataset = FERDataset(dataset_df, processor, CONFIG, split='test')

print(f"âœ“ Train dataset: {len(train_dataset)} images")
print(f"âœ“ Val dataset: {len(val_dataset)} images")
print(f"âœ“ Test dataset: {len(test_dataset)} images")

# Create dataloaders
train_loader = DataLoader(
    train_dataset,
    batch_size=CONFIG['batch_size'],
    shuffle=True,
    num_workers=2
)

val_loader = DataLoader(
    val_dataset,
    batch_size=CONFIG['batch_size'],
    shuffle=False,
    num_workers=2
)

print(f"\nâœ“ DataLoaders created")
print(f"âœ“ Train batches: {len(train_loader)}")
print(f"âœ“ Val batches: {len(val_loader)}")

## 7. Training Setup

In [None]:
# Setup optimizer and scheduler
optimizer = AdamW(
    model.parameters(),
    lr=CONFIG['learning_rate'],
    weight_decay=CONFIG['weight_decay']
)

total_steps = len(train_loader) * CONFIG['num_epochs']
scheduler = CosineAnnealingLR(
    optimizer,
    T_max=total_steps,
    eta_min=1e-6
)

print(f"âœ“ Optimizer: AdamW (lr={CONFIG['learning_rate']})")
print(f"âœ“ Scheduler: CosineAnnealing (total_steps={total_steps})")
print(f"âœ“ Training will run for {CONFIG['num_epochs']} epochs")

## 8. Training Loop

In [None]:
def train_epoch(model, train_loader, optimizer, scheduler, device, epoch, config):
    """
    Train for one epoch with gradient accumulation and mixed precision
    """
    model.train()
    total_loss = 0
    epoch_start_time = time.time()
    batch_times = []
    
    # Setup automatic mixed precision if enabled
    scaler = torch.amp.GradScaler('cuda') if config.get('use_fp16', False) else None
    
    pbar = tqdm(train_loader, desc=f"Epoch {epoch+1}/{config['num_epochs']}")
    
    for batch_idx, batch in enumerate(pbar):
        batch_start_time = time.time()
        
        # Move to device
        pixel_values = batch['pixel_values'].to(device)
        input_ids = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device)
        
        # Forward pass with mixed precision
        if config.get('use_fp16', False):
            with torch.amp.autocast('cuda'):
                outputs = model(
                    pixel_values=pixel_values,
                    input_ids=input_ids,
                    attention_mask=attention_mask,
                    labels=input_ids
                )
                loss = outputs.loss / config['accumulation_steps']
        else:
            outputs = model(
                pixel_values=pixel_values,
                input_ids=input_ids,
                attention_mask=attention_mask,
                labels=input_ids
            )
            loss = outputs.loss / config['accumulation_steps']
        
        # Backward pass with gradient accumulation
        if scaler:
            scaler.scale(loss).backward()
        else:
            loss.backward()
        
        # Update weights every accumulation_steps
        if (batch_idx + 1) % config['accumulation_steps'] == 0:
            if scaler:
                scaler.unscale_(optimizer)
                torch.nn.utils.clip_grad_norm_(model.parameters(), config['max_grad_norm'])
                scaler.step(optimizer)
                scaler.update()
                optimizer.zero_grad()
            else:
                torch.nn.utils.clip_grad_norm_(model.parameters(), config['max_grad_norm'])
                optimizer.step()
                optimizer.zero_grad()
            scheduler.step()
            
            # Aggressive GPU memory cleanup
            if (batch_idx + 1) % (config['accumulation_steps'] * 10) == 0:
                torch.cuda.empty_cache()
        
        total_loss += loss.item() * config['accumulation_steps']
        batch_time = time.time() - batch_start_time
        batch_times.append(batch_time)
        
        # Log every N batches
        if (batch_idx + 1) % config['log_interval'] == 0:
            avg_batch_time = np.mean(batch_times[-config['log_interval']:])
            remaining_batches = len(train_loader) - batch_idx - 1
            eta_seconds = avg_batch_time * remaining_batches
            eta_minutes = eta_seconds / 60
            
            pbar.set_postfix({
                'loss': f'{(loss.item() * config["accumulation_steps"]):.4f}',
                'batch_time': f'{batch_time:.2f}s',
                'eta': f'{eta_minutes:.1f}m'
            })
    
    epoch_time = time.time() - epoch_start_time
    avg_loss = total_loss / len(train_loader)
    
    return avg_loss, epoch_time

def validate(model, val_loader, device):
    """
    Validate model with timing
    """
    model.eval()
    total_loss = 0
    val_start_time = time.time()
    
    with torch.no_grad():
        for batch in tqdm(val_loader, desc="Validating"):
            pixel_values = batch['pixel_values'].to(device)
            input_ids = batch['input_ids'].to(device)
            attention_mask = batch['attention_mask'].to(device)
            
            outputs = model(
                pixel_values=pixel_values,
                input_ids=input_ids,
                attention_mask=attention_mask,
                labels=input_ids
            )
            
            total_loss += outputs.loss.item()
    
    val_time = time.time() - val_start_time

    avg_loss = total_loss / len(val_loader)print("âœ“ Training functions defined")

    return avg_loss, val_time

## 9. Train Model

In [None]:
# Training loop
# Clear GPU cache before training
torch.cuda.empty_cache()
print(f"âœ“ GPU cache cleared")

training_start_time = time.time()

print(f"\n{'='*70}")
print(f"Starting BLIP Fine-tuning on RAF-DB Dataset")
print(f"Image Size: {CONFIG['image_size']}")
print(f"Batch Size: {CONFIG['batch_size']} (effective: {CONFIG['batch_size'] * CONFIG['accumulation_steps']})")
print(f"Gradient Accumulation Steps: {CONFIG['accumulation_steps']}")
print(f"Mixed Precision (FP16): {CONFIG.get('use_fp16', False)}")
print(f"Num Epochs: {CONFIG['num_epochs']}")
print(f"Learning Rate: {CONFIG['learning_rate']}")
print(f"Device: {device}")
print(f"Start Time: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print(f"{'='*70}\n")

history = {
    'train_loss': [],
    'val_loss': [],
    'epoch_time': [],
    'val_time': [],
    'learning_rate': []
}

best_val_loss = float('inf')
epoch_logs = []

for epoch in range(CONFIG['num_epochs']):
    epoch_log = {'epoch': epoch + 1}
    
    print(f"\nEpoch {epoch+1}/{CONFIG['num_epochs']}")
    print("-" * 70)
    
    # Train
    train_loss, epoch_time = train_epoch(model, train_loader, optimizer, scheduler, device, epoch, CONFIG)
    history['train_loss'].append(train_loss)
    history['epoch_time'].append(epoch_time)
    epoch_log['train_loss'] = train_loss
    epoch_log['epoch_time'] = epoch_time
    
    # Get current learning rate
    current_lr = optimizer.param_groups[0]['lr']
    history['learning_rate'].append(current_lr)
    epoch_log['lr'] = current_lr
    
    print(f"Train Loss: {train_loss:.4f} | Time: {epoch_time:.1f}s ({epoch_time/60:.2f}m)")
    print(f"Learning Rate: {current_lr:.2e}")
    
    # Clear GPU cache before validation
    torch.cuda.empty_cache()
    
    # Validate
    val_loss, val_time = validate(model, val_loader, device)
    history['val_loss'].append(val_loss)
    history['val_time'].append(val_time)
    epoch_log['val_loss'] = val_loss
    epoch_log['val_time'] = val_time
    
    print(f"Val Loss: {val_loss:.4f} | Time: {val_time:.1f}s ({val_time/60:.2f}m)")
    
    # Save best model
    if val_loss < best_val_loss:
        best_val_loss = val_loss
        model_path = Path(CONFIG['models_path']) / f"blip_best_epoch_{epoch+1}.pt"
        torch.save(model.state_dict(), model_path)
        print(f"âœ“ Saved best model: {model_path.name}")
        epoch_log['best_model'] = True
    
    epoch_logs.append(epoch_log)
    
    # Calculate ETA
    elapsed_time = time.time() - training_start_time
    avg_epoch_time = elapsed_time / (epoch + 1)
    remaining_epochs = CONFIG['num_epochs'] - (epoch + 1)
    eta_seconds = avg_epoch_time * remaining_epochs
    eta_minutes = eta_seconds / 60
    eta_hours = eta_minutes / 60
    
    print(f"Total Elapsed: {elapsed_time/3600:.2f}h | ETA: {eta_hours:.2f}h")

total_training_time = time.time() - training_start_time


print(f"\n{'='*70}")
print(f"{'='*70}")

print(f"Training Complete!")
print(f"End Time: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

print(f"Average Time per Epoch: {total_training_time/CONFIG['num_epochs']:.1f} seconds")
print(f"Total Training Time: {total_training_time/3600:.2f} hours ({total_training_time/60:.1f} minutes)")

## 10. Evaluate on Test Set

In [None]:
# Test evaluation
test_loader = DataLoader(
    test_dataset,
    batch_size=CONFIG['batch_size'],
    shuffle=False,
    num_workers=2
)

print(f"Evaluating on test set...")
test_loss, test_time = validate(model, test_loader, device)
print(f"\nâœ“ Test Loss: {test_loss:.4f}")
print(f"âœ“ Test Evaluation Time: {test_time:.1f}s ({test_time/60:.2f}m)")

## 11. Save Training History

In [None]:
# Save training history
history_path = Path(CONFIG['logs_path']) / f"training_history_{CONFIG['timestamp']}.json"
with open(history_path, 'w') as f:
    json.dump(history, f, indent=2)

print(f"âœ“ Training history saved: {history_path}")

# Save detailed epoch logs
epoch_logs_path = Path(CONFIG['logs_path']) / f"epoch_logs_{CONFIG['timestamp']}.json"
with open(epoch_logs_path, 'w') as f:
    json.dump(epoch_logs, f, indent=2)

print(f"âœ“ Epoch logs saved: {epoch_logs_path}")

# Save training config
config_path = Path(CONFIG['logs_path']) / f"training_config_{CONFIG['timestamp']}.json"
with open(config_path, 'w') as f:
    json.dump(CONFIG, f, indent=2, default=str)

print(f"âœ“ Training config saved: {config_path}")

# Plot training curves with timing
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Loss curves
axes[0, 0].plot(history['train_loss'], label='Train Loss', marker='o')
axes[0, 0].plot(history['val_loss'], label='Val Loss', marker='s')
axes[0, 0].set_xlabel('Epoch')
axes[0, 0].set_ylabel('Loss')
axes[0, 0].set_title('BLIP Training History - Loss')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Epoch timing
axes[0, 1].bar(range(1, len(history['epoch_time']) + 1), history['epoch_time'])
axes[0, 1].set_xlabel('Epoch')
axes[0, 1].set_ylabel('Time (seconds)')
axes[0, 1].set_title('Epoch Training Time')
axes[0, 1].grid(True, alpha=0.3, axis='y')

# Learning rate schedule
axes[1, 0].plot(history['learning_rate'], marker='o', color='green')
axes[1, 0].set_xlabel('Epoch')
axes[1, 0].set_ylabel('Learning Rate')
axes[1, 0].set_title('Learning Rate Schedule')
axes[1, 0].grid(True, alpha=0.3)
axes[1, 0].set_yscale('log')

# Cumulative time
cumulative_time = np.cumsum(history['epoch_time'])
axes[1, 1].plot(range(1, len(cumulative_time) + 1), cumulative_time/3600, marker='o', color='red')
axes[1, 1].set_xlabel('Epoch')
axes[1, 1].set_ylabel('Cumulative Time (hours)')
axes[1, 1].set_title('Cumulative Training Time')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig(Path(CONFIG['logs_path']) / f"training_curves_{CONFIG['timestamp']}.png", dpi=150)
plt.show()

print(f"âœ“ Training curves saved")

## 13. Save Trained Model for Later Use

In [None]:
# Save the trained model state dict and configuration
print("="*70)
print("Saving Trained Model")
print("="*70 + "\n")

# Save best model state dict
best_model_path = Path(CONFIG['models_path']) / f"blip_fer_best_model.pt"
torch.save(model.state_dict(), best_model_path)
print(f"âœ“ Best model state dict saved: {best_model_path}")

# Save complete model (including architecture)
full_model_path = Path(CONFIG['models_path']) / f"blip_fer_full_model.pt"
torch.save({
    'model_state_dict': model.state_dict(),
    'config': CONFIG,
    'emotion_labels': CONFIG['emotion_labels'],
    'timestamp': CONFIG['timestamp']
}, full_model_path)
print(f"âœ“ Full model with config saved: {full_model_path}")

# Save processor for later use
processor_path = Path(CONFIG['models_path']) / f"blip_processor"
processor.save_pretrained(processor_path)
print(f"âœ“ Processor saved: {processor_path}")

# Save model metadata
metadata = {
    'model_name': CONFIG['model_name'],
    'training_timestamp': CONFIG['timestamp'],
    'best_val_loss': float(best_val_loss),
    'num_epochs_trained': CONFIG['num_epochs'],
    'image_size': CONFIG['image_size'],
    'emotion_labels': CONFIG['emotion_labels'],
    'total_training_time_hours': total_training_time / 3600,
    'training_history': {
        'train_loss': history['train_loss'],
        'val_loss': history['val_loss'],
        'epoch_times': history['epoch_time']
    }
}

metadata_path = Path(CONFIG['models_path']) / f"model_metadata_{CONFIG['timestamp']}.json"
with open(metadata_path, 'w') as f:
    json.dump(metadata, f, indent=2)
print(f"âœ“ Model metadata saved: {metadata_path}")

print(f"\n{'='*70}")
print(f"All model files saved to: {CONFIG['models_path']}")
print(f"{'='*70}\n")

# List all saved files
print("Saved files:")
for file in sorted(Path(CONFIG['models_path']).glob('*')):
    if file.is_file():
        size_mb = file.stat().st_size / (1024**2)
        print(f"  - {file.name} ({size_mb:.2f} MB)")
    elif file.is_dir():
        print(f"  - {file.name}/ (directory)")

## 14. Load Trained Model (for inference in future sessions)

To use the trained model in a future session, run the cells below:

In [None]:
# Example: Load trained model for inference
print("="*70)
print("Loading Trained Model")
print("="*70 + "\n")

# Load processor
loaded_processor = AutoProcessor.from_pretrained(CONFIG['models_path'] / 'blip_processor')
print(f"âœ“ Processor loaded")

# Load model
loaded_model = BlipForConditionalGeneration.from_pretrained(CONFIG['model_name'])
loaded_model.gradient_checkpointing_enable()

# Load best model state dict
model_path = Path(CONFIG['models_path']) / 'blip_fer_best_model.pt'
loaded_model.load_state_dict(torch.load(model_path, map_location=device))
loaded_model.to(device)
loaded_model.eval()
print(f"âœ“ Model weights loaded from: {model_path}")

# Load metadata
metadata_files = sorted(Path(CONFIG['models_path']).glob('model_metadata_*.json'))
if metadata_files:
    with open(metadata_files[-1], 'r') as f:
        loaded_metadata = json.load(f)
    print(f"âœ“ Model metadata loaded")
    print(f"\n  Best validation loss: {loaded_metadata['best_val_loss']:.4f}")
    print(f"  Training time: {loaded_metadata['total_training_time_hours']:.2f} hours")
    print(f"  Image size: {loaded_metadata['image_size']}")
    print(f"  Emotions: {', '.join(loaded_metadata['emotion_labels'].values())}")

In [None]:
# Inference example with loaded model
def predict_emotion_inference(model, processor, image_path, device, emotion_labels):
    """
    Predict emotion from image using loaded model
    """
    model.eval()
    
    # Load image
    img = cv2.imread(str(image_path))
    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img_resized = cv2.resize(img_rgb, (64, 64))  # Match training size
    
    # Process
    inputs = processor(images=img_resized, return_tensors="pt")
    pixel_values = inputs['pixel_values'].to(device)
    
    # Generate caption
    with torch.no_grad():
        caption_ids = model.generate(pixel_values=pixel_values, max_length=20)
        caption = processor.decode(caption_ids[0], skip_special_tokens=True)
    
    return caption

# Test inference on sample
if len(test_dataset) > 0:
    sample_path = test_dataset.df.iloc[0]['image_path']
    prediction = predict_emotion_inference(loaded_model, loaded_processor, sample_path, device, CONFIG['emotion_labels'])
    print(f"\nSample inference:")
    print(f"  Image: {Path(sample_path).name}")
    print(f"  Prediction: {prediction}")

## 15. Model Checkpoints Summary

**Saved Model Files:**

1. **blip_fer_best_model.pt** - Best model weights (state dict only, smaller file)
   - Use this for inference by loading weights into a fresh model
   - Smallest file size (~900 MB)

2. **blip_fer_full_model.pt** - Complete model snapshot
   - Includes model weights, config, and metadata
   - Easier one-stop loading
   - Larger file size (~1.2 GB)

3. **blip_processor/** - BLIP image processor
   - Required for preprocessing images before inference
   - Can be loaded with `AutoProcessor.from_pretrained()`

4. **model_metadata_*.json** - Training metadata
   - Training loss history
   - Best validation loss
   - Training duration
   - Emotion labels mapping

**To Load Model in Future Session:**
```python
# Load processor
processor = AutoProcessor.from_pretrained('path_to_processor')

# Load model
model = BlipForConditionalGeneration.from_pretrained('Salesforce/blip-image-captioning-base')
model.load_state_dict(torch.load('path_to_best_model.pt'))
model.eval()

# Use model for inference
```

All files are saved to your Google Drive in the `models/` directory and will persist after the session closes!

## 16. Summary

In [None]:
# Test inference on a sample image
def predict_emotion(model, processor, image_path, device):
    """
    Predict emotion from image
    """
    model.eval()
    
    # Load image
    img = cv2.imread(str(image_path))
    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img_resized = cv2.resize(img_rgb, (224, 224))
    
    # Process
    inputs = processor(images=img_resized, return_tensors="pt")
    pixel_values = inputs['pixel_values'].to(device)
    
    # Generate caption
    with torch.no_grad():
        caption_ids = model.generate(pixel_values=pixel_values, max_length=20)
        caption = processor.decode(caption_ids[0], skip_special_tokens=True)
    
    return caption

# Test on sample
if len(test_dataset) > 0:
    sample_path = test_dataset.df.iloc[0]['image_path']
    prediction = predict_emotion(model, processor, sample_path, device)
    print(f"Sample prediction: {prediction}")

## Summary

âœ“ BLIP model successfully fine-tuned on RAF-DB dataset  
âœ“ Best model saved to `models/` directory  
âœ“ Training history and curves saved to `logs/` directory  
âœ“ Ready for inference and deployment