## VGG16 Model Training & Evaluation

### ⚠️ Training vs Evaluation Mode

**This notebook contains both training and evaluation scripts:**

- **Sections 1-9**: Complete VGG16 training pipeline  
  - Only execute if you want to **re-train the model from scratch**
  - Training takes significant time and computational resources
  - Will overwrite the existing trained model

- **Sections 10+**: Model evaluation only
  - Execute these sections to **evaluate the existing saved model**
  - Loads pre-trained model and computes F1-scores and metrics
  - Safe to run without affecting the trained model

**Recommended workflow:**
- **For evaluation**: Skip to Section 10 and execute evaluation cells only
- **For retraining**: Execute all sections 1-9, then continue with evaluation

---

### 1. Imports 

In [2]:
# Notebook setup - auto-reload utility modules during development
# This enables automatic reloading of changed files in utils/ folder
# without needing to restart the kernel or manually reload modules
%load_ext autoreload
%autoreload 2

import copy
import os
import shutil
import time
import sys  # Add this line
from sklearn.metrics import f1_score
import pandas as pd

sys.path.append('..')
from utils.image_utils import copy_image_to_class_folders  # Fix this line

import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import torchvision
from torchvision import datasets, models, transforms

### 2. Image data preparation
Organize processed images into class-specific directories for model training.

This cell takes preprocessed images and copies them into a structured directory hierarchy suitable for training image classification models. It performs a train/validation split (80/20 by default) and organizes images by their product type code into separate class folders.

In [None]:
#df_image_train = pd.read_csv("df_image_train.csv", index_col="productid")
#copy_image_to_class_folders(df_image_train) # Duration 26apr2025

### 3. Data transformations

In [None]:
# 2. Define data transformations
data_transforms = {
    'train': transforms.Compose([
        transforms.Resize((224, 224)),  # Add this back for VGG16
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize((224, 224)),  # Add this back for VGG16
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

### 3. Load data

In [None]:
# 3. Load data
data_dir = '../data/processed/images/image_train_vgg16'
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x])
                  for x in ['train', 'val']}  # Use 'train', 'val'

dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=4,
                                             shuffle=True, num_workers=4)
              for x in ['train', 'val']}  # Use 'train', 'val'

dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']}

class_names = image_datasets['image_train_vgg16'].classes

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

### 4. Load model

In [None]:
# Load pre-trained VGG16 model
model_ft = models.vgg16(weights='IMAGENET1K_V1')

### 5. Modify the classifier

In [None]:
# Freeze all parameters in the features (convolutional) layers
for param in model_ft.features.parameters():
    param.requires_grad = False

# Replace the final layer in the classifier
num_ftrs = model_ft.classifier[6].in_features
model_ft.classifier[6] = nn.Linear(num_ftrs, 27)  # Replace with 27 outputs for your classes

model_ft = model_ft.to(device)  # Move model to GPU if available

### 6. Loss Function and Optimizer

In [None]:
# Set up loss function and optimizer
criterion = nn.CrossEntropyLoss()

# Only optimize parameters that require gradients
optimizer_ft = optim.SGD(filter(lambda p: p.requires_grad, model_ft.parameters()), 
                         lr=0.001, momentum=0.9)

# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)

### 7. Training and Evaluation Function

This cell only defines the training function. Run the next cell to actually execute the training

In [None]:
# Training function
# Modified training function with F1-score tracking
def train_model(model, criterion, optimizer, scheduler, num_epochs=5):
    since = time.time()
    
    best_model_wts = copy.deepcopy(model.state_dict())
    best_f1 = 0.0  # Track best F1 instead of accuracy
    
    train_key = 'train'
    val_key = 'val'
    
    for epoch in range(num_epochs):
        print(f'Epoch {epoch}/{num_epochs - 1}')
        print('-' * 10)
        
        # Each epoch has a training and validation phase
        for phase in [train_key, val_key]:
            if phase == train_key:
                model.train()
            else:
                model.eval()
                
            running_loss = 0.0
            running_corrects = 0
            all_preds = []
            all_labels = []
            
            # Iterate over data
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device)
                
                optimizer.zero_grad()
                
                with torch.set_grad_enabled(phase == train_key):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)
                    
                    if phase == train_key:
                        loss.backward()
                        optimizer.step()
                
                # Collect predictions and labels for F1 calculation
                all_preds.extend(preds.cpu().numpy())
                all_labels.extend(labels.cpu().numpy())
                
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)
            
            if phase == train_key:
                scheduler.step()
                
            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects.double() / dataset_sizes[phase]
            epoch_f1 = f1_score(all_labels, all_preds, average='weighted')
            
            print(f'{phase} Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f} F1: {epoch_f1:.4f}')
            
            # Save best model based on F1-score
            if phase == val_key and epoch_f1 > best_f1:
                best_f1 = epoch_f1
                best_model_wts = copy.deepcopy(model.state_dict())
        
        print()
    
    time_elapsed = time.time() - since
    print(f'Training complete in {time_elapsed // 60:.0f}m {time_elapsed % 60:.0f}s')
    print(f'Best val F1: {best_f1:.4f}')
    
    model.load_state_dict(best_model_wts)
    return model

In [None]:
# Train and evaluate the model
model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler, num_epochs=1)

### 8. Save the trained model

In [None]:
torch.save(model_ft.state_dict(), 'vgg16_transfer_model.pth')

### 9. Future Improvements for VGG16 Training

**4. Training Configuration Optimizations:**
- **Increase epochs**: Current training uses only 1 epoch - should use 10-25 epochs for meaningful training
- **Larger batch size**: Current batch_size=4 could be increased to 16-32 for better gradient estimates (depending on GPU memory)
- **Validation monitoring**: Add early stopping and validation loss tracking during training
- **Learning rate optimization**: Experiment with different initial learning rates (0.01, 0.001, 0.0001)
- **Data augmentation**: Consider additional augmentations like rotation, color jitter, or random crops
- **Class balancing**: Implement weighted loss function to handle class imbalance in the dataset
- **Cross-validation**: Implement k-fold cross-validation for more robust performance estimates

**Note**: These improvements should be considered for future model retraining when computational resources allow.

---

## 🔄 Model Evaluation (Start here for existing model)

**The following sections load and evaluate the existing trained VGG16 model without retraining.**

In [3]:
### 10. Load Existing Trained Model (Skip Training)
# Load the pre-trained model instead of training from scratch
print("Loading existing trained VGG16 model...")

# Check if GPU is available and set device
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Initialize the model architecture (same as training setup)
model_ft = models.vgg16(weights='IMAGENET1K_V1')

# Freeze feature layers (same as training)
for param in model_ft.features.parameters():
    param.requires_grad = False
    
# Replace final classifier layer (same as training)
num_ftrs = model_ft.classifier[6].in_features
model_ft.classifier[6] = nn.Linear(num_ftrs, 27)
model_ft = model_ft.to(device)

# Load the trained weights
model_path = '../models/vgg16_transfer_model.pth'
model_ft.load_state_dict(torch.load(model_path, map_location=device))
model_ft.eval()

print("✅ Model loaded successfully!")
print(f"Model loaded from: {model_path}")
print(f"Device: {device}")

Loading existing trained VGG16 model...
Using device: cpu
✅ Model loaded successfully!
Model loaded from: ../models/vgg16_transfer_model.pth
Device: cpu


Adjust batch_size and num_workers according to your available computing power

In [4]:
### 11. Setup Data for Evaluation with OpenCV (PIL-free)
import cv2
import torch
from torch.utils.data import Dataset
import os
import numpy as np

class OpenCVImageFolder(Dataset):
    """Custom dataset that uses OpenCV and bypasses PIL completely"""
    
    def __init__(self, root_dir, transform=None):
        self.root_dir = root_dir
        self.transform = transform
        self.classes = sorted([d for d in os.listdir(root_dir) 
                              if os.path.isdir(os.path.join(root_dir, d)) and not d.startswith('.')])
        self.class_to_idx = {cls: idx for idx, cls in enumerate(self.classes)}
        
        # Get all image paths
        self.samples = []
        for class_name in self.classes:
            class_dir = os.path.join(root_dir, class_name)
            if os.path.isdir(class_dir):
                for img_name in os.listdir(class_dir):
                    if img_name.lower().endswith(('.jpg', '.jpeg')):
                        img_path = os.path.join(class_dir, img_name)
                        self.samples.append((img_path, self.class_to_idx[class_name]))
    
    def __len__(self):
        return len(self.samples)
    
    def __getitem__(self, idx):
        img_path, label = self.samples[idx]
        
        # Use OpenCV to load image
        img = cv2.imread(img_path)
        if img is None:
            raise ValueError(f"Could not load image: {img_path}")
        
        # Convert BGR to RGB
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        
        # Resize to 224x224
        img = cv2.resize(img, (224, 224))
        
        # Convert to PyTorch tensor directly (bypass PIL)
        img = torch.from_numpy(img).permute(2, 0, 1).float() / 255.0
        
        # Apply ImageNet normalization
        mean = torch.tensor([0.485, 0.456, 0.406]).view(3, 1, 1)
        std = torch.tensor([0.229, 0.224, 0.225]).view(3, 1, 1)
        img = (img - mean) / std
        
        return img, label

# Load data (no transforms needed since everything is done in dataset)
data_dir = '../data/processed/images/image_train_vgg16'
image_datasets = {x: OpenCVImageFolder(os.path.join(data_dir, x))
                  for x in ['train', 'val']}

dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=16,
                                             shuffle=False, num_workers=0)
              for x in ['train', 'val']}

dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']}
class_names = image_datasets['train'].classes

print(f"Dataset sizes - Train: {dataset_sizes['train']:,}, Val: {dataset_sizes['val']:,}")
print(f"Number of classes: {len(class_names)}")

Dataset sizes - Train: 66,551, Val: 16,671
Number of classes: 27


In [5]:
### 12. Comprehensive Model Evaluation
from sklearn.metrics import classification_report, f1_score
import json
import numpy as np

def evaluate_model_comprehensive(model, dataloader, class_names, device):
    """Comprehensive evaluation with F1-score and classification report"""
    model.eval()
    
    all_preds = []
    all_labels = []
    running_corrects = 0
    total_samples = 0
    
    print("Evaluating model...")
    with torch.no_grad():
        for i, (inputs, labels) in enumerate(dataloader):
            inputs = inputs.to(device)
            labels = labels.to(device)
            
            outputs = model(inputs)
            _, preds = torch.max(outputs, 1)
            
            all_preds.extend(preds.cpu().numpy())
            all_labels.extend(labels.cpu().numpy())
            
            running_corrects += torch.sum(preds == labels.data)
            total_samples += labels.size(0)
            
            # Progress indicator
            if (i + 1) % 100 == 0:
                print(f"Processed {i + 1} batches...")
    
    # Calculate metrics
    accuracy = running_corrects.double() / total_samples
    f1_weighted = f1_score(all_labels, all_preds, average='weighted')
    f1_macro = f1_score(all_labels, all_preds, average='macro')
    
    # Classification report
    class_report = classification_report(
        all_labels, all_preds, 
        target_names=class_names, 
        output_dict=True
    )
    
    return {
        'accuracy': float(accuracy),
        'f1_weighted': f1_weighted,
        'f1_macro': f1_macro,
        'classification_report': class_report,
        'predictions': all_preds,
        'true_labels': all_labels
    }

# Evaluate on validation set
print("=" * 50)
print("VGG16 Model Evaluation")
print("=" * 50)

val_results = evaluate_model_comprehensive(
    model_ft, 
    dataloaders['val'], 
    class_names, 
    device
)

print(f"\n📊 Validation Results:")
print(f"Accuracy: {val_results['accuracy']:.4f}")
print(f"F1-Score (Weighted): {val_results['f1_weighted']:.4f}")
print(f"F1-Score (Macro): {val_results['f1_macro']:.4f}")

VGG16 Model Evaluation
Evaluating model...
Processed 100 batches...
Processed 200 batches...
Processed 300 batches...
Processed 400 batches...
Processed 500 batches...
Processed 600 batches...
Processed 700 batches...
Processed 800 batches...
Processed 900 batches...
Processed 1000 batches...

📊 Validation Results:
Accuracy: 0.5382
F1-Score (Weighted): 0.5183
F1-Score (Macro): 0.4603


In [6]:
### 13. Save Results to Pipeline
# Save comprehensive results following the established format
results_data = {
    'model_type': 'VGG16_Transfer_Learning',
    'dataset': 'validation_split',
    'evaluation_date': pd.Timestamp.now().isoformat(),
    'model_path': model_path,
    'metrics': {
        'accuracy': val_results['accuracy'],
        'f1_score_weighted': val_results['f1_weighted'],
        'f1_score_macro': val_results['f1_macro']
    },
    'classification_report': val_results['classification_report'],
    'model_parameters': {
        'architecture': 'VGG16',
        'transfer_learning': True,
        'frozen_features': True,
        'num_classes': 27,
        'input_size': '224x224',
        'batch_size': 32,
        'preprocessing': 'ImageNet_normalization'
    },
    'data_split': {
        'validation_samples': dataset_sizes['val'],
        'training_samples': dataset_sizes['train'],
        'split_method': 'stratified_80_20_random_state_42'
    },
    'benchmark_comparison': {
        'official_resnet50_f1': 0.5534,
        'our_vgg16_f1': val_results['f1_weighted'],
        'improvement': val_results['f1_weighted'] - 0.5534
    }
}

# Save to results folder
results_path = '../results/vgg16_model_results.json'
os.makedirs('../results', exist_ok=True)

with open(results_path, 'w') as f:
    json.dump(results_data, f, indent=2)

print(f"\n✅ Results saved to {results_path}")
print(f"🎯 VGG16 F1-Score: {val_results['f1_weighted']:.3f}")
print(f"📈 vs Official Benchmark: {val_results['f1_weighted'] - 0.5534:+.3f}")

# Display classification report summary
print(f"\n📋 Classification Report Summary:")
print(f"Classes: {len(class_names)}")
print(f"Support: {sum(val_results['classification_report'][cls]['support'] for cls in class_names)}")


✅ Results saved to ../results/vgg16_model_results.json
🎯 VGG16 F1-Score: 0.518
📈 vs Official Benchmark: -0.035

📋 Classification Report Summary:
Classes: 27
Support: 16671.0
