# ResNet50 Waste Classification Notebook

## Introduction

This notebook implements a deep learning model for waste classification using the ResNet50 architecture. The model is trained on the preprocessed public dataset and tested on two different datasets.

**Training Dataset:** `preprocessed_Public`  
**Test Dataset 1:** `preprocessed_self`  
**Test Dataset 2:** `SelfCollected_Dataset`

Key features:
- Fine-tuning of pre-trained ResNet50
- Data augmentation and class weight balancing
- Early stopping and learning rate scheduling
- Comprehensive evaluation on multiple test sets with confusion matrix and metrics
- Automatic model saving

## Dependencies

The following cells install the necessary Python packages, including PyTorch with CUDA support for GPU acceleration.

In [1]:
# Install PyTorch with CUDA support for GPU acceleration
# This installs PyTorch with CUDA 12.1 support. For other CUDA versions, visit https://pytorch.org/get-started/locally/
import subprocess
import sys

subprocess.check_call([sys.executable, "-m", "pip", "install", "--quiet", "torch", "torchvision", "torchaudio", "--index-url", "https://download.pytorch.org/whl/cu121"])
print("PyTorch with CUDA 12.1 installed successfully!")

PyTorch with CUDA 12.1 installed successfully!


In [2]:
%pip install scikit-learn matplotlib seaborn pandas

Note: you may need to restart the kernel to use updated packages.


# ResNet50 Waste Classification Model

## Overview
This notebook implements a fine-tuned ResNet50 model for waste classification (Paper, Plastic, Aluminum) with advanced training techniques including early stopping, class weight balancing, and comprehensive metrics tracking.

**Data Strategy:**
- **Training:** preprocessed_Public dataset (80% train, 20% validation split)
- **Testing:** Two separate test datasets (preprocessed_self and SelfCollected_Dataset)

---

## Cell 1: Setup & Configuration

In [3]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, models, transforms
from torch.utils.data import DataLoader, Subset
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_score, recall_score, f1_score, confusion_matrix
import time
import numpy as np
import os

# ==========================================
# 1. CONFIGURATION
# ==========================================
# Training data: preprocessed_Public dataset
DATA_DIR = '../Dataset/preprocessed_Public'

# Test datasets
TEST_DIR_1 = '../Dataset/preprocessed_self'
TEST_DIR_2 = '../Dataset/SelfCollected_Dataset'
# Model save path
MODEL_SAVE_PATH = 'waste_classifier_resnet50_final.pth'

BATCH_SIZE = 8             # Reduced slightly for 512x512 images to avoid memory errors
LEARNING_RATE = 1e-4        # Lower learning rate for fine-tuning
NUM_EPOCHS = 15
NUM_CLASSES = 3             # paper, plastic, aluminum
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Early Stopping Configuration
PATIENCE = 5                # Number of epochs to wait before stopping
MIN_DELTA = 0.001          # Minimum change to qualify as an improvement

print(f"Using device: {DEVICE}")
print(f"\nDataset Configuration:")
print(f"  Training: {DATA_DIR}")
print(f"  Test Set 1: {TEST_DIR_1}")
print(f"  Test Set 2: {TEST_DIR_2}")
print(f"  Model will be saved to: {MODEL_SAVE_PATH}")

Using device: cpu

Dataset Configuration:
  Training: ../Dataset/preprocessed_Public
  Test Set 1: ../Dataset/preprocessed_self
  Test Set 2: ../Dataset/SelfCollected_Dataset
  Model will be saved to: waste_classifier_resnet50_final.pth


## Cell 2: Data Preparation with Class Weight Analysis

This section loads the preprocessed_Public dataset and automatically calculates class weights to handle imbalanced data.

In [4]:
# Training transforms: Resize + Augmentation (Flip, Rotate, Color Jitter)
train_transform = transforms.Compose([
    transforms.Resize((512, 512)),
    transforms.RandomHorizontalFlip(),
    transforms.RandomRotation(15),
    transforms.ColorJitter(brightness=0.1, contrast=0.1), # Adds robustness to lighting
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

# Validation transforms: Resize only
val_transform = transforms.Compose([
    transforms.Resize((512, 512)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

try:
    # Load dataset twice: once for train (with augmentation), once for val (clean)
    full_data_train = datasets.ImageFolder(DATA_DIR, transform=train_transform)
    full_data_val = datasets.ImageFolder(DATA_DIR, transform=val_transform)

    # Get class names
    class_names = full_data_train.classes
    print(f"Classes detected: {class_names}")

    # Create indices for split (80% Train, 20% Val)
    train_idx, val_idx = train_test_split(
        list(range(len(full_data_train))),
        test_size=0.2,
        random_state=42,
        stratify=full_data_train.targets  # Ensure balanced split
    )

    # Create subsets
    train_dataset = Subset(full_data_train, train_idx)
    val_dataset = Subset(full_data_val, val_idx)

    # Data Loaders
    dataloaders = {
        'train': DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=2),
        'val': DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=False, num_workers=2)
    }
    dataset_sizes = {'train': len(train_dataset), 'val': len(val_dataset)}
    
    print(f"\nDataset Split:")
    print(f"  Training: {dataset_sizes['train']} images")
    print(f"  Validation: {dataset_sizes['val']} images")

except Exception as e:
    print("\nERROR: Could not find dataset!")
    print(f"Make sure you have a folder named '{DATA_DIR}' with subfolders for each class.")
    print(f"Error details: {str(e)}")
    raise

# ==========================================
# CALCULATE CLASS WEIGHTS (Handle Imbalanced Data)
# ==========================================
print("\nCalculating class weights for imbalanced dataset...")

# Count samples per class in training set
class_counts = np.zeros(NUM_CLASSES)
for idx in train_idx:
    label = full_data_train.targets[idx]
    class_counts[label] += 1

print(f"Class distribution in training set:")
for i, class_name in enumerate(class_names):
    print(f"  {class_name}: {int(class_counts[i])} samples")

# Calculate weights inversely proportional to class frequency
# Gives more weight to underrepresented classes
total_samples = np.sum(class_counts)
class_weights = total_samples / (NUM_CLASSES * class_counts)
class_weights = torch.tensor(class_weights, dtype=torch.float32).to(DEVICE)

print(f"\nClass weights (normalized):")
for i, class_name in enumerate(class_names):
    print(f"  {class_name}: {class_weights[i]:.4f}")

Classes detected: ['test', 'train', 'val']

Dataset Split:
  Training: 6654 images
  Validation: 1664 images

Calculating class weights for imbalanced dataset...
Class distribution in training set:
  test: 322 samples
  train: 6009 samples
  val: 323 samples

Class weights (normalized):
  test: 6.8882
  train: 0.3691
  val: 6.8669


## Cell 3: Model Setup (ResNet50 + Dropout)

Initialize ResNet50 with pretrained ImageNet weights and add a custom classifier head.

In [5]:
print("\nInitializing ResNet50...")
model = models.resnet50(weights=models.ResNet50_Weights.DEFAULT)

# Modify the final layer (The Classifier)
# ResNet50's default input to the final layer is 2048 features
num_ftrs = model.fc.in_features

model.fc = nn.Sequential(
    nn.Dropout(0.5),            # Strong dropout to prevent overfitting
    nn.Linear(num_ftrs, 512),   # Add an intermediate layer
    nn.ReLU(),
    nn.Dropout(0.3),            # Mild dropout
    nn.Linear(512, NUM_CLASSES) # Final output (3 classes)
)

model = model.to(DEVICE)


Initializing ResNet50...


## Cell 4: Training Setup with Early Stopping & Class Weights

Configure the loss function with class weights, optimizer, scheduler, and early stopping mechanism.

In [6]:
# ==========================================
# LOSS FUNCTION WITH CLASS WEIGHTS
# ==========================================
# CrossEntropyLoss with class weights to handle imbalanced data
criterion = nn.CrossEntropyLoss(weight=class_weights)

# Adam optimizer is generally faster at converging than SGD
optimizer = optim.Adam(model.parameters(), lr=LEARNING_RATE)

# Scheduler: if validation accuracy doesn't improve, lower the learning rate
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='max', factor=0.1, patience=3)

# ==========================================
# EARLY STOPPING CLASS
# ==========================================
class EarlyStopping:
    """
    Stops training when validation metric stops improving.
    Saves the best model weights.
    """
    def __init__(self, patience=5, min_delta=0.001, verbose=True):
        self.patience = patience
        self.min_delta = min_delta
        self.verbose = verbose
        self.counter = 0
        self.best_score = None
        self.early_stop = False
        self.best_model_wts = None
        
    def __call__(self, val_acc, model):
        if self.best_score is None:
            self.best_score = val_acc
            self.best_model_wts = model.state_dict().copy()
        elif val_acc > self.best_score + self.min_delta:
            self.best_score = val_acc
            self.counter = 0
            self.best_model_wts = model.state_dict().copy()
            if self.verbose:
                print(f"    ✓ Validation improved! New best accuracy: {val_acc:.4f}")
        else:
            self.counter += 1
            if self.verbose:
                print(f"    ✗ No improvement. Patience: {self.counter}/{self.patience}")
            if self.counter >= self.patience:
                self.early_stop = True
                if self.verbose:
                    print(f"    ⚠ EARLY STOPPING TRIGGERED after {self.patience} epochs without improvement!")

# Initialize early stopping
early_stopping = EarlyStopping(patience=PATIENCE, min_delta=MIN_DELTA, verbose=True)

# ==========================================
# TRAINING LOOP WITH METRICS TRACKING
# ==========================================
def train_model(model, criterion, optimizer, scheduler, early_stopping, num_epochs=10):
    """
    Train the model with early stopping and comprehensive metrics tracking.
    
    Tracks:
    - Loss & Accuracy
    - Precision, Recall, F1 Score
    - Best model checkpoint
    """
    since = time.time()
    
    best_acc = 0.0
    history = {
        'train_loss': [], 'val_loss': [],
        'train_acc': [], 'val_acc': [],
        'train_precision': [], 'val_precision': [],
        'train_recall': [], 'val_recall': [],
        'train_f1': [], 'val_f1': []
    }

    for epoch in range(num_epochs):
        print(f'\n{"="*70}')
        print(f'Epoch {epoch+1}/{num_epochs}')
        print(f'{"="*70}')

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0
            batch_count = 0
            
            # Store predictions and labels for metrics calculation
            all_preds = []
            all_labels = []

            # Iterate over data
            for batch_idx, (inputs, labels) in enumerate(dataloaders[phase]):
                inputs = inputs.to(DEVICE)
                labels = labels.to(DEVICE)

                # Zero the parameter gradients
                optimizer.zero_grad()

                # Forward
                # Track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)

                    # Backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                # Statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)
                batch_count += 1
                
                # Store predictions and labels for metrics
                all_preds.extend(preds.cpu().numpy())
                all_labels.extend(labels.cpu().numpy())
                
                # Print batch progress every 5 batches
                if (batch_idx + 1) % 5 == 0 or batch_idx == 0:
                    current_loss = running_loss / (batch_count * BATCH_SIZE)
                    current_acc = running_corrects.double() / (batch_count * BATCH_SIZE)
                    num_batches = len(dataloaders[phase])
                    print(f'  {phase.upper()} | Batch {batch_idx+1}/{num_batches} | Loss: {current_loss:.4f} | Acc: {current_acc:.4f}')

            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects.double() / dataset_sizes[phase]
            
            # Calculate additional metrics
            all_preds = np.array(all_preds)
            all_labels = np.array(all_labels)
            
            epoch_precision = precision_score(all_labels, all_preds, average='weighted', zero_division=0)
            epoch_recall = recall_score(all_labels, all_preds, average='weighted', zero_division=0)
            epoch_f1 = f1_score(all_labels, all_preds, average='weighted', zero_division=0)

            # Get current learning rate
            current_lr = optimizer.param_groups[0]['lr']
            
            print(f'\n  {phase.upper()} SUMMARY')
            print(f'    Loss: {epoch_loss:.4f} | Acc: {epoch_acc:.4f}')
            print(f'    Precision: {epoch_precision:.4f} | Recall: {epoch_recall:.4f} | F1: {epoch_f1:.4f}')
            print(f'    Learning Rate: {current_lr:.2e}')
            
            # Store metrics in history
            history[f'{phase}_loss'].append(epoch_loss)
            history[f'{phase}_acc'].append(epoch_acc.item())
            history[f'{phase}_precision'].append(epoch_precision)
            history[f'{phase}_recall'].append(epoch_recall)
            history[f'{phase}_f1'].append(epoch_f1)

            # Deep copy the model if it's the best one so far
            if phase == 'val':
                if epoch_acc > best_acc:
                    best_acc = epoch_acc
                    print(f'  *** NEW BEST MODEL! Validation Accuracy: {epoch_acc:.4f} ***')
                
                # Check early stopping and step scheduler
                early_stopping(epoch_acc.item(), model)
                scheduler.step(epoch_acc)
                
                if early_stopping.early_stop:
                    print(f"\nTraining stopped early at epoch {epoch+1}/{num_epochs}")
                    break
        
        if early_stopping.early_stop:
            break

    time_elapsed = time.time() - since
    print(f'\n{"="*70}')
    print(f'Training complete in {time_elapsed // 60:.0f}m {time_elapsed % 60:.0f}s')
    print(f'Best val Acc: {best_acc:.4f}')
    print(f'{"="*70}')

    # Load best model weights
    model.load_state_dict(early_stopping.best_model_wts)
    return model, history

# ==========================================
# RUN TRAINING & SAVE MODEL
# ==========================================
if __name__ == '__main__':
    print("\n" + "="*70)
    print("STARTING TRAINING")
    print("="*70)
    
    # Train
    trained_model, training_history = train_model(model, criterion, optimizer, scheduler, early_stopping, NUM_EPOCHS)
    
    # Save the trained model
    print(f"\n{'='*70}")
    print("SAVING MODEL")
    print("="*70)
    
    # Save model state dict (recommended approach)
    torch.save({
        'model_state_dict': trained_model.state_dict(),
        'class_names': class_names,
        'training_history': training_history,
        'model_config': {
            'num_classes': NUM_CLASSES,
            'architecture': 'ResNet50'
        }
    }, MODEL_SAVE_PATH)
    
    print(f"✓ Model successfully saved to: {MODEL_SAVE_PATH}")
    print(f"  File size: {os.path.getsize(MODEL_SAVE_PATH) / (1024*1024):.2f} MB")
    print(f"\nModel includes:")
    print(f"  - Trained weights (state_dict)")
    print(f"  - Class names: {class_names}")
    print(f"  - Training history (loss, accuracy, metrics)")
    print("="*70)


STARTING TRAINING

Epoch 1/15
  TRAIN | Batch 1/832 | Loss: 1.0864 | Acc: 0.1250
  TRAIN | Batch 5/832 | Loss: 1.0684 | Acc: 0.5750
  TRAIN | Batch 10/832 | Loss: 1.0920 | Acc: 0.6750
  TRAIN | Batch 15/832 | Loss: 1.0416 | Acc: 0.7667
  TRAIN | Batch 20/832 | Loss: 1.0369 | Acc: 0.7812
  TRAIN | Batch 25/832 | Loss: 1.0019 | Acc: 0.8050
  TRAIN | Batch 30/832 | Loss: 0.9994 | Acc: 0.8250
  TRAIN | Batch 35/832 | Loss: 0.9853 | Acc: 0.8393
  TRAIN | Batch 40/832 | Loss: 0.9633 | Acc: 0.8469
  TRAIN | Batch 45/832 | Loss: 0.9724 | Acc: 0.8528
  TRAIN | Batch 50/832 | Loss: 0.9708 | Acc: 0.8600
  TRAIN | Batch 55/832 | Loss: 1.0056 | Acc: 0.8568
  TRAIN | Batch 60/832 | Loss: 1.0037 | Acc: 0.8646
  TRAIN | Batch 65/832 | Loss: 1.0089 | Acc: 0.8577
  TRAIN | Batch 70/832 | Loss: 1.0188 | Acc: 0.8554
  TRAIN | Batch 75/832 | Loss: 1.0017 | Acc: 0.8617
  TRAIN | Batch 80/832 | Loss: 1.0176 | Acc: 0.8562
  TRAIN | Batch 85/832 | Loss: 1.0207 | Acc: 0.8559
  TRAIN | Batch 90/832 | Loss: 1.03

KeyboardInterrupt: 

## Cell 5: Training History Visualization

Visualize the training and validation metrics over epochs.

In [None]:
import matplotlib.pyplot as plt

# Plot training history
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Loss plot
axes[0, 0].plot(training_history['train_loss'], label='Train', marker='o')
axes[0, 0].plot(training_history['val_loss'], label='Val', marker='s')
axes[0, 0].set_xlabel('Epoch')
axes[0, 0].set_ylabel('Loss')
axes[0, 0].set_title('Training & Validation Loss')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Accuracy plot
axes[0, 1].plot(training_history['train_acc'], label='Train', marker='o')
axes[0, 1].plot(training_history['val_acc'], label='Val', marker='s')
axes[0, 1].set_xlabel('Epoch')
axes[0, 1].set_ylabel('Accuracy')
axes[0, 1].set_title('Training & Validation Accuracy')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Precision, Recall, F1 plot
axes[1, 0].plot(training_history['train_precision'], label='Train Precision', marker='o')
axes[1, 0].plot(training_history['val_precision'], label='Val Precision', marker='s')
axes[1, 0].plot(training_history['train_recall'], label='Train Recall', marker='^')
axes[1, 0].plot(training_history['val_recall'], label='Val Recall', marker='d')
axes[1, 0].set_xlabel('Epoch')
axes[1, 0].set_ylabel('Score')
axes[1, 0].set_title('Precision & Recall')
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3)

# F1 Score plot
axes[1, 1].plot(training_history['train_f1'], label='Train', marker='o')
axes[1, 1].plot(training_history['val_f1'], label='Val', marker='s')
axes[1, 1].set_xlabel('Epoch')
axes[1, 1].set_ylabel('F1 Score')
axes[1, 1].set_title('F1 Score (Weighted Average)')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Cell 6: Model Evaluation on Test Set 1 (preprocessed_self)

This cell evaluates the trained model on the first test dataset.

In [None]:
# ==========================================
# EVALUATION ON TEST SET 1: preprocessed_self
# This cell can run independently after training
# ==========================================

import torch
from torch import nn
from torchvision import datasets, models, transforms
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import classification_report, confusion_matrix, precision_recall_fscore_support
import numpy as np
import os
import pandas as pd

# ==========================================
# HELPER FUNCTIONS
# ==========================================

def load_trained_model(model_path, device):
    """Load the trained model from checkpoint."""
    if not os.path.exists(model_path):
        raise FileNotFoundError(f"Model file {model_path} not found! Please train the model first.")
    
    checkpoint = torch.load(model_path, map_location=device)
    
    # Recreate model architecture
    model = models.resnet50(weights=None)
    num_ftrs = model.fc.in_features
    num_classes = checkpoint['model_config']['num_classes']
    
    model.fc = nn.Sequential(
        nn.Dropout(0.5),
        nn.Linear(num_ftrs, 512),
        nn.ReLU(),
        nn.Dropout(0.3),
        nn.Linear(512, num_classes)
    )
    
    # Load trained weights
    model.load_state_dict(checkpoint['model_state_dict'])
    model.to(device)
    model.eval()
    
    return model, checkpoint['class_names']

def evaluate_model(model, dataloader, device):
    """Evaluate the model and return predictions and labels."""
    model.eval()
    all_preds = []
    all_labels = []
    all_probs = []

    with torch.no_grad():
        for inputs, labels in dataloader:
            inputs = inputs.to(device)
            labels = labels.to(device)

            outputs = model(inputs)
            probs = torch.softmax(outputs, dim=1)
            _, preds = torch.max(outputs, 1)

            all_preds.extend(preds.cpu().numpy())
            all_labels.extend(labels.cpu().numpy())
            all_probs.extend(probs.cpu().numpy())

    return np.array(all_preds), np.array(all_labels), np.array(all_probs)

def plot_confusion_matrix(cm, class_names, title='Confusion Matrix', save_path=None):
    """Plot confusion matrix using seaborn heatmap."""
    plt.figure(figsize=(10, 8))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
                xticklabels=class_names, yticklabels=class_names,
                cbar_kws={'label': 'Count'})
    plt.title(title, fontsize=14, fontweight='bold')
    plt.ylabel('True Label', fontsize=12)
    plt.xlabel('Predicted Label', fontsize=12)
    plt.tight_layout()
    
    if save_path:
        plt.savefig(save_path, dpi=300, bbox_inches='tight')
        print(f"  ✓ Confusion matrix saved to: {save_path}")
    
    plt.show()

# ==========================================
# LOAD MODEL & EVALUATE ON TEST SET 1
# ==========================================

print("\n" + "="*70)
print("EVALUATING ON TEST SET 1: preprocessed_self")
print("="*70)

# Configuration
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
BATCH_SIZE = 8

# Validation transform (no augmentation for testing)
test_transform = transforms.Compose([
    transforms.Resize((512, 512)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

# Load the trained model
try:
    model, class_names = load_trained_model(MODEL_SAVE_PATH, DEVICE)
    print(f"✓ Model loaded from: {MODEL_SAVE_PATH}")
except Exception as e:
    print(f"❌ Error loading model: {str(e)}")
    raise

# Load test dataset 1
try:
    test_dataset_1 = datasets.ImageFolder(TEST_DIR_1, transform=test_transform)
    test_loader_1 = DataLoader(test_dataset_1, batch_size=BATCH_SIZE, shuffle=False)
    
    print(f"✓ Test dataset loaded from: {TEST_DIR_1}")
    print(f"  Classes: {test_dataset_1.classes}")
    print(f"  Total images: {len(test_dataset_1)}")
    
    # Class distribution
    test_labels_1 = [label for _, label in test_dataset_1.samples]
    for i, class_name in enumerate(class_names):
        count = test_labels_1.count(i)
        print(f"    - {class_name}: {count} images")
        
except Exception as e:
    print(f"❌ Error loading test dataset: {str(e)}")
    raise

# Evaluate on test set 1
print("\n" + "-"*70)
print("Running inference...")
print("-"*70)

preds_1, labels_1, probs_1 = evaluate_model(model, test_loader_1, DEVICE)

# Save predictions to CSV
predictions_df_1 = pd.DataFrame({
    'Image_Index': range(len(labels_1)),
    'True_Label': [class_names[label] for label in labels_1],
    'Predicted_Label': [class_names[pred] for pred in preds_1],
    'Confidence': [probs_1[i][preds_1[i]] for i in range(len(preds_1))]
})

# Add probability columns for each class
for i, class_name in enumerate(class_names):
    predictions_df_1[f'Prob_{class_name}'] = probs_1[:, i]

csv_path_1 = 'predictions_preprocessed_self.csv'
predictions_df_1.to_csv(csv_path_1, index=False)
print(f"\n✓ Predictions saved to: {csv_path_1}")

# Calculate confusion matrix
cm_1 = confusion_matrix(labels_1, preds_1)

# Print classification report
print("\n" + "="*70)
print("CLASSIFICATION REPORT - TEST SET 1 (preprocessed_self)")
print("="*70)
report_1 = classification_report(labels_1, preds_1, target_names=class_names, digits=4)
print(report_1)

# Save classification report
with open('classification_report_preprocessed_self.txt', 'w') as f:
    f.write("="*70 + "\n")
    f.write("CLASSIFICATION REPORT - TEST SET 1 (preprocessed_self)\n")
    f.write("="*70 + "\n\n")
    f.write(report_1)
print("✓ Classification report saved to: classification_report_preprocessed_self.txt")

# Plot confusion matrix
print("\n" + "="*70)
print("CONFUSION MATRIX - TEST SET 1")
print("="*70)
plot_confusion_matrix(cm_1, class_names, 
                     title='Confusion Matrix - preprocessed_self Dataset',
                     save_path='confusion_matrix_preprocessed_self.png')

# Calculate per-class accuracy
print("\n" + "="*70)
print("PER-CLASS METRICS - TEST SET 1")
print("="*70)
for i, class_name in enumerate(class_names):
    class_accuracy = cm_1[i, i] / cm_1[i, :].sum() if cm_1[i, :].sum() > 0 else 0
    print(f"  {class_name}:")
    print(f"    Accuracy: {class_accuracy:.4f} ({cm_1[i, i]}/{cm_1[i, :].sum()})")

# Overall metrics
overall_accuracy_1 = np.trace(cm_1) / np.sum(cm_1)
precision_1, recall_1, f1_1, _ = precision_recall_fscore_support(labels_1, preds_1, average='weighted')

print(f"\n{'='*70}")
print("OVERALL METRICS - TEST SET 1 (preprocessed_self)")
print("="*70)
print(f"  Overall Accuracy: {overall_accuracy_1:.4f}")
print(f"  Weighted Precision: {precision_1:.4f}")
print(f"  Weighted Recall: {recall_1:.4f}")
print(f"  Weighted F1-Score: {f1_1:.4f}")
print("="*70)

# Save summary metrics
metrics_summary_1 = {
    'Dataset': 'preprocessed_self',
    'Total_Samples': len(labels_1),
    'Overall_Accuracy': overall_accuracy_1,
    'Weighted_Precision': precision_1,
    'Weighted_Recall': recall_1,
    'Weighted_F1': f1_1
}

metrics_df_1 = pd.DataFrame([metrics_summary_1])
metrics_df_1.to_csv('metrics_summary_preprocessed_self.csv', index=False)
print(f"\n✓ Metrics summary saved to: metrics_summary_preprocessed_self.csv")

## Cell 7: Model Evaluation on Test Set 2 (SelfCollected_Dataset)

This cell evaluates the trained model on the second test dataset.

In [None]:
# ==========================================
# EVALUATION ON TEST SET 2: SelfCollected_Dataset
# This cell can run independently after training
# ==========================================

import torch
from torch import nn
from torchvision import datasets, models, transforms
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import classification_report, confusion_matrix, precision_recall_fscore_support
import numpy as np
import os
import pandas as pd

print("\n" + "="*70)
print("EVALUATING ON TEST SET 2: SelfCollected_Dataset")
print("="*70)

# Configuration
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
BATCH_SIZE = 8

# Test transform
test_transform = transforms.Compose([
    transforms.Resize((512, 512)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

# Load the trained model (reuse functions from previous cell)
try:
    model, class_names = load_trained_model(MODEL_SAVE_PATH, DEVICE)
    print(f"✓ Model loaded from: {MODEL_SAVE_PATH}")
except Exception as e:
    print(f"❌ Error loading model: {str(e)}")
    raise

# Load test dataset 2
try:
    test_dataset_2 = datasets.ImageFolder(TEST_DIR_2, transform=test_transform)
    test_loader_2 = DataLoader(test_dataset_2, batch_size=BATCH_SIZE, shuffle=False)
    
    print(f"✓ Test dataset loaded from: {TEST_DIR_2}")
    print(f"  Classes: {test_dataset_2.classes}")
    print(f"  Total images: {len(test_dataset_2)}")
    
    # Class distribution
    test_labels_2 = [label for _, label in test_dataset_2.samples]
    for i, class_name in enumerate(class_names):
        count = test_labels_2.count(i)
        print(f"    - {class_name}: {count} images")
        
except Exception as e:
    print(f"❌ Error loading test dataset: {str(e)}")
    raise

# Evaluate on test set 2
print("\n" + "-"*70)
print("Running inference...")
print("-"*70)

preds_2, labels_2, probs_2 = evaluate_model(model, test_loader_2, DEVICE)

# Save predictions to CSV
predictions_df_2 = pd.DataFrame({
    'Image_Index': range(len(labels_2)),
    'True_Label': [class_names[label] for label in labels_2],
    'Predicted_Label': [class_names[pred] for pred in preds_2],
    'Confidence': [probs_2[i][preds_2[i]] for i in range(len(preds_2))]
})

# Add probability columns for each class
for i, class_name in enumerate(class_names):
    predictions_df_2[f'Prob_{class_name}'] = probs_2[:, i]

csv_path_2 = 'predictions_SelfCollected_Dataset.csv'
predictions_df_2.to_csv(csv_path_2, index=False)
print(f"\n✓ Predictions saved to: {csv_path_2}")

# Calculate confusion matrix
cm_2 = confusion_matrix(labels_2, preds_2)

# Print classification report
print("\n" + "="*70)
print("CLASSIFICATION REPORT - TEST SET 2 (SelfCollected_Dataset)")
print("="*70)
report_2 = classification_report(labels_2, preds_2, target_names=class_names, digits=4)
print(report_2)

# Save classification report
with open('classification_report_SelfCollected_Dataset.txt', 'w') as f:
    f.write("="*70 + "\n")
    f.write("CLASSIFICATION REPORT - TEST SET 2 (SelfCollected_Dataset)\n")
    f.write("="*70 + "\n\n")
    f.write(report_2)
print("✓ Classification report saved to: classification_report_SelfCollected_Dataset.txt")

# Plot confusion matrix
print("\n" + "="*70)
print("CONFUSION MATRIX - TEST SET 2")
print("="*70)
plot_confusion_matrix(cm_2, class_names, 
                     title='Confusion Matrix - SelfCollected_Dataset',
                     save_path='confusion_matrix_SelfCollected_Dataset.png')

# Calculate per-class accuracy
print("\n" + "="*70)
print("PER-CLASS METRICS - TEST SET 2")
print("="*70)
for i, class_name in enumerate(class_names):
    class_accuracy = cm_2[i, i] / cm_2[i, :].sum() if cm_2[i, :].sum() > 0 else 0
    print(f"  {class_name}:")
    print(f"    Accuracy: {class_accuracy:.4f} ({cm_2[i, i]}/{cm_2[i, :].sum()})")

# Overall metrics
overall_accuracy_2 = np.trace(cm_2) / np.sum(cm_2)
precision_2, recall_2, f1_2, _ = precision_recall_fscore_support(labels_2, preds_2, average='weighted')

print(f"\n{'='*70}")
print("OVERALL METRICS - TEST SET 2 (SelfCollected_Dataset)")
print("="*70)
print(f"  Overall Accuracy: {overall_accuracy_2:.4f}")
print(f"  Weighted Precision: {precision_2:.4f}")
print(f"  Weighted Recall: {recall_2:.4f}")
print(f"  Weighted F1-Score: {f1_2:.4f}")
print("="*70)

# Save summary metrics
metrics_summary_2 = {
    'Dataset': 'SelfCollected_Dataset',
    'Total_Samples': len(labels_2),
    'Overall_Accuracy': overall_accuracy_2,
    'Weighted_Precision': precision_2,
    'Weighted_Recall': recall_2,
    'Weighted_F1': f1_2
}

metrics_df_2 = pd.DataFrame([metrics_summary_2])
metrics_df_2.to_csv('metrics_summary_SelfCollected_Dataset.csv', index=False)
print(f"\n✓ Metrics summary saved to: metrics_summary_SelfCollected_Dataset.csv")

## Cell 8: Comparative Analysis of Both Test Sets

This cell compares the performance on both test datasets side by side.

In [None]:
# ==========================================
# COMPARATIVE ANALYSIS OF BOTH TEST SETS
# ==========================================

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

print("\n" + "="*70)
print("COMPARATIVE ANALYSIS: preprocessed_self vs SelfCollected_Dataset")
print("="*70)

# Combine metrics from both test sets
comparison_data = {
    'Metric': ['Overall Accuracy', 'Weighted Precision', 'Weighted Recall', 'Weighted F1-Score'],
    'preprocessed_self': [overall_accuracy_1, precision_1, recall_1, f1_1],
    'SelfCollected_Dataset': [overall_accuracy_2, precision_2, recall_2, f1_2]
}

comparison_df = pd.DataFrame(comparison_data)
print("\n" + comparison_df.to_string(index=False))

# Save comparison
comparison_df.to_csv('test_sets_comparison.csv', index=False)
print(f"\n✓ Comparison saved to: test_sets_comparison.csv")

# Visualize comparison
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Bar chart comparison
metrics = comparison_df['Metric']
x = np.arange(len(metrics))
width = 0.35

axes[0].bar(x - width/2, comparison_df['preprocessed_self'], width, label='preprocessed_self', alpha=0.8)
axes[0].bar(x + width/2, comparison_df['SelfCollected_Dataset'], width, label='SelfCollected_Dataset', alpha=0.8)
axes[0].set_xlabel('Metrics', fontsize=12)
axes[0].set_ylabel('Score', fontsize=12)
axes[0].set_title('Performance Comparison: Two Test Datasets', fontsize=14, fontweight='bold')
axes[0].set_xticks(x)
axes[0].set_xticklabels(metrics, rotation=45, ha='right')
axes[0].legend()
axes[0].grid(axis='y', alpha=0.3)
axes[0].set_ylim([0, 1.0])

# Add value labels on bars
for i, metric in enumerate(metrics):
    axes[0].text(i - width/2, comparison_df['preprocessed_self'][i] + 0.02, 
                f'{comparison_df["preprocessed_self"][i]:.3f}', 
                ha='center', va='bottom', fontsize=9)
    axes[0].text(i + width/2, comparison_df['SelfCollected_Dataset'][i] + 0.02, 
                f'{comparison_df["SelfCollected_Dataset"][i]:.3f}', 
                ha='center', va='bottom', fontsize=9)

# Per-class accuracy comparison
class_acc_1 = [cm_1[i, i] / cm_1[i, :].sum() if cm_1[i, :].sum() > 0 else 0 
               for i in range(len(class_names))]
class_acc_2 = [cm_2[i, i] / cm_2[i, :].sum() if cm_2[i, :].sum() > 0 else 0 
               for i in range(len(class_names))]

x_classes = np.arange(len(class_names))
axes[1].bar(x_classes - width/2, class_acc_1, width, label='preprocessed_self', alpha=0.8)
axes[1].bar(x_classes + width/2, class_acc_2, width, label='SelfCollected_Dataset', alpha=0.8)
axes[1].set_xlabel('Class', fontsize=12)
axes[1].set_ylabel('Accuracy', fontsize=12)
axes[1].set_title('Per-Class Accuracy Comparison', fontsize=14, fontweight='bold')
axes[1].set_xticks(x_classes)
axes[1].set_xticklabels(class_names)
axes[1].legend()
axes[1].grid(axis='y', alpha=0.3)
axes[1].set_ylim([0, 1.0])

# Add value labels
for i, class_name in enumerate(class_names):
    axes[1].text(i - width/2, class_acc_1[i] + 0.02, f'{class_acc_1[i]:.3f}', 
                ha='center', va='bottom', fontsize=9)
    axes[1].text(i + width/2, class_acc_2[i] + 0.02, f'{class_acc_2[i]:.3f}', 
                ha='center', va='bottom', fontsize=9)

plt.tight_layout()
plt.savefig('test_sets_comparison.png', dpi=300, bbox_inches='tight')
print("✓ Comparison chart saved to: test_sets_comparison.png")
plt.show()

# Print detailed comparison
print("\n" + "="*70)
print("PER-CLASS ACCURACY COMPARISON")
print("="*70)
for i, class_name in enumerate(class_names):
    print(f"\n{class_name}:")
    print(f"  preprocessed_self:       {class_acc_1[i]:.4f}")
    print(f"  SelfCollected_Dataset:   {class_acc_2[i]:.4f}")
    diff = class_acc_2[i] - class_acc_1[i]
    print(f"  Difference:              {diff:+.4f}")

print("\n" + "="*70)
print("SUMMARY")
print("="*70)
print(f"\nTest Set 1 (preprocessed_self):")
print(f"  Samples: {len(labels_1)}")
print(f"  Overall Accuracy: {overall_accuracy_1:.4f}")

print(f"\nTest Set 2 (SelfCollected_Dataset):")
print(f"  Samples: {len(labels_2)}")
print(f"  Overall Accuracy: {overall_accuracy_2:.4f}")

print(f"\nAccuracy Difference: {(overall_accuracy_2 - overall_accuracy_1):+.4f}")
print("="*70)

# Create comprehensive summary file
with open('comprehensive_evaluation_summary.txt', 'w') as f:
    f.write("="*70 + "\n")
    f.write("COMPREHENSIVE EVALUATION SUMMARY\n")
    f.write("ResNet50 Waste Classification Model\n")
    f.write("="*70 + "\n\n")
    
    f.write("TRAINING CONFIGURATION:\n")
    f.write(f"  Training Dataset: {DATA_DIR}\n")
    f.write(f"  Model: ResNet50 (Fine-tuned)\n")
    f.write(f"  Classes: {class_names}\n")
    f.write(f"  Epochs: {NUM_EPOCHS}\n")
    f.write(f"  Batch Size: {BATCH_SIZE}\n")
    f.write(f"  Learning Rate: {LEARNING_RATE}\n\n")
    
    f.write("="*70 + "\n")
    f.write("TEST SET 1: preprocessed_self\n")
    f.write("="*70 + "\n")
    f.write(f"Total Samples: {len(labels_1)}\n")
    f.write(f"Overall Accuracy: {overall_accuracy_1:.4f}\n")
    f.write(f"Weighted Precision: {precision_1:.4f}\n")
    f.write(f"Weighted Recall: {recall_1:.4f}\n")
    f.write(f"Weighted F1-Score: {f1_1:.4f}\n\n")
    
    f.write("Per-Class Accuracy:\n")
    for i, class_name in enumerate(class_names):
        f.write(f"  {class_name}: {class_acc_1[i]:.4f}\n")
    
    f.write("\n" + "="*70 + "\n")
    f.write("TEST SET 2: SelfCollected_Dataset\n")
    f.write("="*70 + "\n")
    f.write(f"Total Samples: {len(labels_2)}\n")
    f.write(f"Overall Accuracy: {overall_accuracy_2:.4f}\n")
    f.write(f"Weighted Precision: {precision_2:.4f}\n")
    f.write(f"Weighted Recall: {recall_2:.4f}\n")
    f.write(f"Weighted F1-Score: {f1_2:.4f}\n\n")
    
    f.write("Per-Class Accuracy:\n")
    for i, class_name in enumerate(class_names):
        f.write(f"  {class_name}: {class_acc_2[i]:.4f}\n")
    
    f.write("\n" + "="*70 + "\n")
    f.write("FILES GENERATED:\n")
    f.write("="*70 + "\n")
    f.write(f"  - {MODEL_SAVE_PATH} (trained model)\n")
    f.write("  - predictions_preprocessed_self.csv\n")
    f.write("  - predictions_SelfCollected_Dataset.csv\n")
    f.write("  - classification_report_preprocessed_self.txt\n")
    f.write("  - classification_report_SelfCollected_Dataset.txt\n")
    f.write("  - confusion_matrix_preprocessed_self.png\n")
    f.write("  - confusion_matrix_SelfCollected_Dataset.png\n")
    f.write("  - metrics_summary_preprocessed_self.csv\n")
    f.write("  - metrics_summary_SelfCollected_Dataset.csv\n")
    f.write("  - test_sets_comparison.csv\n")
    f.write("  - test_sets_comparison.png\n")
    f.write("  - comprehensive_evaluation_summary.txt\n")

print("\n✓ Comprehensive summary saved to: comprehensive_evaluation_summary.txt")
print("\n" + "="*70)
print("EVALUATION COMPLETE!")
print("="*70)
print("\nAll results have been saved. Check the working directory for:")
print("  - Model checkpoint")
print("  - Predictions (CSV)")
print("  - Classification reports (TXT)")
print("  - Confusion matrices (PNG)")
print("  - Metrics summaries (CSV)")
print("  - Comparison charts (PNG)")
print("="*70)