# Transfer Learning for Fundus Image Classification using ResNet50
---

## Overview

This notebook is the fourth in a series focused on classifying fundus images into one of three categories: Normal, Diabetic Retinopathy, and Other Disease. The previous notebooks covered:

1. **Exploratory Data Analysis (EDA)**: Understanding the dataset and its properties.
2. **Baseline Model**: Establishing simple models as a point of comparison.
3. **Baseline CNN Model**: Implementing a Convolutional Neural Network (CNN) for the task.

In this notebook, we aim to improve the performance of our classification model by leveraging a pre-trained ResNet50 model for transfer learning. 

## Objectives

- Load and preprocess the fundus image dataset.
- Implement transfer learning using ResNet50.
- Evaluate the performance of the model and compare it with the baseline models.


## Table of Contents

1. [Introduction](#Introduction)
    - 1.1 [Objectives](#Objectives)
    - 1.2 [Dataset Overview](#Dataset-Overview)
2. [Data Preprocessing](#Data-Preprocessing)
    - 2.1 [Data Loading](#Data-Loading)
    - 2.2 [Data Augmentation](#Data-Augmentation)
3. [Transfer Learning with ResNet50](#Transfer-Learning-with-ResNet50)
    - 3.1 [Model Architecture](#Model-Architecture)
    - 3.2 [Model Compilation](#Model-Compilation)
    - 3.3 [Training](#Training)
4. [Model Evaluation](#Model-Evaluation)
    - 4.1 [Performance Metrics](#Performance-Metrics)
5. [Comparative Analysis](#Comparative-Analysis)
6. [Conclusion](#Conclusion)
    - 6.1 [Summary](#Summary)
    - 6.2 [Future Work](#Future-Work)

In [61]:
# Standard Libraries
import os

# Libraries for Data Manipulation
import numpy as np
import pandas as pd

# Libraries for Image Handling
from PIL import Image

# Libraries for Data Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Libraries for Machine Learning
from sklearn.metrics import f1_score, recall_score, accuracy_score, confusion_matrix
from sklearn.model_selection import train_test_split, ParameterGrid, KFold

# PyTorch and Related Libraries
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset, random_split, SubsetRandomSampler
from torchvision import models, transforms
from torchvision.transforms import functional as F

# Libraries for Progress Monitoring
from tqdm import tqdm


## Data-Preprocessing

### Data-Loading

All fo the paramters for the model can be found here. 

In [24]:

#Data Loading 
PATH_train_df = 'data/train_df.csv' #Training Data Path 
PATH_test_df = 'data/test_df.csv' #Testing Data Path 
PATH_imgaes = 'data/preprocessed_images' #Image Folder Path

#Model Parameters
eps = 10 #Epochs 
pl = 5 #Patience Limit
bs = 32 #Batch Size
dr = 0.4 #Dropout Rate
lr = 0.001 #Learning Rate

# Transformer Parameters
randRot = 5 #Random Rotation in Degrees
brJit = 0.2 #Brightness Jitter
ctJit =0.2 #Contrast Jitter
brFct = 1.2 #Brightness Factor
ctFct = 1.2 #Contrast Factor

#K-Fold Cross-Validation
k = 5 #Number of splits
kCriterion = nn.CrossEntropyLoss() #Kriterion... Get it?

# Device
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")


Loading the data

In [3]:

def load_dataframes(train_path, test_path):
    train_df = pd.read_csv(train_path)
    test_df = pd.read_csv(test_path)
    return train_df, test_df


Defining a DataSet class

In [4]:

# Dataset Class for loading Fundus images and labels
class FundusDataset(Dataset):
    def __init__(self, dataframe, root_dir, transform=None):
        self.df = dataframe
        self.root_dir = root_dir
        self.transform = transform
        self.label_mapping = {'D': 0, 'O': 1, 'N': 2}

    def __len__(self):
        return len(self.df)

    def __getitem__(self, idx):
        img_name = os.path.join(self.root_dir, self.df.loc[idx, 'filename'])
        image = Image.open(img_name).convert('RGB')
        label_str = self.df.loc[idx, 'Grouped-Label']
        label = self.label_mapping[label_str]
        label = torch.tensor(label)
        if self.transform:
            image = self.transform(image)
        return image, label


#### Data Augmentation

Below we are have a function that will adjust transform the images for training and validation. 

In [38]:

def get_transforms():
    transform = transforms.Compose([
        transforms.RandomHorizontalFlip(),
        transforms.RandomRotation(randRot),
        transforms.ColorJitter(brightness=brJit, contrast=ctJit),
        transforms.Lambda(lambda img: F.adjust_brightness(img, brightness_factor=brFct)),
        transforms.Lambda(lambda img: F.adjust_contrast(img, contrast_factor=ctFct)),
        transforms.Resize((512, 512)),
        transforms.ToTensor(),
    ])
    val_transform = transforms.Compose([
        transforms.Resize((512, 512)),
        transforms.Lambda(lambda img: F.adjust_brightness(img, brightness_factor=brFct)),
        transforms.Lambda(lambda img: F.adjust_contrast(img, contrast_factor=ctFct)),
        transforms.ToTensor(),
    ])
    return transform, val_transform


## Transfer Learning with ResNet50

Here we are loading the dataframes into the dataset

In [6]:

def get_dataloaders(train_df, test_df, root_dir, transform, val_transform, batch_size=bs):
    train_dataset = FundusDataset(dataframe=train_df, root_dir=root_dir, transform=transform)
    val_dataset = FundusDataset(dataframe=test_df, root_dir=root_dir, transform=val_transform)
    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)
    return train_loader, val_loader


### Model Architecture

Here we can get the resnet50Model and set the 3 features for the final layer. 
<br> Initially we had tried a resnet18 model but switched to the larger model to try to generalize better. 

In [7]:

def get_model():
    model = models.resnet50(pretrained=True)
    num_ftrs = model.fc.in_features
    model.fc = nn.Linear(num_ftrs, 3)
    return model


Set the device to use the GPU

In [8]:

def get_device():
    return torch.device("cuda:0" if torch.cuda.is_available() else "cpu")


Set the criterion and optimizer

In [9]:

def get_loss_and_optimizer(model, learning_rate=lr):
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=learning_rate)
    return criterion, optimizer


Here we set the early stopping parameters. 

In [23]:

def init_early_stopping():
    return float('inf'), 0, pl


Below is the loops for one training and one validation epoch. 

In [12]:

def train_one_epoch(model, train_loader, criterion, optimizer, device):
    model.train()
    running_loss = 0.0
    batch_bar = tqdm(train_loader, desc="Batches", leave=True)
    
    for i, (images, labels) in enumerate(batch_bar):
        images, labels = images.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    return running_loss / len(train_loader)

def validate_one_epoch(model, val_loader, criterion, device):
    model.eval()
    running_loss = 0.0
    all_labels = []
    all_preds = []
    with torch.no_grad():
        for images, labels in val_loader:
            images, labels = images.to(device), labels.to(device)
            
            # Run the model on input data
            outputs = model(images)
            _, preds = torch.max(outputs, 1)
            
            # Compute the loss
            loss = criterion(outputs, labels)
            running_loss += loss.item()
            
            # Collect all the true labels and predictions
            all_labels.extend(labels.cpu().numpy())
            all_preds.extend(preds.cpu().numpy())
            
    avg_loss = running_loss / len(val_loader)
    return avg_loss, np.array(all_labels), np.array(all_preds)



Here we initiate the metrics to blank arrays

In [13]:

def init_metrics():
    return [], [], [], [], []


And here is a function to update the metrics. 

In [14]:

def update_metrics(lists, new_values):
    for lst, value in zip(lists, new_values):
        lst.append(value)
        

Here we are defining the layer that we are updating. 
#### Function Overview
- **Parameters**:
  - `model`: The pre-trained model to be modified.
  - `num_ftrs`: The number of features coming into the newly added fully connected layer.
  - `freeze_all`: Boolean flag to control whether all layers are frozen or not.

- **Process**:
  1. Adds a new sequential block containing a fully connected layer, ReLU activation, dropout, and a final fully connected layer to output 3 classes.
  2. Replaces the existing `fc` layer in the model with this new sequential block.
  3. By default, freezes all parameters in the model to disable backpropagation.
  4. Optionally, unfreezes the parameters in `layer4` and the new fully connected layer for fine-tuning.#### Function Overview

In [16]:
def modify_and_freeze_model(model, num_ftrs, freeze_all=True):
    new_layers = nn.Sequential(
        nn.Linear(num_ftrs, 256),
        nn.ReLU(),
        nn.Dropout(dr),
        nn.Linear(256, 3)
    )
    model.fc = new_layers
    
    # Freezing all layers
    for param in model.parameters():
        param.requires_grad = False
    
    # Unfreezing layers from layer4 onwards
    if not freeze_all:
        for param in model.layer4.parameters():
            param.requires_grad = True
        for param in model.fc.parameters():
            param.requires_grad = True
    
    return model

### Model Compilation

In [18]:

def train_model(model, train_loader, val_loader, criterion, optimizer, device, epochs=eps, patience_limit=pl):
    
    best_val_loss, patience_counter, patience_limit = init_early_stopping()
    train_f1, train_recall, train_accuracy, val_f1, val_recall = init_metrics()
    
    for epoch in range(epochs):
        print(f"Starting Epoch {epoch+1}")
        
        train_loss = train_one_epoch(model, train_loader, criterion, optimizer, device)
        print(f"Epoch {epoch+1} - Training Loss: {train_loss:.4f}")
        
        val_loss, val_labels, val_preds = validate_one_epoch(model, val_loader, criterion, device)
        
        # Compute the metrics
        f1 = f1_score(val_labels, val_preds, average='weighted')
        recall = recall_score(val_labels, val_preds, average='weighted')
        accuracy = accuracy_score(val_labels, val_preds)
        
        print(f"Epoch {epoch+1} - Validation Loss: {val_loss:.4f}, F1: {f1:.4f}, Recall: {recall:.4f}, Accuracy: {accuracy:.4f}")
        
        # Early stopping logic
        if val_loss < best_val_loss:
            best_val_loss = val_loss
            patience_counter = 0
        else:
            patience_counter += 1
        print(f"Patience Counter: {patience_counter}")
        
        if patience_counter >= patience_limit:
            print("Early stopping.")
            break


We are initializing all of the paramters required before running the model.

In [19]:

# Initialize everything
train_df, test_df = load_dataframes(PATH_train_df, PATH_test_df)
transform, val_transform = get_transforms()
train_loader, val_loader = get_dataloaders(train_df, test_df, PATH_imgaes, transform, val_transform)
model = get_model()
model = modify_and_freeze_model(model, model.fc.in_features)
device = get_device()
model = model.to(device)
criterion, optimizer = get_loss_and_optimizer(model)
best_val_loss, patience_counter, patience_limit = init_early_stopping()
train_f1, train_recall, train_accuracy, val_f1, val_recall = init_metrics()


Quick model test

In [20]:

# Run the model
# train_model(model, train_loader, val_loader, criterion, optimizer, device)


This is for the K-Fold Validation logic.
#### Function Overview
- **Parameters**:
  - `k`: Number of folds for the K-Fold Cross-Validation.
  - `full_dataset`: The complete dataset to be split into training and validation sets.
  - `model_func`: Function to initialize the model architecture.
  - `kCriterion`: Loss criterion to use during training.
  - `optimizer_func`: Function to initialize the optimizer.
  - `device`: Computing device (CPU or GPU).
  - `epochs`: Number of training epochs (default is set to the value of `eps`).

- **Process**:
  1. Initializes K-Fold splitting.
  2. Iterates through each fold, creating training and validation subsets.
  3. Initializes the model, optimizer, and other settings for each fold.
  4. Trains the model using the `train_model` function.
  5. Validates the model on the validation subset and computes metrics.

- **Output**:
    - Returns a list containing validation loss, true labels, and predicted labels for each fold.

In [21]:

def k_fold_train_model(k, full_dataset, model_func, kCriterion, optimizer_func, device, epochs=eps):
    kf = KFold(n_splits=k)
    fold_results = []
    
    for fold, (train_idx, val_idx) in enumerate(kf.split(full_dataset)):
        print(f"Starting Fold {fold+1}")
        
        # Create train and validation data loaders
        train_subsampler = SubsetRandomSampler(train_idx)
        val_subsampler = SubsetRandomSampler(val_idx)
        
        train_loader = DataLoader(full_dataset, batch_size=bs, sampler=train_subsampler)
        val_loader = DataLoader(full_dataset, batch_size=bs, sampler=val_subsampler)
        
        # Initialize model, optimizer, and other settings for each fold
        model = model_func().to(device)
        criterion = kCriterion
        optimizer = optimizer_func(model)
        
        # Train the model
        train_model(model, train_loader, val_loader, criterion, optimizer, device, epochs)
        
        # Compute validation metrics
        val_loss, val_labels, val_preds = validate_one_epoch(model, val_loader, criterion, device)
        fold_results.append((val_loss, val_labels, val_preds))
        print(f"Fold {fold+1} - Validation Loss: {val_loss:.4f}")
        
    return fold_results


### Training

This is where the fun begins! And Waiting. 
<br> We start training the model here.

In [22]:

#Training Data for K-Folds
full_dataset = FundusDataset(dataframe=train_df, root_dir=PATH_imgaes, transform=transform)

# Function to create a new model
def create_model():
    base_model = get_model()
    return modify_and_freeze_model(base_model, base_model.fc.in_features, freeze_all=False)

# Function to create a new optimizer
def create_optimizer(model):
    return optim.Adam(model.parameters(), lr=lr)

# Run 5-fold cross-validation
k_fold_results = k_fold_train_model(k, full_dataset, create_model, kCriterion, create_optimizer, device)


Starting Fold 1
Starting Epoch 1


Batches: 100%|██████████| 128/128 [03:18<00:00,  1.55s/it]


Epoch 1 - Training Loss: 1.0458
Epoch 1 - Validation Loss: 0.9839, F1: 0.5076, Recall: 0.5230, Accuracy: 0.5230
Patience Counter: 0
Starting Epoch 2


Batches: 100%|██████████| 128/128 [03:14<00:00,  1.52s/it]


Epoch 2 - Training Loss: 0.9368
Epoch 2 - Validation Loss: 0.9049, F1: 0.5530, Recall: 0.5591, Accuracy: 0.5591
Patience Counter: 0
Starting Epoch 3


Batches: 100%|██████████| 128/128 [03:10<00:00,  1.49s/it]


Epoch 3 - Training Loss: 0.8989
Epoch 3 - Validation Loss: 0.9425, F1: 0.4907, Recall: 0.5103, Accuracy: 0.5103
Patience Counter: 1
Starting Epoch 4


Batches: 100%|██████████| 128/128 [03:10<00:00,  1.49s/it]


Epoch 4 - Training Loss: 0.8702
Epoch 4 - Validation Loss: 0.9233, F1: 0.5604, Recall: 0.5679, Accuracy: 0.5679
Patience Counter: 2
Starting Epoch 5


Batches: 100%|██████████| 128/128 [03:10<00:00,  1.49s/it]


Epoch 5 - Training Loss: 0.8519
Epoch 5 - Validation Loss: 0.9407, F1: 0.5239, Recall: 0.5376, Accuracy: 0.5376
Patience Counter: 3
Starting Epoch 6


Batches: 100%|██████████| 128/128 [03:17<00:00,  1.55s/it]


Epoch 6 - Training Loss: 0.8256
Epoch 6 - Validation Loss: 0.8888, F1: 0.5771, Recall: 0.5816, Accuracy: 0.5816
Patience Counter: 0
Starting Epoch 7


Batches: 100%|██████████| 128/128 [03:12<00:00,  1.51s/it]


Epoch 7 - Training Loss: 0.7978
Epoch 7 - Validation Loss: 0.9565, F1: 0.5329, Recall: 0.5415, Accuracy: 0.5415
Patience Counter: 1
Starting Epoch 8


Batches: 100%|██████████| 128/128 [03:12<00:00,  1.50s/it]


Epoch 8 - Training Loss: 0.7773
Epoch 8 - Validation Loss: 0.9001, F1: 0.5853, Recall: 0.5914, Accuracy: 0.5914
Patience Counter: 2
Starting Epoch 9


Batches: 100%|██████████| 128/128 [03:11<00:00,  1.50s/it]


Epoch 9 - Training Loss: 0.7518
Epoch 9 - Validation Loss: 0.8768, F1: 0.6013, Recall: 0.6012, Accuracy: 0.6012
Patience Counter: 0
Starting Epoch 10


Batches: 100%|██████████| 128/128 [03:11<00:00,  1.50s/it]


Epoch 10 - Training Loss: 0.7275
Epoch 10 - Validation Loss: 0.9627, F1: 0.5757, Recall: 0.5806, Accuracy: 0.5806
Patience Counter: 1
Fold 1 - Validation Loss: 0.9594
Starting Fold 2
Starting Epoch 1


Batches: 100%|██████████| 128/128 [03:10<00:00,  1.49s/it]


Epoch 1 - Training Loss: 1.0543
Epoch 1 - Validation Loss: 1.1453, F1: 0.4290, Recall: 0.4692, Accuracy: 0.4692
Patience Counter: 0
Starting Epoch 2


Batches: 100%|██████████| 128/128 [03:11<00:00,  1.50s/it]


Epoch 2 - Training Loss: 0.9519
Epoch 2 - Validation Loss: 0.9242, F1: 0.5407, Recall: 0.5533, Accuracy: 0.5533
Patience Counter: 0
Starting Epoch 3


Batches: 100%|██████████| 128/128 [03:11<00:00,  1.50s/it]


Epoch 3 - Training Loss: 0.9159
Epoch 3 - Validation Loss: 0.8979, F1: 0.5610, Recall: 0.5670, Accuracy: 0.5670
Patience Counter: 0
Starting Epoch 4


Batches: 100%|██████████| 128/128 [03:11<00:00,  1.50s/it]


Epoch 4 - Training Loss: 0.8721
Epoch 4 - Validation Loss: 0.8781, F1: 0.5773, Recall: 0.5787, Accuracy: 0.5787
Patience Counter: 0
Starting Epoch 5


Batches: 100%|██████████| 128/128 [03:10<00:00,  1.49s/it]


Epoch 5 - Training Loss: 0.8445
Epoch 5 - Validation Loss: 0.9053, F1: 0.5614, Recall: 0.5679, Accuracy: 0.5679
Patience Counter: 1
Starting Epoch 6


Batches: 100%|██████████| 128/128 [03:28<00:00,  1.63s/it]


Epoch 6 - Training Loss: 0.8288
Epoch 6 - Validation Loss: 0.8588, F1: 0.5864, Recall: 0.5855, Accuracy: 0.5855
Patience Counter: 0
Starting Epoch 7


Batches: 100%|██████████| 128/128 [03:44<00:00,  1.75s/it]


Epoch 7 - Training Loss: 0.8055
Epoch 7 - Validation Loss: 0.8709, F1: 0.5943, Recall: 0.5943, Accuracy: 0.5943
Patience Counter: 1
Starting Epoch 8


Batches: 100%|██████████| 128/128 [03:43<00:00,  1.75s/it]


Epoch 8 - Training Loss: 0.7700
Epoch 8 - Validation Loss: 0.9812, F1: 0.5419, Recall: 0.5533, Accuracy: 0.5533
Patience Counter: 2
Starting Epoch 9


Batches: 100%|██████████| 128/128 [03:23<00:00,  1.59s/it]


Epoch 9 - Training Loss: 0.7516
Epoch 9 - Validation Loss: 0.8789, F1: 0.5746, Recall: 0.5806, Accuracy: 0.5806
Patience Counter: 3
Starting Epoch 10


Batches: 100%|██████████| 128/128 [03:19<00:00,  1.56s/it]


Epoch 10 - Training Loss: 0.7353
Epoch 10 - Validation Loss: 0.8840, F1: 0.5724, Recall: 0.5777, Accuracy: 0.5777
Patience Counter: 4
Fold 2 - Validation Loss: 0.8866
Starting Fold 3
Starting Epoch 1


Batches: 100%|██████████| 128/128 [03:36<00:00,  1.69s/it]


Epoch 1 - Training Loss: 1.0513
Epoch 1 - Validation Loss: 0.9965, F1: 0.4734, Recall: 0.4907, Accuracy: 0.4907
Patience Counter: 0
Starting Epoch 2


Batches: 100%|██████████| 128/128 [03:38<00:00,  1.71s/it]


Epoch 2 - Training Loss: 0.9439
Epoch 2 - Validation Loss: 0.9697, F1: 0.5409, Recall: 0.5494, Accuracy: 0.5494
Patience Counter: 0
Starting Epoch 3


Batches: 100%|██████████| 128/128 [03:34<00:00,  1.67s/it]


Epoch 3 - Training Loss: 0.8977
Epoch 3 - Validation Loss: 0.9378, F1: 0.5511, Recall: 0.5533, Accuracy: 0.5533
Patience Counter: 0
Starting Epoch 4


Batches: 100%|██████████| 128/128 [03:23<00:00,  1.59s/it]


Epoch 4 - Training Loss: 0.8710
Epoch 4 - Validation Loss: 0.9261, F1: 0.5869, Recall: 0.5885, Accuracy: 0.5885
Patience Counter: 0
Starting Epoch 5


Batches: 100%|██████████| 128/128 [03:27<00:00,  1.62s/it]


Epoch 5 - Training Loss: 0.8380
Epoch 5 - Validation Loss: 0.9440, F1: 0.5640, Recall: 0.5660, Accuracy: 0.5660
Patience Counter: 1
Starting Epoch 6


Batches: 100%|██████████| 128/128 [03:36<00:00,  1.69s/it]


Epoch 6 - Training Loss: 0.8194
Epoch 6 - Validation Loss: 0.8985, F1: 0.5871, Recall: 0.5885, Accuracy: 0.5885
Patience Counter: 0
Starting Epoch 7


Batches: 100%|██████████| 128/128 [03:19<00:00,  1.56s/it]


Epoch 7 - Training Loss: 0.8030
Epoch 7 - Validation Loss: 0.9495, F1: 0.5667, Recall: 0.5699, Accuracy: 0.5699
Patience Counter: 1
Starting Epoch 8


Batches: 100%|██████████| 128/128 [03:39<00:00,  1.72s/it]


Epoch 8 - Training Loss: 0.7683
Epoch 8 - Validation Loss: 0.9221, F1: 0.6002, Recall: 0.6012, Accuracy: 0.6012
Patience Counter: 2
Starting Epoch 9


Batches: 100%|██████████| 128/128 [03:40<00:00,  1.72s/it]


Epoch 9 - Training Loss: 0.7594
Epoch 9 - Validation Loss: 0.9377, F1: 0.5641, Recall: 0.5709, Accuracy: 0.5709
Patience Counter: 3
Starting Epoch 10


Batches: 100%|██████████| 128/128 [03:22<00:00,  1.59s/it]


Epoch 10 - Training Loss: 0.7405
Epoch 10 - Validation Loss: 0.9209, F1: 0.5919, Recall: 0.5914, Accuracy: 0.5914
Patience Counter: 4
Fold 3 - Validation Loss: 0.9274
Starting Fold 4
Starting Epoch 1


Batches: 100%|██████████| 128/128 [03:17<00:00,  1.55s/it]


Epoch 1 - Training Loss: 1.0553
Epoch 1 - Validation Loss: 1.0037, F1: 0.4413, Recall: 0.4883, Accuracy: 0.4883
Patience Counter: 0
Starting Epoch 2


Batches: 100%|██████████| 128/128 [03:20<00:00,  1.57s/it]


Epoch 2 - Training Loss: 0.9639
Epoch 2 - Validation Loss: 0.9297, F1: 0.5329, Recall: 0.5411, Accuracy: 0.5411
Patience Counter: 0
Starting Epoch 3


Batches: 100%|██████████| 128/128 [03:25<00:00,  1.60s/it]


Epoch 3 - Training Loss: 0.8947
Epoch 3 - Validation Loss: 0.8868, F1: 0.5903, Recall: 0.5890, Accuracy: 0.5890
Patience Counter: 0
Starting Epoch 4


Batches: 100%|██████████| 128/128 [03:24<00:00,  1.60s/it]


Epoch 4 - Training Loss: 0.8789
Epoch 4 - Validation Loss: 0.8990, F1: 0.5804, Recall: 0.5783, Accuracy: 0.5783
Patience Counter: 1
Starting Epoch 5


Batches: 100%|██████████| 128/128 [03:16<00:00,  1.53s/it]


Epoch 5 - Training Loss: 0.8479
Epoch 5 - Validation Loss: 0.8875, F1: 0.5711, Recall: 0.5724, Accuracy: 0.5724
Patience Counter: 2
Starting Epoch 6


Batches: 100%|██████████| 128/128 [03:13<00:00,  1.51s/it]


Epoch 6 - Training Loss: 0.8212
Epoch 6 - Validation Loss: 0.8518, F1: 0.5969, Recall: 0.5959, Accuracy: 0.5959
Patience Counter: 0
Starting Epoch 7


Batches: 100%|██████████| 128/128 [03:13<00:00,  1.51s/it]


Epoch 7 - Training Loss: 0.7834
Epoch 7 - Validation Loss: 0.8630, F1: 0.5772, Recall: 0.5783, Accuracy: 0.5783
Patience Counter: 1
Starting Epoch 8


Batches: 100%|██████████| 128/128 [03:13<00:00,  1.51s/it]


Epoch 8 - Training Loss: 0.7596
Epoch 8 - Validation Loss: 0.8799, F1: 0.5897, Recall: 0.5969, Accuracy: 0.5969
Patience Counter: 2
Starting Epoch 9


Batches: 100%|██████████| 128/128 [03:13<00:00,  1.51s/it]


Epoch 9 - Training Loss: 0.7482
Epoch 9 - Validation Loss: 0.8963, F1: 0.5707, Recall: 0.5695, Accuracy: 0.5695
Patience Counter: 3
Starting Epoch 10


Batches: 100%|██████████| 128/128 [03:13<00:00,  1.51s/it]


Epoch 10 - Training Loss: 0.7096
Epoch 10 - Validation Loss: 0.8910, F1: 0.5917, Recall: 0.5959, Accuracy: 0.5959
Patience Counter: 4
Fold 4 - Validation Loss: 0.8699
Starting Fold 5
Starting Epoch 1


Batches: 100%|██████████| 128/128 [03:13<00:00,  1.51s/it]


Epoch 1 - Training Loss: 1.0534
Epoch 1 - Validation Loss: 1.2074, F1: 0.4167, Recall: 0.4687, Accuracy: 0.4687
Patience Counter: 0
Starting Epoch 2


Batches: 100%|██████████| 128/128 [03:13<00:00,  1.51s/it]


Epoch 2 - Training Loss: 0.9613
Epoch 2 - Validation Loss: 0.8917, F1: 0.5799, Recall: 0.5793, Accuracy: 0.5793
Patience Counter: 0
Starting Epoch 3


Batches: 100%|██████████| 128/128 [03:15<00:00,  1.53s/it]


Epoch 3 - Training Loss: 0.9096
Epoch 3 - Validation Loss: 0.8964, F1: 0.5613, Recall: 0.5636, Accuracy: 0.5636
Patience Counter: 1
Starting Epoch 4


Batches: 100%|██████████| 128/128 [03:28<00:00,  1.63s/it]


Epoch 4 - Training Loss: 0.8805
Epoch 4 - Validation Loss: 0.9048, F1: 0.5674, Recall: 0.5665, Accuracy: 0.5665
Patience Counter: 2
Starting Epoch 5


Batches: 100%|██████████| 128/128 [03:32<00:00,  1.66s/it]


Epoch 5 - Training Loss: 0.8544
Epoch 5 - Validation Loss: 0.8392, F1: 0.6125, Recall: 0.6115, Accuracy: 0.6115
Patience Counter: 0
Starting Epoch 6


Batches: 100%|██████████| 128/128 [03:27<00:00,  1.62s/it]


Epoch 6 - Training Loss: 0.8244
Epoch 6 - Validation Loss: 0.9258, F1: 0.5873, Recall: 0.5871, Accuracy: 0.5871
Patience Counter: 1
Starting Epoch 7


Batches: 100%|██████████| 128/128 [03:27<00:00,  1.62s/it]


Epoch 7 - Training Loss: 0.8136
Epoch 7 - Validation Loss: 0.8635, F1: 0.5828, Recall: 0.5822, Accuracy: 0.5822
Patience Counter: 2
Starting Epoch 8


Batches: 100%|██████████| 128/128 [03:26<00:00,  1.61s/it]


Epoch 8 - Training Loss: 0.7800
Epoch 8 - Validation Loss: 0.8776, F1: 0.5912, Recall: 0.5930, Accuracy: 0.5930
Patience Counter: 3
Starting Epoch 9


Batches: 100%|██████████| 128/128 [03:29<00:00,  1.64s/it]


Epoch 9 - Training Loss: 0.7673
Epoch 9 - Validation Loss: 0.8384, F1: 0.6087, Recall: 0.6096, Accuracy: 0.6096
Patience Counter: 0
Starting Epoch 10


Batches: 100%|██████████| 128/128 [03:24<00:00,  1.60s/it]


Epoch 10 - Training Loss: 0.7287
Epoch 10 - Validation Loss: 0.8450, F1: 0.6078, Recall: 0.6067, Accuracy: 0.6067
Patience Counter: 1
Fold 5 - Validation Loss: 0.8455


### K-Fold Validation Metrics for the Final Model

The final model was trained and validated using 5-Fold Cross-Validation. Here are the validation metrics obtained at the 10th epoch for each fold:

#### Fold 1
- **Training Loss**: 0.7275
- **Validation Loss**: 0.9594
- **F1 Score**: 0.5757
- **Recall**: 0.5806
- **Accuracy**: 0.5806

#### Fold 2
- **Training Loss**: 0.7353
- **Validation Loss**: 0.8866
- **F1 Score**: 0.5724
- **Recall**: 0.5777
- **Accuracy**: 0.5777

#### Fold 3
- **Training Loss**: 0.7405
- **Validation Loss**: 0.9274
- **F1 Score**: 0.5919
- **Recall**: 0.5914
- **Accuracy**: 0.5914

#### Fold 4
- **Training Loss**: 0.7096
- **Validation Loss**: 0.8699
- **F1 Score**: 0.5917
- **Recall**: 0.5959
- **Accuracy**: 0.5959

#### Fold 5
- **Training Loss**: 0.7287
- **Validation Loss**: 0.8455
- **F1 Score**: 0.6078
- **Recall**: 0.6067
- **Accuracy**: 0.6067

#### Observations
- F1 Score, Recall, and Accuracy show variations across the folds, with Fold 5 demonstrating the best performance.
- The Validation Loss also shows some fluctuations across the folds, hinting at the model's sensitivity to the data distribution.

These K-Fold metrics provide a comprehensive view of the model's robustness and generalization capabilities across different subsets of the dataset.


We save the model for evaluation

In [25]:
torch.save(model.state_dict(), 'res50Fundus.pth')

## Model Evaluation

Load in the test data 

In [47]:

# Assuming test_df is your test DataFrame and FundusDataset is your custom dataset class
test_dataset = FundusDataset(test_df, PATH_imgaes, transform=val_transform)
test_loader = DataLoader(test_dataset, batch_size=8, shuffle=False)


Load up the model

In [48]:

# Initialize the model architecture
model = get_model()

# Modify and optionally freeze model layers
num_ftrs = model.fc.in_features  # Or the number of features that your model's fc layer expects
model = modify_and_freeze_model(model, num_ftrs, freeze_all=False)  # Set freeze_all based on your training setup

# Load the state dictionary into the model
model.load_state_dict(torch.load('res50Fundus.pth'))

# Set the model to evaluation mode
model.eval()


ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 

Ensure that the GPU is running the model

In [49]:
# Move the model to the device
model = model.to(device)

### Performance Metrics

Check the performance metrics on the training set. 

In [50]:
true_labels = []
pred_labels = []

with torch.no_grad():
    for images, labels in test_loader:
        images, labels = images.to(device), labels.to(device)
        
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        
        true_labels.extend(labels.cpu().numpy())
        pred_labels.extend(predicted.cpu().numpy())

# Calculate metrics
f1 = f1_score(true_labels, pred_labels, average='weighted')
recall = recall_score(true_labels, pred_labels, average='weighted')
accuracy = accuracy_score(true_labels, pred_labels)

print(f'F1 Score: {f1}')
print(f'Recall: {recall}')
print(f'Accuracy: {accuracy}')


F1 Score: 0.20394488135985972
Recall: 0.33620015637216577
Accuracy: 0.33620015637216577


## Comparative Analysis

#### Final Model
- **F1 Score**: 0.204
- **Recall**: 0.336
- **Accuracy**: 0.336

#### Augmented Baseline CNN Model
- **F1 Score**: 0.325
- **Recall**: 0.334
- **Accuracy**: 0.334

#### Initial Baseline CNN Model
- **F1 Score**: 0.168
- **Recall**: 0.333
- **Accuracy**: 0.333

#### Observations
- The F1 Score of the final model (0.204) is better than the initial baseline (0.168) but worse than the previous baseline (0.325).
- Recall and Accuracy show a slight increase across all models, with the final model marginally outperforming the baselines.

The metrics indicate a mixed performance. While the final model improves upon the initial baseline in terms of F1 Score, it falls short of the performance achieved by the previous baseline model. Further investigation is needed to understand the factors contributing to these variations.


### Further Analysis and Next Steps

#### Analyzing Metrics
- **Comparison with Baselines**: The final model's F1 score is lower than the previous baseline but higher than the initial baseline. This indicates that some changes may have led to a decrease in performance compared to the previous baseline.

#### Potential Causes
- **Model Complexity**: The model may be either too complex or too simple to capture the underlying patterns in the data effectively.
    - More experimenting with adding or removing layers and neurons may need to be done. Techniques like dropout or regularization to control overfitting were used but perhaps not to the best paramters.
- **Hyperparameter Tuning**: The choice of hyperparameters like learning rate, dropout rate, etc., could be suboptimal.
    - Use hyperparameter optimization techniques such as grid search or random search to systematically explore the hyperparameter space and identify the best-performing settings. This was done in the previous notebook but clearly does not translate to this model.
- **Data Augmentation**: The transformations applied might not be beneficial for this specific task, or additional augmentations may be needed.
<br>-Evaluate the effectiveness of each augmentation technique through systematic removal of some of the augmentations. I will need to experiment with different types and combinations of data augmentations to identify the most beneficial set.

## Conclusion

### Summary

The final model showed improvement over the initial baseline but performed less optimally compared to the previous baseline with the K-Fold validation indicating variability in performance metrics.
<br>Given the current metrics and observations, further diagnostic analysis and model refinement are needed. A systematic approach to identifying bottlenecks and areas for improvement will be essential for enhancing the model's performance.

### Future Work

- **Hyperparameter Optimization**: Employ techniques like grid search or random search for hyperparameter tuning.
- **Ensemble Methods**: Consider using ensemble methods to combine the strengths of multiple models.
- **Data Resampling**: Implement oversampling or undersampling techniques to balance the classes.
