# Assignment Module 2: Product Classification

The goal of this assignment is to implement a neural network that classifies smartphone pictures of products found in grocery stores. The assignment will be divided into two parts: first, you will be asked to implement from scratch your own neural network for image classification; then, you will fine-tune a pretrained network provided by PyTorch.


## Preliminaries: the dataset

The dataset you will be using contains natural images of products taken with a smartphone camera in different grocery stores:

<p align="center">
  <img src="https://github.com/marcusklasson/GroceryStoreDataset/raw/master/sample_images/natural/Granny-Smith.jpg" width="150">
  <img src="https://github.com/marcusklasson/GroceryStoreDataset/raw/master/sample_images/natural/Pink-Lady.jpg" width="150">
  <img src="https://github.com/marcusklasson/GroceryStoreDataset/raw/master/sample_images/natural/Lemon.jpg" width="150">
  <img src="https://github.com/marcusklasson/GroceryStoreDataset/raw/master/sample_images/natural/Banana.jpg" width="150">
  <img src="https://github.com/marcusklasson/GroceryStoreDataset/raw/master/sample_images/natural/Vine-Tomato.jpg" width="150">
</p>
<p align="center">
  <img src="https://github.com/marcusklasson/GroceryStoreDataset/raw/master/sample_images/natural/Yellow-Onion.jpg" width="150">
  <img src="https://github.com/marcusklasson/GroceryStoreDataset/raw/master/sample_images/natural/Green-Bell-Pepper.jpg" width="150">
  <img src="https://github.com/marcusklasson/GroceryStoreDataset/raw/master/sample_images/natural/Arla-Standard-Milk.jpg" width="150">
  <img src="https://github.com/marcusklasson/GroceryStoreDataset/raw/master/sample_images/natural/Oatly-Natural-Oatghurt.jpg" width="150">
  <img src="https://github.com/marcusklasson/GroceryStoreDataset/raw/master/sample_images/natural/Alpro-Fresh-Soy-Milk.jpg" width="150">
</p>

The products belong to the following 43 classes:
```
0.  Apple
1.  Avocado
2.  Banana
3.  Kiwi
4.  Lemon
5.  Lime
6.  Mango
7.  Melon
8.  Nectarine
9.  Orange
10. Papaya
11. Passion-Fruit
12. Peach
13. Pear
14. Pineapple
15. Plum
16. Pomegranate
17. Red-Grapefruit
18. Satsumas
19. Juice
20. Milk
21. Oatghurt
22. Oat-Milk
23. Sour-Cream
24. Sour-Milk
25. Soyghurt
26. Soy-Milk
27. Yoghurt
28. Asparagus
29. Aubergine
30. Cabbage
31. Carrots
32. Cucumber
33. Garlic
34. Ginger
35. Leek
36. Mushroom
37. Onion
38. Pepper
39. Potato
40. Red-Beet
41. Tomato
42. Zucchini
```

The dataset is split into training (`train`), validation (`val`), and test (`test`) set.

The following code cells download the dataset and define a `torch.utils.data.Dataset` class to access it. This `Dataset` class will be the starting point of your assignment: use it in your own code and build everything else around it.

In [1]:
!git clone https://github.com/marcusklasson/GroceryStoreDataset.git

Cloning into 'GroceryStoreDataset'...
remote: Enumerating objects: 6559, done.[K
remote: Counting objects: 100% (266/266), done.[K
remote: Compressing objects: 100% (231/231), done.[K
remote: Total 6559 (delta 45), reused 37 (delta 35), pack-reused 6293 (from 1)[K
Receiving objects: 100% (6559/6559), 116.26 MiB | 33.42 MiB/s, done.
Resolving deltas: 100% (275/275), done.
Updating files: 100% (5717/5717), done.


In [2]:
from pathlib import Path
from PIL import Image
from torch import Tensor
from torch.utils.data import Dataset
from typing import List, Tuple
import torchvision.transforms as transforms

In [3]:
class GroceryStoreDataset(Dataset):

    def __init__(self, split: str, transform=None) -> None:
        super().__init__()

        self.root = Path("/content/GroceryStoreDataset/dataset")
        self.split = split
        self.paths, self.labels = self.read_file()

        self.transform = transform

    def __len__(self) -> int:
        return len(self.labels)

    def __getitem__(self, idx) -> Tuple[Tensor, int]:
        img = Image.open(self.root / self.paths[idx])
        label = self.labels[idx]

        if self.transform:
            img = self.transform(img)

        return img, label

    def read_file(self) -> Tuple[List[str], List[int]]:
        paths = []
        labels = []

        with open(self.root / f"{self.split}.txt") as f:
            for line in f:
                # path, fine-grained class, coarse-grained class
                path, _, label = line.replace("\n", "").split(", ")
                paths.append(path), labels.append(int(label))

        return paths, labels

    def get_num_classes(self) -> int:
        return max(self.labels) + 1

## Part 1: design your own network

Your goal is to implement a convolutional neural network for image classification and train it on `GroceryStoreDataset`. You should consider yourselves satisfied once you obtain a classification accuracy on the **validation** split of **around 60%**. You are free to achieve that however you want, except for a few rules you must follow:

- You **cannot** simply instantiate an off-the-self PyTorch network. Instead, you must construct your network as a composition of existing PyTorch layers. In more concrete terms, you can use e.g. `torch.nn.Linear`, but you **cannot** use e.g. `torchvision.models.alexnet`.

- Justify every *design choice* you make. Design choices include network architecture, training hyperparameters, and, possibly, dataset preprocessing steps. You can either (i) start from the simplest convolutional network you can think of and add complexity one step at a time, while showing how each step gets you closer to the target ~60%, or (ii) start from a model that is already able to achieve the desired accuracy and show how, by removing some of its components, its performance drops (i.e. an *ablation study*). You can *show* your results/improvements however you want: training plots, console-printed values or tables, or whatever else your heart desires: the clearer, the better.

Don't be too concerned with your network performance: the ~60% is just to give you an idea of when to stop. Keep in mind that a thoroughly justified model with lower accuracy will be rewarded **more** points than a poorly experimentally validated model with higher accuracy.

### imports

In [4]:
import matplotlib.pyplot as plt
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader
import torch.optim as optim
import torch
from torch.optim.lr_scheduler import StepLR
import copy

### data preprocessing and augmentation
In this first step we transform both the train and the validation sets, this is done because we want to do some augmentation in order to reduce the bias of the network.
- at first the images are resized to 64x64 in order to ensure uniform input size
- then we have the data augmentation (of course, only for the training set) with RandomHorizontalFlip and RandomRotation
- finally we normalize the images

In [None]:
"""
# Data augmentation and normalization
train_transforms = transforms.Compose([
    transforms.Resize((64, 64)),
    transforms.RandomHorizontalFlip(),
    transforms.RandomRotation(15),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

val_transforms = transforms.Compose([
    transforms.Resize((64, 64)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

test_transforms = transforms.Compose([
    transforms.Resize((64, 64)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
"""

In [5]:

train_transforms = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.RandomHorizontalFlip(),
    transforms.RandomRotation(15),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

val_transforms = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

test_transforms = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])


In [6]:
# Step 1: Define the device (GPU or CPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load the datasets
train_dataset = GroceryStoreDataset(split='train', transform=train_transforms)
val_dataset = GroceryStoreDataset(split='val', transform=val_transforms)
test_dataset = GroceryStoreDataset(split='test', transform=test_transforms)

# Create data loaders
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

### plotting functions

In [7]:
def plot_accuracy(train_acc, val_acc):

    epochs = range(1, len(train_acc) + 1)

    plt.figure(figsize=(8, 6))
    plt.plot(epochs, train_acc, 'b', label='Training Accuracy')
    plt.plot(epochs, val_acc, 'r', label='Validation Accuracy')
    plt.title('Training and Validation Accuracy')
    plt.xlabel('Epochs')
    plt.ylabel('Accuracy')
    plt.legend()
    plt.grid(True)
    plt.show()

def plot_loss(train_loss, val_loss):
    epochs = range(1, len(train_loss) + 1)

    plt.figure(figsize=(8, 6))
    plt.plot(epochs, train_loss, 'b', label='Training Loss')
    plt.plot(epochs, val_loss, 'r', label='Validation Loss')
    plt.title('Training and Validation Loss')
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.legend()
    plt.grid(True)
    plt.show()


### definition of training function
Here we define the training function. It takes as input the model, the train and val loader, as well as the criterion, optimizer, scheduler, number of epochs and patience. The total number of epoch is set to 20, although if there is no improvement after 5 steps the model stops training. At each epoch, we train on the train set and then evaluate the performance on the validation step.

In [8]:
def train_model_with_early_stopping_and_scheduler(model, train_loader, val_loader, criterion, optimizer, scheduler, num_epochs=20, patience=5):
    """
    Train the model on the training set and evaluate it on the validation set, with early stopping and a learning rate scheduler.

    Args:
        model (nn.Module): The PyTorch model to train.
        train_loader (DataLoader): DataLoader for the training dataset.
        val_loader (DataLoader): DataLoader for the validation dataset.
        criterion (nn.Module): Loss function used to calculate training and validation loss.
        optimizer (torch.optim.Optimizer): Optimizer for model weight updates.
        scheduler (torch.optim.lr_scheduler): Learning rate scheduler to adjust the learning rate during training.
        num_epochs (int, optional): Maximum number of training epochs. Default is 20.
        patience (int, optional): Number of consecutive epochs without validation accuracy improvement before early stopping. Default is 5.

    Returns:
        model (nn.Module): The trained model with the best weights (based on validation accuracy).
        train_loss_arr (list): List of training loss values for each epoch.
        train_acc_arr (list): List of training accuracy values for each epoch.
        val_loss_arr (list): List of validation loss values for each epoch.
        val_acc_arr (list): List of validation accuracy values for each epoch.
    """
    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0
    epochs_no_improve = 0
    early_stop = False

    train_loss_arr = []
    train_acc_arr = []
    val_loss_arr = []
    val_acc_arr = []

    for epoch in range(num_epochs):
        print(f'Epoch {epoch+1}/{num_epochs}')

        # Training phase
        model.train()
        running_loss = 0.0
        running_corrects = 0

        for inputs, labels in train_loader:
            inputs, labels = inputs.to(device), labels.to(device)
            optimizer.zero_grad()

            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            _, preds = torch.max(outputs, 1)
            running_loss += loss.item() * inputs.size(0)
            running_corrects += torch.sum(preds == labels.data)

        epoch_loss = running_loss / len(train_loader.dataset)
        epoch_acc = running_corrects.double() / len(train_loader.dataset)

        train_loss_arr.append(epoch_loss)
        train_acc_arr.append(epoch_acc)

        print(f'Training Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}')

        # Validation phase
        model.eval()
        val_loss = 0.0
        val_corrects = 0

        with torch.no_grad():
            for inputs, labels in val_loader:
                inputs, labels = inputs.to(device), labels.to(device)
                outputs = model(inputs)
                loss = criterion(outputs, labels)

                _, preds = torch.max(outputs, 1)
                val_loss += loss.item() * inputs.size(0)
                val_corrects += torch.sum(preds == labels.data)

        val_loss = val_loss / len(val_loader.dataset)
        val_acc = val_corrects.double() / len(val_loader.dataset)

        val_loss_arr.append(val_loss)
        val_acc_arr.append(val_acc)

        print(f'Validation Loss: {val_loss:.4f} Acc: {val_acc:.4f}')

        # Check if early stopping condition is met
        if val_acc > best_acc:
            best_acc = val_acc
            best_model_wts = copy.deepcopy(model.state_dict())
            epochs_no_improve = 0
        else:
            epochs_no_improve += 1

        if epochs_no_improve >= patience:
            print("Early stopping triggered")
            early_stop = True
            break

        # Step the learning rate scheduler
        scheduler.step()

        print("-" * 20)

    # Restore best model weights
    print(f"Best val Acc: {best_acc:.4f}")
    model.load_state_dict(best_model_wts)
    return model, train_loss_arr, train_acc_arr, val_loss_arr, val_acc_arr


### definition of evaluation function
This function takes as input the model, the test loader and the criterion. It calculates the average test loss and the test accuracy by comparing the computed labels with the true ones.

In [9]:
def evaluate_model_on_test_set(model, test_loader, criterion):
    """
    Evaluate the model on the test set.

    Args:
        model (nn.Module): Trained model to be evaluated.
        test_loader (DataLoader): DataLoader for the test dataset.
        criterion (nn.Module): Loss function to calculate test loss.

    Returns:
        test_loss (float): Average loss on the test set.
        test_accuracy (float): Accuracy on the test set.
    """
    model.eval()  # Set the model to evaluation mode
    test_loss = 0.0
    test_corrects = 0
    total_samples = 0

    with torch.no_grad():  # Disable gradient calculation for evaluation
        for inputs, labels in test_loader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)

            # Calculate loss
            loss = criterion(outputs, labels)
            test_loss += loss.item() * inputs.size(0)

            # Calculate accuracy
            _, preds = torch.max(outputs, 1)
            test_corrects += torch.sum(preds == labels.data)
            total_samples += labels.size(0)

    test_loss = test_loss / total_samples
    test_accuracy = test_corrects.double() / total_samples

    print(f'Test Loss: {test_loss:.4f} | Test Accuracy: {test_accuracy:.4f}')
    return test_loss, test_accuracy


### definition of loss function
In this class we define a variation of CrossEntropyLoss which also incorporates label smoothing, which is a regularization technique that adjusts the target labels during training by distributing a small portion of the target's "probability mass" to other classes.

In [10]:
# Label Smoothing Cross Entropy Loss
class LabelSmoothingLoss(nn.Module):
    def __init__(self, smoothing=0.1):
        super(LabelSmoothingLoss, self).__init__()
        self.smoothing = smoothing

    def forward(self, pred, target):
        # Get number of classes
        num_classes = pred.size(1)

        # Create smoothed labels
        with torch.no_grad():
            true_dist = torch.zeros_like(pred)
            true_dist.fill_(self.smoothing / (num_classes - 1))
            true_dist.scatter_(1, target.data.unsqueeze(1), 1.0 - self.smoothing)

        return torch.mean(torch.sum(-true_dist * F.log_softmax(pred, dim=1), dim=1))

### definition of the network
The following is a CNN which is called EnhancedCNN because its predecessor, the SimpleCNN, of course was extremely basic and didn't perform as well.
This network uses 2d convolutions, with 64, 128, 256, 512 filters. We choose the convolutional layers because it was requested by the task assignment (aka, build a convolutional neural network). The activation function is ReLU.

We also use batch normalization and max pooling after the convolutional layers. Batch normalization is used specifically because it speeds up convergence and improves generalization.

In [None]:
class EnhancedCNN(nn.Module):
    def __init__(self, num_classes):
        super(EnhancedCNN, self).__init__()
        # First convolutional block
        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1)
        self.conv2 = nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.batchnorm1 = nn.BatchNorm2d(64)
        self.batchnorm2 = nn.BatchNorm2d(128)

        # Second convolutional block
        self.conv3 = nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1)
        self.conv4 = nn.Conv2d(256, 512, kernel_size=3, stride=1, padding=1)
        self.batchnorm3 = nn.BatchNorm2d(256)
        self.batchnorm4 = nn.BatchNorm2d(512)

        # Global Average Pooling layer instead of flattening
        self.global_avg_pool = nn.AdaptiveAvgPool2d((1, 1))  # Output size of 1x1

        # Fully connected layers
        self.fc1 = nn.Linear(512, 1024)  # Adjust based on the output from the global avg pooling
        self.fc2 = nn.Linear(1024, num_classes)

        # Regularization
        self.dropout = nn.Dropout(0.5)

    def forward(self, x):
        # First block
        x = self.pool(F.relu(self.batchnorm1(self.conv1(x))))
        x = self.pool(F.relu(self.batchnorm2(self.conv2(x))))

        # Second block
        x = self.pool(F.relu(self.batchnorm3(self.conv3(x))))
        x = self.pool(F.relu(self.batchnorm4(self.conv4(x))))

        # Global Average Pooling
        x = self.global_avg_pool(x)  # Output size becomes [batch_size, 512, 1, 1]
        x = x.view(x.size(0), -1)  # Flatten the tensor to [batch_size, 512]

        # Fully connected layers
        x = F.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.fc2(x)
        return x


### Training the model
As criterion we are using the **LabelSmoothingLoss** priorly defined, with smoothing parameter = 0.1. As an optimizer we are using **Adam**, which optimizes the model's weights in order to minimize the loss function. The parameter lr indicates the learning rate and sets the initial learning rate. Then the learning rate is adjusted via a scheduler, **StepLR** which dynamically adjusts the learning rate during training.

In [None]:
# Initialize the model, loss function, and optimizer
num_classes = train_dataset.get_num_classes()
model = EnhancedCNN(num_classes).to(device)
criterion = LabelSmoothingLoss(smoothing=0.1)
#criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
scheduler = StepLR(optimizer, step_size=10, gamma=0.1)


# Train the model with early stopping
trained_model, train_loss_arr, train_acc_arr, val_loss_arr, val_acc_arr = train_model_with_early_stopping_and_scheduler(model, train_loader, val_loader, criterion, optimizer, scheduler, num_epochs=30, patience=5)

# Save the best model
torch.save(trained_model.state_dict(), 'best_grocery_cnn.pth')


### plots

In [None]:
model.load_state_dict(torch.load('best_grocery_cnn.pth'))
model.to(device)
test_loss, test_accuracy = evaluate_model_on_test_set(model, test_loader, criterion)

## Part 2: fine-tune an existing network

Your goal is to fine-tune a pretrained **ResNet-18** model on `GroceryStoreDataset`. Use the implementation provided by PyTorch, do not implement it yourselves! (i.e. exactly what you **could not** do in the first part of the assignment). Specifically, you must use the PyTorch ResNet-18 model pretrained on ImageNet-1K (V1). Divide your fine-tuning into two parts:

1. First, fine-tune the Resnet-18 with the same training hyperparameters you used for your best model in the first part of the assignment.
1. Then, tweak the training hyperparameters in order to increase the accuracy on the validation split of `GroceryStoreDataset`. Justify your choices by analyzing the training plots and/or citing sources that guided you in your decisions (papers, blog posts, YouTube videos, or whatever else you find enlightening). You should consider yourselves satisfied once you obtain a classification accuracy on the **validation** split **between 80 and 90%**.

In [None]:
"""
import torch
import torchvision.models as models
from torch.utils.data import DataLoader
import torch.nn as nn
import torch.optim as optim
from torch.optim.lr_scheduler import StepLR
import copy
from torchvision import transforms

# Step 1: Define device
#device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Step 2: Load pretrained ResNet-18 model
#resnet18 = models.resnet18(weights=models.ResNet18_Weights.IMAGENET1K_V1)  # Load pretrained ResNet-18

# Step 3: Modify the classifier to fit the GroceryStoreDataset
#num_classes = train_dataset.get_num_classes()  # Dynamically fetch the correct number of classes
#resnet18.fc = nn.Linear(resnet18.fc.in_features, num_classes)
#resnet18 = resnet18.to(device)

# Step 4: Define training and validation transformations
train_transforms = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.RandomHorizontalFlip(),
    transforms.RandomRotation(15),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

val_transforms = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# Step 5: Load datasets and dataloaders
train_dataset = GroceryStoreDataset(split='train', transform=train_transforms)
val_dataset = GroceryStoreDataset(split='val', transform=val_transforms)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)


# Step 2: Load pretrained ResNet-18 model
resnet18 = models.resnet18(weights=models.ResNet18_Weights.IMAGENET1K_V1)

# Step 3: Modify the classifier to fit the GroceryStoreDataset
num_classes = train_dataset.get_num_classes()  # Dynamically fetch the correct number of classes
resnet18.fc = nn.Linear(resnet18.fc.in_features, num_classes)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
resnet18 = resnet18.to(device)

# Step 6: Define the loss function, optimizer, and learning rate scheduler
#criterion = nn.CrossEntropyLoss()
#optimizer = optim.Adam(resnet18.parameters(), lr=0.001)
#scheduler = StepLR(optimizer, step_size=10, gamma=0.1)
criterion = LabelSmoothingLoss(smoothing=0.1)
optimizer = optim.Adam(resnet18.parameters(), lr=0.001)
scheduler = StepLR(optimizer, step_size=10, gamma=0.1)


# Train the ResNet-18 model
resnet18, train_loss_arr, train_acc_arr, val_loss_arr, val_acc_arr = train_model_with_early_stopping_and_scheduler(
    model=resnet18,
    train_loader=train_loader,
    val_loader=val_loader,
    criterion=criterion,
    optimizer=optimizer,
    scheduler=scheduler,
    num_epochs=30,
    patience=5
)

# Save the fine-tuned model
torch.save(resnet18.state_dict(), 'fine_tuned_resnet18.pth')

"""


In [14]:
import torch
import torch.nn as nn
from torch.nn import functional as F
from torch import Tensor
from torchsummary import summary
from torch.optim import Adam, lr_scheduler, SGD
from torch.utils.data import DataLoader, Dataset
from torchvision import transforms as T
from torchvision.models import resnet18, ResNet18_Weights

def get_data_loaders(train_transform, val_test_transform, batch_size=32):

    train_dataset = GroceryStoreDataset(split='train', transform=train_transform)
    val_dataset = GroceryStoreDataset(split='val', transform=val_test_transform)
    test_dataset = GroceryStoreDataset(split='test', transform=val_test_transform)

    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)
    test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

    return train_loader, val_loader, test_loader

train_transforms = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.RandomHorizontalFlip(),
    transforms.RandomRotation(15),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

val_transforms = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])


train_loader, val_loader, _ = get_data_loaders(train_transform=train_transforms, val_test_transform=val_transforms)

all_models = {}

def get_model():
    model = resnet18(weights=ResNet18_Weights.IMAGENET1K_V1)
    num_classes = train_dataset.get_num_classes()
    model.fc = nn.Linear(model.fc.in_features, num_classes)
    return model

In [16]:
model = get_model().to(device)

for name, param in model.named_parameters():
    if 'fc' in name:  # If it's part of the fully connected layer
        param.requires_grad = True  # Keep it trainable
    else:
        param.requires_grad = False  # Freeze all other layers

Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /root/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
100%|██████████| 44.7M/44.7M [00:00<00:00, 146MB/s]


In [17]:
criterion = LabelSmoothingLoss(smoothing=0.1)
optimizer = Adam(model.parameters(), lr=0.001)
scheduler = StepLR(optimizer, step_size=10, gamma=0.1)
num_epochs = 30
patience = 5

In [18]:
#model_save_path='resnet18-v1.pth'

trained_model, train_loss_arr, train_acc_arr, val_loss_arr, val_acc_arr = train_model_with_early_stopping_and_scheduler(
    model=model,
    train_loader=train_loader,
    val_loader=val_loader,
    criterion=criterion,
    optimizer=optimizer,
    scheduler=scheduler,
    num_epochs=num_epochs,
    patience=patience
)

# Save the best model
torch.save(trained_model.state_dict(), 'resnet18-v1.pth')

Epoch 1/30
Training Loss: 2.7267 Acc: 0.3625
Validation Loss: 2.3518 Acc: 0.4257
--------------------
Epoch 2/30
Training Loss: 1.7511 Acc: 0.6947
Validation Loss: 1.9336 Acc: 0.5541
--------------------
Epoch 3/30
Training Loss: 1.4229 Acc: 0.8170
Validation Loss: 1.7694 Acc: 0.6351
--------------------
Epoch 4/30
Training Loss: 1.2808 Acc: 0.8693
Validation Loss: 1.7370 Acc: 0.6419
--------------------
Epoch 5/30
Training Loss: 1.1839 Acc: 0.9076
Validation Loss: 1.6506 Acc: 0.6858
--------------------
Epoch 6/30
Training Loss: 1.1437 Acc: 0.9170
Validation Loss: 1.5809 Acc: 0.7162
--------------------
Epoch 7/30
Training Loss: 1.1051 Acc: 0.9261
Validation Loss: 1.5748 Acc: 0.7230
--------------------
Epoch 8/30
Training Loss: 1.0656 Acc: 0.9420
Validation Loss: 1.6145 Acc: 0.6993
--------------------
Epoch 9/30
Training Loss: 1.0446 Acc: 0.9462
Validation Loss: 1.6160 Acc: 0.6926
--------------------
Epoch 10/30
Training Loss: 1.0217 Acc: 0.9508
Validation Loss: 1.5848 Acc: 0.7196


### Section 2 - Tweak the hyperparameters

#### Attempt 1
- change the learning rate; higher for the final layer and lower for pretrained layers
- increase patience

In [17]:
model = get_model().to(device)

for name, param in model.named_parameters():
    if 'fc' in name:  # If it's part of the fully connected layer
        param.requires_grad = True  # Keep it trainable
    else:
        param.requires_grad = False  # Freeze all other layers

In [18]:
criterion = LabelSmoothingLoss(smoothing=0.1)
#optimizer = Adam(model.parameters(), lr=0.001)
optimizer = optim.Adam([
    {'params': model.fc.parameters(), 'lr': 1e-3},  # Higher LR for final layer
    {'params': model.layer4.parameters(), 'lr': 1e-4}])  # Lower LR for pretrained layers
scheduler = StepLR(optimizer, step_size=10, gamma=0.1)
num_epochs = 30
patience = 10

In [19]:
#model_save_path='resnet18-v2.pth'

trained_model, train_loss_arr, train_acc_arr, val_loss_arr, val_acc_arr = train_model_with_early_stopping_and_scheduler(
    model=model,
    train_loader=train_loader,
    val_loader=val_loader,
    criterion=criterion,
    optimizer=optimizer,
    scheduler=scheduler,
    num_epochs=num_epochs,
    patience=patience
)

# Save the best model
torch.save(trained_model.state_dict(), 'resnet18-v2.pth')

Epoch 1/30
Training Loss: 2.6671 Acc: 0.3883
Validation Loss: 2.2631 Acc: 0.4527
--------------------
Epoch 2/30
Training Loss: 1.7323 Acc: 0.6826
Validation Loss: 1.9393 Acc: 0.5676
--------------------
Epoch 3/30
Training Loss: 1.4136 Acc: 0.8288
Validation Loss: 1.7186 Acc: 0.6858
--------------------
Epoch 4/30
Training Loss: 1.2682 Acc: 0.8731
Validation Loss: 1.6852 Acc: 0.6622
--------------------
Epoch 5/30
Training Loss: 1.1928 Acc: 0.8981
Validation Loss: 1.6246 Acc: 0.7027
--------------------
Epoch 6/30
Training Loss: 1.1374 Acc: 0.9261
Validation Loss: 1.6221 Acc: 0.7128
--------------------
Epoch 7/30
Training Loss: 1.0976 Acc: 0.9326
Validation Loss: 1.5897 Acc: 0.7162
--------------------
Epoch 8/30
Training Loss: 1.0612 Acc: 0.9428
Validation Loss: 1.5573 Acc: 0.7230
--------------------
Epoch 9/30
Training Loss: 1.0431 Acc: 0.9477
Validation Loss: 1.6378 Acc: 0.6926
--------------------
Epoch 10/30
Training Loss: 1.0245 Acc: 0.9515
Validation Loss: 1.5817 Acc: 0.7061


### Attempt 2: change the learning rate for the intermediate layers

In [15]:

criterion = LabelSmoothingLoss(smoothing=0.1)
optimizer = optim.Adam([
    {'params': model.fc.parameters(), 'lr': 1e-3},       # Higher learning rate for final layer
    {'params': model.layer4.parameters(), 'lr': 1e-4},   # Intermediate for higher layers
    {'params': model.layer3.parameters(), 'lr': 1e-5}    # Lower for earlier layers
])
scheduler = StepLR(optimizer, step_size=10, gamma=0.1)
num_epochs = 30
patience = 10

Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /root/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
100%|██████████| 44.7M/44.7M [00:00<00:00, 83.5MB/s]


In [21]:
#model_save_path='resnet18-v3.pth'

trained_model, train_loss_arr, train_acc_arr, val_loss_arr, val_acc_arr = train_model_with_early_stopping_and_scheduler(
    model=model,
    train_loader=train_loader,
    val_loader=val_loader,
    criterion=criterion,
    optimizer=optimizer,
    scheduler=scheduler,
    num_epochs=num_epochs,
    patience=patience
)

# Save the best model
torch.save(trained_model.state_dict(), 'resnet18-v3.pth')

Epoch 1/30
Training Loss: 1.0100 Acc: 0.9591
Validation Loss: 1.6077 Acc: 0.7162
--------------------
Epoch 2/30
Training Loss: 0.9935 Acc: 0.9610
Validation Loss: 1.5597 Acc: 0.7162
--------------------
Epoch 3/30
Training Loss: 0.9815 Acc: 0.9595
Validation Loss: 1.5811 Acc: 0.7264
--------------------
Epoch 4/30
Training Loss: 0.9777 Acc: 0.9640
Validation Loss: 1.5641 Acc: 0.7095
--------------------
Epoch 5/30
Training Loss: 0.9628 Acc: 0.9667
Validation Loss: 1.4937 Acc: 0.7635
--------------------
Epoch 6/30
Training Loss: 0.9576 Acc: 0.9682
Validation Loss: 1.5073 Acc: 0.7365
--------------------
Epoch 7/30
Training Loss: 0.9647 Acc: 0.9667
Validation Loss: 1.5412 Acc: 0.7365
--------------------
Epoch 8/30
Training Loss: 0.9404 Acc: 0.9720
Validation Loss: 1.5696 Acc: 0.7331
--------------------
Epoch 9/30
Training Loss: 0.9459 Acc: 0.9708
Validation Loss: 1.5612 Acc: 0.7230
--------------------
Epoch 10/30
Training Loss: 0.9384 Acc: 0.9754
Validation Loss: 1.5723 Acc: 0.7128


### Attempt 3: Unfreeze all layers, remove criterion/change? and change for the scheduler the decay to a lower value

In [16]:
model = get_model().to(device)
for param in model.parameters():
    param.requires_grad = True  # Unfreeze all layers


In [17]:
criterion = LabelSmoothingLoss(smoothing=0.1)
optimizer = optim.Adam([
    {'params': model.fc.parameters(), 'lr': 1e-3},       # Higher learning rate for final layer
    {'params': model.layer4.parameters(), 'lr': 1e-4},   # Intermediate for higher layers
    {'params': model.layer3.parameters(), 'lr': 1e-5}    # Lower for earlier layers
])
# Learning rate scheduler to decay LR over time
scheduler = StepLR(optimizer, step_size=5, gamma=0.5)  # Decay LR more frequently for fine-tuning
num_epochs = 20
patience = 10


In [18]:
model_save_path='resnet18-v4.pth'

trained_model, train_loss_arr, train_acc_arr, val_loss_arr, val_acc_arr = train_model_with_early_stopping_and_scheduler(
    model=model,
    train_loader=train_loader,
    val_loader=val_loader,
    criterion=criterion,
    optimizer=optimizer,
    scheduler=scheduler,
    num_epochs=num_epochs,
    patience=patience
)


Epoch 1/20
Training Loss: 1.8976 Acc: 0.6413
Validation Loss: 1.5674 Acc: 0.7264
--------------------
Epoch 2/20
Training Loss: 0.9577 Acc: 0.9629
Validation Loss: 1.4148 Acc: 0.7669
--------------------
Epoch 3/20
Training Loss: 0.8527 Acc: 0.9837
Validation Loss: 1.3970 Acc: 0.7703
--------------------
Epoch 4/20
Training Loss: 0.8060 Acc: 0.9970
Validation Loss: 1.3391 Acc: 0.8243
--------------------
Epoch 5/20
Training Loss: 0.7911 Acc: 0.9955
Validation Loss: 1.3271 Acc: 0.7905
--------------------
Epoch 6/20
Training Loss: 0.7742 Acc: 0.9992
Validation Loss: 1.3049 Acc: 0.8345
--------------------
Epoch 7/20
Training Loss: 0.7672 Acc: 0.9985
Validation Loss: 1.2842 Acc: 0.8311
--------------------
Epoch 8/20
Training Loss: 0.7647 Acc: 0.9985
Validation Loss: 1.2855 Acc: 0.8345
--------------------
Epoch 9/20
Training Loss: 0.7576 Acc: 1.0000
Validation Loss: 1.2771 Acc: 0.8311
--------------------
Epoch 10/20
Training Loss: 0.7573 Acc: 0.9996
Validation Loss: 1.2453 Acc: 0.8649
