# Assignment Module 2: Product Classification

The goal of this assignment is to implement a neural network that classifies smartphone pictures of products found in grocery stores. The assignment will be divided into two parts: first, you will be asked to implement from scratch your own neural network for image classification; then, you will fine-tune a pretrained network provided by PyTorch.


## Preliminaries: the dataset

The dataset you will be using contains natural images of products taken with a smartphone camera in different grocery stores:

<p align="center">
  <img src="https://github.com/marcusklasson/GroceryStoreDataset/raw/master/sample_images/natural/Granny-Smith.jpg" width="150">
  <img src="https://github.com/marcusklasson/GroceryStoreDataset/raw/master/sample_images/natural/Pink-Lady.jpg" width="150">
  <img src="https://github.com/marcusklasson/GroceryStoreDataset/raw/master/sample_images/natural/Lemon.jpg" width="150">
  <img src="https://github.com/marcusklasson/GroceryStoreDataset/raw/master/sample_images/natural/Banana.jpg" width="150">
  <img src="https://github.com/marcusklasson/GroceryStoreDataset/raw/master/sample_images/natural/Vine-Tomato.jpg" width="150">
</p>
<p align="center">
  <img src="https://github.com/marcusklasson/GroceryStoreDataset/raw/master/sample_images/natural/Yellow-Onion.jpg" width="150">
  <img src="https://github.com/marcusklasson/GroceryStoreDataset/raw/master/sample_images/natural/Green-Bell-Pepper.jpg" width="150">
  <img src="https://github.com/marcusklasson/GroceryStoreDataset/raw/master/sample_images/natural/Arla-Standard-Milk.jpg" width="150">
  <img src="https://github.com/marcusklasson/GroceryStoreDataset/raw/master/sample_images/natural/Oatly-Natural-Oatghurt.jpg" width="150">
  <img src="https://github.com/marcusklasson/GroceryStoreDataset/raw/master/sample_images/natural/Alpro-Fresh-Soy-Milk.jpg" width="150">
</p>

The products belong to the following 43 classes:
```
0.  Apple
1.  Avocado
2.  Banana
3.  Kiwi
4.  Lemon
5.  Lime
6.  Mango
7.  Melon
8.  Nectarine
9.  Orange
10. Papaya
11. Passion-Fruit
12. Peach
13. Pear
14. Pineapple
15. Plum
16. Pomegranate
17. Red-Grapefruit
18. Satsumas
19. Juice
20. Milk
21. Oatghurt
22. Oat-Milk
23. Sour-Cream
24. Sour-Milk
25. Soyghurt
26. Soy-Milk
27. Yoghurt
28. Asparagus
29. Aubergine
30. Cabbage
31. Carrots
32. Cucumber
33. Garlic
34. Ginger
35. Leek
36. Mushroom
37. Onion
38. Pepper
39. Potato
40. Red-Beet
41. Tomato
42. Zucchini
```

The dataset is split into training (`train`), validation (`val`), and test (`test`) set.

The following code cells download the dataset and define a `torch.utils.data.Dataset` class to access it. This `Dataset` class will be the starting point of your assignment: use it in your own code and build everything else around it.

In [31]:
try:
    import google.colab
    !git clone https://github.com/marcusklasson/GroceryStoreDataset.git
except: pass

fatal: destination path 'GroceryStoreDataset' already exists and is not an empty directory.


In [32]:
from pathlib import Path
from PIL import Image
from torch import Tensor
from torch.utils.data import Dataset
from typing import List, Tuple

In [33]:
class GroceryStoreDataset(Dataset):

    def __init__(self, split: str, transform=None) -> None:
        super().__init__()

        self.root = Path("GroceryStoreDataset/dataset")
        self.split = split
        self.paths, self.labels = self.read_file()

        self.transform = transform

    def __len__(self) -> int:
        return len(self.labels)

    def __getitem__(self, idx) -> Tuple[Tensor, int]:
        img = Image.open(self.root / self.paths[idx])
        label = self.labels[idx]

        if self.transform:
            img = self.transform(img)

        return img, label

    def read_file(self) -> Tuple[List[str], List[int]]:
        paths = []
        labels = []

        with open(self.root / f"{self.split}.txt") as f:
            for line in f:
                # path, fine-grained class, coarse-grained class
                path, _, label = line.replace("\n", "").split(", ")
                paths.append(path), labels.append(int(label))

        return paths, labels

    def get_num_classes(self) -> int:
        return max(self.labels) + 1

## Part 1: design your own network

Your goal is to implement a convolutional neural network for image classification and train it on `GroceryStoreDataset`. You should consider yourselves satisfied once you obtain a classification accuracy on the **validation** split of **around 60%**. You are free to achieve that however you want, except for a few rules you must follow:

- You **cannot** simply instantiate an off-the-self PyTorch network. Instead, you must construct your network as a composition of existing PyTorch layers. In more concrete terms, you can use e.g. `torch.nn.Linear`, but you **cannot** use e.g. `torchvision.models.alexnet`.

- Justify every *design choice* you make. Design choices include network architecture, training hyperparameters, and, possibly, dataset preprocessing steps. You can either (i) start from the simplest convolutional network you can think of and add complexity one step at a time, while showing how each step gets you closer to the target ~60%, or (ii) start from a model that is already able to achieve the desired accuracy and show how, by removing some of its components, its performance drops (i.e. an *ablation study*). You can *show* your results/improvements however you want: training plots, console-printed values or tables, or whatever else your heart desires: the clearer, the better.

Don't be too concerned with your network performance: the ~60% is just to give you an idea of when to stop. Keep in mind that a thoroughly justified model with lower accuracy will be rewarded **more** points than a poorly experimentally validated model with higher accuracy.

### Imports

In [34]:
import subprocess
import sys

def install(package):
    subprocess.check_call([sys.executable, "-m", "pip", "install", package])

def is_installed(package):
    try:
        subprocess.check_call([sys.executable, "-c", f"import {package}"], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
        return True
    except subprocess.CalledProcessError:
        return False

# Upgrade pip
subprocess.check_call([sys.executable, "-m", "pip", "install", "--upgrade", "pip"])

# Check and install packages if not already installed
packages = [
    ("wandb", "wandb"),
    ("torchmetrics", "torchmetrics"),
    ("torchsummary", "torchsummary")
]

for pkg_name, import_name in packages:
    if not is_installed(import_name):
        install(pkg_name)

In [35]:
# THE CODE ABOVE IS QUICKIER than the bash
# ! python3.11 -m pip install --upgrade pip
# ! pip install -q wandb
# ! pip install -q torchmetrics
# ! pip install torchsummary

### Weights and Biases for following the net

In [36]:
import wandb

WANDB_USER = "lollopelle-2-universit-di-bologna"
WANDB_PROJECT = "IPCV-assignment-2"

### Imports

In [37]:
# Standard library imports
import csv
import copy
import random
from pathlib import Path
from typing import Any, Dict
import os
import json
from datetime import datetime
import time

# Third-party library imports
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
from tqdm import tqdm
from tqdm.notebook import tqdm

# PyTorch imports
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.optim import AdamW
from torch.optim.lr_scheduler import OneCycleLR, LambdaLR
from torch.utils.data import DataLoader
import torch.utils
from torchmetrics.classification.accuracy import Accuracy
from torchsummary import summary

# Torchvision imports
from torchvision import transforms as T
from torchvision.models import resnet18, ResNet18_Weights

#### Functions

In [38]:
def fix_random(seed: int) -> None:
    """Fix all the possible sources of randomness.

    Args:
        seed: the seed to use.
    """
    np.random.seed(seed)
    random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)

    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True

def extract_classes(csv_file_path: str) -> dict:
    """
    Extract unique pairs of IDs and labels from a CSV file.

    This function reads a CSV file, extracts the third and fourth columns,
    and creates a dictionary with unique pairs of IDs (from the fourth column)
    and labels (from the third column).

    Parameters:
    csv_file_path (str): The path to the CSV file.

    Returns:
    dict: A dictionary with IDs as keys and labels as values.
    """

    # Dictionary to store the unique pairs
    classes = {}

    # Read the CSV file
    with open(csv_file_path, mode='r', newline='', encoding='utf-8') as file:
        csv_reader = csv.reader(file)

        # Skip the CSV header
        next(csv_reader)

        for row in csv_reader:
            label = row[2]       # Third column
            id = int(row[3])     # Fourth column

            # Add the pair to the dictionary if it doesn't already exist
            if id not in classes:
                classes[id] = label

    return classes

def show_grid(dataset: GroceryStoreDataset, classes: dict) -> None:
    """Shows a grid with random images taken from the dataset.

    Args:
        dataset: the dataset containing the images.
        process: a function to apply on the images before showing them.
    """
    fig = plt.figure(figsize=(15, 5))
    indices_random = np.random.randint(10, size=10, high=len(classes.keys()))

    for count, idx in enumerate(indices_random):
        fig.add_subplot(2, 5, count + 1)
        item = dataset.__getitem__(idx) # (Tensor, idx)
        title = classes[item[1]]
        plt.title(title)
        image_processed = item[0]
        plt.imshow(T.ToPILImage()(image_processed))
        plt.axis("off")

    plt.tight_layout()
    plt.show()

def parse_compose(v):
  res = []
  for t in str(v).split("\n")[1:]:
    res.append(t.strip("    "))
  return res[:-1]

### Configuration

In [39]:
fix_random(seed=42)

device = "cpu"
if torch.cuda.is_available():
  print("All good, a GPU is available")
  device = torch.device("cuda:0")
else:
  print("Please set GPU via Edit -> Notebook Settings")

old_cfg = {
    "resize_size": 256,
    "crop_size": 224,

    "batch_size": 4,
    "num_epochs": 20,

    "lr": 1e-3,
    "wd": 1e-4,
    "step_size": 5
}

cfg = {
    "resize_size": 256,
    "crop_size": 224,
    "batch_size": 16,
    "num_epochs": 50,
    "lr": 1e-3,
    "wd": 5e-4,
    "step_size": 5
}

# CHOOSE WHICH MODEL TO TRAIN
FLAG = "proj"

All good, a GPU is available


### Data

In [40]:
# In  order to convert integer classes into their literal
classes = extract_classes(csv_file_path = 'GroceryStoreDataset/dataset/classes.csv')

# Preprocessing
mean_image_net = [0.485, 0.456, 0.406]                              # FIXME
std_image_net = [0.229, 0.224, 0.225]                               # FIXME
data_transforms = {
     "train": T.Compose([
                         T.RandomResizedCrop(cfg["crop_size"]),      # FIXME
                         T.RandomHorizontalFlip(),                   # FIXME
                         T.ToTensor(),                               # FIXME
                         T.Normalize(mean_image_net, std_image_net)  # FIXME
                     ]),

    "val": T.Compose([
                        T.Resize(cfg["resize_size"]),               # FIXME
                        T.CenterCrop(cfg["crop_size"]),             # FIXME
                        T.ToTensor(),                               # FIXME
                        T.Normalize(mean_image_net, std_image_net)  # FIXME
                    ]),

    "test": T.Compose([
                        T.ToTensor(),                               # FIXME
                        T.Resize(cfg["resize_size"]),               # FIXME
                    ]) # DEBUG
}

# Datasets
data_train = GroceryStoreDataset(split="train", transform=data_transforms["train"])
data_val = GroceryStoreDataset(split="val", transform=data_transforms["val"])
data_test = GroceryStoreDataset(split="test", transform=data_transforms["test"])

# DEBUG
# show_grid(dataset=data_train, classes=classes)
# show_grid(dataset=data_test, classes=classes)
# show_grid(dataset=data_val, classes=classes)

### Models

#### First version

In [None]:
class ProjectCNN_v1(nn.Module):
    def __init__(self, n_classes):
        super(ProjectCNN_v1, self).__init__()
        
        # Down path
        self.conv = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=7, stride=3, padding=3)
        self.maxpool = nn.MaxPool2d(kernel_size=2, stride=6)
        
        # Fully connected layer for classification
        self.fc1 = nn.Linear(32 * 13 * 13, n_classes)
    
    def forward(self, x):
        x = F.relu(self.conv(x)) 
        x = self.maxpool(x)
        
        # Flatten and pass through fully connected layer
        x = torch.flatten(x, 1) 
        x = self.fc1(x)
        return x

#### Second version

In [None]:
class ProjectCNN_v2(nn.Module):
    def __init__(self, n_classes):
        super(ProjectCNN_v2, self).__init__()
        
        # Down path
        self.conv = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=7, stride=2, padding=3)
        self.maxpool = nn.MaxPool2d(kernel_size=2, stride=2)
        
        # Fully connected layers for classification
        self.fc1 = nn.Linear(64 * 56 * 56, 512)
        self.fc2 = nn.Linear(512, n_classes) 
    
    def forward(self, x):
        x = F.relu(self.conv(x))
        x = self.maxpool(x)
        
        # Flatten and pass through fully connected layers
        x = torch.flatten(x, 1)
        x = F.relu(self.fc1(x))
        x = self.fc2(x) # No activation here, as we'll use CrossEntropyLoss which applies Softmax
        return x

#### Third version

In [None]:
class ProjectCNN_v3(nn.Module):
    def __init__(self, n_classes):
        super(ProjectCNN_v3, self).__init__()
        
        # Down path
        self.conv11 = nn.Conv2d(in_channels=3, out_channels=128, kernel_size=5, padding=1)
        self.maxpool1 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv21 = nn.Conv2d(in_channels=128, out_channels=256, kernel_size=5, padding=1)
        self.maxpool2 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv31 = nn.Conv2d(in_channels=256, out_channels=512, kernel_size=5, padding=1)
        self.maxpool3 = nn.MaxPool2d(kernel_size=2, stride=2)
        
        # Fully connected layers for classification
        self.fc1 = nn.Linear(512 * 26 * 26, 1024)
        self.fc2 = nn.Linear(1024, n_classes)
    
    def forward(self, x):
        x = F.relu(self.conv11(x))  
        x = self.maxpool1(x)
        x = F.relu(self.conv21(x))   
        x = self.maxpool2(x)
        x = F.relu(self.conv31(x))  
        x = self.maxpool3(x)
        
        # Flatten and pass through fully connected layers
        x = torch.flatten(x, 1)  
        x = F.relu(self.fc1(x))
        x = self.fc2(x)  # No activation here, as we'll use CrossEntropyLoss which applies Softmax
        return x

#### Fourth version

In [None]:
class ProjectCNN_v4(nn.Module):
    def __init__(self, n_classes):
        super(ProjectCNN_v4, self).__init__()
        
        # Stem layer
        self.stem_conv1 = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=7, stride=2, padding=3)
        self.stem_batchnorm1 = nn.BatchNorm2d(32)
        self.stem_pool1 = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.stem_conv2 = nn.Conv2d(in_channels=32, out_channels=32, kernel_size=1, stride=1, padding=0)
        self.stem_batchnorm2 = nn.BatchNorm2d(32)
        self.stem_conv3 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=1, padding=1)
        self.stem_batchnorm3 = nn.BatchNorm2d(64)
        self.stem_pool2 = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        
        # Down path
        self.conv11 = nn.Conv2d(in_channels=64, out_channels=128, kernel_size=5, padding=1)
        self.maxpool1 = nn.MaxPool2d(kernel_size=2, stride=2)
        
        self.conv21 = nn.Conv2d(in_channels=128, out_channels=256, kernel_size=5, padding=1)
        self.maxpool2 = nn.MaxPool2d(kernel_size=2, stride=2)
        
        self.conv31 = nn.Conv2d(in_channels=256, out_channels=512, kernel_size=5, padding=1)
        self.maxpool3 = nn.MaxPool2d(kernel_size=2, stride=2)
        
        # Fully connected layers for classification
        self.fc1 = nn.Linear(512 * 1 * 1, 1024)
        self.fc2 = nn.Linear(1024, n_classes)
    
    def forward(self, x):
        # Stem layer
        x = F.relu(self.stem_batchnorm1(self.stem_conv1(x)))
        x = self.stem_pool1(x)
        x = F.relu(self.stem_batchnorm2(self.stem_conv2(x)))
        x = F.relu(self.stem_batchnorm3(self.stem_conv3(x)))
        x = self.stem_pool2(x)
        
        # Down path
        x = F.relu(self.conv11(x))
        x = self.maxpool1(x)
        x = F.relu(self.conv21(x))
        x = self.maxpool2(x)
        x = F.relu(self.conv31(x))
        x = self.maxpool3(x)
        
        # Flatten and pass through fully connected layers
        x = torch.flatten(x, 1)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)  # No activation here, as we'll use CrossEntropyLoss which applies Softmax
        
        return x

#### Fifth version

In [None]:
class ProjectCNN_v5(nn.Module):
    def __init__(self, n_classes):
        super(ProjectCNN_v5, self).__init__()
        
        # Stem layer
        self.stem_conv1 = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=7, stride=2, padding=3)
        self.stem_batchnorm1 = nn.BatchNorm2d(32)
        self.stem_pool1 = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.stem_conv2 = nn.Conv2d(in_channels=32, out_channels=32, kernel_size=1, stride=1, padding=0)
        self.stem_batchnorm2 = nn.BatchNorm2d(32)
        self.stem_conv3 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=1, padding=1)
        self.stem_batchnorm3 = nn.BatchNorm2d(64)
        self.stem_pool2 = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        
        # Down path
        self.conv11 = nn.Conv2d(in_channels=64, out_channels=128, kernel_size=5, padding=1)
        self.batchnorm11 = nn.BatchNorm2d(128)
        self.maxpool1 = nn.MaxPool2d(kernel_size=2, stride=2)
        
        self.conv21 = nn.Conv2d(in_channels=128, out_channels=256, kernel_size=5, padding=1)
        self.batchnorm21 = nn.BatchNorm2d(256)
        self.maxpool2 = nn.MaxPool2d(kernel_size=2, stride=2)
        
        self.conv31 = nn.Conv2d(in_channels=256, out_channels=512, kernel_size=5, padding=1)
        self.batchnorm31 = nn.BatchNorm2d(512)
        self.maxpool3 = nn.MaxPool2d(kernel_size=2, stride=2)
        
        # Fully connected layers for classification
        self.fc1 = nn.Linear(512 * 1 * 1, 1024)
        self.fc2 = nn.Linear(1024, n_classes)
    
    def forward(self, x):
        # Stem layer
        x = F.relu(self.stem_batchnorm1(self.stem_conv1(x)))
        x = self.stem_pool1(x)
        x = F.relu(self.stem_batchnorm2(self.stem_conv2(x)))
        x = F.relu(self.stem_batchnorm3(self.stem_conv3(x)))
        x = self.stem_pool2(x)
        
        # Down path
        x = F.relu(self.batchnorm11(self.conv11(x)))
        x = self.maxpool1(x)
        x = F.relu(self.batchnorm21(self.conv21(x)))
        x = self.maxpool2(x)
        x = F.relu(self.batchnorm31(self.conv31(x)))
        x = self.maxpool3(x)
        
        # Flatten and pass through fully connected layers
        x = torch.flatten(x, 1)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)  # No activation here, as we'll use CrossEntropyLoss which applies Softmax
        
        return x

#### Sixth version

In [None]:
class ProjectCNN_v6(nn.Module):
    def __init__(self, n_classes):
        super(ProjectCNN_v6, self).__init__()
        
        # Stem layer
        self.stem_conv1 = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=7, stride=2, padding=3)
        self.stem_batchnorm1 = nn.BatchNorm2d(32)
        self.stem_pool1 = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.stem_conv2 = nn.Conv2d(in_channels=32, out_channels=32, kernel_size=1, stride=1, padding=0)
        self.stem_batchnorm2 = nn.BatchNorm2d(32)
        self.stem_conv3 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=1, padding=1)
        self.stem_batchnorm3 = nn.BatchNorm2d(64)
        self.stem_pool2 = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        
        # Down path
        self.conv11 = nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, padding=1)
        self.batchnorm11 = nn.BatchNorm2d(128)
        self.conv12 = nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, padding=1)
        self.batchnorm12 = nn.BatchNorm2d(128)
        self.maxpool1 = nn.MaxPool2d(kernel_size=2, stride=2)
        
        self.conv21 = nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, padding=1)
        self.batchnorm21 = nn.BatchNorm2d(256)
        self.conv22 = nn.Conv2d(in_channels=256, out_channels=256, kernel_size=3, padding=1)
        self.batchnorm22 = nn.BatchNorm2d(256)
        self.maxpool2 = nn.MaxPool2d(kernel_size=2, stride=2)
        
        self.conv31 = nn.Conv2d(in_channels=256, out_channels=512, kernel_size=3, padding=1)
        self.batchnorm31 = nn.BatchNorm2d(512)
        self.conv32 = nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, padding=1)
        self.batchnorm32 = nn.BatchNorm2d(512)
        self.maxpool3 = nn.MaxPool2d(kernel_size=2, stride=2)
        
        # Fully connected layers for classification
        self.fc1 = nn.Linear(512 * 3 * 3, 1024)
        self.fc2 = nn.Linear(1024, n_classes)
    
    def forward(self, x):
        # Stem layer
        x = F.relu(self.stem_batchnorm1(self.stem_conv1(x)))
        x = self.stem_pool1(x)
        x = F.relu(self.stem_batchnorm2(self.stem_conv2(x)))
        x = F.relu(self.stem_batchnorm3(self.stem_conv3(x)))
        x = self.stem_pool2(x)
        
        # Down path
        x = F.relu(self.batchnorm11(self.conv11(x)))
        x = F.relu(self.batchnorm12(self.conv12(x)))
        x = self.maxpool1(x)
        
        x = F.relu(self.batchnorm21(self.conv21(x)))
        x = F.relu(self.batchnorm22(self.conv22(x)))
        x = self.maxpool2(x)
        
        x = F.relu(self.batchnorm31(self.conv31(x)))
        x = F.relu(self.batchnorm32(self.conv32(x)))
        x = self.maxpool3(x)
        
        # Flatten and pass through fully connected layers
        x = torch.flatten(x, 1)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)  # No activation here, as we'll use CrossEntropyLoss which applies Softmax
        
        return x

#### Seventh model

In [None]:
class ProjectCNN_v7(nn.Module):
    def __init__(self, n_classes):
        super(ProjectCNN_v7, self).__init__()
        
        # Stem layer
        self.stem_conv1 = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=7, stride=2, padding=3)
        self.stem_batchnorm1 = nn.BatchNorm2d(32)
        self.stem_pool1 = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        
        self.stem_conv2 = nn.Conv2d(in_channels=32, out_channels=32, kernel_size=1, stride=1, padding=0)
        self.stem_batchnorm2 = nn.BatchNorm2d(32)
        
        self.stem_conv3 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=1, padding=1)
        self.stem_batchnorm3 = nn.BatchNorm2d(64)
        self.stem_pool2 = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        
        # Down path
        self.conv11 = nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, padding=1)
        self.batchnorm11 = nn.BatchNorm2d(128)
        self.conv12 = nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, padding=1)
        self.batchnorm12 = nn.BatchNorm2d(128)
        self.maxpool1 = nn.MaxPool2d(kernel_size=2, stride=2)
        
        self.conv21 = nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, padding=1)
        self.batchnorm21 = nn.BatchNorm2d(256)
        self.conv22 = nn.Conv2d(in_channels=256, out_channels=256, kernel_size=3, padding=1)
        self.batchnorm22 = nn.BatchNorm2d(256)
        self.maxpool2 = nn.MaxPool2d(kernel_size=2, stride=2)
        
        self.conv31 = nn.Conv2d(in_channels=256, out_channels=512, kernel_size=3, padding=1)
        self.batchnorm31 = nn.BatchNorm2d(512)
        self.conv32 = nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, padding=1)
        self.batchnorm32 = nn.BatchNorm2d(512)
        self.maxpool3 = nn.MaxPool2d(kernel_size=2, stride=2)
        
        # Fully connected layers for classification
        self.fc1 = nn.Linear(512 * 3 * 3, 1024)
        self.dropout = nn.Dropout(0.2)
        self.fc2 = nn.Linear(1024, n_classes)
    
    def forward(self, x):
        # Stem layer
        x = F.relu(self.stem_batchnorm1(self.stem_conv1(x)))
        x = self.stem_pool1(x)
        
        x = F.relu(self.stem_batchnorm2(self.stem_conv2(x)))
        x = F.relu(self.stem_batchnorm3(self.stem_conv3(x)))
        x = self.stem_pool2(x)
        
        # Down path
        x = F.relu(self.batchnorm11(self.conv11(x)))
        x = F.relu(self.batchnorm12(self.conv12(x)))
        x = self.maxpool1(x)
        
        x = F.relu(self.batchnorm21(self.conv21(x)))
        x = F.relu(self.batchnorm22(self.conv22(x)))
        x = self.maxpool2(x)
        
        x = F.relu(self.batchnorm31(self.conv31(x)))
        x = F.relu(self.batchnorm32(self.conv32(x)))
        x = self.maxpool3(x)
        
        # Flatten and pass through fully connected layers
        x = torch.flatten(x, 1)  # Flatten all dimensions except batch
        x = F.relu(self.fc1(x))
        x = self.dropout(x)  # Apply dropout
        x = self.fc2(x)  # No activation here, as we'll use CrossEntropyLoss which applies Softmax
        
        return x
    
    def layers(self):
        for name, params in self.named_parameters():
            print(f"{name}: {params.shape}")

In [50]:
FLAG = "v6"
match FLAG :
    case "v1": model = ProjectCNN_v1(n_classes=len(classes))
    case "v2": model = ProjectCNN_v2(n_classes=len(classes))
    case "v3": model = ProjectCNN_v3(n_classes=len(classes))
    case "v4": model = ProjectCNN_v4(n_classes=len(classes))
    case "v5": model = ProjectCNN_v5(n_classes=len(classes))
    case "v6": model = ProjectCNN_v6(n_classes=len(classes))
    case "v7": model = ProjectCNN_v7(n_classes=len(classes))
    case _ : raise NameError("Unknown model in FLAG")

# Verifies if the model is already on the device
if next(model.parameters()).device != device:
    model.to(device)

In [51]:
summary(
    model,
    input_size=(3, cfg["crop_size"], cfg["crop_size"])
)

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 32, 112, 112]           4,736
       BatchNorm2d-2         [-1, 32, 112, 112]              64
         MaxPool2d-3           [-1, 32, 56, 56]               0
            Conv2d-4           [-1, 32, 56, 56]           1,056
       BatchNorm2d-5           [-1, 32, 56, 56]              64
            Conv2d-6           [-1, 64, 56, 56]          18,496
       BatchNorm2d-7           [-1, 64, 56, 56]             128
         MaxPool2d-8           [-1, 64, 28, 28]               0
            Conv2d-9          [-1, 128, 28, 28]          73,856
      BatchNorm2d-10          [-1, 128, 28, 28]             256
           Conv2d-11          [-1, 128, 28, 28]         147,584
      BatchNorm2d-12          [-1, 128, 28, 28]             256
        MaxPool2d-13          [-1, 128, 14, 14]               0
           Conv2d-14          [-1, 256,

### Trainer

In [52]:
# For automating batching

loader_train = DataLoader(
    data_train,
    batch_size=cfg["batch_size"],
    shuffle=True,
    pin_memory=True
)
loader_val = DataLoader(
    data_val,
    batch_size=cfg["batch_size"],
    shuffle=False
)
loader_test = DataLoader(
    data_test,
    batch_size=cfg["batch_size"],
    shuffle=False
)

In [53]:
class Trainer:
    def __init__(self,
            model: nn.Module,
            train_loader: DataLoader,
            val_loader: DataLoader,
            test_loader: DataLoader,
            device: torch.device,
            num_classes: int
        ) -> None:
        self.train_loader = train_loader
        self.val_loader = val_loader
        self.test_loader = test_loader
        self.device = device
        self.num_classes = num_classes
        self.num_epochs = cfg["num_epochs"]

        self.model = model.to(device)
        self.optimizer = AdamW(self.model.parameters(), lr=cfg["lr"], weight_decay=cfg["wd"])
        num_steps = self.num_epochs * len(train_loader)
        # self.scheduler = OneCycleLR(self.optimizer, cfg["lr"], total_steps=num_steps)
        self.scheduler = LambdaLR(self.optimizer, lr_lambda=lambda epoch: 1.0)

        self.step = 0
        self.best_acc = 0.0

        wandb.init(name=cfg["run_name"], entity=WANDB_USER, project=WANDB_PROJECT, config=cfg)
        self.ckpt_path = Path("ckpts")
        self.ckpt_path.mkdir(exist_ok=True)

    def logfn(self, values: Dict[str, Any]) -> None:
        wandb.log(values, step=self.step, commit=False)

    def train(self) -> None:
        self.training_time = time.time()
        for _ in tqdm(range(self.num_epochs), desc="Epoch"):
            self.model.train()

            for imgs, labels in self.train_loader:
                imgs = imgs.to(self.device)
                labels = labels.to(self.device)

                pred = self.model(imgs)
                # print(pred.shape, labels.shape)
                loss = F.cross_entropy(pred, labels)

                self.optimizer.zero_grad()
                loss.backward()
                self.optimizer.step()
                self.scheduler.step()

                if self.step % 10 == 0:
                    self.logfn({"train/loss": loss.item()})
                    self.logfn({"train/lr": self.scheduler.get_last_lr()[0]})

                self.step += 1

            self.eval("train")
            self.eval("val")

        wandb.finish()
        self.training_time = time.time() - self.training_time


    # def test(self) -> None:
    #     wandb.init(name=cfg["run_name"]+"_test", entity=WANDB_USER, project=WANDB_PROJECT, config=cfg)
    #     self.eval("test")
    #     wandb.finish()

    @torch.no_grad()
    def eval(self, split: str) -> None:
        self.model.eval()

        if split == "train":
            loader = self.train_loader
        elif split == "val":
            loader = self.val_loader
        # elif split == "test":
        #     loader = self.test_loader
        else:
            raise ValueError(f"Unknown split: {split}")

        acc = Accuracy("multiclass", num_classes=self.num_classes).to(self.device)

        losses = []
        for imgs, labels in loader:
            imgs = imgs.to(self.device)
            labels = labels.to(self.device)

            pred = self.model(imgs)
            loss = F.cross_entropy(pred, labels)
            losses.append(loss.item())

            pred_softmax = F.softmax(pred, dim=-1)
            acc(pred_softmax, labels)

        loss = sum(losses) / len(losses)
        accuracy = acc.compute()

        self.logfn({f"{split}/loss": loss})
        self.logfn({f"{split}/acc": accuracy})

        if accuracy > self.best_acc and split == "val":
            self.best_acc = accuracy
            torch.save(self.model.state_dict(), self.ckpt_path / f"{wandb.run.name}.pt")
            self.best_model = copy.deepcopy(self.model)

    def save_model_params(self, cfg, data_transforms):
        model_dir = os.path.join("params", f"{FLAG}_model")
        os.makedirs(model_dir, exist_ok=True)

        timestamp = datetime.now().strftime("%Y-%m-%d-%H-%M-%S")  # in order not to have duplicates
        training_time_formatted = time.strftime('%H-%M-%S', time.gmtime(self.training_time))
        file_name = f"ACC={self.best_acc:.2f}____TT={training_time_formatted}____TM={timestamp}.json"
        file_path = os.path.join(model_dir, file_name)

        combined_params = {
            "cfg": cfg,
            "data_transforms": {k: parse_compose(v) for k,v in data_transforms.items()},
            "model_structure" : parse_compose(model)
        }

        with open(file_path, 'w') as f:
            json.dump(combined_params, f, indent=4)

In [54]:
cfg["run_name"] = f"ProjectCNN_{FLAG}_scheduledLR"

# Wandb key: 3f0834114b4b33656e70323616fa377c30c83542

trainer = Trainer(
    model,
    loader_train,
    loader_val,
    loader_test,
    device,
    num_classes=len(classes.keys())
)

In [55]:
trainer.train()

Epoch:   0%|          | 0/50 [00:00<?, ?it/s]

VBox(children=(Label(value='0.012 MB of 0.012 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
train/acc,▁▂▂▂▂▂▄▂▃▄▄▄▅▄▅▆▅▆▆▆▆▇▇▇▇▇▇▇▇███████████
train/loss,██▅▅▆▄▅▅▆▄▅▃▅▃▃▃▃▂▃▂▃▃▃▂▁▂▃▂▂▃▂▁▂▁▂▂▁▁▁▂
train/lr,▁▁▂▂▃▄▅▆▆▇███████▇▇▇▇▆▆▅▅▅▄▄▃▃▃▂▂▂▂▁▁▁▁▁
val/acc,▂▂▃▂▁▁▃▁▂▅▄▃▅▄▄▆▅▇▇▅▆▆▆▆▇▇▇▇▇█▇██▇██████
val/loss,▅▄▄▅▅▇▅█▆▃▄▄▃▅▅▃▅▂▂▃▃▃▃▃▂▃▂▂▃▁▂▁▂▂▁▁▁▁▁▁

0,1
train/acc,0.96212
train/loss,0.12072
train/lr,0.0
val/acc,0.64527
val/loss,1.43736


In [18]:
# trainer.test()

In [56]:
print(f"Best val acc = {trainer.best_acc:.3f}")

Best val acc = 0.669


In [None]:
# Log results
trainer.save_model_params(cfg, data_transforms)

## Part 2: fine-tune an existing network

Your goal is to fine-tune a pretrained **ResNet-18** model on `GroceryStoreDataset`. Use the implementation provided by PyTorch, do not implement it yourselves! (i.e. exactly what you **could not** do in the first part of the assignment). Specifically, you must use the PyTorch ResNet-18 model pretrained on ImageNet-1K (V1). Divide your fine-tuning into two parts:

1. First, fine-tune the Resnet-18 with the same training hyperparameters you used for your best model in the first part of the assignment.
1. Then, tweak the training hyperparameters in order to increase the accuracy on the validation split of `GroceryStoreDataset`. Justify your choices by analyzing the training plots and/or citing sources that guided you in your decisions (papers, blog posts, YouTube videos, or whatever else you find enlightening). You should consider yourselves satisfied once you obtain a classification accuracy on the **validation** split **between 80 and 90%**.

### 1) Fine-tune the Resnet-18 with the same training hyperparameters you used for your best model in the first part of the assignment.

In [None]:
# Pretrained ResNet-18 on ImageNet-1K (V1)
pt_resnet18 = resnet18(weights=ResNet18_Weights.IMAGENET1K_V1)

Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /root/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
100%|██████████| 44.7M/44.7M [00:00<00:00, 171MB/s]


In [None]:
## Freeze the pre-trained layers ##
for param in pt_resnet18.parameters():
    param.requires_grad = False

## Modify the last layer ##
pt_resnet18.fc = torch.nn.Linear(pt_resnet18.fc.in_features, len(classes.keys()))

## First train
cfg["run_name"] = "ResNet-18_pretrained_scratch-training"
## cfg["num_epochs"] = 5
trainer = Trainer(
    pt_resnet18,
    loader_train,
    loader_val,
    loader_test,
    device,
    num_classes=len(classes.keys())
)
trainer.train()

[34m[1mwandb[0m: Currently logged in as: [33mlollopelle-2[0m ([33mlollopelle-2-universit-di-bologna[0m). Use [1m`wandb login --relogin`[0m to force relogin


Epoch:   0%|          | 0/30 [00:00<?, ?it/s]

VBox(children=(Label(value='0.001 MB of 0.012 MB uploaded\r'), FloatProgress(value=0.09762658227848102, max=1.…

0,1
train/acc,▁▂▄▅▆▇▇▇▇█████████████████████
train/loss,█▇▇▅▅▄▄▃▄▂▃▂▂▃▂▂▁▂▂▂▂▂▂▁▁▁▁▁▂▂▁▂▂▁▁▁▁▁▁▁
train/lr,▁▁▂▂▃▄▅▆▇▇███████▇▇▇▇▆▆▆▅▅▄▄▄▃▃▃▂▂▂▁▁▁▁▁
val/acc,▁▂▄▅▆▇▇▇▇▇▇▇██████████▇███████
val/loss,█▇▅▄▃▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
train/acc,0.95795
train/loss,0.1887
train/lr,0.0
val/acc,0.80743
val/loss,0.57449


In [None]:
print(f"Best val acc (intermediate) = {trainer.best_acc:.3f}")

Best val acc (intermediate) = 0.834


In [None]:
## Load the state of the best accuracy
sd = torch.load(f"ckpts/{cfg['run_name']}.pt")
pt_resnet18.load_state_dict(sd)

## Activate learning
for param in pt_resnet18.parameters():
    param.requires_grad = True

## Fine tuning
cfg["run_name"] = "ResNet-18_pretrained_fine-tuning"
cfg["lr"] *= 0.1
cfg["num_epochs"] = 10
trainer = Trainer(
    pt_resnet18,
    loader_train,
    loader_val,
    loader_test,
    device,
    num_classes=len(classes.keys())
)
trainer.train()

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

VBox(children=(Label(value='0.001 MB of 0.002 MB uploaded\r'), FloatProgress(value=0.5200168563000421, max=1.0…

0,1
train/acc,▂▃▁▅▆▆█▇██
train/loss,▄▃▃▆▄█▂▃▂▄▄▅▃▅▃▃▁▂▂▄▄▂▄▃▄▃▃▂▂▂▇▁▁▁▁▄▂▂▁▃
train/lr,▁▁▂▂▃▄▅▆▆▇███████▇▇▇▇▆▆▆▅▅▄▄▄▃▃▃▂▂▂▁▁▁▁▁
val/acc,▃▃▅▁▅▄▅█▇█
val/loss,▆▇▅█▄▃▂▁▂▁

0,1
train/acc,0.98788
train/loss,0.04966
train/lr,0.0
val/acc,0.88851
val/loss,0.32398


In [None]:
print(f"Best val acc (final) = {trainer.best_acc:.3f}")

Best val acc (final) = 0.895


### 2) Tweak the training hyperparameters in order to increase the accuracy on the validation split of `GroceryStoreDataset`.

In [None]:
# You need to do the same but with different cfg
cfg_fine_tuning = {
    "resize_size": 256,
    "crop_size": 224,
    "batch_size": 16,
    "num_epochs": 30,
    "lr": 1e-3,
    "wd": 5e-4,
    "step_size": 5
}
cfg = cfg_fine_tuning

In [None]:
## Freeze the pre-trained layers ##
for param in pt_resnet18.parameters():
    param.requires_grad = False

## Modify the last layer ##
pt_resnet18.fc = torch.nn.Linear(pt_resnet18.fc.in_features, len(classes.keys()))

# First train
cfg["run_name"] = "ResNet-18_pretrained_scratch-training_2"
# cfg["num_epochs"] = 5
trainer = Trainer(
    pt_resnet18,
    loader_train,
    loader_val,
    loader_test,
    device,
    num_classes=len(classes.keys())
)
trainer.train()
print(f"Best val acc (intermediate) = {trainer.best_acc:.3f}")

Epoch:   0%|          | 0/30 [00:00<?, ?it/s]

VBox(children=(Label(value='0.012 MB of 0.012 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
train/acc,▁▄▇███████████████████████████
train/loss,█▇▆▄▂▂▁▂▂▁▁▁▁▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
train/lr,▁▁▂▂▃▄▅▆▇▇███████▇▇▇▇▆▆▆▅▅▄▄▄▃▃▃▂▂▂▁▁▁▁▁
val/acc,▁▃▆▇██████████▇████▇███▇██████
val/loss,█▆▄▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
train/acc,0.98939
train/loss,0.04075
train/lr,0.0
val/acc,0.87838
val/loss,0.33307


Best val acc (intermediate) = 0.902


In [None]:
# Load the state of the best accuracy
sd = torch.load(f"ckpts/{cfg['run_name']}.pt")
pt_resnet18.load_state_dict(sd)

## Activate learning
for param in pt_resnet18.parameters():
    param.requires_grad = True

# Fine tuning
cfg["run_name"] = "ResNet-18_pretrained_fine-tuning_2"
cfg["lr"] *= 0.1
cfg["num_epochs"] = 10
trainer = Trainer(
    pt_resnet18,
    loader_train,
    loader_val,
    loader_test,
    device,
    num_classes=len(classes.keys())
)
trainer.train()
print(f"Best val acc (final) = {trainer.best_acc:.3f}")

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

VBox(children=(Label(value='0.001 MB of 0.002 MB uploaded\r'), FloatProgress(value=0.519814502529511, max=1.0)…

0,1
train/acc,▇▃▁▂▄▆██▇▇
train/loss,▂▁▂▁▂▅▁▅▁▅▄▃▂▁▂▂█▆▁▅▁▄▁▂▁▁▆▁▂▁▁▂▅▁▁▁▂▁▁▅
train/lr,▁▁▂▂▃▄▅▆▆▇███████▇▇▇▇▆▆▆▅▅▄▄▄▃▃▃▂▂▂▁▁▁▁▁
val/acc,█▇▁▅▁▃▆▆▆▆
val/loss,▁▃█▆▆▅▂▃▃▂

0,1
train/acc,0.98977
train/loss,0.0355
train/lr,0.0
val/acc,0.86149
val/loss,0.42077


Best val acc (final) = 0.885
