# Assignment Module 2: Product Classification

The goal of this assignment is to implement a neural network that classifies smartphone pictures of products found in grocery stores. The assignment will be divided into two parts: first, you will be asked to implement from scratch your own neural network for image classification; then, you will fine-tune a pretrained network provided by PyTorch.


## Preliminaries: the dataset

The dataset you will be using contains natural images of products taken with a smartphone camera in different grocery stores:

<p align="center">
  <img src="https://github.com/marcusklasson/GroceryStoreDataset/raw/master/sample_images/natural/Granny-Smith.jpg" width="150">
  <img src="https://github.com/marcusklasson/GroceryStoreDataset/raw/master/sample_images/natural/Pink-Lady.jpg" width="150">
  <img src="https://github.com/marcusklasson/GroceryStoreDataset/raw/master/sample_images/natural/Lemon.jpg" width="150">
  <img src="https://github.com/marcusklasson/GroceryStoreDataset/raw/master/sample_images/natural/Banana.jpg" width="150">
  <img src="https://github.com/marcusklasson/GroceryStoreDataset/raw/master/sample_images/natural/Vine-Tomato.jpg" width="150">
</p>
<p align="center">
  <img src="https://github.com/marcusklasson/GroceryStoreDataset/raw/master/sample_images/natural/Yellow-Onion.jpg" width="150">
  <img src="https://github.com/marcusklasson/GroceryStoreDataset/raw/master/sample_images/natural/Green-Bell-Pepper.jpg" width="150">
  <img src="https://github.com/marcusklasson/GroceryStoreDataset/raw/master/sample_images/natural/Arla-Standard-Milk.jpg" width="150">
  <img src="https://github.com/marcusklasson/GroceryStoreDataset/raw/master/sample_images/natural/Oatly-Natural-Oatghurt.jpg" width="150">
  <img src="https://github.com/marcusklasson/GroceryStoreDataset/raw/master/sample_images/natural/Alpro-Fresh-Soy-Milk.jpg" width="150">
</p>

The products belong to the following 43 classes:
```
0.  Apple
1.  Avocado
2.  Banana
3.  Kiwi
4.  Lemon
5.  Lime
6.  Mango
7.  Melon
8.  Nectarine
9.  Orange
10. Papaya
11. Passion-Fruit
12. Peach
13. Pear
14. Pineapple
15. Plum
16. Pomegranate
17. Red-Grapefruit
18. Satsumas
19. Juice
20. Milk
21. Oatghurt
22. Oat-Milk
23. Sour-Cream
24. Sour-Milk
25. Soyghurt
26. Soy-Milk
27. Yoghurt
28. Asparagus
29. Aubergine
30. Cabbage
31. Carrots
32. Cucumber
33. Garlic
34. Ginger
35. Leek
36. Mushroom
37. Onion
38. Pepper
39. Potato
40. Red-Beet
41. Tomato
42. Zucchini
```

The dataset is split into training (`train`), validation (`val`), and test (`test`) set.

The following code cells download the dataset and define a `torch.utils.data.Dataset` class to access it. This `Dataset` class will be the starting point of your assignment: use it in your own code and build everything else around it.

In [1]:
!git clone https://github.com/marcusklasson/GroceryStoreDataset.git

Cloning into 'GroceryStoreDataset'...
remote: Enumerating objects: 6559, done.[K
remote: Counting objects: 100% (266/266), done.[K
remote: Compressing objects: 100% (231/231), done.[K
remote: Total 6559 (delta 45), reused 35 (delta 35), pack-reused 6293[K
Receiving objects: 100% (6559/6559), 116.26 MiB | 24.53 MiB/s, done.
Resolving deltas: 100% (275/275), done.
Updating files: 100% (5717/5717), done.


In [2]:
from pathlib import Path
from PIL import Image
import PIL
import torch
from torch import Tensor
from torch.utils.data import Dataset, DataLoader
from typing import List, Tuple, Callable, Optional
import torchvision.transforms as transforms

In [3]:
class GroceryStoreDataset(Dataset):

    def __init__(self, split: str, transform: Callable[[PIL.Image.Image], Tensor]) -> None:
        super().__init__()
        self.root = Path("GroceryStoreDataset/dataset")
        self.split = split
        self.paths, self.labels = self.read_file()
        self.transform = transform

    def __len__(self) -> int:
        return len(self.labels)

    def __getitem__(self, idx) -> Tuple[Tensor, int]:
        img = Image.open(self.root / self.paths[idx])
        label = self.labels[idx]
        tensor_img = self.transform(img)
        return tensor_img, label

    def read_file(self) -> Tuple[List[str], List[int]]:
        paths = []
        labels = []
        with open(self.root / f"{self.split}.txt") as f:
            for line in f:
                # path, fine-grained class, coarse-grained class
                path, _, label = line.replace("\n", "").split(", ")
                paths.append(path)
                labels.append(int(label))
        return paths, labels

    def get_num_classes(self) -> int:
        return max(self.labels) + 1

## Part 1: design your own network

Your goal is to implement a convolutional neural network for image classification and train it on `GroceryStoreDataset`. You should consider yourselves satisfied once you obtain a classification accuracy on the **validation** split of **around 60%**. You are free to achieve that however you want, except for a few rules you must follow:

- You **cannot** simply instantiate an off-the-self PyTorch network. Instead, you must construct your network as a composition of existing PyTorch layers. In more concrete terms, you can use e.g. `torch.nn.Linear`, but you **cannot** use e.g. `torchvision.models.alexnet`.

- Justify every *design choice* you make. Design choices include network architecture, training hyperparameters, and, possibly, dataset preprocessing steps. You can either (i) start from the simplest convolutional network you can think of and add complexity one step at a time, while showing how each step gets you closer to the target ~60%, or (ii) start from a model that is already able to achieve the desired accuracy and show how, by removing some of its components, its performance drops (i.e. an *ablation study*). You can *show* your results/improvements however you want: training plots, console-printed values or tables, or whatever else your heart desires: the clearer, the better.

Don't be too concerned with your network performance: the ~60% is just to give you an idea of when to stop. Keep in mind that a thoroughly justified model with lower accuracy will be rewarded **more** points than a poorly experimentally validated model with higher accuracy.

PENSO CHE LA COSA MIGLIORE DA FARE SIA INIZIARE CON UN MODELLO MOLTO COMPLICATO CHE OVERFITTI SUL TRAINING DATA. A QUEL PUNTO INSERIRE DELLE REGULARIZATION TECNIQUES CHE MI FACCIANO ARRIVARE A UNA EFFECTIVE CAPACITY OTTIMALE.

In [4]:
!pip install mypy
# Simple mypy cell magic for Colab
from IPython.core.magic import register_cell_magic
from IPython import get_ipython
from mypy import api

@register_cell_magic
def mypy(line, cell):
    for output in api.run(['-c', '\n' + cell] + line.split()):
        if output and not output.startswith('Success'):
            raise TypeError(output)
    get_ipython().run_cell(cell)

Collecting mypy
  Downloading mypy-1.10.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.7/12.7 MB[0m [31m70.2 MB/s[0m eta [36m0:00:00[0m
Collecting mypy-extensions>=1.0.0 (from mypy)
  Downloading mypy_extensions-1.0.0-py3-none-any.whl (4.7 kB)
Installing collected packages: mypy-extensions, mypy
Successfully installed mypy-1.10.1 mypy-extensions-1.0.0


In [5]:
basecnf = {'transform': transforms.PILToTensor(),
                      'batch_size': 32,
                      'num_workers': 2}

In [6]:
#%%mypy --ignore-missing-imports
def create_dataloader(split: str,
                      transform: Callable[[PIL.Image.Image],Tensor],
                      dataset_creation: Callable[[str, Callable[[PIL.Image.Image],Tensor]], Dataset],
                      batch_size: int,
                      shuffle: bool,
                      num_workers: int) -> DataLoader:
      dataset = dataset_creation(split, transform)
      return DataLoader(dataset, batch_size = batch_size, shuffle = shuffle, num_workers = num_workers)

train_dataloader = create_dataloader('train', basecnf['transform'], GroceryStoreDataset, basecnf['batch_size'], True, basecnf['num_workers'])
val_dataloader = create_dataloader('val', basecnf['transform'], GroceryStoreDataset, basecnf['batch_size'], False, basecnf['num_workers'])
test_dataloader = create_dataloader('test', basecnf['transform'], GroceryStoreDataset, basecnf['batch_size'], False, basecnf['num_workers'])

In [18]:
import numpy as np
res = GroceryStoreDataset('train', basecnf['transform'])
res.__getitem__(3)[0].shape

torch.Size([3, 348, 348])

In [38]:
class GroceryStoreModel(torch.nn.Module):
    def __init__(self, nonlin_fn: Callable[[], torch.nn.Module],
                 nor_fn: Callable[[int], torch.nn.Module],
                 dropout_head,
                 nor_head_fn,
                 dropout_fe,
                 stem_convs_fn: Callable[[int, Callable[[], torch.nn.Module], Callable[[int], torch.nn.Module]], torch.nn.Sequential],
                 stem_output_channels: int,
                 num_stages: int,
                 stage_fn: Callable[[int, Callable[[], torch.nn.Module], Callable[[int], torch.nn.Module]], torch.nn.Sequential],
                 head_fn: Callable[[int, int], torch.nn.Sequential],
                 num_classes: int):
        super(GroceryStoreModel, self).__init__()
        self.stem = torch.nn.Sequential(
            stem_convs_fn(stem_output_channels, nonlin_fn, nor_fn, dropout_head),
            torch.nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        )
        self.stages = torch.nn.ModuleList()
        in_channels = stem_output_channels
        for _ in range(num_stages):
            stage = stage_fn(in_channels, nonlin_fn, nor_fn,dropout_head)
            self.stages.append(stage)
            in_channels *= 2
        self.head = head_fn(in_channels, num_classes, nonlin_fn, nor_head_fn, dropout_fe)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x = self.stem(x)
        for stage in self.stages:
            x = stage(x)
        x = self.head(x)
        return x

In [58]:
# vorrei usare sia globalavgpooling che fc layers staccati uno sull'altro.
# vorrei fare anche un paragone col numero di parametri risultanti dalla rete e salvare tutto in una tabella.
# tipo comparando il risultato delle performance tra modello con fclayers e modello con global avg poooling INSIEME anche a  una comparazione sul numero di parametri
# usati.
#from thop import profile
#flops, params = profile(model, inputs=(inputs,))

def block(in_channels: int, out_channels: int, kernel_size: int, stride: int, padding: int,
          normalization_layer: torch.nn.Module, activation_layer: torch.nn.Module, dropout_rate) -> torch.nn.Sequential:
    layers = [
        torch.nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, stride=stride, padding=padding),
        normalization_layer,
        activation_layer
    ]
    if dropout_rate > 0:
        layers.append(torch.nn.Dropout(dropout_rate))
    return torch.nn.Sequential(*layers)

def stem_convs1(out_channels: int, activation_fn: Callable[[], torch.nn.Module], normalization_fn: Callable[[int], torch.nn.Module], dropout_rate:float = 0.0) -> torch.nn.Sequential:
    return torch.nn.Sequential(
        block(3, out_channels, 3, 2, 1, normalization_fn(out_channels), activation_fn(),dropout_rate),
        block(out_channels, out_channels, 3, 1, 1, normalization_fn(out_channels), activation_fn(),dropout_rate),
        block(out_channels, out_channels, 3, 1, 1, normalization_fn(out_channels), activation_fn(),dropout_rate)
    )


def stem_convs2(out_channels: int, activation_fn: Callable[[], torch.nn.Module], normalization_fn: Callable[[int], torch.nn.Module], dropout_rate:float = 0.0) -> torch.nn.Sequential:
    return block(3, out_channels, 7, 2, 1, normalization_fn(out_channels), activation_fn(),dropout_rate)


def stage_fn1(in_channels: int, activation_fn: Callable[[], torch.nn.Module], normalization_fn: Callable[[int], torch.nn.Module], dropout_rate:float = 0.0) -> torch.nn.Sequential:
    out_channels = in_channels * 2
    return torch.nn.Sequential(
        block(in_channels, out_channels, 3, 1, 1, normalization_fn(out_channels), activation_fn(),dropout_rate),
        block(out_channels, out_channels, 3, 1, 1, normalization_fn(out_channels), activation_fn() ,dropout_rate),
        torch.nn.MaxPool2d(kernel_size=2, stride=2)
    )
def stage_fn2(in_channels: int, activation_fn: Callable[[], torch.nn.Module], normalization_fn: Callable[[int], torch.nn.Module], dropout_rate:float = 0.0) -> torch.nn.Sequential:
    out_channels = in_channels * 2
    return torch.nn.Sequential(
        block(in_channels, out_channels, 5, 1, 2, normalization_fn(out_channels), activation_fn(),dropout_rate),
        torch.nn.MaxPool2d(kernel_size=2, stride=2)
    )

def head_fn1(in_channels: int, num_classes: int, activation_fn: Callable[[], torch.nn.Module], normalization_fn: Callable[[int], torch.nn.Module], dropout_rate) -> torch.nn.Sequential:
    return torch.nn.Sequential(
        torch.nn.AdaptiveAvgPool2d((1, 1)),
        torch.nn.Flatten(),
        torch.nn.Dropout(dropout_rate),
        torch.nn.Linear(in_channels, num_classes)
    )

def head_fn2(in_channels: int, num_classes: int, activation_fn: Callable[[], torch.nn.Module], normalization_fn: Callable[[int], torch.nn.Module], dropout_rate):
      return torch.nn.Sequential(
        torch.nn.Flatten(),  # Flatten the input tensor to feed into fully connected layers
        torch.nn.Linear(in_channels, 256),  # First fully connected layer
        normalization_fn(256),
        # Batch normalization layer
        activation_fn(),
        torch.nn.Dropout(dropout_rate),# Activation layer without in-place modification
        torch.nn.Linear(256, 128),  # Second fully connected layer
        normalization_fn(128),  # Batch normalization layer
        activation_fn(),
        torch.nn.Dropout(dropout_rate),# Activation layer without in-place modification
        torch.nn.Linear(128, num_classes)  # Final classification layer
    )

# Example usage
model = GroceryStoreModel(
    nonlin_fn=lambda: torch.nn.LeakyReLU(),
    nor_fn=lambda channels: torch.nn.BatchNorm2d(channels),
    nor_head_fn = lambda channels: torch.nn.BatchNorm1d(channels),
    dropout_head = 0.0,
    dropout_fe = 0.0,
    stem_convs_fn=stem_convs1,
    stem_output_channels=64,
    num_stages=3,
    stage_fn=stage_fn1,
    head_fn=head_fn1,
    num_classes=42
)

In [59]:
from torchsummary import summary

summary(model, input_size=(3, 348, 348))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 64, 174, 174]           1,792
       BatchNorm2d-2         [-1, 64, 174, 174]             128
         LeakyReLU-3         [-1, 64, 174, 174]               0
            Conv2d-4         [-1, 64, 174, 174]          36,928
       BatchNorm2d-5         [-1, 64, 174, 174]             128
         LeakyReLU-6         [-1, 64, 174, 174]               0
            Conv2d-7         [-1, 64, 174, 174]          36,928
       BatchNorm2d-8         [-1, 64, 174, 174]             128
         LeakyReLU-9         [-1, 64, 174, 174]               0
        MaxPool2d-10           [-1, 64, 87, 87]               0
           Conv2d-11          [-1, 128, 87, 87]          73,856
      BatchNorm2d-12          [-1, 128, 87, 87]             256
        LeakyReLU-13          [-1, 128, 87, 87]               0
           Conv2d-14          [-1, 128,

In [None]:
import numpy as np
res = GroceryStoreDataset('train', basecnf['transform'])
res.__getitem__(3)[0].shape

torch.Size([3, 348, 348])

In [None]:
import numpy as np
res = GroceryStoreDataset('train', basecnf['transform'])
res.__getitem__(3)[0].shape

torch.Size([3, 348, 348])

## Part 2: fine-tune an existing network

Your goal is to fine-tune a pretrained **ResNet-18** model on `GroceryStoreDataset`. Use the implementation provided by PyTorch, do not implement it yourselves! (i.e. exactly what you **could not** do in the first part of the assignment). Specifically, you must use the PyTorch ResNet-18 model pretrained on ImageNet-1K (V1). Divide your fine-tuning into two parts:

1. First, fine-tune the Resnet-18 with the same training hyperparameters you used for your best model in the first part of the assignment.
1. Then, tweak the training hyperparameters in order to increase the accuracy on the validation split of `GroceryStoreDataset`. Justify your choices by analyzing the training plots and/or citing sources that guided you in your decisions (papers, blog posts, YouTube videos, or whatever else you find enlightening). You should consider yourselves satisfied once you obtain a classification accuracy on the **validation** split **between 80 and 90%**.