# Training a ResNet classifier to classify mushrooms
In this notebook, we will train and save a working ResNet model based on the ResNet9 architechture (LINK) to classify images of mushrooms in the norwegian flora. To this end, we first need to create a pipeline to load in our data, preprocess it and feed it to a training loop in mini-batches. We must further design the residual convolutional blocks used in the ResNet, as well as the final model. 


## Preparing dataset and preprocessing of data
Before declaring residual convolutional blocks and the ResNet model, we should make sure all data can be loaded, preprocessed and iterated over in a consistent, precise manner. To this end, we will declare a PyTorch dataset and a PyTorch preprocessing step, all present and pre-loaded into a dataset instance `data`. 
### Defining a PyTorch dataset for image data
The image data and subsequent labels will be accessed and loaded into memory using a custom `MushroomDataset`-class, inheriting from PyTorch standard dataset-class in `torch.utils.Dataset`. Not only will creating a separate class streamline the retrieval and preprocessing of data, the inherited functionality allows for the seamless division into mini-batches using PyTorch Dataloaders, which should allow for better, less resource intensive training down the line: 

In [150]:
import torch
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from torch.utils.data import Dataset
from pathlib import Path

# Create a custom dataset to simplify the use and indexing of the custom mushroom dataset
class MushroomDataset(Dataset):
    # Overload the init function to capture the image directory, transform and load in the labels
    def __init__(self, path_imgs: str, path_labels: str, transform = None) -> None:
        self.img_dir = path_imgs
        self.labels = pd.read_csv(path_labels)
        self.transform = transform
    
    # Overload the len(..) operator to give the length of all labels
    def __len__(self) -> int:
        return self.labels.shape[0]
    
    # Overload the [index] indexator to yield an image (s.t transforms) and it's corresponding label
    def __getitem__(self, index):
        # Find the image path and load the image
        img_path = Path(f"{self.img_dir}/{self.labels.iloc[index, 0]}.jpg")
        img = plt.imread(img_path)

        # Load the corresponding image label
        label = torch.tensor(self.labels.iloc[index, 1], dtype=torch.int16)

        # If a transform is specified, apply it to the image
        if self.transform:
            img = self.transform(img)

        return (img, label)
    
    def _num_classes(self) -> int:
        return len(self.labels.value_counts('label'))

### Defining a preprocessing pipeline
The `MushroomDataset`-class contains a `transform` parameter, which will be used to apply a set of simple, yet important, preprocessing steps to the image data. Essentially, we wish to normalize all color channels of the image data for the better convergence of the employed nonlinear optimization scheme during training, as well as transform the data into PyTorch tensors.

Before defining the preprocessing pipeline however, we need to note the average mean- and standard deviation of all color channels across our dataset. This will serve as the backbone for our normalization scheme, and the values should be found experimentally. Below, the mean and standard deviations of the separate color channels of all images are accumulated into `mean_liet` and `std_list`, before being averaged and returned. This yields the necessary data for our normalization pipeline:

In [151]:
BASE_DIR = Path("01_Training_RestNet_Classifier.ipynb").parent.resolve()

# Find a list of all .jpg image-files in the dataset
img_paths = Path(f"{BASE_DIR}/data/mushroom_imgs").rglob('*.jpg')

# Find the mean of all color channels by accumulating each value over all available images
mean_list, std_list = np.array([0, 0, 0]), np.array([0, 0, 0])

for count, path in enumerate(img_paths):
    # Load in the image, convert it to a torch.Tensor and permute for correct dimensions
    img = torch.Tensor(plt.imread(str(path))).permute((2, 0, 1))
    
    # Perform elementwise addition using np.add
    mean_list = np.add(mean_list, img.mean([1, 2]))
    std_list = np.add(std_list, img.std([1, 2]))

# Perform elementwise division with the counter to get the average mean/std of all color channels across all images 
mean_list, std_list = mean_list / (count+1), std_list / (count + 1)

With the mean and standard deviations, we can define a simple preprocessing pipeline using a composite transformation from `torchvision.transforms.Compose`. An image fed to the composite transformation will first be converted into a PyTorch tensor, before being normalized accross all available color channels.

In [152]:
from torchvision import transforms

# Define a composite transform to preprocess the data
preprocessing_pipeline = transforms.Compose([
    transforms.ToTensor(), 
    transforms.Normalize(list(mean_list), list(std_list))
])  

### Create training/test sets w. Dataloaders
An instance `data` of the `MushroomDataset`-class, preprocessed using `preprocessing_pipeline`, can be split into a training and test set using `torch.utils.data.random_split()`. Here, this will be done using 25% of the data for validation:  

In [153]:
# Define the absolute paths to the image data and subsequent labels
IMAGE_DIR = Path(f"{BASE_DIR}/data/mushroom_imgs")
LABEL_DIR = Path(f"{BASE_DIR}/data/mushroom_imgs/img_labels.csv")

# Instantiate the dataset
data = MushroomDataset(IMAGE_DIR, LABEL_DIR, preprocessing_pipeline)

# Divide the dataset into training and test sets using pytorch's 'random_split' method:
train_data, test_data = torch.utils.data.random_split(data, [0.75, 0.25])

For each subset of data, we can now create a dataloader from `torch.utils.data.Dataloader`, allowing us to iterate through the dataset in shuffled mini-batches: 

In [154]:
import os
from torch.utils.data import DataLoader

# Define the BATCH_SIZE hyperparameter deciding the amount of images in each mini-batch during training
# NOTE: This should be tuned as a hyperparameter
BATCH_SIZE = 32

# Declare the dataloaders
train_dataloader = DataLoader(dataset = train_data,
                              batch_size = BATCH_SIZE,
                              shuffle = True)

test_dataloader = DataLoader(dataset = test_data,
                            batch_size = BATCH_SIZE,
                            shuffle = True)


## Defining the ResNet Model
In the following subsection, I will attempt to walk us through the building of a ResNet model from scratch, mainly for the purposes of attaining deep knowledge about the topic and methods used in the implementation. The target output, a general `ResNet`-class, implements what are known as "*deeper ResNet architectures*". These encompass ResNet50, 101, 152 and above; utilizing *bottlenecks*, defined in the [ResNet paper](https://arxiv.org/pdf/1512.03385.pdf), to compress the data going into a block. This implies that the implementation is not ideal for shallower ResNet architectures, like ResNet9, 18 or 34.

The proposed architecture cosists of 5 **layers**, each built up by **residual blocks**. To this end, out implementation will seek to build a class `ResNetLayer` encompassing another class `ResNetBlock`. `ResNetLayer` instances are later concated in the final `ResNet` class, which streamlines the flow of data through said layers.  

The following section will try to streamline the somewhat tricky implementation in an orderly fashion. However, due to the complexity of dimensional operations, it is highly recommended to consult the [original paper](https://arxiv.org/pdf/1512.03385.pdf), especially Table 1, which presents the main outlines.

### The ResNet block
A ResNet block generally seeks to do two things: 
1. Extract learnt features of the input data `x`
2. Combine the input data and said features at the output, effectively augmenting, rather than losing, information. 
The initial features are attained using standard, trained convolutional layers, whilst the final combination is, more often than not, a simple element-wise addition. The theoretical workings of a block are, in this sense, fairly straightforward. 

The first part of the procedure is, however, somewhat complicated by the addition of the aforementioned *bottlenecks*. These change the otherwise simple block-structure to a meticulously crafted stack of convolutional layers, which are used to reduce the amount of parameters necessary to train deeper networks. The first layer, `conv_red` in the below class implementation, seek to reduce the input data to a lower dimensionality by a 1x1 convolution, often by a fixed factor of `4`. This reduced input data `x` is then fed to the second layer `conv_features`, which infers on said data with a 3x3 convolution. Finally, the dimensionality of the extracted features are increased another 1x1 convolution `conv_inc`. Overall, such a bottleneck reduces the amounts of parameters per block significantly, which yields a massive improvement to the training-performance of deeper networks.

*Bottlenecks*, however, also impact the second part, as the reduction/increase of dimensionalities have to be accounted for in the final augmentation-step. When two PyTorch tensors are added together, their dimensions must match perfectly, to which another 1x1 convolution is adopted. The convolution `identity_downsample`, is used to alter the dimensionality of the input `x` if it doesn't match the dimensions of the extracted features. `identity_downsample` is taken in as a parameter to the block and is defined as part of the `ResNetLayer`-class, described in the next section.

In [155]:
from torch import nn

# Define a ResNet block
# NOTE: Following the structure in the article, note that identity mappings increase/decrease dimensions by a factor 4 and that paddings are fixed. 
class ResNetBlock(nn.Module):
    def __init__(self, 
                 in_channels: int, 
                 out_channels:int,
                 identity_mapping: nn.Sequential = None,
                 stride: int = 1):
        super().__init__()
        # The factor of which to increase/reduce dimensions
        self.dim_ext_factor = 4
        
        # The first convolutional layer to reduce dimensionality before a bottleneck, as well as a connected BN computation
        self.conv_red = nn.Sequential(
            nn.Conv2d(in_channels = in_channels,
                      out_channels = out_channels,
                      kernel_size = 1,
                      stride = 1,
                      padding = 0),
            nn.BatchNorm2d(out_channels),
            nn.ReLU()
        )

        # The second convolutional layer to infer on the input data. Again followed by a BN computation. NOTE: Only layer with variable stride
        self.conv_features = nn.Sequential(
            nn.Conv2d(in_channels = out_channels,
                      out_channels = out_channels,
                      kernel_size = 3,
                      stride = stride,
                      padding = 1),
            nn.BatchNorm2d(out_channels),
            nn.ReLU()
        )

        # The third convolutional layer, in which the dimensions are increased by a factor 'dim_ext_factor'.
        self.conv_inc = nn.Sequential(
            nn.Conv2d(in_channels = out_channels,
                      out_channels = out_channels * self.dim_ext_factor,
                      kernel_size = 1,
                      stride = 1,
                      padding = 0),
            nn.BatchNorm2d(out_channels * self.dim_ext_factor),
            nn.ReLU()
        )

        # Define a ReLU block used to infer on the output
        self.relu = nn.ReLU()

        # Finally, allocate an instance-variable to the desired identity mapping:
        self.identity_mapping = identity_mapping

    # Define the forward function -> How data 'x' is processed by the block
    def forward(self, x):
        # save the original input for later addition
        block_input = x

        # propagate data through the layers using Conv2D -> BN -> ReLu
        x = self.conv_red(x)
        x = self.conv_features(x)
        x = self.conv_inc(x)

        # if an identity-mapping is supplied and the input dimensions don't match the data, use the mapping to downsample 'block input'
        if self.identity_mapping and block_input.shape != x.shape:
            block_input = self.identity_mapping(block_input)
        
        # add the input to the residual information -> Information augmentation
        x += block_input

        return self.relu(x)


### The ResNet Layer
A ResNet Layer seeks to propagate the input data `x` through `num_blocks` `ResNetBlock` instances using the same amount of filters/convolutional kernels. A complete residual network consists of multiple such layers with different amounts of blocks working on different dimensionalities of data. The entire purpose of the below class is then to implement a layer, as described by Table 1 in the [ResNet article](https://arxiv.org/pdf/1512.03385.pdf).

In said article, one can see that the dimensionality within each layer is increased by a `dim_ext_factor` of 4. This is hardcoded as an instance-parameter, and is used throughout the layer. 

Firstly, the layer-implementation is tasked with the declaration of an `identity mapping` if the amount of convolutional kernels in the input `in_channels` is not `dim_ext_factor` proportional to the desired amount of kernels in the output `out_channels`. A simple sequential operation is then declared, with a 1x1 convolution correctly scaling the input data as well as a simple 2D batch-norm operation. 

Secondly, the layer is tasked with organizing `num_blobks` instances of `ResNetBlock`, so as to propagate the data correctly. This is simply done by adding all instances to a list, which are readily available in the later *forward* pass. The most important detail to note here is how the `in_channels` are scaled after the first block, so as to accomodate later blocks to the dimensionality increase present throughout a ResNet layer.

In [156]:
# Define a ResNet Layer
class ResNetLayer(nn.Module):
    def __init__(self,
                 num_blocks: int,
                 in_channels: int,
                 out_channels: int,
                 stride: int):
        super().__init__()
        # The factor of which to increase/reduce dimensions
        self.dim_ext_factor = 4
        
        # Define the identity_mapping and whether or not it is needed in the current layer
        identity_mapping = None
        if stride != 1 or in_channels != out_channels*self.dim_ext_factor:
            identity_mapping = nn.Sequential(
                nn.Conv2d(in_channels = in_channels,
                          out_channels = out_channels * self.dim_ext_factor,
                          kernel_size = 1,
                          stride = stride),
                nn.BatchNorm2d(out_channels * self.dim_ext_factor)
            )
        
        # Define the layer as an instance parameter and start adding blocks to it
        self.layer = []
        self.layer.append(ResNetBlock(in_channels = in_channels,
                                      out_channels = out_channels,
                                      identity_mapping = identity_mapping,
                                      stride = stride))
        
        # At the end of a ResNetBlock, the number of channels are increased by a factor of self.dim_ext_factor. The 'in_channels' of subsequent blocks have to be adjusted accordingly.
        in_channels = out_channels * self.dim_ext_factor

        # Go through the remaining blocks, appending them to the layer
        for i in range(num_blocks - 1):
            block = ResNetBlock(in_channels = in_channels, 
                                out_channels = out_channels,
                                identity_mapping = identity_mapping,
                                stride = 1)
            self.layer.append(block)

    # Define the forward pass through the Layer -> How data 'x' is processed
    def forward(self, x):
        # Propagate the data through all blocks within the layer
        for block in self.layer:
            x = block(x)

        return x


### Deep ResNet Architecture
With `ResNetLayer` instances, we can now implement a general class for deeper ResNet architectures using *bottlenecked* residual blocks. Again following Table 1 in the [ResNet article](https://arxiv.org/pdf/1512.03385.pdf), we seek to implement 1 standard, convolutional layer, followed by 4 `ResNetLayer`, which lead to a Fully Connected Neural Network (FCNN) mapping the input to `num_classes` possible outcomes.

The `ResNet` code is fairly self explanatory. The only thing one should note is how the choice of `layer_list` affects the output architecture. `layer_list` defines the amount of `ResNetBlock`-instances to be declared at each layer and is set to a ResNet50 architecture by default. By choosing `layer_list = [3, 4, 23, 3]` or `layer_list = [3, 8, 36, 3]`, the resulting architecture can be altered to ResNet101 or ResNet152 respectively.

In [157]:
from typing import List

# Define the finished ResNet implementation
class ResNet(nn.Module):
    def __init__(self,
                 layer_list: List[int] = [3, 4, 6, 3],
                 num_classes: int = 1,
                 img_channels: int = 3):
        super().__init__()

        # Define the first part of the ResNet architecture, which is shared by all sizes.
        self.layer_1 = nn.Sequential(
            nn.Conv2d(in_channels = img_channels,
                      out_channels = 64,
                      kernel_size = 7,
                      stride = 2,
                      padding = 3),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size = 3,
                         stride = 2,
                         padding = 1)
        )

        # Define the 4 ResNetLayer's 
        self.layer_2 = ResNetLayer(layer_list[0], in_channels= 64, out_channels= 64, stride = 1)
        self.layer_3 = ResNetLayer(layer_list[1], in_channels= 256, out_channels= 128, stride = 2)
        self.layer_4 = ResNetLayer(layer_list[2], in_channels= 512, out_channels= 256, stride = 2)
        self.layer_5 = ResNetLayer(layer_list[3], in_channels= 1024, out_channels= 512, stride = 2)
        
        # Define the final Average pooling layer, bringing the result to a (2048, 1, 1) format
        self.avg_pool = nn.AdaptiveAvgPool2d((1, 1))

        # Define the final FCNN
        self.fcnn = nn.Linear(in_features = 512 * 4,
                              out_features = num_classes)

    # Define the final forward pass through the ResNet -> How data 'x' is processed by the block
    def forward(self, x):
        # Pass the data through all layers:
        x = self.layer_1(x)
        x = self.layer_2(x)
        x = self.layer_3(x)
        x = self.layer_4(x)
        x = self.layer_5(x)

        # Perform average pooling on the data, flatten it and feed it through the final fcnn
        x = self.avg_pool(x)
        x = x.reshape(x.shape[0], -1)
        
        return self.fcnn(x)

In [158]:
resNet50 = ResNet(layer_list = [3, 4, 6, 3], 
                  num_classes = data._num_classes(), 
                  img_channels = 3)

torch.Size([1, 31])
