Code from https://www.kaggle.com/code/mohamedmagdy191/traffic-signs-recognition-resnet-from-scratch

# ResNet Explained

**ResNet is a type of CNN.It was designed to tackle the issue of vanishing gradients in deep networks, which was a major hindrance in developing deep neural networks. Its architecture enables the network to learn multiple layers of features without getting stuck in local minima.**

### Here are the key features of the ResNet (Residual Network) architecture:

* Residual Connections: ResNet incorporates residual connections, which allow for training very deep neural networks and alleviate the vanishing gradient problem. 

* Identity Mapping: ResNet uses identity mapping as the residual function, which makes the training process easier by learning the residual mapping rather than the actual mapping.

* Depth: ResNet enables the creation of very deep neural networks, which can improve performance on image recognition tasks. 

* Fewer Parameters: ResNet achieves better results with fewer parameters, making it computationally more efficient.

* State-of-the-art Results: ResNet has achieved state-of-the-art results on various image recognition tasks and has become a widely used benchmark for image recognition tasks.

* General and Effective Approach: The authors conclude that residual connections are a general and effective approach for enabling deeper networks.

### How ResNet Works?

* ResNet works by adding residual connections to the network, which helps to maintain the information flow throughout the network and prevents the gradients from vanishing.

* The residual connection is a shortcut that allows the information to bypass one or more layers in the network and reach the output directly.

* The residual connection allows the network to learn the residual function and make small updates to the parameters, which enables the network to converge faster and achieve better performance.

* This enables the network to learn residual functions and helps the network to converge faster and achieve better performance.

* The residual connection is based on the idea that instead of trying to learn the complex mapping between the inputs and the outputs, it is easier to learn the residual function, which maps the inputs to the desired outputs.

### The Problem Statement
Deep Neural Networks provide more accuracy as the number of layers increases. But, when we go deeper into the network, the accuracy of the network decreases instead of increasing. An increase in the depth of the network increases the training error, which ultimately increases the test error. Because of this, the network cannot generalize well for new data, which becomes inefficient. This degradation indicates that the increase in the model layer does not aid the model’s performance.

### The solution
Adding more layers to a suitably deep model leads to higher training errors. The paper presents how architectural changes like residual learning tackle this degradation problem using residual networks. Residual Network adds an identity mapping between the layers. Applying identity mapping to the input will give the output the same as the input. The skip connections directly pass the input to the output, effectively allowing the network to learn an identity function. The paper presents a deep convolutional neural network architecture that solves the vanishing gradients problem and enables the training of deep networks. It showed that deep residual networks could be trained effectively, achieving improved accuracy on several benchmark datasets compared to previous state-of-the-art models.


# References

* Deep Residual Learning for Image Recognition: https://arxiv.org/abs/1512.03385
* ResNet Explained :https://www.analyticsvidhya.com/blog/2023/02/deep-residual-learning-for-image-recognition-resnet-explained/
* Pytorch ResNet implementation from Scratch: https://www.youtube.com/watch?v=DkNIBBBvcPs

In [2]:
import os
import torch
import torch.nn as nn
import torchvision.transforms as T
from PIL import Image
from torch.utils.data import DataLoader, Dataset

# Building Custom Dataset for Traffic signs

In [3]:
# Transforming the Data ToTensor and Normalize it 
transforms = T.Compose([T.ToTensor(), T.Resize((225, 225)),
                        T.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])])

In [4]:
class TSignsDataset(Dataset):
    def __init__(self, df, root_dir, transform=None):
        self.df = df
        self.root_dir = root_dir
        self.transform = transform

    def __len__(self):
        return len(self.df)

    def __getitem__(self, index):
        image_path = os.path.join(self.root_dir, self.df.iloc[index, 7])  #the column of paths in df is 7
        image = Image.open(image_path)
        y_class = torch.tensor(self.df.iloc[index, 6])  #the column of ClsassId in df is 6

        if self.transform:
            image = self.transform(image)

        return (image, y_class)

# Loading The data into DataLoaders

In [5]:
def getDataloaders (batch_size, training_set, validation_set):
    train_loader = DataLoader(dataset=training_set, batch_size=batch_size, shuffle=True)
    valid_loader = DataLoader(dataset=validation_set, batch_size=batch_size, shuffle=False)
    dataloaders = {'training': train_loader, 'validation': valid_loader}
    return dataloaders

# Building The ResNet Model from scratch

### Generic Residual block 

In [6]:
class block(nn.Module):
    def __init__(
            self, in_channels, out_channels, identity_downsample=None, stride=1):
        super().__init__()
        self.expansion = 4
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1, padding=0, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False, )
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.conv3 = nn.Conv2d(out_channels, out_channels * self.expansion, kernel_size=1, stride=1, padding=0,
                               bias=False, )
        self.bn3 = nn.BatchNorm2d(out_channels * self.expansion)
        self.relu = nn.ReLU()
        self.identity_downsample = identity_downsample
        self.stride = stride

    def forward(self, x):
        identity = x.clone()

        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.conv2(x)
        x = self.bn2(x)
        x = self.relu(x)
        x = self.conv3(x)
        x = self.bn3(x)

        if self.identity_downsample is not None:
            identity = self.identity_downsample(identity)

        x += identity
        x = self.relu(x)
        return x

### Generic implementation of ResNet Class

In [7]:
class ResNet(nn.Module):
    def __init__(self, block, layers, image_channels, num_classes):
        super(ResNet, self).__init__()
        self.in_channels = 64
        self.conv1 = nn.Conv2d(image_channels, 64, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU()
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

        # Essentially the entire ResNet architecture are in these 4 lines below
        self.layer1 = self._make_layer(block, layers[0], out_channels=64, stride=1)
        self.layer2 = self._make_layer(block, layers[1], out_channels=128, stride=2)
        self.layer3 = self._make_layer(block, layers[2], out_channels=256, stride=2)
        self.layer4 = self._make_layer(block, layers[3], out_channels=512, stride=2)

        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(512 * 4, num_classes)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        x = self.avgpool(x)
        x = x.reshape(x.shape[0], -1)
        x = self.fc(x)

        return x

    def _make_layer(self, block, num_residual_blocks, out_channels, stride):
        identity_downsample = None
        layers = []

        # Either if we half the input space for ex, 56x56 -> 28x28 (stride=2), or channels changes
        # we need to adapt the Identity (skip connection) so it will be able to be added
        # to the layer that's ahead
        if stride != 1 or self.in_channels != out_channels * 4:
            identity_downsample = nn.Sequential(
                nn.Conv2d(self.in_channels, out_channels * 4, kernel_size=1, stride=stride, bias=False)
                , nn.BatchNorm2d(out_channels * 4))

        layers.append(block(self.in_channels, out_channels, identity_downsample, stride))

        # The expansion size is always 4 for ResNet 50,101,152
        self.in_channels = out_channels * 4

        # For example for first resnet layer: 256 will be mapped to 64 as intermediate layer,
        # then finally back to 256. Hence no identity downsample is needed, since stride = 1,
        # and also same amount of channels.
        for i in range(num_residual_blocks - 1):
            layers.append(block(self.in_channels, out_channels))

        return nn.Sequential(*layers)

### The ResNet: 3 levels of depth

In [8]:
def ResNet50(img_channel=3, num_classes=1000):
    return ResNet(block, [3, 4, 6, 3], img_channel, num_classes)


def ResNet101(img_channel=3, num_classes=1000):
    return ResNet(block, [3, 4, 23, 3], img_channel, num_classes)


def ResNet152(img_channel=3, num_classes=1000):
    return ResNet(block, [3, 8, 36, 3], img_channel, num_classes)

# Training The model

In [9]:
def Train(model, device, criterion, optimizer, num_epochs, batch_size, dataloaders, out_path):
    best_acc = 0.0

    for epoch in range(num_epochs):
        print("epoch {}/{}".format(epoch + 1, num_epochs))
        print("*" * 10)

        for x in ["training", "validation"]:
            if x == "training":
                model.train()
            else:
                model.eval()

            running_loss = 0.0
            running_accuracy = 0

            for data in dataloaders[x]:
                img, y = data
                img, y = img.to(device), y.to(device)
                print("Image is cuda: ", img.is_cuda)
                optimizer.zero_grad()
                y_pred = model(img)
                loss = criterion(y_pred, y)
                _, preds = torch.max(y_pred, dim=1)

                if x == 'training':
                    loss.backward()
                    optimizer.step()

                running_loss += loss.item()
                running_accuracy += torch.sum(preds == y.data)

            epoch_loss = running_loss / len(dataloaders[x])
            epoch_acc = running_accuracy / len(dataloaders[x])

            print('{} Loss: {:.4f} || Accuracy: {:.4f}'.format(x, epoch_loss, epoch_acc))

            # deep copy the model
            if x == 'validation' and epoch_acc > best_acc:
                best_acc = epoch_acc

    # load best model weights
    torch.save(model.state_dict(), out_path)
    return print('Best validation Accuracy: {:4f}'.format(best_acc))