This is an implementation of AlexNet introduced in **ImageNet Classification with Deep Convolutional Neural Networks** found [here](https://papers.nips.cc/paper_files/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html). This notebook is just a way for me to understand my code found in `alexnet.py`.

### Preparing the Project

In [1]:
%pip install torch

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip available: 22.2.1 -> 24.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
import torch
import torch.nn as nn

### Building AlexNet

AlexNet's architecture can be found in the papers in the diagram or just by reading it. Layed out it looks like this:

```
Convolutional Layer
ReLU
Response Normalization Layer
MaxPool Layer

Convolutional Layer
ReLU
Response Normalization Layer
MaxPool Layer

Convolutional Layer
ReLU

Convloutional Layer
ReLU

Convolutional Layer
ReLU
MaxPool Layer

Dropout
Fully-Connected Layer
ReLU

Dropout
Fully-Connected Layer
ReLU

Fully-Connected Layer
ReLU

1000-way Softmax
```
With this layout, we can first construct a template for this architecture:

```python
class AlexNet(nn.Module):
    self.conv1 = nn.Conv2d()
    self.relu1 = nn.ReLU()
    self.norm1 = nn.LocalResponseNorm()
    self.maxp1 = nn.MaxPool2d()

    self.conv2 = nn.Conv2d()
    self.relu2 = nn.ReLU()
    self.norm2 = nn.LocalResponseNorm()
    self.maxp2 = nn.MaxPool2d()

    self.conv3 = nn.Conv2d()
    self.relu3 = nn.ReLU()

    self.conv4 = nn.Conv2d()
    self.relu4 = nn.ReLU()

    self.conv5 = nn.Conv2d()
    self.relu5 = nn.ReLU()
    self.maxp5 = nn.MaxPool2d()

    self.dropf1 = nn.Dropout()
    self.fc1 = nn.Linear()
    self.reluf1 = nn.ReLU()

    self.dropf2 = nn.Dropout()
    self.fc2 = nn.Linear()
    self.reluf2 = nn.Linear()

    self.fc3 = nn.Linear()
```
With this, we can fill in the arguments for each module by reading the paper to get the fully completed code below.

In [4]:
class AlexNet(nn.Module):

    def __init__(self):
        super().__init__()
    
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=96, kernel_size=11, stride=4, padding=2)
        self.relu1 = nn.ReLU()
        self.norm1 = nn.LocalResponseNorm(size=5, alpha=0.0001, beta=0.75, k=2)
        self.maxp1 = nn.MaxPool2d(kernel_size=3, stride=2)
        
        self.conv2 = nn.Conv2d(in_channels=96, out_channels=256, kernel_size=5, stride=1, padding=2)
        self.relu2 = nn.ReLU()
        self.norm2 = nn.LocalResponseNorm(size=5, alpha=0.0001, beta=0.75, k=2)
        self.maxp2 = nn.MaxPool2d(kernel_size=3, stride=2)

        self.conv3 = nn.Conv2d(in_channels=256, out_channels=384, kernel_size=3, stride=1, padding=1)
        self.relu3 = nn.ReLU()
        
        self.conv4 = nn.Conv2d(in_channels=384, out_channels=384, kernel_size=3, stride=1, padding=1)
        self.relu4 = nn.ReLU()

        self.conv5 = nn.Conv2d(in_channels=384, out_channels=256, kernel_size=3, stride=1, padding=1)
        self.relu5 = nn.ReLU()
        self.maxp5 = nn.MaxPool2d(kernel_size=3, stride=2)

        self.dropf1 = nn.Dropout(p=0.5, inplace=True)
        self.fc1 = nn.Linear(in_features=(256 * 6 * 6), out_features=4096)
        self.reluf1 = nn.ReLU()
        self.dropf2 = nn.Dropout(p=0.5, inplace=True)
        self.fc2 = nn.Linear(in_features=4096, out_features=4096)
        self.reluf2 = nn.ReLU()
        self.fc3 = nn.Linear(in_features=4096, out_features=1000)

    def forward(self, x):

        x = self.conv1(x)
        x = self.relu1(x)
        x = self.norm1(x)
        x = self.maxp1(x)

        x = self.conv2(x)
        x = self.relu2(x)
        x = self.norm2(x)
        x = self.maxp2(x)

        x = self.conv3(x)
        x = self.relu3(x)

        x = self.conv4(x)
        x = self.relu4(x)

        x = self.conv5(x)
        x = self.relu5(x)
        x = self.maxp5(x)

        x = x.reshape(x.shape[0], -1)

        x = self.dropf1(x)
        x = self.fc1(x)
        x = self.reluf1(x)
        x = self.dropf2(x)
        x = self.fc2(x)
        x = self.reluf2(x)
        x = self.fc3(x)

        return x

Everything can be found in the paper other than the paddings which the author does not mention (for some reason), but they are used in all convolutional layers with sizes `[2, 2, 1, 1, 1]` respectively.

Also note that before we can pass the input from the last convolutional layer to the first fully-connected layer, we have to flatten it.

### Testing the Model

We can test the model by passing a dummy tensor of size `[2, 3, 244, 244]` which represents two 244x244 RGB images. We know the model works when this dummy tensor passes through the model without triggering an error.

In [5]:
def test():
    net = AlexNet()
    x = torch.randn(2, 3, 224, 224)
    y = net(x).to('cuda')
    print(y.shape)

test()

torch.Size([2, 1000])
