## Title: ImageNet Classification with Deep Convolutional Neural Networks

### Summary:
- This research introduces a large, deep convolutional neural network (CNN) that outperformed state-of-the-art methods on both the ImageNet LSVRC-2010 and LSVRC-2012 image categorization competitions.
- The CNN comprises five convolutional layers, some of which are followed by max-pooling layers, three fully connected layers, and a final 1000-way softmax, and contains 650,000 neurons and 60 million parameters.
- The study presents a number of methods for accelerating CNN training and increasing its accuracy, including the use of non-saturating neurons, a highly efficient GPU implementation of the convolution function, and a regularization method called "dropout" for minimizing overfitting.
- This research also examines the CNN's learned features and demonstrates how they capture complicated patterns in natural images while being invariant to modifications.
- Both the top-1 and top-5 error rates on the ImageNet test data show that the CNN significantly outperforms earlier approaches in this research.

Read more: https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf

In [1]:
import torch
from torch import nn
from torchsummary import summary

In [2]:
class AlexNet(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        
        self.features = nn.Sequential(
            nn.Conv2d(in_channels=3, out_channels=96, kernel_size=11, stride=4, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(3, 2),
            nn.Conv2d(96, 256, 5, 1, 3),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(3, 2),
            nn.Conv2d(256, 384, 3, 1, 1),
            nn.ReLU(inplace=True),
            nn.Conv2d(384, 384, 3, 1, 1),
            nn.ReLU(inplace=True),
            nn.Conv2d(384, 256, 3, 1, 1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(3, 2)
        )

        self.classifier = nn.Sequential(
            nn.Dropout(0.5),
            nn.Linear(256 * 6 * 6, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(0.5),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Linear(4096, 1000)
        )

    def forward(self, x):
        x = self.features(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x


In [3]:
alexnet = AlexNet()
summary(alexnet, (3, 224, 224))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 96, 54, 54]          34,944
              ReLU-2           [-1, 96, 54, 54]               0
         MaxPool2d-3           [-1, 96, 26, 26]               0
            Conv2d-4          [-1, 256, 28, 28]         614,656
              ReLU-5          [-1, 256, 28, 28]               0
         MaxPool2d-6          [-1, 256, 13, 13]               0
            Conv2d-7          [-1, 384, 13, 13]         885,120
              ReLU-8          [-1, 384, 13, 13]               0
            Conv2d-9          [-1, 384, 13, 13]       1,327,488
             ReLU-10          [-1, 384, 13, 13]               0
           Conv2d-11          [-1, 256, 13, 13]         884,992
             ReLU-12          [-1, 256, 13, 13]               0
        MaxPool2d-13            [-1, 256, 6, 6]               0
          Dropout-14                 [-