[**AlexNet**](https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf) was the first large-scale network deployed to beat conventional computer vision methods on a large-scale vision challenge, leveraging ReLU activations and dropout for improved performance. 

![](imgs/AlexNet.png)

[AlexNet Architecture](https://neurohive.io/en/popular-networks/alexnet-imagenet-classification-with-deep-convolutional-neural-networks/)

AlexNet consists of 8 layers—5 convolutional layers followed by 3 fully connected layers. The network uses ReLU activation, dropout for regularization, and overlapping max-pooling.

1. Input Layer
   - Takes in a 227×227×3 RGB image (original ImageNet images were 224×224, but AlexNet used a slightly larger input size due to specific kernel strides).
   
2. First Convolutional Layer (Conv1)
   - 96 filters of size 11×11×3, stride 4, ReLU activation
   - Output size: 55×55×96
   - Followed by Max Pooling (3×3, stride 2) → Output: 27×27×96

3. Second Convolutional Layer (Conv2)
   - 256 filters of size 5×5×96, stride 1, ReLU activation
   - Max Pooling (3×3, stride 2) → Output: 13×13×256

4. Third Convolutional Layer (Conv3)
   - 384 filters of size 3×3×256, stride 1, ReLU activation
   - Output: 13×13×384

5. Fourth Convolutional Layer (Conv4)
   - 384 filters of size 3×3×384, stride 1, ReLU activation
   - Output: 13×13×384

6. Fifth Convolutional Layer (Conv5)
   - 256 filters of size 3×3×384, stride 1, ReLU activation
   - Max Pooling (3×3, stride 2) → Output: 6×6×256

7. Fully Connected Layers
   - FC6: 4096 neurons, ReLU, Dropout (50%)
   - FC7: 4096 neurons, ReLU, Dropout (50%)
   - FC8 (Output Layer): 1000 neurons (for ImageNet classes), Softmax activation

In [2]:
import torch
import torch.nn as nn
import torch.optim as optim
import utils

In [3]:
class AlexNet(nn.Module):
    def __init__(self, num_classes=10):
        super().__init__()

        self.net = nn.Sequential(
            nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=1),
            nn.ReLU(inplace=True), nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(96, 256, kernel_size=5, padding=2),
            nn.ReLU(inplace=True), nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(256, 384, kernel_size=3, padding=1), nn.ReLU(inplace=True),
            nn.Conv2d(384, 384, kernel_size=3, padding=1), nn.ReLU(inplace=True),
            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True), nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Flatten(),
            nn.Linear(6 * 6 * 256, 4096), nn.ReLU(inplace=True), nn.Dropout(0.5),
            nn.Linear(4096, 4096), nn.ReLU(inplace=True), nn.Dropout(0.5),
            nn.Linear(4096, num_classes)
        )

    def forward(self, x):
        return self.net(x)

In [4]:
utils.layer_summary(AlexNet(num_classes=10), (1, 3, 227, 227))

Input shape: (1, 3, 227, 227)
----------------------------------------
Conv2d          output shape: (1, 96, 55, 55)
ReLU            output shape: (1, 96, 55, 55)
MaxPool2d       output shape: (1, 96, 27, 27)
Conv2d          output shape: (1, 256, 27, 27)
ReLU            output shape: (1, 256, 27, 27)
MaxPool2d       output shape: (1, 256, 13, 13)
Conv2d          output shape: (1, 384, 13, 13)
ReLU            output shape: (1, 384, 13, 13)
Conv2d          output shape: (1, 384, 13, 13)
ReLU            output shape: (1, 384, 13, 13)
Conv2d          output shape: (1, 256, 13, 13)
ReLU            output shape: (1, 256, 13, 13)
MaxPool2d       output shape: (1, 256, 6, 6)
Flatten         output shape: (1, 9216)
Linear          output shape: (1, 4096)
ReLU            output shape: (1, 4096)
Dropout         output shape: (1, 4096)
Linear          output shape: (1, 4096)
ReLU            output shape: (1, 4096)
Dropout         output shape: (1, 4096)
Linear          output shape: (1, 10)
-----

In [5]:
data = utils.CIFAR10DataLoader(batch_size=64, resize=(227, 227))
train_loader = data.get_train_loader()
test_loader = data.get_test_loader()

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = AlexNet(num_classes=10)
model.apply(utils.init_kaiming).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

epochs = 10
for epoch in range(epochs):
    train_loss, train_acc = utils.train_step(train_loader, model, criterion, optimizer, device)
    test_loss, test_acc = utils.eval_step(test_loader, model, criterion, device)
    print(f"Epoch {epoch + 1:>{len(str(epochs))}}/{epochs} | "
          f"Train Loss: {train_loss:.4f} | "
          f"Test Loss: {test_loss:.4f} | "
          f"Test Acc: {test_acc:.4f}")

Epoch  1/10 | Train Loss: 1.9811 | Test Loss: 1.6664 | Test Acc: 0.3594
Epoch  2/10 | Train Loss: 1.4305 | Test Loss: 1.2087 | Test Acc: 0.5628
Epoch  3/10 | Train Loss: 1.1476 | Test Loss: 1.0258 | Test Acc: 0.6400
Epoch  4/10 | Train Loss: 0.9511 | Test Loss: 0.9292 | Test Acc: 0.6731
Epoch  5/10 | Train Loss: 0.8233 | Test Loss: 0.8783 | Test Acc: 0.7014
Epoch  6/10 | Train Loss: 0.7192 | Test Loss: 0.7426 | Test Acc: 0.7437
Epoch  7/10 | Train Loss: 0.6266 | Test Loss: 0.7846 | Test Acc: 0.7336
Epoch  8/10 | Train Loss: 0.5547 | Test Loss: 0.7405 | Test Acc: 0.7560
Epoch  9/10 | Train Loss: 0.4914 | Test Loss: 0.6824 | Test Acc: 0.7737
Epoch 10/10 | Train Loss: 0.4310 | Test Loss: 0.7219 | Test Acc: 0.7600
