[**AlexNet**](https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf) was the first large-scale network deployed to beat conventional computer vision methods on a large-scale vision challenge, leveraging ReLU activations and dropout for improved performance. 

![](imgs/AlexNet.png)
> Image Source: [Neurohive](https://neurohive.io/en/popular-networks/alexnet-imagenet-classification-with-deep-convolutional-neural-networks/)

AlexNet consists of 8 layers—5 convolutional layers followed by 3 fully connected layers. The network uses ReLU activation, dropout for regularization, and overlapping max-pooling.

1. **Input Layer**
   - Takes in a **227×227×3 RGB** image (original ImageNet images were **224×224**, but AlexNet used a slightly larger input size due to specific kernel strides).
   
2. **First Convolutional Layer (Conv1)**
   - **96 filters** of size **11×11×3**, **stride 4**, **ReLU activation**
   - Output size: **55×55×96**
   - Followed by **Max Pooling (3×3, stride 2)** → Output: **27×27×96**

3. **Second Convolutional Layer (Conv2)**
   - **256 filters** of size **5×5×96**, **stride 1**, **ReLU activation**
   - **Local Response Normalization (LRN)** (helps generalization)
   - **Max Pooling (3×3, stride 2)** → Output: **13×13×256**

4. **Third Convolutional Layer (Conv3)**
   - **384 filters** of size **3×3×256**, **stride 1**, **ReLU activation**
   - Output: **13×13×384**

5. **Fourth Convolutional Layer (Conv4)**
   - **384 filters** of size **3×3×384**, **stride 1**, **ReLU activation**
   - Output: **13×13×384**

6. **Fifth Convolutional Layer (Conv5)**
   - **256 filters** of size **3×3×384**, **stride 1**, **ReLU activation**
   - **Max Pooling (3×3, stride 2)** → Output: **6×6×256**

7. **Fully Connected Layers**
   - **FC6:** 4096 neurons, ReLU, **Dropout (50%)**
   - **FC7:** 4096 neurons, ReLU, **Dropout (50%)**
   - **FC8 (Output Layer):** 1000 neurons (for ImageNet classes), **Softmax activation**

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.init as init
import utils

In [8]:
class AlexNet(nn.Module):
    def __init__(self, num_classes=1000):
        super().__init__()

        self.net = nn.Sequential(
            nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=1),
            nn.ReLU(inplace=True), nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(96, 256, kernel_size=5, padding=2),
            nn.ReLU(inplace=True), nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(256, 384, kernel_size=3, padding=1), nn.ReLU(inplace=True),
            nn.Conv2d(384, 384, kernel_size=3, padding=1), nn.ReLU(inplace=True),
            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True), nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Flatten(),
            nn.Linear(6 * 6 * 256, 4096), nn.ReLU(inplace=True), nn.Dropout(0.5),
            nn.Linear(4096, 4096), nn.ReLU(inplace=True), nn.Dropout(0.5),
            nn.Linear(4096, num_classes)
        )

        self._initialize_weights()

    def forward(self, x):
        return self.net(x)
    
    def _initialize_weights(self):
        # Iterate through individual layers in nn.Sequential
        for m in self.net:
            if isinstance(m, nn.Conv2d) or isinstance(m, nn.Linear):
                nn.init.xavier_uniform_(m.weight)  # Apply Xavier uniform initialization
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)  # Set biases to 0


In [4]:
utils.layer_summary(AlexNet(num_classes=10), (1, 3, 227, 227))

Layer Name                     Layer Type              Param #         Output Shape
net.0                          Conv2d                    34944                Error
net.1                          ReLU                          0     (1, 3, 227, 227)
net.2                          MaxPool2d                     0     (1, 3, 113, 113)
net.3                          Conv2d                   614656                Error
net.4                          ReLU                          0     (1, 3, 113, 113)
net.5                          MaxPool2d                     0       (1, 3, 56, 56)
net.6                          Conv2d                   885120                Error
net.7                          ReLU                          0       (1, 3, 56, 56)
net.8                          Conv2d                  1327488                Error
net.9                          ReLU                          0       (1, 3, 56, 56)
net.10                         Conv2d                   884992              

In [5]:
data = utils.CIFAR10DataLoader(batch_size=64, resize=(227, 227))
train_loader = data.get_train_loader()
test_loader = data.get_test_loader()

In [9]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = AlexNet(num_classes=10).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

epochs = 10
for epoch in range(epochs):
    train_loss, train_acc = utils.train_step(train_loader, model, criterion, optimizer, device)
    test_loss, test_acc = utils.eval_step(test_loader, model, criterion, device)
    print(f"Epoch {epoch + 1}/{epochs}: Train Loss={train_loss}, Test Loss={test_loss}, Test Accuracy={test_acc}")

Epoch 1/10: Train Loss=1.7938762808699742, Test Loss=1.421180201184218, Test Accuracy=0.4853
Epoch 2/10: Train Loss=1.4135858848729097, Test Loss=1.2837689970708956, Test Accuracy=0.5363
Epoch 3/10: Train Loss=1.2614317626294578, Test Loss=1.2204932574253933, Test Accuracy=0.5609
Epoch 4/10: Train Loss=1.1539182943456314, Test Loss=1.1173195740219894, Test Accuracy=0.6116
Epoch 5/10: Train Loss=1.045770225241361, Test Loss=1.0576315620902237, Test Accuracy=0.62
Epoch 6/10: Train Loss=0.9835313361166688, Test Loss=1.0808553107225212, Test Accuracy=0.6205
Epoch 7/10: Train Loss=0.9082142593305739, Test Loss=0.9975112623469845, Test Accuracy=0.6487
Epoch 8/10: Train Loss=0.841291508558766, Test Loss=1.006145279878264, Test Accuracy=0.6504
Epoch 9/10: Train Loss=0.7958463107990792, Test Loss=1.046884818441549, Test Accuracy=0.6398
Epoch 10/10: Train Loss=0.7327078013773769, Test Loss=1.040890034216984, Test Accuracy=0.6559
