## AlexNet Complete Architecture

### Introduction
AlexNet was designed by **Hinton**, winner of the ***2012*** ***ImageNet competition***, and his student Alex Krizhevsky. It was also after that year that more and deeper neural networks were proposed, such as the excellent vgg, GoogleLeNet. Its official data model has an accuracy rate of **57.1%** and top 1-5 reaches **80.2%**. This is already quite outstanding for traditional machine learning classification algorithms.


![title](https://raw.githubusercontent.com/blurred-machine/Data-Science/master/Deep%20Learning%20SOTA/img/alexnet.png)


![title](https://raw.githubusercontent.com/blurred-machine/Data-Science/master/Deep%20Learning%20SOTA/img/alexnet2.png)

## Why does AlexNet achieve better results?

### **Relu activation function is used:**
- Relu function: f (x) = max (0, x)

![alex1](https://raw.githubusercontent.com/blurred-machine/Data-Science/master/Deep%20Learning%20SOTA/img/alex512.png)

- ReLU-based deep convolutional networks are trained several times faster than tanh and sigmoid- based networks. The following figure shows the number of iterations for a four-layer convolutional network based on CIFAR-10 that reached 25% training error in tanh and ReLU:

![alex1](https://raw.githubusercontent.com/blurred-machine/Data-Science/master/Deep%20Learning%20SOTA/img/alex612.png)

### **Standardization (Local Response Normalization):**
- After using ReLU f (x) = max (0, x), you will find that the value after the activation function has no range like the tanh and sigmoid functions, so a normalization will usually be done after ReLU, and the LRU is a steady proposal One method in neuroscience is called "Lateral inhibition", which talks about the effect of active neurons on its surrounding neurons.

![alex1](https://raw.githubusercontent.com/blurred-machine/Data-Science/master/Deep%20Learning%20SOTA/img/alex3.jpg)


### **Dropout:**
- Dropout is also a concept often said, which can effectively prevent overfitting of neural networks. Compared to the general linear model, a regular method is used to prevent the model from overfitting. In the neural network, Dropout is implemented by modifying the structure of the neural network itself. For a certain layer of neurons, randomly delete some neurons with a defined probability, while keeping the individuals of the input layer and output layer neurons unchanged, and then update the parameters according to the learning method of the neural network. In the next iteration, rerandom Remove some neurons until the end of training.


![alex1](https://raw.githubusercontent.com/blurred-machine/Data-Science/master/Deep%20Learning%20SOTA/img/alex4.jpg)


### **Enhanced Data (Data Augmentation):**

**In deep learning, when the amount of data is not large enough, there are generally 4 solutions:**

- Data augmentation- artificially increase the size of the training set-create a batch of "new" data from existing data by means of translation, flipping, noise

- Regularization——The relatively small amount of data will cause the model to overfit, making the training error small and the test error particularly large. By adding a regular term after the Loss Function , the overfitting can be suppressed. The disadvantage is that a need is introduced Manually adjusted hyper-parameter.

- Dropout- also a regularization method. But different from the above, it is achieved by randomly setting the output of some neurons to zero

- Unsupervised Pre-training- use Auto-Encoder or RBM's convolution form to do unsupervised pre-training layer by layer, and finally add a classification layer to do supervised Fine-Tuning






##Importing the Libraries

In [1]:
import numpy as np
import torch
import torch.nn as nn
from torchvision import datasets
from torchvision import transforms
from torch.utils.data.sampler import SubsetRandomSampler

In [2]:
# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [3]:
device

device(type='cuda')

##AlexNet from Scratch

In [4]:
class AlexNet(nn.Module):

  def __init__(self, num_classes=10):
    super().__init__()
    self.layer1 = nn.Sequential(
        nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=0),
        nn.BatchNorm2d(96),
        nn.ReLU(),
        nn.MaxPool2d(3, stride=2)
    )
    self.layer2 = nn.Sequential(
        nn.Conv2d(96, 256, kernel_size=5, padding=2),
        nn.BatchNorm2d(256),
        nn.ReLU(),
        nn.MaxPool2d(3, stride=2)
    )
    self.layer3 = nn.Sequential(
        nn.Conv2d(256, 384, kernel_size=3, padding=1),
        nn.BatchNorm2d(384),
        nn.ReLU()
    )
    self.layer4 = nn.Sequential(
        nn.Conv2d(384, 384, kernel_size=3, padding=1),
        nn.BatchNorm2d(384),
        nn.ReLU()
    )
    self.layer5 = nn.Sequential(
        nn.Conv2d(384, 256, kernel_size=3, padding=1),
        nn.BatchNorm2d(256),
        nn.ReLU(),
        nn.MaxPool2d(3, stride=2)
    )
    self.fc1 = nn.Sequential(
        nn.Dropout(0.5),
        nn.Linear(256*6*6, 4096),
        nn.ReLU()
    )
    self.fc2 = nn.Sequential(
        nn.Dropout(0.5),
        nn.Linear(4096, 4096),
        nn.ReLU()
    )
    self.fc3 = nn.Linear(4096, num_classes)  # no need for Sequential
  def forward(self, x):

    out = self.layer1(x)
    out = self.layer2(out)
    out = self.layer3(out)
    out = self.layer4(out)
    out = self.layer5(out)
    out = out.reshape(out.size(0), -1)
    out = self.fc1(out)
    out = self.fc2(out)
    out = self.fc3(out)
    return out

##Dataset

In [5]:
import numpy as np
import torch
from torch.utils.data.sampler import SubsetRandomSampler
from torchvision import datasets, transforms

# ==========================
# Training and Validation Loader
# ==========================
def get_train_valid_loader(data_dir,
                           batch_size,
                           augment,
                           random_seed,
                           valid_size=0.1,
                           shuffle=True):
    # Normalization for CIFAR-10
    normalize = transforms.Normalize(
        mean=[0.4914, 0.4822, 0.4465],
        std=[0.2023, 0.1994, 0.2010]
    )

    # Validation transforms
    valid_transform = transforms.Compose([
        transforms.Resize((227, 227)),
        transforms.ToTensor(),
        normalize
    ])

    # Training transforms
    if augment:
        train_transform = transforms.Compose([
            transforms.RandomCrop(32, padding=4),
            transforms.RandomHorizontalFlip(),
            transforms.ToTensor(),
            normalize
        ])
    else:
        train_transform = transforms.Compose([
            transforms.Resize((227, 227)),
            transforms.ToTensor(),
            normalize
        ])

    # Load the dataset
    train_dataset = datasets.CIFAR10(
        root=data_dir,
        train=True,
        download=True,
        transform=train_transform
    )

    valid_dataset = datasets.CIFAR10(
        root=data_dir,
        train=True,
        download=True,
        transform=valid_transform
    )

    # Create indices for training and validation splits
    num_train = len(train_dataset)
    indices = list(range(num_train))
    split = int(np.floor(valid_size * num_train))

    if shuffle:
        np.random.seed(random_seed)
        np.random.shuffle(indices)

    train_idx, valid_idx = indices[split:], indices[:split]

    # Samplers
    train_sampler = SubsetRandomSampler(train_idx)
    valid_sampler = SubsetRandomSampler(valid_idx)

    # DataLoaders
    train_loader = torch.utils.data.DataLoader(
        train_dataset,
        batch_size=batch_size,
        sampler=train_sampler
    )

    valid_loader = torch.utils.data.DataLoader(
        valid_dataset,
        batch_size=batch_size,
        sampler=valid_sampler
    )

    return train_loader, valid_loader


# ==========================
# Test Loader
# ==========================
def get_test_loader(data_dir,
                    batch_size,
                    shuffle=True):
    # Normalization for CIFAR-10 test set
    normalize = transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    )

    transform = transforms.Compose([
        transforms.Resize((227, 227)),
        transforms.ToTensor(),
        normalize
    ])

    dataset = datasets.CIFAR10(
        root=data_dir,
        train=False,
        download=True,
        transform=transform
    )

    data_loader = torch.utils.data.DataLoader(
        dataset,
        batch_size=batch_size,
        shuffle=shuffle
    )

    return data_loader





In [6]:

data_dir = './data'
batch_size = 64

train_loader, valid_loader = get_train_valid_loader(
    data_dir=data_dir,
    batch_size=batch_size,
    augment=False,
    random_seed=1
)

test_loader = get_test_loader(
    data_dir=data_dir,
    batch_size=batch_size
)


100%|██████████| 170M/170M [00:04<00:00, 34.6MB/s]


##Setting Hyperparameters

In [7]:
num_classes = 10
num_epochs = 20
batch_size = 64
learning_rate = 0.005

model = AlexNet(num_classes).to(device)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, weight_decay = 0.005, momentum = 0.9)

# Train the model
total_step = len(train_loader)

##Training

In [8]:
total_step = len(train_loader)

for epoch in range(num_epochs):

  for i, (images, labels) in enumerate(train_loader):
      # Move tensors to the configured device
      images = images.to(device)
      labels = labels.to(device)

      # Forward pass
      outputs = model(images)
      loss = criterion(outputs, labels)

      # Backward and optimize
      optimizer.zero_grad()
      loss.backward()
      optimizer.step()

  print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'
                  .format(epoch+1, num_epochs, i+1, total_step, loss.item()))

  # Validation
  with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in valid_loader:
        images = images.to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
        del images, labels, outputs

    print('Accuracy of the network on the {} validation images: {} %'.format(5000, 100 * correct / total))

Epoch [1/20], Step [704/704], Loss: 0.6746
Accuracy of the network on the 5000 validation images: 57.3 %
Epoch [2/20], Step [704/704], Loss: 1.2493
Accuracy of the network on the 5000 validation images: 68.98 %
Epoch [3/20], Step [704/704], Loss: 0.6175
Accuracy of the network on the 5000 validation images: 72.96 %
Epoch [4/20], Step [704/704], Loss: 0.6472
Accuracy of the network on the 5000 validation images: 75.36 %
Epoch [5/20], Step [704/704], Loss: 0.8327
Accuracy of the network on the 5000 validation images: 77.52 %
Epoch [6/20], Step [704/704], Loss: 0.6372
Accuracy of the network on the 5000 validation images: 76.34 %
Epoch [7/20], Step [704/704], Loss: 0.5577
Accuracy of the network on the 5000 validation images: 78.58 %
Epoch [8/20], Step [704/704], Loss: 0.7393
Accuracy of the network on the 5000 validation images: 79.62 %
Epoch [9/20], Step [704/704], Loss: 0.2448
Accuracy of the network on the 5000 validation images: 79.48 %
Epoch [10/20], Step [704/704], Loss: 0.2703
Acc

##Testing

In [9]:
with torch.no_grad():
  correct = 0
  total = 0
  for images, labels in test_loader:
    images = images.to(device)
    labels = labels.to(device)
    outputs = model(images)
    _, predicted = torch.max(outputs.data, 1)
    total += labels.size(0)
    correct += (predicted == labels).sum().item()
    del images, labels, outputs

  print('Accuracy of the network on the {} test images: {} %'.format(10000, 100 * correct / total))

Accuracy of the network on the 10000 test images: 81.54 %
