# Deep Learning

# Tutorial 13: Regularisation Layer - Dropout

In this tutorial, we will cover:

- Regularisation, Dropout

Prerequisites:

- Python, Tensor basics, DNN Training, Overfitting

My contact:

- Niklas Beuter (niklas.beuter@th-luebeck.de)

Course:

- Slides and notebooks will be available at https://lernraum.th-luebeck.de/course/view.php?id=5383

## Expected Outcomes
* Understand regularisation: dropout layer.

## Why do we need regularisation?

When a neural network model learns not only the underlying patterns in the training data but also the noise and random fluctuations, we call it overfitting. As a result, while the model performs exceptionally well on training data, its performance on new, unseen data is poor.

### Causes of Overfitting
1. **Too many parameters**: Having more parameters than necessary can lead the model to fit excessively to the training data.
2. **Insufficient training data**: Without enough data, the model fails to generalize well, learning instead from the limited examples provided.
3. **Extended training time**: Training a model for too many epochs can lead it to learn from the noise in the data as well as from the actual signal.

### How to Prevent Overfitting
- **Use regularization techniques** such as L2 regularization, **Dropout**, and early stopping.
- **Increase training data**: More data can help the model learn more general patterns.
- **Use data augmentation**: Modifying existing training samples to create new ones can help in generalizing better.

### Definition of Dropout
Dropout is a regularization technique used in neural networks to prevent overfitting. Overfitting occurs when a model learns the detailed noise in the training data to the extent that it negatively impacts the performance of the model on new data.

### How Dropout Works
During the training phase, dropout randomly deactivates a subset of neurons in the network with a certain probability $p$, typically between 0.2 and 0.5. This means that each neuron has a chance of $p$ to be temporarily "dropped out" and not contribute to the forward pass or the backpropagation process during that particular training step.


![Image from: Srivastava et al, “Dropout: A simple way to prevent neural networks from overfitting”, JMLR 2014](https://pgaleone.eu/images/dropout/dropout.jpeg)

#### Mathematical Representation
The dropout process can be mathematically represented as follows:

For a given layer with input vector $x$, the output $y$ after applying dropout is:

$$ y = x \odot m $$

where $\odot$ denotes element-wise multiplication, and $m$ is a mask vector where each element is independently drawn from a Bernoulli distribution with parameter $1-p$. This mask effectively "drops out" a fraction of its elements by setting them to zero.

### Benefits of Using Dropout
1. **Reduces Overfitting**: By deactivating neurons randomly, dropout ensures that the network does not become overly dependent on any single neuron, promoting a more generalized model.
2. **Encourages Neuronal Redundancy**: The network learns to use all neurons effectively, as it cannot rely on the presence of particular neurons, enhancing its ability to generalize well.
3. **Ensemble Effect**: The process of using dropout can be seen as training a large number of thin networks (each dropout iteration represents a different network). At test time, using all neurons simulates an averaging of these networks.

### Application of Dropout
Dropout is typically applied during training. At test time, all neurons are used, but their outputs are scaled down by a factor of $1-p$ to account for the larger number of active units compared to the training phase.

## Implementation of Dropout

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

In [None]:
# Dropout from Scratch

def dropout_layer(X, dropout):
    assert 0 <= dropout <= 1
    if dropout == 1: return torch.zeros_like(X)
    # Create a mask based on the input size and apply dropout (set to 0 and 1)
    mask = (torch.rand(X.shape).to(X.device) > torch.tensor(dropout).to(device)).float().to(X.device)
    # return debiased other values of the layer
    return mask * X / (1.0 - dropout)

In [None]:
X = torch.arange(16, dtype = torch.float32).reshape((2, 8))
print('dropout_p = 0:', dropout_layer(X, 0))
print('dropout_p = 0.5:', dropout_layer(X, 0.5))
print('dropout_p = 1:', dropout_layer(X, 1))

In [None]:
class CustomDropout(nn.Module):
    def __init__(self, p=0.5):
        super(CustomDropout, self).__init__()
        self.p = p

    def forward(self, x):
        if self.training:
            return dropout_layer(x,self.p)
        else:
            return x

In [None]:
class FashionMNISTModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(FashionMNISTModel, self).__init__()
        self.layer1 = nn.Linear(input_dim, hidden_dim)
        self.relu = nn.ReLU()
        self.dropout = CustomDropout(p=0.5)
        self.layer2 = nn.Linear(hidden_dim, output_dim)

    def forward(self, x):
        x = x.view(x.size(0), -1)  # Flatten the image
        x = self.layer1(x)
        x = self.relu(x)
        x = self.dropout(x)
        x = self.layer2(x)
        return x

In [None]:
def train_model(model, train_loader, criterion, optimizer, num_epochs=5):
    model.train()
    for epoch in range(num_epochs):
        running_loss = 0.0
        for images, labels in train_loader:
            images, labels = images.to(device), labels.to(device)
            optimizer.zero_grad()
            output = model(images)
            loss = criterion(output, labels)
            loss.backward()
            optimizer.step()
            running_loss += loss.item()
        print(f'Epoch {epoch+1}, Loss: {running_loss/len(train_loader)}')

In [None]:
# The transform compose is used to combine several transformations. In this case, it converts the images and normalizes them
transform = transforms.Compose([
    transforms.ToTensor(),  # Convert images to tensors
    transforms.Normalize((0.5,), (0.5,))  # Normalize the images
])

# Load datasets MNIST
#train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
#test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)

# Load datasets FASHION MNIST
train_dataset = datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = datasets.FashionMNIST(root='./data', train=False, download=True, transform=transform)

# Create data loaders with batch size of 64
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

# Class names for the Fashion-MNIST dataset
fashion_mnist_classes = [
    "T-shirt/top",   # Class 0
    "Trouser",       # Class 1
    "Pullover",      # Class 2
    "Dress",         # Class 3
    "Coat",          # Class 4
    "Sandal",        # Class 5
    "Shirt",         # Class 6
    "Sneaker",       # Class 7
    "Bag",           # Class 8
    "Ankle boot"     # Class 9
]

In [None]:
def evaluate_model(model, test_loader):
    model.eval()  # Set model to evaluation mode
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in test_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    accuracy = 100 * correct / total
    print(f'Accuracy: {accuracy:.2f}%')

In [None]:
# Create the model
model = FashionMNISTModel(784, 256, 10)
model = model.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

# Train the model
train_model(model, train_loader, criterion, optimizer)
evaluate_model(model, test_loader)

## Step 8: Predict classes

In [None]:
import matplotlib.pyplot as plt

def predict_and_visualize_single_image(model, image, label):
    image, label = image.to(device), label.to(device)  # Move image and label to the device
    model.eval()  # Set the model to evaluation mode
    with torch.no_grad():
        image = image.unsqueeze(0)  # Add batch dimension
        output = model(image)
        _, predicted = torch.max(output.data, 1)

    # Move the image back to CPU for visualization
    image = image.cpu()

    # Plot the image and its predicted class
    plt.imshow(image.squeeze(), cmap='gray')
    plt.title(f'Actual: {label} Predicted: {predicted.item()}')
    plt.axis('off')
    plt.show()

In [None]:
# Fetch one batch from the test_loader
test_images, test_labels = next(iter(test_loader))

print("Number of images in batch: ", test_images.shape[0])

# Select the first image and label from the batch
image, label = test_images[50], test_labels[50]

# Predict the class 
predict_and_visualize_single_image(model, image, label)

## PyTorch Dropout Layer

During evaluation (i.e., when using model.eval()), the dropout layer does not alter the input data and acts as a pass-through layer.

In [None]:
class FashionMNISTModelPyTorch(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(FashionMNISTModelPyTorch, self).__init__()
        self.layer1 = nn.Linear(input_dim, hidden_dim)
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(p=0.5)  # Using built-in Dropout layer
        self.layer2 = nn.Linear(hidden_dim, output_dim)

    def forward(self, x):
        x = x.view(x.size(0), -1)  # Flatten the image
        x = self.layer1(x)
        x = self.relu(x)
        x = self.dropout(x)  # Apply dropout after activation
        x = self.layer2(x)
        return x

In [None]:
# Create the model
model = FashionMNISTModelPyTorch(784, 256, 10)
model = model.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

# Train the model
train_model(model, train_loader, criterion, optimizer)
evaluate_model(model, test_loader)

# References

This notebook is adapted from or uses following sources:
* https://d2l.ai/chapter_multilayer-perceptrons/dropout.html#dropout
* https://web.eecs.umich.edu/~justincj/slides/eecs498/498_FA2019_lecture10.pdf

