# 08: Regularization with Dropout

### ðŸŽ¯ Objective
This notebook explores **Dropout**, one of the most popular regularization techniques in deep learning. We will see how to implement it in PyTorch using `nn.Dropout` and `F.dropout`, and crucially, how it behaves differently during **training** versus **evaluation**.

### ðŸ“š Key Concepts
- **Dropout:** Randomly zeroing out neurons during training to prevent co-adaptation and overfitting.
- **`nn.Dropout`:** The module class for dropout.
- **`F.dropout`:** The functional version of dropout.
- **`model.train()` vs `model.eval()`:** Toggling the model's mode to enable/disable dropout.

In [1]:
# import libraries
import torch
import torch.nn as nn
import torch.nn.functional as F

## 1. Using `nn.Dropout`

We create a `Dropout` instance with a probability `p=0.5`. This means each element in the input tensor has a 50% chance of being zeroed out.

**Scaling Factor:** Notice that the non-zero values are scaled *up*. If we drop 50% of the inputs, the remaining inputs need to be doubled (multiplied by $1/(1-p)$) so that the *sum* or *average* of the layer's output remains roughly the same. PyTorch handles this automatically.

In [5]:
# define a dropout instance and make some data
prob = .5

dropout = nn.Dropout(p=prob)
x = torch.ones(10) # Vector of ten 1s

# let's see what dropout returns
y = dropout(x)

print('Original data:', x)
print('Dropout output:', y)
print('Mean of output:', torch.mean(y))

# Because p=0.5, we expect roughly half to be 0 and half to be 2 (1 / 0.5 = 2).
# The mean should remain close to 1.0.

Original data: tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
Dropout output: tensor([0., 0., 0., 2., 2., 2., 2., 2., 2., 0.])
Mean of output: tensor(1.2000)


## 2. Dropout in Evaluation Mode (`.eval()`)

When we are **testing** or using the model for real predictions, we do NOT want random neurons disappearing. We want the full network to be active. 

Calling `.eval()` on a model (or a dropout layer) switches this behavior off. The dropout layer becomes an identity function (it does nothing).

In [6]:
# dropout is turned off when evaluating the model
dropout.eval()
y = dropout(x)
print('Output in eval mode:', y)
print('Mean in eval mode:', torch.mean(y))
# Result: All 1s. No dropout applied.

Output in eval mode: tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
Mean in eval mode: tensor(1.)


## 3. Using `F.dropout` (Functional API)

Sometimes we use `F.dropout` directly in the `forward` method instead of defining a layer. 

**CRITICAL WARNING:** Unlike `nn.Dropout`, `F.dropout` does **not** automatically check if the model is in `.eval()` mode. It blindly applies dropout unless you explicitly tell it not to.

In [7]:
# annoyingly, F.dropout() is not deactivated in eval mode:

dropout.eval() # This sets the 'dropout' OBJECT to eval mode...
y = F.dropout(x) # ...but this FUNCTION doesn't know about that object.

print('F.dropout output (default):', y)
print('Mean:', torch.mean(y))
# Result: Dropout is still active! This is a common bug source.

F.dropout output (default): tensor([2., 0., 2., 2., 2., 0., 2., 0., 2., 0.])
Mean: tensor(1.2000)


In [8]:
# but you can manually switch it off using the 'training' argument
# In a real model, you would typically pass 'self.training' here.

y = F.dropout(x, training=False)

print('F.dropout output (training=False):', y)
print('Mean:', torch.mean(y))
# Result: Correctly disabled.

F.dropout output (training=False): tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
Mean: tensor(1.)


## 4. Toggling Modes

Just a demonstration that you can switch back and forth between `.train()` and `.eval()` modes.

In [9]:
# the model needs to be reset after toggling into eval mode

dropout.train() # Switch ON
y = dropout(x)
print('Train mode:', y) 


dropout.eval() # Switch OFF
y = dropout(x)
print('Eval mode:', y) 


# dropout.train() # Left commented out to show state persists
y = dropout(x)
print('Still eval mode:', y)

Train mode: tensor([2., 2., 2., 2., 2., 2., 0., 2., 2., 2.])
Eval mode: tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
Still eval mode: tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
