# 7.2.3 Dropout

## Explanation of Dropout

Dropout is a regularization technique used in neural networks to prevent overfitting. During training, dropout randomly sets a fraction of the input units to zero at each update during forward pass. This prevents the network from becoming too reliant on any one neuron, thereby forcing it to learn more robust features that are useful in conjunction with many different random subsets of the other neurons.

Mathematically, dropout can be described as follows:

Given a neural network layer's output $a$, dropout is applied as:

$$
\tilde{a} = a \cdot \text{mask}
$$

where $\text{mask}$ is a binary vector with the same shape as $a$, where each element is 0 with probability $p$ and 1 with probability $1-p$. During training, the mask is randomly generated, and $p$ is the dropout rate.

## Scenarios Where Dropout is Beneficial

1. **Overfitting**: Dropout is especially useful when a model has too many parameters compared to the amount of training data, which can lead to overfitting. By randomly dropping units, dropout prevents the network from relying too heavily on specific neurons, thereby improving generalization.

2. **Complex Models**: In deep neural networks with many layers and parameters, dropout can be crucial for preventing overfitting by ensuring that the network does not become too specialized to the training data.

3. **Small Datasets**: When working with small datasets, dropout helps in improving the model's ability to generalize by artificially increasing the amount of data through the random dropping of neurons.

## Methods for Implementing Dropout

Dropout can be implemented in neural networks both from scratch and using high-level libraries. Here, we demonstrate both approaches.

___
___
### Readings:
- [How Dropout Regularization Mitigates Overfitting in Neural Networks](https://readmedium.com/en/https:/medium.com/data-science-365/how-dropout-regularization-mitigates-overfitting-in-neural-networks-9dcc3e7102ff)
- [What is Dropout Regularization method?](https://ai.plainenglish.io/what-is-dropout-regularization-method-1eae267411ef)
- [Dropout](https://neuralthreads.medium.com/dropout-regularization-technique-that-clicked-in-geoffrey-hintons-mind-at-a-bank-fa7fa8c5e1fb)
- [Types of Regularization in Machine Learning](https://medium.com/towards-data-science/types-of-regularization-in-machine-learning-eb5ce5f9bf50)
___
___

## Dropout from Scratch

In [1]:
import numpy as np

class Dropout:
    def __init__(self, rate):
        self.rate = rate
        self.mask = None
    
    def forward(self, X, training=True):
        if training:
            self.mask = np.random.rand(*X.shape) > self.rate
            return X * self.mask
        else:
            return X
    
    def backward(self, d_out):
        return d_out * self.mask

In [2]:
np.random.seed(42)  
X = np.array([[1, 2, 3], [4, 5, 6]])
dropout = Dropout(rate=0.5)

# Forward pass
X_dropout = dropout.forward(X, training=True)
print("Forward pass with dropout:\n", X_dropout)

# Backward pass (example gradients from subsequent layers)
d_out = np.random.randn(*X.shape)
d_X = dropout.backward(d_out)
print("\nBackward pass gradients:\n", d_X)

Forward pass with dropout:
 [[0 2 3]
 [4 0 0]]

Backward pass gradients:
 [[ 0.          0.76743473 -0.46947439]
 [ 0.54256004 -0.         -0.        ]]


___
___
## Dropout using `TensorFlow/Keras`

In [3]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

In [4]:
# Load and preprocess data
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape(-1, 28*28).astype('float32') / 255
x_test = x_test.reshape(-1, 28*28).astype('float32') / 255
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

In [5]:
# Build model with Dropout
model = Sequential([
    tf.keras.Input(shape=(28*28,)),
    Dense(128, activation='relu'),
    Dropout(0.5),
    Dense(64, activation='relu'),
    Dropout(0.5),
    Dense(10, activation='softmax')
])

In [6]:
# Compile model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Train model
model.fit(x_train, y_train, epochs=5, batch_size=32, validation_split=0.2)

# Evaluate model
loss, accuracy = model.evaluate(x_test, y_test)

Epoch 1/5
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 3ms/step - accuracy: 0.6748 - loss: 0.9928 - val_accuracy: 0.9410 - val_loss: 0.2082
Epoch 2/5
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 3ms/step - accuracy: 0.8992 - loss: 0.3619 - val_accuracy: 0.9554 - val_loss: 0.1613
Epoch 3/5
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 3ms/step - accuracy: 0.9162 - loss: 0.2933 - val_accuracy: 0.9603 - val_loss: 0.1410
Epoch 4/5
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 3ms/step - accuracy: 0.9284 - loss: 0.2557 - val_accuracy: 0.9619 - val_loss: 0.1341
Epoch 5/5
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 2ms/step - accuracy: 0.9359 - loss: 0.2360 - val_accuracy: 0.9657 - val_loss: 0.1279
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.9563 - loss: 0.1466


In [7]:
print(f'Test Loss: {loss:.4f}')
print(f'Test Accuracy: {accuracy:.4f}')

Test Loss: 0.1268
Test Accuracy: 0.9633


## Conclusion

Dropout is a powerful regularization technique used to prevent overfitting in neural networks by randomly dropping a fraction of neurons during training. By applying dropout, we force the network to be less reliant on specific neurons, promoting a more robust model with better generalization.

In our exploration of Dropout, we first demonstrated how to implement it manually from scratch. This implementation involves creating a dropout mask and applying it during the forward pass while handling gradients appropriately during the backward pass. This approach provides insight into how dropout functions at a fundamental level.

We then utilized TensorFlow/Keras to incorporate dropout into a neural network model with minimal code. TensorFlow's built-in `Dropout` layer simplifies the integration, allowing us to focus on building and training the model efficiently.

Both methods show that dropout is beneficial in various scenarios, such as when dealing with large networks, preventing overfitting, or training with limited data. By implementing dropout correctly, we can enhance the performance and generalization capabilities of neural networks.
