## 2.4 Dropout Regularization
- Intro to Dropout Regularization 
- Experiment Handling Overfitting using Dropout Regularization

⚠️⚠️⚠️ *Please open this notebook in Google Colab* by click below link ⚠️⚠️⚠️<br><br>
<a href="https://colab.research.google.com/github/Muhammad-Yunus/Belajar-Image-Classification/blob/main/Pertemuan%202/2.4%20dropout_regularization.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a><br><br><br>
- Click `Connect` button in top right Google Colab notebook,<br>
<img src="resource/cl-connect-gpu.png" width="250px">
- If connecting process completed, it will turn to something look like this<br>
<img src="resource/cl-connect-gpu-success.png" width="250px">

### 2.4 Intro to Dropout Regularization
- <font color="orange">Dropout</font> is one of the <font color="orange">oldest</font> regularization techniques in deep learning. 
- At each training iteration, it <font color="orange">drops random neurons</font> from the network with a probability <font color="orange">p</font> (typically 25% to 50%). 
- In practice, <font color="orange">neuron outputs</font> are set to <font color="orange">0</font>. 
- The network result is that these neurons will <font color="orange">not participate in the loss computation</font> this time around and they will <font color="orange">not get weight updates</font>. 
- Different neurons will be dropped at each training iteration.<br><br>
<img src="resource/Dropout_2.png" width="700px"><br><br>
- When evaluation the performance of your network of course you put all the neurons back (dropout rate=0)
    - This procedure <font color="orange">slowdown</font> the model learn during <font color="orange">training</font>.
    - But use <font color="orange">all knowledge</font> the model have, during <font color="orange">evaluation</font>.
    - This hopefully make training loss dropping in same portion with the validation loss and <font color="orange">avoiding overfitting</font>.

- Check GPU connected into Colab environment is active

In [None]:
!nvidia-smi

- Import library

In [None]:
!pip install gdown

import os
import cv2
import gdown
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader, random_split

import torchvision
from torchvision import transforms

from IPython import display

# clear output cell
display.clear_output()

print(f"torch : {torch.__version__}")
print(f"torch vision : {torchvision.__version__}")

- Download MNIST Dataset

In [None]:
DATASET_NAME = 'MNIST' # the dataset name
DATASET_NUM_CLASS = 10 # number of class in dataset

In [None]:
# default using gdrive_id Dataset `mnist_dataset.zip` (1-FfwJrllyHofQwIbMb_IxAkxnfMGSFmR)
gdrive_id = '1-FfwJrllyHofQwIbMb_IxAkxnfMGSFmR' # <-----  ⚠️⚠️⚠️ USE YOUR OWN GDrive ID FOR CUSTOM DATASET ⚠️⚠️⚠️

# download zip from GDrive
url = f'https://drive.google.com/uc?id={gdrive_id}'
gdown.download(url, DATASET_NAME + ".zip", quiet=False)

# unzip dataset
!unzip {DATASET_NAME}.zip -d {DATASET_NAME}

# clear output cell
display.clear_output()

- Load MNIST Dataset

In [None]:
# Define Custom Dataset class
# it's just helper to load image dataset using OpenCV and convert to pytorch tensor
# also doing a label encoding using one-hot encoding
class CustomDataset(Dataset):
    def __init__(self, root_dir):
        self.root_dir = root_dir
        self.image_files = sorted([file for file in os.listdir(root_dir) if file.lower().endswith('.png')])

    def __len__(self):
        return len(self.image_files)

    def __getitem__(self, idx):
        # Read image from corresponding .png file
        image_path = os.path.join(self.root_dir, self.image_files[idx])
        image = cv2.imread(image_path)  # Load image using OpenCV
        image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)  # Convert BGR to GRAY
        image = torch.from_numpy(image).to(torch.float32)  # Convert NumPy array to PyTorch tensor
        image = torch.flatten(image) # flatten image from 2D Tensor (28x28) into a 1D tensor (784)

        # Read label from corresponding .txt file
        label_path = os.path.splitext(image_path)[0] + ".txt"
        with open(label_path, 'r') as label_file:
            label = int(label_file.read().strip())  # Assuming labels are integers

        # Apply one-hot encoding into label
        labels_tensor = torch.tensor(label)
        one_hot_encoded = F.one_hot(labels_tensor, num_classes=DATASET_NUM_CLASS).to(torch.float32)

        return image, one_hot_encoded



# instantiate dataset
# in here the image dataset is not loaded yet
# we only read all image files names in fataset folder
all_train_dataset = CustomDataset(root_dir=f'{DATASET_NAME}/dataset/train')
test_dataset = CustomDataset(root_dir=f'{DATASET_NAME}/dataset/test')

- Split Dataset

In [None]:
# Split 'all_train_dataset' into 'train' and 'validation' set using `random_split()` function
train_dataset, validation_dataset = random_split(all_train_dataset, [50000, 10000])

print(f"Train Dataset : {len(train_dataset)} data")
print(f"Validation Dataset : {len(validation_dataset)} data")
print(f"Test Dataset : {len(test_dataset)} data")

- Create Data Loader

In [None]:
# Create data loaders
BATCH_SIZE = 128

train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)
validation_loader = DataLoader(validation_dataset, batch_size=BATCH_SIZE, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=BATCH_SIZE, shuffle=False)

### Define the network with Dropout Layer here!!!
- In pytorch we can append Dropout after Activation Function in between hidden layer
- The following example to add dropout layer with probability <font color="orange">p = 0.25 (25%)</font>
    ```
    nn.Dropout(0.25)
    ```
- More about [Pytorch Dropout](https://pytorch.org/docs/stable/generated/torch.nn.Dropout.html)

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') 
model = nn.Sequential(
    nn.Linear(in_features=784, out_features=200),   # 1st hidden layer
    nn.ReLU(),                                      # ReLU Activation function
    nn.Dropout(0.25),                               # Dropout 25% neuron
    nn.Linear(in_features=200, out_features=100),   # 2nd hidden layer
    nn.ReLU(),                                      # ReLU Activation function
    nn.Linear(in_features=100, out_features=60),    # 3rd hidden layer
    nn.ReLU(),                                      # ReLU Activation function
    nn.Dropout(0.25),                               # Dropout 25% neuron
    nn.Linear(in_features=60, out_features=30),     # 4th hidden layer
    nn.ReLU(),                                      # ReLU Activation function
    nn.Linear(in_features=30, out_features=10),     # Output layer
    nn.LogSoftmax(dim=1)                            # Log probabilities for classification
).to(device)

- Setting Optimizer, Learning Rate Scheduler & Loss Function

In [None]:
from torch.optim.lr_scheduler import StepLR

# Setup optimizer, learning rate scheduler & loss function
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
scheduler = StepLR(optimizer, step_size=5, gamma=0.1)  # Decay Learning Rate by a factor of 0.1 every 5 epochs
loss_function = nn.CrossEntropyLoss()

- To run training process, we can use the following code

In [None]:
!pip install tqdm

from tqdm import tqdm

In [None]:
def train(model, train_loader, optimizer, loss_function):
    model.train()
    running_loss = 0.0
    correct_predictions = 0
    total_predictions = 0

    # Add progress bar for training loop
    progress_bar = tqdm(train_loader, desc='Training', leave=False)

    for inputs, labels in progress_bar:
        inputs = inputs.to(device)  # Move inputs to device
        labels = labels.to(device)  # Move labels to device

        # Resets the gradients of all the model's parameters before the backward pass
        optimizer.zero_grad()
        # Pass input tensor to model
        outputs = model(inputs)
        # Calculate loss value
        loss = loss_function(outputs, labels)
        # Computes the gradient of the loss with respect to each parameter in model
        loss.backward()
        # Adjust model parameters
        optimizer.step()
        # Sum loss value
        running_loss += loss.item()

        # Calculate correct & total prediction
        _, predicted = torch.max(outputs, 1)
        correct_predictions += (predicted == labels.argmax(1)).sum().item()
        total_predictions += labels.size(0)

        # Update progress bar description with current loss
        progress_bar.set_postfix(loss=loss.item())

    # Calculate average training loss
    average_train_loss = running_loss / len(train_loader.dataset)
    # Calculate training accuracy
    train_accuracy = correct_predictions / total_predictions
    return average_train_loss, train_accuracy

def validate(model, val_loader, loss_function):
    model.eval()
    running_loss = 0.0
    correct_predictions = 0
    total_predictions = 0

    # Add progress bar for validation loop
    progress_bar = tqdm(val_loader, desc='Validating', leave=False)

    with torch.no_grad():
        for inputs, labels in progress_bar:
            inputs = inputs.to(device)  # Move inputs to device
            labels = labels.to(device)  # Move labels to device

            # Pass input tensor to model
            outputs = model(inputs)
            # Calculate loss value
            loss = loss_function(outputs, labels)
            # Sum loss value
            running_loss += loss.item()

            # Calculate correct & total prediction
            _, predicted = torch.max(outputs, 1)
            correct_predictions += (predicted == labels.argmax(1)).sum().item()
            total_predictions += labels.size(0)

            # Update progress bar description with loss
            progress_bar.set_postfix(loss=loss.item())

    # Calculate average validation loss
    average_val_loss = running_loss / len(val_loader.dataset)
    # Calculate validation accuracy
    val_accuracy = correct_predictions / total_predictions
    return average_val_loss, val_accuracy

# This is a training loop for selected Epoch
# each epoch will process all training and validation set, chunked into small batch size data
# then measure the loss & accuracy of training and validation set
NUM_EPOCH = 10  # You can change this value

train_losses = []
val_losses = []
train_accuracies = []
val_accuracies = []

for epoch in range(NUM_EPOCH):
    print(f"Epoch {epoch+1}/{NUM_EPOCH}")

    train_loss, train_accuracy = train(model, train_loader, optimizer, loss_function)
    val_loss, val_accuracy = validate(model, validation_loader, loss_function)

    train_losses.append(train_loss)
    val_losses.append(val_loss)
    train_accuracies.append(train_accuracy * 100)  # Convert to percentage
    val_accuracies.append(val_accuracy * 100)  # Convert to percentage

    # Step the scheduler
    scheduler.step() # <--- ⚠️⚠️⚠️ THIS LINE WILL SCHEDULED THE DECAY LEARNING RATE IN OPTIMIZER 

    print(f"Train Loss = {train_loss:.4f}, Val Loss = {val_loss:.4f}, Train Accuracy = {train_accuracy:.4f}, Val Accuracy = {val_accuracy:.4f}\n")


- Plot Loss and Accuracy of Training vs Validation Set 

In [None]:
# visualize Loss & Accuracy
import matplotlib.pyplot as plt

epochs = list(range(1, NUM_EPOCH + 1))

# Plotting loss
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.plot(epochs, train_losses, 'b', label='Training Loss')
plt.plot(epochs, val_losses, 'r', label='Validation Loss')
plt.title('Training and Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.grid(True)

# Plotting accuracy
plt.subplot(1, 2, 2)
plt.plot(epochs, train_accuracies, 'b', label='Training Accuracy')
plt.plot(epochs, val_accuracies, 'r', label='Validation Accuracy')
plt.title('Training and Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy (%)')
plt.legend()
plt.grid(True)

plt.tight_layout()


- Evaluate Model, find Precision, Recal each class data, measure accuracy and compute confusion matrix

In [None]:
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
import seaborn as sns
import numpy as np

# define evaluate function for test set
def evaluate(model, test_loader):
    model.eval()
    all_labels = []
    all_preds = []

    # Add progress bar for validation loop
    progress_bar = tqdm(test_loader, desc='Evaluating', leave=False)

    with torch.no_grad():
        # iterate over all batched test set
        for inputs, labels in progress_bar:
            inputs = inputs.to(device) # move inputs to device
            labels = labels.to(device) # move labels to device

            # pass input tensor to model
            outputs = model(inputs)
            # get prediction
            _, preds = torch.max(outputs, 1)
            # collect all labels & preds
            all_labels.extend(labels.cpu().numpy())
            all_preds.extend(preds.cpu().numpy())

    return all_labels, all_preds

# Evaluation on test set
all_labels, all_preds = evaluate(model, test_loader)
all_labels = np.argmax(all_labels, axis=1)

# Calculate classification report
labels = [str(i) for i in range(DATASET_NUM_CLASS)]
print(classification_report(all_labels, all_preds, target_names=labels))

# Confusion Matrix
conf_matrix = confusion_matrix(all_labels, all_preds)

# Plotting the confusion matrix
plt.figure(figsize=(10, 7))
sns.heatmap(conf_matrix, annot=True, fmt="d", cmap="Blues")
plt.xlabel('Predicted Class')
plt.ylabel('Actual Class')
plt.title('Confusion Matrix')
plt.show()

- Download Model 

In [None]:
# Save the model
torch.save(model.state_dict(), 'trained_model.pt')

# Download the model file
from google.colab import files
files.download('trained_model.pt')

>
>### Discussion
>- Is adding Dropout Regularization solving overfitting in more bigger network?
>- Try adding more Dropout in between all hidden layer with p = 0.25
>   - It's working better? 
>- Try changing probability p from 0.25 to other value, e.g 0.5
>
>.

### 📝📝📝 Special Care for Overfitting Problem

- Overfitting happens when a neural network learns <font color="cyan">"badly"</font>, 
- In a way that <font color="cyan">works for the training set</font> but <font color="cyan">not so well on validation set</font>. 
- There are regularisation techniques like dropout that can force it to learn in a better way but overfitting also has deeper roots.
- Here are some root cause :<br><br>
<img src="resource/Overfitting.png" width="700px"><br><br>
- If you have <font color="cyan">very little training data</font>, even a small network can learn it by heart and you will see "overfitting". 
    - Generally speaking, you always <font color="cyan">need lots of data</font> to train neural networks.
- If you already experimented with <font color="cyan">different sizes</font> of network, <font color="cyan">change optimizer</font>, <font color="cyan">applied dropout</font> and <font color="cyan">trained on lots of data</font> but still facing the overfitting, this means that <font color="cyan">your neural network</font>, in its present shape, is <font color="orange">not capable of extracting more information</font> from <font color="cyan">your data</font>, as in our case here.
- In the next chapter we will learn more about <font color="orange">Convolution Neural Network (CNN)</font> that might more capable to learn good enough with dataset we have.

_________________________________________________________________________
<br><br><br>
# Source
- https://codelabs.developers.google.com/codelabs/cloud-tensorflow-mnist#8
- https://pytorch.org/docs/stable/generated/torch.nn.Dropout.html