# Exploring Uncertainty Estimation with Monte Carlo Dropout (MC Dropout)

In machine learning, especially in deep learning, it is crucial not only to obtain accurate predictions but also to understand how **confident** a model is in those predictions. This is where **uncertainty estimation** comes in. By quantifying uncertainty, we can make better decisions in high-stakes tasks such as medical diagnosis, autonomous driving, and more.

One popular method for estimating uncertainty in neural networks is **Monte Carlo Dropout (MC Dropout)**. In this blog, we will dive into what uncertainty means in the context of deep learning, how MC Dropout helps us estimate uncertainty, and how you can implement it using PyTorch.

## What is Uncertainty in Deep Learning?

Uncertainty in a machine learning model refers to how confident the model is about its predictions. Generally, there are two types of uncertainty:

1. **Aleatoric Uncertainty**: This type of uncertainty arises due to inherent noise or randomness in the data. For example, if the input data is ambiguous or corrupted (e.g., a blurry image), the model's predictions will have high aleatoric uncertainty.
  
2. **Epistemic Uncertainty**: This type of uncertainty is related to the model itself. It arises due to a lack of knowledge, often caused by insufficient training data. Epistemic uncertainty can be reduced by gathering more data or using a better model.

In tasks where wrong predictions could lead to severe consequences (e.g., predicting a disease from medical images), it's important to know how uncertain the model is about its prediction.

## What is Monte Carlo Dropout?

**Monte Carlo Dropout (MC Dropout)** is a technique used to estimate **epistemic uncertainty** in deep learning models. Dropout is traditionally used as a regularization technique to prevent overfitting during training by randomly "dropping out" neurons in the network. However, **in MC Dropout, Dropout is also applied during inference** to get different predictions each time we pass the input through the network.

This allows us to collect multiple outputs (by running the same input through the model multiple times), which can be used to calculate the model's **mean prediction** and **uncertainty** (standard deviation) of those predictions.

### Steps to Apply MC Dropout:

1. **Activate Dropout during inference**: Normally, Dropout is only used during training. In MC Dropout, we turn it on during inference as well.
  
2. **Perform multiple forward passes**: Run the input through the model multiple times (e.g., 100 iterations), each time with Dropout applied. This will give us different outputs because different neurons will be dropped each time.
  
3. **Estimate mean and uncertainty**: Calculate the mean of these predictions to get the final output. The standard deviation of these predictions gives us the uncertainty.

## Why is MC Dropout Useful?

- **Uncertainty Quantification**: MC Dropout helps us estimate the uncertainty of the model's predictions. This is especially useful in critical tasks such as healthcare or autonomous vehicles.
- **Easy to Implement**: MC Dropout is relatively easy to implement in existing models since it only requires modifying how Dropout is applied during inference.
- **Improved Decision-Making**: By knowing how uncertain a model is, we can make better decisions. For example, if a model is highly uncertain about a medical diagnosis, it can alert a human expert to take a closer look.


## Python Code Implementation

In [6]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.transforms as transforms
import torchvision.models as models
import torchvision.datasets as datasets
from torch.utils.data import DataLoader, Subset
import numpy as np
import random

def init_model():
    model = models.wide_resnet50_2(pretrained=True)
    model.conv1 = nn.Conv2d(model.conv1.in_channels, 
                            model.conv1.out_channels,
                            3, 1, 1)  # Change kernel size to 3x3 with padding 1
    model.maxpool = nn.Identity()  # Remove the maxpooling layer
    model.dropout = nn.Dropout(p=0.5)  # Add dropout with 50% probability
    # Change the final fully connected layer to fit CIFAR-100 classes
    model.fc = nn.Sequential(
        nn.Dropout(p=0.5),  # Add another Dropout before the final layer
        nn.Linear(model.fc.in_features, 100)
    )
    return model

model = init_model()
print(model)

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): Identity()
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1,

### MC Dropout

In [2]:
def enable_dropout(model):
    """ Enable dropout layers during inference """
    for m in model.modules():
        if isinstance(m, nn.Dropout):
            m.train()

            
def mc_dropout_predict(model, x, n_iter=50):
    """ Perform MC Dropout for uncertainty estimation """
    model.eval()
    enable_dropout(model)
    
    predictions = []
    for _ in range(n_iter):
        with torch.no_grad():
            pred = model(x)
            predictions.append(pred)
    
    predictions = torch.stack(predictions)
    mean_prediction = predictions.mean(dim=0)
    uncertainty = predictions.std(dim=0)
    
    return mean_prediction, uncertainty

In [3]:
def load_cifar100(batch_size=1):
    transform = transforms.Compose([
        transforms.Resize((32, 32)),
        transforms.ToTensor(),
        transforms.Normalize((0.5071, 0.4867, 0.4408), (0.2675, 0.2565, 0.2761)),
    ])
    
    # Load CIFAR-100 training set
    dataset = datasets.CIFAR100(root='./data', train=False, download=True, transform=transform)
    
    # Randomly select a few images
    random_indices = random.sample(range(len(dataset)), 10)  # Select 10 random images
    subset = Subset(dataset, random_indices)
    
    data_loader = DataLoader(subset, batch_size=batch_size, shuffle=False)
    
    return data_loader

In [7]:
cifar100_loader = load_cifar100(batch_size=1)
model = model.cuda()  # Use GPU if available
for inputs, labels in cifar100_loader:
    inputs = inputs.cuda()  # Move inputs to GPU
        
    # Perform MC Dropout and get mean prediction and uncertainty
    mean_pred, uncertainty = mc_dropout_predict(model, inputs, n_iter=50)
        
    print("Mean Prediction (First 5 classes): ", mean_pred[0][:5])
    print("Uncertainty (First 5 classes): ", uncertainty[0][:5])

Files already downloaded and verified
Mean Prediction (First 5 classes):  tensor([ 0.1195,  0.4123, -1.5641,  0.3386, -0.0486], device='cuda:0')
Uncertainty (First 5 classes):  tensor([0.3807, 0.4994, 0.4753, 0.4732, 0.5316], device='cuda:0')
Mean Prediction (First 5 classes):  tensor([-0.1361,  0.1523, -0.4898,  0.1995, -0.2344], device='cuda:0')
Uncertainty (First 5 classes):  tensor([0.3003, 0.3387, 0.3496, 0.2998, 0.3705], device='cuda:0')
Mean Prediction (First 5 classes):  tensor([-0.3431,  0.4221, -1.0592,  0.1948, -0.4164], device='cuda:0')
Uncertainty (First 5 classes):  tensor([0.4319, 0.5049, 0.5084, 0.5179, 0.5629], device='cuda:0')
Mean Prediction (First 5 classes):  tensor([-0.5336, -0.2419, -0.8448,  0.3664, -0.5599], device='cuda:0')
Uncertainty (First 5 classes):  tensor([0.4038, 0.4211, 0.3976, 0.4447, 0.3795], device='cuda:0')
Mean Prediction (First 5 classes):  tensor([-0.2060, -0.1346, -1.1554,  0.1303, -0.6362], device='cuda:0')
Uncertainty (First 5 classes):  ten