# Lab7 - MC Dropout

In today's lab we will implement Monte-Carlo droput for neural network and use it as a model specific informativeness measure in an active learning cycle.

As shown by [Gal & Ghahramani (2016)]( https://arxiv.org/abs/1506.02142)
Dropout in Neural Network can be used as an approximation of Bayesian model and therefore can be used as a measure of models uncertainty.

Lets start by loading Fashion Mnist dataset and creating a simple NN with pytorch.

In [44]:
import torch
import torch.nn as nn

model = nn.Sequential(
          nn.Conv2d(1,32, 3),
          nn.ReLU(),
          nn.MaxPool2d(2),
          nn.Dropout(),
          nn.Conv2d(32,64, 3),
          nn.ReLU(),
          nn.MaxPool2d(2),
          nn.Dropout(),
          nn.Conv2d(64, 32, 3),
          nn.ReLU(),
          nn.MaxPool2d(2),
          nn.Flatten(),
          nn.Dropout(),
          nn.Linear(32, 10)
        )
model = model.float()

optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
loss_fn = nn.CrossEntropyLoss()


In [45]:
from torchvision import datasets
import torchvision.transforms as transforms
from torch.utils.data import DataLoader


training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=transforms.Compose([
      transforms.ToTensor(),
      transforms.Normalize((0,), (1,))
    ])
)

test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=transforms.Compose([
      transforms.ToTensor(),
      transforms.Normalize((0,), (1,))
    ])
)


train_dataloader = DataLoader(training_data, batch_size=64, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=64, shuffle=True)

We can define below functions for training and evaluation with created NN.

In [46]:
from sklearn.metrics import balanced_accuracy_score

def train_loop(dataloader, model, loss_fn, optimizer, num_epochs=1):
    size = len(dataloader.dataset)
    for epoch in range(num_epochs):
      for batch, (X, y) in enumerate(dataloader):
          pred = model(X)
          loss = loss_fn(pred, y)

          optimizer.zero_grad()
          loss.backward()
          optimizer.step()

          if batch % 100 == 0:
              loss, current = loss.item(), batch * len(X)
              print(f"Epoch {epoch}, loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")


def test_loop(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    test_loss, labels_estimated, correct_labels = 0, [], []

    with torch.no_grad():
        for X, y in dataloader:
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct_labels.extend(y.numpy())
            labels_estimated.extend((pred.argmax(1)).numpy())

    test_loss /= num_batches
    print(f"Test Error: \n BAC: {balanced_accuracy_score(correct_labels, labels_estimated)} \n Loss {test_loss}")

In [47]:
train_loop(train_dataloader, model, loss_fn, optimizer)

Epoch 0, loss: 2.309655  [    0/60000]
Epoch 0, loss: 1.558943  [ 6400/60000]
Epoch 0, loss: 1.217630  [12800/60000]
Epoch 0, loss: 1.297080  [19200/60000]
Epoch 0, loss: 1.228658  [25600/60000]
Epoch 0, loss: 0.991111  [32000/60000]
Epoch 0, loss: 0.864235  [38400/60000]
Epoch 0, loss: 1.047429  [44800/60000]
Epoch 0, loss: 0.783615  [51200/60000]
Epoch 0, loss: 0.858984  [57600/60000]


In [48]:
test_loop(test_dataloader, model, loss_fn)

Test Error: 
 BAC: 0.6368 
 Loss 0.9636115507715067


1. Create a function that measures the standard deviation of predictions using proposed model on given samples with dropout enabled.

2. Implement BALD informativeness:
$$
u^*_{BALD} = \arg\max_{u} H(y|u, U_{tr}) - E_{\theta\sim p(\theta|U_{tr})}[H(y|u,\theta) ]
$$
where $H$ is entropy function, $U_{tr}$ is current training set and  $θ$ are the parameters of the model. 


To obtain the first part of total uncertainty run normal inference with the model, estimate the second part by making inference with dropout in "training" mode.

Warning: Make sure that values that you are applying entropy function to proper probability distributions. 

3. Prepare an active learning experiment, split the training dataset into 1.5% of randomly selected initial training data and and a pool from which our algorithms will chose samples.

4. Use created informativeness functions in active learning experiment. Choose 5 batches with 64 samples in each batch. Compare obtained results with random sampling.