# Interpretability-framework - Uncertainty Tutorial 1

in this notebook you will learn the intuition behind the features of the interpretability framework and how to us them.

## Classification Example with AlexNet

Demonstration of an uncertainty measurement of an image classification example.

In [8]:
import torchvision.models as models
from torch.nn import Dropout
from torch.nn import Softmax
from torch.nn import Sequential

import data_utils
from interpretability_framework import modules

### Step 1:

Load the model, which should be evaluated.

**Note: Monte Carlo Dropout ensembles only achieve proper results when used on a net which was trained with dropout. So check if the model you would like to use has dropout layers active in training.**

In our example we use a pretrained AlexNet for demonstration.

In [None]:
alexnet = models.alexnet(pretrained=True)
alexnet.classifier.add_module("softmax", Softmax(dim=1))

### Step 2:

Wrap your net with an PredictionEnsemble layer. This layer collects an ensemble of predictions with Monte Carlo dropout. This ensemble will be used for measuring uncertainties. 

In [6]:
ensemble = Sequential(
        modules.PredictionEnsemble(inner=alexnet), modules.ConfidenceMeanPrediction()
    )

### Step 3:

Prepare the ensemble layer by entering evaluation mode, to specify, that we are in test mode.

In [7]:
ensemble.eval()

Sequential(
  (0): PredictionEnsemble(
    (inner): AlexNet(
      (features): Sequential(
        (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
        (1): ReLU(inplace)
        (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
        (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
        (4): ReLU(inplace)
        (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
        (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (7): ReLU(inplace)
        (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (9): ReLU(inplace)
        (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (11): ReLU(inplace)
        (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
      (classifier): Sequential(
        (

### Step 4: 

Make dropout layers re-active to use predictive dropout.

In [12]:
for layer in list(alexnet.modules()):
    if isinstance(layer, Dropout):
        layer.train()

### Step 5: 

You can evaluate the predictive uncertainties by loading an image from a path or a batch of a dataset.
data.utils transforms the images, as AlexNet needs examples with dimensions ( 1 x 3 x 227 x 227 ).


**Option 1: use batch data** 

We can retrieve a batch of data from a dataset by using the pytorch dataloader implemented in data.utils.py.
Let's specify a batch size of 5 examples.

In [16]:
# currently no data available!
# batch = load_imagenet_dataset("../data/imagenet")

**Option 2: use image from path**

Load an image example, for which the uncertainties should be calculated.

*In order to interpret the results, we want to compare the uncertainties of an in-distribution and an out-of-distribution example.* 
As AlexNet was trained with ImageNet data, we pick an ImageNet example as an in-distribution target, for which predictions should be very confident. An out-of-distribution target is an image, which depicts something, which is not a class label of ImageNet. We use Cholitas for this example.

In [27]:
id_example = data_utils.get_example_from_path("../data/imagenet_example.jpg")
ood_example = data_utils.get_example_from_path("../data/ood_example.jpg")

**Option 3: generate a random example**

Retrieve a random example normalized with the distribution mean and standard deviation, which can also be seen as an out-of-distribution sample.


In [32]:
random_example = data_utils.get_random_example()

### Step 6: 
Calculate uncertainties.

In [28]:
pred, pred_entropy, mutual_info, var_ratio = ensemble(id_example)

print("Uncertainties of in-distribution example")
print(f"    mean prediction: {pred.argmax()}, class probability: {pred.max()}")
print(f"    total predictive entropy: {pred_entropy.sum()}")
print(f"    total mutual information: {mutual_info.sum()}")
print(f"    variational ratio: {var_ratio.item()}")

Uncertainties of in-distribution example
    mean prediction: 283, class probability: 0.9980050921440125
    total predictive entropy: 0.0186183862388134
    total mutual information: 0.0022695695515722036
    variational ratio: 0.001994907855987549


In [44]:
pred, pred_entropy, mutual_info, var_ratio = ensemble(ood_example)

print("Uncertainties of out-of-distribution example")
print(f"    mean prediction: {pred.argmax()}, class probability: {pred.max()}")
print(f"    total predictive entropy: {pred_entropy.sum()}")
print(f"    total mutual information: {mutual_info.sum()}")
print(f"    variational ratio: {var_ratio.item()}")

Uncertainties of out-of-distribution example
    mean prediction: 879, class probability: 0.20534050464630127
    total predictive entropy: 3.4618284702301025
    total mutual information: 0.4586506187915802
    variational ratio: 0.7946594953536987


In [37]:
pred, pred_entropy, mutual_info, var_ratio = ensemble(random_example)

print("Uncertainties of the random example")
print(f"    mean prediction: {pred.argmax()}, class probability: {pred.max()}")
print(f"    total predictive entropy: {pred_entropy.sum()}")
print(f"    total mutual information: {mutual_info.sum()}")
print(f"    variational ratio: {var_ratio.item()}")

Uncertainties of the random example
    mean prediction: 735, class probability: 0.19145451486110687
    total predictive entropy: 4.249647617340088
    total mutual information: 0.22843754291534424
    variational ratio: 0.8085454702377319


## Interpretation of results

We should see that in-distribution examples have a class probability close to 1 and a low uncertainties. Which means, we are predicting with high confidence.

In the OOD and random sample we see that the prediction can be different everytime the ensemble is executed. This is due to the stochasticity of the MC dropout. 

Still, we can observe that the uncertainty is constantly very high, so we predict with low confidence. Whereas, the total predictive entropy (total uncertainty) is high for both, OOD and random example, we see that the mutual information is significantly different. High mutual information idicates that the total uncertainty is mainly due to model uncertainty. Whereas when the mutual information is low, it is an indicator that the total uncertainty is due to data uncertainty.

