# Anomaly Detection v2.1

## Note:
It is recommended to run the code using a GPU. To do this, go to Runtime > Change runtime type > Hardware Accelerator > select GPU.

It may be the case that a GPU is not available, in which case, use a default CPU ("None" hardware accelerator). The code will still work without a GPU, but may run much slower.

### Contributers:
Jason Ding

## Pre-requisites

In the following cells, we are installing and importing the necessary libaries and downloading the classification model.

In [None]:
!wget https://github.com/hendrycks/outlier-exposure/raw/master/CIFAR/snapshots/baseline/cifar10_wrn_baseline_epoch_99.pt
!wget https://raw.githubusercontent.com/hendrycks/pre-training/master/robustness/adversarial/models/wrn_with_pen.py
!pip3 install torchvision==0.12.0

--2022-08-14 20:09:43--  https://github.com/hendrycks/outlier-exposure/raw/master/CIFAR/snapshots/baseline/cifar10_wrn_baseline_epoch_99.pt
Resolving github.com (github.com)... 20.27.177.113
Connecting to github.com (github.com)|20.27.177.113|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/hendrycks/outlier-exposure/master/CIFAR/snapshots/baseline/cifar10_wrn_baseline_epoch_99.pt [following]
--2022-08-14 20:09:44--  https://raw.githubusercontent.com/hendrycks/outlier-exposure/master/CIFAR/snapshots/baseline/cifar10_wrn_baseline_epoch_99.pt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9037421 (8.6M) [application/octet-stream]
Saving to: ‘cifar10_wrn_baseline_epoch_99.pt’


2022-08-14 20:09:46

In [None]:
import torch
import math
from torch.utils.data import Dataset
import torchvision
import torchvision.transforms as transforms
from torchvision.transforms import ToTensor
from torchvision import datasets
import numpy as np
import torch.nn.functional as F
import sklearn.metrics as sk
from wrn_with_pen import WideResNet

prefetch = 2

## Model and Data Loading

In the following cells, we are loading the CIFAR-10 model. Additionally, we are retrieving CIFAR-10 and out-of-distribution datasets.

In [None]:
# Load CIFAR-10 model

net = WideResNet(depth=40, num_classes=10, widen_factor=2, dropRate=0.3)

if torch.cuda.is_available():
    net.load_state_dict(torch.load('cifar10_wrn_baseline_epoch_99.pt'))
    net.eval()
    net.cuda()
else:
    net.load_state_dict(torch.load('cifar10_wrn_baseline_epoch_99.pt', map_location=torch.device('cpu')))
    net.eval()

In [None]:
# /////////////// Loading Datasets ///////////////
mean = [x / 255 for x in [125.3, 123.0, 113.9]]
std = [x / 255 for x in [63.0, 62.1, 66.7]]

# /////////////// CIFAR-10 ///////////////

data_transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize(mean, std)])

cifar_10_data = datasets.CIFAR10(
    root="data",
    train=False,
    download=True,
    transform=data_transform
)

cifar10_data = cifar_10_data
cifar10_loader = torch.utils.data.DataLoader(cifar10_data, batch_size=200, shuffle=False,
                                          num_workers=prefetch, pin_memory=True)
ood_num_examples = len(cifar10_data) // 5

# /////////////// Rademacher Noise ///////////////

dummy_targets = torch.ones(ood_num_examples)
ood_data = torch.from_numpy(np.random.binomial(
    n=1, p=0.5, size=(ood_num_examples, 3, 32, 32)).astype(np.float32)) * 2 - 1
rademacher_ood_data = torch.utils.data.TensorDataset(ood_data, dummy_targets)
rademacher_ood_loader = torch.utils.data.DataLoader(rademacher_ood_data, batch_size=200, shuffle=True)

# /////////////// SVHN ///////////////

data_transform = transforms.Compose([transforms.Resize(32), transforms.ToTensor(), transforms.Normalize(mean, std)])

svhn_ood_data = torchvision.datasets.SVHN(root = "data", 
                          split="test",
                          transform = data_transform, 
                          download = True)

svhn_ood_loader = torch.utils.data.DataLoader(svhn_ood_data, batch_size=200, shuffle=True,
                                         num_workers=prefetch, pin_memory=True)

# /////////////// DTD ///////////////

# data_transform = transforms.Compose([transforms.Resize(32), transforms.CenterCrop(32),
#                                      transforms.ToTensor(), transforms.Normalize(mean, std)])

# dtd_ood_data = torchvision.datasets.DTD(root = "data", 
#                           split="test",
#                           transform = data_transform, 
#                           download = True)

# dtd_ood_loader = torch.utils.data.DataLoader(dtd_ood_data, batch_size=200, shuffle=True,
#                                          num_workers=prefetch, pin_memory=True)

# /////////////// CIFAR-100 ///////////////

data_transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize(mean, std)])

cifar100_ood_data = datasets.CIFAR100(
    root="data",
    train=False,
    download=True,
    transform=data_transform
)

cifar100_ood_loader = torch.utils.data.DataLoader(cifar100_ood_data, batch_size=200, shuffle=True,
                                         num_workers=prefetch, pin_memory=True)

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to data/cifar-10-python.tar.gz


  0%|          | 0/170498071 [00:00<?, ?it/s]

Extracting data/cifar-10-python.tar.gz to data
Downloading http://ufldl.stanford.edu/housenumbers/test_32x32.mat to data/test_32x32.mat


  0%|          | 0/64275384 [00:00<?, ?it/s]

Downloading https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz to data/cifar-100-python.tar.gz


  0%|          | 0/169001437 [00:00<?, ?it/s]

Extracting data/cifar-100-python.tar.gz to data


## General Functions

The following cells define functions that we will use in the rest of the code. You do not need to do anything here.

In [None]:
concat = lambda x: np.concatenate(x, axis=0)
to_np = lambda x: x.data.cpu().numpy()

'''
Calculates the anomaly scores for a portion of the given dataset. 
If a GPU is not available, will run on a smaller fraction of the
dataset, so that calculations will be faster.

loader: A DataLoader that contains the loaded data of a dataset
anomaly_score_calculator: A function that takes in the output 
                          logit of a batch of data and/or the 
                          penultimate.
model_net: The classifier model.
usePenultimate: True if anomaly_score_calculator needs the 
                penultimate as a parameter. False otherwise.
'''
def get_ood_scores(loader, anomaly_score_calculator, model_net, use_penultimate = False):
    _score = []

    with torch.no_grad():
        for batch_idx, (data, target) in enumerate(loader):
            if torch.cuda.is_available():
                fraction = 200
            else:
                fraction = 1000
            if batch_idx >= ood_num_examples // fraction:
                break

            if torch.cuda.is_available():
                data = data.cuda()

            output = model_net(data)

            if use_penultimate:
                score = anomaly_score_calculator(output[0], output[1])
            else:
                score = anomaly_score_calculator(output[0])
            _score.append(score)

    return concat(_score).copy()

In [None]:
# /////////////// Printing Results ///////////////
all_anomaly_results = {}

'''
Returns and prints out the AUROC score of a dataset.

ood_loader: A DataLoader that contains the loaded data of a dataset
anomaly_score_calculator: A function that takes in the output 
                          logit of a batch of data and/or the 
                          penultimate.
model_net: The classifier model.
usePenultimate: True if anomaly_score_calculator needs the 
                penultimate as a parameter. False otherwise.
'''
def get_and_print_results(ood_loader, anomaly_score_calculator, model_net, use_penultimate):
    out_score = get_ood_scores(ood_loader, anomaly_score_calculator, model_net, use_penultimate = use_penultimate)
    auroc = get_auroc(out_score, in_score)
    print('AUROC: \t\t\t{:.2f}'.format(100 * auroc) + "%")
    return auroc

'''
Prints out the AUROC score of all the OOD datasets. The results 
will be appended to global variable all_anomaly_results, which 
is used later for display purposes.

anomaly_score_calculator: A function that takes in the output 
                          logit of a batch of data and/or the 
                          penultimate.
anomaly_score_name: The name of the anomaly score method.
model_net: The classifier model.
model_name: The name of the classifier model.
usePenultimate: True if anomaly_score_calculator needs the 
                penultimate as a parameter. False otherwise.
'''
def print_all_results(anomaly_score_calculator, anomaly_score_name, model_net, model_name = "default_model", use_penultimate = False):
    global in_score, all_anomaly_results
    in_score = get_ood_scores(cifar10_loader, anomaly_score_calculator, model_net, use_penultimate)
    results = []

    print('Rademacher Noise Detection')
    auroc = get_and_print_results(rademacher_ood_loader, anomaly_score_calculator, model_net, use_penultimate)
    results.append(auroc)

    print('\nSVHN Detection')
    auroc = get_and_print_results(svhn_ood_loader, anomaly_score_calculator, model_net, use_penultimate)
    results.append(auroc)

    # print('\nDTD Detection')
    # auroc = get_and_print_results(dtd_ood_loader, anomaly_score_calculator, model_net, use_penultimate)
    # results.append(auroc)

    print('\nCIFAR-100 Detection')
    auroc = get_and_print_results(cifar100_ood_loader, anomaly_score_calculator, model_net, use_penultimate)
    results.append(auroc)

    average = sum(results) / len(results)
    results.append(average)

    if not model_name in all_anomaly_results:
        all_anomaly_results[model_name] = {}
    all_anomaly_results[model_name][anomaly_score_name] = results

## Implement AUROC Score
Fill in the get_auroc score. We will use this function in order to calculate the AUROC score of an out-of-distribution dataset.

It may be helpful to use the sklearn.metrics.roc_auc_score() function. Both _pos and _neg should be used.

In [None]:
'''
Calculates the AUROC score of a OOD dataset.

_pos: an array of anomoly scores of the OOD dataset
_neg: an array of anomoly scores of images in the CIFAR-10 dataset
return: The AUROC score the data in decimal form
'''
def get_auroc(_pos, _neg):
    ones = torch.ones(_pos.shape)
    zeros = torch.zeros(_neg.shape)

    y_true = torch.concat((ones, zeros))
    y_score = torch.tensor(np.concatenate((_pos, _neg)))

    auroc_score = sk.roc_auc_score(y_true, y_score)
    return auroc_score

## Implement Anomaly Score Calculators
Fill in the folllowing functions which calculate the anomaly score given the model's output for a batch of data (the output will contain one logit per image in the batch).

The following equations show how the logits should be transformed in order to get the anomaly score.

Max Logit Equation: <br>
$\text{Score}=-\text{max} l_k$

Max Softmax Equation: <br>
$\text{Score}=-\text{max} p(y=k|x)$

Cross Entropy Anomaly Equation: <br>
$\text{Score} = \bar{l}-\text{log}∑_{c=1}^{\text{num_classes}}e^{l_c}$

In [None]:
# /////////////// Anomaly Score Calculators ///////////////

'''
Calculates the max logit anomaly score of a batch of outputs.

output: The model's output for a batch of data.
'''
def max_logit_anomaly_score(output):
    max, index = torch.max(output, dim=1)
    score = max.cpu().data.numpy()
    score = score * -1
    return score

'''
Calculates the max softmax anomaly score of a batch of outputs.

output: The model's output for a batch of data.
'''
def max_softmax_anomaly_score(output):
    sm = torch.softmax(output, dim=1)
    max, index = torch.max(sm, dim=1)
    score = max.cpu().data.numpy()
    score = score * -1
    return score

'''
Calculates the cross entropy anomaly score of a batch of outputs.

output: The model's output for a batch of data.
'''
def cross_entropy_anomaly_score(output):
    mean = torch.mean(output, dim=1)

    exp = torch.exp(output)
    exp = torch.sum(exp, dim=1)

    score = mean - torch.log(exp)
    score = score.cpu().data.numpy()
    return score

## Print AUROC Results

Run the following cells in order to see how well each of the anomaly score calculators do on the OOD datasets.

In [None]:
print("======= Max Logit AUROC Scores =======")
print_all_results(max_logit_anomaly_score, "Max Logit", net)

Rademacher Noise Detection
AUROC: 			70.13%

SVHN Detection
AUROC: 			90.72%

CIFAR-100 Detection
AUROC: 			85.96%


In [None]:
print("======= Max Softmax AUROC Scores =======")
print_all_results(max_softmax_anomaly_score, "Max Softmax", net)

Rademacher Noise Detection
AUROC: 			80.34%

SVHN Detection
AUROC: 			92.60%

CIFAR-100 Detection
AUROC: 			88.43%


In [None]:
print("======= Cross Entropy AUROC Scores =======")
print_all_results(cross_entropy_anomaly_score, "Cross Entropy", net)

Rademacher Noise Detection
AUROC: 			70.16%

SVHN Detection
AUROC: 			91.01%

CIFAR-100 Detection
AUROC: 			87.74%


## Implement ViM

We will now implement virtual-logit matching (ViM). ViM works by doing the following. 

ViM calculates the **principal space** ($P^\perp$) using the training data's (CIFAR-10)  feature (penultimate). Let us use the 12 most significant principal components for our principal space. HINT: You may find it helpful to use np.linalg.svd and then principal component analysis. One can also use covarience to find the principal space.

Then, ViM calculates **alpha** ($\alpha$) by projecting the training data's features onto the prinicpal space calculated before. This is known as the **residual** ($x^{P^\perp}$). Alpha is then calculated by the following equation:
> $\alpha := \frac{∑^k_{i=1}\text{max}_{j=1,...,C}\{l^i_j\}}{∑^K_{i=1}\| x_i^{P\perp} \|}$

In other words, alpha is the sum of each logit's maximum value, divided by the sum of the norms of each feature's residuals.

Your calculated alpha in this part should be in the range of 1.5 to 1.9.


In [None]:
# You may find the following functions useful.
from numpy.linalg import pinv, norm
from scipy.special import logsumexp

_score = []
to_np = lambda x: x.data.cpu().numpy()
concat = lambda x: np.concatenate(x, axis=0)

# Extraction fully connected layer's weights and biases
w, b = net.fc.weight.cpu().detach().numpy(), net.fc.bias.cpu().detach().numpy()
# Origin of a new coordinate system of feature space to remove bias
u = -np.matmul(pinv(w), b)

'''
Calculates and returns the principal space and alpha values given
the training data the model used.

training_data_loader: A DataLoader that contains the loaded data of a 
                      training dataset.
model_net: The classifier model.
verbose: If true, will print out the alpha value.
return: Returns a tuple whose first element is the principal space matrix
        and second element is the alpha value.
'''
def compute_ViM_principal_space_and_alpha(training_data_loader, model_net, verbose = False):
    result = []

    # Getting the first batch of the training data to calculate principal space and alpha
    training_data, target = next(iter(training_data_loader))
    if torch.cuda.is_available():
        training_data = training_data.cuda()

    result = model_net(training_data)
    logit = result[0] # Logits (values before softmax)
    penultimate = result[1] # Features/Penultimate (values before fully connected layer)

    logit_id_train = logit.cpu().detach().numpy().squeeze()
    feature_id_train = penultimate.cpu().detach().numpy().squeeze()

    if verbose:
        print('Computing principal space...')

    centered_features = feature_id_train - u # do we add u or subtract u?
    U, S, Vt = np.linalg.svd(centered_features)
    principal_space = Vt[:12,:].T

    if verbose:
        print('Computing alpha...')

    max, index = torch.max(torch.tensor(logit_id_train), dim=1)
    max_sum = torch.sum(max)

    proj = np.matmul(feature_id_train, principal_space) # do we use centered features?

    proj = np.linalg.norm(proj, axis=1)
    proj_sum = torch.sum(torch.tensor(proj))


    alpha = max_sum / proj_sum

    if verbose:
        print(f'alpha = {alpha}')

    return principal_space, alpha

principal_space, alpha = compute_ViM_principal_space_and_alpha(cifar10_loader, net, verbose = True)

Computing principal space...
Computing alpha...
alpha = 1.6956534385681152


## Implement ViM Anomaly Score Calculator

Now, implement the ViM anomaly score calculator. 

First, we want to project our penultimate/feature values onto the principal space which we found before, which is called the residual. We multiply the norm of this residual by alpha to get what we call the **virtual logit score**. 

Next, we get what we call the **energy score** by taking LogSumExp of the logits. 

Finally, the outputted **anomaly score** is calculated by subtracting the virtual logit by the energy score.

$\text{vlogit}= \alpha \| {x^{P^\perp}} \|$\
$\text{energy} = \text{ln}\sum_{i=1}^C e^{l_i}$\
$\text{anomaly_score} = \text{vlogit} - \text{energy}$

In [None]:
'''
Calculates the ViM anomaly score of a batch of outputs.

output: The model's output for a batch of data.
penultimate: The model's penultimate (feature values) for a batch of data.
'''
def ViM_anomaly_score_calculator(output, penultimate):
    logit_id_val = output.cpu().detach().numpy().squeeze()
    feature_id_val = penultimate.cpu().detach().numpy().squeeze()

    centered_features = feature_id_val - u # do we add u or subtract u?
    U, S, Vt = np.linalg.svd(centered_features)
    principal_space = Vt[:12,:].T


    max, index = torch.max(torch.tensor(logit_id_val), dim=1) 
    max_sum = torch.sum(max)

    mult = np.matmul(feature_id_val, principal_space) # do we use centered features? 
    # principal_norm = np.linalg.norm(principal_space)

    # 200,12
    proj = mult # / principal_norm

    # 200,
    proj = np.linalg.norm(proj, axis=1)
    alpha = 1.6956534385681152

    vlogit = torch.tensor(alpha * proj).cpu()


    # calculating energy
    exp = torch.exp(output)
    sum_exp = torch.sum(exp, dim=1)
    log_sum_exp = torch.log(sum_exp).cpu()


    # score_id 
    score_id = vlogit - log_sum_exp

    return score_id

print("======= ViM_anomaly_score_calculator =======")
w, b = net.fc.weight.cpu().detach().numpy(), net.fc.bias.cpu().detach().numpy()
u = -np.matmul(pinv(w), b)
principal_space, alpha = compute_ViM_principal_space_and_alpha(cifar10_loader, net) # Making sure you have the correct ViM values before calculating the score 
print_all_results(ViM_anomaly_score_calculator, "ViM", net, use_penultimate = True)

Rademacher Noise Detection
AUROC: 			95.84%

SVHN Detection
AUROC: 			94.19%

CIFAR-100 Detection
AUROC: 			90.34%


## Compare Anomaly Score Results
Run the following cell to see how the different anomaly score calculators compare to each other for the OOD datasets. You should see that ViM is superior to other anomaly scores in all of the datasets.

In [None]:
# ///////////////// Compare Results /////////////////

def get_results_max(model_name = "normal"):
    all_anomaly_results[model_name]["max"] = [0,0,0,0,0]
    for key in all_anomaly_results[model_name].keys():
        if (key != "max"):
            index = 0
            for score in all_anomaly_results[model_name][key]:
                all_anomaly_results[model_name]["max"][index] = \
                    max(score, all_anomaly_results[model_name]["max"][index])
                index += 1


def compare_all_results():
    for model_name in all_anomaly_results:
        to_be_printed = " " * (25 - len(model_name)) + model_name
        dataset_names = ["Rademacher", "SVHN", "CIFAR-100", "Average"]
        for name in dataset_names:
            to_be_printed += " | " + " "*(6-math.ceil(len(name)/2)) + \
                                name + " "*(6-math.floor(len(name)/2))

        print(to_be_printed)
        print("=" * (25 + len(dataset_names) * 15))

        get_results_max(model_name = model_name)
        for key in all_anomaly_results[model_name].keys():
            if (key != "max"):
                to_be_printed = " "*(25-len(key)) + key
                index = 0
                for result in all_anomaly_results[model_name][key]:
                    if (all_anomaly_results[model_name]["max"][index] == result):
                        result = "*" + '{:.2f}'.format(round(result * 100, 2)) + "%"
                    else:
                        result = '{:.2f}'.format(round(result * 100, 2)) + "%"
                    to_be_printed += " | " + " "*(6-math.ceil(len(result)/2)) + \
                                        result + " "*(6-math.floor(len(result)/2))
                    index += 1
                print(to_be_printed)
        print()

    print("\n* highlights the maximum AUROC Score for an OOD Dataset")

compare_all_results()

            default_model |  Rademacher  |     SVHN     |  CIFAR-100   |   Average   
                Max Logit |    70.13%    |    90.72%    |    85.96%    |    82.27%   
              Max Softmax |    80.34%    |    92.60%    |    88.43%    |    87.12%   
            Cross Entropy |    70.16%    |    91.01%    |    87.74%    |    82.97%   
                      ViM |   *95.84%    |   *93.95%    |   *89.41%    |   *93.06%   


* highlights the maximum AUROC Score for an OOD Dataset


## Data Augmentation

Now we will see if models trained using data augmentation methods for robustness help in OOD detection. 

We will load a CIFAR-10 model that used PixMix data augmentation during training, and see how it fares compared to the default model that did not use data augmentation.

In [None]:
!gdown 1mjIfbb3mfXXvAZ1sBnjotFr5yYFmLi68 # Downloading PixMix model
!gdown 1skZT6yplO-Sv4M8Ksgzx14cTY3H-HgOa # Downlaoding wideresnet class

Downloading...
From: https://drive.google.com/uc?id=1mjIfbb3mfXXvAZ1sBnjotFr5yYFmLi68
To: /content/checkpoint.pth.tar
100% 71.8M/71.8M [00:00<00:00, 110MB/s] 
Downloading...
From: https://drive.google.com/uc?id=1skZT6yplO-Sv4M8Ksgzx14cTY3H-HgOa
To: /content/wideresnet_with_pen.py
100% 4.03k/4.03k [00:00<00:00, 7.64MB/s]


In [None]:
from wideresnet_with_pen import WideResNet as WideResNet2

In [None]:
# Loading the PixMix model

pixmix_net = WideResNet2(depth=40, num_classes=10, widen_factor=4, drop_rate=0.3)
pixmix_net = torch.nn.DataParallel(pixmix_net)

if torch.cuda.is_available():
    checkpoint = torch.load('checkpoint.pth.tar')
    pixmix_net.load_state_dict(checkpoint['state_dict'])
    net.eval()
    net.cuda()
else:
    checkpoint = torch.load('checkpoint.pth.tar', map_location=torch.device('cpu'))
    pixmix_net.load_state_dict(checkpoint['state_dict'])
    net.eval()

## PixMix Model Testing

Let us now test the PixMix model with the same anomaly score calculators we coded before.

In [None]:
print("======= Max Logit AUROC Scores =======")
print_all_results(max_logit_anomaly_score, "Max Logit", pixmix_net, model_name = "pixmix_trained_model")

Rademacher Noise Detection
AUROC: 			94.06%

SVHN Detection
AUROC: 			92.60%

CIFAR-100 Detection
AUROC: 			87.27%


In [None]:
print("======= Max Softmax Probability AUROC Scores =======")
print_all_results(max_softmax_anomaly_score, "Max Softmax Probability", pixmix_net, model_name = "pixmix_trained_model")

Rademacher Noise Detection
AUROC: 			92.44%

SVHN Detection
AUROC: 			91.09%

CIFAR-100 Detection
AUROC: 			87.96%


In [None]:
print("======= Cross Entropy AUROC Scores =======")
print_all_results(cross_entropy_anomaly_score, "Cross Entropy", pixmix_net, model_name = "pixmix_trained_model")

Rademacher Noise Detection
AUROC: 			94.53%

SVHN Detection
AUROC: 			92.21%

CIFAR-100 Detection
AUROC: 			86.67%


In [None]:
print("======= ViM_anomaly_score_calculator =======")
w, b = pixmix_net.module.fc.weight.cpu().detach().numpy(), pixmix_net.module.fc.bias.cpu().detach().numpy()
u = -np.matmul(pinv(w), b)
principal_space, alpha = compute_ViM_principal_space_and_alpha(cifar10_loader, pixmix_net)
print_all_results(ViM_anomaly_score_calculator, "ViM", pixmix_net, model_name = "pixmix_trained_model", use_penultimate = True)

Rademacher Noise Detection
AUROC: 			97.02%

SVHN Detection
AUROC: 			94.91%

CIFAR-100 Detection
AUROC: 			91.57%


## Compare No-data-augmentation Model vs PixMix Model

Let us now compare how the default model compared to the PixMix model by running the following cell. 

You should see that the PixMix model successfully helps us in OOD detection, and has a higher AUROC score.

In [None]:
compare_all_results()

            default_model |  Rademacher  |     SVHN     |  CIFAR-100   |   Average   
                Max Logit |    70.13%    |    90.72%    |    85.96%    |    82.27%   
              Max Softmax |    80.34%    |    92.60%    |    88.43%    |    87.12%   
            Cross Entropy |    70.16%    |    91.01%    |    87.74%    |    82.97%   
                      ViM |   *95.84%    |   *93.95%    |   *89.41%    |   *93.06%   

     pixmix_trained_model |  Rademacher  |     SVHN     |  CIFAR-100   |   Average   
                Max Logit |    94.06%    |    92.60%    |    87.28%    |    91.31%   
  Max Softmax Probability |    92.44%    |    91.09%    |    87.96%    |    90.50%   
            Cross Entropy |    94.53%    |    92.21%    |    86.67%    |    91.14%   
                      ViM |   *97.02%    |   *94.91%    |   *91.57%    |   *94.50%   


* highlights the maximum AUROC Score for an OOD Dataset
