# Outlier Detection Tutorial


## _Problem Statement_

For most computer vision tasks like **image classification** and **object detection**, outliers can provide insight into operational drift, or training problems. A way to identify these is through autoencoding reconstruction error.

To help with this, DAML has introduced an outlier detector, based on _Alibi Detect_, that allows a user to identify outliers.


### _When to use_

The `OD_AE` class should be used when you would like to find individual images in a dataset which are the most different form the others in the provided set.


### _What you will need_

1. A PyTorch Dataset with your images returned first in \_\_getitem\_\_


### _Setting up_

Let's import the required libraries needed to set up a minimal working example


In [None]:
try:
    %pip install -q daml[tensorflow]
except Exception:
    pass

import os

from pytest import approx

os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"

In [None]:
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

from daml.metrics.outlier_detection import OD_AE, Threshold, ThresholdType

tf.random.set_seed(108)
tf.keras.utils.set_random_seed(408)

## Load the data

We will use the tensorflow mnist dataset for this tutorial on outlier detection


In [None]:
# Load in the mnist dataset from tensorflow datasets
(images, ds_info) = tfds.load(
    "mnist",
    split="train",
    with_info=True,
)  # type: ignore

tfds.visualization.show_examples(images, ds_info)
images = images.shuffle(images.cardinality())
images = [i["image"].numpy() for i in list(images.take(2000))]
images = np.array(images)

## Initialize the model

Now, lets look at how to use DAML's outlier detection methods.  
We will focus on a simple autoencoder network from our Alibi Detect provider


In [None]:
# Initialize the autoencoder-based outlier detector from alibi-detect
metric = OD_AE()

## Train the model

Next we will train a model on the dataset.
For better results, the epochs can be increased.
We set the outlier threshold to detect the most extreme 1% of training data as outliers.


In [None]:
# Train the detector on the set of images
metric.fit_dataset(
    images=images,
    epochs=20,
    threshold=Threshold(100, ThresholdType.PERCENTAGE),
    verbose=False,
)

## Test for outliers

We have trained our detector on a dataset of digits.  
What happens when we give it corrupted images of digits (which we expect to be "outliers")?


In [None]:
corr_images, ds_info = tfds.load(
    "mnist_corrupted/translate",
    split="train",
    with_info=True,
)  # type: ignore

tfds.visualization.show_examples(corr_images, ds_info)
corr_images = corr_images.shuffle(corr_images.cardinality())
corr_images = [i["image"].numpy() for i in list(corr_images.take(2000))]
corr_images = np.array(corr_images)

Now we evaluate the two datasets using the trained model.


In [None]:
preds_in = metric.evaluate(images)["is_outlier"]
print(f"Original digits outliers: {np.mean(preds_in)*100}%")

In [None]:
preds_corr = metric.evaluate(corr_images)["is_outlier"]
print(f"Corrupted digits outliers: {np.mean(preds_corr)*100}%")

In [None]:
### TEST ASSERTION ###
print(np.mean(preds_in))
print(np.mean(preds_corr))
assert np.mean(preds_in) == approx(0.01, abs=0.01)
assert np.mean(preds_corr) == approx(0.8, abs=0.1)

### Results

We identify most of the corrupted images as outliers!
