# Out-of-Distribution (OOD) Detection Tutorial


## _Problem Statement_

For most computer vision tasks like **image classification** and **object detection**, out-of-distribution (OOD) detection can provide insight into operational drift, or training problems. A way to identify these is through autoencoding reconstruction error.

To help with this, DataEval has an OOD detector that allows a user to identify these images.


### _When to use_

The `OOD_AE` class and similar should be used when you would like to find individual images in a dataset which are the most different from the others in the provided set.


### _What you will need_

1. A training image dataset with the approximate percentage of known OOD images.
2. A test image dataset to evaluate for OOD images.
3. A python environment with the following packages installed:
   - `dataeval[tensorflow]` or `dataeval[all]`


### _Setting up_

Let's import the required libraries needed to set up a minimal working example


In [None]:
try:
    import google.colab  # noqa: F401

    # specify the version of DataEval (==X.XX.X) for versions other than the latest
    %pip install -q dataeval[tensorflow]
except Exception:
    pass

import os

os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"

In [1]:
import numpy as np

from dataeval._internal.datasets import MNIST
from dataeval.detectors.ood import OOD_AE, OOD_VAEGMM
from dataeval.utils.tensorflow.models import AE, VAEGMM, create_model

2024-09-28 05:32:07.489153: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-09-28 05:32:07.491286: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-09-28 05:32:07.515833: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-09-28 05:32:07.515862: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-09-28 05:32:07.516510: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to

## Load the data

We will use the tensorflow mnist dataset for this tutorial


In [3]:
# Load in the training mnist dataset and use the first 4000
train_ds = MNIST(root="./data/", train=True, download=True, size=4000, unit_interval=True, channels="channels_first")

# Split out the images and labels
images, labels = train_ds.data, train_ds.targets
input_shape = images[0].shape

## Initialize the model

Now, lets look at how to use DataEval's OOD detection methods.  
We will focus on a simple autoencoder network from our Alibi Detect provider


In [5]:
detectors = [
    OOD_AE(create_model(AE, input_shape)),
    OOD_VAEGMM(create_model(VAEGMM, input_shape)),
]

2024-09-28 05:33:44.212106: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-09-28 05:33:44.213057: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2256] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...


## Train the model

Next we will train a model on the dataset.
For better results, the epochs can be increased.
We set the threshold to detect the most extreme 1% of training data as out-of-distribution.


In [6]:
for detector in detectors:
    print(f"Training {detector.__class__.__name__}...")
    detector.fit(images, threshold_perc=99, epochs=20, verbose=False)

Training OOD_AE...
Training OOD_VAEGMM...


## Test for OOD

We have trained our detector on a dataset of digits.  
What happens when we give it corrupted images of digits (which we expect to be "OOD")?


In [7]:
corruption = MNIST(
    root="./data",
    train=True,
    download=False,
    size=2000,
    unit_interval=True,
    channels="channels_first",
    corruption="translate",
)
corrupted_images = corruption.data

Files already downloaded and verified


Now we evaluate the two datasets using the trained model.


In [8]:
[(type(detector).__name__, np.mean(detector.predict(images).is_ood)) for detector in detectors]

[('OOD_AE', 0.01), ('OOD_VAEGMM', 0.01075)]

In [10]:
[(type(detector).__name__, np.mean(detector.predict(corrupted_images).is_ood)) for detector in detectors]

[('OOD_AE', 0.971), ('OOD_VAEGMM', 0.0365)]

### Results

We can see that the Autoencoder based OOD detector was able to identify most of the translated images as outliers, while the AEGMM was resilient to the perturbation.

Depending on your needs, certain outlier detectors will work better under specific conditions.
