In [None]:
%matplotlib widget
import os
import glob
import cuvis
import cuvis_ai
import warnings
import matplotlib
import torchvision.transforms as T
from utils import generate_output_gif
warnings.filterwarnings('ignore')

## Anomaly Detection in Hyperspectral Data

This notebook explores methods for detecting anomalies in hyperspectral image cubes. We will visualize results and demonstrate how different techniques highlight outliers or unusual spectral signatures. In particular, we aim to incorporate the RX detector — a classic algorithm for anomaly detection in hyperspectral imagery — to establish a strong baseline.

### Data: Lentils

In this notebook, we will be using a CUVIS.AI session file (video file) which contains multiple sequential hyperspectral datacubes. We begin by loading the hyperspectral dataset that will be used to evaluate different anomaly detection methods. The RX detector will later operate on this data to identify pixels with statistically distinct spectral signatures.

To get started, we will download a dataset from Google Drive using Cuvis.AI

In [None]:
try:
    base_path = "../data/cuvis_ai_video"
    os.mkdir(base_path)
except FileExistsError as e:
    pass
if not os.path.exists("../data/cuvis_ai_video"):
    data_down = cuvis_ai.data.PublicDataSets()
    data_down.download_dataset("Aquarium", download_path=base_path)

In [None]:
# Let's look at a single example
cubes = glob.glob(f'{base_path}/*.cu3s')
data = cuvis.SessionFile(cubes[0]).get_measurement(0)
sample_cube = data.data.get('cube').array
waves = data.data.get('cube').wavelength
x,y,z = sample_cube.shape
print(f'Width: {x}, Height: {y}, Channels {z}')

As we can see, the dataset has 51 channels. We will be looking at the individual spectra to identify which of them could be considered statistical outliers

### RX Detector

The RX (Reed-Xiaoli) detector is a widely used method for hyperspectral anomaly detection. It computes how different a pixel's spectrum is from the background distribution by measuring the Mahalanobis distance from the global mean.

The RX anomaly score for a pixel **x** is given by:

$$
\text{RX}(\mathbf{x}) = (\mathbf{x} - \boldsymbol{\mu})^\top \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu})
$$

Where:

- $\mathbf{x}$ is the spectral vector of the pixel,  
- $\boldsymbol{\mu}$ is the global mean spectrum,  
- $\boldsymbol{\Sigma}$ is the covariance matrix of the background,  
- $\boldsymbol{\Sigma}^{-1}$ is the inverse (or pseudo-inverse) covariance matrix.  

Pixels with higher RX scores are more likely to be anomalous.


In [None]:
# Define RX detector
rx = cuvis_ai.anomaly.RXDetector()

Now we will take the output of the RX detector and feed it into a decider node to threshold our decisions. Varying the threshold will determine what is considered an anomaly.

In [98]:
# Distance decider node
threshold = 200000000 # This threshold is set for the aquarium dataset! It will vary depending on the dataset.
decider = cuvis_ai.deciders.BinaryDecider(threshold)

Now that we have this simple, two-stage node, we connect them indicating the PCA node will be the entry point for data, which will then flow to the GMM node.

*This will throw an initialization warning "Unsatisfied dimensionality constraint", but this is expected behavior*

In [None]:
# Define and construct graph
graph = cuvis_ai.pipeline.Graph("DemoGraph")
graph.add_base_node(rx)
graph.add_edge(rx, decider)

Cuvis.AI has methods for handling large number of datacubes, including our session file which has over 200 images in it. We'll define it as a dataset to pass into the graph.

This dataset is *unlabeled*, meaning it only contains the raw hyperspectral datacubes, and not label files.

In [None]:
# Define unlabeled dataset
data = cuvis_ai.data.CuvisDataSet(base_path)

### Train the Model

As GMM is an unsupervised classifier, we will need to train the model given a subset of the data. The `fit` method takes a number of sample datacubes from our dataloader and uses that to train the graph. Try adjusting the number of training datacubes and observe the impact that has on the training time. We will then use the `forward` method to generate the output results. Tru 

In [None]:
# Use first four images to fit the data
number_of_training_images = 1
out = graph.forward(*data[0:number_of_training_images])

In [None]:
import matplotlib.pyplot as plt
plt.figure()
plt.imshow(out[0])
plt.show()

### Visualize the Results

Now that we have defined a graph in cuvis.ai, we can use it to classify all the images in dataset. The cells below will generate and display showing the anomaly detector applied to the video rendered as a gif.

In [None]:
generate_output_gif(
    graph,
    data,
    base_path,
    gif_name="rx_result.gif",
    title="(Reed-Xiaoli) Anomaly Detector"
)