In [1]:
%matplotlib widget
import os
import glob
import cuvis
import cuvis_ai
import warnings
import matplotlib
import torchvision.transforms as T
from utils import generate_output_gif
warnings.filterwarnings('ignore')

  import pkg_resources


# Unsupervised Classification

### Objective: Cluster a dataset using unsupervised methods to group similar spectra together in hyperspectral datacubes

### Data: Aquarium

In this notebook, we will be using a CUVIS.AI session file (video file) which contains multiple sequential hyperspectral datacubes. You will load the data, and then define a graph which performs spatial and dimensional transforms to the data before clustering the pixels.

To get started, we will download a dataset from Google Drive using Cuvis.AI

In [3]:
base_path = "../data/cuvis_ai_video"
if not os.path.exists(base_path):
    data_down = cuvis_ai.data.PublicDataSets()
    data_down.download_dataset("Aquarium", download_path=base_path)

Now let's look at a single measurement from the dataset. We'll need the size of the dataset to make some decisions on how we should transform the data.

In [4]:
# Let's look at a single example
cubes = glob.glob(f'{base_path}/*.cu3s')
data = cuvis.SessionFile(cubes[0]).get_measurement(0)
sample_cube = data.data.get('cube').array
waves = data.data.get('cube').wavelength
x,y,z = sample_cube.shape
print(f'Width: {x}, Height: {y}, Channels {z}')

Width: 275, Height: 290, Channels 51


As we can see, the dataset has 51 channels. For machine learning applications, we can apply a transformation of the data to reduce the spectral dimensionality. Let's use Principal Components Analysis (PCA) to reduce our number of channels.

In [5]:
number_of_components = 6
# Define PCA with n components
pca = cuvis_ai.preprocessor.PCA(number_of_components)

Now we will take the output of the PCA and feed it into an unsupervised classifier. We will use a Gaussian Mixture Model with a pre-defined number of classes. When picking the number of classes, you'll want to consider the composition of the images to see how many classes "naturally" exist.

In [None]:
number_of_classes = 4
# Define a GMM with n components
gmm = cuvis_ai.unsupervised.GMM(number_of_classes)

Now that we have this simple, two-stage node, we connect them indicating the PCA node will be the entry point for data, which will then flow to the GMM node.

*This will throw an initialization warning "Unsatisfied dimensionality constraint", but this is expected behavior*

In [None]:
# Define and construct graph
graph = cuvis_ai.pipeline.Graph("DemoGraph")
graph.add_base_node(pca)
graph.add_edge(pca, gmm)


Cuvis.AI has methods for handling large number of datacubes, including our session file which has over 200 images in it. We'll define it as a dataset to pass into the graph.

This dataset is *unlabeled*, meaning it only contains the raw hyperspectral datacubes, and not label files.

In [None]:
# Define unlabeled dataset
data = cuvis_ai.data.CuvisDataSet(base_path)

### Train the Model

As GMM is an unsupervised classifier, we will need to train the model given a subset of the data. The `fit` method takes a number of sample datacubes from our dataloader and uses that to train the graph. Try adjusting the number of training datacubes and observe the impact that has on the training time. We will then use the `forward` method to generate the output results. Tru 

In [None]:
# Use first four images to fit the data
number_of_training_images = 4
graph.fit(*data[0:number_of_training_images])

### Visualize the Results

Now that we have defined a graph in cuvis.ai, we can use it to classify all the images in dataset. The cells below will generate and display showing the classification applied to the video rendered as a gif.

In [None]:
generate_output_gif(
    graph,
    data,
    base_path,
    gif_name="gmm_result.gif",
    title="Gaussian Mixture Model - 4 Classes"
)

### Next Steps

As you can see from the results above, clustering takes some tuning to identify clustering parameters which yield good performance. cuvis.ai makes several unsupervised classification techniques available to work with hyperspectral data.

- K-Means Clustering
- Gaussian Mixture Modeling
- Mean-Shift Clustering

Take a peek at the [source code](https://github.com/cubert-hyperspectral/cuvis.ai/blob/main/cuvis_ai/unsupervised/sklearn_wrapped.py) and try out different classification nodes with the dataset!