# Autoencoders
Unsupervised learning can be a handy tool. What's best is: it does not require any labeled data.
One popular type of unsupervised learning models are autoencoders.
They learn efficient data encodings by compressing the data into a low dimensional representation and subsequently trying to reproduce the original input from that learned representation as best as possible.
Autoencoders are used for all kinds of tasks that involve dimensionality reduction.
The applications include data visualization, data denoising, anomaly detection and information retrieval.

Today we will focus on the last one: information retrieval. We will use autoencoders to learn low dimensional encodings for the CAD models of the DMU-Net dataset. We will then apply a simple k-nearest-neighbors algorithm to the encodings in order to retrieve the most similar CAD models for a given model. Let's begin.

## Installing Dependencies

In [None]:
!pip install tensorflow matplotlib renumics-spotlight scipy umap-learn k3d

## Defining imports
The imports are the same as in the 3D classification tutorial, with the addition of `sklearn` for the k-nearst-neighbors algorithm.

In [1]:
import os

os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"

import k3d
import numpy as np
import pyarrow
from ipywidgets import GridspecLayout, Label, VBox
from sklearn.neighbors import NearestNeighbors
from tensorflow.keras.layers import Conv3D, Input, MaxPool3D, UpSampling3D
from tensorflow.keras.models import Model

pyarrow.PyExtensionType.set_auto_load(True)

## Loading the dataset
Again, we load and process the DMU-Net dataset. This time we only need the input data, no labels.

In [None]:
import datasets

ds = datasets.load_dataset("renumics/dmu_tiny")
ds_train = ds["train"]
ds_test = ds["test"]

In [3]:
class_names = ["Nut", "Screw", "GearWheel"]

train_geometries = np.array(ds_train["voxel"])
train_labels = np.array(ds_train["label"])

test_geometries = np.array(ds_test["voxel"])
test_labels = np.array(ds_test["label"])
test_ids = np.array(ds_test["id"])

train_geometries = train_geometries.reshape(*train_geometries.shape, 1)
test_geometries = test_geometries.reshape(*test_geometries.shape, 1)
all_geometries = np.append(test_geometries, train_geometries, 0)

## Building the network
Now we build the autoencoder. The architecture basically consists of an encoder and a decoder. The encoder compresses the input data into a low dimensional representation. The decoder tries to recover the original data.
![](imgs/autoencoder.png)
*Architecture of an autoencoder. [Source.](https://towardsdatascience.com/applied-deep-learning-part-3-autoencoders-1c083af4d798)*

For our task we compress the <tt>48x48x48</tt> geometric shapes into a <tt>12x12x12x8</tt> encoding. This gives us  a dimensionality reduction of factor 8. For a higher reduction we would need more training time and possibly more training data.

In [4]:
input_geometry = Input((48, 48, 48, 1))
x = Conv3D(filters=16, kernel_size=(3, 3, 3), padding="same", activation="relu")(input_geometry)
x = MaxPool3D(pool_size=(2, 2, 2))(x)
x = Conv3D(filters=8, kernel_size=(3, 3, 3), padding="same", activation="relu")(x)
encoded = MaxPool3D(pool_size=(2, 2, 2))(x)

x = Conv3D(filters=8, kernel_size=(3, 3, 3), padding="same", activation="relu")(encoded)
x = UpSampling3D(size=(2, 2, 2))(x)
x = Conv3D(filters=16, kernel_size=(3, 3, 3), padding="same", activation="relu")(x)
x = UpSampling3D(size=(2, 2, 2))(x)
decoded = Conv3D(filters=1, kernel_size=(3, 3, 3), padding="same", activation="sigmoid")(x)

The encoder has a simple CNN architecture, consisting of 2 convolutions followed by pooling operations. The architecture of the decoder is basically inverse to the encoder. The <tt>UpSampling3D</tt> layer simply doubles the dimensions by repeating each value across a local patch of size <tt>2x2</tt>.

The autoencoder consists of the encoder followed by the decoder.

In [5]:
autoencoder = Model(input_geometry, decoded)
autoencoder.compile(optimizer="adam", loss="binary_crossentropy")

## Training
We train the autoencoder by defining the geometric shapes as both input and output of the model. Thus, the model learns to construct (and recover) a low dimensional representation that encodes the essential features of the data.

This time we need to train a bit longer to obtain good results.

In [None]:
autoencoder.fit(
    train_geometries,
    train_geometries,
    validation_data=(test_geometries, test_geometries),
    epochs=20,
    batch_size=16,
)

## Visualization of the Reconstructions
We have now trained both an encoder and a decoder for our CAD models. We can examine the quality of the learned encodings by looking at how well the decoder was able to reconstruct the general shape of the geometries. You should be able to still recognize the model classes.  If you have the time, you can try to increase the number of training epochs, that should further improve your encodings.

In [None]:
predictions = autoencoder.predict(test_geometries)
reconstructions = predictions >= 0.5
rows, cols = 4, 3
grid = GridspecLayout(rows, cols)
for i in range(rows):
    for j in range(cols):
        sample_idx = i * cols + j
        plot = k3d.plot(height=300, menu_visibility=False, grid_visible=False)
        plot += k3d.voxels(
            reconstructions[sample_idx].squeeze().astype(np.uint8), bounds=[0, 1, 0, 1, 0, 1]
        )
        grid[i, j] = VBox([plot])
grid

## Retrieval
For the shape retrieval we are only interested in the first half of the autoencoder: the encoder.
The task is to retrieve the most similar geometries for a test geometry.
We use the trained encoder model to extract encodings for the CAD geometries. The codes are flattened into vectors.

In [None]:
encoder = Model(input_geometry, encoded)
train_codes = encoder.predict(train_geometries)
train_codes = train_codes.reshape(-1, np.prod(train_codes[0].shape))
test_shape = test_geometries[0]
test_code = encoder.predict(test_shape[None])
test_code = test_code.reshape(-1, np.prod(test_code.shape))

Now we use the encodings to compare and search for the most similar geometries for the given geometry.
We use a k-nearest-neighbor algorithm to retrieve the 3 most similar items.

In [None]:
knn = NearestNeighbors(metric="euclidean")
knn.fit(train_codes)
neighbor_distances, neighbor_indices = knn.kneighbors(test_code, n_neighbors=3)

Let's look at the given geometry.

In [None]:
print("Queried geometric shape")
plot = k3d.plot(menu_visibility=False)
plot += k3d.voxels(test_shape.astype(np.uint8), bounds=[0, 1, 0, 1, 0, 1])
plot.display()

Now let's look at the retrieved shapes. If all went well they should have a strong resemblance to the queried shape.

In [None]:
print("Retrieved similar shapes")
grid = GridspecLayout(1, 3)
for j in range(3):
    sample_idx = neighbor_indices[0, j]
    plot = k3d.plot(height=300, menu_visibility=False, grid_visible=False)
    plot += k3d.voxels(train_geometries[sample_idx].astype(np.uint8), bounds=[0, 1, 0, 1, 0, 1])
    grid[0, j] = VBox([Label(value="Distance: {:.2f}".format(neighbor_distances[0, j])), plot])
grid

This is the last of the tutorials. You have constructed a whole 3D shape retrieval system, based on low dimensional encodings generated by autoencoders. Hopefully the tutorials have helped you gain a general sense of working with neural networks. Cheers!