# Autoencoder Trainer


## _Problem Statement_

For most computer vision tasks like **image classification** and **object detection**, the size of the image datasets can put an enormous strain on the speed of dataset analysis methods. A way to lessen this burden is to reduce the size of the images without losing the _important_ information. This is known as **dimensionality reduction**. Given the high dimensionality of image data, this is best done using an autoencoder trained on a reconstruction task.

To help with this, DataEval has introduced a lightweight, easy-to-use Autoencoder Training class ( `AETrainer` ), that allows a user to have out-of-the-box functionality for this type of dimensionality reduction.


### _When to use_

The `AETrainer` class should be used when you have lots of images, have very large images, or your given speed requirements are strict


### _What you will need_

1. A PyTorch Dataset with your images returned first in `__getitem__`
2. (Optional) A PyTorch autoencoder model
3. (Optional) A PyTorch autoencoder model with a defined `encode` function
4. A Python environment with the following packages installed:
   - `dataeval` or `dataeval[all]`

If the optional models are not given, a default architecture is used. This default has an `encode` function.
It is encouraged to create a custom architecture that best fits with your data as this will lead to better results during training. We will also provide a sample dataset to facilitate the running of the tutorial.


### _Setting up_

Let's import the required libraries needed to set up a minimal working example


In [None]:
# Google Colab Only
try:
    import google.colab  # noqa: F401

    # specify the version of DataEval (==X.XX.X) for versions other than the latest
    %pip install -q dataeval
except Exception:
    pass

While you can use your own dataset, for this example we will be importing the `MNIST` dataset and use it going forward. Let's import it from the DataEval utils package.


In [None]:
import numpy as np
import torch
from torch.utils.data import Subset

from dataeval.utils.data.datasets import MNIST

Now you will grab the MNIST dataset and look at it's size and shape.


In [None]:
# Configure the dataset transforms
transforms = [
    lambda x: x / 255.0,  # scale to [0, 1]
    lambda x: x.astype(np.float32),  # convert to float32
]

training_dataset = MNIST(root="./data/", image_set="train", transforms=transforms)
testing_dataset = MNIST(root="./data/", image_set="test", transforms=transforms)

In [None]:
print("Training dataset size:", len(training_dataset))
print("Training image shape:", training_dataset[0][0].shape)


There are over 54,000 images in the training set, each 28x28 pixels.
Dimensionality reduction using an encoder will provide speed improvements
for downstream tasks.

:::{note}

The MNIST dataset is very small compared to most operational datasets, and for this example does not actually reduce the image size.
To use your own dataset, replace `training_dataset` and `testing_dataset` in the cells above.

:::


### _Using a default trainer_

#### **Training Phase**

DataEval provides a simple default trainer for autoencoder tasks. Let's import the necessary classes.
In this simple example, we will assume you do not have an autoencoder architecture to use.


In [None]:
from dataeval.utils.torch.models import Autoencoder
from dataeval.utils.torch.trainer import AETrainer

Now you set up the model and trainer.


In [None]:
device = "cuda" if torch.cuda.is_available() else "cpu"
model = Autoencoder(channels=1)
trainer = AETrainer(model, device=device, batch_size=32)

Let's train the model on a subset (6000 images) of the MNIST data.
Since this is a simpler problem, you will reduce the default 25 epochs to 10.


In [None]:
training_subset = Subset(training_dataset, range(6000))
training_loss = trainer.train(training_subset, epochs=10)
print(training_loss[-1])

#### **Evaluation Phase**

Now that you have a trained model, let's check its performance on a validation set.


In [None]:
eval_loss = trainer.eval(testing_dataset)
print(eval_loss)

In [None]:
### TEST ASSERTION CELL ###
assert -0.1 < training_loss[-1] / eval_loss - 1 < 0.1

Great! You can see that the model was able to perform reconstruction on unseen data. This is only done to confirm that your model did not overfit to the training data.

Now you can encode the dataset and use those embeddings to speed up downstream tasks.

#### **Encoding Phase**

Encoding is different than training or evaluation when using an autoencoder as the latter compresses the image, and then reconstructs it back to the original size.
By calling only the first part of the autoencoder, the **encoder**, you can take advantage of this compression.

Let's show an example using the training data


In [None]:
embeddings = trainer.encode(training_subset)

In [None]:
print("Embedded image shape:", embeddings.shape)

In [None]:
### TEST ASSERTION CELL ###
assert embeddings.shape == (6000, 64, 6, 6)

Now you can see how the encoder can change the overall shape of your images, which can lead to significant benefits for downstream tasks when using large data


### _Additional Information_

**Related Notebooks**

1. [Bayes Error Rate](BayesErrorRateEstimationTutorial.ipynb)
1. [Divergence](HPDivergenceTutorial.ipynb)
1. [Sufficiency](ClassLearningCurvesTutorial.ipynb)
