# Example of MLP VAE with MNIST dataset

In [1]:
# First install the library

# %pip install rapidae

Since Rapidae uses the new version of Keras 3, this allows the use of different backends. 
We can select among the 3 available backends (Tensorflow, Pytorch and Jax) by modifying the environment variable "KERAS_BACKEND".
In the next cell we can define it.

In [2]:
import os

os.environ["KERAS_BACKEND"] = "torch"

In [3]:
import sys

notebook_dir = os.path.abspath('')
sys.path.append(os.path.join(notebook_dir, '..'))

from rapidae.pipelines.training import TrainingPipeline
from rapidae.models.vae.vae_model import VAE
from rapidae.models.base.default_architectures import Encoder_MLP, Decoder_MLP
from rapidae.data.utils import display_diff
from rapidae.data.datasets import load_MNIST
from keras import utils

# For reproducibility in Keras 3. This will set:
# 1) `numpy` seed
# 2) backend random seed
# 3) `python` random seed
utils.set_random_seed(1)

ModuleNotFoundError: No module named 'rapidae'

# Download and preprocess the dataset

Download and preprocess the dataset. In this example, the selected dataset is the well-known MNIST composed of handwritten number images.

The "persistant" parameter of the load_MNIST() serves as a flag to determine if we want the dataset to be cached in the datasets folder.

Train and test data are normalized.

We also need to convert the labels into one-hot encoding.

In [None]:
# Load MNIST dataset
x_train, y_train, x_test, y_test = load_MNIST(persistant=True)

x_train = x_train.reshape(x_train.shape[0], -1).astype("float32") / 255
x_test = x_test.reshape(x_test.shape[0], -1).astype("float32") / 255

# Obtain number of clasess
n_classes = len(set(y_train))

# Convert labels to categorical
y_train = utils.to_categorical(y_train, n_classes)
y_test = utils.to_categorical(y_test, n_classes)

### Model creation

In this example we are using a vanilla MLP variational autoencoder. 

In [None]:
# Model creation
model = VAE(input_dim=x_train.shape[1], latent_dim=32,
            encoder=Encoder_MLP, decoder=Decoder_MLP, layers_conf=[128, 64])

### Training pipeline

Define the training pipeline. There you can fix some hyperparameters related to the training phase of the autoencoder, like learning rate, bath size, numer of epochs, etc.
Here you can define callbacks to the model.
Also the pipeline's name can be customized to facilitate the identification of the corresponding folder with the saved models inside output_dir folder.

In [None]:
pipe = TrainingPipeline(name='training_pipeline_mnist_mlp_vae', learning_rate=0.01,
                        model=model, num_epochs=40, batch_size=128)

trained_model = pipe(x=x_train, y=y_train)

### Evaluation step

In [None]:
y_hat = trained_model.predict(x_test)

display_diff(x_test, y_hat['recon'])