# Advanced Usage Tutorial
This notebooks aims at showing a few advabced features of the translation model.

## 1. Importing the needed parts

In [None]:
# we will need to import a few things first
import numpy as np
import os

# changing working directory
# for imports to work
from pathlib import Path
path = Path(os.getcwd())
print(path)
os.chdir(path.parent)

# import the encoders and decoders we want to use
from multimodal_autoencoders.model.encoders import DynamicEncoder
from multimodal_autoencoders.model.decoders import DynamicDecoder

# import the Autoencoder class
from multimodal_autoencoders.base.autoencoder import VariationalAutoencoder

# import a discriminator and a classifier
from multimodal_autoencoders.model.classifiers import Discriminator, SimpleClassifier

# import the JointTrainer aka the brains of the operation
from multimodal_autoencoders.joint_trainer import JointTrainer

## 2. Setting up the data
We will use the same synthetic data set up as in the basic usage example.

In [None]:
# initialize the random generator
rng = np.random.default_rng(seed = 1234)

# create some latent information common to all modalities
train_latent_information = rng.random(size = (100, 25))

# small helper function for our synthetic data
def generate_modality(latent_information: np.array, n_random_dims: int, samples: int = 100):
    ar = np.concatenate((latent_information, rng.random(size = (100, n_random_dims))), axis=1)
    rng.shuffle(ar, axis=1)

    return ar

# define the data dictionary
# this will be the first part you'll need to hold your actual data
train_data_dict = {
    "modality_1": generate_modality(train_latent_information, 25),
    "modality_2": generate_modality(train_latent_information, 50),
    "modality_3": generate_modality(train_latent_information, 75)}


# we will also create a separate validation data set sharing some similarity to the training data
val_latent_information = train_latent_information * 0.8 + rng.random(size = (100, 25)) * 0.2
val_data_dict = {
    "modality_1": generate_modality(val_latent_information, 25),
    "modality_2": generate_modality(val_latent_information, 50),
    "modality_3": generate_modality(val_latent_information, 75)}

## 3. Setting up the models
### 3.1 Autoencoders with individual pretraining and frozen joint training
Depending on your use case, a pretraining of one or multuple models might be helpful for the overall performance. Additionally, you might want to keep this pre-trained model in its trained state for the joint training and only let the other models adapt to it. For such a scenario, each autoencoder accepts two more argumens:
- pretrain_epochs: integer numver of epochs the model should be pretrained
- train_joint: boolean flag whether the model should also be trained in joint mode

In [None]:
model_dict = {
    "modality_1": VariationalAutoencoder(DynamicEncoder(50, 42, 2), DynamicDecoder(50, 42, 36, 2), "adam", 0.001),
    "modality_2": VariationalAutoencoder(DynamicEncoder(75, 50, 2), DynamicDecoder(75, 50, 36, 2), "adam", 0.001),
    "modality_3": VariationalAutoencoder(DynamicEncoder(100, 75, 2), DynamicDecoder(100, 75, 36, 2), "adam", 0.001, pretrain_epochs = 10, train_joint = False)}

#### 3.2.1 Discriminator

In [None]:
discriminator = Discriminator("adam", 0.001, 36, len(model_dict), 50)

#### 3.2.2 Classifier and cluster labels

In [None]:
cluster_data = np.concatenate((np.repeat(0, 50), np.repeat(1, 50))).flatten()

In [None]:
classifier = SimpleClassifier("adam", 0.001, 36, 2)

## 4. Seting up the JointTraner

In [None]:
model = JointTrainer(
        model_dict = model_dict,
        discriminator = discriminator,
        
        classifier = classifier)

## 5. Train with early stopping
Now that all parts of the model are set up again, we can begin training. This time around we don't only want to train the model, but we mant to make sure it stops training as soon as it stops to generalize to unseen data. The method of early stopping allows to do this automatically. The train call provides two further parameters to customize the early stopping procedure:
- patience: integer number of epochs that the model is allowed to not improve on unseen data 
- min_value: float value of minimal difference between validation loss of the previous and the current epoch needed to count as an improvement

These parameters should be chosen carefully. Too little patience will lead to the training stopping to early, even though the model would have recovered a few epochs later. Too much patience and the model might train longer than needed. The architecture will store a model checkpoint for you at the beginning of a consecutive series of overfitting epochs. This allows to return the optimal point at which the model was best performing and best generalizing.<br>
The min_value needed to count an epoch result as overfitted can be very domain specific. Depending on the scale of your data a larger difference between training and validation loss might be loss of an issue. Always keep in mind to not too small of a value as otherwise you might stop the training prematurely. A good practice is to do a first short training run and evalutate a good min_value based on the log.

In [None]:
meter_dict = model.train(
    train_data_dict = train_data_dict,
    val_data_dict = val_data_dict,
    batch_size = 10,
    max_epochs = 10,
    recon_weight = 3,
    beta = 0.001,
    disc_weight = 3,
    anchor_weight = 1,
    cl_weight = 3,
    cluster_labels = cluster_data,
    use_gpu = False,
    patience = 2,
    min_value = 10)

print(meter_dict["loss"].avg)

## 6. Pre-training with a classifier
In some cases it might be beneficial to not only pre-train an autoencoder, but to have it influenced by a cluster classifier. This allows to create the original use case published by Dai Yang et al. of pre-training an autoencoder and a classifer, to which the other models get aligned in the joint training phase. The train method of the JointTrainer class provides a "cluster_modality" parameter towards this aim. Simply provide the model key to which the classifier should be added during pre-training. 

In [None]:
model_dict = {
    "modality_1": VariationalAutoencoder(DynamicEncoder(50, 42, 2), DynamicDecoder(50, 42, 36, 2), "adam", 0.001),
    "modality_2": VariationalAutoencoder(DynamicEncoder(75, 50, 2), DynamicDecoder(75, 50, 36, 2), "adam", 0.001),
    "modality_3": VariationalAutoencoder(DynamicEncoder(100, 75, 2), DynamicDecoder(100, 75, 36, 2), "adam", 0.001, pretrain_epochs = 10, train_joint = False)}

# re-intialize the trainer
model = JointTrainer(
        model_dict = model_dict,
        discriminator = discriminator,
        classifier = classifier)

# launch training with classifier in pre-training
# by providing an existing model key through the
# cluster_modality parameter
meter_dict = model.train(
    train_data_dict = train_data_dict,
    val_data_dict = val_data_dict,
    batch_size = 10,
    max_epochs = 10,
    recon_weight = 3,
    beta = 0.001,
    disc_weight = 3,
    anchor_weight = 1,
    cl_weight = 3,
    cluster_labels = cluster_data,
    cluster_modality = "modality_3",
    use_gpu = False)

print(meter_dict["loss"].avg)