# Example of Recurrent VAE + Regressor with CMAPPS dataset

In [1]:
# First install the library

# %pip install aepy

Since Rapidae uses the new version of Keras 3, this allows the use of different backends. 
We can select among the 3 available backends (Tensorflow, Pytorch and Jax) by modifying the environment variable "KERAS_BACKEND".
In the next cell we can define it.

In [2]:
import os

os.environ["KERAS_BACKEND"] = "torch"

In [3]:
import sys

#notebook_dir = os.path.abspath('')
#sys.path.append(os.path.join(notebook_dir, '..', 'src'))

import keras
import numpy as np
from sklearn.metrics import mean_squared_error
from rapidae.pipelines import PreprocessPipeline, TrainingPipeline
from rapidae.models.vae import VAE
from rapidae.models.base import RecurrentDecoder, RecurrentEncoder
from rapidae.metrics import cmapps_score
from rapidae.data.utils import evaluate
from rapidae.data.preprocessing import CMAPSS_preprocessor
from rapidae.data.datasets import load_CMAPSS


# For reproducibility in Keras 3. This will set:
# 1) `numpy` seed
# 2) backend random seed
# 3) `python` random seed
keras.utils.set_random_seed(1)

2023-12-31 20:33:28.108124: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-12-31 20:33:28.108154: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-12-31 20:33:28.108858: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-12-31 20:33:28.114508: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Fix some parameters of the data:
 - The selected subdataset of CMAPSS
 - The specific sensors
 - The length of the window
 - The smoothing instensity
 - The max RUL

For more information you can check the paper: https://www.sciencedirect.com/science/article/pii/S2665963822000537

In [4]:
dataset = 'FD003'
# sensors to work with: T30, T50, P30, PS30, phi
sensors = ['s_3', 's_4', 's_7', 's_11', 's_12']
# windows length
sequence_length = 30
# smoothing intensity
alpha = 0.1
# max RUL
threshold = 125

### Download and preprocess the dataset

Download the dataset and create a pipeline for preprocessing. CMAPSS datassets consists of multiple multivariate time series. Each data set is further divided into training and test subsets. Each time series is from a different engine i.e., the data can be considered to be from a fleet of engines of the same type 

The CMAPSS_preprocessor encapsulates a set of operations in order to prepare properly the data, like generating the RUL values, remove unused sensors, scaling, smoothing, etc.

In [5]:
# NOT IMPLEMENTED YET
# x_train, y_train, x_val, y_val, x_test, y_test = utils.get_data(dataset, sensors,
# sequence_length, alpha, threshold)

train, test, y_test = load_CMAPSS(dataset)

preprocess_pipeline = PreprocessPipeline(
    name='CMAPPS_preprocessing', preprocessor=CMAPSS_preprocessor)

x_train, y_train, x_val, y_val, x_test, y_test = preprocess_pipeline(
    train=train, test=test, y_test=y_test, threshold=100)

2023-12-31 20:33:30 [32m[INFO][0m: +++ CMAPPS_preprocessing +++[0m
2023-12-31 20:33:30 [32m[INFO][0m: Creating folder in ../output_dir/CMAPPS_preprocessing_2023-12-31_20-33-30[0m
2023-12-31 20:33:30 [32m[INFO][0m: Selected preprocessor is a function.[0m


### Model creation

Fix hyperparameters for the model.

In [6]:
timesteps = x_train.shape[1]
input_dim = x_train.shape[2]
intermediate_dim = 300
batch_size = 128
latent_dim = 2
epochs = 2
optimizer = 'adam'

Create the VAE model. Since in this example we are working with time series, the encoder and the decoder are recurren LSTM layers.
Also a classifier is also added taking as input the latent space of the autoencoder. This regressor will be in charge of trying to predict the RUL value.

In [8]:
model = VAE(input_dim=(x_train.shape[1], x_train.shape[2]), latent_dim=2,
            downstream_task='regression', encoder=RecurrentEncoder, decoder=RecurrentDecoder)
# model_callbacks = utils.get_callbacks("p", model, x_train, y_train)



KeyError: 'masking_value'

### Training pipeline 

Create and lauch the pipeline to train the model, in this example we have evaluation data so it can be passed as a dict to the pipeline.

In [None]:
pipeline = TrainingPipeline(
    name='training_pipeline_rul_vae', model=model, num_epochs=1)
trained_model = pipeline(x=x_train, y=y_train, x_eval=x_val, y_eval=y_val)

### Evaluation step

Let's now make evaluate the model over the test set. The selected metrics to evaluate this are the mean square error and the CMAPSS score.
Here we can see the difference between using the evaluate method with a metric imported from the Scikit-Learn library and a custom one.

In [None]:
y_hat = trained_model.predict(x_test)

evaluate(y_true=np.expand_dims(y_test, axis=-1),
         y_hat=y_hat['reg'], sel_metric=mean_squared_error)
evaluate(y_true=np.expand_dims(y_test, axis=-1),
         y_hat=y_hat['reg'], sel_metric=cmapps_score.CMAPSS_Score())