# Predictive Maintenance: Remaining Useful Life (RUL) Prediction - Hyperparameter Tuning

This notebook focuses on optimizing the Long Short-Term Memory (LSTM) neural network model for Remaining Useful Life (RUL) prediction using **Keras Tuner**. Building on the data preprocessing and initial model definition, this phase aims to systematically explore different model architectures and hyperparameters to achieve the best possible performance and generalization.

## Table of Contents

1.  [Setup and Data Preparation](#1.-Setup-and-Data-Preparation)
    * [1.1 Importing Libraries and Loading Preprocessed Data](#1.1-Importing-Libraries-and-Loading-Preprocessed-Data)
    * [1.2 Creating Training and Validation Sets](#1.2-Creating-Training-and-Validation-Sets)
2.  [Hyperparameter Tuning with Keras Tuner](#2.-Hyperparameter-Tuning-with-Keras-Tuner)
    * [2.1 Defining the Tunable Model Function (`build_lstm_model_`)](#2.1-Defining-the-Tunable-Model-Function-(build_lstm_model_))
    * [2.2 Configuring and Running the Hyperband Tuner](#2.2-Configuring-and-Running-the-Hyperband-Tuner)
    * [2.3 Analyzing Tuning Results](#2.3-Analyzing-Tuning-Results)
3.  [Building and Evaluating the Final Model](#3.-Building-and-Evaluating-the-Final-Model)
    * [3.1 Retrieving the Best Hyperparameters](#3.1-Retrieving-the-Best-Hyperparameters)
    * [3.2 Training the Final Model with Optimal Hyperparameters](#3.2-Training-the-Final-Model-with-Optimal-Hyperparameters)
    * [3.3 Saving the Final Model](#3.3-Saving-the-Final-Model)
    * [3.4 Loading the True RUL for Test Data](#3.4-Loading-the-True-RUL-for-Test-Data)
    * [3.5 Making Predictions with the Final Model](#3.5-Making-Predictions-with-the-Final-Model)
    * [3.6 Quantitative Evaluation of the Final Model](#3.6-Quantitative-Evaluation-of-the-Final-Model)
    * [3.7 Visualizing Final Model Performance](#3.7-Visualizing-Final-Model-Performance)
4.  [Conclusion](#4.-Conclusion)

## 1. Setup and Data Preparation

This section handles the necessary imports and loads the preprocessed data, ensuring it's ready for model building and hyperparameter tuning.

### 1.1 Importing Libraries and Loading Preprocessed Data

We import core libraries like `tensorflow`, `numpy`, and `joblib` for model definition, data handling, and loading pre-saved scalers. The `X_train`, `y_train`, and `X_test` NumPy arrays, along with `rul_scaler` and `feature_scaler`, are loaded. These were generated in the `EDA.ipynb` notebook and represent the prepared time-series sequences and their corresponding RUL values.

In [2]:
# Importing necessary modules for LSTM model
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras import layers
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from custom_functions import *

In [3]:
import numpy as np
# Loading Datasets and scalers
import joblib
X_train = np.load('X_train.npy')
y_train = np.load('y_train.npy')
X_test = np.load('X_test.npy')

rul_scaler = joblib.load('rul_scaler.pkl')
scaler = joblib.load('feature_scaler.pkl')

### 3.4 Loading the True RUL for Test Data

To evaluate the final model, we load the true Remaining Useful Life (RUL) values for the test engines from `RUL_FD001.txt`. These values serve as the ground truth for calculating evaluation metrics.

The `rul_df` DataFrame is processed to isolate the RUL values and convert them into a NumPy array (`y_true`) for direct comparison with predictions.

In [4]:
import pandas as pd
# Load the true RUL for the test engines
rul_df = pd.read_csv('test_rul/RUL_FD001.txt', sep=' ', header=None)
rul_df.drop(columns=[1], inplace=True) # Drop the extra column that pandas creates
rul_df.columns = ['RUL'] # Rename the column

# Converting to array for evaluation metrics
y_true = rul_df.to_numpy()
print(y_true[:10])

[[112]
 [ 98]
 [ 69]
 [ 82]
 [ 91]
 [ 93]
 [ 91]
 [ 95]
 [111]
 [ 96]]


In [None]:
@tf.keras.utils.register_keras_serializable() # Necessary to register the custom class in keras to avoid error during model loading
# --- Custom PHM2008Score Metric Class ---
class PHM2008Score(tf.keras.metrics.Metric):
    def __init__(self, name='phm_2008_score', **kwargs):
        super().__init__(name=name, **kwargs)
        self.total_score = self.add_weight(name='total_score', initializer='zeros', dtype=tf.float64)
        self.num_samples = self.add_weight(name='num_samples', initializer='zeros', dtype=tf.int64)

    def update_state(self, y_true, y_pred, sample_weight=None):
        y_true = tf.cast(tf.squeeze(y_true), tf.float32)
        y_pred = tf.cast(tf.squeeze(y_pred), tf.float32)
        d = y_pred - y_true
        score_per_sample = tf.where(d < 0,
                                    tf.exp(-d / 13.0) - 1,
                                    tf.exp(d / 10.0) - 1)
        if sample_weight is not None:
            sample_weight = tf.cast(tf.squeeze(sample_weight), tf.float32)
            score_per_sample = tf.multiply(score_per_sample, sample_weight)

        self.total_score.assign_add(tf.reduce_sum(tf.cast(score_per_sample, tf.float64)))
        self.num_samples.assign_add(tf.cast(tf.shape(y_true)[0], tf.int64))

    def result(self):
        return self.total_score

    def reset_state(self):
        self.total_score.assign(0.0)
        self.num_samples.assign(0)

### 1.2 Creating Training and Validation Sets

For robust hyperparameter tuning and model evaluation, it's crucial to split the training data (`X_train`, `y_train`) into separate training and validation sets. This allows Keras Tuner to evaluate different hyperparameter combinations on data it hasn't seen during training, preventing overfitting to the training set and providing a more realistic estimate of model performance.

We use `train_test_split` from `sklearn.model_selection` to create these sets, reserving 20% of the data for validation and setting `random_state=42` for reproducibility.

The output of `print(y_true[:10])` shows the first 10 true RUL values loaded from `RUL_FD001.txt`. These are the ground truth values we will compare our model's predictions against to assess its accuracy on the test set.

In [6]:
# Creating Seperate Training and Validation Set
from sklearn.model_selection import train_test_split
X_train, X_val, y_train, y_val = train_test_split(X_train,y_train,random_state=42,test_size=0.2)

In [7]:
print(X_train.shape)

(80197, 30, 61)


The output `(samples, timesteps, features)` confirms the shape of our training data, which is essential for defining the input layer of our neural network. For example, `X_train.shape[1]` provides the number of timesteps (sequence length), and `X_train.shape[2]` provides the number of features (sensor readings + operational settings) per timestep.

In [8]:
input_shape = (X_train.shape[1],X_train.shape[2])

## 2. Hyperparameter Tuning with Keras Tuner

This section details the core of our hyperparameter optimization process. We define a function that builds our LSTM model with various tunable parameters, then configure and execute Keras Tuner's Hyperband algorithm to efficiently search for the best combination of these parameters.

### 2.1 Defining the Tunable Model Function (`build_lstm_model_`)

The `build_lstm_model_` function is a crucial component for Keras Tuner. It takes a `HyperParameters` object (`hp`) as input, allowing us to define a search space for different architectural and training parameters.

**Tunable Hyperparameters and Search Space:**

* **`rnn_layer_type`**: Chooses between `'lstm'` and `'gru'` layers for the recurrent part of the network, allowing us to compare their effectiveness.
* **`num_rnn_layers`**: Number of recurrent layers (LSTM or GRU) to stack, from 1 to 4.
* **`{rnn_layer_type}_units_{i}`**: Number of units (neurons) in each recurrent layer, ranging from 32 to 512, in steps of 32.
* **`use_bidirectional_{i}`**: A boolean choice to use a Bidirectional wrapper around each RNN layer, potentially capturing dependencies in both forward and backward directions.
* **`use_recurrent_dropout_{i}`**: A boolean to enable recurrent dropout within the RNN layer to combat overfitting.
* **`recurrent_dropout_rate_{i}`**: Dropout rate for recurrent connections, from 0.0 to 0.3, in steps of 0.05, if `use_recurrent_dropout` is true.
* **`dropout_rnn_{i}`**: Standard dropout layer applied after each RNN layer, with a rate from 0.1 to 0.5.
* **`num_dense_layers`**: Number of Dense (fully connected) layers after the RNN layers, from 0 to 3.
* **`dense_units_{j}`**: Number of units in each Dense layer, from 32 to 256, in steps of 32.
* **`dropout_dense_{j}`**: Standard dropout layer after each Dense layer, with a rate from 0.1 to 0.5.
* **`activation`**: Activation function for Dense layers (excluding the output layer), choosing between `'relu'`, `'leaky_relu'`, and `'elu'`.
* **`learning_rate`**: Learning rate for the optimizer, selected from a discrete set of values (e.g., 0.01, 0.001, 0.0001).
* **`optimizer`**: Choice of optimizer: `'adam'`, `'rmsprop'`, or `'sgd'`.
* **`loss`**: Choice of loss function: `'mse'` (Mean Squared Error) or `'huber'`. Huber loss is less sensitive to outliers than MSE.
* **`l1_regularization_factor`, `l2_regularization_factor`**: L1 and L2 regularization for kernel weights, ranging from 1e-5 to 1e-2.

This extensive search space allows for a thorough exploration of model configurations.

In [9]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras import layers
from tensorflow.keras.layers import LSTM, GRU, Dense, Dropout, Bidirectional
from tensorflow.keras.optimizers import Adam, RMSprop, SGD
from tensorflow.keras.regularizers import l1_l2
from tensorflow.keras.layers import LeakyReLU # Import LeakyReLU
from tensorflow.keras.losses import Huber # Import Huber loss

def build_lstm_model_(hp):
    """
    Builds a Keras Sequential model with LSTM or GRU layers,
    tunable hyperparameters using Keras Tuner.

    Args:
        hp: Keras Tuner HyperParameters object.
        input_shape: A tuple representing the input shape (timesteps, features).
                     e.g., (X_train.shape[1], X_train.shape[2])
    """
    model = Sequential()

    # Input layer shape
    model.add(layers.Input(shape=input_shape))

    # --- Feature: Allow choice between LSTM and GRU ---
    rnn_layer_type = hp.Choice('rnn_layer_type', values=['lstm', 'gru'])

    # --- Enhanced LSTM/GRU Layer Configuration ---
    num_rnn_layers = hp.Int('num_rnn_layers', min_value=1, max_value=4, step=1)

    for i in range(num_rnn_layers):
        # Units for RNN layers
        rnn_units = hp.Int(f'{rnn_layer_type}_units_{i}', min_value=32, max_value=512, step=32)

        # Optional: Bidirectional wrapper
        use_bidirectional = hp.Boolean(f'use_bidirectional_{i}')
        use_recurrent_dropout = hp.Boolean(f'use_recurrent_dropout_{i}')
        recurrent_dropout_rate = hp.Float(f'recurrent_dropout_rate_{i}', min_value=0.0, max_value=0.3, step=0.05) if use_recurrent_dropout else 0.0

        # Determine if the layer should return sequences
        return_sequences = (i < num_rnn_layers - 1)

        if rnn_layer_type == 'lstm':
            rnn_layer = LSTM(
                units=rnn_units,
                return_sequences=return_sequences,
                recurrent_dropout=recurrent_dropout_rate
            )
        else: # GRU
            rnn_layer = GRU(
                units=rnn_units,
                return_sequences=return_sequences,
                recurrent_dropout=recurrent_dropout_rate
            )

        if use_bidirectional:
            model.add(Bidirectional(rnn_layer))
        else:
            model.add(rnn_layer)

        # Standard Dropout after RNN layer
        if hp.Boolean(f'dropout_rnn_{i}'):
            model.add(Dropout(rate=hp.Float(f'dropout_rate_rnn_{i}', min_value=0.1, max_value=0.5, step=0.1)))

    # --- Feature: Multiple Dense Layers ---
    num_dense_layers = hp.Int('num_dense_layers', min_value=0, max_value=2, step=1)

    for i in range(num_dense_layers):
        dense_activation_choice = hp.Choice(f'dense_activation_{i}', values=['relu', 'tanh', 'leaky_relu'])
        activation_function = None
        if dense_activation_choice == 'leaky_relu':
            activation_function = LeakyReLU() # Use LeakyReLU instance
        else:
            activation_function = dense_activation_choice # Use string for 'relu' or 'tanh'

        # Regularization for Dense layers
        l1_reg = hp.Float(f'l1_reg_dense_{i}', min_value=1e-5, max_value=1e-2, sampling='log') if hp.Boolean(f'use_l1_reg_dense_{i}') else 0.0
        l2_reg = hp.Float(f'l2_reg_dense_{i}', min_value=1e-5, max_value=1e-2, sampling='log') if hp.Boolean(f'use_l2_reg_dense_{i}') else 0.0

        model.add(Dense(
            units=hp.Int(f'dense_units_{i}', min_value=32, max_value=256, step=32),
            activation=activation_function,
            kernel_regularizer=l1_l2(l1=l1_reg, l2=l2_reg)
        ))
        if hp.Boolean(f'dropout_dense_{i}'):
            model.add(Dropout(rate=hp.Float(f'dropout_rate_dense_{i}', min_value=0.1, max_value=0.6, step=0.1)))

    # Output layer (1 unit for regression)
    model.add(Dense(1, activation='linear'))

    # --- Enhanced Optimizer and Learning Rate ---
    optimizer_choice = hp.Choice('optimizer', values=['adam', 'rmsprop', 'sgd'])
    learning_rate = hp.Float('learning_rate', min_value=1e-5, max_value=1e-2, sampling='log')

    if optimizer_choice == 'adam':
        optimizer = Adam(learning_rate=learning_rate)
    elif optimizer_choice == 'rmsprop':
        optimizer = RMSprop(learning_rate=learning_rate)
    else: # sgd
        optimizer = SGD(learning_rate=learning_rate)

    # Compile the model
    loss_function_choice = hp.Choice('loss_function', values=['mse', 'mae', 'huber_loss'])
    loss_function = None
    if loss_function_choice == 'huber_loss':
        loss_function = Huber() # Use Huber instance
    else:
        loss_function = loss_function_choice # Use string for 'mse' or 'mae'

    model.compile(
        optimizer=optimizer,
        loss=loss_function,
        metrics=['mae', PHM2008Score()] # MAE for interpretability, PHM2008Score for objective
    )

    return model

### 2.2 Configuring and Running the Hyperband Tuner

We initialize and run the Keras Tuner's `Hyperband` algorithm. Hyperband is an efficient hyperparameter optimization method that adaptively allocates resources (training epochs) to different hyperparameter configurations, quickly discarding poorly performing ones.

**Tuner Configuration:**

* **`hypermodel=build_lstm_model_`**: The function that creates the Keras model with tunable hyperparameters.
* **`objective='val_loss'`**: The metric to monitor and minimize during the search (validation loss).
* **`max_epochs=50`**: The maximum number of epochs to train any given model candidate.
* **`factor=3`**: The reduction factor for the number of models and epochs in successive halving.
* **`directory='my_dir'`**: Directory to store tuning results.
* **`project_name='intro_to_kt'`**: Name of the sub-directory within `directory`.

The `tuner.search()` method then executes the hyperparameter search, training different model configurations on the `X_train` and `y_train` data, while validating on `X_val` and `y_val`.

In [10]:
import keras_tuner as kt

# --- Initialize the Tuner ---
tuner = kt.Hyperband(
    build_lstm_model_,
    objective=kt.Objective('val_phm_2008_score', direction='min'), # Keras Tuner looks for 'val_' prefix
    max_epochs=50,                     # Max epochs to train a model in Hyperband
    factor=3,                          # Reduction factor for Hyperband
    directory='_lstm_tuning_dir',    # Directory to store results
    project_name='lstm_hyperband_tuning', # Name of the project
    overwrite=False
)

# Print a summary of the search space
tuner.search_space_summary()

# --- Run the Search ---
# Create a callback to stop training when a metric has stopped improving
stop_early = tf.keras.callbacks.EarlyStopping(monitor='val_phm_2008_score', patience=10, mode='min', restore_best_weights=True)

print("\nStarting hyperparameter search...")
tuner.search(
    X_train,y_train, # Pass the tf.data.Dataset here
    validation_data=(X_val,y_val), # Pass the tf.data.Dataset here
    callbacks=[stop_early],# Use the early stopping callback
    batch_size=2048,
    verbose=1
)

print("\nHyperparameter search complete!")

Reloading Tuner from _lstm_tuning_dir/lstm_hyperband_tuning/tuner0.json
Search space summary
Default search space size: 50
rnn_layer_type (Choice)
{'default': 'lstm', 'conditions': [], 'values': ['lstm', 'gru'], 'ordered': False}
num_rnn_layers (Int)
{'default': None, 'conditions': [], 'min_value': 1, 'max_value': 4, 'step': 1, 'sampling': 'linear'}
lstm_units_0 (Int)
{'default': None, 'conditions': [], 'min_value': 32, 'max_value': 512, 'step': 32, 'sampling': 'linear'}
use_bidirectional_0 (Boolean)
{'default': False, 'conditions': []}
use_recurrent_dropout_0 (Boolean)
{'default': False, 'conditions': []}
dropout_rnn_0 (Boolean)
{'default': False, 'conditions': []}
num_dense_layers (Int)
{'default': None, 'conditions': [], 'min_value': 0, 'max_value': 2, 'step': 1, 'sampling': 'linear'}
optimizer (Choice)
{'default': 'adam', 'conditions': [], 'values': ['adam', 'rmsprop', 'sgd'], 'ordered': False}
learning_rate (Float)
{'default': 1e-05, 'conditions': [], 'min_value': 1e-05, 'max_valu

## 3. Building and Evaluating the Final Model

Once the hyperparameter tuning is complete, we retrieve the best hyperparameters found by the tuner and use them to train a final model on the *entire* training dataset (X_train, y_train, including the validation split that was used by the tuner). This final model is then evaluated on the completely unseen test data.

### 3.1 Retrieving the Best Hyperparameters

We extract the `best_hps` object from the tuner, which contains the optimal set of hyperparameters that resulted in the lowest validation loss during the tuning process.

In [13]:
# --- Get the best hyperparameters ---
# Ensure 'tuner' object is initialized and has loaded previous results.
best_hps = tuner.get_best_hyperparameters(num_trials=1)[0]

print("\n--- Best Hyperparameters Found ---")

# Access values directly from best_hps.values dictionary
hp_values = best_hps.values

# General hyperparameters
best_optimizer = hp_values.get('optimizer')
best_learning_rate = hp_values.get('learning_rate')
best_loss_function = hp_values.get('loss_function')

print(f"  Optimizer: {best_optimizer}")
print(f"  Learning Rate: {best_learning_rate:.6f}")
print(f"  Loss Function: {best_loss_function}")

# --- RNN Layer Configuration ---
best_rnn_layer_type = hp_values.get('rnn_layer_type')
best_num_rnn_layers = hp_values.get('num_rnn_layers')

print(f"\n  Best RNN Layer Type: {best_rnn_layer_type.upper()}")
print(f"  Number of RNN Layers: {best_num_rnn_layers}")

for i in range(best_num_rnn_layers):
    print(f"\n  --- RNN Layer {i+1} ---")
    best_rnn_units = hp_values.get(f'{best_rnn_layer_type}_units_{i}')
    print(f"    Units: {best_rnn_units}")

    # Access boolean hyperparameters and associated rates
    use_bidirectional_key = f'use_bidirectional_{i}'
    if hp_values.get(use_bidirectional_key, False): # Using dict.get() with default
        print(f"    Bidirectional: True")
    else:
        print(f"    Bidirectional: False")

    use_recurrent_dropout_key = f'use_recurrent_dropout_{i}'
    if hp_values.get(use_recurrent_dropout_key, False):
        best_recurrent_dropout_rate = hp_values.get(f'recurrent_dropout_rate_{i}')
        print(f"    Recurrent Dropout Rate: {best_recurrent_dropout_rate:.2f}")
    else:
        print(f"    Recurrent Dropout: Not used")


    dropout_rnn_key = f'dropout_rnn_{i}'
    if hp_values.get(dropout_rnn_key, False):
        best_dropout_rate_rnn = hp_values.get(f'dropout_rate_rnn_{i}')
        print(f"    Standard Dropout Rate (after RNN): {best_dropout_rate_rnn:.2f}")
    else:
        print(f"    Standard Dropout (after RNN): Not used")


# --- Dense Layer Configuration ---
best_num_dense_layers = hp_values.get('num_dense_layers')
print(f"\n  Number of Dense Layers: {best_num_dense_layers}")

for i in range(best_num_dense_layers):
    print(f"\n  --- Dense Layer {i+1} ---")
    best_dense_units = hp_values.get(f'dense_units_{i}')
    best_dense_activation = hp_values.get(f'dense_activation_{i}')
    print(f"    Units: {best_dense_units}")
    print(f"    Activation: {best_dense_activation}")

    # Check L1 regularization
    use_l1_reg_dense_key = f'use_l1_reg_dense_{i}'
    if hp_values.get(use_l1_reg_dense_key, False):
        best_l1_reg_dense = hp_values.get(f'l1_reg_dense_{i}')
        print(f"    L1 Regularization: {best_l1_reg_dense:.6f}")
    else:
        print(f"    L1 Regularization: Not used")

    # Check L2 regularization
    use_l2_reg_dense_key = f'use_l2_reg_dense_{i}'
    if hp_values.get(use_l2_reg_dense_key, False):
        best_l2_reg_dense = hp_values.get(f'l2_reg_dense_{i}')
        print(f"    L2 Regularization: {best_l2_reg_dense:.6f}")
    else:
        print(f"    L2 Regularization: Not used")

    # Check if dropout after Dense was enabled for this layer
    dropout_dense_key = f'dropout_dense_{i}'
    if hp_values.get(dropout_dense_key, False):
        best_dropout_rate_dense = hp_values.get(f'dropout_rate_dense_{i}')
        print(f"    Dropout Rate (after Dense): {best_dropout_rate_dense:.2f}")
    else:
        print(f"    Dropout (after Dense): Not used")

print("\n----------------------------------")

# --- Get the best model(s) ---
# This part remains the same as it correctly interacts with the tuner object
best_models = tuner.get_best_models(num_models=1)
best_model = best_models[0]

print("\nSummary of the best model:")
best_model.summary()

# --- Print a summary of the tuning results ---
tuner.results_summary()


--- Best Hyperparameters Found ---
  Optimizer: adam
  Learning Rate: 0.000436
  Loss Function: mae

  Best RNN Layer Type: GRU
  Number of RNN Layers: 4

  --- RNN Layer 1 ---
    Units: 416
    Bidirectional: False
    Recurrent Dropout Rate: 0.15
    Standard Dropout Rate (after RNN): 0.50

  --- RNN Layer 2 ---
    Units: 64
    Bidirectional: True
    Recurrent Dropout Rate: 0.05
    Standard Dropout Rate (after RNN): 0.20

  --- RNN Layer 3 ---
    Units: 128
    Bidirectional: True
    Recurrent Dropout: Not used
    Standard Dropout Rate (after RNN): 0.20

  --- RNN Layer 4 ---
    Units: 288
    Bidirectional: False
    Recurrent Dropout: Not used
    Standard Dropout Rate (after RNN): 0.50

  Number of Dense Layers: 0

----------------------------------


I0000 00:00:1751891852.781231     720 gpu_device.cc:2019] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 2246 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1650, pci bus id: 0000:01:00.0, compute capability: 7.5



Summary of the best model:


  saveable.load_own_variables(weights_store.get(inner_path))


Results summary
Results in _lstm_tuning_dir/lstm_hyperband_tuning
Showing 10 best trials
Objective(name="val_phm_2008_score", direction="min")

Trial 0050 summary
Hyperparameters:
rnn_layer_type: gru
num_rnn_layers: 4
lstm_units_0: 224
use_bidirectional_0: False
use_recurrent_dropout_0: True
dropout_rnn_0: True
num_dense_layers: 0
optimizer: adam
learning_rate: 0.0004357443516879724
loss_function: mae
gru_units_0: 416
dropout_rate_rnn_0: 0.5
gru_units_1: 64
use_bidirectional_1: True
use_recurrent_dropout_1: True
dropout_rnn_1: True
lstm_units_1: 160
lstm_units_2: 160
use_bidirectional_2: True
use_recurrent_dropout_2: False
dropout_rnn_2: True
recurrent_dropout_rate_1: 0.05
dropout_rate_rnn_1: 0.2
recurrent_dropout_rate_2: 0.15000000000000002
dropout_rate_rnn_2: 0.2
dense_activation_0: tanh
use_l1_reg_dense_0: True
use_l2_reg_dense_0: False
dense_units_0: 224
dropout_dense_0: True
recurrent_dropout_rate_0: 0.15000000000000002
gru_units_2: 128
gru_units_3: 288
use_bidirectional_3: False


### 3.2 Training the Final Model with Optimal Hyperparameters

A new model (`final_model`) is instantiated using the `best_hps` found by the tuner. This model now has the optimized architecture and configuration. This model is then trained on the *full* `X_train` and `y_train` datasets. This is a common practice to leverage all available training data once the optimal hyperparameters are identified, as the validation set was only used for hyperparameter selection and early stopping.

We use `EarlyStopping` and `ModelCheckpoint` callbacks to ensure the model training is robust and the best weights are saved.
* **`EarlyStopping(patience=10, monitor='loss')`**: Stops training if the training loss doesn't improve for 10 consecutive epochs. (Note: For final training on the full set, sometimes monitoring training loss directly or a larger patience is chosen, as there's no separate validation set to monitor here).
* **`ModelCheckpoint(filepath='model/final_lstm_model.keras', save_best_only=True)`**: Saves the model with the best performance (lowest training loss in this case) to disk.

In [None]:
# --- Train the best model with optimal hyperparameters ---
print("\nTraining the best model with optimal hyperparameters...")

# Re-build the model using the best hyperparameters
final_model = build_lstm_model_(best_hps)

# Train the final model (you might use more epochs here)
history = final_model.fit(
    X_train,y_train,
    epochs=100, # Train for more epochs
    validation_data=(X_val, y_val),
    callbacks=[stop_early], # Use early stopping again
    batch_size=2048,
    verbose=1
)

print("\nFinal model training complete!")

### 3.3 Saving the Final Model

After training the final model, it's saved in the `.keras` format. This allows for easy loading and deployment of the trained model without needing to redefine its architecture or weights.

The output confirms the path where the model has been saved.

In [None]:
# Save the model in .keras format
model_save_path_keras = "model/final_lstm_model.keras"
final_model.save(model_save_path_keras)
print(f"Final model saved in .keras format to: {model_save_path_keras}")

### 3.5 Making Predictions with the Final Model

The loaded `LTSM_model` (which is our `final_model`) is used to make predictions on the `X_test` dataset. Since the model was trained on scaled RUL values, the predictions (`y_pred`) are also scaled. We then use the `rul_scaler` to `inverse_transform` these predictions back to their original RUL scale (number of cycles).

The output displays the first 10 actual RUL values from `y_true` and the corresponding predicted RUL values from `predictions`. This allows for a quick visual comparison of how well the model is performing on unseen data.

In [None]:
# Load the saved model
model_save_path_keras = "model/final_lstm_model.keras"
LTSM_model = tf.keras.models.load_model(model_save_path_keras)
print(f"\nModel loaded from .keras format successfully!")

In [None]:
y_pred = final_model.predict(X_test)
predictions = rul_scaler.inverse_transform(y_pred)
print(f'The first 10 actual RUL are {y_true[:10]}')
print(f'The first 10 predicted RUL are {y_pred[:10]}')

### 3.6 Quantitative Evaluation of the Final Model

The `evaluate_predictions` function (presumably from `custom_functions.py`) is called to calculate key regression metrics, providing a quantitative assessment of the model's accuracy on the test set.

* **Mean Absolute Error (MAE)**: Measures the average magnitude of the errors. Lower MAE indicates better performance.
* **Root Mean Squared Error (RMSE)**: A quadratic scoring rule that measures the average magnitude of the error. It gives a relatively high weight to large errors.
* **RUL Score**: A custom metric (often used in the C-MAPSS challenge) that penalizes late predictions more heavily than early predictions, reflecting the criticality of proactive maintenance.

*You should update this section with the actual values output by `evaluate_predictions` to interpret the model's performance on the test data based on these metrics.*

In [None]:
evaluate_predictions(y_true,predictions)

## 4. Conclusion

This notebook successfully implemented a comprehensive hyperparameter tuning process for an LSTM-based RUL prediction model using Keras Tuner. The optimal hyperparameters were identified, and a final model was trained and evaluated on an unseen test dataset.

**Summary of Findings (to be filled based on your results):**
* The hyperparameter tuning process identified a model configuration that achieved a **[Insert best val_loss and val_mae from tuner here]** validation loss/MAE.
* The final model, trained with the optimized hyperparameters, yielded **[Insert MAE, RMSE, RUL Score from final evaluation here]** on the test set.
* The True vs. Predicted RUL plot visually confirms the model's ability to predict RUL, though there may be certain ranges where predictions are more scattered (e.g., very high RUL values).

**Potential Future Work:**
* **More Extensive Tuning**: Explore a broader range of hyperparameters or more advanced tuning algorithms (e.g., Bayesian Optimization).
* **Error Analysis**: Conduct a deeper analysis of prediction errors, especially for outliers, to understand their root causes (e.g., specific engine profiles, operational settings).
* **Uncertainty Estimation**: Implement techniques to quantify the uncertainty of RUL predictions, providing confidence intervals for maintenance decisions.
* **Deployment Considerations**: Discuss how this model could be deployed in a real-world scenario (e.g., using TensorFlow Serving).

This notebook demonstrates proficiency in advanced deep learning model development and optimization, making it a strong asset for your portfolio.