# LSTM Model for RUL Prediction

This notebook builds and trains an LSTM (Long Short-Term Memory) model to predict the `overall_health_score` as a proxy for RUL. LSTMs are well-suited for time-series data as they can capture temporal dependencies.

In [None]:
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error
import matplotlib.pyplot as plt

# Set random seed for reproducibility
tf.random.set_seed(42)
np.random.seed(42)

## 1. Load and Prepare Data

We'll load the engineered features dataset. For the LSTM, we need to scale the features and then reshape the data into sequences.

In [None]:
# Load the dataset
data = pd.read_csv('../data/features/features_for_modeling.csv')

# --- FIX: Impute NaN values with 0 ---
# These NaNs likely occur for users without a historical average.
# Imputing with 0 assumes no deviation from the average.
impute_cols = [
    'Ambient_Humidity_vs_user_avg',
    'Ambient_Temperature_vs_user_avg',
    'Battery_Current_vs_user_avg',
    'Battery_Voltage_vs_user_avg'
]
for col in impute_cols:
    if col in data.columns:
        data[col].fillna(0, inplace=True)

# Also, drop any remaining NaN values from other columns just in case
data.dropna(inplace=True)

# Define the target variable
target_variable = 'overall_health_score'

# Ensure all feature columns are numeric before scaling
features = data.drop(columns=[target_variable])
features = features.select_dtypes(include=np.number)
target = data[target_variable]

print("Data loaded successfully.")
print(f"Features shape: {features.shape}")
print(f"Target shape: {target.shape}")

In [None]:
# Check for missing values in the dataframe
missing_values = data.isnull().sum()
print(missing_values[missing_values > 0])

In [None]:
# Scale features to a range between 0 and 1
scaler = MinMaxScaler()
scaled_features = scaler.fit_transform(features)

# Function to create sequences
def create_sequences(features, target, time_steps=24):
    X, y = [], []
    for i in range(len(features) - time_steps):
        X.append(features[i:(i + time_steps)])
        y.append(target[i + time_steps])
    return np.array(X), np.array(y)

# Create sequences
X_seq, y_seq = create_sequences(scaled_features, target.values, time_steps=24)

print(f"Shape of sequence data (X): {X_seq.shape}")
print(f"Shape of sequence labels (y): {y_seq.shape}")

## 2. Split Data into Training and Testing Sets

We'll split the sequenced data into training and testing sets to evaluate the model's performance on unseen data.

In [None]:
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X_seq, y_seq, test_size=0.2, random_state=42)

print(f"X_train shape: {X_train.shape}")
print(f"X_test shape: {X_test.shape}")
print(f"y_train shape: {y_train.shape}")
print(f"y_test shape: {y_test.shape}")

## 3. Build the LSTM Model

We will construct a Sequential LSTM model with the following layers:
- **LSTM Layer**: The core layer with 50 units to capture temporal patterns. `return_sequences=True` is used for stacking LSTM layers.
- **Dropout Layer**: To prevent overfitting by randomly setting a fraction of input units to 0.
- **Dense Layer**: A fully connected layer to produce the final prediction.


In [None]:
# Build the LSTM model
model = Sequential([
    LSTM(50, return_sequences=True, input_shape=(X_train.shape[1], X_train.shape[2])),
    Dropout(0.2),
    LSTM(50),
    Dropout(0.2),
    Dense(1)
])

# --- FIX: Use an optimizer with gradient clipping to prevent NaN loss ---
optimizer = tf.keras.optimizers.Adam(clipnorm=1.0)
model.compile(optimizer=optimizer, loss='mean_squared_error')
model.summary()

## 4. Train the LSTM Model

Now, we'll train the model on the training data. We'll use a validation split to monitor performance on a subset of the training data and use `EarlyStopping` to prevent overfitting by stopping the training if the validation loss doesn't improve for 10 consecutive epochs.

In [None]:
from tensorflow.keras.callbacks import EarlyStopping

# Define early stopping
early_stopping = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)

# Train the model
history = model.fit(
    X_train, y_train,
    epochs=100,
    batch_size=32,
    validation_split=0.2,
    callbacks=[early_stopping],
    verbose=1
)

print("Model training complete.")

## 5. Evaluate Model Performance

After training, we'll evaluate the model's performance on the test set using RMSE and MAE. This will tell us how well the model generalizes to new, unseen data.

In [None]:
# Predict on the test set
y_pred = model.predict(X_test)

# Calculate RMSE and MAE
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
mae = mean_absolute_error(y_test, y_pred)

print(f"LSTM Model Test RMSE: {rmse:.4f}")
print(f"LSTM Model Test MAE: {mae:.4f}")

## 6. Visualize Training History

Plotting the training and validation loss over epochs helps us understand how the model learned and whether it overfitted.

In [None]:
plt.figure(figsize=(12, 6))
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('LSTM Model Training History')
plt.xlabel('Epoch')
plt.ylabel('Loss (MSE)')
plt.legend()
plt.grid(True)
plt.show()

## 7. Visualize Predictions vs. Actual Values

A scatter plot of predicted vs. actual values can give us a good visual sense of the model's performance. A perfect model would have all points on the diagonal line.

In [None]:
plt.figure(figsize=(10, 10))
plt.scatter(y_test, y_pred, alpha=0.3)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], '--r', linewidth=2)
plt.title('Actual vs. Predicted Health Score')
plt.xlabel('Actual Values')
plt.ylabel('Predicted Values')
plt.axis('equal')
plt.axis('square')
plt.grid(True)
plt.show()

## 8. Hyperparameter Tuning: A More Complex LSTM

The initial LSTM model's performance was not as good as the tree-based models. Let's try a more complex architecture to see if we can capture more intricate patterns. We will increase the number of units in the LSTM layers and add an extra Dense layer.

In [None]:
# Build a more complex LSTM model
tuned_model = Sequential([
    LSTM(100, return_sequences=True, input_shape=(X_train.shape[1], X_train.shape[2])),
    Dropout(0.3),
    LSTM(100, return_sequences=False),
    Dropout(0.3),
    Dense(50, activation='relu'),
    Dense(1)
])

# Compile the model with gradient clipping
optimizer = tf.keras.optimizers.Adam(clipnorm=1.0)
tuned_model.compile(optimizer=optimizer, loss='mean_squared_error')
tuned_model.summary()

# Train the new model
history_tuned = tuned_model.fit(
    X_train, y_train,
    epochs=100,
    batch_size=32,
    validation_split=0.2,
    callbacks=[early_stopping],
    verbose=1
)

# Evaluate the tuned model
y_pred_tuned = tuned_model.predict(X_test)
rmse_tuned = np.sqrt(mean_squared_error(y_test, y_pred_tuned))
mae_tuned = mean_absolute_error(y_test, y_pred_tuned)

print(f"Tuned LSTM Model Test RMSE: {rmse_tuned:.4f}")
print(f"Tuned LSTM Model Test MAE: {mae_tuned:.4f}")