# EV RUL Prediction: XGBoost & Random Forest

This notebook trains XGBoostRegressor and RandomForestRegressor models to predict Remaining Useful Life (RUL) for EV components. We compare their performance using RMSE and MAE.

In [None]:
# Import Required Libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error
from xgboost import XGBRegressor
from sklearn.ensemble import RandomForestRegressor
import matplotlib.pyplot as plt

## Load and Prepare Data

Load the preprocessed feature set and split into training and test sets for RUL prediction.

In [None]:
# Load preprocessed features (after feature engineering)
data = pd.read_csv('../data/features/features_for_modeling.csv')

# We will predict the 'overall_health_score' as a proxy for RUL.
target_variable = 'overall_health_score'

# Prepare features (X) and target (y)
# Ensure the target variable itself is not included in the features
X = data.drop(columns=[target_variable]) 
y = data[target_variable]

# Ensure all feature columns are numeric before training
X = X.select_dtypes(include=np.number)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"Models will be trained to predict: '{target_variable}'")

In [None]:

for col in cols:
    print(col)

## Train XGBoostRegressor

Train an XGBoostRegressor model on the training data for RUL prediction.

In [None]:
# Train XGBoostRegressor
xgb_model = XGBRegressor(random_state=42, n_jobs=-1)
xgb_model.fit(X_train, y_train)

# Predict on test set
xgb_pred = xgb_model.predict(X_test)

## Train RandomForestRegressor

Train a RandomForestRegressor model on the training data for RUL prediction.

In [None]:
# Train RandomForestRegressor
rf_model = RandomForestRegressor(random_state=42, n_jobs=-1)
rf_model.fit(X_train, y_train)

# Predict on test set
rf_pred = rf_model.predict(X_test)

## Evaluate Model Performance

Evaluate both models using RMSE and MAE on the test set, and compare their results.

In [None]:
# Evaluate XGBoostRegressor
xgb_rmse = np.sqrt(mean_squared_error(y_test, xgb_pred))
xgb_mae = mean_absolute_error(y_test, xgb_pred)

# Evaluate RandomForestRegressor
rf_rmse = np.sqrt(mean_squared_error(y_test, rf_pred))
rf_mae = mean_absolute_error(y_test, rf_pred)

print(f"XGBoostRegressor RMSE: {xgb_rmse:.2f}, MAE: {xgb_mae:.2f}")
print(f"RandomForestRegressor RMSE: {rf_rmse:.2f}, MAE: {rf_mae:.2f}")

- RMSE  (Root Mean Squared Error) measures the square root of the average squared differences between predicted and actual values. It penalizes larger errors more heavily and is sensitive to outliers.
- MAE (Mean Absolute Error) measures the average absolute differences between predicted and actual values. It treats all errors equally and is more robust to outliers.
- We use RMSE and MAE to quantify how well the model predicts the Remaining Useful Life (RUL). Lower values mean better predictions. Using both gives a more complete picture of model performance.

Testing if we used raw data:

In [None]:
# Load raw data (before feature engineering)
raw_data = pd.read_csv('../archive/EV_Predictive_Maintenance_Dataset_15min.csv')

# Convert Timestamp to datetime if needed
raw_data['Timestamp'] = pd.to_datetime(raw_data['Timestamp'])

# Prepare features and target from raw data
X_raw = raw_data.drop(['RUL', 'Timestamp'], axis=1)
y_raw = raw_data['RUL']

# Split into train and test sets
X_train_raw, X_test_raw, y_train_raw, y_test_raw = train_test_split(X_raw, y_raw, test_size=0.2, random_state=42)

# Train XGBoostRegressor on raw features
xgb_model_raw = XGBRegressor(random_state=42, n_jobs=-1)
xgb_model_raw.fit(X_train_raw, y_train_raw)
xgb_pred_raw = xgb_model_raw.predict(X_test_raw)

# Train RandomForestRegressor on raw features
rf_model_raw = RandomForestRegressor(random_state=42, n_jobs=-1)
rf_model_raw.fit(X_train_raw, y_train_raw)
rf_pred_raw = rf_model_raw.predict(X_test_raw)

# Evaluate models on raw features
xgb_rmse_raw = np.sqrt(mean_squared_error(y_test_raw, xgb_pred_raw))
xgb_mae_raw = mean_absolute_error(y_test_raw, xgb_pred_raw)
rf_rmse_raw = np.sqrt(mean_squared_error(y_test_raw, rf_pred_raw))
rf_mae_raw = mean_absolute_error(y_test_raw, rf_pred_raw)

print(f"XGBoostRegressor (raw) RMSE: {xgb_rmse_raw:.2f}, MAE: {xgb_mae_raw:.2f}")
print(f"RandomForestRegressor (raw) RMSE: {rf_rmse_raw:.2f}, MAE: {rf_mae_raw:.2f}")