## . Predictive Maintenance (RUL Prediction)

### 📊 Dataset Overview
| Feature | Description |
|----------|--------------|
| `Vibration` | Measured vibration level |
| `Temperature` | Operating temperature |
| `Pressure` | System pressure |
| `Operating_Hours` | Hours since last maintenance |
| `Remaining_Useful_Life` | Remaining lifetime of the component *(Target)* |
| `Component_Type_Gear` | 1 if component is a Gear, else 0 |
| `Component_Type_Hydraulic Cylinder` | 1 if component is a Hydraulic Cylinder, else 0 |

### 🎯 Problem Statement
Create a **deep learning regression model** using **LSTM/GRU** to predict the **Remaining Useful Life (RUL)** of machine components based on historical sensor readings and operational metrics.

The model should:
- Learn temporal patterns from sensor sequences  
- Estimate how long each component will function before failure  
- Support proactive maintenance scheduling  

### 🧩 Modeling Goal
Train an **RandomForex regression model** to predict the numeric value of `Remaining_Useful_Life` from multivariate time-dependent sensor data.


In [18]:
import pandas as pd
import os 
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import numpy as np 
from sklearn.metrics import r2_score,mean_squared_error
from sklearn.ensemble import RandomForestRegressor
import joblib


In [2]:
os.chdir("../")

In [3]:
df = pd.read_csv("raw_data/RUL_dataset/rul_dataset_realistic.csv")

In [4]:
df = df.drop("Machine_ID",axis=1)
df = df.drop("Failure_Type",axis=1)

In [5]:
df.isnull().sum()

Cycle                    0
Temperature (°C)         0
Pressure (kPa)           0
Vibration (mm/s)         0
Voltage (V)              0
Current (A)              0
Speed (RPM)              0
Lubrication_Level (%)    0
Humidity (%)             0
RUL                      0
dtype: int64

In [6]:
df.describe().corr()

Unnamed: 0,Cycle,Temperature (°C),Pressure (kPa),Vibration (mm/s),Voltage (V),Current (A),Speed (RPM),Lubrication_Level (%),Humidity (%),RUL
Cycle,1.0,0.999219,0.999506,0.998674,0.998456,0.998687,0.924289,0.99914,0.999058,0.999948
Temperature (°C),0.999219,1.0,0.999586,0.999864,0.999483,0.999876,0.915121,0.999988,0.999974,0.99933
Pressure (kPa),0.999506,0.999586,1.0,0.998977,0.999629,0.999012,0.926333,0.999472,0.999372,0.999725
Vibration (mm/s),0.998674,0.999864,0.998977,1.0,0.999045,0.999999,0.908356,0.999914,0.999949,0.998722
Voltage (V),0.998456,0.999483,0.999629,0.999045,1.0,0.999088,0.922279,0.999424,0.999343,0.998815
Current (A),0.998687,0.999876,0.999012,0.999999,0.999088,1.0,0.908677,0.999923,0.999956,0.998741
Speed (RPM),0.924289,0.915121,0.926333,0.908356,0.922279,0.908677,1.0,0.913637,0.912422,0.925908
Lubrication_Level (%),0.99914,0.999988,0.999472,0.999914,0.999424,0.999923,0.913637,1.0,0.999995,0.999227
Humidity (%),0.999058,0.999974,0.999372,0.999949,0.999343,0.999956,0.912422,0.999995,1.0,0.999132
RUL,0.999948,0.99933,0.999725,0.998722,0.998815,0.998741,0.925908,0.999227,0.999132,1.0


In [7]:
df

Unnamed: 0,Cycle,Temperature (°C),Pressure (kPa),Vibration (mm/s),Voltage (V),Current (A),Speed (RPM),Lubrication_Level (%),Humidity (%),RUL
0,213,98.89,210.11,6.93,226.40,10.45,1436.0,49.58,74.07,125.14
1,137,70.92,203.99,5.00,220.26,10.75,1479.0,77.18,34.44,199.67
2,6,62.77,213.68,2.65,217.63,9.27,1324.0,90.19,50.58,294.36
3,109,75.25,209.68,3.89,213.28,9.30,1679.0,89.28,71.70,207.51
4,35,72.43,219.72,3.80,222.33,12.06,1613.0,90.05,41.43,268.74
...,...,...,...,...,...,...,...,...,...,...
4995,171,63.45,217.11,5.06,219.20,9.71,1776.0,69.89,54.85,170.16
4996,223,70.46,157.61,6.19,227.29,10.40,1463.0,89.29,61.06,118.36
4997,213,72.91,186.61,4.80,220.09,9.02,1588.0,58.94,47.92,157.81
4998,282,83.96,180.54,5.66,219.04,9.11,1149.0,43.06,33.40,35.65


In [8]:
X,y = df.drop("RUL",axis=1),df["RUL"]

In [9]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [10]:
X_train[:3000].to_csv("notebooks/transformed_data/rul_dataset.csv")

In [11]:
scalter_model = StandardScaler()
scaled_x_train = scalter_model.fit_transform(X_train)
scaled_x_test = scalter_model.transform(X_test)

In [12]:
model = RandomForestRegressor()

In [13]:
model.fit(X_train,y_train)

0,1,2
,n_estimators,100
,criterion,'squared_error'
,max_depth,
,min_samples_split,2
,min_samples_leaf,1
,min_weight_fraction_leaf,0.0
,max_features,1.0
,max_leaf_nodes,
,min_impurity_decrease,0.0
,bootstrap,True


In [14]:
y_pred = model.predict(X_test)

In [15]:
r2 = r2_score(y_test,y_pred)
mse = mean_squared_error(y_test,y_pred)
rmse = np.sqrt(mse)

In [16]:
print(f"r2_score: {r2}")
print(f"mse {mse}")
print(f"rmse: {rmse}")

r2_score: 0.9462979612882514
mse 250.01886577150998
rmse: 15.811984877665106


In [17]:
joblib.dump(model,"notebooks/trained_model/rul_model.pkl")

['notebooks/trained_model/rul_model.pkl']