AER 850 Machine Learning Project
Hussein Hamie 500876254
Mishran Haque 500896460

**Turbofan Engine Remaining Useful Life Prediction**
**DATASET:** NASA Turbofan Jet Engine Data Set

Data sets consists of multiple multivariate time series. Each data set is further divided into training and test subsets. Each time series is from a different engine i.e., the data can be considered to be from a fleet of engines of the same type. Each engine starts with different degrees of initial wear and manufacturing variation which is unknown to the user. This wear and variation is considered normal, i.e., it is not considered a fault condition. There are three operational settings that have a substantial effect on engine performance. These settings are also included in the data. The data is contaminated with sensor noise.

The engine is operating normally at the start of each time series, and develops a fault at some point during the series. In the training set, the fault grows in magnitude until system failure. In the test set, the time series ends some time prior to system failure. The objective of the competition is to predict the number of remaining operational cycles before failure in the test set, i.e., the number of operational cycles after the last cycle that the engine will continue to operate. Also provided a vector of true Remaining Useful Life (RUL) values for the test data.

The data are provided as a zip-compressed text file with 26 columns of numbers, separated by spaces. Each row is a snapshot of data taken during a single operational cycle, each column is a different variable. The columns correspond to:
1) unit number
2) time, in cycles
3) operational setting 1
4) operational setting 2
5) operational setting 3
6) sensor measurement 1
7) sensor measurement 2
...
26) sensor measurement 26

Data Set: FD001
Train trjectories: 100
Test trajectories: 100
Conditions: ONE (Sea Level)
Fault Modes: ONE (HPC Degradation)

Data Set: FD002
Train trjectories: 260
Test trajectories: 259
Conditions: SIX
Fault Modes: ONE (HPC Degradation)

Data Set: FD003
Train trjectories: 100
Test trajectories: 100
Conditions: ONE (Sea Level)
Fault Modes: TWO (HPC Degradation, Fan Degradation)

Data Set: FD004
Train trjectories: 248
Test trajectories: 249
Conditions: SIX
Fault Modes: TWO (HPC Degradation, Fan Degradation)

Reference: A. Saxena, K. Goebel, D. Simon, and N. Eklund, ‘Damage Propagation Modeling for Aircraft Engine Run-to-Failure Simulation’, in the Proceedings of the 1st International Conference on Prognostics and Health Management (PHM08), Denver CO, Oct 2008.

In [65]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.ensemble import RandomForestRegressor
import sklearn
from sklearn.metrics import mean_squared_error, r2_score
import os
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import random
import warnings
#warnings.filterwarnings('ignore')

In [66]:
# define filepath to read data
dir_path = './CMaps/'
# define column names for easy indexing
index_names = ['unit_number', 'time_cycles']
setting_names = ['setting_1', 'setting_2', 'setting_3']
sensor_names = sensor_names=[ "(Total temperature at fan inlet) (◦R)",
                              "(Total temperature at LPC outlet) (◦R)",
                              "(Total temperature at HPC outlet) (◦R)",
                              "(Total temperature at LPT outlet) (◦R)",
                              "(Pressure at fan inlet) (psia)",
                              "(Total pressure in bypass-duct) (psia)",
                              "(Total pressure at HPC outlet) (psia)",
                              "(Physical fan speed) (rpm)",
                              "(Physical core speed) (rpm)",
                              "(Engine pressure ratio(P50/P2)",
                              "(Static pressure at HPC outlet) (psia)",
                              "(Ratio of fuel flow to Ps30) (pps/psia)",
                              "(Corrected fan speed) (rpm)",
                              "(Corrected core speed) (rpm)",
                              "(Bypass Ratio) ",
                              "(Burner fuel-air ratio)",
                              "(Bleed Enthalpy)",
                              "(Demanded fan speed)",
                              "(Demanded corrected fan speed)",
                              "(HPT coolant bleed (lbm/s))",
                              "(LPT coolant bleed (lbm/s))" ]
col_names = index_names + setting_names + sensor_names

In [67]:
FD001_train = pd.read_csv('CMaps/train_FD001.txt',sep='\s+',header=None,index_col=False,names=col_names)
FD001_test = pd.read_csv('CMaps/test_FD001.txt',sep='\s+',header=None,index_col=False,names=col_names)

FD002_train = pd.read_csv('CMaps/train_FD002.txt',sep='\s+',header=None,index_col=False,names=col_names)
FD002_test = pd.read_csv('CMaps/test_FD002.txt',sep='\s+',header=None,index_col=False,names=col_names)

FD003_train = pd.read_csv('CMaps/train_FD003.txt',sep='\s+',header=None,index_col=False,names=col_names)
FD003_test = pd.read_csv('CMaps/test_FD003.txt',sep='\s+',header=None,index_col=False,names=col_names)

FD004_train = pd.read_csv('CMaps/train_FD004.txt',sep='\s+',header=None,index_col=False,names=col_names)
FD004_test = pd.read_csv('CMaps/test_FD004.txt',sep='\s+',header=None,index_col=False,names=col_names)

In [68]:
FD001_train.head()

Unnamed: 0,unit_number,time_cycles,setting_1,setting_2,setting_3,(Total temperature at fan inlet) (◦R),(Total temperature at LPC outlet) (◦R),(Total temperature at HPC outlet) (◦R),(Total temperature at LPT outlet) (◦R),(Pressure at fan inlet) (psia),...,(Ratio of fuel flow to Ps30) (pps/psia),(Corrected fan speed) (rpm),(Corrected core speed) (rpm),(Bypass Ratio),(Burner fuel-air ratio),(Bleed Enthalpy),(Demanded fan speed),(Demanded corrected fan speed),(HPT coolant bleed (lbm/s)),(LPT coolant bleed (lbm/s))
0,1,1,-0.0007,-0.0004,100.0,518.67,641.82,1589.7,1400.6,14.62,...,521.66,2388.02,8138.62,8.4195,0.03,392,2388,100.0,39.06,23.419
1,1,2,0.0019,-0.0003,100.0,518.67,642.15,1591.82,1403.14,14.62,...,522.28,2388.07,8131.49,8.4318,0.03,392,2388,100.0,39.0,23.4236
2,1,3,-0.0043,0.0003,100.0,518.67,642.35,1587.99,1404.2,14.62,...,522.42,2388.03,8133.23,8.4178,0.03,390,2388,100.0,38.95,23.3442
3,1,4,0.0007,0.0,100.0,518.67,642.35,1582.79,1401.87,14.62,...,522.86,2388.08,8133.83,8.3682,0.03,392,2388,100.0,38.88,23.3739
4,1,5,-0.0019,-0.0002,100.0,518.67,642.37,1582.85,1406.22,14.62,...,522.19,2388.04,8133.8,8.4294,0.03,393,2388,100.0,38.9,23.4044


In [69]:
FD001_train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20631 entries, 0 to 20630
Data columns (total 26 columns):
 #   Column                                   Non-Null Count  Dtype  
---  ------                                   --------------  -----  
 0   unit_number                              20631 non-null  int64  
 1   time_cycles                              20631 non-null  int64  
 2   setting_1                                20631 non-null  float64
 3   setting_2                                20631 non-null  float64
 4   setting_3                                20631 non-null  float64
 5   (Total temperature at fan inlet) (◦R)    20631 non-null  float64
 6   (Total temperature at LPC outlet) (◦R)   20631 non-null  float64
 7   (Total temperature at HPC outlet) (◦R)   20631 non-null  float64
 8   (Total temperature at LPT outlet) (◦R)   20631 non-null  float64
 9   (Pressure at fan inlet) (psia)           20631 non-null  float64
 10  (Total pressure in bypass-duct) (psia)   20631

In [70]:
FD001_train.describe(include='all')

Unnamed: 0,unit_number,time_cycles,setting_1,setting_2,setting_3,(Total temperature at fan inlet) (◦R),(Total temperature at LPC outlet) (◦R),(Total temperature at HPC outlet) (◦R),(Total temperature at LPT outlet) (◦R),(Pressure at fan inlet) (psia),...,(Ratio of fuel flow to Ps30) (pps/psia),(Corrected fan speed) (rpm),(Corrected core speed) (rpm),(Bypass Ratio),(Burner fuel-air ratio),(Bleed Enthalpy),(Demanded fan speed),(Demanded corrected fan speed),(HPT coolant bleed (lbm/s)),(LPT coolant bleed (lbm/s))
count,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0,...,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0
mean,51.506568,108.807862,-9e-06,2e-06,100.0,518.67,642.680934,1590.523119,1408.933782,14.62,...,521.41347,2388.096152,8143.752722,8.442146,0.03,393.210654,2388.0,100.0,38.816271,23.289705
std,29.227633,68.88099,0.002187,0.000293,0.0,6.537152e-11,0.500053,6.13115,9.000605,3.3947e-12,...,0.737553,0.071919,19.076176,0.037505,1.556432e-14,1.548763,0.0,0.0,0.180746,0.108251
min,1.0,1.0,-0.0087,-0.0006,100.0,518.67,641.21,1571.04,1382.25,14.62,...,518.69,2387.88,8099.94,8.3249,0.03,388.0,2388.0,100.0,38.14,22.8942
25%,26.0,52.0,-0.0015,-0.0002,100.0,518.67,642.325,1586.26,1402.36,14.62,...,520.96,2388.04,8133.245,8.4149,0.03,392.0,2388.0,100.0,38.7,23.2218
50%,52.0,104.0,0.0,0.0,100.0,518.67,642.64,1590.1,1408.04,14.62,...,521.48,2388.09,8140.54,8.4389,0.03,393.0,2388.0,100.0,38.83,23.2979
75%,77.0,156.0,0.0015,0.0003,100.0,518.67,643.0,1594.38,1414.555,14.62,...,521.95,2388.14,8148.31,8.4656,0.03,394.0,2388.0,100.0,38.95,23.3668
max,100.0,362.0,0.0087,0.0006,100.0,518.67,644.53,1616.91,1441.49,14.62,...,523.38,2388.56,8293.72,8.5848,0.03,400.0,2388.0,100.0,39.43,23.6184


In [71]:
#Finding missing/incorrect data
FD001_train.isnull().sum()

unit_number                                0
time_cycles                                0
setting_1                                  0
setting_2                                  0
setting_3                                  0
(Total temperature at fan inlet) (◦R)      0
(Total temperature at LPC outlet) (◦R)     0
(Total temperature at HPC outlet) (◦R)     0
(Total temperature at LPT outlet) (◦R)     0
(Pressure at fan inlet) (psia)             0
(Total pressure in bypass-duct) (psia)     0
(Total pressure at HPC outlet) (psia)      0
(Physical fan speed) (rpm)                 0
(Physical core speed) (rpm)                0
(Engine pressure ratio(P50/P2)             0
(Static pressure at HPC outlet) (psia)     0
(Ratio of fuel flow to Ps30) (pps/psia)    0
(Corrected fan speed) (rpm)                0
(Corrected core speed) (rpm)               0
(Bypass Ratio)                             0
(Burner fuel-air ratio)                    0
(Bleed Enthalpy)                           0
(Demanded 

In [72]:
#Creating a data profiling report to get understand the data better
#from ydata_profiling import ProfileReport
#data_report = ProfileReport(FD001_train)
#data_report.to_file("TrainDataset_Profile.html")

In [73]:
#Dropping columns that do not contain necessary features (unit_number and operational settings)
drop_cols = index_names + setting_names
FD001_train = FD001_train.drop(columns = drop_cols)

#Dropping feature columns with constant values
unique = FD001_train.nunique()
const_cols = unique[unique == 1].index.tolist()
FD001_train = FD001_train.drop(columns = const_cols)