# EXPLORING NASA’S TURBOFAN DATASET
<hr>

## Exploratory Data Analysis and hyper parameter tuning for FD003
## Introduction

This notebook contains an exploration data analysis and evaluation of the some various types of machine learning models.<br>
Datasets include simulations of multiple turbofan engines over time, each row contains the following information:
1. Engine unit number
2. Time, in cycles
3. Three operational settings
4. 21 sensor readings

Note: There are 4 datasets available of increasing complexity:  
Data Set FD003 is the third in the series.  
Train trjectories: 100  
Test trajectories: 100  
Conditions: ONE (Sea Level)  
Fault Modes: TWO (High Pressure Compressor(HPC) Degradation, Fan Degradation)`

### Importing libraries and Dependencies

In [1]:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

### Reading Data

In [3]:
# define filepath and files to read data
dir_path = './CMaps/'
train_file = 'train_FD003.txt'
test_file = 'test_FD003.txt'

# define column names for easy indexing
index_names = ['unit_nr', 'time_cycles']
setting_names = ['setting_1', 'setting_2', 'setting_3']
sensor_names = ['s_{}'.format(i+1) for i in range(0,21)]
col_names = index_names + setting_names + sensor_names

# read data
train = pd.read_csv((dir_path+train_file), sep='\s+', header=None, 
                 names=col_names)
test = pd.read_csv((dir_path+test_file), sep='\s+', header=None, 
                 names=col_names)
y_test = pd.read_csv((dir_path+'RUL_FD003.txt'), sep='\s+', header=None, 
                 names=['RemainingUsefulLife'])

# check shape and inspect first few rows
print(train.shape)
train.head()

(24720, 26)


Unnamed: 0,unit_nr,time_cycles,setting_1,setting_2,setting_3,s_1,s_2,s_3,s_4,s_5,...,s_12,s_13,s_14,s_15,s_16,s_17,s_18,s_19,s_20,s_21
0,1,1,-0.0005,0.0004,100.0,518.67,642.36,1583.23,1396.84,14.62,...,522.31,2388.01,8145.32,8.4246,0.03,391,2388,100.0,39.11,23.3537
1,1,2,0.0008,-0.0003,100.0,518.67,642.5,1584.69,1396.89,14.62,...,522.42,2388.03,8152.85,8.4403,0.03,392,2388,100.0,38.99,23.4491
2,1,3,-0.0014,-0.0002,100.0,518.67,642.18,1582.35,1405.61,14.62,...,522.03,2388.0,8150.17,8.3901,0.03,391,2388,100.0,38.85,23.3669
3,1,4,-0.002,0.0001,100.0,518.67,642.92,1585.61,1392.27,14.62,...,522.49,2388.08,8146.56,8.3878,0.03,392,2388,100.0,38.96,23.2951
4,1,5,0.0016,0.0,100.0,518.67,641.68,1588.63,1397.65,14.62,...,522.58,2388.03,8147.8,8.3869,0.03,392,2388,100.0,39.14,23.4583


In [4]:
# inspect time cycles
train[index_names].groupby('unit_nr').max().describe()

Unnamed: 0,time_cycles
count,100.0
mean,247.2
std,86.48384
min,145.0
25%,189.75
50%,220.5
75%,279.75
max,525.0
