These datasets contain measurements from 26 unknown sensors, where each data sample contains data until engine failure. The goal in this task is to predic the **RUL** (remaining useful lifetime) of each sample engine in the test data.  
These are four datasets: `FD001, FD002, FD003, FD004`, where each one is a slightly more complex version of its predecessor.

## Part 1 - FD001

## 1.1 - EDA  


Let's perform some basic plotting of the data to get a feel of what we are dealing with. We will start with some necessary imports and loading of the data into an organized dataframe.

In [None]:
!pip3 install tsfresh
!pip3 install scipy --upgrade
!pip3 install pytorch_lightning
!pip3 install datapane

Collecting tsfresh
  Downloading tsfresh-0.18.0-py2.py3-none-any.whl (94 kB)
[?25l[K     |███▌                            | 10 kB 12.0 MB/s eta 0:00:01[K     |███████                         | 20 kB 8.5 MB/s eta 0:00:01[K     |██████████▍                     | 30 kB 10.5 MB/s eta 0:00:01[K     |█████████████▉                  | 40 kB 12.9 MB/s eta 0:00:01[K     |█████████████████▎              | 51 kB 6.8 MB/s eta 0:00:01[K     |████████████████████▊           | 61 kB 7.1 MB/s eta 0:00:01[K     |████████████████████████▏       | 71 kB 5.8 MB/s eta 0:00:01[K     |███████████████████████████▊    | 81 kB 6.5 MB/s eta 0:00:01[K     |███████████████████████████████▏| 92 kB 5.3 MB/s eta 0:00:01[K     |████████████████████████████████| 94 kB 1.9 MB/s 
Collecting matrixprofile>=1.1.10<2.0.0
  Downloading matrixprofile-1.1.10-cp37-cp37m-manylinux2010_x86_64.whl (1.1 MB)
[K     |████████████████████████████████| 1.1 MB 16.3 MB/s 
Collecting stumpy>=1.7.2
  Downloading stump

Collecting pytorch_lightning
  Downloading pytorch_lightning-1.4.9-py3-none-any.whl (925 kB)
[K     |████████████████████████████████| 925 kB 5.1 MB/s 
Collecting torchmetrics>=0.4.0
  Downloading torchmetrics-0.5.1-py3-none-any.whl (282 kB)
[K     |████████████████████████████████| 282 kB 33.0 MB/s 
Collecting future>=0.17.1
  Downloading future-0.18.2.tar.gz (829 kB)
[K     |████████████████████████████████| 829 kB 34.4 MB/s 
Collecting PyYAML>=5.1
  Downloading PyYAML-5.4.1-cp37-cp37m-manylinux1_x86_64.whl (636 kB)
[K     |████████████████████████████████| 636 kB 48.8 MB/s 
Collecting pyDeprecate==0.3.1
  Downloading pyDeprecate-0.3.1-py3-none-any.whl (10 kB)
Collecting aiohttp
  Downloading aiohttp-3.7.4.post0-cp37-cp37m-manylinux2014_x86_64.whl (1.3 MB)
[K     |████████████████████████████████| 1.3 MB 39.9 MB/s 
Collecting yarl<2.0,>=1.0
  Downloading yarl-1.6.3-cp37-cp37m-manylinux2014_x86_64.whl (294 kB)
[K     |████████████████████████████████| 294 kB 50.1 MB/s 
Collectin

In [None]:
!datapane login --token=b3dff308f961043284986ae9fe1dcc50408c9960

[32mConnected successfully to https://datapane.com as elad[0m


In [None]:
import numpy as np
import pandas as pd
import plotly.express as px
from xgboost import XGBRegressor
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
import tsfresh
from tsfresh import select_features
from tsfresh.utilities.dataframe_functions import impute
from tsfresh import extract_features
from scipy.signal import butter, filtfilt
from scipy import fftpack
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import torchmetrics
import pytorch_lightning as pl
from sklearn.preprocessing import LabelEncoder
from multiprocessing import cpu_count
from pytorch_lightning.callbacks import ModelCheckpoint, EarlyStopping
from pytorch_lightning.loggers import TensorBoardLogger


pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.



In [None]:
# for reproducibility
pl.seed_everything(42)

Global seed set to 42


42

In [None]:
train_df = pd.read_csv('/content/drive/MyDrive/Datasets/NASA_CMAPSS/train_FD001.txt', delimiter=' ', header=None)
test_df = pd.read_csv('/content/drive/MyDrive/Datasets/NASA_CMAPSS/test_FD001.txt', delimiter=' ', header=None)
SENSOR_COLUMN_NAMES = [f'sensor_{i}' for i in range(1, 27)] 
df_columns = ['unit_number', 'time', *SENSOR_COLUMN_NAMES]
train_df.columns = df_columns
test_df.columns = df_columns

We will start with a basic description of each column. Viewing this description we immediately suspect sensors `3, 4, 8, 9, 10, 11, 13, 19, 21, 22` to be non-informative because their standard deviation is very low in relation to their mean value. Sensors `25, 26` contain only NaNs, so we can just drop them.

In [None]:
train_df.head().to_csv('FD001_head.csv')

In [None]:
train_df.describe()

Unnamed: 0,unit_number,time,sensor_1,sensor_2,sensor_3,sensor_4,sensor_5,sensor_6,sensor_7,sensor_8,sensor_9,sensor_10,sensor_11,sensor_12,sensor_13,sensor_14,sensor_15,sensor_16,sensor_17,sensor_18,sensor_19,sensor_20,sensor_21,sensor_22,sensor_23,sensor_24,sensor_25,sensor_26
count,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0,0.0,0.0
mean,51.506568,108.807862,-9e-06,2e-06,100.0,518.67,642.680934,1590.523119,1408.933782,14.62,21.609803,553.367711,2388.096652,9065.242941,1.3,47.541168,521.41347,2388.096152,8143.752722,8.442146,0.03,393.210654,2388.0,100.0,38.816271,23.289705,,
std,29.227633,68.88099,0.002187,0.000293,0.0,6.537152e-11,0.500053,6.13115,9.000605,3.3947e-12,0.001389,0.885092,0.070985,22.08288,4.660829e-13,0.267087,0.737553,0.071919,19.076176,0.037505,1.556432e-14,1.548763,0.0,0.0,0.180746,0.108251,,
min,1.0,1.0,-0.0087,-0.0006,100.0,518.67,641.21,1571.04,1382.25,14.62,21.6,549.85,2387.9,9021.73,1.3,46.85,518.69,2387.88,8099.94,8.3249,0.03,388.0,2388.0,100.0,38.14,22.8942,,
25%,26.0,52.0,-0.0015,-0.0002,100.0,518.67,642.325,1586.26,1402.36,14.62,21.61,552.81,2388.05,9053.1,1.3,47.35,520.96,2388.04,8133.245,8.4149,0.03,392.0,2388.0,100.0,38.7,23.2218,,
50%,52.0,104.0,0.0,0.0,100.0,518.67,642.64,1590.1,1408.04,14.62,21.61,553.44,2388.09,9060.66,1.3,47.51,521.48,2388.09,8140.54,8.4389,0.03,393.0,2388.0,100.0,38.83,23.2979,,
75%,77.0,156.0,0.0015,0.0003,100.0,518.67,643.0,1594.38,1414.555,14.62,21.61,554.01,2388.14,9069.42,1.3,47.7,521.95,2388.14,8148.31,8.4656,0.03,394.0,2388.0,100.0,38.95,23.3668,,
max,100.0,362.0,0.0087,0.0006,100.0,518.67,644.53,1616.91,1441.49,14.62,21.61,556.06,2388.56,9244.59,1.3,48.53,523.38,2388.56,8293.72,8.5848,0.03,400.0,2388.0,100.0,39.43,23.6184,,


In [None]:
train_df.drop(columns=['sensor_25', 'sensor_26'], inplace=True)
SENSOR_COLUMN_NAMES.remove('sensor_25')
SENSOR_COLUMN_NAMES.remove('sensor_26')

Usually, having no domain knowledge hinders our ability to extract useful information from the data. Despite this, we will use this to our advantage. Because we don't know anything about the individual sensors, we can start with normalizing all the data to be in the same scale, this will make our analyses much easier.  
We will fit a MinMaxScaler to each sensor column, and save these scalers for later use (we will need to perform the same transformation on the test data). After that we will plot the first unit's measurements of each signal to verify our assumptions regarding which signals don't contain relevant information.

In [None]:
MIN_MAX_SCALERS = {}
for col_name in SENSOR_COLUMN_NAMES:
  scaler = MinMaxScaler()
  train_df[col_name] = scaler.fit_transform(train_df[col_name].values.reshape(-1, 1)).squeeze()
  MIN_MAX_SCALERS[col_name] = scaler

The plot below allows us to divide the signals into four groups:  
* irrelevant `3, 4, 8, 9, 13, 19, 21, 22`
* upward trend: `5, 6, 7, 11, 14, 16, 18, 20`
* downward trend: `10, 12, 15, 17, 23, 24`
* no trend: `1, 2`

In [None]:
train_df_long_format = pd.melt(train_df[train_df['unit_number'] == 20],
                               id_vars=['unit_number', 'time'],
                               value_vars=SENSOR_COLUMN_NAMES)
fig = px.line(train_df_long_format, x="time", y="value", color='variable',
              title='First unit normalized sensor values')
fig.for_each_trace(
    lambda trace: trace.update(visible='legendonly') if trace.name != "variable=sensor_1" else (),
)
fig.show()

In [None]:
import datapane as dp 
report = dp.Report(dp.Plot(fig) ) #Create a report
report.upload(name='NASA_FD001', open=True) #Publish the report

Bokeh version 2.3.3 is not supported, these plots may not display correctly, please install version ~=2.2.0
Folium version 0.8.3 is not supported, these plots may not display correctly, please install version >=0.12.0


Uploading report and associated data - *please wait...*
Your report only contains a single element - did you know you can include additional plots, tables and text in a single report? Check out https://docs.datapane.com/reports/blocks/layout-pages-and-selects for more info
Report successfully uploaded at https://datapane.com/u/elad/reports/BAmpn1k/nasa-fd001/, follow the link to view and share your report.


## Baseline model

We will now drop the irrelevant columns, and continue to add labels to the data - the RUL. Our first labelling method will be naive and straightforward. We know that each unit in the training set is run until `RUL = 0`. Thus, we will assume that the degredation is linear and label each time step in the training set accordingly.

In [None]:
train_df.drop(columns=[f'sensor_{i}' for i in [3, 4, 8, 9, 13, 19, 21, 22]], inplace=True, errors='ignore')
RUL = train_df.groupby('unit_number').apply(lambda group_df:
                                            pd.concat([group_df['time'].max() - group_df['time'], group_df['time']], axis=1)).\
                                            reset_index().drop(columns=['level_1'])
RUL.columns = ['unit_number', 'RUL', 'time']
train_df = pd.merge(train_df, RUL, left_on=['unit_number', 'time'], right_on=['unit_number', 'time'])
X_train = train_df[[x for x in train_df.columns if 'sensor_' in x]].values
y_train = train_df['RUL'].values.clip(max=125)

Great, now every row has a label. We will now perform regression where each row is a data sample, with `x = row sensors` and `y = RUL`. As in Koen Peters' post (https://towardsdatascience.com/predictive-maintenance-of-turbofan-engines-ec54a083127), we will clip the RUL values at a maximum value of 125 to match engine degredation trend better.

In [None]:
xgbr = XGBRegressor()
xgbr.fit(X_train, y_train)



XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=1, gamma=0,
             importance_type='gain', learning_rate=0.1, max_delta_step=0,
             max_depth=3, min_child_weight=1, missing=None, n_estimators=100,
             n_jobs=1, nthread=None, objective='reg:linear', random_state=0,
             reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
             silent=None, subsample=1, verbosity=1)

Alright, we have our first model. Now, let's load the test set (and not forget to apply the scalers to the test set as well). Below we can see our baseline performance on the train and test sets.

In [None]:
test_df.drop(columns=[f'sensor_{i}' for i in [3, 4, 8, 9, 13, 19, 21, 22, 25, 26]], inplace=True, errors='ignore')
for col_name in [x for x in test_df.columns if 'sensor_' in x]:
  test_df[col_name] = MIN_MAX_SCALERS[col_name].transform(test_df[col_name].values.reshape(-1, 1)).squeeze()
X_test = test_df.groupby('unit_number').apply(lambda group_df: group_df.iloc[group_df['time'].argmax()])[[x for x in test_df.columns if 'sensor_' in x]].values
y_test = pd.read_csv('/content/drive/MyDrive/Datasets/NASA_CMAPSS/RUL_FD001.txt', header=None).values.squeeze().clip(max=125)

In [None]:
def print_train_test_results(X_train, X_test, y_train, y_test, model):
  y_pred_train = model.predict(X_train)
  y_pred_test = model.predict(X_test)
  print(f'RMSE on train set: {mean_squared_error(y_train, y_pred_train, squared=False)}')
  print(f'RMSE on test set: {mean_squared_error(y_test, y_pred_test, squared=False)}')  

In [None]:
print_train_test_results(X_train, X_test, y_train, y_test, xgbr)

RMSE on train set: 17.679585838619715
RMSE on test set: 17.35303931484297


## Feature Engineering

### Windowing the data

Instead of taking each feature instance to be one timestep of sensor measurements, let's try using a window of 10 timesteps. The rational behind this is that maybe data from the near past will help determine the RUL of the current timestep.  
This operation of course will leave us with 3D data (n_samples, timesteps, n_sensors), which we will have to featurize in order to keep using traditional ML methods.

In [None]:
WINDOW_SIZE = 20

In [None]:
def get_windowed_dataframes(df):
  df_groups = df.sort_values(['unit_number', 'time']).groupby('unit_number')
  all_rollings = []
  for _, group_df in df_groups:
    group_df_rolling = group_df.rolling(window=WINDOW_SIZE)  
    all_rollings.extend([wnd for wnd in group_df_rolling if len(wnd) == WINDOW_SIZE])
  return all_rollings


def get_windowed_xy(all_rollings):
  all_rollings_X = np.array([wnd[[x for x in wnd.columns if 'sensor_' in x]].values for wnd in all_rollings])
  all_rollings_y = np.array([wnd['RUL'].iloc[-1] for wnd in all_rollings]).clip(max=125)
  return all_rollings_X, all_rollings_y

In [None]:
all_rollings = get_windowed_dataframes(train_df)
X_train_rolling, y_train_rolling = get_windowed_xy(all_rollings)
X_test_rolling = np.array(test_df.sort_values(['unit_number', 'time']).groupby('unit_number').\
                          apply(lambda group_df: group_df[[x for x in test_df.columns if 'sensor_' in x]].\
                                iloc[-WINDOW_SIZE:].values).tolist())

In [None]:
for i, rolling_df in enumerate(all_rollings):
  rolling_df['instance_id'] = i
  rolling_df.drop(columns=['unit_number'], inplace=True)
all_rollings_df = pd.concat(all_rollings)

Our first attempt at this windowed approach will be to flatten the dimensions of each window and simply feed all of it to xgboost. This will of course hide the time-related information from the model.

In [None]:
xgbr_windows_naive = XGBRegressor()
xgbr_windows_naive.fit(X_train_rolling.reshape(X_train_rolling.shape[0], -1), y_train_rolling)



XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=1, gamma=0,
             importance_type='gain', learning_rate=0.1, max_delta_step=0,
             max_depth=3, min_child_weight=1, missing=None, n_estimators=100,
             n_jobs=1, nthread=None, objective='reg:linear', random_state=0,
             reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
             silent=None, subsample=1, verbosity=1)

We can see that the train set performance improved a bit, while the test set performance deteriorated. This is an expected behaviour, because we gave the model more data for each sample, thus each sample could be "memorized" by it better. Because the method is very naive and doesn't generalize well, the test set performance deteriorated.

In [None]:
print_train_test_results(X_train_rolling.reshape(X_train_rolling.shape[0], -1),
                         X_test_rolling.reshape(X_test_rolling.shape[0], -1),
                         y_train_rolling, y_test, xgbr_windows_naive)

RMSE on train set: 15.908527707436281
RMSE on test set: 16.429387222606096


### Automatic Feature Generation

Let us now try to generate features from each mini-time series of 20 time steps. We will use the python package `tsfresh` which generates features automatically from a set of fixed operations.

In [None]:
all_rollings_df_X = all_rollings_df.drop(columns=['RUL'])

In [None]:
extracted_features = extract_features(all_rollings_df_X,
                                      column_id="instance_id", column_sort="time",
                                      n_jobs=4, default_fc_parameters=tsfresh.feature_extraction.settings.MinimalFCParameters())

Feature Extraction: 100%|██████████| 20/20 [02:37<00:00,  7.85s/it]


In [None]:
impute(extracted_features)
features_filtered = select_features(extracted_features, y_train_rolling, n_jobs=4)

In [None]:
features_filtered.head()

Unnamed: 0,sensor_24__minimum,sensor_11__root_mean_square,sensor_16__minimum,sensor_11__mean,sensor_11__median,sensor_11__sum_values,sensor_10__minimum,sensor_11__maximum,sensor_10__maximum,sensor_18__sum_values,sensor_10__mean,sensor_10__median,sensor_10__sum_values,sensor_15__root_mean_square,sensor_7__maximum,sensor_10__root_mean_square,sensor_7__root_mean_square,sensor_11__minimum,sensor_16__root_mean_square,sensor_15__minimum,sensor_16__sum_values,sensor_15__mean,sensor_15__median,sensor_15__sum_values,sensor_14__minimum,sensor_16__maximum,sensor_14__maximum,sensor_16__median,sensor_16__mean,sensor_14__mean,sensor_14__median,sensor_14__sum_values,sensor_24__maximum,sensor_14__root_mean_square,sensor_18__median,sensor_7__minimum,sensor_7__mean,sensor_5__sum_values,sensor_20__median,sensor_20__mean,...,sensor_12__median,sensor_12__minimum,sensor_12__standard_deviation,sensor_12__variance,sensor_14__standard_deviation,sensor_14__variance,sensor_17__maximum,sensor_17__root_mean_square,sensor_17__median,sensor_17__mean,sensor_17__sum_values,sensor_17__minimum,sensor_15__standard_deviation,sensor_15__variance,sensor_7__standard_deviation,sensor_7__variance,sensor_23__variance,sensor_23__standard_deviation,sensor_18__standard_deviation,sensor_18__variance,sensor_11__variance,sensor_11__standard_deviation,sensor_16__variance,sensor_16__standard_deviation,sensor_20__standard_deviation,sensor_20__variance,sensor_10__standard_deviation,sensor_10__variance,sensor_5__standard_deviation,sensor_5__variance,sensor_1__maximum,sensor_6__standard_deviation,sensor_6__variance,sensor_2__standard_deviation,sensor_2__variance,sensor_1__standard_deviation,sensor_1__variance,sensor_24__standard_deviation,sensor_24__variance,sensor_2__root_mean_square
0,0.526788,0.246881,0.176471,0.242424,0.227273,4.848485,0.5781,0.333333,0.798712,6.729127,0.688486,0.698873,13.769726,0.707898,0.404625,0.691311,0.317965,0.151515,0.242402,0.577825,4.794118,0.703412,0.678038,14.06823,0.107143,0.294118,0.380952,0.227941,0.239706,0.24881,0.241071,4.97619,0.807097,0.259199,0.352443,0.21185,0.314365,6.674699,0.333333,0.320833,...,0.124675,0.085569,0.018699,0.00035,0.072648,0.005278,0.209722,0.169649,0.169522,0.16858,3.371607,0.132883,0.079577,0.006332,0.047711,0.002276,0.004979,0.070564,0.079634,0.006342,0.002181,0.0467,0.0013,0.036052,0.066012,0.004358,0.06243,0.003898,0.105734,0.01118,0.683908,0.086516,0.007485,0.211271,0.044635,0.131272,0.017232,0.070005,0.004901,0.554339
1,0.526788,0.251145,0.176471,0.246212,0.227273,4.924242,0.5781,0.333333,0.798712,6.672951,0.686232,0.689211,13.724638,0.716027,0.404625,0.689012,0.316065,0.151515,0.24691,0.577825,4.882353,0.711514,0.682303,14.230277,0.107143,0.294118,0.380952,0.235294,0.244118,0.239286,0.232143,4.785714,0.807097,0.248935,0.338977,0.21185,0.312281,6.840361,0.333333,0.320833,...,0.124675,0.085569,0.018421,0.000339,0.068636,0.004711,0.209722,0.16833,0.169522,0.167393,3.347869,0.132883,0.080268,0.006443,0.048766,0.002378,0.005164,0.071858,0.079604,0.006337,0.002454,0.049533,0.001371,0.037028,0.066012,0.004358,0.061837,0.003824,0.099991,0.009998,0.683908,0.085229,0.007264,0.196143,0.038472,0.131368,0.017258,0.072608,0.005272,0.568258
2,0.526788,0.251145,0.176471,0.246212,0.227273,4.924242,0.5781,0.333333,0.798712,6.563678,0.685266,0.689211,13.705314,0.712954,0.404625,0.688145,0.313786,0.151515,0.243914,0.577825,4.823529,0.708529,0.682303,14.170576,0.107143,0.294118,0.35119,0.227941,0.241176,0.231845,0.232143,4.636905,0.807097,0.239597,0.324356,0.21185,0.310111,7.027108,0.333333,0.320833,...,0.124966,0.085569,0.020529,0.000421,0.060453,0.003655,0.209722,0.168065,0.169522,0.167115,3.342295,0.132883,0.079311,0.00629,0.04788,0.002292,0.005219,0.072242,0.077816,0.006055,0.002454,0.049533,0.001328,0.03644,0.066012,0.004358,0.062881,0.003954,0.102738,0.010555,0.683908,0.087016,0.007572,0.185358,0.034358,0.127237,0.016189,0.071266,0.005079,0.576447
3,0.526788,0.248873,0.176471,0.243939,0.227273,4.878788,0.5781,0.333333,0.798712,6.587534,0.683011,0.674718,13.660225,0.703479,0.404625,0.685888,0.306306,0.151515,0.245328,0.577825,4.852941,0.698934,0.678038,13.978678,0.107143,0.294118,0.35119,0.235294,0.242647,0.23125,0.232143,4.625,0.807097,0.23899,0.324356,0.211006,0.302135,6.963855,0.333333,0.329167,...,0.124675,0.085569,0.020825,0.000434,0.06033,0.00364,0.209722,0.166767,0.168155,0.165739,3.31479,0.132883,0.079838,0.006374,0.050373,0.002537,0.005233,0.072343,0.078436,0.006152,0.002431,0.049306,0.001308,0.036172,0.055746,0.003108,0.062757,0.003938,0.103902,0.010796,0.695402,0.087098,0.007586,0.189801,0.036024,0.129736,0.016831,0.07419,0.005504,0.554339
4,0.526788,0.249862,0.176471,0.244697,0.227273,4.893939,0.5781,0.333333,0.798712,6.667949,0.675282,0.666667,13.505636,0.694446,0.404625,0.678371,0.30372,0.151515,0.241441,0.577825,4.779412,0.691151,0.678038,13.823028,0.107143,0.294118,0.35119,0.227941,0.238971,0.240476,0.232143,4.809524,0.807097,0.248785,0.324356,0.211006,0.299553,6.972892,0.333333,0.329167,...,0.124428,0.085569,0.021796,0.000475,0.06376,0.004065,0.209722,0.16678,0.168155,0.165752,3.315048,0.132883,0.067562,0.004565,0.050141,0.002514,0.005003,0.07073,0.071766,0.00515,0.002555,0.050542,0.001187,0.034449,0.055746,0.003108,0.064662,0.004181,0.1039,0.010795,0.695402,0.087165,0.007598,0.196143,0.038472,0.128862,0.016605,0.074252,0.005513,0.568258


In [None]:
def get_last_window_from_unit(group_df):
  res_df = group_df[[x for x in test_df.columns if 'sensor_' in x]].iloc[-10:]
  res_df['time'] = np.arange(1, len(res_df) + 1)
  return res_df

test_df_X = test_df.sort_values(['unit_number', 'time']).groupby('unit_number').\
                          apply(get_last_window_from_unit).reset_index().drop(columns=['level_1'])
extracted_features_test = extract_features(test_df_X, column_id="unit_number", column_sort="time",
                                           n_jobs=4, default_fc_parameters=tsfresh.feature_extraction.settings.MinimalFCParameters())

Feature Extraction: 100%|██████████| 20/20 [00:00<00:00, 22.21it/s]


In [None]:
extracted_features_test_min = extracted_features_test[features_filtered.columns]

In [None]:
xgbr_features_automatic = XGBRegressor()
xgbr_features_automatic.fit(features_filtered.values, y_train_rolling)



XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=1, gamma=0,
             importance_type='gain', learning_rate=0.1, max_delta_step=0,
             max_depth=3, min_child_weight=1, missing=None, n_estimators=100,
             n_jobs=1, nthread=None, objective='reg:linear', random_state=0,
             reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
             silent=None, subsample=1, verbosity=1)

In [None]:
print_train_test_results(features_filtered.values,
                         extracted_features_test_min.values,
                         y_train_rolling, y_test, xgbr_features_automatic)

RMSE on train set: 14.486881328112556
RMSE on test set: 23.113512123440437


### Manual Feature Generation

In this part we will apply several operations to the measurements of each sensor, in each window. We will avoid creating features that combine sensors because we lack the domain knowledge (for example, the meaning of the sensors) required to generate these kinds of combinations.  
The operations for each sensor will be as follows:  
* average value
* average diff value
* standard deviation
* maximum value
* minimum value


In [None]:
def avg_diff(series):
  return np.mean(np.diff(series))

all_rollings_grouped = all_rollings_df_X.drop(columns=['time']).groupby('instance_id').agg(['mean', avg_diff, 'std', 'max', 'min'])

In [None]:
test_df_grouped = test_df.sort_values(['unit_number', 'time']).groupby('unit_number').\
                          apply(lambda group_df: group_df[[x for x in test_df.columns if 'sensor_' in x]].\
                                iloc[-WINDOW_SIZE:]).reset_index()
test_df_aggregated = test_df_grouped.drop(columns=['level_1']).groupby('unit_number').agg(['mean', avg_diff, 'std', 'max', 'min'])

In [None]:
X_features_manual_train = all_rollings_grouped.values
X_features_manual_test = test_df_aggregated.values
xgbr_features_manual = XGBRegressor()
xgbr_features_manual.fit(X_features_manual_train, y_train_rolling)



XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=1, gamma=0,
             importance_type='gain', learning_rate=0.1, max_delta_step=0,
             max_depth=3, min_child_weight=1, missing=None, n_estimators=100,
             n_jobs=1, nthread=None, objective='reg:linear', random_state=0,
             reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
             silent=None, subsample=1, verbosity=1)

In [None]:
print_train_test_results(X_features_manual_train, X_features_manual_test,
                         y_train_rolling, y_test, xgbr_features_manual)

RMSE on train set: 14.58791927615462
RMSE on test set: 18.131375854948576


In [None]:
# from sklearn.feature_selection import SelectKBest
# from sklearn.feature_selection import chi2

# select_k_best = SelectKBest(chi2, k=100)
# select_k_best.fit(scaler.transform(X_features_manual_train), y_train_rolling)
# xgbr_k_best_features.fit(select_k_best.transform(scaler.transform(X_features_manual_train)), y_train_rolling)

# print_train_test_results(select_k_best.transform(scaler.transform(X_features_manual_train)),
#                          select_k_best.transform(scaler.transform(X_features_manual_test)),
#                          y_train_rolling, y_test, xgbr_k_best_features)

ValueError: ignored

## LSTM

In this section we will use a simple LSTM network to process the data windows ans see if we gain any improvement from our baseline model (which the feature approach failed to deliver).

In [None]:
feature_columns = [x for x in all_rollings_df.columns if 'sensor_' in x]
feature_columns

['sensor_1',
 'sensor_2',
 'sensor_5',
 'sensor_6',
 'sensor_7',
 'sensor_10',
 'sensor_11',
 'sensor_12',
 'sensor_14',
 'sensor_15',
 'sensor_16',
 'sensor_17',
 'sensor_18',
 'sensor_20',
 'sensor_23',
 'sensor_24']

In [None]:
from torch.utils.data import TensorDataset

class RULDataModule(pl.LightningDataModule):
  def __init__(self, X_train, y_train, X_val, y_val, X_test, y_test,
               batch_size):
    super().__init__()
    self.X_train = X_train
    self.y_train = y_train
    self.X_val = X_val
    self.y_val = y_val
    self.X_test = X_test
    self.y_test = y_test
    self.batch_size = batch_size

  def setup(self, stage=None):
    self.train_dataset = TensorDataset(torch.Tensor(self.X_train), torch.Tensor(self.y_train))
    self.val_dataset = TensorDataset(torch.Tensor(self.X_val), torch.Tensor(self.y_val))
    self.test_dataset = TensorDataset(torch.Tensor(self.X_test), torch.Tensor(self.y_test))

  def train_dataloader(self):
    return DataLoader(self.train_dataset,
                      batch_size=self.batch_size,
                      shuffle=True,
                      num_workers=cpu_count())
  
  def val_dataloader(self):
    return DataLoader(self.val_dataset,
                      batch_size=self.batch_size,
                      shuffle=False,
                      num_workers=cpu_count())
    
  def test_dataloader(self):
    return DataLoader(self.test_dataset,
                      batch_size=self.batch_size,
                      shuffle=False,
                      num_workers=cpu_count())    

In [None]:
n_epochs = 65
batch_size = 64

X_train_lstm, X_val_lstm, y_train_lstm, y_val_lstm = train_test_split(
    X_train_rolling, y_train_rolling, test_size=0.2, stratify=y_train_rolling)

data_module = RULDataModule(X_train_lstm, y_train_lstm,
                            X_val_lstm, y_val_lstm,
                            X_test_rolling, y_test, batch_size=batch_size)

In [None]:
np.unique(y_val_lstm, return_counts=True)

(array([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,
         13,  14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,
         26,  27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,
         39,  40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,
         52,  53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,
         65,  66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,
         78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,
         91,  92,  93,  94,  95,  96,  97,  98,  99, 100, 101, 102, 103,
        104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116,
        117, 118, 119, 120, 121, 122, 123, 124, 125]),
 array([  20,   20,   20,   20,   20,   20,   20,   20,   20,   20,   20,
          20,   20,   20,   20,   20,   20,   20,   20,   20,   20,   20,
          20,   20,   20,   20,   20,   20,   20,   20,   20,   20,   20,
          20,   20,   20,   20,   20,   20,   20,   20,   20,   20

In [None]:
class RULModel(nn.Module):
  def __init__(self, n_features, n_hidden=256, n_layers=3):
    super().__init__()
    self.lstm = nn.LSTM(
        input_size=n_features,
        hidden_size=n_hidden,
        num_layers=n_layers,
        batch_first=True,
        dropout=0.75
    )

    self.regressor = nn.Linear(n_hidden, 1)

  def forward(self, x):
    self.lstm.flatten_parameters()
    _, (hidden, _) = self.lstm(x)
    out = hidden[-1]
    return self.regressor(out).squeeze()

In [None]:
class RULPredictor(pl.LightningModule):
  def __init__(self, n_features: int):
    super().__init__()
    self.model = RULModel(n_features)
    self.criterion = nn.MSELoss()
  
  def forward(self, x, labels=None):
    output = self.model(x)
    loss = 0
    if labels is not None:
      loss = self.criterion(output, labels)
    return loss, output

  def training_step(self, batch, batch_idx):
    X, y = batch
    loss, outputs = self(X, y)
    step_rmse = torchmetrics.functional.mean_squared_error(outputs, y, squared=False)
    self.log('train_loss', loss, prog_bar=True, logger=True)
    self.log('train_RMSE', step_rmse, prog_bar=True, logger=True)
    return {'loss': loss, 'RMSE': step_rmse}
  
  def validation_step(self, batch, batch_idx):
    X, y = batch
    loss, outputs = self(X, y)
    step_rmse = torchmetrics.functional.mean_squared_error(outputs, y, squared=False)
    self.log('val_loss', loss, prog_bar=True, logger=True)
    self.log('val_RMSE', step_rmse, prog_bar=True, logger=True)
    return {'loss': loss, 'RMSE': step_rmse}
  
  def test_step(self, batch, batch_idx):
    X, y = batch
    loss, outputs = self(X, y)
    step_rmse = torchmetrics.functional.mean_squared_error(outputs, y, squared=False)
    self.log('test_loss', loss, prog_bar=True, logger=True)
    self.log('test_RMSE', step_rmse, prog_bar=True, logger=True)
    return {'loss': loss, 'RMSE': step_rmse}

  def configure_optimizers(self):
    return optim.Adam(self.parameters(), lr=0.0001)
    

In [None]:
model = RULPredictor(n_features=len(feature_columns))

In [None]:
%load_ext tensorboard
%tensorboard --logdir ./lightning_logs

[1;30;43mThis cell output is too large and can only be displayed while logged in.[0m


In [None]:
checkpoint_callback = ModelCheckpoint(
    dirpath='checkpoints',
    filename='best-checkpoint',
    save_top_k=1,
    verbose=True,
    monitor='val_loss',
    mode='min'
)

early_stop_callback = EarlyStopping(monitor="val_loss", patience=7,
                                    verbose=True, mode="min")
logger = TensorBoardLogger('lightning_logs', name='RUL')
trainer = pl.Trainer(
    logger=logger,
    callbacks=[checkpoint_callback, early_stop_callback],
    max_epochs=n_epochs,
    gpus=1,
    progress_bar_refresh_rate=30
)


Checkpoint directory checkpoints exists and is not empty.

GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs


In [None]:
trainer.fit(model, data_module)

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name      | Type     | Params
---------------------------------------
0 | model     | RULModel | 1.3 M 
1 | criterion | MSELoss  | 0     
---------------------------------------
1.3 M     Trainable params
0         Non-trainable params
1.3 M     Total params
5.334     Total estimated model params size (MB)


Validation sanity check: 0it [00:00, ?it/s]

Global seed set to 42


Training: -1it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Metric val_loss improved. New best score: 6506.877
Epoch 0, global step 234: val_loss reached 6506.87695 (best 6506.87695), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 824.691 >= min_delta = 0.0. New best score: 5682.186
Epoch 1, global step 469: val_loss reached 5682.18555 (best 5682.18555), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 679.994 >= min_delta = 0.0. New best score: 5002.191
Epoch 2, global step 704: val_loss reached 5002.19141 (best 5002.19141), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 581.425 >= min_delta = 0.0. New best score: 4420.766
Epoch 3, global step 939: val_loss reached 4420.76611 (best 4420.76611), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 498.424 >= min_delta = 0.0. New best score: 3922.342
Epoch 4, global step 1174: val_loss reached 3922.34204 (best 3922.34204), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 426.750 >= min_delta = 0.0. New best score: 3495.593
Epoch 5, global step 1409: val_loss reached 3495.59253 (best 3495.59253), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 361.351 >= min_delta = 0.0. New best score: 3134.241
Epoch 6, global step 1644: val_loss reached 3134.24121 (best 3134.24121), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 302.662 >= min_delta = 0.0. New best score: 2831.579
Epoch 7, global step 1879: val_loss reached 2831.57886 (best 2831.57886), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 252.630 >= min_delta = 0.0. New best score: 2578.949
Epoch 8, global step 2114: val_loss reached 2578.94922 (best 2578.94922), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 205.267 >= min_delta = 0.0. New best score: 2373.682
Epoch 9, global step 2349: val_loss reached 2373.68188 (best 2373.68188), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 166.685 >= min_delta = 0.0. New best score: 2206.997
Epoch 10, global step 2584: val_loss reached 2206.99658 (best 2206.99658), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 129.996 >= min_delta = 0.0. New best score: 2077.000
Epoch 11, global step 2819: val_loss reached 2077.00049 (best 2077.00049), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 100.735 >= min_delta = 0.0. New best score: 1976.265
Epoch 12, global step 3054: val_loss reached 1976.26501 (best 1976.26501), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 74.528 >= min_delta = 0.0. New best score: 1901.737
Epoch 13, global step 3289: val_loss reached 1901.73743 (best 1901.73743), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 54.130 >= min_delta = 0.0. New best score: 1847.607
Epoch 14, global step 3524: val_loss reached 1847.60730 (best 1847.60730), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 37.769 >= min_delta = 0.0. New best score: 1809.839
Epoch 15, global step 3759: val_loss reached 1809.83875 (best 1809.83875), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 24.798 >= min_delta = 0.0. New best score: 1785.041
Epoch 16, global step 3994: val_loss reached 1785.04089 (best 1785.04089), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 15.465 >= min_delta = 0.0. New best score: 1769.576
Epoch 17, global step 4229: val_loss reached 1769.57593 (best 1769.57593), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 9.324 >= min_delta = 0.0. New best score: 1760.252
Epoch 18, global step 4464: val_loss reached 1760.25220 (best 1760.25220), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 5.142 >= min_delta = 0.0. New best score: 1755.110
Epoch 19, global step 4699: val_loss reached 1755.11035 (best 1755.11035), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 2.509 >= min_delta = 0.0. New best score: 1752.602
Epoch 20, global step 4934: val_loss reached 1752.60168 (best 1752.60168), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 1.207 >= min_delta = 0.0. New best score: 1751.395
Epoch 21, global step 5169: val_loss reached 1751.39478 (best 1751.39478), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.506 >= min_delta = 0.0. New best score: 1750.888
Epoch 22, global step 5404: val_loss reached 1750.88831 (best 1750.88831), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.212 >= min_delta = 0.0. New best score: 1750.677
Epoch 23, global step 5639: val_loss reached 1750.67651 (best 1750.67651), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.080 >= min_delta = 0.0. New best score: 1750.596
Epoch 24, global step 5874: val_loss reached 1750.59631 (best 1750.59631), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.010 >= min_delta = 0.0. New best score: 1750.586
Epoch 25, global step 6109: val_loss reached 1750.58630 (best 1750.58630), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Epoch 26, global step 6344: val_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 27, global step 6579: val_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.007 >= min_delta = 0.0. New best score: 1750.579
Epoch 28, global step 6814: val_loss reached 1750.57910 (best 1750.57910), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Epoch 29, global step 7049: val_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 30, global step 7284: val_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 31, global step 7519: val_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 1750.579
Epoch 32, global step 7754: val_loss reached 1750.57898 (best 1750.57898), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.001 >= min_delta = 0.0. New best score: 1750.578
Epoch 33, global step 7989: val_loss reached 1750.57837 (best 1750.57837), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Epoch 34, global step 8224: val_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 35, global step 8459: val_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 36, global step 8694: val_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 37, global step 8929: val_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 1750.578
Epoch 38, global step 9164: val_loss reached 1750.57788 (best 1750.57788), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 22.493 >= min_delta = 0.0. New best score: 1728.085
Epoch 39, global step 9399: val_loss reached 1728.08459 (best 1728.08459), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 1074.838 >= min_delta = 0.0. New best score: 653.247
Epoch 40, global step 9634: val_loss reached 653.24701 (best 653.24701), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 141.306 >= min_delta = 0.0. New best score: 511.941
Epoch 41, global step 9869: val_loss reached 511.94052 (best 511.94052), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 115.737 >= min_delta = 0.0. New best score: 396.204
Epoch 42, global step 10104: val_loss reached 396.20370 (best 396.20370), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 40.989 >= min_delta = 0.0. New best score: 355.214
Epoch 43, global step 10339: val_loss reached 355.21432 (best 355.21432), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 46.064 >= min_delta = 0.0. New best score: 309.150
Epoch 44, global step 10574: val_loss reached 309.15033 (best 309.15033), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 7.989 >= min_delta = 0.0. New best score: 301.161
Epoch 45, global step 10809: val_loss reached 301.16119 (best 301.16119), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 24.967 >= min_delta = 0.0. New best score: 276.194
Epoch 46, global step 11044: val_loss reached 276.19421 (best 276.19421), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 14.384 >= min_delta = 0.0. New best score: 261.811
Epoch 47, global step 11279: val_loss reached 261.81058 (best 261.81058), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Epoch 48, global step 11514: val_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 7.102 >= min_delta = 0.0. New best score: 254.709
Epoch 49, global step 11749: val_loss reached 254.70894 (best 254.70894), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 12.048 >= min_delta = 0.0. New best score: 242.661
Epoch 50, global step 11984: val_loss reached 242.66113 (best 242.66113), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 4.917 >= min_delta = 0.0. New best score: 237.744
Epoch 51, global step 12219: val_loss reached 237.74380 (best 237.74380), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 3.349 >= min_delta = 0.0. New best score: 234.395
Epoch 52, global step 12454: val_loss reached 234.39499 (best 234.39499), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 4.054 >= min_delta = 0.0. New best score: 230.341
Epoch 53, global step 12689: val_loss reached 230.34096 (best 230.34096), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Epoch 54, global step 12924: val_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 55, global step 13159: val_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Epoch 56, global step 13394: val_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 4.954 >= min_delta = 0.0. New best score: 225.387
Epoch 57, global step 13629: val_loss reached 225.38693 (best 225.38693), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 2.003 >= min_delta = 0.0. New best score: 223.384
Epoch 58, global step 13864: val_loss reached 223.38411 (best 223.38411), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Epoch 59, global step 14099: val_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 2.144 >= min_delta = 0.0. New best score: 221.240
Epoch 60, global step 14334: val_loss reached 221.24036 (best 221.24036), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Epoch 61, global step 14569: val_loss was not in top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 2.059 >= min_delta = 0.0. New best score: 219.182
Epoch 62, global step 14804: val_loss reached 219.18159 (best 219.18159), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.284 >= min_delta = 0.0. New best score: 218.898
Epoch 63, global step 15039: val_loss reached 218.89793 (best 218.89793), saving model to "/content/checkpoints/best-checkpoint-v3.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Epoch 64, global step 15274: val_loss was not in top 1


In [None]:
trained_model = RULPredictor.load_from_checkpoint(
    trainer.checkpoint_callback.best_model_path,
    n_features=len(feature_columns)
)
trained_model.freeze()

In [None]:
from tqdm import tqdm

def get_predictions_and_labels_lstm(dataset):
  predictions = []
  labels = []
  for item in tqdm(dataset):
    X, y = item
    _, output = trained_model(X.unsqueeze(dim=0))
    predictions.append(output.item())
    labels.append(y.item())
  return predictions, labels

In [None]:
print(f'RMSE on train set: {mean_squared_error(*get_predictions_and_labels_lstm(data_module.train_dataset), squared=False)}')
print(f'RMSE on test set: {mean_squared_error(*get_predictions_and_labels_lstm(data_module.test_dataset), squared=False)}')

100%|██████████| 14984/14984 [01:29<00:00, 167.66it/s]


RMSE on train set: 14.761564144403062


100%|██████████| 100/100 [00:00<00:00, 170.43it/s]

RMSE on test set: 16.065719776970667





### Signal Processing

The readme for this dataset states that the data contains sensor noise, why not try to smooth them out with some low-pass filters? Let's first perform an FFT to understand the behaviour of the signals.

In [None]:
def butter_lowpass_filter(data, cutoff, sample_rate, order):
  nyq = sample_rate * 0.5
  normal_cutoff = cutoff / nyq
  b, a = butter(order, normal_cutoff, btype='low', analog=False)
  y = filtfilt(b, a, data)
  return y

In [None]:
signal = train_df[train_df['unit_number'] == 20]['sensor_14'].values
sig_fft = fftpack.fft(signal)
sig_amp = 2 / len(signal) * np.abs(sig_fft)
sig_freq = np.abs(fftpack.fftfreq(len(signal), 1))

In [None]:
px.line(x=sig_freq, y=sig_amp)

In [None]:
filtered_signal = butter_lowpass_filter(train_df[train_df['unit_number'] == 100]['sensor_1'].values,
                      cutoff=0.05,
                      sample_rate=1,
                      order=2)
px.line(x=np.arange(len(filtered_signal)), y=filtered_signal)

In [None]:
def add_lowpass_sensors_to_df(df):
    for sensor_name in [x for x in df.columns if 'sensor_' in x]:
      lowpass_sensor = df.groupby('unit_number').apply(lambda group_df:
                                                            pd.DataFrame({f'{sensor_name}_lowpass': butter_lowpass_filter(group_df['sensor_1'].values,
                                                                                              cutoff=0.05,
                                                                                              sample_rate=1,
                                                                                              order=2),
                                                                        'time': group_df['time']})).reset_index()
      df[f'{sensor_name}_lowpass'] = lowpass_sensor[f'{sensor_name}_lowpass'].values  