### USX Predictive Maintenance

The business problem for this notebook is about predicting problems caused by component failures such that the question "What is the probability that a machine will fail in the near future due to a failure of a certain component?" can be answered. The problem is formatted as a multi-class classification problem and a machine learning algorithm is used to create the predictive model that learns from historical data collected from machines. 

The following sections go through the steps of implementing such a model which are feature engineering, label construction, training and evaluation.

### Headers

In [None]:
# CSS Files
from IPython.core.display import HTML
from IPython.display import Image

In [None]:
%matplotlib inline

# general libs
import sys
import numpy as np
import pandas as pd
import sklearn
import itertools

#plotting libs
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.dates import DayLocator, HourLocator, MinuteLocator, AutoDateLocator, DateFormatter

# date time libs
import datetime as dt
from datetime import timedelta
import statsmodels.api as sm 

# ml libs
from sklearn.ensemble import GradientBoostingClassifier
from sklearn import metrics
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
from sklearn.preprocessing import scale
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
from pandas.tools.plotting import scatter_matrix
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC

# CSS Fileb
from IPython.core.display import HTML
from IPython.display import Image

# Style Setting
sns.set(style="ticks", color_codes=True)
sns.set_context("notebook")
sns.set_style("darkgrid")

In [None]:
# Notebook Style Setting
css = open('styles/style-table.css').read() + open('styles/style-notebook.css').read()
HTML('<style>{}</style>'.format(css))

In [None]:
HTML('''<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Toggle Code On/Off"></form>''')

### Data Sources

Common data sources for predictive maintenance problems are :
 1. Failure history: The failure history of a node or component within the node. Ex: Alerts
 2. Machine usage. The operating conditions like metric data collected from sensors. Ex: Telemetry
 3. Machine features: The static info of a node. Ex: Config info like num cpus, memory, network etc.
 4. Error Events: The errors in logs but not has tranlated into actual failures. 
 5. Admin Actions: The behaviorial info like API accesses, Jobs run etc.

### Reading Data...

In [None]:
telemetry = pd.read_csv('data/pdm/pdm_telemetry2.csv', index_col=0, parse_dates=True)
errors = pd.read_csv('data/pdm/pdm_errors.csv', index_col=0, parse_dates=True)
ops = pd.read_csv('data/pdm/pdm_ops.csv', index_col=0, parse_dates=True)
alerts = pd.read_csv('data/pdm/pdm_alerts.csv', index_col=0, parse_dates=True)
machines = pd.read_csv('data/pdm/pdm_machines.csv', index_col=0, parse_dates=True)

### Data Cleanup...

#### Processing telemetry data...

The first data source is the telemetry time-series data which consists of metrics m1, m2, m3, m4 etc collected from 100 machines in real time averaged over every hour collected. Below, we display the first 10 records in the dataset. A summary of the whole dataset is also provided.

In [None]:
telemetry = telemetry.dropna()
telemetry['datetime'] = telemetry.index
telemetry.head()

In [None]:
# plot telemetry data
plot_df = telemetry.loc[(telemetry['machineID'] == 1) &
                        (telemetry['datetime'] > pd.to_datetime('2015-01-01')) &
                        (telemetry['datetime'] < pd.to_datetime('2015-02-01')), ['datetime', 'm1']]

sns.set_style("darkgrid")
plt.figure(figsize=(12, 3))
plt.plot(plot_df['datetime'], plot_df['m1'])
plt.ylabel('metric1')

# make x-axis ticks legible
adf = plt.gca().get_xaxis().get_major_formatter()
adf.scaled[1.0] = '%m-%d'
plt.xlabel('Date')

#### Processing error data....

The second major data source is the error logs. These are non-breaking errors thrown while the machine is still operational and do not constitute as failures. The error date and times are rounded to the closest hour since the telemetry data is collected at an hourly rate.

In [None]:
errors = errors.dropna()
errors['datetime'] = errors.index
errors['errorID'] = errors['errorID'].astype('category')

print("Total number of error records: %d" % len(errors.index))
errors.head()

In [None]:
# Plot error data
sns.set_style("darkgrid")
plt.figure(figsize=(6, 3))
errors['errorID'].value_counts().plot(kind='bar')
plt.ylabel('Count')

#### Processing jobs/ops data

These are the scheduled and unscheduled admin operations corresponding to both regular inspection of components as well as failures.

In [None]:
ops = ops.dropna()
ops['datetime'] = ops.index
ops['optype'] = ops['optype'].astype('category')

print("Total number of operation records: %d" % len(ops.index))
ops.head()

In [None]:
sns.set_style("darkgrid")
plt.figure(figsize=(6, 3))
ops['optype'].value_counts().plot(kind='bar')
plt.ylabel('Count')

#### Processing machines (config) data...

This data set includes some config information about the machines: metrics: cm1, cm2, ....

In [None]:
machines = machines.dropna()
machines['machineID'] = machines.index
machines['model'] = machines['model'].astype('category')
machines.head()

In [None]:
sns.set_style("darkgrid")
plt.figure(figsize=(6, 4))
_, bins, _ = plt.hist([machines.loc[machines['model'] == 'model1', 'cm1'],
                       machines.loc[machines['model'] == 'model2', 'cm1'],
                       machines.loc[machines['model'] == 'model3', 'cm1'],
                       machines.loc[machines['model'] == 'model4', 'cm1']],
                       20, stacked=True, label=['model1', 'model2', 'model3', 'model4'])
plt.xlabel('config metric 1 (cm1)')
plt.ylabel('Count')
plt.legend()

#### Processing alert data...

These are the records of due to failures. Each record has a date and time, machine ID, and alert type.

In [None]:
alerts = alerts.dropna()
alerts['datetime'] = alerts.index
alerts['alerttype'] = failures['alerttype'].astype('category')

print("Total number of alerts: %d" % len(alerts.index))
alerts.head()

In [None]:
sns.set_style("darkgrid")
plt.figure(figsize=(6, 4))
alerts['alerttype'].value_counts().plot(kind='bar')
plt.ylabel('Count')

### Feature Engineering

The first step in predictive maintenance is feature engineering which requires bringing the different data sources together to create features that best describe a machines's health condition at a given point in time. In the next sections, several feature engineering methods are used to create features based on the properties of each data source.

#### Lag features in telemetry data...

Telemetry data almost always comes with time-stamps which makes it suitable for calculating lagging features. A common method is to pick a window size for the lag features to be created and compute rolling aggregate measures such as mean, standard deviation, minimum, maximum, etc. to represent the short term history of the telemetry over the lag window. In the following, rolling mean and standard deviation of the telemetry data over the last 3 hour lag window is calculated for every 3 hours.

In [None]:
# Capturing 3 hour lag...

# Calculate mean values for telemetry features
temp = []
fields = ['m1', 'm2', 'm3', 'm4']
for col in fields:
    temp.append(pd.pivot_table(telemetry,
                               index='datetime',
                               columns='machineID',
                               values=col).resample('3H', closed='left', label='right').mean().unstack())
telemetry_mean_3h = pd.concat(temp, axis=1)
telemetry_mean_3h.columns = [i + 'mean_3h' for i in fields]
telemetry_mean_3h.reset_index(inplace=True)

# repeat for standard deviation
temp = []
for col in fields:
    temp.append(pd.pivot_table(telemetry,
                               index='datetime',
                               columns='machineID',
                               values=col).resample('3H', closed='left', label='right').std().unstack())
telemetry_sd_3h = pd.concat(temp, axis=1)
telemetry_sd_3h.columns = [i + 'sd_3h' for i in fields]
telemetry_sd_3h.reset_index(inplace=True)

telemetry_mean_3h.head()

For capturing a longer term effect, 24 hour lag features are also calculated as below.

In [None]:
# Capturing longer term i.e., 24 hour lag...
temp = []
fields = ['m1', 'm2', 'm3', 'm4']
for col in fields:
    temp.append(pd.rolling_mean(pd.pivot_table(telemetry,
                                               index='datetime',
                                               columns='machineID',
                                               values=col), window=24).resample('3H',
                                                                                closed='left',
                                                                                label='right').first().unstack())
telemetry_mean_24h = pd.concat(temp, axis=1)
telemetry_mean_24h.columns = [i + 'mean_24h' for i in fields]
telemetry_mean_24h.reset_index(inplace=True)
telemetry_mean_24h = telemetry_mean_24h.loc[-telemetry_mean_24h['m1mean_24h'].isnull()]

# repeat for standard deviation
temp = []
fields = ['m1', 'm2', 'm3', 'm4']
for col in fields:
    temp.append(pd.rolling_std(pd.pivot_table(telemetry,
                                               index='datetime',
                                               columns='machineID',
                                               values=col), window=24).resample('3H',
                                                                                closed='left',
                                                                                label='right').first().unstack())
telemetry_sd_24h = pd.concat(temp, axis=1)
telemetry_sd_24h.columns = [i + 'sd_24h' for i in fields]
telemetry_sd_24h.reset_index(inplace=True)
telemetry_sd_24h = telemetry_sd_24h.loc[-telemetry_sd_24h['m1sd_24h'].isnull()]

# Notice that a 24h rolling average is not available at the earliest timepoints
telemetry_mean_24h.head(10)

Next, the columns of the feature datasets created earlier are merged to create the final feature set from telemetry

In [None]:
# merge columns of feature sets created earlier
telemetry_feat = pd.concat([telemetry_mean_3h,
                            telemetry_sd_3h.ix[:, 2:6],
                            telemetry_mean_24h.ix[:, 2:6],
                            telemetry_sd_24h.ix[:, 2:6]], axis=1).dropna()
telemetry_feat.describe()

In [None]:
telemetry_feat.head()

#### Lag features in error data...

Like telemetry data, errors come with timestamps. An important difference is that the error IDs are categorical values and should not be averaged over time intervals like the telemetry measurements. Instead, we count the number of errors of each type in a lagging window. We begin by reformatting the error data to have one entry per machine per time at which at least one error occurred:

In [None]:
errors.head()

In [None]:
# create a column for each error type
error_count = pd.get_dummies(errors.set_index('datetime')).reset_index()
error_count.columns = ['datetime', 'machineID', 'error1', 'error2', 'error3', 'error4', 'error5']

# combine errors for a given machine in a given hour
error_count_g = error_count.groupby(['machineID', 'datetime']).sum().reset_index()
error_count_g.head(15)

Now we add blank entries for all other hourly timepoints (since no errors occurred at those times):

In [None]:
error_count = error_count_g.copy()
error_count = telemetry[['datetime', 'machineID']].merge(error_count, on=['machineID', 'datetime'], how='left').fillna(0.0)
error_count.describe()
error_count.head()



Now we compute the total number of errors of each type over the last 24 hours, for timepoints taken every three hours:


In [None]:
temp = []
fields = ['error%d' % i for i in range(1,6)]
print(fields)
for col in fields:
    temp.append(pd.rolling_sum(
            pd.pivot_table(error_count, index='datetime', columns='machineID', values=col), 
            window=24).resample('3H', closed='left', label='right', how='first').unstack())
    
error_count = pd.concat(temp, axis=1)
error_count.columns = [i + 'count' for i in fields]
error_count.reset_index(inplace=True)
error_count = error_count.dropna()
error_count.describe()

In [None]:
error_count.head()

#### Lag features in ops data...
Creating lagging features from ops data is not as straightforward as for telemetry and errors, so the features from this data are generated in a more custom way. This type of ad-hoc feature engineering is very common in predictive maintenance since domain knowledge plays a big role in understanding the predictors of a problem. 

In [None]:

# create a column for each op type
ops_rep = pd.get_dummies(ops.set_index('datetime')).reset_index()
ops_rep.columns = ['datetime', 'machineID', 'op1', 'op2', 'op3', 'op4']

# combine operation for a given machine in a given hour
ops_rep = ops_rep.groupby(['machineID', 'datetime']).sum().reset_index()

# add timepoints where no operations were done
ops_rep = telemetry[['datetime', 'machineID']].merge(ops_rep,
                                                      on=['datetime', 'machineID'],
                                                      how='outer').fillna(0).sort_values(by=['machineID', 'datetime'])

ops = ['op1', 'op2', 'op3', 'op4']
for op in ops:
    # convert indicator to most recent date of op 
    ops_rep.loc[ops_rep[op] < 1, op] = None
    ops_rep.loc[-ops_rep[op].isnull(), op] = ops_rep.loc[-ops_rep[op].isnull(), 'datetime']
    
    # forward-fill the most-recent date of op 
    ops_rep[op] = ops_rep[op].fillna(method='ffill')

# remove dates in 2014 (may have NaN or future component change dates)    
ops_rep = ops_rep.loc[ops_rep['datetime'] > pd.to_datetime('2015-01-01')]

# replace dates of most recent ops  with days since most recent op change
for op in components:
    ops_rep[op] = (ops_rep['datetime'] - ops_rep[op]) / np.timedelta64(1, 'D')
    
ops_rep.describe()

In [None]:
ops_rep.head()

#### Machine features

The machine features can be used without further modification. These include descriptive information about the type of each machine and its config metrics.

#### Final merge....
Finally, we merge all the feature data sets we created earlier to get the final feature matrix.

In [None]:
final_feat = telemetry_feat.merge(error_count, on=['datetime', 'machineID'], how='left')
final_feat = final_feat.merge(ops_rep, on=['datetime', 'machineID'], how='left')
final_feat = final_feat.merge(machines, on=['machineID'], how='left')
final_feat.describe()

In [None]:
final_feat.head()

### Label Construction

When using multi-class classification for predicting failure due to a problem, labelling is done by taking a time window prior to the failure of an asset and labelling the feature records that fall into that window as "about to fail due to a problem" while labelling all other records as "normal." This time window should be picked according to the business case: in some situations it may be enough to predict failures hours in advance, while in others days or weeks may be needed to allow e.g. for arrival of replacement parts.

The prediction problem for this example scenerio is to estimate the probability that a machine will fail in the near future due to a failure of a certain component. More specifically, the goal is to compute the probability that a machine will fail in the next 24 hours due to a certain component failure (component 1, 2, 3, or 4). Below, a categorical failure feature is created to serve as the label. All records within a 24 hour window before a failure of component 1 have failure=comp1, and so on for components 2, 3, and 4; all records not within 24 hours of a component failure have failure=none.


In [None]:
labeled_features = final_feat.merge(alerts, on=['datetime', 'machineID'], how='left')
labeled_features = labeled_features.fillna(method='bfill', limit=7) # fill backward up to 24h
labeled_features = labeled_features.fillna('none')
labeled_features.head()

Below is an example of records that are labeled as failure=comp4 in the failure column. Notice that the first 8 records all occur in the 24-hour window before the first recorded failure of component 4. The next 8 records are within the 24 hour window before another failure of component 4

In [None]:
labeled_features.loc[labeled_features['alerttype'] == 'alert4'][:16]

### Modelling


#### Training, Validation and Testing

When working with time-stamped data as in this example, record partitioning into training, validation, and test sets should be performed carefully to prevent overestimating the performance of the models. 

In predictive maintenance, the features are usually generated using lagging aggregates: records in the same time window will likely have identical labels and similar feature values. These correlations can give a model an "unfair advantage" when predicting on a test set record that shares its time window with a training set record. We therefore partition records into training, validation, and test sets in large chunks, to minimize the number of time intervals shared between them.

Predictive models have no advance knowledge of future chronological trends: in practice, such trends are likely to exist and to adversely impact the model's performance. To obtain an accurate assessment of a predictive model's performance, we recommend training on older records and validating/testing using newer records.

For both of these reasons, a time-dependent record splitting strategy is an excellent choice for predictive maintenace models. The split is effected by choosing a point in time based on the desired size of the training and test sets: all records before the timepoint are used for training the model, and all remaining records are used for testing. (If desired, the timeline could be further divided to create validation sets for parameter selection.) To prevent any records in the training set from sharing time windows with the records in the test set, we remove any records at the boundary -- in this case, by ignoring 24 hours' worth of data prior to the timepoint.


In [None]:
# make test and training splits
threshold_dates = [[pd.to_datetime('2015-07-31 01:00:00'), pd.to_datetime('2015-08-01 01:00:00')],
                   [pd.to_datetime('2015-08-31 01:00:00'), pd.to_datetime('2015-09-01 01:00:00')],
                   [pd.to_datetime('2015-09-30 01:00:00'), pd.to_datetime('2015-10-01 01:00:00')]]

test_results = []
models = []
for last_train_date, first_test_date in threshold_dates:
    # split out training and test data
    train_y = labeled_features.loc[labeled_features['datetime'] < last_train_date, 'alerttype']
    train_X = pd.get_dummies(labeled_features.loc[labeled_features['datetime'] < last_train_date].drop(['datetime',
                                                                                                        'machineID',
                                                                                                        'alerttype'], 1))
    test_X = pd.get_dummies(labeled_features.loc[labeled_features['datetime'] > first_test_date].drop(['datetime',
                                                                                                       'machineID',
                                                                                                       'alerttype'], 1))
    # train and predict using the model, storing results for later
    my_model = GradientBoostingClassifier(random_state=42)
    my_model.fit(train_X, train_y)
    test_result = pd.DataFrame(labeled_features.loc[labeled_features['datetime'] > first_test_date])
    test_result['predicted_failure'] = my_model.predict(test_X)
    test_results.append(test_result)
    models.append(my_model)

we plot the feature importances in the (first) trained model:

In [None]:
sns.set_style("darkgrid")
plt.figure(figsize=(12, 4))
labels, importances = zip(*sorted(zip(test_X.columns, models[0].feature_importances_), reverse=True, key=lambda x: x[1]))
plt.xticks(range(len(labels)), labels)
_, labels = plt.xticks()
plt.setp(labels, rotation=90)
plt.bar(range(len(importances)), importances)
plt.ylabel('Importance')

### Evaluation

In predictive maintenance, machine failures are usually rare occurrences in the lifetime of the assets compared to normal operation. This causes an imbalance in the label distribution which usually causes poor performance as algorithms tend to classify majority class examples better at the expense of minority class examples as the total misclassification error is much improved when majority class is labeled correctly. 

This causes low recall rates although accuracy can be high and becomes a larger problem when the cost of false alarms to the business is very high. To help with this problem, sampling techniques such as oversampling of the minority examples are usually used along with more sophisticated techniques which are not covered in this notebook.

Link: https://github.com/scikit-learn-contrib/imbalanced-learn

In [None]:
sns.set_style("darkgrid")
plt.figure(figsize=(8, 4))
labeled_features['alerttype'].value_counts().plot(kind='bar')
plt.xlabel('Alert Occurence')
plt.ylabel('Count')

Also, due to the class imbalance problem, it is important to look at evaluation metrics other than accuracy alone and compare those metrics to the baseline metrics which are computed when random chance is used to make predictions rather than a machine learning model. The comparison will bring out the value and benefits of using a machine learning model better.

In [None]:
from sklearn.metrics import confusion_matrix, recall_score, accuracy_score, precision_score

def Evaluate(predicted, actual, labels):
    output_labels = []
    output = []
    
    # Calculate and display confusion matrix
    cm = confusion_matrix(actual, predicted, labels=labels)
    print('Confusion matrix\n- x-axis is true labels (none, comp1, etc.)\n- y-axis is predicted labels')
    print(cm)
    
    # Calculate precision, recall, and F1 score
    accuracy = np.array([float(np.trace(cm)) / np.sum(cm)] * len(labels))
    precision = precision_score(actual, predicted, average=None, labels=labels)
    recall = recall_score(actual, predicted, average=None, labels=labels)
    f1 = 2 * precision * recall / (precision + recall)
    output.extend([accuracy.tolist(), precision.tolist(), recall.tolist(), f1.tolist()])
    output_labels.extend(['accuracy', 'precision', 'recall', 'F1'])
    
    # Calculate the macro versions of these metrics
    output.extend([[np.mean(precision)] * len(labels),
                   [np.mean(recall)] * len(labels),
                   [np.mean(f1)] * len(labels)])
    output_labels.extend(['macro precision', 'macro recall', 'macro F1'])
    
    # Find the one-vs.-all confusion matrix
    cm_row_sums = cm.sum(axis = 1)
    cm_col_sums = cm.sum(axis = 0)
    s = np.zeros((2, 2))
    for i in range(len(labels)):
        v = np.array([[cm[i, i],
                       cm_row_sums[i] - cm[i, i]],
                      [cm_col_sums[i] - cm[i, i],
                       np.sum(cm) + cm[i, i] - (cm_row_sums[i] + cm_col_sums[i])]])
        s += v
    s_row_sums = s.sum(axis = 1)
    
    # Add average accuracy and micro-averaged  precision/recall/F1
    avg_accuracy = [np.trace(s) / np.sum(s)] * len(labels)
    micro_prf = [float(s[0,0]) / s_row_sums[0]] * len(labels)
    output.extend([avg_accuracy, micro_prf])
    output_labels.extend(['average accuracy',
                          'micro-averaged precision/recall/F1'])
    
    # Compute metrics for the majority classifier
    mc_index = np.where(cm_row_sums == np.max(cm_row_sums))[0][0]
    cm_row_dist = cm_row_sums / float(np.sum(cm))
    mc_accuracy = 0 * cm_row_dist; mc_accuracy[mc_index] = cm_row_dist[mc_index]
    mc_recall = 0 * cm_row_dist; mc_recall[mc_index] = 1
    mc_precision = 0 * cm_row_dist
    mc_precision[mc_index] = cm_row_dist[mc_index]
    mc_F1 = 0 * cm_row_dist;
    mc_F1[mc_index] = 2 * mc_precision[mc_index] / (mc_precision[mc_index] + 1)
    output.extend([mc_accuracy.tolist(), mc_recall.tolist(),
                   mc_precision.tolist(), mc_F1.tolist()])
    output_labels.extend(['majority class accuracy', 'majority class recall',
                          'majority class precision', 'majority class F1'])
        
    # Random accuracy and kappa
    cm_col_dist = cm_col_sums / float(np.sum(cm))
    exp_accuracy = np.array([np.sum(cm_row_dist * cm_col_dist)] * len(labels))
    kappa = (accuracy - exp_accuracy) / (1 - exp_accuracy)
    output.extend([exp_accuracy.tolist(), kappa.tolist()])
    output_labels.extend(['expected accuracy', 'kappa'])
    

    # Random guess
    rg_accuracy = np.ones(len(labels)) / float(len(labels))
    rg_precision = cm_row_dist
    rg_recall = np.ones(len(labels)) / float(len(labels))
    rg_F1 = 2 * cm_row_dist / (len(labels) * cm_row_dist + 1)
    output.extend([rg_accuracy.tolist(), rg_precision.tolist(),
                   rg_recall.tolist(), rg_F1.tolist()])
    output_labels.extend(['random guess accuracy', 'random guess precision',
                          'random guess recall', 'random guess F1'])
    
    # Random weighted guess
    rwg_accuracy = np.ones(len(labels)) * sum(cm_row_dist**2)
    rwg_precision = cm_row_dist
    rwg_recall = cm_row_dist
    rwg_F1 = cm_row_dist
    output.extend([rwg_accuracy.tolist(), rwg_precision.tolist(),
                   rwg_recall.tolist(), rwg_F1.tolist()])
    output_labels.extend(['random weighted guess accuracy',
                          'random weighted guess precision',
                          'random weighted guess recall',
                          'random weighted guess F1'])

    output_df = pd.DataFrame(output, columns=labels)
    output_df.index = output_labels
                  
    return output_df

In [None]:
evaluation_results = []
for i, test_result in enumerate(test_results):
    print('\nSplit %d:' % (i+1))
    evaluation_result = Evaluate(actual = test_result['alerttype'],
                                 predicted = test_result['predicted_failure'],
                                 labels = ['none', 'alert1', 'alert2', 'alert3', 'alert4'])
    evaluation_results.append(evaluation_result)
    
evaluation_results[0]  # show full results for first split only

In predictive maintenance, we are often most concerned with how many of the actual failures were predicted by the model, i.e. the model's recall. (Recall becomes more important as the consequences of false negatives -- true failures that the model did not predict -- exceed the consequences of false positives, viz. false prediction of impending failure.) 

Below, we compare the recall rates for each failure type for the three models. The recall rates for all components as well as no failure are all above 90% meaning the model was able to capture above 90% of the failures correctly.

In [None]:
recall_df = pd.DataFrame([evaluation_results[0].loc['recall'].values,
                          evaluation_results[1].loc['recall'].values,
                          evaluation_results[2].loc['recall'].values],
                         columns = ['none', 'alert1', 'alert2', 'alert3', 'alert4'],
                         index = ['recall for first split',
                                  'recall for second split',
                                  'recall for third split'])
recall_df

### Summary

In this notebook, the steps of implementing a predictive maintenance model is provided using an example scenario where the goal is to predict alerts on a machine. Typical steps of predictive maintenance such as feature engineering, labelling, training and evaluation are done using a synthetic data set.