# Deep Learning for Predictive Maintenance

Deep learning is one of the hottest trends in the machine learning space nowadays, and there are many fields and applications where it stands out, such as driverless cars, speech and image recognition, robotics and finance. Deep learning is a set of algorithms that is inspired by the shape of our brain (biological neural networks), and machine learning and cognitive scientists usually refer to it as Artificial Neural Networks (ANN).

Predictive maintenance is also a very popular area where many different techniques are designed to help determine the condition of an equipment in order to predict when maintenance should be performed. In predictive maintenance scenarios, data is collected over time to monitor the state of an equipment with the final goal of finding patterns to predict failures. Among the deep learning methods, Long Short Term Memory [(LSTM)](http://colah.github.io/posts/2015-08-Understanding-LSTMs/) networks are especially appealing to the predictive maintenance domain due to the fact that they are very good at learning from sequences. This fact lends itself to their applications using time series data by making it possible to look back for longer periods of time to detect failure patterns. 

In this notebook, we build an LSTM network for the data set and scenerio described at [Predictive Maintenance Template](https://gallery.cortanaintelligence.com/Collection/Predictive-Maintenance-Template-3) to predict remaining useful life of aircraft engines. In summary, the template uses simulated aircraft sensor values to predict when an aircraft engine will fail in the future so that maintenance can be planned in advance. 

This notebook serves as a tutorial for beginners looking to apply deep learning in predictive maintenance domain and uses a simple scenario where only one data source (sensor values) is used to make predictions. In more advanced predictive maintenance scenarios such as in [Predictive Maintenance Modelling Guide](https://gallery.cortanaintelligence.com/Notebook/Predictive-Maintenance-Modelling-Guide-R-Notebook-1), there are many other data sources (i.e. historical maintenance records, error logs, machine and operator features etc.) which may require different types of treatments to be used in the deep learning networks. Since predictive maintenance is not a typical domain for deep learning, its application is an open area of research. 

This notebook uses [keras](https://keras.io/) deep learning library with Microsoft Cognitive Toolkit [CNTK](https://docs.microsoft.com/en-us/cognitive-toolkit/Using-CNTK-with-Keras) as backend.

## Step 2: Model Building

Using the sensor data sets explored and constructed in the `1_Data Ingestion and Preparation.ipynb` Jupyter notebook, this notebook loads the data from the Azure Blob container and builds an LSTM network for scenerio described at [Predictive Maintenance Template](https://gallery.cortanaintelligence.com/Collection/Predictive-Maintenance-Template-3) to predict remaining useful life of aircraft engines. We then store the model for deployment in an Azure web service. We will prepare and build the web service in the `Code/3_Operationalization.ipynb` Jupyter notebook.


In [1]:
import h5py

In [2]:
import keras

Using TensorFlow backend.


In [3]:
# import the libraries
import os
import glob
import time
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import urllib
from scipy import stats

# Setup the pyspark environment
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

# Setting seed for reproducability
np.random.seed(1234)  
PYTHONHASHSEED = 0
from sklearn import preprocessing
from sklearn.metrics import confusion_matrix, recall_score, precision_score
from keras.models import Sequential
from keras.layers import Dense, Dropout, LSTM, Activation
%matplotlib inline

# For Azure blob storage access
from azure.storage.blob import BlockBlobService
from azure.storage.blob import PublicAccess

# Load feature data set

We have previously created the labeled feature data set in the `Code\1_Data Ingestion and Preparation.ipynb` Jupyter notebook. Since the Azure Blob storage account name and account key are not passed between notebooks, you'll need your credentials here again.

In [4]:
# Enter your Azure blob storage details here 
#ACCOUNT_NAME = "<your blob storage account name>"
ACCOUNT_NAME = "mlsamplespmfrancesca638"

# You can find the account key under the _Access Keys_ link in the 
# [Azure Portal](portal.azure.com) page for your Azure storage container.
#ACCOUNT_KEY = "<your blob storage account key>"
ACCOUNT_KEY = "eeoYWNozGWaFq6mykM1ZTTcCDX+mHElWCpooqO2fYIZhFOC5W9JsZkxDROCsfzNmANSm+WM6l+tXZFDwUjB+3A=="

#-------------------------------------------------------------------------------------------
# The data from the Data Ingestion and Preparation notebook is stored in the sensordata ingestion container.
CONTAINER_NAME = "sensordataingestiontest"

# Connect to your blob service     
az_blob_service = BlockBlobService(account_name=ACCOUNT_NAME, account_key=ACCOUNT_KEY)

# We will store and read each of these data sets in blob storage in an 
# Azure Storage Container on your Azure subscription.
# See https://github.com/Azure/ViennaDocs/blob/master/Documentation/UsingBlobForStorage.md
# for details.

# This is the final feature data file. #FEATURES_LOCAL_DIRECT
TRAIN_DATA = 'PM_train_files.parquet'
TEST_DATA = 'PM_test_files.parquet'

# This is where we store the final model data file.
LOCAL_DIRECT = 'model_result.parquet'

Load the data and dump a short summary of the resulting DataFrame.

In [5]:
# load the previous created final dataset into the workspace
# create a local path where we store results
if not os.path.exists(TRAIN_DATA):
    os.makedirs(TRAIN_DATA)
    print('DONE creating a local directory!')

# download the entire parquet result folder to local path for a new run 
for blob in az_blob_service.list_blobs(CONTAINER_NAME):
    if TRAIN_DATA in blob.name:
        local_file = os.path.join(TRAIN_DATA, os.path.basename(blob.name))
        az_blob_service.get_blob_to_path(CONTAINER_NAME, blob.name, local_file)
        
# load the previous created final dataset into the workspace
# create a local path where we store results
if not os.path.exists(TRAIN_DATA):
    os.makedirs(TRAIN_DATA)
    print('DONE creating a local directory!')

# download the entire parquet result folder to local path for a new run 
for blob in az_blob_service.list_blobs(CONTAINER_NAME):
    if TRAIN_DATA in blob.name:
        local_file = os.path.join(TRAIN_DATA, os.path.basename(blob.name))
        az_blob_service.get_blob_to_path(CONTAINER_NAME, blob.name, local_file)

train_df = spark.read.parquet(TRAIN_DATA)
train_df = train_df.toPandas()
train_df.head(10)

DONE creating a local directory!


Unnamed: 0,id,cycle,setting1,setting2,setting3,s1,s2,s3,s4,s5,...,s16,s17,s18,s19,s20,s21,RUL,label1,label2,cycle_norm
0,86,69,0.591954,0.333333,0.0,0.0,0.439759,0.450621,0.380655,0.0,...,0.0,0.5,0.0,0.0,0.465116,0.567109,209,0,0,0.188366
1,86,70,0.356322,0.333333,0.0,0.0,0.472892,0.496621,0.293889,0.0,...,0.0,0.333333,0.0,0.0,0.666667,0.536868,208,0,0,0.191136
2,86,71,0.609195,0.5,0.0,0.0,0.496988,0.348812,0.44767,0.0,...,0.0,0.416667,0.0,0.0,0.457364,0.624827,207,0,0,0.193906
3,86,72,0.454023,0.666667,0.0,0.0,0.289157,0.306736,0.451047,0.0,...,0.0,0.416667,0.0,0.0,0.550388,0.579674,206,0,0,0.196676
4,86,73,0.62069,0.166667,0.0,0.0,0.409639,0.418574,0.362762,0.0,...,0.0,0.25,0.0,0.0,0.612403,0.615576,205,0,0,0.199446
5,86,74,0.666667,0.833333,0.0,0.0,0.48494,0.296926,0.294227,0.0,...,0.0,0.5,0.0,0.0,0.651163,0.57995,204,0,0,0.202216
6,86,75,0.62069,0.833333,0.0,0.0,0.496988,0.410726,0.447164,0.0,...,0.0,0.416667,0.0,0.0,0.573643,0.51947,203,0,0,0.204986
7,86,76,0.511494,0.416667,0.0,0.0,0.427711,0.301068,0.392134,0.0,...,0.0,0.5,0.0,0.0,0.604651,0.6715,202,0,0,0.207756
8,86,77,0.442529,0.583333,0.0,0.0,0.364458,0.285372,0.36445,0.0,...,0.0,0.333333,0.0,0.0,0.527132,0.784176,201,0,0,0.210526
9,86,78,0.45977,0.583333,0.0,0.0,0.433735,0.558535,0.465901,0.0,...,0.0,0.416667,0.0,0.0,0.534884,0.593897,200,0,0,0.213296


In [6]:
type(train_df)

pandas.core.frame.DataFrame

In [7]:
# load the previous created final dataset into the workspace
# create a local path where we store results
if not os.path.exists(TEST_DATA):
    os.makedirs(TEST_DATA)
    print('DONE creating a local directory!')

# download the entire parquet result folder to local path for a new run 
for blob in az_blob_service.list_blobs(CONTAINER_NAME):
    if TRAIN_DATA in blob.name:
        local_file = os.path.join(TEST_DATA, os.path.basename(blob.name))
        az_blob_service.get_blob_to_path(CONTAINER_NAME, blob.name, local_file)
        
# load the previous created final dataset into the workspace
# create a local path where we store results
if not os.path.exists(TEST_DATA):
    os.makedirs(TEST_DATA)
    print('DONE creating a local directory!')

# download the entire parquet result folder to local path for a new run 
for blob in az_blob_service.list_blobs(CONTAINER_NAME):
    if TEST_DATA in blob.name:
        local_file = os.path.join(TEST_DATA, os.path.basename(blob.name))
        az_blob_service.get_blob_to_path(CONTAINER_NAME, blob.name, local_file)

test_df = spark.read.parquet(TRAIN_DATA)
test_df = test_df.toPandas()
test_df.head(10)

DONE creating a local directory!


Unnamed: 0,id,cycle,setting1,setting2,setting3,s1,s2,s3,s4,s5,...,s16,s17,s18,s19,s20,s21,RUL,label1,label2,cycle_norm
0,86,69,0.591954,0.333333,0.0,0.0,0.439759,0.450621,0.380655,0.0,...,0.0,0.5,0.0,0.0,0.465116,0.567109,209,0,0,0.188366
1,86,70,0.356322,0.333333,0.0,0.0,0.472892,0.496621,0.293889,0.0,...,0.0,0.333333,0.0,0.0,0.666667,0.536868,208,0,0,0.191136
2,86,71,0.609195,0.5,0.0,0.0,0.496988,0.348812,0.44767,0.0,...,0.0,0.416667,0.0,0.0,0.457364,0.624827,207,0,0,0.193906
3,86,72,0.454023,0.666667,0.0,0.0,0.289157,0.306736,0.451047,0.0,...,0.0,0.416667,0.0,0.0,0.550388,0.579674,206,0,0,0.196676
4,86,73,0.62069,0.166667,0.0,0.0,0.409639,0.418574,0.362762,0.0,...,0.0,0.25,0.0,0.0,0.612403,0.615576,205,0,0,0.199446
5,86,74,0.666667,0.833333,0.0,0.0,0.48494,0.296926,0.294227,0.0,...,0.0,0.5,0.0,0.0,0.651163,0.57995,204,0,0,0.202216
6,86,75,0.62069,0.833333,0.0,0.0,0.496988,0.410726,0.447164,0.0,...,0.0,0.416667,0.0,0.0,0.573643,0.51947,203,0,0,0.204986
7,86,76,0.511494,0.416667,0.0,0.0,0.427711,0.301068,0.392134,0.0,...,0.0,0.5,0.0,0.0,0.604651,0.6715,202,0,0,0.207756
8,86,77,0.442529,0.583333,0.0,0.0,0.364458,0.285372,0.36445,0.0,...,0.0,0.333333,0.0,0.0,0.527132,0.784176,201,0,0,0.210526
9,86,78,0.45977,0.583333,0.0,0.0,0.433735,0.558535,0.465901,0.0,...,0.0,0.416667,0.0,0.0,0.534884,0.593897,200,0,0,0.213296


In [8]:
type(test_df)

pandas.core.frame.DataFrame

## Modelling

The traditional predictive maintenance machine learning models are based on feature engineering which is manual construction of right features using domain expertise and similar methods. This usually makes these models hard to reuse since feature engineering is specific to the problem scenario and the available data which varies from one business to the other. Perhaps the most attractive part of applying deep learning in the predictive maintenance domain is the fact that these networks can automatically extract the right features from the data, eliminating the need for manual feature engineering.

When using LSTMs in the time-series domain, one important parameter to pick is the sequence length which is the window for LSTMs to look back. This may be viewed as similar to picking window_size = 5 cycles for calculating the rolling features in the [Predictive Maintenance Template](https://gallery.cortanaintelligence.com/Collection/Predictive-Maintenance-Template-3) which are rolling mean and rolling standard deviation for 21 sensor values. The idea of using LSTMs is to let the model extract abstract features out of the sequence of sensor values in the window rather than engineering those manually. The expectation is that if there is a pattern in these sensor values within the window prior to failure, the pattern should be encoded by the LSTM.

One critical advantage of LSTMs is their ability to remember from long-term sequences (window sizes) which is hard to achieve by traditional feature engineering. For example, computing rolling averages over a window size of 50 cycles may lead to loss of information due to smoothing and abstracting of values over such a long period, istead, using all 50 values as input may provide better results. While feature engineering over large window sizes may not make sense, LSTMs are able to use larger window sizes and use all the information in the window as input. Below, we illustrate the approach.


In [9]:
# pick a large window size of 50 cycles
sequence_length = 50

[Keras LSTM](https://keras.io/layers/recurrent/) layers expect an input in the shape of a numpy array of 3 dimensions (samples, time steps, features) where samples is the number of training sequences, time steps is the look back window or sequence length and features is the number of features of each sequence at each time step. 

In [10]:
# function to reshape features into (samples, time steps, features) 
def gen_sequence(id_df, seq_length, seq_cols):
    """ Only sequences that meet the window-length are considered, no padding is used. This means for testing
    we need to drop those which are below the window-length. An alternative would be to pad sequences so that
    we can use shorter ones """
    data_array = id_df[seq_cols].values
    num_elements = data_array.shape[0]
    for start, stop in zip(range(0, num_elements-seq_length), range(seq_length, num_elements)):
        yield data_array[start:stop, :]

In [11]:
# pick the feature columns 
sensor_cols = ['s' + str(i) for i in range(1,22)]
sequence_cols = ['setting1', 'setting2', 'setting3', 'cycle_norm']
input_features = sensor_cols + sequence_cols
input_features
sequence_cols.extend(sensor_cols)

In [12]:
# generator for the sequences
seq_gen = (list(gen_sequence(train_df[test_df['id']==id], sequence_length, sequence_cols)) 
           for id in train_df['id'].unique())

In [13]:
# generate sequences and convert to numpy array
seq_array = np.concatenate(list(seq_gen)).astype(np.float32)
seq_array.shape

(15631, 50, 25)

In [14]:
# function to generate labels
def gen_labels(id_df, seq_length, label):
    data_array = id_df[label].values
    num_elements = data_array.shape[0]
    return data_array[seq_length:num_elements, :]

In [15]:
# generate labels
label_gen = [gen_labels(train_df[train_df['id']==id], sequence_length, ['label1']) 
             for id in train_df['id'].unique()]
label_array = np.concatenate(label_gen).astype(np.float32)
label_array.shape

(15631, 1)

## LSTM Network
Next, we build a deep network. The first layer is an LSTM layer with 100 units followed by another LSTM layer with 50 units. Dropout is also applied after each LSTM layer to control overfitting. Final layer is a Dense output layer with single unit and sigmoid activation since this is a binary classification problem.

In [16]:
# build the network
nb_features = seq_array.shape[2]
nb_out = label_array.shape[1]

model = Sequential()

model.add(LSTM(
         input_shape=(sequence_length, nb_features),
         units=100,
         return_sequences=True))
model.add(Dropout(0.2))

model.add(LSTM(
          units=50,
          return_sequences=False))
model.add(Dropout(0.2))

model.add(Dense(units=nb_out, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

In [17]:
print(model.summary())

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_1 (LSTM)                (None, 50, 100)           50400     
_________________________________________________________________
dropout_1 (Dropout)          (None, 50, 100)           0         
_________________________________________________________________
lstm_2 (LSTM)                (None, 50)                30200     
_________________________________________________________________
dropout_2 (Dropout)          (None, 50)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 51        
Total params: 80,651
Trainable params: 80,651
Non-trainable params: 0
_________________________________________________________________
None


In [18]:
%%time
# fit the network
model.fit(seq_array, label_array, epochs=10, batch_size=200, validation_split=0.05, verbose=1,
          callbacks = [keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0, patience=0, verbose=0, mode='auto')])

Train on 14849 samples, validate on 782 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
CPU times: user 3min 13s, sys: 39.9 s, total: 3min 53s
Wall time: 42.8 s


<keras.callbacks.History at 0x7f9397f01240>

In [19]:
# training metrics
scores = model.evaluate(seq_array, label_array, verbose=1, batch_size=200)
print('Accurracy: {}'.format(scores[1]))

Accurracy: 0.9579681406612321


In [20]:
# make predictions and compute confusion matrix
y_pred = model.predict_classes(seq_array,verbose=1, batch_size=200)
y_true = label_array
print('Confusion matrix\n- x-axis is true labels.\n- y-axis is predicted labels')
cm = confusion_matrix(y_true, y_pred)
cm

Confusion matrix
- x-axis is true labels.
- y-axis is predicted labels


array([[12546,    49],
       [  608,  2428]])

In [21]:
# compute precision and recall
precision = precision_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)
print( 'precision = ', precision, '\n', 'recall = ', recall)

precision =  0.980218005652 
 recall =  0.799736495389


# Persist the model


We'll save the latest model for use in deploying a webservice for operationalization in the next notebook. We store this local to the Jupyter notebook kernel because the model is stored in a hierarchical format that does not translate to Azure Blob storage well.

In [23]:
import pickle
import h5py
from sklearn import datasets 

# save model
model.save(os.environ['AZUREML_NATIVE_SHARE_DIRECTORY']+'pdmrfull.model')
#model.save('./outputs/lstm_model.h5py')
print("Model saved")

Model saved


# Conclusion

In the next notebook Code\3_operationalization.ipynb Jupyter notebook we will create the functions needed to operationalize and deploy any model to get realtime predictions.