# Predictive Maintenance Demonstrator
### Developed by: Mustafa Aldemir & Ahmed Elsenousi

This notebook demonstrates how you can train a model for Predictive Maintenance using a simple LSTM architecture.

Note: The notebook uses a sample dataset provided in the repository. You should collect data from your Predictive Maintenance Demonstrator for an accurate prediction.

<!-- ![title](images/predmaint1.jpg) -->

- Greengrass,
- IoT Core
- 

<!-- ![title](images/architecture.png) -->
<!-- TODO: Update the diagram of the demonstrator and add explanation about how it works -->

#### Improvement Areas:
- Train in Sagemaker Training Job
- Build a Sagemaker Pipeline
- Automate deployment on Greengrass
- Convert the model to TFLite

### Install required packages

In [None]:
!pip install numpy pandas matplotlib keras --quiet

### Import required packages

In [None]:
import json
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from IPython.display import display, HTML

In [None]:
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.optimizers import Adam

### Some constant definitions

In [None]:
RAW_DATA_FILE = 'data/raw_data.csv'
CLEANED_DATA_FILE = 'data/cleaned_data.csv'
MODEL_FILE = 'model/lstm.h5'

In [None]:
# This is the ID of the Greengrass group in the sample dataset.
#Replace it with your own Greengrass Group ID
MY_GREENGRASS_GROUP_ID = "22c680b6-96ab-4e1c-920e-9e1c96df9e31"

In [None]:
MAX_VIBRATION_FLAG = 1

In [None]:
TRAIN_TEST_SPLIT_RATIO = 0.80

### Load the data

In [None]:
# load the dataset to a dataframe
raw_data = pd.read_csv(RAW_DATA_FILE, sep='\t', encoding='utf-8')

In [None]:
# shape of the dataframe
print(f"{raw_data.shape[0]} rows and {raw_data.shape[1]} columns")

# columns
print(f"Columns: {raw_data.columns.values.tolist()}")

In [None]:
# display the first rows of the dataframe
raw_data.head()

### Format the data

In [None]:
raw_data.dropna(subset=['ts'], inplace=True)

In [None]:
raw_data = raw_data.sort_values(by=['ts'])

In [None]:
# raw_data['ts'] = pd.to_datetime(raw_data['ts'], unit='s')

### Get a specific Greengrass Group

In [None]:
# Greengrass groups in the dataset
print(f"Greengrass groups: {raw_data['greengrass_group_id'].unique().tolist()}")

In [None]:
raw_data = raw_data[raw_data['greengrass_group_id'] == MY_GREENGRASS_GROUP_ID]

### Drop unnecessary columns

In [None]:
cleaned_data = raw_data.drop(['__dt','greengrass_group_id'], axis = 1)

### Calculate the remaining time for each row

In [None]:
# https://stackoverflow.com/questions/62819482/efficient-way-of-row-based-calculation-in-pandas/62820025#62820025
cleaned_data['RUL'] = cleaned_data['ts'].where(cleaned_data['max_vibration'].eq(MAX_VIBRATION_FLAG)).bfill()- cleaned_data['ts']

In [None]:
# the last rows have NaN diff
cleaned_data = cleaned_data[cleaned_data['RUL'].notna()]

In [None]:
cleaned_data.head()

### Investigate any anomalies

In [None]:
cleaned_data.hist(column='RUL')

In [None]:
# cleaned_data = cleaned_data.drop(cleaned_data[cleaned_data['RUL']>1000].index)

In [None]:
# plot all columns
values = cleaned_data.values

plt.figure(figsize=(15,60))
i = 1
for group in range(cleaned_data.shape[1]):
    plt.subplot(cleaned_data.shape[1], 1, i)
    plt.plot(values[:, group])
    plt.title(cleaned_data.columns[group], y=0.5, loc='right')
    i += 1
plt.show()

### Save in a file

In [None]:
cleaned_data.to_csv(CLEANED_DATA_FILE, sep='\t', index=False, encoding='utf-8')

In [None]:
cleaned_data.shape

### Format the data for training

In [None]:
num_steps = 4
# use only these features
used_features = ['zrmsvelocity', 'temperature', 'xrmsvelocity', 'xpeakacceleration',
       'zpeakacceleration', 'zrmsacceleration', 'xrmsacceleration',
       'zkurtosis', 'xkurtosis', 'zcrestfactor', 'xcrestfactor',
       'zpeakvelocity', 'xpeakvelocity', 'zhfrmsacceleration',
       'xhfrmsacceleration']
num_features = len(used_features)

In [None]:
x = np.array(cleaned_data.iloc[:,0:num_features])

In [None]:
used_data_len = len(x) - len(x) % num_steps
x = x[0:used_data_len]

In [None]:
x = np.array(x)
x_shaped = np.reshape(x[0:used_data_len], newshape=(-1, num_steps, num_features))
x_shaped.shape

In [None]:
y_shaped = np.array(cleaned_data['RUL'][::num_steps])
y_shaped.shape

### Split train-test data

In [None]:
TRAIN_TEST_SPLIT_RATIO = 0.80

In [None]:
train_ind = int(TRAIN_TEST_SPLIT_RATIO * x_shaped.shape[0])

x_train = x_shaped[:train_ind]
y_train = y_shaped[:train_ind]
x_test = x_shaped[train_ind:]
y_test = y_shaped[train_ind:]

print(f"Training features shape: {x_train.shape}")
print(f"Training labels shape: {y_train.shape}")
print(f"Test features shape: {x_test.shape}")
print(f"Test labels shape: {y_test.shape}")

### Design the model

In [None]:
# Option 1: The simplest LSTM model
model = Sequential()
model.add(LSTM(50, activation='relu', input_shape=(num_steps, num_features)))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')

In [None]:
# Option 2: A slightly more complex LSTM model
model = Sequential()
model.add(LSTM(100, activation='tanh', input_shape=(num_steps, num_features), 
               return_sequences=False))
model.add(Dense(units=50, activation='relu'))
model.add(Dense(units=1, activation='linear'))
adam = Adam(lr=0.001)
model.compile(optimizer=adam, loss='mse')

### Train the model

In [None]:
history = model.fit(x_train, y_train, epochs=1000, validation_split=0, verbose=1)

### Make predictions and plot

In [None]:
# Make a prediction for a sample point
model.predict(x_train[55].reshape(-1, num_steps, num_features))

In [None]:
# Make predictions for the full dataset
x_shaped_prediction = model.predict(x_shaped)

In [None]:
plt.style.use('ggplot')
plt.figure(figsize=(20, 7))
plt.plot(y_shaped, label="True value")
plt.plot(x_shaped_prediction.ravel(), label="Predicted value")
plt.ylabel("Remaining Useful Life (RUL) in seconds")
plt.xlabel("t")
plt.legend()

### Store the model artifacts

In [None]:
model.save(MODEL_FILE)

### Next Steps

Now the trained model is saved as file MODEL_FILE. You can deploy it on Greengrass device to make predictions at the edge.


#TODO: explain how to deploy the model on Greengrass device