# Advanced Data Science Capstone
# Nasa Bearings Dataset

In [1]:
pip install tensorflow==2.2.0rc

Note: you may need to restart the kernel to use updated packages.


ERROR: Could not find a version that satisfies the requirement tensorflow==2.2.0rc (from versions: 2.2.0rc1, 2.2.0rc2, 2.2.0rc3, 2.2.0rc4, 2.2.0, 2.2.1, 2.3.0rc0, 2.3.0rc1, 2.3.0rc2, 2.3.0, 2.3.1, 2.4.0rc0, 2.4.0rc1, 2.4.0rc2)
ERROR: No matching distribution found for tensorflow==2.2.0rc


In [3]:
import tensorflow as tf
if not tf.__version__ == '2.2.0-rc0':
    print(tf.__version__)
    raise ValueError('please upgrade to TensorFlow 2.2.0-rc0, or restart your Kernel (Kernel->Restart & Clear Output)')

Now we import all the dependencies 

In [1]:
import numpy as np
from numpy import concatenate
from matplotlib import pyplot
from pandas import read_csv
from pandas import DataFrame
from pandas import concat
import sklearn
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.layers import LSTM
from tensorflow.keras.callbacks import Callback
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Activation
import pickle
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import sys
from queue import Queue
import pandas as pd
import json
%matplotlib inline

We grab the files necessary for taining. Those are sampled from the lorenz attractor model implemented in NodeRED. Those are two serialized pickle numpy arrays. In case you are interested in how these data has been generated please have a look at the following tutorial. https://developer.ibm.com/tutorials/iot-deep-learning-anomaly-detection-2/

In [None]:
!rm watsoniotp.*
!wget https://raw.githubusercontent.com/Zernez/Coursera_Advanced_Capstone/main/Dataset/DataBroken_InnerBearing.csv
!wget https://raw.githubusercontent.com/Zernez/Coursera_Advanced_Capstone/main/Dataset/DataBroken_RollerBearing.csv
!wget https://raw.githubusercontent.com/Zernez/Coursera_Advanced_Capstone/main/Dataset/Data_Bearing4.csv
!wget https://raw.githubusercontent.com/Zernez/Coursera_Advanced_Capstone/main/Dataset/Data_Bearing1.csv   

De-serialize the numpy array containing the training data

In [None]:
df_fault_inner = pd.read_csv('DataBroken_InnerBearing.csv', sep=",", header=None, names=["Bearing_X", "Bearing_Y"])
df_fault_roller = pd.read_csv('DataBroken_RollerBearing.csv', sep=",", header=None, names=["Bearing_X", "Bearing_Y"])
df_bearing_4 = pd.read_csv('Data_Bearing4.csv', sep=",", header=None, names=["Bearing_X", "Bearing_Y"])
df_bearing_1 = pd.read_csv('Data_Bearing1.csv', sep=",", header=None, names=["Bearing_X", "Bearing_Y"])

In [None]:
df_fault_inner

Reshape to three columns and 3000 rows. In other words three vibration sensor axes and 3000 samples

Since this data is sampled from the Lorenz Attractor Model, let's plot it with a phase lot to get the typical 2-eyed plot. First for the healthy data

Then for the broken one

In the previous examples, we fed the raw data into an LSTM. Now we want to use an ordinary feed-forward network. So we need to do some pre-processing of this time series data

A widely-used method in traditional data science and signal processing is called Discrete Fourier Transformation. This algorithm transforms from the time to the frequency domain, or in other words, it returns the frequency spectrum of the signals.

The most widely used implementation of the transformation is called FFT, which stands for Fast Fourier Transformation, let’s run it and see what it returns


In [None]:
data_fault_inner_fft = np.fft.fft(df_fault_inner).real
data_fault_roller_fft = np.fft.fft(df_fault_roller).real
data_bearing_4_fft = np.fft.fft(df_bearing_4).real
data_bearing_1_fft = np.fft.fft(df_bearing_4).real

Let’s first have a look at the shape and contents of the arrays.

In [None]:
data_fault_inner_fft.shape

First, we notice that the shape is the same as the input data. So if we have 3000 samples, we get back 3000 spectrum values, or in other words 3000 frequency bands with the intensities.

The second thing we notice is that the data type of the array entries is not float anymore, it is complex. So those are not complex numbers, it is just a means for the algorithm the return two different frequency compositions in one go. The real part returns a sine decomposition and the imaginary part a cosine. We will ignore the cosine part in this example since it turns out that the sine part already gives us enough information to implement a good classifier.

But first let’s plot the two arrays to get an idea how a healthy and broken frequency spectrum differ


In [None]:
fig, ax = plt.subplots(num=None, figsize=(14, 6), dpi=80, facecolor='w', edgecolor='k')
size = len(data_fault_inner_fft)
ax.plot(range(0,size), data_fault_inner_fft[:,0].real, '-', color='blue', animated = True, linewidth=1)
ax.plot(range(0,size), data_fault_inner_fft[:,1].real, '-', color='red', animated = True, linewidth=1)

In [None]:
fig, ax = plt.subplots(num=None, figsize=(14, 12), dpi=80, facecolor='w', edgecolor='k')
size = len(data_fault_roller_fft)
ax.plot(range(0,size), data_fault_roller_fft[:,0].real, '-', color='blue', animated = True, linewidth=1)
ax.plot(range(0,size), data_fault_roller_fft[:,1].real, '-', color='red', animated = True, linewidth=1)

In [None]:
fig, ax = plt.subplots(num=None, figsize=(14, 6), dpi=80, facecolor='w', edgecolor='k')
size = len(data_bearing_4_fft)
ax.plot(range(0,size), data_bearing_4_fft[:,0].real, '-', color='blue', animated = True, linewidth=1)
ax.plot(range(0,size), data_bearing_4_fft[:,1].real, '-', color='red', animated = True, linewidth=1)

In [None]:
fig, ax = plt.subplots(num=None, figsize=(14, 6), dpi=80, facecolor='w', edgecolor='k')
size = len(data_bearing_1_fft)
ax.plot(range(0,size), data_bearing_1_fft[:,0].real, '-', color='blue', animated = True, linewidth=1)
ax.plot(range(0,size), data_bearing_1_fft[:,1].real, '-', color='red', animated = True, linewidth=1)

So, what we've been doing is so called feature transformation step. We’ve transformed the data set in a way that our machine learning algorithm – a deep feed forward neural network implemented as binary classifier – works better. So now let's scale the data to a 0..1

In [None]:
def scaleData(data):
    # normalize features
    scaler = MinMaxScaler(feature_range=(0, 1))
    return scaler.fit_transform(data)

And please don’t worry about the warnings. As explained before we don’t need the imaginary part of the FFT

In [None]:
data_fault_inner_scaled = scaleData(data_fault_inner_fft)
data_fault_roller_scaled = scaleData(data_fault_roller_fft)
data_bearing_4_scaled = scaleData(data_bearing_4_fft)
data_bearing_1_scaled = scaleData(data_bearing_1_fft)

In [None]:
data_fault_inner_scaled = data_fault_inner_scaled.T
data_fault_roller_scaled = data_fault_roller_scaled.T
data_bearing_4_scaled = data_bearing_4_scaled.T
data_bearing_1_scaled = data_bearing_1_scaled.T

In [None]:
data_fault_inner_scaled.shape

In [None]:
data_fault_roller_scaled.shape

In [None]:
data_bearing_4_scaled.shape

In [None]:
data_bearing_1_scaled.shape

Now we reshape again to have three examples (rows) and 3000 features (columns). It's important that you understand this. We have turned our initial data set which containd 3 columns (dimensions) of 3000 samples. Since we applied FFT on each column we've obtained 3000 spectrum values for each of the 3 three columns. We are now using each column with the 3000 spectrum values as one row (training example) and each of the 3000 spectrum values becomes a column (or feature) in the training data set

In [None]:
data_fault_inner_scaled.reshape(2, 18000)
data_fault_roller_scaled.reshape(2, 18000)
data_bearing_4_scaled.reshape(2, 18000)
data_bearing_1_scaled.reshape(2, 18000)

# Start of Assignment

The first thing we need to do is to install a little helper library for submitting the solutions to the coursera grader:

Please specify you email address you are using with cousera here:


## Task

Given, the explanation above, please fill in the following two constants in order to make the neural network work properly

In [None]:
import time
dim = 18000
samples = 18000

### Submission

Now it’s time to submit your first solution. Please make sure that the secret variable contains a valid submission token. You can obtain it from the courser web page of the course using the grader section of this assignment.


To observe how training works we just print the loss during training

In [None]:
class LossHistory(Callback):
    def on_train_begin(self, logs={}):
        self.losses = []

    def on_batch_end(self, batch, logs={}):
        sys.stdout.write(str(logs.get('loss'))+str(', '))
        sys.stdout.flush()
        self.losses.append(logs.get('loss'))
        
lr = LossHistory()

## Task

Please fill in the following constants to properly configure the neural network. For some of them you have to find out the precise value, for others you can try and see how the neural network is performing at a later stage. The grader only looks at the values which need to be precise


In [None]:
number_of_neurons_layer1 = 1
number_of_neurons_layer2 = 1
number_of_neurons_layer3 = 1
number_of_epochs = 40

## Task

Now it’s time to create the model. Please fill in the placeholders. Please note since this is only a toy example, we don't use a separate corpus for training and testing. Just use the same data for fitting and scoring


In [None]:
# design network
from tensorflow.keras import optimizers
sgd = optimizers.SGD(lr=0.01, clipnorm=1.)

model = Sequential()
model.add(Dense(number_of_neurons_layer1,input_shape=(dim, ), activation='relu'))
model.add(Dense(number_of_neurons_layer2, activation='relu'))
model.add(Dense(number_of_neurons_layer3, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer=sgd)

def train(data,label):
    model.fit(data, label, epochs=number_of_epochs, batch_size=72, validation_data=(data, label), verbose=0, shuffle=True,callbacks=[lr])

def score(data):
    return model.predict(data)

We prepare the training data by concatenating a label “0” for the broken and a label “1” for the healthy data. Finally we union the two data sets together

In [None]:
label_healthy = np.repeat(2,2)
label_healthy.shape = (2,1)
label_broken_inner = np.repeat(0,2)
label_broken_inner.shape = (2,1)
label_broken_roller = np.repeat(1,2)
label_broken_roller.shape = (2,1)

train_healthy = np.hstack((data_bearing_1_scaled,label_healthy))
train_broken_inner = np.hstack((data_fault_inner_scaled,label_broken))
train_both = np.vstack((train_healthy,train_broken))

Let’s have a look at the two training sets for broken and healthy and at the union of them. Note that the last column is the label

In [None]:
pd.DataFrame(train_healthy)

In [None]:
pd.DataFrame(train_broken)

In [None]:
pd.DataFrame(train_both)

So those are frequency bands. Notice that although many frequency bands are having nearly the same energy, the neural network algorithm still can work those out which are significantly different. 

## Task

Now it’s time to do the training. Please provide the first 3000 columns of the array as the 1st parameter and column number 3000 containing the label as 2nd parameter. Please use the python array slicing syntax to obtain those. 

The following link tells you more about the numpy array slicing syntax
https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.indexing.html


In [None]:
features = train_both[:,0:18000]
labels = train_both[:,18000]

In [None]:
pd.DataFrame(features)

In [None]:
pd.DataFrame(labels)

Now it’s time to do the training. You should see the loss trajectory go down, we will also plot it later. Note: We also could use TensorBoard for this but for this simple scenario we skip it. In some rare cases training doesn’t converge simply because random initialization of the weights caused gradient descent to start at a sub-optimal spot on the cost hyperplane. Just recreate the model (the cell which contains *model = Sequential()*) and re-run all subsequent steps and train again



In [None]:
train(features,labels)

Let's plot the losses

In [None]:
fig, ax = plt.subplots(num=None, figsize=(14, 6), dpi=100, facecolor='w', edgecolor='k')
size = len(lr.losses)
ax.plot(range(0,size), lr.losses, '-', color='blue', animated = True, linewidth=1)

Now let’s examine whether we are getting good results. Note: best practice is to use a training and a test data set for this which we’ve omitted here for simplicity

In [None]:
score(data_bearing_1_scaled)

In [None]:
score(data_fault_inner_scaled)