This notebook is creating features for normal fan operation and operations with anomaly.  
Feature created is Median Absolute Deviation . 
This feature is selected as it is robust to outliers than the other statistics such as mean , standard deviation etc.  
MAD value of the 200 records is calculated for each of the 'x' , 'y' and 'z' axes.  


In [1]:
import pandas as pd
import numpy as np
from scipy import stats
import seaborn as sn
import matplotlib.pyplot as plt
%matplotlib inline

In [3]:
import tensorflow as tf
from tensorflow.keras import layers, models, optimizers, regularizers

In [4]:
# Print versions
!python --version
print('Numpy ' + np.__version__)
print('TensorFlow ' + tf.__version__)
print('Keras ' + tf.keras.__version__)

Python 3.8.3
Numpy 1.19.5
TensorFlow 2.4.1
Keras 2.4.0


In [5]:
from sklearn.metrics import classification_report, confusion_matrix,precision_recall_curve,auc

In [6]:
from sklearn.model_selection import train_test_split

Read in the normal and anomaly data .  
hese are 200 * 200 records of accelerometer readings for each fan speed , with and with out weight.  

In [7]:
fan_normal = pd.read_csv("fan_normal_op.csv")

In [8]:
fan_anomaly = pd.read_csv("fan_anomaly_op.csv")

In [9]:
fan_normal.shape

(120600, 6)

In [10]:
fan_normal.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 120600 entries, 0 to 120599
Data columns (total 6 columns):
 #   Column  Non-Null Count   Dtype  
---  ------  --------------   -----  
 0   x       120600 non-null  float64
 1   y       120600 non-null  float64
 2   z       120600 non-null  float64
 3   count   120600 non-null  int64  
 4   weight  120600 non-null  int64  
 5   speed   120600 non-null  int64  
dtypes: float64(3), int64(3)
memory usage: 5.5 MB


In [11]:
fan_normal.head()

Unnamed: 0,x,y,z,count,weight,speed
0,-0.012695,0.203125,0.970703,0,0,0
1,0.000488,0.208008,0.979981,0,0,0
2,-0.027344,0.196289,0.984375,0,0,0
3,0.005371,0.217285,0.991699,0,0,0
4,-0.02002,0.199219,0.980469,0,0,0


In [12]:
fan_anomaly.shape

(120600, 6)

In [13]:
fan_anomaly.head()

Unnamed: 0,x,y,z,count,weight,speed
0,-0.019043,0.20166,0.974609,0,1,0
1,-0.019043,0.189453,0.975586,0,1,0
2,-0.018555,0.192871,0.982422,0,1,0
3,-0.020508,0.192383,0.979004,0,1,0
4,-0.020996,0.193359,0.976563,0,1,0


In [14]:
fan_anomaly.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 120600 entries, 0 to 120599
Data columns (total 6 columns):
 #   Column  Non-Null Count   Dtype  
---  ------  --------------   -----  
 0   x       120600 non-null  float64
 1   y       120600 non-null  float64
 2   z       120600 non-null  float64
 3   count   120600 non-null  int64  
 4   weight  120600 non-null  int64  
 5   speed   120600 non-null  int64  
dtypes: float64(3), int64(3)
memory usage: 5.5 MB


For each 200 records sample of accelerometer readings, calculate the median absolute deviation 

In [15]:
normal_mad_x = fan_normal.groupby(['count'])['x'].agg([lambda x :stats.median_abs_deviation(x)])
normal_mad_x.columns = ['mad_x']
normal_mad_x

Unnamed: 0_level_0,mad_x
count,Unnamed: 1_level_1
0,0.015381
1,0.015625
2,0.015137
3,0.014648
4,0.014160
...,...
196,0.015137
197,0.015381
198,0.014404
199,0.014648


In [16]:
normal_mad_y = fan_normal.groupby(['count'])['y'].agg([lambda x :stats.median_abs_deviation(x)])
normal_mad_y.columns = ['mad_y']
normal_mad_y

Unnamed: 0_level_0,mad_y
count,Unnamed: 1_level_1
0,0.011718
1,0.012939
2,0.011963
3,0.011963
4,0.011718
...,...
196,0.012207
197,0.012696
198,0.011231
199,0.012207


In [17]:
normal_mad_z = fan_normal.groupby(['count'])['z'].agg([lambda x :stats.median_abs_deviation(x)])
normal_mad_z.columns = ['mad_z']
normal_mad_z

Unnamed: 0_level_0,mad_z
count,Unnamed: 1_level_1
0,0.005860
1,0.006348
2,0.007080
3,0.006348
4,0.005859
...,...
196,0.006348
197,0.006348
198,0.005859
199,0.006836


Combine the MAD readings for x, y and z into a single file.  

In [18]:
normal_df = normal_mad_x.merge(normal_mad_y, left_index =True , right_index =True)
normal_df

Unnamed: 0_level_0,mad_x,mad_y
count,Unnamed: 1_level_1,Unnamed: 2_level_1
0,0.015381,0.011718
1,0.015625,0.012939
2,0.015137,0.011963
3,0.014648,0.011963
4,0.014160,0.011718
...,...,...
196,0.015137,0.012207
197,0.015381,0.012696
198,0.014404,0.011231
199,0.014648,0.012207


In [19]:
normal_df = normal_df.merge(normal_mad_z, left_index =True , right_index =True)
normal_df

Unnamed: 0_level_0,mad_x,mad_y,mad_z
count,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0.015381,0.011718,0.005860
1,0.015625,0.012939,0.006348
2,0.015137,0.011963,0.007080
3,0.014648,0.011963,0.006348
4,0.014160,0.011718,0.005859
...,...,...,...
196,0.015137,0.012207,0.006348
197,0.015381,0.012696,0.006348
198,0.014404,0.011231,0.005859
199,0.014648,0.012207,0.006836


In [20]:
normal_df.head()

Unnamed: 0_level_0,mad_x,mad_y,mad_z
count,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0.015381,0.011718,0.00586
1,0.015625,0.012939,0.006348
2,0.015137,0.011963,0.00708
3,0.014648,0.011963,0.006348
4,0.01416,0.011718,0.005859


Add a column which stands for 'normal' readings as 0 .

In [21]:
normal_df['anomaly'] = 0

In [22]:
normal_df.head()

Unnamed: 0_level_0,mad_x,mad_y,mad_z,anomaly
count,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,0.015381,0.011718,0.00586,0
1,0.015625,0.012939,0.006348,0
2,0.015137,0.011963,0.00708,0
3,0.014648,0.011963,0.006348,0
4,0.01416,0.011718,0.005859,0


Similarly , create a file for MAD readings for Anomaly readings.  

In [23]:
anomaly_mad_x = fan_anomaly.groupby(['count'])['x'].agg([lambda x :stats.median_abs_deviation(x)])
anomaly_mad_x.columns = ['mad_x']
anomaly_mad_x

Unnamed: 0_level_0,mad_x
count,Unnamed: 1_level_1
0,0.011231
1,0.009766
2,0.011718
3,0.010254
4,0.010254
...,...
196,0.008301
197,0.009278
198,0.009277
199,0.009765


In [24]:
anomaly_mad_y = fan_anomaly.groupby(['count'])['y'].agg([lambda x :stats.median_abs_deviation(x)])
anomaly_mad_y.columns = ['mad_y']
anomaly_mad_y

Unnamed: 0_level_0,mad_y
count,Unnamed: 1_level_1
0,0.009033
1,0.009277
2,0.010253
3,0.009765
4,0.009521
...,...
196,0.008789
197,0.008301
198,0.008301
199,0.008789


In [25]:
anomaly_mad_z = fan_anomaly.groupby(['count'])['z'].agg([lambda x :stats.median_abs_deviation(x)])
anomaly_mad_z.columns = ['mad_z']
anomaly_mad_z

Unnamed: 0_level_0,mad_z
count,Unnamed: 1_level_1
0,0.005859
1,0.006347
2,0.006836
3,0.006347
4,0.006836
...,...
196,0.006348
197,0.006347
198,0.006348
199,0.006348


In [26]:
anomaly_df = anomaly_mad_x.merge(anomaly_mad_y, left_index =True , right_index =True)
anomaly_df

Unnamed: 0_level_0,mad_x,mad_y
count,Unnamed: 1_level_1,Unnamed: 2_level_1
0,0.011231,0.009033
1,0.009766,0.009277
2,0.011718,0.010253
3,0.010254,0.009765
4,0.010254,0.009521
...,...,...
196,0.008301,0.008789
197,0.009278,0.008301
198,0.009277,0.008301
199,0.009765,0.008789


In [27]:
anomaly_df = anomaly_df.merge(anomaly_mad_z, left_index =True , right_index =True)
anomaly_df

Unnamed: 0_level_0,mad_x,mad_y,mad_z
count,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0.011231,0.009033,0.005859
1,0.009766,0.009277,0.006347
2,0.011718,0.010253,0.006836
3,0.010254,0.009765,0.006347
4,0.010254,0.009521,0.006836
...,...,...,...
196,0.008301,0.008789,0.006348
197,0.009278,0.008301,0.006347
198,0.009277,0.008301,0.006348
199,0.009765,0.008789,0.006348


Add a column for 'anomaly' as 1.

In [28]:
anomaly_df['anomaly'] = 1

In [29]:
anomaly_df.head()

Unnamed: 0_level_0,mad_x,mad_y,mad_z,anomaly
count,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,0.011231,0.009033,0.005859,1
1,0.009766,0.009277,0.006347,1
2,0.011718,0.010253,0.006836,1
3,0.010254,0.009765,0.006347,1
4,0.010254,0.009521,0.006836,1


Combine the two files for further process of modelling.  

In [30]:
features_df = normal_df.append(anomaly_df)
features_df

Unnamed: 0_level_0,mad_x,mad_y,mad_z,anomaly
count,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,0.015381,0.011718,0.005860,0
1,0.015625,0.012939,0.006348,0
2,0.015137,0.011963,0.007080,0
3,0.014648,0.011963,0.006348,0
4,0.014160,0.011718,0.005859,0
...,...,...,...,...
196,0.008301,0.008789,0.006348,1
197,0.009278,0.008301,0.006347,1
198,0.009277,0.008301,0.006348,1
199,0.009765,0.008789,0.006348,1


In [31]:
## Save the files
normal_df.to_csv("normal_features_df.csv", index = False)
anomaly_df.to_csv("anomaly_features_df.csv", index = False)
features_df.to_csv("all_features_df.csv", index = False)