                                                                                                                Burcu Belen

# Detection of Sensor Failures with CNN



### Description of Dataset

Condition monitoring of hydraulic systems data: https://archive.ics.uci.edu/ml/datasets/Condition+monitoring+of+hydraulic+systems

The dataset addresses the condition assessment of a hydraulic test rig based on multi sensor data. Four fault types are superimposed with several severity grades impeding selective quantification.
The data set was experimentally obtained with a hydraulic test rig. This test rig consists of a primary working and a secondary cooling-filtration circuit which are connected via the oil tank. The system cyclically repeats constant load cycles (duration 60 seconds) and measures process values such as pressures, volume flows and temperatures while the condition of four hydraulic components (cooler, valve, pump and accumulator) is quantitatively varied. The data set contains raw process sensor data (i.e. without feature extraction) which are structured as matrices (tab-delimited) with the rows representing the cycles and the columns the data points within a cycle. The aim is to predict the state of the four hydraulic components according to the temperature sensors (TS1, TS2, TS3, TS4) measured with a frequency of 1 Hz (60 observations for each cycle). A model made with CNN in Keras will be used to analyze the classification problem.

The data set contains raw process sensor data (i.e. without feature extraction) which are structured as matrices (tab-delimited) with the rows representing the cycles and the columns the data points within a cycle. The sensors involved are:

```
      Sensor	Physical quantity				Unit		Sampling rate
      PS1		Pressure						bar			100 Hz
      PS2		Pressure						bar			100 Hz
      PS3		Pressure						bar			100 Hz
      PS4		Pressure						bar			100 Hz
      PS5		Pressure						bar			100 Hz
      PS6		Pressure						bar			100 Hz
      EPS1	    Motor power					W			  100 Hz
      FS1		Volume flow					 l/min		  10 Hz
      FS2		Volume flow					 l/min		  10 Hz
      TS1		Temperature					 °C			 1 Hz
      TS2		Temperature					 °C			 1 Hz
      TS3		Temperature					 °C			 1 Hz
      TS4		Temperature					 °C			 1 Hz
      VS1		Vibration						mm/s		  1 Hz
      CE		Cooling efficiency (virtual)	  %			 1 Hz
      CP		Cooling power (virtual)		  kW			 1 Hz
      SE		Efficiency factor				 %			 1 Hz
```

The target condition values are cycle-wise annotated in ‘profile.txt‘ (tab-delimited). As before, the row number represents the cycle number. The columns are

```

1: Cooler condition / %:
	3: close to total failure
	20: reduced effifiency
	100: full efficiency

2: Valve condition / %:
	100: optimal switching behavior
	90: small lag
	80: severe lag
	73: close to total failure

3: Internal pump leakage:
	0: no leakage
	1: weak leakage
	2: severe leakage

4: Hydraulic accumulator / bar:
	130: optimal pressure
	115: slightly reduced pressure
	100: severely reduced pressure
	90: close to total failure

5: stable flag:
	0: conditions were stable
	1: static conditions might not have been reached yet
```

In [19]:
#Import packages
import pandas as pd
import numpy as np

from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, classification_report
from sklearn.model_selection import train_test_split
from sklearn.metrics import recall_score, f1_score

import tensorflow as tf
from tensorflow.keras.models import *
from tensorflow.keras.layers import *
from tensorflow.keras.utils import *

In [2]:
# Read data
label = pd.read_csv('profile.txt', sep='\t', header=None)
label.columns = ['Cooler','Valve','Pump','Accumulator','Flag']

data = ['TS1.txt','TS2.txt','TS3.txt','TS4.txt']
df = pd.DataFrame()

for txt in data:
    read_df = pd.read_csv(txt, sep='\t', header=None)
    df = df.append(read_df)    

print(df.shape)
df.head()

(8820, 60)


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,50,51,52,53,54,55,56,57,58,59
0,35.57,35.492,35.469,35.422,35.414,35.32,35.227,35.242,35.16,35.176,...,36.008,35.984,35.996,36.039,36.008,36.008,36.094,36.102,36.09,36.152
1,36.156,36.094,35.992,36.008,35.992,35.902,35.824,35.82,35.727,35.727,...,37.328,37.324,37.34,37.332,37.316,37.41,37.418,37.422,37.488,37.477
2,37.488,37.391,37.34,37.312,37.223,37.145,37.059,36.973,36.898,36.879,...,38.457,38.461,38.457,38.469,38.469,38.555,38.527,38.543,38.527,38.621
3,38.633,38.535,38.469,38.379,38.297,38.223,38.125,38.062,37.977,37.969,...,39.441,39.363,39.367,39.457,39.461,39.461,39.473,39.441,39.453,39.461
4,39.461,39.461,39.375,39.281,39.203,39.113,39.043,38.969,38.875,38.883,...,40.324,40.32,40.312,40.34,40.32,40.387,40.391,40.391,40.387,40.391


In [3]:
label.shape

(2205, 5)

In [4]:
df = df.sort_index().values.reshape(-1,len(data),len(df.columns)).transpose(0,2,1)
df.shape

(2205, 60, 4)

In [5]:
label = label.Cooler
label.value_counts()

100    741
20     732
3      732
Name: Cooler, dtype: int64

In [6]:
# Label mapping
diz_label, diz_reverse_label = {}, {}
for i,lab in enumerate(label.unique()):
    diz_label[lab] = i
    diz_reverse_label[i] = lab

print(diz_label)
print(diz_reverse_label)
label = label.map(diz_label)
y = to_categorical(label)

{3: 0, 20: 1, 100: 2}
{0: 3, 1: 20, 2: 100}


In [7]:
y.shape

(2205, 3)

In [8]:
# Train Test Split
X_train, X_test, y_train, y_test = train_test_split(df, y, random_state = 42, test_size=0.2)

In [9]:
# Scale Data
scaler = StandardScaler()

X_train = scaler.fit_transform(X_train.reshape(-1, X_train.shape[-1])).reshape(X_train.shape)
X_test = scaler.transform(X_test.reshape(-1, X_test.shape[-1])).reshape(X_test.shape)

In [10]:
X_train.shape

(1764, 60, 4)

In [11]:
X_test.shape

(441, 60, 4)

To capture features and non-obvious correlations of the series, a 1D CNN was used.
The model was built to classify the state of the cooling component (Cooler condition) giving as input only the time series of the temperature in array format.


In [12]:
num_sensors = 4
TIME_PERIODS = 60
BATCH_SIZE = 16
EPOCHS = 10

model_m = Sequential()
model_m.add(Conv1D(100, 6, activation='relu', input_shape=(TIME_PERIODS, num_sensors)))
model_m.add(Conv1D(100, 6, activation='relu'))
model_m.add(MaxPooling1D(3))
model_m.add(Conv1D(160, 6, activation='relu'))
model_m.add(Conv1D(160, 6, activation='relu'))
model_m.add(GlobalAveragePooling1D(name='G_A_P_1D'))
model_m.add(Dropout(0.5))
model_m.add(Dense(3, activation='softmax'))

model_m.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model_m.fit(X_train, y_train, batch_size=BATCH_SIZE, epochs=EPOCHS, validation_split=0.2, verbose=2)

Epoch 1/10
89/89 - 1s - loss: 0.2056 - accuracy: 0.9412 - val_loss: 0.0776 - val_accuracy: 0.9830 - 1s/epoch - 17ms/step
Epoch 2/10
89/89 - 1s - loss: 0.1237 - accuracy: 0.9773 - val_loss: 0.0586 - val_accuracy: 0.9802 - 873ms/epoch - 10ms/step
Epoch 3/10
89/89 - 1s - loss: 0.1033 - accuracy: 0.9780 - val_loss: 0.0840 - val_accuracy: 0.9802 - 856ms/epoch - 10ms/step
Epoch 4/10
89/89 - 1s - loss: 0.1197 - accuracy: 0.9709 - val_loss: 0.0876 - val_accuracy: 0.9745 - 782ms/epoch - 9ms/step
Epoch 5/10
89/89 - 1s - loss: 0.1078 - accuracy: 0.9766 - val_loss: 0.0670 - val_accuracy: 0.9802 - 821ms/epoch - 9ms/step
Epoch 6/10
89/89 - 1s - loss: 0.1032 - accuracy: 0.9773 - val_loss: 0.0581 - val_accuracy: 0.9887 - 828ms/epoch - 9ms/step
Epoch 7/10
89/89 - 1s - loss: 0.1049 - accuracy: 0.9766 - val_loss: 0.0690 - val_accuracy: 0.9802 - 806ms/epoch - 9ms/step
Epoch 8/10
89/89 - 1s - loss: 0.0885 - accuracy: 0.9780 - val_loss: 0.0726 - val_accuracy: 0.9830 - 824ms/epoch - 9ms/step
Epoch 9/10
89/89

The model with 10 epoch reaches an accuracy of approximately 97% for the training data.

In [13]:
model_m.evaluate(X_test, y_test, verbose=2)

14/14 - 0s - loss: 0.0415 - accuracy: 0.9932 - 66ms/epoch - 5ms/step


[0.041533395648002625, 0.9931972622871399]

In [14]:
pred_test = np.argmax(model_m.predict(X_test), axis=1)

In [15]:
print(classification_report([diz_reverse_label[np.argmax(label)] for label in y_test], 
                            [diz_reverse_label[label] for label in pred_test]))

              precision    recall  f1-score   support

           3       1.00      0.99      0.99       152
          20       0.98      1.00      0.99       135
         100       1.00      0.99      1.00       154

    accuracy                           0.99       441
   macro avg       0.99      0.99      0.99       441
weighted avg       0.99      0.99      0.99       441



In [24]:
recall_score([diz_reverse_label[np.argmax(label)] for label in y_test], 
         [diz_reverse_label[label] for label in pred_test],
         average='micro')

0.9931972789115646

In [25]:
recall_score([diz_reverse_label[np.argmax(label)] for label in y_test], 
         [diz_reverse_label[label] for label in pred_test],
         average='macro')

0.9934495329232171

In [23]:
f1_score([diz_reverse_label[np.argmax(label)] for label in y_test], 
         [diz_reverse_label[label] for label in pred_test],
         average='micro')

0.9931972789115646

In [22]:
f1_score([diz_reverse_label[np.argmax(label)] for label in y_test], 
         [diz_reverse_label[label] for label in pred_test],
         average='macro')

0.9930437144881566