# Hydraulic Rig

## 0 Intro:

This notebook works on machine learning to forecast failures in a Test Hydaraulic Rig. Several parameters are experimentally sensored in the circuit do provide data. The test rig consists of a primary hydraulic working and a secondary cooling-filtration circuit conected by the oil tank.

<img src="HydraulicSystem.png">

There are 17 sensors represented in the image above measuring 2205 tests cycles of 60 seconds in diferent measuring frequencies. The table above shows its details.


<table style='width:100%'>    
      <tr>
              <th>Sensor
              <th>Variável do processo
              <th>Unidade
              <th>Amostragem
         
      <tr>
              <th>PS1
              <th>Pressão
              <th>bar
              <th>100 Hz
    
      <tr>
              <th>PS2
              <th>Pressão
              <th>bar
              <th>100 Hz
        
      <tr>
              <th>PS3
              <th>Pressão
              <th>bar
              <th>100 Hz
    
      <tr>
              <th>PS4
              <th>Pressão
              <th>bar
              <th>100 Hz
    
      <tr>
              <th>PS5
              <th>Pressão
              <th>bar
              <th>100 Hz
         
      <tr>
              <th>PS6
              <th>Pressão
              <th>bar
              <th>100 Hz
    
      <tr>
              <th>EPS1
              <th>Potência do Motor
              <th>W
              <th>100 Hz
    
      <tr>
              <th>FS1
              <th>Vazão
              <th>l/min
              <th>10 Hz
    
      <tr>
              <th>FS2
              <th>Vazão
              <th>l/min
              <th>10 Hz
    
      <tr>
              <th>TS1
              <th>Temperatura
              <th>ºC
              <th>1 Hz
    
      <tr>
              <th>TS2
              <th>Temperatura
              <th>ºC
              <th>1 Hz 
    
      <tr>
              <th>TS3
              <th>Temperatura
              <th>ºC
              <th>1 Hz
          
      <tr>
              <th>TS4
              <th>Temperatura
              <th>ºC
              <th>1 Hz
    
      <tr>
              <th>VS1
              <th>Vibração
              <th>mm/s
              <th>1 Hz
       
      <tr>
              <th>CE
              <th>Eficiência do resfriamento
              <th>%
              <th>1 Hz
       
      <tr>
              <th>CP
              <th>Potência virtual do resfriamento
              <th>kW
              <th>1 Hz
     
      <tr>
              <th>SE
              <th>Fator de Eficiência
              <th>%
              <th>1 Hz


**Objectives:** Our main goal is to predict cooler efficiency using other variables to build a model and train it with experiment data.

The cooler working condition is given by (%):
* 3: close to total failure
* 20: reduced effifiency
* 100: full efficiency

In [63]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from tqdm import tqdm

## 1 Data imports and inicial manipulations.

Due to the diferent instrumentation used to measure several phisical properties out raw data is not consistant. Is necessay to prepare it to subsequent analisys. The inicial data processing consists in making al the sensors datas the same frequancy. We will do this downsampling each of 17 sensors raw data do 1Hz and upsampling each of those to 100Hz doing linear interpolation.

### 1.1 Downsampling

In [76]:
file_path = './data/'

load_names = os.listdir(file_path)
load_names.remove('description.txt')
load_names.remove('documentation.txt')


# Index das colunas para o downsamplig das variáveis com maior taxa de amostragem
cols_100 = np.arange(0, 6000, 100)
cols_10 = np.arange(0, 600, 10)

# Importação dos dados contidos nos arquivos .txt
# Features
pressure = []
flow = []
temp = []

print('Carregamento dos conjuntos de dados:')
for name in tqdm(load_names):
    if 'PS' in name:
        aux = pd.read_csv(f'{file_path}{name}', delimiter='\t', header=None)
        ps = pd.DataFrame()
        ps[cols_100] = aux.loc[:, cols_100].copy()
        pressure.append(ps)
    elif 'FS' in name:
        aux = pd.read_csv(f'{file_path}{name}', delimiter='\t', header=None)
        fs = pd.DataFrame()
        fs[cols_10] = aux.loc[:, cols_10].copy()
        flow.append(fs)
    elif 'TS' in name:
        temp.append(pd.read_csv(f'{file_path}{name}', delimiter='\t', header=None))

aux = pd.read_csv(f'{file_path}EPS1.txt', delimiter='\t', header=None)
eps = pd.DataFrame()
eps[cols_100] = aux.loc[:, cols_100].copy()

vs = pd.read_csv(f'{file_path}VS1.txt', delimiter='\t', header=None)
ce = pd.read_csv(f'{file_path}CE.txt', delimiter='\t', header=None)
cp = pd.read_csv(f'{file_path}CP.txt', delimiter='\t', header=None)
se = pd.read_csv(f'{file_path}SE.txt', delimiter='\t', header=None)

# Concatenação dos dados
data = []
print('Processamento dos dados:')
for cycle in tqdm(range(2205)):
    example = np.c_[
        pressure[0].loc[cycle, :].values,
        pressure[1].loc[cycle, :].values,
        pressure[2].loc[cycle, :].values,
        pressure[3].loc[cycle, :].values,
        pressure[4].loc[cycle, :].values,
        pressure[5].loc[cycle, :].values,
        flow[0].loc[cycle, :].values,
        flow[1].loc[cycle, :].values,
        temp[0].loc[cycle, :].values,
        temp[1].loc[cycle, :].values,
        temp[2].loc[cycle, :].values,
        temp[3].loc[cycle, :].values,
        eps.loc[cycle, :].values,
        vs.loc[cycle, :].values,
        ce.loc[cycle, :].values,
        cp.loc[cycle, :].values,
        se.loc[cycle, :].values]

    data.append(example)

downdata=np.array(data)

Carregamento dos conjuntos de dados:
100%|██████████| 18/18 [03:39<00:00, 12.19s/it]
  0%|          | 0/2205 [00:00<?, ?it/s]Processamento dos dados:
100%|██████████| 2205/2205 [00:05<00:00, 383.26it/s]


The result of downsampling is a 3D array of shape (2205 tests, 60 centésimos de segundo, 17 sensores)

In [77]:
np.shape(downdata)

(2205, 60, 17)

Transforming 3D data frame in a 2D data frame labelling the experiment cycle and the observed cooler condition in extras columns. The sensor names are also included in column header

In [82]:
df1Hz = pd.DataFrame()
label_exp = []
label = []
i = 0
for exp in tqdm(downdata):
    df1Hz = pd.concat([df1Hz, pd.DataFrame(exp)], axis='index')
    label_exp.append((i+1)*np.ones(exp.shape[0]))
    label.append(labels[i]*np.ones(exp.shape[0]))
    i += 1

names = [f'PS{num}' for num in range(1, 7)]
names = names + [f'FS{num}' for num in range(1, 3)]
names = names + [f'TS{num}' for num in range(1, 5)]
names = names + ['EPS1', 'VS1', 'CE', 'CP', 'SE']

df1Hz.columns = names

label_exp = np.array(label_exp)
label_exp = label_exp.reshape((label_exp.shape[0]*label_exp.shape[1]))
df1Hz['exp'] = label_exp.astype('int')

label = np.array(label)
label = label.reshape((label.shape[0]*label.shape[1]))
df1Hz['condition'] = label.astype('int')

100%|██████████| 2205/2205 [25:36<00:00,  1.44it/s]


### 1.2 Upsampling

In [74]:
cols_1 = np.arange(0, 6000, 100)
cols_10 = np.arange(0, 6000, 10)

# Importação dos dados contidos nos arquivos "".txt"
# Features
pressure = []
flow = []
temp = []

print('Carregamento dos conjuntos de dados:')
for name in tqdm(load_names):
    if 'PS' in name or 'EPS' in name:
        ps = pd.read_csv(f'{file_path}{name}', delimiter='\t', header=None)
        pressure.append(ps)
    elif 'FS' in name:
        aux = pd.read_csv(f'{file_path}{name}', delimiter='\t', header=None)
        fs = pd.DataFrame(data=np.nan*np.ones((aux.shape[0], 6000)))
        fs[cols_10] = aux.values
        fs = fs.interpolate(axis='columns')
        flow.append(fs)
    elif 'TS' in name:
        aux = pd.read_csv(f'{file_path}{name}', delimiter='\t', header=None)
        t = pd.DataFrame(data=np.nan*np.ones((aux.shape[0], 6000)))
        t[cols_1] = aux.values
        t = t.interpolate(axis='columns')
        temp.append(t)

eps = pd.read_csv(f'{file_path}EPS1.txt', delimiter='\t', header=None)
vs = pd.read_csv(f'{file_path}VS1.txt', delimiter='\t', header=None)
ce = pd.read_csv(f'{file_path}CE.txt', delimiter='\t', header=None)
cp = pd.read_csv(f'{file_path}CP.txt', delimiter='\t', header=None)
se = pd.read_csv(f'{file_path}SE.txt', delimiter='\t', header=None)

aux_dfs = [vs, ce, cp, se]
mod_dfs = []
for df in aux_dfs:
    aux = df.copy()
    aux_df = pd.DataFrame(data=np.nan*np.ones((aux.shape[0], 6000)))
    aux_df[cols_1] = aux.values
    aux_df = aux_df.interpolate(axis='columns')
    mod_dfs.append(aux_df)

# Concatenação dos dados
data = []
print('Processamento dos dados:')
for cycle in tqdm(range(2205)):
    example = np.c_[
        pressure[0].loc[cycle, :].values,
        pressure[1].loc[cycle, :].values,
        pressure[2].loc[cycle, :].values,
        pressure[3].loc[cycle, :].values,
        pressure[4].loc[cycle, :].values,
        pressure[5].loc[cycle, :].values,
        flow[0].loc[cycle, :].values,
        flow[1].loc[cycle, :].values,
        temp[0].loc[cycle, :].values,
        temp[1].loc[cycle, :].values,
        temp[2].loc[cycle, :].values,
        temp[3].loc[cycle, :].values,
        eps.loc[cycle, :].values,
        mod_dfs[0].loc[cycle, :].values,
        mod_dfs[1].loc[cycle, :].values,
        mod_dfs[2].loc[cycle, :].values,
        mod_dfs[3].loc[cycle, :].values]

    data.append(example)

updata=np.array(data)

Carregamento dos conjuntos de dados:
100%|██████████| 18/18 [01:54<00:00,  6.37s/it]
  0%|          | 0/2205 [00:00<?, ?it/s]Processamento dos dados:
100%|██████████| 2205/2205 [00:39<00:00, 55.71it/s] 


The result of downsampling is a 3D array of shape (2205 tests, 6000 centésimos de segundo, 17 sensores)

In [75]:
np.shape(updata)

(2205, 6000, 17)

Transforming 3D data frame in a 2D data frame labelling the experiment cycle and the observed cooler condition in extras columns. The sensor names are also included in column header

In [80]:
labels = pd.read_csv(f'{file_path}profile.txt', delimiter='\t', header=None)
labels = labels[0].copy()

df100Hz = pd.DataFrame()
label_exp = []
label = []
i = 0
for exp in tqdm(updata):
    df100Hz = pd.concat([df100Hz, pd.DataFrame(exp)], axis='index')
    label_exp.append((i+1)*np.ones(exp.shape[0]))
    label.append(labels[i]*np.ones(exp.shape[0]))
    i += 1

names = [f'PS{num}' for num in range(1, 7)]
names = names + [f'FS{num}' for num in range(1, 3)]
names = names + [f'TS{num}' for num in range(1, 5)]
names = names + ['EPS1', 'VS1', 'CE', 'CP', 'SE']

df100Hz.columns = names

label_exp = np.array(label_exp)
label_exp = label_exp.reshape((label_exp.shape[0]*label_exp.shape[1]))
df100Hz['exp'] = label_exp.astype('int')

label = np.array(label)
label = label.reshape((label.shape[0]*label.shape[1]))
df100Hz['condition'] = label.astype('int')

100%|██████████| 2205/2205 [00:14<00:00, 153.66it/s]


Our imported and sampled data sets pre-view for 100Hz and 1Hz:

In [81]:
df1Hz

Unnamed: 0,PS1,PS2,PS3,PS4,PS5,PS6,FS1,FS2,TS1,TS2,TS3,TS4,EPS1,VS1,CE,CP,SE,exp,condition
0,2411.6,151.47,125.500,2.305,0.000,9.936,8.990,10.179,35.570,40.961,38.320,30.363,2411.6,0.604,47.202,2.184,68.039,1,3
1,2936.6,191.46,0.430,0.000,0.000,9.974,0.001,10.176,35.492,40.949,38.332,30.375,2936.6,0.605,47.273,2.184,0.000,1,3
2,2656.2,179.09,0.133,0.000,0.000,9.984,0.005,10.163,35.469,40.965,38.320,30.367,2656.2,0.611,47.250,2.184,0.000,1,3
3,2949.4,191.43,0.000,0.000,0.000,9.947,0.000,10.167,35.422,40.922,38.324,30.367,2949.4,0.603,47.332,2.185,0.000,1,3
4,2945.8,191.36,0.000,0.000,0.000,9.964,0.000,10.167,35.414,40.879,38.332,30.379,2945.8,0.608,47.213,2.178,0.000,1,3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
55,2417.4,151.73,125.740,2.273,10.222,10.000,7.706,10.171,35.441,40.910,38.195,30.395,2417.4,0.516,46.355,2.134,68.167,2205,100
56,2415.6,151.84,125.500,2.352,10.207,9.962,7.858,10.196,35.437,40.895,38.184,30.391,2415.6,0.528,46.432,2.146,68.167,2205,100
57,2417.4,151.81,125.780,2.305,10.198,9.965,8.013,10.194,35.434,40.883,38.184,30.395,2417.4,0.522,46.384,2.144,68.258,2205,100
58,2417.6,151.81,125.790,2.406,10.241,10.014,7.710,10.167,35.434,40.879,38.184,30.402,2417.6,0.522,46.479,2.136,68.258,2205,100


In [85]:
df100Hz

Unnamed: 0,PS1,PS2,PS3,PS4,PS5,PS6,FS1,FS2,TS1,TS2,TS3,TS4,EPS1,VS1,CE,CP,SE,exp,condition
0,2411.6,151.47,125.50,2.305,0.000,9.936,8.990,10.1790,35.57000,40.96100,38.32000,30.36300,2411.6,0.60400,47.20200,2.184,68.03900,1,3
1,2411.6,151.45,125.39,2.305,0.000,9.947,8.168,10.1785,35.56922,40.96088,38.32012,30.36312,2411.6,0.60401,47.20271,2.184,67.35861,1,3
2,2411.6,151.52,125.40,2.336,0.000,9.964,7.346,10.1780,35.56844,40.96076,38.32024,30.36324,2411.6,0.60402,47.20342,2.184,66.67822,1,3
3,2411.6,151.27,125.03,2.578,0.000,9.989,6.524,10.1775,35.56766,40.96064,38.32036,30.36336,2411.6,0.60403,47.20413,2.184,65.99783,1,3
4,2411.6,150.80,124.05,2.977,0.000,9.996,5.702,10.1770,35.56688,40.96052,38.32048,30.36348,2411.6,0.60404,47.20484,2.184,65.31744,1,3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5995,2416.2,151.64,125.73,2.305,10.204,9.989,7.774,10.1760,35.42600,40.89100,38.18700,30.37500,2416.2,0.53100,46.62100,2.148,68.11700,2205,100
5996,2416.6,151.70,125.81,2.320,10.238,10.007,7.774,10.1760,35.42600,40.89100,38.18700,30.37500,2416.6,0.53100,46.62100,2.148,68.11700,2205,100
5997,2416.8,151.73,125.77,2.273,10.223,10.007,7.774,10.1760,35.42600,40.89100,38.18700,30.37500,2416.8,0.53100,46.62100,2.148,68.11700,2205,100
5998,2417.0,151.71,125.66,2.227,10.218,9.988,7.774,10.1760,35.42600,40.89100,38.18700,30.37500,2417.0,0.53100,46.62100,2.148,68.11700,2205,100


Importing the massive data set to a .sav content. I can import this file directly for further work steps without processing it all over again.

In [113]:
import pickle
file_path = './data'
pickle.dump(df100Hz, open('./df100Hz.sav', 'wb'))
pickle.dump(df1Hz, open('./df1Hz.sav', 'wb'))