# Obtención de Datos
En este notebook se pretende familiarizar con los datos suministrados por la materia de Itinerario, los cuales se asocian con valores obtenidos por distintos sensores colocados en la estructura base de una turbina eólica experimental de tipo jacket en el laboratorio de la Universidad de Cataluña.

Se tiene una carpeta que contiene 2 archivos de datos: IMAGES_FOLDER y DATOS. El archivo DATOS se trata de varios experimentos realizados para tomar datos durante 60 segundos de duración a una frecuencia de muestreo de 1651.6129 Hz (lo que equivale a 99097 muestras en un minuto para cada uno de los 24 sensores anexados a la estructura base de la turbina).

Existen 25 pruebas experimentales realizadas tomando en cuenta las diferentes amplitudes del white noise (0.5, 1, 2 ,3). Para cada amplitud de white noise se tiene:

- 10 pruebas con la barra sin daños
- 5 pruebas con una réplica de la barra original
- 5 pruebas con una barra que posee 5 mm de fisura
- 5 pruebas con un perno mal ajustado en la estructura

En total se tendrían 100 experimentos.

Cada uno de los archivos está nombrado con la siguiente nomenclatura X_Y_ZA.mat, en donde:
- X: indica el tipo de daño (1 sin daños, 2 réplica, 3 daño por fisura, 4 perno flojo)
- Y: indica el número del experimento
- Z: indica la amplitud del white noise utilizada (0.5, 1, 2, 3)

## Importaciones

In [1]:
from google.colab import drive
from os import listdir
from os.path import isfile, join
from scipy.io import loadmat

import pandas as pd

import os

## Obtención de Datos
El dataset fue subido a Google Drive con el nombre de DATOS_EXPERIMENTALES_JACKET. La carpeta contiene una subcarpeta llamada _MACOSX que no contiene ningún tipo de archivo, una carpeta denominada IMAGES_FOLDER que posee datos .mat, y la carpeta DATOS que contiene datos tipo .mat. También se incluyen archivos relacionados a la descripción del dataset.

In [2]:
drive.mount('/content/drive')

Mounted at /content/drive


Verificamos la cantidad de datos que existe en cada carpeta interna.

In [3]:
for dirpath, dirnames, filenames in os.walk('drive/MyDrive/DATOS_EXPERIMENTALES_JACKET'):
  print(f"Existen {len(dirnames)} directorios y {len(filenames)} datos en {dirpath}")

Existen 3 directorios y 111 datos en drive/MyDrive/DATOS_EXPERIMENTALES_JACKET
Existen 0 directorios y 100 datos en drive/MyDrive/DATOS_EXPERIMENTALES_JACKET/DATOS
Existen 0 directorios y 0 datos en drive/MyDrive/DATOS_EXPERIMENTALES_JACKET/__MACOSX
Existen 0 directorios y 6400 datos en drive/MyDrive/DATOS_EXPERIMENTALES_JACKET/IMAGES_FOLDER


Almacenamos la dirección de la carpeta de DATOS e IMAGES_FOLDER en sus respectivas variables.

In [4]:
dataset_datos_dir = 'drive/MyDrive/DATOS_EXPERIMENTALES_JACKET/DATOS'
dataset_images_dir = 'drive/MyDrive/DATOS_EXPERIMENTALES_JACKET/IMAGES_FOLDER'

Almacenamos la dirección de cada archivo asociado a la carpeta de DATOS para realizar una revisión sobre los datos contenidos.

In [5]:
dataset_datos_files = [dataset_datos_dir+'/'+filename for filename in listdir(dataset_datos_dir) if isfile(join(dataset_datos_dir, filename))]

# Imprimimos un elemento del arreglo para verificar que las direcciones se hayan almacenado correctamente
dataset_datos_files[0]

'drive/MyDrive/DATOS_EXPERIMENTALES_JACKET/DATOS/1_1_05A.mat'

Analizamos los tipos de datos que contiene cada archivo individual de la carpeta DATOS.

In [6]:
mat = loadmat(dataset_datos_files[0])
type(mat)

dict

Revisamos la estructura que posee el diccionario.

In [None]:
mat.keys()

dict_keys(['__header__', '__version__', '__globals__', 'None', 'data', 'starttime', 'timestamps', '__function_workspace__'])

Observamos lo que contiene cada una de las keys del diccionario.

In [None]:
mat['__header__']

b'MATLAB 5.0 MAT-file, Platform: PCWIN64, Created on: Tue Feb 12 14:09:35 2019'

In [None]:
mat['__version__']

'1.0'

In [None]:
mat['__globals__']

[]

In [None]:
mat['None']

MatlabOpaque([(b'ans', b'MCOS', b'daq.ni.CompactDAQModule', array([[3707764736],
       [         2],
       [         1],
       [         6],
       [         1],
       [         2],
       [         3],
       [         4],
       [         5],
       [         6],
       [         1]], dtype=uint32))],
             dtype=[('s0', 'O'), ('s1', 'O'), ('s2', 'O'), ('arr', 'O')])

In [None]:
mat['data'], len(mat['data'])

(array([[ 3.47662560e-04,  2.12761018e-04,  1.71859650e-04, ...,
          1.96720881e-04,  1.34626292e-04, -4.31555940e-05],
        [ 8.14584000e-05,  2.48714320e-04,  2.05410860e-04, ...,
          2.05256023e-04,  1.69468800e-06, -5.04751020e-05],
        [ 2.22497760e-04,  3.94965040e-04,  2.38962070e-04, ...,
          6.99130570e-05,  7.60876040e-05, -1.06842300e-06],
        ...,
        [ 1.87085280e-04,  2.68214416e-04,  2.23711520e-04, ...,
          1.38194193e-04,  4.19400360e-05, -6.26742820e-05],
        [-2.11156800e-05,  3.02948962e-04,  2.60922862e-04, ...,
          7.96675050e-05,  9.62180200e-06, -9.07323960e-05],
        [ 1.49230560e-04,  2.30432980e-04,  2.04190816e-04, ...,
          1.44290723e-04,  2.73053640e-05, -3.15663730e-05]]), 99097)

In [None]:
mat['starttime']

array([[737468.58928423]])

In [None]:
mat['timestamps'], len(mat['timestamps'])

(array([[0.00000000e+00],
        [6.05468750e-04],
        [1.21093750e-03],
        ...,
        [5.99983203e+01],
        [5.99989258e+01],
        [5.99995313e+01]]), 99097)

In [None]:
mat['__function_workspace__'], len(mat['__function_workspace__'][0])

(array([[ 0,  1, 73, ...,  0,  0,  0]], dtype=uint8), 1352)

Las características que podrían clasificarse como más relevantes dentro del .mat serían:
- data
- timestamps

Podemos observar que la longitud de los datos contenidos en data y en timestamps concuerda con la cantidad de muestras que se tomaron para cada experimento durante 60 segundos a una frecuencia de 1651.6129 Hz (99097).

Implementamos una visualización de los datos para familiarizarnos con los mismos.

In [7]:
datos_mat = mat['data']
df = pd.DataFrame(datos_mat)
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23
0,0.000164,0.000316,0.000257,0.000163,0.000217,0.000159,0.000133,0.000218,0.000316,0.000102,0.000274,0.000244,0.000157,3.8e-05,0.000348,0.000246,3.1e-05,0.000106,0.000199,0.000195,0.000147,9.7e-05,8.6e-05,-1.2e-05
1,0.000145,0.000225,0.00023,0.00015,0.000207,0.000144,0.000114,0.00021,0.000329,0.000157,0.000273,0.000263,0.000197,8.6e-05,0.000392,0.000279,5.8e-05,0.000102,0.000273,0.000165,0.000207,0.000197,0.000104,1.1e-05
2,0.000192,0.000263,0.000266,0.000179,0.000199,0.000202,7.3e-05,0.000173,0.000334,0.000112,0.000264,0.000205,0.000195,5.4e-05,0.000383,0.000261,9e-06,5.8e-05,0.00027,0.000182,0.000165,0.000172,8.4e-05,-6e-06
3,0.000182,0.000258,0.000243,0.000177,0.000252,0.0002,0.000122,0.0002,0.000348,0.000162,0.00027,0.00025,0.000192,2.1e-05,0.000351,0.000183,2.4e-05,6.7e-05,0.000211,0.00021,0.000136,0.000174,6.8e-05,-3.7e-05
4,0.00015,0.000261,0.000177,0.000153,0.000195,0.00016,0.0001,0.000196,0.000298,0.000116,0.000289,0.000225,0.000132,2.8e-05,0.000353,0.000202,-9e-06,7.5e-05,0.000197,0.000185,0.000117,0.00013,2.7e-05,-2.9e-05


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 99097 entries, 0 to 99096
Data columns (total 24 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   0       99097 non-null  float64
 1   1       99097 non-null  float64
 2   2       99097 non-null  float64
 3   3       99097 non-null  float64
 4   4       99097 non-null  float64
 5   5       99097 non-null  float64
 6   6       99097 non-null  float64
 7   7       99097 non-null  float64
 8   8       99097 non-null  float64
 9   9       99097 non-null  float64
 10  10      99097 non-null  float64
 11  11      99097 non-null  float64
 12  12      99097 non-null  float64
 13  13      99097 non-null  float64
 14  14      99097 non-null  float64
 15  15      99097 non-null  float64
 16  16      99097 non-null  float64
 17  17      99097 non-null  float64
 18  18      99097 non-null  float64
 19  19      99097 non-null  float64
 20  20      99097 non-null  float64
 21  21      99097 non-null  float64
 22

In [8]:
df.describe()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23
count,99097.0,99097.0,99097.0,99097.0,99097.0,99097.0,99097.0,99097.0,99097.0,99097.0,99097.0,99097.0,99097.0,99097.0,99097.0,99097.0,99097.0,99097.0,99097.0,99097.0,99097.0,99097.0,99097.0,99097.0
mean,0.000162,0.000263,0.00022,0.000144,0.000198,0.000166,0.000108,0.000191,0.000304,0.000135,0.000272,0.000248,0.000173,4.8e-05,0.000374,0.000221,2.7e-05,7.9e-05,0.000212,0.000159,0.000144,0.000146,4.1e-05,-1.9e-05
std,3e-05,2.8e-05,3e-05,2.7e-05,2.7e-05,2.7e-05,2.7e-05,2.3e-05,2.3e-05,2.6e-05,2.2e-05,2.1e-05,2.5e-05,2.1e-05,2.2e-05,3.3e-05,2.1e-05,2.2e-05,3.4e-05,2.5e-05,2.4e-05,3.2e-05,2.6e-05,2.5e-05
min,2.3e-05,0.000138,9.6e-05,3.2e-05,8.4e-05,5.6e-05,-1e-06,8.9e-05,0.000207,3.5e-05,0.000181,0.00016,6.7e-05,-4.4e-05,0.000281,9.6e-05,-7.3e-05,-1.1e-05,6.8e-05,5.1e-05,2.3e-05,2.5e-05,-8.1e-05,-0.000122
25%,0.000142,0.000243,0.000201,0.000126,0.00018,0.000148,8.9e-05,0.000175,0.000289,0.000118,0.000257,0.000233,0.000156,3.3e-05,0.000359,0.000197,1.2e-05,6.4e-05,0.000188,0.000142,0.000127,0.000124,2.3e-05,-3.6e-05
50%,0.000162,0.000263,0.000221,0.000144,0.000199,0.000166,0.000108,0.000191,0.000304,0.000135,0.000272,0.000248,0.000173,4.8e-05,0.000374,0.000221,2.7e-05,7.9e-05,0.000212,0.000159,0.000144,0.000146,4.1e-05,-1.9e-05
75%,0.000182,0.000282,0.00024,0.000163,0.000217,0.000184,0.000127,0.000207,0.00032,0.000153,0.000287,0.000262,0.00019,6.2e-05,0.000388,0.000244,4.2e-05,9.4e-05,0.000236,0.000176,0.00016,0.000169,5.8e-05,-2e-06
max,0.000294,0.000386,0.000347,0.000257,0.000303,0.000273,0.000218,0.00029,0.000399,0.000244,0.00037,0.00034,0.000284,0.000137,0.000468,0.000353,0.000114,0.000169,0.00034,0.000277,0.000253,0.000268,0.000151,9.4e-05


Creamos 4 conjuntos distintos, cada uno de estos conjuntos contendrán los datos asociados con cada nivel de White Noise.

In [9]:
datos_wn_05 = []
datos_wn_1 = []
datos_wn_2 = []
datos_wn_3 = []

for filename in listdir(dataset_datos_dir):
  if isfile(join(dataset_datos_dir, filename)):
    nombre_archivo = filename.split('.')[0]
    if (nombre_archivo.split('_')[2] == '05A'):
      datos_wn_05.append(dataset_datos_dir + '/' + filename)
    elif (nombre_archivo.split('_')[2] == '1A'):
      datos_wn_1.append(dataset_datos_dir + '/' + filename)
    elif (nombre_archivo.split('_')[2] == '2A'):
      datos_wn_2.append(dataset_datos_dir + '/' + filename)
    elif (nombre_archivo.split('_')[2] == '3A'):
      datos_wn_3.append(dataset_datos_dir + '/' + filename)

Verificamos que los datos hayan sido agrupados correctamente

In [10]:
datos_wn_05[0], datos_wn_1[0], datos_wn_2[0], datos_wn_3[0]

('drive/MyDrive/DATOS_EXPERIMENTALES_JACKET/DATOS/1_1_05A.mat',
 'drive/MyDrive/DATOS_EXPERIMENTALES_JACKET/DATOS/1_5_1A.mat',
 'drive/MyDrive/DATOS_EXPERIMENTALES_JACKET/DATOS/1_5_2A.mat',
 'drive/MyDrive/DATOS_EXPERIMENTALES_JACKET/DATOS/1_2_3A.mat')

Creamos una función que se va a aplicar a cada uno de los 4 conjuntos, en donde se crearan 4 subconjuntos relacionados con el estado de la estructura (1, 2, 3, 4). Todo estará dentro de la estructura de un diccionario para acceder fácilmente a los mismos.

In [11]:
def separar_por_estado(arreglo):

  estado_1 = []
  estado_2 = []
  estado_3 = []
  estado_4 = []

  for direccion in arreglo:
    nombre_archivo = direccion.split('/')[4].split('.')[0]
    estado = nombre_archivo.split('_')[0]
    if (estado == '1'):
      estado_1.append(direccion)
    elif (estado == '2'):
      estado_2.append(direccion)
    elif (estado == '3'):
      estado_3.append(direccion)
    elif (estado == '4'):
      estado_4.append(direccion)

  return {1: estado_1, 2: estado_2, 3: estado_3, 4: estado_4}

In [12]:
datos_wn_05_estados = separar_por_estado(datos_wn_05)
datos_wn_1_estados = separar_por_estado(datos_wn_1)
datos_wn_2_estados = separar_por_estado(datos_wn_2)
datos_wn_3_estados = separar_por_estado(datos_wn_3)

diccionario_datos = {
                     'wn_05': datos_wn_05_estados,
                     'wn_1': datos_wn_1_estados,
                     'wn_2': datos_wn_2_estados,
                     'wn_3': datos_wn_3_estados 
                    }
                    
diccionario_datos.keys()

dict_keys(['wn_05', 'wn_1', 'wn_2', 'wn_3'])

Verificamos la estructura del diccionario.

In [14]:
diccionario_datos

{'wn_05': {1: ['drive/MyDrive/DATOS_EXPERIMENTALES_JACKET/DATOS/1_1_05A.mat',
   'drive/MyDrive/DATOS_EXPERIMENTALES_JACKET/DATOS/1_5_05A.mat',
   'drive/MyDrive/DATOS_EXPERIMENTALES_JACKET/DATOS/1_3_05A.mat',
   'drive/MyDrive/DATOS_EXPERIMENTALES_JACKET/DATOS/1_2_05A.mat',
   'drive/MyDrive/DATOS_EXPERIMENTALES_JACKET/DATOS/1_10_05A.mat',
   'drive/MyDrive/DATOS_EXPERIMENTALES_JACKET/DATOS/1_4_05A.mat',
   'drive/MyDrive/DATOS_EXPERIMENTALES_JACKET/DATOS/1_7_05A.mat',
   'drive/MyDrive/DATOS_EXPERIMENTALES_JACKET/DATOS/1_6_05A.mat',
   'drive/MyDrive/DATOS_EXPERIMENTALES_JACKET/DATOS/1_9_05A.mat',
   'drive/MyDrive/DATOS_EXPERIMENTALES_JACKET/DATOS/1_8_05A.mat'],
  2: ['drive/MyDrive/DATOS_EXPERIMENTALES_JACKET/DATOS/2_5_05A.mat',
   'drive/MyDrive/DATOS_EXPERIMENTALES_JACKET/DATOS/2_1_05A.mat',
   'drive/MyDrive/DATOS_EXPERIMENTALES_JACKET/DATOS/2_3_05A.mat',
   'drive/MyDrive/DATOS_EXPERIMENTALES_JACKET/DATOS/2_4_05A.mat',
   'drive/MyDrive/DATOS_EXPERIMENTALES_JACKET/DATOS/2_2_05A