## ANALISIS DE DATOS DE ACC (ACELEROMETRO)

El presente es para analizar los datos del acelerometro del smartwatch, el cual tiene un procesamiento de datos en 32Hz, que seria la 32ava parte de un segundo

In [287]:
# Importando Pandas y otras librerias
import pandas as pd
import numpy as np

In [288]:
PACIENTE = '015'
PATH_FOLDER = 'G:\\Dataset\\big-ideas-lab-glycemic-variability-and-wearable-device-data-1.1.2\\'+PACIENTE+'\\'

In [289]:
# Leyendo el CSV
acelerometro_values = pd.read_csv(PATH_FOLDER + 'ACC_'+PACIENTE+'.csv', engine='python', na_values="not available")

In [290]:
acelerometro_values.head()

Unnamed: 0,datetime,acc_x,acc_y,acc_z
0,2020-07-24 07:07:49.000000,1.0,23.0,-59.0
1,2020-07-24 07:07:49.031250,0.0,23.0,-60.0
2,2020-07-24 07:07:49.062500,-1.0,23.0,-59.0
3,2020-07-24 07:07:49.093750,-4.0,23.0,-56.0
4,2020-07-24 07:07:49.125000,-6.0,23.0,-55.0


In [291]:
acelerometro_values.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13969806 entries, 0 to 13969805
Data columns (total 4 columns):
 #   Column    Dtype  
---  ------    -----  
 0   datetime  object 
 1    acc_x    float64
 2    acc_y    float64
 3    acc_z    float64
dtypes: float64(3), object(1)
memory usage: 426.3+ MB


In [292]:
acelerometro_values.count()

datetime    13969806
 acc_x      13969806
 acc_y      13969806
 acc_z      13969806
dtype: int64

In [293]:
acelerometro_values["datetime"].head()

0    2020-07-24 07:07:49.000000
1    2020-07-24 07:07:49.031250
2    2020-07-24 07:07:49.062500
3    2020-07-24 07:07:49.093750
4    2020-07-24 07:07:49.125000
Name: datetime, dtype: object

### Trabajando con Datetime
Lo primero sera convertir los datetime a el formato correcto, ya que lo esta detectando como object, lo siguiente sera colocar como index las fechas y al final agrupar por cada 5 minutos los datos para obtener el promedio y media de los datos


In [294]:
# Convertimos en fechas los datimetimes
acelerometro_values['datetime'] = pd.to_datetime(acelerometro_values['datetime'])

In [295]:
print(acelerometro_values.columns)

## Añadimos el valor de la magnitud
acelerometro_values["magnitude"] = np.sqrt(acelerometro_values[' acc_x']**2 + acelerometro_values[' acc_y']**2 + acelerometro_values[' acc_z']**2)
acelerometro_values = acelerometro_values.set_index('datetime')
acelerometro_values.head()


Index(['datetime', ' acc_x', ' acc_y', ' acc_z'], dtype='object')


Unnamed: 0_level_0,acc_x,acc_y,acc_z,magnitude
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2020-07-24 07:07:49.000000,1.0,23.0,-59.0,63.332456
2020-07-24 07:07:49.031250,0.0,23.0,-60.0,64.257295
2020-07-24 07:07:49.062500,-1.0,23.0,-59.0,63.332456
2020-07-24 07:07:49.093750,-4.0,23.0,-56.0,60.671245
2020-07-24 07:07:49.125000,-6.0,23.0,-55.0,59.916609


### Calculamos la magnitud del vector

En este caso tenemos que encontrar la magnitud del vector, en este caso en 3D, donde obtendremos el promedio

In [296]:
# Removemos las 3 dimensiones y nos quedamos con la magnitud
columns_to_remove = [' acc_x', ' acc_y', ' acc_z']
acelerometro_values = acelerometro_values.drop(columns=columns_to_remove) 
acelerometro_values.head()


Unnamed: 0_level_0,magnitude
datetime,Unnamed: 1_level_1
2020-07-24 07:07:49.000000,63.332456
2020-07-24 07:07:49.031250,64.257295
2020-07-24 07:07:49.062500,63.332456
2020-07-24 07:07:49.093750,60.671245
2020-07-24 07:07:49.125000,59.916609


### Calculamos le media, la mediana y demas factores de estadistica

En este caso tenemos que obtener el promedio, mediana, max, min, desviacion estandar y quartiles

In [297]:
# Funcion para calcular los cuartiles 1 y 3 que indican en el paper
# 
def quartiles(x):
    return pd.Series([x.quantile(0.25), x.quantile(0.75)], index=['q1', 'q3'])


In [298]:
# Dividimos en registros de cada 5 min
df_procesado_5min = acelerometro_values['magnitude'].resample('5min')
# Obtenemos el promedio
df_5min = df_procesado_5min.agg(['mean', 'median', 'max', 'min', 'std'])
print(df_5min.columns)
df_5min.head(20)


Index(['mean', 'median', 'max', 'min', 'std'], dtype='object')


Unnamed: 0_level_0,mean,median,max,min,std
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2020-07-05 15:10:00,65.395828,65.383484,151.175395,19.131126,5.574826
2020-07-05 15:15:00,65.450358,65.459911,114.149025,20.615528,3.615005
2020-07-05 15:20:00,65.469327,65.391131,91.181138,35.355339,1.822561
2020-07-05 15:25:00,65.666601,65.627738,107.07941,38.858718,3.276848
2020-07-05 15:30:00,65.827568,65.421709,139.821315,13.601471,6.123134
2020-07-05 15:35:00,66.169623,65.076878,180.321934,19.33908,9.257707
2020-07-05 15:40:00,66.81192,64.818979,189.781453,12.767145,13.624385
2020-07-05 15:45:00,65.77757,64.288413,169.437304,11.224972,11.738833
2020-07-05 15:50:00,63.974395,63.06346,138.928039,15.937377,9.908468
2020-07-05 15:55:00,64.723214,63.458648,155.881365,22.494444,7.775781


In [299]:
# Lo mismo aplicamos para 1 hora
df_procesado_1hora = acelerometro_values['magnitude'].resample('1h') 
# Obtenemos el promedio
df_1hora = df_procesado_1hora.agg(['mean', 'median', 'max', 'min', 'std'])

# Removemos las columnas que no necesitamos por ahora
# df_1hora = df_1hora.drop(columns=columns_to_remove)
df_1hora.head(20)

Unnamed: 0_level_0,mean,median,max,min,std
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2020-07-05 15:00:00,65.529903,65.192024,189.781453,11.224972,8.231945
2020-07-05 16:00:00,65.484364,65.306967,221.126661,11.74734,5.730069
2020-07-05 17:00:00,65.00831,64.226163,221.126661,9.433981,6.971192
2020-07-05 18:00:00,65.294064,65.161338,183.679068,16.03122,3.041116
2020-07-05 19:00:00,64.967033,64.892218,156.217797,32.572995,1.67088
2020-07-05 20:00:00,65.679021,65.176683,198.579959,6.164414,7.883305
2020-07-05 21:00:00,65.905377,65.490457,221.126661,7.483315,7.098158
2020-07-05 22:00:00,64.412327,64.132675,129.251692,32.280025,2.010118
2020-07-05 23:00:00,63.647782,63.600314,88.283634,45.6618,0.763258
2020-07-06 00:00:00,64.183121,63.788714,194.530203,9.797959,4.270418


In [300]:
# Crear a serie de dataframe de 5 min
series5min = quartiles(df_procesado_5min)
series5min.head()

q1    datetime
2020-07-05 15:10:00    64.451532
2020...
q3    datetime
2020-07-05 15:10:00    65.833122
2020...
dtype: object

In [301]:
# Separar los cuartiles en columnas individuales
# Obtenemos los quantiles
df_5min_quantil1 = df_procesado_5min.quantile(0.25)
df_5min_quantil3 = df_procesado_5min.quantile(0.75)
df_1hora_quantil1 = df_procesado_1hora.quantile(0.25)
df_1hora_quantil3 = df_procesado_1hora.quantile(0.75)
df_5min['q1'] = df_5min_quantil1
df_5min['q3'] = df_5min_quantil3
df_5min.head(10)
# df_1hora[['q1', 'q3']] = [df_1hora_quantil1,df_1hora_quantil3]


Unnamed: 0_level_0,mean,median,max,min,std,q1,q3
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2020-07-05 15:10:00,65.395828,65.383484,151.175395,19.131126,5.574826,64.451532,65.833122
2020-07-05 15:15:00,65.450358,65.459911,114.149025,20.615528,3.615005,64.938432,65.886266
2020-07-05 15:20:00,65.469327,65.391131,91.181138,35.355339,1.822561,65.076878,65.946948
2020-07-05 15:25:00,65.666601,65.627738,107.07941,38.858718,3.276848,64.976919,66.098411
2020-07-05 15:30:00,65.827568,65.421709,139.821315,13.601471,6.123134,64.691576,66.377707
2020-07-05 15:35:00,66.169623,65.076878,180.321934,19.33908,9.257707,64.044906,66.659583
2020-07-05 15:40:00,66.81192,64.818979,189.781453,12.767145,13.624385,60.76183,70.519501
2020-07-05 15:45:00,65.77757,64.288413,169.437304,11.224972,11.738833,57.497826,73.013697
2020-07-05 15:50:00,63.974395,63.06346,138.928039,15.937377,9.908468,62.064483,64.078077
2020-07-05 15:55:00,64.723214,63.458648,155.881365,22.494444,7.775781,62.241465,66.515036


In [302]:
df_5min.count()

mean      1165
median    1165
max       1165
min       1165
std       1165
q1        1165
q3        1165
dtype: int64

In [303]:
# Igual con 1hora de dataset
df_1hora['q1'] = df_1hora_quantil1
df_1hora['q3'] = df_1hora_quantil3
df_1hora.head(10)

Unnamed: 0_level_0,mean,median,max,min,std,q1,q3
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2020-07-05 15:00:00,65.529903,65.192024,189.781453,11.224972,8.231945,63.079315,66.27971
2020-07-05 16:00:00,65.484364,65.306967,221.126661,11.74734,5.730069,64.06247,65.916614
2020-07-05 17:00:00,65.00831,64.226163,221.126661,9.433981,6.971192,63.29297,65.58201
2020-07-05 18:00:00,65.294064,65.161338,183.679068,16.03122,3.041116,64.691576,65.681047
2020-07-05 19:00:00,64.967033,64.892218,156.217797,32.572995,1.67088,64.521314,65.314623
2020-07-05 20:00:00,65.679021,65.176683,198.579959,6.164414,7.883305,64.06247,66.287254
2020-07-05 21:00:00,65.905377,65.490457,221.126661,7.483315,7.098158,63.960926,66.977608
2020-07-05 22:00:00,64.412327,64.132675,129.251692,32.280025,2.010118,63.537391,65.192024
2020-07-05 23:00:00,63.647782,63.600314,88.283634,45.6618,0.763258,63.166447,63.984373
2020-07-06 00:00:00,64.183121,63.788714,194.530203,9.797959,4.270418,63.158531,64.536811


In [304]:
df_1hora.count()

mean      101
median    101
max       101
min       101
std       101
q1        101
q3        101
dtype: int64

In [305]:
# Exportamos los resultados en un csv
df_5min.to_csv("ACC_5min_"+PACIENTE+".csv")
df_1hora.to_csv("ACC_1hora_"+PACIENTE+".csv")