## ANALISIS DE DATOS DE ACC (ACELEROMETRO)

El presente es para analizar los datos del acelerometro del smartwatch, el cual tiene un procesamiento de datos en 32Hz, que seria la 32ava parte de un segundo

In [11]:
# Importando Pandas y otras librerias
import pandas as pd
import numpy as np

In [12]:
# Leyendo el CSV
acelerometro_values = pd.read_csv('ACC_016.csv', engine='python', na_values="not available")

In [13]:
acelerometro_values.head()

Unnamed: 0,datetime,acc_x,acc_y,acc_z
0,2020-07-16 09:29:03.000000,-39.0,-28.0,37.0
1,2020-07-16 09:29:03.031250,-38.0,-27.0,37.0
2,2020-07-16 09:29:03.062500,-37.0,-37.0,37.0
3,2020-07-16 09:29:03.093750,-47.0,-27.0,41.0
4,2020-07-16 09:29:03.125000,-39.0,-37.0,34.0


In [14]:
acelerometro_values.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17860788 entries, 0 to 17860787
Data columns (total 4 columns):
 #   Column    Dtype  
---  ------    -----  
 0   datetime  object 
 1    acc_x    float64
 2    acc_y    float64
 3    acc_z    float64
dtypes: float64(3), object(1)
memory usage: 545.1+ MB


In [15]:
acelerometro_values.count()

datetime    17860788
 acc_x      17860788
 acc_y      17860788
 acc_z      17860788
dtype: int64

In [16]:
acelerometro_values["datetime"].head()

0    2020-07-16 09:29:03.000000
1    2020-07-16 09:29:03.031250
2    2020-07-16 09:29:03.062500
3    2020-07-16 09:29:03.093750
4    2020-07-16 09:29:03.125000
Name: datetime, dtype: object

### Trabajando con Datetime
Lo primero sera convertir los datetime a el formato correcto, ya que lo esta detectando como object, lo siguiente sera colocar como index las fechas y al final agrupar por cada 5 minutos los datos para obtener el promedio y media de los datos


In [17]:
# Convertimos en fechas los datimetimes
acelerometro_values['datetime'] = pd.to_datetime(acelerometro_values['datetime'])

In [18]:
print(acelerometro_values.columns)

## Añadimos el valor de la magnitud
acelerometro_values["magnitude"] = np.sqrt(acelerometro_values[' acc_x']**2 + acelerometro_values[' acc_y']**2 + acelerometro_values[' acc_z']**2)
acelerometro_values = acelerometro_values.set_index('datetime')
acelerometro_values.head()


Index(['datetime', ' acc_x', ' acc_y', ' acc_z'], dtype='object')


Unnamed: 0_level_0,acc_x,acc_y,acc_z,magnitude
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2020-07-16 09:29:03.000000,-39.0,-28.0,37.0,60.61353
2020-07-16 09:29:03.031250,-38.0,-27.0,37.0,59.514704
2020-07-16 09:29:03.062500,-37.0,-37.0,37.0,64.08588
2020-07-16 09:29:03.093750,-47.0,-27.0,41.0,67.963225
2020-07-16 09:29:03.125000,-39.0,-37.0,34.0,63.608176


### Calculamos la magnitud del vector

En este caso tenemos que encontrar la magnitud del vector, en este caso en 3D, donde obtendremos el promedio

In [19]:
# Removemos las 3 dimensiones y nos quedamos con la magnitud
columns_to_remove = [' acc_x', ' acc_y', ' acc_z']
acelerometro_values = acelerometro_values.drop(columns=columns_to_remove) 
acelerometro_values.head()


Unnamed: 0_level_0,magnitude
datetime,Unnamed: 1_level_1
2020-07-16 09:29:03.000000,60.61353
2020-07-16 09:29:03.031250,59.514704
2020-07-16 09:29:03.062500,64.08588
2020-07-16 09:29:03.093750,67.963225
2020-07-16 09:29:03.125000,63.608176


### Calculamos le media, la mediana y demas factores de estadistica

En este caso tenemos que obtener el promedio, mediana, max, min, desviacion estandar y quartiles

In [22]:
# Funcion para calcular los cuartiles 1 y 3 que indican en el paper
# 
def quartiles(x):
    return pd.Series([x.quantile(0.25), x.quantile(0.75)], index=['q1', 'q3'])


In [20]:
# Dividimos en registros de cada 5 min
df_procesado_5min = acelerometro_values['magnitude'].resample('5min')
# Obtenemos el promedio
df_5min = df_procesado_5min.agg(['mean', 'median', 'max', 'min', 'std'])
print(df_5min.columns)
df_5min.head(20)


Index(['mean', 'median', 'max', 'min', 'std'], dtype='object')


Unnamed: 0_level_0,mean,median,max,min,std
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2020-07-16 09:25:00,65.035743,63.269266,167.014969,14.456832,11.721119
2020-07-16 09:30:00,62.730299,62.801274,133.768457,19.33908,3.857899
2020-07-16 09:35:00,62.568438,62.513998,139.907112,25.019992,3.254793
2020-07-16 09:40:00,62.401744,62.225397,130.326513,31.701735,3.242671
2020-07-16 09:45:00,62.548595,62.465991,109.402925,28.160256,2.355077
2020-07-16 09:50:00,63.158727,62.697687,122.723266,15.524175,3.646927
2020-07-16 09:55:00,63.193804,62.713635,131.373513,26.70206,4.738275
2020-07-16 10:00:00,64.353551,63.0,151.294415,17.832555,10.94662
2020-07-16 10:05:00,64.282381,61.886994,131.651054,34.985711,14.525395
2020-07-16 10:10:00,64.558886,62.713635,184.483062,15.748016,13.317263


In [21]:
# Lo mismo aplicamos para 1 hora
df_procesado_1hora = acelerometro_values['magnitude'].resample('1h') 
# Obtenemos el promedio
df_1hora = df_procesado_1hora.agg(['mean', 'median', 'max', 'min', 'std'])

# Removemos las columnas que no necesitamos por ahora
# df_1hora = df_1hora.drop(columns=columns_to_remove)
df_1hora.head(20)

Unnamed: 0_level_0,mean,median,max,min,std
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2020-07-16 09:00:00,62.836575,62.561969,167.014969,14.456832,4.116197
2020-07-16 10:00:00,63.699399,63.071388,184.483062,8.774964,7.898439
2020-07-16 11:00:00,63.038655,62.896741,153.323188,7.874008,4.180281
2020-07-16 12:00:00,63.215403,62.872888,213.267438,8.774964,5.198428
2020-07-16 13:00:00,63.37144,62.88084,179.830476,12.688578,6.17139
2020-07-16 14:00:00,63.009526,62.817195,159.918729,13.928388,4.205482
2020-07-16 15:00:00,62.996885,62.753486,186.209559,4.690416,5.131926
2020-07-16 16:00:00,63.028031,62.561969,197.671445,7.141428,5.624842
2020-07-16 17:00:00,63.658786,63.055531,208.156191,11.18034,6.270833
2020-07-16 18:00:00,63.284474,63.182276,143.111844,9.486833,3.74475


In [23]:
# Crear a serie de dataframe de 5 min
series5min = quartiles(df_procesado_5min)
series5min.head()

q1    datetime
2020-07-16 09:25:00    60.514461
2020...
q3    datetime
2020-07-16 09:25:00    67.448126
2020...
dtype: object

In [24]:
# Separar los cuartiles en columnas individuales
# Obtenemos los quantiles
df_5min_quantil1 = df_procesado_5min.quantile(0.25)
df_5min_quantil3 = df_procesado_5min.quantile(0.75)
df_1hora_quantil1 = df_procesado_1hora.quantile(0.25)
df_1hora_quantil3 = df_procesado_1hora.quantile(0.75)
df_5min['q1'] = df_5min_quantil1
df_5min['q3'] = df_5min_quantil3
df_5min.head(10)
# df_1hora[['q1', 'q3']] = [df_1hora_quantil1,df_1hora_quantil3]


Unnamed: 0_level_0,mean,median,max,min,std,q1,q3
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2020-07-16 09:25:00,65.035743,63.269266,167.014969,14.456832,11.721119,60.514461,67.448126
2020-07-16 09:30:00,62.730299,62.801274,133.768457,19.33908,3.857899,62.12085,63.269266
2020-07-16 09:35:00,62.568438,62.513998,139.907112,25.019992,3.254793,61.862751,63.06346
2020-07-16 09:40:00,62.401744,62.225397,130.326513,31.701735,3.242671,61.717096,62.841069
2020-07-16 09:45:00,62.548595,62.465991,109.402925,28.160256,2.355077,61.983869,62.88084
2020-07-16 09:50:00,63.158727,62.697687,122.723266,15.524175,3.646927,62.241465,63.40347
2020-07-16 09:55:00,63.193804,62.713635,131.373513,26.70206,4.738275,62.128898,63.537391
2020-07-16 10:00:00,64.353551,63.0,151.294415,17.832555,10.94662,58.906706,66.274052
2020-07-16 10:05:00,64.282381,61.886994,131.651054,34.985711,14.525395,55.362442,66.889835
2020-07-16 10:10:00,64.558886,62.713635,184.483062,15.748016,13.317263,58.273064,67.904344


In [25]:
df_5min.count()

mean      1866
median    1866
max       1866
min       1866
std       1866
q1        1866
q3        1866
dtype: int64

In [26]:
# Igual con 1hora de dataset
df_1hora['q1'] = df_1hora_quantil1
df_1hora['q3'] = df_1hora_quantil3
df_1hora.head(10)

Unnamed: 0_level_0,mean,median,max,min,std,q1,q3
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2020-07-16 09:00:00,62.836575,62.561969,167.014969,14.456832,4.116197,61.975802,63.190189
2020-07-16 10:00:00,63.699399,63.071388,184.483062,8.774964,7.898439,61.822326,64.544558
2020-07-16 11:00:00,63.038655,62.896741,153.323188,7.874008,4.180281,62.201286,63.576725
2020-07-16 12:00:00,63.215403,62.872888,213.267438,8.774964,5.198428,62.136946,63.725976
2020-07-16 13:00:00,63.37144,62.88084,179.830476,12.688578,6.17139,61.465437,64.807407
2020-07-16 14:00:00,63.009526,62.817195,159.918729,13.928388,4.205482,62.201286,63.450768
2020-07-16 15:00:00,62.996885,62.753486,186.209559,4.690416,5.131926,61.983869,63.631753
2020-07-16 16:00:00,63.028031,62.561969,197.671445,7.141428,5.624842,61.757591,63.686733
2020-07-16 17:00:00,63.658786,63.055531,208.156191,11.18034,6.270833,62.112801,64.58328
2020-07-16 18:00:00,63.284474,63.182276,143.111844,9.486833,3.74475,62.201286,64.311741


In [27]:
df_1hora.count()

mean      161
median    161
max       161
min       161
std       161
q1        161
q3        161
dtype: int64

In [28]:
# Exportamos los resultados en un csv
df_5min.to_csv("ACC_5min.csv")
df_1hora.to_csv("ACC_1hora.csv")