## ANALISIS DE DATOS DE TEMP (TEMPERATURE SKIN)

El presente es para analizar los datos de la temperatura de la piel del smartwatch, el cual tiene un procesamiento de datos en 4Hz, que serian 4 registros por segundo

In [267]:
# Importando Pandas y otras librerias
import pandas as pd
import numpy as np

In [268]:
PACIENTE = '015'
PATH_FOLDER = 'G:\\Dataset\\big-ideas-lab-glycemic-variability-and-wearable-device-data-1.1.2\\'+PACIENTE+'\\'

In [269]:
# Leyendo el CSV
temperature_values = pd.read_csv(PATH_FOLDER + 'TEMP_'+PACIENTE+'.csv', engine='python', na_values="not available")

In [270]:
temperature_values.head()

Unnamed: 0,datetime,temp
0,2020-07-24 07:07:49.000,25.09
1,2020-07-24 07:07:49.250,25.09
2,2020-07-24 07:07:49.500,25.09
3,2020-07-24 07:07:49.750,25.09
4,2020-07-24 07:07:50.000,25.09


In [271]:
temperature_values.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1746216 entries, 0 to 1746215
Data columns (total 2 columns):
 #   Column    Dtype  
---  ------    -----  
 0   datetime  object 
 1    temp     float64
dtypes: float64(1), object(1)
memory usage: 26.6+ MB


In [272]:
temperature_values.count()

datetime    1746216
 temp       1746216
dtype: int64

In [273]:
temperature_values["datetime"].head()

0    2020-07-24 07:07:49.000
1    2020-07-24 07:07:49.250
2    2020-07-24 07:07:49.500
3    2020-07-24 07:07:49.750
4    2020-07-24 07:07:50.000
Name: datetime, dtype: object

### Trabajando con Datetime
Lo primero sera convertir los datetime a el formato correcto, ya que lo esta detectando como object, lo siguiente sera colocar como index las fechas y al final agrupar por cada 5 minutos los datos para obtener el promedio y media de los datos


In [274]:
# Convertimos en fechas los datimetimes
temperature_values['datetime'] = pd.to_datetime(temperature_values['datetime'])
print(temperature_values.columns)

Index(['datetime', ' temp'], dtype='object')


In [275]:


## Se coloca indices como datetime
temperature_values = temperature_values.set_index('datetime')
print(temperature_values.columns)


Index([' temp'], dtype='object')


In [276]:
df_procesado_5min = temperature_values[' temp'].resample('5min') 

### Calculamos le media, la mediana y demas factores de estadistica

En este caso tenemos que obtener el promedio, mediana, max, min, desviacion estandar y quartiles

In [277]:
# Funcion para calcular los cuartiles 1 y 3 que indican en el paper
# 
def quartiles(x):
    return pd.Series([x.quantile(0.25), x.quantile(0.75)], index=['q1', 'q3'])


In [278]:
# Crear a serie de dataframe de 5 min
series5min = quartiles(df_procesado_5min)
series5min.head()

q1    datetime
2020-07-05 15:10:00    33.33
2020-07-...
q3    datetime
2020-07-05 15:10:00    34.37
2020-07-...
dtype: object

In [279]:
# Definimos los metodos del dataframe a calcular
df_5min = df_procesado_5min.agg(['mean', 'median', 'max', 'min', 'std'])
print(df_5min.columns)
# Removemos las columnas que no necesitamos por ahora
# Supongamos que tienes tus datos en un DataFrame llamado 'df'
# columns_to_remove = [' temp']
# df_5min = df_5min.drop(columns=columns_to_remove)
df_5min.head(20)

Index(['mean', 'median', 'max', 'min', 'std'], dtype='object')


Unnamed: 0_level_0,mean,median,max,min,std
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2020-07-05 15:10:00,33.324361,34.11,34.57,25.37,1.927075
2020-07-05 15:15:00,34.8679,34.82,35.23,34.53,0.183939
2020-07-05 15:20:00,35.2036,35.21,35.33,35.07,0.066468
2020-07-05 15:25:00,35.108233,35.11,35.25,34.95,0.083216
2020-07-05 15:30:00,35.030833,35.07,35.15,34.73,0.110821
2020-07-05 15:35:00,34.4624,34.42,34.75,34.13,0.169143
2020-07-05 15:40:00,33.9546,33.87,34.27,33.75,0.159206
2020-07-05 15:45:00,33.588067,33.34,34.16,33.18,0.378743
2020-07-05 15:50:00,33.065233,33.0,33.45,32.89,0.150868
2020-07-05 15:55:00,32.655433,32.79,32.99,32.15,0.258627


In [280]:
# Lo mismo aplicamos para 1 hora
df_procesado_1hora = temperature_values[' temp'].resample('1h') 
# Obtenemos el promedio
df_1hora = df_procesado_1hora.agg(['mean', 'median', 'max', 'min', 'std'])

# Removemos las columnas que no necesitamos por ahora
# df_1hora = df_1hora.drop(columns=columns_to_remove)
df_1hora.head(20)

Unnamed: 0_level_0,mean,median,max,min,std
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2020-07-05 15:00:00,34.146061,34.34,35.33,25.37,1.054504
2020-07-05 16:00:00,34.109069,34.33,34.97,32.73,0.562448
2020-07-05 17:00:00,34.280614,34.39,35.21,33.29,0.480809
2020-07-05 18:00:00,35.249569,35.15,36.03,34.18,0.413163
2020-07-05 19:00:00,35.871483,35.87,36.23,35.53,0.131229
2020-07-05 20:00:00,33.729894,33.58,35.91,31.35,1.362011
2020-07-05 21:00:00,34.849931,35.23,35.84,32.77,0.819861
2020-07-05 22:00:00,35.794528,35.89,36.34,33.97,0.43391
2020-07-05 23:00:00,35.205603,35.205,36.05,34.45,0.437902
2020-07-06 00:00:00,34.567453,34.59,35.16,33.66,0.282621


In [281]:
# Separar los cuartiles en columnas individuales
# Obtenemos los quantiles
df_5min_quantil1 = df_procesado_5min.quantile(0.25)
df_5min_quantil3 = df_procesado_5min.quantile(0.75)
df_1hora_quantil1 = df_procesado_1hora.quantile(0.25)
df_1hora_quantil3 = df_procesado_1hora.quantile(0.75)
df_5min['q1'] = df_5min_quantil1
df_5min['q3'] = df_5min_quantil3
df_5min.head(10)
# df_1hora[['q1', 'q3']] = [df_1hora_quantil1,df_1hora_quantil3]


Unnamed: 0_level_0,mean,median,max,min,std,q1,q3
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2020-07-05 15:10:00,33.324361,34.11,34.57,25.37,1.927075,33.33,34.37
2020-07-05 15:15:00,34.8679,34.82,35.23,34.53,0.183939,34.75,35.03
2020-07-05 15:20:00,35.2036,35.21,35.33,35.07,0.066468,35.15,35.27
2020-07-05 15:25:00,35.108233,35.11,35.25,34.95,0.083216,35.05,35.18
2020-07-05 15:30:00,35.030833,35.07,35.15,34.73,0.110821,35.0,35.11
2020-07-05 15:35:00,34.4624,34.42,34.75,34.13,0.169143,34.33,34.65
2020-07-05 15:40:00,33.9546,33.87,34.27,33.75,0.159206,33.81,34.115
2020-07-05 15:45:00,33.588067,33.34,34.16,33.18,0.378743,33.25,34.0
2020-07-05 15:50:00,33.065233,33.0,33.45,32.89,0.150868,32.97,33.09
2020-07-05 15:55:00,32.655433,32.79,32.99,32.15,0.258627,32.41,32.87


In [282]:
df_5min.count()

mean      1165
median    1165
max       1165
min       1165
std       1165
q1        1165
q3        1165
dtype: int64

In [283]:
# Igual con 1hora de dataset
df_1hora['q1'] = df_1hora_quantil1
df_1hora['q3'] = df_1hora_quantil3
df_1hora.head(10)

Unnamed: 0_level_0,mean,median,max,min,std,q1,q3
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2020-07-05 15:00:00,34.146061,34.34,35.33,25.37,1.054504,33.29,35.07
2020-07-05 16:00:00,34.109069,34.33,34.97,32.73,0.562448,33.75,34.55
2020-07-05 17:00:00,34.280614,34.39,35.21,33.29,0.480809,33.81,34.65
2020-07-05 18:00:00,35.249569,35.15,36.03,34.18,0.413163,34.91,35.68
2020-07-05 19:00:00,35.871483,35.87,36.23,35.53,0.131229,35.79,35.95
2020-07-05 20:00:00,33.729894,33.58,35.91,31.35,1.362011,32.53,34.875
2020-07-05 21:00:00,34.849931,35.23,35.84,32.77,0.819861,34.59,35.49
2020-07-05 22:00:00,35.794528,35.89,36.34,33.97,0.43391,35.73,36.09
2020-07-05 23:00:00,35.205603,35.205,36.05,34.45,0.437902,34.77,35.61
2020-07-06 00:00:00,34.567453,34.59,35.16,33.66,0.282621,34.49,34.73


In [284]:
df_1hora.count()

mean      101
median    101
max       101
min       101
std       101
q1        101
q3        101
dtype: int64

In [285]:
# Exportamos los resultados en un csv
df_5min.to_csv("TEMP_5min_"+PACIENTE+".csv")
df_1hora.to_csv("TEMP_1hora_"+PACIENTE+".csv")

### ARCHIVOS CSV GENERADOS CON EXITO PARA 5 MIN Y 1 HORA