## ANALISIS DE DATOS DE INTERVALOS DE LATIDOS DEL CORAZON (IBI)

El presente es para analizar los datos de la temperatura de la piel del smartwatch, el cual tiene un procesamiento de datos en 1.25Hz, que serian 1 registros por 0.80 segundos

In [23]:
# Importando Pandas y otras librerias
import pandas as pd
import numpy as np

In [24]:
PACIENTE = '001'
PATH_FOLDER = 'G:\\Dataset\\big-ideas-lab-glycemic-variability-and-wearable-device-data-1.1.2\\'+PACIENTE+'\\'

In [25]:
# Leyendo el CSV
ibi_values = pd.read_csv(PATH_FOLDER + 'IBI_'+PACIENTE+'.csv', engine='python', na_values="not available")

In [26]:
ibi_values.head()

Unnamed: 0,datetime,ibi
0,2020-02-13 15:33:22.059328,0.828163
1,2020-02-13 15:33:22.934368,0.87504
2,2020-02-13 15:34:21.593303,0.98442
3,2020-02-13 15:34:22.483969,0.890666
4,2020-02-13 15:34:23.421512,0.937543


In [27]:
ibi_values.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 266366 entries, 0 to 266365
Data columns (total 2 columns):
 #   Column    Non-Null Count   Dtype  
---  ------    --------------   -----  
 0   datetime  266366 non-null  object 
 1    ibi      266366 non-null  float64
dtypes: float64(1), object(1)
memory usage: 4.1+ MB


In [28]:
ibi_values.count()

datetime    266366
 ibi        266366
dtype: int64

In [29]:
ibi_values["datetime"].head()

0    2020-02-13 15:33:22.059328
1    2020-02-13 15:33:22.934368
2    2020-02-13 15:34:21.593303
3    2020-02-13 15:34:22.483969
4    2020-02-13 15:34:23.421512
Name: datetime, dtype: object

### Trabajando con Datetime
Lo primero sera convertir los datetime a el formato correcto, ya que lo esta detectando como object, lo siguiente sera colocar como index las fechas y al final agrupar por cada 5 minutos los datos para obtener el promedio y media de los datos


In [30]:
# Convertimos en fechas los datimetimes
ibi_values['datetime'] = pd.to_datetime(ibi_values['datetime'])
print(ibi_values.columns)

Index(['datetime', ' ibi'], dtype='object')


In [31]:


## Se coloca indices como datetime
ibi_values = ibi_values.set_index('datetime')
print(ibi_values.columns)


Index([' ibi'], dtype='object')


In [32]:
df_procesado_5min = ibi_values.resample('5min') 

### Calculamos le media, la mediana y demas factores de estadistica

En este caso tenemos que obtener el promedio, mediana, max, min, desviacion estandar y quartiles

In [33]:
# Funcion para calcular los cuartiles 1 y 3 que indican en el paper
# 
def quartiles(x):
    return pd.Series([x.quantile(0.25), x.quantile(0.75)], index=['q1', 'q3'])


In [34]:
# Crear a serie de dataframe de 5 min
series5min = quartiles(df_procesado_5min)
series5min.head()

q1                              ibi
datetime        ...
q3                              ibi
datetime        ...
dtype: object

In [35]:
# Definimos los metodos del dataframe a calcular
df_5min = df_procesado_5min.agg(['mean', 'median', 'max', 'min', 'std'])
print(df_5min.columns)
df_5min.head(8)

MultiIndex([(' ibi',   'mean'),
            (' ibi', 'median'),
            (' ibi',    'max'),
            (' ibi',    'min'),
            (' ibi',    'std')],
           )


Unnamed: 0_level_0,ibi,ibi,ibi,ibi,ibi
Unnamed: 0_level_1,mean,median,max,min,std
datetime,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
2020-02-13 15:30:00,0.903166,0.890666,0.98442,0.828163,0.05991
2020-02-13 15:35:00,0.849333,0.921917,1.140677,0.468771,0.228782
2020-02-13 15:40:00,0.930846,0.953169,1.078174,0.43752,0.1592
2020-02-13 15:45:00,0.95382,0.953169,1.250057,0.562526,0.157979
2020-02-13 15:50:00,0.937543,0.968794,1.125051,0.734409,0.098188
2020-02-13 15:55:00,0.896291,0.87504,1.281309,0.671906,0.153557
2020-02-13 16:00:00,0.515649,0.515649,0.671906,0.390643,0.058267
2020-02-13 16:05:00,0.904871,0.87504,1.0938,0.781286,0.102096


In [36]:
# Lo mismo aplicamos para 1 hora
df_procesado_1hora = ibi_values[' ibi'].resample('1h') 
# Obtenemos el promedio
df_1hora = df_procesado_1hora.agg(['mean', 'median', 'max', 'min', 'std'])

# Removemos las columnas que no necesitamos por ahora
# df_1hora = df_1hora.drop(columns=columns_to_remove)
df_1hora.head(10)

Unnamed: 0_level_0,mean,median,max,min,std
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2020-02-13 15:00:00,0.910574,0.937543,1.281309,0.43752,0.163677
2020-02-13 16:00:00,0.914551,0.968794,1.234432,0.390643,0.201014
2020-02-13 17:00:00,0.788136,0.781286,1.328186,0.515649,0.103357
2020-02-13 18:00:00,0.91894,0.937543,1.31256,0.390643,0.125242
2020-02-13 19:00:00,0.874715,0.87504,1.250057,0.359391,0.117019
2020-02-13 20:00:00,0.877129,0.87504,1.140677,0.5469,0.10834
2020-02-13 21:00:00,0.955519,0.968794,1.296934,0.578151,0.100944
2020-02-13 22:00:00,0.98295,0.98442,1.406314,0.359391,0.079524
2020-02-13 23:00:00,0.915187,0.906291,1.140677,0.406269,0.064902
2020-02-14 00:00:00,0.98985,0.98442,1.296934,0.734409,0.085531


In [37]:
# Separar los cuartiles en columnas individuales
# Obtenemos los quantiles
df_5min_quantil1 = df_procesado_5min.quantile(0.25)
df_5min_quantil3 = df_procesado_5min.quantile(0.75)
df_1hora_quantil1 = df_procesado_1hora.quantile(0.25)
df_1hora_quantil3 = df_procesado_1hora.quantile(0.75)
df_5min['q1'] = df_5min_quantil1
df_5min['q3'] = df_5min_quantil3
df_5min.head(10)
# df_1hora[['q1', 'q3']] = [df_1hora_quantil1,df_1hora_quantil3]


Unnamed: 0_level_0,ibi,ibi,ibi,ibi,ibi,q1,q3
Unnamed: 0_level_1,mean,median,max,min,std,Unnamed: 6_level_1,Unnamed: 7_level_1
datetime,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
2020-02-13 15:30:00,0.903166,0.890666,0.98442,0.828163,0.05991,0.87504,0.937543
2020-02-13 15:35:00,0.849333,0.921917,1.140677,0.468771,0.228782,0.625028,1.03911
2020-02-13 15:40:00,0.930846,0.953169,1.078174,0.43752,0.1592,0.910197,1.023484
2020-02-13 15:45:00,0.95382,0.953169,1.250057,0.562526,0.157979,0.890666,1.046923
2020-02-13 15:50:00,0.937543,0.968794,1.125051,0.734409,0.098188,0.859414,1.000046
2020-02-13 15:55:00,0.896291,0.87504,1.281309,0.671906,0.153557,0.796911,0.953169
2020-02-13 16:00:00,0.515649,0.515649,0.671906,0.390643,0.058267,0.484397,0.562526
2020-02-13 16:05:00,0.904871,0.87504,1.0938,0.781286,0.102096,0.843789,0.960981
2020-02-13 16:10:00,0.887541,0.906291,1.109426,0.593777,0.191821,0.843789,0.98442
2020-02-13 16:15:00,0.974003,0.98442,1.000046,0.937543,0.032528,0.960981,0.992233


In [38]:
df_5min.count()

 ibi  mean      2076
      median    2076
      max       2076
      min       2076
      std       2056
q1              2076
q3              2076
dtype: int64

In [39]:
# Igual con 1hora de dataset
df_1hora['q1'] = df_1hora_quantil1
df_1hora['q3'] = df_1hora_quantil3
df_1hora.head(10)

Unnamed: 0_level_0,mean,median,max,min,std,q1,q3
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2020-02-13 15:00:00,0.910574,0.937543,1.281309,0.43752,0.163677,0.843789,1.015671
2020-02-13 16:00:00,0.914551,0.968794,1.234432,0.390643,0.201014,0.843789,1.062549
2020-02-13 17:00:00,0.788136,0.781286,1.328186,0.515649,0.103357,0.718783,0.859414
2020-02-13 18:00:00,0.91894,0.937543,1.31256,0.390643,0.125242,0.843789,1.000046
2020-02-13 19:00:00,0.874715,0.87504,1.250057,0.359391,0.117019,0.781286,0.968794
2020-02-13 20:00:00,0.877129,0.87504,1.140677,0.5469,0.10834,0.796911,0.953169
2020-02-13 21:00:00,0.955519,0.968794,1.296934,0.578151,0.100944,0.890666,1.031297
2020-02-13 22:00:00,0.98295,0.98442,1.406314,0.359391,0.079524,0.937543,1.031297
2020-02-13 23:00:00,0.915187,0.906291,1.140677,0.406269,0.064902,0.87504,0.953169
2020-02-14 00:00:00,0.98985,0.98442,1.296934,0.734409,0.085531,0.921917,1.062549


In [40]:
df_1hora.count()

mean      184
median    184
max       184
min       184
std       184
q1        184
q3        184
dtype: int64

In [41]:
# Exportamos los resultados en un csv
df_5min.to_csv("IBI_5min_"+PACIENTE+".csv")
df_1hora.to_csv("IBI_1hora_"+PACIENTE+".csv")

### ARCHIVOS CSV GENERADOS CON EXITO PARA 5 MIN Y 1 HORA

Para esta parte ahora tenemos que calcular los calculos de VFC, para ello se esta utilizando una libreria reada por Digital Biomarkers Discovery, la cual se encargara de procesar los datos por las ventanas de 5 minutos

In [42]:
# Ahora generamos el calculo de VFC
# Primero importamos la libreria especial de Digital Biomarkers Discovery tiene ya creada
import BIL_HRV as bh
import os
import time

In [43]:
# Función para calcular MeanRR y MeanHR
TEMPORAL_NAME = 'test.csv'
def calculate_hr(df):
    time.sleep(0.2)
    df.fillna(0)
    # df['ibi'] = df[' ibi']
    df[' ibi'] = pd.to_numeric(df[' ibi'], errors='coerce')
    df = df.dropna(subset=[' ibi'])
    # df = df.drop([' ibi'], axis=1)
    df[' ibi'] = df[' ibi'].astype(float)
    df.to_csv(TEMPORAL_NAME)
    try:
        results = bh.hrv(TEMPORAL_NAME)
    except Exception as error:
    # handle the exception
        print("An exception occurred:", error) 
        print("Exception found, Default value response")
        # Crear un diccionario con valores vacíos
        results = {
            'MeanRR': 0.0,
            'MeanHR': 0.0,
            'MinHR': 0.0,
            'MaxHR': 0.0,
            'SDNN': 0.0,
            'RMSSD': 0.0,
            'NNx': 0.0,
            'pNNx': 0.0,
            'PowerVLF': 0.0,
            'PowerLF': 0.0,
            'PowerHF': 0.0,
            'PowerTotal': 0.0,
            'LF/HF': 0.0,
            'PeakVLF': 0.0,
            'PeakLF': 0.0,
            'PeakHF': 0.0,
            'FractionLF': 0.0,
            'FractionHF': 0.0
        }
    # Eliminar el archivo
    os.remove(TEMPORAL_NAME)
    return results

In [44]:
import warnings
# Resamplear el DataFrame a 5 minutos y aplicar la función
# Or if you are using > Python 3.11:
with warnings.catch_warnings(action="ignore"):
    resampled = df_procesado_5min.apply(calculate_hr).apply(pd.Series)

An exception occurred: max() arg is an empty sequence
Exception found, Default value response
An exception occurred: A value (1.0) in x_new is below the interpolation range's minimum value (1.046923).
Exception found, Default value response
An exception occurred: A value (1.0) in x_new is below the interpolation range's minimum value (1.000046).
Exception found, Default value response
An exception occurred: A value (1.0) in x_new is below the interpolation range's minimum value (1.046923).
Exception found, Default value response
An exception occurred: max() arg is an empty sequence
Exception found, Default value response
An exception occurred: max() arg is an empty sequence
Exception found, Default value response
An exception occurred: A value (1.0) in x_new is below the interpolation range's minimum value (1.000046).
Exception found, Default value response
An exception occurred: A value (1.0) in x_new is below the interpolation range's minimum value (1.046923).
Exception found, Defaul

In [45]:
# Unir los resultados al DataFrame original
df_resampled = df_procesado_5min.mean()

df_resampled['MeanRR'] = resampled['MeanRR']
df_resampled['MeanHR'] = resampled['MeanHR']
df_resampled['MinHR'] = resampled['MinHR']
df_resampled['MaxHR'] = resampled['MaxHR']
df_resampled['SDNN'] = resampled['SDNN']
df_resampled['RMSSD'] = resampled['RMSSD']
df_resampled['NNx'] = resampled['NNx']
df_resampled['pNNx'] = resampled['pNNx']
df_resampled['PowerVLF'] = resampled['PowerVLF']
df_resampled['PowerLF'] = resampled['PowerLF']
df_resampled['PowerHF'] = resampled['PowerHF']
df_resampled['PowerTotal'] = resampled['PowerTotal']
df_resampled['LF/HF'] = resampled['LF/HF']
df_resampled['PeakVLF'] = resampled['PeakVLF']
df_resampled['PeakLF'] = resampled['PeakLF']
df_resampled['PeakHF'] = resampled['PeakHF']
df_resampled['FractionLF'] = resampled['FractionLF']
df_resampled['FractionHF'] = resampled['FractionHF']

In [46]:
df_resampled.head()

Unnamed: 0_level_0,ibi,MeanRR,MeanHR,MinHR,MaxHR,SDNN,RMSSD,NNx,pNNx,PowerVLF,PowerLF,PowerHF,PowerTotal,LF/HF,PeakVLF,PeakLF,PeakHF,FractionLF,FractionHF
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
2020-02-13 15:30:00,0.903166,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2020-02-13 15:35:00,0.849333,816.3,75.3,60.0,101.0,124.3,241.2,22.0,73.3,2916.12,5027.74,11801.01,19744.87,0.43,0.04,0.12,0.16,29.88,70.12
2020-02-13 15:40:00,0.930846,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2020-02-13 15:45:00,0.95382,948.5,63.3,60.7,65.8,22.3,151.3,15.0,65.2,0.0,10821.08,3557.89,14378.97,3.04,0.0,0.09,0.18,75.26,24.74
2020-02-13 15:50:00,0.937543,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [47]:
df_resampled.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2621 entries, 2020-02-13 15:30:00 to 2020-02-22 17:50:00
Freq: 5T
Data columns (total 19 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0    ibi        2076 non-null   float64
 1   MeanRR      2621 non-null   float64
 2   MeanHR      2621 non-null   float64
 3   MinHR       2621 non-null   float64
 4   MaxHR       2621 non-null   float64
 5   SDNN        2621 non-null   float64
 6   RMSSD       2621 non-null   float64
 7   NNx         2621 non-null   float64
 8   pNNx        2621 non-null   float64
 9   PowerVLF    2621 non-null   float64
 10  PowerLF     2621 non-null   float64
 11  PowerHF     2621 non-null   float64
 12  PowerTotal  2621 non-null   float64
 13  LF/HF       2610 non-null   float64
 14  PeakVLF     2621 non-null   float64
 15  PeakLF      2621 non-null   float64
 16  PeakHF      2621 non-null   float64
 17  FractionLF  2610 non-null   float64
 18  FractionHF  2610 non-nu

In [48]:
df_resampled = df_resampled.dropna(subset=[' ibi'])
df_resampled.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2076 entries, 2020-02-13 15:30:00 to 2020-02-22 17:50:00
Data columns (total 19 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0    ibi        2076 non-null   float64
 1   MeanRR      2076 non-null   float64
 2   MeanHR      2076 non-null   float64
 3   MinHR       2076 non-null   float64
 4   MaxHR       2076 non-null   float64
 5   SDNN        2076 non-null   float64
 6   RMSSD       2076 non-null   float64
 7   NNx         2076 non-null   float64
 8   pNNx        2076 non-null   float64
 9   PowerVLF    2076 non-null   float64
 10  PowerLF     2076 non-null   float64
 11  PowerHF     2076 non-null   float64
 12  PowerTotal  2076 non-null   float64
 13  LF/HF       2065 non-null   float64
 14  PeakVLF     2076 non-null   float64
 15  PeakLF      2076 non-null   float64
 16  PeakHF      2076 non-null   float64
 17  FractionLF  2065 non-null   float64
 18  FractionHF  2065 non-null   floa

In [49]:
df_resampled.to_csv("IBI_5min_hr_data_"+PACIENTE+".csv")