## ANALISIS DE DATOS DE INTERVALOS DE LATIDOS DEL CORAZON (IBI)

El presente es para analizar los datos de la temperatura de la piel del smartwatch, el cual tiene un procesamiento de datos en 1.25Hz, que serian 1 registros por 0.80 segundos

In [352]:
# Importando Pandas y otras librerias
import pandas as pd
import numpy as np

In [353]:
PACIENTE = '015'
PATH_FOLDER = 'G:\\Dataset\\big-ideas-lab-glycemic-variability-and-wearable-device-data-1.1.2\\'+PACIENTE+'\\'

In [354]:
# Leyendo el CSV
ibi_values = pd.read_csv(PATH_FOLDER + 'IBI_'+PACIENTE+'.csv', engine='python', na_values="not available")

In [355]:
ibi_values.head()

Unnamed: 0,datetime,ibi
0,2020-07-24 07:08:38.689775,0.687531
1,2020-07-24 07:08:39.267926,0.578151
2,2020-07-24 07:08:39.814826,0.5469
3,2020-07-24 07:08:40.486732,0.671906
4,2020-07-24 07:08:41.158638,0.671906


In [356]:
ibi_values.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 225714 entries, 0 to 225713
Data columns (total 2 columns):
 #   Column    Non-Null Count   Dtype  
---  ------    --------------   -----  
 0   datetime  225714 non-null  object 
 1    ibi      225714 non-null  float64
dtypes: float64(1), object(1)
memory usage: 3.4+ MB


In [357]:
ibi_values.count()

datetime    225714
 ibi        225714
dtype: int64

In [358]:
ibi_values["datetime"].head()

0    2020-07-24 07:08:38.689775
1    2020-07-24 07:08:39.267926
2    2020-07-24 07:08:39.814826
3    2020-07-24 07:08:40.486732
4    2020-07-24 07:08:41.158638
Name: datetime, dtype: object

### Trabajando con Datetime
Lo primero sera convertir los datetime a el formato correcto, ya que lo esta detectando como object, lo siguiente sera colocar como index las fechas y al final agrupar por cada 5 minutos los datos para obtener el promedio y media de los datos


In [359]:
# Convertimos en fechas los datimetimes
ibi_values['datetime'] = pd.to_datetime(ibi_values['datetime'])
print(ibi_values.columns)

Index(['datetime', ' ibi'], dtype='object')


In [360]:


## Se coloca indices como datetime
ibi_values = ibi_values.set_index('datetime')
print(ibi_values.columns)


Index([' ibi'], dtype='object')


In [361]:
df_procesado_5min = ibi_values.resample('5min') 

### Calculamos le media, la mediana y demas factores de estadistica

En este caso tenemos que obtener el promedio, mediana, max, min, desviacion estandar y quartiles

In [362]:
# Funcion para calcular los cuartiles 1 y 3 que indican en el paper
# 
def quartiles(x):
    return pd.Series([x.quantile(0.25), x.quantile(0.75)], index=['q1', 'q3'])


In [363]:
# Crear a serie de dataframe de 5 min
series5min = quartiles(df_procesado_5min)
series5min.head()

q1                              ibi
datetime        ...
q3                              ibi
datetime        ...
dtype: object

In [364]:
# Definimos los metodos del dataframe a calcular
df_5min = df_procesado_5min.agg(['mean', 'median', 'max', 'min', 'std'])
print(df_5min.columns)
df_5min.head(8)

MultiIndex([(' ibi',   'mean'),
            (' ibi', 'median'),
            (' ibi',    'max'),
            (' ibi',    'min'),
            (' ibi',    'std')],
           )


Unnamed: 0_level_0,ibi,ibi,ibi,ibi,ibi
Unnamed: 0_level_1,mean,median,max,min,std
datetime,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
2020-07-05 15:10:00,0.676208,0.687531,0.87504,0.453146,0.088469
2020-07-05 15:15:00,0.674718,0.687531,0.812537,0.453146,0.064038
2020-07-05 15:20:00,0.70684,0.718783,0.859414,0.562526,0.0491
2020-07-05 15:25:00,0.700883,0.703157,0.843789,0.609403,0.031342
2020-07-05 15:30:00,0.71683,0.703157,0.890666,0.609403,0.066499
2020-07-05 15:35:00,0.717581,0.718783,0.87504,0.562526,0.06691
2020-07-05 15:40:00,0.683067,0.687531,0.76566,0.625029,0.045747
2020-07-05 15:45:00,,,,,


In [365]:
# Lo mismo aplicamos para 1 hora
df_procesado_1hora = ibi_values[' ibi'].resample('1h') 
# Obtenemos el promedio
df_1hora = df_procesado_1hora.agg(['mean', 'median', 'max', 'min', 'std'])

# Removemos las columnas que no necesitamos por ahora
# df_1hora = df_1hora.drop(columns=columns_to_remove)
df_1hora.head(10)

Unnamed: 0_level_0,mean,median,max,min,std
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2020-07-05 15:00:00,0.690484,0.703157,1.015671,0.390643,0.071263
2020-07-05 16:00:00,0.654875,0.65628,1.328186,0.515649,0.042053
2020-07-05 17:00:00,0.61737,0.625029,0.968794,0.359391,0.052755
2020-07-05 18:00:00,0.663339,0.65628,0.906291,0.43752,0.051523
2020-07-05 19:00:00,0.685186,0.687531,1.296934,0.515649,0.05648
2020-07-05 20:00:00,0.687469,0.687531,0.906291,0.390643,0.053175
2020-07-05 21:00:00,0.666364,0.671906,0.890666,0.468771,0.040332
2020-07-05 22:00:00,0.643108,0.640654,1.031297,0.515649,0.025577
2020-07-05 23:00:00,0.644963,0.640654,0.76566,0.484397,0.023067
2020-07-06 00:00:00,0.658001,0.65628,0.87504,0.421894,0.032567


In [366]:
# Separar los cuartiles en columnas individuales
# Obtenemos los quantiles
df_5min_quantil1 = df_procesado_5min.quantile(0.25)
df_5min_quantil3 = df_procesado_5min.quantile(0.75)
df_1hora_quantil1 = df_procesado_1hora.quantile(0.25)
df_1hora_quantil3 = df_procesado_1hora.quantile(0.75)
df_5min['q1'] = df_5min_quantil1
df_5min['q3'] = df_5min_quantil3
df_5min.head(10)
# df_1hora[['q1', 'q3']] = [df_1hora_quantil1,df_1hora_quantil3]


Unnamed: 0_level_0,ibi,ibi,ibi,ibi,ibi,q1,q3
Unnamed: 0_level_1,mean,median,max,min,std,Unnamed: 6_level_1,Unnamed: 7_level_1
datetime,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
2020-07-05 15:10:00,0.676208,0.687531,0.87504,0.453146,0.088469,0.625029,0.718783
2020-07-05 15:15:00,0.674718,0.687531,0.812537,0.453146,0.064038,0.640654,0.718783
2020-07-05 15:20:00,0.70684,0.718783,0.859414,0.562526,0.0491,0.687531,0.734409
2020-07-05 15:25:00,0.700883,0.703157,0.843789,0.609403,0.031342,0.687531,0.718783
2020-07-05 15:30:00,0.71683,0.703157,0.890666,0.609403,0.066499,0.671906,0.753941
2020-07-05 15:35:00,0.717581,0.718783,0.87504,0.562526,0.06691,0.671906,0.773473
2020-07-05 15:40:00,0.683067,0.687531,0.76566,0.625029,0.045747,0.65628,0.695344
2020-07-05 15:45:00,,,,,,,
2020-07-05 15:50:00,0.486042,0.484397,0.562526,0.390643,0.050201,0.453145,0.515649
2020-07-05 15:55:00,0.663225,0.593777,1.015671,0.531274,0.160708,0.5469,0.718783


In [367]:
df_5min.count()

 ibi  mean      1123
      median    1123
      max       1123
      min       1123
      std       1111
q1              1123
q3              1123
dtype: int64

In [368]:
# Igual con 1hora de dataset
df_1hora['q1'] = df_1hora_quantil1
df_1hora['q3'] = df_1hora_quantil3
df_1hora.head(10)

Unnamed: 0_level_0,mean,median,max,min,std,q1,q3
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2020-07-05 15:00:00,0.690484,0.703157,1.015671,0.390643,0.071263,0.65628,0.734409
2020-07-05 16:00:00,0.654875,0.65628,1.328186,0.515649,0.042053,0.640654,0.671906
2020-07-05 17:00:00,0.61737,0.625029,0.968794,0.359391,0.052755,0.593777,0.65628
2020-07-05 18:00:00,0.663339,0.65628,0.906291,0.43752,0.051523,0.640654,0.687531
2020-07-05 19:00:00,0.685186,0.687531,1.296934,0.515649,0.05648,0.65628,0.703157
2020-07-05 20:00:00,0.687469,0.687531,0.906291,0.390643,0.053175,0.65628,0.718783
2020-07-05 21:00:00,0.666364,0.671906,0.890666,0.468771,0.040332,0.65628,0.687531
2020-07-05 22:00:00,0.643108,0.640654,1.031297,0.515649,0.025577,0.625029,0.65628
2020-07-05 23:00:00,0.644963,0.640654,0.76566,0.484397,0.023067,0.640654,0.65628
2020-07-06 00:00:00,0.658001,0.65628,0.87504,0.421894,0.032567,0.640654,0.671906


In [369]:
df_1hora.count()

mean      101
median    101
max       101
min       101
std       101
q1        101
q3        101
dtype: int64

In [370]:
# Exportamos los resultados en un csv
df_5min.to_csv("IBI_5min_"+PACIENTE+".csv")
df_1hora.to_csv("IBI_1hora_"+PACIENTE+".csv")

### ARCHIVOS CSV GENERADOS CON EXITO PARA 5 MIN Y 1 HORA

Para esta parte ahora tenemos que calcular los calculos de VFC, para ello se esta utilizando una libreria reada por Digital Biomarkers Discovery, la cual se encargara de procesar los datos por las ventanas de 5 minutos

In [371]:
# Ahora generamos el calculo de VFC
# Primero importamos la libreria especial de Digital Biomarkers Discovery tiene ya creada
import BIL_HRV as bh
import os
import time

In [372]:
# Función para calcular MeanRR y MeanHR
TEMPORAL_NAME = 'test.csv'
def calculate_hr(df):
    time.sleep(0.2)
    df.fillna(0)
    # df['ibi'] = df[' ibi']
    df[' ibi'] = pd.to_numeric(df[' ibi'], errors='coerce')
    df = df.dropna(subset=[' ibi'])
    # df = df.drop([' ibi'], axis=1)
    df[' ibi'] = df[' ibi'].astype(float)
    df.to_csv(TEMPORAL_NAME)
    try:
        results = bh.hrv(TEMPORAL_NAME)
    except Exception as error:
    # handle the exception
        print("An exception occurred:", error) 
        print("Exception found, Default value response")
        # Crear un diccionario con valores vacíos
        results = {
            'MeanRR': 0.0,
            'MeanHR': 0.0,
            'MinHR': 0.0,
            'MaxHR': 0.0,
            'SDNN': 0.0,
            'RMSSD': 0.0,
            'NNx': 0.0,
            'pNNx': 0.0,
            'PowerVLF': 0.0,
            'PowerLF': 0.0,
            'PowerHF': 0.0,
            'PowerTotal': 0.0,
            'LF/HF': 0.0,
            'PeakVLF': 0.0,
            'PeakLF': 0.0,
            'PeakHF': 0.0,
            'FractionLF': 0.0,
            'FractionHF': 0.0
        }
    # Eliminar el archivo
    os.remove(TEMPORAL_NAME)
    return results

In [373]:
import warnings
# Resamplear el DataFrame a 5 minutos y aplicar la función
# Or if you are using > Python 3.11:
with warnings.catch_warnings(action="ignore"):
    resampled = df_procesado_5min.apply(calculate_hr).apply(pd.Series)

An exception occurred: max() arg is an empty sequence
Exception found, Default value response
An exception occurred: max() arg is an empty sequence
Exception found, Default value response
An exception occurred: max() arg is an empty sequence
Exception found, Default value response
An exception occurred: max() arg is an empty sequence
Exception found, Default value response
An exception occurred: max() arg is an empty sequence
Exception found, Default value response
An exception occurred: max() arg is an empty sequence
Exception found, Default value response
An exception occurred: attempt to get argmax of an empty sequence
Exception found, Default value response
An exception occurred: max() arg is an empty sequence
Exception found, Default value response
An exception occurred: max() arg is an empty sequence
Exception found, Default value response
An exception occurred: max() arg is an empty sequence
Exception found, Default value response
An exception occurred: max() arg is an empty seq

In [374]:
# Unir los resultados al DataFrame original
df_resampled = df_procesado_5min.mean()

df_resampled['MeanRR'] = resampled['MeanRR']
df_resampled['MeanHR'] = resampled['MeanHR']
df_resampled['MinHR'] = resampled['MinHR']
df_resampled['MaxHR'] = resampled['MaxHR']
df_resampled['SDNN'] = resampled['SDNN']
df_resampled['RMSSD'] = resampled['RMSSD']
df_resampled['NNx'] = resampled['NNx']
df_resampled['pNNx'] = resampled['pNNx']
df_resampled['PowerVLF'] = resampled['PowerVLF']
df_resampled['PowerLF'] = resampled['PowerLF']
df_resampled['PowerHF'] = resampled['PowerHF']
df_resampled['PowerTotal'] = resampled['PowerTotal']
df_resampled['LF/HF'] = resampled['LF/HF']
df_resampled['PeakVLF'] = resampled['PeakVLF']
df_resampled['PeakLF'] = resampled['PeakLF']
df_resampled['PeakHF'] = resampled['PeakHF']
df_resampled['FractionLF'] = resampled['FractionLF']
df_resampled['FractionHF'] = resampled['FractionHF']

In [375]:
df_resampled.head()

Unnamed: 0_level_0,ibi,MeanRR,MeanHR,MinHR,MaxHR,SDNN,RMSSD,NNx,pNNx,PowerVLF,PowerLF,PowerHF,PowerTotal,LF/HF,PeakVLF,PeakLF,PeakHF,FractionLF,FractionHF
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
2020-07-05 15:10:00,0.676208,671.8,90.2,78.4,106.1,64.2,85.5,34.0,50.0,1687.27,1165.29,2005.26,4857.82,0.58,0.02,0.04,0.24,36.75,63.25
2020-07-05 15:15:00,0.674718,673.8,89.2,84.2,101.0,28.2,88.9,54.0,54.5,484.36,872.94,1671.55,3028.85,0.52,0.03,0.05,0.2,34.31,65.69
2020-07-05 15:20:00,0.70684,707.5,84.8,81.5,89.9,14.9,78.3,68.0,43.6,73.4,179.95,335.44,588.79,0.54,0.02,0.06,0.33,34.91,65.09
2020-07-05 15:25:00,0.700883,700.5,85.7,83.1,88.7,10.2,46.8,28.0,17.8,12.59,210.18,115.1,337.87,1.83,0.03,0.06,0.38,64.62,35.38
2020-07-05 15:30:00,0.71683,716.1,83.8,80.7,86.3,10.6,106.8,35.0,63.6,20.86,161.76,2186.96,2369.59,0.07,0.03,0.13,0.18,6.89,93.11


In [376]:
df_resampled.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 5711 entries, 2020-07-05 15:10:00 to 2020-07-25 11:00:00
Freq: 5T
Data columns (total 19 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0    ibi        1123 non-null   float64
 1   MeanRR      5711 non-null   float64
 2   MeanHR      5711 non-null   float64
 3   MinHR       5711 non-null   float64
 4   MaxHR       5711 non-null   float64
 5   SDNN        5711 non-null   float64
 6   RMSSD       5711 non-null   float64
 7   NNx         5711 non-null   float64
 8   pNNx        5711 non-null   float64
 9   PowerVLF    5711 non-null   float64
 10  PowerLF     5711 non-null   float64
 11  PowerHF     5711 non-null   float64
 12  PowerTotal  5711 non-null   float64
 13  LF/HF       5696 non-null   float64
 14  PeakVLF     5711 non-null   float64
 15  PeakLF      5711 non-null   float64
 16  PeakHF      5711 non-null   float64
 17  FractionLF  5696 non-null   float64
 18  FractionHF  5696 non-nu

In [377]:
df_resampled = df_resampled.dropna(subset=[' ibi'])
df_resampled.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1123 entries, 2020-07-05 15:10:00 to 2020-07-25 11:00:00
Data columns (total 19 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0    ibi        1123 non-null   float64
 1   MeanRR      1123 non-null   float64
 2   MeanHR      1123 non-null   float64
 3   MinHR       1123 non-null   float64
 4   MaxHR       1123 non-null   float64
 5   SDNN        1123 non-null   float64
 6   RMSSD       1123 non-null   float64
 7   NNx         1123 non-null   float64
 8   pNNx        1123 non-null   float64
 9   PowerVLF    1123 non-null   float64
 10  PowerLF     1123 non-null   float64
 11  PowerHF     1123 non-null   float64
 12  PowerTotal  1123 non-null   float64
 13  LF/HF       1108 non-null   float64
 14  PeakVLF     1123 non-null   float64
 15  PeakLF      1123 non-null   float64
 16  PeakHF      1123 non-null   float64
 17  FractionLF  1108 non-null   float64
 18  FractionHF  1108 non-null   floa

In [378]:
df_resampled.to_csv("IBI_5min_hr_data_"+PACIENTE+".csv")