## ANALISIS DE DATOS DE GLUCOSA (MEDIDOS POR DEXCOMBLOOD DEVICE)

El presente es para analizar y limpiar los datos de glucosa del dispositivo que como vemos brindan directamente el nivel de glucosa del paciente en el transcurso de los dias

In [747]:
# Importando Pandas y otras librerias
import pandas as pd
import numpy as np

In [748]:
PACIENTE = '016'
PATH_FOLDER = 'G:\\Dataset\\big-ideas-lab-glycemic-variability-and-wearable-device-data-1.1.2\\'+PACIENTE+'\\'

In [749]:
# Leyendo el CSV
glucose_values = pd.read_csv(PATH_FOLDER + 'Dexcom_'+PACIENTE+'.csv', engine='python', na_values="not available")
glucose_values = glucose_values.iloc[12:]

In [750]:
glucose_values.head()

Unnamed: 0,Index,Timestamp (YYYY-MM-DDThh:mm:ss),Event Type,Event Subtype,Patient Info,Device Info,Source Device ID,Glucose Value (mg/dL),Insulin Value (u),Carb Value (grams),Duration (hh:mm:ss),Glucose Rate of Change (mg/dL/min),Transmitter Time (Long Integer)
12,13,2020-07-16 10:48:24,EGV,,,,iPhone G6,134.0,,,,,8400.0
13,14,2020-07-16 10:53:24,EGV,,,,iPhone G6,130.0,,,,,8700.0
14,15,2020-07-16 10:58:24,EGV,,,,iPhone G6,127.0,,,,,9000.0
15,16,2020-07-16 11:03:25,EGV,,,,iPhone G6,122.0,,,,,9300.0
16,17,2020-07-16 11:08:24,EGV,,,,iPhone G6,121.0,,,,,9600.0


In [751]:
glucose_values.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2276 entries, 12 to 2287
Data columns (total 13 columns):
 #   Column                              Non-Null Count  Dtype  
---  ------                              --------------  -----  
 0   Index                               2276 non-null   int64  
 1   Timestamp (YYYY-MM-DDThh:mm:ss)     2276 non-null   object 
 2   Event Type                          2276 non-null   object 
 3   Event Subtype                       0 non-null      object 
 4   Patient Info                        0 non-null      object 
 5   Device Info                         0 non-null      object 
 6   Source Device ID                    2276 non-null   object 
 7   Glucose Value (mg/dL)               2276 non-null   float64
 8   Insulin Value (u)                   0 non-null      float64
 9   Carb Value (grams)                  0 non-null      float64
 10  Duration (hh:mm:ss)                 0 non-null      object 
 11  Glucose Rate of Change (mg/dL/min)  0 non-

In [752]:
glucose_values.count()

Index                                 2276
Timestamp (YYYY-MM-DDThh:mm:ss)       2276
Event Type                            2276
Event Subtype                            0
Patient Info                             0
Device Info                              0
Source Device ID                      2276
Glucose Value (mg/dL)                 2276
Insulin Value (u)                        0
Carb Value (grams)                       0
Duration (hh:mm:ss)                      0
Glucose Rate of Change (mg/dL/min)       0
Transmitter Time (Long Integer)       2276
dtype: int64

In [753]:
glucose_values["Timestamp (YYYY-MM-DDThh:mm:ss)"].head()
columns_to_remove = ['Index','Event Subtype','Patient Info','Device Info','Insulin Value (u)','Carb Value (grams)','Duration (hh:mm:ss)','Glucose Rate of Change (mg/dL/min)']
# Removemos las columnas que no necesitamos por ahora
glucose_values = glucose_values.drop(columns=columns_to_remove)

### Trabajando con Datetime
Lo primero sera convertir los datetime a el formato correcto, ya que lo esta detectando como object, lo siguiente sera colocar como index las fechas y al final agrupar por cada 5 minutos los datos para obtener el promedio y media de los datos


In [754]:
# Convertimos en fechas los datimetimes
glucose_values['datetime'] = pd.to_datetime(glucose_values['Timestamp (YYYY-MM-DDThh:mm:ss)'])
glucose_values = glucose_values.rename(columns={'Glucose Value (mg/dL)': 'glucose'})
print(glucose_values.columns)

Index(['Timestamp (YYYY-MM-DDThh:mm:ss)', 'Event Type', 'Source Device ID',
       'glucose', 'Transmitter Time (Long Integer)', 'datetime'],
      dtype='object')


In [755]:


## Se coloca indices como datetime
glucose_values = glucose_values.set_index('datetime')
print(glucose_values.columns)


Index(['Timestamp (YYYY-MM-DDThh:mm:ss)', 'Event Type', 'Source Device ID',
       'glucose', 'Transmitter Time (Long Integer)'],
      dtype='object')


In [756]:
df_procesado_5min = glucose_values['glucose'].resample('5min') 
df_procesado_1hora = glucose_values['glucose'].resample('1h')
df_procesado_24hora = glucose_values['glucose'].resample('24h')

### Calculamos el valor de Glucosa de los 5 minutos

Solo recuperamos el valor de glucosa con los correspondientes 5 minutos del paciente

In [757]:
# Definimos los metodos del dataframe a calcular
df_5min = df_procesado_5min.agg(['mean'])
df_1hora = df_procesado_1hora.agg(['mean'])
df_24hora = df_procesado_24hora.agg(['mean'])
print(df_5min.columns)

df_5min.head()

Index(['mean'], dtype='object')


Unnamed: 0_level_0,mean
datetime,Unnamed: 1_level_1
2020-07-16 10:45:00,134.0
2020-07-16 10:50:00,130.0
2020-07-16 10:55:00,127.0
2020-07-16 11:00:00,122.0
2020-07-16 11:05:00,121.0


In [758]:
df_5min = df_5min.rename(columns={'mean': 'glucose'})
df_1hora = df_1hora.rename(columns={'mean': 'glucose'})
df_24hora = df_24hora.rename(columns={'mean': 'glucose'})
df_5min.count()

glucose    2276
dtype: int64

In [759]:
df_5min = df_5min.fillna(0)
df_1hora = df_1hora.fillna(0)
df_24hora = df_24hora.fillna(0)

### Calculamos su categoria

In [760]:
df_5min.head()

Unnamed: 0_level_0,glucose
datetime,Unnamed: 1_level_1
2020-07-16 10:45:00,134.0
2020-07-16 10:50:00,130.0
2020-07-16 10:55:00,127.0
2020-07-16 11:00:00,122.0
2020-07-16 11:05:00,121.0


In [761]:
# Valores extraidos del paper
PersLow = 90.8
PersNorm = 112.4
PersHigh = 149.9

# Definir las condiciones
condiciones = [
    df_24hora['glucose'] <= PersLow,
    df_24hora['glucose'] <= PersNorm,
    df_24hora['glucose'] > PersNorm,
]
# Resultados correspondientes
resultados_clasificacion = ['PersLow', 'PersNorm', 'PersHigh']

In [762]:
# Usando np.select()
df_24hora['nivel'] = np.select(condiciones, resultados_clasificacion, default='PersHigh')
# df_24hora['nivel'] = np.where(df_24hora['glucose'] > 0, 1, 0)
df_24hora.head()

Unnamed: 0_level_0,glucose,nivel
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1
2020-07-16,111.767296,PersNorm
2020-07-17,103.84375,PersNorm
2020-07-18,105.292254,PersNorm
2020-07-19,103.350694,PersNorm
2020-07-20,100.163043,PersNorm


In [763]:
df_5min = df_5min.sort_index()
df_1hora = df_1hora.sort_index()
df_24hora = df_24hora.sort_index()

In [764]:
# Combinar los DataFrames
resultado_5min = pd.merge_asof(df_5min, df_24hora, left_index=True, right_index=True, direction='backward')
resultado_1hora = pd.merge_asof(df_1hora, df_24hora, left_index=True, right_index=True, direction='backward')

In [765]:
resultado_5min.fillna(0)
resultado_1hora.fillna(0)
resultado_5min.head()

Unnamed: 0_level_0,glucose_x,glucose_y,nivel
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2020-07-16 10:45:00,134.0,111.767296,PersNorm
2020-07-16 10:50:00,130.0,111.767296,PersNorm
2020-07-16 10:55:00,127.0,111.767296,PersNorm
2020-07-16 11:00:00,122.0,111.767296,PersNorm
2020-07-16 11:05:00,121.0,111.767296,PersNorm


In [766]:
# Exportamos los resultados en un csv
resultado_5min.to_csv("Dexcom_5min_"+PACIENTE+".csv")
resultado_1hora.to_csv("Dexcom_1hora_"+PACIENTE+".csv")

### ARCHIVOS CSV GENERADOS CON EXITO PARA 5 MIN