# **TR_2021/04 - Technical report: Rate ratio for cardiovascular hospitalizations and extreme events**


|Technical Report ID  |2021/04|
|--|--|
| Title |Rate ratio for cardiovascular hospitalizations and extreme events|
| Authors | Júlia De Lázari, Paula Dornhofer|
| Creation Date| 2021-03|


## Databases descriptions

**inputs:** 

- hospitalizações_circulatório.csv: Dataframe of hospitalizations due to cardiovascular ICDs from 2014 to 2018.

- EV_VCP.csv: Dataframe with the extreme events computed. Viracopos data was used for this.

## Analysis

This report presents an analysis of the the _rate ratio_ for the [extreme climate events](https://github.com/climate-and-health-datasci-Unicamp/project-climatic-variations-cardiovascular-diseases/blob/main/notebooks/TR_2020_05_Extreme_climatic_events_for_Campinas.ipynb) and cardiovascular hospitalizations.

The analysis was conducted for the total data and for some stratifications (sex, age, age and sex).





##**Rate ratio**

Rate ratio is a relative difference measure used to compare the incidence rates of events occurring at any given point in time, frequently used in epidemiology [CDC].

It is given by **RR = rate ratio = incidence rate 1/incidence rate 2**

with **incidence rate = number of events/population size**

The confidence interval is given by **log(RR) - [1.96 x SE(log(RR))] a log(RR) + [1.96 x SE(log(RR))]**. SE is the abreviation for standard error [SPH].

In our case **RR = (number of hospitalizations at days under extreme climatic events/number of days with extreme climatic events)/(number of hospitalizations at control days/number of control days)**

Its interpretation is similar to that of the _risk ratio_. A rate ratio of 1.0 indicates equal rates in the two groups. A rate ratio greater than 1.0 indicates increased risk for the group in the numerator. A rate ratio less than 1.0 indicates descreased risk for the group in the numerator.

##**Import libraries**

In [None]:
#-------------------------------------------------------------------#
#                       Import libraries                            #
#-------------------------------------------------------------------#
import pandas as pd
import numpy as np
import datetime
import more_itertools as mit
import datetime
import statistics as stat
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter
import scipy
import math
import seaborn as sns
import pylab
from datetime import timedelta
from calendar import isleap
from google.colab import drive
from google.colab import files

drive.mount('/content/drive')

pd.options.mode.chained_assignment = None

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## **Load and merge dataframes**

In [None]:
#-------------------------------------------------------------------#
#                      cardiovascular hospitalizations                 #
#-------------------------------------------------------------------#

df_hosp = pd.read_csv('hospitalizações_circulatório.csv')
df_hosp = df_hosp.drop(columns = {'Unnamed: 0','Hora','Número Paciente','Descrição CID'}) #drop unneeded columns
df_hosp = df_hosp.rename(columns = {'Data':'DATE','Idade': 'IDADE','Sexo': 'SEXO'}) #rename Data to DATE to merge dataframes
df_hosp = df_hosp[(df_hosp.DATE !='2012-02-29')&(df_hosp.DATE !='2016-02-29')] #remove leap year dates (02-29)
df_hosp = df_hosp.sort_values('DATE')

In [None]:
#-------------------------------------------------------------------#
#                  Extreme climatic variations                      #
#-------------------------------------------------------------------#
df_vir =  pd.read_csv('EV_VCP.csv')
df_vir = df_vir.drop(columns = {'Unnamed: 0'})
df_vir = df_vir[df_vir['DATE']>='2014-01-01']

In [None]:
#-------------------------------------------------------------------#
#              Merge health and climate dataframes                  #
#-------------------------------------------------------------------#
df = pd.merge(df_vir,df_hosp, on='DATE', how='outer')

## **Functions**

Automatize some repeated operations along the notebook
- stratify functions: different stratifications of the dataframe
- rate_ratio: compute the rate ratio for the desired stratification

###**Stratify functions**

In [None]:
# Stratify sex
def stratify_sex(database):
  women = database[database['SEXO']=='F']
  men = database[database['SEXO']=='M']

  dataframes = [database, women, men]
  df_names = ["All", "Women", "Men"]

  return dataframes, df_names

In [None]:
#Stratify age
def stratify_age(database):
  #less_20 = database[(database['IDADE'] < 20)]  
  between_20_40 = database[(database['IDADE'] >= 20) & (database['IDADE'] < 40)]
  between_40_65 = database[(database['IDADE'] >= 40) & (database['IDADE'] < 65)]   
  over_65 = database[(database['IDADE'] > 64)]   
  over_75 = database[(database['IDADE'] > 75)]

  dataframes  = [database, between_20_40, between_40_65, over_65, over_75] 
  df_names = ["All", "Between 20 and 40 years old","Between 40 and 65 years old","Above 65 years old","Above 75 years old"]

  return dataframes, df_names

In [None]:
# Stratify age sex
def stratify_age_sex(database): 
    between_20_65_F = database[(database['IDADE'] >= 20) & (database['IDADE'] < 65)  & (database['SEXO']=="F")]
    between_20_65_M = database[(database['IDADE'] >= 20) & (database['IDADE'] < 65) & (database['SEXO']=="M")]   
    over_65_F = database[(database['IDADE'] > 64) & (database['SEXO']=="F")]   
    over_65_M = database[(database['IDADE'] > 64) & (database['SEXO']=="M")]     

    dataframes = [database, between_20_65_F, between_20_65_M,over_65_F,over_65_M]
    df_names = ["All", "Women between 20 and 65 years old","Men between 20 and 65 years old","Women above 65 years old","Men above 65 years old"]

    return dataframes, df_names

###**Rate ratio function**

In [None]:
def rate_ratio(db, stratify,event):
  database = db.copy()

  #subsets depending on the stratification
  if (stratify == 'age and sex'): 
    dataframes, df_names = stratify_age_sex(database)
  elif (stratify == 'sex'): 
    dataframes, df_names = stratify_sex(database)
  elif (stratify == 'age'): 
    dataframes, df_names = stratify_age(database)
  
  #aux variable 
  list_rr = []
  list_up_ci = []
  list_lr_ci = []

  for df in dataframes:
    #column for number of hospitalizations
    df['N_hosp'] = np.where(df['CID'].isnull(),0,df.groupby(['DATE']).DATE.transform('count'))
    df = df.drop_duplicates('DATE',keep='first')
    df = df.sort_values('DATE')
      
    number_event = len(df[df[event] ==1]) # number of days with a extreme event
    number_control = len(df[df[event] ==0]) # number of days without a extreme event

    hosp_event = df.N_hosp[df[event] ==1].sum() # total number of hosp during a extreme event
    hosp_control = df.N_hosp[df[event] == 0].sum() # total number of hosp during control days

    # Rate ratio and confidence interval
    RR = round((hosp_event/number_event)/(hosp_control/number_control), 2) # compute rate ratio
    SE = math.sqrt(1/hosp_event + 1/hosp_control)

    upper_CI = round(np.exp(math.log(RR)+1.96*SE),2) #upper value
    lower_CI = round(np.exp(math.log(RR)-1.96*SE),2) #lower value

    # Append values in the list
    list_rr.append(RR)
    list_up_ci.append(upper_CI)
    list_lr_ci.append(lower_CI)

  #Create table
  table = pd.DataFrame()
  table['Group'] = df_names
  table['Rate ratio (RR)'] = list_rr
  table['Upper CI'] = list_up_ci
  table['Lower CI'] = list_lr_ci

  return table

##**Temperature**

###**Extreme thermal range**

####**Sex**

In [None]:
rate_ratio(df, 'sex','above_temp_range')

Unnamed: 0,Group,Rate ratio (RR),Upper CI,Lower CI
0,All,0.96,1.03,0.89
1,Women,0.99,1.1,0.89
2,Men,0.96,1.06,0.87


####**Age**

In [None]:
rate_ratio(df, 'age','above_temp_range')

Unnamed: 0,Group,Rate ratio (RR),Upper CI,Lower CI
0,All,0.96,1.03,0.89
1,Between 20 and 40 years old,0.93,1.31,0.66
2,Between 40 and 65 years old,1.04,1.16,0.93
3,Above 65 years old,0.92,1.02,0.83
4,Above 75 years old,1.01,1.17,0.87


####**Age and sex**

In [None]:
rate_ratio(df, 'age and sex','above_temp_range')

Unnamed: 0,Group,Rate ratio (RR),Upper CI,Lower CI
0,All,0.96,1.03,0.89
1,Women between 20 and 65 years old,1.05,1.23,0.89
2,Men between 20 and 65 years old,1.04,1.19,0.91
3,Women above 65 years old,0.96,1.11,0.83
4,Men above 65 years old,0.94,1.09,0.81


###**Extreme temperature difference between days**

####**Sex**

In [None]:
rate_ratio(df, 'sex','above_temp_dif')

Unnamed: 0,Group,Rate ratio (RR),Upper CI,Lower CI
0,All,1.16,1.35,1.0
1,Women,1.2,1.49,0.97
2,Men,1.04,1.28,0.84


####**Age**

In [None]:
rate_ratio(df, 'age','above_temp_dif')

Unnamed: 0,Group,Rate ratio (RR),Upper CI,Lower CI
0,All,1.16,1.35,1.0
1,Between 20 and 40 years old,0.93,1.8,0.48
2,Between 40 and 65 years old,1.11,1.39,0.89
3,Above 65 years old,1.09,1.35,0.88
4,Above 75 years old,1.1,1.5,0.81


####**Age and sex**

In [None]:
rate_ratio(df, 'age and sex','above_temp_dif')

Unnamed: 0,Group,Rate ratio (RR),Upper CI,Lower CI
0,All,1.16,1.35,1.0
1,Women between 20 and 65 years old,1.01,1.42,0.72
2,Men between 20 and 65 years old,1.06,1.4,0.8
3,Women above 65 years old,1.14,1.51,0.86
4,Men above 65 years old,0.92,1.27,0.67


##**Pressure**

###**Low pressure waves**

####**Sex**

In [None]:
rate_ratio(df, 'sex','LPW')

Unnamed: 0,Group,Rate ratio (RR),Upper CI,Lower CI
0,All,1.09,1.22,0.98
1,Women,0.99,1.17,0.84
2,Men,1.18,1.37,1.02


####**Age**

In [None]:
rate_ratio(df, 'age','LPW')

Unnamed: 0,Group,Rate ratio (RR),Upper CI,Lower CI
0,All,1.09,1.22,0.98
1,Between 20 and 40 years old,1.0,1.62,0.62
2,Between 40 and 65 years old,0.96,1.14,0.81
3,Above 65 years old,1.17,1.36,1.01
4,Above 75 years old,1.19,1.47,0.96


####**Age and sex**

In [None]:
rate_ratio(df, 'age and sex','LPW')

Unnamed: 0,Group,Rate ratio (RR),Upper CI,Lower CI
0,All,1.09,1.22,0.98
1,Women between 20 and 65 years old,0.76,1.02,0.56
2,Men between 20 and 65 years old,1.12,1.36,0.92
3,Women above 65 years old,1.25,1.53,1.02
4,Men above 65 years old,1.01,1.26,0.81


###**High pressure waves**

####**Sex**

In [None]:
rate_ratio(df, 'sex','HPW')

Unnamed: 0,Group,Rate ratio (RR),Upper CI,Lower CI
0,All,1.0,1.06,0.94
1,Women,1.04,1.13,0.95
2,Men,0.98,1.06,0.91


####**Age**

In [None]:
rate_ratio(df, 'age','HPW')

Unnamed: 0,Group,Rate ratio (RR),Upper CI,Lower CI
0,All,1.0,1.06,0.94
1,Between 20 and 40 years old,0.94,1.24,0.71
2,Between 40 and 65 years old,0.97,1.06,0.89
3,Above 65 years old,1.02,1.11,0.94
4,Above 75 years old,0.99,1.12,0.87


####**Age and sex**

In [None]:
rate_ratio(df, 'age and sex','HPW')

Unnamed: 0,Group,Rate ratio (RR),Upper CI,Lower CI
0,All,1.0,1.06,0.94
1,Women between 20 and 65 years old,1.01,1.15,0.88
2,Men between 20 and 65 years old,0.92,1.03,0.82
3,Women above 65 years old,0.99,1.11,0.88
4,Men above 65 years old,1.02,1.14,0.91


###**Extreme difference of pressure between days**

####**Sex**

In [None]:
rate_ratio(df, 'sex','above_pressure_dif')

Unnamed: 0,Group,Rate ratio (RR),Upper CI,Lower CI
0,All,1.05,1.11,0.99
1,Women,0.98,1.07,0.9
2,Men,1.08,1.17,1.0


####**Age**

In [None]:
rate_ratio(df, 'age','above_pressure_dif')

Unnamed: 0,Group,Rate ratio (RR),Upper CI,Lower CI
0,All,1.05,1.11,0.99
1,Between 20 and 40 years old,1.02,1.32,0.79
2,Between 40 and 65 years old,1.02,1.11,0.93
3,Above 65 years old,1.02,1.11,0.94
4,Above 75 years old,1.03,1.16,0.91


####**Age and sex**

In [None]:
rate_ratio(df, 'age and sex','above_pressure_dif')

Unnamed: 0,Group,Rate ratio (RR),Upper CI,Lower CI
0,All,1.05,1.11,0.99
1,Women between 20 and 65 years old,0.95,1.08,0.83
2,Men between 20 and 65 years old,1.06,1.18,0.95
3,Women above 65 years old,0.99,1.11,0.88
4,Men above 65 years old,1.04,1.16,0.93


##**Humidity**

###**Low humidity waves**

####**Sex**

In [None]:
rate_ratio(df, 'sex','LHW')

Unnamed: 0,Group,Rate ratio (RR),Upper CI,Lower CI
0,All,0.91,0.97,0.85
1,Women,0.92,1.02,0.83
2,Men,0.9,0.99,0.82


####**Age**

In [None]:
rate_ratio(df, 'age','LHW')

Unnamed: 0,Group,Rate ratio (RR),Upper CI,Lower CI
0,All,0.91,0.97,0.85
1,Between 20 and 40 years old,0.95,1.3,0.7
2,Between 40 and 65 years old,0.94,1.04,0.85
3,Above 65 years old,0.92,1.01,0.84
4,Above 75 years old,1.04,1.19,0.91


####**Age and sex**

In [None]:
rate_ratio(df, 'age and sex','LHW')

Unnamed: 0,Group,Rate ratio (RR),Upper CI,Lower CI
0,All,0.91,0.97,0.85
1,Women between 20 and 65 years old,0.92,1.08,0.79
2,Men between 20 and 65 years old,0.95,1.08,0.84
3,Women above 65 years old,0.94,1.07,0.82
4,Men above 65 years old,0.91,1.04,0.79


###**High humidity waves**

####**Sex**

In [None]:
rate_ratio(df, 'sex','HHW')

Unnamed: 0,Group,Rate ratio (RR),Upper CI,Lower CI
0,All,1.05,1.45,0.76
1,Women,1.01,1.63,0.63
2,Men,0.99,1.54,0.64


####**Age**

In [None]:
rate_ratio(df, 'age','HHW')

Unnamed: 0,Group,Rate ratio (RR),Upper CI,Lower CI
0,All,1.05,1.45,0.76
1,Between 20 and 40 years old,0.82,5.83,0.12
2,Between 40 and 65 years old,0.55,1.06,0.29
3,Above 65 years old,1.39,2.04,0.95
4,Above 75 years old,1.29,2.22,0.75


####**Age and sex**

In [None]:
rate_ratio(df, 'age and sex','HHW')

Unnamed: 0,Group,Rate ratio (RR),Upper CI,Lower CI
0,All,1.05,1.45,0.76
1,Women between 20 and 65 years old,0.53,1.64,0.17
2,Men between 20 and 65 years old,0.58,1.22,0.28
3,Women above 65 years old,1.63,2.76,0.96
4,Men above 65 years old,1.08,1.9,0.61


###**Extreme humidity variation**

####**Sex**

In [None]:
rate_ratio(df, 'sex','above_humidity_range')

Unnamed: 0,Group,Rate ratio (RR),Upper CI,Lower CI
0,All,1.01,1.06,0.96
1,Women,1.04,1.12,0.96
2,Men,1.0,1.07,0.93


####**Age**

In [None]:
rate_ratio(df, 'age','above_humidity_range')

Unnamed: 0,Group,Rate ratio (RR),Upper CI,Lower CI
0,All,1.01,1.06,0.96
1,Between 20 and 40 years old,0.99,1.27,0.77
2,Between 40 and 65 years old,1.04,1.12,0.96
3,Above 65 years old,1.0,1.08,0.93
4,Above 75 years old,1.09,1.21,0.98


####**Age and sex**

In [None]:
rate_ratio(df, 'age and sex','above_humidity_range')

Unnamed: 0,Group,Rate ratio (RR),Upper CI,Lower CI
0,All,1.01,1.06,0.96
1,Women between 20 and 65 years old,1.07,1.2,0.95
2,Men between 20 and 65 years old,0.99,1.09,0.9
3,Women above 65 years old,1.01,1.12,0.91
4,Men above 65 years old,0.96,1.06,0.87


###**Extreme humidity difference between days**

####**Sex**

In [None]:
rate_ratio(df, 'sex','above_humidity_dif')

Unnamed: 0,Group,Rate ratio (RR),Upper CI,Lower CI
0,All,0.99,1.16,0.84
1,Women,1.23,1.53,0.99
2,Men,0.98,1.24,0.78


####**Age**

In [None]:
rate_ratio(df, 'age','above_humidity_dif')

Unnamed: 0,Group,Rate ratio (RR),Upper CI,Lower CI
0,All,0.99,1.16,0.84
1,Between 20 and 40 years old,1.18,2.2,0.63
2,Between 40 and 65 years old,1.14,1.45,0.9
3,Above 65 years old,0.97,1.22,0.77
4,Above 75 years old,1.02,1.39,0.75


####**Age and sex**

In [None]:
rate_ratio(df, 'age and sex','above_humidity_dif')

Unnamed: 0,Group,Rate ratio (RR),Upper CI,Lower CI
0,All,0.99,1.16,0.84
1,Women between 20 and 65 years old,1.22,1.71,0.87
2,Men between 20 and 65 years old,1.19,1.61,0.88
3,Women above 65 years old,1.32,1.76,0.99
4,Men above 65 years old,0.84,1.22,0.58


## **References**

CENTERS FOR DISEASE CONTROL AND PREVENTION (CDC).Principles ofEpidemiology in Public Health Practice, Third Edition An Introduction to AppliedEpidemiology and Biostatistics. Available at: <https://www.cdc.gov/csels/dsepd/ss1978/lesson3/section5.html>


BOSTON UNIVERSITY SCHOOL OF PUBLIC HEALTH (SPH).Rate Ratios. Available at: <https://sphweb.bumc.bu.edu/otlt/MPH-Modules/PH717-QuantCore/PH717_ComparingFrequencies/PH717_ComparingFrequencies9.html>.


