# Prevalence of Hypertension among People Living with HIV (PLWHIV) 


<!-- ### Data Analysis Plan:
1. Introduction
2. Exploratory data analysis
3. Prevalence of hypertension in PLWHIV on ART or not
4. Factors associated with hypertension in PLWHIV on ART or not
5. Conclusion 
6. Recommendations -->

*Boni Maxime Ale*

*NYC Data Science Academy*

*21 February 2022*

## Introduction
<!-- Hypertension is a major modifiable risk factor for cardiovascular diseases (CVD) globally. In low- and middle-income settings, including sub-Saharan Africa (SSA), hypertension prevalence has been increasing rapidly over the past several decades. The World Health Organization (WHO) estimates that 46% of individuals >25 years in SSA have hypertension, with rising rates due to demographic transitions that have led to sedentary lifestyles, smoking, harmful alcohol use and consumption of processed foods [1–3]. Estimates of hypertension prevalence in Kenya are high (ranging from 12.6–36.9%) with higher rates in urban areas [1, 4, 5]. Older age, higher body mass index (BMI), alcohol consumption, cigarette smoking, and higher socioeconomic status have been associated with hypertension in previous studies in Kenya [5–7].

However, hypertension diagnosis and treatment are often delayed due to its asymptomatic nature, leading to increased risk of complications and mortality [8]. In SSA, screening, diagnosis, and treatment remain inadequate [9] and a recent study found that 40% of individuals with hypertension in East and West Africa were unaware of their status. The WHO 2017 report on non-communicable diseases (NCD) risk factors identified hypertension as the leading cause of death across income levels [10]. In 2015, hypertension caused an estimated 7.5 million deaths, accounting for 12.8% of all deaths globally [11]. In particular, sub-Saharan Africa (SSA) is facing a dual burden of communicable and non-communicable diseases, including CVD and cancers, with fewer resources for managing NCD [1, 12, 13].

The widespread use of antiretroviral therapy (ART) in SSA has resulted in a near normal life expectancy among persons with HIV (PWH); overall approximately 76% of PWH in SSA are virally suppressed [14]. This increased lifespan, however, may lead to an increased risk of NCD, including hypertension, due to the HIV virus and ART toxicity [14–17]. Studies on hypertension in PWH have shown varied results, some showing higher prevalence of hypertension while others showing no differences or lower prevalence of hypertension among PWH [18, 19]. The majority of studies have included PWH who are ART naïve or on ART but with poorly controlled viral loads compared to HIV-negative individuals in SSA [15, 18, 20, 21]. Data are lacking among PWH who are virally suppressed on ART. We sought to estimate the prevalence of hypertension among virally suppressed PWH on long-term ART compared to HIV-negative adults in western Kenya and identify factors associated with hypertension. These data can help guide prevention strategies and inform allocation of resources for integrated hypertension and HIV management. -->

<!-- ## 1. Objectives :

### 1.1. Primary Objective


### 1.2. Secondary Objective


## 2. Materials and Methods

### 2.1. Study design and Setting 


### 2.2. Definition of variables



### 2.3. Statistical Analysis -->

## 3. Results

### 3.1. Exploratory Data Analysis

In [1]:
## Import library 
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('ggplot')
import matplotlib
import seaborn as sns
import plotly
import plotly.express as px
plotly.offline.init_notebook_mode(connected = True)

#import plotly.graph_objects as go

In [2]:
## Loading data
hiv = pd.read_csv("dataset_plos.csv", 
                 # index_col = "year", 
                  parse_dates = True)

## check the first 5 observations of the data set
hiv.head(5)
#print(hiv.info())

Unnamed: 0,year,sex,age,priorhivtest,treatment,hivresult,hivnew,cd4_cat,bp_cat,bmi_cat,stisymptoms,tbsymptoms,diabsymptoms
0,2008,1.0,49.0,0.0,,0.0,0.0,,5.0,3.0,0,0,0
1,2008,1.0,20.0,0.0,,0.0,0.0,,2.0,2.0,0,0,0
2,2008,2.0,29.0,0.0,,0.0,0.0,,2.0,4.0,0,0,0
3,2008,2.0,38.0,1.0,,0.0,0.0,,1.0,4.0,0,0,0
4,2008,1.0,32.0,0.0,,0.0,0.0,,5.0,2.0,0,0,0


In [3]:
## let's create a copy of hiv dataset for data wrangling
hiv2 = hiv.copy()

In [4]:
## let's convert the 7 modalities of bp_cat into two (normal and high)
conversion_dictionary = {1 : "Normal",
                        2: "Normal",
                        3: "Normal",
                        4 : "High",
                        5 : "High",
                        6 : "High",
                        7 : "Normal"
                        }

hiv2['bp_cat2'] = hiv2['bp_cat'].replace(conversion_dictionary)

## Age groups
#print(hiv2["age"].value_counts())
bins = [12, 20, 30, 40, 50, 60, 70]
labels = ["12-20", "20-30", "30-40", "40-50", "50-60", "60+"]
hiv2["age_group"] = pd.cut(hiv2["age"], bins = bins, labels = labels, right = False)

## change bmi_cat data type 
hiv2['bmi_cat2'] = hiv2['bmi_cat'].astype('category')
## rename bmi category
hiv2['bmi_cat2'] = hiv2['bmi_cat2'].cat.rename_categories({1: "Underweight",
                                                          2:"Normal weight",
                                                          3:"Overweight",
                                                          4:"Obese"})

## let's convert the 4 modalities of treatment into two 
conversion_dictionary_trt = {1 : "On ART",
                             2 : "Not on ART",
                             3 :  "Not on ART",
                             4 : "Not on ART"}

hiv2['trt'] = hiv2['treatment'].replace(conversion_dictionary_trt)

## Gender
## change sex data type 
hiv2['sex'] = hiv2['sex'].astype('category')
## rename gender category
hiv2['sex'] = hiv2['sex'].cat.rename_categories({2:"female",1: "male"})


## cd4 renaming 
hiv2['cd4_cat2'] = hiv2['cd4_cat'].astype('category')
hiv2['cd4_cat2'] = hiv2['cd4_cat2'].cat.rename_categories({
   1 : "<50 cells/μl",
   2 : "50-249 cells/μl",
   3 : "250-349 cells/μl",
   4 : "350-499 cells/μl",
   5 : "500+ cells/μ"
})

hiv2.head()


Unnamed: 0,year,sex,age,priorhivtest,treatment,hivresult,hivnew,cd4_cat,bp_cat,bmi_cat,stisymptoms,tbsymptoms,diabsymptoms,bp_cat2,age_group,bmi_cat2,trt,cd4_cat2
0,2008,male,49.0,0.0,,0.0,0.0,,5.0,3.0,0,0,0,High,40-50,Overweight,,
1,2008,male,20.0,0.0,,0.0,0.0,,2.0,2.0,0,0,0,Normal,20-30,Normal weight,,
2,2008,female,29.0,0.0,,0.0,0.0,,2.0,4.0,0,0,0,Normal,20-30,Obese,,
3,2008,female,38.0,1.0,,0.0,0.0,,1.0,4.0,0,0,0,Normal,30-40,Obese,,
4,2008,male,32.0,0.0,,0.0,0.0,,5.0,2.0,0,0,0,High,30-40,Normal weight,,


In [5]:
## data cleaning 
hiv3 = hiv2.dropna(subset = ["bp_cat2", "sex", "bmi_cat"])

hiv4 = hiv3.loc[hiv3["bp_cat2"]== "High"]

In [6]:
## Treament group ART and non ART
## let's convert the 7 modalities of bp_cat into two (normal and high)
conversion_dictionary_trt = {1 : "On ART",
                             2 : "Not on ART",
                             3 :  "Not on ART",
                             4 : "Not on ART"}

hiv2['trt'] = hiv2['treatment'].replace(conversion_dictionary_trt)

# ## let's see the frequency in each group
# print(hiv2['trt'].value_counts())

# print(hiv2['trt'].value_counts(normalize = True))

### 3.2. Prevalence of Hypertension

In [7]:
## Blood Pressure 
## let's convert the 7 modalities of bp_cat into two (normal and high)
conversion_dictionary = {1 : "Normal",
                        2: "Normal",
                        3: "Normal",
                        4 : "High",
                        5 : "High",
                        6 : "High",
                        7 : "Normal"
                        }

hiv2['bp_cat2'] = hiv2['bp_cat'].replace(conversion_dictionary)

# ## let's see the frequency in each group
# print(hiv2["bp_cat2"].value_counts())

# print(hiv2["bp_cat2"].value_counts(normalize = True))

In [8]:
prev_total1 = hiv2["bp_cat2"].value_counts() 
prev_total = hiv2["bp_cat2"].value_counts(normalize = True) 
prev_total = pd.DataFrame(round( prev_total* 100, 2))
prev_total.columns = ["Prevalence of Hypertension"]
print(prev_total1)
prev_total

Normal    34588
High       8948
Name: bp_cat2, dtype: int64


Unnamed: 0,Prevalence of Hypertension
Normal,79.45
High,20.55


There are 20.55 % of PLWHIV who has hypertension. This is considerable and not very different to the general population in Cape Town in SA

In [9]:
prev_year = hiv2.groupby("year")["bp_cat2"].value_counts(normalize = True)
prev_year = round(prev_year*100, 2)

In [10]:
prev_year_bp = hiv2.groupby("year")["bp_cat2"].value_counts(normalize = True)
prev_year_bp = pd.DataFrame(round(prev_year_bp*100, 2))
prev_year_bp.columns = ["Prevalence of Hypertension"]
prev_year_bp = prev_year_bp.reset_index()
prev_year_bp = prev_year_bp.loc[prev_year_bp["bp_cat2"]=="High"]
prev_year_bp = prev_year_bp.drop(['bp_cat2'], axis = 1)
prev_year_bp

Unnamed: 0,year,Prevalence of Hypertension
1,2008,18.69
3,2009,16.77
5,2010,22.68
7,2011,25.3
9,2012,20.03
11,2014,22.57
13,2015,18.32
15,2016,19.55


In [11]:
## Let's see the prevalence of hypertension over the year
fig_prev = px.bar(prev_year_bp,
                        x = "year",
                        y = "Prevalence of Hypertension", 
                  #color = "year",
            barmode = "group",
           # facet_col = "year",
            template="simple_white",
            title = "Prevalence of hypertension over the year",
                   text_auto='.4s'
                       )
fig_prev.update_yaxes(title_text = "Prevalence of Hypertension")
fig_prev.update_xaxes(title_text = "Year of the Survey")
fig_prev.update_xaxes(categoryorder = 'array', categoryarray= ['2013','2009','2015','2008','2016', '2012', '2014', '2010', '2011'])
fig_prev.update_traces(textfont_size = 12, textangle = 0, textposition = "outside", cliponaxis = False)
fig_prev.update_yaxes(range= [0, 35])
fig_prev.show()

### 3.2.1. Prevalence of Hypertension by age group

In [12]:
## Prevalence of HTN by age group
prev_age = hiv2.groupby("age_group")["bp_cat2"].value_counts(normalize = True)
prev_age = pd.DataFrame(round(prev_age*100, 2))

prev_age.columns = ["Prevalence of Hypertension"]
prev_age = prev_age.reset_index()
prev_age = prev_age.loc[prev_age["bp_cat2"] == "High"].drop(['bp_cat2'], axis = 1)
prev_age

Unnamed: 0,age_group,Prevalence of Hypertension
1,12-20,4.64
3,20-30,11.18
5,30-40,22.78
7,40-50,33.47
9,50-60,39.18
11,60+,38.19


In [13]:
fig_prev_age = px.bar(prev_age,
                        x = "age_group",
                        y = "Prevalence of Hypertension", 
                  #color = "sex",
            barmode = "group",
           # facet_col = "year",
            template="simple_white",
            title = "Prevalence of hypertension by age group",
                   text_auto='.3s'
                       )
fig_prev_age.update_yaxes(title_text = "Prevalence of Hypertension")
fig_prev_age.update_xaxes(title_text = "Age Group")
#fig_prev.update_xaxes(categoryorder = 'array', categoryarray= ['2013','2009','2015','2008','2016', '2012', '2014', '2010', '2011'])
fig_prev_age.update_traces(textfont_size = 12, textangle = 0, textposition = "outside", cliponaxis = False)
fig_prev_age.update_traces(marker_color='green')
fig_prev_age.update_yaxes(range= [0, 50])
fig_prev_age.show()

### 3.2.2. Prevalence of Hypertension by gender

In [14]:
## Prevalence of HTN by gender
prev_gender = hiv2.groupby("sex")["bp_cat2"].value_counts(normalize = True)
prev_gender= pd.DataFrame(round(prev_gender*100, 2))
prev_gender.columns = ["Prevalence of Hypertension"]
prev_gender= prev_gender.reset_index()
prev_gender = prev_gender.loc[prev_gender["bp_cat2"] == "High"].drop(['bp_cat2'], axis = 1)
prev_gender

Unnamed: 0,sex,Prevalence of Hypertension
1,male,21.25
3,female,19.88


In [15]:
fig_prev_gender = px.bar(prev_gender,
                        x = "sex",
                        y = "Prevalence of Hypertension", 
                 # color = "sex",
            barmode = "group",
           # facet_col = "year",
            template="simple_white",
            title = "Prevalence of hypertension among men and women in PLWHIV",
                   text_auto='.3s'
                       )
fig_prev_gender.update_yaxes(title_text = "Prevalence of Hypertension")
fig_prev_gender.update_xaxes(title_text = "Gender")
#fig_prev.update_xaxes(categoryorder = 'array', categoryarray= ['2013','2009','2015','2008','2016', '2012', '2014', '2010', '2011'])
fig_prev_gender.update_traces(textfont_size = 12, textangle = 0, textposition = "outside", cliponaxis = False)
fig_prev_gender.update_yaxes(range= [0, 30])
fig_prev_gender.show()

### 3.2.3. Prevalence of hypertension by age group and gender

In [16]:
prev_age_sex = hiv2.groupby(["age_group","sex"])["bp_cat2"].value_counts(normalize = True)
prev_age_sex = pd.DataFrame(round(prev_age_sex*100, 2))
prev_age_sex.columns = ["Prevalence of Hypertension"]
prev_age_sex = prev_age_sex.reset_index()
prev_age_sex = prev_age_sex.loc[prev_age_sex["bp_cat2"]== "High"].drop(['bp_cat2'], axis = 1)
prev_age_sex 

Unnamed: 0,age_group,sex,Prevalence of Hypertension
1,12-20,male,4.43
3,12-20,female,4.75
5,20-30,male,11.66
7,20-30,female,10.74
9,30-40,male,22.75
11,30-40,female,22.81
13,40-50,male,31.97
15,40-50,female,35.21
17,50-60,male,39.04
19,50-60,female,39.33


In [17]:
fig_prev_age_sex = px.bar(prev_age_sex,
                        x = "age_group",
                        y = "Prevalence of Hypertension", 
                  color = "sex",
            barmode = "group",
           # facet_col = "year",
            template="simple_white",
            title = "Prevalence of hypertension by age group and gender",
                   text_auto='.3s'
                       )
fig_prev_age_sex.update_yaxes(title_text = "Prevalence of Hypertension")
fig_prev_age_sex.update_xaxes(title_text = "Age Group")
#fig_prev.update_xaxes(categoryorder = 'array', categoryarray= ['2013','2009','2015','2008','2016', '2012', '2014', '2010', '2011'])
fig_prev_age_sex.update_traces(textfont_size = 12, textangle = 0, textposition = "outside", cliponaxis = False)
fig_prev_age_sex.update_yaxes(range= [0, 50])
fig_prev_age_sex.show()

This means that there is almost no difference between the prevalence of hypertension in age groups among male and female.

### 3.2.4. Prevalence of Hypertension by CD4 count category

In [18]:
## Prevalence of HTN by CD4 count category
prev_cd4 = hiv2.groupby("cd4_cat2")["bp_cat2"].value_counts(normalize = True)
prev_cd4 = pd.DataFrame(round(prev_cd4*100, 2))
prev_cd4.columns = ["Prevalence of Hypertension"]
prev_cd4 = prev_cd4.reset_index()
prev_cd4 = prev_cd4.loc[prev_cd4["bp_cat2"] == "High"].drop(['bp_cat2'], axis = 1)
prev_cd4

Unnamed: 0,cd4_cat2,Prevalence of Hypertension
1,<50 cells/μl,21.21
3,50-249 cells/μl,24.45
5,250-349 cells/μl,24.04
7,350-499 cells/μl,23.51
9,500+ cells/μ,24.49


In [19]:
fig_prev_cd4 = px.bar(prev_cd4,
                        x = "cd4_cat2",
                        y = "Prevalence of Hypertension", 
                  # color = "sex",
            barmode = "group",
           # facet_col = "year",
            template="simple_white",
            title = "Prevalence of hypertension by CD4 count category",
                   text_auto='.4s'
                       )
fig_prev_cd4.update_yaxes(title_text = "Prevalence of Hypertension")
fig_prev_cd4.update_xaxes(title_text = "CD4 count category")
fig_prev_cd4.update_traces(textfont_size = 12, textangle = 0, textposition = "outside", cliponaxis = False)
fig_prev_cd4.update_traces(marker_color = 'indianred')
fig_prev_cd4.update_yaxes(range= [0, 35])
fig_prev_cd4.show()

PLWHIV with CD4 at 500 are patient with controlled HIV. Unfortunately, they are the one with higer prevalence of hypertension. This means that even though an HI patient may be controlled for his HIV, he could be battling with Hypertension without knowing. 

### 3.2.5. Prevalence of hypertension by Body Mass Index (BMI) category

In [20]:
## let's see the proportion of hypertension by Body mass index category
prev_bmi = hiv2.groupby("bmi_cat2")["bp_cat2"].value_counts(normalize = True)
prev_bmi = pd.DataFrame(round(prev_bmi*100, 2))
prev_bmi.columns = ["Prevalence of Hypertension"]
prev_bmi = prev_bmi.reset_index()
prev_bmi = prev_bmi.loc[prev_bmi["bp_cat2"] == "High"].drop(['bp_cat2'], axis = 1)
prev_bmi

Unnamed: 0,bmi_cat2,Prevalence of Hypertension
1,Underweight,9.57
3,Normal weight,14.85
5,Overweight,23.36
7,Obese,30.24


In [21]:
fig_prev_bmi = px.bar(prev_bmi,
                        x = "bmi_cat2",
                        y = "Prevalence of Hypertension", 
                  # color = "sex",
            barmode = "group",
           # facet_col = "year",
            template="simple_white",
            title = "Prevalence of hypertension by body mass index category",
                   text_auto='.4s'
                       )
fig_prev_bmi.update_yaxes(title_text = "Prevalence of Hypertension")
fig_prev_bmi.update_xaxes(title_text = "Body mass index category")
fig_prev_bmi.update_traces(textfont_size = 12, textangle = 0, textposition = "outside", cliponaxis = False)
fig_prev_bmi.update_traces(marker_color = 'red')
fig_prev_bmi.update_yaxes(range= [0, 40])
fig_prev_bmi.show()

### 3.2.6. Prevalence of hypertension by BMI category among men and women living with HIV

In [24]:
prev_bmi_sex = hiv2.groupby(["bmi_cat2","sex"])["bp_cat2"].value_counts(normalize = True)
prev_bmi_sex = pd.DataFrame(round(prev_bmi_sex*100, 2))
prev_bmi_sex.columns = ["Prevalence of Hypertension"]
prev_bmi_sex = prev_bmi_sex.reset_index()
prev_bmi_sex = prev_bmi_sex.loc[prev_bmi_sex["bp_cat2"]== "High"].drop(['bp_cat2'], axis = 1)
prev_bmi_sex 

Unnamed: 0,bmi_cat2,sex,Prevalence of Hypertension
1,Underweight,male,10.96
3,Underweight,female,7.33
5,Normal weight,male,16.42
7,Normal weight,female,11.41
9,Overweight,male,29.56
11,Overweight,female,18.41
13,Obese,male,43.05
15,Obese,female,27.55


In [32]:
fig_prev_bmi_sex = px.bar(prev_bmi_sex,
                        x = "bmi_cat2",
                        y = "Prevalence of Hypertension", 
                  color = "sex",
            barmode = "group",
           facet_col = "sex",
            template="simple_white",
            title = "Prevalence of hypertension by BMI category and gender",
                   text_auto='.3s'
                       )
fig_prev_bmi_sex.update_yaxes(title_text = "Prevalence of Hypertension")
fig_prev_bmi_sex.update_xaxes(title_text = "Body mass index category")
#fig_prev.update_xaxes(categoryorder = 'array', categoryarray= ['2013','2009','2015','2008','2016', '2012', '2014', '2010', '2011'])
fig_prev_bmi_sex.update_traces(textfont_size = 12, textangle = 0, textposition = "outside", cliponaxis = False)
fig_prev_bmi_sex.update_yaxes(range= [0, 60])
fig_prev_bmi_sex.update_layout(showlegend=False)
fig_prev_bmi_sex.show()

### 3.2.7. Prevalence of hypertension among PLWHIV on ART treatment or not

In [22]:
## let's see the proportion of hypertension among PLWHIV on ART and those not on ART
prev_trt = hiv2.groupby("trt")["bp_cat2"].value_counts(normalize = True)
prev_trt = pd.DataFrame(round(prev_trt * 100, 2))
prev_trt.columns = ["Prevalence of Hypertension"]
prev_trt = prev_trt.reset_index()
prev_trt = prev_trt.loc[prev_trt["bp_cat2"] == "High"].drop(['bp_cat2'], axis = 1)
prev_trt

Unnamed: 0,trt,Prevalence of Hypertension
1,Not on ART,21.03
3,On ART,19.8


In [23]:
fig_prev_trt = px.bar(prev_trt,
                        x = "trt",
                        y = "Prevalence of Hypertension", 
                 # color = "sex",
            barmode = "group",
           # facet_col = "year",
            template="simple_white",
            title = "Prevalence of hypertension of PLWHIV on ART treatment or not",
                   text_auto='.3s'
                       )
fig_prev_trt.update_yaxes(title_text = "Prevalence of Hypertension")
fig_prev_trt.update_xaxes(title_text = "Treatment Group")
fig_prev_trt.update_traces(textfont_size = 12, textangle = 0, textposition = "outside", cliponaxis = False)
fig_prev_trt.update_yaxes(range= [0, 30])
fig_prev_trt.show()