# Understanding HIV Trends and Impact in Africa

# 1. Business Understanding

## 1.1 Problem Statement

HIV continues to be a major public health issue in Africa, with a high number of people living with the virus, especially in sub-Saharan regions. While prevention and treatment efforts have improved, HIV is a lifelong condition, making it crucial to track data on this affected. This data helps us identify infection trends, prepare healthcare services to accommodate these patients, and address challenges like stigma and inequality. Beyond health, HIV also impacts jobs, healthcare costs, and poverty levels, making it important to use data to create targeted interventions. By analyzing this information, governments and organizations can better allocate resources and create policies to help reduce transmission and improve the quality of life for those affected.

## 1.2 Objectives

Create a visualization that shows the trend of HIV cases in the countries that contribute to 75% of the global burden </br>
</br>
Generate a visualization that displays the trend of HIV cases in the countries contributing to 75% of the burden within each WHO region (column called ParentLocationCode contains the WHO regions)

# 2. Importing Libraries and Warnings

In [8]:
import warnings
import numpy as np
import pandas as pd
warnings.filterwarnings("ignore")

# 3. Data Understanding

You are provided with a dataset from the World Health Organization (WHO) Global Observatory, containing data on people living with HIV at the country level from 2000 to 2023.

In [5]:
hiv_data = pd.read_csv("data/HIV data 2000-2023.csv", encoding="latin1")
hiv_data

Unnamed: 0,IndicatorCode,Indicator,ValueType,ParentLocationCode,ParentLocation,Location type,SpatialDimValueCode,Location,Period type,Period,Value
0,HIV_0000000001,Estimated number of people (all ages) living w...,numeric,AFR,Africa,Country,AGO,Angola,Year,2023,320 000 [280 000 - 380 000]
1,HIV_0000000001,Estimated number of people (all ages) living w...,numeric,AFR,Africa,Country,AGO,Angola,Year,2022,320 000 [280 000 - 380 000]
2,HIV_0000000001,Estimated number of people (all ages) living w...,numeric,AFR,Africa,Country,AGO,Angola,Year,2021,320 000 [280 000 - 380 000]
3,HIV_0000000001,Estimated number of people (all ages) living w...,numeric,AFR,Africa,Country,AGO,Angola,Year,2020,320 000 [280 000 - 370 000]
4,HIV_0000000001,Estimated number of people (all ages) living w...,numeric,AFR,Africa,Country,AGO,Angola,Year,2015,300 000 [260 000 - 350 000]
...,...,...,...,...,...,...,...,...,...,...,...
1547,HIV_0000000001,Estimated number of people (all ages) living w...,numeric,WPR,Western Pacific,Country,WSM,Samoa,Year,2020,No data
1548,HIV_0000000001,Estimated number of people (all ages) living w...,numeric,WPR,Western Pacific,Country,WSM,Samoa,Year,2015,No data
1549,HIV_0000000001,Estimated number of people (all ages) living w...,numeric,WPR,Western Pacific,Country,WSM,Samoa,Year,2010,No data
1550,HIV_0000000001,Estimated number of people (all ages) living w...,numeric,WPR,Western Pacific,Country,WSM,Samoa,Year,2005,No data


You have also been provided with World Bank data on the multidimensional poverty headcount ratio, which includes factors such as income, educational attainment, school enrolment, electricity access, sanitation and drinking water.

In [7]:
poverty_data = pd.read_excel("data/multidimensional_poverty.xlsx")
poverty_data

Unnamed: 0,"Individuals in households deprived in each indicator, 110 economies, circa year 2021 (2018-2023)\nDate: October 2024",Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13,Unnamed: 14,Unnamed: 15
0,Region,Country code,Economy,Reporting year,Survey name,Survey year,Survey coverage,Welfare type,Survey comparability,Deprivation rate (share of population),,,,,,Multidimensional poverty headcount ratio (%)
1,,,,,,,,,,Monetary (%),Educational attainment (%),Educational enrollment (%),Electricity (%),Sanitation (%),Drinking water (%),
2,SSA,AGO,Angola,2018,IDREA,2018,N,c,2,31.122005,29.753423,27.44306,52.639532,53.637516,32.106507,47.203606
3,ECA,ALB,Albania,2012,HBS,2018,N,c,1,0.048107,0.19238,-,0.06025,6.579772,9.594966,0.293161
4,LAC,ARG,Argentina,2010,EPHC-S2,2021,U,i,3,0.894218,1.08532,0.731351,0,0.257453,0.364048,0.906573
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
107,ECA,UZB,Uzbekistan,2022,HBS,2022,N,c,1,2.253092,0,-,0.12747,21.786885,10.693686,2.253092
108,EAP,VNM,Viet Nam,2010,VHLSS,2022,N,c,2,0.963795,3.384816,1.841407,0.079733,4.132901,1.968127,1.266184
109,EAP,VUT,Vanuatu,2010,NSDP,2019,N,c,0,9.963333,25.723079,13.404277,26.994166,42.970088,11.813611,19.892171
110,SSA,ZMB,Zambia,2010,LCMS-VIII,2022,N,c,4,64.341974,16.267821,23.39835,45.135146,53.505135,26.849246,66.506058


We would like you to merge this dataset with the HIV data above and analyze the relationship between people living with HIV and multidimensional poverty, and the individual factors that contribute to the ratio. Remember to account for the random effects (country, year).

In [9]:
hiv_data.columns

Index(['Individuals in households deprived in each indicator, 110 economies, circa year 2021 (2018-2023)\nDate: October 2024',
       'Unnamed: 1', 'Unnamed: 2', 'Unnamed: 3', 'Unnamed: 4', 'Unnamed: 5',
       'Unnamed: 6', 'Unnamed: 7', 'Unnamed: 8', 'Unnamed: 9', 'Unnamed: 10',
       'Unnamed: 11', 'Unnamed: 12', 'Unnamed: 13', 'Unnamed: 14',
       'Unnamed: 15'],
      dtype='object')

In [10]:
poverty_data.columns

Index(['IndicatorCode', 'Indicator', 'ValueType', 'ParentLocationCode',
       'ParentLocation', 'Location type', 'SpatialDimValueCode', 'Location',
       'Period type', 'Period', 'Value'],
      dtype='object')

In [12]:
hiv_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1552 entries, 0 to 1551
Data columns (total 11 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   IndicatorCode        1552 non-null   object
 1   Indicator            1552 non-null   object
 2   ValueType            1552 non-null   object
 3   ParentLocationCode   1552 non-null   object
 4   ParentLocation       1552 non-null   object
 5   Location type        1552 non-null   object
 6   SpatialDimValueCode  1552 non-null   object
 7   Location             1552 non-null   object
 8   Period type          1552 non-null   object
 9   Period               1552 non-null   int64 
 10  Value                1552 non-null   object
dtypes: int64(1), object(10)
memory usage: 133.5+ KB


In [13]:
poverty_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 112 entries, 0 to 111
Data columns (total 16 columns):
 #   Column                                                                                                               Non-Null Count  Dtype 
---  ------                                                                                                               --------------  ----- 
 0   Individuals in households deprived in each indicator, 110 economies, circa year 2021 (2018-2023)
Date: October 2024  111 non-null    object
 1   Unnamed: 1                                                                                                           111 non-null    object
 2   Unnamed: 2                                                                                                           111 non-null    object
 3   Unnamed: 3                                                                                                           111 non-null    object
 4   Unnamed: 4          

In [16]:
hiv_data.describe()

Unnamed: 0,Period
count,1552.0
mean,2014.5
std,8.080351
min,2000.0
25%,2008.75
50%,2017.5
75%,2021.25
max,2023.0


In [15]:
poverty_data.describe()

Unnamed: 0,"Individuals in households deprived in each indicator, 110 economies, circa year 2021 (2018-2023)\nDate: October 2024",Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13,Unnamed: 14,Unnamed: 15
count,111,111,111,111,111,111,111,111,111,112,111,111,111,111,111,111
unique,8,111,111,13,58,6,3,4,8,107,107,76,70,80,80,109
top,ECA,Country code,Economy,2010,EU-SILC,2021,N,c,1,0,0,-,0,-,0,0
freq,41,1,1,43,26,40,109,58,30,6,5,36,42,25,24,3


In [19]:
hiv_data.shape

(1552, 11)

In [18]:
poverty_data.shape

(112, 16)

# 4. Data Cleaning