# Comprehensive Data Analysis and Visualizations of HIV/AIDS Epidemiology in African Countries"
### By Edmund Joseph

This project focuses on a thorough examination of HIV/AIDS epidemiological data for various African countries. It involves data collection, cleaning, and in-depth analysis to uncover key trends, challenges, and insights. The project also employs data visualization techniques to present the findings in a clear and informative manner. The dataset used here is available on https://www.kaggle.com/datasets/imdevskp/hiv-aids-dataset. This dataset contains the information from **WHO** and **UNESCO** of the HIV AIDS since a decade.

The dataset is explored, cleaned, analyzed and visualized to produce valuable insights. There are six datasets in this.

## Importing Packages and Reading dataset

In [121]:
import pandas as pd
import numpy as np
import plotly.express as px
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")

In [122]:
adult_data = pd.read_csv(r"C:\Users\user1\Downloads\archive (6)\no_of_cases_adults_15_to_49_by_country_clean.csv")

In [123]:
adult_data.head()

Unnamed: 0,Country,Year,Count,Count_median,Count_min,Count_max,WHO Region
0,Afghanistan,2018,0.1[0.1–0.1],0.1,0.1,0.1,Eastern Mediterranean
1,Albania,2018,na,,,,Europe
2,Algeria,2018,0.1[0.1–0.1],0.1,0.1,0.1,Africa
3,Angola,2018,2.0[1.7–2.3],2.0,1.7,2.3,Africa
4,Argentina,2018,0.4[0.4–0.4],0.4,0.4,0.4,Americas


# Extacting no of adults case for African countries and storing in a new dataframe

In [124]:
afr_adult = adult_data[adult_data['WHO Region'] == 'Africa']

In [125]:
afr_adult.head()

Unnamed: 0,Country,Year,Count,Count_median,Count_min,Count_max,WHO Region
2,Algeria,2018,0.1[0.1–0.1],0.1,0.1,0.1,Africa
3,Angola,2018,2.0[1.7–2.3],2.0,1.7,2.3,Africa
16,Benin,2018,1.0[0.7–1.7],1.0,0.7,1.7,Africa
20,Botswana,2018,20.3[17.3–21.8],20.3,17.3,21.8,Africa
24,Burkina Faso,2018,0.7[0.6–0.9],0.7,0.6,0.9,Africa


# take a quick view of the data

In [126]:
afr_adult.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 176 entries, 2 to 679
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Country       176 non-null    object 
 1   Year          176 non-null    int64  
 2   Count         176 non-null    object 
 3   Count_median  176 non-null    float64
 4   Count_min     176 non-null    float64
 5   Count_max     176 non-null    float64
 6   WHO Region    176 non-null    object 
dtypes: float64(3), int64(1), object(3)
memory usage: 11.0+ KB


In [127]:
afr_adult.dtypes

Country          object
Year              int64
Count            object
Count_median    float64
Count_min       float64
Count_max       float64
WHO Region       object
dtype: object

# next is importing data for no of death cases and exptracting death cases for africa

In [128]:
death_case = pd.read_csv(r"C:\Users\user1\Downloads\archive (6)\no_of_deaths_by_country_clean.csv")
death_case.head()

Unnamed: 0,Country,Year,Count,Count_median,Count_min,Count_max,WHO Region
0,Afghanistan,2018,500[200–610],500.0,200.0,610.0,Eastern Mediterranean
1,Albania,2018,na,,,,Europe
2,Algeria,2018,200[200–200],200.0,200.0,200.0,Africa
3,Angola,2018,14000[9500–18000],14000.0,9500.0,18000.0,Africa
4,Argentina,2018,1700[1300–2100],1700.0,1300.0,2100.0,Americas


In [129]:
afr_death_case = death_case[death_case['WHO Region'] == 'Africa']
afr_death_case.head()

Unnamed: 0,Country,Year,Count,Count_median,Count_min,Count_max,WHO Region
2,Algeria,2018,200[200–200],200.0,200.0,200.0,Africa
3,Angola,2018,14000[9500–18000],14000.0,9500.0,18000.0,Africa
16,Benin,2018,2200[1100–4400],2200.0,1100.0,4400.0,Africa
20,Botswana,2018,4800[4100–5700],4800.0,4100.0,5700.0,Africa
24,Burkina Faso,2018,3300[2400–4400],3300.0,2400.0,4400.0,Africa


# importing no of people living with hiv dataset and extract data for africa

In [130]:
hiv_living_data = pd.read_csv(r"C:\Users\user1\Downloads\archive (6)\no_of_people_living_with_hiv_by_country_clean.csv")
hiv_living_data.head()

Unnamed: 0,Country,Year,Count,Count_median,Count_min,Count_max,WHO Region
0,Afghanistan,2018,7200[4100–11000],7200.0,4100.0,11000.0,Eastern Mediterranean
1,Albania,2018,na,,,,Europe
2,Algeria,2018,16000[15000–17000],16000.0,15000.0,17000.0,Africa
3,Angola,2018,330000[290000–390000],330000.0,290000.0,390000.0,Africa
4,Argentina,2018,140000[130000–150000],140000.0,130000.0,150000.0,Americas


In [131]:
afr_hiv_living_data = hiv_living_data[hiv_living_data['WHO Region'] == 'Africa']
afr_hiv_living_data.head()

Unnamed: 0,Country,Year,Count,Count_median,Count_min,Count_max,WHO Region
2,Algeria,2018,16000[15000–17000],16000.0,15000.0,17000.0,Africa
3,Angola,2018,330000[290000–390000],330000.0,290000.0,390000.0,Africa
16,Benin,2018,73000[48000–120000],73000.0,48000.0,120000.0,Africa
20,Botswana,2018,370000[330000–400000],370000.0,330000.0,400000.0,Africa
24,Burkina Faso,2018,96000[78000–120000],96000.0,78000.0,120000.0,Africa


# import mother2child prevention data and extract for africa

In [132]:
mother2child_prevention = pd.read_csv(r"C:\Users\user1\Downloads\archive (6)\prevention_of_mother_to_child_transmission_by_country_clean.csv")
mother2child_prevention.head()

Unnamed: 0,Country,Received Antiretrovirals,Needing antiretrovirals,Percentage Recieved,Needing antiretrovirals_median,Needing antiretrovirals_min,Needing antiretrovirals_max,Percentage Recieved_median,Percentage Recieved_min,Percentage Recieved_max,WHO Region
0,Afghanistan,20,200[100–500],11[7–18],200.0,100.0,500.0,11.0,7.0,18.0,Eastern Mediterranean
1,Albania,No data,Nodata,Nodata,,,,,,,Europe
2,Algeria,320,500[500–500],74[69–78],500.0,500.0,500.0,74.0,69.0,78.0,Africa
3,Angola,9600,25000[19000–32000],38[29–48],25000.0,19000.0,32000.0,38.0,29.0,48.0,Africa
4,Argentina,1800,1800[1600–2000],95[85–95],1800.0,1600.0,2000.0,95.0,85.0,95.0,Americas


In [133]:
afr_mother2child_prevention = mother2child_prevention[mother2child_prevention['WHO Region'] == 'Africa']
afr_mother2child_prevention.head()

Unnamed: 0,Country,Received Antiretrovirals,Needing antiretrovirals,Percentage Recieved,Needing antiretrovirals_median,Needing antiretrovirals_min,Needing antiretrovirals_max,Percentage Recieved_median,Percentage Recieved_min,Percentage Recieved_max,WHO Region
2,Algeria,320,500[500–500],74[69–78],500.0,500.0,500.0,74.0,69.0,78.0,Africa
3,Angola,9600,25000[19000–32000],38[29–48],25000.0,19000.0,32000.0,38.0,29.0,48.0,Africa
16,Benin,4600,2600[1600–4300],95[95–95],2600.0,1600.0,4300.0,95.0,95.0,95.0,Africa
20,Botswana,12 400,13000[10000–14000],95[77–95],13000.0,10000.0,14000.0,95.0,77.0,95.0,Africa
24,Burkina Faso,4700,4900[3600–6100],95[71–95],4900.0,3600.0,6100.0,95.0,71.0,95.0,Africa


# import art courarge data and extract for africa

In [134]:
artCoverage_data = pd.read_csv(r"C:\Users\user1\Downloads\archive (6)\art_coverage_by_country_clean.csv")
artCoverage_data.head()

Unnamed: 0,Country,Reported number of people receiving ART,Estimated number of people living with HIV,Estimated ART coverage among people living with HIV (%),Estimated number of people living with HIV_median,Estimated number of people living with HIV_min,Estimated number of people living with HIV_max,Estimated ART coverage among people living with HIV (%)_median,Estimated ART coverage among people living with HIV (%)_min,Estimated ART coverage among people living with HIV (%)_max,WHO Region
0,Afghanistan,920,7200[4100–11000],13[7–20],7200.0,4100.0,11000.0,13.0,7.0,20.0,Eastern Mediterranean
1,Albania,580,Nodata,Nodata,,,,,,,Europe
2,Algeria,12800,16000[15000–17000],81[75–86],16000.0,15000.0,17000.0,81.0,75.0,86.0,Africa
3,Angola,88700,330000[290000–390000],27[23–31],330000.0,290000.0,390000.0,27.0,23.0,31.0,Africa
4,Argentina,85500,140000[130000–150000],61[55–67],140000.0,130000.0,150000.0,61.0,55.0,67.0,Americas


In [135]:
afr_artCoverage_data = artCoverage_data[artCoverage_data['WHO Region'] == 'Africa']
afr_artCoverage_data.head()

Unnamed: 0,Country,Reported number of people receiving ART,Estimated number of people living with HIV,Estimated ART coverage among people living with HIV (%),Estimated number of people living with HIV_median,Estimated number of people living with HIV_min,Estimated number of people living with HIV_max,Estimated ART coverage among people living with HIV (%)_median,Estimated ART coverage among people living with HIV (%)_min,Estimated ART coverage among people living with HIV (%)_max,WHO Region
2,Algeria,12800,16000[15000–17000],81[75–86],16000.0,15000.0,17000.0,81.0,75.0,86.0,Africa
3,Angola,88700,330000[290000–390000],27[23–31],330000.0,290000.0,390000.0,27.0,23.0,31.0,Africa
16,Benin,44200,73000[48000–120000],61[40–95],73000.0,48000.0,120000.0,61.0,40.0,95.0,Africa
20,Botswana,307000,370000[330000–400000],83[75–90],370000.0,330000.0,400000.0,83.0,75.0,90.0,Africa
24,Burkina Faso,59300,96000[78000–120000],62[50–75],96000.0,78000.0,120000.0,62.0,50.0,75.0,Africa


# import art pediatric coverage data and extract africa data

In [136]:
pediatricCoverage_df = pd.read_csv(r"C:\Users\user1\Downloads\archive (6)\art_pediatric_coverage_by_country_clean.csv")
pediatricCoverage_df.head()

Unnamed: 0,Country,Reported number of children receiving ART,Estimated number of children needing ART based on WHO methods,Estimated ART coverage among children (%),Estimated number of children needing ART based on WHO methods_median,Estimated number of children needing ART based on WHO methods_min,Estimated number of children needing ART based on WHO methods_max,Estimated ART coverage among children (%)_median,Estimated ART coverage among children (%)_min,Estimated ART coverage among children (%)_max,WHO Region
0,Afghanistan,60,500[500-530],17[10-26],500.0,500.0,530.0,17.0,10.0,26.0,Eastern Mediterranean
1,Albania,20,Nodata,Nodata,,,,,,,Europe
2,Algeria,770,500[500-520],95[95-95],500.0,500.0,520.0,95.0,95.0,95.0,Africa
3,Angola,4800,38000[30000-47000],13[10-16],38000.0,30000.0,47000.0,13.0,10.0,16.0,Africa
4,Argentina,1700,1800[1600-2100],92[84-95],1800.0,1600.0,2100.0,92.0,84.0,95.0,Americas


In [137]:
afr_pediatricCoverage_df = pediatricCoverage_df[pediatricCoverage_df['WHO Region'] == 'Africa']
afr_pediatricCoverage_df.head()

Unnamed: 0,Country,Reported number of children receiving ART,Estimated number of children needing ART based on WHO methods,Estimated ART coverage among children (%),Estimated number of children needing ART based on WHO methods_median,Estimated number of children needing ART based on WHO methods_min,Estimated number of children needing ART based on WHO methods_max,Estimated ART coverage among children (%)_median,Estimated ART coverage among children (%)_min,Estimated ART coverage among children (%)_max,WHO Region
2,Algeria,770,500[500-520],95[95-95],500.0,500.0,520.0,95.0,95.0,95.0,Africa
3,Angola,4800,38000[30000-47000],13[10-16],38000.0,30000.0,47000.0,13.0,10.0,16.0,Africa
16,Benin,2000,4600[2800-8000],44[27-77],4600.0,2800.0,8000.0,44.0,27.0,77.0,Africa
20,Botswana,5400,14000[10000-17000],38[28-46],14000.0,10000.0,17000.0,38.0,28.0,46.0,Africa
24,Burkina Faso,1900,9100[6300-12000],21[15-29],9100.0,6300.0,12000.0,21.0,15.0,29.0,Africa


## Data cleaning and preprocessing

In [138]:
african_adult.isnull().sum()

Country         0
Year            0
Count           0
Count_median    0
Count_min       0
Count_max       0
WHO Region      0
dtype: int64

# no missing data in any columns

In [142]:
african_death_case.isnull().sum()

Country         0
Year            0
Count           0
Count_median    0
Count_min       0
Count_max       0
WHO Region      0
dtype: int64

In [143]:
afr_hiv_living_data.isnull().sum()

Country         0
Year            0
Count           0
Count_median    0
Count_min       0
Count_max       0
WHO Region      0
dtype: int64

In [144]:
afr_mother2child_prevention.isnull().sum()

Country                           0
Received Antiretrovirals          0
Needing antiretrovirals           0
Percentage Recieved               0
Needing antiretrovirals_median    3
Needing antiretrovirals_min       3
Needing antiretrovirals_max       3
Percentage Recieved_median        3
Percentage Recieved_min           3
Percentage Recieved_max           3
WHO Region                        0
dtype: int64

# replace missing data with the mean value of the columns and drop less important columns

In [145]:
afr_mother2child_prevention.drop(columns=['Needing antiretrovirals','Needing antiretrovirals_min','Needing antiretrovirals_max','Percentage Recieved','Percentage Recieved_min','Percentage Recieved_max'], axis=1, inplace=True)

In [146]:
afr_mother2child_prevention.isnull().sum()

Country                           0
Received Antiretrovirals          0
Needing antiretrovirals_median    3
Percentage Recieved_median        3
WHO Region                        0
dtype: int64

In [147]:
afr_mother2child_prevention['Needing antiretrovirals_median'].fillna(afr_mother2child_prevention['Needing antiretrovirals_median'].mean(), inplace=True)

In [148]:
afr_mother2child_prevention['Percentage Recieved_median'].fillna(afr_mother2child_prevention['Percentage Recieved_median'].mean(), inplace=True)

In [149]:
afr_mother2child_prevention.isnull().sum()

Country                           0
Received Antiretrovirals          0
Needing antiretrovirals_median    0
Percentage Recieved_median        0
WHO Region                        0
dtype: int64

In [150]:
afr_artCoverage_data.head().isnull().sum()

Country                                                           0
Reported number of people receiving ART                           0
Estimated number of people living with HIV                        0
Estimated ART coverage among people living with HIV (%)           0
Estimated number of people living with HIV_median                 0
Estimated number of people living with HIV_min                    0
Estimated number of people living with HIV_max                    0
Estimated ART coverage among people living with HIV (%)_median    0
Estimated ART coverage among people living with HIV (%)_min       0
Estimated ART coverage among people living with HIV (%)_max       0
WHO Region                                                        0
dtype: int64

# drop some less important columns

In [157]:
afr_artCoverage_data.head().isnull().sum()

Country                                                           0
Reported number of people receiving ART                           0
Estimated ART coverage among people living with HIV (%)_median    0
WHO Region                                                        0
dtype: int64

In [158]:
afr_pediatricCoverage_df.isnull().sum()

Country                                                                 0
Reported number of children receiving ART                               0
Estimated number of children needing ART based on WHO methods           0
Estimated ART coverage among children (%)                               0
Estimated number of children needing ART based on WHO methods_median    2
Estimated number of children needing ART based on WHO methods_min       2
Estimated number of children needing ART based on WHO methods_max       2
Estimated ART coverage among children (%)_median                        2
Estimated ART coverage among children (%)_min                           2
Estimated ART coverage among children (%)_max                           2
WHO Region                                                              0
dtype: int64

# drop less important columns

In [159]:
afr_pediatricCoverage_df.drop(columns=['Estimated number of children needing ART based on WHO methods',
                                   'Estimated ART coverage among children (%)',
                                   'Estimated number of children needing ART based on WHO methods_min',
                                   'Estimated number of children needing ART based on WHO methods_max',
                                   'Estimated ART coverage among children (%)_min',
                                   'Estimated ART coverage among children (%)_max'],axis=1,inplace=True)

In [160]:
afr_pediatricCoverage_df.isnull().sum()

Country                                                                 0
Reported number of children receiving ART                               0
Estimated number of children needing ART based on WHO methods_median    2
Estimated ART coverage among children (%)_median                        2
WHO Region                                                              0
dtype: int64

In [161]:
afr_pediatricCoverage_df['Estimated number of children needing ART based on WHO methods_median'].fillna(afr_pediatricCoverage_df['Estimated number of children needing ART based on WHO methods_median'].mean(), inplace=True)

In [162]:
afr_pediatricCoverage_df['Estimated ART coverage among children (%)_median'].fillna(afr_pediatricCoverage_df['Estimated ART coverage among children (%)_median'].mean(), inplace=True)

In [163]:
afr_pediatricCoverage_df.isnull().sum()

Country                                                                 0
Reported number of children receiving ART                               0
Estimated number of children needing ART based on WHO methods_median    0
Estimated ART coverage among children (%)_median                        0
WHO Region                                                              0
dtype: int64

# datasets are all explored and cleaned for analysis

Quick view of the datasets

In [164]:
african_adult.head()

Unnamed: 0,Country,Year,Count,Count_median,Count_min,Count_max,WHO Region
2,Algeria,2018,0.1[0.1–0.1],0.1,0.1,0.1,Africa
3,Angola,2018,2.0[1.7–2.3],2.0,1.7,2.3,Africa
16,Benin,2018,1.0[0.7–1.7],1.0,0.7,1.7,Africa
20,Botswana,2018,20.3[17.3–21.8],20.3,17.3,21.8,Africa
24,Burkina Faso,2018,0.7[0.6–0.9],0.7,0.6,0.9,Africa


In [165]:
afr_hiv_living_data.head()

Unnamed: 0,Country,Year,Count,Count_median,Count_min,Count_max,WHO Region
2,Algeria,2018,16000[15000–17000],16000.0,15000.0,17000.0,Africa
3,Angola,2018,330000[290000–390000],330000.0,290000.0,390000.0,Africa
16,Benin,2018,73000[48000–120000],73000.0,48000.0,120000.0,Africa
20,Botswana,2018,370000[330000–400000],370000.0,330000.0,400000.0,Africa
24,Burkina Faso,2018,96000[78000–120000],96000.0,78000.0,120000.0,Africa


In [166]:
african_death_case.head()

Unnamed: 0,Country,Year,Count,Count_median,Count_min,Count_max,WHO Region
2,Algeria,2018,200[200–200],200.0,200.0,200.0,Africa
3,Angola,2018,14000[9500–18000],14000.0,9500.0,18000.0,Africa
16,Benin,2018,2200[1100–4400],2200.0,1100.0,4400.0,Africa
20,Botswana,2018,4800[4100–5700],4800.0,4100.0,5700.0,Africa
24,Burkina Faso,2018,3300[2400–4400],3300.0,2400.0,4400.0,Africa


In [167]:
afr_mother2child_prevention.head()

Unnamed: 0,Country,Received Antiretrovirals,Needing antiretrovirals_median,Percentage Recieved_median,WHO Region
2,Algeria,320,500.0,74.0,Africa
3,Angola,9600,25000.0,38.0,Africa
16,Benin,4600,2600.0,95.0,Africa
20,Botswana,12 400,13000.0,95.0,Africa
24,Burkina Faso,4700,4900.0,95.0,Africa


In [168]:
afr_artCoverage_data.head()

Unnamed: 0,Country,Reported number of people receiving ART,Estimated ART coverage among people living with HIV (%)_median,WHO Region
2,Algeria,12800,81.0,Africa
3,Angola,88700,27.0,Africa
16,Benin,44200,61.0,Africa
20,Botswana,307000,83.0,Africa
24,Burkina Faso,59300,62.0,Africa


In [169]:
afr_pediatricCoverage_df.head()

Unnamed: 0,Country,Reported number of children receiving ART,Estimated number of children needing ART based on WHO methods_median,Estimated ART coverage among children (%)_median,WHO Region
2,Algeria,770,500.0,95.0,Africa
3,Angola,4800,38000.0,13.0,Africa
16,Benin,2000,4600.0,44.0,Africa
20,Botswana,5400,14000.0,38.0,Africa
24,Burkina Faso,1900,9100.0,21.0,Africa


## Data Analysis and Visualization

### Number of HIV cases among adults (19-45)


#### Country-wise:

In [170]:
country_cases = african_adult.groupby(['Country','Year']).mean()['Count_median']

In [171]:
country_cases

Country   Year
Algeria   2000     0.1
          2005     0.1
          2010     0.1
          2018     0.1
Angola    2000     1.0
                  ... 
Zambia    2018    11.3
Zimbabwe  2000    25.0
          2005    19.0
          2010    15.4
          2018    12.7
Name: Count_median, Length: 176, dtype: float64

In [172]:
country_cases_fig = px.bar(country_cases.reset_index(), x='Year', y='Count_median', color='Country',
             labels={'Year': 'Year', 'Count_median': 'Mean Count_median'},
             title='Mean Count_median by Country and Year in Africa')

In [173]:
country_cases_fig.show()


##### \*From the above different plots, Eswatini, in South Africa, is the place which holds the tragically highest number of HIV AIDS cases. However, Botswana and Zimbabwe which had the highest number of cases in 2000 at the start of the decade, has shown a steady decrease in the number of HIV cases, Lesotho holds the second highest number of cases throughout the decade.\*

### Number of deaths due to HIV AIDS

#### Country-wise:

In [174]:
countrywisedeaths = african_death_case.groupby(['Country','Year']).mean()['Count_median']

In [175]:
countrywisedeaths_fig = px.bar(countrywisedeaths.reset_index(), x='Year', y='Count_median', color='Country',
             labels={'Year': 'Year', 'Count_median': 'Mean Count_median'},
             title='Mean Count_median by Country and Year in Africa')

In [176]:
countrywisedeaths_fig.show()

##### \*In the start of the decade during 2000, Kenya and Zimbabwe holds the greatest amount of deaths due to HIV AIDS. But, their amount reduced greatly throughout the decade. However, South Africa again holds the highest number of deaths due to HIV AIDS throughout the decade. Other significant increase in number of deaths are shown in Nigeria, Indonesia and Mozambique.\*


## Number of people living with HIV AIDS

### Country-wise

In [177]:
afr_peoplewithiv = pd.DataFrame(afr_hiv_living_data.groupby(['Country', 'Year']).mean()['Count_median'])


In [178]:
afr_peoplewithiv


Unnamed: 0_level_0,Unnamed: 1_level_0,Count_median
Country,Year,Unnamed: 2_level_1
Algeria,2000,1900.0
Algeria,2005,3700.0
Algeria,2010,7100.0
Algeria,2018,16000.0
Angola,2000,87000.0
...,...,...
Zambia,2018,1200000.0
Zimbabwe,2000,1600000.0
Zimbabwe,2005,1400000.0
Zimbabwe,2010,1200000.0


In [179]:
# Reset the index to make it suitable for plotting
afr_peoplewithiv_reset = afr_peoplewithiv.reset_index()

# Create a bar plot
fig = px.bar(afr_peoplewithiv_reset, x='Year', y='Count_median', color='Country',
             labels={'Year': 'Year', 'Count_median': 'Mean Count_median'},
             title='Mean Count_median by Country and Year for People with HIV in Africa')

# Show the plot
fig.show()


#### \* From the above plot, it is clear that South Africa has the highest number of people living with HIV AIDS throughout the decade.\*

## Prevention of mother-to-child transmission estimate

In [180]:
afr_mother2child_prevention['Received Antiretrovirals'].value_counts()

8600       2
320        1
57 500     1
1800       1
280        1
47 100     1
2400       1
50         1
110        1
109 000    1
12 100     1
940        1
43 700     1
1400       1
4400       1
248 000    1
5500       1
4300       1
94 800     1
77 000     1
56 500     1
8200       1
1000       1
9600       1
3300       1
4600       1
12 400     1
4700       1
4000       1
No data    1
21 700     1
3200       1
5600       1
0          1
16 500     1
11 400     1
1300       1
230        1
18 400     1
1900       1
700        1
13 000     1
59 600     1
Name: Received Antiretrovirals, dtype: int64

In [181]:
afr_mother2child_prevention['Received Antiretrovirals'] = pd.to_numeric(afr_mother2child_prevention['Received Antiretrovirals'], errors="coerce")
afr_mother2child_prevention['Received Antiretrovirals'] = afr_mother2child_prevention['Received Antiretrovirals'].astype(float)

In [182]:
preventionofTransmission = pd.DataFrame(afr_mother2child_prevention.groupby("Country").mean()['Received Antiretrovirals'])
preventionofTransmission

Unnamed: 0_level_0,Received Antiretrovirals
Country,Unnamed: 1_level_1
Algeria,320.0
Angola,9600.0
Benin,4600.0
Botswana,
Burkina Faso,4700.0
Burundi,4000.0
Cabo Verde,
Cameroon,
Central African Republic,3200.0
Chad,5600.0


# drop columns with NaN

In [183]:
preventionofTransmission = preventionofTransmission.dropna()
preventionofTransmission

Unnamed: 0_level_0,Received Antiretrovirals
Country,Unnamed: 1_level_1
Algeria,320.0
Angola,9600.0
Benin,4600.0
Burkina Faso,4700.0
Burundi,4000.0
Central African Republic,3200.0
Chad,5600.0
Comoros,0.0
Equatorial Guinea,1300.0
Eritrea,230.0


In [184]:
# Reset the index to make it suitable for plotting
preventionofTransmission_reset = preventionofTransmission.reset_index()

# Create a bar plot
fig = px.bar(preventionofTransmission_reset, x='Country', y='Received Antiretrovirals',
             color='Country',
             labels={'Received Antiretrovirals': 'Mean Received Antiretrovirals'},
             title='Mean Received Antiretrovirals by Country for Prevention of Mother-to-Child Transmission in Africa')

# Show the plot
fig.show()


### From the above plot, it is clear that Angola has the highest number of pregnant women who received antiretrovirals. 

In [185]:
needAR_women = pd.DataFrame(afr_mother2child_prevention.groupby('Country').mean()['Needing antiretrovirals_median'])
needAR_women

Unnamed: 0_level_0,Needing antiretrovirals_median
Country,Unnamed: 1_level_1
Algeria,500.0
Angola,25000.0
Benin,2600.0
Botswana,13000.0
Burkina Faso,4900.0
Burundi,5000.0
Cabo Verde,28319.512195
Cameroon,27000.0
Central African Republic,4500.0
Chad,10000.0


In [186]:
# Reset the index to make it suitable for plotting
needAR_women_reset = needAR_women.reset_index()

# Create a bar plot
fig = px.bar(needAR_women_reset, x='Country', y='Needing antiretrovirals_median',
             color='Country',
             labels={'Antiretrovirals': 'Received Antiretrovirals'},
             title='Mean Received Antiretrovirals by Country for Prevention of Mother-to-Child Transmission in Africa')

# Show the plot
fig.show()


### From the above plot, it is clear that South Africa has the highest number of women who needs antiretrovirals to prevent transmission. 

## ART (Anti Retro-viral Therapy) coverage among people living with HIV estimates:

In [187]:
afr_artCoverage_data['Reported number of people receiving ART'].value_counts()

12800      1
88700      1
13900      1
3500       1
814000     1
47100      1
3000       1
2800       1
1213000    1
184000     1
19800      1
1016000    1
194000     1
26600      1
28400      1
4788000    1
30700      1
64800      1
1004000    1
1109000    1
965000     1
206000     1
1068000    1
14600      1
100        1
44200      1
307000     1
59300      1
65500      1
2200       1
281000     1
39600      1
61400      1
252000     1
48600      1
256000     1
21400      1
8900       1
177000     1
450000     1
35600      1
7500       1
113000     1
1151000    1
Name: Reported number of people receiving ART, dtype: int64

In [188]:
afr_artCoverage_data['Reported number of people receiving ART'] = pd.to_numeric(afr_artCoverage_data['Reported number of people receiving ART'], errors="coerce")
afr_artCoverage_data['Reported number of people receiving ART'] = afr_artCoverage_data['Reported number of people receiving ART'].astype(float)

In [189]:
PeoplereceivingART = pd.DataFrame(afr_artCoverage_data.groupby('Country').mean()['Reported number of people receiving ART'])
PeoplereceivingART

Unnamed: 0_level_0,Reported number of people receiving ART
Country,Unnamed: 1_level_1
Algeria,12800.0
Angola,88700.0
Benin,44200.0
Botswana,307000.0
Burkina Faso,59300.0
Burundi,65500.0
Cabo Verde,2200.0
Cameroon,281000.0
Central African Republic,39600.0
Chad,61400.0


In [190]:
colors = ['red', 'blue', 'green', 'orange', 'purple']
# Create a Plotly Express bar chart
fig = px.bar(PeoplereceivingART, x=PeoplereceivingART.index, y='Reported number of people receiving ART',
             title="People who receive ART - country", color=PeoplereceivingART.index, color_discrete_sequence=colors)

# Customize the layout
fig.update_layout(
    xaxis_title="country",
    yaxis_title="Number of people who receive ART",
    xaxis=dict(tickangle=90),
    legend_title="country",
)

# Show the plot
fig.show()



#### South Africa tends to have the greatest number of people who are receiving ART. Other countries have significantly low.

In [191]:
ARTcoverage = pd.DataFrame(afr_artCoverage_data.groupby('Country').mean()['Estimated ART coverage among people living with HIV (%)_median'])
ARTcoverage

Unnamed: 0_level_0,Estimated ART coverage among people living with HIV (%)_median
Country,Unnamed: 1_level_1
Algeria,81.0
Angola,27.0
Benin,61.0
Botswana,83.0
Burkina Faso,62.0
Burundi,80.0
Cabo Verde,89.0
Cameroon,52.0
Central African Republic,36.0
Chad,51.0


In [192]:
# Define a color scale (you can customize this)
colors = ['red', 'blue', 'green', 'orange', 'purple']

# Create a Plotly Express bar chart with colors
fig = px.bar(ARTcoverage, x=ARTcoverage.index, y='Estimated ART coverage among people living with HIV (%)_median',
             title="ART coverage of people living with HIV - country", color=ARTcoverage.index, color_discrete_sequence=colors)

# Customize the layout
fig.update_layout(
    xaxis_title="Country",
    yaxis_title="Estimated ART Coverage (%)",
    xaxis=dict(tickangle=90),
    plot_bgcolor='lightgray',  # Set the background color
)

# Show the plot
fig.show()



#### Although the number of people receiving ART is more in South Africa, average % wise is high in Namibia,Cabo Verde and Zimbabwe.

## ART (Anti Retro-viral Therapy) coverage among children estimates:

In [193]:
afr_pediatricCoverage_df['Reported number of children receiving ART'].value_counts()

770        2
1900       2
3400       2
No data    2
8600       2
1400       1
180        1
40         1
86 900     1
1300       1
50 200     1
7500       1
163 000    1
1100       1
90         1
1500       1
4200       1
67 100     1
59 600     1
49 100     1
45 100     1
74 300     1
680        1
16 000     1
2000       1
5400       1
10 300     1
2500       1
2600       1
12 300     1
500        1
8500       1
21 500     1
550        1
5900       1
2100       1
350        1
4800       1
63 900     1
Name: Reported number of children receiving ART, dtype: int64

##### \* The value "No data" is replaced with 0. \*

In [194]:
afr_pediatricCoverage_df['Reported number of children receiving ART'].replace("No data", 0, inplace=True)
afr_pediatricCoverage_df['Reported number of children receiving ART'] = pd.to_numeric(afr_pediatricCoverage_df['Reported number of children receiving ART'], errors="coerce")
afr_pediatricCoverage_df['Reported number of children receiving ART'] = afr_pediatricCoverage_df['Reported number of children receiving ART'].astype(float)

In [195]:
ChildrenreceivingART = pd.DataFrame(afr_pediatricCoverage_df.groupby('Country').mean()['Reported number of children receiving ART'])
ChildrenreceivingART

Unnamed: 0_level_0,Reported number of children receiving ART
Country,Unnamed: 1_level_1
Algeria,770.0
Angola,4800.0
Benin,2000.0
Botswana,5400.0
Burkina Faso,1900.0
Burundi,3400.0
Cabo Verde,0.0
Cameroon,
Central African Republic,2500.0
Chad,2600.0


In [203]:
fig = px.bar(ChildrenreceivingART, x=ChildrenreceivingART.index, y='Reported number of children receiving ART', 
             title="Children who receive ART - Africa",
             labels={'Reported number of children receiving ART': 'Number of children who receive ART'},
             color=ChildrenreceivingART.index,  # Color by the index
             height=500, width=1000)

fig.update_xaxes(tickangle=90)
fig.show()


#### Eswatini and Namibia has the highest number of children who receives ART, followed by Lesotho.

In [197]:
childrenneedingART = pd.DataFrame(afr_pediatricCoverage_df.groupby('Country').mean()['Estimated number of children needing ART based on WHO methods_median'])
childrenneedingART

Unnamed: 0_level_0,Estimated number of children needing ART based on WHO methods_median
Country,Unnamed: 1_level_1
Algeria,500.0
Angola,38000.0
Benin,4600.0
Botswana,14000.0
Burkina Faso,9100.0
Burundi,11000.0
Cabo Verde,36200.0
Cameroon,43000.0
Central African Republic,11000.0
Chad,16000.0


In [198]:
fig = px.bar(childrenneedingART, x=childrenneedingART.index, 
             y='Estimated number of children needing ART based on WHO methods_median', 
             title="Children who need ART - Country",
             labels={'Estimated number of children needing ART based on WHO methods_median': 'Number of children who need ART'},
             color=childrenneedingART.index,  # Color by the index
             height=500, width=1100)

fig.update_xaxes(tickangle=90)
fig.show()


### South Africa has the tremendously highest number of children who needs ART rather than those who are receiving it!!

In [199]:
ChildARTcoverage = pd.DataFrame(afr_pediatricCoverage_df.groupby('Country').mean()['Estimated ART coverage among children (%)_median'])
ChildARTcoverage

Unnamed: 0_level_0,Estimated ART coverage among children (%)_median
Country,Unnamed: 1_level_1
Algeria,95.0
Angola,13.0
Benin,44.0
Botswana,38.0
Burkina Faso,21.0
Burundi,30.0
Cabo Verde,41.619048
Cameroon,24.0
Central African Republic,23.0
Chad,16.0


In [200]:
fig = px.bar(ChildARTcoverage, x=ChildARTcoverage.index, 
             y='Estimated ART coverage among children (%)_median', 
             title="Estimated ART coverage in children - Country",
             labels={'Estimated ART coverage among children (%)_median': 'Estimated ART coverage'},
             color=ChildARTcoverage.index,  # Color by the index
             height=500, width=1000)

fig.update_xaxes(tickangle=90)
fig.show()


### Though the South Africa has the highest number of children who needs ART, percent wise Algeria has the highest value of ART coverage in children.

## Conclusion:

The detailed exploration and cleaning of the datasets have provided valuable insights into the HIV/AIDS epidemic. The analysis highlights South Africa's central role in the epidemic, with the highest number of people living with HIV, the greatest need for pediatric antiretroviral treatment, the highest number of HIV-related deaths, and a substantial number of women receiving antiretrovirals.

These findings underscore the persistent global health challenge posed by HIV/AIDS. While progress has been made in specific regions, it is evident that targeted interventions, heightened awareness, enhanced treatment access, and comprehensive prevention strategies remain essential to effectively combat this enduring health crisis. The data analysis serves as a crucial foundation for informed decision-making and resource allocation in the ongoing fight against HIV/AIDS.

### Thank you :) Hope it is insightful!!