<a href="https://colab.research.google.com/github/Peiprjs/voila/blob/main/HIV_Deaths_VS_Total_expenditure.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# HIV deaths in 0-4 year-old children against total healthcare expenditure in South Africa and the Netherlands
### A project by Ellie Petrova (i6326413) and Mar Roca (i6351071)
An alternative and likely complementary hypothesis points to the widespread use of unsafe medical practices in Africa during years following World War II, such as unsterile reuse of single-use syringes during mass vaccination, antibiotic, and anti-malaria treatment campaigns

In [None]:
# REMOVE?? Only if stuff breaks
# !pip install altair==5.2.0 --quiet
# (we needed Altair 5.2.0 specifically because of some version-specific update, so we call for it using PIP, muting the output)

In [1]:
import numpy as np
import pandas as pd
import altair as alt
import ipywidgets as widgets
from ipywidgets import interact
# This imports the required dependencies

In [2]:
healthFactors = pd.read_csv('https://raw.githubusercontent.com/NHameleers/dtz2025-datasets/master/CountryHealthFactors.csv')
healthFactors = healthFactors.rename(columns = str.strip)
# This imports the CSV with the dataset and strips leading and trailing whitespaces in variable indexes
hf_SouthAfrica = healthFactors.loc[healthFactors.Country == "South Africa", ['Year', 'Total expenditure', 'HIV/AIDS']]
hf_Netherlands = healthFactors.loc[healthFactors.Country == "Netherlands", ['Year', 'Total expenditure', 'HIV/AIDS']]
# This selects only the data that we're interested in: Years, Total expenditure and HIV/AIDS
# from rows which Country column is equal to South Africa and the Netherlands respectively

In [3]:
print(f"The datasets have both the same shape: {hf_SouthAfrica.shape[0]} rows and {hf_SouthAfrica.shape[1]} colums")
print(f"The data that we have was collected between {hf_SouthAfrica.Year.min()} and {hf_Netherlands.Year.max()}")

The datasets have both the same shape: 16 rows and 3 colums
The data that we have was collected between 2000 and 2015


By running `hf_SouthAfrica.shape` or `hf_Netherlands.shape` we get the shapes of the dataframes resulting from isolating the data that we are interested in. We can observe that both of the resulting frames have a shape of **16x3**: **16** rows and **3** columns. By running `hf_SouthAfrica.Year.min()` or `hf_SouthAfrica.Year.max()` we can find out between what years we have the data from: **2000** to **2015**

# In South Africa throughout the years (2000-2015) - Mar Roca


In [4]:
hf_SouthAfrica.head(15)

Unnamed: 0,Year,Total expenditure,HIV/AIDS
2393,2015,,3.6
2394,2014,8.8,3.7
2395,2013,8.78,4.5
2396,2012,8.79,7.6
2397,2011,8.61,8.5
2398,2010,8.5,11.0
2399,2009,8.39,19.0
2400,2008,7.75,23.5
2401,2007,7.53,26.4
2402,2006,7.57,28.1


The first step in any statistical analysis is to graph the variables we're interested in studying, to see if there is and (in case there is) what kind of relationship the two variables follow.
So, we will start by generating graphs using Altair. The next step will be to calculate some desciptive statistics.

In [5]:
hf_SouthAfrica["Year"] = hf_SouthAfrica["Year"].astype(str)
# We need to perform this type conversion because of Altair being weird with years. According to the documentation, we must define it as string and specify the variable type as temporal
base = alt.Chart(hf_SouthAfrica).mark_circle(opacity=0.5).encode(
    alt.X('Year', type='temporal', scale=alt.Scale(zero=False)),
    alt.Y('HIV/AIDS', type='quantitative'),)

# This first part draws the dots in the scatter plot
base + base.transform_loess('Year', 'HIV/AIDS').mark_line()
# This second part draws a LOESS (LOcally Estimated Scatterplot Smoothing) line, which makes seeing the evolution easier.

The code above generates a simple xy scatterplot which shows that, except for an increase before 2004, HIV cases have been steadily decreasing since 2004.

In [6]:
hf_SouthAfrica.describe()

Unnamed: 0,Total expenditure,HIV/AIDS
count,15.0,16.0
mean,8.306,18.49375
std,0.480473,10.166053
min,7.53,3.6
25%,7.85,8.275
50%,8.39,22.4
75%,8.74,26.975
max,8.9,29.7


# Comparing South Africa and The Netherlands in [year]

