<a href="https://colab.research.google.com/github/Patric-Ramz/Our_World_In_Data/blob/main/Our_World_In_Data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Junior Data Scientist application**

The following code shows the procedure followed to calculate the crude death rate and age-standardized death rate for Uganda and the United States of America. The two death rates are calculated on age-specific death rates from chronic obstructive pulmonary disease (COPD) in Uganda and the United States.

The crude death rate is a ratio showing the number of deaths in a year divided by the total population. The age-standardized death rate takes the crude death rate and adjusts it to reflect what the death rate would be if the countries being compared had a standardized population distribution. The population distribution matters because age can be a comfounding factor when comparing death rates.

*   The population data used to calculate the death rates are sourced from the UN World Population Prospects (2022) — Population Estimates 1950-2021
*   The standardized population used to calculate the age-standardized death rates is sourced from table 1 of Age standardization of rates: a new WHO standard

The formula used to calculate the crude death rate for each country is:
$$
$$
$$
CDR = \frac{\sum_{i} (d_i \times p_i)}{\sum_{i} p_i}
$$

Where: $$
d_i = \text{age-specific death rate in group i}
$$
$$
p_i = \text{mid year population in group i}
$$
$$
$$

the formula used to calculate the age-standardized death rate for each country is:
$$
$$
$$
ASDR = \frac{\sum_{i} (w_i \times p_i \times d_i)} {\sum_{i} (w_i \times p_i)}
$$

Where: $$
w_i = \text{proportion of the standard population in age group i}
$$
$$
p_i = \text{mid year population in group i}
$$
$$
d_i = \text{age-specific death rate in group i}
$$
$$
$$
$$
$$

The assumption made in calculating the respective death rates are:



*   Mid-year population is a sufficient estimator for total population
*   The crude death rate assumes the size of the population remains constant throughout the year
*   The age-standardized death rate assumes that the age structure of the standard population remains stable over time.
*  The calculations rely on the quality and accuracy of the underlying data







In [11]:
import pandas as pd

from google.colab import drive
drive.mount('/content/drive') # Mounting my google drive to my google colab

population_data = pd.read_csv('/content/drive/My Drive/Our_World_In_Data/unpopulation_dataportal_20240303010227.csv') # Importing the UN population data
COPD_data = pd.read_csv('/content/drive/My Drive/Our_World_In_Data/age_specific_death_rates_of_COPD.csv') # Importing the age-specific death rates of COPD
who_std_pop = pd.read_csv('/content/drive/My Drive/Our_World_In_Data/who_std_pop.csv') # Importing the WHO Standard Population

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


# **Preparing the data**

In [14]:
selected_columns = population_data[['Location', 'Age', 'Value']] # Selecting relevant columns from the population data
who_std_pop = who_std_pop[['Age Group', 'WHO']] # selecting relevant columns
who_std_pop['Proportions'] = who_std_pop['WHO'] / 100 # converting % to decimal

# Convert data to NumPy arrays
proportions = who_std_pop['Proportions'].values
usa_population = US_data['Value'].values
ug_population = Uganda_data['Value'].values
usa_death_rates = COPD_data['Death rate, United States, 2019'].values
uganda_death_rates = COPD_data['Death rate, Uganda, 2019'].values
copd_death_rate_us = COPD_data['Death rate, United States, 2019'].values
copd_death_rate_ug = COPD_data['Death rate, Uganda, 2019'].values

#splitting USA and Uganda
US_data = selected_columns[selected_columns['Location'] == 'United States of America'] # United States
Uganda_data = selected_columns[selected_columns['Location'] == 'Uganda'] # Uganda

# Aggregating the ages above 85 for the USA
sum_value = US_data['Value'].iloc[-4:].sum() # Sum the values in the last 4 rows
US_data = US_data.iloc[:-4] # Remove the last 4 rows
US_data.loc[len(US_data)] = ['United States of America', '85+', sum_value] # Change the value in the "Age" column to "85+"

# Aggregating the ages above 85 for Uganda
UG_sum_value = Uganda_data['Value'].iloc[-4:].sum() # Sum the values in the last 4 rows
Uganda_data = Uganda_data.iloc[:-4] # Remove the last 4 rows
Uganda_data.loc[len(Uganda_data)] = ['Uganda', '85+', UG_sum_value] # Change the value in the "Age" column to "85+"

# **Crude Death Rate**

In [15]:
# Calculating the number of deaths for the USA
number_of_deaths = usa_population * copd_death_rate_us * 1/100000 # Perform element-wise multiplication
US_data['number of deaths'] = number_of_deaths # Add the result back to the DataFrame

# Calculating the number of deaths for Uganda
number_of_deaths_ug = ug_population * copd_death_rate_ug * 1/100000 # Perform element-wise multiplication
Uganda_data['number of deaths'] = number_of_deaths_ug # Add the result back to the DataFrame

# Calculating the CDR for the USA
total_deaths_US = US_data['number of deaths'].sum()
population_value = US_data['Value'].sum()

CDR = total_deaths_US / population_value * 100000

sentence = f"The CDR for United States of America is {CDR:.1f} per 100,000 people."
print(sentence)

# Calculating the CDR for Uganda
total_deaths_UG = Uganda_data['number of deaths'].sum()
population_value = Uganda_data['Value'].sum()

CDR = total_deaths_UG / population_value * 100000

sentence = f"The CDR for The Uganda is {CDR:.1f} per 100,000 people."
print(sentence)

The CDR for United States of America is 57.2 per 100,000 people.
The CDR for The Uganda is 5.8 per 100,000 people.


# **Age-Standardized Death Rates**

In [10]:
#usa
usa_ASDR = proportions * usa_death_rates * usa_population
US_data['ASDR'] = usa_ASDR
standard_population_us = proportions * usa_population
US_data['Std_pop'] = standard_population_us

usa_age_standardized_death_rate = US_data['ASDR'].sum() / US_data['Std_pop'].sum()
sentence = f"The age-standardized death rate for The United States is {usa_age_standardized_death_rate:.1f} per 100,000 people."
print(sentence)

#ug
ug_ASDR = proportions * uganda_death_rates * ug_population
Uganda_data['ASDR'] = ug_ASDR
standard_population_ug = proportions * ug_population
Uganda_data['Std_pop'] = standard_population_ug

ug_age_standardized_death_rate = Uganda_data['ASDR'].sum() / Uganda_data['Std_pop'].sum()
sentence = f"The age-standardized death rate for Uganda is {ug_age_standardized_death_rate:.1f} per 100,000 people."
print(sentence)

The age-standardized death rate for The United States is 16.5 per 100,000 people.
The age-standardized death rate for Uganda is 2.2 per 100,000 people.
