<a href="https://colab.research.google.com/github/dqminhv/owid-copd-us-uganda/blob/main/us_uganda_copd.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This work book calculates the crude death rate and age-standardized death rate from chronic obstructive pulmonary disease (COPD) in the United States and Uganda in 2019. To obtain the results, the following steps are taken.



1. ***Data collection***
>* **Age-specific death rates from COPD for both the United States and Uganda in 2019**: Provided by Our World in Data.
>* **Age-standardized population**: Obtain from WHO Standard Population — Table 1 in 'Ahmad OB, Boschi-Pinto C, Lopez AD, Murray CJ, Lozano R, Inoue M (2001). Age standardization of rates: a new WHO standard.'
>* **United States and Uganda population in 2019 in 5-year age groups**: Obtained from UN World Population Prospects (2022) — Population Estimates 1950-2021

2. ***Formula***

>*   Pi: Population of age group i
>*   Di: Death rate from COPD of age group i
>*   APi: Age-standardized population of age group i

> a. Crude death rate by COPD
>>General formula: **Total death by COPD / Total population**

>>Total death by COPD is calculated by  **SUM(Di * Pi) / SUM(Pi)**

> b. Age-standardized death rate by COPD
>>General formula: **Total death by COPD / Total population**

>>Age-standardized death rate by COPD is calculated by  **SUM(Di * SUM(Pi) * APi) / SUM(Pi)**


3. ***Result***


* **Crude death rate** for COPD of the ***US*** in 2019 is **56.9** per 100,000
* **Crude death rate** for COPD of ***Uganda*** in 2019 is **5.8** per 100,000
* **Age-standardized death rate** for COPD of the ***US*** in 2019 is **28.4** per 100,000
* **Age-standardized death rate** for COPD of ***Uganda*** in 2019 is **28.7** per 100,000







# 1.Import packages and data





In [1]:
# Import required packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
# Create paths for each data files
population_data_path = '/content/drive/MyDrive/Colab Notebooks/data/WPP2022_Population1JanuaryByAge5GroupSex_Medium.csv'
age_specific_death_rates_of_copd_data_path = '/content/drive/MyDrive/Colab Notebooks/data/age-specific-death-rates-of-copd.csv'
who_age_std_data_path = '/content/drive/MyDrive/Colab Notebooks/data/WHO_std_pop_dist.csv'

In [3]:
#Read the files into dataframes
world_pop = pd.read_csv(population_data_path, low_memory=False)
age_specific_death_rates_of_copd = pd.read_csv(age_specific_death_rates_of_copd_data_path)
who_age_std = pd.read_csv(who_age_std_data_path)

In [4]:
#Create a data frame for US population in 2019 from the world population data
us_pop_2019 = world_pop.loc[(world_pop['Location']=='United States of America') & (world_pop['Time']==2019)][['ISO2_code', 'AgeGrp', 'PopTotal']].reset_index(drop=True)

In [5]:
#Create a data frame for Uganda population in 2019 from the world population data
uganda_pop_2019 = world_pop.loc[(world_pop['Location']=='Uganda') & (world_pop['Time']==2019)][['ISO2_code', 'AgeGrp', 'PopTotal']].reset_index(drop=True)

# 2.Grouping age groups above 85+ together

In the UN World Population data, ages above 85 are divided into smaller subgroups, whereas the WHO Standard Population and Table of age-specific death rates of COPD group all ages above 85 into a single category. Therefore, we must consolidate the age groups above 85 in the UN World Population data into a single group labeled 85+.

## a. United States

In [6]:
# 85+ row data by summing up all age groups above 85
us_85_pop = us_pop_2019['PopTotal'][-4:].sum()
us_85 = pd.DataFrame({'ISO2_code': ['US'], 'AgeGrp': ['85+'], 'PopTotal': [us_85_pop]})

# Add the new row
us_pop_2019 = pd.concat([us_pop_2019, us_85], ignore_index=True)

In [7]:
#Remove rows 17 to 20, which are the groups of 85-89, 90-94, 95-99, 100+
us_pop_2019.drop(index=[17, 18, 19, 20], inplace=True)
us_pop_2019.reset_index(drop=True, inplace=True)

## b.Uganda

In [8]:
# 85+ row data by summing up all age groups above 85
uganda_85_pop = uganda_pop_2019['PopTotal'][-4:].sum()
uganda_85 = pd.DataFrame({'ISO2_code': ['UG'], 'AgeGrp': ['85+'], 'PopTotal': [uganda_85_pop]})

# Add the new row
uganda_pop_2019 = pd.concat([uganda_pop_2019, uganda_85], ignore_index=True)

In [9]:
#Remove rows 17 to 20, which are the groups of 85-89, 90-94, 95-99, 100+
uganda_pop_2019.drop(index=[17, 18, 19, 20], inplace=True)
uganda_pop_2019.reset_index(drop=True, inplace=True)

In [20]:
us_pop_2019

Unnamed: 0,ISO2_code,AgeGrp,PopTotal
0,US,0-4,19960.257
1,US,5-9,20732.962
2,US,10-14,22052.246
3,US,15-19,21839.549
4,US,20-24,21896.017
5,US,25-29,23475.072
6,US,30-34,22665.777
7,US,35-39,22209.426
8,US,40-44,20510.398
9,US,45-49,21391.729


In [21]:
uganda_pop_2019

Unnamed: 0,ISO2_code,AgeGrp,PopTotal
0,UG,0-4,7244.221
1,UG,5-9,6551.165
2,UG,10-14,5830.086
3,UG,15-19,5067.026
4,UG,20-24,4264.126
5,UG,25-29,3404.442
6,UG,30-34,2532.182
7,UG,35-39,1851.236
8,UG,40-44,1475.06
9,UG,45-49,1213.931


In [22]:
age_specific_death_rates_of_copd

Unnamed: 0,Age group (years),"Death rate, United States, 2019","Death rate, Uganda, 2019"
0,0-4,0.04,0.4
1,5-10,0.02,0.17
2,11-14,0.02,0.07
3,15-19,0.02,0.23
4,20-24,0.06,0.38
5,25-29,0.11,0.4
6,30-34,0.29,0.75
7,35-39,0.56,1.11
8,40-44,1.42,2.04
9,45-49,4.0,5.51


In [23]:
who_age_std

Unnamed: 0,age-group,who-std
0,0-4,8.86
1,5-9,8.69
2,10-14,8.6
3,15-19,8.47
4,20-24,8.22
5,25-29,7.93
6,30-34,7.61
7,35-39,7.15
8,40-44,6.59
9,45-49,6.04


# 3.Crude death rates by COPD

## a. United States

In [10]:
# Crude death rate = Total death / Total population
us_crude_death_rate_2019 = \
(us_pop_2019['PopTotal'] * (age_specific_death_rates_of_copd['Death rate, United States, 2019'])).sum() / us_pop_2019['PopTotal'].sum()

In [11]:
print('Crude death rate for COPD of the US in 2019 is {} per 100,000'.format(round(us_crude_death_rate_2019, 1)))

Crude death rate for COPD of the US in 2019 is 56.9 per 100,000


## b. Uganda

In [12]:
# Crude death rate = Total death / Total population
uganda_crude_death_rate_2019 = \
(uganda_pop_2019['PopTotal'] * (age_specific_death_rates_of_copd['Death rate, Uganda, 2019'])).sum() / uganda_pop_2019['PopTotal'].sum()

In [13]:
print('Crude death rate for COPD of Uganda in 2019 is {} per 100,000'.format(round(uganda_crude_death_rate_2019, 1)))

Crude death rate for COPD of Uganda in 2019 is 5.8 per 100,000


#4.Age-standardized death rate for COPD

## a. United States

In [14]:
# Calculate the standardized population of each age group in the US
# Formula: Standardized population of age group G = Total population * Age-standardized rate of group G
us_pop_age_std_2019 = who_age_std['who-std']/100 * (us_pop_2019['PopTotal'].sum())

In [15]:
# Age-standardize death rate = Total age-standardized death rate / Total population
us_age_std_death_rate_2019 = \
(us_pop_age_std_2019 * (age_specific_death_rates_of_copd['Death rate, United States, 2019'])).sum() / us_pop_2019['PopTotal'].sum()

In [16]:
# Print the result
print('Age-standardized death rate for COPD of the US in 2019 is {} per 100,000'.format(round(us_age_std_death_rate_2019, 1)))

Age-standardized death rate for COPD of the US in 2019 is 28.4 per 100,000


## b. Uganda

In [17]:
# Calculate the standardized population of each age group in Uganda
# Formula: Standardized population of age group G = Total population * Age-standardized rate of group G
uganda_pop_age_std_2019 = who_age_std['who-std']/100 * (uganda_pop_2019['PopTotal'].sum())

In [18]:
# Age-standardize death rate = Total age-standardized death rate / Total population
uganda_age_std_death_rate_2019 = \
(uganda_pop_age_std_2019 * (age_specific_death_rates_of_copd['Death rate, Uganda, 2019'])).sum() / uganda_pop_2019['PopTotal'].sum()

In [19]:
print('Age-standardized death rate for COPD of Uganda in 2019 is {} per 100,000'.format(round(uganda_age_std_death_rate_2019, 1)))

Age-standardized death rate for COPD of Uganda in 2019 is 28.7 per 100,000
