# Intergrate COPD Deaths Dataset
This file intergrates the COPD Deaths dataseet with the NSW Air Quality Data set.

## Set Up
Ensure that the required libraries are available by running the below code in the terminal before execution:
- pip install pandas


Execute the following in the jupyter notebook before execution to ensure that the required libraries are imported:


In [1]:
import pandas as pd

## Load Datasets

In [2]:
# Load data into dataframes.
df_air_quality = pd.read_csv('../../2-nsw-air-quality/data-processed-financial-year-alt.csv')
df_copd_deaths = pd.read_csv('../../3-nsw-health-stats/respiratory-health/chronic-obstructive-pulmonary-disease/deaths/data-processed-alt.csv')

# View Headers.
print("Air Quality Headers:")
print(df_air_quality.columns.tolist())
print("\nCOPD Headers:")
print(df_copd_deaths.columns.tolist())

Air Quality Headers:
['financial year', 'lhd', 'CO ppm', 'NO pphm', 'NO2 pphm', 'OZONE pphm', 'PM10 µg/m³', 'SO2 pphm']

COPD Headers:
['financial year', 'lhd', 'Female rate per 100,000 population', 'Male rate per 100,000 population']


## Data Manipulation

In [3]:
df_copd_deaths = df_copd_deaths.rename(columns={
    'Female rate per 100,000 population': 'COPD deaths [f] [rate per 100,000]',
    'Male rate per 100,000 population': 'COPD deaths [m] [rate per 100,000]',
})

## Merge Datasets

Merge dataframes on 'financial year' and 'lhd'

In [4]:
# Merge dataframes on 'date' and 'lhd' columns.
df_merged = pd.merge(df_air_quality, df_copd_deaths, on=['financial year', 'lhd'], how='inner')

# Fill NaN values with 'NA.
df_merged = df_merged.fillna('NA')

# View headers of merged dataframe.
df_merged.head()

Unnamed: 0,financial year,lhd,CO ppm,NO pphm,NO2 pphm,OZONE pphm,PM10 µg/m³,SO2 pphm,"COPD deaths [f] [rate per 100,000]","COPD deaths [m] [rate per 100,000]"
0,2011/2012,Central Coast,0.065989,0.306572,0.564177,1.586418,13.532604,0.064975,24.9,33.1
1,2012/2013,Central Coast,0.084025,0.268964,0.502566,1.719031,16.43849,0.07798,27.9,37.2
2,2013/2014,Central Coast,0.1,0.241667,0.516667,1.75,15.933333,0.091667,24.9,40.9
3,2014/2015,Central Coast,0.1,0.233333,0.466667,1.79119,15.158333,0.05,25.3,43.1
4,2015/2016,Central Coast,0.108333,0.216667,0.458333,1.691667,15.375,0.058333,28.2,40.0


## Output Merged Datasets

In [5]:
df_merged.to_csv('copd-deaths-merged.csv', index=False)