# Integrate Monthly Datasets

## Set Up

Ensure that the required libraries are available by running the below code in the terminal before execution:
- pip install pandas


Execute the following in the jupyter notebook before execution to ensure that the required libraries are imported:

In [28]:
import pandas as pd

## Load Datasets

In [29]:
# Load data into dataframes.
df_air_quality = pd.read_csv('../../2-nsw-air-quality/data-processed-financial-year.csv')
df_asthma_deaths = pd.read_csv('../../3-nsw-health-stats/respiratory-health/asthma/deaths/data-processed.csv')
df_asthma_edp = pd.read_csv('../../3-nsw-health-stats/respiratory-health/asthma/emergency-department-presentations/yearly/data-processed-alt.csv')
df_asthma_hospitalisations = pd.read_csv('../../3-nsw-health-stats/respiratory-health/asthma/hospitalisations/data-processed-alt.csv')
df_asthma_children = pd.read_csv('../../3-nsw-health-stats/respiratory-health/asthma/prevelance-in-children/data-processed-alt.csv')

# View Headers.
print("Air Quality Headers:")
print(df_air_quality.columns.tolist())

print("\nAsthma Deaths Headers:")
print(df_asthma_deaths.columns.tolist())

print("\nAsthma Emergency Department Presentations Headers:")
print(df_asthma_edp.columns.tolist())

print("\nAsthma Hospitalisations Headers:")
print(df_asthma_hospitalisations.columns.tolist())

print("\nAsthma Prevelance in Children Headers:")
print(df_asthma_children.columns.tolist())

Air Quality Headers:
['financial year', 'lhd', 'CO ppm', 'NO pphm', 'NO2 pphm', 'OZONE pphm', 'PM10 µg/m³', 'SO2 pphm']

Asthma Deaths Headers:
['lhd', 'financial year', 'rate per 100,000 population']

Asthma Emergency Department Presentations Headers:
['financial year', 'lhd', 'Female rate per 100,000 population', 'Male rate per 100,000 population']

Asthma Hospitalisations Headers:
['financial year', 'lhd', 'Female rate per 100,000 population', 'Male rate per 100,000 population']

Asthma Prevelance in Children Headers:
['lhd', 'financial year', 'per cent']


## Data Manipulation

Rename columns for clarity.

In [30]:
# Asthma Deaths
df_asthma_deaths = df_asthma_deaths.rename(columns={
    'rate per 100,000 population': 'asthma deaths [rate per 100,000]',
})

# Asthma Emergency Department Presentations
df_asthma_edp = df_asthma_edp.rename(columns={
    'Female rate per 100,000 population': 'asthma edp [f] [rate per 100,000]',
    'Male rate per 100,000 population': 'asthma edp [m] [rate per 100,000]'
})

# Asthma Hospitalisations
df_asthma_hospitalisations = df_asthma_hospitalisations.rename(columns={
    'Female rate per 100,000 population': 'asthma hospitalisations [f] [rate per 100,000]',
    'Male rate per 100,000 population': 'asthma hospitalisations [m] [rate per 100,000]'
})

# Asthma Prevelance in Children
df_asthma_children = df_asthma_children.rename(columns={
    'per cent': 'asthma prevelance in children [% of children]'
})

## Merge Datasets

Merge dataframes on 'date' and 'lhd' columns.

In [31]:
# Merge dataframes on 'date' and 'lhd' columns.
df_merged = df_air_quality
df_merged = pd.merge(df_merged, df_asthma_deaths, on=['financial year', 'lhd'], how='inner')
df_merged = pd.merge(df_merged, df_asthma_edp, on=['financial year', 'lhd'], how='inner')
df_merged = pd.merge(df_merged, df_asthma_hospitalisations, on=['financial year', 'lhd'], how='inner')
df_merged = pd.merge(df_merged, df_asthma_children, on=['financial year', 'lhd'], how='inner')

# View headers of merged dataframe.
df_merged.head()

Unnamed: 0,financial year,lhd,CO ppm,NO pphm,NO2 pphm,OZONE pphm,PM10 µg/m³,SO2 pphm,"asthma deaths [rate per 100,000]","asthma edp [f] [rate per 100,000]","asthma edp [m] [rate per 100,000]","asthma hospitalisations [f] [rate per 100,000]","asthma hospitalisations [m] [rate per 100,000]",asthma prevelance in children [% of children]
0,2014/2015,Hunter New England,0.242,0.522,0.802,1.883333,18.6775,0.132,0.825,455.6,445.5,146.7,136.0,16.3
1,2014/2015,Illawarra Shoalhaven,0.2,0.294488,0.569399,2.02,17.61,0.08,0.65,410.1,383.6,128.8,102.6,11.85
2,2014/2015,Murrumbidgee,,0.01,0.33,2.093303,,0.05,0.975,652.8,618.8,221.0,189.1,21.3
3,2014/2015,Nepean Blue Mountains,,0.115,0.45,2.025,15.66659,0.001455,1.0,447.0,379.1,188.8,176.4,14.65
4,2014/2015,Northern Sydney,,0.24,0.73,1.81,14.33,0.03,0.55,221.3,283.0,111.2,127.3,16.3


## Output Dataset

In [32]:
df_merged.to_csv('data-merged.csv', index=False)