# Analyzing the Relationship between Air Pollutants and Carbon Dioxide Emissions in Singapore

## Details
Name: Reuben Goh

Adm Num: P2205711

Class: EP0302 04

## URLs of Datasets Chosen
1. [Air Pollutant - Nitrogen Dioxide](https://beta.data.gov.sg/collections/1366/datasets/d_88dcbdd26f7adbb5a469491378abfedc/view)

2. [Air Pollutant - Ozone](https://beta.data.gov.sg/collections/1367/datasets/d_12e90ff1178704ebd56dc2fff04eef56/view)

3. [Air Pollutant - Particulate Matter PM2.5](https://beta.data.gov.sg/collections/1369/datasets/d_397fe8de643aea9927bdee32e49307ff/view)

4. [Average Daily Polyclinic Attendances for Selected Diseases](https://beta.data.gov.sg/datasets/d_5d5508f1c954f5630d7b3aa7875d01f9/view)

In [1]:
# Imports
import numpy as np
import matplotlib.pyplot as plt
import os

In [29]:
# handling data sets to be used later

# store the paths of the datasets in a dictionary
data_set_list = {
  "no2": os.path.join("datasets", "AirPollutantNitrogenDioxide.csv"),
  "ozone": os.path.join("datasets", "AirPollutantOzone.csv"),
  "pm2.5": os.path.join("datasets", "AirPollutantParticulateMatterPM2.5.csv"),
  "disease": os.path.join("datasets", "AverageDailyPolyclinicAttendancesforSelectedDiseases.csv")
}

# load the datasets into numpy arrays
no2_data = np.genfromtxt(data_set_list["no2"], delimiter=",", skip_header=1)
ozone_data = np.genfromtxt(data_set_list["ozone"], delimiter=",", skip_header=1)
pm25_data = np.genfromtxt(data_set_list["pm2.5"], delimiter=",", skip_header=1)

# load diseases (will be filtered out later to only look at respiratory diseases in this project)
disease_data = np.genfromtxt(data_set_list["disease"], delimiter=",", skip_header=1, dtype=[("epi_week", "U7"), ("disease", "U100"), ("no_of_cases", "i8")])


### Dataset 1: Air Pollutant - Nitrogen Dioxide

In [30]:
# analyzing csv data

print("============== Nitrogen Dioxide Data ==============")
print(f"There are a total of {no2_data.shape[0]} data points")

There are a total of 23 data points


### Dataset 2: Air Pollutant - Ozone

In [None]:
print("============== Ozone ==============")

### Dataset 3: Air Pollutant - Particulate Matter PM2.5

In [None]:
print("============== Particulate Matter PM2.5 ==============")

### Dataset 4: Average Daily Polyclinic Attendances for Selected Diseases

In [56]:
print("============== Yearly Polyclinic Attendances for Acute Upper Respiratory Tract Infections ==============")
print(f"Originally, there were {disease_data.shape[0]} rows in this dataset.")
print(f"Originally, there were {np.unique(disease_data['disease']).shape[0]} unique diseases in this dataset.")
print("The unique diseases are:", end=" ")
print(", ".join(list(np.unique(disease_data["disease"]))), end=".\n")

print() # empty line


# filter out respiratory diseases only to be analyzed
respiratory_disease_data = disease_data[disease_data["disease"] == "Acute Upper Respiratory Tract infections"]

print("In this project, only the data for Acute Upper Respiratory Tract infections will be analyzed.")
print(f"There are {respiratory_disease_data.shape[0]} rows in this dataset.")
print(f"Each row contains the number of cases of Acute Upper Respiratory Tract infections for a specific week.")
print(f"The dataset contains data from {respiratory_disease_data[0]['epi_week']} to {respiratory_disease_data[-1]['epi_week']}.")

# print(disease_data)
# print(respiratory_disease_data)

# fix difference in time periods between respiratory data and other air pollutant data
# first, extract year
years = np.array([int(week.split("-")[0]) for week in respiratory_disease_data["epi_week"]])
unique_years = np.unique(years)

# aggregate respiratory cases by year
respiratory_cases_by_year = np.array([np.sum(respiratory_disease_data[years == year]["no_of_cases"]) for year in unique_years])

Originally, there were 2557 rows in this dataset.
Originally, there were 5 unique diseases in this dataset.
The unique diseases are: Acute Conjunctivitis, Acute Diarrhoea, Acute Upper Respiratory Tract infections, Chickenpox, HFMD.

In this project, only the data for Acute Upper Respiratory Tract infections will be analyzed.
There are 574 rows in this dataset.
Each row contains the number of cases of Acute Upper Respiratory Tract infections for a specific week.
The dataset contains data from 2012-W0 to 2022-W5.


### Trends of Air Pollutants and CO2 Emission (Line Graph)

[('2012-W0', 'Acute Upper Respiratory Tract infections', 2932)
 ('2012-W0', 'Acute Conjunctivitis',  120)
 ('2012-W0', 'Acute Diarrhoea',  491) ...
 ('2022-W5', 'Acute Diarrhoea',  320) ('2022-W5', 'Chickenpox',    7)
 ('2022-W5', 'HFMD',   12)]
