# Air Quality Notebook

### Imports and Cleaning

Below is a list of all the standard imports used throughout this notebook, as well as reading in the data from csv

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import math
import calendar
%matplotlib inline
import warnings

warnings.filterwarnings('ignore') 

In [2]:
aq = pd.read_csv("airQuality.csv")

For the cleaning portion, we started by making separate columns for each of the Year, Month, and Day associated with each row, allowing for more visualization options and customizability later on. The table was then sorted based on date, with the index being reset. The values within the rows also needed to be converted from string to float in order to perform transformations to them.

In [3]:
aq[["Year", "Month", "Day"]] = aq["date"].str.split("/", expand = True)

In [4]:
aq = aq.sort_values(["Year", "Month", "Day"])

In [5]:
aq = aq.reset_index()

In [6]:
for i in range(len(aq[" pm25"])):
    aq[" pm25"][i] = aq[" pm25"][i].strip()
    if aq[" pm25"][i] == "":
        aq[" pm25"][i] = None
aq[" pm25"] = aq[" pm25"].astype(float)

for i in range(len(aq[" o3"])):
    aq[" o3"][i] = aq[" o3"][i].strip()
    if aq[" o3"][i] == "":
        aq[" o3"][i] = None
aq[" o3"] = aq[" o3"].astype(float)  

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  aq[" o3"][i] = aq[" o3"][i].strip()
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  aq[" o3"][i] = aq[" o3"][i].strip()
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  aq[" o3"][i] = None
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  aq[" o3"][i] = aq[" o3"][i].strip()
A value is trying to be set on a copy of

In [7]:
aq = aq.drop("index", axis = 1)

### Calculations

The rest of this notebook was dedicated to calculating the Air Quality Index (AQI) value for each row in the dataset. Because the table did not already include the AQI, rather the concentration of specific pollutants in the air, we had to use those values for calculations. The 2 most common concentrations that can contribute to AQI are PM 2.5 and O3, so after some research, we found the ranges of values for each concentration that equate to the range values used for AQI. We then stored these ranges in dictionaries to allow for easy access when performing the calculations. Once that was done, it was time for the actual calculation, in which we calculated the specific AQI value for each of the 2 pollutants, and whichever was the higher value was the one that was contributing more to the air quality. These values were appended to a list, one for each row, as well as specifying which type of pollutant was the main contributor. Once done with calculations, these lists were used to create columns in our existing dataframe, which could now be exported for use in Power BI.

In [8]:
dict_o3 = {}
dict_pm25 = {}

o3_vals = np.linspace(0, 374, 375)
pm25_vals = np.round(np.linspace(0, 500.4, 5004), 1)

for i in o3_vals:
    if i <= 59.0:
        dict_o3[i] = {"Level" : "Good", "Min" : 0, "Max" : 59, "AQI Min" : 0, "AQI Max" : 50}
    elif i <= 75.0:
        dict_o3[i] = {"Level" : "Moderate", "Min" : 60, "Max" : 75, "AQI Min" : 51, "AQI Max" : 100}
    elif i <= 95.0:a
        dict_o3[i] = {"Level" : "Sensitive Groups", "Min" : 76, "Max" : 95, "AQI Min" : 101, "AQI Max" : 150}
    elif i <= 115.0:
        dict_o3[i]= {"Level" : "Unhealthy", "Min" : 96, "Max" : 115, "AQI Min" : 151, "AQI Max" : 200}
    else:
        dict_o3[i] = {"Level" : "Very Unhealthy", "Min" : 116, "Max" : 374, "AQI Min" : 201, "AQI Max" : 300}
        
for i in pm25_vals:
    if i <= 15.4:
        dict_pm25[i] = {"Level" : "Good", "Min" : 0.0, "Max" : 15.4, "AQI Min" : 0, "AQI Max" : 50}
    elif i <= 35.4:
        dict_pm25[i] = {"Level" : "Moderate", "Min" : 15.4, "Max" : 35.4, "AQI Min" : 51, "AQI Max" : 100}
    elif i <= 65.4:
        dict_pm25[i] = {"Level" : "Sensitive Groups", "Min" : 35.5, "Max" : 65.4, "AQI Min" : 101, "AQI Max" : 150}
    elif i <= 150.4:
        dict_pm25[i] = {"Level" : "Unhealthy", "Min" : 65.5, "Max" : 150.4, "AQI Min" : 151, "AQI Max" : 200}
    elif i <= 250.4:
        dict_pm25[i] = {"Level" : "Very Unhealthy", "Min" : 150.5, "Max" : 250.4, "AQI Min" : 201, "AQI Max" : 300}
    elif i <= 350.4:
        dict_pm25[i] = {"Level" : "Hazardous", "Min" : 250.5, "Max" : 350.4, "AQI Min" : 301, "AQI Max" : 400}
    else:
        dict_pm25[i] = {"Level" : "Hazardous", "Min" : 350.5, "Max" : 500.4, "AQI Min" : 401, "AQI Max" : 500}

In [9]:
aqi = []
type_ = []
for i in range(len(aq)):
    o3 = 1
    pm25 = 1
    if math.isnan(aq[" o3"][i]) == True and math.isnan(aq[" pm25"][i]) == True:
        aqi.append(None)
        type_.append(None)
    elif math.isnan(aq[" o3"][i]) == True and math.isnan(aq[" pm25"][i]) != True:
        aqi.append(((dict_pm25[aq[" pm25"][i]]["AQI Max"] - dict_pm25[aq[" pm25"][i]]["AQI Min"]) / (dict_pm25[aq[" pm25"][i]]["Max"] - dict_pm25[aq[" pm25"][i]]["Min"])) * (aq[" o3"][i] - dict_pm25[aq[" pm25"][i]]["Min"]) + dict_pm25[aq[" pm25"][i]]["AQI Min"])
        type_.append("pm25")
    elif math.isnan(aq[" o3"][i]) != True and math.isnan(aq[" pm25"][i]) == True:
        aqi.append(((dict_o3[aq[" o3"][i]]["AQI Max"] - dict_o3[aq[" o3"][i]]["AQI Min"]) / (dict_o3[aq[" o3"][i]]["Max"] - dict_o3[aq[" o3"][i]]["Min"])) * (aq[" o3"][i] - dict_o3[aq[" o3"][i]]["Min"]) + dict_o3[aq[" o3"][i]]["AQI Min"])
        type_.append("o3")
    elif math.isnan(aq[" o3"][i]) != True and math.isnan(aq[" pm25"][i]) != True:
        o3 = ((dict_o3[aq[" o3"][i]]["AQI Max"] - dict_o3[aq[" o3"][i]]["AQI Min"]) / (dict_o3[aq[" o3"][i]]["Max"] - dict_o3[aq[" o3"][i]]["Min"])) * (aq[" o3"][i] - dict_o3[aq[" o3"][i]]["Min"]) + dict_o3[aq[" o3"][i]]["AQI Min"]
        pm25 = ((dict_pm25[aq[" pm25"][i]]["AQI Max"] - dict_pm25[aq[" pm25"][i]]["AQI Min"]) / (dict_pm25[aq[" pm25"][i]]["Max"] - dict_pm25[aq[" pm25"][i]]["Min"])) * (aq[" pm25"][i] - dict_pm25[aq[" pm25"][i]]["Min"]) + dict_pm25[aq[" pm25"][i]]["AQI Min"]
        aqi.append(round(max(o3, pm25), 0))
        if pm25 > o3:
            type_.append("pm25")
        if o3 > pm25:
            type_.append("o3")

In [10]:
aq["AQI"] = aqi
aq["Type"] = type_

In [11]:
for i in range(len(aq)):
    aq["Month"][i] = calendar.month_name[int(aq["Month"][i])]

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  aq["Month"][i] = calendar.month_name[int(aq["Month"][i])]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  aq["Month"][i] = calendar.month_name[int(aq["Month"][i])]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  aq["Month"][i] = calendar.month_name[int(aq["Month"][i])]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-c

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  aq["Month"][i] = calendar.month_name[int(aq["Month"][i])]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  aq["Month"][i] = calendar.month_name[int(aq["Month"][i])]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  aq["Month"][i] = calendar.month_name[int(aq["Month"][i])]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-c

In [None]:
aq.to_csv("gr_air_quality.csv")

In [12]:
aq

Unnamed: 0,date,pm25,pm10,o3,no2,so2,co,Year,Month,Day,AQI,Type
0,2014/10/1,13.0,,20.0,,,,2014,October,1,42.0,pm25
1,2014/10/10,27.0,,20.0,,1,2,2014,October,10,79.0,pm25
2,2014/10/11,31.0,,27.0,,1,1,2014,October,11,89.0,pm25
3,2014/10/12,32.0,,32.0,,1,2,2014,October,12,92.0,pm25
4,2014/10/13,33.0,,22.0,,1,1,2014,October,13,94.0,pm25
...,...,...,...,...,...,...,...,...,...,...,...,...
3417,2024/2/5,56.0,41,27.0,12,,5,2024,February,5,135.0,pm25
3418,2024/2/6,63.0,22,19.0,8,,2,2024,February,6,146.0,pm25
3419,2024/2/7,53.0,35,18.0,14,,4,2024,February,7,130.0,pm25
3420,2024/2/8,71.0,34,36.0,8,,2,2024,February,8,154.0,pm25
