# Getting AQI Data

Next, I need to get the AQI data For the city of Philadelphia to do further analysis for this, I will be using the aqs.epa.gob website to get the data. AQI stands for Air quality index, Which is a measure of how clean or polluted air is. Since different pollutants have different kind of impact on the environment and air quality the QA is used as a standardized way to measure the quality of air irrespective of the pollutant.

The AQI Score is divided into multiple parts:
- 0-50: Good
- 51-100: Moderate
- 101-150: Unhealthy for sensitive groups
- 151-200: Unhealthy
- 201-300: Very Unhealthy
- 301-500: Hazardous

refer to this website for more information: https://www.airnow.gov/aqi/aqi-basics/

### License
A lot of code used in this notebook was developed by Dr. David W. McDonald for use in DATA 512, a course in the UW MS Data Science degree program. 

So, this code by extension is provided under the [Creative Commons](https://creativecommons.org) [CC-BY license](https://creativecommons.org/licenses/by/4.0/). Revision 1.1 - August 16, 2024

### Attribution

Some code in this book has been inspired from Raagul Nagendran, a student in the same program. Attributions are also privided in the code cells where the code was used.

----------------------------------


#### below a few basic packages that are required to get the data


In [7]:
# The JSON package is used for parsing JSON data from the API
import json, time

# requests is used to send HTTP requests to the API
import requests

# both polar and pandas are used for data manipulation and analysis
import polars as pl
import pandas as pd

#import the custom file constants containing the keys. Not available in the repository.
#Defines the AQI_USERID and AQI_KEY
import constants

In [8]:
UW_EMAIL = 'dabhinav@uw.edu'

In [9]:
API_REQUEST_URL = 'https://aqs.epa.gov/data/api'
API_ACTION_SIGNUP = '/signup?email={email}'
#
#    List actions provide information on API parameter values that are required by some other actions/requests
API_ACTION_LIST_CLASSES = '/list/classes?email={email}&key={key}'
API_ACTION_LIST_PARAMS = '/list/parametersByClass?email={email}&key={key}&pc={pclass}'
API_ACTION_LIST_SITES = '/list/sitesByCounty?email={email}&key={key}&state={state}&county={county}'
#
#    Monitor actions are requests for monitoring stations that meet specific criteria
API_ACTION_MONITORS_COUNTY = '/monitors/byCounty?email={email}&key={key}&param={param}&bdate={begin_date}&edate={end_date}&state={state}&county={county}'
API_ACTION_MONITORS_BOX = '/monitors/byBox?email={email}&key={key}&param={param}&bdate={begin_date}&edate={end_date}&minlat={minlat}&maxlat={maxlat}&minlon={minlon}&maxlon={maxlon}'
#
#    Summary actions are requests for summary data. These are for daily summaries
API_ACTION_DAILY_SUMMARY_COUNTY = '/dailyData/byCounty?email={email}&key={key}&param={param}&bdate={begin_date}&edate={end_date}&state={state}&county={county}'
API_ACTION_DAILY_SUMMARY_BOX = '/dailyData/byBox?email={email}&key={key}&param={param}&bdate={begin_date}&edate={end_date}&minlat={minlat}&maxlat={maxlat}&minlon={minlon}&maxlon={maxlon}'
#
#    It is always nice to be respectful of a free data resource.
#    We're going to observe a 100 requests per minute limit - which is fairly nice
API_LATENCY_ASSUMED = 0.002       # Assuming roughly 2ms latency on the API and network
API_THROTTLE_WAIT = (1.0/100.0)-API_LATENCY_ASSUMED
#
#
#    This is a template that covers most of the parameters for the actions we might take, from the set of actions
#    above. In the examples below, most of the time parameters can either be supplied as individual values to a
#    function - or they can be set in a copy of the template and passed in with the template.
#
AQS_REQUEST_TEMPLATE = {
    "email":      "",
    "key":        "",
    "state":      "",     # the two digit state FIPS # as a string
    "county":     "",     # the three digit county FIPS # as a string
    "begin_date": "",     # the start of a time window in YYYYMMDD format
    "end_date":   "",     # the end of a time window in YYYYMMDD format, begin_date and end_date must be in the same year
    "minlat":    0.0,
    "maxlat":    0.0,
    "minlon":    0.0,
    "maxlon":    0.0,
    "param":     "",     # a list of comma separated 5 digit codes, max 5 codes requested
    "pclass":    ""      # parameter class is only used by the List calls
}



To run the above mention API, we will need to get a username and password from the website. The request_signup function below Implements, just that.

In [10]:
#
#    This implements the sign-up request. The parameters are standardized so that this function definition matches
#    all of the others. However, the easiest way to call this is to simply call this function with your preferred
#    email address.
#
def request_signup(email_address = None,
                   endpoint_url = API_REQUEST_URL,
                   endpoint_action = API_ACTION_SIGNUP,
                   request_template = AQS_REQUEST_TEMPLATE,
                   headers = None):

    # Make sure we have a string - if you don't have access to this email addres, things might go badly for you
    if email_address:
        request_template['email'] = email_address

    if not request_template['email']:
        raise Exception("Must supply an email address to call 'request_signup()'")

    if '@' not in request_template['email']:
        raise Exception(f"Must supply an email address to call 'request_signup()'. The string '{request_template['email']}' does not look like an email address.")

    # Compose the signup url - create a request URL by combining the endpoint_url with the parameters for the request
    request_url = endpoint_url+endpoint_action.format(**request_template)

    # make the request
    try:
        # Wait first, to make sure we don't exceed a rate limit in the situation where an exception occurs
        # during the request processing - throttling is always a good practice with a free data source
        if API_THROTTLE_WAIT > 0.0:
            time.sleep(API_THROTTLE_WAIT)
        response = requests.get(request_url, headers=headers)
        json_response = response.json()
    except Exception as e:
        print(e)
        json_response = None
    return json_response




below code is commented since requrest_signup needs to be run only once to get the username and password.

In [11]:
#
#    A SIGNUP request is only to be done once, to request a key. A key is sent to that email address and needs to be confirmed with a click through
#    This code should probably be commented out after you've made your key request to make sure you don't accidentally make a new sign-up request
#
# print("Requesting SIGNUP ...")
# response = request_signup(UW_EMAIL)
# print(json.dumps(response, indent=4))

In [12]:
#
#    This implements the list request. There are several versions of the list request that only require email and key.
#    This code sets the default action/requests to list the groups or parameter class descriptors. Having those descriptors
#    allows one to request the individual (proprietary) 5 digit codes for individual air quality measures by using the
#    param request. Some code in later cells will illustrate those requests.
#
def request_list_info(email_address = None, key = None,
                      endpoint_url = API_REQUEST_URL,
                      endpoint_action = API_ACTION_LIST_CLASSES,
                      request_template = AQS_REQUEST_TEMPLATE,
                      headers = None):

    #  Make sure we have email and key - at least
    #  This prioritizes the info from the call parameters - not what's already in the template
    if email_address:
        request_template['email'] = email_address
    if key:
        request_template['key'] = key

    # For the basic request we need an email address and a key
    if not request_template['email']:
        raise Exception("Must supply an email address to call 'request_list_info()'")
    if not request_template['key']:
        raise Exception("Must supply a key to call 'request_list_info()'")

    # compose the request
    request_url = endpoint_url+endpoint_action.format(**request_template)

    # make the request
    try:
        # Wait first, to make sure we don't exceed a rate limit in the situation where an exception occurs
        # during the request processing - throttling is always a good practice with a free data source
        if API_THROTTLE_WAIT > 0.0:
            time.sleep(API_THROTTLE_WAIT)
        response = requests.get(request_url, headers=headers)
        json_response = response.json()
    except Exception as e:
        print(e)
        json_response = None
    return json_response



In [13]:
#
#   The default should get us a list of the various groups or classes of sensors. These classes are user defined names for clustors of
#   sensors that might be part of a package or default air quality sensing station. We need a class name to start getting down to the
#   a sensor ID. Each sensor type has an ID number. We'll eventually need those ID numbers to be able to request values that come from
#   that specific sensor.
#
AQI_PARAM_CLASS = "AQI POLLUTANTS"

request_data = AQS_REQUEST_TEMPLATE.copy()
request_data['email'] = constants.AQI_USERID
request_data['key'] = constants.AQI_KEY
request_data['pclass'] = AQI_PARAM_CLASS

response = request_list_info(request_template=request_data, endpoint_action=API_ACTION_LIST_PARAMS)

if response["Header"][0]['status'] == "Success":
    print(json.dumps(response['Data'],indent=4))
else:
    print(json.dumps(response,indent=4))


[
    {
        "code": "42101",
        "value_represented": "Carbon monoxide"
    },
    {
        "code": "42401",
        "value_represented": "Sulfur dioxide"
    },
    {
        "code": "42602",
        "value_represented": "Nitrogen dioxide (NO2)"
    },
    {
        "code": "44201",
        "value_represented": "Ozone"
    },
    {
        "code": "81102",
        "value_represented": "PM10 Total 0-10um STP"
    },
    {
        "code": "88101",
        "value_represented": "PM2.5 - Local Conditions"
    },
    {
        "code": "88502",
        "value_represented": "Acceptable PM2.5 AQI & Speciation Mass"
    }
]


In [15]:
#
#   Given the set of sensor codes, now we can create a parameter list or 'param' value as defined by the AQS API spec.
#   It turns out that we want all of these measures for AQI, but we need to have two different param constants to get
#   all seven of the code types. We can only have a max of 5 sensors/values request per param.
#
#   Gaseous AQI pollutants CO, SO2, NO2, and O2
AQI_PARAMS_GASEOUS = "42101,42401,42602,44201"
#
#   Particulate AQI pollutants PM10, PM2.5, and Acceptable PM2.5
AQI_PARAMS_PARTICULATES = "81102,88101,88502"
#
#

In [16]:
#
#   We'll use these two city locations in the examples below.
#
CITY_LOCATION = {
    "city" : "Philadelphia",
    "latlon" : [39.9526, -75.1652],
    'county' : 'Philadelphia',
    'state'  : 'Pennsylvania',
    'fips'   : '42101',
}

In [None]:
#
#  This list request should give us a list of all the monitoring stations in the county specified by the
#  given city selected from the CITY_LOCATIONS dictionary
#
request_data = AQS_REQUEST_TEMPLATE.copy()
request_data['email'] = constants.AQI_USERID
request_data['key'] = constants.AQI_KEY
request_data['state'] = CITY_LOCATION['fips'][:2]   # the first two digits (characters) of FIPS is the state code
request_data['county'] = CITY_LOCATION['fips'][2:]  # the last three digits (characters) of FIPS is the county code

response = request_list_info(request_template=request_data, endpoint_action=API_ACTION_LIST_SITES)

if response["Header"][0]['status'] == "Success":
    print(json.dumps(response['Data'],indent=4))
else:
    print(json.dumps(response,indent=4))


In [18]:
#
#    This implements the daily summary request. Daily summary provides a daily summary value for each sensor being requested
#    from the start date to the end date.
#
#    Like the two other functions, this can be called with a mixture of a defined parameter dictionary, or with function
#    parameters. If function parameters are provided, those take precedence over any parameters from the request template.
#
def request_daily_summary(email_address = None, key = None, param=None,
                          begin_date = None, end_date = None, fips = None,
                          endpoint_url = API_REQUEST_URL,
                          endpoint_action = API_ACTION_DAILY_SUMMARY_COUNTY,
                          request_template = AQS_REQUEST_TEMPLATE,
                          headers = None):

    #  This prioritizes the info from the call parameters - not what's already in the template
    if email_address:
        request_template['email'] = email_address
    if key:
        request_template['key'] = key
    if param:
        request_template['param'] = param
    if begin_date:
        request_template['begin_date'] = begin_date
    if end_date:
        request_template['end_date'] = end_date
    if fips and len(fips)==5:
        request_template['state'] = fips[:2]
        request_template['county'] = fips[2:]
    # Make sure there are values that allow us to make a call - these are always required
    if not request_template['email']:
        raise Exception("Must supply an email address to call 'request_daily_summary()'")
    if not request_template['key']:
        raise Exception("Must supply a key to call 'request_daily_summary()'")
    if not request_template['param']:
        raise Exception("Must supply param values to call 'request_daily_summary()'")
    if not request_template['begin_date']:
        raise Exception("Must supply a begin_date to call 'request_daily_summary()'")
    if not request_template['end_date']:
        raise Exception("Must supply an end_date to call 'request_daily_summary()'")
    # Note we're not validating FIPS fields because not all of the daily summary actions require the FIPS numbers

    # compose the request
    request_url = endpoint_url+endpoint_action.format(**request_template)

    # make the request
    try:
        # Wait first, to make sure we don't exceed a rate limit in the situation where an exception occurs
        # during the request processing - throttling is always a good practice with a free data source
        # if API_THROTTLE_WAIT > 0.0:
        #     time.sleep(API_THROTTLE_WAIT)

        response = requests.get(request_url, headers=headers)
        json_response = response.json()
    except Exception as e:
        print(e)
        json_response = None
    return json_response



Here, We call the request_daily_summary function separately for both gases and particulate data as required in the assignment. We are only considering the dates from first May to end of October for every year. Data from 1961 to 2024 as required in the assignment. In case there are any errors or data is not available for a specific year we are saving that in a separate dictionary, which we will look at the next cells.

#### This code was inspired by the method used by Raagul Nagendran in his notebook. Though I have rewritten this code, I have used very similar logic here.

In [None]:

request_data = AQS_REQUEST_TEMPLATE.copy()
request_data['email'] = constants.AQI_USERID
request_data['key'] = constants.AQI_KEY
request_data['state'] = CITY_LOCATION['fips'][:2]
request_data['county'] = CITY_LOCATION['fips'][2:]

# request daily AQI summary data for 1961 to 2024 (May 1st to Oct 31st)
request_data['param'] = AQI_PARAMS_GASEOUS
START_YEAR = 1961
END_YEAR = 2024

gaseous_aqi = []
particulate_aqi = []
gaseous_errors = {}
particulate_errors = {}

for year in range(START_YEAR, END_YEAR + 1):
    year = str(year)
    print("Current Year is : "+str(year))
    request_data['begin_date'] = f"{year}0501"
    request_data['end_date'] = f"{year}1031"

    # Get Gaseous data
    gaseous_response = request_daily_summary(request_template=request_data)
    if gaseous_response["Header"][0]['status'] == "Success":
        gaseous_aqi.extend(gaseous_response['Data'])
    else:
        gaseous_errors[year] = "No data / API errors"
    #print("Gaseous Data : "+str(gaseous_errors[year]))
    # Get Particulates data
    request_data_particulates = request_data.copy()
    request_data_particulates['param'] = AQI_PARAMS_PARTICULATES
    # print(request_data_particulates)
    particulate_response = request_daily_summary(request_template=request_data_particulates)
    if particulate_response["Header"][0]['status'] == "Success":
        particulate_aqi.extend(particulate_response['Data'])
    else:
        particulate_errors[year] = "No data / API errors"
    #print("Particulate Data : "+str(particulate_errors[year]))




In [None]:
gaseous_errors.keys()

dict_keys(['1961'])

Looks like there is gases data all the way from 1962 until 2024 however, we are not really sure how accurate this AQA values are because according to the Assignment document and also according to my reading of AQI most government AQI data is only consistent from the late 1980s. I will keep this in mind while analyzing further.

In [None]:
particulate_errors.keys()

dict_keys(['1961', '1962', '1963', '1964', '1965', '1966', '1967', '1968', '1969', '1970', '1971', '1972', '1973', '1974', '1975', '1976', '1977', '1978', '1979', '1980', '1981', '1982', '1983', '1984'])

Looks like there is particulate data all the way from 1985 until 2024. This is consistent with the AQI data being consistent from the late 1980s. I will keep this in mind while analyzing further as well.

Below we save both gaseous, AQI and particulate AQI files a separate Json to be retrieved later for analysis.

In [None]:
#write gaseous_aqi to a file
with open('./intermediate data files/gaseous_aqi.json', 'w') as f:
    json.dump(gaseous_aqi, f)

In [None]:
#write particulate_aqi to a file
with open('./intermediate data files/particulate_aqi.json', 'w') as f:
    json.dump(particulate_aqi, f)