## AQS Data Fetching
----
After our preliminary investigation launched in the [previous notebook](preliminary_exploration.ipynb), we have deduced that we can pretty easily fetch and filter a large quantity of daily readings data for at least carbon monoxide in a couple of states. Now, we aim to fetch air quality data for all of the AQI defined pollutants for all 50 states, during the time period between 2013 and 2018.

To start, let's import our needed libraries.

In [1]:
import pandas as pd
from pyaqs import AQSFetcher
from time import sleep
from random import random

In [2]:
aqs_fetcher = AQSFetcher('bbjornstad.flatiron@gmail.com', 'ochrefox21')

In [3]:
state_codes = aqs_fetcher.get_state_codes()

In [4]:
aqi_parameters = aqs_fetcher.get_parameter_list_by_class('AQI POLLUTANTS')
aqi_parameters

Unnamed: 0,code,parameter_description
0,42101,Carbon monoxide
1,42401,Sulfur dioxide
2,42602,Nitrogen dioxide (NO2)
3,44201,Ozone
4,81102,PM10 Total 0-10um STP
5,88101,PM2.5 - Local Conditions
6,88502,Acceptable PM2.5 AQI & Speciation Mass


Alright, now everything that we will need to cache the EPA AQS API has been appropriately stored here. We can now loop over each of the states and query for the data on these AQI pollutants. For ease of further use, we will also save these data in raw format as CSV files in the [`data`](./data/) folder. This will allow us to easily retrieve the data in raw format if needed in the future, without needing to further query the API, gaining in efficiency and practicality for the API maintainers. Let's start by setting the date bounds to variables to increase readability.

In [5]:
bdate = 20130101
edate = 20181231

Now, we can loop over the codes in our state code lists to fetch raw dataframes that we can store as CSV files.

In [None]:
all_param_all_states_raw = pd.DataFrame()

for param_code in aqi_parameters.code:
    all_state_aq_data = pd.DataFrame()

    for state_code in state_codes.code:
        new_raw_data = aqs_fetcher.daily_data_by_state(state_code, [param_code], bdate, edate)
        try:
            new_raw_data.to_csv(f'data/{param_code}_state_{state_code}_raw.csv')
            print(f'Saved {state_code} ({state_codes.loc[state_codes.code == state_code].state_name.values[0]}) -- {param_code}')
            all_state_aq_data = pd.concat([all_state_aq_data, new_raw_data])
            sleep(random()*0.25)
        except AttributeError:
            # we got no data for one of our searches. This is fine, but we don't want to save anything
            print(f'No data found for {state_code} -- {param_code}')
            pass

    all_state_aq_data.to_csv(f'data/{param_code}_all_states_raw.csv')
    
    all_param_all_states_raw = pd.concat([all_param_all_states_raw, all_state_aq_data])

all_param_all_states_raw.to_csv('data/aqi_parameters_all_states_raw.csv')
all_param_all_states_raw.head()

Saved 01 (Alabama) -- 42101
Saved 02 (Alaska) -- 42101
Saved 04 (Arizona) -- 42101
Saved 05 (Arkansas) -- 42101
Saved 06 (California) -- 42101
Saved 08 (Colorado) -- 42101
Saved 09 (Connecticut) -- 42101
Saved 10 (Delaware) -- 42101
Saved 11 (District Of Columbia) -- 42101
Saved 12 (Florida) -- 42101
Saved 13 (Georgia) -- 42101
Saved 15 (Hawaii) -- 42101
Saved 16 (Idaho) -- 42101
Saved 17 (Illinois) -- 42101
Saved 18 (Indiana) -- 42101
Saved 19 (Iowa) -- 42101
Saved 20 (Kansas) -- 42101
Saved 21 (Kentucky) -- 42101
Saved 22 (Louisiana) -- 42101
Saved 23 (Maine) -- 42101
Saved 24 (Maryland) -- 42101
Saved 25 (Massachusetts) -- 42101
Saved 26 (Michigan) -- 42101
Saved 27 (Minnesota) -- 42101
Saved 28 (Mississippi) -- 42101
Saved 29 (Missouri) -- 42101
Saved 30 (Montana) -- 42101
Saved 31 (Nebraska) -- 42101
Saved 32 (Nevada) -- 42101
Saved 33 (New Hampshire) -- 42101
Saved 34 (New Jersey) -- 42101
Saved 35 (New Mexico) -- 42101
Saved 36 (New York) -- 42101
Saved 37 (North Carolina) -- 42