## Data Analysis
This interactive notebook handles the analysis and visualizations of data, including automated hypothesis testing. First, we need to get our data from the EPA website. To do so, we should start by importing some of our required libraries.

In [1]:
from pyaqs import AQSFetcher
import pandas as pd

Now, we will instantiate a new AQSFetcher object and use it to get the required data from the EPA website. For now, we will focus our attention on counties within Illinois, the state where we currently reside. To do so, we will use some of the custom defined methods to get the appropriate identification codes for the necessary locations and parameters.

To note, in this context, a *parameter* is a compound that in the air that can be measured. The EPA has many such parameters, sorted into different classes whose descriptions are easily accessible through the API.

In [2]:
aqs_fetcher = AQSFetcher('bbjornstad.flatiron@gmail.com', 'ochrefox21')

In [4]:
state_codes = aqs_fetcher.get_state_codes()
state_codes.head()

Unnamed: 0,code,state_name
0,1,Alabama
1,2,Alaska
2,4,Arizona
3,5,Arkansas
4,6,California


Let's store the code for Illinois in a variable for easy access.

In [6]:
il_code = state_codes.loc[state_codes.state_name == 'Illinois', 'code'].values[0]
il_code

'17'

And now we will get a list of codes for the counties within Illinois.

In [7]:
il_county_codes = aqs_fetcher.get_counties_by_state(il_code)
il_county_codes.head()

Unnamed: 0,code,county_name
0,1,Adams
1,3,Alexander
2,5,Bond
3,7,Boone
4,9,Brown


Finally, let's take a look at the possible parameter classes and identify a set that seems reasonable for analysis.

In [9]:
aqs_fetcher.get_parameter_classes()

Unnamed: 0,class_name,class_description
0,AIRNOW MAPS,The parameters represented on AirNow maps (881...
1,ALL,Select all Parameters Available
2,AQI POLLUTANTS,Pollutants that have an AQI Defined
3,CORE_HAPS,Urban Air Toxic Pollutants
4,CRITERIA,Criteria Pollutants
5,CSN DART,List of CSN speciation parameters to populate ...
6,FORECAST,Parameters routinely extracted by AirNow (STI)
7,HAPS,Hazardous Air Pollutants
8,IMPROVE CARBON,IMPROVE Carbon Parameters
9,IMPROVE_SPECIATION,PM2.5 Speciated Parameters Measured at IMPROVE...


We are most interested in those parameters held in the CRITERIA class, as indicated by the description. In particular, this class defines pollutants that the EPA has determined to be suitable criteria for overall air quality.

In [11]:
parameter_codes = aqs_fetcher.get_parameter_list_by_class('CRITERIA')
parameter_codes

Unnamed: 0,code,parameter_description
0,14129,Lead (TSP) LC
1,42101,Carbon monoxide
2,42401,Sulfur dioxide
3,42602,Nitrogen dioxide (NO2)
4,44201,Ozone
5,81102,PM10 Total 0-10um STP
6,85129,Lead PM10 LC FRM/FEM
7,88101,PM2.5 - Local Conditions


Fantastic, these will allow us the possibility to easily partition and query the data that we need to continue with the analysis.

In [13]:
il_aq_data = aqs_fetcher.annual_data_by_state(il_code, parameter_codes.code, 20120101, 20161231)
il_aq_data.head()

Unnamed: 0,state_code,county_code,site_number,parameter_code,poc,latitude,longitude,datum,parameter,sample_duration,...,fiftieth_percentile,tenth_percentile,local_site_name,site_address,state,county,city,cbsa_code,cbsa,date_of_last_change
0,17,115,110,14129,1,39.862576,-88.940748,WGS84,Lead (TSP) LC,24 HOUR,...,0.02,0.01,MUELLER,1226 E. GARFIELD,Illinois,Macon,Decatur,19500,"Decatur, IL",2013-06-28
1,17,115,110,14129,1,39.862576,-88.940748,WGS84,Lead (TSP) LC,24 HOUR,...,0.011,0.004,MUELLER,1226 E. GARFIELD,Illinois,Macon,Decatur,19500,"Decatur, IL",2014-02-25
2,17,115,110,14129,1,39.862576,-88.940748,WGS84,Lead (TSP) LC,24 HOUR,...,0.012,0.003,MUELLER,1226 E. GARFIELD,Illinois,Macon,Decatur,19500,"Decatur, IL",2015-03-18
3,17,115,110,14129,1,39.862576,-88.940748,WGS84,Lead (TSP) LC,24 HOUR,...,0.012,0.004,MUELLER,1226 E. GARFIELD,Illinois,Macon,Decatur,19500,"Decatur, IL",2016-01-19
4,17,115,110,14129,1,39.862576,-88.940748,WGS84,Lead (TSP) LC,24 HOUR,...,0.008,0.004,MUELLER,1226 E. GARFIELD,Illinois,Macon,Decatur,19500,"Decatur, IL",2017-02-02


In [25]:
il_aq_data.columns

Index(['state_code', 'county_code', 'site_number', 'parameter_code', 'poc',
       'latitude', 'longitude', 'datum', 'parameter', 'sample_duration',
       'pollutant_standard', 'metric_used', 'method', 'year',
       'units_of_measure', 'event_type', 'observation_count',
       'observation_percent', 'validity_indicator', 'valid_day_count',
       'required_day_count', 'exceptional_data_count',
       'null_observation_count', 'primary_exceedance_count',
       'secondary_exceedance_count', 'certification_indicator',
       'arithmetic_mean', 'standard_deviation', 'first_max_value',
       'first_max_datetime', 'second_max_value', 'second_max_datetime',
       'third_max_value', 'third_max_datetime', 'fourth_max_value',
       'fourth_max_datetime', 'first_max_nonoverlap_value',
       'first_max_n_o_datetime', 'second_max_nonoverlap_value',
       'second_max_n_o_datetime', 'ninety_ninth_percentile',
       'ninety_eighth_percentile', 'ninety_fifth_percentile',
       'ninetieth_perc

In [27]:
il_aq_data.groupby(['county_code', 'parameter']).mean().loc[:,['arithmetic_mean', 'standard_deviation']]

Unnamed: 0_level_0,Unnamed: 1_level_0,arithmetic_mean,standard_deviation
county_code,parameter,Unnamed: 2_level_1,Unnamed: 3_level_1
001,Ozone,0.042986,0.010343
019,Carbon monoxide,0.139411,0.061764
019,Ozone,0.044802,0.011319
019,PM2.5 - Local Conditions,8.457767,4.246211
019,Sulfur dioxide,1.286831,3.203973
...,...,...,...
197,PM2.5 - Local Conditions,8.351912,4.332240
201,Carbon monoxide,0.396856,0.176690
201,Lead (TSP) LC,0.027988,0.041999
201,Ozone,0.042892,0.011944
