### Air Pollution Study Using EPA Data

The study below looks at air quality in the U.S. between 2004 and 2014 using selected data from the EPA, available at http://aqsdr1.epa.gov/aqsweb/aqstmp/airdata/download_files.html. Specifically, we look at PM 2.5 data, which refers to particulate matter less than or equal to 2.5 microns in size, also called fine particle pollution.

The goal is to see how pollution levels have changed during the period of interest, both nationwide and for individual states.


In [15]:
import numpy as np
import pandas as pd

# For 2004, starting at row 150656, the State Code switches from an integer to 'CC', 
# indicating Canada, so we need to skip those. For both years, we'll parse the
# dates, which are at column 11.
skip = range(150656, 150771)
data2004 = pd.read_csv('Data/daily_88101_2004.csv', skiprows=skip, parse_dates=[11])
data2014 = pd.read_csv('Data/daily_88101_2014.csv', parse_dates=[11])

In [30]:
# Create subsets of the data, with just the columns we care about.
d2004 = data2004.loc[:, ['State Code','County Code','Site Num','Latitude','Longitude','Date Local', \
                 'AQI','State Name','County Name', 'City Name']]
d2014 = data2014.loc[:, ['State Code','County Code','Site Num','Latitude','Longitude','Date Local', \
                 'AQI','State Name','County Name', 'City Name']]

The feature we are most interested in is the Air Quality Index (AQI), recorded as an integer that ranges from 0 to greater than 300, measured in micrograms per cubic meter. Here is how the EPA defines the air quality for ranges of AQI:
- 0–50: "Good"
- 51–100: "Moderate"
- 101–150: "Unhealthy for Sensitive Groups"
- 151–200: "Unhealthy"
- 201–300: "Very Unhealthy"
- Above 300: "Hazardous"


In [35]:
# Create a summary dataframe of the AQI data for the two years
AQI = pd.DataFrame([d2004.describe().loc[['mean', 'min', 'max'], 'AQI'], 
                    d2014.describe().loc[['mean', 'min', 'max'], 'AQI']], ['2004', '2014'])
AQI

Unnamed: 0,mean,min,max
2004,44.203364,0,503
2014,34.23122,0,240


Just based on this cursory summary, it looks like there's been improvement over the last ten years. 