
# Texas Air Quality Notebook

**Welcome to our notebook on Texas Air Quality!** 

In this notebook we will be looking at Air Quality Index (AQI) scores in the surrounding Texas area. With so many pollutants in the air, especially as we head into the annual fire season, AQI becomes something we check on the daily. For many of us, this AQI map is all too familiar. Throughout this module we will discuss how data can be used to visualize and uncover underlying trends in the world.

**Let's get started!**

<p align="center">
  <img src="images/Texas_AQI" width="" height="" align="center">
</p>

<br>

## Introduction to Jupyter Notebook

Before we get started with the data, let's talk about what Jupyter Notebook is. This lab is set up in a Jupyter Notebook. Notebooks can contain anything from live code, to written text, equations or visualizations. The content of notebooks are written into rectangular sections called **cells**. 

#### Types of Cells
There are two types of cells in Jupyter, **code** cells and **markdown** cells. **Code cells**, as you can imagine, contain code in Python, the programming language that we will be using throughout this notebook.  **Markdown cells**, such as this one, contain written text. You can select any cell by clicking on it one. 

#### Running Cells
'Running' a cell is similar to pressing 'Enter' on a calculator once you've typed in an expression; it computes all of the expressions contained within the cell.

To run a cell, you can do one of the following:

- press **Shift + Enter**
- click the **Run** button on the top tool bar

Running a markdown cell will embed the text into the notebook and running a code cell will evaluate the code and display its output under the cell. 

Let's try it! **Run the code cell below.**

In [167]:
print("Hello World!")

Hello World!


#### Editing and Saving

- To **edit** a cell, simply double click on the desired cell and begin typing. The cell that you are currently working in will be highlighted by a green box.
- To **save** the notebook, either click *Ctl + S* or navigate to the "File" dropdown and select "Save and Checkpoint"

#### Adding Cells
You can add a cell by clicking <b><code>Insert > Insert Cell Below</code></b> and choose the cell type in the drop down menu. Try adding a cell below to type in your name!


#### Deleting Cells 
To delete a cell, click on the <b><code>scissors</code></b> at the top or <b><code>Edit > Cut Cells</code></b>. Delete the cell below.

In [168]:
print("Delete this cell!")

Delete this cell!


**Important Tip**: Everytime you open a Jupyter notebook, it is extremely important to run all the cells from the beginning in order for the notebook to work. 

Now that we have had a brief crash course on Jupyter Notebooks, let's dive into Texas AQI!

<br>

## Introduction to the Data <a class="anchor" id="2"></a>

In this notebook we will look at data collected from PrupleAir, a company that manages a network of air quality sensors. The data from these sensors are then collected to create maps like the one displayed above that depicts an intuitive visualization of the air quality in a specific region. In the dataframe below, you will find several metrics that help us do this.

**Run the cell below to import all the dependencies needed for this notebook!**

In [169]:
import matplotlib.pyplot as plt
import numpy as np
import purpleair
import folium
import ipywidgets as widgets
from ipywidgets import interact, interactive, fixed, interact_manual
from datetime import datetime
from IPython.display import clear_output

import widget1

<br>

# PurpleAir Data

Before we begin looking at data collected from PurpleAir sensors, lets first take a look at what a sensor is, and what it measures. 


> Below is a picture of a real PurpleAir Air Quality Sensor. These sensor can be mounted both indoors or outdoors, and it tracks airborne particulate matter(PM) in real time using PMSX003 laser counters. Particulate matter can include things like dust, smoke, dirt and any other organic or inorganic particles in the air. With multiple sensors mounted in a region, PurpleAir can create a relatively accurate measure of AQI throughout the day as the air quality changes. 

For more information on how sensors work, take a look at the official PurpleAir website [here](https://www2.purpleair.com/community/faq#hc-what-do-the-numbers-on-the-purpleair-map-mean-1)!

<p align="center">
  <img src="images/purpleair-sensor-pm2.5.webp" width="" height="" align="center">
</p>

In order to work with the data, we need to pull it into our workspace. Fortunately, PurpleAir has created an API that allows users to pull in and work with their AQI data. In the code cell below we will import the prupleair API and use it to create a dataframe of data from all 18,858 PurpleAir sensors!

**Run the code cell below!**

In [170]:
from purpleair.network import SensorList
p = SensorList()
df = p.to_dataframe(sensor_filter='all',
                    channel='parent')

Initialized 23,164 sensors!


**Run the cell below to display the sensor dataframe!**

The dataframe below contains all the sensor data as of the latest update. It contains data on everything from the geograohical latitude and longitude of the sensor to data on the last time that sensor measured airborne PM.

In [171]:
df

Unnamed: 0_level_0,parent,lat,lon,name,location_type,pm_2.5,temp_f,temp_c,humidity,pressure,...,last_update_check,created,uptime,is_owner,10min_avg,30min_avg,1hour_avg,6hour_avg,1day_avg,1week_avg
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
14633,,37.275561,-121.964134,Hazelwood canary,outside,7.52,69.0,20.555556,33.0,1012.16,...,,,,False,4.71,3.89,3.00,1.40,1.34,5.04
25999,,30.053808,-95.494643,Villages of Bridgestone AQI,outside,11.25,84.0,28.888889,74.0,1007.73,...,,,,False,11.67,10.87,9.90,7.63,8.16,13.65
14091,,37.883620,-122.070087,WC Hillside,outside,0.75,68.0,20.000000,27.0,1006.78,...,,,,False,1.15,1.10,0.87,2.57,5.08,6.48
108226,,38.573703,-121.439113,"""C"" Street Air Shelter",inside,2.29,80.0,26.666667,23.0,1018.48,...,,,,False,1.10,0.88,0.78,1.39,2.14,3.80
42073,,47.185173,-122.176855,#1,outside,4.23,57.0,13.888889,69.0,1002.01,...,,,,False,5.03,5.39,5.20,3.62,4.39,5.67
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
64995,,36.691324,126.585255,한서대학교,outside,29.04,81.0,27.222222,41.0,1004.43,...,,,,False,30.40,27.11,23.98,15.53,10.69,14.08
64093,,36.710720,126.548390,해미읍성,outside,37.18,81.0,27.222222,42.0,1014.64,...,,,,False,35.11,32.19,30.19,20.68,12.90,14.51
29747,,36.761236,127.395300,화덕보건진료소,outside,46.08,83.0,28.333333,43.0,1004.70,...,,,,False,46.00,44.13,42.29,29.37,18.57,18.59
98309,,36.718003,126.926841,화천1리마을회관,outside,40.91,87.0,30.555556,38.0,1009.79,...,,,,False,41.10,42.30,41.23,27.15,15.72,15.81


Here is a breakdown of the dataframe above and what each column represents. 

`lat`: The latitude coordinate of the location

`lon`: The longitude coordinate of the location

`name`: The name of the location

`location_type`: The nature of the location (ie. inside or outside)

`pm_2.5`: The level of fine particulate matter in the air of that location

`temp_f`: The temperature of the location in degrees Farenheit 

`temp_c`: The temperature of the location in degrees Celsius 

`humidity`: The humidity percentage of the location

`pressure`: The pressure index of the location (in millibars)

`last_seen`: The last seen date and timestamp in UTC

`model`: Model of the specific sensor

`flagged`: Whether or not the channel was marked as flagged (usually based on a fault)

`age`: Sensor data age (when data was last received)

`10min_avg`: Average PM 2.5 AQI over the last 10 minutes 

`30min_avg`: Average PM 2.5 AQI over the last 30 minutes

`1hour_avg`: Average PM 2.5 AQI over the last hour

`6hour_avg`: Average PM 2.5 AQI over the last 6 hours

`1day_avg`: Average PM 2.5 AQI over the last day 

`1week_avg`: Average PM 2.5 AQI over the last week


<br>

### Airborne Particulate Matter (PM) 2.5 
While many of the column names are relatively straightforward, such as the "name" column (which displays the set name of the particular sensor), the "location_type" column (which indicates where it is an indoor or outdoor sensor), etc., we would like to draw your attention to the "pm_2.5" column. 

>The "pm_2.5" column represents the count of airborne pm that is larger than 2.5um/dl, in otherwords, airborne particles that have a diameter of 2.5 micrometers or less. In high levels, PM 2.5 particles can reduce visibility and cause the air to appear hazy. Tracking PM 2.5 is important because prolonged exposure to high levels of PM 2.5 particles can cause adverse US Environmental Protection Agency (EPA) use to calculate the local Air Quality Index (AQI).

**Run the cells below to take a look at PM 2.5 levels of a specific sensor over time!**

If you go to the PurpleAir website [here](https://map.purpleair.com/1/mAQI/a10/p604800/cC0#14.52/30.28196/-97.73198), it should navigate you to a map of the surrounding University of Texas area. If you click on the sensor currently located in the DKR Texas Memorial Stadium, you'll find that the name of that particular sensor is "PA_II_D8B6". 

Let's take a closer look at the UT Stadium Sensor! In the code cell below we will filter the dataframe by the sensor name ("PA_II_D8B6") to pick out the row that corresponds to the specific sensor we are looking for. 

**Run the cell below!**

In [172]:
df[df['name'] == "PA_II_D8B6"]

Unnamed: 0_level_0,parent,lat,lon,name,location_type,pm_2.5,temp_f,temp_c,humidity,pressure,...,last_update_check,created,uptime,is_owner,10min_avg,30min_avg,1hour_avg,6hour_avg,1day_avg,1week_avg
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
27395,,30.285069,-97.732877,PA_II_D8B6,outside,13.04,79.0,26.111111,77.0,992.27,...,,,,False,12.95,11.6,11.25,8.62,9.73,10.17


<br>

The row above gives us loads of information on the state of the AQI in the UT Stadium at the present moment, but it would be nice to see the AQI information over time. In the cell below, we create a dataframe that contains information about the UT Stadium sensor roughly over the last 7 days. 

**Run the cell below to output the dataframe!**

In [173]:
## data from PA_II_D8B6 (UT Stadium Sensor) sensor from the past week
from purpleair.sensor import Sensor
se = Sensor(27395)
UT_stadium = se.parent.get_historical(weeks_to_get=1,thingspeak_field='secondary')
UT_stadium['Date'] = [i.date().strftime("%d-%b-%Y") for i in UT_stadium['created_at']]
UT_stadium

Unnamed: 0_level_0,created_at,0.3um/dl,0.5um/dl,1.0um/dl,2.5um/dl,5.0um/dl,10.0um/dl,PM1.0 (CF=ATM) ug/m3,PM10 (CF=ATM) ug/m3,Date
entry_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
521919,2021-10-07 00:01:48+00:00,1774.91,490.60,60.25,2.95,0.00,0.00,9.05,12.27,07-Oct-2021
521920,2021-10-07 00:03:30+00:00,1859.14,516.03,79.39,5.86,1.36,0.00,9.36,14.66,07-Oct-2021
521921,2021-10-07 00:05:29+00:00,1865.60,516.08,68.40,4.80,0.70,0.70,9.22,14.42,07-Oct-2021
521922,2021-10-07 00:09:30+00:00,1705.64,482.73,62.18,2.73,1.56,0.76,8.80,12.93,07-Oct-2021
521923,2021-10-07 00:11:25+00:00,1698.47,477.30,65.44,3.44,0.53,0.11,8.61,12.42,07-Oct-2021
...,...,...,...,...,...,...,...,...,...,...
526200,2021-10-13 23:51:59+00:00,1495.66,432.41,80.55,5.71,0.86,0.64,7.32,13.05,13-Oct-2021
526201,2021-10-13 23:53:56+00:00,1545.51,448.20,90.63,9.49,2.37,0.37,7.53,14.32,13-Oct-2021
526202,2021-10-13 23:55:57+00:00,1485.98,418.81,73.17,8.66,0.69,0.28,7.07,12.95,13-Oct-2021
526203,2021-10-13 23:57:59+00:00,1537.31,428.65,73.13,6.98,0.73,0.18,7.07,12.24,13-Oct-2021


As you can see from the "created_at" column, the AQI was taken every two minutes over the past 7 days. The data frame also contains information on PM paticules of different diameters such as 0.3, 0.5, 1.0, 2.5, 5.0 and 10.0.

<br>

While this dataframe is useful, there are too many rows of data (~5000) to look at! Let's create a visualization of the PM 2.5 measure to get a better idea of how the AQI changed over time.


**Run the cell below to create the line plot.**

In [174]:
def f(date):
    fig = plt.figure(figsize=(20,3))
    plt.plot(UT_stadium['created_at'].loc[UT_stadium['Date'] == date], UT_stadium["2.5um/dl"].loc[UT_stadium['Date'] == date])
    plt.xlabel('Time')
    plt.ylabel('PM 2.5 Particle Count')
    plt.title('UT Stadium Sensor PM 2.5')
    plt.rcParams["figure.figsize"] = (20,3)
    
interact(f, date = list(UT_stadium['Date'].unique()));

interactive(children=(Dropdown(description='date', options=('07-Oct-2021', '08-Oct-2021', '09-Oct-2021', '10-O…

The line plot above displays the date along the x-axis and the PM 2.5 Particle count along the y-axis.

<br>

**QUESTION: What trends do you notice about the line plot?**



*Your answer here*

<br>

While the line plot does show us a trend in the PM2.5 count over time, we still have not clue how that translates to the API Index. The next section will discuss what AQI is and how it is calculated.

### AQI Index
The AQI Index contains 6 categories that air quality can fall into. Each category contains a range of index values from 0 - 500 that is calculated from the regions PM 2.5 measure. The chart below is provided by the US Environmental Protection Agency (EPA) and shows the official AQI Index (these breakpoints were revised in 2012). 

For more information on how AQI Index is calculated, take a look at the AQI Index Factsheet provided by the EPA [here](https://www.epa.gov/sites/default/files/2016-04/documents/2012_aqi_factsheet.pdf)!

<p align="center">
  <img src="images/AQI-category.png" width="" height="" align="center">
</p>

Now that we know how sesors work, what they measure and how AQI Indexes are calculated, let's see if we can create a visualization of AQI Indexes that are a little closer to home!

First, let's find a group of sensors that are near Houston, Texas. The code cell below does just that. We use a range of longitude and latitude coordinates to decide whether to include or exclude a sensor. 

**Run the cell below to display a dataset of about 40 sensors in and surrounding Houston!**

In [175]:
TX_data = df.loc[(df["lat"] >= 25.9) & (df["lat"] <= 34.1) & (df["lon"] >= -104.9) & (df["lon"] <= -93.1)]
TX_data = TX_data[["lat", "lon", "name", "location_type", "pm_2.5", "temp_f", "humidity", "pressure"]]
TX_data

Unnamed: 0_level_0,lat,lon,name,location_type,pm_2.5,temp_f,humidity,pressure
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
25999,30.053808,-95.494643,Villages of Bridgestone AQI,outside,11.25,84.0,74.0,1007.73
74265,29.939444,-95.671354,16815 Flower Mist Lane,inside,0.79,80.0,35.0,1008.20
30327,33.346098,-96.355190,"236 County Road 5020, Blue Ridge, Texas 75424-...",outside,11.84,,,
83065,30.246557,-98.065587,301 Barton Creek Dr,outside,10.39,76.0,62.0,973.95
98139,29.818880,-95.695667,"3623 Shadow Trail, Houston, Texas 77084",outside,7.03,84.0,68.0,1008.27
...,...,...,...,...,...,...,...,...
34431,29.762515,-95.465982,WPI-2,inside,0.37,81.0,49.0,1007.15
27629,33.538329,-101.781656,Yellow House Canyon,outside,7.91,66.0,25.0,904.14
65197,30.261005,-97.770814,Zilker #1,outside,18.41,77.0,69.0,994.71
52617,30.258937,-97.764749,Zilker neighborhood,outside,18.47,77.0,76.0,993.51


<br>

Now that we have a smaller subset of data to work with, the next step is to use the PM 2.5 measures to assign each sensor to an AQI Index Category and corresponding color. 

**Run the cell below to create a new column in the dataframe that indicates the sensors AQI Index color code.**

In [176]:
#creating a column that indicates the AQI code name
color_code = []
for i in TX_data["pm_2.5"].to_list():
    if i <= 12.0:
        color_code.append('green')
    elif (i < 12) & (i <=35.4):
        color_code.append('yellow')
    elif (i < 35.5) & (i <=55.4):
        color_code.append('orange')   
    elif (i < 55.5) & (i <=150.4):
        color_code.append('red')
    elif (i < 150.5) & (i <=250.4):
        color_code.append('purple')
    else:
        color_code.append('darkpurple')

TX_data['code'] = color_code

<br>

Our last step is to use the longitude and latitude coordinates to map the relative location of the sensor with is corresponding AQI Index color!

In [177]:
def map(Latitude ,Longitude):
    m = folium.Map(width=600, height=500, location=[Latitude, Longitude])
    
    for i in np.arange(len(TX_data) - 1):
        folium.Marker(
            location=[TX_data.iloc[i]['lat'], TX_data.iloc[i]['lon']],
            popup=TX_data.iloc[i]['name'],
            icon=folium.Icon(color=TX_data.iloc[i]['code']),
        ).add_to(m)
    display(m)
    
interact(map, Latitude = (26, 34, 0.001) , Longitude = (-103, -93, 0.001));
## Houston, TX - Lat: 29.7604 / Lon: -95.3698

interactive(children=(FloatSlider(value=30.0, description='Latitude', max=34.0, min=26.0, step=0.001), FloatSl…

Now that we have created a map we can easily see what the AQI index is across the city! 

<br>

**QUESTION: What do you notice about the map?**

*Your answer here*

<br>

Developed By: Ziyue Li, Melisa Esqueda, Maham Bawaney & Karalyn Chong