<img src="images/ProjectPythia_Logo_Final-01-Blue.svg" width=300 alt="Project Pythia Logo"></img>


This link is problematic...


`<img src="https://www.epa.gov/themes/epa_theme/images/epa-seal.svg" width = 300 alt="US EPA Logo">`


# Visualizing Data with EPA's Air Quality System (AQS) API

---

## Overview
Air quality data are an important aspect of both atmospheric and environmental sciences. Understanding the concentrations of particulate matter and chemical species like O<sub>3</sub> and NO<sub>x</sub> can be useful for air pollution analysis from both the physical science and health science perspectives.

In this notebook, we will cover:
1. Accessing data from the AQS
1. Exploring the format of the data
1. Preparing the data for visualization
1. Generating a timeseries plot of various data points


#### Actually doing the first two steps will properly inform what's written in the second two steps. We're just not there yet.

## Prerequisites

| Concepts | Importance | Notes |
| --- | --- | --- |
| [Introduction to Pandas](https://foundations.projectpythia.org/core/pandas/pandas.html) | Necessary | How to deal with dataframes and datasets |
| [Matplotlib Basics](https://foundations.projectpythia.org/core/matplotlib/matplotlib-basics.html) | Helpful | Skills for different plotting styles and techniques |
| Project management | Helpful | |

- **Time to learn**: 525,600 minutes (<b><i>estimation might not be accurate</i></b>)
- **System requirements**:
    - Turing Air Quality Environment Kernel (Or install the package from [GitHub](https://github.com/USEPA/pyaqsapi#readme))
    - Email address for AQS access (It's the government...)

---

## Imports


<div class="admonition alert alert-info">
    <p class="admonition-title" style="font-weight:bold">Info</p>
Here we'll import lots of stuff, but we might not end up using them all...
</div

In [None]:
import matplotlib.pyplot as plt
import pandas as pd
from datetime import date
from datetime import datetime
import numpy as np
import pyaqsapi as aqs

## Accessing data from the AQS

<div class="admonition alert alert-warning">
    <p class="admonition-title" style="font-weight:bold">Important:</p>
If you have previously registered an account with the AQS, now will be a good time to get that information out.

If not, you should have an email address in mind that you'd like to use.
</div>

### Register a new email with aqs_sign_up

Remove the comment # and replace 'EMAIL' in the code below with an email address to use for credentials.

In [None]:
aqs.aqs_sign_up('EMAIL')

<div class="admonition alert alert-success">
    <p class="admonition-title" style="font-weight:bold">Success</p>
A verification email should have been sent. Click the link in the email to verify your account.
</div>

#### Data can be pulled from the AQS in a number of different ways..
1. By Sample Site
1. By County
1. By State
1. By Lat/Long Box
1. By Monitoring Agency
1. By Primary Quality Assurance Organization
1. By Core Based Statistical Aera( as defined by the US Census Bureau)

### Let's look at how the package deals with states...

In [None]:
aqs.aqs_states()

<div class="admonition alert alert-danger">
    <p class="admonition-title" style="font-weight:bold">Whoops!</p>
    You need to input your credentials before any of the functions will work!
</div>

### Use the aqs_credentials function to input your username (your email address) and access key.
#### This is all found in the email you received when verifying your email address.
If you've previously registered your address and do not have the key, you can simply generate a new key by using the aqs_sign_up funtion to resubmit your email address.

In [None]:
aqs.aqs_credentials(username='', key='')
#This notebook will fail a run test because there is no email address or credentials key in this cell.
#Please input your own information to proceed.

### Let's look at those states now...

In [None]:
aqs.aqs_states()

##### Since states will be input via a number, let's store this as a variable that we can call on later to remind ourselves of what states we need.

Let's assume for now that we want to focus on New York and also save that code as variable.

In [None]:
states = aqs.aqs_states()
NY = 36

<div class="admonition alert alert-danger">
    <p class="admonition-title" style="font-weight:bold">Everything Is Currently Numerical</p>
    It's important that we also address the fact that everything is input as a numerical value for pulling these data from the AQS.
</div>

Parameter Codes can be accessed from the EPA [here](https://aqs.epa.gov/aqsweb/documents/codetables/parameters.html), but to simplify things here are codes for a few common pollutants with defined Air Quality Index values you might be looking for...

| Pollutant | Parameter Code |
| --- | --- |
| Carbon Monoxide (CO) | 42101 |
| Nitrogen Dioxide (NO<sub>2</sub>) | 42602 |
| Ozone (O<sub>3</sub>) | 44201 |
| PM 10 (Total) | 81102 |  
| PM 2.5 (Local Conditions) | 88101 |


##### Let's also store the parameter codes as variables to make things more simple.

In [None]:
CO = 42101
NO2 = 42602
O3 = 44201
PM10 = 81102
PM25 = 81104

---

### Exploring the format of the data

Let's look at current O<sub>3</sub> data for New York State.

In [None]:
now = datetime.today()
year = now.year
month = now.month
day = now.day
print(year, month, day)

<div class="admonition alert alert-warning">
    <p class="admonition-title" style="font-weight:bold">Warning</p>
    Note that the above date is in UTC!
</div>

We'll subtract one day so we have the past day of data.

In [None]:
zone = aqs.bystate.sampledata(parameter= O3, bdate = date(year=year, month=month, day = day-1), edate = date(year=year, month=month, day = day), stateFIPS=NY)

In [None]:
zone

<div class="admonition alert alert-danger">
    <p class="admonition-title" style="font-weight:bold">Oh, no!</p>
    Looks like there isn't current O<sub>3</sub> data available. Let's try earlier. 
</div>

In [None]:
zone = aqs.bystate.sampledata(parameter= O3, bdate = date(year=year, month=month-2, day = day-1), edate = date(year=year, month=month, day = day), stateFIPS=NY)

In [None]:
zone

Great! Now we have some data. Let's look at the columns.

In [None]:
zone.columns

When does the O<sub>3</sub> data end?

In [None]:
print('Most Recent Date: ', zone['date_local'].max())

<div class="admonition alert alert-info">
    <p class="admonition-title" style="font-weight:bold">PAUSE</p>
    We've seen how to pull the data, and we've seen that pulling current data is not necessarily possible every time. Let's look at the most recent full month of data.
</div>

In [None]:
sept = 9
start = 1
end = 30

In [None]:
zone_sept = aqs.bystate.sampledata(parameter= O3, bdate = date(year=year, month=sept, day = start), edate = date(year=year, month=sept, day = end), stateFIPS=NY)
zone_sept

A quick check at plotting the data in its original format shows that some polishing is necessary.

In [None]:
plt.plot(zone_sept['date_local'], zone_sept['sample_measurement'], '.')

<div class="admonition alert alert-warning">
    <p class="admonition-title" style="font-weight:bold">Spatiotemporal Issues Abound!</p>
    It looks like the primary hiccups in trying to plot these data are the fact that the DataFrame has a separate column for date and time, and that there are multiple sample sites across the DataFrame.

<p></p>We can combine the dates and times into a single datetime value.<p>
    
We can also specify the sample sites we want to look at, or take averages across a county or the whole state.<p>Either way, we'll need to prepare the data.</p>
</div>

---

## Preparing the data for visualization

Let's utilize some of Pandas features to generate a more manageable DataFrame for plotting.

<div class="admonition alert alert-danger">
    <p class="admonition-title" style="font-weight:bold">TO BE CONTINUED!!!!</p>
</div>

### Subsection to the second section

#### a quick demonstration

##### of further and further

###### header levels

as well $m = a * t / h$ text! Similarly, you have access to other $\LaTeX$ equation [**functionality**](https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Typesetting%20Equations.html) via MathJax (demo below from link),

\begin{align}
\dot{x} & = \sigma(y-x) \\
\dot{y} & = \rho x - y - xz \\
\dot{z} & = -\beta z + xy
\end{align}

Check out [**any number of helpful Markdown resources**](https://www.markdownguide.org/basic-syntax/) for further customizing your notebooks and the [**Jupyter docs**](https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Working%20With%20Markdown%20Cells.html) for Jupyter-specific formatting information. Don't hesitate to ask questions if you have problems getting it to look *just right*.

## Generating a timeseries plot of various data points

If you're comfortable, and as we briefly used for our embedded logo up top, you can embed raw html into Jupyter Markdown cells (edit to see):

<div class="admonition alert alert-info">
    <p class="admonition-title" style="font-weight:bold">Info</p>
    Your relevant information here!
</div>

Feel free to copy this around and edit or play around with yourself. Some other `admonitions` you can put in:

<div class="admonition alert alert-success">
    <p class="admonition-title" style="font-weight:bold">Success</p>
    We got this done after all!
</div>

<div class="admonition alert alert-warning">
    <p class="admonition-title" style="font-weight:bold">Warning</p>
    Be careful!
</div>

<div class="admonition alert alert-danger">
    <p class="admonition-title" style="font-weight:bold">Danger</p>
    Scary stuff be here.
</div>

We also suggest checking out Jupyter Book's [brief demonstration](https://jupyterbook.org/content/metadata.html#jupyter-cell-tags) on adding cell tags to your cells in Jupyter Notebook, Lab, or manually. Using these cell tags can allow you to [customize](https://jupyterbook.org/interactive/hiding.html) how your code content is displayed and even [demonstrate errors](https://jupyterbook.org/content/execute.html#dealing-with-code-that-raises-errors) without altogether crashing our loyal army of machines!

---

## Summary
Air Quality Data... We've got it. We've plotted it (hopefully). What a great and amazing tool this is!

### What's next?
Next we will explore how to improve the timeseries and possibly make it interactive.

## Resources and references
Finally, be rigorous in your citations and references as necessary. Give credit where credit is due. Also, feel free to link to relevant external material, further reading, documentation, etc. Then you're done! Give yourself a quick review, a high five, and send us a pull request. A few final notes:
 - `Kernel > Restart Kernel and Run All Cells...` to confirm that your notebook will cleanly run from start to finish
 - `Kernel > Restart Kernel and Clear All Outputs...` before committing your notebook, our machines will do the heavy lifting
 - Take credit! Provide author contact information if you'd like; if so, consider adding information here at the bottom of your notebook
 - Give credit! Attribute appropriate authorship for referenced code, information, images, etc.
 - Only include what you're legally allowed: **no copyright infringement or plagiarism**
 
Thank you for your contribution!