# Function to query PurpleAir
**We should really set up a database to test this...**

For 10 minute averages of all sensors

This notebook retrieves readings from PurpleAir Sensors in Minneapolis and cleans the entries and texts people who are interested in the sensors if they are above a threshold

Documentation is available here: https://api.purpleair.com.
You can read this article for help getting started: https://community.purpleair.com/t/making-api-calls-with-the-purpleair-api/180.

From PurpleAir: 

"The data from individual sensors will update no less than every 30 seconds. As a courtesy, we ask that you limit the number of requests to no more than once every 1 to 10 minutes, assuming you are only using the API to obtain data from sensors. If retrieving data from multiple sensors at once, please send a single request rather than individual requests in succession.

The PurpleAir historical API is released as of July 18, 2022. For more information, view this post: https://community.purpleair.com/t/new-version-of-the-purpleair-api-on-july-18th/1251.

Please let us know if you have any questions or concerns, and have a great day!"

A paper on this process: https://doi.org/10.5194/amt-14-4617-2021 (Link for [Download](https://www.researchgate.net/publication/352663348_Development_and_application_of_a_United_States-wide_correction_for_PM25_data_collected_with_the_PurpleAir_sensor) )

Chat on which PM Estimate to use: https://community.purpleair.com/t/pm2-5-algorithms/3972/6

## Import Packages

In [1]:
### Import Packages

# File manipulation

import os # For working with Operating System
import requests # Accessing the Web
import datetime as dt # Working with dates/times

# Database 

# import psycopg2
# from psycopg2 import sql

# Analysis

import numpy as np
import geopandas as gpd
import pandas as pd

# Get CWD

cwd = os.getcwd()

## Definitions

In [2]:
# This is my personal API key... Please use responsibly! 51592903-B445-11ED-B6F4-42010A800007

api = input('Please enter your Purple Air api key')

Please enter your Purple Air api key 51592903-B445-11ED-B6F4-42010A800007


In [3]:
# Load the necessary data

datapath = os.path.join(cwd, '..', '..', 'Data')

sensor_info = gpd.read_file(os.path.join(datapath, 'PurpleAir_Stations.geojson'))

In [23]:
# Set the Spike threshold

spike_threshold = 28 # Micgrograms per meter cubed

In [4]:
def getSensorsData(query='', api_read_key=''):

    # my_url is assigned the URL we are going to send our request to.
    url = 'https://api.purpleair.com/v1/sensors?' + query

    # my_headers is assigned the context of our request we want to make. In this case
    # we will pass through our API read key using the variable created above.
    my_headers = {'X-API-Key':api_read_key}

    # This line creates and sends the request and then assigns its response to the
    # variable, r.
    response = requests.get(url, headers=my_headers)

    # We then return the response we received.
    return response

## Importing PurpleAir Station Data from PurpleAir API

In [5]:
#Setting parameters for API
fields = ['pm2.5_10minute']

fields_string = 'fields=' + '%2C'.join(fields)

In [6]:
# Query only for sensors in our database

sensor_ids = sensor_info.sensor_index.unique().astype(int)

sensor_string = 'show_only=' + '%2C'.join(sensor_ids.astype(str))

query_string = '&'.join([fields_string, sensor_string])

In [7]:
#finalizing query for API function

print(query_string)

fields=pm2.5_10minute&show_only=142718%2C142720%2C142726%2C142724%2C142730%2C142728%2C142734%2C142732%2C142736%2C142744%2C142750%2C142748%2C142752%2C142756%2C142774%2C142772%2C142926%2C143214%2C143216%2C143222%2C143226%2C143224%2C143238%2C143242%2C143240%2C143246%2C143248%2C143636%2C143648%2C143656%2C143660%2C143666%2C143668%2C143916%2C143942%2C143944%2C145202%2C145204%2C145242%2C145250%2C145454%2C145470%2C145498%2C145502%2C145506%2C145504%2C145604%2C145610%2C145614%2C145616%2C156605%2C157747%2C157757%2C157787%2C157785%2C157837%2C157845%2C157861%2C157871%2C157877%2C157935%2C166459%2C168327%2C177765


In [150]:
#calling the API
runtime = dt.datetime.today()
response = getSensorsData(query_string, api)

In [151]:
response_dict = response.json() # Read response as a json (dictionary)

col_names = response_dict['fields']
data = np.array(response_dict['data'])

sensors_df = pd.DataFrame(data, columns = col_names)

In [152]:
#visualizing API response
sensors_df.head()

Unnamed: 0,sensor_index,pm2.5_10minute
0,142718,14.3
1,142720,16.0
2,142726,14.2
3,142724,14.5
4,142730,37.6


## Cleaning PurpleAir Station Data

In [153]:
clean_df = sensors_df.copy()

# Rename column for ease of use

clean_df = clean_df.rename(columns = {'pm2.5_10minute':'pm25'})

# Remove obvious error values

clean_df = clean_df[clean_df.pm25 < 1000] 

# Remove NaNs

clean_df = clean_df.dropna()

## Check for Spikes

In [154]:
# Check for spikes

spikes_df =  clean_df[clean_df.pm25 > spike_threshold]

spikes_df

Unnamed: 0,sensor_index,pm25
4,142730,37.6
5,142728,72.1
26,143248,36.4
35,143944,79.0
36,145202,28.3
47,145610,32.8


# Text Alert

In [155]:
# To initialize if never run before

# current_alerts = pd.DataFrame({'sensor_index':pd.Series(dtype='int'),
#                                         'start_time':pd.Series(dtype='object'),
#                                         'max_reading':pd.Series(dtype='float')})

# current_alerts.to_csv(os.path.join(datapath, 'Active_Alerts_Acute_PurpleAir.csv'),
#                      index = False)

# archived_alerts = pd.DataFrame({'sensor_index':pd.Series(dtype='int'),
#                                         'start_time':pd.Series(dtype='object'),
#                                         'max_reading':pd.Series(dtype='float'),
#                                'duration':pd.Series(dtype='int')})

# archived_alerts.to_csv(os.path.join(datapath, 'Archived_Alerts_Acute_PurpleAir.csv'),
#                      index = False)

In [156]:
# Load current alerts

current_alerts = pd.read_csv(os.path.join(datapath, 'Active_Alerts_Acute_PurpleAir.csv'))

In [157]:
# Check if any alerts ended

current_spiked_sensors = set(spikes_df.sensor_index)

old_spiked_sensors = set(current_alerts.sensor_index)

ended_spiked_sensors = old_spiked_sensors - current_spiked_sensors

if len(ended_spiked_sensors) > 0:

    # Add them to alert archive

    archived_alerts = pd.read_csv(os.path.join(datapath, 'Archived_Alerts_Acute_PurpleAir.csv'))
    
    done_alerts = current_alerts[current_alerts.sensor_index.isin(ended_spiked_sensors)].copy()
    
    done_alerts['duration'] = (runtime - done_alerts.start_time).astype(int)

    # Concatenate and save (this really should be a submission using SQL...
    pd.concat([archived_alerts, done_alerts]).to_csv(os.path.join(datapath, 'Archived_Alerts_Acute_PurpleAir.csv'))

    # Remove from current alerts
    
    current_alerts = current_alerts[~current_alerts.sensor_index.isin(ended_spiked_sensors)]
    
    # Text people the spike is over
    # NOT DONE HERE

In [158]:
# Check if there is currently an alert out on this

for _, spike in spikes_df.iterrows():

    sensor_index = int(spike.sensor_index)
    reading = spike.pm25

    # Is there already an alert for this?
    
    if sensor_index in list(current_alerts.sensor_index.astype(int)): # If yes

         # Check if reading is higher than previous max
        
        alert = current_alerts[current_alerts.sensor_index == sensor_index].iloc[0] # Find the alert
        
        if reading > alert.max_reading:
            
            current_alerts.loc[alert.index, 'max_reading'] = reading # Replace max_reading

    else:

        # Start an alert

        new_alert = [sensor_index, runtime, reading]

        current_alerts.loc[len(current_alerts), :] = new_alert

        # Text everyone...
        # NOT DONE
        # message = create_message()

        message = f'''SPIKE ALERT! {sensor_index} is reading at {reading} micrograms/meter^3
        You are receiving this text because you signed up for SpikeAlerts. 
        Please reply with STOP to be removed from this list.'''
        
        print(message)

        # print(message)

In [159]:
current_alerts

Unnamed: 0,sensor_index,start_time,max_reading
0,142730,2023-08-10 20:04:37.898642,37.6
1,142728,2023-08-10 20:04:37.898642,72.1
2,143248,2023-08-10 20:04:37.898642,36.4
3,143944,2023-08-10 20:04:37.898642,79.0
4,145202,2023-08-10 20:04:37.898642,28.3
5,145610,2023-08-10 20:04:37.898642,32.8


In [146]:
current_alerts.to_csv(os.path.join(datapath, 'Active_Alerts_Acute_PurpleAir.csv'),
                      index=False)