## PurpleAir Data
Data are available via PurpleAir's API. You will need to use a gmail account and create an API key via [this dashboard](https://develop.purpleair.com/sign-in?redirectURL=%2Fdashboards%2Fkeys). You should create a "Read" key that has a status of "Enabled". It's a good idea to add a label, host restrictions restrict the use of the key to certain machines, you do not need to set these. 

TODO: insert screen shot image

Once you have generated your key, you can "read" the key value to use in making requests. First, run the cell below and enter the key you generated when prompted. 

If you do not have a Gmail account, or you don't want to set up an API, sample data shown here have been saved as part of the repository and can be used directly, just skip down to [insert section]

In [1]:
# imports
import datetime
import getpass
import requests


import geopandas as gpd
import pandas as pd


In [2]:

api_key = getpass.getpass("Enter your API key: ")

Test your API key by running the code below. It should show the message "Key submit was successful" if your key is valid. 

In [None]:
url = "https://api.purpleair.com/v1/keys"

headers = {
    "X-API-Key": api_key 
}

response = requests.get(url, headers=headers)

if response.status_code == 201:
    print('Key submit was successful')
    data = response.json()
    print(data)
else:
    print(f"Request failed with status code: {response.status_code}")

If the API key is valid, a bounding box can be used to search for sensors. The coordinates in the cell below represent the longitude and latitude of the northwest and southeast corners of a box that encloses the South Bronx. The API request returns the identifiers of sensors within that bounding box. 

In [3]:
# corner latitude and longitudes in decimal degrees
nwlat = 40.9
nwlng = -73.933
selat = 40.80
selng = -73.78

url = 'https://api.purpleair.com/v1/sensors'

headers = {"X-API-Key": api_key}

params = {
    'fields':'name,latitude,longitude,position_rating,last_seen,date_created',
    'location_type':0,
    'nwlng':nwlng,
    'nwlat':nwlat,
    'selng':selng,
    'selat':selat

}

with requests.get(url, headers=headers, params=params) as response:

    if response.status_code == 200:
        print('Success')
        data = response.json()
        print(f'Request returned {len(data)} sensors')
    else:
        print(f"Request failed with status code: {response.status_code}")
        print(response)


Success
Request returned 8 sensors


We can convert the sensor list to a dataframe and save it for future use. 

In [None]:
df = pd.DataFrame(data['data'], columns=data['fields'])
gdf = gpd.GeoDataFrame(
    df,
    geometry=gpd.points_from_xy(df.longitude, df.latitude), 
    crs="EPSG:4326"
)

print(gdf.shape)
gdf.to_parquet('./SouthBronxSensors.parquet', engine='pyarrow', compression='snappy')


(10, 8)


If the data didn't download properly, or you cannot set up an API key, you can uncomment the code below will load the locally saved version of the file. 

In [4]:
gdf = gpd.read_parquet('./SouthBronxSensors.parquet')
print(gdf.shape)
gdf.head()


(10, 8)


Unnamed: 0,sensor_index,date_created,last_seen,name,position_rating,latitude,longitude,geometry
0,90249,1605560768,1743086164,FreshAir-O4,3,40.861225,-73.89016,POINT (-73.89016 40.86122)
1,90283,1605561111,1743086143,SIS-roof,5,40.81536,-73.888374,POINT (-73.88837 40.81536)
2,90389,1605561629,1743086130,FA-AHo,5,40.83022,-73.92234,POINT (-73.92234 40.83022)
3,91423,1605893083,1743086164,FA-O2b,5,40.861134,-73.891556,POINT (-73.89156 40.86113)
4,91899,1606157681,1743086204,FA-O7,5,40.83016,-73.9219,POINT (-73.9219 40.83016)


In [5]:
# map to show the sensors
#gdf.explore()


With several sensors available, data can be pulled from the API. First, we can look at the sensors date created and last date collected to see if they are likely to have data in the time period we are interested in. Pandas, a library for working with datatables, has a helper method to convert the timestamp in the data from seconds to a human-readable date and time. 

In [6]:
# date_created is when the sensor was set up in the database, last_seen
# is the date & time for the last sensor recording
print('Date Created:')
print(pd.to_datetime(gdf.date_created, unit='s'))
print('\nDate Last Seen:')
print(pd.to_datetime(gdf.last_seen, unit='s'))

Date Created:
0   2020-11-16 21:06:08
1   2020-11-16 21:11:51
2   2020-11-16 21:20:29
3   2020-11-20 17:24:43
4   2020-11-23 18:54:41
5   2020-11-24 17:25:14
6   2020-11-24 17:25:22
7   2022-11-29 19:05:50
8   2023-12-27 16:41:38
9   2024-03-27 19:17:52
Name: date_created, dtype: datetime64[ns]

Date Last Seen:
0   2025-03-27 14:36:04
1   2025-03-27 14:35:43
2   2025-03-27 14:35:30
3   2025-03-27 14:36:04
4   2025-03-27 14:36:44
5   2025-03-27 14:36:15
6   2025-03-27 14:35:50
7   2025-03-27 13:54:05
8   2025-03-27 14:35:26
9   2025-03-27 14:35:40
Name: last_seen, dtype: datetime64[ns]


From this, it can be seen that all of the sensors were created before the fires, and all have collected data recently. Next, the code below will retrieve data from one of the sensors for the date of the event. 

To retrieve data, the sensor ID is used to construct a new request, including the columns we want to include and the date/time range from which to get data. 

In [18]:
# TODO: turn comments into table above
fields = [
    'pm2.5_alt',        #Estimated mass concentration PM2.5 (µg/m³).
    'pm2.5_atm',        #Estimated mass concentration PM2.5 (µg/m³) (raw value).
    'humidity',         #Relative humidity inside of the sensor housing (%). This matches the "Raw Humidity" map layer and on average is 4% lower than ambient conditions.
    'temperature',      #Temperature inside of the sensor housing (F). This matches the "Raw Temperature" map layer and on average is 8°F higher than ambient conditions.
    'pressure',         #Current pressure in Millibars.
]

In [22]:
# TODO: retrieve time fields with data

start_date = datetime.datetime(2024,11,7).timestamp()
end_date = datetime.datetime(2024,11,10).timestamp()

sensor_ids = gdf.sensor_index.values
# get the first id for testing
id = sensor_ids[1]

# dictionary to hold returned sensor data
sensor_data = {}
# construct a request
# update the url, it is the API sensor url with :sensor after the end of the base URL
url = f"https://api.purpleair.com/v1/sensors/{id}/history" 

# new parameters
params = {
    # 'sensor_index':id, # encoded in URL
    'fields':",".join(fields),
    'start_timestamp':start_date,
    'end_timestamp':end_date,

}

with requests.get(url=url, headers=headers, params=params) as response:

    if response.status_code == 200 or response.status_code == 201:
        print('Success')
        sensor_data = response.json()
        print(len(sensor_data))


    else:
        print(f"Request failed with status code: {response.status_code}")
        print(response.text)

Success
9


In [23]:
df_sensor = pd.DataFrame(data = sensor_data.get('data'), columns=sensor_data.get('fields'))
print(df_sensor.shape)
df_sensor.head()
# sample_data = history_data.get('data')
# type(sample_data)

(0, 6)


Unnamed: 0,time_stamp,humidity,temperature,pressure,pm2.5_alt,pm2.5_atm


In [25]:
sensor_data.get('data')


[]