# Engaging Rain Gauges

*Building a data pipeline and API for GIS people, by GIS people*

This notebook serves as 

* an overview of the capability of the rain gauge and calibrated radar rainfall system operated by 3 Rivers Wet Weath
* an interactive demonstration of using the 3RWW Rainfall API to access that data and the visualize it in a notebook with the *ArcGIS API for Python*

In [1]:
import IPython

    * introduce 3rww's system
    * design requirements: replacing existing system and *workflows*.
    * design goals: enhancing for the modern web (API first) and for use in geospatial tools (web and desktop GIS). Show existing AGOL map.
    * questions?
    ---
    * getting it working:
        * step 1: serverless data pipelines (AWS via Stackery)
        * step 2: database + API (Postgres + Django on Heroku)
            * https://trwwapi.herokuapp.com/rainfall/docs/swagger-ui or https://trwwapi.herokuapp.com/rainfall/docs/redoc/
            * API by GIS people for GIS people
        * step 3: ETL, visualize, interact, download your way

    * questions?

    * example:
        * get an AOI, get the pixels, and then get the pixels in the AOI, all with *ArcGIS API for Python*
        * make a request to the Rainfall API (the "shortcut endpoints")
        * join the result to the pixel layer
        * load the layer to a map in the notebook

    * questions?

    ---
    * other example:
        * Make-It Rain web app

    * last call

## The Problem Space: Rain Gauges + Radar Rainfall Estimates

When radar estimates of rainfall are calibrated with actual rain gauge data, a highly accurate and valuable source of rainfall data can be calculated over large geographic areas. 

The result is called *Calibrated Radar Rainfall Data*, or *Gauge-Adjusted Radar Rainfall Data*

[3 Rivers Wet Weather (3RWW)](https://www.3riverswetweather.org), with support from [Vieux Associates](http://www.vieuxinc.com/) and [Datawise](https://datawise.software/), uses rainfall data from the NEXRAD radar located in Moon Township, PA calibrated by rain gauge measurements to produce a map of on-the-ground rainfall estimates for every square kilometer in Allegheny County.

This is equivalent in accuracy of having 2,276 rain gauges on the ground.

You can view and explore this data on 3RWW's calibrated radar rainfall data site at [www.3riverswetweather.org/municipalities/calibrated-radar-rainfall-data](http://www.3riverswetweather.org/municipalities/calibrated-radar-rainfall-data)

This notebook walks through how to programmatically access 3RWW's massive repository of high resolution spatiotemporal rainfall data for Allegheny County via the ***3RWW Data API*** for an area of interest.

# The system is +20 years old!

<img src="https://www.arcgis.com/sharing/rest/content/items/0ebf224e84d547f6b08fb9ae910084af/resources/garr.jpeg?v=1614741684802&w=800"/>

In 2019, it began to look like the existing stack would need to be replaced. That included:

* Data pipelines to consume four rainfall data sources
* A database capable of storing twenty years worth of records (+1.5 billion observations)
* A user interface

Fortunately, we had already started to think about the problem and has some resources ready.

In [2]:
IPython.display.HTML('<iframe src="https://data-3rww.opendata.arcgis.com/app/3rww-rain-gauge-map-1" width=1024 height=500/>')

### In planning a replacement we needed to:

* Respect existing workflows. Existing users needed to be able to get the data in a familiar way.
* Make data available for use in geospatial tools. More and more folks asking to use this in GIS.
* Create an up-to-date web experience. Mobile first; support the potential for integrations in one or more sites.

# Building a Replacement Stack

The way we had to approach the problem meant we were thinking about storing a lot of archival data first, and then figuring out how to get data on recurring basis from the various sources.

## Data Pipelines 

![](https://civicmapper.com/assets/img/proj/pipeline.png)

Stackery (https://www.stackery.io/) handles all orchestration.

* Recurring and periodic pipelines for real-time/provisional and historic/calibrated datasets. 
* Event-driven from end-to-end.
* In-line process for long-term storage of raw data in S3 buckets.
* Services that only run when they need to

## Database + API

<img src="https://cdn.artandlogic.com/wp-content/uploads/2014/12/django-logo-negative-300x136.png" style="display: block; float: left; margin: 20px; height: 100px;">
<img src="https://upload.wikimedia.org/wikipedia/en/thumb/6/60/PostGIS_logo.png/150px-PostGIS_logo.png" style="display: block; float: left; margin: 20px; height: 100px;">
<img src="http://datablend.be/wp-content/uploads/2013/01/redis_logo-290x220-cropped.png" style="display: block; float: left; margin: 20px; height: 100px;">
<img src="https://dailysmarty-production.s3.amazonaws.com/uploads/post/img/509/feature_thumb_heroku-logo.jpg" style="display: block; float: left; margin: 20px; height: 100px;">

### Why?

* We're just getting slices from tables in a database and want to expose them simply through a read-only REST API.
* Spatial is a secondary consideration when storing and serving this data.
* The most tempting queries are the longest running.
* Being able to auto-generate documentation for the API is a bonus: [ReDoc](https://trwwapi.herokuapp.com/rainfall/docs/swagger-ui) and [Swagger](https://trwwapi.herokuapp.com/rainfall/docs/redoc/) 

**When we need to, we create endpoints for specific uses cases, workflows, and technology integrations.**

## With that: ETL, visualize, interact, and download your way

# Engaging Rain Gauges

Working with 3RWW Rainfall in a Jupyter Notebook with the ArcGIS API for Python 

## Notebook Setup

In [28]:
# ----------------------------------
# imports from the Python standard library

import json #read and write JSON
from time import sleep
from statistics import mean, stdev
from copy import deepcopy

# ----------------------------------
# imports from 3rd-party libraries

# Requests - HTTP requests for humans
import requests
# PETL - an Extract/Transform/Load toolbox
import petl as etl
# Python DateUtil (parser) - a helper for reading timestamps
from dateutil.parser import parse

# ArcGIS API for Python - for accessing 3RWW's open reference datasets in ArcGIS Online
from arcgis.gis import GIS
from arcgis import geometry, GeoAccessor

# for displaying things from the ArcGIS Online in this Jupyter notebook
from IPython.display import display

# First: Let's get some test rainfall data from the API

Getting rainfall data programmatically is a straightforward endeavor: it requires you to submit a HTTP request with parameters (as `JSON` payload) specifying locations of interest and a time range. It returns a `JSON` with timestamps, rainfall values, and some metadata about the observation.

In [4]:
# using the python requests library
response = requests.post(
    url="http://localhost:7000/rainfall/v2/gauge/realtime/",
    json={
        'start_dt':'2021-02-28T00:00:00',
        'end_dt':'2021-03-02T00:00:00',
        'rollup': 'daily',
        'gauges': '2,8,9',
        'f': 'sensor' # results as timeseries per sensor, instead sensors per timeseries
    } # Note we're using the `json` kwarg for `request.post` instead of params (default) or data
)

The `response` includes some information about our request and how to get our result: `jobUrl`.

In [5]:
response.json()

{'args': {'start_dt': '2021-02-28T00:00:00',
  'end_dt': '2021-03-02T00:00:00',
  'rollup': 'daily',
  'gauges': '2,8,9',
  'f': 'sensor'},
 'meta': {'jobId': '67ed570d-e5c5-437e-842b-d460e385608a',
  'jobUrl': 'http://localhost:7000/rainfall/v2/gauge/realtime/67ed570d-e5c5-437e-842b-d460e385608a/'},
 'data': None,
 'status': 'queued',
 'status_code': 200,
 'messages': ['running job 67ed570d-e5c5-437e-842b-d460e385608a']}

We can check for completion at that `jobUrl` manually...or use a `while` loop to check for us.

In [6]:
done = False

while done == False:
    
    job = requests.get(response.json()['meta']['jobUrl'])
    j = job.json()
    
    if j['status'] == 'finished':
        done = True
        print("Done!\n", job.json())
        
    elif j['status'] in ['deferred', 'failed', "does not exist", 'error']:
        done = True
        print("Something went wrong: {0}".format("; ",join([m for m in j['messages']])))
            
    else:
        print(j['status'],"...")
        sleep(2)

started ...
Done!
 {'args': {'sensor_ids': '2,8,9', 'start_dt': '2021-02-28T00:00:00-05:00', 'end_dt': '2021-03-02T00:00:00-05:00', 'rollup': 'daily', 'zerofill': True, 'f': 'sensor'}, 'meta': {'records': 3, 'jobId': '67ed570d-e5c5-437e-842b-d460e385608a', 'jobUrl': 'http://localhost:7000/rainfall/v2/gauge/realtime/67ed570d-e5c5-437e-842b-d460e385608a/'}, 'data': [{'id': '2', 'data': [{'ts': '2021-02-28', 'val': 0.72, 'src': 'G'}, {'ts': '2021-03-01', 'val': 0.21, 'src': 'G'}, {'ts': '2021-03-02', 'val': 0, 'src': 'G'}]}, {'id': '8', 'data': [{'ts': '2021-02-28', 'val': 0.86, 'src': 'G'}, {'ts': '2021-03-01', 'val': 0.18, 'src': 'G'}, {'ts': '2021-03-02', 'val': 0, 'src': 'G'}]}, {'id': '9', 'data': [{'ts': '2021-02-28', 'val': 1.0, 'src': 'G'}, {'ts': '2021-03-01', 'val': 0.25, 'src': 'G'}, {'ts': '2021-03-02', 'val': 0, 'src': 'G'}]}], 'status': 'finished', 'status_code': 200, 'messages': []}


Take a look at the results

In [7]:
result_table = etl.fromdicts(j['data'])
etl.vis.display(result_table)

id,data
2,"[{'ts': '2021-02-28', 'val': 0.72, 'src': 'G'}, {'ts': '2021-03-01', 'val': 0.21, 'src': 'G'}, {'ts': '2021-03-02', 'val': 0, 'src': 'G'}]"
8,"[{'ts': '2021-02-28', 'val': 0.86, 'src': 'G'}, {'ts': '2021-03-01', 'val': 0.18, 'src': 'G'}, {'ts': '2021-03-02', 'val': 0, 'src': 'G'}]"
9,"[{'ts': '2021-02-28', 'val': 1.0, 'src': 'G'}, {'ts': '2021-03-01', 'val': 0.25, 'src': 'G'}, {'ts': '2021-03-02', 'val': 0, 'src': 'G'}]"


Calculate rainfall totals for each sensor

In [8]:
result_table = etl.addfield(
    result_table, 
    'total', 
    lambda row: round(sum(
        [t['val'] for t in row['data'] if t['val'] is not None]
    ), 2)
)
result_table

id,data,total
2,"[{'ts': '2021-02-28', 'val': 0.72, 'src': 'G'}, {'ts': '2021-03-01', 'val': 0.21, 'src': 'G'}, {'ts': '2021-03-02', 'val': 0, 'src': 'G'}]",0.93
8,"[{'ts': '2021-02-28', 'val': 0.86, 'src': 'G'}, {'ts': '2021-03-01', 'val': 0.18, 'src': 'G'}, {'ts': '2021-03-02', 'val': 0, 'src': 'G'}]",1.04
9,"[{'ts': '2021-02-28', 'val': 1.0, 'src': 'G'}, {'ts': '2021-03-01', 'val': 0.25, 'src': 'G'}, {'ts': '2021-03-02', 'val': 0, 'src': 'G'}]",1.25


Now we have both timeseries and totals per sensor. This is all we need to get rainfall totals on a map.

# Next: Make the map

As shown above, 3RWW's Rainfall Data API is not spatial: it returns rainfall values for locations at points in time, but those locations are only represented by IDs.

(We keep the location separate from the observations because the sensors don't move.)

To do anything that is location specific with this data (e.g., query rainfall in a specific watershed), you'll want some geodata for reference.

## Radar Pixel Polygons

The pixel layer used for the calibrated radar rainfall data is available on [3RWW's Open Data Portal](http://data-3rww.opendata.arcgis.com/) at:

* [data-3rww.opendata.arcgis.com/datasets/228b1584b89a45308ed4256c5bedd43d_1](https://data-3rww.opendata.arcgis.com/datasets/228b1584b89a45308ed4256c5bedd43d_1), and
* [3rww.maps.arcgis.com/home/item.html?id=228b1584b89a45308ed4256c5bedd43d](https://3rww.maps.arcgis.com/home/item.html?id=228b1584b89a45308ed4256c5bedd43d)

We'll use the ArcGIS API for Python to bring it in here.

In [9]:
# Establish a connection to your 3RWW's ArcGIS Online portal.
gis = GIS('https://3rww.maps.arcgis.com')

We can search for the feature layer by name:

In [10]:
search_results = gis.content.search('Gauge Adjusted Radar Rainfall Data')
for item in search_results:
    display(item)
garrd_item = search_results[0]

Alternatively, we can use the item `id` to directly find the feature layer:

In [11]:
garrd_id = "228b1584b89a45308ed4256c5bedd43d"
garrd_item = gis.content.get(itemid=garrd_id)
garrd_item

Either way gets us `gaard_item`: a feature layer *collection* item, which contains individual feature layers. This one (we know from clicking on the item above), has both points and polygons variants of the GARRD reference geometry. We're interested in the polygons (grid). Get that as follows:

In [12]:
garrd_grid = garrd_item.layers[1]
garrd_grid

<FeatureLayer url:"https://services6.arcgis.com/dMKWX9NPCcfmaZl3/arcgis/rest/services/garrd/FeatureServer/1">

Put in on a map:

In [13]:
m = gis.map('Pittsburgh')
m.add_layer(garrd_grid)
display(m)

MapView(layout=Layout(height='400px', width='100%'))

Finally, we can turn that into a `geojson`-like Python dictionary.

In [14]:
q = garrd_grid.query(out_sr=4326)
garrd_grid_geojson = q.to_geojson

## Area of Interest Polygons

Next, we'll establish an area of interest using a polygon from an existing dataset: the Saw Mill Run watershed.

Allegheny County has a watershed dataset in ArcGIS Online, so we'll use that for this example. It's available here:

* http://openac-alcogis.opendata.arcgis.com/datasets/364f4c3613164f79a1d8c84aed6c03e0_0

(Note that you could swap this out for any online geodata service that provides polygons, and this will work)

In [15]:
# use the item ID from the link above to get the layer
watersheds_item = gis.content.get(itemid="364f4c3613164f79a1d8c84aed6c03e0")
watersheds_layer = watersheds_item.layers[0]
basin = watersheds_layer.query(where="DESCR like '%GIRTYS%'", out_sr=4326)

In [16]:
m2 = gis.map('Pittsburgh')
m2.add_layer(basin)
m2

MapView(layout=Layout(height='400px', width='100%'))

> *Note that while we're pulling our data from online sources, you could also read in your own geometry here from a shapefile on disk.*

## Intersecting Pixels w/ the Area of Interest

Now that we know how to get pixel data, and know how to get area of interest data, we can perform a spatial intersection to IDs of the pixels in the area of interest, which we'll use in a query to the Teragon API.

Using the `garrd_grid` feature layer and the `saw_mill_run` feature_set, running and intersect is pretty easy:

In [17]:
# construct the filter using the geometry module
sa_filter = geometry.filters.intersects(geometry=basin.features[0].geometry, sr=4326)
# then use that filter in a query of the the pixel data
pixels_of_interest = garrd_grid.query(geometry_filter=sa_filter, out_sr=4326)

In [18]:
m3 = gis.map('Pittsburgh')
m3.add_layer(pixels_of_interest)
m3

MapView(layout=Layout(height='400px', width='100%'))

There they are: pixels covering the Saw Mill Run watershed. Let's get a list of IDs, since that's what we're after.

First, let's introspect so we know what to go after:

In [19]:
pixels_of_interest.features[0]

{"geometry": {"rings": [[[-80.0475764477203, 40.5659130044032], [-80.047883686099, 40.5749156295791], [-80.0360770815008, 40.5751493312659], [-80.0357714203079, 40.5661466752918], [-80.0475764477203, 40.5659130044032]]]}, "attributes": {"OBJECTID": 539, "PIXEL": "143124", "Shape__Area": 10763910.416626, "Shape__Length": 13123.3595800102}}

We can see that each Feature object is represented as a Python dictionary, and the ID is stored under `attributes` in the `PIXEL` property. We can get all the Pixel IDS out into a list with a one-liner:

In [20]:
pixel_ids = list(set([f.attributes['PIXEL'] for f in pixels_of_interest.features]))
print(pixel_ids)

['145128', '144125', '148133', '147130', '148135', '145127', '150131', '144131', '150132', '146128', '148131', '145125', '149132', '146130', '148132', '150133', '146129', '144126', '144128', '144124', '143126', '143124', '149134', '149130', '145131', '147133', '149133', '144127', '148134', '144129', '145126', '148130', '147134', '147132', '143130', '143128', '143129', '145129', '148129', '149135', '143127', '146127', '149131', '146126', '143125', '145130', '144130', '147131', '146131']


## Getting rainfall data for those locations

Similar to before, except this time:

* we're coming with a list of pixels for our area of interest
* we use the realtime radar pixels endpoint


In [21]:
# note that our pixels kwarg is comma-delimited string, rather than a Python list.
pixels_str = ",".join(pixel_ids)

response = requests.post(
    url="http://localhost:7000/rainfall/v2/pixel/realtime/",
    json={
        'start_dt':'2021-02-28T00:00:00',
        'end_dt':'2021-03-02T00:00:00',
        'rollup': 'daily',
        'pixels': pixels_str,
        'f': 'sensor'
    }
)

In [22]:
done = False

while done == False:
    
    job = requests.get(response.json()['meta']['jobUrl'])
    j = job.json()
    
    if j['status'] == 'finished':
        done = True
        print("Done!\n", job.json())
        
    elif j['status'] in ['deferred', 'failed', "does not exist", 'error']:
        done = True
        print("Something went wrong: {0}".format("; ".join([m for m in j['messages']])))
            
    else:
        print(j['status'],"...")
        sleep(2)

started ...
started ...
started ...
started ...
started ...
started ...
started ...
started ...
started ...
Done!
 {'args': {'sensor_ids': '145128,144125,148133,147130,148135,145127,150131,144131,150132,146128,148131,145125,149132,146130,148132,150133,146129,144126,144128,144124,143126,143124,149134,149130,145131,147133,149133,144127,148134,144129,145126,148130,147134,147132,143130,143128,143129,145129,148129,149135,143127,146127,149131,146126,143125,145130,144130,147131,146131', 'start_dt': '2021-02-28T00:00:00-05:00', 'end_dt': '2021-03-02T00:00:00-05:00', 'rollup': 'daily', 'zerofill': True, 'f': 'sensor'}, 'meta': {'records': 49, 'jobId': 'db842d97-08c5-4027-a8eb-11fdf256062d', 'jobUrl': 'http://localhost:7000/rainfall/v2/pixel/realtime/db842d97-08c5-4027-a8eb-11fdf256062d/'}, 'data': [{'id': '143124', 'data': [{'ts': '2021-02-28', 'val': 0.695, 'src': 'R'}, {'ts': '2021-03-01', 'val': 0.321, 'src': 'R'}, {'ts': '2021-03-02', 'val': 0, 'src': 'R'}]}, {'id': '143125', 'data': [{'ts'

Take a look at the results:

In [23]:
result_table = etl.fromdicts(j['data'])
etl.vis.display(result_table)

id,data
143124,"[{'ts': '2021-02-28', 'val': 0.695, 'src': 'R'}, {'ts': '2021-03-01', 'val': 0.321, 'src': 'R'}, {'ts': '2021-03-02', 'val': 0, 'src': 'R'}]"
143125,"[{'ts': '2021-02-28', 'val': 0.7, 'src': 'R'}, {'ts': '2021-03-01', 'val': 0.298, 'src': 'R'}, {'ts': '2021-03-02', 'val': 0, 'src': 'R'}]"
143126,"[{'ts': '2021-02-28', 'val': 0.665, 'src': 'R'}, {'ts': '2021-03-01', 'val': 0.282, 'src': 'R'}, {'ts': '2021-03-02', 'val': 0, 'src': 'R'}]"
143127,"[{'ts': '2021-02-28', 'val': 0.633, 'src': 'R'}, {'ts': '2021-03-01', 'val': 0.277, 'src': 'R'}, {'ts': '2021-03-02', 'val': 0, 'src': 'R'}]"
143128,"[{'ts': '2021-02-28', 'val': 0.626, 'src': 'R'}, {'ts': '2021-03-01', 'val': 0.269, 'src': 'R'}, {'ts': '2021-03-02', 'val': 0, 'src': 'R'}]"


Using PETL we can calculate a total for our timeseries as well:

In [24]:
result_table = etl.addfield(
    result_table, 
    'total', 
    lambda row: round(sum(
        [t['val'] for t in row['data'] if t['val'] is not None]
    ), 2)
)
result_table

id,data,total
143124,"[{'ts': '2021-02-28', 'val': 0.695, 'src': 'R'}, {'ts': '2021-03-01', 'val': 0.321, 'src': 'R'}, {'ts': '2021-03-02', 'val': 0, 'src': 'R'}]",1.02
143125,"[{'ts': '2021-02-28', 'val': 0.7, 'src': 'R'}, {'ts': '2021-03-01', 'val': 0.298, 'src': 'R'}, {'ts': '2021-03-02', 'val': 0, 'src': 'R'}]",1.0
143126,"[{'ts': '2021-02-28', 'val': 0.665, 'src': 'R'}, {'ts': '2021-03-01', 'val': 0.282, 'src': 'R'}, {'ts': '2021-03-02', 'val': 0, 'src': 'R'}]",0.95
143127,"[{'ts': '2021-02-28', 'val': 0.633, 'src': 'R'}, {'ts': '2021-03-01', 'val': 0.277, 'src': 'R'}, {'ts': '2021-03-02', 'val': 0, 'src': 'R'}]",0.91
143128,"[{'ts': '2021-02-28', 'val': 0.626, 'src': 'R'}, {'ts': '2021-03-01', 'val': 0.269, 'src': 'R'}, {'ts': '2021-03-02', 'val': 0, 'src': 'R'}]",0.9


## Get the results on the map

We update the each feature in the layer with the total

In [32]:
# make a copy since we're going to modify the layer in place
pixel_layer = deepcopy(pixels_of_interest)

# for f in pixels_geojson['features']:
for f in pixel_layer.features:
    # get the pixel ID
    p = f.attributes['PIXEL']
    # use that to select the row in our table; get row as a dict
    t = etl\
        .selecteq(result_table, 'id', p)\
        .cutout('data', 'id')\
        .dicts()
    # update the feature's attributes with the total
    f.attributes.update(t[0])

Since we've added a new property to the attributes object, we also need to describe that as a new field in the layer object:

In [33]:
new_fields = [
    {
        'name': 'total',
        'type': 'esriFieldTypeDouble',
        'alias': 'total',
        'sqlType': 'sqlTypeOther',
        'domain': None,
        'defaultValue': None
    }
]

pixels_of_interest.fields.extend(new_fields)

Finally, we're ready to put that on a map:

In [36]:
map_widget = gis.map('Pittsburgh')
map_widget.add_layer(
    pixel_layer, 
    options={"opacity":1, "renderer": "ClassedColorRenderer", "field_name":"total"}
)
map_widget

MapView(layout=Layout(height='400px', width='100%'))

# So...that seemed like a lot of work?

Our users are potentially looking at using this data in ways we can't accomodate in a web app. 

This presentation is also the notebook that we'll share with them so they can create their own custom workflows.

For those who just want to see and download rainfall data? [There's an app for that](https:/3rww.github.io/rainfall).