# Rivulet: U. S. Geological Survey Water Quality APIs
_by Michelle H Wilkerson, Lucas Coletti_

## Purpose of this Notebook

This notebook was developed as part of NSF Grant 2445609 to support accessing and processing public data for middle and high school classroom activities. It's written to be relatively accessible to beginners, but if you have not interacted with computational notebooks or python before you may find navigating this tool difficult. (Check out the Show Your Work project for a gentle introduction to computational notebooks for educators!)

Our project is focused on supporting data analysis and mechanistic reasoning in science education. In other words, we want students to learn how data provides information about _how scientific mechanisms work_, and how understanding scientific mechanisms can help them to _explain and interpret patterns in data_. This builds on a long history of research on complex systems and agent-based modeling, and more closely connects that work to current expansions of data analysis across subjects.

Here, we are focused on Water Quality as a phenomenon. While most students understand that poor Water Quality can impact health, they may not know what sorts of pollutants impact water quality, and what kinds of events or conditions lead to reductions in water quality.

This data tool allows users to connect to the United States Geological Survey (USGS) water data APIs, search for water quality data streams in an area of interest, and then provides a collection of ways to fitler and search the data to ensure you find datasets that have patterns worth exploring. These kinds of datasets can serve as a launch to examining what WQ is and what are its underlying mechanistic and compositional complexities.

You are welcome to modify and adapt this script. You may find the USGS' water data APIs documentation [here]() and [here](https://doi-usgs.github.io/dataretrieval-python/) helpful.

## Part 0: What is an API? (Click to expand...)

Describe APIs, how they work, how common they are in data science, why they are useful for educators and educational researchers who do data science education work to know about. Describe the risks and concerns about APIs and using them to source data that students will interact with.

## Part I: Connecting with USGS Water Data APIs

The USGS has developed a python library (unhelpfully but impressively called `dataretrieval` to help people access and fetch hydrological data from several different water-related data services.

You can sign up for an API key [at this site](https://api.waterdata.usgs.gov/signup). Once you recieve it, replace the DEMO_KEY below with your unique API key. Do not share your key!

In [3]:
!pip install dataretrieval

API_KEY = "A1K91YFduwNuiDxl57aqiDtg42gSxQdaCBcU3rjq" 

Collecting dataretrieval
  Downloading dataretrieval-1.0.12-py3-none-any.whl.metadata (9.2 kB)
Downloading dataretrieval-1.0.12-py3-none-any.whl (38 kB)
Installing collected packages: dataretrieval
Successfully installed dataretrieval-1.0.12


Not sure this is important yet, but the docs say that if you want data after March 2024 you want to specify `legacy=False`. See [here](https://github.com/DOI-USGS/dataretrieval-python#:~:text=%E2%9A%A0%EF%B8%8F,the%20wqp%20module.) and [here](https://doi-usgs.github.io/dataRetrieval/articles/Status.html#:~:text=Discrete%20Data,non%2DUSGS%20data) for more information.

In [9]:
import dataretrieval.nwis as nwis
import pandas as pd

## Part 2: Specifying a Location and Time Period

Let's create a bounding box to indicate the region we are interested in. We will then filter our queries to focus only on monitoring sites within the bounding box.

In [14]:
# EDIT HERE: Define a bounding box around your
# target region. If it is densely populated, we suggest
# you start with a bounding box that is only one degree
# in area. 

min_lat = 37.5 # CHANGE TO YOUR MINIMUM LATITUDE
max_lat = 38 # CHANGE TO YOUR MAXIMUM LATITUDE

min_long = -122.5 # CHANGE TO YOUR MINIMUM LONGITUDE
max_long = -122

# this is unnecessary but sort of luxurious. let's map the box to
# make sure we're capturing what we want.

import folium

bbox = [[min_lat, min_long], [max_lat, max_long]]

# Calculate the center of the box to position the map
map_center = [(bbox[0][0] + bbox[1][0]) / 2, (bbox[0][1] + bbox[1][1]) / 2]

# Create a Folium map object
m = folium.Map(location=map_center, zoom_start=8)

# Add a rectangle for the bounding box to the map
folium.Rectangle(
    bounds=bbox,
    color="#ff0000",        # Red border
    fill=True,
    fill_color="#ff7800",   # Orange fill
    fill_opacity=0.2
).add_to(m)

m

We're gonna fetch the list of sites within the specified bounding box above. 

Like the AQS team, the NWIS team is awesome and fetch you some beautiful data. For each request, they pass back a tuple (in this case, a two-ple) of dataframe, metadata. So be sure you catch what's returned with that format. Below, since we are requesting sites, we assign the results of the request to the tuple sites, sites_metadata.

Note that at the bottom of the output here, you'll get a URL. That's the corresponding GET request that you can put in to get the same data. Helpful for debugging if something's not working as you would expect.

In [15]:
# bBox (list): A contiguous range of decimal latitude and longitude.
# Starts with the west longitude, then the south latitude, 
# then the east longitude, and then the north latitude 
# with each value separated by a comma. 
# The product of the range of latitude range and longitude cannot exceed 25 degrees. 
# TODO: Ok I have no idea how to smartly translate from the idea of min/max 
# (especially considering different hemispheres) to this idea of east west. 
# Since these are US Services maybe we can ignore the hemisphere question for now...

bbox = str(min_long) + "," + str(min_lat) + "," + str(max_long) + "," + str(max_lat)

sites, sites_metadata = nwis.what_sites(bBox=bbox)

sites

Unnamed: 0,agency_cd,site_no,station_nm,site_tp_cd,dec_lat_va,dec_long_va,coord_acy_cd,dec_coord_datum_cd,alt_va,alt_acy_va,alt_datum_cd,huc_cd,geometry
0,USGS,11162618,PILARCITOS LK NR HILLSBOROUGH CA,LK,37.548783,-122.422152,S,NAD83,2.66,0.16,NAVD88,18050006,POINT (-122.42215 37.54878)
1,USGS,111626182,PILARCITOS C BL SPILLWAY NR HILLSBOROUGH CA,ST,37.547694,-122.420889,5,NAD83,613.50,0.16,NAVD88,18050006,POINT (-122.42089 37.54769)
2,USGS,11162619,PILARCITOS C AB STONE DAM NR HILLSBOROUGH CA,ST,37.528139,-122.400083,5,NAD83,528.77,0.16,NAVD88,18050006,POINT (-122.40008 37.52814)
3,USGS,11162620,PILARCITOS C BL STONE DAM NR HILLSBOROUGH CA,ST,37.524617,-122.399373,S,NAD83,459.45,0.16,NAVD88,18050006,POINT (-122.39937 37.52462)
4,USGS,11162640,DENNISTON C A EL GRANADA CA,ST,37.509663,-122.488033,F,NAD83,,,,18050006,POINT (-122.48803 37.50966)
...,...,...,...,...,...,...,...,...,...,...,...,...,...
496,USGS,375819122035801,GRAYSON C A GOLF CLUB RD NR PLEASANT HILL CA,ST,37.972000,-122.066000,S,NAD83,23.00,4.30,NAVD88,18050001,POINT (-122.066 37.972)
497,USGS,375824122261201,SAN FRANCISCO BAY WATER QUALITY PROJECT SITE 15,ES,37.973331,-122.436667,M,NAD83,,,,18050002,POINT (-122.43667 37.97333)
498,USGS,375848122011201,002N001W31D001M,GW,37.980000,-122.020000,F,NAD83,101.70,20.00,NAVD88,18050001,POINT (-122.02 37.98)
499,USGS,375918122250701,SAN PABLO BAY NR SAN PABLO CA,ES,37.988392,-122.418700,S,NAD83,,,,18050002,POINT (-122.4187 37.98839)


Whoa! Ok that's a lot of sites.

## Part 3: EColi (?) TODO

## Part 4: Winter Salt?

## Part 5: Oxygen Dissolution

## Part 6: Turbidity?

## Credits

Hodson, T.O., Hariharan, J.A., Black, S., and Horsburgh, J.S., 2023, dataretrieval (Python): a Python package for discovering and retrieving water data available from U.S. federal hydrologic web services: U.S. Geological Survey software release, https://doi.org/10.5066/P94I5TX3.