# Assessing WQP API's via AOI's
- Emilio Mayorga, UW-APL. 2018-4-5
- Run with Python 2.7 using [`odm2client` conda environment](https://github.com/BiG-CZ/wshp2017_tutorial_content/blob/master/clientenvironment.yml), the one created for the [BiGCZ-ODM2 workshop in Nov. 2017](https://github.com/BiG-CZ/bigcz_wshp2017)

## Code used to calculate the area in $km^2$ of an AOI bounding box
This AOI is specified in lat-lon coordinates as `bbox = (ymin, xmin, ymax, xmax)`

In [1]:
import numpy as np
from shapely.geometry import Polygon

From https://stackoverflow.com/questions/45733838/how-to-calculate-area-of-a-polygon-with-latitude-and-longitude

In [2]:
def to_polygon_vertices(ymin, xmin, ymax, xmax):
    # list of bounding box vertices, to form a closed polygon
    box_points = [(ymin, xmin), (ymax, xmin), (ymax, xmax), (ymin, xmax), (ymin, xmin)]
    return box_points

def switch_to_xy_coordinates(coordinates):
    # returns x,y coordinates in km
    earth_radius = 6371.0  # in km
    lat_dist = np.pi * earth_radius / 180.0

    latitudes, longitudes = zip(*coordinates)
    y = (lat * lat_dist for lat in latitudes)
    x = (lon * lat_dist * np.cos(np.radians(lat)) for lat, lon in zip(latitudes, longitudes))
    return list(zip(x, y))

def compute_polygon_area(coordinates):
    # returns area in km2
    xy_coordinates = switch_to_xy_coordinates(coordinates)
    return Polygon(xy_coordinates).area

## WQP first tests
[Water Quality Portal Web Services Guide](https://www.waterqualitydata.us/webservices_documentation/)

In [3]:
from collections import OrderedDict
from StringIO import StringIO
import requests as r
import pandas as pd

In [4]:
# url = "https://www.waterqualitydata.us/data/Station/search?characteristicName=Caffeine&bBox=-92.8,44.2,-88.9,46.0&mimeType=tsv&sorted=no"
wsurl = "https://www.waterqualitydata.us/data/Station/search"

### Set AOI request parameters

In [5]:
# use this scheme to generate a square bounding box centered on the specified lat-lon point
# and with the specified width (in degrees)
width_deg = 1.0

latctr, lonctr = 40.1, -75.5  # just north of the Schuykil river near Philly
# latctr, lonctr = 41.1, -75.5  # 1 deg north of the above PA/DRB center point
# latctr, lonctr = 46.5, -123.0  # About halfway between Olympia, WA and Portland, OR
# latctr, lonctr = 42.0, -93.0  # Iowa
# latctr, lonctr = 30.0, -97.5  # Texas, south of Austin

In [6]:
bbox = (latctr-0.5*width_deg, lonctr-0.5*width_deg, latctr+0.5*width_deg, lonctr+0.5*width_deg)
ymin,xmin,ymax,xmax = bbox

In [7]:
print("BBOX: {0}.  AOI area: {1:.0f} km2".format(bbox, compute_polygon_area(to_polygon_vertices(*bbox))))

BBOX: (39.6, -76.0, 40.6, -75.0).  AOI area: 9457 km2


## Run WQP API calls

In [8]:
#characteristicName="Caffeine",
params = dict(
    bBox="{0:.3f},{1:.3f},{2:.3f},{3:.3f}",
    mimeType="tsv",
    sorted="no"
)

In [9]:
params['bBox'] = params['bBox'].format(xmin, ymin, xmax, ymax)

In [10]:
params

{'bBox': '-76.000,39.600,-75.000,40.600', 'mimeType': 'tsv', 'sorted': 'no'}

In [11]:
%%time
resp = r.get(wsurl, params=params)

CPU times: user 1.09 s, sys: 140 ms, total: 1.23 s
Wall time: 17.3 s


In [12]:
df = pd.read_csv(StringIO(resp.content), sep='\t')  # coerce_float=True)

  interactivity=interactivity, compiler=compiler, result=result)


In [13]:
df.shape

(19528, 36)

In [14]:
df.head()

Unnamed: 0,OrganizationIdentifier,OrganizationFormalName,MonitoringLocationIdentifier,MonitoringLocationName,MonitoringLocationTypeName,MonitoringLocationDescriptionText,HUCEightDigitCode,DrainageAreaMeasure/MeasureValue,DrainageAreaMeasure/MeasureUnitCode,ContributingDrainageAreaMeasure/MeasureValue,...,CountyCode,AquiferName,FormationTypeText,AquiferTypeName,ConstructionDateText,WellDepthMeasure/MeasureValue,WellDepthMeasure/MeasureUnitCode,WellHoleDepthMeasure/MeasureValue,WellHoleDepthMeasure/MeasureUnitCode,ProviderName
0,USGS-MD,USGS Maryland Water Science Center,MD006-394205075514101,CE Ae 75,Well,,2060002,,,,...,15.0,Piedmont and Blue Ridge crystalline-rock aquifers,"Gneisses, On Garret Island, Near Elk Milles, &...",Unconfined single aquifer,20100609.0,240.0,ft,240.0,ft,NWIS
1,USGS-MD,USGS Maryland Water Science Center,MD006-394205075514301,CE Ae 74,Well,,2060002,,,,...,15.0,Piedmont and Blue Ridge crystalline-rock aquifers,"Gneisses, On Garret Island, Near Elk Milles, &...",Unconfined single aquifer,20100608.0,340.0,ft,340.0,ft,NWIS
2,USGS-MD,USGS Maryland Water Science Center,MD006-394207075514501,CE Ae 73,Well,,2060002,,,,...,15.0,Piedmont and Blue Ridge crystalline-rock aquifers,"Gneisses, On Garret Island, Near Elk Milles, &...",Unconfined single aquifer,20100604.0,380.0,ft,380.0,ft,NWIS
3,USGS-MD,USGS Maryland Water Science Center,MD006-394208075514901,CE Ae 78,Well,,2060002,,,,...,15.0,Piedmont and Blue Ridge crystalline-rock aquifers,"Gneisses, On Garret Island, Near Elk Milles, &...",Unconfined single aquifer,19920425.0,450.0,ft,450.0,ft,NWIS
4,USGS-MD,USGS Maryland Water Science Center,MD006-394208075514902,CE Ae 79,Well,,2060002,,,,...,15.0,Piedmont and Blue Ridge crystalline-rock aquifers,"Gneisses, On Garret Island, Near Elk Milles, &...",Unconfined single aquifer,19920426.0,450.0,ft,450.0,ft,NWIS


In [15]:
df.iloc[0]

OrganizationIdentifier                                                                       USGS-MD
OrganizationFormalName                                            USGS Maryland Water Science Center
MonitoringLocationIdentifier                                                   MD006-394205075514101
MonitoringLocationName                                                                     CE Ae  75
MonitoringLocationTypeName                                                                      Well
MonitoringLocationDescriptionText                                                                NaN
HUCEightDigitCode                                                                            2060002
DrainageAreaMeasure/MeasureValue                                                                 NaN
DrainageAreaMeasure/MeasureUnitCode                                                              NaN
ContributingDrainageAreaMeasure/MeasureValue                                               

In [16]:
df[['MonitoringLocationIdentifier', 'MonitoringLocationName', 'OrganizationIdentifier', 'OrganizationFormalName', 
    'LongitudeMeasure', 'LatitudeMeasure']].tail(10)

Unnamed: 0,MonitoringLocationIdentifier,MonitoringLocationName,OrganizationIdentifier,OrganizationFormalName,LongitudeMeasure,LatitudeMeasure
19518,STROUD-WELL7,Stroud Preserve: Well7,STROUD,Stroud Water Research Center (Pennsylvania),-75.653158,39.944822
19519,STROUD-WELL8,Stroud Preserve: Well8,STROUD,Stroud Water Research Center (Pennsylvania),-75.6534,39.945586
19520,STROUD-WELL9,Stroud Preserve: Well9,STROUD,Stroud Water Research Center (Pennsylvania),-75.653531,39.945444
19521,WWMD_VA-USGS01458500,USGS 1458500 Delaware River at Frenchtown NJ,WWMD_VA,Virginia - World Water Monitoring Day,-75.065,40.526111
19522,WWMD_VA-USGS01467200,USGS 1467200 Delaware R at Ben Franklin Bridg...,WWMD_VA,Virginia - World Water Monitoring Day,-75.137399,39.954002
19523,WWMD_VA-USGS01477050,"USGS 1477050 Delaware River at Chester, PA",WWMD_VA,Virginia - World Water Monitoring Day,-75.366302,39.836779
19524,USGS-01467150,Cooper River at Haddonfield NJ,USGS-NJ,USGS New Jersey Water Science Center,-75.021389,39.903056
19525,USGS-01477120,Raccoon Creek near Swedesboro NJ,USGS-NJ,USGS New Jersey Water Science Center,-75.259167,39.740556
19526,USGS-01464907,Little Neshaminy C at Valley Road nr Neshaminy PA,USGS-PA,USGS Pennsylvania Water Science Center,-75.119616,40.229275
19527,USGS-01472157,"French Creek near Phoenixville, PA",USGS-PA,USGS Pennsylvania Water Science Center,-75.601305,40.151491


In [17]:
df['MonitoringLocationTypeName'].value_counts()

Well                                               16023
River/Stream                                        1509
Stream                                              1203
Estuary                                              207
Spring                                               177
Lake                                                 105
Atmosphere                                            67
Facility: Outfall                                     44
Lake, Reservoir, Impoundment                          32
River/Stream Perennial                                30
Well: Collector or Ranney type well                   18
Facility Other                                        14
Stream: Tidal stream                                  14
Well: Test hole not completed as a well               12
Facility Municipal Sewage (POTW)                      10
Subsurface: Unsaturated zone                           9
Facility: Water-distribution system                    7
Reservoir                      

In [18]:
df['ProviderName'].value_counts()

NWIS       17542
STORET      1982
BIODATA        4
Name: ProviderName, dtype: int64

In [19]:
df['OrganizationIdentifier'].value_counts()

USGS-PA                  15162
USGS-NJ                   1547
USGS-MD                    837
11NPSWRD                   419
NJDEP_BFBM                 253
21NJDEP1                   199
DRBC                       152
NJDEP_AMERICORPS           124
21DELAWQ_WQX               121
NJDEPNJWAP                  89
KWMNDATA                    79
NJDEP_DSREH                 67
MDE_FIELDSERVICES_WQX       60
SAN                         43
21PA_WQX                    39
NJDEP_BEARS                 32
STROUD                      31
31DRBCSP                    29
NARS_WQX                    27
EMAP_CS                     25
EMAP_CS_WQX                 23
31DELRBC_WQX                20
31DELRBC                    17
MDEDAT05_WQX                15
42SRBCWQ_WQX                12
NALMS                       11
OST_SHPD_TEST               10
NARSTEST                    10
RCE WRP                     10
NARS                         8
MDE_TMDL                     8
OST_SHPD                     7
MDEDAT06

**Initially, let's target these attributes only.** (Note: the Provider is the data system or database that the data are extracted from; the Organization is the entity that actually collected or is ultimately responsible for the data). The values shown are the ones from the example listed in cell 15, above.
```
ProviderName                                                     NWIS
OrganizationIdentifier                                        USGS-MD
OrganizationFormalName             USGS Maryland Water Science Center
MonitoringLocationIdentifier                    MD006-394205075514101
MonitoringLocationName                                      CE Ae  75
MonitoringLocationTypeName                                       Well
LatitudeMeasure                                               39.7013
LongitudeMeasure                                             -75.8615
```