### Get Allocations Data through WaDE API using Indexed Access and Plot Allocation Amounts for a Given State

This Jupyter notebook calls the WaDE 2.0 API to access the "allocation amounts digest", a simplified call that returns only few fields, for a state. This code tests obtaining the whole allocation amounts data for a given state by using indexed (paged) access of the API calls. Each call obtains 10000 rows (the maximum allowed) at a time, and the responses are concatenated into one dataframe.

The code:
1. Calls the WaDE API Digest and gets water allocations table with only four metadata in JSON format.

2. Organizes the data into a Pandas data frame.

4. Plosts water allocation amount in Google map and Plotly's mapbox.


#####  Required packages

    - Pandas
    - Numpy
    - JSON
    - gmaps
    - plotly

Install required packages (from command line or here) if the they have not been installed already.If running from Jupyter notebook use the cell magic: %%cmd

    %%cmd
    pip install gmaps
    
    pip install plotly


In addition, you may need to enable the following extensions:

    jupyter nbextension enable --py --sys-prefix widgetsnbextension

    jupyter nbextension enable --py --sys-prefix gmaps


In [1]:
#!/usr/bin/env python
import pandas as pd
import numpy as np
import os
import json
from pandas.io.json import json_normalize
from urllib.request import urlopen
import gmaps
import gmaps.datasets
import plotly.express as px

In [2]:
# Make one short (with few number of records) call to get length of allocation data

# Access the first 10 rows of siteallocationamounts
url = 'https://wade-api-qa.azure-api.net/v1/SiteAllocationAmounts?State=WA&StartIndex=0\
&RecordCount=10'
#print(url)
response =  urlopen(url)
dataread = response.read().decode("utf-8")
data = json.loads(dataread)
data
df1 = json_normalize(data) #, 'Organizations')
alloc_length = df1['TotalWaterAllocationsCount'].iloc[0]
print(alloc_length)
df1

58231


Unnamed: 0,Organizations,TotalWaterAllocationsCount
0,[{'OrganizationName': 'Washington State Depart...,58231


In [3]:
# Access WaDE API Digest to get the water allocations JSON for limited number of columns

base_url='https://wade-api-qa.azure-api.net/v1/SiteAllocationAmountsDigest?OrganizationUUID='

# organization name
org_name = 'WSDE'

df100_list = []
rec_count = 10000
alloc_length = 58231
iloop = 0
print(alloc_length)
for start_index in range(0, alloc_length, rec_count) :
    print("loop "+str(iloop) + " Start index = " + str(start_index))
    url = base_url + org_name + '&StartIndex='+str(start_index) + '&RecordCount=' + str(rec_count)
    response =  urlopen(url)
    dataread = response.read().decode("utf-8")
    data = json.loads(dataread)
    df10 = json_normalize(data, 'Sites', 
                      ['AllocationPriorityDate', 'AllocationAmount','AllocationMaximum' ])
    df100_list.append(df10)
    print('length of dataframe = '+str(len(df10.index)))
    
    iloop = iloop + 1

df100 = pd.concat(df100_list, sort=True, ignore_index=True)

#df100.drop_duplicates(inplace=True)
#print(len(df100.index))

df100   #.head(5)

58231
loop 0 Start index = 0
length of dataframe = 10000
loop 1 Start index = 10000
length of dataframe = 6377
loop 2 Start index = 20000
length of dataframe = 0
loop 3 Start index = 30000
length of dataframe = 0
loop 4 Start index = 40000
length of dataframe = 0
loop 5 Start index = 50000
length of dataframe = 0


Unnamed: 0,AllocationAmount,AllocationMaximum,AllocationPriorityDate,Latitude,Longitude,SiteUUID
0,2,356,2011-01-26T00:00:00,47.060906,-120.358700,WA_100001
1,10,2,1974-06-30T00:00:00,48.978377,-119.066474,WA_100002
2,,12,1956-05-01T00:00:00,48.973750,-119.038081,WA_100003
3,10,2,1974-06-30T00:00:00,48.980615,-119.048117,WA_100005
4,0.004,1.25,1889-11-11T00:00:00,48.972173,-118.980836,WA_100012
5,0.004,0.66,1889-11-11T00:00:00,48.966752,-118.985362,WA_100013
6,0.01,0.5,1889-11-11T00:00:00,48.965844,-119.007295,WA_100014
7,0.01,0.5,1889-11-11T00:00:00,48.964146,-119.014000,WA_100017
8,0.004,0.53,1889-11-11T00:00:00,48.970280,-119.052122,WA_100018
9,0.00222801,0.93,1939-06-01T00:00:00,48.969201,-119.046863,WA_100019


In [4]:
print("Drop rows without lat lon values...")

df500 = df100.dropna(subset=['Longitude', 'Latitude'])
df500 = df100.reset_index(drop=True)

#print(len(df500.index))
df500   #.head(5)

Drop rows without lat lon values...


Unnamed: 0,AllocationAmount,AllocationMaximum,AllocationPriorityDate,Latitude,Longitude,SiteUUID
0,2,356,2011-01-26T00:00:00,47.060906,-120.358700,WA_100001
1,10,2,1974-06-30T00:00:00,48.978377,-119.066474,WA_100002
2,,12,1956-05-01T00:00:00,48.973750,-119.038081,WA_100003
3,10,2,1974-06-30T00:00:00,48.980615,-119.048117,WA_100005
4,0.004,1.25,1889-11-11T00:00:00,48.972173,-118.980836,WA_100012
5,0.004,0.66,1889-11-11T00:00:00,48.966752,-118.985362,WA_100013
6,0.01,0.5,1889-11-11T00:00:00,48.965844,-119.007295,WA_100014
7,0.01,0.5,1889-11-11T00:00:00,48.964146,-119.014000,WA_100017
8,0.004,0.53,1889-11-11T00:00:00,48.970280,-119.052122,WA_100018
9,0.00222801,0.93,1939-06-01T00:00:00,48.969201,-119.046863,WA_100019


In [5]:
print("Drop duplicates if there are any...")

subCols = ['Longitude', 'Latitude']

df500.drop_duplicates(subset = subCols, inplace=True)   #
df500 = df500.reset_index(drop=True)

print(len(df500.index))
df500

Drop duplicates if there are any...
12979


Unnamed: 0,AllocationAmount,AllocationMaximum,AllocationPriorityDate,Latitude,Longitude,SiteUUID
0,2,356,2011-01-26T00:00:00,47.060906,-120.358700,WA_100001
1,10,2,1974-06-30T00:00:00,48.978377,-119.066474,WA_100002
2,,12,1956-05-01T00:00:00,48.973750,-119.038081,WA_100003
3,10,2,1974-06-30T00:00:00,48.980615,-119.048117,WA_100005
4,0.004,1.25,1889-11-11T00:00:00,48.972173,-118.980836,WA_100012
5,0.004,0.66,1889-11-11T00:00:00,48.966752,-118.985362,WA_100013
6,0.01,0.5,1889-11-11T00:00:00,48.965844,-119.007295,WA_100014
7,0.01,0.5,1889-11-11T00:00:00,48.964146,-119.014000,WA_100017
8,0.004,0.53,1889-11-11T00:00:00,48.970280,-119.052122,WA_100018
9,0.00222801,0.93,1939-06-01T00:00:00,48.969201,-119.046863,WA_100019


In [6]:
# make sure the data are in the right data types
# plotly complained about allocation types being 'object'

print(df500.dtypes)

df500['AllocationAmount'] = pd.to_numeric(df500['AllocationAmount'], errors='coerce')
df500['AllocationMaximum'] = pd.to_numeric(df500['AllocationMaximum'], errors='coerce')
df500['Latitude'] = pd.to_numeric(df500['Latitude'], errors='coerce')
df500['Longitude'] = pd.to_numeric(df500['Longitude'], errors='coerce')
print(df500.dtypes)

AllocationAmount           object
AllocationMaximum          object
AllocationPriorityDate     object
Latitude                  float64
Longitude                 float64
SiteUUID                   object
dtype: object
AllocationAmount          float64
AllocationMaximum         float64
AllocationPriorityDate     object
Latitude                  float64
Longitude                 float64
SiteUUID                   object
dtype: object


In [7]:
print("Droping null amounts...")

df500purge = df500.loc[(df500["AllocationAmount"] == '') | (df500["AllocationAmount"] == np.nan)]
if len(df500purge.index) > 0:
    dropIndex = df500.loc[(df500["AllocationAmount"] == '') | (df500["AllocationAmount"] == np.nan)].index
    outdf100 = df500.drop(dropIndex)
    outdf100 = df500.reset_index(drop=True)

Droping null amounts...



elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison



In [8]:
print("Droping null max amounts...")

df500purge = df500.loc[(df500["AllocationMaximum"] == '') | (df500["AllocationMaximum"] == np.nan)]
if len(df500purge.index) > 0:
    dropIndex = df500.loc[(df500["AllocationMaximum"] == '') | (df500["AllocationMaximum"] == np.nan)].index
    outdf100 = df500.drop(dropIndex)
    outdf100 = df500.reset_index(drop=True)

Droping null max amounts...


###### Make sure to get API keys from Google and Mapbox

In [None]:
# Plot allocation amount as a gmaps heatmap

APIKey = 'AI...'  # put your Google API key here
print(APIKey)
gmaps.configure(api_key=APIKey)

logan_coordinates = (41.6, -111.8)
denver_coordinates = (39.78, -104.59)
#Wenatchee Washington 98801
wenatchee_coordinates = 47.425159, -120.326302
fig = gmaps.figure(map_type='HYBRID', center=wenatchee_coordinates, zoom_level=6.5)

locations = df500[['Latitude', 'Longitude']]
#locations = locations[0:8701]
weights = df500['AllocationAmount']
#weights = weights1[0:8701]
fig.add_layer(gmaps.heatmap_layer(locations, weights=weights))

fig

###### If using the token file, make sure to put a mapbox token file (.mapbox_token) inside the directory of this source code

In [19]:
# plot allocation amount as plotly heatmap

px.set_mapbox_access_token(open(".mapbox_token").read())

fig = px.scatter_mapbox(df500, lat="Latitude", lon="Longitude",  
                        color="AllocationAmount", #size="AllocationMaximum",
                  color_continuous_scale=px.colors.cyclical.IceFire, 
                        size_max=5, range_color=[0,5],
                        zoom=5.5, hover_data=['AllocationAmount'])
fig.show()