## Water Hub Data Access
Draft 1 of the Data Access Chapter in the "Water Data and Code" Tutorial Series. This code was first written by Santiago Botero using Python 3.9.1 64-bit, and then further adapted into a jupyter notebook by Richard Johnson.

#### Check that we can connect to the water hub repository
The cell below uses "requests" to connect with the water hub. A response of 200 tells us that the connection was successful.

In [None]:
# First we need to import the packages we need to make an HTTP request
import requests
import json

data = requests.get('https://waterhub.livinglakescanada.ca/api/3')
print(data)

####  Look at the packages available
The reply is a json file that is contains everything, i.e. all of the packages.
json is a javascript notation for moving and fetching data.
For more than you even want to know see:  https://docs.python.org/3/library/json.html


In [None]:
data = requests.get('https://waterhub.livinglakescanada.ca/api/3/action/package_search').json()
print(data)

#### Make the output readable.
The data that json returns is a dictionary. That is hard to read. The cell below "prettifies" it.
The second cell sorts the file alphabetically/

In [None]:
print (json.dumps(data, indent=1))

In [None]:
print (json.dumps(data, indent=1, sort_keys=True))


#### Search for a package
Here we specify which package within the above has the word Norns Creek data. (package_search?q=norns).
I have indented the print to 5 to make it easy to see the "results" list, within which we have two "resources" being
'named' 'Monitoring Stations GPS points' and 'Norns Creek Measurements'.
We want the id of the latter which is 12 lines above "name: Norns Creek Measuremenst" and is "id: fb4c4973-ad6e-4774-9832-c46c49f20369"


In [None]:
data = requests.get('https://waterhub.livinglakescanada.ca/api/3/action/package_search?q=norns').json()
print(json.dumps(data, indent=5))

#### Finally, look at the actual data
If we look at the 'resource_id' of the "Norns Creek Measurements", which we put into the URL
--- > ~/datastore_search?resource_id=fb4c4973-ad6e-4774-9832-c46c49f20369' we get the JSON string.
I have printed this with only an indent of 2 because it is easy to read.
I "typed" it to prove that it is a string, not a list nor a dictionary.

In [None]:
data = requests.get('https://waterhub.livinglakescanada.ca/api/3/action/datastore_search?resource_id=fb4c4973-ad6e-4774-9832-c46c49f20369').json()

# Find the key for dictionary:
json_string = json.dumps(data['result']['records'], indent=2)
print(type(json_string),json_string)



#### Now we have the data
First we must convert it from a string to a python object, a list, using the Python method 'loads'
We can iterate over a list to pull out the dictionaries within it. Remember that Python indices start at 0.
We can install some QA/QC checks here such as each dictionary is the same length and key names. (secomd cell)

In [None]:
dict_list = json.loads(json_string)
print (type(dict_list))                   #confirm that we have a 'list'

for item in dict_list:                    #iterate and print because the list is only 4 items.
    print (item)

In [None]:
# ------------QA/QC---------------------
for item in dict_list:                  
    print (len(item))               # The dictionaries should be the same size                    
    print (item.keys())             # The fienld names should be the same.