
# $\color{red}{\text{City of Mesa - Python API Use Cases}}$

1. Get and display the public dataset count
2. Copy metadata from one dataset to another
3. Check for data "freshness" in a dataset


***

## $\color{red}{\text{Use Case #1:}}$  Use Discovery API to get the public dataset count

In [1]:
# Import libraries
import requests
import re

### Define a method that uses the Discovery API with *domains*, *only*, and *audience* parameters

In [2]:
# Define a re-usable method to return City of Mesa's public dataset count
def com_get_public_dataset_count():

    # Discovery API endpoint for domains
    endpoint = 'http://api.us.socrata.com/api/catalog/v1/domains'
    
    # City of Mesa domain on Socrata
    domain = 'data.mesaaz.gov'
    
    # Build the endpoint url for pulling datasets only
    url = endpoint + '?domains=' + domain + '&only=datasets&audience=public'
    
    # Make the call to the Discovery endpoint 
    response = requests.get(url)
    
    # Parse the response for the count and return the value
    return response.json().get("results",[{}])[0].get("count","")


### Test the method

In [3]:
print(com_get_public_dataset_count())

92


### Let's use Bokeh to chart the count
https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Python_Bokeh_Cheat_Sheet.pdf

In [4]:
#!pip3 install bokeh

In [5]:
# Bokeh imports
from bokeh.plotting import figure, show
from bokeh.io import output_notebook

In [6]:
# Tell Bokeh to output the chart to the notebook
output_notebook()

In [7]:
# Set the x and y values where y is our count
x = ['Count']
y = [com_get_public_dataset_count()]

# Create a Bokeh figure including our count in the title
p = figure(x_range=x, plot_height=200, title='Dataset Count = ' + str(y))

# Set x and y into the figure as a vertical bar
p.vbar(x=x, top=y, width=0.5)

# Increase title size so we can see it
p.title.text_font_size = '12pt'

# Show the figure!
show(p)


***

## $\color{red}{\text{Use Case #2:}}$  Use Metadata API to Copy Metadata Between Datasets

* The City of Mesa has 2 Socrata sites: Public and Internal.
* We use the Metadata API to copy dataset Metadata between datasets on one site, or between sites.

### *See the metadata notebooks in this lab for sample code*




***

## $\color{red}{\text{Use Case #3:}}$  Use Discovery API, and use SoQL to check for dataset "freshness"

* Look at 5 datasets and check each column that is a date type for Max values.
* Get the "age" of the in days by using today's date.
* Plot the age results to identify outliers.

In [8]:
from datetime import datetime

date_format = "%Y-%m-%d"
now = datetime.now()
resource_endpoint = 'https://data.mesaaz.gov/resource/'



### Specific to Mesa - Define a method to scrape HTML for the API dataset ID
(Some of Mesa's datasets have an API dataset ID different from the resource dataset ID.)

In [9]:
# Method to return alternate API 4-by-4 dataset ID by scraping dataset html
def get_api_uuid(dataset_uuid):
    # Get the dataset's Primer page
    url = 'http://data.mesaaz.gov/d/' + dataset_uuid
    response = requests.get(url)
    
    # Use a regular expresssion search to scrape the html for the API 4-by-4 location
    api_content = re.findall("https://dev.socrata.com/foundry/data.mesaaz.gov/.........",
                            str(response.content))
    # Extract the 4-by-4 at the end of the found string
    return api_content[0][-9:]


### Use the Discovery API to get 5 datasets to check

In [10]:
# Use the Discovery API to get list of public datasets (only look at first 5 for demo)
url = 'http://api.us.socrata.com/api/catalog/v1?domains=data.mesaaz.gov&only=datasets&audience=public&limit=5'
response = requests.get(url)
results = response.json().get("results",[{}])


### Iterate over the Discovery API results checking for date fields. If found, use SoQL to get the Max value and compare with "now" as an age (in days).

In [11]:
# Iterate through the results. 
# max date values in a dates list - this will be our Y values.

# Init list to be use for charting
dataset_columns = []
days = []

for dataset in results:

    dataset_resource = dataset['resource']
    
    # Get the dataset ID, dataset name, and sets of field names and field datatypes
    dataset_id = dataset_resource.get('id')
    dataset_name = dataset_resource.get('name')
    column_fields = dataset_resource.get('columns_field_name')
    column_types = dataset_resource.get('columns_datatype')
    
    # City of Mesa specific - Get API dataset ID which may be different than resource dataset ID
    api_id = get_api_uuid(dataset_id)
    
    # Merge the column names and datatypes into a dictionary
    column_dict = dict(zip(column_fields, column_types))
    
    # Iterate over the items (fields) in the dictionary and identify date types
    # k <- column/field name, v <- column/field type
    for field_name,field_type in column_dict.items():

        # Is the field a date type?
        if field_type == 'calendar_date':

            # Using SoQL, get the max value of the date
            max_url = resource_endpoint + api_id + '?$select=max(' + field_name + ')'
            max_response = requests.get(max_url)

            # Extract the date value from the response JSON, where
            # the returned field will pre pre-pended with 'max_'.
            date_str = max_response.json()[0].get('max_' + field_name)
            date_value = datetime.strptime(date_str[0:10], date_format)
            
            # Get the difference between the date value and "now" as an age.
            date_age = (now - date_value).days
            
            # Add the results to the x and y lists to be used for charting.
            dataset_columns.append(dataset_name + ' ' + field_name)
            days.append(date_age)
            


### Chart the data age (in days) of each dateset field. Negative numbers are in the future.

In [12]:
# Set the x and y values where y is the age in days
x = dataset_columns
y = days

# Create a Bokeh figure
p = figure(x_range=x, plot_width=800, plot_height=600, title='Dataset Age')

# Use vertical labels
p.xaxis.major_label_orientation = 'vertical'

# No scientific notation
p.left[0].formatter.use_scientific = False

# Set x and y into the figure as a vertical bar
p.vbar(x=x, top=y, width=0.5)

# Increase title size so we can see it
p.title.text_font_size = '12pt'

# Show the figure!
show(p)