# CYPLAN255
### Urban Informatics and Visualization

HIT RECORD and TRANSCRIBE

# Lecture 11 -- APIs <img src="https://i.imgur.com/wNMULZP.jpg" width=550 align='right' title="Man Standing in the Lumberyard of Seattle Cedar Lumber Manufacturing, Alfred Eisenstaedt (1939)">
******
March 2, 2022 

# Agenda
1. Announcements
2. Intro to APIs
3. Using APIs with Python
4. For next time
5. Questions


# 1. Announcements

- Final Project released tonight
- Assignment 4 (project proposal + initial analysis) due March 13
- GitHub Pages tutorial

# 2. Intro to APIs

- What's an API?
- Examples
- Why are APIs useful?
- Types of APIs and API responses

## 2.1. What's an API?

- **A**pplication
  - software, product, or service
- **P**rogramming
  - we're going to be writing code
- **I**nterface
  - a point where two systems, subjects, organizations, etc. meet and interact

APIs are _transactional_, kind of like ordering food at a restaurant:
 1. client requests an item from the menu
 2. waiter takes the order and tells the cook what to make
 3. cook prepares the item and gives it to the server
 4. server brings client the item they ordered


In this analogy, the **waiter** + **menu** + **server** constitute the **API**

## 2.2 Examples

<center><img src="images/api1.png" width=75%></center>

<center><img src="images/api2.png" width=75%></center>

<center><img src="images/api3.png" width=75%></center>

Yes, even pandas is an API!

## 2.3 Why is an APIs useful? i.e. why do companies publish them?

- Standardizes the access points to a service or piece of software
  - No ordering "off the menu"
- Allows proprietary details to remain private
  - The ingredients in the chef's secret sauce are never revealed
- The implementation details do not matter to the client
  - Customer doesn't need or want to be able to cook the dish themselves

## 2.4 Types of APIs

1. APIs in a programming language
  - Functions: `my_function()`
  - Arguments/parameters: `my_function(args=x)`
  - Function returns a value
2. APIs over the web
  - URL endpoints: `http://my.domain/endpoint`
  - Query parameters: `http://my.domain/endpoint?args=x`
  - HTTP request returns a value

### 2.4.1 REST APIs

The most common type of API you'll encounter on the web is the **REST** API. REST APIs define a limited set of operations for transferring data between a **client** and a **server** using **HTTP**.

<img src="https://phpenthusiast.com/theme/assets/images/blog/what_is_rest_api.png?021019a" width=80%>

## 2.5 API Response Objects

We've primarily been dealing with _tabular_ data so far (columns and rows), but most APIs on the web will return data in a **hierarchical** format like **JSON** or **XML**.

<center><img src="images/api4.png" width=500></center>

JSON stands for **JavaScript Object Notation**. Since JavaScript is the "language of the web", most of the data you'll get from web APIs will be formatted as JSON.

There's nothing special or fancy or scary about JSON: _it's just nested dictionaries_.

_**JSON IS JUST NESTED DICTIONARIES**_

This means its super easy to work with JSON in Python. 

For example, Python comes with its own built-in module for reading and writing JSON files:

```python
import json
```

This is JSON:

```javascript
{
  "firstName": "Jason",
  "lastName": "Response",
  "address": {
    "streetAddress": "404 Error Street",
    "city": "",
    "state": "Null Island",
    "postalCode": "10100-0100"
  }
  "spouse": null
}
```

# 3. APIs, Python, and you

## 3.1 Using the `requests` library

`requests` is my library of choice for querying API endpoints and URLs in Python. If you don't yet have it installed go ahead and do that now. Check out the [documentation](https://docs.python-requests.org/en/latest/).

Using requests is as simple as:
```python
requests.get("https://my.domain/endpoint")
```

It can get a bit more complicated than that if, for example, an API requires authentication, or if you want to `POST` rather than `GET` data. But we don't need to worry about that for now.

## 3.2. SF Trees

Let's get some street tree data from the San Francisco Open Data Portal and use it to practice with APIs. First, our imports:

In [1]:
import pandas as pd
import json      # library for working with JSON-formatted text strings
import requests  # library for accessing content from web URLs
import pprint    # library for cleanly printing Python data structures
pp = pprint.PrettyPrinter()

Take a moment to familiarize yourself with the API endpoint we'll be using by reading the documentation:
https://data.sfgov.org/City-Infrastructure/Tree-Caretakers-in-San-Francisco/frdy-nem6

Under Export / SODA (Socrata Open Data API) we can see the API endpoint url, and the columns available. This is the "menu" in our restaurant analogy.

Now let's download the data:

In [2]:
endpoint_url = "https://data.sfgov.org/resource/frdy-nem6.json"
response = requests.get(endpoint_url)

Now let's take a look at what we got:

In [3]:
response

<Response [200]>

We can use the `response.text` command to just look at the first 500 characters of the response object:

In [4]:
results = response.text
print(type(results))
print(results[:500])

<class 'str'>
[ {
  "siteorder" : "1",
  "xcoord" : "6016267.25355",
  "location" : {
    "latitude" : "37.73636162009325",
    "human_address" : "{\"address\": \"\", \"city\": \"\", \"state\": \"\", \"zip\": \"\"}",
    "needs_recoding" : false,
    "longitude" : "-122.38620200123"
  },
  "qlegalstatus" : "DPW Maintained",
  "ycoord" : "2096084.36716",
  "planttype" : "Tree",
  "dbh" : "16",
  "qaddress" : "9 Young Ct",
  "latitude" : "37.73636162009325",
  "qcaretaker" : "Private",
  "qsiteinfo" : "Sidewalk


We can use Python's `json` module to convert that string into a dictionary (or list of dictionaries):

In [5]:
# parse the string into a Python dictionary (loads = "load string")
data = json.loads(results)
print(data[:3])
print(type(data))

[{'siteorder': '1', 'xcoord': '6016267.25355', 'location': {'latitude': '37.73636162009325', 'human_address': '{"address": "", "city": "", "state": "", "zip": ""}', 'needs_recoding': False, 'longitude': '-122.38620200123'}, 'qlegalstatus': 'DPW Maintained', 'ycoord': '2096084.36716', 'planttype': 'Tree', 'dbh': '16', 'qaddress': '9 Young Ct', 'latitude': '37.73636162009325', 'qcaretaker': 'Private', 'qsiteinfo': 'Sidewalk: Curb side : Cutout', 'longitude': '-122.38620200123', 'qspecies': 'Pyrus calleryana :: Ornamental Pear', 'plotsize': 'Width 3ft', 'treeid': '196949'}, {'siteorder': '1', 'xcoord': '5993354.86667', 'location': {'latitude': '37.73839153834396', 'human_address': '{"address": "", "city": "", "state": "", "zip": ""}', 'needs_recoding': False, 'longitude': '-122.4655069999494'}, 'qlegalstatus': 'DPW Maintained', 'ycoord': '2097295.22775', 'planttype': 'Tree', 'dbh': '2', 'qaddress': '9 Yerba Buena Ave', 'latitude': '37.73839153834396', 'qcaretaker': 'Private', 'qsiteinfo':

Pretty Print makes this a bit easier to read:

In [6]:
pp.pprint(data[:3])

[{'dbh': '16',
  'latitude': '37.73636162009325',
  'location': {'human_address': '{"address": "", "city": "", "state": "", '
                                '"zip": ""}',
               'latitude': '37.73636162009325',
               'longitude': '-122.38620200123',
               'needs_recoding': False},
  'longitude': '-122.38620200123',
  'planttype': 'Tree',
  'plotsize': 'Width 3ft',
  'qaddress': '9 Young Ct',
  'qcaretaker': 'Private',
  'qlegalstatus': 'DPW Maintained',
  'qsiteinfo': 'Sidewalk: Curb side : Cutout',
  'qspecies': 'Pyrus calleryana :: Ornamental Pear',
  'siteorder': '1',
  'treeid': '196949',
  'xcoord': '6016267.25355',
  'ycoord': '2096084.36716'},
 {'dbh': '2',
  'latitude': '37.73839153834396',
  'location': {'human_address': '{"address": "", "city": "", "state": "", '
                                '"zip": ""}',
               'latitude': '37.73839153834396',
               'longitude': '-122.4655069999494',
               'needs_recoding': False},
  'l

Pandas makes it easy to work with JSON since it's already so used to working with dictionaries:

In [7]:
pd.DataFrame.from_records(data, columns=['qspecies', 'latitude','longitude']).head()

Unnamed: 0,qspecies,latitude,longitude
0,Pyrus calleryana :: Ornamental Pear,37.73636162009325,-122.38620200123
1,Acer rubrum :: Red Maple,37.73839153834396,-122.4655069999494
2,Acer rubrum :: Red Maple,37.73775178646406,-122.46449593033032
3,Eucalyptus sideroxylon :: Red Ironbark,37.73921894851821,-122.3778693642827
4,Eucalyptus nicholii :: Nichol's Willow-Leafed ...,37.73921894851821,-122.3778693642827


But perhaps it would have been easier to work directly with the JSON output of our response object instead of using `response.text` and `json.loads()`:

In [8]:
pd.DataFrame.from_dict(response.json())[['qspecies', 'latitude','longitude']].head()

Unnamed: 0,qspecies,latitude,longitude
0,Pyrus calleryana :: Ornamental Pear,37.73636162009325,-122.38620200123
1,Acer rubrum :: Red Maple,37.73839153834396,-122.4655069999494
2,Acer rubrum :: Red Maple,37.73775178646406,-122.46449593033032
3,Eucalyptus sideroxylon :: Red Ironbark,37.73921894851821,-122.3778693642827
4,Eucalyptus nicholii :: Nichol's Willow-Leafed ...,37.73921894851821,-122.3778693642827


But perhaps it would have been even _easier_ if we had made our request directly from pandas in the first place:

In [9]:
pd.read_json(endpoint_url)[['qspecies', 'latitude','longitude']].head()

Unnamed: 0,qspecies,latitude,longitude
0,Pyrus calleryana :: Ornamental Pear,37.736362,-122.386202
1,Acer rubrum :: Red Maple,37.738392,-122.465507
2,Acer rubrum :: Red Maple,37.737752,-122.464496
3,Eucalyptus nicholii :: Nichol's Willow-Leafed ...,37.739219,-122.377869
4,Eucalyptus nicholii :: Nichol's Willow-Leafed ...,37.739219,-122.377869


### 3.3. Exercise: Police Stops in San Francisco

Let's examine a second dataset from the San Francisco Open Data Portal for practice: police stops.

Go to the City Open Data Portal and get the url for a JSON request for the Police Stops dataset.  Here is a shortcut to the dataset: https://data.sfgov.org/Public-Safety/Police-Department-Calls-for-Service/hz9m-tj6z

Use the methods we just learned for loading the data and creating a DataFrame. Explore the data using techniques from previous homeworks and lectures. For example, you could generate some summary statistics, or make some charts. What if you made a scatter plot where the x and y axes were the latitude and longitude columns from the traffic stop data, respectively? 

In [10]:
police_stop_url = "https://data.sfgov.org/resource/hz9m-tj6z.json"
response = requests.get(police_stop_url)

In [11]:
response.text

'[{"crime_id":"161080584","original_crimetype_name":"Poss","report_date":"2016-04-17T00:00:00.000","call_date":"2016-04-17T00:00:00.000","offense_date":"2016-04-17T00:00:00.000","call_time":"05:23","call_dttm":"2016-04-17T05:23:00.000","disposition":"NOM","address":"0 Block Of Evelyn Wy","city":"San Francisco","state":"CA","agency_id":"1","address_type":"Premise Address"}\n,{"crime_id":"190123549","original_crimetype_name":"Rep","report_date":"2019-01-12T00:00:00.000","call_date":"2019-01-12T00:00:00.000","offense_date":"2019-01-12T00:00:00.000","call_time":"20:36","call_dttm":"2019-01-12T20:36:00.000","disposition":"HAN","address":"600 Block Of Eddy St","city":"San Francisco","state":"CA","agency_id":"1","address_type":"Premise Address"}\n,{"crime_id":"161100690","original_crimetype_name":"911","report_date":"2016-04-19T00:00:00.000","call_date":"2016-04-19T00:00:00.000","offense_date":"2016-04-19T00:00:00.000","call_time":"07:45","call_dttm":"2016-04-19T07:45:00.000","disposition":"A

In [15]:
df = pd.read_json(police_stop_url)[['crime_id', 'original_crimetype_name','report_date', 'disposition', 'agency_id']]
df.head()

Unnamed: 0,crime_id,original_crimetype_name,report_date,disposition,agency_id
0,161080584,Poss,2016-04-17T00:00:00.000,NOM,1
1,190123549,Rep,2019-01-12T00:00:00.000,HAN,1
2,161100690,911,2016-04-19T00:00:00.000,ADV,1
3,193040567,Traf Violation Cite,2019-10-31T00:00:00.000,GOA,1
4,193041338,Traf Violation Cite,2019-10-31T00:00:00.000,GOA,1


In [25]:
df['count'] = 1
df_by_name = df[['original_crimetype_name', 'count']].groupby('original_crimetype_name').sum().reset_index()
df_by_name.sample(5)

Unnamed: 0,original_crimetype_name,count
60,Caser,1
129,Rep,5
39,911 Drop,2
135,Rz,1
104,Neighbor/594,1


In [26]:
import plotly.express as px
fig = px.bar(df_by_name[df_by_name['count']>=10], x='original_crimetype_name', y='count')
fig.show()

In [31]:
df['year'] = [x[:4] for x in df['report_date']]
df_by_year = df[['year', 'count']].groupby('yebar').sum().reset_index()

In [33]:
fig = px.line(df_by_year, x='year', y='count')
fig.show()

# 4. For next time

1. Create an Mapbox account: https://account.mapbox.com/auth/signup/
2. Generate a Mapbox API token: https://account.mapbox.com/access-tokens

# 5. Questions? 