## Lighthouse Labs
### W01D3 APIs
Instructor: Socorro Dominguez  
September 15, 2021

[Download the notebook](https://downgit.github.io/#/home?url=https://github.com/sedv8808/LighthouseLabs/tree/main/W01D3)

**Agenda:**
* Introductions
* Where does our data come from?
* APIs
    * What is it?
    * Applications
    * Demo
        * Python
        * Postman
        * Terminal

Think about where we get our data from...

- Publicly available datasets (e.g. check out Kaggle)
    - Good for benchmarking, but limited for real use-case
- Company’s database (e.g. transaction history)
    - SQL, MongoDB, etc.
- From the web
    - Collected manually
    - Collected automatically (code)

## How is Data Science related to the Web?

![img](api_img.png)

![img](xml_img.png)

### What if we wanted to collect this data for analysis?

We would need to create a Data Mining program that goes to URLs and parses the HTML to extract data.
- Effortful
    - HTML is difficult to parse
    - Almost all information is irrelevant (e.g. UI-related)
    - Websites often require interaction (e.g. “Load More” button)
    - When websites update, your code will break
    - Every website is different
    - Companies try to stop data miners

## What is an API?

**A**pplication  
**P**rogramming  
**I**nterface  
  
* We will be mostly using RESTful APIs.   

**RE**presentation  
**S**tate  
**T**ransfer  
**C**haracteristics  

We will revisit RESTful APIs in Week 7, so try to get just the main concepts today.

- Programmer-friendly version of websites
- Go to a URL composed of
    - API root endpoint
    - API function
    - API key (like a login)
    - Parameter keys
    - Parameter values
    - Returns data (JSON, XML, csv, etc.)

### Characteristics?

Client-server, typically HTTP-based, stateless server


### Furthermore....

some web site’s provide direct access to their data. For example: Twitter, Translink, Car2Go, Google Maps, Yahoo

* Why would they do this?

* Why would some web sites not do this?

### What representation is DATA found in?

**J**ava**S**cript **O**bject **N**otation (json)


Textual format for structured data  
* [a,b,c] for arrays  
* {‘x’: m, ‘y’: n, ‘z’: o} for objects

### JSON
* textual description of python (javascript actually) objects
* arrays and dictionaries

```
{
'library': [
           {'title': 'For Whom the Bell Tolls', 'author': 'Ernest Hemingway'},
           {'title': 'Trump: The Art of the Deal', 'author': 'Good Question'}
           ]
}
```

### Intro to XML   
• hierarchical description of tagged data  

```
<library>
<book>
<title>
For Whom the Bell Tolls
</title>
<author>
Ernest Hemingway
</author>
</book>
<book>
<title>
Trump: The Art of the Deal
</title>
<author>
Good Question
</author>
</book>
</library>
```

### Using a Web API

Provider defines:
* message format for requests and responses
* usually in both XML and JSON
* registration and authentication
* usually using OAuth (delegated authorization framework for REST/APIs. It enables apps to obtain limited access to a user's data without giving away a user's password.)

Language integration
* might be provided or you might have to do it yourself
* if provided, usually someone other than data source
* library API for various languages like python
* you write a python program that calls library procedures
* library formats messages, sends them to web provider, translates responses as return values

### Getting JSON Data

We need to select the output format using API:
* e.g., http header: accept = application/json


View in browser or Postman
* good for exploration / debugging

Use request .get
* this returns a python array or dictionary

Get a string and parse
* import json
* x = json .loads(aJSONString)

## Demo with Translink and Github

Get your own API token from developer.translink.ca !  
 ``` Close slideshow mode```

Hint: If you are on Chrome, use the JSON formatter extension.

In [1]:
import requests
import config as cfg

# Get your own API token from developer.translink.ca

apikey = cfg.translink['key']

#In the Walkthrough, you will learn how to hide your API keys using Environmental Variables.

requests.get('http://api.translink.ca/rttiapi/v1/stops/61935/estimates?apikey={}'.format(apikey),headers={'accept': 'application/JSON'}).json()



[{'RouteNo': '099',
  'RouteName': 'COMMERCIAL-BROADWAY/UBC (B-LINE)',
  'Direction': 'EAST',
  'RouteMap': {'Href': 'https://nb.translink.ca/geodata/099.kmz'},
  'Schedules': [{'Pattern': 'E8FL2',
    'Destination': 'TO BOUNDARY B-LINE',
    'ExpectedLeaveTime': '9:35am',
    'ExpectedCountdown': 0,
    'ScheduleStatus': ' ',
    'CancelledTrip': False,
    'CancelledStop': False,
    'AddedTrip': False,
    'AddedStop': False,
    'LastUpdate': '09:16:41 am'},
   {'Pattern': 'E1',
    'Destination': "COMM'L-BDWAY STN",
    'ExpectedLeaveTime': '9:36am',
    'ExpectedCountdown': 1,
    'ScheduleStatus': '*',
    'CancelledTrip': False,
    'CancelledStop': False,
    'AddedTrip': False,
    'AddedStop': False,
    'LastUpdate': '08:36:02 am'},
   {'Pattern': 'E1',
    'Destination': "COMM'L-BDWAY STN",
    'ExpectedLeaveTime': '9:40am',
    'ExpectedCountdown': 5,
    'ScheduleStatus': '*',
    'CancelledTrip': False,
    'CancelledStop': False,
    'AddedTrip': False,
    'AddedStop'

In [2]:
from IPython.display import JSON
request_url = 'https://api.translink.ca/rttiapi/v1/stops?apikey={}&lat={}&long={}'.format(apikey, 49.18, -122.85)
response = requests.get(request_url, headers={'accept': 'application/JSON'}).json()
JSON(response)



<IPython.core.display.JSON object>

In [3]:
response

[{'StopNo': 54997,
  'Name': 'SB KING GEORGE BLVD FS 98 AVE',
  'BayNo': 'N',
  'City': 'SURREY',
  'OnStreet': 'KING GEORGE BLVD',
  'AtStreet': '98 AVE',
  'Latitude': 49.179601,
  'Longitude': -122.845814,
  'WheelchairAccess': 1,
  'Distance': 307,
  'Routes': '314, 321, 329'},
 {'StopNo': 54986,
  'Name': 'EB 96 AVE FS 134 ST',
  'BayNo': 'N',
  'City': 'SURREY',
  'OnStreet': '96 AVE',
  'AtStreet': '134 ST',
  'Latitude': 49.176992,
  'Longitude': -122.850709,
  'WheelchairAccess': 0,
  'Distance': 338,
  'Routes': '314, 329'},
 {'StopNo': 54999,
  'Name': 'WB 96 AVE FS 134 ST',
  'BayNo': 'N',
  'City': 'SURREY',
  'OnStreet': '96 AVE',
  'AtStreet': '134 ST',
  'Latitude': 49.177159,
  'Longitude': -122.851794,
  'WheelchairAccess': 0,
  'Distance': 342,
  'Routes': '314, 329'},
 {'StopNo': 54989,
  'Name': 'NB KING GEORGE BLVD FS FRASER HWY',
  'BayNo': 'N',
  'City': 'SURREY',
  'OnStreet': 'KING GEORGE BLVD',
  'AtStreet': 'FRASER HWY',
  'Latitude': 49.181116,
  'Longitude

In [4]:
len(response)

13

In [5]:
new_list =[]
for i in range(0, len(response)-1):
    r = response[i]['StopNo']
    new_list.append(r)

In [6]:
new_list

[54997,
 54986,
 54999,
 54989,
 54998,
 54988,
 59339,
 54996,
 57983,
 55665,
 55669,
 54990]

In [7]:
new_list = []
for i in range(0, len(response)):
    new_list.append(response[i]['Name'])

In [8]:
new_list

['SB KING GEORGE BLVD FS 98 AVE',
 'EB 96 AVE FS 134 ST',
 'WB 96 AVE FS 134 ST',
 'NB KING GEORGE BLVD FS FRASER HWY',
 'WB 96 AVE FS KING GEORGE BLVD',
 'NB KING GEORGE BLVD FS 96 AVE',
 'KING GEORGE STN BAY 4',
 'KING GEORGE STN BAY 1     ',
 'KING GEORGE STATION PLATFORM 2',
 'NB 132 ST NS 98 AVE',
 'SB 132 ST FS 98 AVE',
 'KING GEORGE STN BAY 2',
 'KING GEORGE STATION PLATFORM 1']

In [9]:
response[0]['Name']

'SB KING GEORGE BLVD FS 98 AVE'

In [10]:
data = requests.get('https://api.github.com/users/sedv8808',headers={'accept': 'application/JSON'}).json()

In [11]:
data

{'login': 'sedv8808',
 'id': 42593920,
 'node_id': 'MDQ6VXNlcjQyNTkzOTIw',
 'avatar_url': 'https://avatars.githubusercontent.com/u/42593920?v=4',
 'gravatar_id': '',
 'url': 'https://api.github.com/users/sedv8808',
 'html_url': 'https://github.com/sedv8808',
 'followers_url': 'https://api.github.com/users/sedv8808/followers',
 'following_url': 'https://api.github.com/users/sedv8808/following{/other_user}',
 'gists_url': 'https://api.github.com/users/sedv8808/gists{/gist_id}',
 'starred_url': 'https://api.github.com/users/sedv8808/starred{/owner}{/repo}',
 'subscriptions_url': 'https://api.github.com/users/sedv8808/subscriptions',
 'organizations_url': 'https://api.github.com/users/sedv8808/orgs',
 'repos_url': 'https://api.github.com/users/sedv8808/repos',
 'events_url': 'https://api.github.com/users/sedv8808/events{/privacy}',
 'received_events_url': 'https://api.github.com/users/sedv8808/received_events',
 'type': 'User',
 'site_admin': False,
 'name': 'Socorro Dominguez Vidana',
 'c

## Example on Getting a README from GitHub

In [12]:
import base64

def get_readme(url, token):
    '''Document your function'''
    url_to_api_endpoint = url.replace('https://github.com/', '')
    new_url = 'https://api.github.com/repos/' + url_to_api_endpoint + '/contents/README.md'
    headers = {'Authorization': f'token {token}', 'accept': 'application/JSON'}
    
    try:
        readme = requests.get(new_url, headers=headers).json()
        readme = readme['content']
        readme = base64.b64decode(readme)
    except:
        readme = "Missing"

    return readme

In [13]:
url = 'https://github.com/chrismheiser/lipdnet'
token = cfg.github_api['secret']

In [14]:
get_readme(url = url, token=token)

b'<h1 align="left">\n  <br>\n  <a href="http://www.lipd.net"><img src="https://www.dropbox.com/s/kgeyec2b8cft5mo/lipd4.png?raw=1" alt="LiPD" width="225"></a>\n</h1>\n\n<p align="left">\n      <a href="https://zenodo.org/badge/latestdoi/25707644"><img src="https://zenodo.org/badge/25707644.svg" alt="DOI"></a>\n      <a href="https://img.shields.io/badge/license-GPL-brightgreen.svg"><img src="https://img.shields.io/badge/license-GPL-brightgreen.svg"></a>\n</p>\n\nInput/output and manipulation utilities for LiPD files on LiPD.net\n\n-----\n\n### What is it?\n\nLiPD is short for Linked PaleoData. LiPD files are the data standard for storing and exchanging data amongst paleoclimate scientists. The package will help you convert your existing paleoclimate observations into LiPD files that can be shared and analyzed.\n\nOrganizing and using your observation data can be time  consuming. Our goal is to let you focus on more important tasks than data wrangling.\n\n-----\n\n\n## How to Cite this c

Demo in Postman.

Demo in Terminal uCURL

```
curl https://api.github.com/users/sedv8808
```

Viewing in Chrome

```
https://api.github.com/users/sedv8808
```

In [15]:
!curl https://api.github.com/users/sedv8808

{
  "login": "sedv8808",
  "id": 42593920,
  "node_id": "MDQ6VXNlcjQyNTkzOTIw",
  "avatar_url": "https://avatars.githubusercontent.com/u/42593920?v=4",
  "gravatar_id": "",
  "url": "https://api.github.com/users/sedv8808",
  "html_url": "https://github.com/sedv8808",
  "followers_url": "https://api.github.com/users/sedv8808/followers",
  "following_url": "https://api.github.com/users/sedv8808/following{/other_user}",
  "gists_url": "https://api.github.com/users/sedv8808/gists{/gist_id}",
  "starred_url": "https://api.github.com/users/sedv8808/starred{/owner}{/repo}",
  "subscriptions_url": "https://api.github.com/users/sedv8808/subscriptions",
  "organizations_url": "https://api.github.com/users/sedv8808/orgs",
  "repos_url": "https://api.github.com/users/sedv8808/repos",
  "events_url": "https://api.github.com/users/sedv8808/events{/privacy}",
  "received_events_url": "https://api.github.com/users/sedv8808/received_events",
  "type": "User",
  "site_admin": false,
 

You should not forget:
- Today we just played aimlessly with APIs, however all APIs have documentation. Read it!


## HTTP Requests
- Hypertext Transfer Protocol

- When you access a website (through an URL), you are:
    - "sending a HTTP GET request to the server to retrieve data"
    - "data" can be a webpage that is displayed, it can be JSON


- When you access a website, you know it worked if it loaded
    - Status codes are helpful when you're working with code

- Common HTTP status codes:
    - 200 OK
    - 400 Bad Request
    - 401 Unauthorized
    - 404 Not Found

In [16]:
requests.get('https://api.github.com/users/sedv8808')

<Response [200]>

In [17]:
requests.get('https://api.translink.ca/rttiapi/v1/stops?lat=49.18&long=-122.85')

<Response [500]>

### The Anatomy Of A Request

It’s important to know that a request is made up of four things:

1. The endpoint

2. The method

3. The headers

4. The data (or body)

1. The endpoint (or route) is the url you request for

root-endpoint/?

https://api.github.com

2. The Method is the type of request you send to the server. You can choose from these types below:

a. GET - Used to get resource from server

b. POST - Used to create new resource on server

c. PUT/PATCH - update resource on server

d. DELETE - delete a resource on the server