## A lot of websites support REST API to query data, and in your daily job you will use it very often. RESTful API = REpresentational State Transfer. 

Key points of REST API: 

1. Client-Server
2. Stateless
3. Cacheable
4. Uniform-interface
5. Layered System
6. Code on demond

## REST API is a preferred way to pull data from websites using crawlers
1. You can get structured data directly, and easily parse them into dataframe
2. It's much more stable than a web crawler, website can update front page frequently but will always keep the API query the same syntax/format, or have compatibility support -- a lot of apps depend on the reliability of their API service
3. It's more cost effective to both you and the web server
4. It allows to query some data unseen from web page e.g. stock price at tick level

## REST API consist of: a base URL + standard HTTP method (Get, Put, Post, Delete) + a media type (mostly JSON or XML)

In python, we can use 
1. requests to send/receive HTTP request from client side
2. json to serialize and deserialize data 
3. flask to build up web server to response to HTTP request

Let's dive in with an example from Yelp.

Go to https://www.yelp.com/developers/documentation/v3/business_search (you may need to sign up Yelp developer's account to get access to these documentations)
Read through this page, and understand
1. What HTTP method should be used?
2. What is the basic endpoint?
3. What is the format of the response body?
4. Can you figure out how to search for restaurants within 10 miles of zip code 94583?

Now go to https://www.yelp.com/developers/v3/manage_app  to add an app to your developer's account, get your client ID and app key, which will be used later. The default daily request limit is 5000.

In [1]:
#pip install requests if you don't already have it
import requests
#test if your request is working fine, you should be getting a status code of 200 from below command
response = requests.get('https://api.github.com')
print(response)

<Response [200]>


Some common HTTP response code:
200: Everything OK
400: Bad request, check your request syntax
401/403: Unathorized, you may need permission
404: Page not found, you know it
405: Bad method
500: Internal Server Error, check your server code

A HTTP request example:

POST /cgi-bin/process.cgi HTTP/1.1
User-Agent: Mozilla/4.0 (compatible; MSIE5.01; Windows NT)
Host: www.tutorialspoint.com
Content-Type: application/x-www-form-urlencoded
Content-Length: length
Accept-Language: en-us
Accept-Encoding: gzip, deflate
Connection: Keep-Alive

licenseID=string&content=string&/paramsXML=string

In [19]:
#let's try to send GET request to Yelp, replace the API_Key with yours
API_Key = 'aAJnW9n-0Pqb_as_v2pRKM4hMsLBgE5m4abHIRJRRPKDWGVJ12ZLKaM-2zm1CBn64klKd0gaVUuSI18f4dGj5VBWpdzSQXHmC56mjTMhE8-st3s3PYhtxWMKSP1yXnYx'
url = 'https://api.yelp.com/v3/businesses/search'
params = {
    'term':'Restaurants',
    'location':'11357',
    'radius':'16000',
    'limit':'50'
}
headers = {
    'Authorization': 'Bearer %s' % API_Key,
    'Content-Type':'application/json'
}
#the browser needs full url to work, requests helps you to automatically format those urls
#e.g. https://www.yelp.com/search?find_desc=Restaurants&find_loc=San+Ramon%2C+CA&ns=1
response = requests.get(url, params=params, headers=headers, json=None) 
#GET request usually don't send data to server, so json=None. If you need to POST data, you can use json={'key':'value'}

In [20]:
print(response)

<Response [200]>


In [21]:
# to view the contents of the response
response.content

b'{"businesses": [{"id": "cegA4jf16vEt7NCQ7cpu2w", "alias": "kind-fresh-meadows", "name": "Kin\'d", "image_url": "https://s3-media2.fl.yelpcdn.com/bphoto/75GptlVgTc9HqZoyHdu8NQ/o.jpg", "is_closed": false, "url": "https://www.yelp.com/biz/kind-fresh-meadows?adjust_creative=Eoyi_YSkAfM5aeQla0x-3A&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=Eoyi_YSkAfM5aeQla0x-3A", "review_count": 924, "categories": [{"alias": "thai", "title": "Thai"}], "rating": 4.5, "coordinates": {"latitude": 40.73031, "longitude": -73.77814}, "transactions": ["delivery", "pickup"], "price": "$$", "location": {"address1": "192-03 Union Tpke", "address2": "", "address3": null, "city": "Fresh Meadows", "zip_code": "11366", "country": "US", "state": "NY", "display_address": ["192-03 Union Tpke", "Fresh Meadows, NY 11366"]}, "phone": "+17184680888", "display_phone": "(718) 468-0888", "distance": 6933.974525491427}, {"id": "xR2Yeb_Nh0MOAiUIaqaVOg", "alias": "tavern-157-flushing-2", "name": "Tavern 

In [22]:
#other properties and methods of response object, a full list of available properties and methods can be found here
#https://www.w3schools.com/python/ref_requests_response.asp
print('cookies is ' + str(response.cookies) + '\n')
print('headers is ' + str(response.headers) + '\n')
print('response body is ' + str(response.text))

cookies is <RequestsCookieJar[]>

headers is {'Connection': 'keep-alive', 'content-type': 'application/json', 'ratelimit-remaining': '4998', 'server': 'envoy', 'x-b3-sampled': '0', 'x-routing-service': 'routing-main--uswest2-6c64c7fb6d-gdvhh; site=public_api_v3', 'x-zipkin-id': 'd51694e6ea19023a', 'ratelimit-resettime': '2022-02-18T00:00:00+00:00', 'ratelimit-dailylimit': '5000', 'x-cloudmap': 'routing_uswest2', 'content-encoding': 'gzip', 'x-proxied': '10-69-126-17-uswest2aprod', 'x-extlb': '10-69-126-17-uswest2aprod', 'cache-control': 'max-age=0, no-store, private, no-transform', 'Accept-Ranges': 'bytes', 'Date': 'Thu, 17 Feb 2022 06:55:57 GMT', 'Via': '1.1 varnish', 'X-Served-By': 'cache-lax10623-LGB', 'X-Cache': 'MISS', 'X-Cache-Hits': '0', 'Vary': 'Accept-Encoding', 'transfer-encoding': 'chunked'}

response body is {"businesses": [{"id": "cegA4jf16vEt7NCQ7cpu2w", "alias": "kind-fresh-meadows", "name": "Kin'd", "image_url": "https://s3-media2.fl.yelpcdn.com/bphoto/75GptlVgTc9HqZoyH

In [23]:
#response.json() can automatically parse json text into python dict
#or you can also use json package to parse the text
response.json()

{'businesses': [{'alias': 'kind-fresh-meadows',
   'categories': [{'alias': 'thai', 'title': 'Thai'}],
   'coordinates': {'latitude': 40.73031, 'longitude': -73.77814},
   'display_phone': '(718) 468-0888',
   'distance': 6933.974525491427,
   'id': 'cegA4jf16vEt7NCQ7cpu2w',
   'image_url': 'https://s3-media2.fl.yelpcdn.com/bphoto/75GptlVgTc9HqZoyHdu8NQ/o.jpg',
   'is_closed': False,
   'location': {'address1': '192-03 Union Tpke',
    'address2': '',
    'address3': None,
    'city': 'Fresh Meadows',
    'country': 'US',
    'display_address': ['192-03 Union Tpke', 'Fresh Meadows, NY 11366'],
    'state': 'NY',
    'zip_code': '11366'},
   'name': "Kin'd",
   'phone': '+17184680888',
   'price': '$$',
   'rating': 4.5,
   'review_count': 924,
   'transactions': ['delivery', 'pickup'],
   'url': 'https://www.yelp.com/biz/kind-fresh-meadows?adjust_creative=Eoyi_YSkAfM5aeQla0x-3A&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=Eoyi_YSkAfM5aeQla0x-3A'},
  {'alias': '

In [24]:
type(response.json())

dict

In [25]:
#let's take a look at what information you can get
print(response.json().keys())
#And you can easily load it into a dataframe
import pandas as pd
businesses = response.json()['businesses']
df = pd.DataFrame(businesses)
df.head()

dict_keys(['businesses', 'total', 'region'])


Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,transactions,price,location,phone,display_phone,distance
0,cegA4jf16vEt7NCQ7cpu2w,kind-fresh-meadows,Kin'd,https://s3-media2.fl.yelpcdn.com/bphoto/75Gptl...,False,https://www.yelp.com/biz/kind-fresh-meadows?ad...,924,"[{'alias': 'thai', 'title': 'Thai'}]",4.5,"{'latitude': 40.73031, 'longitude': -73.77814}","[delivery, pickup]",$$,"{'address1': '192-03 Union Tpke', 'address2': ...",17184680888,(718) 468-0888,6933.974525
1,xR2Yeb_Nh0MOAiUIaqaVOg,tavern-157-flushing-2,Tavern 157,https://s3-media1.fl.yelpcdn.com/bphoto/DzgKsJ...,False,https://www.yelp.com/biz/tavern-157-flushing-2...,466,"[{'alias': 'wine_bars', 'title': 'Wine Bars'},...",4.5,"{'latitude': 40.7634836392041, 'longitude': -7...","[delivery, restaurant_reservation, pickup]",$$,"{'address1': '157-12 Northern Blvd', 'address2...",17188867488,(718) 886-7488,2677.000863
2,03RL_VcRGXuICl2R6Luu_Q,the-waterfront-nyc-bronx,The Waterfront NYC,https://s3-media2.fl.yelpcdn.com/bphoto/zV8E_U...,False,https://www.yelp.com/biz/the-waterfront-nyc-br...,37,"[{'alias': 'newamerican', 'title': 'American (...",4.0,"{'latitude': 40.8112556, 'longitude': -73.8352...",[delivery],,"{'address1': '500 Hutchinson River Pkwy', 'add...",17184141577,(718) 414-1577,3232.946456
3,2f-qNpdVJl2h9L6SHYuICA,havana-cafe-bronx,Havana Cafe,https://s3-media2.fl.yelpcdn.com/bphoto/6oza5f...,False,https://www.yelp.com/biz/havana-cafe-bronx?adj...,888,"[{'alias': 'cuban', 'title': 'Cuban'}, {'alias...",4.0,"{'latitude': 40.8379, 'longitude': -73.83437}","[delivery, pickup]",$$,"{'address1': '3151 E Tremont Ave', 'address2':...",17185181800,(718) 518-1800,5934.899329
4,ABHLJmxkaPY4kOjJGGudig,mister-seoul-bayside,Mister Seoul,https://s3-media4.fl.yelpcdn.com/bphoto/5uN9XC...,False,https://www.yelp.com/biz/mister-seoul-bayside?...,44,"[{'alias': 'korean', 'title': 'Korean'}]",4.5,"{'latitude': 40.7649048, 'longitude': -73.7714...","[delivery, restaurant_reservation, pickup]",,"{'address1': '39-35 Bell Blvd', 'address2': No...",13475027103,(347) 502-7103,4226.050294


In [26]:
df.shape

(50, 16)

In [27]:
file = "sample.csv"
df.to_csv(file)

## Sometimes you need to send data to server, let's take a look at json

JSON stands for JavaScript Object Notation, it's a string representing serielized object/data, contains only data and can easily parse between all kinds of popular data format, e.g. dict, dataframe, csv file, and can be used across all languages.

You will encounter JSON objects and JSON Arrays. e.g.
{
"name":"John",
"age":30,
"cars":[ "Ford", "BMW", "Fiat" ]
}

Conversion table between JSON and Python types:

| JSON | Python |
| --- | --- |
| object | dict |
| array | list |
| string | str |
| number(int/real) | int/float |
| true/false | True/False |

In [28]:
import json

cars = {
    "name":"John",
    "age":"30",
    "cars":["Ford","BMW","Fiat"]
}

print('type of cars is ' + str(type(cars)))

cars_json = json.dumps(cars)
print('type of cars_json is ' + str(type(cars_json)))

cars_from_json = json.loads(cars_json)
print('type of cars_from_json is ' + str(type(cars_from_json)))

type of cars is <class 'dict'>
type of cars_json is <class 'str'>
type of cars_from_json is <class 'dict'>


In [29]:
# you can easily load json string into dataframe also

df_car = pd.read_json(cars_json, orient='columns')
df_car.head()

Unnamed: 0,name,age,cars
0,John,30,Ford
1,John,30,BMW
2,John,30,Fiat


In [None]:
## Homework: use github api to create a new repo for yourself