# JSON and APIs

_September 22, 2020_

Agenda today:
- Introduction to API and Remote Server Model 
- Getting data through an API: Case study with YELP API

In [11]:
import pandas as pd
import numpy as np
import requests
import json
#from yelp.client import Client
import matplotlib.pyplot as plt
plt.style.use('seaborn')

## Part I. APIs and Remote Server Model
API stands for Application Programming Interface. At some point or the other, large companies would build API for their products for their clients or internal use. It allows the company's application to communicate with another application. But what _exactly_ is an API?

#### Remote server 
When we think about the world of Web, we can think of it as a collection of _servers_. And servers are nothing but huge computers that store a huge amount of data from users and are optimized to process requests. For example, when you type in www.facebook.com, your browser sends a _request_ to the Facebook server, and gets a response from the server, thus interpreting the code and displaying your homepage. 

In this case, your browser is the _client_, and Facebook’s server is an API. To put it broadly, whenever you visit a website, you are interacting with its API. However, an API isn’t the same as the remote server — rather it is the part of the server that receives __requests__ and sends __responses__.

<img src='status-code.png' width = 500>

## Part II. Getting Data Through APIs

The `get()` method send a request to YELP's API, and stored information in a variable called `request`. Next, let's see if it's successful. 

#### YELP API
Sometimes you need _authentication_ to get data from a service in additional to just sending a `GET()` request. Yelp API is a perfect example. 

You will need to go to the YELP's developer's [website](https://www.yelp.com/developers/v3/manage_app) and request for a client ID and API key, which function like a key into a house of data. 

<img src='yelp.png' width = 500>

In [12]:
# lets try to get some data from yelp!
url = 'https://api.yelp.com/v3/businesses/search'
response = requests.get(url)

In [13]:
# check the status code
response.status_code

# what happened here?

400

In [41]:
# now we are ready to get our data 

# usually, services would limit you to a certain amount of API calls. This varies from service
# to service, so you have to watch out to it 

MY_API_KEY = "NxtMiHhoo60686e4ETOSC5oiZuGGh75pCHk_I-o4aGeYh09Ad2k8zZvKgtcvqpwP_Z5IqsnoMxQno0-3sRu47D5V1uVP9uJ1UnlJeE-qfqbLokxZ2mbeWbj-8w5qX3Yx"


term = 'burgers'
location = 'Brooklyn'
SEARCH_LIMIT = 50

url = 'https://api.yelp.com/v3/businesses/search'

headers = {
        'Authorization': 'Bearer {}'.format(MY_API_KEY),
    }

url_params = {
                'term': term.replace(' ', '+'),
                'location': location.replace(' ', '+'),
                'limit': SEARCH_LIMIT
            }
response = requests.get(url, headers=headers, params=url_params)

In [42]:
# examine the response object

print(response)


<Response [200]>


In [43]:
# how are we going to parse the response.text object?

print(response.text)

{"businesses": [{"id": "-0bdnX762vdTb9lI00etvA", "alias": "brooklyn-burgers-and-beer-brooklyn", "name": "Brooklyn Burgers & Beer", "image_url": "https://s3-media3.fl.yelpcdn.com/bphoto/Xt6uxV5pTXQ_c1gdJytrYA/o.jpg", "is_closed": false, "url": "https://www.yelp.com/biz/brooklyn-burgers-and-beer-brooklyn?adjust_creative=kCWh97FC3e0Y7Emck3mGaw&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=kCWh97FC3e0Y7Emck3mGaw", "review_count": 346, "categories": [{"alias": "burgers", "title": "Burgers"}, {"alias": "beerbar", "title": "Beer Bar"}], "rating": 4.0, "coordinates": {"latitude": 40.6743507, "longitude": -73.9815826}, "transactions": ["delivery", "pickup"], "price": "$$", "location": {"address1": "259 5th Ave", "address2": null, "address3": "", "city": "Brooklyn", "zip_code": "11215", "country": "US", "state": "NY", "display_address": ["259 5th Ave", "Brooklyn, NY 11215"]}, "phone": "+17187881458", "display_phone": "(718) 788-1458", "distance": 4331.401461075489}, {"id"

In [66]:
# working with JSON

burgers = response.text
burgers = json.loads(burgers)

In [67]:
# cleaning and exploring the data
for key in burgers.keys():
    print(key)

businesses
total
region


In [68]:
burgers["region"]

{'center': {'longitude': -73.93936157226562, 'latitude': 40.652330148320374}}

In [69]:
burgers['businesses'][0]

{'id': '-0bdnX762vdTb9lI00etvA',
 'alias': 'brooklyn-burgers-and-beer-brooklyn',
 'name': 'Brooklyn Burgers & Beer',
 'image_url': 'https://s3-media3.fl.yelpcdn.com/bphoto/Xt6uxV5pTXQ_c1gdJytrYA/o.jpg',
 'is_closed': False,
 'url': 'https://www.yelp.com/biz/brooklyn-burgers-and-beer-brooklyn?adjust_creative=kCWh97FC3e0Y7Emck3mGaw&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=kCWh97FC3e0Y7Emck3mGaw',
 'review_count': 346,
 'categories': [{'alias': 'burgers', 'title': 'Burgers'},
  {'alias': 'beerbar', 'title': 'Beer Bar'}],
 'rating': 4.0,
 'coordinates': {'latitude': 40.6743507, 'longitude': -73.9815826},
 'transactions': ['delivery', 'pickup'],
 'price': '$$',
 'location': {'address1': '259 5th Ave',
  'address2': None,
  'address3': '',
  'city': 'Brooklyn',
  'zip_code': '11215',
  'country': 'US',
  'state': 'NY',
  'display_address': ['259 5th Ave', 'Brooklyn, NY 11215']},
 'phone': '+17187881458',
 'display_phone': '(718) 788-1458',
 'distance': 4331.40146

In [70]:
burgers_df.head()

Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,transactions,price,location,phone,display_phone,distance
0,-0bdnX762vdTb9lI00etvA,brooklyn-burgers-and-beer-brooklyn,Brooklyn Burgers & Beer,https://s3-media3.fl.yelpcdn.com/bphoto/Xt6uxV...,False,https://www.yelp.com/biz/brooklyn-burgers-and-...,346,"[{'alias': 'burgers', 'title': 'Burgers'}, {'a...",4.0,"{'latitude': 40.6743507, 'longitude': -73.9815...","[delivery, pickup]",$$,"{'address1': '259 5th Ave', 'address2': None, ...",17187881458,(718) 788-1458,4331.401461
1,xiEBdZ8z5KY21Vw2pY2fJQ,two-8-two-bar-and-burger-brooklyn,Two 8 Two Bar & Burger,https://s3-media1.fl.yelpcdn.com/bphoto/Bk_qD2...,False,https://www.yelp.com/biz/two-8-two-bar-and-bur...,657,"[{'alias': 'burgers', 'title': 'Burgers'}, {'a...",4.0,"{'latitude': 40.688536, 'longitude': -73.989728}","[delivery, pickup]",$$,"{'address1': '282 Atlantic Ave', 'address2': '...",17185962282,(718) 596-2282,5853.318089
2,wYgp-defqwJPhjC6Y_WKWg,burgerfi-brooklyn-2,BurgerFi,https://s3-media2.fl.yelpcdn.com/bphoto/U56T0L...,False,https://www.yelp.com/biz/burgerfi-brooklyn-2?a...,299,"[{'alias': 'hotdog', 'title': 'Hot Dogs'}, {'a...",4.0,"{'latitude': 40.61883, 'longitude': -74.02131}","[delivery, pickup]",$$,"{'address1': '719 86th St', 'address2': '', 'a...",17188360836,(718) 836-0836,7856.266094
3,dkAj-3gmkvdA4XkJmw6hCw,juanchis-burger-brooklyn,Juanchi's Burger,https://s3-media4.fl.yelpcdn.com/bphoto/ZgbWUB...,False,https://www.yelp.com/biz/juanchis-burger-brook...,449,"[{'alias': 'gastropubs', 'title': 'Gastropubs'...",4.5,"{'latitude': 40.713003, 'longitude': -73.958863}","[delivery, pickup]",$$,"{'address1': '225 S 1st St', 'address2': None,...",19292950147,(929) 295-0147,6945.733809
4,M6z8nxPfoM9si1LIb8sxig,burger-urway-park-slope-brooklyn,BURGER URWAY PARK SLOPE,https://s3-media1.fl.yelpcdn.com/bphoto/lwp8cg...,False,https://www.yelp.com/biz/burger-urway-park-slo...,1,"[{'alias': 'burgers', 'title': 'Burgers'}, {'a...",5.0,"{'latitude': 40.67037, 'longitude': -73.98502}","[delivery, pickup]",,"{'address1': '339 7th St', 'address2': None, '...",13474220139,(347) 422-0139,4335.748177


In [48]:
# turn the relevant dataset into a dataframe

burgers_df = pd.DataFrame.from_dict(burgers["businesses"])
burgers_df.isnull().sum()

id                0
alias             0
name              0
image_url         0
is_closed         0
url               0
review_count      0
categories        0
rating            0
coordinates       0
transactions      0
price            15
location          0
phone             0
display_phone     0
distance          0
dtype: int64

In [49]:
# you can do some analysis and visualization from here on! 

# visualize the review count - what's the appropriate plot?
burgers_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 16 columns):
id               50 non-null object
alias            50 non-null object
name             50 non-null object
image_url        50 non-null object
is_closed        50 non-null bool
url              50 non-null object
review_count     50 non-null int64
categories       50 non-null object
rating           50 non-null float64
coordinates      50 non-null object
transactions     50 non-null object
price            35 non-null object
location         50 non-null object
phone            50 non-null object
display_phone    50 non-null object
distance         50 non-null float64
dtypes: bool(1), float64(2), int64(1), object(12)
memory usage: 6.0+ KB


In [78]:
# query the name of the burgers place with the highest review
reviews_300more = burgers_df["name"][burgers_df["review_count"] > 300] 

In [79]:
reviews_300more

0       Brooklyn Burgers & Beer
1        Two 8 Two Bar & Burger
3              Juanchi's Burger
13    My House Burgers & Shakes
14               Bonnie's Grill
15             Dutch Boy Burger
19                  Smashburger
20                  Shake Shack
28                   Bareburger
29         Emily - West Village
33                   Moo Burger
34             Black Tap - Soho
36               Brennan & Carr
37                   Bareburger
38                        Emily
39                    Au Cheval
40                      BK JANI
42                  Shake Shack
44             Peaches HotHouse
45                        Korzo
47      Emmy Squared - Brooklyn
49                   Skinflints
Name: name, dtype: object

In [None]:
# migrate the cleaned data into a sql db

In [None]:
# can you do some other queries using sql/pandas?

#### Resources
- [Getting Data from Reddit API](https://www.storybench.org/how-to-scrape-reddit-with-python/)
- [Twitch API](https://dev.twitch.tv/docs)