# Lecture 8 Otaining Data from API's
__Math 3080: Fundamentals of Data Science__

Reading:
* [McKinney, *Python for Data Science*, Chapter 6](https://wesmckinney.com/book/accessing-data#io_web_apis)

Class notes are found through GitHub. As changes are made, they will automatically be uploaded to GitHub. A link to the repository is on Canvas.

-----
## Outline

Some live data can be obtained through web scraping. However, the best archives of data are stored in databases and can be retreived through an __Application Programming Interface (API)__. APIs are commonly used in application software, such as smartphone apps.

To get data from an API in Python, we need the `requests` package.

In [2]:
import pandas as pd
import requests 

In order to access an API, we generally need an *authorization key*. These are often available on webpages under the "Developers" link at the bottom of the page.
* Example: [Yelp business search](https://www.yelp.com)
  * Select "Developers" link on the bottom of the page
  * If needed, enroll in an account
  * Select "Manage API Access"
  * Create an app
  * After the app is created, you will see an API key at the top of the page

Every API needs documentation, which generally includes examples of how to use the system. For Yelp, find the documentation here:
* [Getting Started with the Yelp Fusion API](https://docs.developer.yelp.com/docs/fusion-intro)

In [3]:
api_url = "https://api.yelp.com/v3/businesses/search"

authorization = {
  'Authorization': 'Bearer hZU3WOBIK3jklJqIzew0uDFK_vjSYmoKToQQejrQuceKPGu8SF6M_-SuAT7asN6RNldA_kZvQGrE-3vh-RuQxHxRNUUKkHeRk03p_RLCQcO6ZZvHKMHoR5sEh7f3Y3Yx'
}

search_parameters = {
    'term': 'restaurants',
    'location': 'Ephraim, UT',
    'radius' : 15000,
    'limit' : 50
}

In [4]:
response = requests.get(api_url, headers=authorization, params=search_parameters)

# Generally a good idea to check for HTTP errors
response.raise_for_status()

# What did we get?
response.text

'{"businesses": [{"id": "vOsSWOhz25b7OcHtClmg7w", "alias": "roots-89-roadhouse-ephraim", "name": "Roots 89 Roadhouse", "image_url": "https://s3-media0.fl.yelpcdn.com/bphoto/2Qm6nUg4ThhhrjIjkr_zqw/o.jpg", "is_closed": false, "url": "https://www.yelp.com/biz/roots-89-roadhouse-ephraim?adjust_creative=rAfNf1xUCz-EbUgzFpdQJg&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=rAfNf1xUCz-EbUgzFpdQJg", "review_count": 22, "categories": [{"alias": "coffee", "title": "Coffee & Tea"}, {"alias": "burgers", "title": "Burgers"}, {"alias": "bbq", "title": "Barbeque"}], "rating": 4.3, "coordinates": {"latitude": 39.3662481507577, "longitude": -111.58679385326128}, "transactions": [], "location": {"address1": "350 N Main St", "address2": "", "address3": null, "city": "Ephraim", "zip_code": "84627", "country": "US", "state": "UT", "display_address": ["350 N Main St", "Ephraim, UT 84627"]}, "phone": "+14352838889", "display_phone": "(435) 283-8889", "distance": 1038.7391872988767}, {"

The results are in *json* format. So, we need to extract the data.

In [5]:
data = response.json()
data

{'businesses': [{'id': 'vOsSWOhz25b7OcHtClmg7w',
   'alias': 'roots-89-roadhouse-ephraim',
   'name': 'Roots 89 Roadhouse',
   'image_url': 'https://s3-media0.fl.yelpcdn.com/bphoto/2Qm6nUg4ThhhrjIjkr_zqw/o.jpg',
   'is_closed': False,
   'url': 'https://www.yelp.com/biz/roots-89-roadhouse-ephraim?adjust_creative=rAfNf1xUCz-EbUgzFpdQJg&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=rAfNf1xUCz-EbUgzFpdQJg',
   'review_count': 22,
   'categories': [{'alias': 'coffee', 'title': 'Coffee & Tea'},
    {'alias': 'burgers', 'title': 'Burgers'},
    {'alias': 'bbq', 'title': 'Barbeque'}],
   'rating': 4.3,
   'coordinates': {'latitude': 39.3662481507577,
    'longitude': -111.58679385326128},
   'transactions': [],
   'location': {'address1': '350 N Main St',
    'address2': '',
    'address3': None,
    'city': 'Ephraim',
    'zip_code': '84627',
    'country': 'US',
    'state': 'UT',
    'display_address': ['350 N Main St', 'Ephraim, UT 84627']},
   'phone': '+143528388

In [6]:
list(data)

['businesses', 'total', 'region']

In [7]:
restaurants = pd.DataFrame(data['businesses'])
restaurants.head(3)

Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,transactions,location,phone,display_phone,distance,price
0,vOsSWOhz25b7OcHtClmg7w,roots-89-roadhouse-ephraim,Roots 89 Roadhouse,https://s3-media0.fl.yelpcdn.com/bphoto/2Qm6nU...,False,https://www.yelp.com/biz/roots-89-roadhouse-ep...,22,"[{'alias': 'coffee', 'title': 'Coffee & Tea'},...",4.3,"{'latitude': 39.3662481507577, 'longitude': -1...",[],"{'address1': '350 N Main St', 'address2': '', ...",14352838889,(435) 283-8889,1038.739187,
1,MdpvieSSi2Z4pREUlYnl7w,solid-rock-cafe-ephraim,Solid Rock Cafe,https://s3-media0.fl.yelpcdn.com/bphoto/t18Vpa...,False,https://www.yelp.com/biz/solid-rock-cafe-ephra...,31,"[{'alias': 'cafes', 'title': 'Cafes'}, {'alias...",4.7,"{'latitude': 39.3597646, 'longitude': -111.584...",[],"{'address1': '96 E Center St', 'address2': '',...",14352830178,(435) 283-0178,360.853324,$
2,BWh2P8LRPMP9ER2SFrnA2w,harvest-grill-ephraim,Harvest Grill,,False,https://www.yelp.com/biz/harvest-grill-ephraim...,3,"[{'alias': 'salad', 'title': 'Salad'}, {'alias...",4.0,"{'latitude': 39.36027, 'longitude': -111.587452}",[],"{'address1': '27 N Main St', 'address2': None,...",14352834755,(435) 283-4755,635.898924,


Great! We have been able to search for and create a simple DataFrame with all of these restaurants. But now, how do we search for specific categories of food? Let's look at the "categories" column.

In [8]:
restaurants['categories']

0     [{'alias': 'coffee', 'title': 'Coffee & Tea'},...
1     [{'alias': 'cafes', 'title': 'Cafes'}, {'alias...
2     [{'alias': 'salad', 'title': 'Salad'}, {'alias...
3            [{'alias': 'mexican', 'title': 'Mexican'}]
4     [{'alias': 'italian', 'title': 'Italian'}, {'a...
5            [{'alias': 'mexican', 'title': 'Mexican'}]
6     [{'alias': 'comfortfood', 'title': 'Comfort Fo...
7     [{'alias': 'icecream', 'title': 'Ice Cream & F...
8     [{'alias': 'tradamerican', 'title': 'American'...
9     [{'alias': 'bowling', 'title': 'Bowling'}, {'a...
10           [{'alias': 'chinese', 'title': 'Chinese'}]
11               [{'alias': 'pizza', 'title': 'Pizza'}]
12    [{'alias': 'hotdogs', 'title': 'Fast Food'}, {...
13    [{'alias': 'foodstands', 'title': 'Food Stands'}]
14    [{'alias': 'convenience', 'title': 'Convenienc...
15               [{'alias': 'pizza', 'title': 'Pizza'}]
16    [{'alias': 'sandwiches', 'title': 'Sandwiches'...
17    [{'alias': 'hotdogs', 'title': 'Fast Food'

This column is an array of *json* formatted information. So, we need to extract that information. We'll do this with `pd.json_normalize()`. The result will come out with a "flattened" DataFrame.
* By "flattened" we mean that every occurence of a category gets its own line, even if the place in question (the restaurants in this case) appears multiple times in our list

In [9]:
restaurants_flat = pd.json_normalize(data['businesses'],
                                     sep='_',
                                     record_path='categories',
                                     meta=['name','rating'],
                                     meta_prefix='biz_')
restaurants_flat

Unnamed: 0,alias,title,biz_name,biz_rating
0,coffee,Coffee & Tea,Roots 89 Roadhouse,4.3
1,burgers,Burgers,Roots 89 Roadhouse,4.3
2,bbq,Barbeque,Roots 89 Roadhouse,4.3
3,cafes,Cafes,Solid Rock Cafe,4.7
4,coffee,Coffee & Tea,Solid Rock Cafe,4.7
5,bagels,Bagels,Solid Rock Cafe,4.7
6,salad,Salad,Harvest Grill,4.0
7,burgers,Burgers,Harvest Grill,4.0
8,sandwiches,Sandwiches,Harvest Grill,4.0
9,mexican,Mexican,Jose's Cafe Mexican Food,4.1


Now, we can narrow our search by categories.

In [10]:
restaurants_flat[ restaurants_flat['alias'] == "pizza" ]

Unnamed: 0,alias,title,biz_name,biz_rating
11,pizza,Pizza,Roy's Pizza & Pasta,4.3
26,pizza,Pizza,Za's & Da's,5.0
34,pizza,Pizza,Main Street Pizza,4.1
40,pizza,Pizza,Little Caesars Pizza,3.0
42,pizza,Pizza,Papa Murphy's,3.0
48,pizza,Pizza,Domino's Pizza,0.0


In [13]:
restaurants_flat[ restaurants_flat['alias'] == "hotdogs" ].sort_values(by='biz_rating', ascending=False)

Unnamed: 0,alias,title,biz_name,biz_rating
33,hotdogs,Fast Food,Maverik Adventure's First Stop,4.1
36,hotdogs,Fast Food,Subway,2.7
43,hotdogs,Fast Food,Wendy's,2.5
27,hotdogs,Fast Food,McDonald's,2.0
37,hotdogs,Fast Food,Arby's,1.0
51,hotdogs,Fast Food,Charlee's Comfort Kitchen,0.0


In [12]:
restaurants_flat[ restaurants_flat['title'] == "Mexican" ]

Unnamed: 0,alias,title,biz_name,biz_rating
9,mexican,Mexican,Jose's Cafe Mexican Food,4.1
13,mexican,Mexican,Los Amigos Mexican Restaurant,3.3
47,mexican,Mexican,Pueblo Chico Market,0.0


-----

## Practice

Now try it yourselves with stock data:
* url = "