# Lecture 8 Otaining Data from API's
__Math 3080: Fundamentals of Data Science__

Reading:
* [McKinney, *Python for Data Science*, Chapter 6](https://wesmckinney.com/book/accessing-data#io_web_apis)

Class notes are found through GitHub. As changes are made, they will automatically be uploaded to GitHub. A link to the repository is on Canvas.

-----
## Outline

Some live data can be obtained through web scraping. However, the best archives of data are stored in databases and can be retreived through an __Application Programming Interface (API)__. APIs are commonly used in application software, such as smartphone apps.

To get data from an API in Python, we need the `requests` package.

In [None]:
import pandas as pd
import requests 

In order to access an API, we generally need an *authorization key*. These are often available on webpages under the "Developers" link at the bottom of the page.
* Example: [Yelp business search](https://www.yelp.com)
  * Select "Developers" link on the bottom of the page
  * If needed, enroll in an account
  * Select "Manage API Access"
  * Create an app
  * After the app is created, you will see an API key at the top of the page

Every API needs documentation, which generally includes examples of how to use the system. For Yelp, find the documentation here:
* [Getting Started with the Yelp Fusion API](https://docs.developer.yelp.com/docs/fusion-intro)

In [None]:
api_url = "https://api.yelp.com/v3/businesses/search"

authorization = {
  'Authorization': 'Bearer hZU3WOBIK3jklJqIzew0uDFK_vjSYmoKToQQejrQuceKPGu8SF6M_-SuAT7asN6RNldA_kZvQGrE-3vh-RuQxHxRNUUKkHeRk03p_RLCQcO6ZZvHKMHoR5sEh7f3Y3Yx'
}

search_parameters = {
    'term': 'restaurants',
    'location': 'Ephraim, UT',
    'radius' : 15000,
    'limit' : 50
}

In [None]:
response = requests.get(api_url, headers=authorization, params=search_parameters)

# Generally a good idea to check for HTTP errors
response.raise_for_status()

# What did we get?
response.text

The results are in *json* format. So, we need to extract the data.

In [None]:
data = response.json()
data

In [None]:
list(data)

In [None]:
restaurants = pd.DataFrame(data['businesses'])
restaurants.head(3)

Great! We have been able to search for and create a simple DataFrame with all of these restaurants. But now, how do we search for specific categories of food? Let's look at the "categories" column.

In [None]:
restaurants['categories']

This column is an array of *json* formatted information. So, we need to extract that information. We'll do this with `pd.json_normalize()`. The result will come out with a "flattened" DataFrame.
* By "flattened" we mean that every occurence of a category gets its own line, even if the place in question (the restaurants in this case) appears multiple times in our list

In [None]:
restaurants_flat = pd.json_normalize(data['businesses'],
                                     sep='_',
                                     record_path='categories',
                                     meta=['name','rating'],
                                     meta_prefix='biz_')
restaurants_flat

Now, we can narrow our search by categories.

In [None]:
restaurants_flat[ restaurants_flat['alias'] == "pizza" ]

In [None]:
restaurants_flat[ restaurants_flat['alias'] == "hotdogs" ]

In [None]:
restaurants_flat[ restaurants_flat['title'] == "Mexican" ]