<a href='https://ai.meng.duke.edu'> = <img align="left" style="padding-top:10px;" src=https://storage.googleapis.com/aipi_datasets/Duke-AIPI-Logo.png>

# Working with APIs: Yelp
In this example we will demonstrate how to work with APIs using the Yelp Fusion API as an example.  The documentation for the Yelp Fusion API can be accessed [here](https://www.yelp.com/developers/documentation/v3/business_search).  To run this code you will need to register to get an API key for the Fusion API.

In [1]:
import requests
import json
import os

import pandas as pd
import numpy as np

Let's create a class `yelpSearch` for our Yelp search.  In addition to the `__init__` method we will create three other methods for our class:  
- `fetchData()`: gets the yelp search data for the specified location and type of business  
- `processData()`: cleans up the search results data into a dataframe  
- `getReviews()`: gets the yelp reviews for a specific business

In [1]:
class yelpSearch:
    def __init__(self,key,term,location,url="https://api.yelp.com/v3/businesses/"):
        self.key = key
        self.term = term
        self.location = location
        self.url = url

    def fetchData(self,limit=10):
        '''
        Gets data from Yelp for businesses that match a given search term and location

        Inputs:
            limit(int): maximum number of businesses to return
        Returns:
            df(DataFrame): a dataframe containing information on businesses that match the search query
        '''

        # Specify headers, url and params
        search_url = self.url + 'search'
        headers = {'Authorization': f'Bearer {self.key}'}
        payload = {
            'term': self.term.replace(' ', '+'),
            'location': self.location.replace(' ', '+'),
            'limit': limit
        }
        
        rows = [] # Hold data for each business
        ids = [] # Hold ids of businesses already added
        
        try:
            # Get response
            response = requests.get(search_url, headers=headers, params=payload)
            # Decode
            response = response.json()
            # Add results to lists
            if response['businesses']:
                data = response['businesses']
                for d in data:
                    if d['id'] not in ids:
                        rows.append(d)
                        ids.append(d['id'])
        except Exception as e:
            print('Error occurred')
            print(e)

        df = pd.DataFrame(rows)
        self.data = df
        return self.data

    def processData(self):
        '''
        Cleans up the data extracted from Yelp

        Inputs:
            df(DataFrame): dataframe of results from Yelp
        Returns:
            df_clean(DataFrame): processed dataframe containing cleaned results
        '''
        
        # Clean up columns
        self.data['category'] = self.data['categories'].apply(lambda x: x[0]['title'])
        self.data['address'] = self.data['location'].apply(lambda x: x['address1'])
        self.data['distance'] = self.data['distance']/1609
        
        # Filter to only needed columns
        self.data = self.data.loc[:,['id','name','review_count','category','rating','address','distance','price']]
        
        return self.data

    def getReviews(self,id):
        '''
        Fetches reviews for a given business from Yelp API

        Inputs:
            id(str): yelp id of business to fetch review for
        Returns:
            df(DataFrame): dataframe containing the ratings, review and timestamps
        '''

        # Specify headers and url
        headers = {'Authorization': f'Bearer {self.key}'}
        reviews_url = self.url + id + "/reviews"
        
        try:
            # Get response to search
            response = requests.get(reviews_url, headers=headers)
            # Decode
            response = response.json()
            
            reviews = []
            for review in response['reviews']:
                reviews.append(review)
            
        except Exception as e:
            print('Error occurred')
            print(e)

        reviewsdf = pd.DataFrame(reviews)
        reviewsdf['date'] = reviewsdf['time_created'].apply(lambda x: x.split(' ')[0])
        
        reviewsdf = reviewsdf.loc[:,['rating','text','date']]
        
        return reviewsdf

Now that we've got everything set up, let's try it out. We will instantiate our class by searching for indian food in Morrisville, NC.  We'll then get the search results from Yelp and organize them into a dataframe.

In [1]:
term = 'indian food'
location = 'Morrisville NC'

# Load API key (read this in from a config.py file or type it in)
if os.path.exists('config.py'):
    import config
    key = config.api_key
else:
    key = input('Please enter your API key:')

# Instantiate class
search = yelpSearch(key,term,location)
# Get the search results
data = search.fetchData(limit=10)
# Clean up
df = search.processData()
df

In [5]:
# Get review for a specific restaurant
id='yrfk9eKjtvlkzKyO1HPtCQ'
reviews = search.getReviews(id)
reviews

Unnamed: 0,rating,text,date
0,5,This place is great and a must visit for all B...,2022-08-19
1,5,This is my 1st five rating for a good reason: ...,2022-07-16
2,3,Luke warm madras coffee; 20 min to get seated;...,2022-05-08
