# Yelp API - Gathering Data
In this Notebook, we're going to use the YELP Fusion API to retrieve informations about restaurants in Paris.


## Steps to collect data :
<ul>
<li>Get a list of all Yelp categories</li>
<li>For each category, loop through to get 1000 restaurants correponding to that category</li>
<li>Do that for all categories to retrieve all restaurant data</li>
</ul>



### Testing Yelp API Endpoint
We start by checking the access to the API.

In [1]:
import requests

YELP_API_ENDPOINT = 'https://api.yelp.com/v3/businesses/search'
# CLIENT_ID = '<--- Client ID HERE --->'
# API_KEY =   '<--- API KEY HERE --->'
HEADERS = {'Authorization': 'bearer %s' % API_KEY}

PARAMETERS = {'term': 'restaurants',
              'offset': 0,
              'limit': 50,
              'radius': 40000,
              'location': 'Paris, France'}


api_response = requests.get(url=YELP_API_ENDPOINT, params=PARAMETERS, headers=HEADERS)
if api_response.status_code == 200:
    print('Yelp API access ok')
    

Yelp API access ok


### Get all restaurants categories

In [2]:
import json

with open('yelp_categories.json') as f:
    data = json.load(f)

# load restaurants categories only
restaurants_types = [place for place in data if 'restaurants' in place['parents']]

# load their titles and aliases
restaurant_titles = [restaurant['title'] for restaurant in restaurants_types]
restaurant_aliases = [restaurant['alias'] for restaurant in restaurants_types]

### Sending requests to the Yelp API
In the Yelp API, there is a limit of 50 places per API call. We set the offset and limit parameters at each call to allow a total of 1000 places to be collected for each restaurant category

In [3]:
import time
from tqdm.notebook import tqdm
tqdm().pandas()


PARAMETERS = {'term': 'restaurants',
              'offset': 0,                 # start at 0
              'limit': 50,                 # maximum is 50 places per call
              'radius': 40000,             # max distance 40.000m from the location
              'location': 'Paris, France'}

restaurants_in_paris = []

# Loop over restaurants categories
for category in tqdm(restaurant_aliases):
    PARAMETERS['categories'] = category
    # Cycle through restaurants
    for offset_number in range(0,1000,50):
        PARAMETERS['offset'] = offset_number
        response = requests.get(url=YELP_API_ENDPOINT, params=PARAMETERS, headers=HEADERS)
        
        if not response.json().get('businesses', False):
            break
        restaurants_in_paris.extend(response.json()['businesses'])
        print("Collecting {} restaurants data.".format(category))

        ## Don't make the Yelp API angry
        time.sleep(0.1) 


HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))




  from pandas import Panel


HBox(children=(FloatProgress(value=0.0, max=192.0), HTML(value='')))

Collecting afghani restaurants data.
Collecting african restaurants data.
Collecting african restaurants data.
Collecting african restaurants data.
Collecting arabian restaurants data.
Collecting argentine restaurants data.
Collecting armenian restaurants data.
Collecting asianfusion restaurants data.
Collecting asianfusion restaurants data.
Collecting australian restaurants data.
Collecting austrian restaurants data.
Collecting bangladeshi restaurants data.
Collecting basque restaurants data.
Collecting bbq restaurants data.
Collecting bbq restaurants data.
Collecting belgian restaurants data.
Collecting bistros restaurants data.
Collecting bistros restaurants data.
Collecting bistros restaurants data.
Collecting bistros restaurants data.
Collecting bistros restaurants data.
Collecting bistros restaurants data.
Collecting bistros restaurants data.
Collecting bistros restaurants data.
Collecting bistros restaurants data.
Collecting bistros restaurants data.
Collecting bistros restauran

Collecting japanese restaurants data.
Collecting japanese restaurants data.
Collecting japanese restaurants data.
Collecting japanese restaurants data.
Collecting japanese restaurants data.
Collecting japanese restaurants data.
Collecting japanese restaurants data.
Collecting japanese restaurants data.
Collecting japanese restaurants data.
Collecting japanese restaurants data.
Collecting kebab restaurants data.
Collecting kebab restaurants data.
Collecting kebab restaurants data.
Collecting korean restaurants data.
Collecting korean restaurants data.
Collecting korean restaurants data.
Collecting laotian restaurants data.
Collecting latin restaurants data.
Collecting latin restaurants data.
Collecting lyonnais restaurants data.
Collecting malaysian restaurants data.
Collecting meatballs restaurants data.
Collecting mediterranean restaurants data.
Collecting mediterranean restaurants data.
Collecting mediterranean restaurants data.
Collecting mediterranean restaurants data.
Collecting m

### Save results to json files

In [4]:
# Save list with duplicates
restaurants_file =  open("paris_restaurants_duplicates.json", "w")
json.dump(restaurants_in_paris, restaurants_file, indent=6)
restaurants_file.close()

# Remove the duplicate entries
res_list = [i for n, i in enumerate(restaurants_in_paris) if i not in restaurants_in_paris[n + 1:]]

# Save list without duplicates
restaurants_file = open("paris_restaurants.json", "w")
json.dump(res_list, restaurants_file, indent=6)
restaurants_file.close()