In [1]:
import pandas as pd
import json
import requests # This library will be used to call the APIs
import os # Use this library to access environment variable(s)

# Foursquare

### Send a request to Foursquare with a small radius (1000m) for all the bike stations in your city of choice

In [2]:
# Access and assign environment variable containing foursquare key
foursquare_key = os.getenv("foursquare_api_key")
# Read the CSV file previous saved in "city_bikes.ipynb" into DataFrame
all_stations_df = pd.read_csv("../data/montreal_bike_stations.csv")

To perform an appropriate comparison between [Foursquare places](https://location.foursquare.com/places/docs/categories) and [Yelp places](https://docs.developer.yelp.com/docs/resources-categories), we can attempt to obtain similar categories.                                                             

To name a few, we can see similar categories in both places:
* restaurants (listed as "13065" on Foursquare)
* arts and entertainment (listed as "10000" on Foursquare and as "arts" on Yelp)
* landmarks & outdoors (list as "16000" on Foursquare); landmarks & historical buildings (listed as "landmarks" on Yelp)


In [3]:
def fsq_place_data(latitude, longitude, radius):
    
    fsq_endpt_url = "https://api.foursquare.com/v3/places/search" 
    parameters = {
        'll': f"{latitude},{longitude}",
        'radius': radius,
        'categories': '13065,10000,16000',
        # Select specific fields to minimuze unnecessary calls (to obtain only necessary data)
        'fields': 'fsq_id,name,geocodes,location,categories,distance,rating,popularity,price'
    }   
    headers = {
        'Accept': 'application/json', # Set output response type to JSON
        'Authorization': foursquare_key
    }
    
    fsq_response = requests.get(fsq_endpt_url, params = parameters, headers = headers)
    
    return fsq_response    

In [4]:
# Sample request from first row of 'all_stations_df' table
# fsq_res = fsq_place_data(latitude = 45.617500, longitude = -73.60601, radius = 1000)

### Parse through the response to get points of interest (such as restaurants and bars) and details you want (ratings, name, and location)

Note 1: The APIs we will be using (Foursquare and Yelp) have a limit on the amount of calls that can be performed. In this case, we are more limited by Yelp which has a limit of 500 API calls per day. 

Note 2: Making the APIs calls can be a time-consuming process, ideally it would be best done over a period of time.

**Based on considerations like available API calls and maintaining consistency (with similar points of interest selected for analysis), the decision made was to select 350 API calls for both Yelp and Foursquare.**

In [5]:
fsq_data = []

# Iterate over the rows (.iterrows()) of all_stations_df (which contains information about city bikes in Montreal)
# 'all_stations_df' is received as a tuple containing index position and table information, we use "_" to store the index
# We can limit the amount of API calls made using 'iloc[]' in our for-loop (since our function call is within)
for _, df_row in all_stations_df.iloc[:350].iterrows():
    
    # From the 'all_stations_df' table we can retrieve the necessary values through keys
    # Check using print() to verify latitude and longitudes from 'all_stations_df' table are correct 
    # print(f"{df_row['latitude']}, {df_row['longitude']}")
    fsq_res = fsq_place_data(latitude = df_row['latitude'], longitude = df_row['longitude'], radius = 1000)
    fsq_json = fsq_res.json()
    
    # Access 'results' key from fsq_json, otherwise provide an empty list where 'results' is not present
    try:
        for results in fsq_json.get('results', []):
            
            # The categories in each 'results' can contain one or more category names, in such as case we can add unique names to a list for each row
            categories = set()
            for category in results.get('categories', []):
                categories.add(category['name'])
            
            fsq_place_details = {
                'fsq_name': results['name'],
                # Store the list of combined categories for each place
                'fsq_categories': list(categories),
                'fsq_latitude': results['geocodes']['main']['latitude'],
                'fsq_longitude': results['geocodes']['main']['longitude'],
                'city_bike_latitude': df_row['latitude'],
                'city_bike_longitude': df_row['longitude'],
                'fsq_distance (m)': results['distance'],
                # 'rating', 'popularity', 'price' can contain empty fields, use get() method and replace default value with null/NaN
                'fsq_rating': results.get('rating', None),
                'fsq_popularity': results.get('popularity', None),
                'fsq_price': results.get('price', None)
            }
            fsq_data.append(fsq_place_details) 
    
    # HTTP error for status codes like 429 (too many request), delay the program's execution
    except requests.exceptions.HTTPError as http_err:
        if fsq_res.status_code == 429:
                time.sleep(30)
        else: 
            print("HTTP error occurred. Error:", str(http_err))
       
    except requests.exceptions.Timeout as timeout_err:
        print("Request timed out. Error:", str(timeout_err))
        
    except Exception as exc:
        print("Request failed. Error:", str(exc))
                     
fsq_place_df = pd.DataFrame(fsq_data)   

In [7]:
fsq_place_df

Unnamed: 0,fsq_name,fsq_categories,fsq_latitude,fsq_longitude,city_bike_latitude,city_bike_longitude,fsq_distance (m),fsq_rating,fsq_popularity,fsq_price
0,Restaurant Prima Luna,[Italian Restaurant],45.617439,-73.593995,45.617500,-73.606011,941,7.4,0.985510,1.0
1,Salle Désilets,"[Office Building, Music Venue]",45.617818,-73.606000,45.617500,-73.606011,19,,0.961532,
2,Fun O Max,"[Playground, Recreation Center]",45.618397,-73.605533,45.617500,-73.606011,106,,0.514712,
3,Ecafé,[Restaurant],45.611729,-73.606237,45.617500,-73.606011,641,,,
4,Gagnon Multi Services Inc,"[Agriculture and Forestry Service, Farm]",45.618448,-73.597475,45.617500,-73.606011,659,,0.722572,
...,...,...,...,...,...,...,...,...,...,...
3483,Umi Sushi,[Sushi Restaurant],45.467507,-73.541760,45.472599,-73.539806,567,6.5,0.884432,2.0
3484,Nagomi,[Japanese Restaurant],45.472835,-73.539459,45.472599,-73.539806,45,,0.829483,2.0
3485,Parc de l'esplanade de la Pointe-Nord,[Park],45.473489,-73.537979,45.472599,-73.539806,186,,0.921741,
3486,Subway,"[Fast Food Restaurant, Deli]",45.469158,-73.541570,45.472599,-73.539806,421,5.6,0.899977,1.0


Response fields based on [Foursquare](https://location.foursquare.com/developer/reference/response-fields) definitions (used in the dictionary fsq_place_details):
* name: best known name for the FSQ place
* latitude: distance north or south of the equator
* longitude: distance east or west of the prime meridian
* city_bike_latitude: information previously obtained from "city_bikes.ipynb"
* city_bike_longitude: information previously obtained from "city_bikes.ipynb"
* distance: the calculated distance (in meters) from the provided location
* rating: numerical value from 0.0 to 10.0 of the place based on user votes, likes/dislikes, tips sentiment, and visit data
    * not all places will have a rating (in this case, represented as null) 
* popularity: measure of the place's popularity, by foot traffic, it ranges from 0 to 1 and uses 6-month span of point of interest visits for given geohgraphic area 
    * not all places will have a popularity value (in this case, represented as null)
* price: numerical value from 1 to 4 and is used to describe the pricing tier of the place, it is based on known prices for menu items and other offerings
    * values include:
        * 1 = cheap
        * 2 = moderate
        * 3 = expensive
        * 4 = very expensive
    * not all places will have a price value (in this case, represented as null) 

### Put parsed results into a dataframe

In [8]:
# Save DataFrame to a CSV file without row index value
fsq_place_df.to_csv("../data/foursquare_places.csv", index = False)

# Yelp

### Send a request to Yelp with a small radius (1000m) for all the bike stations in your city of choice

In [9]:
yelp_key = os.getenv("yelp_api_key")
all_stations_df = pd.read_csv("../data/montreal_bike_stations.csv")

In [10]:
def yelp_place_data(latitude, longitude, radius):
    
    yelp_endpt_url = "https://api.yelp.com/v3/businesses/search" 
    parameters = {
        'latitude': latitude,
        'longitude': longitude,
        'radius': radius,
        'categories': 'restaurants,arts,landmarks'
    }   
    headers = {
        'Authorization': 'bearer ' + yelp_key
    }
    
    yelp_response = requests.get(yelp_endpt_url, params = parameters, headers = headers)
    
    return yelp_response    

In [11]:
# Sample request from first row of 'all_stations_df' table
# yelp_res = yelp_place_data(latitude = 45.617500, longitude = -73.60601, radius = 1000)

### Parse through the response to get points of interest (such as restaurants, and bars) and details you want (ratings, name, and location)

In [12]:
yelp_data = []

for _, df_row in all_stations_df.iloc[:350].iterrows():
    
    # print(f"{df_row['latitude']}, {df_row['longitude']}")
    yelp_res = yelp_place_data(latitude = df_row['latitude'], longitude = df_row['longitude'], radius = 1000)
    yelp_json = yelp_res.json()
    
    try:
        for businesses in yelp_json.get('businesses', []):
            
            categories = set()
            for category in businesses.get('categories', []):
                categories.add(category['title'])
            
            yelp_place_details = {
                'yelp_name': businesses['name'],
                'yelp_categories': list(categories),
                'yelp_latitude': businesses['coordinates']['latitude'],
                'yelp_longitude': businesses['coordinates']['longitude'],
                'city_bike_latitude': df_row['latitude'],
                'city_bike_longitude': df_row['longitude'],
                'yelp_distance (m)': businesses['distance'],
                'yelp_rating': businesses.get('rating', None),
                # Yelp does not contain a "popularity" field, instead we will substitute "review_count"
                'yelp_review_count': businesses.get('review_count', None),
                'yelp_price': businesses.get('price', None)
            }
            yelp_data.append(yelp_place_details) 
            
    except requests.exceptions.HTTPError as http_err:
        if yelp_res.status_code == 429:
            # For APIs that have a queries-per-second (QPS) rate limiter,in this case, it is mainly useful for Yelp due to more strict policies 
            time.sleep(30)
        else: 
            print("HTTP error occurred. Error:", str(http_err))
       
    except requests.exceptions.Timeout as timeout_err:
        print("Request timed out. Error:", str(timeout_err))
        
    except Exception as exc:
        print("Request failed. Error:", str(exc))
                     
yelp_place_df = pd.DataFrame(yelp_data)   

In [13]:
yelp_place_df

Unnamed: 0,yelp_name,yelp_categories,yelp_latitude,yelp_longitude,city_bike_latitude,city_bike_longitude,yelp_distance (m),yelp_rating,yelp_review_count,yelp_price
0,Capucine,[Italian],45.619670,-73.609704,45.617500,-73.606011,375.111000,5.0,4,
1,Restaurant Prima Luna,"[Italian, Sushi Bars]",45.617234,-73.594176,45.617500,-73.606011,920.960695,4.0,14,$$$
2,Boulangerie Adriatica,"[Italian, Pizza]",45.615210,-73.609710,45.617500,-73.606011,365.542788,4.0,1,
3,Grillades Sizzle,[Portuguese],45.623980,-73.601030,45.617500,-73.606011,808.741245,3.5,3,$$
4,Dagostino Pizza,[Pizza],45.624790,-73.599070,45.617500,-73.606011,975.918229,4.5,3,
...,...,...,...,...,...,...,...,...,...,...
6492,Umi Sushi,"[Japanese, Sushi Bars]",45.467604,-73.541600,45.472599,-73.539806,572.772091,3.0,16,$$$
6493,Subway,"[Sandwiches, Fast Food]",45.469044,-73.541703,45.472599,-73.539806,422.093975,1.5,5,
6494,Amir,[Middle Eastern],45.467550,-73.541380,45.472599,-73.539806,585.485232,2.5,6,
6495,Le Vin-Le Vain,[Restaurants],45.468450,-73.543060,45.472599,-73.539806,515.538839,4.0,1,


Response fields based on Yelp definitions (used in the dictionary yelp_place_details):
* name: best known name for the Yelp place
* latitude: distance north or south of the equator
* longitude: distance east or west of the prime meridian
* city_bike_latitude: information previously obtained from "city_bikes.ipynb"
* city_bike_longitude: information previously obtained from "city_bikes.ipynb"
* distance: the calculated distance (in meters) from the provided location
* rating: numerical value from 0.0 to 5.0 of the place based on user reviews, likes, and other user interactions
    * not all places will have a rating (in this case, represented as null) 
* review_count: the total number of reviews the place has received
    * not all places will have a popularity value (in this case, represented as null)
* price: string represented by "$" to describe the pricing tier of the place, it is based on known prices for menu items and other offerings
    * strings include:
        * \$ = cheap
        * \\$\$ = moderate
        * \\$\$ = expensive
        * \\$\$\$\$ = very expensive
    * not all places will have a price value (in this case, represented as "None") 

### Put parsed results into a dataframe

In [14]:
# Save DataFrame to a CSV file without row index value
yelp_place_df.to_csv("../data/yelp_places.csv", index = False)

# Comparing Results

### Which API provided you with more complete data?

To perform an appropriate analysis, the decision was made to use similar points of interests as well as the same number of API calls. 

In this case, making each API call was a time-consuming process and was also limited by the amount Yelp allowed. Yelp allows 500 API calls per day, as such to avoid exceeding this limit and allowing for additional calls, a 350 limit was placed. This would also enable a faster completion time as opposed to using a larger number of calls. It should be noted, that Yelp also places a limit on the number of queries per second (QPS); however, this was taken care of using a try/except with a delayed execution. Although we were able to perform more API calls using Foursquare, it did not make sense for the analysis when comparing the number of results returned.

For the selection of the points of interest, the documentation for both Foursquare and Yelp were examined to find places that had a common structure and naming convention. The selection included restaurants, arts and entertainment, and landmarks and outdoors (for Yelp this was landmarks and historical buildings).

In [15]:
fsq_place_df.info()
print(f"\n{fsq_place_df.isna().sum()}")

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3488 entries, 0 to 3487
Data columns (total 10 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   fsq_name             3488 non-null   object 
 1   fsq_categories       3488 non-null   object 
 2   fsq_latitude         3488 non-null   float64
 3   fsq_longitude        3488 non-null   float64
 4   city_bike_latitude   3488 non-null   float64
 5   city_bike_longitude  3488 non-null   float64
 6   fsq_distance (m)     3488 non-null   int64  
 7   fsq_rating           2888 non-null   float64
 8   fsq_popularity       3391 non-null   float64
 9   fsq_price            2368 non-null   float64
dtypes: float64(7), int64(1), object(2)
memory usage: 272.6+ KB

fsq_name                  0
fsq_categories            0
fsq_latitude              0
fsq_longitude             0
city_bike_latitude        0
city_bike_longitude       0
fsq_distance (m)          0
fsq_rating              600
fsq_

In [16]:
yelp_place_df.info()
print(f"\n{yelp_place_df.isna().sum()}")

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6497 entries, 0 to 6496
Data columns (total 10 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   yelp_name            6497 non-null   object 
 1   yelp_categories      6497 non-null   object 
 2   yelp_latitude        6497 non-null   float64
 3   yelp_longitude       6497 non-null   float64
 4   city_bike_latitude   6497 non-null   float64
 5   city_bike_longitude  6497 non-null   float64
 6   yelp_distance (m)    6497 non-null   float64
 7   yelp_rating          6497 non-null   float64
 8   yelp_review_count    6497 non-null   int64  
 9   yelp_price           4767 non-null   object 
dtypes: float64(6), int64(1), object(3)
memory usage: 507.7+ KB

yelp_name                 0
yelp_categories           0
yelp_latitude             0
yelp_longitude            0
city_bike_latitude        0
city_bike_longitude       0
yelp_distance (m)         0
yelp_rating               0
yelp

When looking through the documentation for points of interest, it was seen that [Yelp](https://docs.developer.yelp.com/docs/resources-categories) contained more categories and subcategories compared to [Foursquare](https://location.foursquare.com/places/docs/categories), but Foursquare's documentation was much easier to navigate.  

When selecting most relevant columns we were able to retrieve from the JSON for comparison, we can see that both Foursquare and Yelp both had all the same categories (with similar category definitions) except for "fsq_popularity" and "yelp_review_count", respectively.

When comparing the number of results returned for 350 API calls of both Foursquare and Yelp, we can see that they returned 3488 and 6497 entries, respectively. Of those 3488 entries, Foursquare had a total of 1817 null rows (or 52.093% of its rows contained null values). On other hand, of those 6497 entries, Yelp had a total of 1730 null rows (or 26.628% of its rows contained null values). In addition, it can be seen that Foursquare contains null values for the columns "fsq_rating", "fsq_popularity", "fsq_price" while Yelp only contains a null value for the column "yelp_price" only.

In pandas.DataFrame, we have access to 'info()' which can be used to print a concise summary. Using it we can view the datatypes for each of the columns, the only datatype that stands out is "fsq_price" and "yelp_price" represented by float64 and object, respectively. This is due to the fact "fsq_price" contains a range of 1.0 to 4.0 to represent price, whereas "yelp_price" represents this range using '\\$' to '\\$$$$'.

In [17]:
fsq_place_df.describe()

Unnamed: 0,fsq_latitude,fsq_longitude,city_bike_latitude,city_bike_longitude,fsq_distance (m),fsq_rating,fsq_popularity,fsq_price
count,3488.0,3488.0,3488.0,3488.0,3488.0,2888.0,3391.0,2368.0
mean,45.525158,-73.599164,45.52544,-73.599052,1575.069,8.198373,0.916681,1.681166
std,0.041046,0.050112,0.041092,0.050452,63049.02,0.900564,0.141904,0.75808
min,45.415135,-73.760746,45.417746,-73.758227,4.0,4.9,0.007661,1.0
25%,45.510113,-73.616373,45.509759,-73.617012,304.0,7.7,0.92044,1.0
50%,45.525313,-73.583534,45.527009,-73.584249,494.5,8.5,0.950814,2.0
75%,45.544943,-73.568884,45.545026,-73.568568,708.0,8.9,0.975963,2.0
max,45.657116,-73.484806,45.651406,-73.490113,3724106.0,9.5,1.0,4.0


In [18]:
yelp_place_df.describe()

Unnamed: 0,yelp_latitude,yelp_longitude,city_bike_latitude,city_bike_longitude,yelp_distance (m),yelp_rating,yelp_review_count
count,6497.0,6497.0,6497.0,6497.0,6497.0,6497.0,6497.0
mean,45.52451,-73.598492,45.525306,-73.597999,729.545514,4.059566,125.557334
std,0.036553,0.04933,0.036661,0.048001,1355.11957,0.666475,292.829064
min,45.374494,-73.992268,45.417746,-73.758227,5.753933,1.0,1.0
25%,45.51123,-73.61304,45.512994,-73.613752,352.185225,4.0,9.0
50%,45.52452,-73.582602,45.527009,-73.583801,631.897264,4.0,34.0
75%,45.542177,-73.56943,45.543651,-73.569297,920.796038,4.5,102.0
max,45.657352,-73.482534,45.651406,-73.490113,39485.007311,5.0,3130.0


The only columns that may only be considered appropriate for comparison are 'rating', 'popularity' (for Foursquare) or 'review_count (for Yelp), and 'price'. However, as previously mentioned the 'rating' between Foursqaure and Yelp are not comparable as they use different scales (e.g. 0.0 to 10.0 and 0.0 to 5.0, respectively), the 'popularity' and 'review_count' are different statistics, and 'price' is evaluted using numbers for Foursqaure but strings for Yelp. 

### Get the top 10 restaurants according to their rating

#### Foursquare

In [19]:
fsq_place_df.head(10)

Unnamed: 0,fsq_name,fsq_categories,fsq_latitude,fsq_longitude,city_bike_latitude,city_bike_longitude,fsq_distance (m),fsq_rating,fsq_popularity,fsq_price
0,Restaurant Prima Luna,[Italian Restaurant],45.617439,-73.593995,45.6175,-73.606011,941,7.4,0.98551,1.0
1,Salle Désilets,"[Office Building, Music Venue]",45.617818,-73.606,45.6175,-73.606011,19,,0.961532,
2,Fun O Max,"[Playground, Recreation Center]",45.618397,-73.605533,45.6175,-73.606011,106,,0.514712,
3,Ecafé,[Restaurant],45.611729,-73.606237,45.6175,-73.606011,641,,,
4,Gagnon Multi Services Inc,"[Agriculture and Forestry Service, Farm]",45.618448,-73.597475,45.6175,-73.606011,659,,0.722572,
5,Ge Sports,"[Sporting Goods Retail, Hiking Trail]",45.61855,-73.614667,45.6175,-73.606011,682,,,
6,Entreprise Viceversa,[Garden],45.620482,-73.598124,45.6175,-73.606011,713,,0.268197,
7,Restaurant Shekz,[Restaurant],45.623543,-73.60138,45.6175,-73.606011,773,,0.548987,
8,Grillades Sizzle,"[Bistro, Portuguese Restaurant]",45.624004,-73.601104,45.6175,-73.606011,811,,0.995115,
9,Pizzeria Etc,[Pizzeria],45.617734,-73.593471,45.6175,-73.606011,964,,0.695877,1.0


In [23]:
# To get the top 10 restaurants according to their rating for Foursquare we can apply a lambda function
# The apply() method applies a function along an axis of the DataFrame (the default is axis = 0 or the indices)
# The statement checks each row in "fsq_categories" column to see if it contains the category "Restaurant", if it does it adds that row to the list of filtered rows
fsq_restaurant_df = fsq_place_df[fsq_place_df['fsq_categories'].apply(lambda x: 'Restaurant' in x)]
fsq_restaurant_df

Unnamed: 0,fsq_name,fsq_categories,fsq_latitude,fsq_longitude,city_bike_latitude,city_bike_longitude,fsq_distance (m),fsq_rating,fsq_popularity,fsq_price
3,Ecafé,[Restaurant],45.611729,-73.606237,45.617500,-73.606011,641,,,
7,Restaurant Shekz,[Restaurant],45.623543,-73.601380,45.617500,-73.606011,773,,0.548987,
13,La Fabrique Bistrot,[Restaurant],45.518010,-73.569549,45.516926,-73.564257,426,8.6,0.927961,4.0
16,Bouillon Bilk,[Restaurant],45.511038,-73.565936,45.516926,-73.564257,675,9.3,0.954504,3.0
17,Cadet,"[Wine Bar, Restaurant]",45.510388,-73.564365,45.516926,-73.564257,741,9.4,0.965538,
...,...,...,...,...,...,...,...,...,...,...
3473,Atwater Cocktail Club,"[Speakeasy, Restaurant]",45.481306,-73.578300,45.478889,-73.581989,404,8.6,0.980743,3.0
3475,Havre-aux-Glaces,"[Ice Cream Parlor, Restaurant]",45.478917,-73.575658,45.478889,-73.581989,492,8.5,0.944582,1.0
3477,Théâtre Corona,"[Theater, Restaurant, Music Venue]",45.482908,-73.574915,45.478889,-73.581989,695,8.9,0.913272,
3481,Tim Hortons,"[Restaurant, Cafe, Coffee, and Tea House]",45.473150,-73.540727,45.472599,-73.539806,87,6.3,0.918801,1.0


In [62]:
# Filter using boolean indexing where "fsq_rating" is not null the corresponding row will return True and be selected
fsq_filtered_df = fsq_restaurant_df[fsq_restaurant_df['fsq_rating'].notna()]

# Use 'drop_duplicates()' to drop duplicates (by default, it drops duplicates except the first occurrence)
fsq_unique_restaurants_df = fsq_filtered_df.drop_duplicates('fsq_name')

# Sort the DataFrame by 'fsq_rating' in descending order for top 10 restaurants by their rating
fsq_top_restaurants = fsq_unique_restaurants_df.sort_values(by = 'fsq_rating', ascending = False)

# Limit the results to only top 10 restaurants by rating
fsq_top_restaurants.head(10)

Unnamed: 0,fsq_name,fsq_categories,fsq_latitude,fsq_longitude,city_bike_latitude,city_bike_longitude,fsq_distance (m),fsq_rating,fsq_popularity,fsq_price
17,Cadet,"[Wine Bar, Restaurant]",45.510388,-73.564365,45.516926,-73.564257,741,9.4,0.965538,
521,Crew Collective & Café,"[Coffee Shop, Café, Restaurant]",45.502313,-73.559169,45.497165,-73.55933,581,9.3,0.984186,1.0
16,Bouillon Bilk,[Restaurant],45.511038,-73.565936,45.516926,-73.564257,675,9.3,0.954504,3.0
1375,Larry's,"[Café, Restaurant]",45.524086,-73.594743,45.5281,-73.588439,664,9.2,0.923158,
859,Marconi,"[Cocktail Bar, Restaurant]",45.532967,-73.615967,45.53519,-73.615482,262,9.2,0.949631,
174,Le Moineau Masqué,"[Coffee Shop, Café, Restaurant]",45.525508,-73.577976,45.52689,-73.57264,455,9.2,0.896802,1.0
1376,Au Kouign-Amann,"[Coffee Shop, Restaurant, Bakery]",45.523125,-73.583423,45.5281,-73.588439,662,9.2,0.95072,2.0
1612,Kem CoBa,"[Ice Cream Parlor, Restaurant, Pastry Shop]",45.523116,-73.594954,45.527041,-73.593471,456,9.2,0.916739,1.0
2293,Paquebot,"[Coffee Shop, Café, Restaurant]",45.54873,-73.601168,45.546661,-73.588684,1006,9.1,0.948647,1.0
566,Jacquie et France,[Restaurant],45.458575,-73.576264,45.456085,-73.581937,530,9.1,0.951704,1.0


#### Yelp

In [61]:
yelp_place_df.head(10)

Unnamed: 0,yelp_name,yelp_categories,yelp_latitude,yelp_longitude,city_bike_latitude,city_bike_longitude,yelp_distance (m),yelp_rating,yelp_review_count,yelp_price
0,Capucine,[Italian],45.61967,-73.609704,45.6175,-73.606011,375.111,5.0,4,
1,Restaurant Prima Luna,"[Italian, Sushi Bars]",45.617234,-73.594176,45.6175,-73.606011,920.960695,4.0,14,$$$
2,Boulangerie Adriatica,"[Italian, Pizza]",45.61521,-73.60971,45.6175,-73.606011,365.542788,4.0,1,
3,Grillades Sizzle,[Portuguese],45.62398,-73.60103,45.6175,-73.606011,808.741245,3.5,3,$$
4,Dagostino Pizza,[Pizza],45.62479,-73.59907,45.6175,-73.606011,975.918229,4.5,3,
5,Pizzeria Etc (La),[Pizza],45.61723,-73.593807,45.6175,-73.606011,956.344667,4.5,2,$$
6,Tim Hortons,"[Coffee & Tea, Breakfast & Brunch]",45.6263,-73.59795,45.6175,-73.606011,1165.624436,1.0,3,
7,Shekz Restaurant,"[Italian, Sushi Bars, Thai]",45.62356,-73.6015,45.6175,-73.606011,748.784734,1.5,2,
8,L'Amère à Boire,"[Tapas Bars, Brewpubs]",45.51642,-73.566042,45.516926,-73.564257,150.012599,4.0,68,$$
9,Le Saint-Bock,[Brasseries],45.51582,-73.564641,45.516926,-73.564257,126.552913,4.0,209,$$


In [63]:
yelp_restaurant_df = yelp_place_df[yelp_place_df['yelp_categories'].apply(lambda x: 'Restaurant' in x)]
yelp_restaurant_df

Unnamed: 0,yelp_name,yelp_categories,yelp_latitude,yelp_longitude,city_bike_latitude,city_bike_longitude,yelp_distance (m),yelp_rating,yelp_review_count,yelp_price


The search returned 0 entries containing "Restaurant" for the column "yelp_categories".

The question we can ask ourself is whether or not this is correct. We can further investigate by examining a sample JSON structure.

In [67]:
# Before diving in, we can view the length and the see the available keys
print(len(yelp_json))
print(yelp_json.keys())

3
dict_keys(['businesses', 'total', 'region'])


In [75]:
# Earlier we determined "businesses" contains the information we are looking for
print(len(yelp_json['businesses']))

# The JSON contains 15 entries within the API call, we can select and view a single one
yelp_json['businesses'][1]

15


{'id': 'YghwGgNPT-76or74_R0TnQ',
 'alias': 'nagomi-verdun',
 'name': 'Nagomi',
 'image_url': 'https://s3-media1.fl.yelpcdn.com/bphoto/um4MWTPNQikyQy50HExY3w/o.jpg',
 'is_closed': False,
 'url': 'https://www.yelp.com/biz/nagomi-verdun?adjust_creative=STOpwzJEmsUiIfp_uywKXA&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=STOpwzJEmsUiIfp_uywKXA',
 'review_count': 3,
 'categories': [{'alias': 'japanese', 'title': 'Japanese'}],
 'rating': 2.5,
 'coordinates': {'latitude': 45.472915, 'longitude': -73.539458},
 'transactions': [],
 'location': {'address1': '103 Rue Jacques-le Ber',
  'address2': '',
  'address3': None,
  'city': 'Verdun',
  'zip_code': 'H3E 1Y1',
  'country': 'CA',
  'state': 'QC',
  'display_address': ['103 Rue Jacques-le Ber',
   'Verdun, QC H3E 1Y1',
   'Canada']},
 'phone': '+15147611888',
 'display_phone': '+1 514-761-1888',
 'distance': 44.378052218935245}

The first index (1) was selected for this investigation. In the Yelp JSON, we notice that it has a similar structure to that of the JSON for Foursquare. However, for "categories" the Yelp JSON contains 'alias' and 'title' (and in this case only one category), unlike Foursquare JSON which contains 'id' and 'name'.

From this JSON, the "categories" key does not really provide much information. If we examine the [Yelp](https://docs.developer.yelp.com/docs/resources-categories) documentation and search verbatim "Japanese" we can see it contains sub-categories such as "Blowfish", "Conveyor Belt Sushi", etc.

![image.png](attachment:7b0f183c-5a57-4776-a409-f1d0fd1a72dc.png)

However, if we scroll-up on the webpage, we can conclude "Japanese" is actually a sub-category of "Restaurants" (not in view) based on indentation of the text. 

![image.png](attachment:b520b1db-dce4-4a6c-b686-c12fad7e0eec.png)

Although Yelp provides more categories and sub-categories than Foursquare, its does not provide accurate information about the categories (which could be considered a flaw). To correctly categorize the titles used in the "categories" key we could potentially map each key within "Restaurants" to its subcategories, but that would also be very time-consuming as it contains A-Z strings. As such, the analysis was left off here.