# Mini-Project II: Exploring Top Destinations in Vancouver!

#### Michael Halim - June 4, 2021

Downtown Vancouver is home to some of the most amazing places and tourist attractions; from restaurants and entertainment to outdoor activities. However, there are many people - including, myself - who's lived in Vancouver for years and not fully explore what Downtown Vancouver has to offer either because they do not know where to go or they do not have the time. This project aims to identify and determine the shortest path to visit the top 10 places in Downtown Vancouver!

First question: What are the top 10 places to visit in Downtown Vancouver? There are many ways to answer this question; however, one of the ways we can gather data to answer this question are APIs. Fortunately, there are many API services that provide information regarding nearby businesses, including Yelp and Foursquare. Thus, to begin this analysis, I start by sampling restaurants data from both Yelp and Foursquare's API and compare their data coverage and quality to determine which is the better API resource. Then, using the selected API resource, I will obtain a recommended list of activities to do in Downtown Vancouver. To determine the shortest distance, hence the most optimal path, I will implement the Travelling Salesman Algorithm through Google's ORTools. 

## Importing Libraries and Specifying Universal Variables

In [4]:
import pandas as pd
from IPython.display import JSON
from foursquare import Foursquare
import requests

ModuleNotFoundError: No module named 'foursquare'

In [6]:
lat = 49.2820
long = -123.1171

## Extracting Data from Yelp API

The first step is to get restaurant data from Yelp's API. To do so, I will import the necessary API keys and create a function to collect data from Yelp using the requests module.

In [8]:
from yelpconfig import YelpAPI

yelp = YelpAPI()
yelp_clientid = yelp.client_id
yelp_apikey = yelp.api_key

In [5]:
def get_yelp(latitude, longitude, query="", category=[]):
    
    yelp_url = "https://api.yelp.com/v3/businesses/search"
    
    header = {
        "Authorization": f"Bearer {yelp_apikey}"
    }
    
    parameters = {
        "term": query,
        "latitude": latitude,
        "longitude": longitude,
        "radius": 3000,
        "limit": 50,
        "categories": []
    }
    
    return requests.get(url=yelp_url, headers=header, params=parameters).json()

Using the above function, we can extract top 50 restaurants from Yelp and store them into a variable. We can loop through this JSON to extract necessary information. I did the following by setting up a dictionary with keys of the information that I want to extract and store all the necessary information into a list as items of the dictionary. We can then convert this dictionary directly into a dataframe to clean and explore!

In [9]:
search="restaurant"
yelp_restaurants = get_yelp(lat, long, query=search)

In [68]:
def yelp_to_dict(json_data):
    
    my_dict = {"name": [], "address": [], "city": [], "province": [], "postal_code": [], "rating": [], "review_count": [], "price": []}

    for item in json_data["businesses"]:
        my_dict["name"].append(item["name"])
        my_dict["rating"].append(item["rating"])
        my_dict["review_count"].append(item["review_count"])
        my_dict["address"].append(item["location"]["address1"])
        my_dict["city"].append(item["location"]["city"])
        my_dict["province"].append(item["location"]["state"])
        my_dict['postal_code'].append(item["location"]["zip_code"])

        try:
            my_dict["price"].append(item["price"])
        except:
            my_dict["price"].append(".")
    
    return my_dict

In [70]:
yelp_restaurants_dict = yelp_to_dict(yelp_restaurants)

In [71]:
yelp_restaurants_df = pd.DataFrame(yelp_restaurants_dict)
yelp_restaurants_df.head(5)

Unnamed: 0,name,address,city,province,postal_code,rating,review_count,price
0,Medina Cafe,780 Richards Street,Vancouver,BC,V6B 3A4,4.0,2309,$$
1,The Flying Pig - Yaletown,1168 Hamilton Street,Vancouver,BC,V6B 2S2,4.0,1090,$$
2,Le Crocodile Restaurant,909 Burrard Street,Vancouver,BC,V6Z 2N2,4.5,386,$$$$
3,Fanny Bay Oyster Bar & Shellfish Market,762 Cambie St,Vancouver,BC,V6B 2P2,4.5,575,$$
4,Chambar,568 Beatty Street,Vancouver,BC,V6B 2L3,4.0,1363,$$$


## Extracting Data from FourSquare API

We repeat a similar process, but this time to extract data from FourSquare's API. 

In [15]:
from fsconfignew import FourSquareAPI

fs = FourSquareAPI()
fs_id = fs.client_id
fs_key = fs.client_key

client = Foursquare(client_id=fs_id, client_secret=fs_key, version='20210528')

In [16]:
def get_fs(latitude, longitude, section="", category=""):
    
    parameters = {
        'll': f"{latitude},{longitude}",
        'radius': 3000,
        'section': section,
        'limit': 50,
        'categoryID': category,
    }
    
    return client.venues.explore(params=parameters)

Using the above function, we can explore the top 50 restaurants in a 3 km radius from the latitude and longitude specified above.

In [17]:
section="food"
fs_restaurants = get_fs(lat, long, section=section)

In [18]:
JSON(fs_restaurants)

<IPython.core.display.JSON object>

Looking at the output from the above API, we can see that we obtain information regarding restaurant name and address. Thus, to get the ratings, review counts, and price, we need to obtain more details about each of the recommended restaurants using a separate API call by passing in each venue ID. 

In [20]:
fs_restaurant_name = []
fs_restaurant_id = []

for restaurant in fs_restaurants['groups'][0]['items']:
    fs_restaurant_name.append(restaurant['venue']['name'])
    fs_restaurant_id.append(restaurant['venue']['id'])

In [22]:
fs_restaurant_dict = {'name': [], 'address': [], 
                      'city': [], 'province': [],
                      'postal_code': [], 'rating': [],
                      'review_count': [], 'price': []}

In [None]:
for fs_id in fs_restaurant_id:

    data = client.venues(fs_id)
    name = data['venue']['name']
    
    try: 
        address = data['venue']['location']['address']
    except:
        address = ""
    
    try: 
        city = data['venue']['location']['city']
    except:
        city = ""
   
    try: 
        province = data['venue']['location']['state']
    except:
        province = ""
    
    try:
        post_code = data['venue']['location']['postalCode']
    except:
        post_code = 0
    
    try:
        rating = data['venue']['rating']
    except:
        rating = 0
    
    try: 
        review = data['venue']['likes']['count']
    except:
        review = 0
    
    try:
        price = data['venue']['price']['tier']
    except:
        price = 0
    
    fs_restaurant_dict['name'].append(name)
    fs_restaurant_dict['address'].append(address)
    fs_restaurant_dict['city'].append(city)
    fs_restaurant_dict['province'].append(province)
    fs_restaurant_dict['postal_code'].append(post_code)
    fs_restaurant_dict['rating'].append(rating)
    fs_restaurant_dict['review_count'].append(review)
    fs_restaurant_dict['price'].append(price)

In [44]:
fs_restaurants_df = pd.DataFrame(fs_restaurant_dict)

In [45]:
fs_restaurants_df.head(5)

Unnamed: 0,name,address,city,province,postal_code,rating,review_count,price
0,Gotham Steakhouse & Cocktail Bar,615 Seymour St.,Vancouver,BC,V6B 3K3,9.2,117,4
1,Medina Café,780 Richards St,Vancouver,BC,V6B 2L3,8.9,789,2
2,The Keg Steakhouse + Bar - Dunsmuir,688 Dunsmuir Street,Vancouver,BC,V6B 1N3,8.7,158,2
3,Meat & Bread,370 Cambie St,Vancouver,BC,V6B 2N3,9.2,468,2
4,Le Crocodile Restaurant,909 Burrard St,Vancouver,BC,V6Z 2N2,9.2,63,4


## Data Cleaning

Looking at the raw dataframes generated using Yelp and FourSquare's API, we notice a few things different between the data generated between the two APIs. 1) They have a different rating scale (out of 10 instead of out of 5) and 2) They have a different price "unit" (integers instead of strings). We need to take these considerations into account in order to perform a fair comparison amongst the dataset. To do so, we'll normalize the ratings and change the dollar units into an integer. We can also add a column to label where each dataset came from. Once we've cleaned the dataset, we can also store it into an excel file, in case things go wrong!

### Cleaning Yelp DataFrame

In [72]:
yelp_df = yelp_restaurants_df.copy()
yelp_df['api'] = 'yelp'
yelp_df['rating'] = round(yelp_df['rating']/5, 2)

In [73]:
def change_price(price):
    if price == "$":
        return 1
    elif price == "$$":
        return 2
    elif price == "$$$":
        return 3
    elif price == "$$$$":
        return 4
    else:
        return 0

In [74]:
yelp_df['price'] = yelp_df['price'].map(change_price)

In [75]:
yelp_df.head(5)

Unnamed: 0,name,address,city,province,postal_code,rating,review_count,price,api
0,Medina Cafe,780 Richards Street,Vancouver,BC,V6B 3A4,0.8,2309,2,yelp
1,The Flying Pig - Yaletown,1168 Hamilton Street,Vancouver,BC,V6B 2S2,0.8,1090,2,yelp
2,Le Crocodile Restaurant,909 Burrard Street,Vancouver,BC,V6Z 2N2,0.9,386,4,yelp
3,Fanny Bay Oyster Bar & Shellfish Market,762 Cambie St,Vancouver,BC,V6B 2P2,0.9,575,2,yelp
4,Chambar,568 Beatty Street,Vancouver,BC,V6B 2L3,0.8,1363,3,yelp


In [38]:
yelp_df.to_csv("yelp_restaurants.csv")

### Cleaning Four Square Data

In [51]:
fs_df = fs_restaurants_df.copy()
fs_df['api'] = 'foursquare'

In [52]:
fs_df['rating'] = round(fs_restaurants_df['rating']/10, 1)

In [56]:
fs_df['city'] = 'Vancouver'
fs_df['province'] = "BC"

In [58]:
fs_df.head(5)

Unnamed: 0,name,address,city,province,postal_code,rating,review_count,price,api
0,Gotham Steakhouse & Cocktail Bar,615 Seymour St.,Vancouver,BC,V6B 3K3,0.9,117,4,foursquare
1,Medina Café,780 Richards St,Vancouver,BC,V6B 2L3,0.9,789,2,foursquare
2,The Keg Steakhouse + Bar - Dunsmuir,688 Dunsmuir Street,Vancouver,BC,V6B 1N3,0.9,158,2,foursquare
3,Meat & Bread,370 Cambie St,Vancouver,BC,V6B 2N3,0.9,468,2,foursquare
4,Le Crocodile Restaurant,909 Burrard St,Vancouver,BC,V6Z 2N2,0.9,63,4,foursquare


In [59]:
fs_df.to_csv("foursquare_restaurants.csv")

## Storing Data in a SQLite Database

Now that both we've cleaned both FourSquare and Yelp data, we can store them into a SQLite database for future use (and for practice working with SQL!)

In [62]:
import sqlite3 
from sqlite3 import Error

def create_connection(path):
    connection = None
    try:
        connection = sqlite3.connect(path)
        print("Connection to SQLite DB successful")
        return connection
    except Error as e:
        print(f"The error '{e}' occured")
        return 

In [63]:
connection = create_connection("pointsofinterest.sqlite")

Connection to SQLite DB successful


In [64]:
fs_df.to_sql('restaurants', connection, index=False)

In [65]:
yelp_df.to_sql('restaurants', connection, index=False, if_exists="append")

In [67]:
pd.read_sql("SELECT * FROM restaurants", connection).head(5)

Unnamed: 0,name,address,city,province,postal_code,rating,review_count,price,api
0,Gotham Steakhouse & Cocktail Bar,615 Seymour St.,Vancouver,BC,V6B 3K3,0.9,117,4,foursquare
1,Medina Café,780 Richards St,Vancouver,BC,V6B 2L3,0.9,789,2,foursquare
2,The Keg Steakhouse + Bar - Dunsmuir,688 Dunsmuir Street,Vancouver,BC,V6B 1N3,0.9,158,2,foursquare
3,Meat & Bread,370 Cambie St,Vancouver,BC,V6B 2N3,0.9,468,2,foursquare
4,Le Crocodile Restaurant,909 Burrard St,Vancouver,BC,V6Z 2N2,0.9,63,4,foursquare


## Comparing API Quality


In [76]:
merged = yelp_df.append(fs_df)

In [77]:
merged[['api', 'rating', 'review_count', 'price']].groupby('api').mean()

Unnamed: 0_level_0,rating,review_count,price
api,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
foursquare,0.848,145.54,1.92
yelp,0.834,431.7,1.84


In [78]:
merged_df = yelp_df.append(fs_df.drop_duplicates(subset=['name']))

In [79]:
duplicates_df = merged_df[merged_df.duplicated(subset=['name'], keep=False)]
final_df = duplicates_df.groupby(['name', 'api']).max().reset_index()
final_df.to_csv("duplicates_df.csv")

In [82]:
final_df

Unnamed: 0,name,api,address,city,province,postal_code,rating,review_count,price
0,Chambar,foursquare,568 Beatty St,Vancouver,BC,V6B 2L3,0.9,390,3
1,Chambar,yelp,568 Beatty Street,Vancouver,BC,V6B 2L3,0.8,1363,3
2,Joe Fortes Seafood & Chop House,foursquare,777 Thurlow St,Vancouver,BC,V6E 3V5,0.9,385,3
3,Joe Fortes Seafood & Chop House,yelp,777 Thurlow Street,Vancouver,BC,V6E 3V5,0.8,1043,3
4,Kokoro Tokyo Mazesoba,foursquare,551 Seymore St,Vancouver,BC,V6B 3H6,0.8,23,1
5,Kokoro Tokyo Mazesoba,yelp,551 Seymour Street,Vancouver,BC,V6B 3H6,0.8,346,2
6,La Taqueria Pinche Taco Shop,foursquare,586 Hornby Street,Vancouver,BC,V6C 2E8,0.8,37,1
7,La Taqueria Pinche Taco Shop,yelp,322 W Hastings Street,Vancouver,BC,V8B 1K6,0.9,477,2
8,Le Crocodile Restaurant,foursquare,909 Burrard St,Vancouver,BC,V6Z 2N2,0.9,63,4
9,Le Crocodile Restaurant,yelp,909 Burrard Street,Vancouver,BC,V6Z 2N2,0.9,386,4


## Top 10 Points of Interest

In [83]:
restaurants_df = yelp_df.copy()
restaurants_df = restaurants_df.drop(['api'], axis=1)

### Outdoor Activities

In [84]:
categories = ["Active Life"]
search = "Active Life"
active_life_data = get_yelp(lat, long, query=search, category=categories)

In [85]:
yelp_activelife_dict = yelp_to_dict(active_life_data)

In [94]:
activelife_df = pd.DataFrame(yelp_activelife_dict)

In [95]:
activelife_df['price'] = activelife_df['price'].map(change_price)
activelife_df['rating'] = round(activelife_df['rating']/5, 1)

In [96]:
activelife_df.head(5)

Unnamed: 0,name,address,city,province,postal_code,rating,review_count,price
0,Vancouver Seawall,,Vancouver,BC,,1.0,95,0
1,Vancouver Water Adventures,1812 Boatlift Lane,Vancouver,BC,V6H 3Y2,0.9,41,0
2,Find and Seek Puzzle Adventure Escape Rooms,88 W Pender Street,Vancouver,BC,V6B 6N8,1.0,92,0
3,Sara Binder Living,,Vancouver,BC,V6E 2M6,1.0,1,0
4,FlyOver Canada,999 Canada Place,Vancouver,BC,V6C 3E1,0.8,197,0


### Entertainment Data

In [86]:
categories = ["Arts", "Entertainment"]
search = ["Arts Entertainment"]
entertainment_data = get_yelp(lat, long, query=search, category=categories)

In [98]:
yelp_entertainment_dict = yelp_to_dict(entertainment_data)

In [99]:
entertainment_df = pd.DataFrame(yelp_entertainment_dict)

In [100]:
entertainment_df['price'] = entertainment_df['price'].map(change_price)
entertainment_df['rating'] = round(entertainment_df['rating']/5, 1)

In [102]:
entertainment_df.head(5)

Unnamed: 0,name,address,city,province,postal_code,rating,review_count,price
0,Dimensions Art Gallery,432 W Hastings St,Vancouver,BC,V6B 1L1,0.8,6,0
1,Illuminate Yaletown,1200 Hamilton Street,Vancouver,BC,V6B 2Y5,0.8,1,0
2,MakerLabs,780 E Cordova Street,Vancouver,BC,V6A 1M3,0.8,5,0
3,Movieland Arcade,906 Granville Street,Vancouver,BC,V6Z 1L2,0.7,30,0
4,The Den,1348 Robson St,Vancouver,BC,V6E 1C5,0.6,1,0


### Take top 3 data from restaurants and outdoor activities; take top 4 data from entertainment

In [103]:
top_activelife_df = activelife_df.sort_values(['review_count'], ascending=False)[:3]

In [104]:
top_entertainment_df = entertainment_df.sort_values(['review_count'], ascending=False)[:4]

In [105]:
top_food_df = restaurants_df.sort_values(['review_count'], ascending=False)[:3]

In [114]:
places_to_visit = top_food_df.append([top_activelife_df, top_entertainment_df])

## Finding Shortest Path between Top 10 Destinations

In [110]:
from googleapi import GoogleAPI
import googlemaps

In [111]:
google = GoogleAPI()
google_api_key = google.api_key

gmaps = googlemaps.Client(key=google_api_key)

In [115]:
places_to_visit['full_address'] = places_to_visit['address'] + " " + places_to_visit['city']

In [116]:
places_to_visit['key'] = 0

In [117]:
reference_df = places_to_visit[["full_address", "name", "key"]].merge(places_to_visit[["full_address", "name", "key"]], on="key").drop('key', axis=1)

In [118]:
reference_df.columns = "origin_address", "origin_name", "destination_address", "destination_name"

In [119]:
distance_df = reference_df.copy().drop(["origin_name", "destination_name"], axis=1)
distance_df.columns = "origin", "destination"
distance_df

Unnamed: 0,origin,destination
0,780 Richards Street Vancouver,780 Richards Street Vancouver
1,780 Richards Street Vancouver,200 Granville Street Vancouver
2,780 Richards Street Vancouver,568 Beatty Street Vancouver
3,780 Richards Street Vancouver,1166 Stanley Park Drive Vancouver
4,780 Richards Street Vancouver,845 Avison Way Vancouver
...,...,...
95,555 W Hastings Street Vancouver,999 Canada Place Vancouver
96,555 W Hastings Street Vancouver,1 Alexander Street Vancouver
97,555 W Hastings Street Vancouver,578 Carrall Street Vancouver
98,555 W Hastings Street Vancouver,750 Hornby Street Vancouver


In [120]:
distances = []
durations = []

In [None]:
with open("distances.txt", "a") as output:
    output.write(str(distances))
    
with open("durations.txt", "a") as output:
    output.write(str(durations))

In [None]:
distance_matrix = []
count = 0
for i in range(10):
    entry = [item for item in distances[count:count+10]]
    distance_matrix.append(entry)
    count += 10