# Where to open a new Japanese restaurant in Vancouver?
#### The Battle of the Neighborhoods -Applied Data Science Capstone by IBM/Coursera
#### By Paola Segundo 

![Japanese](https://vancouver.foodiepulse.com/wp-content/uploads/2019/04/seiza-japanese.jpg)

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

##### The popularity of Japanese restaurants and food in Vancouver is undeniable. From California rolls in sushi places, Ramen in West End, green tea desserts and fluffy cheesecake, the presence of this diverse type of cuisine has a huge impact in the city and seems to be everywhere, but we would like to know in which neighborhoods it is less present than others. 
##### This analysis will recommend an optimal neighborhood to open a new Japanese restaurant 

## Data <a name="data"></a>

#### 1) Vancouver Local Areas (Neighborhoods) 
##### We downloaded the following cvs file form the City of Vancouver Open Data Portal (https://opendata.vancouver.ca/explore/dataset/local-area-boundary/export/?location=12,49.2474,-123.12402) This file contains the names of the neighborhoods in the city, but also the central coordinates of each neighborhood. We can clean the data by dropping some columns and changing the data type to float to read the coordinates during the analysis. 

#### 2) Foursquare API venues (Japanese Restaurants) 
##### From the Foursquare API we used the get nearby venues and the Japanese restaurant category filter to generate a data frame of Japanese restaurants by neighborhood. 


## Methodology <a name="methodology"></a>

#### 1) Cleaned and prepare Vancouver Local Areas (Neighborhoods) cvs file 
#### 2) Created request at Foursquare to retrieve Japanese Restaurant venues by proximity of 500 meters. 
#### 3) Processed data frames to determine which neighborhood has less Japanese restaurants 
#### 4) Mapped and recommend location

## Analysis <a name="analysis"></a>

In [13]:
import pandas as pd 
import numpy as np 
import requests 
print('libraries imported')

libraries imported


In [14]:
data = pd.read_csv('local-area-boundary.csv', sep=';')
df=data.drop(columns=['Geom'])
df[['Lat','Long']] = df.geo_point_2d.str.split(",",expand=True)
df_yvr=df.drop(columns=['geo_point_2d'])
df_yvr["Lat"] = df_yvr.Lat.astype(float) 
df_yvr["Long"]= df_yvr.Long.astype(float)
df_yvr.head()

Unnamed: 0,MAPID,Name,Lat,Long
0,DS,Dunbar-Southlands,49.237962,-123.189547
1,KERR,Kerrisdale,49.223655,-123.159576
2,KIL,Killarney,49.217022,-123.037647
3,KITS,Kitsilano,49.26754,-123.163295
4,SC,South Cambie,49.245556,-123.121801


In [15]:
df_yvr.to_pickle('./locations.pkl')

In [16]:
#Define Foursquare API Credentials and Version 
CLIENT_ID = '5NDFLJDHBDBXJ2GIMGKDS5UYKTGBTGXZ5DZ10Z0ZZROOGPBP'
CLIENT_SECRET = 'I5G0130GRVPHVBMWBPD2PVOK1VKAVJS43FYS53UFVUKNIPRO'
VERSION = '20200517'
query='Japanese'

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 5NDFLJDHBDBXJ2GIMGKDS5UYKTGBTGXZ5DZ10Z0ZZROOGPBP
CLIENT_SECRET:I5G0130GRVPHVBMWBPD2PVOK1VKAVJS43FYS53UFVUKNIPRO


In [17]:
#Define function to repeat finding venues for all the neighborhoods in Vancouver
def getNearbyVenues(names, latitudes, longitudes, query, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&categoryID={}&limit={}'.format(
            '5NDFLJDHBDBXJ2GIMGKDS5UYKTGBTGXZ5DZ10Z0ZZROOGPBP', 
            'I5G0130GRVPHVBMWBPD2PVOK1VKAVJS43FYS53UFVUKNIPRO', 
            '20200517', 
            lat, 
            lng, 
            query,
            radius,  
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [23]:
#Creating request for 100 venues in a radius of 500 meters 
radius = 500 
neighborhood_latitude = df_yvr.loc[0,'Lat']
neighborhood_longitude = df_yvr.loc[0,'Long']
query='japanese'
LIMIT=1000
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&query={}&limit={}'.format(
    '5NDFLJDHBDBXJ2GIMGKDS5UYKTGBTGXZ5DZ10Z0ZZROOGPBP', 
    'I5G0130GRVPHVBMWBPD2PVOK1VKAVJS43FYS53UFVUKNIPRO', 
    '20200517', neighborhood_latitude, neighborhood_longitude, radius, query, LIMIT)
yvr_venues = getNearbyVenues(names=df_yvr['Name'],
                                   latitudes=df_yvr['Lat'],
                                   longitudes=df_yvr['Long'],
                                    query=(query)
                            )

Dunbar-Southlands


KeyError: 'groups'

In [112]:
# Let's now go over our neighborhood locations and get nearby restaurants; we'll also maintain a dictionary of all found restaurants and all found italian restaurants
japanese_restaurant_categories = ['4bf58dd8d48988d111941735','55a59bace4b013909087cb0c','55a59bace4b013909087cb30',
                                 '55a59bace4b013909087cb21','55a59bace4b013909087cb06','55a59bace4b013909087cb1b',
                                 '55a59bace4b013909087cb1e','55a59bace4b013909087cb18','55a59bace4b013909087cb24',
                                 '55a59bace4b013909087cb15','55a59bace4b013909087cb27','55a59bace4b013909087cb12',
                                 '4bf58dd8d48988d1d2941735','55a59bace4b013909087cb2d','55a59a31e4b013909087cb00',
                                 '55a59af1e4b013909087cb03','55a59bace4b013909087cb2a','55a59bace4b013909087cb0f',
                                 '55a59bace4b013909087cb33','55a59bace4b013909087cb09','55a59bace4b013909087cb36']   

def is_restaurant(categories, specific_filter=None):
    restaurant_words = ['restaurant', 'diner', 'taverna', 'steakhouse']
    restaurant = False
    specific = False
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in restaurant_words:
            if r in category_name:
                restaurant = True
        if 'fast food' in category_name:
            restaurant = False
        if not(specific_filter is None) and (category_id in specific_filter):
            specific = True
            restaurant = True
    return restaurant, specific

area_restaurants = []
for venue in yvr_venues:
        venue_id = venue[0]
        venue_name = venue[1]
        venue_categories = venue[2]
        venue_latlon = venue[3]
        venue_address = venue[4]
        is_res, is_japanese = is_restaurant(category, specific_filter=japanese_restaurant_categories)
        if is_res:
            x, y = lonlat_to_xy(venue_latlon[1], venue_latlon[0])
        restaurant = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, is_japanese, x, y)
        if venue_distance<=500:
            area_restaurants.append(restaurant)
        restaurants[venue_id] = restaurant
        if is_japanese:
            japanese_restaurants[venue_id] = restaurant
location_restaurants.append(area_restaurants)
print(' .', end='')
print(' done.')
return restaurants, japanese_restaurants, location_restaurants     

IndexError: string index out of range

In [77]:
#Resulting dataframe
print(yvr_venues.shape)
yvr_venues

(452, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Dunbar-Southlands,49.237962,-123.189547,Red Tuna,49.234746,-123.184952,Japanese Restaurant
1,Dunbar-Southlands,49.237962,-123.189547,Blaq Sheep Coffee House And Bistro,49.235501,-123.185324,Coffee Shop
2,Dunbar-Southlands,49.237962,-123.189547,H-Mart,49.235813,-123.185640,Grocery Store
3,Dunbar-Southlands,49.237962,-123.189547,Celtic Treasure Chest,49.235169,-123.185504,Grocery Store
4,Dunbar-Southlands,49.237962,-123.189547,"Bus Stop 50298 (7,32,N22)",49.234912,-123.185422,Bus Stop
...,...,...,...,...,...,...,...
447,West End,49.285011,-123.135438,Sushi Sky,49.281461,-123.133628,Sushi Restaurant
448,West End,49.285011,-123.135438,West End Farmers Market,49.282503,-123.130264,Farmers Market
449,West End,49.285011,-123.135438,Best Western Plus Sands,49.286867,-123.140883,Hotel
450,West End,49.285011,-123.135438,Red Umbrella Cafe,49.286289,-123.140324,Café


In [93]:
# Category IDs corresponding to Japanese restaurants were taken from Foursquare web site (https://developer.foursquare.com/docs/resources/categories):

food_category = '4d4b7105d754a06374d81259'

japanese_restaurant_categories = ['4bf58dd8d48988d111941735','55a59bace4b013909087cb0c','55a59bace4b013909087cb30',
                                 '55a59bace4b013909087cb21','55a59bace4b013909087cb06','55a59bace4b013909087cb1b',
                                 '55a59bace4b013909087cb1e','55a59bace4b013909087cb18','55a59bace4b013909087cb24',
                                 '55a59bace4b013909087cb15','55a59bace4b013909087cb27','55a59bace4b013909087cb12',
                                 '4bf58dd8d48988d1d2941735','55a59bace4b013909087cb2d','55a59a31e4b013909087cb00',
                                 '55a59af1e4b013909087cb03','55a59bace4b013909087cb2a','55a59bace4b013909087cb0f',
                                 '55a59bace4b013909087cb33','55a59bace4b013909087cb09','55a59bace4b013909087cb36']

def is_restaurant(categories, specific_filter=None):
    restaurant_words = ['restaurant', 'diner', 'taverna', 'steakhouse']
    restaurant = False
    specific = False
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in restaurant_words:
            if r in category_name:
                restaurant = True
    return restaurant, specific

def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def format_address(location):
    address = ', '.join(location['formattedAddress'])
    address = address.replace(', Canada', '')
    return address

def get_venues_near_location(lat, lon, category, client_id, client_secret, radius=500, limit=100):
    version = '20180724'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        CLIENT_ID, CLIENT_SECRET, VERSION, lat, lon, category, radius, limit)
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   format_address(item['venue']['location']),
                   item['venue']['location']['distance']) for item in results]        
    except:
        venues = []
    return venues

In [96]:
# Let's now go over our neighborhood locations and get nearby restaurants; we'll also maintain a dictionary of all found restaurants and all found japanese restaurants

import pickle

def get_restaurants(lats, lons):
    restaurants = {}
    japanese_restaurants = {}
    location_restaurants = []

    print('Obtaining venues around candidate locations:', end='')
    for lat, lon in zip(lats, lons):
        # Using radius=900 to make sure we have overlaps/full coverage so we don't miss any restaurant (we're using dictionaries to remove any duplicates resulting from area overlaps)
        venues = get_venues_near_location(lat, lon, food_category, CLIENT_ID, CLIENT_SECRET, radius=900, limit=1000)
        area_restaurants = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlon = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            is_res, is_japanese = is_restaurant(venue_categories, specific_filter=japanese_restaurant_categories)
            if is_res:
                x, y = lonlat_to_xy(venue_latlon[1], venue_latlon[0])
                restaurant = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, is_italian, x, y)
                if venue_distance<=900:
                    area_restaurants.append(restaurant)
                restaurants[venue_id] = restaurant
                if is_japanese:
                    japanese_restaurants[venue_id] = restaurant
        location_restaurants.append(area_restaurants)
        print(' .', end='')
    print(' done.')
    return restaurants, japanese_restaurants, location_restaurants
# Try to load from local file system in case we did this before
restaurants = {}
japanese_restaurants = {}
location_restaurants = []
loaded = False
try:
    with open('restaurants_500.pkl', 'rb') as f:
        restaurants = pickle.load(f)
    with open('italian_restaurants_500.pkl', 'rb') as f:
        italian_restaurants = pickle.load(f)
    with open('location_restaurants_500.pkl', 'rb') as f:
        location_restaurants = pickle.load(f)
    print('Restaurant data loaded.')
    loaded = True
except:
    pass

# If load failed use the Foursquare API to get the data
if not loaded:
    restaurants, japanese_restaurants, location_restaurants = get_restaurants('Lat', 'Long')
    
    # Let's persists this in local file system
    with open('restaurants_500.pkl', 'wb') as f:
        pickle.dump(restaurants, f)
    with open('japanese_restaurants_500.pkl', 'wb') as f:
        pickle.dump(japanese_restaurants, f)
    with open('location_restaurants_500.pkl', 'wb') as f:
        pickle.dump(location_restaurants, f)

Obtaining venues around candidate locations: . . . done.


In [97]:
import numpy as np

print('Total number of restaurants:', len(restaurants))
print('Total number of Japanese restaurants:', len(japanese_restaurants))
print('Percentage of Japanese restaurants: {:.2f}%'.format(len(japanese_restaurants) / len(restaurants) * 100))
print('Average number of restaurants in neighborhood:', np.array([len(r) for r in location_restaurants]).mean())

Total number of restaurants: 0
Total number of Japanese restaurants: 0


ZeroDivisionError: division by zero

## Results and Discussion <a name="results"></a>

## Conclusion <a name="conclusion"></a>