# Where to open a new Japanese restaurant in Vancouver?
#### The Battle of the Neighborhoods -Applied Data Science Capstone by IBM/Coursera
#### By Paola Segundo 

![Japanese](https://vancouver.foodiepulse.com/wp-content/uploads/2019/04/seiza-japanese.jpg)

## Table of contents
* [Introduction](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction <a name="introduction"></a>

### Background
##### The popularity of Japanese restaurants and food in Vancouver is undeniable. From California rolls in sushi places, Ramen in West End, green tea desserts and fluffy cheesecake, the presence of this diverse type of cuisine has a huge impact in the city and seems to be everywhere, but we would like to know in which neighborhoods it is less present than others.

### Problem 
##### A restaurant owner wants to open a second location for his Japanese restaurant, but he noticed that there are a lot of them around the city, so he would like to know in which neighborhood it would be ideal to open.
	
##### The two main criteria will be the median income of the neighborhood, as we want to make sure the surrounding neighbors will have enough money to spend in restaurants, and the most common types of venues, as we would like to make sure there aren't too many japanese restaurants already or too many restaurants in general. 
 


## Data <a name="data"></a>


##### Based on definition of our problem, factors that will influence our decision are:
* Top 5 neighborhoods with more income in Vancouver.
* Number of existing japanese or sushi restaurants in the neighborhood. 
* Most common types of venues in the neighborhoods. 

#### Data Sources 
1) Vancouver Local Areas (Neighborhoods) 
##### We downloaded the following .cvs file from the City of Vancouver Open Data Portal (https://opendata.vancouver.ca/explore/dataset/local-area-boundary/export/?location=12,49.2474,-123.12402) This file contains the names of the neighborhoods in the city, but also the central coordinates of each neighborhood. We can clean the data by dropping some columns and changing the data type to float to read the coordinates during the analysis. 

2) Census Local Area profiles 2016 
##### We downloaded the following cvs file from the City of Vancouver Open Data Portal (https://opendata.vancouver.ca/explore/dataset/census-local-area-profiles-2016/information/) This file contains the names of neighborhoods, spoken languages, income, demographics and other data. We cleaned the data to contain income information only. 

2) Foursquare API venues (Japanese Restaurants) 
##### From the Foursquare API we used the get nearby venues from Neighborhoods in order to identify the presence of japanese and sushi restaurants, but also to identify which are the most frequent categories of venues per neighborhood. 



## Methodology <a name="methodology"></a>

1) Created a request at Foursquare to retrieve venues by proximity of 1000 meters in the 5 highest income neighborhoods.  
2) Processed data frames to determine how many Japanese or Sushi restaurants each neighborhood has. 
3) Calculated the frequency of other venues in order to determine if there are other types of restaurants in the area. 


## Analysis <a name="analysis"></a>

In [1]:
import pandas as pd 
import numpy as np 
import requests 
print('libraries imported')

libraries imported


In [2]:
#Importing Neighborhoods csv with coordinates data
data = pd.read_csv('local-area-boundary.csv', sep=';')
df=data.drop(columns=['Geom'])
df[['Lat','Long']] = df.geo_point_2d.str.split(",",expand=True)
df_yvr=df.drop(columns=['geo_point_2d'])
df_yvr["Lat"] = df_yvr.Lat.astype(float) 
df_yvr["Long"]= df_yvr.Long.astype(float)
df_yvr["Name"]= df_yvr.Name.astype(str)
df_yvr = df_yvr.sort_values(by ='Name')
df_yvr.reset_index(inplace=True)
df_yvr.head()

Unnamed: 0,index,MAPID,Name,Lat,Long
0,12,AR,Arbutus-Ridge,49.246805,-123.161669
1,13,CBD,Downtown,49.280747,-123.116567
2,0,DS,Dunbar-Southlands,49.237962,-123.189547
3,14,FAIR,Fairview,49.26454,-123.131049
4,15,GW,Grandview-Woodland,49.27644,-123.066728


In [3]:
#Importing income csv 
income=pd.read_csv('Income-LocalAreas.csv')
income["income"]= income.income.astype(int)
income["Area"]= income.Area.astype(str)
income.sort_values(by ='Area')
income.head()

Unnamed: 0,Area,income
0,Arbutus-Ridge,62675
1,Downtown,63251
2,Dunbar-Southlands,78117
3,Fairview,61627
4,Grandview-Woodland,42896


In [4]:
#Joining dataframes 
frames=[df_yvr,income]
df=pd.concat(frames,axis=1)
#Organizing Neighborhoods by income
df.sort_values(by='income',ascending=False)
top=df.nlargest(5,'income')
top

Unnamed: 0,index,MAPID,Name,Lat,Long,Area,income
15,19,SHAU,Shaughnessy,49.245681,-123.13976,Shaughnessy,118668
21,11,WPG,West Point Grey,49.268401,-123.203467,West Point Grey,82042
2,0,DS,Dunbar-Southlands,49.237962,-123.189547,Dunbar-Southlands,78117
7,1,KERR,Kerrisdale,49.223655,-123.159576,Kerrisdale,77248
16,4,SC,South Cambie,49.245556,-123.121801,South Cambie,65459


In [5]:
top.to_pickle('./locations.pkl')

In [7]:
#Define Foursquare API Credentials and Version 
CLIENT_ID = '5NDFLJDHBDBXJ2GIMGKDS5UYKTGBTGXZ5DZ10Z0ZZROOGPBP'
CLIENT_SECRET = 'TC1YYTJGX4PTSO51GGN0NRPJ2E1FW1QM4ATFBOYSWDXT0BAY'
VERSION = '20200530'

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 5NDFLJDHBDBXJ2GIMGKDS5UYKTGBTGXZ5DZ10Z0ZZROOGPBP
CLIENT_SECRET:TC1YYTJGX4PTSO51GGN0NRPJ2E1FW1QM4ATFBOYSWDXT0BAY


In [8]:
#Define function to repeat finding venues for all the neighborhoods in Vancouver
def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            '5NDFLJDHBDBXJ2GIMGKDS5UYKTGBTGXZ5DZ10Z0ZZROOGPBP', 
            'TC1YYTJGX4PTSO51GGN0NRPJ2E1FW1QM4ATFBOYSWDXT0BAY', 
            '20200530', 
            lat, 
            lng,
            radius,  
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [9]:
#Creating request for 1000 venues in a radius of 1000 meters 
radius = 1000 
neighborhood_latitude = top['Lat']
neighborhood_longitude = top['Long']
LIMIT=1000
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    '5NDFLJDHBDBXJ2GIMGKDS5UYKTGBTGXZ5DZ10Z0ZZROOGPBP', 
    'TC1YYTJGX4PTSO51GGN0NRPJ2E1FW1QM4ATFBOYSWDXT0BAY', 
    '20200530', neighborhood_latitude, neighborhood_longitude, radius, LIMIT)
yvr_venues = getNearbyVenues(names=top['Name'],
                                   latitudes=top['Lat'],
                                   longitudes=top['Long'],
                            )

Shaughnessy
West Point Grey
Dunbar-Southlands
Kerrisdale
South Cambie


In [10]:
#Resulting dataframe
print(yvr_venues.shape)
yvr_venues.head()

(115, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Shaughnessy,49.245681,-123.13976,Quilchena Park,49.245194,-123.151211,Park
1,Shaughnessy,49.245681,-123.13976,Dragon Ball Tea House,49.249126,-123.127718,Bubble Tea Shop
2,Shaughnessy,49.245681,-123.13976,The Arbutus Club,49.248507,-123.152152,Event Space
3,Shaughnessy,49.245681,-123.13976,The Maze at Van Dusen Gardens,49.23892,-123.136666,Garden
4,Shaughnessy,49.245681,-123.13976,Second Cup,49.244846,-123.126036,Coffee Shop


In [11]:
yvr_japanese = yvr_venues[yvr_venues['Venue Category'].isin(['Japanese Restaurant','Sushi Restaurant'])]
yvr_japanese

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
11,West Point Grey,49.268401,-123.203467,Takumi Japanese Restaurant,49.26377,-123.206784,Japanese Restaurant
18,West Point Grey,49.268401,-123.203467,Sun Sushi,49.26381,-123.209577,Sushi Restaurant
20,West Point Grey,49.268401,-123.203467,Hime Sushi,49.263847,-123.2088,Sushi Restaurant
37,Dunbar-Southlands,49.237962,-123.189547,Red Tuna,49.234746,-123.184952,Japanese Restaurant
84,South Cambie,49.245556,-123.121801,Goma Sushi,49.252434,-123.127385,Sushi Restaurant
92,South Cambie,49.245556,-123.121801,Osaka Sushi,49.248559,-123.125674,Sushi Restaurant


In [12]:
yvr_japanese.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dunbar-Southlands,1,1,1,1,1,1
South Cambie,2,2,2,2,2,2
West Point Grey,3,3,3,3,3,3


## Data Analysis 

In [13]:
#Numberof Venues per neighborhood 
yvr_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dunbar-Southlands,20,20,20,20,20,20
Kerrisdale,20,20,20,20,20,20
Shaughnessy,8,8,8,8,8,8
South Cambie,38,38,38,38,38,38
West Point Grey,29,29,29,29,29,29


In [15]:
#Preparing data to analyze
# one hot encoding
vancouver_onehot = pd.get_dummies(yvr_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
vancouver_onehot['Neighborhood'] = yvr_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [vancouver_onehot.columns[-1]] + list(vancouver_onehot.columns[:-1])
vancouver_onehot = vancouver_onehot[fixed_columns]

vancouver_onehot.head()

Unnamed: 0,Vietnamese Restaurant,Athletics & Sports,Bakery,Bank,Baseball Field,Beach,Bubble Tea Shop,Burger Joint,Bus Stop,Café,...,Pub,Restaurant,Sandwich Place,Scenic Lookout,Seafood Restaurant,Soccer Field,Spanish Restaurant,Sporting Goods Shop,Supermarket,Sushi Restaurant
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [16]:
#Frequency of each type of venue
vancouver_grouped = vancouver_onehot.groupby('Neighborhood').mean().reset_index()
vancouver_grouped

Unnamed: 0,Neighborhood,Vietnamese Restaurant,Athletics & Sports,Bakery,Bank,Baseball Field,Beach,Bubble Tea Shop,Burger Joint,Bus Stop,...,Pub,Restaurant,Sandwich Place,Scenic Lookout,Seafood Restaurant,Soccer Field,Spanish Restaurant,Sporting Goods Shop,Supermarket,Sushi Restaurant
0,Dunbar-Southlands,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,...,0.05,0.05,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Kerrisdale,0.0,0.0,0.05,0.0,0.0,0.0,0.05,0.0,0.15,...,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.05,0.0
2,Shaughnessy,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.125,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,South Cambie,0.026316,0.0,0.0,0.052632,0.0,0.0,0.026316,0.0,0.0,...,0.0,0.026316,0.052632,0.026316,0.026316,0.026316,0.0,0.026316,0.0,0.052632
4,West Point Grey,0.034483,0.034483,0.034483,0.068966,0.0,0.068966,0.0,0.034483,0.034483,...,0.0,0.034483,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.068966


In [18]:
num_top_venues = 5

for hood in vancouver_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = vancouver_grouped[vancouver_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Dunbar-Southlands----
           venue  freq
0  Grocery Store  0.15
1           Café  0.10
2   Liquor Store  0.10
3    Coffee Shop  0.10
4            Gym  0.10


----Kerrisdale----
                venue  freq
0            Bus Stop  0.15
1  Chinese Restaurant  0.10
2                Café  0.10
3         Golf Course  0.10
4       Grocery Store  0.10


----Shaughnessy----
             venue  freq
0           Garden  0.25
1             Park  0.25
2  Bubble Tea Shop  0.12
3     Burger Joint  0.12
4      Coffee Shop  0.12


----South Cambie----
              venue  freq
0       Coffee Shop  0.16
1            Garden  0.11
2              Park  0.11
3  Sushi Restaurant  0.05
4              Bank  0.05


----West Point Grey----
              venue  freq
0  Sushi Restaurant  0.07
1              Bank  0.07
2             Beach  0.07
3              Park  0.07
4   Harbor / Marina  0.07




In [19]:
#Creating DataFrame with data 
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [20]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = vancouver_grouped['Neighborhood']

for ind in np.arange(vancouver_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(vancouver_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Dunbar-Southlands,Grocery Store,Coffee Shop,Gym,Liquor Store,Café,Baseball Field,Sandwich Place,Restaurant,Pub,Gym / Fitness Center
1,Kerrisdale,Bus Stop,Café,Golf Course,Chinese Restaurant,Grocery Store,Bubble Tea Shop,Supermarket,Pool,Bakery,Pizza Place
2,Shaughnessy,Garden,Park,Coffee Shop,Bubble Tea Shop,Burger Joint,Event Space,Dessert Shop,Gym,Grocery Store,Greek Restaurant
3,South Cambie,Coffee Shop,Garden,Park,Sushi Restaurant,Bank,Chinese Restaurant,Sandwich Place,Outdoor Sculpture,Bubble Tea Shop,Café
4,West Point Grey,Sushi Restaurant,Park,Bank,Beach,Harbor / Marina,Department Store,Irish Pub,Hostel,Gym,Garden


## Results and Discussion <a name="results"></a>

##### After processing our data, we have found that only 3 of our 5 highest income neighborhoods in Vancouver have Japanese or Sushi Restaurants. On the other hand Sushi Restaurants are the most popular type of venue in West Point Grey and the 4th most common venue in South Cambie. 

##### The highest income Neighborhood is Shaughnessy and it doesn't have any Japanese or Sushi Restaurant, it is also the Neighborhood with less restaurant venues in its most common venues, which means a good business opportunity. 


##### It would be great to see the reason behind this distribution of venues, as some neighborhoods are mostly residential and have smaller commercial areas, which means the price of commercial rent is higher. It would be nice to add the price of commercial rent to the study to think in more information and to be able to make a better recommendation for a new Japanese Restaurant. 

##### This type of analysis helps a lot to discover patterns that are not visible while walking through the areas. It is easy to conclude that there are a lot of japanese restaurants in a city like Vancouver, the distribution doesn't seem equal  but it's hard to come to conclusions without this type of process. 


## Conclusion <a name="conclusion"></a>

##### Our recommendation is to open a Japanese Restaurant in either Shaughnessy or in Kerrisdale neighborhoods.  Shaugnessy neighborhood is the highest income neighborhood and has no sushi or japanese restaurants. Kerrisdale is the 4th highest income neighborhood in Vancouver and it also doesn’t have any Japanese or sushi restaurants, but it has more restaurants than Shaugnessy.
