# Restaurant set up project

## Introduction/Business Problem

The goal of this project is to help person who is interested to set up a new restaurant in Toronto city and have no idea what type of restaurant is the most popular. And also to know where are the existing competitors located at the map, how they are clustered, in order to choose the best place to set up a new restaurant.

## Data Section

The data needed for this project including:
1. Information of Toronto city's neighborhoods.
2. Latitude and longitude of each Toronto city's neighborhoods.
3. Information of each neighborhoods venue with category.
4. Map to be used to render the final result

## Methodology

The method used in this project is K-Means cluster in order to separate different neighborhood based on the different type of restaurant that each neighborhood has. Then be able to find the best option of restaurant to set up at the neighborhood selected.

## Creating Data Frame

In [1]:
# import the library we use to open URLs
import urllib.request

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

import pandas as pd
import numpy as np

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-2.0.0                |     pyh9f0ad1d_0          63 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          97 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-2.0.0-pyh9f0ad1d_0



Downloading and Extracting Packages
geopy-2.0.0          | 63 KB     | ##################################### | 100% 
geographiclib-1.50   | 34 KB     | ################################

In [2]:
# specify which URL/web page we are going to be scraping
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"

In [3]:
# open the url using urllib.request and put the HTML into the page variable
page = urllib.request.urlopen(url)

In [4]:
pip install BeautifulSoup4

Collecting BeautifulSoup4
[?25l  Downloading https://files.pythonhosted.org/packages/66/25/ff030e2437265616a1e9b25ccc864e0371a0bc3adb7c5a404fd661c6f4f6/beautifulsoup4-4.9.1-py3-none-any.whl (115kB)
[K     |████████████████████████████████| 122kB 4.9MB/s eta 0:00:01
[?25hCollecting soupsieve>1.2 (from BeautifulSoup4)
  Downloading https://files.pythonhosted.org/packages/6f/8f/457f4a5390eeae1cc3aeab89deb7724c965be841ffca6cfca9197482e470/soupsieve-2.0.1-py3-none-any.whl
Installing collected packages: soupsieve, BeautifulSoup4
Successfully installed BeautifulSoup4-4.9.1 soupsieve-2.0.1
Note: you may need to restart the kernel to use updated packages.


In [5]:
from bs4 import BeautifulSoup

In [6]:
# parse the HTML from our URL into the BeautifulSoup parse tree format
soup = BeautifulSoup(page, "html.parser")

In [7]:
right_table=soup.find('table', class_='wikitable sortable')

In [8]:
A=[]
B=[]
C=[]

for row in right_table.findAll('tr'):
    cells=row.findAll('td')
    if len(cells) == 3:
        A.append(cells[0].text.strip())
        B.append(cells[1].text.strip())
        C.append(cells[2].text.strip())

In [9]:
toront_df=pd.DataFrame({'Postal Code': A,
                       'Borough': B,
                       'Neighborhood': C})

In [10]:
toront_df.set_index('Postal Code')
toront_df.shape

(180, 3)

In [11]:
new_df = toront_df[~toront_df.Borough	.str.contains("Not assigned")]
new_df

Unnamed: 0,Postal Code,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
160,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
165,M4Y,Downtown Toronto,Church and Wellesley
168,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
169,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


In [12]:
new_df.shape

(103, 3)

## Creating Coordinates Data Frame

In [13]:
coordinates_df = pd.read_csv('http://cocl.us/Geospatial_data')
coordinates_df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [14]:
merged_df = new_df.join(coordinates_df.set_index('Postal Code'), on='Postal Code')

In [15]:
merged_df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
2,M3A,North York,Parkwoods,43.753259,-79.329656
3,M4A,North York,Victoria Village,43.725882,-79.315572
4,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
5,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


## Explore neighborhoods in Toronto

In [16]:
merged_df.Borough.unique()

array(['North York', 'Downtown Toronto', 'Etobicoke', 'Scarborough',
       'East York', 'York', 'East Toronto', 'West Toronto',
       'Central Toronto', 'Mississauga'], dtype=object)

In [17]:
toronto_data = merged_df[merged_df['Borough'].str.contains('Toronto')].reset_index(drop=True)
toronto_data.drop('Postal Code', axis=1, inplace=True)
toronto_data.set_index('Borough', inplace=True)
toronto_data.head()

Unnamed: 0_level_0,Neighborhood,Latitude,Longitude
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
Downtown Toronto,St. James Town,43.651494,-79.375418
East Toronto,The Beaches,43.676357,-79.293031


#### Define Foursquare Credentials and Version

In [18]:
CLIENT_ID = 'DQ40LJ2HFGAR5MHESMY4PQN0XE5QF0EXEHG525KJIIYSUOR4' # your Foursquare ID
CLIENT_SECRET = '35IZRBD0ODVUZ5OBC55FK3SUKPTFT0G1UWYFDWWGHEYTV2SP' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: DQ40LJ2HFGAR5MHESMY4PQN0XE5QF0EXEHG525KJIIYSUOR4
CLIENT_SECRET:35IZRBD0ODVUZ5OBC55FK3SUKPTFT0G1UWYFDWWGHEYTV2SP


In [19]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            100)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [20]:
toronto_venues = getNearbyVenues(names=toronto_data['Neighborhood'],
                                   latitudes=toronto_data['Latitude'],
                                   longitudes=toronto_data['Longitude']
                                  )

Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
The Beaches
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
The Danforth West, Riverdale
Toronto Dominion Centre, Design Exchange
Brockton, Parkdale Village, Exhibition Place
India Bazaar, The Beaches West
Commerce Court, Victoria Hotel
Studio District
Lawrence Park
Roselawn
Davisville North
Forest Hill North & West, Forest Hill Road Park
High Park, The Junction South
North Toronto West,  Lawrence Park
The Annex, North Midtown, Yorkville
Parkdale, Roncesvalles
Davisville
University of Toronto, Harbord
Runnymede, Swansea
Moore Park, Summerhill East
Kensington Market, Chinatown, Grange Park
Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport


## Explore restaurants in Toronto

In [21]:
toronto_restaurants = toronto_venues[toronto_venues['Venue Category'].str.contains('Restaurant')].reset_index(drop=True)
toronto_restaurants.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant
1,"Regent Park, Harbourfront",43.65426,-79.360636,Cluny Bistro & Boulangerie,43.650565,-79.357843,French Restaurant
2,"Regent Park, Harbourfront",43.65426,-79.360636,El Catrin,43.650601,-79.35892,Mexican Restaurant
3,"Regent Park, Harbourfront",43.65426,-79.360636,Izumi,43.64997,-79.360153,Asian Restaurant
4,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,Tokyo Sushi,43.665885,-79.386977,Sushi Restaurant


## Analyze Each Neighborhood

In [22]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_restaurants[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_restaurants['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Asian Restaurant,Belgian Restaurant,Brazilian Restaurant,Cajun / Creole Restaurant,Caribbean Restaurant,Chinese Restaurant,Colombian Restaurant,Comfort Food Restaurant,...,New American Restaurant,Ramen Restaurant,Restaurant,Seafood Restaurant,Sushi Restaurant,Taiwanese Restaurant,Thai Restaurant,Theme Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,1,0,0,0,0,0,0,0
1,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Regent Park, Harbourfront",0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Queen's Park, Ontario Provincial Government",0,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0


### Number of restaurant by each Neighborhood

In [23]:
Most_res = toronto_onehot.groupby('Neighborhood').sum()
Most_res['sum'] = Most_res.sum(axis=1)
Most_res.sort_values('sum', ascending=False, inplace=True)
Most_res.head()

Unnamed: 0_level_0,American Restaurant,Asian Restaurant,Belgian Restaurant,Brazilian Restaurant,Cajun / Creole Restaurant,Caribbean Restaurant,Chinese Restaurant,Colombian Restaurant,Comfort Food Restaurant,Cuban Restaurant,...,Ramen Restaurant,Restaurant,Seafood Restaurant,Sushi Restaurant,Taiwanese Restaurant,Thai Restaurant,Theme Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,sum
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
"First Canadian Place, Underground city",3,3,0,1,0,0,0,1,0,0,...,0,4,3,2,0,2,0,1,0,30
"Commerce Court, Victoria Hotel",4,1,0,0,0,0,0,0,0,0,...,0,7,3,0,0,2,0,2,0,29
"Toronto Dominion Centre, Design Exchange",3,1,0,0,0,0,1,0,0,0,...,0,3,3,1,0,0,0,1,0,24
St. James Town,3,1,1,0,0,0,0,0,1,0,...,0,5,2,0,0,1,0,1,0,24
"Richmond, Adelaide, King",2,1,0,1,0,0,0,1,0,0,...,0,4,1,2,0,3,0,1,0,23


### Toronto Top 10 Type of Restaurant

In [24]:
toronto_top_10_res = toronto_onehot.drop('Neighborhood', axis=1).mean().reset_index(name='score')

# delete ambiguous type of Restaurant
delete_row = toronto_top_10_res[toronto_top_10_res["index"]=='Restaurant'].index
top_10 = toronto_top_10_res.drop(delete_row)
top_10.sort_values('score', ascending=False, inplace=True)
top_10 = top_10.head(10)
for i in range(top_10['score'].count()):
    top_10.iloc[i,1]= "Top {}".format(i+1)
top_10.rename(columns={'index':'Venue Category'}, inplace=True)
top_10.set_index('score', inplace=True)
top_10

Unnamed: 0_level_0,Venue Category
score,Unnamed: 1_level_1
Top 1,Italian Restaurant
Top 2,Japanese Restaurant
Top 3,Seafood Restaurant
Top 4,American Restaurant
Top 5,Sushi Restaurant
Top 6,Thai Restaurant
Top 7,Vegetarian / Vegan Restaurant
Top 8,Greek Restaurant
Top 9,Fast Food Restaurant
Top 10,Asian Restaurant


In [25]:
# Filtering data frame of top 10
keys = list(top_10.columns.values)
top_10_index = top_10.set_index(keys).index
toronto_restaurants_top_10 = toronto_restaurants['Venue Category'].isin(top_10_index)
toronto_restaurants_top_10 = toronto_restaurants[toronto_restaurants_top_10]
toronto_restaurants_top_10.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
3,"Regent Park, Harbourfront",43.65426,-79.360636,Izumi,43.64997,-79.360153,Asian Restaurant
4,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,Tokyo Sushi,43.665885,-79.386977,Sushi Restaurant
7,"Garden District, Ryerson",43.657162,-79.378937,Salad King,43.657601,-79.38162,Thai Restaurant
9,"Garden District, Ryerson",43.657162,-79.378937,Crepe Delicious,43.654536,-79.380889,Fast Food Restaurant
10,"Garden District, Ryerson",43.657162,-79.378937,Kinka Izakaya Original,43.660596,-79.378891,Japanese Restaurant


In [26]:
# one hot encoding
toronto_onehot_top_10 = pd.get_dummies(toronto_restaurants_top_10[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot_top_10['Neighborhood'] = toronto_restaurants_top_10['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns2 = [toronto_onehot_top_10.columns[-1]] + list(toronto_onehot_top_10.columns[:-1])
toronto_onehot_top_10 = toronto_onehot_top_10[fixed_columns2]

# group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
toronto_res_grouped = toronto_onehot_top_10.groupby('Neighborhood').mean().reset_index()

toronto_onehot_top_10.head()

Unnamed: 0,Neighborhood,American Restaurant,Asian Restaurant,Fast Food Restaurant,Greek Restaurant,Italian Restaurant,Japanese Restaurant,Seafood Restaurant,Sushi Restaurant,Thai Restaurant,Vegetarian / Vegan Restaurant
3,"Regent Park, Harbourfront",0,1,0,0,0,0,0,0,0,0
4,"Queen's Park, Ontario Provincial Government",0,0,0,0,0,0,0,1,0,0
7,"Garden District, Ryerson",0,0,0,0,0,0,0,0,1,0
9,"Garden District, Ryerson",0,0,1,0,0,0,0,0,0,0
10,"Garden District, Ryerson",0,0,0,0,0,1,0,0,0,0


In [27]:
address = 'Toronto, Canada'

geolocator = Nominatim(user_agent="t_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [28]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

## Function to get restaurant type distribution in Toronto

In [29]:
def getTorontoTopRestaurant(df, restaurant):
    # create map of Toronto using latitude and longitude values
    address = 'Toronto, Canada'
    geolocator = Nominatim(user_agent="t_explorer")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    map_top = folium.Map(location=[latitude, longitude], zoom_start=11)

    top_res = df.loc[df['Venue Category'] == restaurant]
    
    # add markers to map
    for lat, lng, label in zip(top_res['Venue Latitude'], top_res['Venue Longitude'], top_res['Venue']):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(map_top)  

    return map_top

In [30]:
getTorontoTopRestaurant(toronto_restaurants_top_10, 'Italian Restaurant')

In [31]:
ita_res = toronto_res_grouped.sort_values('Italian Restaurant', ascending=False)
ita_res.head()

Unnamed: 0,Neighborhood,American Restaurant,Asian Restaurant,Fast Food Restaurant,Greek Restaurant,Italian Restaurant,Japanese Restaurant,Seafood Restaurant,Sushi Restaurant,Thai Restaurant,Vegetarian / Vegan Restaurant
17,"Parkdale, Roncesvalles",0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
1,"Brockton, Parkdale Village, Exhibition Place",0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
4,Christie,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
23,"St. James Town, Cabbagetown",0.0,0.0,0.0,0.0,0.5,0.25,0.0,0.0,0.25,0.0
21,"Runnymede, Swansea",0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.4,0.0,0.2


In [32]:
getTorontoTopRestaurant(toronto_restaurants_top_10, 'Japanese Restaurant')

In [33]:
Jap_res = toronto_res_grouped.sort_values('Japanese Restaurant', ascending=False)
Jap_res.head()

Unnamed: 0,Neighborhood,American Restaurant,Asian Restaurant,Fast Food Restaurant,Greek Restaurant,Italian Restaurant,Japanese Restaurant,Seafood Restaurant,Sushi Restaurant,Thai Restaurant,Vegetarian / Vegan Restaurant
31,"University of Toronto, Harbord",0.0,0.0,0.0,0.0,0.25,0.5,0.0,0.25,0.0,0.0
5,Church and Wellesley,0.090909,0.0,0.090909,0.0,0.0,0.363636,0.0,0.363636,0.090909,0.0
10,"Garden District, Ryerson",0.0,0.0,0.2,0.0,0.3,0.3,0.1,0.0,0.1,0.0
14,"Kensington Market, Chinatown, Grange Park",0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.75
3,Central Bay Street,0.0,0.0,0.0,0.0,0.333333,0.25,0.083333,0.083333,0.166667,0.083333


In [34]:
getTorontoTopRestaurant(toronto_restaurants_top_10, 'Sushi Restaurant')

In [35]:
Sus_res = toronto_res_grouped.sort_values('Sushi Restaurant', ascending=False)
Sus_res.head()

Unnamed: 0,Neighborhood,American Restaurant,Asian Restaurant,Fast Food Restaurant,Greek Restaurant,Italian Restaurant,Japanese Restaurant,Seafood Restaurant,Sushi Restaurant,Thai Restaurant,Vegetarian / Vegan Restaurant
18,"Queen's Park, Ontario Provincial Government",0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
9,"Forest Hill North & West, Forest Hill Road Park",0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
26,"Summerhill West, Rathnelly, South Hill, Forest...",0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0
21,"Runnymede, Swansea",0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.4,0.0,0.2
5,Church and Wellesley,0.090909,0.0,0.090909,0.0,0.0,0.363636,0.0,0.363636,0.090909,0.0


## Cluster Neighborhoods by top 10

In [40]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_res_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 3, 4, 0, 3, 1, 0, 0, 0, 1], dtype=int32)

In [41]:
# create new toronto data (only restaurants)
toronto_restaurants_top_10.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
3,"Regent Park, Harbourfront",43.65426,-79.360636,Izumi,43.64997,-79.360153,Asian Restaurant
4,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,Tokyo Sushi,43.665885,-79.386977,Sushi Restaurant
7,"Garden District, Ryerson",43.657162,-79.378937,Salad King,43.657601,-79.38162,Thai Restaurant
9,"Garden District, Ryerson",43.657162,-79.378937,Crepe Delicious,43.654536,-79.380889,Fast Food Restaurant
10,"Garden District, Ryerson",43.657162,-79.378937,Kinka Izakaya Original,43.660596,-79.378891,Japanese Restaurant


In [42]:
new_toronto_data = toronto_restaurants_top_10.drop(toronto_restaurants_top_10.columns[[3,4,5,6]], axis =1)
ntd = new_toronto_data.groupby('Neighborhood').first()

In [43]:
toronto_res_grouped.insert(0, 'Cluster Labels', kmeans.labels_)

In [44]:
toronto_merged = ntd
toronto_merged = toronto_merged.join(toronto_res_grouped.set_index('Neighborhood'), on='Neighborhood')

In [45]:
toronto_merged.reset_index(inplace=True)
toronto_merged.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Cluster Labels,American Restaurant,Asian Restaurant,Fast Food Restaurant,Greek Restaurant,Italian Restaurant,Japanese Restaurant,Seafood Restaurant,Sushi Restaurant,Thai Restaurant,Vegetarian / Vegan Restaurant
0,Berczy Park,43.644771,-79.373306,0,0.0,0.0,0.0,0.142857,0.142857,0.142857,0.285714,0.0,0.142857,0.142857
1,"Brockton, Parkdale Village, Exhibition Place",43.636847,-79.428191,3,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
2,"Business reply mail Processing Centre, South C...",43.662744,-79.321558,4,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Central Bay Street,43.657952,-79.387383,0,0.0,0.0,0.0,0.0,0.333333,0.25,0.083333,0.083333,0.166667,0.083333
4,Christie,43.669542,-79.422564,3,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0


In [46]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Neighborhood Latitude'], toronto_merged['Neighborhood Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Result

1. According to this analysis Italian Restaurant, Japanese Restaurant and Sushi Restaurant are the top 3 most popular type of restaurant in Toronto city.

2. Parkdale, Roncesvalles, Brockton, Parkdale Village, Exhibition Place and Christie are the neighborhoods that have greater number of Italian Restaurants. (Cluster number 1)

3. University of Toronto, Harbord and Church and Wellesley are the neighborhoods that have greater number of Japanese Restaurant. (Cluster number 0)

4. Forest Hill North & West, Queen's Park, Ontario Provincial Government and Summerhill West, Rathnelly, South Hill are the neighborhoods that have greater number of Sushi Restaurant. (Cluster number 4)

## Discussion

Base on this analysis, if someone is looking for a place to set up a new restaurant in Toronto city. In the first place I will recommend 3 neighborhoods: First Canadian Place, Underground city, Commerce Court, Victoria Hotel and Toronto Dominion Centre, Design Exchange. These neighborhoods have most number of restaurant which also means major concentration of people looking for food.

After defining the place to set up a new restaurant, what type of restaurant is the most popular in Toronto will also be valuable to consider. According to the analysis Italian Restaurant and Japanese Restaurant are the most popular restaurants in Toronto city, follow by Sushi Restaurant, Seafood Restaurant and American Restaurant.

## Conclusion

As Conclusion this analysis help the investor to have a better vision about Toronto population preferences of restaurant, also where are the best location to set up a new restaurant. Futhermore using the same disign of project (changing the object) could also help to study different type of venue which will be valuable for different type of investors.