# Capstone Project The Battle of the Neighborhoods

### Table of contents

* [Introduction](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Results](#analysis)
* [Discussion](#results)
* [Conclusion](#conclusion)

## 1. Introduction

Toronto is Canada's largest city, the most populous city in Canada, and home to a diverse population of about 2.9 million people.
The city is ranked as one of the top destinations around the globe. It boasts world-class restaurants, cultural attractions as varied as the cultures themselves.
Moreover, Toronto is recognized for being Canada’s commercial capital and for its excellence in a number of sectors including life sciences, technology, and education. Thus, the outstanding opportunities attract investors all around the world. 

A group of stakeholders intend to open a **Chinese restaurant** in **downtown Toronto**, the main central business district of Toronto.

The location will make an impact on succeed of the restaurant. We particularly interested in:
1)	**areas with no Chinese restaurants;**
2)  **areas which are not crowded with restaurants.**

## 2. Data

### 2.1 Data sources

Information of Neighborhoods of Toronto can be found in a Wikipedia page.  A table in this page list postal code, borough and neighborhood name.

In week 3, the course provides a link of a csv document, through which we can obtain the georgical coordinate conveniently. 

We can use Foursquare API to get location information of neighborhoods.

### 2.2 Data cleaning

#### Get the neighborhood information from the Wikipedia page.

Firstly, we use lxml package to scrape the table from Wikipedia page.

In [1]:
import pandas as pd

In [2]:
#install lxml
! pip install lxml
print('successfully installed the package!')

successfully installed the package!


In [3]:
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
df=pd.read_html(url)[0]
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


Then, we delete the rouws,that have Borough with 'Not assigned' value; and check the rows, that have Neighborhood with 'Not assigned' value. 

In [4]:
df.drop(df[df['Borough']=='Not assigned'].index,inplace=True)

In [5]:
df[df['Neighbourhood']=='Not assigned'].count()

Postal Code      0
Borough          0
Neighbourhood    0
dtype: int64

As every Neighborhood has a valid values, we don't need to process. 
Reset index and print the shape of the Dataframe.

In [6]:
df=df.reset_index(drop=True)
df.rename(columns={'Neighbourhood':'Neighborhood'},inplace=True)
print("The size of the dataframe is",df.shape)

The size of the dataframe is (103, 3)


In [7]:
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


#### Get the geographical information (latitude and the longitude coordinates) of each neighborhood

Use the csv file to get the geographical information and load the data into a dataframe.

In [8]:
data='http://cocl.us/Geospatial_data'
df_ll=pd.read_csv(data)
df_ll.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Join the two dataframes to get a new dataframe, with neighborhood information (Postcode,Borough and Neighborhood) and the geographical coordinates.

In [9]:
df_toronto=pd.merge(df,df_ll,on='Postal Code')
df_toronto.rename(columns={'Postal Code':'PostalCode'},inplace=True)
df_toronto.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


Check the shape of the dataframe.

In [10]:
print("The size of the dataframe is",df.shape)

The size of the dataframe is (103, 3)


#### Get the venues information

Import all packages we need.

In [11]:
import numpy as np # library to handle data in a vectorized manner

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't installed
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't installed
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


Our stakeholders intend to open a Chinese restaurant in downtown Toronto. So we slice the original dataframe and create a new dataframe of the Downtown Toronto data.

In [12]:
df_DT = df_toronto[df_toronto['Borough']=='Downtown Toronto']

In [13]:
df_DT=df_DT.reset_index(drop=True)
df_DT.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306


In [14]:
df_DT.shape

(19, 5)

In [15]:
#export as csv
#df_DT.to_csv('DT.csv',index=False)

Use Foursquare API to get venues’ information in each neighborhood. We write a function to get the top 100 venues in a radius of 500 meters in every neighborhood.

In [16]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    CLIENT_ID = 'GL5S4H4TRAC2XW2SLXHB4VIJPLFYW5L1S5T2DJDXBZ2VJ25T' # your Foursquare ID
    CLIENT_SECRET = 'HNWCHMUSPKSXPPB1LLXLHMKTWN2IMAIHJY45PNGGFSCRALDX' # your Foursquare Secret
    VERSION = '20180605' # Foursquare API version
    LIMIT = 100 # A default Foursquare API limit value
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [17]:
#get venues information of all neighborhoods in Downtown Toronto
DT_venues = getNearbyVenues(names=df_DT['Neighborhood'],
                                   latitudes=df_DT['Latitude'],
                                   longitudes=df_DT['Longitude']
                                  )


Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Harbourfront East, Union Station, Toronto Islands
Toronto Dominion Centre, Design Exchange
Commerce Court, Victoria Hotel
University of Toronto, Harbord
Kensington Market, Chinatown, Grange Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Rosedale
Stn A PO Boxes
St. James Town, Cabbagetown
First Canadian Place, Underground city
Church and Wellesley


##### **Important remainder** ---

The Foursquare server is not steady, sometimes we fail to get the dataframe DT_vennues. After I successfully runned the code and got the DT_venues, I exported it to a CSV document. If you can't get the DT_venues, please use the CSV document.

If the Foursquare is not steady and can not get the data, Double-click **here** to use the CSV 

<!-- 
DT_venues=pd.read_csv('https://raw.githubusercontent.com/XM-Shang/Coursera_Capstone_Week3/main/DT_venues.csv')
print('The shape of DT_venues is  ',DT_venues.shape)
DT_venues.head()
--> 

##### ---*The remainder comes to an end.*

Let's check the venues dataframe.

In [18]:
print('The shape of DT_venues is  ',DT_venues.shape)
DT_venues.head()

The shape of DT_venues is   (1253, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Regent Park, Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,"Regent Park, Harbourfront",43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant
4,"Regent Park, Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa


In Downtown Toronto, there are 1253 venues.

### 2.3 Feature selection

#### 2.3.1 General view of feature

We care about the category of each venue. Let's check how many unique  categories in downtown Toronto.

In [19]:
print('There are {} unique categories.'.format(len(DT_venues['Venue Category'].unique())))

There are 214 unique categories.


In [20]:
#print every category
category_list=[]
for category in DT_venues['Venue Category'].unique():
    category_list.append(category)
    #print(category)
print(category_list)

['Bakery', 'Coffee Shop', 'Distribution Center', 'Restaurant', 'Spa', 'Breakfast Spot', 'Park', 'Gym / Fitness Center', 'Pub', 'Historic Site', 'Farmers Market', 'Chocolate Shop', 'Performing Arts Venue', 'Dessert Shop', 'French Restaurant', 'Theater', 'Yoga Studio', 'Mexican Restaurant', 'Café', 'Event Space', 'Shoe Store', 'Art Gallery', 'Electronics Store', 'Brewery', 'Bank', 'Beer Store', 'Hotel', 'Antique Shop', 'Portuguese Restaurant', 'Italian Restaurant', 'Beer Bar', 'Creperie', 'Sushi Restaurant', 'Hobby Shop', 'Diner', 'Fried Chicken Joint', 'Chinese Restaurant', 'Smoothie Shop', 'Sandwich Place', 'Gym', 'Bar', 'College Auditorium', 'College Cafeteria', 'Music Venue', 'Clothing Store', 'Comic Shop', 'Plaza', 'Burrito Place', 'Pizza Place', 'Ramen Restaurant', 'Thai Restaurant', 'Burger Joint', 'Shopping Mall', 'New American Restaurant', 'Bookstore', 'Gastropub', 'Tanning Salon', 'Japanese Restaurant', 'Fast Food Restaurant', 'Steakhouse', 'College Rec Center', 'Middle Eastern

We should focus on category 'restaurant'. However, there are so many categories related to restaurant, in another words, they are specific restaurant categories, such as 'French Restaurant', 'Mexican Restaurant', 'Portuguese Restaurant', 'Italian Restaurant'.

Those categories should be taken into consideration as well. We examined every category and extracted categories, whose name contains ‘Restaurant’. We regard them as competitor category.

In [21]:
import re

In [22]:
#extract all category, whose name contains 'Restaurant'
competitor_category=[]
temp=[]
for category in category_list:
    temp=re.findall('Restaurant',category)
    if temp==['Restaurant']:
        competitor_category.append(category)        

In [23]:
len(competitor_category)

44

In [24]:
print(competitor_category)

['Restaurant', 'French Restaurant', 'Mexican Restaurant', 'Portuguese Restaurant', 'Italian Restaurant', 'Sushi Restaurant', 'Chinese Restaurant', 'Ramen Restaurant', 'Thai Restaurant', 'New American Restaurant', 'Japanese Restaurant', 'Fast Food Restaurant', 'Middle Eastern Restaurant', 'Modern European Restaurant', 'Ethiopian Restaurant', 'Seafood Restaurant', 'Vietnamese Restaurant', 'American Restaurant', 'Latin American Restaurant', 'Vegetarian / Vegan Restaurant', 'German Restaurant', 'Comfort Food Restaurant', 'Asian Restaurant', 'Moroccan Restaurant', 'Belgian Restaurant', 'Greek Restaurant', 'Eastern European Restaurant', 'Falafel Restaurant', 'Indian Restaurant', 'Korean Restaurant', 'Colombian Restaurant', 'Mediterranean Restaurant', 'Brazilian Restaurant', 'Gluten-free Restaurant', 'Caribbean Restaurant', 'Dumpling Restaurant', 'Filipino Restaurant', 'Doner Restaurant', 'Dim Sum Restaurant', 'Molecular Gastronomy Restaurant', 'Taiwanese Restaurant', 'Sri Lankan Restaurant',

We found, there are 44 competitor categoryies. Next step, we get all the competitor venues, and structure them into a dataframe.

In [25]:
mask=DT_venues['Venue Category'].isin(competitor_category)
DT_restaurant=DT_venues[mask]
DT_restaurant=DT_restaurant.reset_index(drop=True)
print('How many competitor venues?',DT_restaurant.shape[0])
DT_restaurant.head()

How many competitor venues? 297


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant
1,"Regent Park, Harbourfront",43.65426,-79.360636,Cluny Bistro & Boulangerie,43.650565,-79.357843,French Restaurant
2,"Regent Park, Harbourfront",43.65426,-79.360636,El Catrin,43.650601,-79.35892,Mexican Restaurant
3,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,Nando's,43.661728,-79.386391,Portuguese Restaurant
4,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,Mercatto,43.660391,-79.387664,Italian Restaurant


Let's explore further which type of restanut is most common and show the top 10 category.

In [26]:
Category_group=DT_restaurant.groupby('Venue Category').count()
Category_group.sort_values(by=['Neighborhood'],ascending=False,inplace=True)
Category_group.head(10)

Unnamed: 0_level_0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Restaurant,44,44,44,44,44,44
Japanese Restaurant,31,31,31,31,31,31
Italian Restaurant,22,22,22,22,22,22
Seafood Restaurant,20,20,20,20,20,20
Sushi Restaurant,17,17,17,17,17,17
American Restaurant,16,16,16,16,16,16
Thai Restaurant,15,15,15,15,15,15
Vegetarian / Vegan Restaurant,14,14,14,14,14,14
Asian Restaurant,9,9,9,9,9,9
Mexican Restaurant,9,9,9,9,9,9


From the result, we found that Japanese, Italian restaurant are very popular in downtown Toronto.

#### 2.3.2 Restaurants in each neighborhood

In [27]:
# Get the geograpical coordinate of Downtown Toronto
address='Downtown Toronto, TO'
geolocator=Nominatim(user_agent="to_explorer")
location=geolocator.geocode(address)
latitude=location.latitude
longitude=location.longitude
print('The geograpical coordinate of Downtown Toronto are {},{}.'.format(latitude,longitude))

The geograpical coordinate of Downtown Toronto are 43.6563221,-79.3809161.


Let's create a map and mark these venues. We can also observe how these restaurants are distributed.

In [28]:
#create a map of Downtown Toronto using the geograpical coordinate
map_DT=folium.Map(location=[latitude,longitude],zoom_start=11)

In [29]:
#add neighborhood markers to map
for lat, lng, label in zip(df_DT['Latitude'],df_DT['Longitude'],df_DT['Neighborhood']):
    folium.Marker([lat,lng], popup=label).add_to(map_DT)
    
#add venues markers to map
for lat, lng, label in zip(DT_restaurant['Venue Latitude'],DT_restaurant['Venue Longitude'],DT_restaurant['Venue']):
    label=folium.Popup(label,parse_html=True)
    folium.CircleMarker(
    [lat,lng],
    radius=5,
    popup=label,
    color='blue',
    fill=True,
    fill_color='#3186cc',
    fill_opacity=0.7,
    parse_html=False).add_to(map_DT)

map_DT

In some neighborhoods, restaurant are crowed, such as Toronto Dominion Centre, Design Exchange, Commerce Court, Victoria Hotel, Richmond, Adelaide, King.

#### 2.3.3 Important competitor categories

We examined the restaurant categories again, moreover, we tred to use different colors to mark venues:

1) how many Chinese restaurants in Downtown Toronto? We will use color red to mark the venues.

2) 'Dumpling Restaurant', 'Dim Sum Restaurant' belongs to Chinese food, so these two categories can be regard as most competitive competitors. Here, we define them as 'competitor class 1' and use color pink to mark the venues.

3)'Japanese Restaurant','Italian Restaurant' are two important categories among the top 10 categories, they should be regard as relatively competitive competitors. Here, we define them as 'competitor class 2' and use color green to mark the venues.

4)we use color blue to mark other venues.  

In [30]:
#check the Chinese restaurant
CH_restaurant=DT_restaurant[DT_restaurant['Venue Category']=='Chinese Restaurant']
#print('How many Chinese restaurants?', CH_restaurant.shape[0])

In [31]:
CH_restaurant

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
7,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,Crown Princess Fine Dining 伯爵名宴,43.666455,-79.387698,Chinese Restaurant
23,"Garden District, Ryerson",43.657162,-79.378937,GB Hand-Pulled Noodles,43.656434,-79.383783,Chinese Restaurant
114,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752,Pearl Harbourfront,43.638157,-79.380688,Chinese Restaurant
149,"Toronto Dominion Centre, Design Exchange",43.647177,-79.381576,Szechuan Express,43.646973,-79.379549,Chinese Restaurant
240,"St. James Town, Cabbagetown",43.667967,-79.367675,China Gourmet,43.66418,-79.368359,Chinese Restaurant
241,"St. James Town, Cabbagetown",43.667967,-79.367675,Tender Trap Restaurant,43.667724,-79.369485,Chinese Restaurant
293,Church and Wellesley,43.66586,-79.38316,Crown Princess Fine Dining 伯爵名宴,43.666455,-79.387698,Chinese Restaurant


Two records are about Venue 'Crown Princess Fine Dining', but the latitude and longitude are same. Thus, there are actually one venue called'Crown Princess Fine Dining'. 

These neighborhoods are near to each other, when we use Foursquare to obtain venues information(radius is 500m), the venue apperas in 'Queen's Park, Ontario Provincial Government' and 'Church and Wellesley'. We checked the map again, the venue should belong to 'Queen's Park', then we delete the record with Neighborhood 'Church and Wellesley'.

In [32]:
DT_restaurant.drop(293,inplace=True)
DT_restaurant.reset_index(drop=True,inplace=True)
#DT_restaurant.shape

In [33]:
CH_restaurant=DT_restaurant[DT_restaurant['Venue Category']=='Chinese Restaurant']
CH_restaurant

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
7,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,Crown Princess Fine Dining 伯爵名宴,43.666455,-79.387698,Chinese Restaurant
23,"Garden District, Ryerson",43.657162,-79.378937,GB Hand-Pulled Noodles,43.656434,-79.383783,Chinese Restaurant
114,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752,Pearl Harbourfront,43.638157,-79.380688,Chinese Restaurant
149,"Toronto Dominion Centre, Design Exchange",43.647177,-79.381576,Szechuan Express,43.646973,-79.379549,Chinese Restaurant
240,"St. James Town, Cabbagetown",43.667967,-79.367675,China Gourmet,43.66418,-79.368359,Chinese Restaurant
241,"St. James Town, Cabbagetown",43.667967,-79.367675,Tender Trap Restaurant,43.667724,-79.369485,Chinese Restaurant


There are only 6 Chinese restaurants in Downtown Toronto. We still have an opportunity to enter the market.

Then, we create a new map and mark the restaurant with different colors according to the categories.

In [34]:
#add a new column 'COLOR' to mark color for each venue
DT_restaurant.insert(DT_restaurant.shape[1],'COLOR','color')

In [35]:
#if venue category is 'Chinese Restaurant', set color red
#if venue category is 'Dumpling Restaurant' or 'Dim Sum Restaurant', set color yellow 
#if venue category is 'Japanese Restaurant' or 'Italian Restaurant' , set color green
#others venue category, set color blue
for i in range(0,DT_restaurant.shape[0]):
    if DT_restaurant.iloc[i,DT_restaurant.shape[1]-2]=='Chinese Restaurant':
        DT_restaurant.iloc[i,DT_restaurant.shape[1]-1]='red'
    elif DT_restaurant.iloc[i,DT_restaurant.shape[1]-2]=='Dumpling Restaurant' or DT_restaurant.iloc[i,DT_restaurant.shape[1]-2]=='Dim Sum Restaurant':
        DT_restaurant.iloc[i,DT_restaurant.shape[1]-1]='yellow'
    elif DT_restaurant.iloc[i,DT_restaurant.shape[1]-2]=='Japanese Restaurant' or DT_restaurant.iloc[i,DT_restaurant.shape[1]-2]=='Italian Restaurant':
        DT_restaurant.iloc[i,DT_restaurant.shape[1]-1]='green'
    else:
        DT_restaurant.iloc[i,DT_restaurant.shape[1]-1]='blue'    

In [36]:
DT_restaurant.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,COLOR
0,"Regent Park, Harbourfront",43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant,blue
1,"Regent Park, Harbourfront",43.65426,-79.360636,Cluny Bistro & Boulangerie,43.650565,-79.357843,French Restaurant,blue
2,"Regent Park, Harbourfront",43.65426,-79.360636,El Catrin,43.650601,-79.35892,Mexican Restaurant,blue
3,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,Nando's,43.661728,-79.386391,Portuguese Restaurant,blue
4,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,Mercatto,43.660391,-79.387664,Italian Restaurant,green


In [37]:
DT_restaurant.tail()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,COLOR
291,Church and Wellesley,43.66586,-79.38316,Naan & Kabob,43.669005,-79.386219,Afghan Restaurant,blue
292,Church and Wellesley,43.66586,-79.38316,O. Noir,43.669145,-79.382505,Restaurant,blue
293,Church and Wellesley,43.66586,-79.38316,Kokoni Izakaya,43.664181,-79.380258,Japanese Restaurant,green
294,Church and Wellesley,43.66586,-79.38316,Asahi Sushi,43.669874,-79.382943,Sushi Restaurant,blue
295,Church and Wellesley,43.66586,-79.38316,A&W,43.666415,-79.378235,Fast Food Restaurant,blue


Let's create a map and mark venues with different colors.

In [38]:
#create a map of Downtown Toronto using the geograpical coordinate
map_DT=folium.Map(location=[latitude,longitude],zoom_start=11)

In [39]:
#add markers to map
folium.Marker([latitude,longitude], popup='Downtown Toronto').add_to(map_DT)
for lat, lng, label,col in zip(DT_restaurant['Venue Latitude'],DT_restaurant['Venue Longitude'],DT_restaurant['Venue'],DT_restaurant['COLOR']):
    label=folium.Popup(label,parse_html=True)
    folium.CircleMarker(
    [lat,lng],
    radius=5,
    popup=label,
    color=col,
    fill=True,
    fill_color=col,
    fill_opacity=0.7,
    parse_html=False).add_to(map_DT)

map_DT

## 3.Methodology

We use k-means clustering to create clusters of locations, then, according to stakeholders' interests we compare those clusters.  
We will present map of all venues with clusters and try to identify general zones / neighborhoods / addresses which should be a starting point for final 'street level' exploration, then a further search for optimal venue location by stakeholders.

#### one hot encoding

In [40]:
# one hot encoding
DT_onehot = pd.get_dummies(DT_restaurant[['Venue Category']], prefix="", prefix_sep="")
#DT_onehot.head()

In [41]:
# add neighborhood column back to dataframe
DT_onehot['Neighborhood'] = DT_restaurant['Neighborhood'] 
#DT_onehot.head()

In [42]:
# move 'neighborhood' column to the first column
fixed_columns = [DT_onehot.columns[-1]] + list(DT_onehot.columns[:-1])
DT_onehot = DT_onehot[fixed_columns]

DT_onehot.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,American Restaurant,Asian Restaurant,Belgian Restaurant,Brazilian Restaurant,Caribbean Restaurant,Chinese Restaurant,Colombian Restaurant,Comfort Food Restaurant,Dim Sum Restaurant,Doner Restaurant,Dumpling Restaurant,Eastern European Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,French Restaurant,German Restaurant,Gluten-free Restaurant,Greek Restaurant,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Korean Restaurant,Latin American Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,New American Restaurant,Portuguese Restaurant,Ramen Restaurant,Restaurant,Seafood Restaurant,Sri Lankan Restaurant,Sushi Restaurant,Taiwanese Restaurant,Thai Restaurant,Theme Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
1,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,"Queen's Park, Ontario Provincial Government",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
4,"Queen's Park, Ontario Provincial Government",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [43]:
DT_onehot.shape

(296, 45)

Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [44]:
DT_grouped = DT_onehot.groupby('Neighborhood').mean().reset_index()
DT_grouped

Unnamed: 0,Neighborhood,Afghan Restaurant,American Restaurant,Asian Restaurant,Belgian Restaurant,Brazilian Restaurant,Caribbean Restaurant,Chinese Restaurant,Colombian Restaurant,Comfort Food Restaurant,Dim Sum Restaurant,Doner Restaurant,Dumpling Restaurant,Eastern European Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,French Restaurant,German Restaurant,Gluten-free Restaurant,Greek Restaurant,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Korean Restaurant,Latin American Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,New American Restaurant,Portuguese Restaurant,Ramen Restaurant,Restaurant,Seafood Restaurant,Sri Lankan Restaurant,Sushi Restaurant,Taiwanese Restaurant,Thai Restaurant,Theme Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.076923,0.0,0.076923,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.153846,0.153846,0.0,0.076923,0.0,0.076923,0.0,0.076923,0.0
1,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.05,0.0,0.0,0.0,0.05,0.15,0.1,0.05,0.0,0.0,0.0,0.05,0.05,0.0,0.0,0.05,0.05,0.05,0.05,0.05,0.0,0.05,0.0,0.1,0.0,0.05,0.0
2,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Church and Wellesley,0.043478,0.043478,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.173913,0.0,0.0,0.086957,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.130435,0.0,0.0,0.173913,0.0,0.0,0.043478,0.0,0.043478
4,"Commerce Court, Victoria Hotel",0.0,0.1,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.033333,0.0,0.033333,0.0,0.0,0.066667,0.1,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.233333,0.1,0.0,0.033333,0.0,0.066667,0.0,0.066667,0.0
5,"First Canadian Place, Underground city",0.0,0.103448,0.103448,0.0,0.034483,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.034483,0.0,0.0,0.137931,0.0,0.034483,0.034483,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.137931,0.103448,0.0,0.068966,0.0,0.068966,0.0,0.034483,0.0
6,"Garden District, Ryerson",0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.095238,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,0.142857,0.0,0.0,0.0,0.047619,0.095238,0.047619,0.0,0.0,0.047619,0.0,0.095238,0.047619,0.047619,0.0,0.047619,0.0,0.047619,0.0,0.0,0.047619
7,"Harbourfront East, Union Station, Toronto Islands",0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.153846,0.076923,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.230769,0.076923,0.0,0.076923,0.0,0.0,0.0,0.076923,0.0
8,"Kensington Market, Chinatown, Grange Park",0.0,0.0,0.0,0.047619,0.0,0.047619,0.0,0.0,0.047619,0.047619,0.047619,0.095238,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.190476,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.190476,0.142857
9,"Queen's Park, Ontario Provincial Government",0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.166667,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0


if you want to print top venues in each neighborhood, click **here**
<!--
num_top_venues = 5

for hood in DT_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = DT_grouped[DT_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')
-->

#### present the top ten venue category in each neighborhood

In [45]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Create a dataframe to structure the top 10 most common venue categorie

In [46]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = DT_grouped['Neighborhood']

for ind in np.arange(DT_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(DT_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Seafood Restaurant,Restaurant,Comfort Food Restaurant,Thai Restaurant,Greek Restaurant,Sushi Restaurant,Vegetarian / Vegan Restaurant,Italian Restaurant,Japanese Restaurant,French Restaurant
1,Central Bay Street,Italian Restaurant,Thai Restaurant,Japanese Restaurant,Indian Restaurant,New American Restaurant,French Restaurant,Vegetarian / Vegan Restaurant,Korean Restaurant,Middle Eastern Restaurant,Falafel Restaurant
2,Christie,Italian Restaurant,Restaurant,Vietnamese Restaurant,Doner Restaurant,German Restaurant,French Restaurant,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant
3,Church and Wellesley,Sushi Restaurant,Japanese Restaurant,Restaurant,Mediterranean Restaurant,Vietnamese Restaurant,Mexican Restaurant,American Restaurant,Caribbean Restaurant,Ethiopian Restaurant,Fast Food Restaurant
4,"Commerce Court, Victoria Hotel",Restaurant,American Restaurant,Japanese Restaurant,Seafood Restaurant,Thai Restaurant,Asian Restaurant,Italian Restaurant,Vegetarian / Vegan Restaurant,New American Restaurant,Fast Food Restaurant


#### Segment and cluster

Run k-means to segment the neighborhood into 5 clusters.

In [47]:
# set number of clusters
kclusters = 5

DT_grouped_clustering = DT_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(DT_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 3, 1, 0, 0, 0, 3, 3, 4, 3])

In [48]:
# add clustering labels
neighborhoods_venues_sorted.insert(1, 'Cluster Labels', kmeans.labels_)

DT_merged = df_DT

# merge sorted with DT_data to add latitude/longitude for each neighborhood
DT_merged = neighborhoods_venues_sorted.join(DT_merged.set_index('Neighborhood'), on='Neighborhood')

DT_merged # check the last columns!

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,PostalCode,Borough,Latitude,Longitude
0,Berczy Park,0,Seafood Restaurant,Restaurant,Comfort Food Restaurant,Thai Restaurant,Greek Restaurant,Sushi Restaurant,Vegetarian / Vegan Restaurant,Italian Restaurant,Japanese Restaurant,French Restaurant,M5E,Downtown Toronto,43.644771,-79.373306
1,Central Bay Street,3,Italian Restaurant,Thai Restaurant,Japanese Restaurant,Indian Restaurant,New American Restaurant,French Restaurant,Vegetarian / Vegan Restaurant,Korean Restaurant,Middle Eastern Restaurant,Falafel Restaurant,M5G,Downtown Toronto,43.657952,-79.387383
2,Christie,1,Italian Restaurant,Restaurant,Vietnamese Restaurant,Doner Restaurant,German Restaurant,French Restaurant,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,M6G,Downtown Toronto,43.669542,-79.422564
3,Church and Wellesley,0,Sushi Restaurant,Japanese Restaurant,Restaurant,Mediterranean Restaurant,Vietnamese Restaurant,Mexican Restaurant,American Restaurant,Caribbean Restaurant,Ethiopian Restaurant,Fast Food Restaurant,M4Y,Downtown Toronto,43.66586,-79.38316
4,"Commerce Court, Victoria Hotel",0,Restaurant,American Restaurant,Japanese Restaurant,Seafood Restaurant,Thai Restaurant,Asian Restaurant,Italian Restaurant,Vegetarian / Vegan Restaurant,New American Restaurant,Fast Food Restaurant,M5L,Downtown Toronto,43.648198,-79.379817
5,"First Canadian Place, Underground city",0,Japanese Restaurant,Restaurant,American Restaurant,Asian Restaurant,Seafood Restaurant,Thai Restaurant,Sushi Restaurant,Colombian Restaurant,New American Restaurant,Vegetarian / Vegan Restaurant,M5X,Downtown Toronto,43.648429,-79.38228
6,"Garden District, Ryerson",3,Japanese Restaurant,Italian Restaurant,Middle Eastern Restaurant,Fast Food Restaurant,Ramen Restaurant,Vietnamese Restaurant,New American Restaurant,Ethiopian Restaurant,Mexican Restaurant,Modern European Restaurant,M5B,Downtown Toronto,43.657162,-79.378937
7,"Harbourfront East, Union Station, Toronto Islands",3,Restaurant,Italian Restaurant,Indian Restaurant,Mexican Restaurant,Vegetarian / Vegan Restaurant,Sushi Restaurant,Seafood Restaurant,Japanese Restaurant,Chinese Restaurant,New American Restaurant,M5J,Downtown Toronto,43.640816,-79.381752
8,"Kensington Market, Chinatown, Grange Park",4,Vegetarian / Vegan Restaurant,Mexican Restaurant,Vietnamese Restaurant,Dumpling Restaurant,Comfort Food Restaurant,Japanese Restaurant,Filipino Restaurant,Dim Sum Restaurant,Doner Restaurant,Caribbean Restaurant,M5T,Downtown Toronto,43.653206,-79.400049
9,"Queen's Park, Ontario Provincial Government",3,Italian Restaurant,Sushi Restaurant,Restaurant,Chinese Restaurant,Portuguese Restaurant,Mexican Restaurant,Vietnamese Restaurant,Doner Restaurant,Filipino Restaurant,Fast Food Restaurant,M7A,Downtown Toronto,43.662301,-79.389494


There is no restaurant (restaurants in competitor category) in followings neiborhoods: 
1) Rosedale 
2) CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport

Thus, these neighborhoods are not be segmented.

#### Mark all the clusters in a map

In [49]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

In [50]:
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(DT_merged['Latitude'], DT_merged['Longitude'], DT_merged['Neighborhood'], DT_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 4. Results and 5. Discussion

Let's examine the results and present a brief discussion.

We structure neighborhoods of each cluster into a dataframe.

To display the result in a better way, we mark all the restaurants in the neighborhoods of important clusters with different colors.

#### Cluster 1

In [51]:
DT_merged.loc[DT_merged['Cluster Labels'] == 0, DT_merged.columns[[0] + list(range(2, DT_merged.shape[1]-4))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Seafood Restaurant,Restaurant,Comfort Food Restaurant,Thai Restaurant,Greek Restaurant,Sushi Restaurant,Vegetarian / Vegan Restaurant,Italian Restaurant,Japanese Restaurant,French Restaurant
3,Church and Wellesley,Sushi Restaurant,Japanese Restaurant,Restaurant,Mediterranean Restaurant,Vietnamese Restaurant,Mexican Restaurant,American Restaurant,Caribbean Restaurant,Ethiopian Restaurant,Fast Food Restaurant
4,"Commerce Court, Victoria Hotel",Restaurant,American Restaurant,Japanese Restaurant,Seafood Restaurant,Thai Restaurant,Asian Restaurant,Italian Restaurant,Vegetarian / Vegan Restaurant,New American Restaurant,Fast Food Restaurant
5,"First Canadian Place, Underground city",Japanese Restaurant,Restaurant,American Restaurant,Asian Restaurant,Seafood Restaurant,Thai Restaurant,Sushi Restaurant,Colombian Restaurant,New American Restaurant,Vegetarian / Vegan Restaurant
11,"Richmond, Adelaide, King",Restaurant,Thai Restaurant,American Restaurant,Sushi Restaurant,Colombian Restaurant,New American Restaurant,Mediterranean Restaurant,Vegetarian / Vegan Restaurant,Modern European Restaurant,Fast Food Restaurant
12,St. James Town,Restaurant,American Restaurant,Japanese Restaurant,Seafood Restaurant,Moroccan Restaurant,Comfort Food Restaurant,New American Restaurant,Italian Restaurant,Middle Eastern Restaurant,Vegetarian / Vegan Restaurant
14,Stn A PO Boxes,Italian Restaurant,Seafood Restaurant,Restaurant,Japanese Restaurant,Molecular Gastronomy Restaurant,American Restaurant,Thai Restaurant,Fast Food Restaurant,Sushi Restaurant,Vegetarian / Vegan Restaurant
15,"Toronto Dominion Centre, Design Exchange",Restaurant,American Restaurant,Japanese Restaurant,Seafood Restaurant,Asian Restaurant,Italian Restaurant,Chinese Restaurant,Vegetarian / Vegan Restaurant,New American Restaurant,French Restaurant
16,"University of Toronto, Harbord",Japanese Restaurant,French Restaurant,Italian Restaurant,Sushi Restaurant,Restaurant,Comfort Food Restaurant,Vietnamese Restaurant,Dumpling Restaurant,Filipino Restaurant,Fast Food Restaurant


In [52]:
templist=[]
temp=DT_merged.loc[DT_merged['Cluster Labels'] == 0]
for neighborhood in temp['Neighborhood']:
    templist.append(neighborhood)
mask=DT_restaurant['Neighborhood'].isin(templist)
cluster=DT_restaurant[mask]

In [53]:
#create a map of Downtown Toronto using the geograpical coordinate
map_DT=folium.Map(location=[latitude,longitude],zoom_start=11)

In [54]:
#add neighborhood markers to map
for lat, lng, label in zip(cluster['Neighborhood Latitude'],cluster['Neighborhood Longitude'],cluster['Neighborhood']):
    folium.Marker([lat,lng], popup=label).add_to(map_DT)
#add venues markers to map
for lat, lng, label,col in zip(cluster['Venue Latitude'],cluster['Venue Longitude'],cluster['Venue'],cluster['COLOR']):
    label=folium.Popup(label,parse_html=True)
    folium.CircleMarker(
    [lat,lng],
    radius=5,
    popup=label,
    color=col,
    fill=True,
    fill_color=col,
    fill_opacity=0.7,
    parse_html=False).add_to(map_DT)

map_DT

Cluster 1 contains over 50 percentage of all neighborhoods and there are only 1 Chinese restaurants. Except for 'Church and Wellesleythe' and 'University of Toronto, Harbord', the restaurants of other neighborhoods are crowded.

#### Cluster 2

In [55]:
DT_merged.loc[DT_merged['Cluster Labels'] == 1, DT_merged.columns[[0] + list(range(2, DT_merged.shape[1]-4))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Christie,Italian Restaurant,Restaurant,Vietnamese Restaurant,Doner Restaurant,German Restaurant,French Restaurant,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant


#### Cluster 3

In [56]:
DT_merged.loc[DT_merged['Cluster Labels'] == 2, DT_merged.columns[[0] + list(range(2, DT_merged.shape[1]-4))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,"Regent Park, Harbourfront",French Restaurant,Mexican Restaurant,Restaurant,Vietnamese Restaurant,Doner Restaurant,German Restaurant,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant


#### Cluster 4

In [57]:
DT_merged.loc[DT_merged['Cluster Labels'] == 3, DT_merged.columns[[0] + list(range(2, DT_merged.shape[1]-4))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Central Bay Street,Italian Restaurant,Thai Restaurant,Japanese Restaurant,Indian Restaurant,New American Restaurant,French Restaurant,Vegetarian / Vegan Restaurant,Korean Restaurant,Middle Eastern Restaurant,Falafel Restaurant
6,"Garden District, Ryerson",Japanese Restaurant,Italian Restaurant,Middle Eastern Restaurant,Fast Food Restaurant,Ramen Restaurant,Vietnamese Restaurant,New American Restaurant,Ethiopian Restaurant,Mexican Restaurant,Modern European Restaurant
7,"Harbourfront East, Union Station, Toronto Islands",Restaurant,Italian Restaurant,Indian Restaurant,Mexican Restaurant,Vegetarian / Vegan Restaurant,Sushi Restaurant,Seafood Restaurant,Japanese Restaurant,Chinese Restaurant,New American Restaurant
9,"Queen's Park, Ontario Provincial Government",Italian Restaurant,Sushi Restaurant,Restaurant,Chinese Restaurant,Portuguese Restaurant,Mexican Restaurant,Vietnamese Restaurant,Doner Restaurant,Filipino Restaurant,Fast Food Restaurant
13,"St. James Town, Cabbagetown",Restaurant,Italian Restaurant,Chinese Restaurant,Indian Restaurant,Thai Restaurant,Taiwanese Restaurant,Sri Lankan Restaurant,Japanese Restaurant,Caribbean Restaurant,Dumpling Restaurant


In [58]:
templist=[]
temp=DT_merged.loc[DT_merged['Cluster Labels'] == 3]
for neighborhood in temp['Neighborhood']:
    templist.append(neighborhood)
mask=DT_restaurant['Neighborhood'].isin(templist)
cluster=DT_restaurant[mask]

In [59]:
#create a map of Downtown Toronto using the geograpical coordinate
map_DT=folium.Map(location=[latitude,longitude],zoom_start=11)

In [60]:
#add neighborhood markers to map
for lat, lng, label in zip(cluster['Neighborhood Latitude'],cluster['Neighborhood Longitude'],cluster['Neighborhood']):
    folium.Marker([lat,lng], popup=label).add_to(map_DT)
#add venues markers to map
for lat, lng, label,col in zip(cluster['Venue Latitude'],cluster['Venue Longitude'],cluster['Venue'],cluster['COLOR']):
    label=folium.Popup(label,parse_html=True)
    folium.CircleMarker(
    [lat,lng],
    radius=5,
    popup=label,
    color=col,
    fill=True,
    fill_color=col,
    fill_opacity=0.7,
    parse_html=False).add_to(map_DT)

map_DT

Cluster 4 contains 30 percentage of all neighborhoods and there are 5 Chinese restaurants. Almost all the Chinese restaurant are located in these neighborhoods.

#### Cluster 5

In [61]:
DT_merged.loc[DT_merged['Cluster Labels'] == 4, DT_merged.columns[[0] + list(range(2, DT_merged.shape[1]-4))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,"Kensington Market, Chinatown, Grange Park",Vegetarian / Vegan Restaurant,Mexican Restaurant,Vietnamese Restaurant,Dumpling Restaurant,Comfort Food Restaurant,Japanese Restaurant,Filipino Restaurant,Dim Sum Restaurant,Doner Restaurant,Caribbean Restaurant


Except for stakeholders’ interests, we will not consider neighborhoods of clusters 1 and clusters 4. The reasons are as followings:
The restaurants are intensively located, especially 'Japanese Restaurant', 'Italian Restaurant', thus the competition is severe.

#### Heatmap

Let's dive further on cluster 2 3 5. We create a heat map to show the intensity of restaurants in these neighborhoods.

In [62]:
#extract venue ata form DT_restaurant, which belongs to neighborhoods of cluster 2,3,5
temp1=DT_merged.loc[DT_merged['Cluster Labels'] == 1]
temp2=DT_merged.loc[DT_merged['Cluster Labels'] == 2]
temp3=DT_merged.loc[DT_merged['Cluster Labels'] == 4]

list_cluster=[]
for neighborhood in temp1['Neighborhood']:
    list_cluster.append(neighborhood)
for neighborhood in temp2['Neighborhood']:
    list_cluster.append(neighborhood)
for neighborhood in temp3['Neighborhood']:
    list_cluster.append(neighborhood)

mask=DT_restaurant['Neighborhood'].isin(list_cluster)
cluster1=DT_restaurant[mask]

In [63]:
#create a map of Downtown Toronto using the geograpical coordinate
map_DT1=folium.Map(location=[latitude,longitude],zoom_start=11)

In [64]:
from folium.plugins import HeatMap
#add neighborhood markers to map
for lat, lng, label in zip(cluster1['Neighborhood Latitude'],cluster1['Neighborhood Longitude'],cluster1['Neighborhood']):
    folium.Marker([lat,lng], popup=label).add_to(map_DT1)

heat_map=cluster1[['Venue Latitude', 'Venue Longitude']].values
HeatMap(heat_map, radius=15).add_to(map_DT1)
map_DT1

The intensity of neighborhoods ‘Kensington Market, Chinatown, Grange Park’ is in a middle level. 
Neighborhoods with a low intensity indicate that the demand for restaurant is not urgent.

## 6. Conclusion

Based on the above analysis, we will recommend neighborhoods ‘Kensington Market, Chinatown, Grange Park’ for stakeholders.

Thanks for watch.