# Capstone Project - The Battle of the Neighborhoods 
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

A chain of restaurant owners in **Ontario, Canada** want to expand their business.
Currently they have their restaurants open in cities like **Ottawa, Brampton and Hamilton**.

They figured out that they would make more profit by opening up a restaurant in **Toronto** as **Toronto** is the largest city of Canada. So they want to open up a new restaurant some place nice with good neighbourhood in Toronto. They are having trouble figuring out which place to chose within Toronto to open their new restaurant.

We have to help them figure out which place to chose  where there business will be good, they have less competition and nice people live around. They want to know about 2-3 such places so that they can decide for themselves which one is the best.


## Data <a name="data"></a>

#### First Dataset: List of neighbourhoods in Toronto:

Firstly, I will be using data from a wikipedia page which provides information about list of neighbourhoods in Toronto, Canada. I will be using web scrapping tool BeautifulSoup for extracting the data in the form of a table from this wikipedia page.
This table contains 3 columns: Postal Code, Borough and Neighbourhood.
The link for this wikipedia page: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M .
After preprocessing the table and adding two more columns of Latitude and Longitude of each Neighbourhood, this dataset is ready for use. 
Final DataFrame will have 5 columns: Postal Code, Borough, Neighbourhood, Latitude, Longitude.
And it will contain 39 rows having 39 unique neighbourhoods of Toronto and 11 unique Boroughs.

For example,the first row contains a Borough named **North York** which contains one neighbourhood named **Parkwoods** and has a Postal code of **M3A**. The geographical coordinates of this neighbourhood is **(43.753259,-79.329656)**.

#### Second Dataset: List of different venues in the neighbourhoods of Toronto:

This dataset will be formed using the Foursquare API. I will use the Foursquare location data to explore different venues in each neighbourhood of Toronto.
These venues can be any place. For example: Parks, Coffee Shops, Hotels, Gyms, etc. 
Using the Foursquare location data, I can get information about these venues and analyze the neighbourhoods of Toronto easily based on this information.

We will use the geographical coordinates from above dataset to generate this Location dataset.

**In general, I will be using these two datasets to solve the business problem of finding the best place to open a restaurant within Toronto**


Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [158]:
#Importing Libraries
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis

import json # library to handle JSON files

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans
from sklearn.cluster import DBSCAN

import folium # map rendering library

print('Libraries imported.')


Libraries imported.


Importing the first dataset in form of a DataFrame:

In [159]:
df=pd.read_csv('Toronto_Neighbourhoods.csv')

In [160]:
df.head()

Unnamed: 0.1,Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,4,M4E,East Toronto,The Beaches,43.676357,-79.293031


In [161]:
df.drop('Unnamed: 0',axis=1,inplace=True)

In [162]:
df.columns=['Postal Code','Borough','Neighbourhood','latitude','longitude']

In [163]:
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,latitude,longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M4E,East Toronto,The Beaches,43.676357,-79.293031


In [164]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(df['Borough'].unique()),
        df.shape[0]
    )
)

The dataframe has 4 boroughs and 39 neighborhoods.


Geographical coordinates of Toronto:

In [165]:
latitude=43.6532
longitude=-79.3832

**Creating a map of Toronto with all 39 neighbourhoods marked on this map:**

In [166]:
# create map of Toronto using latitude and longitude values:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighbourhood in zip(df['latitude'], df['longitude'], df['Borough'], df['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

**Folium** is a great visualization library. Feel free to zoom into the above map, and click on each circle mark to reveal the name of the neighborhood and its respective borough.

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them by creating the second dataset.

#### Define Foursquare Credentials and Version

In [167]:
CLIENT_ID = 'YSX0RLDK0BS2SIZRMNKVYOSSWARTUL3RM50EBE2YCDXDBWYC' # my Foursquare ID
CLIENT_SECRET = 'MX5SMRCWBOOBMH4XLOF04XVECS5RJQSXWIFMAYM3ZHH0BVQU' # my Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100

### Explore different venues in different Neighborhoods of Toronto:

#### Let's create a function to do the same for all the neighborhoods in Toronto:

In [168]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [169]:
toronto_venues = getNearbyVenues(names=df['Neighbourhood'],
                                   latitudes=df['latitude'],
                                   longitudes=df['longitude']
                                  )


**toronto_venues** is a dataframe that contains all the information about different neighbourhoods of Toronto along with their nearby venues like Park, Restaurant, Coffee shop, etc. It is the second dataset that we require to solve the problem:

In [170]:
toronto_venues.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Regent Park, Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,"Regent Park, Harbourfront",43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant
4,"Regent Park, Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa


In [171]:
toronto_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,57,57,57,57,57,57
"Brockton, Parkdale Village, Exhibition Place",25,25,25,25,25,25
"Business reply mail Processing Centre, South Central Letter Processing Plant Toronto",14,14,14,14,14,14
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",17,17,17,17,17,17
Central Bay Street,66,66,66,66,66,66
Christie,17,17,17,17,17,17
Church and Wellesley,78,78,78,78,78,78
"Commerce Court, Victoria Hotel",100,100,100,100,100,100
Davisville,35,35,35,35,35,35
Davisville North,8,8,8,8,8,8


In [172]:
toronto_venues.Neighbourhood.nunique()

39

In [173]:
df.Neighbourhood.nunique()

39

Preprocessing the second dataset that is **toronto_venues** dataframe so that we can cluster the dataset easily using **one hot encoding** :

In [174]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighbourhood'] = toronto_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Neighbourhood,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Store,Belgian Restaurant,Bistro,Board Shop,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,...,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soup Place,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,Stationery Store,Steakhouse,Strip Club,Summer Camp,Supermarket,Sushi Restaurant,Swim School,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


We're interested in venues in 'food' category, but only those that are proper restaurants - coffee shops, pizza places, bakeries etc. are not direct competitors, so we don't care about those. Hence we will include in out list only venues that have 'restaurant' in category name, and we'll make sure to detect and include all the subcategories of different restaurants in the neighborhood. For example, Afghan restaurant, Italian restaurant, etc. For this, we locate venues from **toronto_onehot** dataframe that are restaurants only:

In [175]:
col=['Neighbourhood']
for column in toronto_onehot.columns:
    if column.__contains__('Restaurant'):
        col.append(column)

In [176]:
toronto_restaurants=toronto_onehot[col]
toronto_restaurants=toronto_restaurants.groupby('Neighbourhood').sum().reset_index()
toronto_restaurants.head()

Unnamed: 0,Neighbourhood,Afghan Restaurant,American Restaurant,Asian Restaurant,Belgian Restaurant,Brazilian Restaurant,Cajun / Creole Restaurant,Caribbean Restaurant,Chinese Restaurant,Colombian Restaurant,Comfort Food Restaurant,Cuban Restaurant,Dim Sum Restaurant,Doner Restaurant,Dumpling Restaurant,Eastern European Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,French Restaurant,German Restaurant,Gluten-free Restaurant,Greek Restaurant,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Korean Restaurant,Latin American Restaurant,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,New American Restaurant,Persian Restaurant,Portuguese Restaurant,Ramen Restaurant,Restaurant,Seafood Restaurant,Sushi Restaurant,Taiwanese Restaurant,Thai Restaurant,Theme Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,Berczy Park,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,1,0,1,1,0,0,0,0,0,0,0,0,1,0,0,0,0,2,2,1,0,1,0,1,0
1,"Brockton, Parkdale Village, Exhibition Place",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
2,"Business reply mail Processing Centre, South C...",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
3,"CN Tower, King and Spadina, Railway Lands, Har...",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Central Bay Street,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,1,3,2,1,0,0,0,0,1,1,0,0,1,0,1,1,1,0,1,0,2,0,1,0


**Adding a column containing total number of restaurants in that neighbourhood. This will help us in making clusters using K-Means clustering algorithm.**

In [177]:
toronto_restaurants['Total']=toronto_restaurants.sum(axis=1)
#toronto_grouped=toronto_restaurants

toronto_grouped = toronto_onehot.groupby('Neighbourhood').mean().reset_index()
toronto_grouped
toronto_restaurants= toronto_restaurants.drop('Neighbourhood',axis=1)


**Using K-Means clustering algorithm to make clusters of dataset so that our analysis is easy:**

In [178]:
# set number of clusters
kclusters = 5


# run k-means clustering
kmeans = KMeans(n_clusters=kclusters,random_state=0).fit(toronto_restaurants)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 1, 1, 1, 4, 1, 0, 0, 2, 1], dtype=int32)

In [179]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Preparing a dataset **venues_sorted** in which all neighbourhoods of Toronto are listed along with its **top 10 most common venues**. This will help in better visualisation of each cluster after they are formed.

In [180]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
venues_sorted = pd.DataFrame(columns=columns)
venues_sorted['Neighbourhood'] = toronto_grouped['Neighbourhood']

for ind in np.arange(toronto_grouped.shape[0]):
    venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Coffee Shop,Bakery,Pub,Cheese Shop,Restaurant,Seafood Restaurant,Farmers Market,Beer Bar,Cocktail Bar,Café
1,"Brockton, Parkdale Village, Exhibition Place",Café,Bakery,Nightclub,Coffee Shop,Breakfast Spot,Gym / Fitness Center,Performing Arts Venue,Pet Store,Climbing Gym,Restaurant
2,"Business reply mail Processing Centre, South C...",Gym / Fitness Center,Garden Center,Brewery,Light Rail Station,Farmers Market,Fast Food Restaurant,Burrito Place,Restaurant,Auto Workshop,Garden
3,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Lounge,Airport Service,Airport Terminal,Bar,Harbor / Marina,Coffee Shop,Sculpture Garden,Boat or Ferry,Rental Car Location,Boutique
4,Central Bay Street,Coffee Shop,Italian Restaurant,Sandwich Place,Café,Department Store,Juice Bar,Thai Restaurant,Japanese Restaurant,Salad Place,Burger Joint


In [181]:
# add clustering labels
venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

After adding cluster labels to **venues_sorted** dataframe:

In [182]:
venues_sorted.head()

Unnamed: 0,Cluster Labels,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,3,Berczy Park,Coffee Shop,Bakery,Pub,Cheese Shop,Restaurant,Seafood Restaurant,Farmers Market,Beer Bar,Cocktail Bar,Café
1,1,"Brockton, Parkdale Village, Exhibition Place",Café,Bakery,Nightclub,Coffee Shop,Breakfast Spot,Gym / Fitness Center,Performing Arts Venue,Pet Store,Climbing Gym,Restaurant
2,1,"Business reply mail Processing Centre, South C...",Gym / Fitness Center,Garden Center,Brewery,Light Rail Station,Farmers Market,Fast Food Restaurant,Burrito Place,Restaurant,Auto Workshop,Garden
3,1,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Lounge,Airport Service,Airport Terminal,Bar,Harbor / Marina,Coffee Shop,Sculpture Garden,Boat or Ferry,Rental Car Location,Boutique
4,4,Central Bay Street,Coffee Shop,Italian Restaurant,Sandwich Place,Café,Department Store,Juice Bar,Thai Restaurant,Japanese Restaurant,Salad Place,Burger Joint


Creating a dataframe **toronto_merged**, by merging two dataframes: **df** and **venues_sorted**. 

In [183]:

toronto_merged = df

toronto_merged = toronto_merged.join(venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

toronto_merged.head(10) 

Unnamed: 0,Postal Code,Borough,Neighbourhood,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,1,Coffee Shop,Bakery,Park,Café,Pub,Breakfast Spot,Theater,Yoga Studio,Farmers Market,Restaurant
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,2,Coffee Shop,Yoga Studio,College Auditorium,Beer Bar,Smoothie Shop,Sandwich Place,Café,Portuguese Restaurant,Chinese Restaurant,Persian Restaurant
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,4,Clothing Store,Coffee Shop,Café,Japanese Restaurant,Bubble Tea Shop,Cosmetics Shop,Fast Food Restaurant,Ramen Restaurant,Bookstore,Pizza Place
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,0,Coffee Shop,Café,Cocktail Bar,Gastropub,Beer Bar,Restaurant,American Restaurant,Park,Seafood Restaurant,Farmers Market
4,M4E,East Toronto,The Beaches,43.676357,-79.293031,1,Trail,Pub,Neighborhood,Health Food Store,Doner Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Yoga Studio
5,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,3,Coffee Shop,Bakery,Pub,Cheese Shop,Restaurant,Seafood Restaurant,Farmers Market,Beer Bar,Cocktail Bar,Café
6,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,4,Coffee Shop,Italian Restaurant,Sandwich Place,Café,Department Store,Juice Bar,Thai Restaurant,Japanese Restaurant,Salad Place,Burger Joint
7,M6G,Downtown Toronto,Christie,43.669542,-79.422564,1,Grocery Store,Café,Park,Restaurant,Baby Store,Candy Store,Athletics & Sports,Italian Restaurant,Diner,Coffee Shop
8,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568,0,Coffee Shop,Café,Hotel,Bar,Restaurant,Gym,Clothing Store,Thai Restaurant,Concert Hall,Lounge
9,M6H,West Toronto,"Dufferin, Dovercourt Village",43.669005,-79.442259,1,Bakery,Pharmacy,Music Venue,Bar,Supermarket,Middle Eastern Restaurant,Portuguese Restaurant,Café,Furniture / Home Store,Bank


**Creating a map of toronto showing all 100 neighbourhoods of toronto, with different colours representing neighbourhoods belonging to different cluster:**

In [184]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['latitude'], toronto_merged['longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Cluster-wise segmentation of the main dataset that is toronto_merged dataframe:

In [185]:
df0=toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]
df0.head()

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Downtown Toronto,0,Coffee Shop,Café,Cocktail Bar,Gastropub,Beer Bar,Restaurant,American Restaurant,Park,Seafood Restaurant,Farmers Market
8,Downtown Toronto,0,Coffee Shop,Café,Hotel,Bar,Restaurant,Gym,Clothing Store,Thai Restaurant,Concert Hall,Lounge
13,Downtown Toronto,0,Coffee Shop,Hotel,Café,Restaurant,Japanese Restaurant,Italian Restaurant,American Restaurant,Seafood Restaurant,Salad Place,Gastropub
16,Downtown Toronto,0,Coffee Shop,Restaurant,Café,Hotel,Gym,American Restaurant,Seafood Restaurant,Gastropub,Japanese Restaurant,Beer Bar
34,Downtown Toronto,0,Coffee Shop,Pub,Café,Seafood Restaurant,Beer Bar,Hotel,Restaurant,Italian Restaurant,Japanese Restaurant,Creperie


In [186]:
df1=toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]
df1.head()

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,1,Coffee Shop,Bakery,Park,Café,Pub,Breakfast Spot,Theater,Yoga Studio,Farmers Market,Restaurant
4,East Toronto,1,Trail,Pub,Neighborhood,Health Food Store,Doner Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Yoga Studio
7,Downtown Toronto,1,Grocery Store,Café,Park,Restaurant,Baby Store,Candy Store,Athletics & Sports,Italian Restaurant,Diner,Coffee Shop
9,West Toronto,1,Bakery,Pharmacy,Music Venue,Bar,Supermarket,Middle Eastern Restaurant,Portuguese Restaurant,Café,Furniture / Home Store,Bank
14,West Toronto,1,Café,Bakery,Nightclub,Coffee Shop,Breakfast Spot,Gym / Fitness Center,Performing Arts Venue,Pet Store,Climbing Gym,Restaurant


In [187]:
df2=toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]
df2.head()

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Downtown Toronto,2,Coffee Shop,Yoga Studio,College Auditorium,Beer Bar,Smoothie Shop,Sandwich Place,Café,Portuguese Restaurant,Chinese Restaurant,Persian Restaurant
17,East Toronto,2,Café,Coffee Shop,American Restaurant,Bakery,Brewery,Gastropub,Yoga Studio,Fish Market,Pet Store,Park
22,West Toronto,2,Mexican Restaurant,Café,Thai Restaurant,Arts & Crafts Store,Discount Store,Bar,Diner,Bakery,Speakeasy,Italian Restaurant
26,Central Toronto,2,Pizza Place,Sandwich Place,Dessert Shop,Italian Restaurant,Gym,Coffee Shop,Café,Sushi Restaurant,Seafood Restaurant,Discount Store
27,Downtown Toronto,2,Café,Sandwich Place,Bar,Japanese Restaurant,Bookstore,Bakery,Restaurant,Yoga Studio,Italian Restaurant,Beer Bar


In [188]:
df3=toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]
df3.head()

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Downtown Toronto,3,Coffee Shop,Bakery,Pub,Cheese Shop,Restaurant,Seafood Restaurant,Farmers Market,Beer Bar,Cocktail Bar,Café
10,Downtown Toronto,3,Coffee Shop,Aquarium,Hotel,Café,Restaurant,Fried Chicken Joint,Scenic Lookout,Pizza Place,Brewery,Bar
11,West Toronto,3,Bar,Coffee Shop,Asian Restaurant,Restaurant,Café,Men's Store,Vietnamese Restaurant,Ice Cream Shop,Italian Restaurant,Gift Shop
12,East Toronto,3,Greek Restaurant,Coffee Shop,Italian Restaurant,Restaurant,Furniture / Home Store,Ice Cream Shop,Pizza Place,Caribbean Restaurant,Pub,Café
35,Downtown Toronto,3,Coffee Shop,Italian Restaurant,Bakery,Park,Chinese Restaurant,Pizza Place,Restaurant,Pub,Café,Diner


In [189]:
df4=toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]
df4.head()

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Downtown Toronto,4,Clothing Store,Coffee Shop,Café,Japanese Restaurant,Bubble Tea Shop,Cosmetics Shop,Fast Food Restaurant,Ramen Restaurant,Bookstore,Pizza Place
6,Downtown Toronto,4,Coffee Shop,Italian Restaurant,Sandwich Place,Café,Department Store,Juice Bar,Thai Restaurant,Japanese Restaurant,Salad Place,Burger Joint
30,Downtown Toronto,4,Vegetarian / Vegan Restaurant,Coffee Shop,Café,Mexican Restaurant,Vietnamese Restaurant,Bar,Park,Gaming Cafe,Caribbean Restaurant,Dessert Shop


## Analysis: <a name="analysis"></a>

In [190]:
print('Total number of neighbourhoods in cluster 0 is',toronto_restaurants.loc[df0.index,:].shape[0])
print('Total number of restaurants in this cluster is', toronto_restaurants.loc[df0.index,:]['Total'].sum())
print('Ratio of Restaurant/Neighbourhood in this cluster is',(toronto_restaurants.loc[df0.index,:]['Total'].sum()/toronto_restaurants.loc[df0.index,:].shape[0]) )

Total number of neighbourhoods in cluster 0 is 7
Total number of restaurants in this cluster is 78
Ratio of Restaurant/Neighbourhood in this cluster is 11.142857142857142


In [191]:
print('Total number of neighbourhoods in cluster 1 is',toronto_restaurants.loc[df1.index,:].shape[0])
print('Total number of restaurants in this cluster is', toronto_restaurants.loc[df1.index,:]['Total'].sum())
print('Ratio of Restaurant/Neighbourhood in this cluster is',(toronto_restaurants.loc[df1.index,:]['Total'].sum()/toronto_restaurants.loc[df1.index,:].shape[0]) )

Total number of neighbourhoods in cluster 1 is 18
Total number of restaurants in this cluster is 201
Ratio of Restaurant/Neighbourhood in this cluster is 11.166666666666666


In [192]:
print('Total number of neighbourhoods in cluster 2 is',toronto_restaurants.loc[df2.index,:].shape[0])
print('Total number of restaurants in this cluster is', toronto_restaurants.loc[df2.index,:]['Total'].sum())
print('Ratio of Restaurant/Neighbourhood in this cluster is',(toronto_restaurants.loc[df2.index,:]['Total'].sum()/toronto_restaurants.loc[df2.index,:].shape[0]) )

Total number of neighbourhoods in cluster 2 is 6
Total number of restaurants in this cluster is 35
Ratio of Restaurant/Neighbourhood in this cluster is 5.833333333333333


In [193]:
print('Total number of neighbourhoods in cluster 3 is',toronto_restaurants.loc[df3.index,:].shape[0])
print('Total number of restaurants in this cluster is', toronto_restaurants.loc[df3.index,:]['Total'].sum())
print('Ratio of Restaurant/Neighbourhood in this cluster is',(toronto_restaurants.loc[df3.index,:]['Total'].sum()/toronto_restaurants.loc[df3.index,:].shape[0]) )

Total number of neighbourhoods in cluster 3 is 5
Total number of restaurants in this cluster is 36
Ratio of Restaurant/Neighbourhood in this cluster is 7.2


In [194]:
print('Total number of neighbourhoods in cluster 4 is',toronto_restaurants.loc[df4.index,:].shape[0])
print('Total number of restaurants in this cluster is', toronto_restaurants.loc[df4.index,:]['Total'].sum())
print('Ratio of Restaurant/Neighbourhood in this cluster is',(toronto_restaurants.loc[df4.index,:]['Total'].sum()/toronto_restaurants.loc[df4.index,:].shape[0]) )

Total number of neighbourhoods in cluster 4 is 3
Total number of restaurants in this cluster is 39
Ratio of Restaurant/Neighbourhood in this cluster is 13.0


### Note: As it is clearly visible that Restaurant/Neighbourhood ratio is lowest for Cluster 2, we will further analyse neighbourhoods belonging to cluster 2 only.

In [195]:
toronto_restaurants.loc[df2.index,:]

Unnamed: 0,Afghan Restaurant,American Restaurant,Asian Restaurant,Belgian Restaurant,Brazilian Restaurant,Cajun / Creole Restaurant,Caribbean Restaurant,Chinese Restaurant,Colombian Restaurant,Comfort Food Restaurant,Cuban Restaurant,Dim Sum Restaurant,Doner Restaurant,Dumpling Restaurant,Eastern European Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,French Restaurant,German Restaurant,Gluten-free Restaurant,Greek Restaurant,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Korean Restaurant,Latin American Restaurant,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,New American Restaurant,Persian Restaurant,Portuguese Restaurant,Ramen Restaurant,Restaurant,Seafood Restaurant,Sushi Restaurant,Taiwanese Restaurant,Thai Restaurant,Theme Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Total
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,2
17,0,0,0,1,0,0,2,0,0,1,0,1,1,2,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,3,20
22,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,4
26,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
27,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
28,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,2,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,2,0,0,0,1,0,9


In [196]:
toronto_restaurants.loc[df2.index,:].Total

1      2
17    20
22     4
26     0
27     0
28     9
Name: Total, dtype: int64

In [197]:
print(toronto_restaurants.loc[df2.index,:][toronto_restaurants.loc[df2.index,:].Total==20])
print(toronto_restaurants.loc[df2.index,:][toronto_restaurants.loc[df2.index,:].Total==9])

    Afghan Restaurant  American Restaurant  ...  Vietnamese Restaurant  Total
17                  0                    0  ...                      3     20

[1 rows x 48 columns]
    Afghan Restaurant  American Restaurant  ...  Vietnamese Restaurant  Total
28                  0                    0  ...                      0      9

[1 rows x 48 columns]


As we can see, Index no. 17 and 28 contains very high Total number of restaurants (20 and 9) in these neighbourhoods, we will remove these neighbourhoods from df4 dataframe:

In [198]:
df2.drop([17,28],axis=0,inplace=True)

In [199]:
toronto_restaurants.loc[df2.index,:]

Unnamed: 0,Afghan Restaurant,American Restaurant,Asian Restaurant,Belgian Restaurant,Brazilian Restaurant,Cajun / Creole Restaurant,Caribbean Restaurant,Chinese Restaurant,Colombian Restaurant,Comfort Food Restaurant,Cuban Restaurant,Dim Sum Restaurant,Doner Restaurant,Dumpling Restaurant,Eastern European Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,French Restaurant,German Restaurant,Gluten-free Restaurant,Greek Restaurant,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Korean Restaurant,Latin American Restaurant,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,New American Restaurant,Persian Restaurant,Portuguese Restaurant,Ramen Restaurant,Restaurant,Seafood Restaurant,Sushi Restaurant,Taiwanese Restaurant,Thai Restaurant,Theme Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Total
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,2
22,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,4
26,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
27,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [200]:
toronto_merged.loc[df2.index,:]

Unnamed: 0,Postal Code,Borough,Neighbourhood,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,2,Coffee Shop,Yoga Studio,College Auditorium,Beer Bar,Smoothie Shop,Sandwich Place,Café,Portuguese Restaurant,Chinese Restaurant,Persian Restaurant
22,M6P,West Toronto,"High Park, The Junction South",43.661608,-79.464763,2,Mexican Restaurant,Café,Thai Restaurant,Arts & Crafts Store,Discount Store,Bar,Diner,Bakery,Speakeasy,Italian Restaurant
26,M4S,Central Toronto,Davisville,43.704324,-79.38879,2,Pizza Place,Sandwich Place,Dessert Shop,Italian Restaurant,Gym,Coffee Shop,Café,Sushi Restaurant,Seafood Restaurant,Discount Store
27,M5S,Downtown Toronto,"University of Toronto, Harbord",43.662696,-79.400049,2,Café,Sandwich Place,Bar,Japanese Restaurant,Bookstore,Bakery,Restaurant,Yoga Studio,Italian Restaurant,Beer Bar


In above dataset, we can see that neighbourhoods with index 22 has Restaurant as their most common venue more than anyone and hence these neighbourhoods are not suitable for Restaurant business. Hence we have to remove these rows from df3 dataframe:

Here we have to look at few things. If we consider top 10 most common venues in case of each neighbourhood, all of them have 3 restaurants. 

M4S: Population=26506, Private Dwellings=14011

M5S: Population=15372, Private Dwellings=9380

M6P: Population=40035, Private Dwellings=19924

M7A: Basically contains war memorial, some government offices, library, and other monuments. It has a cafe also (Quorum Cafe).

We can see that in M6P, we have restaurants as most common venue more than once in top 5 most common venues. But, considering the fact that it has a population nearly equal to the population of M4S and M5S combined, it has to be considered equally. This will distribute the population of M6P in the newly opened restaurant(if the client chooses it).


In [201]:
toronto_merged.loc[df2.index,:]

Unnamed: 0,Postal Code,Borough,Neighbourhood,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,2,Coffee Shop,Yoga Studio,College Auditorium,Beer Bar,Smoothie Shop,Sandwich Place,Café,Portuguese Restaurant,Chinese Restaurant,Persian Restaurant
22,M6P,West Toronto,"High Park, The Junction South",43.661608,-79.464763,2,Mexican Restaurant,Café,Thai Restaurant,Arts & Crafts Store,Discount Store,Bar,Diner,Bakery,Speakeasy,Italian Restaurant
26,M4S,Central Toronto,Davisville,43.704324,-79.38879,2,Pizza Place,Sandwich Place,Dessert Shop,Italian Restaurant,Gym,Coffee Shop,Café,Sushi Restaurant,Seafood Restaurant,Discount Store
27,M5S,Downtown Toronto,"University of Toronto, Harbord",43.662696,-79.400049,2,Café,Sandwich Place,Bar,Japanese Restaurant,Bookstore,Bakery,Restaurant,Yoga Studio,Italian Restaurant,Beer Bar


**The above Neighbourhoods looks perfect for Restaurant opening. Therefore, finally storing the information of these 4 neighbourhoods in a dataframe named final:**

In [202]:
#final=toronto_merged.loc[df3.index,'Postcode':'longitude']
final=toronto_merged.loc[df2.index]
final

Unnamed: 0,Postal Code,Borough,Neighbourhood,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,2,Coffee Shop,Yoga Studio,College Auditorium,Beer Bar,Smoothie Shop,Sandwich Place,Café,Portuguese Restaurant,Chinese Restaurant,Persian Restaurant
22,M6P,West Toronto,"High Park, The Junction South",43.661608,-79.464763,2,Mexican Restaurant,Café,Thai Restaurant,Arts & Crafts Store,Discount Store,Bar,Diner,Bakery,Speakeasy,Italian Restaurant
26,M4S,Central Toronto,Davisville,43.704324,-79.38879,2,Pizza Place,Sandwich Place,Dessert Shop,Italian Restaurant,Gym,Coffee Shop,Café,Sushi Restaurant,Seafood Restaurant,Discount Store
27,M5S,Downtown Toronto,"University of Toronto, Harbord",43.662696,-79.400049,2,Café,Sandwich Place,Bar,Japanese Restaurant,Bookstore,Bakery,Restaurant,Yoga Studio,Italian Restaurant,Beer Bar


**Visualising these 4 neighbourhoods on a map:**

In [203]:
# create map of Toronto using latitude and longitude values:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=15)

# add markers to map
for lat, lng, borough, neighbourhood in zip(final['latitude'], final['longitude'], final['Borough'], final['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=9,
        popup=label,
        color='blue',
        fill=True,
        fill_color='blue',
        fill_opacity=1,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### The 4 neighbourhoods are depicted by 4 blue dots in the above map.

## Results and Discussion <a name="results"></a>

Our analysis shows that although there is a great number of restaurants in Toranto, there are pockets of low restaurant density fairly close to city center. To identify these pockets, we used clustering algorithm and segmmented our neighbourhood dataset accordingly. 

We used K-means clustering algorithm for for making 5 clusters each containing some neighbourhoods based on number of restaurants they have in their vicinity. Then we analysed each cluster by calculating Restaurant/Neighbourhood ratio of each cluster. We saw that cluster 2 had lowest ratio, which means very few restaurants are present within vicinity of each neighbourhood. There were total 6 neighbourhoods belonging to cluster 2. Then upon further analysis, we found that 2 among those were not good for opening up a new restaurant. Hence, only 4 neighbourhoods left.

According to our analysis, we got a total of 4 neighbourhoods where restaurant business will be good. There are two reasons for that. First reason is, we saw that these neighbourhoods does not contain much restaurants around their vicinity which will lower the competition in the restaurant business. Second reason is that, as we can see in the above map that these 4 neighbourhoods have good population density (especially M6P) which means more customers and hence more profit.

The final 4 neighbourhoods that are perfect for opening a new restaurant are stored in a dataframe named final which contains information about latitude, longitude and borough of these neighbourhoods. 

The owners can further chose from these 4 locations which will be the best according to the type of restaurant they are trying to open.

## Conclusion <a name="conclusion"></a>

Purpose of this project was to identify neighbourhoods in Toronto low number of restaurants in order to aid stakeholders in narrowing down the search for optimal location for a new restaurant. By calculating restaurant density distribution from Foursquare data we have first identified the most common nearby venues of each neighbourhood. Then with the help of clustering techniques and further analysis we were able to narrow down to 4 neighbourhoods which were good for opening up a new restaurant. This concludes this project of Battle of Neighbourhoods.