 # Capstone Project - The Battle of Neighborhoods (Week 2)
 
### Applied Data Science Capstone by IBM/Coursera

### Author: Ying

## Table of Content

* [Introduction](#introduction)
* [Business Problem](#problem)
* [Data](#data)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

<b>Toronto</b> is the provincial capital of Ontario and the most populous city in Canada. Toronto is an international centre of business, finance, arts, and culture, and is recognized as one of the most multicultural and cosmopolitan cities in the world. The diverse population of Toronto reflects its current and historical role as an important destination for immigrants to Canada. More than 50 percent of residents belong to a visible minority population group, and over 200 distinct ethnic origins are represented among its inhabitants. While the majority of Torontonians speak English as their primary language, over 160 languages are spoken in the city. The insights derived from analysis will give good understanding of the business environment which help in strategically targeting the market. This will help in reduction of risk of starting a new business. And the Return on Investment will be reasonable.

## Business Problem <a name="problem"></a>

The cuisine of Toronto reflects Toronto's size and multicultural diversity. Let’s say one of my friends, Lily, wants to open a sushi restaurant in Toront. Starting a sushi restaurant can be a great business opportunity. New sushi bar should be open in an area that inadequate neighborhood in this way the bar can attract more customers. Therefore, this analysis necessary to ensure that we have enough customers and that we are not so close to other sushi places.

## Data <a name="data"></a>

Neighborhood has a total of 10 boroughs and 103 neighborhoods. In order to segement the neighborhoods and explore them, we will essentially need a dataset that contains the 5 boroughs and the neighborhoods that exist in each borough as well as the the latitude and logitude coordinates of each neighborhood. The data is from our week3 assignment, I will export the data as csv file for future analysis.

Now once we got the neighborhood's latitude and longitude, let's use Foursquare Location to get the Mission District Neighborhood's all restaurant details. The restaurant details can be retrieved using search endpoint. For our project we need only Sushi restaurant data, and in search endpoint there is a attribute called category id, i.e for each category(like Indian or Italian or Mexican Restaurant) foursquare has a defined categoryid which will help us to get the desired data. Sushi category Id 4bf58dd8d48988d1d2941735 is used for retrieving data from Foursquare API.

## Analysis <a name="analysis"></a>

### Import Libraries

In this section we import the libraries that will be required to process the data.

The first library is Pandas.
Pandas is an open source, BSD-licensed library, providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

In [3]:
pip install beautifulsoup4

Collecting beautifulsoup4
[?25l  Downloading https://files.pythonhosted.org/packages/e8/b5/7bb03a696f2c9b7af792a8f51b82974e51c268f15e925fc834876a4efa0b/beautifulsoup4-4.9.0-py3-none-any.whl (109kB)
[K     |████████████████████████████████| 112kB 10.5MB/s eta 0:00:01
[?25hCollecting soupsieve>1.2 (from beautifulsoup4)
  Downloading https://files.pythonhosted.org/packages/05/cf/ea245e52f55823f19992447b008bcbb7f78efc5960d77f6c34b5b45b36dd/soupsieve-2.0-py2.py3-none-any.whl
Installing collected packages: soupsieve, beautifulsoup4
Successfully installed beautifulsoup4-4.9.0 soupsieve-2.0
Note: you may need to restart the kernel to use updated packages.


In [4]:
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim
import urllib.request
import json
from bs4 import BeautifulSoup
from urllib.request import urlopen
import requests
from pandas.io.json import json_normalize

import matplotlib.cm as cm
import matplotlib.colors as colors
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import matplotlib.colors as colors
%matplotlib inline
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes
import folium

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Libraries imported.


#### Tranform the data into a *pandas* dataframe

Download and Explore Dataset
Neighborhood has a total of 10 boroughs and 103 neighborhoods. In order to segement the neighborhoods and explore them, we will essentially need a dataset that contains the 5 boroughs and the neighborhoods that exist in each borough as well as the the latitude and logitude coordinates of each neighborhood.

This is the assignment for capstone week3 projects. For convenience, we load the cleaned data directly from week3 results.So let's go ahead and do that.

In [5]:
data = pd.read_csv("Toront data.csv",header=0,index_col=0)

print('Data downloaded!')

data.head(5)

Data downloaded!


Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [74]:
data.shape

(103, 5)

#### Use geopy library to get the latitude and longitude values of Toronto City.

In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>ny_explorer</em>, as shown below.

In [6]:
address = 'Toronto, CA'

geolocator = Nominatim(user_agent="explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


#### Create a map of Toronto with neighborhoods superimposed on top.

In [7]:
import folium
# create map of New York using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, Neighbourhood in zip(data['Latitude'], data['Longitude'], data['Borough'], data['Neighbourhood']):
    label = '{}, {}'.format(Neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

## Foursquare venues


In [8]:
import urllib
def getNearbyVenues(names, latitudes, longitudes, radius=5000, categoryIds=''):
    try:
        venues_list=[]
        for name, lat, lng in zip(names, latitudes, longitudes):
            #print(name)

            # create the API request URL
            url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, radius, LIMIT)

            if (categoryIds != ''):
                url = url + '&categoryId={}'
                url = url.format(categoryIds)

            # make the GET request
            response = requests.get(url).json()
            results = response["response"]['venues']

            # return only relevant information for each nearby venue
            for v in results:
                success = False
                try:
                    category = v['categories'][0]['name']
                    success = True
                except:
                    pass

                if success:
                    venues_list.append([(
                        name, 
                        lat, 
                        lng, 
                        v['name'], 
                        v['location']['lat'], 
                        v['location']['lng'],
                        v['categories'][0]['name']
                    )])

        nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
        nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude',  
                  'Venue Category']
    
    except:
        print(url)
        print(response)
        print(results)
        print(nearby_venues)

    return(nearby_venues)

In [9]:
LIMIT = 500 
radius = 5000 
CLIENT_ID = 'ZMHWBS0SR12Z3YDYVHJVTZPRK3U1ZP3I2TYQAJ5CU3JUHMB5'
CLIENT_SECRET = 'H3TT0XT3P5TIAFCV1Y2UUVLF42N44DICNKLUELK34H2TKLFR'
VERSION = '20181020'

In [10]:
#https://developer.foursquare.com/docs/resources/categories
#Sushi = 4bf58dd8d48988d1d2941735
Toronto_venues_sushi = getNearbyVenues(names=data['Neighbourhood'], latitudes=data['Latitude'], longitudes=data['Longitude'], radius=1000, categoryIds='4bf58dd8d48988d1d2941735')
Toronto_venues_sushi.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Cedarbrae,43.773136,-79.239476,HO Sushi Express,43.780965,-79.247445,Sushi Restaurant
1,Cedarbrae,43.773136,-79.239476,Fukuoka Sushi,43.780679,-79.24732,Sushi Restaurant
2,"Kennedy Park, Ionview, East Birchmount Park",43.727929,-79.262029,Ikki sushi,43.733489,-79.257644,Sushi Restaurant
3,"Dorset Park, Wexford Heights, Scarborough Town...",43.75741,-79.273304,Shiro Sushi,43.756228,-79.266965,Japanese Restaurant
4,Agincourt,43.7942,-79.262029,Sushi Legend,43.796602,-79.270292,Sushi Restaurant


In [11]:
Toronto_venues_sushi.to_csv("Toronto_venues_sushi.csv")

In [12]:
Toronto_venues_sushi.shape

(778, 7)

In [13]:
def addToMap(df, color, existingMap):
    for lat, lng, local, venue, venueCat in zip(df['Venue Latitude'], df['Venue Longitude'], df['Neighbourhood'], df['Venue'], df['Venue Category']):
        label = '{} ({}) - {}'.format(venue, venueCat, local)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color=color,
            fill=True,
            fill_color=color,
            fill_opacity=0.7).add_to(existingMap)

In [14]:
map_toronto_sushi = folium.Map(location=[latitude, longitude], zoom_start=10)
addToMap(Toronto_venues_sushi, 'red', map_toronto_sushi)

map_toronto_sushi

In [15]:
def addColumn(startDf, columnTitle, dataDf):
    grouped = dataDf.groupby('Neighborhood').count()
    
    for n in startDf['Neighborhood']:
        try:
            startDf.loc[startDf['Neighborhood'] == n,columnTitle] = grouped.loc[n, 'Venue']
        except:
            startDf.loc[startDf['Neighborhood'] == n,columnTitle] = 0

In [16]:
toronto_grouped = Toronto_venues_sushi.groupby('Neighbourhood').count()

print('There are {} uniques categories.'.format(len(Toronto_venues_sushi['Venue Category'].unique())))

toronto_grouped

There are 10 uniques categories.


Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,1,1,1,1,1,1
"Bathurst Manor, Wilson Heights, Downsview North",1,1,1,1,1,1
Bayview Village,2,2,2,2,2,2
"Bedford Park, Lawrence Manor East",4,4,4,4,4,4
Berczy Park,30,30,30,30,30,30
"Brockton, Parkdale Village, Exhibition Place",3,3,3,3,3,3
Business reply mail Processing CentrE,3,3,3,3,3,3
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",1,1,1,1,1,1
Canada Post Gateway Processing Centre,2,2,2,2,2,2
Cedarbrae,2,2,2,2,2,2


## 3. Analyze Each Neighborhood

In [17]:
# one hot encoding
onehot = pd.get_dummies(Toronto_venues_sushi[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
onehot['Neighbourhood'] = Toronto_venues_sushi['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [onehot.columns[-1]] + list(onehot.columns[:-1])

onehot = onehot[fixed_columns]

print(onehot.shape)
onehot.head()

(778, 11)


Unnamed: 0,Neighbourhood,Asian Restaurant,Fast Food Restaurant,Grocery Store,Hawaiian Restaurant,Japanese Restaurant,Korean Restaurant,Restaurant,Seafood Restaurant,Sushi Restaurant,Thai Restaurant
0,Cedarbrae,0,0,0,0,0,0,0,0,1,0
1,Cedarbrae,0,0,0,0,0,0,0,0,1,0
2,"Kennedy Park, Ionview, East Birchmount Park",0,0,0,0,0,0,0,0,1,0
3,"Dorset Park, Wexford Heights, Scarborough Town...",0,0,0,0,1,0,0,0,0,0
4,Agincourt,0,0,0,0,0,0,0,0,1,0


In [18]:
onehot_grouped = onehot.groupby('Neighbourhood').mean().reset_index()
print(onehot_grouped.shape)
onehot_grouped

(68, 11)


Unnamed: 0,Neighbourhood,Asian Restaurant,Fast Food Restaurant,Grocery Store,Hawaiian Restaurant,Japanese Restaurant,Korean Restaurant,Restaurant,Seafood Restaurant,Sushi Restaurant,Thai Restaurant
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
1,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
2,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
3,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
4,Berczy Park,0.033333,0.0,0.0,0.0,0.133333,0.0,0.2,0.0,0.633333,0.0
5,"Brockton, Parkdale Village, Exhibition Place",0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.666667,0.0
6,Business reply mail Processing CentrE,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
7,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
8,Canada Post Gateway Processing Centre,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
9,Cedarbrae,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0


In [19]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [36]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighbourhood'] = onehot_grouped['Neighbourhood']

for ind in np.arange(onehot_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(onehot_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Sushi Restaurant,Thai Restaurant,Seafood Restaurant,Restaurant,Korean Restaurant,Japanese Restaurant,Hawaiian Restaurant,Grocery Store,Fast Food Restaurant,Asian Restaurant
1,"Bathurst Manor, Wilson Heights, Downsview North",Sushi Restaurant,Thai Restaurant,Seafood Restaurant,Restaurant,Korean Restaurant,Japanese Restaurant,Hawaiian Restaurant,Grocery Store,Fast Food Restaurant,Asian Restaurant
2,Bayview Village,Sushi Restaurant,Thai Restaurant,Seafood Restaurant,Restaurant,Korean Restaurant,Japanese Restaurant,Hawaiian Restaurant,Grocery Store,Fast Food Restaurant,Asian Restaurant
3,"Bedford Park, Lawrence Manor East",Sushi Restaurant,Thai Restaurant,Seafood Restaurant,Restaurant,Korean Restaurant,Japanese Restaurant,Hawaiian Restaurant,Grocery Store,Fast Food Restaurant,Asian Restaurant
4,Berczy Park,Sushi Restaurant,Restaurant,Japanese Restaurant,Asian Restaurant,Thai Restaurant,Seafood Restaurant,Korean Restaurant,Hawaiian Restaurant,Grocery Store,Fast Food Restaurant


Cluster Neighborhoods


In [63]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = onehot_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 1, 1, 2, 3, 1, 1, 1, 1], dtype=int32)

In [64]:
neighborhoods_venues_sorted.columns

Index(['Cluster Labels', 'Neighbourhood', '1st Most Common Venue',
       '2nd Most Common Venue', '3rd Most Common Venue',
       '4th Most Common Venue', '5th Most Common Venue',
       '6th Most Common Venue', '7th Most Common Venue',
       '8th Most Common Venue', '9th Most Common Venue',
       '10th Most Common Venue'],
      dtype='object')

In [65]:
# add clustering labels

if 'Cluster Labels' in neighborhoods_venues_sorted.columns:
    print("neighborhoods_venues_sorted has Cluster Labels columns")
else:
    neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = data
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')
toronto_merged.fillna(0,inplace=True)
toronto_merged[['Cluster Labels']] = toronto_merged[['Cluster Labels']].astype(int)
toronto_merged

neighborhoods_venues_sorted has Cluster Labels columns


Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353,0,0,0,0,0,0,0,0,0,0,0
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,0,0,0,0,0,0,0,0,0,0,0
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,0,0,0,0,0,0,0,0,0,0,0
3,M1G,Scarborough,Woburn,43.770992,-79.216917,0,0,0,0,0,0,0,0,0,0,0
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,0,Sushi Restaurant,Thai Restaurant,Seafood Restaurant,Restaurant,Korean Restaurant,Japanese Restaurant,Hawaiian Restaurant,Grocery Store,Fast Food Restaurant,Asian Restaurant
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476,0,0,0,0,0,0,0,0,0,0,0
6,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park",43.727929,-79.262029,0,Sushi Restaurant,Thai Restaurant,Seafood Restaurant,Restaurant,Korean Restaurant,Japanese Restaurant,Hawaiian Restaurant,Grocery Store,Fast Food Restaurant,Asian Restaurant
7,M1L,Scarborough,"Golden Mile, Clairlea, Oakridge",43.711112,-79.284577,0,0,0,0,0,0,0,0,0,0,0
8,M1M,Scarborough,"Cliffside, Cliffcrest, Scarborough Village West",43.716316,-79.239476,0,0,0,0,0,0,0,0,0,0,0
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848,0,0,0,0,0,0,0,0,0,0,0


In [66]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [67]:
print('There are {} uniques cluster labels.'.format(len(toronto_merged['Cluster Labels'].unique())))

There are 6 uniques cluster labels.


In [68]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Scarborough,0,0,0,0,0,0,0,0,0,0,0
1,Scarborough,0,0,0,0,0,0,0,0,0,0,0
2,Scarborough,0,0,0,0,0,0,0,0,0,0,0
3,Scarborough,0,0,0,0,0,0,0,0,0,0,0
4,Scarborough,0,Sushi Restaurant,Thai Restaurant,Seafood Restaurant,Restaurant,Korean Restaurant,Japanese Restaurant,Hawaiian Restaurant,Grocery Store,Fast Food Restaurant,Asian Restaurant
5,Scarborough,0,0,0,0,0,0,0,0,0,0,0
6,Scarborough,0,Sushi Restaurant,Thai Restaurant,Seafood Restaurant,Restaurant,Korean Restaurant,Japanese Restaurant,Hawaiian Restaurant,Grocery Store,Fast Food Restaurant,Asian Restaurant
7,Scarborough,0,0,0,0,0,0,0,0,0,0,0
8,Scarborough,0,0,0,0,0,0,0,0,0,0,0
9,Scarborough,0,0,0,0,0,0,0,0,0,0,0


In [69]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,North York,1,Japanese Restaurant,Sushi Restaurant,Restaurant,Asian Restaurant,Thai Restaurant,Seafood Restaurant,Korean Restaurant,Hawaiian Restaurant,Grocery Store,Fast Food Restaurant


In [70]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
26,North York,2,Sushi Restaurant,Japanese Restaurant,Thai Restaurant,Seafood Restaurant,Restaurant,Korean Restaurant,Hawaiian Restaurant,Grocery Store,Fast Food Restaurant,Asian Restaurant
27,North York,2,Sushi Restaurant,Japanese Restaurant,Thai Restaurant,Seafood Restaurant,Restaurant,Korean Restaurant,Hawaiian Restaurant,Grocery Store,Fast Food Restaurant,Asian Restaurant
37,East Toronto,2,Sushi Restaurant,Japanese Restaurant,Thai Restaurant,Seafood Restaurant,Restaurant,Korean Restaurant,Hawaiian Restaurant,Grocery Store,Fast Food Restaurant,Asian Restaurant
38,East York,2,Sushi Restaurant,Japanese Restaurant,Thai Restaurant,Seafood Restaurant,Restaurant,Korean Restaurant,Hawaiian Restaurant,Grocery Store,Fast Food Restaurant,Asian Restaurant
48,Central Toronto,2,Sushi Restaurant,Japanese Restaurant,Thai Restaurant,Seafood Restaurant,Restaurant,Korean Restaurant,Hawaiian Restaurant,Grocery Store,Fast Food Restaurant,Asian Restaurant
49,Central Toronto,2,Sushi Restaurant,Japanese Restaurant,Thai Restaurant,Seafood Restaurant,Restaurant,Korean Restaurant,Hawaiian Restaurant,Grocery Store,Fast Food Restaurant,Asian Restaurant
51,Downtown Toronto,2,Sushi Restaurant,Korean Restaurant,Thai Restaurant,Seafood Restaurant,Restaurant,Japanese Restaurant,Hawaiian Restaurant,Grocery Store,Fast Food Restaurant,Asian Restaurant
52,Downtown Toronto,2,Sushi Restaurant,Japanese Restaurant,Restaurant,Korean Restaurant,Fast Food Restaurant,Asian Restaurant,Thai Restaurant,Seafood Restaurant,Hawaiian Restaurant,Grocery Store
54,Downtown Toronto,2,Sushi Restaurant,Japanese Restaurant,Fast Food Restaurant,Asian Restaurant,Thai Restaurant,Seafood Restaurant,Restaurant,Korean Restaurant,Hawaiian Restaurant,Grocery Store
55,Downtown Toronto,2,Sushi Restaurant,Restaurant,Japanese Restaurant,Grocery Store,Asian Restaurant,Thai Restaurant,Seafood Restaurant,Korean Restaurant,Hawaiian Restaurant,Fast Food Restaurant


In [71]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
29,North York,3,Sushi Restaurant,Restaurant,Thai Restaurant,Seafood Restaurant,Korean Restaurant,Japanese Restaurant,Hawaiian Restaurant,Grocery Store,Fast Food Restaurant,Asian Restaurant
46,Central Toronto,3,Sushi Restaurant,Restaurant,Thai Restaurant,Seafood Restaurant,Korean Restaurant,Japanese Restaurant,Hawaiian Restaurant,Grocery Store,Fast Food Restaurant,Asian Restaurant
56,Downtown Toronto,3,Sushi Restaurant,Restaurant,Japanese Restaurant,Asian Restaurant,Thai Restaurant,Seafood Restaurant,Korean Restaurant,Hawaiian Restaurant,Grocery Store,Fast Food Restaurant
59,Downtown Toronto,3,Sushi Restaurant,Restaurant,Japanese Restaurant,Thai Restaurant,Seafood Restaurant,Korean Restaurant,Hawaiian Restaurant,Grocery Store,Fast Food Restaurant,Asian Restaurant
69,Downtown Toronto,3,Sushi Restaurant,Restaurant,Japanese Restaurant,Asian Restaurant,Thai Restaurant,Seafood Restaurant,Korean Restaurant,Hawaiian Restaurant,Grocery Store,Fast Food Restaurant


In [72]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,Scarborough,4,Japanese Restaurant,Thai Restaurant,Sushi Restaurant,Seafood Restaurant,Restaurant,Korean Restaurant,Hawaiian Restaurant,Grocery Store,Fast Food Restaurant,Asian Restaurant
75,Downtown Toronto,4,Japanese Restaurant,Thai Restaurant,Sushi Restaurant,Seafood Restaurant,Restaurant,Korean Restaurant,Hawaiian Restaurant,Grocery Store,Fast Food Restaurant,Asian Restaurant


In [73]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 5, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
50,Downtown Toronto,5,Sushi Restaurant,Korean Restaurant,Thai Restaurant,Seafood Restaurant,Restaurant,Japanese Restaurant,Hawaiian Restaurant,Grocery Store,Fast Food Restaurant,Asian Restaurant


## Results and Discussion <a name="results"></a>

Our analysis shows that although there is a great number of Sushi restaurants in Toronto (~800 in our initial area of interest which was radius 1000 around each neighbourhood), there are pockets of low restaurant density fairly close to city center. Highest concentration of restaurants was detected in downtown Toronto area, so we focused our attention to areas north, north-east, far away from downtown area. 

Those location candidates were then clustered to create zones of interest which contain greatest number of location candidates. Purpose of this analysis was to only provide info on areas in Toronto but not crowded with existing restaurants (particularly Sushi) - it is entirely possible that there is a very good reason for small number of restaurants in any of those areas, reasons which would make them unsuitable for a new restaurant regardless of lack of competition in the area. Recommended zones should therefore be considered only as a starting point for more detailed analysis which could eventually result in location which has not only no nearby competition but also other factors taken into account and all other relevant conditions met.

## Conclusion <a name="conclusion"></a>

Purpose of this project was to identify Toronto areas close to center with low number of restaurants (particularly SUshi restaurants) in order to aid stakeholders in narrowing down the search for optimal location for a new Sushi restaurant. By calculating restaurant density distribution from Foursquare data we have first identified general avenues that justify further analysis. Clustering of those locations was then performed in order to create major zones of interest (containing greatest number of potential locations) and addresses of those zone centers were created to be used as starting points for final exploration by stakeholders. From the clustering results, area near cluster 2, cluster 4 and cluster 5 might be good to open a new restaurant. However, since cluster 5 is a address in Downtown, it's better to drop this choice. So let's choose cluster 2 and cluster 4 as candidates.

Final decision on optimal restaurant location will be made by stakeholders based on specific characteristics of neighborhoods and locations in every recommended zone, taking into consideration additional factors like attractiveness of each location (proximity to park or water), levels of noise / proximity to major roads, real estate availability, prices, social and economic dynamics of every neighborhood etc.