<h1>Identifying the Best Locations in the City of Toronto for a New Restaurant</h1>

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [46]:
# Importing Packages
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

# Matplotlib and associated plotting modules
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors

from unicodedata import normalize

!pip install lxml
import lxml.html as lh


!pip install geopy
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
print('geopy Libraries imported.')

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# import k-means from clustering stage
from sklearn.cluster import KMeans

!pip install folium
import folium # map rendering library

print('folium Libraries imported.')

geopy Libraries imported.
folium Libraries imported.


## Importing Toronto Neighbourhoods Data


Toronto neighbourhood data is available at a Wikipedia page <a href="https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M">here</a>. First, let us read the data from the page.


In [47]:
# url of the page that contains Toronto neighbourhoods data
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
# create the web page and store it
page = requests.get(url)
page_con = lh.fromstring(page.content)
# read <tr>..</tr> data 
tr_elements = page_con.xpath('//tr')

Get the columns (headers) of the data set

In [48]:
# empty list
first_list = []
i = 0
# store first elements and an empty lists
for t in tr_elements[0]:
    i+=1
    name = t.text_content()
    print ("i = {} and name = {}".format(i,name))
    first_list.append((name,[]))

i = 1 and name = Postal Code

i = 2 and name = Borough

i = 3 and name = Neighbourhood



Now we have the headers, we need to read and store the rows of data

In [49]:
for j in range(1,len(tr_elements)):
    T = tr_elements[j]
    
    # If row is not of size 3, the //tr data is not from our table 
    if len(T)!=3:
        break

    i = 0
    for t in T.iterchildren():
        data=t.text_content() 
        # check if row is empty
        if i>0:
        # convert any numerical value to integers
            try:
                data = int(data)
            except:
                pass
        # Append the data to the empty list of the i'th column
        first_list[i][1].append(data)
        # Increment i for the next column
        i+=1

Now we have a list that contains all required data. Next step is cleaning this data.

In [50]:
Dict = {title:column for (title,column) in first_list} # convert first_list list into a python dictionary
df = pd.DataFrame(Dict) # convert Dict into a dataframe
df = df.stack().str.replace(r'\n','').unstack() # remove empty extra lines from the dataframe
df.columns = ['Postal Code', 'Borough', 'Neighbourhood'] # re-define the dataframe columns
df['Postal Code'].replace('', np.nan, inplace=True) # replace empty entries in the 'Postal Code' column with Numpy.nan
df.dropna(subset=['Postal Code'], inplace=True) # remove rows that contain Numpy.nan in their respective 'Postal Code' column
df.shape # print the shape of the dataframe

(180, 3)

More cleeaning..

In [51]:
# remove rows with 'Not assigned' values in their 'Borough' column
df.drop(df[df['Borough'] == "Not assigned"].index, inplace = True)
df.reset_index(drop=True, inplace=True) # reset dataframe index
df.head() # print the first five rows of the dataframe

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


With this, the dataframe is ready. Let us find its final shape.

In [52]:
df.shape # print the shape of the dataframe

(103, 3)


## 2. Importing Location Details of Toronto Neighbourhoods and Creating the Final Dataframe


In [53]:
# read data from https://cocl.us/Geospatial_data (a .csv file) into a dataframe
coor_df = pd.read_csv('https://cocl.us/Geospatial_data')
coor_df.head() # print the first five rows of the dataframe

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Now let's merge the two dataframes to get the final dataframe that contains postal codes, Borough, neighbourhood, lat and lng

In [54]:
# left merge df and coor_df
neighborhoods = pd.merge(left=df, right=coor_df, how='left', left_on='Postal Code', right_on='Postal Code')
neighborhoods.head() # print the first five rows of the dataframe

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


In [55]:
neighborhoods.shape

(103, 5)


## 3. Segmenting and Clustering Toronto Neighbourhoods


In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>to_explorer</em>, as shown below.

In [56]:
address = 'Toronto, ON, Canada'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geographical coordinate of the city of Toronto are {}, {}.'.format(latitude, longitude))

The geographical coordinate of the city of Toronto are 43.6534817, -79.3839347.


#### Let's create a map of Toronto with neighborhoods superimposed on top.

In [57]:
# create map of Downtown Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### Enter your Foursquare credintials

In [58]:
CLIENT_ID = '0454FJDPXGU41HCHMKPRTTFFCA0W1IGYYM0SOVL2CMNXICYF' # your Foursquare ID
CLIENT_SECRET = '4W2F5HLDC5JOWG2BZZ4R3CYI5SNLECZDQ4OPOHAYNV2AR0SP' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 0454FJDPXGU41HCHMKPRTTFFCA0W1IGYYM0SOVL2CMNXICYF
CLIENT_SECRET:4W2F5HLDC5JOWG2BZZ4R3CYI5SNLECZDQ4OPOHAYNV2AR0SP


View the first neighbourhood in Downtown Toronto

In [59]:
neighborhoods.loc[0, 'Neighbourhood']

'Parkwoods'

In [60]:
neighborhood_latitude = neighborhoods.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = neighborhoods.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = neighborhoods.loc[0, 'Neighbourhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Parkwoods are 43.7532586, -79.3296565.


Get venues within 500 m from Parkwoods

In [61]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
results = requests.get(url).json()
print(results["response"])
#pd.DataFrame.from_dict(results)
#pd.DataFrame.from_dict(results['response']['groups'])



### Create functions to extract venues and their details from the response

In [62]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [63]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [64]:
toronto_venues = getNearbyVenues(names=neighborhoods['Neighbourhood'],
                                   latitudes=neighborhoods['Latitude'],
                                   longitudes=neighborhoods['Longitude']
                                  )

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park, Ontario Provincial Government
Islington Avenue, Humber Valley Village
Malvern, Rouge
Don Mills
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto, Broadview North (Old East York)
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmo

In [65]:
print(toronto_venues.shape)
toronto_venues.head()

(2139, 7)


Unnamed: 0,Neighbourhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
3,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant
4,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop


<br>
<h4> Change Venue Category of all types of restaurants to <strong>'Restaurant'</strong> </h4>

In [66]:
toronto_venues.loc[toronto_venues['Venue Category'].str.contains('Restaurant', case=False), 'Venue Category'] = 'Restaurant'
toronto_venues.head()

Unnamed: 0,Neighbourhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
3,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Restaurant
4,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop


<br>
<h4> Change Venue Category of all non-restaurant venues to <strong>'Non Restaurant'</strong> </h4>

In [67]:
a = toronto_venues['Venue Category'] != 'Restaurant'
column_name = 'Venue Category'
toronto_venues.loc[a, column_name] = 'Non Restaurant'
toronto_venues.head()

Unnamed: 0,Neighbourhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Non Restaurant
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Non Restaurant
2,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Non Restaurant
3,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Restaurant
4,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Non Restaurant


#### Create a dataframe of Toronto venues

In [68]:
#toronto_venues.groupby('Neighbourhood').count()
venues = toronto_venues[['Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category']]
venues.head()

Unnamed: 0,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Brookbanks Park,43.751976,-79.33214,Non Restaurant
1,Variety Store,43.751974,-79.333114,Non Restaurant
2,Victoria Village Arena,43.723481,-79.315635,Non Restaurant
3,Portugril,43.725819,-79.312785,Restaurant
4,Tim Hortons,43.725517,-79.313103,Non Restaurant


#### Create a dataframe of the restaurants located in Downtown Toronto

In [69]:
#downtown_restaurants = downtown_venues[downtown_venues['Venue Category'].str.contains('Resta')]
#downtown_restaurants = downtown_restaurants.drop_duplicates()
#downtown_restaurants.reset_index(drop=True, inplace=True)
#downtown_restaurants.sort_values('Venue', inplace=True, ignore_index=True)
#print('There are {} restaurants in Downtown Toronto.'.format(len(downtown_restaurants['Venue'])))
#downtown_restaurants

## 4. Analyze Each Venue in Toronto

In [70]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighbourhood'] = toronto_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]
print(toronto_onehot.shape)
toronto_onehot.head()

(2139, 3)


Unnamed: 0,Neighbourhood,Non Restaurant,Restaurant
0,Parkwoods,1,0
1,Parkwoods,1,0
2,Victoria Village,1,0
3,Victoria Village,0,1
4,Victoria Village,1,0


#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [71]:
toronto_grouped = toronto_onehot.groupby('Neighbourhood').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Neighbourhood,Non Restaurant,Restaurant
0,Agincourt,0.8,0.2
1,"Alderwood, Long Branch",1.0,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.809524,0.190476
3,Bayview Village,0.5,0.5
4,"Bedford Park, Lawrence Manor East",0.590909,0.409091


#### Now let's find the count of venues in each neighborhood

In [72]:
toronto_grouped2 = toronto_onehot.groupby('Neighbourhood').count().reset_index()
toronto_grouped2.drop(['Restaurant'], axis=1, inplace=True)
toronto_grouped2.rename(columns={"Non Restaurant": "Count"}, inplace=True)
toronto_grouped2.head()

Unnamed: 0,Neighbourhood,Count
0,Agincourt,5
1,"Alderwood, Long Branch",7
2,"Bathurst Manor, Wilson Heights, Downsview North",21
3,Bayview Village,4
4,"Bedford Park, Lawrence Manor East",22


In [73]:
max_count = toronto_grouped2['Count'].max()
toronto_grouped2['Count'] = toronto_grouped2['Count']/max_count
toronto_grouped2.head()

Unnamed: 0,Neighbourhood,Count
0,Agincourt,0.05
1,"Alderwood, Long Branch",0.07
2,"Bathurst Manor, Wilson Heights, Downsview North",0.21
3,Bayview Village,0.04
4,"Bedford Park, Lawrence Manor East",0.22


#### Next, let's merge the two dataframes

In [74]:
toronto_grouped = pd.merge(left=toronto_grouped, right=toronto_grouped2, how='left', left_on='Neighbourhood', right_on='Neighbourhood')
toronto_grouped.head()

Unnamed: 0,Neighbourhood,Non Restaurant,Restaurant,Count
0,Agincourt,0.8,0.2,0.05
1,"Alderwood, Long Branch",1.0,0.0,0.07
2,"Bathurst Manor, Wilson Heights, Downsview North",0.809524,0.190476,0.21
3,Bayview Village,0.5,0.5,0.04
4,"Bedford Park, Lawrence Manor East",0.590909,0.409091,0.22


## 4. Cluster Neighborhoods

Run _k_-means to cluster the neighborhood into 5 clusters.

In [75]:
# set number of clusters
kclusters = 4

toronto_grouped_clustering = toronto_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:100] 

array([0, 1, 0, 0, 0, 2, 1, 1, 1, 1, 1, 0, 0, 2, 1, 2, 0, 0, 2, 0, 1, 0,
       0, 3, 1, 1, 1, 1, 2, 2, 0, 2, 0, 1, 0, 2, 0, 0, 1, 1, 1, 0, 1, 2,
       1, 1, 1, 1, 0, 3, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 2, 1,
       1, 1, 0, 1, 1, 1, 2, 0, 0, 2, 0, 0, 1, 1, 0, 1, 0, 2, 0, 0, 1, 1,
       1, 1, 0, 1, 0, 1, 1, 1], dtype=int32)

In [76]:
# add clustering labels
toronto_grouped.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = neighborhoods

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(toronto_grouped.set_index('Neighbourhood'), on='Neighbourhood')
#print(toronto_merged['Cluster Labels'])
toronto_merged.head() # check the last columns!

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,Non Restaurant,Restaurant,Count
0,M3A,North York,Parkwoods,43.753259,-79.329656,1.0,1.0,0.0,0.02
1,M4A,North York,Victoria Village,43.725882,-79.315572,0.0,0.8,0.2,0.05
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,1.0,0.931818,0.068182,0.44
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,1.0,0.923077,0.076923,0.13
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,0.0,0.818182,0.181818,0.33


In [77]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)
toronto_merged['Cluster Labels'] = toronto_merged['Cluster Labels'].fillna(kclusters)
# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
print(toronto_merged['Cluster Labels'])
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

0      1.0
1      0.0
2      1.0
3      1.0
4      0.0
5      4.0
6      3.0
7      0.0
8      1.0
9      2.0
10     0.0
11     1.0
12     1.0
13     0.0
14     1.0
15     2.0
16     1.0
17     1.0
18     0.0
19     1.0
20     2.0
21     1.0
22     0.0
23     1.0
24     2.0
25     1.0
26     0.0
27     0.0
28     0.0
29     0.0
30     2.0
31     1.0
32     1.0
33     2.0
34     1.0
35     1.0
36     2.0
37     0.0
38     1.0
39     0.0
40     1.0
41     0.0
42     2.0
43     1.0
44     1.0
45     1.0
46     1.0
47     0.0
48     2.0
49     1.0
50     1.0
51     0.0
52     4.0
53     1.0
54     0.0
55     0.0
56     0.0
57     1.0
58     1.0
59     0.0
60     1.0
61     1.0
62     1.0
63     1.0
64     1.0
65     3.0
66     1.0
67     1.0
68     0.0
69     0.0
70     1.0
71     1.0
72     1.0
73     0.0
74     1.0
75     0.0
76     0.0
77     1.0
78     0.0
79     0.0
80     0.0
81     0.0
82     0.0
83     1.0
84     2.0
85     1.0
86     0.0
87     1.0
88     0.0
89     1.0
90     0.0

In [78]:
cluster1 = toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(2, toronto_merged.shape[1]))]]
print(cluster1.shape)
cluster1.head()

(34, 8)


Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,Non Restaurant,Restaurant,Count
1,North York,Victoria Village,43.725882,-79.315572,0.0,0.8,0.2,0.05
4,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,0.0,0.818182,0.181818,0.33
7,North York,Don Mills,43.745906,-79.352188,0.0,0.652174,0.347826,0.23
10,North York,Glencairn,43.709577,-79.445073,0.0,0.8,0.2,0.05
13,North York,Don Mills,43.7259,-79.340923,0.0,0.652174,0.347826,0.23


In [79]:
cluster2 = toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(2, toronto_merged.shape[1]))]]
print(cluster2.shape)
cluster2.head()

(51, 8)


Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,Non Restaurant,Restaurant,Count
0,North York,Parkwoods,43.753259,-79.329656,1.0,1.0,0.0,0.02
2,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,1.0,0.931818,0.068182,0.44
3,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,1.0,0.923077,0.076923,0.13
8,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937,1.0,1.0,0.0,0.11
11,Etobicoke,"West Deane Park, Princess Gardens, Martin Grov...",43.650943,-79.554724,1.0,1.0,0.0,0.01


In [80]:
cluster3 = toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(2, toronto_merged.shape[1]))]]
print(cluster3.shape)
cluster3.head()

(13, 8)


Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,Non Restaurant,Restaurant,Count
9,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,2.0,0.79,0.21,1.0
15,Downtown Toronto,St. James Town,43.651494,-79.375418,2.0,0.717647,0.282353,0.85
20,Downtown Toronto,Berczy Park,43.644771,-79.373306,2.0,0.763636,0.236364,0.55
24,Downtown Toronto,Central Bay Street,43.657952,-79.387383,2.0,0.705882,0.294118,0.68
30,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568,2.0,0.77,0.23,1.0


In [81]:
cluster4 = toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(2, toronto_merged.shape[1]))]]
print(cluster4.shape)
cluster4.head()

(2, 8)


Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,Non Restaurant,Restaurant,Count
6,Scarborough,"Malvern, Rouge",43.806686,-79.194353,3.0,0.0,1.0,0.01
65,Scarborough,"Dorset Park, Wexford Heights, Scarborough Town...",43.75741,-79.273304,3.0,0.2,0.8,0.05


#### Analyse cluster 3

In [82]:
cluster3.sort_values(by=['Count', 'Non Restaurant'], ascending=False)

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,Non Restaurant,Restaurant,Count
36,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752,2.0,0.87,0.13,1.0
9,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,2.0,0.79,0.21,1.0
30,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568,2.0,0.77,0.23,1.0
42,Downtown Toronto,"Toronto Dominion Centre, Design Exchange",43.647177,-79.381576,2.0,0.74,0.26,1.0
97,Downtown Toronto,"First Canadian Place, Underground city",43.648429,-79.38228,2.0,0.71,0.29,1.0
48,Downtown Toronto,"Commerce Court, Victoria Hotel",43.648198,-79.379817,2.0,0.69,0.31,1.0
92,Downtown Toronto,Stn A PO Boxes,43.646435,-79.374846,2.0,0.770833,0.229167,0.96
15,Downtown Toronto,St. James Town,43.651494,-79.375418,2.0,0.717647,0.282353,0.85
99,Downtown Toronto,Church and Wellesley,43.66586,-79.38316,2.0,0.68,0.32,0.75
84,Downtown Toronto,"Kensington Market, Chinatown, Grange Park",43.653206,-79.400049,2.0,0.716216,0.283784,0.74
