# Segmenting and Clustering Neighborhoods in Toronto

## Download our excel data created from the wikipedia page

In [133]:
import pandas as pd

#Initial Data
df = pd.read_csv('Postcode Toronto.csv')
df.head() #Initial Data

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


## Question 1: Create a table with particular features

### Feature one: Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.

In [134]:
#These are the Data that do not have 'Not aasigned' in Borough
df_1 = df[df["Borough"]!='Not assigned']
n = df_1.shape[0]

### Feature two : If a cell has a borough but a 'Not assigned' Neighborhood, then the neighborhood will be the same as the borough.

In [None]:
#Data that had Not aasigned in the Neighborhood column and replace them with the information they had in the Borough column
df_2 = df_1[df_1['Neighbourhood']=='Not assigned']
df_2['Neighbourhood'].replace('Not assigned',df_2['Borough'], inplace=True)

# We remove those that meet that in the Neighborhood column = Not assigned. and by df_2 we know that it is only one element
Num = df_1[df_1['Neighbourhood']=='Not assigned'].index.values.item()
df_1.drop(Num, axis=0, inplace =True)

### We create the table that meets the above conditions

In [136]:
# We put together the df_1 and df_2 to have all the conditions related to Not assigned
df_3 = df_1.append(df_2,ignore_index=True)
N = df_3.shape[0]
df_3.head() 

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights


### Feature three : More than one neighborhood can exist in one postal code area. These rows will be combined into one row with the neighborhoods separated with a comma

### To solve this feature we will do the following:

#### We create a table with the account of each different postal code

In [137]:
#We have a table call 'postal' that tells us how many elements there are for each different postal code
postal = df_3['Postcode'].value_counts().to_frame()
A = postal.shape[0]
B = list(range(0,A,1))

postal['S'] = B
postal['Postcodename']=postal.index
postal.set_index('S', inplace=True)
maxi = postal['Postcode'].max()

### In two empty lists, called 'Caja1' and 'Caja2', we put the elements of the table that meet the conditions that have the repetition of the postal code greater than 1 or only have a one postal code

In [138]:
# Lists that include postal codes that are repeated more than once and those that do not
Caja1 = []
Caja2 = []
for i in range(0,A):
    if postal.loc[i,'Postcode'] > 1:
        Caja1.append(postal.loc[i,'Postcodename'])

for i in range(0,A):
    if postal.loc[i,'Postcode'] == 1:
        Caja2.append(postal.loc[i,'Postcodename'])

### We transform those lists to tables (df_4, df_5)

In [139]:
#Now we have a table that returns the postal codes that appear more than once in the data
df_4 = pd.DataFrame(Caja1)
df_4.rename(columns={0: 'Code'}, inplace = True)
J = df_4.shape[0]

#Now we have a table that returns the postal codes that appear only once in the data
df_5 = pd.DataFrame(Caja2)
df_5.rename(columns={0: 'Code'}, inplace = True)
P = df_5.shape[0]

### It is time to create the table that shows only the elements that have their postal code repeated

In [140]:
# We create the table called table1 which contains all the elements that have their postal code repeated at least twice
Vacio1 = []
tabla1 = pd.DataFrame(Vacio1)

for i in range(0,N):
    for j in range(0,J):
        if Caja1[j] == df_3.loc[i,'Postcode']:
            tabla1 = tabla1.append(df_3.loc[i,].to_frame().transpose(), ignore_index=True)
        
W = tabla1.shape[0]
tabla1.head() 

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M5A,Downtown Toronto,Harbourfront
1,M5A,Downtown Toronto,Regent Park
2,M6A,North York,Lawrence Heights
3,M6A,North York,Lawrence Manor
4,M1B,Scarborough,Rouge


### It is time to create the table that shows only the elements that do NOT have their postal code repeated

In [141]:
# We create the table called table2 which contains all the elements that have their postal code unique
Vacio2 = []
tabla2 = pd.DataFrame(Vacio2)
for i in range(0,N):
    for j in range(0,P):
        if Caja2[j] == df_3.loc[i,'Postcode']:
            tabla2 = tabla2.append(df_3.loc[i,].to_frame().transpose(), ignore_index=True)

K = tabla2.shape[0]
tabla2.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M9A,Etobicoke,Islington Avenue
3,M3B,North York,Don Mills North
4,M6B,North York,Glencairn


### Since we have both tables it will be necessary to concatenate the elements that have the same postal code

### For that, a table is created for each type of repetition, i.e. a table for those that are repeated only 2 times, for those that are repeated only 3 times and thus up to the maximum of repetitions that in this case are 8 times

In [142]:
#Creating the tables containing the postal codes that are repeated the same number of times
Vacio3 = [] 
C1 = tabla1.copy()

#Creating a table containing only the postal codes that are repeated twice
Po2 = postal[postal['Postcode']== 2]
D2 = Po2.shape[0]

Vector2 = Po2['Postcodename']
Lista2 = list(Vector2)
Tabla2 = pd.DataFrame(Vacio3)

for i in range(0,W):
    for j in range(0,D2):
        if Lista2[j] == C1.loc[i,'Postcode']:
            Tabla2 = Tabla2.append(C1.loc[i,].to_frame().transpose(), ignore_index=True)

#Creating a table containing only the postal codes that are repeated three times
Po3 = postal[postal['Postcode']== 3]
D3 = Po3.shape[0]

Vector3 = Po3['Postcodename']
Lista3 = list(Vector3)
Tabla3 = pd.DataFrame(Vacio3)

for i in range(0,W):
    for j in range(0,D3):
        if Lista3[j] == C1.loc[i,'Postcode']:
            Tabla3 = Tabla3.append(C1.loc[i,].to_frame().transpose(), ignore_index=True)

#Creating a table containing only the codes that are repeated four times
Po4 = postal[postal['Postcode']== 4]
D4 = Po4.shape[0]

Vector4 = Po4['Postcodename']
Lista4 = list(Vector4)
Tabla4 = pd.DataFrame(Vacio3)

for i in range(0,W):
    for j in range(0,D4):
        if Lista4[j] == C1.loc[i,'Postcode']:
            Tabla4 = Tabla4.append(C1.loc[i,].to_frame().transpose(), ignore_index=True)

#Creating a table containing only the codes that are repeated five times
Po5 = postal[postal['Postcode']== 5]
D5 = Po5.shape[0]

Vector5 = Po5['Postcodename']
Lista5 = list(Vector5)
Tabla5 = pd.DataFrame(Vacio3)

for i in range(0,W):
    for j in range(0,D5):
        if Lista5[j] == C1.loc[i,'Postcode']:
            Tabla5 = Tabla5.append(C1.loc[i,].to_frame().transpose(), ignore_index=True)

#Creating a table containing only the codes that are repeated seven times (There are no postal codes that are repeated six times)
Po7 = postal[postal['Postcode']== 7]
D7 = Po7.shape[0]

Vector7 = Po7['Postcodename']
Lista7 = list(Vector7)
Tabla7 = pd.DataFrame(Vacio3)

for i in range(0,W):
    for j in range(0,D7):
        if Lista7[j] == C1.loc[i,'Postcode']:
            Tabla7 = Tabla7.append(C1.loc[i,].to_frame().transpose(), ignore_index=True)
            
#Creating a table containing only the codes that are repeated eight times (There are no postal codes that are repeated more than eight times)
Po8 = postal[postal['Postcode']== 8]
D8 = Po8.shape[0]

Vector8 = Po8['Postcodename']
Lista8 = list(Vector8)
Tabla8 = pd.DataFrame(Vacio3)

for i in range(0,W):
    for j in range(0,D8):
        if Lista8[j] == C1.loc[i,'Postcode']:
            Tabla8 = Tabla8.append(C1.loc[i,].to_frame().transpose(), ignore_index=True) 

### Since we have those tables: with the following iterations, the elements of the Neighborhood columns are concatenated and added to a different table for each case

In [143]:
#We are going to get all the tables with the elements of the concatenated neighborhood columns and without repetitions
Vacio4 = []

#We get the table without the repeated postal codes of the case in which they were repeated only 2 times
C2 = Tabla2.copy()
T2 = C2.shape[0]
Buena2 = pd.DataFrame(Vacio4)

for i in range(0,T2-1):
    if Tabla2.loc[i,'Postcode'] == Tabla2.loc[i+1,'Postcode']:
        C2.loc[i,'Neighbourhood'] = C2.loc[i,'Neighbourhood'] + ',' + ' ' + C2.loc[i+1,'Neighbourhood']
        Buena2 = Buena2.append(C2.loc[i,], ignore_index=True)

#We get the table without the repeated postal codes of the case in which they were only repeated 3 times
C3 = Tabla3.copy()
T3 = C3.shape[0]
Buena3 = pd.DataFrame(Vacio4)

for i in range(0,T3-2):
    if Tabla3.loc[i,'Postcode'] == Tabla3.loc[i+1,'Postcode'] == Tabla3.loc[i+2,'Postcode']:
        C3.loc[i,'Neighbourhood'] = C3.loc[i,'Neighbourhood'] + ',' + ' ' + C3.loc[i+1,'Neighbourhood'] + ',' + ' ' + C3.loc[i+2,'Neighbourhood']
        Buena3 = Buena3.append(C3.loc[i,], ignore_index=True)
        
#We get the table without the repeated postal codes of the case in which they were repeated only 4 times
C4 = Tabla4.copy()
T4 = C4.shape[0]
Buena4 = pd.DataFrame(Vacio4)

for i in range(0,T4-3):
    if Tabla4.loc[i,'Postcode'] == Tabla4.loc[i+1,'Postcode'] == Tabla4.loc[i+2,'Postcode'] == Tabla4.loc[i+3,'Postcode']:
        C4.loc[i,'Neighbourhood'] = C4.loc[i,'Neighbourhood'] + ',' + ' ' + C4.loc[i+1,'Neighbourhood'] + ',' + ' ' + C4.loc[i+2,'Neighbourhood'] + ',' + ' ' + C4.loc[i+3,'Neighbourhood']
        Buena4 = Buena4.append(C4.loc[i,], ignore_index=True)

#We get the table without the repeated postal codes of the case in which they were repeated only 5 times
C5 = Tabla5.copy()
T5 = C5.shape[0]
Buena5 = pd.DataFrame(Vacio4)

for i in range(0,T5-4):
    if Tabla5.loc[i,'Postcode'] == Tabla5.loc[i+1,'Postcode'] == Tabla5.loc[i+2,'Postcode'] == Tabla5.loc[i+3,'Postcode'] == Tabla5.loc[i+4,'Postcode']:
        C5.loc[i,'Neighbourhood'] = (C5.loc[i,'Neighbourhood'] + ',' + ' ' + C5.loc[i+1,'Neighbourhood'] + ',' + ' ' + C5.loc[i+2,'Neighbourhood'] + ',' + ' ' + C5.loc[i+3,'Neighbourhood'] 
                                     + ',' + ' ' + C5.loc[i+4,'Neighbourhood'])
        Buena5 = Buena5.append(C5.loc[i,], ignore_index=True)

#We get the table without the repeated postal codes of the case in which they were only repeated 7 times
C7 = Tabla7.copy()
T7 = C7.shape[0]
Buena7 = pd.DataFrame(Vacio4)

for i in range(0,T7-6):
    if (Tabla7.loc[i,'Postcode'] == Tabla7.loc[i+1,'Postcode'] == Tabla7.loc[i+2,'Postcode'] == Tabla7.loc[i+3,'Postcode'] == Tabla7.loc[i+4,'Postcode'] 
        == Tabla7.loc[i+5,'Postcode'] == Tabla7.loc[i+6,'Postcode']):
        C7.loc[i,'Neighbourhood'] = (C7.loc[i,'Neighbourhood'] + ',' + ' ' + C7.loc[i+1,'Neighbourhood'] + ',' + ' ' + C7.loc[i+2,'Neighbourhood'] + ',' + ' ' + C7.loc[i+3,'Neighbourhood']
         + ',' + ' ' + C7.loc[i+4,'Neighbourhood'] + ',' + ' ' + C7.loc[i+5,'Neighbourhood'] + ',' + ' ' + C7.loc[i+6,'Neighbourhood'])
        Buena7 = Buena7.append(C7.loc[i,], ignore_index=True)
    
#We get the table without the repeated postal codes of the case in which they were only repeated 8 times
C8 = Tabla8.copy()
T8 = C8.shape[0]
Buena8 = pd.DataFrame(Vacio4)

for i in range(0,T8-7):
    if (Tabla8.loc[i,'Postcode'] == Tabla8.loc[i+1,'Postcode'] == Tabla8.loc[i+2,'Postcode'] == Tabla8.loc[i+3,'Postcode'] == Tabla8.loc[i+4,'Postcode'] 
        == Tabla8.loc[i+5,'Postcode'] == Tabla8.loc[i+6,'Postcode'] == Tabla8.loc[i+7,'Postcode']):
        C8.loc[i,'Neighbourhood'] = (C8.loc[i,'Neighbourhood'] + ',' + ' ' + C8.loc[i+1,'Neighbourhood'] + ',' + ' ' + C8.loc[i+2,'Neighbourhood'] + ',' + ' ' + C8.loc[i+3,'Neighbourhood']
         + ',' + ' ' + C8.loc[i+4,'Neighbourhood'] + ',' + ' ' + C8.loc[i+5,'Neighbourhood'] + ',' + ' ' + C8.loc[i+6,'Neighbourhood'] + ',' + ' ' + C8.loc[i+7,'Neighbourhood'])
        Buena8 = Buena8.append(C8.loc[i,], ignore_index=True)

### Finally, since we have all the concatenated and clean tables, it would only be necessary to join them to obtain the final table with all the requested features

In [144]:
final1 = tabla2.append(Buena2,ignore_index=True, sort =True)
final2 = final1.append(Buena3, ignore_index=True, sort =True)
final3 = final2.append(Buena4, ignore_index=True, sort =True)
final4 = final3.append(Buena5, ignore_index=True, sort =True)
final5 = final4.append(Buena7, ignore_index=True, sort =True)

#Fixing the last details in the final dataframe
final_df = final5.append(Buena8, ignore_index=True, sort =True)
final_df.rename(columns={'Postcode' : 'PostalCode'}, inplace = True)
cols = final_df.columns.tolist()
cols = cols[-1:] + cols[:-1]
final_df = final_df[cols]
final_df.rename(columns={'Neighbourhood': 'Neighborhood'},inplace = True )
final_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M9A,Etobicoke,Islington Avenue
3,M3B,North York,Don Mills North
4,M6B,North York,Glencairn


### At the end final_df.shape is as follows¶

In [145]:
final_df.shape

(103, 3)

## Now that we have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood.

## Question 2: Add to the final table the longitude and latitude for each postal code

### Loading the excel with the longitude and latitude information

In [146]:
df_data_1 = pd.read_csv('Geospatial_Coordinates.csv')
df_data_1.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


### It is important to note that the information for the length and latitude excel is already sorted in the Postal Code column

### To be able to join both dataframe correctly it is necessary to order our final_df

In [147]:
# These lines of code order the dataframe, reset the index and add the longitude and latitude information
dforden = final_df.sort_values(['PostalCode'])
M = final_df.shape[0]
Q = list(range(0,M))

dforden[''] = Q
dforden.set_index('', inplace=True)

Latitude1 = df_data_1['Latitude'].tolist()
Longitude1 = df_data_1['Longitude'].tolist()

dforden['Latitude'] = Latitude1
dforden['Longitude'] = Longitude1

### In the end our dataframe dforden with the longitude and latitude information is as follows:

In [148]:
dforden.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
,,,,,
0.0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1.0,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2.0,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3.0,M1G,Scarborough,Woburn,43.770992,-79.216917
4.0,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


## Question 3: Replicate the analysis done to neighborhoods in New York City

### Before starting the clustering process, we must filter our dataframe so that we have only Borough values that contain the word Toronto 

In [149]:
# We will filter from the final table only the Borough values ​​that contain the word Toronto in them
df_Toronto = dforden[dforden['Borough'].str.contains("Toronto", case=True)].reset_index()

#Fixing the last details in the Toronto dataframe
M1 = df_Toronto.shape[0]
Q1 = list(range(0,M1))
df_Toronto[''] = Q1
df_Toronto.set_index('', inplace=True)

df_Toronto.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
,,,,,
0.0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1.0,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
2.0,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572
3.0,M4M,East Toronto,Studio District,43.659526,-79.340923
4.0,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879


### Since we have the Toronto dataframe we can start replicating the clustering analysis in New York City

In [87]:
# importing the necessary libraries
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

!conda install -c conda-forge geopy --yes 
!conda install -c conda-forge folium=0.5.0 --yes 

import numpy as np 
import json 
import requests 
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium 

from sklearn.cluster import KMeans
from pandas.io.json import json_normalize 
from geopy.geocoders import Nominatim 

Solving environment: done


  current version: 4.5.11
  latest version: 4.7.12

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.

Solving environment: done


  current version: 4.5.11
  latest version: 4.7.12

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.

Libraries imported.


In [88]:
CLIENT_ID = '22ONMEZAN2J2V3WZTTDXTOQK50ADQPS3V2TQBQCMJ035PDJB' # your Foursquare ID
CLIENT_SECRET = 'GBHRLPQAFG2Z42CA4AVM0VHFWWPB0ZCXTQCS4ARMNJ1ZGZJQ' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

### Let's explore our neighborhoods in our Toronto dataframe and let's get the top 100 venues that are in our neighborhoods within a radius of 500 meters. (as we did in the analysis of new york city neighborhoods)

In [150]:
# we know that all the information is in the items key. Before we proceed, let's borrow the get_category_type
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

def getNearbyVenues(names, latitudes, longitudes):
    
    LIMIT = 100 # limit of number of venues returned by Foursquare API
    radius = 500 # define radius

    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [129]:
Toronto_venues = getNearbyVenues(names = df_Toronto['Neighborhood'],
                                   latitudes = df_Toronto['Latitude'],
                                   longitudes = df_Toronto['Longitude'])

The Beaches
The Danforth West, Riverdale
The Beaches West, India Bazaar
Studio District
Lawrence Park
Davisville North
North Toronto West
Davisville
Moore Park, Summerhill East
Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West
Rosedale
Cabbagetown, St. James Town
Church and Wellesley
Harbourfront, Regent Park
Ryerson, Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide, King, Richmond
Harbourfront East, Toronto Islands, Union Station
Design Exchange, Toronto Dominion Centre
Commerce Court, Victoria Hotel
Roselawn
Forest Hill North, Forest Hill West
The Annex, North Midtown, Yorkville
Harbord, University of Toronto
Chinatown, Grange Park, Kensington Market
CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place, Underground city
Christie
Dovercourt Village, Dufferin
Little Portugal, Trinity
Brockton, Exhibition Place, Parkdale Village
High Park, The 

In [151]:
Toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,The Beaches,43.676357,-79.293031,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
2,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
3,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
4,"The Danforth West, Riverdale",43.679557,-79.352188,Pantheon,43.677621,-79.351434,Greek Restaurant


### It's time to analyze each Neighborhood and group rows by Neighborhood and by taking the mean of the frequency of occurrence of each category

In [97]:
# one hot encoding
Toronto_onehot = pd.get_dummies(Toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Toronto_onehot['Neighborhood'] = Toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [Toronto_onehot.columns[-1]] + list(Toronto_onehot.columns[:-1])
Toronto_onehot = Toronto_onehot[fixed_columns]

In [98]:
Toronto_grouped = Toronto_onehot.groupby('Neighborhood').mean().reset_index()
Toronto_grouped.head()

Unnamed: 0,Neighborhood,Yoga Studio,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Store,Belgian Restaurant,Bistro,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Butcher,Café,Cajun / Creole Restaurant,Camera Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Gym,College Rec Center,Colombian Restaurant,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Festival,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gastropub,Gay Bar,General Entertainment,General Travel,Gift Shop,Gluten-free Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hobby Shop,Home Service,Hookah Bar,Hospital,Hostel,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indoor Play Area,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Korean Restaurant,Lake,Latin American Restaurant,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Mac & Cheese Joint,Malay Restaurant,Market,Martial Arts Dojo,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Movie Theater,Museum,Music Venue,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Outdoor Sculpture,Park,Performing Arts Venue,Persian Restaurant,Pet Store,Pharmacy,Pizza Place,Plane,Playground,Plaza,Poke Place,Polish Restaurant,Pool,Portuguese Restaurant,Post Office,Poutine Place,Pub,Ramen Restaurant,Record Shop,Rental Car Location,Restaurant,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soup Place,Southern / Soul Food Restaurant,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,Stationery Store,Steakhouse,Strip Club,Supermarket,Sushi Restaurant,Swim School,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.02,0.0,0.0,0.01,0.03,0.01,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.08,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.02,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.018182,0.036364,0.0,0.0,0.0,0.018182,0.018182,0.0,0.036364,0.0,0.018182,0.018182,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.036364,0.0,0.0,0.0,0.036364,0.0,0.0,0.0,0.0,0.018182,0.054545,0.072727,0.0,0.0,0.0,0.0,0.018182,0.0,0.018182,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.036364,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.018182,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.036364,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.036364,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.018182,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0
2,"Brockton, Exhibition Place, Parkdale Village",0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.090909,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Business Reply Mail Processing Centre 969 Eastern,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0625,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",0.0,0.0,0.0625,0.0625,0.0625,0.125,0.125,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [131]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = Toronto_grouped['Neighborhood']

for ind in np.arange(Toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Café,Steakhouse,Bar,Thai Restaurant,Hotel,Burger Joint,American Restaurant,Restaurant,Cosmetics Shop
1,Berczy Park,Coffee Shop,Cocktail Bar,Bakery,Steakhouse,Cheese Shop,Farmers Market,Café,Beer Bar,Seafood Restaurant,Comfort Food Restaurant
2,"Brockton, Exhibition Place, Parkdale Village",Breakfast Spot,Café,Coffee Shop,Furniture / Home Store,Burrito Place,Convenience Store,Restaurant,Caribbean Restaurant,Stadium,Bar
3,Business Reply Mail Processing Centre 969 Eastern,Light Rail Station,Fast Food Restaurant,Burrito Place,Auto Workshop,Spa,Garden,Garden Center,Brewery,Park,Smoke Shop
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",Airport Terminal,Airport Lounge,Airport Service,Sculpture Garden,Coffee Shop,Plane,Boat or Ferry,Boutique,Bar,Airport Gate


### With the neighborhoods_venues_sorted dataframe we can start our clustering analysis as it shows us the 10 most common venue in each neighborhood

### It's time to start making clusters

In [106]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

Toronto_merged = df_Toronto

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
Toronto_merged = Toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

Toronto_merged.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
,,,,,,,,,,,,,,,,
0.0,M4E,East Toronto,The Beaches,43.676357,-79.293031,0.0,Health Food Store,Trail,Pub,Diner,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store
1.0,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,0.0,Greek Restaurant,Coffee Shop,Italian Restaurant,Ice Cream Shop,Furniture / Home Store,Bookstore,Pub,Indian Restaurant,Sports Bar,Spa
2.0,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572,0.0,Pet Store,Park,Brewery,Sandwich Place,Burger Joint,Burrito Place,Pub,Pizza Place,Movie Theater,Sushi Restaurant
3.0,M4M,East Toronto,Studio District,43.659526,-79.340923,0.0,Café,Coffee Shop,Bakery,Italian Restaurant,American Restaurant,Yoga Studio,Comfort Food Restaurant,Seafood Restaurant,Brewery,Sandwich Place
4.0,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,3.0,Park,Bus Line,Swim School,Wings Joint,Discount Store,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant


### Since each neighborhood was assigned a Cluster Label we can create our map and thus see better what is really happening

In [121]:
# set number of clusters
kclusters = 5

Toronto_grouped_clustering = Toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Toronto_grouped_clustering)

# create map
address = 'Toronto, Ont'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Toronto_merged['Latitude'], Toronto_merged['Longitude'], Toronto_merged['Neighborhood'], Toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### We can see that 5 clusters were created but really most of the neighborhoods belong to the first cluster and this has a reasonable explanation

### First of all let's see how the datframes are with the information of each cluster and analyze them

#### Cluster 1

In [122]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 0, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
,,,,,,,,,,,,
0.0,East Toronto,0.0,Health Food Store,Trail,Pub,Diner,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store
1.0,East Toronto,0.0,Greek Restaurant,Coffee Shop,Italian Restaurant,Ice Cream Shop,Furniture / Home Store,Bookstore,Pub,Indian Restaurant,Sports Bar,Spa
2.0,East Toronto,0.0,Pet Store,Park,Brewery,Sandwich Place,Burger Joint,Burrito Place,Pub,Pizza Place,Movie Theater,Sushi Restaurant
3.0,East Toronto,0.0,Café,Coffee Shop,Bakery,Italian Restaurant,American Restaurant,Yoga Studio,Comfort Food Restaurant,Seafood Restaurant,Brewery,Sandwich Place
5.0,Central Toronto,0.0,Food & Drink Shop,Gym,Park,Breakfast Spot,Asian Restaurant,Hotel,Clothing Store,Sandwich Place,Wings Joint,Doner Restaurant
6.0,Central Toronto,0.0,Clothing Store,Sporting Goods Shop,Coffee Shop,Mexican Restaurant,Diner,Dessert Shop,Park,Gym / Fitness Center,Furniture / Home Store,Chinese Restaurant
7.0,Central Toronto,0.0,Dessert Shop,Sandwich Place,Gym,Pizza Place,Sushi Restaurant,Café,Coffee Shop,Italian Restaurant,Japanese Restaurant,Flower Shop
9.0,Central Toronto,0.0,Coffee Shop,Pub,American Restaurant,Restaurant,Sports Bar,Bagel Shop,Supermarket,Sushi Restaurant,Liquor Store,Fried Chicken Joint
11.0,Downtown Toronto,0.0,Coffee Shop,Restaurant,Café,Pub,Italian Restaurant,Bakery,Pizza Place,Liquor Store,Gastropub,Caribbean Restaurant


#### Cluster 2

In [123]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 1, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
,,,,,,,,,,,,
8.0,Central Toronto,1.0,Park,Tennis Court,Wings Joint,Diner,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store


#### Cluster 3

In [124]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 2, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
,,,,,,,,,,,,
22.0,Central Toronto,2.0,Garden,Home Service,Pool,Eastern European Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Wings Joint


#### Cluster 4

In [125]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 3, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
,,,,,,,,,,,,
4.0,Central Toronto,3.0,Park,Bus Line,Swim School,Wings Joint,Discount Store,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant


#### Cluster 5

In [132]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 4, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
,,,,,,,,,,,,
10.0,Downtown Toronto,4.0,Park,Playground,Trail,Building,Wings Joint,Eastern European Restaurant,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant


### After seeing the information contained in the dataframes of each cluster we can suggest that:

### The number of neighborhoods in the first Cluster is due to the fact that they are very closely related to each other. For example, at least 29 of the 36 neighborhoods in that cluster have coffee shops or Coffee Shops among their common venues

### The other 4 clusters do not have many things in common

### This is the reason why the program put most of the neighborhoods in the first cluster; really the preferences are similar !!