<h1 align=center><font size = 5>Segmenting and Clustering Neighborhoods in Toronto City Project</font></h1>

## Introduction


This worksheet describes the first few steps of work done, to explore and cluster the neighborhoods in Toronto Project.

### Page1
1. Web scrapping of Wikipedia page to get the Canadian postal codes for their Boroughs and Neighborhoodes, using python's beautifulSoup model.
2. Transform the retrieved data into dataframe (or table) with below conditions:
    1. The dataframe will contain three columns: PostalCode, Borough, and Neighborhood
    2. Only the cells that have an assigned borough, will be processed. Cells with a borough that is Not assigned, are ignored.
    3. As more than one neighborhood can exist in one postal code area. 
        1. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.
    4. If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.

### Page2
1. Use python's csv_read to get the Canadian postal codes's co-ordinates.
2. Merge the last created dataframe containing Canadian Postal Codes with borough names, with co-ordinates.

### Page3
1. 

## Table of Contents

<div class="Page1" style="margin-top: 20px">

<font size = 3>
    <u> Page1 </u>

1.  <a href="#item1">Get Postal Codes of Canada from Wiki</a>

2.  <a href="#item2">Load postal_data into a data frame</a>

3.  <a href="#item3">Loop thru the dataset to remove boroughs that are not assigned</a>

4.  <a href="#item4">display the shape of valid dataset</a>
    </font>
    </div>


<div class="Page2" style="margin-top: 20px">

<font size = 3>
    <u> Page2 </u>

1.  <a href="#item1">Get Canadian Postal Codes co-ordinates from CSV File</a>

2.  <a href="#item2">Load postal_coordinates_data into a data frame</a>

3.  <a href="#item3">Update the postal_data with co-ordinates</a>
    </font>
    </div>

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library
import os

print('Libraries imported.')

Libraries imported.


<p style="color:red"> <b>  Page 01 </b> </p>

<b> Get Postal Codes of Canada from Wiki</b>

In [2]:
postal_data = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0]
postal_data

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
7,M8A,Not assigned,Not assigned
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"


<b> Load postal_data into a data frame. </b>

In [3]:
postal_data_df = np.array(postal_data)
print((postal_data_df.shape))
#postal_data_df

(180, 3)


<b> Loop thru the dataset to remove boroughs that are as "not assigned".</b>

In [4]:
row_cnt = 0
max_col = 3
valid_entry = 0
new_entry = {}
postal_code_colm = []
borough_colm = []
neighbourhood = []
clean_postal_data = []
table_row_cnt = 0
for eachrow in postal_data_df:
    row_cnt += 1
    curr_col = 0
    for eachentry in eachrow:
        curr_col += 1
        #print (eachentry)
        if (curr_col == 1):
            new_postal_code = eachentry
        elif (curr_col == 2):
            #if eachentry.isnull():
            #    print("Null entry")
            if eachentry == 'Not assigned':
                #print("not valid entry on col2, So Skipped. Row,Col = ",row_cnt,',',curr_col, '. Value=', eachentry,'.')
                valid_entry = 0
            else:
                new_borough = eachentry
                valid_entry = 1
        elif (curr_col == 3):
            curr_col = 0
            if eachentry == np.nan:
                if valid_entry == 1:
                    new_neighbourhood = new_borough
            else:
                new_neighbourhood = eachentry
            if valid_entry == 1:
                valid_entry = 0
                new_entry = [new_postal_code, new_borough, new_neighbourhood]
                #print(new_entry)
                clean_postal_data.append(new_entry)
                postal_code_colm.append(new_postal_code)
                borough_colm.append(new_borough)
                neighbourhood.append(new_neighbourhood)                
                
                
#print(clean_postal_data)
#print('postal_code_colm', postal_code_colm)

<b> Display the shape of clean data </b>

In [5]:
column_names = ['PostalCode','Borough','Neighbourhood']
clean_postal_data_df = pd.DataFrame(clean_postal_data,columns=column_names)
print(clean_postal_data_df.shape)
#print(clean_postal_data_df)
df_1 = pd.DataFrame({'PostalCode': postal_code_colm, 'Borough' : borough_colm,'Neighbourhood': neighbourhood})
print("Shape of the Valid PostalCodes",df_1.shape)

(103, 3)
Shape of the Valid PostalCodes (103, 3)


<b> **Sorting** the Postal dataframes (for future use) </b>

In [6]:
sorted_df_1 = df_1.sort_values(by ='PostalCode')
#print(sorted_df1)
sorted_clean_postal_data_df = clean_postal_data_df.sort_values(by ='PostalCode')
#print(sorted_clean_postal_data_df)

-----------------
<p style="color:red"> <b>  Page 02 </b> </p>
<p style="color:Blue"> <b>    Objective - Get Canadian Postal Codes co-ordinates from CSV File & load into Postal dataframe </b> </p>


<b>Get Canadian Postal Codes co-ordinates from CSV File.</b>

In [7]:
path = os.getcwd()
filename = "Geospatial_Coordinates.csv"
full_csv_filename = os.path.join(path, filename)
full_csv_filename
coordinates_data = pd.read_csv(full_csv_filename)
coordinates_data

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
5,M1J,43.744734,-79.239476
6,M1K,43.727929,-79.262029
7,M1L,43.711112,-79.284577
8,M1M,43.716316,-79.239476
9,M1N,43.692657,-79.264848


<b>Load postal_coordinates_data into a data frame. </b>

In [8]:
coordinates_data_df = np.array(coordinates_data)

<b> Loop thru the dataframe to change list-type to column-like structure.</b>

In [9]:
row_cnt = 0
#max_col = 3
new_postal_code = ''
new_latitude = 0
new_longitude = 0

new_entry = {}
postal_code_colm = []
latitude_colm = []
longitude_colm = []
clean_postal_coordinates_data = []
 
for eachrow in coordinates_data_df:
    row_cnt += 1
    curr_col = 0
    for eachentry in eachrow:
        curr_col += 1
        #print (eachentry)
        if (curr_col == 1):
            new_postal_code = eachentry
        elif (curr_col == 2):
            new_latitude = eachentry
        elif (curr_col == 3):
            curr_col = 0
            new_longitude = eachentry
        
        new_entry = [new_postal_code, new_latitude, new_longitude]
        #print(new_entry)
        clean_postal_coordinates_data.append(new_entry)
        postal_code_colm.append(new_postal_code)
        latitude_colm.append(new_latitude)
        longitude_colm.append(new_longitude)
        
#print(clean_postal_coordinates_data)

<b> Display the shape of clean data </b>

In [10]:
column_names = ['PostalCode','Latitude','Longitude']
clean_postal_coordinates_data_df = pd.DataFrame(clean_postal_coordinates_data,columns=column_names)
print(clean_postal_coordinates_data_df.shape)
#print(clean_postal_coordinates_data_df)
df_2 = pd.DataFrame({'PostalCode': postal_code_colm, 'Latitude' : latitude_colm,'Longitude': longitude_colm})
print(df_2.head())
print(df_2.shape)

(309, 3)
  PostalCode   Latitude  Longitude
0        M1B   0.000000   0.000000
1        M1B  43.806686   0.000000
2        M1B  43.806686 -79.194353
3        M1C  43.806686 -79.194353
4        M1C  43.784535 -79.194353
(309, 3)


<b> Sort Coordinates dataframe by Postal Code <b>

In [11]:
sorted_df_2 = df_2.sort_values(by ='PostalCode')
print(sorted_df_2.columns)
#print(sorted_df1)
sorted_clean_postal_coordinates_data_df = clean_postal_coordinates_data_df.sort_values(by ='PostalCode')
#print(sorted_clean_postal_data_df)

Index(['PostalCode', 'Latitude', 'Longitude'], dtype='object')


<b>             Update the postal_data with co-ordinates. </b>

<b> <u> First </u>, Merge or join both the dataframes on Key "PostalCode" </b>

In [12]:
merged_df = pd.merge(sorted_df_1, sorted_df_2, on='PostalCode', how='left')
print(merged_df)

    PostalCode           Borough  \
0          M1B       Scarborough   
1          M1B       Scarborough   
2          M1B       Scarborough   
3          M1C       Scarborough   
4          M1C       Scarborough   
5          M1C       Scarborough   
6          M1E       Scarborough   
7          M1E       Scarborough   
8          M1E       Scarborough   
9          M1G       Scarborough   
10         M1G       Scarborough   
11         M1G       Scarborough   
12         M1H       Scarborough   
13         M1H       Scarborough   
14         M1H       Scarborough   
15         M1J       Scarborough   
16         M1J       Scarborough   
17         M1J       Scarborough   
18         M1K       Scarborough   
19         M1K       Scarborough   
20         M1K       Scarborough   
21         M1L       Scarborough   
22         M1L       Scarborough   
23         M1L       Scarborough   
24         M1M       Scarborough   
25         M1M       Scarborough   
26         M1M       Scarbor

In [13]:
merged_column_names = ['PostalCode','Borough','Neighbourhood','Latitude','Longitude']
merged_df = pd.DataFrame(merged_df,columns=merged_column_names)
print(merged_df.columns)

Index(['PostalCode', 'Borough', 'Neighbourhood', 'Latitude', 'Longitude'], dtype='object')


<b> Sort the Merged dataframe </b>

In [14]:
sort_column_names = ['PostalCode','Borough','Neighbourhood','Latitude','Longitude']
#sort_column_names = ['PostalCode','Longitude','Latitude']


sort_ascending=[True, True, True, False, True]
#sort_ascending=[True, True, False]


sorted_merged_df = merged_df.sort_values(by = sort_column_names, ascending=sort_ascending)
sorted_merged_df.reset_index()
sorted_merged_df

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
2,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1B,Scarborough,"Malvern, Rouge",43.806686,0.0
0,M1B,Scarborough,"Malvern, Rouge",0.0,0.0
3,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.806686,-79.194353
4,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.194353
5,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
6,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.784535,-79.160497
8,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
7,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.160497
9,M1G,Scarborough,Woburn,43.770992,-79.216917


<b> Now, remove the duplicate entries created during the merge() </b>

In [15]:
subset_column_names = ['PostalCode','Borough','Neighbourhood']
deduped_sorted_merged_df = sorted_merged_df.drop_duplicates(subset = subset_column_names,keep='first')
print(deduped_sorted_merged_df.shape)
deduped_sorted_merged_df

(103, 5)


Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
2,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
3,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.806686,-79.194353
6,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.784535,-79.160497
9,M1G,Scarborough,Woburn,43.770992,-79.216917
14,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
16,M1J,Scarborough,Scarborough Village,43.773136,-79.239476
19,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park",43.744734,-79.239476
21,M1L,Scarborough,"Golden Mile, Clairlea, Oakridge",43.727929,-79.262029
25,M1M,Scarborough,"Cliffside, Cliffcrest, Scarborough Village West",43.716316,-79.284577
28,M1N,Scarborough,"Birch Cliff, Cliffside West",43.716316,-79.239476


-----------------
<p style="color:red">  <b>  Page 03 </b> </p>
<p style="color:blue"> <b>  Objective: Plot the neighbours in Toronto city after k-means clustering </b> </p>

<b> Examining the dataframe, we'll see 10 boroughs and their neighbourhoods. </b>

In [16]:
all_boroughs = deduped_sorted_merged_df.groupby('Borough').size()
print("Total no of Buroughs", all_boroughs.shape)
all_boroughs

Total no of Buroughs (10,)


Borough
Central Toronto      9
Downtown Toronto    19
East Toronto         5
East York            5
Etobicoke           12
Mississauga          1
North York          24
Scarborough         17
West Toronto         6
York                 5
dtype: int64

<b> Use geopy library to see the latitude and longitude values of Toronto City. </b>

In [17]:
address = 'Toronto, CA'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of {} are {}, {}.'.format(address, latitude, longitude))

The geograpical coordinate of Toronto, CA are 43.6534817, -79.3839347.


In [18]:
#latitude = 43.6532
#longitude = -79.3832
# create map of New York using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(deduped_sorted_merged_df['Latitude'], deduped_sorted_merged_df['Longitude'], deduped_sorted_merged_df['Borough'], deduped_sorted_merged_df['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

<p style="color:black"> <b> From foursquareAPI, get venues from the latitude & Longitude </b></p>

In [19]:
import json
def getNearbyVenues_sample(names, latitudes, longitudes, radius=500):
    print(names)
    # create the API request URL
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    #print(results)

In [21]:
#nearby_venues = getNearbyVenues_sample(names='Agincourt',latitudes=43.794200,longitudes=-79.295849)
#print(json.dumps(nearby_venues, sort_keys=True, indent=4))

<p style="color:black"> <b> From foursquareAPI, get venues from the latitude & Longitude for all locations </b></p>

In [22]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [23]:
#set the variables
CLIENT_ID = 'RKWLC0DN3YQ4ACU1YP4SBYAEJ3VX5LDYYF4AC0JOA30IFVPQ'
CLIENT_SECRET = 'E1MLAWMWQHHJI0OOWALVR1KJ0ROESER5GF5BCJMV0D5YR0YQ'  
VERSION = '20180605'
radius = 500  
LIMIT = 100

nearby_venues = getNearbyVenues(names=deduped_sorted_merged_df['Neighbourhood'],
                                latitudes=deduped_sorted_merged_df['Latitude'],
                                longitudes=deduped_sorted_merged_df['Longitude'])


nearby_venues.head()

Malvern, Rouge
Rouge Hill, Port Union, Highland Creek
Guildwood, Morningside, West Hill
Woburn
Cedarbrae
Scarborough Village
Kennedy Park, Ionview, East Birchmount Park
Golden Mile, Clairlea, Oakridge
Cliffside, Cliffcrest, Scarborough Village West
Birch Cliff, Cliffside West
Dorset Park, Wexford Heights, Scarborough Town Centre
Wexford, Maryvale
Agincourt
Clarks Corners, Tam O'Shanter, Sullivan
Milliken, Agincourt North, Steeles East, L'Amoreaux East
Steeles West, L'Amoreaux West
Upper Rouge
Hillcrest Village
Fairview, Henry Farm, Oriole
Bayview Village
York Mills, Silver Hills
Willowdale, Newtonbrook
Willowdale, Willowdale East
York Mills West
Willowdale, Willowdale West
Parkwoods
Don Mills
Don Mills
Bathurst Manor, Wilson Heights, Downsview North
Northwood Park, York University
Downsview
Downsview
Downsview
Downsview
Victoria Village
Parkview Hill, Woodbine Gardens
Woodbine Heights
The Beaches
Leaside
Thorncliffe Park
East Toronto, Broadview North (Old East York)
The Danforth West, 

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Malvern, Rouge",43.806686,-79.194353,Wendy’s,43.807448,-79.199056,Fast Food Restaurant
1,"Rouge Hill, Port Union, Highland Creek",43.806686,-79.194353,Wendy’s,43.807448,-79.199056,Fast Food Restaurant
2,"Guildwood, Morningside, West Hill",43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
3,"Guildwood, Morningside, West Hill",43.784535,-79.160497,SEBS Engineering Inc. (Sustainable Energy and ...,43.782371,-79.15682,Construction & Landscaping
4,Woburn,43.770992,-79.216917,Starbucks,43.770037,-79.221156,Coffee Shop


In [24]:
print(nearby_venues.shape)
nearby_venues.head()

(1984, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Malvern, Rouge",43.806686,-79.194353,Wendy’s,43.807448,-79.199056,Fast Food Restaurant
1,"Rouge Hill, Port Union, Highland Creek",43.806686,-79.194353,Wendy’s,43.807448,-79.199056,Fast Food Restaurant
2,"Guildwood, Morningside, West Hill",43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
3,"Guildwood, Morningside, West Hill",43.784535,-79.160497,SEBS Engineering Inc. (Sustainable Energy and ...,43.782371,-79.15682,Construction & Landscaping
4,Woburn,43.770992,-79.216917,Starbucks,43.770037,-79.221156,Coffee Shop


<b> Get the count of Venue in each Neighbourhood </b>

In [25]:
nearby_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,2,2,2,2,2,2
"Alderwood, Long Branch",14,14,14,14,14,14
"Bathurst Manor, Wilson Heights, Downsview North",21,21,21,21,21,21
Bayview Village,4,4,4,4,4,4
"Bedford Park, Lawrence Manor East",22,22,22,22,22,22
Berczy Park,85,85,85,85,85,85
"Birch Cliff, Cliffside West",2,2,2,2,2,2
"Brockton, Parkdale Village, Exhibition Place",45,45,45,45,45,45
"Business reply mail Processing Centre, South Central Letter Processing Plant Toronto",3,3,3,3,3,3
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",74,74,74,74,74,74


<b> Get the count of Unique Venues Category </b>

In [26]:
print('There are {} uniques categories.'.format(len(nearby_venues['Venue Category'].unique())))

There are 245 uniques categories.


### Analyze Each Neighborhood

<b> Normalizing/ One hot encoding by Venues Category </b>

In [27]:
# one hot encoding
Toronto_onehot = pd.get_dummies(nearby_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Toronto_onehot['Neighborhood'] = nearby_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [Toronto_onehot.columns[-1]] + list(Toronto_onehot.columns[:-1])
Toronto_onehot = Toronto_onehot[fixed_columns]

Toronto_onehot.head()

Unnamed: 0,Yoga Studio,Accessories Store,Afghan Restaurant,Airport Food Court,American Restaurant,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basketball Stadium,Bed & Breakfast,Beer Bar,Beer Store,Belgian Restaurant,Bike Shop,Bistro,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Butcher,Café,Cajun / Creole Restaurant,Camera Store,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Auditorium,College Cafeteria,College Gym,College Rec Center,Colombian Restaurant,Comedy Club,Comfort Food Restaurant,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Creperie,Cuban Restaurant,Cupcake Shop,Curling Ice,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Electronics Store,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Gluten-free Restaurant,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hakka Restaurant,Health & Beauty Service,Health Food Store,History Museum,Hobby Shop,Hockey Arena,Home Service,Hookah Bar,Hospital,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indoor Play Area,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Kitchen Supply Store,Korean BBQ Restaurant,Korean Restaurant,Lake,Latin American Restaurant,Light Rail Station,Lingerie Store,Liquor Store,Locksmith,Lounge,Malay Restaurant,Market,Martial Arts School,Massage Studio,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Moroccan Restaurant,Motel,Movie Theater,Museum,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Park,Peruvian Restaurant,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Poke Place,Pool,Portuguese Restaurant,Poutine Place,Print Shop,Pub,Ramen Restaurant,Record Shop,Recording Studio,Rental Car Location,Restaurant,River,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soccer Field,Soup Place,Southern / Soul Food Restaurant,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Sri Lankan Restaurant,Steakhouse,Street Art,Strip Club,Supermarket,Sushi Restaurant,Swim School,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tattoo Parlor,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Toy / Game Store,Trail,Train Station,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Women's Store
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Malvern, Rouge",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Rouge Hill, Port Union, Highland Creek",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Guildwood, Morningside, West Hill",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Guildwood, Morningside, West Hill",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Woburn,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [28]:
Toronto_onehot.shape

(1984, 245)

#### group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [29]:
Toronto_grouped = Toronto_onehot.groupby('Neighborhood').mean().reset_index()
Toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Afghan Restaurant,Airport Food Court,American Restaurant,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basketball Stadium,Bed & Breakfast,Beer Bar,Beer Store,Belgian Restaurant,Bike Shop,Bistro,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Butcher,Café,Cajun / Creole Restaurant,Camera Store,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Auditorium,College Cafeteria,College Gym,College Rec Center,Colombian Restaurant,Comedy Club,Comfort Food Restaurant,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Creperie,Cuban Restaurant,Cupcake Shop,Curling Ice,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Electronics Store,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Gluten-free Restaurant,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hakka Restaurant,Health & Beauty Service,Health Food Store,History Museum,Hobby Shop,Hockey Arena,Home Service,Hookah Bar,Hospital,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indoor Play Area,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Kitchen Supply Store,Korean BBQ Restaurant,Korean Restaurant,Lake,Latin American Restaurant,Light Rail Station,Lingerie Store,Liquor Store,Locksmith,Lounge,Malay Restaurant,Market,Martial Arts School,Massage Studio,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Moroccan Restaurant,Motel,Movie Theater,Museum,Music Venue,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Park,Peruvian Restaurant,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Poke Place,Pool,Portuguese Restaurant,Poutine Place,Print Shop,Pub,Ramen Restaurant,Record Shop,Recording Studio,Rental Car Location,Restaurant,River,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soccer Field,Soup Place,Southern / Soul Food Restaurant,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Sri Lankan Restaurant,Steakhouse,Street Art,Strip Club,Supermarket,Sushi Restaurant,Swim School,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tattoo Parlor,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Toy / Game Store,Trail,Train Station,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Women's Store
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.095238,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.047619,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Berczy Park,0.0,0.0,0.0,0.0,0.035294,0.0,0.011765,0.0,0.0,0.011765,0.0,0.0,0.011765,0.0,0.0,0.011765,0.0,0.0,0.0,0.0,0.0,0.035294,0.0,0.011765,0.0,0.011765,0.011765,0.0,0.0,0.011765,0.0,0.0,0.0,0.0,0.0,0.011765,0.0,0.0,0.058824,0.0,0.011765,0.0,0.0,0.011765,0.0,0.023529,0.047059,0.070588,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011765,0.0,0.0,0.0,0.0,0.023529,0.011765,0.0,0.0,0.0,0.0,0.0,0.023529,0.0,0.0,0.011765,0.0,0.0,0.0,0.0,0.0,0.0,0.011765,0.0,0.0,0.0,0.0,0.023529,0.0,0.0,0.0,0.0,0.011765,0.0,0.0,0.0,0.011765,0.011765,0.011765,0.0,0.0,0.0,0.011765,0.0,0.0,0.0,0.0,0.035294,0.0,0.0,0.0,0.011765,0.0,0.0,0.0,0.0,0.0,0.011765,0.023529,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023529,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011765,0.011765,0.023529,0.0,0.0,0.0,0.011765,0.0,0.0,0.0,0.011765,0.0,0.023529,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011765,0.0,0.0,0.0,0.0,0.023529,0.0,0.011765,0.0,0.0,0.011765,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011765,0.0,0.0,0.011765,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047059,0.0,0.0,0.0,0.0,0.011765,0.0,0.0,0.023529,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011765,0.0,0.0,0.0,0.0,0.0,0.011765,0.011765,0.0,0.0,0.0,0.0,0.0,0.0,0.011765,0.0,0.0,0.0,0.011765,0.0
6,"Birch Cliff, Cliffside West",0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,"Brockton, Parkdale Village, Exhibition Place",0.022222,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.044444,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.088889,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.044444,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.022222,0.022222,0.0,0.0,0.022222,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.044444,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.044444,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.044444,0.0,0.022222,0.0
8,"Business reply mail Processing Centre, South C...",0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.054054,0.0,0.0,0.013514,0.013514,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.013514,0.013514,0.0,0.0,0.0,0.027027,0.013514,0.0,0.0,0.054054,0.0,0.0,0.0,0.013514,0.013514,0.0,0.0,0.013514,0.054054,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.013514,0.0,0.0,0.0,0.0,0.013514,0.013514,0.027027,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.013514,0.0,0.013514,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.013514,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.054054,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.013514,0.0,0.027027,0.0,0.0,0.013514,0.027027,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.054054,0.0,0.040541,0.0,0.013514,0.0


### print each neighborhood along with the top 5 most common venues


In [30]:
num_top_venues = 5

for hood in Toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = Toronto_grouped[Toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Agincourt----
                       venue  freq
0                       Park   0.5
1       Caribbean Restaurant   0.5
2                Yoga Studio   0.0
3                  Nightclub   0.0
4  Middle Eastern Restaurant   0.0


----Alderwood, Long Branch----
                  venue  freq
0                  Café  0.14
1           Coffee Shop  0.14
2    Mexican Restaurant  0.07
3            Restaurant  0.07
4  Fast Food Restaurant  0.07


----Bathurst Manor, Wilson Heights, Downsview North----
                       venue  freq
0                       Bank  0.10
1                Coffee Shop  0.10
2  Middle Eastern Restaurant  0.05
3         Chinese Restaurant  0.05
4                Gas Station  0.05


----Bayview Village----
                 venue  freq
0                 Bank  0.25
1                 Café  0.25
2  Japanese Restaurant  0.25
3   Chinese Restaurant  0.25
4               Office  0.00


----Bedford Park, Lawrence Manor East----
                     venue  freq
0             

                     venue  freq
0               Playground  0.33
1               Smoke Shop  0.33
2            Jewelry Store  0.33
3              Yoga Studio  0.00
4  New American Restaurant  0.00


----Kensington Market, Chinatown, Grange Park----
                 venue  freq
0                 Café  0.15
1            Bookstore  0.09
2  Japanese Restaurant  0.06
3                  Bar  0.06
4               Bakery  0.06


----Kingsview Village, St. Phillips, Martin Grove Gardens, Richview Gardens----
                venue  freq
0      Discount Store  0.17
1         Pizza Place  0.17
2  Chinese Restaurant  0.17
3      Sandwich Place  0.17
4        Intersection  0.17


----Lawrence Manor, Lawrence Heights----
                    venue  freq
0          Clothing Store  0.38
1           Women's Store  0.08
2               Gift Shop  0.08
3  Furniture / Home Store  0.08
4                Boutique  0.08


----Lawrence Park----
         venue  freq
0     Bus Line  0.33
1         Park  0.33
2  S

                       venue  freq
0                       Park   1.0
1                Yoga Studio   0.0
2                  Nightclub   0.0
3         Mexican Restaurant   0.0
4  Middle Eastern Restaurant   0.0


----Weston----
                       venue  freq
0             Baseball Field   1.0
1                Yoga Studio   0.0
2                  Nightclub   0.0
3         Mexican Restaurant   0.0
4  Middle Eastern Restaurant   0.0


----Wexford, Maryvale----
                   venue  freq
0      Indian Restaurant   0.4
1     Chinese Restaurant   0.2
2  Vietnamese Restaurant   0.2
3              Pet Store   0.2
4            Yoga Studio   0.0


----Willowdale, Willowdale West----
           venue  freq
0    Coffee Shop   0.2
1    Pizza Place   0.2
2       Pharmacy   0.2
3        Butcher   0.2
4  Grocery Store   0.2


----Woburn----
                       venue  freq
0                Coffee Shop  0.50
1         Mexican Restaurant  0.25
2      Korean BBQ Restaurant  0.25
3  Middle Easter

<b> Sort the venues in descending order. </b>

In [31]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

<b> Create a new dataframe and display the top 10 venues for each neighborhood. </b>

In [32]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = Toronto_grouped['Neighborhood']

for ind in np.arange(Toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Toronto_grouped.iloc[ind, :], num_top_venues)


print(neighborhoods_venues_sorted.shape)
neighborhoods_venues_sorted.head()

(93, 11)


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Park,Caribbean Restaurant,Women's Store,Dumpling Restaurant,Fish Market,Fish & Chips Shop,Filipino Restaurant,Field,Fast Food Restaurant,Farmers Market
1,"Alderwood, Long Branch",Coffee Shop,Café,Bakery,Restaurant,Fried Chicken Joint,Liquor Store,Fast Food Restaurant,Pharmacy,Gym,American Restaurant
2,"Bathurst Manor, Wilson Heights, Downsview North",Coffee Shop,Bank,Gas Station,Shopping Mall,Middle Eastern Restaurant,Sandwich Place,Mobile Phone Shop,Chinese Restaurant,Fried Chicken Joint,Frozen Yogurt Shop
3,Bayview Village,Bank,Café,Japanese Restaurant,Chinese Restaurant,Women's Store,Flea Market,Fish Market,Fish & Chips Shop,Filipino Restaurant,Field
4,"Bedford Park, Lawrence Manor East",Italian Restaurant,Sandwich Place,Coffee Shop,Thai Restaurant,Locksmith,Liquor Store,Restaurant,Juice Bar,Sushi Restaurant,Butcher


# Run k-means to cluster the neighborhood into 5 clusters.

In [33]:
# set number of clusters
kclusters = 5

Toronto_grouped_clustering = Toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 1, 1, 1, 1, 1, 1, 1, 1, 1])

<b> Create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood. </b>

In [34]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

Toronto_merged = deduped_sorted_merged_df

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
Toronto_merged = Toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighbourhood')

#remove lines with cluster-labels = 0 and NaN-es
#print(Toronto_merged.columns)
#Toronto_merged = Toronto_merged[Toronto_merged['Cluster Labels'] != 0]
Toronto_merged = Toronto_merged.dropna()
Toronto_merged # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353,3.0,Fast Food Restaurant,Women's Store,Food Court,Flea Market,Fish Market,Fish & Chips Shop,Filipino Restaurant,Field,Farmers Market,Falafel Restaurant
3,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.806686,-79.194353,3.0,Fast Food Restaurant,Women's Store,Food Court,Flea Market,Fish Market,Fish & Chips Shop,Filipino Restaurant,Field,Farmers Market,Falafel Restaurant
6,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.784535,-79.160497,0.0,Construction & Landscaping,Bar,Women's Store,Dumpling Restaurant,Flea Market,Fish Market,Fish & Chips Shop,Filipino Restaurant,Field,Fast Food Restaurant
9,M1G,Scarborough,Woburn,43.770992,-79.216917,1.0,Coffee Shop,Mexican Restaurant,Korean BBQ Restaurant,Women's Store,Dumpling Restaurant,Fish Market,Fish & Chips Shop,Filipino Restaurant,Field,Fast Food Restaurant
14,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,1.0,Bakery,Fried Chicken Joint,Gas Station,Thai Restaurant,Caribbean Restaurant,Athletics & Sports,Bank,Hakka Restaurant,Field,Fast Food Restaurant
16,M1J,Scarborough,Scarborough Village,43.773136,-79.239476,1.0,Bakery,Fried Chicken Joint,Gas Station,Thai Restaurant,Caribbean Restaurant,Athletics & Sports,Bank,Hakka Restaurant,Field,Fast Food Restaurant
19,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park",43.744734,-79.239476,1.0,Jewelry Store,Playground,Smoke Shop,Women's Store,Fish & Chips Shop,Filipino Restaurant,Field,Fast Food Restaurant,Farmers Market,Falafel Restaurant
21,M1L,Scarborough,"Golden Mile, Clairlea, Oakridge",43.727929,-79.262029,1.0,Hobby Shop,Department Store,Coffee Shop,Train Station,Filipino Restaurant,Field,Fast Food Restaurant,Fish & Chips Shop,Donut Shop,Farmers Market
25,M1M,Scarborough,"Cliffside, Cliffcrest, Scarborough Village West",43.716316,-79.284577,1.0,Soccer Field,Pub,Coffee Shop,Women's Store,Donut Shop,Fish & Chips Shop,Filipino Restaurant,Field,Fast Food Restaurant,Farmers Market
28,M1N,Scarborough,"Birch Cliff, Cliffside West",43.716316,-79.239476,1.0,American Restaurant,Motel,Women's Store,Dumpling Restaurant,Flea Market,Fish Market,Fish & Chips Shop,Filipino Restaurant,Field,Fast Food Restaurant


<b> View the clusters in map </b>

In [35]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Toronto_merged['Latitude'], Toronto_merged['Longitude'], Toronto_merged['Neighbourhood'], Toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 5. Examine Clusters

<b> Now, we can examine each cluster and determine the discriminating venue categories that distinguish each cluster. </b>

In [36]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 0, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Scarborough,0.0,Construction & Landscaping,Bar,Women's Store,Dumpling Restaurant,Flea Market,Fish Market,Fish & Chips Shop,Filipino Restaurant,Field,Fast Food Restaurant


In [37]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 1, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,Scarborough,1.0,Coffee Shop,Mexican Restaurant,Korean BBQ Restaurant,Women's Store,Dumpling Restaurant,Fish Market,Fish & Chips Shop,Filipino Restaurant,Field,Fast Food Restaurant
14,Scarborough,1.0,Bakery,Fried Chicken Joint,Gas Station,Thai Restaurant,Caribbean Restaurant,Athletics & Sports,Bank,Hakka Restaurant,Field,Fast Food Restaurant
16,Scarborough,1.0,Bakery,Fried Chicken Joint,Gas Station,Thai Restaurant,Caribbean Restaurant,Athletics & Sports,Bank,Hakka Restaurant,Field,Fast Food Restaurant
19,Scarborough,1.0,Jewelry Store,Playground,Smoke Shop,Women's Store,Fish & Chips Shop,Filipino Restaurant,Field,Fast Food Restaurant,Farmers Market,Falafel Restaurant
21,Scarborough,1.0,Hobby Shop,Department Store,Coffee Shop,Train Station,Filipino Restaurant,Field,Fast Food Restaurant,Fish & Chips Shop,Donut Shop,Farmers Market
25,Scarborough,1.0,Soccer Field,Pub,Coffee Shop,Women's Store,Donut Shop,Fish & Chips Shop,Filipino Restaurant,Field,Fast Food Restaurant,Farmers Market
28,Scarborough,1.0,American Restaurant,Motel,Women's Store,Dumpling Restaurant,Flea Market,Fish Market,Fish & Chips Shop,Filipino Restaurant,Field,Fast Food Restaurant
32,Scarborough,1.0,Indian Restaurant,Vietnamese Restaurant,Pet Store,Chinese Restaurant,Women's Store,Falafel Restaurant,Electronics Store,Escape Room,Ethiopian Restaurant,Event Space
35,Scarborough,1.0,Indian Restaurant,Vietnamese Restaurant,Pet Store,Chinese Restaurant,Women's Store,Falafel Restaurant,Electronics Store,Escape Room,Ethiopian Restaurant,Event Space
39,Scarborough,1.0,Clothing Store,Breakfast Spot,Lounge,Skating Rink,Latin American Restaurant,Electronics Store,Fish Market,Fish & Chips Shop,Filipino Restaurant,Field


In [38]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 2, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
37,Scarborough,2.0,Park,Caribbean Restaurant,Women's Store,Dumpling Restaurant,Fish Market,Fish & Chips Shop,Filipino Restaurant,Field,Fast Food Restaurant,Farmers Market
47,Scarborough,2.0,Bakery,Playground,Park,Intersection,Falafel Restaurant,Electronics Store,Escape Room,Ethiopian Restaurant,Event Space,Fast Food Restaurant
125,East Toronto,2.0,Park,Convenience Store,Intersection,Farmers Market,Electronics Store,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Women's Store
133,Central Toronto,2.0,Park,Bus Line,Swim School,Women's Store,Dumpling Restaurant,Fish Market,Fish & Chips Shop,Filipino Restaurant,Field,Fast Food Restaurant
135,Central Toronto,2.0,Park,Bus Line,Swim School,Women's Store,Dumpling Restaurant,Fish Market,Fish & Chips Shop,Filipino Restaurant,Field,Fast Food Restaurant
155,Downtown Toronto,2.0,Park,Playground,Trail,Women's Store,Falafel Restaurant,Dumpling Restaurant,Electronics Store,Escape Room,Ethiopian Restaurant,Event Space
195,Central Toronto,2.0,Sushi Restaurant,Park,Jewelry Store,Trail,Women's Store,Falafel Restaurant,Electronics Store,Escape Room,Ethiopian Restaurant,Event Space
225,Downtown Toronto,2.0,Park,Women's Store,Pool,Afghan Restaurant,Airport Food Court,Flea Market,Fish Market,Fish & Chips Shop,Filipino Restaurant,Field
239,North York,2.0,Park,Construction & Landscaping,Bakery,Fish Market,Fish & Chips Shop,Filipino Restaurant,Field,Fast Food Restaurant,Donut Shop,Farmers Market
242,York,2.0,Park,Construction & Landscaping,Bakery,Fish Market,Fish & Chips Shop,Filipino Restaurant,Field,Fast Food Restaurant,Donut Shop,Farmers Market


In [39]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 3, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Scarborough,3.0,Fast Food Restaurant,Women's Store,Food Court,Flea Market,Fish Market,Fish & Chips Shop,Filipino Restaurant,Field,Farmers Market,Falafel Restaurant
3,Scarborough,3.0,Fast Food Restaurant,Women's Store,Food Court,Flea Market,Fish Market,Fish & Chips Shop,Filipino Restaurant,Field,Farmers Market,Falafel Restaurant


In [40]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 4, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
276,Etobicoke,4.0,Construction & Landscaping,Baseball Field,Women's Store,Dumpling Restaurant,Flea Market,Fish Market,Fish & Chips Shop,Filipino Restaurant,Field,Fast Food Restaurant
294,York,4.0,Baseball Field,Women's Store,Electronics Store,Food & Drink Shop,Flea Market,Fish Market,Fish & Chips Shop,Filipino Restaurant,Field,Fast Food Restaurant
