# Segmenting and Clustering Neighborhoods in Toronto

Let's download all the libraries that will be used for the clustring:

In [537]:
# data scraping and handling JSON files
from bs4 import BeautifulSoup
import requests
import json
from pandas.io.json import json_normalize

# handling data
import numpy as np
import pandas as pd

# visualization
import matplotlib.pyplot as plt
%matplotlib inline
import folium

# clustering
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

## __PART 1__: Scraping Wikipedia page to produce the neighborhoods table.

We proceed with scraping the Wikipedia page in order to obtain the information about neighborhoods in Toronto:
- first, we get the data from the web page and extract the part that contains the table,
- second, we clean the data and store it in three lists, corresponding to postcode, borough and neighborhood,
- lastly, we create a pandas dataframe to hold the table and perform the modifications, suggested in the assignment (drop the cells with 'Not assigned' borough, combine the rows with the same code area, modify 'Not assigned' neighborhoods).

In [538]:
# get the content of the Wikipedia page
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
page = requests.get(url)

# Parse the content with BeautifulSoup
soup = BeautifulSoup(page.content, 'html.parser')
values = soup.find( "table" ) 
table = values.findAll('td')
table[0:15]

[<td>M1A</td>, <td>Not assigned</td>, <td>Not assigned
 </td>, <td>M2A</td>, <td>Not assigned</td>, <td>Not assigned
 </td>, <td>M3A</td>, <td><a href="/wiki/North_York" title="North York">North York</a></td>, <td><a href="/wiki/Parkwoods" title="Parkwoods">Parkwoods</a>
 </td>, <td>M4A</td>, <td><a href="/wiki/North_York" title="North York">North York</a></td>, <td><a href="/wiki/Victoria_Village" title="Victoria Village">Victoria Village</a>
 </td>, <td>M5A</td>, <td><a href="/wiki/Downtown_Toronto" title="Downtown Toronto">Downtown Toronto</a></td>, <td><a href="/wiki/Harbourfront_(Toronto)" title="Harbourfront (Toronto)">Harbourfront</a>
 </td>]

In [539]:
# Clean the data and store the columns in three lists
postcode = []
borough = []
neighborhood = []
for i, value in enumerate(table):
    value = str(value).strip('<td>').strip('/<')
    value = value.split('title="')[-1].split('">')[0]
    value = value.split('\n')[0].split(' (')[0]
    value = value.split(', Toronto')[0]
    
    if (i+1)%3 == 1:
        postcode.append(value)
    elif (i+1)%3 == 2:
        borough.append(value)
    else:
        neighborhood.append(value)
        
print("Postcode : ", postcode[0:5])
print("Borough : ", borough[0:5])
print("Neighborhood : ", neighborhood[0:5])

Postcode :  ['M1A', 'M2A', 'M3A', 'M4A', 'M5A']
Borough :  ['Not assigned', 'Not assigned', 'North York', 'North York', 'Downtown Toronto']
Neighborhood :  ['Not assigned', 'Not assigned', 'Parkwoods', 'Victoria Village', 'Harbourfront']


In [540]:
# Store the data in a pandas dataframe
df = pd.DataFrame({'Postcode' : postcode,
                  'Borough'  : borough,
                  'Neighborhood' : neighborhood})
df.head(10)

Unnamed: 0,Postcode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights
7,M6A,North York,Lawrence Manor
8,M7A,Queen's Park,Not assigned
9,M8A,Not assigned,Not assigned


In [541]:
# Drop cells wiht borough = 'Not assigned'
df = df[df['Borough'] != 'Not assigned']
df.head(10)

Unnamed: 0,Postcode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights
7,M6A,North York,Lawrence Manor
8,M7A,Queen's Park,Not assigned
10,M9A,Etobicoke,Islington Avenue
11,M1B,Scarborough,Rouge
12,M1B,Scarborough,Malvern


In [542]:
# Combine rows with same postcode  
def f(x):
    return pd.Series(dict(Borough = x['Borough'].unique()[0], 
                        Neighborhood = ', '.join(x['Neighborhood'])))
df = df.groupby('Postcode').apply(f)
df.reset_index(inplace=True)
df.head(10) 

Unnamed: 0,Postcode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park"
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge"
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


In [543]:
# Transform cell with 'Not assigned' Neighborhood
df.loc[df['Neighborhood'] == 'Not assigned','Neighborhood']  = df.loc[df['Neighborhood'] == 'Not assigned','Borough']


We print the shape of the resulting dataframe and also double-check the following:
- number of unique postcodes coincides with the number of rows in the dataframe
- there are no 'Not assigned' values in either 'Borough' or 'Neighborhood' columns

In [544]:
# Check that all postcodes are unique 
# and there are no 'Not assigned' values

print("Shape of the dataframe is ", df.shape)
print("There are {} unique postcodes".format(df['Postcode'].count()))
print("Are there any 'Not assigned' boroughs? : ", 
      df['Borough'].isin(['Not assigned']).any())
print("Are there any 'Not assigned' neighborhoods? : ", 
      df['Neighborhood'].isin(['Not assigned']).any())

Shape of the dataframe is  (103, 3)
There are 103 unique postcodes
Are there any 'Not assigned' boroughs? :  False
Are there any 'Not assigned' neighborhoods? :  False


## __PART 2__: Getting geographical coordinates of the neighborhoods.

I tried to use geocoder, as it was suggested, but it didn't work: after waiting for > 10 minutes to get the coordinates, I gave up and decided to use the .csv file.

In [545]:
# download coordinates
coordinates = pd.read_csv('http://cocl.us/Geospatial_data')
coordinates.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [546]:
# insert 'Latitude' and 'Longitude' columns into the dataframe
df['Latitude'] = coordinates['Latitude']
df['Longitude'] = coordinates['Longitude']
df.head()

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


## __PART 3__: Explore and cluster the neighborhoods.

We start by examining our final data.

In [547]:
print("There are {} unique boroughs".format(len(df['Borough'].unique())))
print("There are {} unique neighborhoods".format(
    len(df['Neighborhood'].unique())))

There are 11 unique boroughs
There are 103 unique neighborhoods


Let's also visualize the neighborhoods.

In [579]:
!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim
address = 'Toronto, Canada'

geolocator = Nominatim(user_agent="tor_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

Collecting package metadata: done
Solving environment: done


  current version: 4.6.13
  latest version: 4.6.14

Please update conda by running

    $ conda update -n base conda



# All requested packages already installed.



### 1. Let's also explore one neighborhood, using Foursquare data. 

Following the approach from the Lab, let pick a neighborhood and see what kind of venues are popular in it.

__NOTE__: I cut the cell, containing my Client ID and Client Secret, but it should follow the current cell in order for the code to work.

In [550]:
neighborhood = 'St. James Town'
neighborhood_latitude = df.loc[df['Neighborhood'] == neighborhood, 'Latitude']
neighborhood_latitude = neighborhood_latitude.values[0]
neighborhood_longitude = df.loc[df['Neighborhood'] == neighborhood, 'Longitude']
neighborhood_longitude = neighborhood_longitude.values[0]

LIMIT = 100
radius = 500
search_query = ''
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

results = requests.get(url).json()


In [551]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [552]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Gyu-Kaku Japanese BBQ,Japanese Restaurant,43.651422,-79.375047
1,Crepe TO,Creperie,43.650063,-79.374587
2,Terroni,Italian Restaurant,43.650927,-79.375602
3,GEORGE Restaurant,Restaurant,43.653346,-79.374445
4,Fahrenheit Coffee,Coffee Shop,43.652384,-79.372719


In [553]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare.


### 2. Let's get similar data for all neighborhoods, using the function provided in the Lab

In [554]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    return(nearby_venues)

In [556]:
toronto_venues = getNearbyVenues(names=df['Neighborhood'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )



Rouge, Malvern
Highland Creek, Rouge Hill, Port Union
Guildwood, Morningside, West Hill
Woburn
Cedarbrae
Scarborough Village
East Birchmount Park, Ionview, Kennedy Park
Clairlea, Golden Mile, Oakridge
Cliffcrest, Cliffside, Scarborough Village West
Birch Cliff, Cliffside West
Dorset Park, Scarborough Town Centre, Wexford Heights
Maryvale, Wexford
Agincourt
Clarks Corners, Sullivan, Tam O'Shanter – Sullivan
Agincourt North, L'Amoreaux East, Milliken, Ontario, Steeles East
L'Amoreaux West
Upper Rouge
Hillcrest Village
Fairview, Henry Farm, Oriole
Bayview Village
Silver Hills, York Mills
Newtonbrook, Willowdale
Willowdale South
York Mills West
Willowdale West
Parkwoods
Don Mills North
Flemingdon Park, Don Mills South
Bathurst Manor, Downsview North, Wilson Heights
Northwood Park, York University
CFB Toronto, Downsview East
Downsview
Downsview Central
Downsview Northwest
Victoria Village
Woodbine Gardens, Parkview Hill
Woodbine Heights
The Beaches
Leaside
Thorncliffe Park
East Toronto
The 

### 3. There are 103 neighborhoods and to simplify analysis a lit bit, I filter out all neighborhoods for which there are less than 20 venues found.

In [581]:
venues_per_neighborhood = toronto_venues.groupby('Neighborhood').count()
venues_per_neighborhood.reset_index(inplace = True)
# print(venues_per_neighborhood.sort_values(by = 'Venue'))
lim = 20
more_than_20 = venues_per_neighborhood.loc[venues_per_neighborhood['Venue']>=lim,
                                           'Neighborhood']
toronto_venues = toronto_venues[toronto_venues['Neighborhood'].isin(more_than_20)]

print("Now there are {} neighborhoods to cluster".format(len(toronto_venues['Neighborhood'].unique())))

Now there are 31 neighborhoods to cluster


In [586]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",100,100,100,100,100,100
"Bedford Park, Lawrence Manor East",23,23,23,23,23,23
Berczy Park,57,57,57,57,57,57
"Cabbagetown, St. James Town",44,44,44,44,44,44
Central Bay Street,88,88,88,88,88,88
"Chinatown, Grange Park, Kensington Market",100,100,100,100,100,100
Church and Wellesley,88,88,88,88,88,88
"Commerce Court, Victoria Hotel",100,100,100,100,100,100
Davisville,32,32,32,32,32,32
"Design Exchange, Toronto Dominion Centre",100,100,100,100,100,100


### 4. Now we're left with one third of the initial neighborhoods. Let's proceed with encoding the venues categories and calculating frequencies.

In [584]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
categories = toronto_onehot.columns.values
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
index = np.argwhere(categories =='Neighborhood')
categories  = list(np.delete(categories, index) )
fixed_columns = ['Neighborhood'] + categories

toronto_onehot = toronto_onehot[fixed_columns]
toronto_onehot.head()

Unnamed: 0,Neighborhood,Adult Boutique,Afghan Restaurant,American Restaurant,Antique Shop,Aquarium,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio
92,"Fairview, Henry Farm, Oriole",0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
93,"Fairview, Henry Farm, Oriole",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
94,"Fairview, Henry Farm, Oriole",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
95,"Fairview, Henry Farm, Oriole",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
96,"Fairview, Henry Farm, Oriole",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [561]:
toronto_onehot.shape

(1793, 222)

In [562]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean()
toronto_grouped.reset_index(inplace=True)
toronto_grouped.head()

Unnamed: 0,Neighborhood,Adult Boutique,Afghan Restaurant,American Restaurant,Antique Shop,Aquarium,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,"Adelaide, King, Richmond",0.0,0.0,0.04,0.0,0.0,0.0,0.01,0.01,0.0,...,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0
1,"Bedford Park, Lawrence Manor East",0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,...,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0
3,"Cabbagetown, St. James Town",0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.0,0.0,0.011364,0.0,0.0,0.0,0.0,0.011364,0.0,...,0.0,0.0,0.0,0.011364,0.0,0.0,0.011364,0.0,0.0,0.011364


### 5. Next, let's analyze the most frequent venues in each neighborhood

In [587]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide, King, Richmond----
                 venue  freq
0          Coffee Shop  0.06
1                 Café  0.05
2  American Restaurant  0.04
3           Steakhouse  0.04
4      Thai Restaurant  0.04


----Bedford Park, Lawrence Manor East----
                  venue  freq
0           Coffee Shop  0.09
1    Italian Restaurant  0.09
2  Fast Food Restaurant  0.09
3     Indian Restaurant  0.04
4       Thai Restaurant  0.04


----Berczy Park----
          venue  freq
0   Coffee Shop  0.09
1  Cocktail Bar  0.05
2        Bakery  0.04
3    Restaurant  0.04
4      Beer Bar  0.04


----Cabbagetown, St. James Town----
                venue  freq
0         Coffee Shop  0.09
1  Italian Restaurant  0.05
2              Bakery  0.05
3                 Pub  0.05
4          Restaurant  0.05


----Central Bay Street----
                venue  freq
0         Coffee Shop  0.15
1                Café  0.06
2  Italian Restaurant  0.05
3        Burger Joint  0.03
4              Bakery  0.02


----Chinat

In [565]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [573]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Café,American Restaurant,Steakhouse,Thai Restaurant,Hotel,Bakery,Burger Joint,Gym,Bar
1,"Bedford Park, Lawrence Manor East",Italian Restaurant,Coffee Shop,Fast Food Restaurant,Grocery Store,Comfort Food Restaurant,Pharmacy,Pizza Place,Pub,Restaurant,Sandwich Place
2,Berczy Park,Coffee Shop,Cocktail Bar,Bakery,Seafood Restaurant,Café,Restaurant,Beer Bar,Steakhouse,Cheese Shop,Farmers Market
3,"Cabbagetown, St. James Town",Coffee Shop,Pizza Place,Bakery,Café,Pub,Market,Italian Restaurant,Restaurant,Gastropub,Flower Shop
4,Central Bay Street,Coffee Shop,Café,Italian Restaurant,Burger Joint,Bar,Chinese Restaurant,Sushi Restaurant,Salad Place,Spa,Japanese Restaurant


### 6. Finally, let's run k-means algorithms to split the neighborhoods into 4 clusters.

In [574]:
# set number of clusters
kclusters = 4

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([3, 2, 3, 3, 3, 0, 3, 3, 2, 3, 0, 1, 3, 1, 0, 3, 3, 0, 2, 0, 1, 2,
       2, 3, 3, 3, 3, 2, 2, 2, 2], dtype=int32)

In [575]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
toronto_merged = df.loc[df['Neighborhood'].isin(toronto_venues['Neighborhood'].unique())]

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() 


Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,M2J,North York,"Fairview, Henry Farm, Oriole",43.778517,-79.346556,1,Clothing Store,Fast Food Restaurant,Coffee Shop,Toy / Game Store,Restaurant,Juice Bar,Jewelry Store,Asian Restaurant,Tea Room,Bus Station
22,M2N,North York,Willowdale South,43.77012,-79.408493,2,Coffee Shop,Ramen Restaurant,Sushi Restaurant,Restaurant,Sandwich Place,Café,Movie Theater,Ice Cream Shop,Grocery Store,Japanese Restaurant
27,M3C,North York,"Flemingdon Park, Don Mills South",43.7259,-79.340923,1,Coffee Shop,Gym,Asian Restaurant,Beer Store,Grocery Store,Italian Restaurant,Bike Shop,Chinese Restaurant,Clothing Store,Discount Store
38,M4G,East York,Leaside,43.70906,-79.363452,2,Sporting Goods Shop,Coffee Shop,Furniture / Home Store,Burger Joint,Pet Store,Restaurant,Bike Shop,Smoothie Shop,Breakfast Spot,Brewery
41,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,2,Greek Restaurant,Coffee Shop,Ice Cream Shop,Furniture / Home Store,Bookstore,Italian Restaurant,Pizza Place,Brewery,Bubble Tea Shop,Café


### 7. Now after all the work has be done, we can analyze the results.

    7.a. We can visualize the clusters and examine the plots in case there is a clear visual pattern.

In [595]:
import matplotlib.cm as cm
import matplotlib.colors as colors
# create map
address = 'Toronto, Canada'

geolocator = Nominatim(user_agent="tor_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], 
                                  toronto_merged['Longitude'], 
                                  toronto_merged['Neighborhood'],
                                  toronto_merged['Cluster Labels']):
    cluster = int(cluster)
    label = folium.Popup(str(poi) + ' Cluster ' 
                         + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)

display(map_clusters)

_Although, we can't say that the results are obvious, we can mention several observations_:
- cluster 3 (yellow) clearly corresponds to the Downtown part of the Toronto,
- cluster 0 (red) is close to cluster 3, but it seems to be located slightly to the left from the center of the city, so probably it corresponds to a less busy central part of Toronto,
- clusters 1 and 2(blue and purple) both contain neighborhoods that a re located further from the city center, so most likely they correspond to the residential neighborhoods.

    7.b. To gain more information about the clusters, let's look at the summary for each of them.

__Cluster 0__:

In [577]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]



Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
66,Downtown Toronto,0,Café,Japanese Restaurant,Bar,Bakery,Bookstore,Restaurant,Italian Restaurant,Beer Bar,Beer Store,Sandwich Place
67,Downtown Toronto,0,Café,Vegetarian / Vegan Restaurant,Bar,Dumpling Restaurant,Mexican Restaurant,Bakery,Coffee Shop,Chinese Restaurant,Vietnamese Restaurant,Cocktail Bar
76,West Toronto,0,Pharmacy,Bakery,Supermarket,Bar,Gym,Liquor Store,Fast Food Restaurant,Middle Eastern Restaurant,Discount Store,Music Venue
77,West Toronto,0,Bar,Asian Restaurant,Coffee Shop,Restaurant,Café,Pizza Place,Cocktail Bar,Bakery,Men's Store,New American Restaurant
82,West Toronto,0,Mexican Restaurant,Café,Bar,Park,Bakery,Cajun / Creole Restaurant,Flea Market,Speakeasy,Bookstore,Fried Chicken Joint


__Cluster 1__:

In [578]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]



Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,North York,1,Clothing Store,Fast Food Restaurant,Coffee Shop,Toy / Game Store,Restaurant,Juice Bar,Jewelry Store,Asian Restaurant,Tea Room,Bus Station
27,North York,1,Coffee Shop,Gym,Asian Restaurant,Beer Store,Grocery Store,Italian Restaurant,Bike Shop,Chinese Restaurant,Clothing Store,Discount Store
46,Central Toronto,1,Clothing Store,Coffee Shop,Yoga Studio,Furniture / Home Store,Spa,Fast Food Restaurant,Sporting Goods Shop,Metro Station,Mexican Restaurant,Salon / Barbershop


__Cluster 2__:

In [468]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]



Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,North York,2,Coffee Shop,Ramen Restaurant,Sushi Restaurant,Restaurant,Sandwich Place,Café,Movie Theater,Ice Cream Shop,Grocery Store,Japanese Restaurant
38,East York,2,Sporting Goods Shop,Coffee Shop,Furniture / Home Store,Burger Joint,Pet Store,Restaurant,Bike Shop,Smoothie Shop,Breakfast Spot,Brewery
41,East Toronto,2,Greek Restaurant,Coffee Shop,Ice Cream Shop,Furniture / Home Store,Bookstore,Italian Restaurant,Pizza Place,Brewery,Bubble Tea Shop,Café
42,East Toronto,2,Park,Sandwich Place,Pet Store,Brewery,Burger Joint,Burrito Place,Pub,Coffee Shop,Pizza Place,Movie Theater
47,Central Toronto,2,Sandwich Place,Dessert Shop,Pizza Place,Café,Sushi Restaurant,Italian Restaurant,Coffee Shop,Farmers Market,Indian Restaurant,Seafood Restaurant
62,North York,2,Italian Restaurant,Coffee Shop,Fast Food Restaurant,Grocery Store,Comfort Food Restaurant,Pharmacy,Pizza Place,Pub,Restaurant,Sandwich Place
65,Central Toronto,2,Coffee Shop,Sandwich Place,Café,Pizza Place,Pharmacy,Indian Restaurant,Park,Pub,Burger Joint,Liquor Store
84,West Toronto,2,Café,Coffee Shop,Pizza Place,Gym,Italian Restaurant,Sushi Restaurant,Diner,Pub,Restaurant,Electronics Store
85,Queen's Park,2,Coffee Shop,Japanese Restaurant,Gym,Diner,Café,Smoothie Shop,Burger Joint,Burrito Place,Seafood Restaurant,Liquor Store


__Cluster 3__:

In [470]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]



Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
43,East Toronto,3,Café,Coffee Shop,Bakery,American Restaurant,Italian Restaurant,Yoga Studio,Brewery,Seafood Restaurant,Sandwich Place,Cheese Shop
51,Downtown Toronto,3,Coffee Shop,Pizza Place,Bakery,Café,Pub,Market,Italian Restaurant,Restaurant,Gastropub,Flower Shop
52,Downtown Toronto,3,Coffee Shop,Japanese Restaurant,Gay Bar,Sushi Restaurant,Restaurant,Mediterranean Restaurant,Fast Food Restaurant,Bubble Tea Shop,Pub,Café
53,Downtown Toronto,3,Coffee Shop,Café,Pub,Bakery,Park,Theater,Mexican Restaurant,Breakfast Spot,Electronics Store,Event Space
54,Downtown Toronto,3,Coffee Shop,Clothing Store,Café,Cosmetics Shop,Middle Eastern Restaurant,Restaurant,Sporting Goods Shop,Fast Food Restaurant,Italian Restaurant,Tea Room
55,Downtown Toronto,3,Coffee Shop,Hotel,Restaurant,Café,Cosmetics Shop,Breakfast Spot,Italian Restaurant,Gastropub,Bakery,Park
56,Downtown Toronto,3,Coffee Shop,Cocktail Bar,Bakery,Seafood Restaurant,Café,Restaurant,Beer Bar,Steakhouse,Cheese Shop,Farmers Market
57,Downtown Toronto,3,Coffee Shop,Café,Italian Restaurant,Burger Joint,Bar,Chinese Restaurant,Sushi Restaurant,Salad Place,Spa,Japanese Restaurant
58,Downtown Toronto,3,Coffee Shop,Café,American Restaurant,Steakhouse,Thai Restaurant,Hotel,Bakery,Burger Joint,Gym,Bar
59,Downtown Toronto,3,Coffee Shop,Hotel,Aquarium,Café,Italian Restaurant,Bakery,Restaurant,Brewery,Pizza Place,Scenic Lookout


- Indeed, looking at most popular venues in cluster 0, we notice that there are places like Flea market, Discount Store, etc, so it does look less busy than cluster 3.

- Looking more closely at the venues in clusters 1 and 2, we notice that there are more shops/stores and gyms in cluster 1, while cluster 2 has more parks and restaurants, so we can conclude that cluster 2 is more lively and "_recreational_" and cluster 1 is more "_convenient_".