# Segmenting and Clustering Zipcode-based Neighborhoods in Toronto <a id="Top"></a>

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

Part 1. <a href="#Part_1">Create zipcode and neighborhood dataframe</a>

Part 2. <a href="#Part_2">Fetch Toronto zip code GPS coordinates</a>

Part 3. <a href="#Part_3">Explore and cluster the neighborhoods in Toronto</a>

Part 4. <a href="#Part_4">Cluster Zipcodes</a>

</font>
</div>

## Part 1. Create zipcode and neighborhood dataframe <a id="Part_1"></a> 
<a href="#Top">Back to page top</a>

Import necessary libraries

In [1]:
import numpy as np
import pandas as pd
import urllib3
import certifi
import requests
import folium
import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans
from bs4 import BeautifulSoup
from pandas.io.json import json_normalize
from geopy.geocoders import Nominatim

Fetch the Toronto postcode Wikipedia page using the __urllib3__ package.

In [2]:
# Wikipedia page URL
page_url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

# Create a PoolManager that verifies certificates when making requests. 
# See: https://urllib3.readthedocs.io/en/latest/user-guide.html#ssl
http = urllib3.PoolManager(cert_reqs='CERT_REQUIRED', ca_certs=certifi.where())

# Fetch the desired Wiki page
page = http.request('GET', page_url)

To create the desired dataframe, we proceed with the following steps:
1. Create a Beautifulsoup object called soup. Then from the object, retrieve the table that contains the zipcodes.
2. Convert the table into a pandas DataFrame using __pd.read_html__.
3. Clean up the table by removing unassigned zipcodes.
4. Aggregate neighborhoods that have the same zipcode.

#### Step 1: Scrape the Wikipedia page and fetch the zipcode table

In [3]:
# Create a BeautifulSoup object and get the associated table 
soup = BeautifulSoup(page.data, 'lxml')
table = soup.table

#### Step 2: Convert the table into a dataframe

In [4]:
# Use pandas read_html function to convert the table object into a DataFrame
df_import = pd.read_html(str(table))[0]
df_import.head()

Unnamed: 0,0,1,2
0,Postcode,Borough,Neighbourhood
1,M1A,Not assigned,Not assigned
2,M2A,Not assigned,Not assigned
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village


#### Step 3-1: Clean up the datafame: Remove 'Not Assigned' zipcodes.

In [5]:
# Use the first row as column names. Then we drop the first row and reset the index of the entire DataFrame.
df_import.columns = df_import.iloc[0]
df_import = df_import.drop([0])

# Ignore borough that is "Not assigned" and reset DataFrame index
df_import = df_import[ df_import.Borough != 'Not assigned' ]
df_import = df_import.reset_index(drop=True)

In [6]:
df_import.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights


#### Step 3-2: Clean up the dataframe: Assign "Not assigned" neighbourhoods with corresponding Borough names.

In [7]:
# Get a list of index of "Not assigned" neighrourhoods 
index_list = df_import[ df_import.Neighbourhood == 'Not assigned' ].index

# Assign these neighbourhoods with corresponding Borough names.
for i in index_list:
    df_import.at[i, 'Neighbourhood'] = df_import.at[i, 'Borough']

# Double check if there are leftovers...    
#index_list = df[ df.Neighbourhood == 'Not assigned' ].index
#print(len(index_list))

#### Step 4: Aggregate neighborhoods in the same zipcode area.
We group by 'Postcode' and 'Borough' columns, then use the __apply__ method to join the data in the same group and separate them by comma. Finally, we reset the dataframe's index.

In [8]:
df = df_import.groupby(['Postcode', 'Borough'])['Neighbourhood'].apply(', '.join).reset_index()

### The zipcode and neighborhood dataframe
__Note that the Postcode sorting order is not the same as the one shown in the assignment instruction. Here the rows are sorted based on alphabetical order of the Postcodes__.

In [9]:
#pd.set_option('display.max_rows', len(df))
df.head(15)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park"
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge"
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


Finally, print out the dimension of the dataframe.

In [10]:
df.shape

(103, 3)

---

## 2. Fetch Toronto zip code GPS coordinates <a id="Part_2"></a>
<a href="#Top">Back to page top</a>

Google API sometimes does not work. So we'll just upload the CSV file prepared by the instructor.

#### Option 1: Google Map API

In [48]:
import geocoder
Longitude = []
Latitude = []
for zipcode in df['Postcode']:
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
      g = geocoder.google('{}, Toronto, Ontario'.format(zipcode))
      lat_lng_coords = g.latlng

    lat = lat_lng_coords[0]
    lng = lat_lng_coords[1]
    Latitude.append(lat)
    Longitude.append(lng)
    print('{0}: ({1:12.6f}, {2:12.6f})'.format(zipcode, lat, lng))

M1B: (   43.806686,   -79.194353)
M1C: (   43.784535,   -79.160497)
M1E: (   43.763573,   -79.188711)
M1G: (   43.770992,   -79.216917)
M1H: (   43.773136,   -79.239476)
M1J: (   43.744734,   -79.239476)
M1K: (   43.727929,   -79.262029)
M1L: (   43.711112,   -79.284577)
M1M: (   43.716316,   -79.239476)
M1N: (   43.692657,   -79.264848)
M1P: (   43.757410,   -79.273304)
M1R: (   43.750071,   -79.295849)
M1S: (   43.794200,   -79.262029)
M1T: (   43.781638,   -79.304302)
M1V: (   43.815252,   -79.284577)
M1W: (   43.799525,   -79.318389)
M1X: (   43.836125,   -79.205636)
M2H: (   43.803762,   -79.363452)
M2J: (   43.778517,   -79.346556)
M2K: (   43.786947,   -79.385975)
M2L: (   43.757490,   -79.374714)
M2M: (   43.789053,   -79.408493)
M2N: (   43.770120,   -79.408493)
M2P: (   43.752758,   -79.400049)
M2R: (   43.782736,   -79.442259)
M3A: (   43.753259,   -79.329656)
M3B: (   43.745906,   -79.352188)
M3C: (   43.725900,   -79.340923)
M3H: (   43.754328,   -79.442259)
M3J: (   43.76

#### Option 2: Use the existing file from the instruction.

In [12]:
# Read in the CSV as a dataframe.
df_zipcode = pd.read_csv('Geospatial_Coordinates.csv')

Longitude = []
Latitude = []
for zipcode in df['Postcode']:
    row_id = df[df_zipcode['Postal Code']==zipcode].index[0]
    lat = df_zipcode.at[row_id, 'Latitude']
    lng = df_zipcode.at[row_id, 'Longitude']
    Latitude.append(lat)
    Longitude.append(lng)
    print('{0}: ({1:12.6f}, {2:12.6f})'.format(zipcode, lat, lng))

M1B: (   43.806686,   -79.194353)
M1C: (   43.784535,   -79.160497)
M1E: (   43.763573,   -79.188711)
M1G: (   43.770992,   -79.216917)
M1H: (   43.773136,   -79.239476)
M1J: (   43.744734,   -79.239476)
M1K: (   43.727929,   -79.262029)
M1L: (   43.711112,   -79.284577)
M1M: (   43.716316,   -79.239476)
M1N: (   43.692657,   -79.264848)
M1P: (   43.757410,   -79.273304)
M1R: (   43.750072,   -79.295849)
M1S: (   43.794200,   -79.262029)
M1T: (   43.781638,   -79.304302)
M1V: (   43.815252,   -79.284577)
M1W: (   43.799525,   -79.318389)
M1X: (   43.836125,   -79.205636)
M2H: (   43.803762,   -79.363452)
M2J: (   43.778517,   -79.346556)
M2K: (   43.786947,   -79.385975)
M2L: (   43.757490,   -79.374714)
M2M: (   43.789053,   -79.408493)
M2N: (   43.770120,   -79.408493)
M2P: (   43.752758,   -79.400049)
M2R: (   43.782736,   -79.442259)
M3A: (   43.753259,   -79.329656)
M3B: (   43.745906,   -79.352188)
M3C: (   43.725900,   -79.340923)
M3H: (   43.754328,   -79.442259)
M3J: (   43.76

In [13]:
df['Latitude'] = Latitude
df['Longitude'] = Longitude

### The zipcode, neighborhood, and GPS coordinates dataframe
__Note that the Postcode sorting order is not the same as the one shown in the assignment instruction. Here the rows are sorted based on alphabetical order of the Postcodes__.

In [14]:
df.head(15)

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848


---

## 3. Explore and cluster the neighborhoods in Toronto<a id="Part_3"></a>
<a href="#Top">Back to page top</a>

#### Map out the zipcode neighborhoods using the folium package.
First get the GPS location of the City of Toronto. To the best result, we use the coordinates of Downtown Toronto.

In [15]:
address = 'Downtown Toronto, Toronto, CAN'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
#print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

Create map of Tronto using latitude and longitude values and overlap it with neighborhood GPS location markers.

In [16]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood, borough, postcode in zip(df['Latitude'], df['Longitude'], df['Neighbourhood'], df['Borough'], df['Postcode']):
    label = ("NEIGHBORHOOD: {}, BOROUGH: {}, POSTCODE: {}").format(neighborhood, borough, postcode)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_toronto)
    
map_toronto

---

#### In the rest of the notebook, we'll just focus on boroughs that contain the word "Toronto."
Create a borough filter and apply it to the dataframe. Name the resulting dataframe as __toronto__.

In [17]:
toronto = df[ df['Borough'].str.contains('Toronto') ].reset_index(drop=True)
toronto

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
2,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572
3,M4M,East Toronto,Studio District,43.659526,-79.340923
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879
5,M4P,Central Toronto,Davisville North,43.712751,-79.390197
6,M4R,Central Toronto,North Toronto West,43.715383,-79.405678
7,M4S,Central Toronto,Davisville,43.704324,-79.38879
8,M4T,Central Toronto,"Moore Park, Summerhill East",43.689574,-79.38316
9,M4V,Central Toronto,"Deer Park, Forest Hill SE, Rathnelly, South Hi...",43.686412,-79.400049


#### Map out downtown Toronto

Get the GPS location of Downtown Toronto.

In [18]:
address = 'Downtown Toronto, CAN'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
#print('The geograpical coordinate of Downtown Toronto are {}, {}.'.format(latitude, longitude))

Create the map of Toronto Boroughs using latitude and longitude values.

In [19]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, neighborhood, borough, postcode in zip(
                                toronto['Latitude'], 
                                toronto['Longitude'], 
                                toronto['Neighbourhood'], 
                                toronto['Borough'], 
                                toronto['Postcode']):
    label = ("NEIGHBORHOOD: {}, BOROUGH: {}, POSTCODE: {}").format(neighborhood, borough, postcode)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_toronto)
    
map_toronto

### Utilize the Foursquare API to explore the Toronto neighborhoods and segment them.

In [20]:
# @hidden_cell
CLIENT_ID = 'JBAVIGVGG3N3AWC1FGO2G3U1N3GUOWBEKXFI1SDAOCYYPULD' # your Foursquare ID
CLIENT_SECRET = 'GPGOGGAB5YFPREDIUHAT5OZNYRDBVGZH1WC21KBQMVEP3BIC' # your Foursquare Secret
VERSION = '20180927'

#print('Your credentails:')
#print('CLIENT_ID: ' + CLIENT_ID)
#print('CLIENT_SECRET:' + CLIENT_SECRET)

#### Let's begin with the first zipcode in the dataframe.

In [21]:
toronto.loc[0, 'Postcode']

'M4E'

Get the zipcode's latitude and longitude values.

In [22]:
postcode_latitude = toronto.loc[0, 'Latitude'] 
postcode_longitude = toronto.loc[0, 'Longitude']
postcode_name = toronto.loc[0, 'Postcode']

print('Latitude and longitude values of {} are {}, {}.'.format(postcode_name, 
                                                               postcode_latitude, 
                                                               postcode_longitude))

Latitude and longitude values of M4E are 43.67635739999999, -79.2930312.


#### Get the top 100 venues that are in the M4E postcode area within a radius of 500 meters.

In [23]:
# Set up the FourSquare API call
RADIUS = 500
LIMIT  = 100
url = 'https://api.foursquare.com/v2/venues/explore?client_id={0}&client_secret={1}&v={2}&ll={3},{4}&radius={5}&limit={6}'.format(
    CLIENT_ID,
    CLIENT_SECRET,
    VERSION,
    postcode_latitude,    
    postcode_longitude,
    RADIUS,
    LIMIT)

# Fetch the top 100 venues
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5bb087d14434b97371526c6b'},
 'response': {'headerLocation': 'The Beaches',
  'headerFullLocation': 'The Beaches, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 4,
  'suggestedBounds': {'ne': {'lat': 43.680857404499996,
    'lng': -79.28682091449052},
   'sw': {'lat': 43.67185739549999, 'lng': -79.29924148550948}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4e77e3861f6ecf8d3648300c',
       'name': 'Starbucks',
       'location': {'address': '637 Kingston Rd.',
        'crossStreet': 'at Main St.',
        'lat': 43.67879837444001,
        'lng': -79.2980449760153,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.67879837444001,
          'lng': -79.2980449760153}],
        'distance'

#### Define the function that extracts the category of the venue.

In [24]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Use the __get_category_type()__ function to clean up the json and convert it into a pandas dataframe.

In [25]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Starbucks,Coffee Shop,43.678798,-79.298045
1,Grover Pub and Grub,Pub,43.679181,-79.297215
2,Glen Stewart Ravine,Other Great Outdoors,43.6763,-79.294784
3,Upper Beaches,Neighborhood,43.680563,-79.292869


In [26]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

4 venues were returned by Foursquare.


### Explore the Toronto boroughs

We'd like to apply the above procedure to all the neighborhoods in Toronto boroughs. For this purpose, we define the following function to carry out the task. Note that the function returns the __nearby_venues__ dataframe and a Boolean list __venues_check_list__ indicating null when the FourSquare API does not find any result within the given radius .

In [27]:
def getNearbyVenues(names, latitudes, longitudes, radius, limit):
    
    venues_check_list = []
    venues_list=[]
    idx = 0
    for name, lat, lng in zip(names, latitudes, longitudes):
        
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
        
        num_of_venues_found = len(results)
        if (num_of_venues_found == 0):
            venues_check_list.append(False)
        else:
            venues_check_list.append(True)
        print('{0:4d} Postcode: {1}, number of venues found:{2:6d}'.format(idx, name, num_of_venues_found))
        idx = idx + 1


    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Postcode', 
                  'Postcode Latitude', 
                  'Postcode Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues, venues_check_list)

Apply the above function and get a new dataframe __toronto_venues__ and the corresponding Boolean venue check list.

In [28]:
postcodes = toronto.loc[:, 'Postcode']
latitudes = toronto.loc[:, 'Latitude']
longitudes = toronto.loc[:, 'Longitude']

print('\n            Search radius: {0:8.1f} meters'.format(RADIUS))
print(' Maximum number of venues: {0:6d}\n'.format(LIMIT))
toronto_venues, toronto_venues_check_list = getNearbyVenues(postcodes, latitudes, longitudes, RADIUS, LIMIT)


            Search radius:    500.0 meters
 Maximum number of venues:    100

   0 Postcode: M4E, number of venues found:     4
   1 Postcode: M4K, number of venues found:    42
   2 Postcode: M4L, number of venues found:    16
   3 Postcode: M4M, number of venues found:    41
   4 Postcode: M4N, number of venues found:     4
   5 Postcode: M4P, number of venues found:     8
   6 Postcode: M4R, number of venues found:    20
   7 Postcode: M4S, number of venues found:    37
   8 Postcode: M4T, number of venues found:     5
   9 Postcode: M4V, number of venues found:    16
  10 Postcode: M4W, number of venues found:     4
  11 Postcode: M4X, number of venues found:    48
  12 Postcode: M4Y, number of venues found:    89
  13 Postcode: M5A, number of venues found:    49
  14 Postcode: M5B, number of venues found:   100
  15 Postcode: M5C, number of venues found:   100
  16 Postcode: M5E, number of venues found:    54
  17 Postcode: M5G, number of venues found:    83
  18 Postcode: M5H, n

Check the size of the resulting dataframe.

In [29]:
print(toronto_venues.shape)
toronto_venues.head()

(1697, 7)


Unnamed: 0,Postcode,Postcode Latitude,Postcode Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,M4E,43.676357,-79.293031,Starbucks,43.678798,-79.298045,Coffee Shop
1,M4E,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
2,M4E,43.676357,-79.293031,Glen Stewart Ravine,43.6763,-79.294784,Other Great Outdoors
3,M4E,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
4,M4K,43.679557,-79.352188,Pantheon,43.677621,-79.351434,Greek Restaurant


Check the number of venues for each zipcode.

In [30]:
toronto_venues.groupby('Postcode').count()

Unnamed: 0_level_0,Postcode Latitude,Postcode Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Postcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
M4E,4,4,4,4,4,4
M4K,42,42,42,42,42,42
M4L,16,16,16,16,16,16
M4M,41,41,41,41,41,41
M4N,4,4,4,4,4,4
M4P,8,8,8,8,8,8
M4R,20,20,20,20,20,20
M4S,37,37,37,37,37,37
M4T,5,5,5,5,5,5
M4V,16,16,16,16,16,16


Find out the number of unique categories can be curated from all the returned venues.

In [31]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 232 uniques categories.


### Analyze each postcode heighborhood

In [32]:
pd.set_option('max_columns', 500)

One-hot encoding venue catefories.

In [33]:
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add postcode column back to dataframe
toronto_onehot['Postcode'] = toronto_venues['Postcode'] 

# move postcode column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.shape

(1697, 233)

Group rows by zipcode and by taking the mean of the frequency of occurrence of each category.

In [34]:
toronto_grouped = toronto_onehot.groupby('Postcode').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Postcode,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Arepa Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Beach,Beer Bar,Beer Store,Belgian Restaurant,Bistro,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Butcher,Café,Cajun / Creole Restaurant,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Gym,College Rec Center,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Gluten-free Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hostel,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Korean Restaurant,Lake,Latin American Restaurant,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Mac & Cheese Joint,Market,Martial Arts Dojo,Medical Center,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Movie Theater,Museum,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Park,Performing Arts Venue,Persian Restaurant,Pet Store,Pharmacy,Piano Bar,Pizza Place,Plane,Playground,Plaza,Portuguese Restaurant,Poutine Place,Pub,Ramen Restaurant,Record Shop,Recording Studio,Rental Car Location,Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soup Place,Southern / Soul Food Restaurant,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,Stationery Store,Steakhouse,Strip Club,Supermarket,Sushi Restaurant,Swim School,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tea Room,Thai Restaurant,Theater,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Women's Store,Yoga Studio
0,M4E,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,M4K,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.02381,0.02381,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.238095,0.02381,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.02381,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.02381,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381
2,M4L,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0625,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,M4M,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04878,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04878,0.02439,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.02439,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.097561,0.0,0.0,0.02439,0.02439,0.0,0.0,0.0,0.02439,0.0,0.073171,0.0,0.0,0.0,0.02439,0.0,0.0,0.02439,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04878,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.04878,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439
4,M4N,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Double check the size of one-hot encoded and aggregated dataframe.

In [35]:
toronto_grouped.shape

(38, 233)

Print out each zipcode neighborhood along with the top 5 most common venues.

In [36]:
num_top_venues = 5

for zipcode in toronto_grouped['Postcode']:
    print("---- " + zipcode + " ----")
    temp = toronto_grouped[toronto_grouped['Postcode'] == zipcode].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

---- M4E ----
                  venue  freq
0           Coffee Shop  0.25
1                   Pub  0.25
2  Other Great Outdoors  0.25
3          Neighborhood  0.25
4     Accessories Store  0.00


---- M4K ----
                venue  freq
0    Greek Restaurant  0.24
1      Ice Cream Shop  0.07
2         Coffee Shop  0.07
3           Bookstore  0.05
4  Italian Restaurant  0.05


---- M4L ----
                venue  freq
0           Pet Store  0.06
1  Italian Restaurant  0.06
2   Fish & Chips Shop  0.06
3        Burger Joint  0.06
4       Burrito Place  0.06


---- M4M ----
                 venue  freq
0                 Café  0.10
1          Coffee Shop  0.07
2   Italian Restaurant  0.05
3            Gastropub  0.05
4  American Restaurant  0.05


---- M4N ----
                venue  freq
0                Park  0.25
1  Dim Sum Restaurant  0.25
2         Swim School  0.25
3            Bus Line  0.25
4   Accessories Store  0.00


---- M4P ----
               venue  freq
0              Hotel 

Define a function that sorts the venues in descending order.

In [37]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Create a dataframe that contains venues in descending order for each zipcode area.

In [38]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Postcode']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
postcode_venues_sorted = pd.DataFrame(columns=columns)
postcode_venues_sorted['Postcode'] = toronto_grouped['Postcode']

for ind in np.arange(toronto_grouped.shape[0]):
    postcode_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

postcode_venues_sorted

Unnamed: 0,Postcode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,Neighborhood,Coffee Shop,Other Great Outdoors,Pub,Dim Sum Restaurant,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant
1,M4K,Greek Restaurant,Ice Cream Shop,Coffee Shop,Italian Restaurant,Bookstore,Fruit & Vegetable Store,Furniture / Home Store,Brewery,Bubble Tea Shop,Café
2,M4L,Pet Store,Ice Cream Shop,Movie Theater,Pub,Burrito Place,Burger Joint,Sandwich Place,Brewery,Liquor Store,Fast Food Restaurant
3,M4M,Café,Coffee Shop,Italian Restaurant,Bakery,Gastropub,American Restaurant,Fish Market,Juice Bar,New American Restaurant,Latin American Restaurant
4,M4N,Bus Line,Park,Swim School,Dim Sum Restaurant,Yoga Studio,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant
5,M4P,Hotel,Grocery Store,Park,Sandwich Place,Breakfast Spot,Burger Joint,Food & Drink Shop,Yoga Studio,Dumpling Restaurant,Doner Restaurant
6,M4R,Sporting Goods Shop,Coffee Shop,Yoga Studio,Diner,Spa,Sandwich Place,Salon / Barbershop,Mexican Restaurant,Rental Car Location,Café
7,M4S,Coffee Shop,Pizza Place,Dessert Shop,Sandwich Place,Café,Italian Restaurant,Pharmacy,Seafood Restaurant,Sushi Restaurant,Burger Joint
8,M4T,Gym,Playground,Intersection,Trail,Restaurant,Cosmetics Shop,Coworking Space,Farmers Market,Falafel Restaurant,Event Space
9,M4V,Coffee Shop,Pub,Light Rail Station,American Restaurant,Sushi Restaurant,Bagel Shop,Fried Chicken Joint,Sports Bar,Medical Center,Sandwich Place


## 4. Cluster Zipcodes<a id="Part_4"></a>
<a href="#Top">Back to page top</a>

Run $k$-means to cluster the zipcode areas into 5 clusters

In [39]:
# Set the number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Postcode', 1)

# Run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=34).fit(toronto_grouped_clustering)

# Check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 0, 0, 0, 3, 0, 0, 0, 0, 0], dtype=int32)

Create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood. Note that we have applied the venue check list to get rid of NULL rows.

In [40]:
# Create the dataframe
toronto_merged = toronto[toronto_venues_check_list]

# Add clustering labels
toronto_merged['Cluster Labels'] = kmeans.labels_

# Merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(postcode_venues_sorted.set_index('Postcode'), on='Postcode')

toronto_merged.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,1,Neighborhood,Coffee Shop,Other Great Outdoors,Pub,Dim Sum Restaurant,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,0,Greek Restaurant,Ice Cream Shop,Coffee Shop,Italian Restaurant,Bookstore,Fruit & Vegetable Store,Furniture / Home Store,Brewery,Bubble Tea Shop,Café
2,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572,0,Pet Store,Ice Cream Shop,Movie Theater,Pub,Burrito Place,Burger Joint,Sandwich Place,Brewery,Liquor Store,Fast Food Restaurant
3,M4M,East Toronto,Studio District,43.659526,-79.340923,0,Café,Coffee Shop,Italian Restaurant,Bakery,Gastropub,American Restaurant,Fish Market,Juice Bar,New American Restaurant,Latin American Restaurant
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,3,Bus Line,Park,Swim School,Dim Sum Restaurant,Yoga Studio,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant


#### Visualize the resulting clusters

In [41]:
address = 'Downtown Toronto, CAN'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

# create map of New York using latitude and longitude values
map_clusters = folium.Map(location=[latitude+0.02, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**3.2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to map
markers_colors = []
for lat, lng, postcode, borough, neighborhood, cluster in zip(
                                  toronto_merged['Latitude'], 
                                  toronto_merged['Longitude'], 
                                  toronto_merged['Postcode'], 
                                  toronto_merged['Borough'],
                                  toronto_merged['Neighbourhood'],
                                  toronto_merged['Cluster Labels']):
    label = ("CLUSTER : {}, NEIGHBORHOOD: {}, BOROUGH: {}, POSTCODE: {}").format(cluster, neighborhood, borough, postcode)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=7,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
    
map_clusters

### Examine clusters

Define a function that returns detailed information of a given cluster.

In [42]:
def examine_clusters(id):
    return toronto_merged.loc[toronto_merged['Cluster Labels'] == id, toronto_merged.columns[[0] + [1] + list(range(5, toronto_merged.shape[1]))]]

#### Cluster 1

In [43]:
examine_clusters(0)

Unnamed: 0,Postcode,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,M4K,East Toronto,0,Greek Restaurant,Ice Cream Shop,Coffee Shop,Italian Restaurant,Bookstore,Fruit & Vegetable Store,Furniture / Home Store,Brewery,Bubble Tea Shop,Café
2,M4L,East Toronto,0,Pet Store,Ice Cream Shop,Movie Theater,Pub,Burrito Place,Burger Joint,Sandwich Place,Brewery,Liquor Store,Fast Food Restaurant
3,M4M,East Toronto,0,Café,Coffee Shop,Italian Restaurant,Bakery,Gastropub,American Restaurant,Fish Market,Juice Bar,New American Restaurant,Latin American Restaurant
5,M4P,Central Toronto,0,Hotel,Grocery Store,Park,Sandwich Place,Breakfast Spot,Burger Joint,Food & Drink Shop,Yoga Studio,Dumpling Restaurant,Doner Restaurant
6,M4R,Central Toronto,0,Sporting Goods Shop,Coffee Shop,Yoga Studio,Diner,Spa,Sandwich Place,Salon / Barbershop,Mexican Restaurant,Rental Car Location,Café
7,M4S,Central Toronto,0,Coffee Shop,Pizza Place,Dessert Shop,Sandwich Place,Café,Italian Restaurant,Pharmacy,Seafood Restaurant,Sushi Restaurant,Burger Joint
8,M4T,Central Toronto,0,Gym,Playground,Intersection,Trail,Restaurant,Cosmetics Shop,Coworking Space,Farmers Market,Falafel Restaurant,Event Space
9,M4V,Central Toronto,0,Coffee Shop,Pub,Light Rail Station,American Restaurant,Sushi Restaurant,Bagel Shop,Fried Chicken Joint,Sports Bar,Medical Center,Sandwich Place
11,M4X,Downtown Toronto,0,Restaurant,Coffee Shop,Pizza Place,Indian Restaurant,Café,Pub,Italian Restaurant,Market,Bakery,Park
12,M4Y,Downtown Toronto,0,Japanese Restaurant,Coffee Shop,Burger Joint,Sushi Restaurant,Gay Bar,Restaurant,Pub,Mediterranean Restaurant,Gastropub,Café


#### Cluster 2

In [44]:
examine_clusters(1)

Unnamed: 0,Postcode,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,East Toronto,1,Neighborhood,Coffee Shop,Other Great Outdoors,Pub,Dim Sum Restaurant,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant


#### Cluster 3

In [45]:
examine_clusters(2)

Unnamed: 0,Postcode,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,M4W,Downtown Toronto,2,Park,Playground,Trail,Yoga Studio,Diner,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store
23,M5P,Central Toronto,2,Park,Trail,Jewelry Store,Sushi Restaurant,Yoga Studio,Discount Store,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant


#### Cluster 4

In [46]:
examine_clusters(3)

Unnamed: 0,Postcode,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,M4N,Central Toronto,3,Bus Line,Park,Swim School,Dim Sum Restaurant,Yoga Studio,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant


#### Cluster 5

In [47]:
examine_clusters(4)

Unnamed: 0,Postcode,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,M5N,Central Toronto,4,Garden,Yoga Studio,Discount Store,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant


So it appears that areas that contain many coffee shops and restaurants are grouped together in Cluster 1. Cluster 3, 4, and 5 have different types of recreational venues.