<a href="https://colab.research.google.com/github/NehaAwasthi07/Coursera_Capstone/blob/main/Segmenting_and_Clustering_Neighborhoods_in_Toronto.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Segmenting and Clustering Neighborhoods in Toronto**
# **Introduction**
In this lab, we will explore, segment, and cluster the neighborhoods in the city of Toronto.

For the Toronto neighborhood data, we will obtain the information we need to explore and cluster the neighborhoods in Toronto from a Wikipedia page https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M. We will scrape the page and wrangle the data, clean it, and read it into a pandas dataframe.

Once the data is in a structured format, we will analyse the data to explore and cluster the neighborhoods in the city of Toronto


Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [1]:
import numpy as np 
from bs4 import BeautifulSoup
import pandas as pd 
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import json 
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans # import k-means from clustering stage
!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

print('Libraries imported.')

/bin/bash: conda: command not found
/bin/bash: conda: command not found
Libraries imported.


# **Convert data from the Wikipedia page into a structured Dataset**
Scrape the Wikipedia page and wrangle the data, clean it, and then read it into a pandas  dataframe

In [2]:
url =  'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
data = requests.get(url).text
soup = BeautifulSoup(data, 'html5lib')
table_contents=[]
table=soup.find('table')
for row in table.findAll('td'):
    cell = {}
    if row.span.text=='Not assigned':
        pass
    else:
        cell['PostalCode'] = row.p.text[:3]
        cell['Borough'] = (row.span.text).split('(')[0]
        cell['Neighborhood'] = (((((row.span.text).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
        table_contents.append(cell)

# print(table_contents)
df=pd.DataFrame(table_contents)
df['Borough']=df['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government


# **Download and Explore Dataset**

In [3]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government


Group all neighborhoods with the same postal code

In [4]:
df = df.groupby(["PostalCode", "Borough"])["Neighborhood"].apply(", ".join).reset_index()
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [5]:
df.shape

(103, 3)

# **Get the latitude and the longitude coordinates of each neighborhood**

In [6]:
df_geocoord = pd.read_csv('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs_v1/Geospatial_Coordinates.csv', index_col='Postal Code')
df_geocoord.head()

Unnamed: 0_level_0,Latitude,Longitude
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1
M1B,43.806686,-79.194353
M1C,43.784535,-79.160497
M1E,43.763573,-79.188711
M1G,43.770992,-79.216917
M1H,43.773136,-79.239476


Join both the dataframes based on postal code

In [7]:
df_toronto = pd.merge(df, df_geocoord, how='left', left_on = 'PostalCode', right_on = 'Postal Code')
# Removing all boroughs which are not assigned and removing any rows with missing longitude or latitude data
df_toronto = df_toronto[df_toronto['Borough'] != 'Not assigned']
df_toronto = df_toronto.dropna(subset=['Latitude', 'Longitude'])
df_toronto.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [8]:
print('The Toronto dataframe has {} boroughs and {} neighborhoods.'.format(len(df_toronto['Borough'].unique()),df_toronto.shape[0]))

The Toronto dataframe has 15 boroughs and 103 neighborhoods.


# **Explore Neighborhoods in Toronto City**
Get the latitude and longitude values of Toronto

In [9]:
address = "Toronto, ON"

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto city are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto city are 43.6534817, -79.3839347.


Create a map of the whole Toronto City with neighborhoods superimposed on top

In [10]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)
# Add markers to the map
for lat, lng, borough, neighborhood in zip(df_toronto['Latitude'], df_toronto['Longitude'], df_toronto['Borough'], df_toronto['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='green',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  

map_toronto

# **Map of a part of Toronto City**
We will work only with Toronto City, so we will work with the boroughs that contain the word "Toronto".

In [11]:
df_toronto_city = df_toronto[df_toronto['Borough'].str.contains("Toronto")].reset_index(drop=True)
df_toronto_city.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4J,East York/East Toronto,The Danforth East,43.685347,-79.338106
2,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
3,M4L,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572
4,M4M,East Toronto,Studio District,43.659526,-79.340923


Plot the map and the markers for this region

In [12]:
map_toronto_city = folium.Map(location=[latitude, longitude], zoom_start=12)
for lat, lng, borough, neighborhood in zip(
        df_toronto_city['Latitude'], 
        df_toronto_city['Longitude'], 
        df_toronto_city['Borough'], 
        df_toronto_city['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='green',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto_city)  

map_toronto_city

# **Utilizing the Foursquare API to explore the neighborhoods and segment them**
Define Foursquare Credentials and Version

In [13]:
CLIENT_ID = 'BJV5WLCNRLEDXB5U0ZICJRCHMU0WQUF1JI0SFS04FAAUU3XP'
CLIENT_SECRET = 'J5ZEUGN12SA52TILQVYLIJJC5PUF13HC12OJSC24E2DCJJDR'
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: BJV5WLCNRLEDXB5U0ZICJRCHMU0WQUF1JI0SFS04FAAUU3XP
CLIENT_SECRET:J5ZEUGN12SA52TILQVYLIJJC5PUF13HC12OJSC24E2DCJJDR


Explore the first neighborhood in our data frame

In [14]:
neighborhood_name = df_toronto_city.loc[0, 'Neighborhood']
print(f"The first neighborhood's name is '{neighborhood_name}'.")

The first neighborhood's name is 'The Beaches'.


Get the latitude and longitude values of the neighborhood

In [15]:
neighborhood_latitude = df_toronto_city.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = df_toronto_city.loc[0, 'Longitude'] # neighborhood longitude value

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of The Beaches are 43.67635739999999, -79.2930312.


Now, let's get the top 100 venues that are in The Beaches within a radius of 500 meters.

In [16]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id=BJV5WLCNRLEDXB5U0ZICJRCHMU0WQUF1JI0SFS04FAAUU3XP&client_secret=J5ZEUGN12SA52TILQVYLIJJC5PUF13HC12OJSC24E2DCJJDR&v=20180605&ll=43.67635739999999,-79.2930312&radius=500&limit=100'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=BJV5WLCNRLEDXB5U0ZICJRCHMU0WQUF1JI0SFS04FAAUU3XP&client_secret=J5ZEUGN12SA52TILQVYLIJJC5PUF13HC12OJSC24E2DCJJDR&v=20180605&ll=43.67635739999999,-79.2930312&radius=500&limit=100'

Send the GET request and examine the results

In [17]:
# get the result to a json file
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '609377ce9d54ea650b4ec209'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-4bd461bc77b29c74a07d9282-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/parks_outdoors/hikingtrail_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d159941735',
         'name': 'Trail',
         'pluralName': 'Trails',
         'primary': True,
         'shortName': 'Trail'}],
       'id': '4bd461bc77b29c74a07d9282',
       'location': {'address': 'Glen Manor',
        'cc': 'CA',
        'city': 'Toronto',
        'country': 'Canada',
        'crossStreet': 'Queen St.',
        'distance': 89,
        'formattedAddress': ['Glen Manor (Queen St.)', 'Toronto ON', 'Canada'],
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.67682

Let's borrow the get_category_type function from the Foursquare lab.

In [18]:
# Function that extracts the category of the venue

def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now let's clean the json and structure it into a pandas dataframe.

In [19]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,Glen Manor Ravine,Trail,43.676821,-79.293942
1,The Big Carrot Natural Food Market,Health Food Store,43.678879,-79.297734
2,Grover Pub and Grub,Pub,43.679181,-79.297215
3,Upper Beaches,Neighborhood,43.680563,-79.292869


In [20]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

4 venues were returned by Foursquare.


# **Explore neighborhoods in a part of Toronto City**
Let's create a function to repeat the same process to all the neighborhoods in Toronto

In [21]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    venues_list=[]
    
    for name, lat, lng in zip(names, latitudes, longitudes):
        # print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id=BJV5WLCNRLEDXB5U0ZICJRCHMU0WQUF1JI0SFS04FAAUU3XP&client_secret=J5ZEUGN12SA52TILQVYLIJJC5PUF13HC12OJSC24E2DCJJDR&v=20180605&ll=43.67635739999999,-79.2930312&radius=500&limit=100'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Now let's write the code to run the above function on each neighborhood and create a new dataframe called toronto_city_venues

In [22]:
toronto_city_venues = getNearbyVenues(names=df_toronto_city['Neighborhood'],
                                   latitudes=df_toronto_city['Latitude'],
                                   longitudes=df_toronto_city['Longitude']
                                  )
toronto_city_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,The Beaches,43.676357,-79.293031,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
2,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
3,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
4,The Danforth East,43.685347,-79.338106,Glen Manor Ravine,43.676821,-79.293942,Trail


Let's check the size of the resulting dataframe

In [23]:
print(toronto_city_venues.shape)
toronto_city_venues.head()

(156, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,The Beaches,43.676357,-79.293031,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
2,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
3,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
4,The Danforth East,43.685347,-79.338106,Glen Manor Ravine,43.676821,-79.293942,Trail


Let's check how many venues were returned for each neighborhood

In [24]:
toronto_city_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,4,4,4,4,4,4
"Brockton, Parkdale Village, Exhibition Place",4,4,4,4,4,4
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",4,4,4,4,4,4
Central Bay Street,4,4,4,4,4,4
Christie,4,4,4,4,4,4
Church and Wellesley,4,4,4,4,4,4
"Commerce Court, Victoria Hotel",4,4,4,4,4,4
Davisville,4,4,4,4,4,4
Davisville North,4,4,4,4,4,4
"Dufferin, Dovercourt Village",4,4,4,4,4,4


Let's find out how many unique categories can be curated from all the returned venues

In [25]:
print('There are {} uniques categories.'.format(len(toronto_city_venues['Venue Category'].unique())))

There are 4 uniques categories.


# **Analyze Each Neighborhood**

In [26]:
# one hot encoding
toronto_city_onehot = pd.get_dummies(toronto_city_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_city_onehot['Neighborhood'] = toronto_city_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_city_onehot.columns[-1]] + list(toronto_city_onehot.columns[:-1])
toronto_city_onehot = toronto_city_onehot[fixed_columns]

toronto_city_onehot.head()

Unnamed: 0,Trail,Health Food Store,Neighborhood,Pub
0,1,0,The Beaches,0
1,0,1,The Beaches,0
2,0,0,The Beaches,1
3,0,0,The Beaches,0
4,1,0,The Danforth East,0


Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [27]:
toronto_city_grouped = toronto_city_onehot.groupby('Neighborhood').mean().reset_index()
toronto_city_grouped.head()

Unnamed: 0,Neighborhood,Trail,Health Food Store,Pub
0,Berczy Park,0.25,0.25,0.25
1,"Brockton, Parkdale Village, Exhibition Place",0.25,0.25,0.25
2,"CN Tower, King and Spadina, Railway Lands, Har...",0.25,0.25,0.25
3,Central Bay Street,0.25,0.25,0.25
4,Christie,0.25,0.25,0.25


In [28]:
toronto_city_grouped.shape

(39, 4)

Display each neighborhood along with its top 3 most common venues in a dataframe

In [29]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 3

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_city_grouped['Neighborhood']

for ind in np.arange(toronto_city_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_city_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,Berczy Park,Pub,Health Food Store,Trail
1,"Brockton, Parkdale Village, Exhibition Place",Pub,Health Food Store,Trail
2,"CN Tower, King and Spadina, Railway Lands, Har...",Pub,Health Food Store,Trail
3,Central Bay Street,Pub,Health Food Store,Trail
4,Christie,Pub,Health Food Store,Trail


# **Cluster Neighborhoods**
Run k-means to cluster the neighborhood into 5 clusters

In [30]:
# set number of clusters
kclusters = 5

toronto_city_grouped_clustering = toronto_city_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_city_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

  import sys


array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 3 venues for each neighborhood.

In [31]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_city_merged = df_toronto_city

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_city_merged = toronto_city_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_city_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,0,Pub,Health Food Store,Trail
1,M4J,East York/East Toronto,The Danforth East,43.685347,-79.338106,0,Pub,Health Food Store,Trail
2,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,0,Pub,Health Food Store,Trail
3,M4L,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572,0,Pub,Health Food Store,Trail
4,M4M,East Toronto,Studio District,43.659526,-79.340923,0,Pub,Health Food Store,Trail


Finally, let's visualize the resulting clusters

In [32]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(
        toronto_city_merged['Latitude'], 
        toronto_city_merged['Longitude'], 
        toronto_city_merged['Neighborhood'], 
        toronto_city_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# **Examine Clusters**
Now, Let's examine each cluster and determine the discriminating venue categories that distinguish each cluster.

**Cluster 1**

In [33]:
toronto_city_merged.loc[toronto_city_merged['Cluster Labels'] == 0, toronto_city_merged.columns[[1] + list(range(5, toronto_city_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,East Toronto,0,Pub,Health Food Store,Trail
1,East York/East Toronto,0,Pub,Health Food Store,Trail
2,East Toronto,0,Pub,Health Food Store,Trail
3,East Toronto,0,Pub,Health Food Store,Trail
4,East Toronto,0,Pub,Health Food Store,Trail
5,Central Toronto,0,Pub,Health Food Store,Trail
6,Central Toronto,0,Pub,Health Food Store,Trail
7,Central Toronto,0,Pub,Health Food Store,Trail
8,Central Toronto,0,Pub,Health Food Store,Trail
9,Central Toronto,0,Pub,Health Food Store,Trail


**Cluster 2**

In [34]:
toronto_city_merged.loc[toronto_city_merged['Cluster Labels'] == 1, toronto_city_merged.columns[[1] + list(range(5, toronto_city_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue


**Cluster 3**

In [35]:
toronto_city_merged.loc[toronto_city_merged['Cluster Labels'] == 2, toronto_city_merged.columns[[1] + list(range(5, toronto_city_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue


**Cluster 4**

In [36]:
toronto_city_merged.loc[toronto_city_merged['Cluster Labels'] == 3, toronto_city_merged.columns[[1] + list(range(5, toronto_city_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue


**Cluster 5**

In [37]:
toronto_city_merged.loc[toronto_city_merged['Cluster Labels'] == 4, toronto_city_merged.columns[[1] + list(range(5, toronto_city_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
