# Segmenting and Clustering Neighborhoods in Toronto
One notebook is used for all three parts of this assignment. Each part will be clearly labeled though!

## Part 0: Load packages

In [1]:
%%capture

import numpy as np
import pandas as pd

# For web scraping
from bs4 import BeautifulSoup
import requests

# Latitude/longitude related packages
!pip install pgeocode
import pgeocode
!pip install geopy
import geopy.distance

# Visualization on a map
!pip install folium
import folium 

# Clustering
from sklearn.cluster import KMeans

## Part 1: Scrape Wikipedia page for Toronto

First use Requests and BeautifulSoup to scrape the "List of postal codes of Canada: M"-Wikipedia page. Store all tables in a variable.

In [2]:
# Get the html 
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
html = requests.get(url).text

# Turn into a beautiful soup
soup = BeautifulSoup(html, 'html5lib')

# Find all html tables
tables = soup.find_all('table')
print(f"{len(tables)} tables were found")

# Find the correct table index
for index,table in enumerate(tables):
    if ("M1A" in str(table)):
        tableIndex = index

table = tables[tableIndex]


3 tables were found


Now obtain the contents of each cell in the correct table (the one that actually contains the postal codes etc.). If the cell contains the string "Not assigned", the cell is passed and not stored in the dataframe called ``neighborhoodsToronto``. The assignment states the following:
> if a cell has a borough but a "Not assigned" neighborhood, then the neighborhood will be the same as the borough.
 
but a visual inspection of the Wikipedia pages shows that this does not occur anywhere in the dataset. The only time that "Not assigned" is part of the cell, no borough names are given. It will therefore suffice to filter out _all_ cells that contain "Not assigned". There are a number of odd neighborhood/borough names, which are likely the special-purpose codes mentioned in the Wikipedia page. These are cleaned up manually. 

In [3]:
# Get the content of each cell
tableContents = [];

for row in table.findAll('td'):
    cell = {}
    
    if ("Not assigned" in str(row)):
        pass
    else:
        cell['PostalCode'] = row.p.text[:3]
        cell['Borough'] = (row.span.text).split('(')[0]
        cell['Neighborhood'] = (((((row.span.text).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
        tableContents.append(cell)

# Transform into dataframe
neighborhoodsToronto = pd.DataFrame(tableContents)

# Clean up some odd borough/neighborhood names
neighborhoodsToronto['Borough'] = neighborhoodsToronto['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest',
                                             'East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})
neighborhoodsToronto.head(12)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills North
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


In [4]:
neighborhoodsToronto.shape

(103, 3)

## Part 2: Obtain latitude/longitude of each neighborhood

Use the ``pgeocode`` Python package to obtain the latitude and longitude of each neighborhood. The ``geocoder`` package given in the assignment has not worked properly and requires _many_ function calls. I have also tested it out using the example given on the website of the package (for Mountain View, CA), but this also returned None for >200 tries. That is not sustainable, so the decision was made to change packages.

In [5]:
# Initialize variables for latitude and longitude
latitude  = np.empty(neighborhoodsToronto.shape[0])
longitude = np.empty(neighborhoodsToronto.shape[0])

# Loop over all postal codes with the pgeocode package
canadaGeoCode = pgeocode.Nominatim('ca')
for postalCodeIndex in neighborhoodsToronto.index:
    postalCode = neighborhoodsToronto.loc[postalCodeIndex, 'PostalCode']
    
    locationInformation = canadaGeoCode.query_postal_code(postalCode)
    
    latitude[postalCodeIndex]  = locationInformation.latitude
    longitude[postalCodeIndex] = locationInformation.longitude    
    #print("%s, latitude: %.3f, longitude: %.3f" % (postalCode, latitude[postalCodeIndex], longitude[postalCodeIndex]))
    
# Add latitude/longitude to neighborhoodsToronto dataframe
neighborhoodsToronto['Latitude']  = latitude
neighborhoodsToronto['Longitude'] = longitude

neighborhoodsToronto.head(12)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.7545,-79.33
1,M4A,North York,Victoria Village,43.7276,-79.3148
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.6555,-79.3626
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.7223,-79.4504
4,M7A,Queen's Park,Ontario Provincial Government,43.6641,-79.3889
5,M9A,Etobicoke,Islington Avenue,43.6662,-79.5282
6,M1B,Scarborough,"Malvern, Rouge",43.8113,-79.193
7,M3B,North York,Don Mills North,43.745,-79.359
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.7063,-79.3094
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.6572,-79.3783


## Part 3: Explore and cluster neighborhoods of Toronto

Let's first check out the different boroughs that exist in the dataset

In [6]:
boroughs = neighborhoodsToronto['Borough'].unique()
boroughs

array(['North York', 'Downtown Toronto', "Queen's Park", 'Etobicoke',
       'Scarborough', 'East York', 'York', 'East Toronto', 'West Toronto',
       'East York/East Toronto', 'Central Toronto', 'Mississauga',
       'Downtown Toronto Stn A', 'Etobicoke Northwest',
       'East Toronto Business'], dtype=object)

Create a new dataframe with only the neighborhoods in Downtown Toronto, so we can further explore this borough.

In [7]:
downtownToronto = neighborhoodsToronto[ neighborhoodsToronto['Borough'] == 'Downtown Toronto'].reset_index(drop=True)

print(f'There are {downtownToronto.shape[0]} neighborhoods in Downtown Toronto.')

There are 17 neighborhoods in Downtown Toronto.


Let's visualize these neighborhoods on a map!

In [8]:
# Create map of centered on the mean latitude and longitude values of the neighborhoods
latitude_mean  = downtownToronto['Latitude'].mean()
longitude_mean = downtownToronto['Longitude'].mean()

mapDowntownToronto = folium.Map(location=[latitude_mean, longitude_mean], zoom_start=13)

# Add markers to map for each neighborhood
for latitude, longitude, label in zip(downtownToronto['Latitude'], downtownToronto['Longitude'], downtownToronto['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color='#198d8d',
        fill=True,
        fill_color='#339999',
        fill_opacity=0.7,
        parse_html=False).add_to(mapDowntownToronto)  

# Display map
mapDowntownToronto

Get distance between all neighborhoods and create a dataframe with the minimum distance to the next neighborhood. Just so we have an idea for the radius we should use for Foursquare "get"s. 

In [9]:
def CalculateDistanceToClosestNeighborhoods( boroughData ):

    numberOfNeighborhoods = boroughData.shape[0]
    distances = np.empty((numberOfNeighborhoods, numberOfNeighborhoods))

    for neighborhood_1 in range(numberOfNeighborhoods):
        coordinates_1 = [ boroughData['Latitude'][neighborhood_1], boroughData['Longitude'][neighborhood_1] ]
        for neighborhood_2 in range(numberOfNeighborhoods):
            if neighborhood_1 == neighborhood_2:
                distances[neighborhood_1][neighborhood_2] = np.inf
            else:
                coordinates_2 = [ boroughData['Latitude'][neighborhood_2], boroughData['Longitude'][neighborhood_2] ]
                distances[neighborhood_1][neighborhood_2] = geopy.distance.geodesic( coordinates_1, coordinates_2).m
        
    # Get the minimum distance for each neighborhood
    distanceToNextNeigborhood = pd.DataFrame( {'Neighborhood' : boroughData['Neighborhood'], 'distanceToNext' : distances.min(0) } )
    return distanceToNextNeigborhood

distanceToNextNeigborhood = CalculateDistanceToClosestNeighborhoods( downtownToronto )
distanceToNextNeigborhood

Unnamed: 0,Neighborhood,distanceToNext
0,"Regent Park, Harbourfront",1147.864451
1,"Garden District, Ryerson",627.463944
2,St. James Town,588.735566
3,Berczy Park,575.132361
4,Central Bay Street,627.463944
5,Christie,1857.830145
6,"Richmond, Adelaide, King",92.107987
7,"Harbourfront East, Union Station, Toronto Islands",1990.998183
8,"Toronto Dominion Centre, Design Exchange",255.542356
9,"Commerce Court, Victoria Hotel",0.0


It turns out that "Commerce Court, Victoria Hotel" and "First Canadian Place, Underground city" are in the same location. Let's remove the second one such that we don't get duplicate data in our analysis.

In [10]:
downtownToronto = downtownToronto[ downtownToronto['Neighborhood'] != "First Canadian Place, Underground city"].reset_index(drop=True)

distanceToNextNeigborhood = CalculateDistanceToClosestNeighborhoods( downtownToronto )
distanceToNextNeigborhood

Unnamed: 0,Neighborhood,distanceToNext
0,"Regent Park, Harbourfront",1147.864451
1,"Garden District, Ryerson",627.463944
2,St. James Town,588.735566
3,Berczy Park,575.132361
4,Central Bay Street,627.463944
5,Christie,1857.830145
6,"Richmond, Adelaide, King",92.107987
7,"Harbourfront East, Union Station, Toronto Islands",1990.998183
8,"Toronto Dominion Centre, Design Exchange",255.542356
9,"Commerce Court, Victoria Hotel",92.107987


Using this data, let's calculate the average distance and then use half of this distance as the radius to search for venues later. This is obviously an approximation, but for the sake of this assignment it will suffice to use one radius rather than a separate radius for each neighborhood. 

In [11]:
meanDistance = distanceToNextNeigborhood['distanceToNext'].mean()
radius = meanDistance / 2.
print('The average distance between neighborhoods is %.1f m.' % meanDistance)
print('We will therefore take a radius of %.1f m.' % radius)

The average distance between neighborhoods is 947.1 m.
We will therefore take a radius of 473.5 m.


### Utilize Foursquare to get information on venues in Downtown Toronto

Define the credentials and version.

In [12]:
CLIENT_ID     = 'W345G4OUK4TUKIHHNU5OISVS0CBXJMALDN1MQAIUP42AT15E' 
CLIENT_SECRET = 'QFSQJGIOVRDV04UYE5OOU22RES510Z4SV23BOXPBOITC1CKA'
ACCESS_TOKEN  = 'ERLITQ2TBC3ZKLTAQG5X00FGI1RFLQX33TJJIFBOVTENKSBS' 
VERSION       = '20210701' 

Get the top 100 venues within the radius calculated above. 

In [13]:
def getNeighborhoodVenues(neighborhoods, latitudes, longitudes, radius, limit):
    
    venues = []
    for neighborhood, latitude, longitude in zip(neighborhoods, latitudes, longitudes):
            
        # Create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            latitude, 
            longitude, 
            radius, 
            limit)
            
        # Make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # Save the venue's name, location and category in the venues-list
        venues.append([(
            neighborhood, 
            latitude, 
            longitude, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']) for venue in results])

    venueDataFrame = pd.DataFrame([item for venue in venues for item in venue])
    venueDataFrame.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(venueDataFrame)

limit = 100
venues = getNeighborhoodVenues(downtownToronto['Neighborhood'], downtownToronto['Latitude'], downtownToronto['Longitude'], radius=radius, limit=limit)
venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.6555,-79.3626,Tandem Coffee,43.653559,-79.361809,Coffee Shop
1,"Regent Park, Harbourfront",43.6555,-79.3626,Roselle Desserts,43.653447,-79.362017,Bakery
2,"Regent Park, Harbourfront",43.6555,-79.3626,Figs Breakfast & Lunch,43.655675,-79.364503,Breakfast Spot
3,"Regent Park, Harbourfront",43.6555,-79.3626,The Yoga Lounge,43.655515,-79.364955,Yoga Studio
4,"Regent Park, Harbourfront",43.6555,-79.3626,Sumach Espresso,43.658135,-79.359515,Coffee Shop


Let's find out how many venues that is for each neighborhood separately, as well as the different types of venues for each neighborhood. 

In [14]:
venues.groupby('Neighborhood')[['Venue']].count()

Unnamed: 0_level_0,Venue
Neighborhood,Unnamed: 1_level_1
Berczy Park,76
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",56
Central Bay Street,50
Christie,9
Church and Wellesley,71
"Commerce Court, Victoria Hotel",100
"Garden District, Ryerson",100
"Harbourfront East, Union Station, Toronto Islands",4
"Kensington Market, Chinatown, Grange Park",53
"Regent Park, Harbourfront",22


In [15]:
venues.groupby(['Neighborhood', 'Venue Category'])[['Venue']].count()

Unnamed: 0_level_0,Unnamed: 1_level_0,Venue
Neighborhood,Venue Category,Unnamed: 2_level_1
Berczy Park,Art Gallery,2
Berczy Park,Bagel Shop,1
Berczy Park,Bakery,4
Berczy Park,Basketball Stadium,1
Berczy Park,Beer Bar,2
...,...,...
"University of Toronto, Harbord",Restaurant,1
"University of Toronto, Harbord",Sandwich Place,1
"University of Toronto, Harbord",Theater,1
"University of Toronto, Harbord",Video Game Store,1


Analyze each neighborhood by creating a dataframe with "one-hot encoding", i.e. a dataframe which is mostly 0s but each row will contain one 1 to indicate the venue category. 

Note: Rather than having the first column called 'Neighborhood', it is called 'Hood' because there are two places with a venue category named 'Neighborhood'. 

In [16]:
venues_onehot = pd.get_dummies(venues[['Venue Category']], prefix="", prefix_sep="")

# Add neighborhood column to dataframe and move to first column
venues_onehot['Hood'] = venues['Neighborhood'] 

columns = [venues_onehot.columns[-1]] + list(venues_onehot.columns[:-1])
venues_onehot = venues_onehot[columns]

venues_onehot.head()

Unnamed: 0,Hood,Adult Boutique,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Baby Store,Bagel Shop,...,Thai Restaurant,Theater,Theme Restaurant,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Yoga Studio
0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
4,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


The rows will be grouped for each neighborhood, turning the values in the columns in the frequence of this venue category in this neighborhood. 

In [17]:
venuesFrequency = venues_onehot.groupby('Hood').mean().reset_index()
venuesFrequency.head()

Unnamed: 0,Hood,Adult Boutique,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Baby Store,Bagel Shop,...,Thai Restaurant,Theater,Theme Restaurant,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Yoga Studio
0,Berczy Park,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.013158,...,0.013158,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.013158
1,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,...,0.017857,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.017857,0.017857
2,Central Bay Street,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.02,0.0,0.0
3,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Church and Wellesley,0.014085,0.014085,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.014085,0.014085,0.014085,0.0,0.0,0.0,0.0,0.0,0.0,0.028169


Get the top-5 of venue categories for each neighborhood. Note that there are a few neighborhoods with very few venues (because they're locating in a port in the sea, most likely), so we will have to set those to "None".

In [18]:
def getTopTen( neighborhoodData ):
    # Remove first column (this is the neighborhood name)
    venueCategories = neighborhoodData.iloc[1:] 
    
    # Remove columns wich are 0
    venueCategories = venueCategories[ venueCategories.iloc[:] > 0]
    venueCategories = venueCategories.sort_values(ascending=False)
    
    # Prepare for result dataframe
    columns = ['Neighborhood']
    data = [neighborhoodData.iloc[0]]
    # Extract the top 10, if too short append None's
    venueCategoriesTopTen = []    
    for index in range(10):
        try:
            data.append(venueCategories.index.values[index])
        except:
            data.append('None')
        
        columns.append(f'Popularity {index + 1}')
    
    #data = [neighborhoodData.iloc[0]]
    #data.append(venueCategoriesTopTen)
    result = pd.DataFrame(columns = columns)
    result.loc[0,:] = data
    
    return result.reset_index()

In [19]:
for neighborhoodIndex in range(venuesFrequency.shape[0]):
    if neighborhoodIndex == 0:
        venuePopularity = getTopTen( venuesFrequency.iloc[neighborhoodIndex, :] )
    else:
        venuePopularity_new = getTopTen( venuesFrequency.iloc[neighborhoodIndex, :] )
        venuePopularity = pd.concat([venuePopularity, venuePopularity_new], ignore_index=True)
    
venuePopularity.drop('index', axis=1, inplace=True)
venuePopularity

Unnamed: 0,Neighborhood,Popularity 1,Popularity 2,Popularity 3,Popularity 4,Popularity 5,Popularity 6,Popularity 7,Popularity 8,Popularity 9,Popularity 10
0,Berczy Park,Coffee Shop,Seafood Restaurant,Bakery,Café,Pub,Cocktail Bar,Restaurant,Sandwich Place,Italian Restaurant,Hotel
1,"CN Tower, King and Spadina, Railway Lands, Har...",Coffee Shop,Park,Bar,Café,Italian Restaurant,Gym / Fitness Center,Bakery,Speakeasy,French Restaurant,Grocery Store
2,Central Bay Street,Coffee Shop,Italian Restaurant,Restaurant,Bubble Tea Shop,Café,Clothing Store,Sushi Restaurant,Sandwich Place,Art Museum,New American Restaurant
3,Christie,Café,Grocery Store,Baby Store,Candy Store,Coffee Shop,Park,,,,
4,Church and Wellesley,Japanese Restaurant,Gay Bar,Coffee Shop,Sushi Restaurant,Restaurant,Yoga Studio,Bubble Tea Shop,Hotel,Men's Store,Fast Food Restaurant
5,"Commerce Court, Victoria Hotel",Coffee Shop,Hotel,Café,Restaurant,Japanese Restaurant,Gym,Steakhouse,Deli / Bodega,Asian Restaurant,Salad Place
6,"Garden District, Ryerson",Coffee Shop,Clothing Store,Cosmetics Shop,Hotel,Italian Restaurant,Middle Eastern Restaurant,Café,Movie Theater,Ramen Restaurant,Fast Food Restaurant
7,"Harbourfront East, Union Station, Toronto Islands",Café,Harbor / Marina,Music Venue,Park,,,,,,
8,"Kensington Market, Chinatown, Grange Park",Café,Bar,Vegetarian / Vegan Restaurant,Gaming Cafe,Vietnamese Restaurant,Bakery,Burger Joint,Caribbean Restaurant,Grocery Store,Arts & Crafts Store
9,"Regent Park, Harbourfront",Coffee Shop,Breakfast Spot,Bakery,Pub,Theater,Thai Restaurant,Sushi Restaurant,Spa,Restaurant,Health Food Store


The different neighborhoods can now be grouped using the k-means clustering algorithm. 

In [20]:
numberOfClusters = 6

# Run k-means clustering
clusters = KMeans(n_clusters=numberOfClusters, random_state=0).fit(venuesFrequency.drop('Hood',1))

# Add to the dataframe
venuePopularity.insert(0, 'Cluster', clusters.labels_)
venuePopularity

Unnamed: 0,Cluster,Neighborhood,Popularity 1,Popularity 2,Popularity 3,Popularity 4,Popularity 5,Popularity 6,Popularity 7,Popularity 8,Popularity 9,Popularity 10
0,1,Berczy Park,Coffee Shop,Seafood Restaurant,Bakery,Café,Pub,Cocktail Bar,Restaurant,Sandwich Place,Italian Restaurant,Hotel
1,1,"CN Tower, King and Spadina, Railway Lands, Har...",Coffee Shop,Park,Bar,Café,Italian Restaurant,Gym / Fitness Center,Bakery,Speakeasy,French Restaurant,Grocery Store
2,1,Central Bay Street,Coffee Shop,Italian Restaurant,Restaurant,Bubble Tea Shop,Café,Clothing Store,Sushi Restaurant,Sandwich Place,Art Museum,New American Restaurant
3,2,Christie,Café,Grocery Store,Baby Store,Candy Store,Coffee Shop,Park,,,,
4,1,Church and Wellesley,Japanese Restaurant,Gay Bar,Coffee Shop,Sushi Restaurant,Restaurant,Yoga Studio,Bubble Tea Shop,Hotel,Men's Store,Fast Food Restaurant
5,1,"Commerce Court, Victoria Hotel",Coffee Shop,Hotel,Café,Restaurant,Japanese Restaurant,Gym,Steakhouse,Deli / Bodega,Asian Restaurant,Salad Place
6,1,"Garden District, Ryerson",Coffee Shop,Clothing Store,Cosmetics Shop,Hotel,Italian Restaurant,Middle Eastern Restaurant,Café,Movie Theater,Ramen Restaurant,Fast Food Restaurant
7,3,"Harbourfront East, Union Station, Toronto Islands",Café,Harbor / Marina,Music Venue,Park,,,,,,
8,1,"Kensington Market, Chinatown, Grange Park",Café,Bar,Vegetarian / Vegan Restaurant,Gaming Cafe,Vietnamese Restaurant,Bakery,Burger Joint,Caribbean Restaurant,Grocery Store,Arts & Crafts Store
9,4,"Regent Park, Harbourfront",Coffee Shop,Breakfast Spot,Bakery,Pub,Theater,Thai Restaurant,Sushi Restaurant,Spa,Restaurant,Health Food Store


Visualize this on a map! But first we need to add in the latitude and longitude of each neighborhood again, which is done using a ``join``.

In [21]:
resultDataFrame = downtownToronto.join(venuePopularity.set_index('Neighborhood'), on='Neighborhood')
resultDataFrame

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster,Popularity 1,Popularity 2,Popularity 3,Popularity 4,Popularity 5,Popularity 6,Popularity 7,Popularity 8,Popularity 9,Popularity 10
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.6555,-79.3626,4,Coffee Shop,Breakfast Spot,Bakery,Pub,Theater,Thai Restaurant,Sushi Restaurant,Spa,Restaurant,Health Food Store
1,M5B,Downtown Toronto,"Garden District, Ryerson",43.6572,-79.3783,1,Coffee Shop,Clothing Store,Cosmetics Shop,Hotel,Italian Restaurant,Middle Eastern Restaurant,Café,Movie Theater,Ramen Restaurant,Fast Food Restaurant
2,M5C,Downtown Toronto,St. James Town,43.6513,-79.3756,1,Coffee Shop,Café,Cosmetics Shop,Cocktail Bar,American Restaurant,Gym,Lingerie Store,Gastropub,Moroccan Restaurant,Farmers Market
3,M5E,Downtown Toronto,Berczy Park,43.6456,-79.3754,1,Coffee Shop,Seafood Restaurant,Bakery,Café,Pub,Cocktail Bar,Restaurant,Sandwich Place,Italian Restaurant,Hotel
4,M5G,Downtown Toronto,Central Bay Street,43.6564,-79.386,1,Coffee Shop,Italian Restaurant,Restaurant,Bubble Tea Shop,Café,Clothing Store,Sushi Restaurant,Sandwich Place,Art Museum,New American Restaurant
5,M6G,Downtown Toronto,Christie,43.6683,-79.4205,2,Café,Grocery Store,Baby Store,Candy Store,Coffee Shop,Park,,,,
6,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.6496,-79.3833,1,Café,Coffee Shop,Gym,Japanese Restaurant,Asian Restaurant,Thai Restaurant,Steakhouse,Salad Place,Restaurant,Hotel
7,M5J,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",43.623,-79.3936,3,Café,Harbor / Marina,Music Venue,Park,,,,,,
8,M5K,Downtown Toronto,"Toronto Dominion Centre, Design Exchange",43.6469,-79.3823,1,Coffee Shop,Hotel,Café,Restaurant,Seafood Restaurant,Salad Place,Japanese Restaurant,Sushi Restaurant,Steakhouse,Italian Restaurant
9,M5L,Downtown Toronto,"Commerce Court, Victoria Hotel",43.6492,-79.3823,1,Coffee Shop,Hotel,Café,Restaurant,Japanese Restaurant,Gym,Steakhouse,Deli / Bodega,Asian Restaurant,Salad Place


In [22]:
# Create map of centered on the mean latitude and longitude values of the neighborhoods
mapDowntownToronto_clusters = folium.Map(location=[latitude_mean, longitude_mean], 
                                         tiles='Stamen Toner', 
                                         zoom_start=13)


# Define colors for the clusters
colors = ['purple', 'blue', 'cyan', 'yellow', 'orange', 'red']

# Add markers to map for each neighborhood
for latitude, longitude, neighborhood, cluster in zip(resultDataFrame['Latitude'], resultDataFrame['Longitude'], resultDataFrame['Neighborhood'], resultDataFrame['Cluster'] ):
    labelText = neighborhood + ', cluster ' + str(cluster)
    label = folium.Popup(labelText, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color=colors[cluster],
        fill=True,
        fill_color=colors[cluster],
        fill_opacity=0.7,
        parse_html=False).add_to(mapDowntownToronto_clusters)  

# Display map
mapDowntownToronto_clusters