<h1>Data Analysis on Toronto Neighborhoods and Venues</h1>
<h2> In this notebook, we will be collecting data on Toronto neighborhoods and performing k-means clustering to determine which Toronto neighborhoods are the most similar.</h2>

<h3>Part 1 (Web-scraping and data pre-processing):</h3>
<h4>First, install and import all packages necessary to perform your data analysis.</h4>

In [1]:
!pip install beautifulsoup4
!pip install lxml
!pip install requests
!pip install geocoder
!pip install folium

import requests
import pandas as pd
import geocoder
import json 
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium
import numpy as np

from bs4 import BeautifulSoup as bsoup
from geopy.geocoders import Nominatim 
from pandas.io.json import json_normalize
from sklearn.cluster import KMeans

print('Packages fully loaded and installed.')

Packages fully loaded and installed.


<h4>Now that we have installed all of the packages necessary, we will be using BeautifulSoup to scrape the Wikipedia page of Toronto postal codes.</h4>
<p>We will be using this list of all Toronto neighborhoods to complete our data analysis.</p>

In [2]:
#define URL source
source = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text

#create a Beautiful Soup object and define lxml as the parser
soup= bsoup(source, 'lxml')

#isolate the list of toronto postal codes and add them to an empty list
table= soup.find('table',class_ = 'wikitable')
tabledata = table.tbody.text.split('\n\n')
tableitems = []

#only append postal codes that have an assigned neighborhood and Borough. If no neighborhood exists but a borough does, the borough name is assigned to the neighborhood
for rows in tabledata:
    temp  = rows.split('\n')[1:]
    if (temp != []):
       if (temp[1]!='Not assigned'):
        if(temp[2] == 'Not assigned'):
            temp[2] = temp[1]
        tableitems.append(temp)

<h4>Next, create a Pandas dataframe from the list of neighborhoods we extracted.</h4>

In [3]:
toronto_df = pd.DataFrame(tableitems)

#assign column headers to dataframe
new_header = toronto_df.iloc[0]
toronto_df = toronto_df[1:]
toronto_df.columns = new_header

#group neighborhoods by postcode and assign the list of neighborhoods to the Neighborhood column
toronto_df['Neighborhood'] = toronto_df.groupby(['Postcode', 'Borough'], axis=0)['Neighborhood'].transform(lambda x: ', '.join(x))

#remove duplicates and reset index to clean up dataframe
toronto_df.drop_duplicates(inplace = True)
toronto_df.reset_index(inplace=True)
toronto_df = toronto_df.drop(['index'], axis = 1)
toronto_df.head()

Unnamed: 0,Postcode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M7A,Downtown Toronto,Queen's Park


In [4]:
toronto_df.shape

(103, 3)

<h3>Part 2 (Additon of Latitude and Longitude Values to DataFrame):</h3>
<h4>To begin, read Toronto Postal Code Latitude and Longitude data into a pandas dataframe.</h4>

In [5]:
#import data into pandas df
LatLong = pd.read_csv("http://cocl.us/Geospatial_data")
LatLong.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


<h4>Then, merge The Laitude/Longitude DataFrame with the Postal Code DataFrame to get a complete view of Toronto Neighborhood data.</h4>

In [6]:
#merge dataframes on postcode
df_toronto = pd.merge(toronto_df, LatLong, left_on = 'Postcode', right_on ='Postal Code')

#remove duplicate columns
df_toronto = df_toronto.drop(['Postal Code'], axis = 1)
df_toronto= df_toronto.rename(columns={"Postcode": "PostalCode"})
df_toronto.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763
4,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494


<h3>Part 3 (Analysis on Toronto Neighborhoods):</h3>
<h4>We will begin our data analysis by creating a visualization of all of the neighborhoods in Toronto using Folium. Use Nominatim to get the latitude/longitude of Toronto.</h4>

In [7]:
address = 'Toronto, Ontario'

#Get lat/long values for Toronto and print them
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


In [8]:
# create map of Toronto using latitude and longitude values
toronto_map = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map for each neighborhood and postcode
for pc, lat, lng, borough, neighborhood in zip(df_toronto['PostalCode'],df_toronto['Latitude'], df_toronto['Longitude'], df_toronto['Borough'], df_toronto['Neighborhood']):
    label = '{}: {}, {}'.format(pc, neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='cadetblue',
        fill=True,
        fill_color='#29BBB9',
        fill_opacity=0.7,
        parse_html=False).add_to(toronto_map)  
    
toronto_map

<h4>We are interested in analyzing and clustering neighborhoods in Downtown Toronto. To start, create a dataframe of only Downtown Toronto Neighborhoods.</h4>

In [9]:
toronto_data = df_toronto[df_toronto['Borough'] == 'Downtown Toronto'].reset_index(drop=True)
toronto_data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
1,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494
2,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306


<h4>Now, create a map of Downtown Toronto Neighborhoods to visualize the region we will be working with and the neighborhoods we will be clustering.</h4>

In [10]:
# create map of Toronto using latitude and longitude values
downtown_map = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='cadetblue',
        fill=True,
        fill_color='#29BBB9',
        fill_opacity=0.7,
        parse_html=False).add_to(downtown_map)  
    
downtown_map

<h4>We will be using the Foursquare API to get data on Venues in Downtown Toronto neighborhoods. Define your Foursquare credentials in variables as shown below.</h4>
<p>If you do not have an account, you may create one at the <a href= 'https://developer.foursquare.com/'>Foursquare Developer Portal</a> by clicking on "Create Account".</p>

In [11]:
#foursquare credentials
CLIENT_ID = 'GPEZ2BHSQDPGBGF2NFK3QT2PZRFIJJS015OE1XMUICVQM1HK' # your Foursquare ID
CLIENT_SECRET = 'RO4RBQEB2YOIN13PUIPLFCHF4XN1YU40ZVPCMITB2Y0D1JEA' # your Foursquare Secret
VERSION = '20190106'
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: GPEZ2BHSQDPGBGF2NFK3QT2PZRFIJJS015OE1XMUICVQM1HK
CLIENT_SECRET:RO4RBQEB2YOIN13PUIPLFCHF4XN1YU40ZVPCMITB2Y0D1JEA


<h4>Next, define a function to get all of the nearby venue data for each neighborhood in downtown Toronto and append the relevant information to a pandas DataFrame.</h4>

In [12]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [13]:
#call the getNEarbyVenues function on Downtown Toronto Neighborhood data
LIMIT = 100
radius = 500
toronto_venues = getNearbyVenues(names=toronto_data['Neighborhood'],latitudes=toronto_data['Latitude'],longitudes=toronto_data['Longitude'])

Harbourfront
Queen's Park
Ryerson, Garden District
St. James Town
Berczy Park
Central Bay Street
Christie
Adelaide, King, Richmond
Harbourfront East, Toronto Islands, Union Station
Design Exchange, Toronto Dominion Centre
Commerce Court, Victoria Hotel
Harbord, University of Toronto
Chinatown, Grange Park, Kensington Market
CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara
Rosedale
Stn A PO Boxes 25 The Esplanade
Cabbagetown, St. James Town
First Canadian Place, Underground city
Church and Wellesley


<h4>Now that we have our data on Toronto Venues, use the head function to visualize the Dataframe and explore how many categories there are in our new DataFrame.</h4>

In [14]:
print(toronto_venues.shape)
toronto_venues.head()

(1327, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Harbourfront,43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,Harbourfront,43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,Harbourfront,43.65426,-79.360636,Cooper Koo Family YMCA,43.653191,-79.357947,Gym / Fitness Center
3,Harbourfront,43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,Harbourfront,43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant


In [15]:
print('There are {} unique categories.'.format(len(toronto_venues['Venue Category'].unique())))
toronto_venues.groupby('Neighborhood').count()

There are 210 unique categories.


Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",100,100,100,100,100,100
Berczy Park,57,57,57,57,57,57
"CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara",18,18,18,18,18,18
"Cabbagetown, St. James Town",46,46,46,46,46,46
Central Bay Street,86,86,86,86,86,86
"Chinatown, Grange Park, Kensington Market",92,92,92,92,92,92
Christie,16,16,16,16,16,16
Church and Wellesley,86,86,86,86,86,86
"Commerce Court, Victoria Hotel",100,100,100,100,100,100
"Design Exchange, Toronto Dominion Centre",100,100,100,100,100,100


<h4>To begin analyzing the types of venues that are most popular in Toronto Neighborhoods, perform one-hot encoding to see what categories of venues are in which neighborhoods.</h4>

In [16]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

#rename neighborhood category to 'Neighborhood Category', as "Neighborhood" was returned as a venue category
toronto_onehot.rename({"Neighborhood": "Neighborhood Category"}, axis='columns', inplace=True)

#add neighborhood names back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood']

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1] ]+ list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Neighborhood,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Harbourfront,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Harbourfront,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Harbourfront,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Harbourfront,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Harbourfront,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [17]:
toronto_onehot.shape

(1327, 211)

<h4> Now that we have done one-hot encoding, we can see which venue types occur most often in each neighborhood. This can be done by grouping our one-hot dataframe by neighborhood and taking the mean of the frequency of each category.</h4>
<p>We will be using this data later, so go ahead and assign the grouping and mean to a new dataframe.</p>

In [18]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,...,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.01,0.0
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"CN Tower, Bathurst Quay, Island airport, Harbo...",0.055556,0.055556,0.055556,0.111111,0.166667,0.111111,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Cabbagetown, St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.011628,0.0,0.0,...,0.0,0.0,0.011628,0.0,0.0,0.011628,0.0,0.0,0.0,0.011628
5,"Chinatown, Grange Park, Kensington Market",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.054348,0.0,0.054348,0.01087,0.0,0.0,0.0,0.0
6,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Church and Wellesley,0.0,0.0,0.0,0.0,0.0,0.0,0.011628,0.0,0.0,...,0.0,0.0,0.0,0.0,0.011628,0.0,0.011628,0.011628,0.0,0.023256
8,"Commerce Court, Victoria Hotel",0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,...,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0
9,"Design Exchange, Toronto Dominion Centre",0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,...,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0


<h4>We can now see what the top 5 categories of each neighborhood are. Let's print them out.</h4>

In [19]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----Top Five Venues in "+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['Venue','Frequency']
    temp = temp.iloc[1:]
    temp['Frequency'] = temp['Frequency'].astype(float)
    temp = temp.round({'Frequency': 2})
    print(temp.sort_values('Frequency', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Top Five Venues in Adelaide, King, Richmond----
              Venue  Frequency
0       Coffee Shop       0.08
1        Steakhouse       0.04
2              Café       0.04
3               Bar       0.04
4  Sushi Restaurant       0.03


----Top Five Venues in Berczy Park----
          Venue  Frequency
0   Coffee Shop       0.07
1  Cocktail Bar       0.05
2          Café       0.04
3        Bakery       0.04
4   Cheese Shop       0.04


----Top Five Venues in CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara----
                 Venue  Frequency
0      Airport Service       0.17
1       Airport Lounge       0.11
2     Airport Terminal       0.11
3              Airport       0.06
4  Rental Car Location       0.06


----Top Five Venues in Cabbagetown, St. James Town----
                Venue  Frequency
0         Coffee Shop       0.07
1                Café       0.07
2  Italian Restaurant       0.04
3              Bakery       0.

<h4>We are now interested in clustering these neighborhoods by the venues that are most common in them. To do this, define a function that will return the most common venues for each neighbrhood and create a dataframe with the relevant information.</h4>

In [20]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [21]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Steakhouse,Café,Bar,Restaurant,Burger Joint,Hotel,Sushi Restaurant,Asian Restaurant,Thai Restaurant
1,Berczy Park,Coffee Shop,Cocktail Bar,Café,Farmers Market,Seafood Restaurant,Steakhouse,Bakery,Cheese Shop,Beer Bar,Japanese Restaurant
2,"CN Tower, Bathurst Quay, Island airport, Harbo...",Airport Service,Airport Lounge,Airport Terminal,Harbor / Marina,Sculpture Garden,Airport Food Court,Airport Gate,Bar,Boat or Ferry,Boutique
3,"Cabbagetown, St. James Town",Coffee Shop,Café,Pub,Italian Restaurant,Park,Pizza Place,Chinese Restaurant,Restaurant,Bakery,General Entertainment
4,Central Bay Street,Coffee Shop,Sandwich Place,Café,Italian Restaurant,Japanese Restaurant,Burger Joint,Ice Cream Shop,Bubble Tea Shop,Bar,Bakery


<h4> Now, lets go ahead and cluster the neighborhoods in Downtown Toronto. Start by dropping the Neighborhood name from the dataframe you just created and then run k-means clustering using SciKit Learn.</h4>

In [22]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 2, 0, 0, 0, 4, 0, 0, 0], dtype=int32)

<h4>Add the cluster labels to the dataframe and merge it with the original Downtown Toronto dataframe we created earlier.</h4>

In [23]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged =toronto_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636,3,Coffee Shop,Pub,Bakery,Park,Breakfast Spot,Restaurant,Café,Mexican Restaurant,Yoga Studio,Cosmetics Shop
1,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494,3,Coffee Shop,College Cafeteria,Gym,Park,Yoga Studio,Café,Smoothie Shop,Italian Restaurant,Japanese Restaurant,Sandwich Place
2,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937,0,Coffee Shop,Clothing Store,Café,Cosmetics Shop,Middle Eastern Restaurant,Bakery,Sporting Goods Shop,Italian Restaurant,Lingerie Store,Bubble Tea Shop
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,0,Café,Coffee Shop,Restaurant,Hotel,Clothing Store,Beer Bar,Bakery,Breakfast Spot,Cosmetics Shop,BBQ Joint
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,0,Coffee Shop,Cocktail Bar,Café,Farmers Market,Seafood Restaurant,Steakhouse,Bakery,Cheese Shop,Beer Bar,Japanese Restaurant


<h4>Now that we have our complete dataframe of Toronto Neighborhoods, go ahead and plot the neighborhoods using Folium to show the different neighborhood clusters in different colors.</h4>

In [24]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters