# Segmenting and Clustering Neighbourhoods in Toronto

**Problem 1**

Use the Notebook to build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe.

In order to create the above dataframe:

* The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood
* Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
* More than one neighborhood can exist in one postal code area. 
* If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.
* Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.
* In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.


In [2]:
pip install bs4 #To install Beautifhul soup package

Collecting bs4
  Downloading https://files.pythonhosted.org/packages/10/ed/7e8b97591f6f456174139ec089c769f89a94a1a4025fe967691de971f314/bs4-0.0.1.tar.gz
Collecting beautifulsoup4 (from bs4)
[?25l  Downloading https://files.pythonhosted.org/packages/66/25/ff030e2437265616a1e9b25ccc864e0371a0bc3adb7c5a404fd661c6f4f6/beautifulsoup4-4.9.1-py3-none-any.whl (115kB)
[K     |████████████████████████████████| 122kB 7.1MB/s eta 0:00:01
[?25hCollecting soupsieve>1.2 (from beautifulsoup4->bs4)
  Downloading https://files.pythonhosted.org/packages/6f/8f/457f4a5390eeae1cc3aeab89deb7724c965be841ffca6cfca9197482e470/soupsieve-2.0.1-py3-none-any.whl
Building wheels for collected packages: bs4
  Building wheel for bs4 (setup.py) ... [?25ldone
[?25h  Stored in directory: /home/jupyterlab/.cache/pip/wheels/a0/b0/b2/4f80b9456b87abedbc0bf2d52235414c3467d8889be38dd472
Successfully built bs4
Installing collected packages: soupsieve, beautifulsoup4, bs4
Successfully installed beautifulsoup4-4.9.1 bs4-0.0.

In [86]:
from bs4 import BeautifulSoup
import requests #library to handle requests
import pandas as pd
import numpy as np

In [2]:
Link = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
Source = requests.get(Link).text


In [3]:
soup = BeautifulSoup(Source)

In [4]:
table = soup.find('table')

In [5]:
#Define the dataframe to consist of three columns: PostalCode, Borough and Neighborhoods
columns = ["PostalCode","Borough","Neighbourhoods"]
df = pd.DataFrame(columns=columns)

In [6]:
for tr in table.find_all('tr'):
    row_data = []
    for td in tr.find_all('td'):
        row_data.append(td.text.strip())
    if len(row_data) ==3:
        df.loc[len(df)] = row_data  

In [7]:
df.head()

Unnamed: 0,PostalCode,Borough,Neighbourhoods
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


### Data Cleaning

In [38]:
df = df[df['Borough'] != 'Not assigned']
df = df[df['Neighbourhoods'] !='Not assigned']
df.head()

Unnamed: 0,PostalCode,Borough,Neighbourhoods
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [39]:
grouped_df=df.groupby('PostalCode')['Neighbourhoods'].apply(lambda x: "%s" % ', '.join(x))
grouped_df.head()

PostalCode
M1B                            Malvern, Rouge
M1C    Rouge Hill, Port Union, Highland Creek
M1E         Guildwood, Morningside, West Hill
M1G                                    Woburn
M1H                                 Cedarbrae
Name: Neighbourhoods, dtype: object

In [40]:
grouped_df=grouped_df.reset_index(drop=False)
grouped_df.rename(columns = {'Neighbourhoods':'Neighborhood_joined'},inplace=True)
grouped_df.head()

Unnamed: 0,PostalCode,Neighborhood_joined
0,M1B,"Malvern, Rouge"
1,M1C,"Rouge Hill, Port Union, Highland Creek"
2,M1E,"Guildwood, Morningside, West Hill"
3,M1G,Woburn
4,M1H,Cedarbrae


In [41]:
df_merge = pd.merge(df, grouped_df, on='PostalCode')

In [42]:
df_merge.drop('Neighbourhoods', axis=1, inplace=True)
df_merge.rename(columns={'PostalCode':'Postal Code'},inplace=True)

In [16]:
df_merge.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood_joined
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [23]:
df_merge.shape

(103, 3)

## Problem 2

Use the the csv file from http://cocl.us/Geospatial_data to create a dataframe consisting of Latitudes and Longitudes fro the Postal Codes:

In [12]:
Geo_data = pd.read_csv('http://cocl.us/Geospatial_data')
Geo_data.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [43]:
new_df = pd.merge(df_merge,Geo_data, on='Postal Code')
new_df.drop_duplicates(inplace=True)
new_df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood_joined,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


In [44]:
#To check how many Boroughs and neighbourhoods are in the dataframe
print('There are {} boruoghs and {} neighborhhods in Toronto Canda'.format(len(new_df['Borough'].unique()),new_df['Neighborhood_joined'].shape[0]))

There are 10 boruoghs and 103 neighborhhods in Toronto Canda


In [21]:
# Import other necessary libraries
!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2020.6.20  |       hecda079_0         145 KB  conda-forge
    certifi-2020.6.20          |   py36h9f0ad1d_0         151 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-1.22.0               |     pyh9f0ad1d_0          63 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         393 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-1.22.0-pyh9f0ad1d_0

The following packages will b

* Next is to get the Latitude and Longitude of Toronto Canada

In [24]:
address = 'Toronto Canada'
Geolocator = Nominatim(user_agent='Tr_explorer')
Location = Geolocator.geocode(address)
latitude = Location.latitude
longitude = Location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [45]:
map_Toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(new_df['Latitude'], new_df['Longitude'], new_df['Borough'], new_df['Neighborhood_joined']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Toronto)  
    
map_Toronto

Problem 3

In order to simplify the visuals and explore a particular location, we need to dig deeper into one neighborhood in Toronto 

In [47]:
# I have chosen to explore East Toronto
East_T = new_df[new_df['Borough']=='East Toronto'].reset_index(drop=True)
East_T.head()

Unnamed: 0,Postal Code,Borough,Neighborhood_joined,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
2,M4L,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572
3,M4M,East Toronto,Studio District,43.659526,-79.340923
4,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558


Explore the neighborhoods of East Toronto with Foursquare API

**Define Foursquare credentials and version**

In [53]:
CLIENT_ID = 'XULWLT1NWIPEEIIKBKDQ04OMI0BBOIRCPI30CKXQLCB414CE' # your Foursquare ID
CLIENT_SECRET = 'IYFPNDKEXKT1QTY0QGD2XVVLQWP4HBKGIXCKK0SWMK24TUCY' # your Foursquare Secret
ACCESS_TOKEN = 'XACUHXS1D2MDK40MYPDR1W2Q0KM0V5MKEEO5VYRY4UMN5KTO'
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: XULWLT1NWIPEEIIKBKDQ04OMI0BBOIRCPI30CKXQLCB414CE
CLIENT_SECRET:IYFPNDKEXKT1QTY0QGD2XVVLQWP4HBKGIXCKK0SWMK24TUCY


I will like to explore the Studio District

In [51]:
SD_latitude = East_T.loc[3, 'Latitude'] # neighborhood latitude value
SD_longitude = East_T.loc[3, 'Longitude'] # neighborhood longitude value

SD_name = East_T.loc[3, 'Neighborhood_joined'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(SD_name, 
                                                               SD_latitude, 
                                                               SD_longitude))
radius = 300 #To pull data wihtin 300m of location
LIMIT = 500 #maximum number of places to explore

Latitude and longitude values of Studio District are 43.6595255, -79.340923.


In [57]:
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&oauth_token={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, ACCESS_TOKEN, SD_latitude, SD_longitude, VERSION, radius, LIMIT)
results = requests.get(url).json()
#results

In [66]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [67]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON
# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)
nearby_venues.head()

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,venue.name,venue.categories,venue.location.lat,venue.location.lng
0,Ed's Real Scoop,Ice Cream Shop,43.660656,-79.342019
1,Leslieville Pumps,Sandwich Place,43.660892,-79.340626
2,Queen Books,Bookstore,43.660651,-79.342267
3,The Bone House,Pet Store,43.660894,-79.341097
4,Hooked,Fish Market,43.660407,-79.343257


In [70]:
# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

In [75]:
nearby_venues.tail()

Unnamed: 0,name,categories,lat,lng
40,Thunder Thighs Costumes,Clothing Store,43.661253,-79.341826
41,Hone Fitness,Gym,43.661561,-79.3401
42,Boston Variety,Grocery Store,43.66143,-79.338743
43,Pizza Thick,Pizza Place,43.661464,-79.338656
44,Sprouts,Playground,43.662031,-79.340157


Check how many venues that were returned by Foursquare

In [73]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

45 venues were returned by Foursquare.


In [79]:
def getNearbyVenues(names, latitudes, longitudes, radius=300):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&oauth_token={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET,
            ACCESS_TOKEN,
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Run the function on each neighborhood to create a new data frame

In [80]:
#To explore all East Toronto neighbourhoods
East_Toronto_venues = getNearbyVenues(names=East_T['Neighborhood_joined'],
                                   latitudes=East_T['Latitude'],
                                   longitudes=East_T['Longitude']
                                  )

The Beaches
The Danforth West, Riverdale
India Bazaar, The Beaches West
Studio District
Business reply mail Processing Centre, South Central Letter Processing Plant Toronto


In [81]:
#Check the size of the dataframe
print(East_Toronto_venues.shape)
East_Toronto_venues.head()

(138, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,The Beaches,43.676357,-79.293031,Glen Stewart Park,43.675278,-79.294647,Park
2,The Beaches,43.676357,-79.293031,Balmy Beach Playground,43.676078,-79.290805,Playground
3,The Beaches,43.676357,-79.293031,Best Bathtub in the Beaches,43.674591,-79.293602,Spa
4,"The Danforth West, Riverdale",43.679557,-79.352188,MenEssentials,43.67782,-79.351265,Cosmetics Shop


Analyze each venue

In [82]:
# one hot encoding
East_Toronto_onehot = pd.get_dummies(East_Toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
East_Toronto_onehot['Neighborhood'] = East_Toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [East_Toronto_onehot.columns[-1]] + list(East_Toronto_onehot.columns[:-1])
East_Toronto_onehot = East_Toronto_onehot[fixed_columns]

East_Toronto_onehot.head()

Unnamed: 0,Neighborhood,ATM,American Restaurant,Auto Workshop,Bakery,Bank,Bar,Board Shop,Bookstore,Breakfast Spot,...,Snack Place,Spa,Sports Bar,Sushi Restaurant,Thai Restaurant,Theater,Tibetan Restaurant,Toy / Game Store,Trail,Yoga Studio
0,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0
1,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,The Beaches,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
4,"The Danforth West, Riverdale",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Let's group the data by neighborhoods using the mean

In [83]:
East_Toronto_grouped = East_Toronto_onehot.groupby('Neighborhood').mean().reset_index()
East_Toronto_grouped

Unnamed: 0,Neighborhood,ATM,American Restaurant,Auto Workshop,Bakery,Bank,Bar,Board Shop,Bookstore,Breakfast Spot,...,Snack Place,Spa,Sports Bar,Sushi Restaurant,Thai Restaurant,Theater,Tibetan Restaurant,Toy / Game Store,Trail,Yoga Studio
0,"Business reply mail Processing Centre, South C...",0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"India Bazaar, The Beaches West",0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,...,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0
2,Studio District,0.022222,0.022222,0.0,0.022222,0.022222,0.022222,0.0,0.022222,0.0,...,0.0,0.022222,0.0,0.022222,0.022222,0.022222,0.0,0.022222,0.0,0.022222
3,The Beaches,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0
4,"The Danforth West, Riverdale",0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.032258,0.016129,...,0.016129,0.032258,0.016129,0.032258,0.032258,0.0,0.016129,0.0,0.0,0.016129


In [88]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [97]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = East_Toronto_grouped['Neighborhood']

for ind in np.arange(East_Toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(East_Toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Business reply mail Processing Centre, South C...",Garden,Brewery,Light Rail Station,Farmers Market,Fast Food Restaurant,Park,Auto Workshop,Fried Chicken Joint,Food & Drink Shop,Flower Shop
1,"India Bazaar, The Beaches West",Fast Food Restaurant,Rental Car Location,Pub,Light Rail Station,Liquor Store,Intersection,Movie Theater,Nightlife Spot,Ice Cream Shop,Park
2,Studio District,Coffee Shop,Italian Restaurant,Café,Clothing Store,Diner,Ice Cream Shop,Gym,Grocery Store,Gay Bar,Gastropub
3,The Beaches,Park,Trail,Playground,Spa,Farmers Market,Convenience Store,Cosmetics Shop,Cycle Studio,Dessert Shop,Diner
4,"The Danforth West, Riverdale",Greek Restaurant,Spa,Italian Restaurant,Ice Cream Shop,Gym / Fitness Center,Cosmetics Shop,Restaurant,Juice Bar,Bookstore,Sushi Restaurant


Cluster  Neighborhoods

In [98]:
# import k-means from clustering stage
from sklearn.cluster import KMeans
# set number of clusters
kclusters = 5

Toronto_grouped_clustering = East_Toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([2, 3, 4, 1, 0], dtype=int32)

In [101]:
# add clustering labels
#neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

Toronto_merged = East_T

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
Toronto_merged = Toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood_joined')

Toronto_merged.head() # check the last columns!

Unnamed: 0,Postal Code,Borough,Neighborhood_joined,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,1,Park,Trail,Playground,Spa,Farmers Market,Convenience Store,Cosmetics Shop,Cycle Studio,Dessert Shop,Diner
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,0,Greek Restaurant,Spa,Italian Restaurant,Ice Cream Shop,Gym / Fitness Center,Cosmetics Shop,Restaurant,Juice Bar,Bookstore,Sushi Restaurant
2,M4L,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572,3,Fast Food Restaurant,Rental Car Location,Pub,Light Rail Station,Liquor Store,Intersection,Movie Theater,Nightlife Spot,Ice Cream Shop,Park
3,M4M,East Toronto,Studio District,43.659526,-79.340923,4,Coffee Shop,Italian Restaurant,Café,Clothing Store,Diner,Ice Cream Shop,Gym,Grocery Store,Gay Bar,Gastropub
4,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558,2,Garden,Brewery,Light Rail Station,Farmers Market,Fast Food Restaurant,Park,Auto Workshop,Fried Chicken Joint,Food & Drink Shop,Flower Shop


In [103]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Toronto_merged['Latitude'], Toronto_merged['Longitude'], Toronto_merged['Neighborhood_joined'], Toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters