# Applied Data Science Capstone course
## Segmenting and Clustering Neighborhoods in Toronto¶

Norma Ruiz - Mayo 2019

### Table of Contents
### 1. Download and Clean Neighbourhoods data of Toronto
### 2. Add geographical coordinates to Toronto pandas dataframe
### 3. Explore and Cluster the Neighbourhoods in Toronto

Import libraries

In [1]:
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
import matplotlib.cm as cm
import matplotlib.colors as colors
import io
print('Libraries imported')

Libraries imported


### 1. Download and Clean Neighbourhoods data of Toronto

#### Scrape wikipedia postal codes page

In [2]:
website_url = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M").text
soup = BeautifulSoup(website_url,'lxml')
#print(soup.prettify())

#### Extrack HTML table

In [3]:
postal_table = soup.find('table',{'class':'wikitable sortable'})
#postal_table

#### Extract headings

In [4]:
head = postal_table.find_all('th')
headings = [th.text.strip() for th in head]
print(headings)
newpostal_df = pd.DataFrame(columns=headings)

['Postcode', 'Borough', 'Neighbourhood']


#### Extract table data

In [5]:
content = postal_table.find_all('td')
values = [td.text.strip() for td in content]
print(values)
print('Type: ',type(values))
print('Count: ',len(values))

['M1A', 'Not assigned', 'Not assigned', 'M2A', 'Not assigned', 'Not assigned', 'M3A', 'North York', 'Parkwoods', 'M4A', 'North York', 'Victoria Village', 'M5A', 'Downtown Toronto', 'Harbourfront', 'M5A', 'Downtown Toronto', 'Regent Park', 'M6A', 'North York', 'Lawrence Heights', 'M6A', 'North York', 'Lawrence Manor', 'M7A', "Queen's Park", 'Not assigned', 'M8A', 'Not assigned', 'Not assigned', 'M9A', 'Etobicoke', 'Islington Avenue', 'M1B', 'Scarborough', 'Rouge', 'M1B', 'Scarborough', 'Malvern', 'M2B', 'Not assigned', 'Not assigned', 'M3B', 'North York', 'Don Mills North', 'M4B', 'East York', 'Woodbine Gardens', 'M4B', 'East York', 'Parkview Hill', 'M5B', 'Downtown Toronto', 'Ryerson', 'M5B', 'Downtown Toronto', 'Garden District', 'M6B', 'North York', 'Glencairn', 'M7B', 'Not assigned', 'Not assigned', 'M8B', 'Not assigned', 'Not assigned', 'M9B', 'Etobicoke', 'Cloverdale', 'M9B', 'Etobicoke', 'Islington', 'M9B', 'Etobicoke', 'Martin Grove', 'M9B', 'Etobicoke', 'Princess Gardens', 'M9B

#### Convert to dataframe

If Borough is "Not assigned" it is ommited.
If Neighbourhood is "Not assigned" it will be assigned with Borough name.

In [6]:
j = 0
for i in range(0,len(values),3):
    Postcode = values[i]
    Borough = values[i+1]
    Neighbourhood = values[i+2]
    if Borough != "Not assigned":
        if Neighbourhood == "Not assigned":
            Neighbourhood = Borough
        newpostal_df.loc[j] = [Postcode,Borough,Neighbourhood]
        j += 1
    
newpostal_df.sort_values(by=['Postcode'], inplace=True) 
newpostal_df.reset_index(inplace=True) 
newpostal_df.drop('index', axis=1, inplace=True)
print(newpostal_df.shape)
newpostal_df.head(20)

(211, 3)


Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,Rouge
1,M1B,Scarborough,Malvern
2,M1C,Scarborough,Port Union
3,M1C,Scarborough,Rouge Hill
4,M1C,Scarborough,Highland Creek
5,M1E,Scarborough,Guildwood
6,M1E,Scarborough,Morningside
7,M1E,Scarborough,West Hill
8,M1G,Scarborough,Woburn
9,M1H,Scarborough,Cedarbrae


#### Merge Neighbourhood when same Postal Codes

In [7]:
# df.set_value(i,'ifor',ifor_val)
last_Postcode = ""
last_Neighbourhood = ""
Neighbourhood = ""
for idx, row in newpostal_df.iterrows():
    
    if last_Postcode != row['Postcode']:
        j = idx
        Neighbourhood = ""
        #print(last_Postcode, row['Postcode'], j)
    else:
        Neighbourhood = newpostal_df.at[j, 'Neighbourhood'] + ", " + row['Neighbourhood'] 
        newpostal_df.at[j, 'Neighbourhood'] = Neighbourhood
        newpostal_df.drop([idx], inplace=True)
        #print(last_Postcode, row['Postcode'], j, index)

    last_Postcode = row['Postcode']
    last_Neighbourhood = row['Neighbourhood']
    
newpostal_df.head(15)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge, Malvern"
2,M1C,Scarborough,"Port Union, Rouge Hill, Highland Creek"
5,M1E,Scarborough,"Guildwood, Morningside, West Hill"
8,M1G,Scarborough,Woburn
9,M1H,Scarborough,Cedarbrae
10,M1J,Scarborough,Scarborough Village
11,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park"
14,M1L,Scarborough,"Golden Mile, Oakridge, Clairlea"
17,M1M,Scarborough,"Cliffcrest, Scarborough Village West, Cliffside"
20,M1N,Scarborough,"Cliffside West, Birch Cliff"


#### Dataframe after merging Neighbourhoods in same Postal Codes

In [8]:
newpostal_df.shape

(103, 3)

### 2. Add geographical coordinates to Toronto pandas dataframe

#### Read csv file with postal codes and geographical coordinates

In [9]:
url = 'http://cocl.us/Geospatial_data'
urlData = requests.get(url).content
codes_df = pd.read_csv(io.StringIO(urlData.decode('utf-8')))
codes_df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [50]:
codes_df.shape

(103, 3)

#### Join dataframe newpostal_df with codes_df (geographical coordinates dataframe)

In [10]:
# Inner join of two dataframes
geo_df = newpostal_df.set_index('Postcode').join(codes_df.set_index('Postal Code'), how="inner")
geo_df.reset_index(inplace=True) 
geo_df.rename(columns={'index': 'Postcode'}, inplace=True)
print(geo_df.shape)
geo_df.head()

(103, 5)


Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Port Union, Rouge Hill, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [12]:
!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/DSX-Python35

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    folium-0.5.0               |             py_0          45 KB  conda-forge
    altair-2.2.2               |           py35_1         462 KB  conda-forge
    branca-0.3.1               |             py_0          25 KB  conda-forge
    openssl-1.0.2r             |       h14c3975_0         3.1 MB  conda-forge
    certifi-2018.8.24          |        py35_1001         139 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    ca-certificates-2019.3.9   |       hecc5488_0         146 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         4.0 MB

The following NEW packages will

#### Visualize Postal codes in a map

In [21]:
toronto_map = folium.Map(location=[43.653908, -79.384293], zoom_start=10) # generate map centred around Toronto
# add popular spots to the map as blue circle markers
for lat, lng, label in zip(geo_df['Latitude'], geo_df['Longitude'], geo_df['Postcode']):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        fill=True,
        color='blue',
        fill_color='blue',
        fill_opacity=0.6
        ).add_to(toronto_map)
# display map
toronto_map

### 3. Explore and Cluster the Neighbourhoods in Toronto

#### Explore neighbourhoods

Get top 100 venues and their categories for all neighbourhoods in Toronto

Import libraries

In [22]:
# library to handle JSON files
import json 
# convert an address into latitude and longitude values
from geopy.geocoders import Nominatim 
# library to handle requests
import requests 
# tranform JSON file into a pandas dataframe
from pandas.io.json import json_normalize 
from sklearn.cluster import KMeans
print('Libraries imported')

Libraries imported


Foursquare Credentials

In [23]:
CLIENT_ID = 'CW2HOTRKSJE2UU0D4KAOXX1WOE3E2BNTE0EQI0SUEVDROLOH' # Foursquare ID
CLIENT_SECRET = 'FYUP3CWSXYUK0SHQWSR04G130M51X551SANCCKPKEZSDRUXU' # Foursquare Secret
VERSION = '20180605' # Foursquare API version
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: CW2HOTRKSJE2UU0D4KAOXX1WOE3E2BNTE0EQI0SUEVDROLOH
CLIENT_SECRET:FYUP3CWSXYUK0SHQWSR04G130M51X551SANCCKPKEZSDRUXU


Toronto geographical coordinates

In [24]:
address = 'Toronto, Ontario'
geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(location, latitude, longitude)

Toronto, Ontario, M6K 1X9, Canada 43.653963 -79.387207


Function extracts the category of a venue

In [25]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Function extracts venues/categories for all neighbourhoods in Toronto

In [26]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    return(nearby_venues)

Call getNearbyVenues function

In [27]:
radius = 500
LIMIT = 100
toronto_venues = getNearbyVenues(names=geo_df['Postcode'],
                                   latitudes=geo_df['Latitude'],
                                   longitudes=geo_df['Longitude']
                                  )

M1B
M1C
M1E
M1G
M1H
M1J
M1K
M1L
M1M
M1N
M1P
M1R
M1S
M1T
M1V
M1W
M1X
M2H
M2J
M2K
M2L
M2M
M2N
M2P
M2R
M3A
M3B
M3C
M3H
M3J
M3K
M3L
M3M
M3N
M4A
M4B
M4C
M4E
M4G
M4H
M4J
M4K
M4L
M4M
M4N
M4P
M4R
M4S
M4T
M4V
M4W
M4X
M4Y
M5A
M5B
M5C
M5E
M5G
M5H
M5J
M5K
M5L
M5M
M5N
M5P
M5R
M5S
M5T
M5V
M5W
M5X
M6A
M6B
M6C
M6E
M6G
M6H
M6J
M6K
M6L
M6M
M6N
M6P
M6R
M6S
M7A
M7R
M7Y
M8V
M8W
M8X
M8Y
M8Z
M9A
M9B
M9C
M9L
M9M
M9N
M9P
M9R
M9V
M9W


toronto_venues dataframe contains venues for postal codes in Toronto

In [28]:
print(toronto_venues.shape)
toronto_venues.head()

(2244, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,M1B,43.806686,-79.194353,Wendy's,43.807448,-79.199056,Fast Food Restaurant
1,M1C,43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
2,M1E,43.763573,-79.188711,Swiss Chalet Rotisserie & Grill,43.767697,-79.189914,Pizza Place
3,M1E,43.763573,-79.188711,G & G Electronics,43.765309,-79.191537,Electronics Store
4,M1E,43.763573,-79.188711,Marina Spa,43.766,-79.191,Spa


#### Keep only neighbourhoods with more than 10 venues

In [29]:
toronto_venues10 = toronto_venues.groupby('Neighborhood').filter(lambda x : len(x)>10)
toronto_venues10.shape

(1995, 7)

Now we have 1995 venues instead of 2244

#### List of venues per neighbourhood

In [30]:
toronto_venues10.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
M1W,13,13,13,13,13,13
M2J,66,66,66,66,66,66
M2N,34,34,34,34,34,34
M3C,21,21,21,21,21,21
M3H,19,19,19,19,19,19
M4B,13,13,13,13,13,13
M4G,33,33,33,33,33,33
M4H,17,17,17,17,17,17
M4K,44,44,44,44,44,44
M4L,22,22,22,22,22,22


#### Unique categories

In [31]:
print('There are {} uniques categories.'.format(len(toronto_venues10['Venue Category'].unique())))

There are 249 uniques categories.


#### Dummy coding - Categories

Lets do dummy coding of venue categories. Get 1 column for each category (total 249).
Include neighbourhood column as the first column in new dataframe (toronto_onehot)

In [32]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues10[['Venue Category']], prefix="", prefix_sep="")
# add neighborhood column back to dataframe
toronto_onehot['Nborhood'] = toronto_venues10['Neighborhood'] 
# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]
toronto_onehot.head()

Unnamed: 0,Nborhood,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
72,M1W,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
73,M1W,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
74,M1W,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
75,M1W,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
76,M1W,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


#### Group dataframe by Nborhood using the mean of each venue category

In [33]:
toronto_grouped = toronto_onehot.groupby('Nborhood').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Nborhood,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,M1W,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,M2J,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.015152,0.0,0.0,0.0,0.0,0.015152,0.015152,0.0
2,M2N,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0
3,M3C,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,M3H,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0


Function to return the most frequent categories per neighbourhood

In [34]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#### Build new dataframe with the top 10 categories for each neighbourhood

In [35]:
num_top_venues = 10
indicators = ['st', 'nd', 'rd']
# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Nborhood']
for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)
neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1W,Grocery Store,Chinese Restaurant,Fast Food Restaurant,Japanese Restaurant,Pizza Place,Coffee Shop,Sandwich Place,Breakfast Spot,Electronics Store,Pharmacy
1,M2J,Clothing Store,Fast Food Restaurant,Coffee Shop,Bus Station,Tea Room,Bakery,Asian Restaurant,Toy / Game Store,Food Court,Japanese Restaurant
2,M2N,Coffee Shop,Ramen Restaurant,Café,Sandwich Place,Restaurant,Sushi Restaurant,Discount Store,Japanese Restaurant,Indonesian Restaurant,Steakhouse
3,M3C,Coffee Shop,Asian Restaurant,Gym,Beer Store,General Entertainment,Italian Restaurant,Japanese Restaurant,Fast Food Restaurant,Discount Store,Dim Sum Restaurant
4,M3H,Coffee Shop,Fried Chicken Joint,Pet Store,Pharmacy,Restaurant,Middle Eastern Restaurant,Deli / Bodega,Bridal Shop,Sandwich Place,Diner


In [36]:
neighborhoods_venues_sorted.shape

(44, 11)

#### Cluster neighbourhoods

Use KMeans to cluster toronto_grouped dataframe. 
Drop Nborhood columns because KMeans only accept numerical data.

After tried with different number of clusters we decided to use 6 cluster.

In [37]:
# set number of clusters
kclusters = 6
toronto_grouped_clustering = toronto_grouped.drop('Nborhood', 1)
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)
# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([1, 3, 3, 3, 3, 0, 3, 1, 3, 1, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 1, 3, 3, 3, 4, 3, 3, 2, 5, 3, 3, 3, 3, 3, 3, 3, 1, 0, 1], dtype=int32)

Add cluster number to neighborhoods_venues_sorted dataframe.
Merge the dataframe with geo_df to obtain the geographical coordinates of neighbourhoods.
Use inner join since we filtered neighbourhoods to keep only those with more than 10 venues.

In [38]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
toronto_merged = geo_df
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.set_index('Postcode').join(neighborhoods_venues_sorted.set_index('Neighborhood'), how="inner")
toronto_merged.reset_index(inplace=True)
toronto_merged.head() 

Unnamed: 0,index,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1W,Scarborough,L'Amoreaux West,43.799525,-79.318389,1,Grocery Store,Chinese Restaurant,Fast Food Restaurant,Japanese Restaurant,Pizza Place,Coffee Shop,Sandwich Place,Breakfast Spot,Electronics Store,Pharmacy
1,M2J,North York,"Fairview, Oriole, Henry Farm",43.778517,-79.346556,3,Clothing Store,Fast Food Restaurant,Coffee Shop,Bus Station,Tea Room,Bakery,Asian Restaurant,Toy / Game Store,Food Court,Japanese Restaurant
2,M2N,North York,Willowdale South,43.77012,-79.408493,3,Coffee Shop,Ramen Restaurant,Café,Sandwich Place,Restaurant,Sushi Restaurant,Discount Store,Japanese Restaurant,Indonesian Restaurant,Steakhouse
3,M3C,North York,"Don Mills South, Flemingdon Park",43.7259,-79.340923,3,Coffee Shop,Asian Restaurant,Gym,Beer Store,General Entertainment,Italian Restaurant,Japanese Restaurant,Fast Food Restaurant,Discount Store,Dim Sum Restaurant
4,M3H,North York,"Wilson Heights, Downsview North, Bathurst Manor",43.754328,-79.442259,3,Coffee Shop,Fried Chicken Joint,Pet Store,Pharmacy,Restaurant,Middle Eastern Restaurant,Deli / Bodega,Bridal Shop,Sandwich Place,Diner


#### Display map with the neighbourhoods colored by cluster to witch they belong

In [39]:
#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

In [40]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)
# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['index'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
map_clusters

#### Cluster 1

In [41]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,East York,0,Pizza Place,Fast Food Restaurant,Gym / Fitness Center,Gastropub,Pharmacy,Rock Climbing Spot,Breakfast Spot,Bank,Athletics & Sports,Intersection
42,Etobicoke,0,Pizza Place,Fried Chicken Joint,Pet Store,Pharmacy,Restaurant,Café,Sandwich Place,Mexican Restaurant,Seafood Restaurant,Fast Food Restaurant


#### Cluster 2

In [42]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Scarborough,1,Grocery Store,Chinese Restaurant,Fast Food Restaurant,Japanese Restaurant,Pizza Place,Coffee Shop,Sandwich Place,Breakfast Spot,Electronics Store,Pharmacy
7,East York,1,Indian Restaurant,Sandwich Place,Yoga Studio,Supermarket,Grocery Store,Gym / Fitness Center,Housing Development,Liquor Store,Park,Pizza Place
9,East Toronto,1,Park,Sandwich Place,Coffee Shop,Pub,Brewery,Liquor Store,Light Rail Station,Burger Joint,Fast Food Restaurant,Burrito Place
13,Central Toronto,1,Coffee Shop,Pub,Pizza Place,Convenience Store,Supermarket,Bagel Shop,Fried Chicken Joint,Sports Bar,Sushi Restaurant,American Restaurant
25,North York,1,Fast Food Restaurant,Coffee Shop,Italian Restaurant,Pharmacy,Thai Restaurant,Liquor Store,Sandwich Place,Juice Bar,Butcher,Restaurant
41,East Toronto,1,Light Rail Station,Yoga Studio,Auto Workshop,Garden Center,Garden,Fast Food Restaurant,Farmers Market,Comic Shop,Park,Recording Studio
43,Etobicoke,1,Grocery Store,Convenience Store,Supplement Shop,Sandwich Place,Bakery,Discount Store,Thrift / Vintage Store,Burger Joint,Burrito Place,Gym


#### Cluster 3

In [43]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
32,North York,2,Furniture / Home Store,Clothing Store,Accessories Store,Gift Shop,Boutique,Miscellaneous Shop,Arts & Crafts Store,Event Space,Vietnamese Restaurant,Women's Store


#### Cluster 4

In [44]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,North York,3,Clothing Store,Fast Food Restaurant,Coffee Shop,Bus Station,Tea Room,Bakery,Asian Restaurant,Toy / Game Store,Food Court,Japanese Restaurant
2,North York,3,Coffee Shop,Ramen Restaurant,Café,Sandwich Place,Restaurant,Sushi Restaurant,Discount Store,Japanese Restaurant,Indonesian Restaurant,Steakhouse
3,North York,3,Coffee Shop,Asian Restaurant,Gym,Beer Store,General Entertainment,Italian Restaurant,Japanese Restaurant,Fast Food Restaurant,Discount Store,Dim Sum Restaurant
4,North York,3,Coffee Shop,Fried Chicken Joint,Pet Store,Pharmacy,Restaurant,Middle Eastern Restaurant,Deli / Bodega,Bridal Shop,Sandwich Place,Diner
6,East York,3,Sporting Goods Shop,Coffee Shop,Burger Joint,Furniture / Home Store,Smoothie Shop,Restaurant,Dessert Shop,Bank,Fish & Chips Shop,Sports Bar
8,East Toronto,3,Greek Restaurant,Coffee Shop,Ice Cream Shop,Bookstore,Italian Restaurant,Furniture / Home Store,Japanese Restaurant,Dessert Shop,Caribbean Restaurant,Bakery
10,East Toronto,3,Café,Coffee Shop,Bakery,Italian Restaurant,American Restaurant,Middle Eastern Restaurant,Stationery Store,Fish Market,Latin American Restaurant,Bookstore
11,Central Toronto,3,Clothing Store,Coffee Shop,Sporting Goods Shop,Gym / Fitness Center,Furniture / Home Store,Fast Food Restaurant,Diner,Metro Station,Mexican Restaurant,Dessert Shop
12,Central Toronto,3,Sandwich Place,Pizza Place,Dessert Shop,Coffee Shop,Italian Restaurant,Café,Sushi Restaurant,Pharmacy,Brewery,Restaurant
14,Downtown Toronto,3,Coffee Shop,Bakery,Restaurant,Italian Restaurant,Café,Pub,Market,Pizza Place,Pharmacy,Breakfast Spot


#### Cluster 5

In [45]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
29,Downtown Toronto,4,Airport Service,Airport Lounge,Airport Terminal,Plane,Sculpture Garden,Boutique,Boat or Ferry,Harbor / Marina,Airport Gate,Airport


#### Cluster 6

In [46]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 5, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
33,Downtown Toronto,5,Grocery Store,Café,Park,Nightclub,Diner,Baby Store,Italian Restaurant,Restaurant,Athletics & Sports,Convenience Store
