                          

<h1 align=center><font size = 5>Segmenting and Clustering the Neighborhoods in Toronto</font></h1>

## Introduction

This project works on the segmentating and clustering of the Neighborhoods of the city of Toronto. The neighorhood names are extracted from "List of postal codes of Canada: M" in Wikipedia (https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M). 
The Foursquare API was used to find the information about venues in postcode areas. With K-Means clustering methodology, the postcode areas were grouped based on the top 5 most frequent venues in each postcode.


## Table of Contents

1. <a href="#item1">Extract the List of Neighborhoods and Process the Data </a>
2. <a href="#item2">Get the Latitude and Longitude of the Neighborhoods</a>  
3. <a href="#item3">Clustering the Postcode Areas According to their Top 5 Most Frequent Venues</a>  

## Part 1: Extract the List of Neighborhoods and Process the Data

In [1]:
# import packages

from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np

In [2]:
# scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M
# store the content of the webpage into a string

url_can = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
str_can = requests.get(url_can).text 
type(str_can)

str

In [3]:
#transform the str to html

html_can = BeautifulSoup(str_can, 'lxml') 
type(html_can)

bs4.BeautifulSoup


Now, extract and save targeted data into a dataframe 


In [4]:

# extract the neighborhoods data into table

neighborhoods = html_can.find('table', class_ = 'wikitable')

In [5]:
# extract all tags whose .string matches 'tr' 

neigh = neighborhoods.find_all('tr')

In [6]:
# notice the head from the above printout:
#<th>Postcode</th>
#<th>Borough</th>
#<th>Neighbourhood
#</th>

# define a new Dataframe
heads = ['Postcode','Borough','Neighborhood']
df_can = pd.DataFrame(columns = heads)

# extract each row for ('Postcode', 'Borough', 'Neighbourhood') from the table
# then split the row into three strings
# finally, attach the three strings to the Dataframe

for row in neigh:
    info = row.text.split('\n')[1:-1] 
    pc = info[0]
    br = info[1]
    nbhd= info[2]
    df_can = df_can.append({'Postcode': pc,'Borough': br,'Neighborhood': nbhd}, ignore_index=True)

df_can.head()

Unnamed: 0,Postcode,Borough,Neighborhood
0,Postcode,Borough,Neighbourhood
1,M1A,Not assigned,Not assigned
2,M2A,Not assigned,Not assigned
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village


In [7]:
#drop the first row, which is not real data but just the original column heads on the website

df_can = df_can.iloc[1:]
df_can.head()

Unnamed: 0,Postcode,Borough,Neighborhood
1,M1A,Not assigned,Not assigned
2,M2A,Not assigned,Not assigned
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
5,M5A,Downtown Toronto,Harbourfront


To clean the data, I conducted three procedures:
first, delete all the rows without Borough;
second, for those rest rows without Neighborhood, I assign Borough to Neighborhood;
third, I combined the rows with the same postcode

In [8]:
# drop those data with Borough = "Not assigned"

no_br_index = df_can.index[df_can['Borough'] == 'Not assigned'] # extract the index of rows without "Borough"
df_can.drop(no_br_index, inplace=True) # filter the Dataframe

df_can.shape

(211, 3)

In [9]:
# assign "Borough" to "Neighborhood", if "Borough" was not assigned before

#no_nbhd_index = df_can.index[df_can['Neighborhood'] == 'Not assigned'] 

#for i in no_nbhd_index:
    # df_can['Neighborhood'][i] = df_can['Borough'][i]
        
for i, row in df_can.iterrows():
    if row['Neighborhood'] == 'Not assigned':
        row['Neighborhood'] = row['Borough']
        
df_can.shape

(211, 3)

In [10]:
df = df_can.groupby(['Postcode', 'Borough'])['Neighborhood'].apply(list).apply(lambda x:', '.join(x)).to_frame().reset_index()

#df.columns = ['Postcode', 'Borough', 'Neighborhood']

In [11]:
df.head(20)

Unnamed: 0,Postcode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park"
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge"
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


In [12]:
print('The shape of the processed dataframe is: ',  df.shape)

The shape of the processed dataframe is:  (103, 3)


## Part 2: Get the Latitude and Longitude of the Neighborhoods

In [13]:
# import package to work with streams
import io

# extract the data containing latitude and longitude from http://cocl.us/Geospatial_data
url_geo="http://cocl.us/Geospatial_data"

# extract the data into a string
str_geo=requests.get(url_geo).content

# convert the data into dataframe
geo=pd.read_csv(io.StringIO(str_geo.decode('utf-8')))
geo.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [14]:
# rename the column 'Postal Code' to 'Postcode'
geo.rename(columns={'Postal Code':'Postcode'}, inplace=True)

# merge the dataframe based on the column 'Postcode'
geo = pd.merge(geo, df, on='Postcode')
geo.head()

Unnamed: 0,Postcode,Latitude,Longitude,Borough,Neighborhood
0,M1B,43.806686,-79.194353,Scarborough,"Rouge, Malvern"
1,M1C,43.784535,-79.160497,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,43.763573,-79.188711,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,43.770992,-79.216917,Scarborough,Woburn
4,M1H,43.773136,-79.239476,Scarborough,Cedarbrae


In [15]:
# reorder column names and show the dataframe
geo = geo[['Postcode', 'Borough', 'Neighborhood', 'Latitude', 'Longitude']]
geo.head(20)

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848


## Part 3: Clustering the Postcode Areas According to their Top 5 Most Frequent Venues

In [16]:
!conda install -c conda-forge folium=0.5.0
import folium 

Solving environment: done

# All requested packages already installed.



In [17]:
from geopy.geocoders import Nominatim 

In [18]:
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


In [19]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, postcode, neighborhood in zip(geo['Latitude'], geo['Longitude'], geo['Postcode'], geo['Neighborhood']):
    label = '{}, {}'.format(postcode, neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

In [20]:
# Define Foursquare Credentials and Version

CLIENT_ID = '4JZQHBUQUTMHZC0CYXGFNHS5OCMHVVW3UB31LVWXJFL0JSIJ' # Foursquare ID
CLIENT_SECRET = 'PLBR52XAUYMN0J3QRUKSD3OEHJM0ZHCKPTRYT5SFOSCBMVDO' # Foursquare Secret
VERSION = '20190623' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 4JZQHBUQUTMHZC0CYXGFNHS5OCMHVVW3UB31LVWXJFL0JSIJ
CLIENT_SECRET:PLBR52XAUYMN0J3QRUKSD3OEHJM0ZHCKPTRYT5SFOSCBMVDO


#### Create a function to explore all the neighborhoods

In [21]:
radius = 500
LIMIT =100

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    return(nearby_venues)

In [22]:
toronto_venues = getNearbyVenues(names=geo['Neighborhood'],
                                   latitudes=geo['Latitude'],
                                   longitudes=geo['Longitude'])

Rouge, Malvern
Highland Creek, Rouge Hill, Port Union
Guildwood, Morningside, West Hill
Woburn
Cedarbrae
Scarborough Village
East Birchmount Park, Ionview, Kennedy Park
Clairlea, Golden Mile, Oakridge
Cliffcrest, Cliffside, Scarborough Village West
Birch Cliff, Cliffside West
Dorset Park, Scarborough Town Centre, Wexford Heights
Maryvale, Wexford
Agincourt
Clarks Corners, Sullivan, Tam O'Shanter
Agincourt North, L'Amoreaux East, Milliken, Steeles East
L'Amoreaux West
Upper Rouge
Hillcrest Village
Fairview, Henry Farm, Oriole
Bayview Village
Silver Hills, York Mills
Newtonbrook, Willowdale
Willowdale South
York Mills West
Willowdale West
Parkwoods
Don Mills North
Flemingdon Park, Don Mills South
Bathurst Manor, Downsview North, Wilson Heights
Northwood Park, York University
CFB Toronto, Downsview East
Downsview West
Downsview Central
Downsview Northwest
Victoria Village
Woodbine Gardens, Parkview Hill
Woodbine Heights
The Beaches
Leaside
Thorncliffe Park
East Toronto
The Danforth West, 

In [23]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [24]:
toronto_onehot.shape

(2258, 279)

In [25]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0
1,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Agincourt North, L'Amoreaux East, Milliken, St...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0
4,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [26]:
toronto_grouped.shape

(100, 279)

In [27]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide, King, Richmond----
                 venue  freq
0                 Café  0.05
1          Coffee Shop  0.05
2                  Bar  0.04
3           Steakhouse  0.04
4  American Restaurant  0.04


----Agincourt----
                venue  freq
0      Sandwich Place  0.25
1              Lounge  0.25
2      Breakfast Spot  0.25
3  Chinese Restaurant  0.25
4         Yoga Studio  0.00


----Agincourt North, L'Amoreaux East, Milliken, Steeles East----
                       venue  freq
0                 Playground  0.33
1           Asian Restaurant  0.33
2                       Park  0.33
3                Yoga Studio  0.00
4  Middle Eastern Restaurant  0.00


----Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown----
            venue  freq
0   Grocery Store  0.18
1        Pharmacy  0.09
2      Beer Store  0.09
3    Liquor Store  0.09
4  Sandwich Place  0.09


----Alderwood, Long Branch----
            venue  freq
0     P

                        venue  freq
0  Construction & Landscaping   0.2
1                      Bakery   0.2
2                        Park   0.2
3               Deli / Bodega   0.2
4            Basketball Court   0.2


----East Birchmount Park, Ionview, Kennedy Park----
                venue  freq
0    Department Store  0.25
1      Discount Store  0.25
2          Playground  0.25
3         Coffee Shop  0.25
4  Miscellaneous Shop  0.00


----East Toronto----
                venue  freq
0                Park  0.33
1         Coffee Shop  0.33
2   Convenience Store  0.33
3         Yoga Studio  0.00
4  Mexican Restaurant  0.00


----Emery, Humberlea----
                             venue  freq
0                   Baseball Field   1.0
1                      Yoga Studio   0.0
2              Monument / Landmark   0.0
3  Molecular Gastronomy Restaurant   0.0
4       Modern European Restaurant   0.0


----Fairview, Henry Farm, Oriole----
                  venue  freq
0        Clothing Store  0.12

                             venue  freq
0             Fast Food Restaurant   0.5
1                       Print Shop   0.5
2                    Movie Theater   0.0
3              Monument / Landmark   0.0
4  Molecular Gastronomy Restaurant   0.0


----Runnymede, Swansea----
                venue  freq
0         Coffee Shop  0.08
1         Pizza Place  0.08
2                Café  0.08
3  Italian Restaurant  0.05
4    Sushi Restaurant  0.05


----Ryerson, Garden District----
                       venue  freq
0                Coffee Shop  0.10
1             Clothing Store  0.08
2             Cosmetics Shop  0.04
3                       Café  0.04
4  Middle Eastern Restaurant  0.03


----Scarborough Village----
                             venue  freq
0                       Playground   1.0
1                      Yoga Studio   0.0
2               Mexican Restaurant   0.0
3              Monument / Landmark   0.0
4  Molecular Gastronomy Restaurant   0.0


----Silver Hills, York Mills----
 

In [28]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [29]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Café,American Restaurant,Bar,Steakhouse
1,Agincourt,Lounge,Chinese Restaurant,Sandwich Place,Breakfast Spot,Diner
2,"Agincourt North, L'Amoreaux East, Milliken, St...",Park,Asian Restaurant,Playground,Women's Store,Donut Shop
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",Grocery Store,Beer Store,Fried Chicken Joint,Fast Food Restaurant,Liquor Store
4,"Alderwood, Long Branch",Pizza Place,Coffee Shop,Gym,Skating Rink,Pharmacy


In [30]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

In [31]:
# set number of clusters
kclusters = 6

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_ = kmeans.labels_.astype(int)
kmeans.labels_[0:10] 

array([0, 0, 4, 0, 0, 0, 0, 0, 0, 0])

In [32]:
#add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

neighborhoods_venues_sorted.head()

Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,0,"Adelaide, King, Richmond",Coffee Shop,Café,American Restaurant,Bar,Steakhouse
1,0,Agincourt,Lounge,Chinese Restaurant,Sandwich Place,Breakfast Spot,Diner
2,4,"Agincourt North, L'Amoreaux East, Milliken, St...",Park,Asian Restaurant,Playground,Women's Store,Donut Shop
3,0,"Albion Gardens, Beaumond Heights, Humbergate, ...",Grocery Store,Beer Store,Fried Chicken Joint,Fast Food Restaurant,Liquor Store
4,0,"Alderwood, Long Branch",Pizza Place,Coffee Shop,Gym,Skating Rink,Pharmacy


In [33]:


toronto_merged = geo

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.shape
toronto_merged.head()

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353,0.0,Fast Food Restaurant,Print Shop,Deli / Bodega,Dessert Shop,Dim Sum Restaurant
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,0.0,Bar,Construction & Landscaping,Women's Store,Diner,Discount Store
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,0.0,Electronics Store,Spa,Pizza Place,Intersection,Tech Startup
3,M1G,Scarborough,Woburn,43.770992,-79.216917,0.0,Coffee Shop,Korean Restaurant,Convenience Store,Dumpling Restaurant,Diner
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,0.0,Athletics & Sports,Lounge,Hakka Restaurant,Bakery,Caribbean Restaurant


In [34]:
toronto_merged.dropna(inplace = True)
toronto_merged.shape

(100, 11)

In [35]:
toronto_merged.head()

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353,0.0,Fast Food Restaurant,Print Shop,Deli / Bodega,Dessert Shop,Dim Sum Restaurant
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,0.0,Bar,Construction & Landscaping,Women's Store,Diner,Discount Store
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,0.0,Electronics Store,Spa,Pizza Place,Intersection,Tech Startup
3,M1G,Scarborough,Woburn,43.770992,-79.216917,0.0,Coffee Shop,Korean Restaurant,Convenience Store,Dumpling Restaurant,Diner
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,0.0,Athletics & Sports,Lounge,Hakka Restaurant,Bakery,Caribbean Restaurant


In [36]:
toronto_merged['Cluster Labels'].astype(int)
toronto_merged.head()

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353,0.0,Fast Food Restaurant,Print Shop,Deli / Bodega,Dessert Shop,Dim Sum Restaurant
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,0.0,Bar,Construction & Landscaping,Women's Store,Diner,Discount Store
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,0.0,Electronics Store,Spa,Pizza Place,Intersection,Tech Startup
3,M1G,Scarborough,Woburn,43.770992,-79.216917,0.0,Coffee Shop,Korean Restaurant,Convenience Store,Dumpling Restaurant,Diner
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,0.0,Athletics & Sports,Lounge,Hakka Restaurant,Bakery,Caribbean Restaurant


In [37]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

In [38]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [39]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Scarborough,0.0,Fast Food Restaurant,Print Shop,Deli / Bodega,Dessert Shop,Dim Sum Restaurant
1,Scarborough,0.0,Bar,Construction & Landscaping,Women's Store,Diner,Discount Store
2,Scarborough,0.0,Electronics Store,Spa,Pizza Place,Intersection,Tech Startup
3,Scarborough,0.0,Coffee Shop,Korean Restaurant,Convenience Store,Dumpling Restaurant,Diner
4,Scarborough,0.0,Athletics & Sports,Lounge,Hakka Restaurant,Bakery,Caribbean Restaurant
6,Scarborough,0.0,Coffee Shop,Playground,Discount Store,Department Store,Women's Store
7,Scarborough,0.0,Bakery,Bus Line,Soccer Field,Park,Fast Food Restaurant
8,Scarborough,0.0,Motel,American Restaurant,Women's Store,Dessert Shop,Dim Sum Restaurant
9,Scarborough,0.0,College Stadium,General Entertainment,Skating Rink,Café,Comic Shop
10,Scarborough,0.0,Indian Restaurant,Vietnamese Restaurant,Chinese Restaurant,Latin American Restaurant,Pet Store


In [40]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
91,Etobicoke,1.0,Baseball Field,Dumpling Restaurant,Diner,Discount Store,Dog Run
97,North York,1.0,Baseball Field,Dumpling Restaurant,Diner,Discount Store,Dog Run


In [41]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
23,North York,2.0,Park,Convenience Store,Bank,Bar,Women's Store
25,North York,2.0,Fast Food Restaurant,Food & Drink Shop,Park,Dim Sum Restaurant,Diner
40,East York,2.0,Convenience Store,Park,Coffee Shop,Drugstore,Dim Sum Restaurant
44,Central Toronto,2.0,Park,Bus Line,Swim School,Women's Store,Donut Shop
50,Downtown Toronto,2.0,Park,Trail,Playground,Donut Shop,Dessert Shop
64,Central Toronto,2.0,Park,Jewelry Store,Sushi Restaurant,Trail,Empanada Restaurant
72,North York,2.0,Park,Pizza Place,Japanese Restaurant,Pub,Doner Restaurant
74,York,2.0,Park,Women's Store,Fast Food Restaurant,Market,Pharmacy
90,Etobicoke,2.0,Park,Pool,River,Women's Store,Dog Run
98,York,2.0,Park,Women's Store,Drugstore,Dim Sum Restaurant,Diner


In [42]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
94,Etobicoke,3.0,Bank,Women's Store,Drugstore,Diner,Discount Store


In [43]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
5,Scarborough,4.0,Playground,Women's Store,Donut Shop,Dessert Shop,Dim Sum Restaurant
14,Scarborough,4.0,Park,Asian Restaurant,Playground,Women's Store,Donut Shop
48,Central Toronto,4.0,Tennis Court,Playground,Women's Store,Donut Shop,Dessert Shop


In [44]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 5, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
30,North York,5.0,Park,Other Repair Shop,Airport,Women's Store,Drugstore
