# Segmenting and Clustering Neighborhoods in Toronto

## Part 1

We are using the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, 
in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe

In [10]:
# importing necessary libraries
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
import requests


In [13]:
# download data and parse it:
r = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
soup = BeautifulSoup(r.text, 'html.parser')
table=soup.find('table', attrs={'class':'wikitable sortable'})

In [14]:
#get headers:
headers=table.findAll('th')
for i, head in enumerate(headers): headers[i]=str(headers[i]).replace("<th>","").replace("</th>","").replace("\n","")

In [16]:
#Find all items and skip first one:
rows=table.findAll('tr')
rows=rows[1:len(rows)]

In [17]:
# skip all meta symbols and line feeds between rows:
for i, row in enumerate(rows): rows[i] = str(rows[i]).replace("\n</td></tr>","").replace("<tr>\n<td>","")


In [18]:
# make dataframe, expand rows and drop the old one:
df=pd.DataFrame(rows)
df[headers] = df[0].str.split("</td>\n<td>", n = 2, expand = True) 
df.drop(columns=[0],inplace=True)

In [19]:
# skip not assigned boroughs:
df = df.drop(df[(df.Borough == "Not assigned")].index)

In [20]:
# give "Not assigned" Neighborhoods same name as Borough:
df.Neighbourhood.replace("Not assigned", df.Borough, inplace=True)


In [21]:
# copy Borough value to Neighborhood if NaN:
df.Neighbourhood.fillna(df.Borough, inplace=True)


In [22]:
# drop duplicate rows:
df=df.drop_duplicates()

In [23]:
# extract titles from columns
df.update(
    df.Neighbourhood.loc[
        lambda x: x.str.contains('title')
    ].str.extract('title=\"([^\"]*)',expand=False))

df.update(
    df.Borough.loc[
        lambda x: x.str.contains('title')
    ].str.extract('title=\"([^\"]*)',expand=False))


In [24]:
# delete Toronto annotation from Neighbourhood:
df.update(
    df.Neighbourhood.loc[
        lambda x: x.str.contains('Toronto')
    ].str.replace(", Toronto",""))
df.update(
    df.Neighbourhood.loc[
        lambda x: x.str.contains('Toronto')
    ].str.replace("\(Toronto\)",""))

In [25]:
# combine multiple neighborhoods with the same post code
df2 = pd.DataFrame({'Postcode':df.Postcode.unique()})
df2['Borough']=pd.DataFrame(list(set(df['Borough'].loc[df['Postcode'] == x['Postcode']])) for i, x in df2.iterrows())
df2['Neighborhood']=pd.Series(list(set(df['Neighbourhood'].loc[df['Postcode'] == x['Postcode']])) for i, x in df2.iterrows())
df2['Neighborhood']=df2['Neighborhood'].apply(lambda x: ', '.join(x))
df2.dtypes

Postcode        object
Borough         object
Neighborhood    object
dtype: object

In [27]:
df2.head(10)

Unnamed: 0,Postcode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M7A,Queen's Park (Toronto),Queen's Park
5,M9A,Etobicoke,Islington Avenue
6,M1B,"Scarborough, Toronto","Rouge, Malvern"
7,M3B,North York,Don Mills North
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Ryerson, Garden District"


## Part2

In [28]:
#add Geo-spatial data
dfll= pd.read_csv("http://cocl.us/Geospatial_data")
dfll.rename(columns={'Postal Code':'Postcode'}, inplace=True)
dfll.set_index("Postcode")
df2.set_index("Postcode")
toronto_data=pd.merge(df2, dfll)
toronto_data.head()

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763
4,M7A,Queen's Park (Toronto),Queen's Park,43.662301,-79.389494


## Part3

In [38]:
#!pip install Folium

In [50]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import folium # map rendering library
import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

In [33]:
address = 'Toronto, ON, Canada'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto, ON, Canada are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto, ON, Canada are 43.653963, -79.387207.


### Create a map of Toronto with neighborhoods superimposed on top.


In [39]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['Borough'], toronto_data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

Now utilize the Foursquare API to explore the neighborhoods and segment them.

In [40]:
#Define Foursquare Credentials and Version
CLIENT_ID = 'C0UVIS44Q00FMEWVRHMO0AQ1WDWNBKSKD15I0U1U15W4HKZE' # your Foursquare ID
CLIENT_SECRET = 'LUXLWGXPH5AML2GMCD4OAI0HNU4Q1STIAWVKDDTCX1FSWJEK' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: C0UVIS44Q00FMEWVRHMO0AQ1WDWNBKSKD15I0U1U15W4HKZE
CLIENT_SECRET:LUXLWGXPH5AML2GMCD4OAI0HNU4Q1STIAWVKDDTCX1FSWJEK


### Let's explore One neighbourhood of Toronto 

In [41]:
#Get the neighborhood's name, latitude and longitude values.

neighborhood_name = toronto_data.loc[0, 'Neighborhood'] # neighborhood name
neighborhood_latitude = toronto_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = toronto_data.loc[0, 'Longitude'] # neighborhood longitude value

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Parkwoods are 43.7532586, -79.3296565.


### Now, let's get the top 20 venues that are in Parkwoods within a radius of 100 meters.


In [44]:
#create the GET request URL

LIMIT = 20 # limit of number of venues returned by Foursquare API
radius = 400 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=C0UVIS44Q00FMEWVRHMO0AQ1WDWNBKSKD15I0U1U15W4HKZE&client_secret=LUXLWGXPH5AML2GMCD4OAI0HNU4Q1STIAWVKDDTCX1FSWJEK&v=20180605&ll=43.7532586,-79.3296565&radius=400&limit=20'

In [45]:
#Send the GET request and examine the resutls
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5d51c5279929510026daedfb'},
  'headerLocation': 'Parkwoods - Donalda',
  'headerFullLocation': 'Parkwoods - Donalda, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 3,
  'suggestedBounds': {'ne': {'lat': 43.75685860360001,
    'lng': -79.32468189187942},
   'sw': {'lat': 43.749658596399996, 'lng': -79.33463110812058}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4e8d9dcdd5fbbbb6b3003c7b',
       'name': 'Brookbanks Park',
       'location': {'address': 'Toronto',
        'lat': 43.751976046055574,
        'lng': -79.33214044722958,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.751976046055574,
          'lng': -79.33214044722958}],
        'distance': 245,
        'cc': 'CA',
      

In [46]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

###  Clean the Json and Structure into pandas

In [51]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Brookbanks Park,Park,43.751976,-79.33214
1,KFC,Fast Food Restaurant,43.754387,-79.333021
2,Variety Store,Food & Drink Shop,43.751974,-79.333114


In [52]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

3 venues were returned by Foursquare.


### Now let's explore the neighbourhood of Toronto

In [53]:
# A function to repeat the same process to all the neighborhoods.
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [54]:
# code to run the above function on each neighborhood and create a new dataframe called toronto_venues
toronto_venues = getNearbyVenues(names=toronto_data['Neighborhood'],
                                   latitudes=toronto_data['Latitude'],
                                   longitudes=toronto_data['Longitude']
                                  )

Parkwoods
Victoria Village
Regent Park, Harbourfront 
Lawrence Heights, Lawrence Manor
Queen's Park 
Islington Avenue
Rouge, Malvern
Don Mills North
Parkview Hill, Woodbine Gardens
Ryerson, Garden District
Glencairn
Islington, Martin Grove, Cloverdale, Princess Gardens, West Deane Park
Port Union, Highland Creek , Rouge Hill
Flemingdon Park, Don Mills South
Woodbine Heights
St. James Town
Humewood-Cedarvale
Old Burnhamthorpe, Markland Wood, Eringate, Bloordale Gardens
West Hill, Guildwood, Morningside
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Downsview North, Wilson Heights
Thorncliffe Park
King, Adelaide, Richmond
Dufferin, Dovercourt Village
Scarborough Village
Henry Farm, Fairview, Oriole
York University, Northwood Park
East Toronto
Union Station , Harbourfront East, Toronto Islands
Trinity–Bellwoods, Little Portugal
Kennedy Park, Ionview, East Birchmount Park
Bayview Village
Downsview East, CFB

In [55]:
print(toronto_venues.shape)
toronto_venues.head()

(1069, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,KFC,43.754387,-79.333021,Fast Food Restaurant
2,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
3,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
4,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop


In [56]:
#Check to see hw many venues were returned for each neighbourhood

toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,4,4,4,4,4,4
"Alderwood, Long Branch",9,9,9,9,9,9
"Bathurst Manor, Downsview North, Wilson Heights",19,19,19,19,19,19
Bayview Village,4,4,4,4,4,4
Berczy Park,20,20,20,20,20,20
"Brockton, Parkdale Village, Exhibition Place",20,20,20,20,20,20
Business Reply Mail Processing Centre 969 Eastern,16,16,16,16,16,16
Caledonia-Fairbanks,5,5,5,5,5,5
Canada Post Gateway Processing Centre,11,11,11,11,11,11
Cedarbrae,8,8,8,8,8,8


### Let's find out how many unique categories can be curated from all the returned venues


In [57]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))


There are 217 uniques categories.


### Analyze Each Neighborhood


In [58]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Art Gallery,...,Theme Restaurant,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [59]:
toronto_onehot.shape

(1069, 217)

### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [60]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Theme Restaurant,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,Agincourt,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.00,0.00,0.00,0.00,0.000000,0.000000,0.000000,0.00,0.0,0.0
1,"Alderwood, Long Branch",0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.00,0.00,0.00,0.00,0.000000,0.000000,0.000000,0.00,0.0,0.0
2,"Bathurst Manor, Downsview North, Wilson Heights",0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.00,0.00,0.00,0.00,0.052632,0.000000,0.000000,0.00,0.0,0.0
3,Bayview Village,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.00,0.00,0.00,0.00,0.000000,0.000000,0.000000,0.00,0.0,0.0
4,Berczy Park,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.00,0.00,0.00,0.05,0.000000,0.000000,0.000000,0.00,0.0,0.0
5,"Brockton, Parkdale Village, Exhibition Place",0.050000,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.00,0.00,0.00,0.00,0.000000,0.000000,0.000000,0.00,0.0,0.0
6,Business Reply Mail Processing Centre 969 Eastern,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.00,0.00,0.00,0.00,0.000000,0.000000,0.000000,0.00,0.0,0.0
7,Caledonia-Fairbanks,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.00,0.00,0.00,0.00,0.000000,0.000000,0.000000,0.00,0.0,0.2
8,Canada Post Gateway Processing Centre,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.090909,...,0.00,0.00,0.00,0.00,0.000000,0.000000,0.000000,0.00,0.0,0.0
9,Cedarbrae,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.00,0.00,0.00,0.00,0.000000,0.000000,0.000000,0.00,0.0,0.0


In [61]:
toronto_grouped.shape

(100, 217)

### Let's find each neighborhood along with the top 5 most common venues and put that into a pandas dataframe


In [62]:
# Function to sort the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [63]:
# Data frame with top 5 venues for each neighbourhood
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Agincourt,Clothing Store,Lounge,Breakfast Spot,Skating Rink,Cocktail Bar
1,"Alderwood, Long Branch",Pizza Place,Dance Studio,Coffee Shop,Gym,Skating Rink
2,"Bathurst Manor, Downsview North, Wilson Heights",Coffee Shop,Grocery Store,Pizza Place,Bridal Shop,Sandwich Place
3,Bayview Village,Chinese Restaurant,Bank,Café,Japanese Restaurant,Women's Store
4,Berczy Park,Beer Bar,Farmers Market,Park,Fish Market,Museum


## Cluster neighbourhoods 

Run k-means to cluster the neighborhood into 4 clusters.


In [71]:
import matplotlib.cm as cm
import matplotlib.colors as colors
# import k-means from clustering stage
from sklearn.cluster import KMeans
#from sklearn.datasets.samples_generator import make_blobs

In [66]:
# set number of clusters
kclusters = 4

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 1, 1, 1, 1, 1, 1, 0, 1, 1], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 5 venues for each neighborhood.


In [67]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = toronto_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')



# check the last columns!
toronto_merged.head()

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,0.0,Park,Fast Food Restaurant,Food & Drink Shop,Women's Store,Dance Studio
1,M4A,North York,Victoria Village,43.725882,-79.315572,1.0,Coffee Shop,Portuguese Restaurant,Hockey Arena,Intersection,Women's Store
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,1.0,Coffee Shop,Park,Bakery,Breakfast Spot,Performing Arts Venue
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763,1.0,Furniture / Home Store,Clothing Store,Women's Store,Event Space,Coffee Shop
4,M7A,Queen's Park (Toronto),Queen's Park,43.662301,-79.389494,1.0,Coffee Shop,Park,Sushi Restaurant,Wings Joint,Gym


Let's visualize the resulting clusters

In [72]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        #color=rainbow[cluster-1],
        fill=True,
        #fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Now let's examine Clusters

In [73]:
# Cluster1

toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,North York,0.0,Park,Fast Food Restaurant,Food & Drink Shop,Women's Store,Dance Studio
21,York,0.0,Park,Fast Food Restaurant,Market,Women's Store,General Entertainment
35,East York,0.0,Park,Coffee Shop,Pizza Place,Convenience Store,Curling Ice
40,North York,0.0,Park,Airport,Bus Stop,Dance Studio,Electronics Store
61,Central Toronto,0.0,Park,Photography Studio,Bus Line,Swim School,Curling Ice
64,"York, Toronto",0.0,Park,Women's Store,Curling Ice,Electronics Store,Eastern European Restaurant
77,Etobicoke,0.0,Park,Pizza Place,Mobile Phone Shop,Cuban Restaurant,Drugstore
85,"Scarborough, Toronto",0.0,Park,Playground,Cuban Restaurant,Eastern European Restaurant,Drugstore
91,Downtown Toronto,0.0,Park,Playground,Trail,Building,Curling Ice
98,Etobicoke,0.0,Park,River,Eastern European Restaurant,Drugstore,Dog Run


In [74]:
# Cluster 2

toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,North York,1.0,Coffee Shop,Portuguese Restaurant,Hockey Arena,Intersection,Women's Store
2,Downtown Toronto,1.0,Coffee Shop,Park,Bakery,Breakfast Spot,Performing Arts Venue
3,North York,1.0,Furniture / Home Store,Clothing Store,Women's Store,Event Space,Coffee Shop
4,Queen's Park (Toronto),1.0,Coffee Shop,Park,Sushi Restaurant,Wings Joint,Gym
7,North York,1.0,Gym / Fitness Center,Basketball Court,Caribbean Restaurant,Baseball Field,Café
8,East York,1.0,Pizza Place,Fast Food Restaurant,Bank,Gastropub,Gym / Fitness Center
9,Downtown Toronto,1.0,Clothing Store,Café,Tea Room,Sandwich Place,Beer Bar
10,North York,1.0,Park,Pizza Place,Japanese Restaurant,Pub,Sushi Restaurant
11,Etobicoke,1.0,Golf Course,Bank,Women's Store,Deli / Bodega,Empanada Restaurant
12,"Scarborough, Toronto",1.0,Bar,Moving Target,Women's Store,Dance Studio,Electronics Store


In [75]:
#Cluster3 
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
6,"Scarborough, Toronto",2.0,Fast Food Restaurant,Women's Store,Dance Studio,Electronics Store,Eastern European Restaurant


In [76]:
# Cluster4
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
45,North York,3.0,Cafeteria,Women's Store,Dance Studio,Empanada Restaurant,Electronics Store
