<h3><center> Segmenting and Clustering Neighborhoods in Toronto </center> </h3>

<p>Install website scraping package to extract the data. Following packages are to be installed: <br> 1. BeautifulSoup <br> 2.'lxml' html parser <p>

In [2]:
#!conda install -c anaconda beautifulsoup4

In [3]:
#!conda install -c anaconda lxml

Import all the required libraries: 

In [4]:
from bs4 import BeautifulSoup
import requests
import csv
import numpy as np
import pandas as pd



Now we will scrape the table from the url using beautifulsoup package and store it into a dataframe


In [5]:
source = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup = BeautifulSoup(source, 'lxml')
header = soup.find('tr')
column_names = []

#extracting column names from the table
for i, column in enumerate(header.find_all('th')):
    column_names.append(column.text)
column_names[-1] = column_names[-1].split('\n')[0]

#create dataframe 
df = pd.DataFrame(columns=column_names)

#adding table elements into the dataframe
for i, row in enumerate(soup.find_all('tr')[1:]):
    column_val = []
    for value in row.find_all('td'):
        column_val.append(value.text)
        
    if len(column_val) == df.shape[1]:
        df.loc[i] = column_val    

df.rename(columns={'Postcode':'PostalCode'}, inplace=True)        
df.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned\n
1,M2A,Not assigned,Not assigned\n
2,M3A,North York,Parkwoods\n
3,M4A,North York,Victoria Village\n
4,M5A,Downtown Toronto,Harbourfront\n



Ignore cells with a borough that is Not assigned.


In [6]:
df = df[~(df['Borough'] =='Not assigned')]
df.reset_index(drop=True, inplace=True)

#removing \n from the neighbourhood values
df['Neighbourhood'] = df['Neighbourhood'].apply(lambda x : x.split('\n')[0])
df.head(10)

Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights
5,M6A,North York,Lawrence Manor
6,M7A,Queen's Park,Not assigned
7,M9A,Etobicoke,Islington Avenue
8,M1B,Scarborough,Rouge
9,M1B,Scarborough,Malvern


For the field 'Neighbourhood' equals to 'Not assigned', its value will be equal to  field 'Borough' for a given postal code


In [7]:
index = df[df['Neighbourhood']== 'Not assigned'].index.values
for i in index:
    df.iat[i,2] = df.iloc[i, 1]

In [8]:
df.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights


For neigborhoods having same postal codes they are combined into one row separated with comma

In [9]:
groupby = df.groupby('PostalCode').agg(lambda x : ', '.join(set(x)))
groupby.reset_index(inplace=True)
df = groupby
df.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Morningside, West Hill, Guildwood"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [10]:
df.shape


(103, 3)

# Q1: DataFrame obtained after scraping and cleaning the data

In [53]:
df

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Morningside, West Hill, Guildwood",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Cliffside West, Birch Cliff",43.692657,-79.264848


In [12]:
#!conda install -c conda-forge geocoder

In [13]:
#import geocoder

#for index, row in df.iterrows():
#     print(row.Postcode)
#     lat_long = None
#     while(lat_long is None):
        
#         geo = geocoder.google('{}, Toronto, Ontario'.format(row.Postcode))
#         lat_long = geo.latlng
#         print(lat_long)
#         if lat_long is not None:
#             df.loc[row,'latitude'] = lat_long[0]
#             df.loc[row, 'longitude'] = lat_long[1]
     

In [14]:
!wget -O 'GeoSpatial_data.csv' https://cocl.us/Geospatial_data
print('Data downloaded!')    

--2019-05-28 11:28:26--  https://cocl.us/Geospatial_data
Resolving cocl.us (cocl.us)... 159.8.72.228
Connecting to cocl.us (cocl.us)|159.8.72.228|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://ibm.box.com/shared/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv [following]
--2019-05-28 11:28:30--  https://ibm.box.com/shared/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv
Resolving ibm.box.com (ibm.box.com)... 107.152.26.197
Connecting to ibm.box.com (ibm.box.com)|107.152.26.197|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /public/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv [following]
--2019-05-28 11:28:31--  https://ibm.box.com/public/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv
Reusing existing connection to ibm.box.com:443.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://ibm.ent.box.com/public/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv [following]
--2019-05-28 11:2

In [15]:
lat_long = pd.read_csv('GeoSpatial_data.csv')
lat_long.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [16]:
lat_long.sort_values('Postal Code', ascending=True, inplace = True)
df.sort_values('PostalCode', ascending=True, inplace = True)
lat_long.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [17]:
df['Latitude'] = lat_long['Latitude']
df['Longitude'] = lat_long['Longitude']

# Q2: DataFrame with geographical coordinates of the neighborhoods in the Toronto.

In [54]:
df

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Morningside, West Hill, Guildwood",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Cliffside West, Birch Cliff",43.692657,-79.264848


In [19]:
# install geopy package 
#!conda install -c conda-forge geopy --yes

# install folium package for visualization in map
!conda install -c conda-forge folium=0.5.0 --yes

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/DSX-Python35

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    branca-0.3.1               |             py_0          25 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    certifi-2018.8.24          |        py35_1001         139 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    altair-2.2.2               |           py35_1         462 KB  conda-forge
    ca-certificates-2019.3.9   |       hecc5488_0         146 KB  conda-forge
    openssl-1.0.2r             |       h14c3975_0         3.1 MB  conda-forge
    ------------------------------------------------------------
                                           Total:         4.0 MB

The following NEW packages will

In [20]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import folium # # map rendering library

#### Use geopy library to get the latitude and longitude values of Toronto.

In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>toronto</em>, as shown below.

In [21]:
address = 'Toronto, Ontario'
geolocator = Nominatim(user_agent="toronto")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


#### Create a map of Toronto with neighborhoods superimposed on top.

In [42]:
map_Toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

for lat, lng, label in zip(df['Latitude'], df['Longitude'], df['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Toronto)
    
map_Toronto    

In [23]:
neighbourhoods = df
neighbourhoods.shape

(103, 5)

#### Define Foursquare Credentials and Version

In [24]:
CLIENT_ID = 'ZVIRVOYFL111YW2MNCBJUDH0D3VFQCL0NBQ4BCWFB3TBT0CE' # your Foursquare ID
CLIENT_SECRET = '5QAVVDTHISB5EO1TYM0KSBYCCHJR1LZQ5T4NOS1FGCFFFFFD' # your Foursquare Secret
VERSION = '20180604' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: ZVIRVOYFL111YW2MNCBJUDH0D3VFQCL0NBQ4BCWFB3TBT0CE
CLIENT_SECRET:5QAVVDTHISB5EO1TYM0KSBYCCHJR1LZQ5T4NOS1FGCFFFFFD


# Explore Neighbourhood in Toronto

Function to find nearby venues for every neighbourhood in Toronto

In [25]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
    
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            30)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    #nearby_venues = pd.DataFrame.from_records(venues_list)
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [26]:
toronto_venues = getNearbyVenues(neighbourhoods['Neighbourhood'], 
                                 latitudes = neighbourhoods['Latitude'], 
                                 longitudes= neighbourhoods['Longitude'])


Malvern, Rouge
Highland Creek, Rouge Hill, Port Union
Morningside, West Hill, Guildwood
Woburn
Cedarbrae
Scarborough Village
Kennedy Park, Ionview, East Birchmount Park
Clairlea, Golden Mile, Oakridge
Cliffcrest, Cliffside, Scarborough Village West
Cliffside West, Birch Cliff
Wexford Heights, Dorset Park, Scarborough Town Centre
Maryvale, Wexford
Agincourt
Sullivan, Tam O'Shanter, Clarks Corners
Agincourt North, Steeles East, Milliken, L'Amoreaux East
L'Amoreaux West
Upper Rouge
Hillcrest Village
Henry Farm, Fairview, Oriole
Bayview Village
Silver Hills, York Mills
Willowdale, Newtonbrook
Willowdale South
York Mills West
Willowdale West
Parkwoods
Don Mills North
Don Mills South, Flemingdon Park
Wilson Heights, Bathurst Manor, Downsview North
York University, Northwood Park
CFB Toronto, Downsview East
Downsview West
Downsview Central
Downsview Northwest
Victoria Village
Parkview Hill, Woodbine Gardens
Woodbine Heights
The Beaches
Leaside
Thorncliffe Park
East Toronto
Riverdale, The Danf

In [27]:
print(toronto_venues.shape)
toronto_venues.head()

(1321, 7)


Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Malvern, Rouge",43.806686,-79.194353,Wendy's,43.807448,-79.199056,Fast Food Restaurant
1,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
2,"Morningside, West Hill, Guildwood",43.763573,-79.188711,Swiss Chalet Rotisserie & Grill,43.767697,-79.189914,Pizza Place
3,"Morningside, West Hill, Guildwood",43.763573,-79.188711,G & G Electronics,43.765309,-79.191537,Electronics Store
4,"Morningside, West Hill, Guildwood",43.763573,-79.188711,Marina Spa,43.766,-79.191,Spa


In [28]:
toronto_venues.groupby('Neighbourhood').count().sort_values('Venue', ascending=False)

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Stn A PO Boxes 25 The Esplanade,30,30,30,30,30,30
"Swansea, Runnymede",30,30,30,30,30,30
"Union Station, Harbourfront East, Toronto Islands",30,30,30,30,30,30
"University of Toronto, Harbord",30,30,30,30,30,30
"Victoria Hotel, Commerce Court",30,30,30,30,30,30
"Grange Park, Kensington Market, Chinatown",30,30,30,30,30,30
"Harbourfront, Regent Park",30,30,30,30,30,30
"Henry Farm, Fairview, Oriole",30,30,30,30,30,30
Davisville,30,30,30,30,30,30
Studio District,30,30,30,30,30,30


#### Let's find out how many unique categories can be curated from all the returned venues

In [29]:
print("There are {} unique categories of venue".format(len(toronto_venues['Venue Category'].unique())))

There are 236 unique categories of venue


## Analyze Each Neighbourhood in Toronto

In [30]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']],  prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighbourhood'] = toronto_venues['Neighbourhood']

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]
toronto_onehot.head()

Unnamed: 0,Neighbourhood,Accessories Store,Adult Boutique,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,"Malvern, Rouge",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Highland Creek, Rouge Hill, Port Union",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Morningside, West Hill, Guildwood",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Morningside, West Hill, Guildwood",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Morningside, West Hill, Guildwood",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [31]:
toronto_onehot.shape


(1321, 237)

#### Next, let's group rows by neighborhood and take the mean of the frequency of occurrence of each category

In [32]:
toronto_grouped = toronto_onehot.groupby('Neighbourhood').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Neighbourhood,Accessories Store,Adult Boutique,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Agincourt North, Steeles East, Milliken, L'Amo...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Beaumond Heights, Silverstone, Humbergate, Sou...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [33]:
toronto_grouped.shape

(99, 237)

### Lets put each neighbourhood with top 10 common venues into a panndas dataframe


First, let's write a function to sort the venues in descending order.

In [34]:
def most_common_venues(row, top_venues):
    row_categories = row.iloc[1:]
    row_categories.sort_values(ascending=False, inplace=True)
    return row_categories.index.values[0:top_venues]

Now, lets create the new dataframe and display the top 10 venues for each neighbourhood

In [46]:
top_venues = 10
suffix = ['st', 'nd',  'rd']

# adding column values for the dataframe
columns = ['Neighbourhood']

for i in range(top_venues):
    if i<3:
        columns.append('{}{} most common venue'.format(i+1, suffix[i]))
    else:
        columns.append('{}th most common venue'.format(i+1))
        
toronto_top_venues = pd.DataFrame(columns=columns)
toronto_top_venues['Neighbourhood'] = toronto_grouped['Neighbourhood']

for ind in range(toronto_grouped.shape[0]):
    toronto_top_venues.iloc[ind, 1:] = most_common_venues(toronto_grouped.iloc[ind,:], top_venues)
    
toronto_top_venues.head()    

Unnamed: 0,Neighbourhood,1st most common venue,2nd most common venue,3rd most common venue,4th most common venue,5th most common venue,6th most common venue,7th most common venue,8th most common venue,9th most common venue,10th most common venue
0,Agincourt,Lounge,Skating Rink,Breakfast Spot,Print Shop,Sandwich Place,Yoga Studio,Deli / Bodega,Eastern European Restaurant,Dumpling Restaurant,Drugstore
1,"Agincourt North, Steeles East, Milliken, L'Amo...",Playground,Park,Yoga Studio,Dance Studio,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Dog Run
2,Bayview Village,Japanese Restaurant,Chinese Restaurant,Bank,Café,Yoga Studio,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant
3,"Beaumond Heights, Silverstone, Humbergate, Sou...",Grocery Store,Beer Store,Fried Chicken Joint,Japanese Restaurant,Fast Food Restaurant,Discount Store,Coffee Shop,Pizza Place,Sandwich Place,Pharmacy
4,Berczy Park,Beer Bar,Seafood Restaurant,Cocktail Bar,Café,Farmers Market,Concert Hall,Thai Restaurant,Jazz Club,Steakhouse,Bakery


# Cluster Neighbourhoods

Run k means to cluster the neighbourhoods into 5 clusters

In [47]:
from sklearn.cluster import KMeans
k =4

toronto_grouped_cluster = toronto_grouped.drop('Neighbourhood',axis=1)

kmeans = KMeans(n_clusters=k, random_state=4).fit(toronto_grouped_cluster)

kmeans.labels_[0:20]

array([0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [49]:
toronto_top_venues.insert(0,'Cluster Labels', kmeans.labels_)

toronto_merged = neighbourhoods

toronto_merged = toronto_merged.join(toronto_top_venues.set_index('Neighbourhood'), on='Neighbourhood')
toronto_merged.head(10)

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st most common venue,2nd most common venue,3rd most common venue,4th most common venue,5th most common venue,6th most common venue,7th most common venue,8th most common venue,9th most common venue,10th most common venue
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353,0.0,Fast Food Restaurant,Yoga Studio,Falafel Restaurant,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Dog Run
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,0.0,Bar,Yoga Studio,Farmers Market,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore
2,M1E,Scarborough,"Morningside, West Hill, Guildwood",43.763573,-79.188711,0.0,Electronics Store,Breakfast Spot,Spa,Rental Car Location,Mexican Restaurant,Intersection,Medical Center,Pizza Place,Drugstore,Department Store
3,M1G,Scarborough,Woburn,43.770992,-79.216917,0.0,Coffee Shop,Korean Restaurant,Indian Restaurant,Department Store,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,0.0,Athletics & Sports,Bank,Lounge,Hakka Restaurant,Thai Restaurant,Fried Chicken Joint,Bakery,Caribbean Restaurant,Dim Sum Restaurant,Diner
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476,3.0,Playground,Convenience Store,Dance Studio,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Dog Run,Discount Store
6,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park",43.727929,-79.262029,0.0,Playground,Department Store,Bus Station,Coffee Shop,Yoga Studio,Deli / Bodega,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.711112,-79.284577,0.0,Bakery,Bus Line,Soccer Field,Park,Bus Station,Fast Food Restaurant,Metro Station,Intersection,Dim Sum Restaurant,Diner
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.716316,-79.239476,0.0,American Restaurant,Motel,Yoga Studio,Department Store,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore
9,M1N,Scarborough,"Cliffside West, Birch Cliff",43.692657,-79.264848,0.0,College Stadium,General Entertainment,Skating Rink,Café,Yoga Studio,Deli / Bodega,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore


In [50]:
toronto_merged.drop(toronto_merged[pd.isnull(toronto_merged['Cluster Labels'])].index,axis=0, inplace=True)
toronto_merged['Cluster Labels'] = toronto_merged['Cluster Labels'].astype(int)
toronto_merged.shape

(99, 16)

In [51]:
toronto_merged['Cluster Labels'].value_counts()

0    92
1     4
3     2
2     1
Name: Cluster Labels, dtype: int64

Visualization of neighbourhood clusters in a map

In [52]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters