# IBM APPLIED DATA SCIENCE CAPSTONE PROJECT PART - 3

## Clustering and Segmenting Neighborhoods in Toronto

Neighborhood has a total of 5 boroughs and 204 neighborhoods. In order to segement the neighborhoods and explore them, we will essentially need a dataset that contains the 6 boroughs and the neighborhoods that exist in each borough as well as the the latitude and logitude coordinates of each neighborhood. 

<p>For the Toronto neighborhood data, a wikipedia page available in this link <a href="https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M">here</a>.</p>

Before we get the data and start exploring it, let's download all the dependencies that needed.

In [1]:
import pandas as pd
import numpy as np

<p>Install <b>html5lib</b> for reading Wikipedia Page to pandas</p>

In [2]:
!conda install html5lib -y
import html5lib #install htmllib for reading from html webpage

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

print('All Libraries installed!')

Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/nbuser/anaconda3_501

  added / updated specs:
    - html5lib


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2019.9.11          |           py36_0         154 KB
    conda-4.7.12               |           py36_0         3.0 MB
    ------------------------------------------------------------
                                           Total:         3.2 MB

The following packages will be UPDATED:

  openssl            conda-forge::openssl-1.0.2r-h14c3975_0 --> pkgs/main::openssl-1.0.2t-h7b6447c_1

The following packages will be SUPERSEDED by a higher-priority channel:

  ca-certificates    conda-forge::ca-certificates-2019.9.1~ --> pkgs/main::ca-certificates-2019.8.28-0
  certifi                                       conda-forge --> pkgs/main
  conda    

<p>'<strong>pd.read_html()</strong>' read all the tables in the wikipedia page!
</p>

In [3]:
#the URL containing the dataset
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

Assign the html content to a list named 'table'.

In [4]:
table=pd.read_html(url)
type(table)

list

<p>Type of table is list.
    We need only first element in the list list</p>

In [5]:
df=pd.DataFrame(table[0])
print(df.head())

          0             1                 2
0  Postcode       Borough     Neighbourhood
1       M1A  Not assigned      Not assigned
2       M2A  Not assigned      Not assigned
3       M3A    North York         Parkwoods
4       M4A    North York  Victoria Village


In [93]:
df.shape

(204, 5)

In [94]:
df['Borough'].value_counts()

Etobicoke           42
North York          37
Scarborough         37
Downtown Toronto    36
Central Toronto     17
West Toronto        13
York                 9
East Toronto         6
East York            6
Queen's Park         1
Name: Borough, dtype: int64

Add first index as header 

In [6]:
new_header=df.iloc[0]

df=df[1:]
df.columns=new_header

In [7]:
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
1,M1A,Not assigned,Not assigned
2,M2A,Not assigned,Not assigned
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
5,M5A,Downtown Toronto,Harbourfront


<strong>Analyse the dataset</strong>

In [8]:
print(df.shape)
df.info()

(288, 3)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 288 entries, 1 to 288
Data columns (total 3 columns):
Postcode         288 non-null object
Borough          288 non-null object
Neighbourhood    288 non-null object
dtypes: object(3)
memory usage: 6.8+ KB


<p>We need cells that have an assigned borough. Ignore cells with a borough that is <strong>Not assigned</strong>.</p>

In [9]:
#Check for the index value Not assigned in Borough column of df
index_borough=df[df['Borough'].isin(['Not assigned'])].index

#print indexes 
print(index_borough)


Int64Index([  1,   2,  10,  14,  21,  22,  31,  37,  38,  46,  47,  51,  52,
             53,  55,  56,  60,  61,  62,  74,  75,  76,  89,  90,  91, 105,
            106, 107, 121, 122, 137, 138, 149, 150, 156, 162, 163, 168, 176,
            182, 183, 189, 190, 191, 195, 196, 202, 203, 204, 205, 210, 211,
            224, 225, 238, 239, 242, 243, 248, 249, 254, 255, 259, 260, 261,
            262, 264, 265, 275, 276, 277, 278, 279, 280, 281, 282, 288],
           dtype='int64')


<p>Drop the cells with a borough <b>Not assigned</b></p>

In [10]:
df.drop(index_borough, inplace=True,axis=0)

<p>We can reset the row index in dataframe with reset_index() to make the index start from 0 and specify <b>drop=True</b> to not to keep the original index with the argument.</p>

In [11]:
df.reset_index(drop=True, inplace=True)
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights


Inspect the Neighbourhood column for Not assigned value.

In [12]:
df.loc[df['Neighbourhood'].isin(['Not assigned'])]

Unnamed: 0,Postcode,Borough,Neighbourhood
6,M7A,Queen's Park,Not assigned


<p>Assign the Borough value to Neighbourhood, then the neighborhood will be the same as the borough. So for the 6th cell in the table on the Wikipedia page, the value of the Borough and the Neighborhood columns will be Queen's Park.</p>

In [13]:
df.loc[df['Neighbourhood'].isin(['Not assigned']), 'Neighbourhood']=df['Borough']
print(df.iloc[6,:])

0
Postcode                  M7A
Borough          Queen's Park
Neighbourhood    Queen's Park
Name: 6, dtype: object


Spelling of Neighborhood in US English is Neighborhood.

In [14]:
df.rename(columns={'Neighbourhood':'Neighborhood'}, inplace=True)
df.head()

Unnamed: 0,Postcode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights


<p>Here, we can groupby on <b>'Postcode'</b> and <b>'Borough'</b> seeing as their relationship is the same, cast the <b>'Neighbourhood'</b> column to str and join with a delimiter:</p>

In [15]:
df_sort=df.groupby(['Postcode','Borough'])['Neighborhood'].apply(lambda x: ','.join(x.astype(str))).reset_index()

<p>Display the first 10 observations from the Wikipedia page grouped by Postcode</p>

In [16]:
df_sort.head(10)

Unnamed: 0,Postcode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park,Ionview,Kennedy Park"
7,M1L,Scarborough,"Clairlea,Golden Mile,Oakridge"
8,M1M,Scarborough,"Cliffcrest,Cliffside,Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff,Cliffside West"


Display the number of rows in the dataframe

In [17]:
df_sort.shape

(103, 3)

### PART 2 

<strong>
Now to built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood.
</strong>

In [18]:
df_loc = pd.read_csv('Geospatial_Coordinates.csv')

In [19]:
df_loc.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [20]:
df_loc.shape

(103, 3)

<strong>Rename the column 'Postal Code' to 'Postcode' so as to match with df_sort dataframe to merge both.</strong>

In [21]:
df_loc.rename(columns={'Postal Code':'Postcode'}, inplace=True)
df_loc.head()

Unnamed: 0,Postcode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


###### Merging Dataframes 'df_sort' of Neighbourhood datasets and  'df_loc' of  locations dataframe.

In [22]:
df_merged=pd.merge(left=df_sort, right=df_loc, on='Postcode')

Display the first five rows

In [23]:
df_merged.head(12)

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park,Ionview,Kennedy Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea,Golden Mile,Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest,Cliffside,Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff,Cliffside West",43.692657,-79.264848


In [25]:
df_merged.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 103 entries, 0 to 102
Data columns (total 5 columns):
Postcode        103 non-null object
Borough         103 non-null object
Neighborhood    103 non-null object
Latitude        103 non-null float64
Longitude       103 non-null float64
dtypes: float64(2), object(3)
memory usage: 4.8+ KB


In [26]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(df_merged['Borough'].unique()),
        df_merged.shape[0]
    )
)

The dataframe has 11 boroughs and 103 neighborhoods.


### PART - 3

##### Explore and cluster the neighborhoods in Toronto using Geolocator and Foursquare API

In [34]:
df.shape

(204, 3)

###### Inorder to get the latitudes and longitudes of all neighborhoods in Toronto using 'geolocator'

In [36]:
# Declare List for latitudes and longitudes
latitude_toronto=[]
longitude_toronto=[]

def geo_data(address):
        geolocator = Nominatim(user_agent="toronto_explorer")
        location = geolocator.geocode(address)
        lat = location.latitude
        lng = location.longitude
        
        return lat, lng
    
for row in df['Neighborhood']:
    address=str(row)+", TO"
    lat, lng = geo_data(address)
    #print(address, lat, lng)
    latitude_toronto.append(lat)
    longitude_toronto.append(lng)


In [37]:
#Add latitudes and Longitudes to the dataframe
df['latitudes']=latitude_toronto
df['longitudes']=longitude_toronto

#### Define Foursquare Credentials and Version

In [45]:
CLIENT_ID = '*******'
CLIENT_SECRET = '****'

VERSION = '20180604'

In [46]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [47]:
venues_list=[]

def getNearbyVenues(names, latitudes, longitudes):

    #for name, lat, long in zip(names, latitudes, longitudes):
    lat=latitudes
    lng=longitudes
    radius=500
    LIMIT=100
    
    #url for foursquare API
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        lat, 
        lng, 
        radius, 
        LIMIT)
    
    #Get the url requests
    results=requests.get(url).json()

    venues = results['response']['groups'][0]['items']

    venues_list.append([(name,
                         lat,
                         lng,
                         v['venue']['name'],
                         v['venue']['location']['lat'],
                         v['venue']['location']['lng'],
                         v['venue']['categories'][0]) for v in venues])


    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                      'Neighborhood Latitude', 
                      'Neighborhood Longitude', 
                      'Venue', 
                      'Venue Latitude', 
                      'Venue Longitude', 
                      'Venue Category']
    
    return(nearby_venues)


In [48]:
toronto_venues=pd.DataFrame()
for name, lat, long in zip(df['Neighborhood'], df['latitudes'], df['longitudes']):
    toronto_venues = getNearbyVenues(name, lat, long)

#Display the dimension of dataframe
toronto_venues.shape

(4841, 7)

In [49]:
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,37.856774,-122.220688,"GCH Remodel, LLC",37.857931,-122.221924,"{'id': '5454144b498ec1f095bff2f2', 'name': 'Co..."
1,Parkwoods,37.856774,-122.220688,Parkwoods Gym,37.856478,-122.220779,"{'id': '4bf58dd8d48988d176941735', 'name': 'Gy..."
2,Parkwoods,37.856774,-122.220688,Wild Turkey Hill,37.856666,-122.219858,"{'id': '4bf58dd8d48988d159941735', 'name': 'Tr..."
3,Parkwoods,37.856774,-122.220688,Caldecott Tunnel,37.854929,-122.216535,"{'id': '52f2ab2ebcbc57f1066b8b4a', 'name': 'Tu..."
4,Victoria Village,43.732658,-79.311189,Jatujak,43.736208,-79.307668,"{'id': '4bf58dd8d48988d149941735', 'name': 'Th..."


Rename columns of the toronto_venues dataframe

In [56]:
toronto_venues.rename(columns={'Venue Latitude':'latitude', 'Venue Longitude':'longitude', 'Venue Category':'category'}, inplace=True)
toronto_venues.columns

Index(['Neighborhood', 'Neighborhood Latitude', 'Neighborhood Longitude',
       'Venue', 'latitude', 'longitude', 'category'],
      dtype='object')

In [63]:
category=[]
for rows in toronto_venues['category'].index:
    category.append(toronto_venues['category'][rows]['name'])
    

In [65]:
toronto_venues['Category']=category

In [67]:
toronto_venues.drop('category', axis=1, inplace=True)

In [68]:
toronto_venues.head(2)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,latitude,longitude,Category
0,Parkwoods,37.856774,-122.220688,"GCH Remodel, LLC",37.857931,-122.221924,Construction & Landscaping
1,Parkwoods,37.856774,-122.220688,Parkwoods Gym,37.856478,-122.220779,Gym


In [58]:
toronto_venues.groupby('Neighborhood').count().reset_index()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,latitude,longitude,category
0,Adelaide,100,100,100,100,100,100
1,Agincourt,6,6,6,6,6,6
2,Agincourt North,28,28,28,28,28,28
3,Albion Gardens,26,26,26,26,26,26
4,Alderwood,9,9,9,9,9,9
5,Bathurst Quay,25,25,25,25,25,25
6,Bayview Village,15,15,15,15,15,15
7,Bedford Park,2,2,2,2,2,2
8,Berczy Park,100,100,100,100,100,100
9,Birch Cliff,2,2,2,2,2,2


In [70]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Zoo,Accessories Store,Afghan Restaurant,African Restaurant,Airport Service,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Aquarium,...,Video Store,Vietnamese Restaurant,Volleyball Court,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [71]:
toronto_onehot.shape

(4841, 345)

In [72]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Neighborhood,Zoo,Accessories Store,Afghan Restaurant,African Restaurant,Airport Service,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,...,Video Store,Vietnamese Restaurant,Volleyball Court,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Adelaide,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0
1,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Agincourt North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0
3,Albion Gardens,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Alderwood,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [73]:
toronto_grouped.shape

(185, 345)

#### Let's print each neighborhood along with the top 15 most common venues

In [74]:
num_top_venues = 15

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide----
                 venue  freq
0          Coffee Shop  0.10
1                 Café  0.08
2     Asian Restaurant  0.07
3   Chinese Restaurant  0.06
4                  Bar  0.05
5                Hotel  0.04
6                  Pub  0.04
7       Breakfast Spot  0.03
8   Italian Restaurant  0.03
9     Sushi Restaurant  0.03
10      Ice Cream Shop  0.03
11                 Gym  0.02
12        Dessert Shop  0.02
13            Tea Room  0.02
14            Wine Bar  0.02


----Agincourt----
                            venue  freq
0                 Harbor / Marina  0.33
1                          Bakery  0.17
2                   Train Station  0.17
3                   Grocery Store  0.17
4                    Home Service  0.17
5                            Park  0.00
6   Paper / Office Supplies Store  0.00
7            Pakistani Restaurant  0.00
8                 Paintball Field  0.00
9                    Outlet Store  0.00
10           Outdoor Supply Store  0.00
11              Out

#### Let's put that into a *pandas* dataframe

In [75]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [76]:
um_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
0,Adelaide,Coffee Shop,Café,Asian Restaurant,Chinese Restaurant,Bar,Hotel,Pub,Ice Cream Shop,Italian Restaurant,Sushi Restaurant,Breakfast Spot,Tea Room,Plaza,Pizza Place,Wine Bar
1,Agincourt,Harbor / Marina,Bakery,Grocery Store,Train Station,Home Service,Fast Food Restaurant,Empanada Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Yoga Studio,Electronics Store
2,Agincourt North,Chinese Restaurant,Bakery,Ice Cream Shop,Japanese Restaurant,Clothing Store,Coffee Shop,Fast Food Restaurant,Fried Chicken Joint,Discount Store,Sandwich Place,Juice Bar,Pizza Place,Sporting Goods Shop,Beer Store,Liquor Store
3,Albion Gardens,Café,Bar,Grocery Store,Soccer Stadium,Supermarket,Outlet Store,Knitting Store,Beach,Liquor Store,Bakery,Clothing Store,Gastropub,Park,Shopping Plaza,Chinese Restaurant
4,Alderwood,Pizza Place,Gym,Skating Rink,Sandwich Place,Coffee Shop,Pharmacy,Pool,Pub,Fast Food Restaurant,Empanada Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm


## Cluster Neighborhoods

Run *k*-means to cluster the neighborhood into 5 clusters.

In [77]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[10:30] 

array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2],
      dtype=int32)

#### Add clustering labels

In [95]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

ValueError: cannot insert Cluster Labels, already exists

In [83]:
neighborhoods_venues_sorted.head(2)

Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
0,2,Adelaide,Coffee Shop,Café,Asian Restaurant,Chinese Restaurant,Bar,Hotel,Pub,Ice Cream Shop,Italian Restaurant,Sushi Restaurant,Breakfast Spot,Tea Room,Plaza,Pizza Place,Wine Bar
1,2,Agincourt,Harbor / Marina,Bakery,Grocery Store,Train Station,Home Service,Fast Food Restaurant,Empanada Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Yoga Studio,Electronics Store


#### Merge dataframes of postalcode and toronto venues using "Neighborhood" column

In [84]:
# merge df with neighborhoods_venues_sorted to add latitude/longitude for each neighborhood
toronto_merged=pd.merge(left=df, right=neighborhoods_venues_sorted, left_on='Neighborhood', right_on='Neighborhood')

toronto_merged.head(3) # check the last columns!

Unnamed: 0,Postcode,Borough,Neighborhood,latitudes,longitudes,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,...,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
0,M3A,North York,Parkwoods,37.856774,-122.220688,4,Gym,Construction & Landscaping,Trail,Tunnel,...,Fast Food Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Festival,Electronics Store,Field
1,M4A,North York,Victoria Village,43.732658,-79.311189,1,Mediterranean Restaurant,Park,Thai Restaurant,Middle Eastern Restaurant,...,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Festival,Field,Filipino Restaurant,Ethiopian Restaurant,Fish & Chips Shop
2,M5A,Downtown Toronto,Harbourfront,43.64008,-79.38015,2,Coffee Shop,Café,Hotel,Pizza Place,...,Restaurant,Sushi Restaurant,Sporting Goods Shop,Chinese Restaurant,Sports Bar,Plaza,Steakhouse,Gym,Park,Fried Chicken Joint


In [85]:
toronto_merged.reset_index(drop=True)

Unnamed: 0,Postcode,Borough,Neighborhood,latitudes,longitudes,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,...,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
0,M3A,North York,Parkwoods,37.856774,-122.220688,4,Gym,Construction & Landscaping,Trail,Tunnel,...,Fast Food Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Festival,Electronics Store,Field
1,M4A,North York,Victoria Village,43.732658,-79.311189,1,Mediterranean Restaurant,Park,Thai Restaurant,Middle Eastern Restaurant,...,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Festival,Field,Filipino Restaurant,Ethiopian Restaurant,Fish & Chips Shop
2,M5A,Downtown Toronto,Harbourfront,43.640080,-79.380150,2,Coffee Shop,Café,Hotel,Pizza Place,...,Restaurant,Sushi Restaurant,Sporting Goods Shop,Chinese Restaurant,Sports Bar,Plaza,Steakhouse,Gym,Park,Fried Chicken Joint
3,M5A,Downtown Toronto,Regent Park,43.660706,-79.360457,2,Coffee Shop,Thai Restaurant,Food Truck,Pet Store,...,Sushi Restaurant,Grocery Store,Park,Auto Dealership,Restaurant,Electronics Store,Fast Food Restaurant,Performing Arts Venue,Pub,Indian Restaurant
4,M6A,North York,Lawrence Heights,43.722778,-79.450933,2,Clothing Store,Coffee Shop,Restaurant,American Restaurant,...,Furniture / Home Store,Cosmetics Shop,Fast Food Restaurant,Toy / Game Store,Shopping Mall,Electronics Store,Bakery,Bookstore,Jewelry Store,Sporting Goods Shop
5,M6A,North York,Lawrence Manor,43.722079,-79.437507,2,Park,Electronics Store,Pharmacy,Kids Store,...,Food Court,Empanada Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Food,Flower Shop
6,M7A,Queen's Park,Queen's Park,43.659980,-79.390369,2,Coffee Shop,Café,Sandwich Place,Italian Restaurant,...,Ice Cream Shop,Indian Restaurant,Bubble Tea Shop,Middle Eastern Restaurant,Park,Bookstore,Bar,Thai Restaurant,Mediterranean Restaurant,Smoothie Shop
7,M9A,Etobicoke,Islington Avenue,43.659276,-79.529795,2,Pharmacy,Bank,Grocery Store,Café,...,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Yoga Studio,Field,Filipino Restaurant
8,M1B,Scarborough,Rouge,55.898090,-4.213624,2,Indian Restaurant,Fast Food Restaurant,Yoga Studio,Field,...,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Festival,Filipino Restaurant,Electronics Store,Fish & Chips Shop,Fish Market
9,M1B,Scarborough,Malvern,57.032488,-2.151445,1,Chinese Restaurant,Park,Supermarket,Grocery Store,...,Festival,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Empanada Restaurant,Filipino Restaurant


#### Use geopy library to get the latitude and longitude values of Toronto City.

In [86]:
address = 'Toronto, TO'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


### Create a map of Toronto with neighborhoods superimposed on top using Folium

In [87]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)


# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
#print(ys)
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['latitudes'], toronto_merged['longitudes'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    #print('cluster', cluster)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.9).add_to(map_clusters)
       
map_clusters


### Examine Clusters

##### CLUSTER 1

In [88]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
146,Scarborough,0,Playground,Yoga Studio,Field,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Festival,Filipino Restaurant,Electronics Store,Fish & Chips Shop,Fish Market
163,Etobicoke,0,Playground,Yoga Studio,Field,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Festival,Filipino Restaurant,Electronics Store,Fish & Chips Shop,Fish Market


##### CLUSTER 2

In [89]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
1,North York,1,Mediterranean Restaurant,Park,Thai Restaurant,Middle Eastern Restaurant,Yoga Studio,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Festival,Field,Filipino Restaurant,Ethiopian Restaurant,Fish & Chips Shop
9,Scarborough,1,Chinese Restaurant,Park,Supermarket,Grocery Store,Yoga Studio,Festival,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Empanada Restaurant,Filipino Restaurant
17,Etobicoke,1,Pub,Bus Stop,Park,Cuban Restaurant,Cupcake Shop,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Festival,Creperie,Field,Filipino Restaurant
20,Scarborough,1,Park,Yoga Studio,Field,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Festival,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market
28,Etobicoke,1,Park,Yoga Studio,Field,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Festival,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market
30,Etobicoke,1,Dog Run,Flower Shop,Park,Yoga Studio,Festival,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop
37,East York,1,Park,Convenience Store,Japanese Restaurant,Sandwich Place,Festival,Empanada Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Yoga Studio,Electronics Store
51,North York,1,Park,Tennis Court,Empanada Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Festival,Yoga Studio,Filipino Restaurant,Fish & Chips Shop,Fish Market
62,Scarborough,1,Metro Station,Park,Deli / Bodega,Business Service,Yoga Studio,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Festival,Field,Filipino Restaurant,Ethiopian Restaurant,Fish & Chips Shop
76,North York,1,Middle Eastern Restaurant,Park,Filipino Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Festival,Field,Fish & Chips Shop,Empanada Restaurant,Fish Market,Flea Market


##### CLUSTER 3

In [96]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
2,Downtown Toronto,2,Coffee Shop,Café,Hotel,Pizza Place,Italian Restaurant,Restaurant,Sushi Restaurant,Sporting Goods Shop,Chinese Restaurant,Sports Bar,Plaza,Steakhouse,Gym,Park,Fried Chicken Joint
3,Downtown Toronto,2,Coffee Shop,Thai Restaurant,Food Truck,Pet Store,Beer Store,Sushi Restaurant,Grocery Store,Park,Auto Dealership,Restaurant,Electronics Store,Fast Food Restaurant,Performing Arts Venue,Pub,Indian Restaurant
4,North York,2,Clothing Store,Coffee Shop,Restaurant,American Restaurant,Men's Store,Furniture / Home Store,Cosmetics Shop,Fast Food Restaurant,Toy / Game Store,Shopping Mall,Electronics Store,Bakery,Bookstore,Jewelry Store,Sporting Goods Shop
5,North York,2,Park,Electronics Store,Pharmacy,Kids Store,Bank,Food Court,Empanada Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Food,Flower Shop
6,Queen's Park,2,Coffee Shop,Café,Sandwich Place,Italian Restaurant,Pharmacy,Ice Cream Shop,Indian Restaurant,Bubble Tea Shop,Middle Eastern Restaurant,Park,Bookstore,Bar,Thai Restaurant,Mediterranean Restaurant,Smoothie Shop
7,Etobicoke,2,Pharmacy,Bank,Grocery Store,Café,Shopping Mall,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Yoga Studio,Field,Filipino Restaurant
8,Scarborough,2,Indian Restaurant,Fast Food Restaurant,Yoga Studio,Field,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Festival,Filipino Restaurant,Electronics Store,Fish & Chips Shop,Fish Market
10,North York,2,Coffee Shop,Restaurant,American Restaurant,Furniture / Home Store,Bar,Gourmet Shop,Movie Theater,Liquor Store,Sushi Restaurant,Sandwich Place,Chocolate Shop,Supermarket,Shoe Store,Bakery,Mexican Restaurant
11,East York,2,Park,Hobby Shop,Bar,Coffee Shop,Bakery,Food,Food Court,Food Service,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
12,Downtown Toronto,2,Clothing Store,Coffee Shop,Hotel,Cosmetics Shop,Middle Eastern Restaurant,Restaurant,Fast Food Restaurant,Café,Japanese Restaurant,Ramen Restaurant,Theater,Lingerie Store,Sporting Goods Shop,Bookstore,Tea Room


##### CLUSTER 4

In [97]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
177,Etobicoke,3,Home Service,Music Store,Food Service,Food Court,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Festival,Field,Empanada Restaurant,Filipino Restaurant


##### CLUSTER 5

In [98]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
0,North York,4,Gym,Construction & Landscaping,Trail,Tunnel,Yoga Studio,Fast Food Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Festival,Electronics Store,Field
50,North York,4,Construction & Landscaping,Trail,Yoga Studio,Field,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Festival,Filipino Restaurant,Electronics Store,Fish & Chips Shop
52,North York,4,Construction & Landscaping,Yoga Studio,Filipino Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Festival,Field,Fish & Chips Shop,Empanada Restaurant,Fish Market,Flea Market
53,North York,4,Construction & Landscaping,Wine Bar,Dog Run,Yoga Studio,Field,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Festival,Fish & Chips Shop,Filipino Restaurant,Empanada Restaurant
156,Downtown Toronto,4,Construction & Landscaping,Yoga Studio,Filipino Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Festival,Field,Fish & Chips Shop,Empanada Restaurant,Fish Market,Flea Market
173,Etobicoke,4,Construction & Landscaping,Trail,Yoga Studio,Field,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Festival,Filipino Restaurant,Electronics Store,Fish & Chips Shop
