<a id='index'></a>
# Segmenting and Clustering Neighborhoods in Toronto

## INDEX

[Jump to Part 1- Scrape Wikipedia List of postal codes](#p1)

[Jump to Part 2- Geospatial_data](#p2)

[Jump to Part 3- Explore and cluster the neighborhoods in Toronto](#p3)

[Jump to Part 3.1. - Analyze Each Neighborhood](#p31)

[Jump to Part 3.2. - Explore and cluster the neighborhoods in Toronto](#p32)

[Jump to Part 3.3. - Examine Clusters](#p33)

<a id='p1'></a>
# PART 1 - Part 1- Scrape Wikipedia List of postal codes

[Go back to index](#index)

## Importing libraries

In [1]:
import pandas as pd
import numpy as np

## Scrape the dataframe

In [2]:
#https://stackoverflow.com/questions/55234512/how-to-scrap-wikipedia-tables-with-python
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
table = pd.read_html(url)[0]
table

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights
7,M6A,North York,Lawrence Manor
8,M7A,Queen's Park,Not assigned
9,M8A,Not assigned,Not assigned


##  Wrangling the data

In [3]:
table = table.drop(table[(table.Borough == "Not assigned")].index)

### Combining neighbourhood into one row

In [4]:
table["duplicated"]=table.duplicated(keep='first', subset="Postcode")
table['Neighborhood_2'] = np.where(table['duplicated']==True, table.Neighbourhood +', ', table.Neighbourhood)
table.drop(['Neighbourhood', 'duplicated'], axis=1, inplace=True)
table = table.groupby(['Postcode', 'Borough'], as_index=False).sum()
table.rename(columns={"Neighborhood_2": "Neighborhood"}, inplace=True)

### Neighborhood "Not assigned" to the same as the borough

In [5]:
table['Neighborhood_2'] = np.where(table['Neighborhood']=="Not assigned", table.Borough, table.Neighborhood)
table.drop(['Neighborhood'], axis=1, inplace=True)
table.rename(columns={"Neighborhood_2": "Neighborhood"}, inplace=True)

In [6]:
table.head()

Unnamed: 0,Postcode,Borough,Neighborhood
0,M1B,Scarborough,"RougeMalvern,"
1,M1C,Scarborough,"Highland CreekRouge Hill, Port Union,"
2,M1E,Scarborough,"GuildwoodMorningside, West Hill,"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [7]:
table.shape

(103, 3)

<a id='p2'></a>
[Go back to index](#index)
# Part 2 - Geospatial data

In [8]:
data = pd.read_csv('https://cocl.us/Geospatial_data')
data.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [9]:
table = table.merge(data, left_on='Postcode', right_on='Postal Code')
table.drop(['Postal Code'], axis=1, inplace=True)
table.head(11)

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"RougeMalvern,",43.806686,-79.194353
1,M1C,Scarborough,"Highland CreekRouge Hill, Port Union,",43.784535,-79.160497
2,M1E,Scarborough,"GuildwoodMorningside, West Hill,",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount ParkIonview, Kennedy Park,",43.727929,-79.262029
7,M1L,Scarborough,"ClairleaGolden Mile, Oakridge,",43.711112,-79.284577
8,M1M,Scarborough,"CliffcrestCliffside, Scarborough Village West,",43.716316,-79.239476
9,M1N,Scarborough,"Birch CliffCliffside West,",43.692657,-79.264848


<a id='p3'></a>
[Go back to index](#index)
# Part 3 - Explore and cluster the neighborhoods in Toronto

## Importing libraries

In [10]:
!pip install geopy



In [11]:
import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

## Creating a map with neighborhoods superimposed on top

In [12]:
address = 'Toronto, Canada'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto City are 43.653963, -79.387207.


In [13]:
# create map of New York using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(table['Latitude'], table['Longitude'], table['Borough'], table['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

In [14]:
toronto_data = table
toronto_data.head()

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"RougeMalvern,",43.806686,-79.194353
1,M1C,Scarborough,"Highland CreekRouge Hill, Port Union,",43.784535,-79.160497
2,M1E,Scarborough,"GuildwoodMorningside, West Hill,",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [15]:
address = 'Toronto, CA'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


In [16]:
# create map of Manhattan using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

#### Define Foursquare Credentials and Version

In [17]:
CLIENT_ID = 'CE1N3QEBYUJG4ETCLZ1QHK3C2K1DMJBAH2ZBXJPICVW5AE5Q' # your Foursquare ID
CLIENT_SECRET = 'HYA2B01YII2OQRP4OWFAEWTOUXHGM35FCQ4YOJI55Q1YVKUL' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: CE1N3QEBYUJG4ETCLZ1QHK3C2K1DMJBAH2ZBXJPICVW5AE5Q
CLIENT_SECRET:HYA2B01YII2OQRP4OWFAEWTOUXHGM35FCQ4YOJI55Q1YVKUL


In [18]:
toronto_data.loc[0, 'Neighborhood']

'RougeMalvern, '

In [19]:
neighborhood_latitude = toronto_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = toronto_data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = toronto_data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of RougeMalvern,  are 43.806686299999996, -79.19435340000001.


In [20]:
LIMIT = 100
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=CE1N3QEBYUJG4ETCLZ1QHK3C2K1DMJBAH2ZBXJPICVW5AE5Q&client_secret=HYA2B01YII2OQRP4OWFAEWTOUXHGM35FCQ4YOJI55Q1YVKUL&v=20180605&ll=43.806686299999996,-79.19435340000001&radius=500&limit=100'

#### get_category_type function from the Foursquare lab.

In [21]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

### clean the json and structure it into a *pandas* dataframe.

In [22]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now write the code to run the above function on each neighborhood and create a new dataframe called *Toronto_venues*.

In [23]:
toronto_venues = getNearbyVenues(names=table['Neighborhood'],
                                   latitudes=table['Latitude'],
                                   longitudes=table['Longitude']
                                  )

RougeMalvern, 
Highland CreekRouge Hill, Port Union, 
GuildwoodMorningside, West Hill, 
Woburn
Cedarbrae
Scarborough Village
East Birchmount ParkIonview, Kennedy Park, 
ClairleaGolden Mile, Oakridge, 
CliffcrestCliffside, Scarborough Village West, 
Birch CliffCliffside West, 
Dorset ParkScarborough Town Centre, Wexford Heights, 
MaryvaleWexford, 
Agincourt
Clarks CornersSullivan, Tam O'Shanter, 
Agincourt NorthL'Amoreaux East, Milliken, Steeles East, 
L'Amoreaux West
Upper Rouge
Hillcrest Village
FairviewHenry Farm, Oriole, 
Bayview Village
Silver HillsYork Mills, 
NewtonbrookWillowdale, 
Willowdale South
York Mills West
Willowdale West
Parkwoods
Don Mills North
Flemingdon ParkDon Mills South, 
Bathurst ManorDownsview North, Wilson Heights, 
Northwood ParkYork University, 
CFB TorontoDownsview East, 
Downsview West
Downsview Central
Downsview Northwest
Victoria Village
Woodbine GardensParkview Hill, 
Woodbine Heights
The Beaches
Leaside
Thorncliffe Park
East Toronto
The Danforth WestRi

#### Let's check the size of the resulting dataframe

In [24]:
print(toronto_venues.shape)
toronto_venues.head()

(2239, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"RougeMalvern,",43.806686,-79.194353,Wendy's,43.807448,-79.199056,Fast Food Restaurant
1,"Highland CreekRouge Hill, Port Union,",43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
2,"GuildwoodMorningside, West Hill,",43.763573,-79.188711,Swiss Chalet Rotisserie & Grill,43.767697,-79.189914,Pizza Place
3,"GuildwoodMorningside, West Hill,",43.763573,-79.188711,G & G Electronics,43.765309,-79.191537,Electronics Store
4,"GuildwoodMorningside, West Hill,",43.763573,-79.188711,Big Bite Burrito,43.766299,-79.19072,Mexican Restaurant


In [25]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"AdelaideKing, Richmond,",100,100,100,100,100,100
Agincourt,4,4,4,4,4,4
"Agincourt NorthL'Amoreaux East, Milliken, Steeles East,",2,2,2,2,2,2
"Albion GardensBeaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown,",9,9,9,9,9,9
"AlderwoodLong Branch,",8,8,8,8,8,8
"Bathurst ManorDownsview North, Wilson Heights,",19,19,19,19,19,19
Bayview Village,4,4,4,4,4,4
"Bedford ParkLawrence Manor East,",26,26,26,26,26,26
Berczy Park,55,55,55,55,55,55
"Birch CliffCliffside West,",4,4,4,4,4,4


#### Let's find out how many unique categories can be curated from all the returned venues

In [26]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 272 uniques categories.


<a id='p31'></a>
[Go back to index](#index)
## 3.1. Analyze Each Neighborhood

In [27]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [28]:
toronto_onehot.shape

(2239, 272)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [29]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,"AdelaideKing, Richmond,",0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.030000,...,0.00000,0.0,0.020000,0.000000,0.000000,0.000000,0.000000,0.010000,0.000000,0.000000
1,Agincourt,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,...,0.00000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
2,"Agincourt NorthL'Amoreaux East, Milliken, Stee...",0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,...,0.00000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
3,"Albion GardensBeaumond Heights, Humbergate, Ja...",0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,...,0.00000,0.0,0.000000,0.000000,0.111111,0.000000,0.000000,0.000000,0.000000,0.000000
4,"AlderwoodLong Branch,",0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,...,0.00000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
5,"Bathurst ManorDownsview North, Wilson Heights,",0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,...,0.00000,0.0,0.000000,0.000000,0.052632,0.000000,0.000000,0.000000,0.000000,0.000000
6,Bayview Village,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,...,0.00000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
7,"Bedford ParkLawrence Manor East,",0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.038462,...,0.00000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.038462
8,Berczy Park,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,...,0.00000,0.0,0.018182,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
9,"Birch CliffCliffside West,",0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,...,0.00000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000


In [30]:
toronto_grouped.shape

(100, 272)

#### Let's print each neighborhood along with the top 5 most common venues

In [31]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----AdelaideKing, Richmond, ----
         venue  freq
0  Coffee Shop  0.08
1         Café  0.05
2   Steakhouse  0.04
3          Bar  0.04
4        Hotel  0.03


----Agincourt----
                       venue  freq
0                     Lounge  0.25
1             Breakfast Spot  0.25
2  Latin American Restaurant  0.25
3               Skating Rink  0.25
4             Massage Studio  0.00


----Agincourt NorthL'Amoreaux East, Milliken, Steeles East, ----
                             venue  freq
0                       Playground   0.5
1                             Park   0.5
2                      Yoga Studio   0.0
3                    Metro Station   0.0
4  Molecular Gastronomy Restaurant   0.0


----Albion GardensBeaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown, ----
                  venue  freq
0           Pizza Place  0.22
1              Pharmacy  0.11
2           Video Store  0.11
3  Fast Food Restaurant  0.11
4   Fried Chicken Joint  0.1

                venue  freq
0         Coffee Shop  0.33
1                Park  0.33
2   Convenience Store  0.33
3         Yoga Studio  0.00
4  Mexican Restaurant  0.00


----EmeryHumberlea, ----
                             venue  freq
0                   Baseball Field   1.0
1                      Yoga Studio   0.0
2               Mexican Restaurant   0.0
3  Molecular Gastronomy Restaurant   0.0
4       Modern European Restaurant   0.0


----FairviewHenry Farm, Oriole, ----
                  venue  freq
0        Clothing Store  0.13
1           Coffee Shop  0.07
2  Fast Food Restaurant  0.07
3         Women's Store  0.03
4      Toy / Game Store  0.03


----First Canadian PlaceUnderground city, ----
         venue  freq
0  Coffee Shop  0.12
1         Café  0.07
2   Restaurant  0.04
3        Hotel  0.04
4   Steakhouse  0.04


----Flemingdon ParkDon Mills South, ----
                 venue  freq
0           Beer Store  0.10
1                  Gym  0.10
2          Coffee Shop  0.10
3  Jap

                venue  freq
0                Café  0.08
1         Coffee Shop  0.08
2    Sushi Restaurant  0.06
3  Italian Restaurant  0.06
4            Tea Room  0.03


----RyersonGarden District, ----
            venue  freq
0  Clothing Store  0.08
1     Coffee Shop  0.07
2  Cosmetics Shop  0.04
3          Bakery  0.03
4            Café  0.03


----Scarborough Village----
                             venue  freq
0                       Playground   0.5
1                Convenience Store   0.5
2                      Yoga Studio   0.0
3                    Metro Station   0.0
4  Molecular Gastronomy Restaurant   0.0


----Silver HillsYork Mills, ----
                             venue  freq
0                        Cafeteria   1.0
1               Mexican Restaurant   0.0
2              Monument / Landmark   0.0
3  Molecular Gastronomy Restaurant   0.0
4       Modern European Restaurant   0.0


----St. James Town----
                venue  freq
0         Coffee Shop  0.07
1              

#### Let's put that into a *pandas* dataframe

In [32]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#### Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [33]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"AdelaideKing, Richmond,",Coffee Shop,Café,Steakhouse,Bar,Sushi Restaurant,Hotel,Asian Restaurant,Restaurant,American Restaurant,Thai Restaurant
1,Agincourt,Latin American Restaurant,Lounge,Breakfast Spot,Skating Rink,Women's Store,Drugstore,Discount Store,Dog Run,Doner Restaurant,Donut Shop
2,"Agincourt NorthL'Amoreaux East, Milliken, Stee...",Park,Playground,Women's Store,Donut Shop,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant
3,"Albion GardensBeaumond Heights, Humbergate, Ja...",Pizza Place,Pharmacy,Sandwich Place,Fast Food Restaurant,Beer Store,Fried Chicken Joint,Grocery Store,Video Store,College Stadium,Department Store
4,"AlderwoodLong Branch,",Pizza Place,Gym,Coffee Shop,Pharmacy,Skating Rink,Sandwich Place,Pub,Discount Store,Department Store,Dessert Shop


<a id='p32'></a>
[Go back to index](#index)
## 3.2. Explore and cluster the neighborhoods in Toronto

Run *k*-means to cluster the neighborhood into 5 clusters.

In [34]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 3, 0, 3, 3, 3, 3, 3, 3, 3])

In [35]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = toronto_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,"RougeMalvern,",43.806686,-79.194353,3.0,Fast Food Restaurant,Department Store,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop
1,M1C,Scarborough,"Highland CreekRouge Hill, Port Union,",43.784535,-79.160497,1.0,Bar,Women's Store,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Drugstore,Farmers Market
2,M1E,Scarborough,"GuildwoodMorningside, West Hill,",43.763573,-79.188711,3.0,Medical Center,Intersection,Electronics Store,Pizza Place,Breakfast Spot,Rental Car Location,Mexican Restaurant,Diner,Discount Store,Dog Run
3,M1G,Scarborough,Woburn,43.770992,-79.216917,3.0,Coffee Shop,Pharmacy,Korean Restaurant,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Dumpling Restaurant
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,3.0,Bakery,Lounge,Hakka Restaurant,Bank,Caribbean Restaurant,Athletics & Sports,Thai Restaurant,Fried Chicken Joint,Dog Run,Dim Sum Restaurant


In [66]:
toronto_merged['Cluster Labels'] = toronto_merged['Cluster Labels'].fillna(0).astype(int)

### Finally, let's visualize the resulting clusters

In [67]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<a id='p33'></a>
[Go back to index](#index)
## 5. Examine Clusters

#### Cluster 1

In [68]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels 2
14,Scarborough,0,Park,Playground,Women's Store,Donut Shop,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,0.0
16,Scarborough,0,,,,,,,,,,,
25,North York,0,Park,Bus Stop,Food & Drink Shop,Women's Store,Donut Shop,Diner,Discount Store,Dog Run,Doner Restaurant,Drugstore,0.0
30,North York,0,Park,Airport,Women's Store,Drugstore,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,0.0
40,East York,0,Convenience Store,Park,Coffee Shop,Dumpling Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,0.0
44,Central Toronto,0,Park,Swim School,Bus Line,Women's Store,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Donut Shop,0.0
50,Downtown Toronto,0,Park,Playground,Trail,Empanada Restaurant,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Deli / Bodega,0.0
74,York,0,Park,Women's Store,Fast Food Restaurant,Market,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,0.0
90,Etobicoke,0,Park,Smoke Shop,River,Electronics Store,Empanada Restaurant,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Deli / Bodega,Doner Restaurant,0.0
93,Etobicoke,0,,,,,,,,,,,


#### Cluster 2

In [69]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels 2
1,Scarborough,1,Bar,Women's Store,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Drugstore,Farmers Market,1.0


#### Cluster 3

In [70]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels 2
21,North York,2,Gym,Department Store,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,2.0
48,Central Toronto,2,Gym,Department Store,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,2.0


#### Cluster 4

In [71]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels 2
0,Scarborough,3,Fast Food Restaurant,Department Store,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,3.0
2,Scarborough,3,Medical Center,Intersection,Electronics Store,Pizza Place,Breakfast Spot,Rental Car Location,Mexican Restaurant,Diner,Discount Store,Dog Run,3.0
3,Scarborough,3,Coffee Shop,Pharmacy,Korean Restaurant,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Dumpling Restaurant,3.0
4,Scarborough,3,Bakery,Lounge,Hakka Restaurant,Bank,Caribbean Restaurant,Athletics & Sports,Thai Restaurant,Fried Chicken Joint,Dog Run,Dim Sum Restaurant,3.0
5,Scarborough,3,Convenience Store,Playground,Drugstore,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Women's Store,3.0
6,Scarborough,3,Chinese Restaurant,Coffee Shop,Bus Station,Department Store,Women's Store,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,3.0
7,Scarborough,3,Bakery,Bus Line,Soccer Field,Park,Fast Food Restaurant,Metro Station,Bus Station,Intersection,Doner Restaurant,Dog Run,3.0
8,Scarborough,3,American Restaurant,Motel,Department Store,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Women's Store,3.0
9,Scarborough,3,Café,General Entertainment,College Stadium,Skating Rink,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,3.0
10,Scarborough,3,Indian Restaurant,Pet Store,Thrift / Vintage Store,Light Rail Station,Furniture / Home Store,Chinese Restaurant,Vietnamese Restaurant,Eastern European Restaurant,Dumpling Restaurant,Drugstore,3.0


#### Cluster 5

In [72]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels 2
63,Central Toronto,4,Garden,Women's Store,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Drugstore,Department Store,4.0
