# Applied Data Science Capstone
## Segmenting and Clustering Neighborhoods in Toronto

This notebook describes the process of data wrangling to get a list of Boroughs of Toronto, and then segmenting and clustering them to see if any pattern emerges.

## We start by importing necessary libraries.

In [24]:
import pandas as pd # Importing pandas from China
import numpy as np # Just in case

## Extracting the DataFrame from URL.

In [25]:
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M' #Wikipedia Page URL
Toronto=pd.read_html(url) # Reads the webpage to a list
df_Toronto=Toronto[0] # Dataframe we need is the first element of the list
df_Toronto.head() # Checking if importing worked fine

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


## Clean-up the DataFrame and add Geographic Coordinates from the Geospatial CSV file

In [28]:
df_Toronto.replace("Not assigned", np.nan, inplace = True) # Replacing "Not assigned" with NaN to make next step easy

df_Toronto.dropna(subset=["Borough"], axis=0, inplace=True) # Replacing all NaN

df_Toronto.reset_index(drop=True, inplace=True) # reset index, because we droped two rows

df_Toronto.shape # Shape of final DataFrame

(103, 3)

In [35]:
df_Toronto.sort_values(df_Toronto.columns[0], ascending = True , inplace = True) # Sorting by Postal Code
df_Toronto.head() # Checking again if sorting worked fine

Unnamed: 0,Postal Code,Borough,Neighbourhood
6,M1B,Scarborough,"Malvern, Rouge"
12,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
18,M1E,Scarborough,"Guildwood, Morningside, West Hill"
22,M1G,Scarborough,Woburn
26,M1H,Scarborough,Cedarbrae


In [29]:
Toronto_coordinates='http://cocl.us/Geospatial_data'
df_Toronto_coordinates=pd.read_csv(Toronto_coordinates)
df_Toronto_coordinates.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [36]:
df_Toronto_coordinates.sort_values(df_Toronto_coordinates.columns[0], ascending = True , inplace = True) # Sorting by Postal Code
df_Toronto_coordinates.head() # Checking again if sorting worked fine

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [39]:
df_Toronto[['Latitude','Longitude']]=df_Toronto_coordinates[['Latitude','Longitude']]
df_Toronto.reset_index(drop=True, inplace=True) # reset index, because we sorted
df_Toronto.head() # Checking again if sorting worked fine

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.727929,-79.262029
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.7942,-79.262029
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.778517,-79.346556
3,M1G,Scarborough,Woburn,43.77012,-79.408493
4,M1H,Scarborough,Cedarbrae,43.745906,-79.352188


In [49]:
Toronto_data = df_Toronto[df_Toronto['Borough'].str.contains("Toronto")].reset_index(drop=True)
Toronto_data.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.786947,-79.385975
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
2,M4L,East Toronto,"India Bazaar, The Beaches West",43.704324,-79.38879
3,M4M,East Toronto,Studio District,43.657162,-79.378937
4,M4N,Central Toronto,Lawrence Park,43.648198,-79.379817


## Importing API request handlers, advanced analysis and visualization libraries

In [51]:
#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes
import folium
print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    branca-0.4.1               |             py_0          26 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    certifi-2020.6.20          |   py36h9f0ad1d_0         151 KB  conda-forge
    ca-certificates-2020.6.20  |       hecda079_0         145 KB  conda-forge
    altair-4.1.0               |             py_1         614 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    openssl-1.1.1g             |       h516909a_1         2.1 MB  conda-forge
    ------------------------------------------------------------
                       

In [52]:
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="CA_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


## API Calls

We Define Foursquare Credentials and Version

In [53]:
CLIENT_ID = '0ZPU1H3ST2TYYZHI0TLPSZ3SHHVSBFELN0ECW3T4SRL1IO0T' # your Foursquare ID
CLIENT_SECRET = '4FBXS0Z1O1ODMWP2LGNI2BZ1D3XMRM4GYPTCFX1KFJXDYG1A' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 0ZPU1H3ST2TYYZHI0TLPSZ3SHHVSBFELN0ECW3T4SRL1IO0T
CLIENT_SECRET:4FBXS0Z1O1ODMWP2LGNI2BZ1D3XMRM4GYPTCFX1KFJXDYG1A


Let's create a function to repeat the data extraction process to all the neighborhoods in Toronto

In [77]:
def getNearbyVenues(postcode, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for code, lat, lng in zip(postcode, latitudes, longitudes):
        print(code)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            code, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Postal Code', 
                  'Neighbuorhood Latitude', 
                  'Neighbuorhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [78]:
LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius

Tor_venues = getNearbyVenues(postcode=Toronto_data['Postal Code'],
                                   latitudes=Toronto_data['Latitude'],
                                   longitudes=Toronto_data['Longitude']
                                  )

M4E
M4K
M4L
M4M
M4N
M4P
M4R
M4S
M4T
M4V
M4W
M4X
M4Y
M5A
M5B
M5C
M5E
M5G
M5H
M5J
M5K
M5L
M5N
M5P
M5R
M5S
M5T
M5V
M5W
M5X
M6G
M6H
M6J
M6K
M6P
M6R
M6S
M7A
M7Y


In [79]:
Tor_venues

Unnamed: 0,Postal Code,Neighbuorhood Latitude,Neighbuorhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,M4E,43.786947,-79.385975,Sun Star Chinese Cuisine 翠景小炒,43.787914,-79.381234,Chinese Restaurant
1,M4E,43.786947,-79.385975,TD Canada Trust,43.788074,-79.380367,Bank
2,M4E,43.786947,-79.385975,Maxim's Cafe and Patisserie,43.787863,-79.380751,Café
3,M4E,43.786947,-79.385975,Kaga Sushi,43.787758,-79.381090,Japanese Restaurant
4,M4K,43.679557,-79.352188,MenEssentials,43.677820,-79.351265,Cosmetics Shop
5,M4K,43.679557,-79.352188,Pantheon,43.677621,-79.351434,Greek Restaurant
6,M4K,43.679557,-79.352188,Cafe Fiorentina,43.677743,-79.350115,Italian Restaurant
7,M4K,43.679557,-79.352188,Dolce Gelato,43.677773,-79.351187,Ice Cream Shop
8,M4K,43.679557,-79.352188,Louis Cifer Brew Works,43.677663,-79.351313,Brewery
9,M4K,43.679557,-79.352188,La Diperie,43.677702,-79.352265,Ice Cream Shop


In [80]:
Tor_venues['Borough']=Toronto_data['Borough']
Tor_venues['Neighbourhood']=Toronto_data['Neighbourhood']
Tor_venues

Unnamed: 0,Postal Code,Neighbuorhood Latitude,Neighbuorhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Borough,Neighbourhood
0,M4E,43.786947,-79.385975,Sun Star Chinese Cuisine 翠景小炒,43.787914,-79.381234,Chinese Restaurant,East Toronto,The Beaches
1,M4E,43.786947,-79.385975,TD Canada Trust,43.788074,-79.380367,Bank,East Toronto,"The Danforth West, Riverdale"
2,M4E,43.786947,-79.385975,Maxim's Cafe and Patisserie,43.787863,-79.380751,Café,East Toronto,"India Bazaar, The Beaches West"
3,M4E,43.786947,-79.385975,Kaga Sushi,43.787758,-79.381090,Japanese Restaurant,East Toronto,Studio District
4,M4K,43.679557,-79.352188,MenEssentials,43.677820,-79.351265,Cosmetics Shop,Central Toronto,Lawrence Park
5,M4K,43.679557,-79.352188,Pantheon,43.677621,-79.351434,Greek Restaurant,Central Toronto,Davisville North
6,M4K,43.679557,-79.352188,Cafe Fiorentina,43.677743,-79.350115,Italian Restaurant,Central Toronto,"North Toronto West, Lawrence Park"
7,M4K,43.679557,-79.352188,Dolce Gelato,43.677773,-79.351187,Ice Cream Shop,Central Toronto,Davisville
8,M4K,43.679557,-79.352188,Louis Cifer Brew Works,43.677663,-79.351313,Brewery,Central Toronto,"Moore Park, Summerhill East"
9,M4K,43.679557,-79.352188,La Diperie,43.677702,-79.352265,Ice Cream Shop,Central Toronto,"Summerhill West, Rathnelly, South Hill, Forest..."


In [82]:
Tor_venues.groupby('Postal Code').count()

Unnamed: 0_level_0,Neighbuorhood Latitude,Neighbuorhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Borough,Neighbourhood
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
M4E,4,4,4,4,4,4,4,4
M4K,42,42,42,42,42,42,35,35
M4L,35,35,35,35,35,35,0,0
M4M,100,100,100,100,100,100,0,0
M4N,100,100,100,100,100,100,0,0
M4P,65,65,65,65,65,65,0,0
M4R,4,4,4,4,4,4,0,0
M4S,5,5,5,5,5,5,0,0
M4T,14,14,14,14,14,14,0,0
M4V,13,13,13,13,13,13,0,0


In [83]:
print('There are {} uniques categories.'.format(len(Tor_venues['Venue Category'].unique())))

There are 199 uniques categories.


## Now it's time for One Hot Encoding on the data

In [84]:
# one hot encoding
Toronto_onehot = pd.get_dummies(Tor_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Toronto_onehot['Postal Code'] = Tor_venues['Postal Code'] 

# move neighborhood column to the first column
fixed_columns = [Toronto_onehot.columns[-1]] + list(Toronto_onehot.columns[:-1])
Toronto_onehot = Toronto_onehot[fixed_columns]

Toronto_onehot.head()

Unnamed: 0,Postal Code,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Art Gallery,Arts & Crafts Store,...,Toy / Game Store,Trail,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,M4E,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,M4E,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,M4E,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,M4E,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,M4K,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [85]:
Toronto_grouped = Toronto_onehot.groupby('Postal Code').mean().reset_index()
Toronto_grouped

Unnamed: 0,Postal Code,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Art Gallery,Arts & Crafts Store,...,Toy / Game Store,Trail,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,M4E,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,M4K,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,...,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381
2,M4L,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,M4M,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,...,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0
4,M4N,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.01,0.0,...,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0
5,M4P,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,...,0.0,0.0,0.0,0.061538,0.0,0.046154,0.015385,0.0,0.0,0.0
6,M4R,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,M4S,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,M4T,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,M4V,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## Listing Top 5 Venues for each Post Code

In [86]:
num_top_venues = 5

for hood in Toronto_grouped['Postal Code']:
    print("----"+hood+"----")
    temp = Toronto_grouped[Toronto_grouped['Postal Code'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----M4E----
                 venue  freq
0  Japanese Restaurant  0.25
1   Chinese Restaurant  0.25
2                 Bank  0.25
3                 Café  0.25
4              Airport  0.00


----M4K----
                    venue  freq
0        Greek Restaurant  0.19
1      Italian Restaurant  0.07
2             Coffee Shop  0.07
3          Ice Cream Shop  0.05
4  Furniture / Home Store  0.05


----M4L----
              venue  freq
0    Sandwich Place  0.09
1       Pizza Place  0.09
2      Dessert Shop  0.09
3               Gym  0.06
4  Sushi Restaurant  0.06


----M4M----
                 venue  freq
0          Coffee Shop  0.09
1       Clothing Store  0.09
2   Italian Restaurant  0.03
3  Japanese Restaurant  0.03
4                 Café  0.03


----M4N----
                 venue  freq
0          Coffee Shop  0.13
1                 Café  0.07
2           Restaurant  0.07
3                Hotel  0.06
4  American Restaurant  0.04


----M4P----
                           venue  freq
0        

In [87]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [88]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Postal Code']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Postal Code'] = Toronto_grouped['Postal Code']

for ind in np.arange(Toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Postal Code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,Café,Japanese Restaurant,Chinese Restaurant,Bank,Diner,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop
1,M4K,Greek Restaurant,Coffee Shop,Italian Restaurant,Furniture / Home Store,Restaurant,Ice Cream Shop,Yoga Studio,Indian Restaurant,Spa,Caribbean Restaurant
2,M4L,Dessert Shop,Pizza Place,Sandwich Place,Gym,Café,Sushi Restaurant,Coffee Shop,Italian Restaurant,Japanese Restaurant,Discount Store
3,M4M,Coffee Shop,Clothing Store,Bubble Tea Shop,Cosmetics Shop,Italian Restaurant,Japanese Restaurant,Café,Pizza Place,Bookstore,Middle Eastern Restaurant
4,M4N,Coffee Shop,Café,Restaurant,Hotel,American Restaurant,Gym,Deli / Bodega,Japanese Restaurant,Seafood Restaurant,Italian Restaurant


## Time for unsupervised k-means clustering

In [90]:
# set number of clusters
kclusters = 5

Toronto_grouped_clustering = Toronto_grouped.drop('Postal Code', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)

In [91]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

Toronto_merged = Toronto_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
Toronto_merged = Toronto_merged.join(neighborhoods_venues_sorted.set_index('Postal Code'), on='Postal Code')

Toronto_merged.head() # check the last columns!

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,East Toronto,The Beaches,43.786947,-79.385975,1,Café,Japanese Restaurant,Chinese Restaurant,Bank,Diner,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,1,Greek Restaurant,Coffee Shop,Italian Restaurant,Furniture / Home Store,Restaurant,Ice Cream Shop,Yoga Studio,Indian Restaurant,Spa,Caribbean Restaurant
2,M4L,East Toronto,"India Bazaar, The Beaches West",43.704324,-79.38879,1,Dessert Shop,Pizza Place,Sandwich Place,Gym,Café,Sushi Restaurant,Coffee Shop,Italian Restaurant,Japanese Restaurant,Discount Store
3,M4M,East Toronto,Studio District,43.657162,-79.378937,1,Coffee Shop,Clothing Store,Bubble Tea Shop,Cosmetics Shop,Italian Restaurant,Japanese Restaurant,Café,Pizza Place,Bookstore,Middle Eastern Restaurant
4,M4N,Central Toronto,Lawrence Park,43.648198,-79.379817,1,Coffee Shop,Café,Restaurant,Hotel,American Restaurant,Gym,Deli / Bodega,Japanese Restaurant,Seafood Restaurant,Italian Restaurant


In [92]:
Toronto_merged

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,East Toronto,The Beaches,43.786947,-79.385975,1,Café,Japanese Restaurant,Chinese Restaurant,Bank,Diner,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,1,Greek Restaurant,Coffee Shop,Italian Restaurant,Furniture / Home Store,Restaurant,Ice Cream Shop,Yoga Studio,Indian Restaurant,Spa,Caribbean Restaurant
2,M4L,East Toronto,"India Bazaar, The Beaches West",43.704324,-79.38879,1,Dessert Shop,Pizza Place,Sandwich Place,Gym,Café,Sushi Restaurant,Coffee Shop,Italian Restaurant,Japanese Restaurant,Discount Store
3,M4M,East Toronto,Studio District,43.657162,-79.378937,1,Coffee Shop,Clothing Store,Bubble Tea Shop,Cosmetics Shop,Italian Restaurant,Japanese Restaurant,Café,Pizza Place,Bookstore,Middle Eastern Restaurant
4,M4N,Central Toronto,Lawrence Park,43.648198,-79.379817,1,Coffee Shop,Café,Restaurant,Hotel,American Restaurant,Gym,Deli / Bodega,Japanese Restaurant,Seafood Restaurant,Italian Restaurant
5,M4P,Central Toronto,Davisville North,43.653206,-79.400049,1,Café,Vegetarian / Vegan Restaurant,Coffee Shop,Mexican Restaurant,Vietnamese Restaurant,Bar,Park,Burger Joint,Gaming Cafe,Grocery Store
6,M4R,Central Toronto,"North Toronto West, Lawrence Park",43.693781,-79.428191,1,Hockey Arena,Field,Playground,Trail,Department Store,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant
7,M4S,Central Toronto,Davisville,43.713756,-79.490074,1,Construction & Landscaping,Park,Trail,Basketball Court,Bakery,Yoga Studio,Dessert Shop,Electronics Store,Eastern European Restaurant,Dumpling Restaurant
8,M4T,Central Toronto,"Moore Park, Summerhill East",43.64896,-79.456325,1,Breakfast Spot,Gift Shop,Dessert Shop,Dog Run,Eastern European Restaurant,Italian Restaurant,Restaurant,Bar,Bookstore,Movie Theater
9,M4V,Central Toronto,"Summerhill West, Rathnelly, South Hill, Forest...",43.636966,-79.615819,1,Hotel,Coffee Shop,Middle Eastern Restaurant,Fried Chicken Joint,Sandwich Place,Gas Station,Mediterranean Restaurant,American Restaurant,Gym,Burrito Place


## Visualizing is believing

In [94]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Toronto_merged['Latitude'], Toronto_merged['Longitude'], Toronto_merged['Postal Code'], Toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [95]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 0, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Downtown Toronto,0,Gym,Furniture / Home Store,Yoga Studio,Department Store,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant


In [96]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 1, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,East Toronto,1,Café,Japanese Restaurant,Chinese Restaurant,Bank,Diner,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop
1,East Toronto,1,Greek Restaurant,Coffee Shop,Italian Restaurant,Furniture / Home Store,Restaurant,Ice Cream Shop,Yoga Studio,Indian Restaurant,Spa,Caribbean Restaurant
2,East Toronto,1,Dessert Shop,Pizza Place,Sandwich Place,Gym,Café,Sushi Restaurant,Coffee Shop,Italian Restaurant,Japanese Restaurant,Discount Store
3,East Toronto,1,Coffee Shop,Clothing Store,Bubble Tea Shop,Cosmetics Shop,Italian Restaurant,Japanese Restaurant,Café,Pizza Place,Bookstore,Middle Eastern Restaurant
4,Central Toronto,1,Coffee Shop,Café,Restaurant,Hotel,American Restaurant,Gym,Deli / Bodega,Japanese Restaurant,Seafood Restaurant,Italian Restaurant
5,Central Toronto,1,Café,Vegetarian / Vegan Restaurant,Coffee Shop,Mexican Restaurant,Vietnamese Restaurant,Bar,Park,Burger Joint,Gaming Cafe,Grocery Store
6,Central Toronto,1,Hockey Arena,Field,Playground,Trail,Department Store,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant
7,Central Toronto,1,Construction & Landscaping,Park,Trail,Basketball Court,Bakery,Yoga Studio,Dessert Shop,Electronics Store,Eastern European Restaurant,Dumpling Restaurant
8,Central Toronto,1,Breakfast Spot,Gift Shop,Dessert Shop,Dog Run,Eastern European Restaurant,Italian Restaurant,Restaurant,Bar,Bookstore,Movie Theater
9,Central Toronto,1,Hotel,Coffee Shop,Middle Eastern Restaurant,Fried Chicken Joint,Sandwich Place,Gas Station,Mediterranean Restaurant,American Restaurant,Gym,Burrito Place


In [97]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 2, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
16,Downtown Toronto,2,Martial Arts Dojo,Yoga Studio,Dessert Shop,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run


In [98]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 3, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,Downtown Toronto,3,Construction & Landscaping,Baseball Field,Pool,Yoga Studio,Dessert Shop,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop
29,Downtown Toronto,3,Baseball Field,Yoga Studio,Dessert Shop,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run


In [99]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 4, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
24,Central Toronto,4,Park,Women's Store,Pool,Yoga Studio,Department Store,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant
30,Downtown Toronto,4,Park,Pool,Food & Drink Shop,Yoga Studio,Deli / Bodega,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run


## That's all folks!