## Coursera Segmenting and Clustering Project
#### Step1: Scarping the Wikipedia page to create a table for Toronto neighbourhood

1. Install required packages
2. Read in the html page and scrape the table
3. Clean the data by removing 'Not assigned' Borough
4. Remove the data if there's 'Not assigned' value in Neighbourhood
5. Print out the shape

In [1]:
import pandas as pd
import numpy as np

#!conda install -c conda-forge lxml --yes
#!conda install -c conda-forge html5lib --yes
#!conda install -c conda-forge beautifulsoup4 --yes

In [2]:
#import lxml
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
df = pd.read_html(url)

In [3]:
df[0].shape

(180, 3)

In [4]:
pddf = pd.DataFrame(df[0])
pddf.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [None]:
#pddf[pddf['Borough'] == 'Not assigned'].index

In [5]:
df_clean = pddf.drop(pddf[pddf['Borough'] == 'Not assigned'].index,axis=0)
df_clean.reset_index(drop=True,inplace=True)
df_clean.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [6]:
df_clean[df_clean['Neighbourhood'] == 'Not assigned']

Unnamed: 0,Postal Code,Borough,Neighbourhood


In [7]:
df_clean.shape

(103, 3)

#### Step2: Read in geo data and merge the 2 tables into 1 dataset

In [8]:
geodata = pd.read_csv('Geospatial_Coordinates.csv')
geodata.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [9]:
geodata.shape

(103, 3)

In [10]:
df_all = pd.merge(df_clean,geodata,how='inner',on='Postal Code')
df_all.shape

(103, 5)

In [11]:
df_all.head(12)

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


#### Step3: Explore Toronto neighborhoods
1. Get Toronto latitude and longtitude in order to plot in the map
2. Use Folium map to visualize the neighborhoods

In [18]:
#!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim

city = 'Toronto, Canada'

geolocator = Nominatim(user_agent="city_explorer")
location = geolocator.geocode(city)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [21]:
import folium

# create map
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, label in zip(df_all['Latitude'], df_all['Longitude'], df_all['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto


#### Step4: Use Foursquare API to explore the neighborhoods and segment them
1. Define Foursquare credentials
2. Borrow the function in the lab to get the venues in all neighborhoods in Toronto

In [32]:
CLIENT_ID = 'HB52FLN4Z5HQ02S2SVODAT4VPKMERBE0F5HT2VUDLSY0DRSR' # your Foursquare ID
CLIENT_SECRET = 'ULLR4G5LZZUOPC2XUNQTO4OSJ4HLBSRYLUQWAJ42DWYGIVNY' # your Foursquare Secret
VERSION = '20210120' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: HB4OBLN4Z5HQ02S2SVODAT4VPKMJTBE0F4RT2VBDLSY0DRSR
CLIENT_SECRET:ULLR4G5LZZUOPC2XUNQTO4OSJ4HLBSRYLUQWAJ42DWYGIVNY


In [33]:
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [34]:
toronto_venues = getNearbyVenues(names=df_all['Neighbourhood'],
                                   latitudes=df_all['Latitude'],
                                   longitudes=df_all['Longitude'])

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park, Ontario Provincial Government
Islington Avenue, Humber Valley Village
Malvern, Rouge
Don Mills
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto, Broadview North (Old East York)
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmo

#### Explore the dataset with venues data by looking at unique category and frequency of each neighborhood

In [35]:
print(toronto_venues.shape)
toronto_venues.head()

(2100, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
3,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant
4,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop


In [37]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,4,4,4,4,4,4
"Alderwood, Long Branch",6,6,6,6,6,6
"Bathurst Manor, Wilson Heights, Downsview North",20,20,20,20,20,20
Bayview Village,4,4,4,4,4,4
"Bedford Park, Lawrence Manor East",23,23,23,23,23,23
...,...,...,...,...,...,...
"Willowdale, Willowdale West",5,5,5,5,5,5
Woburn,3,3,3,3,3,3
Woodbine Heights,7,7,7,7,7,7
York Mills West,2,2,2,2,2,2


In [47]:
#unique category 265
len(toronto_venues['Venue Category'].unique())

265

#### Transform data and create a dataset for clustering

In [52]:
toronto_df = pd.get_dummies(toronto_venues[['Venue Category']],prefix='',prefix_sep='')

toronto_df['Neighborhood Name'] = toronto_venues['Neighborhood'] 
#move neighborhood to first column
fixed_cols = [toronto_df.columns[-1]]+list(toronto_df.columns[:-1])
toronto_df = toronto_df[fixed_cols]

toronto_df.head()

Unnamed: 0,Neighborhood Name,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Train Station,Truck Stop,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [53]:
toronto_df.shape

(2100, 266)

#### Group the category to get the frequency of each neighborhood

In [58]:
toronto_grouped = toronto_df.groupby('Neighborhood Name').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood Name,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Train Station,Truck Stop,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
92,"Willowdale, Willowdale West",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93,Woburn,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
94,Woodbine Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
95,York Mills West,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [61]:
#test = toronto_grouped[toronto_grouped['Neighborhood Name'] == 'Agincourt'].T.reset_index()
#test

Unnamed: 0,index,0
0,Neighborhood Name,Agincourt
1,Accessories Store,0
2,Airport,0
3,Airport Food Court,0
4,Airport Gate,0
...,...,...
261,Warehouse Store,0
262,Wine Bar,0
263,Wings Joint,0
264,Women's Store,0


In [62]:
topN = 10

for h in toronto_grouped['Neighborhood Name']:
    print('---'+h+'----')
    tmp = toronto_grouped[toronto_grouped['Neighborhood Name'] == h].T.reset_index()
    tmp.columns = ['venue','freq']
    tmp = tmp.iloc[1:] #drop the first row with neighborhood name
    tmp['freq'] = tmp['freq'].astype(float)
    tmp = tmp.round({'freq':2})
    print(tmp.sort_values('freq',ascending=False).reset_index(drop=True).head(topN))
    print('\n')
    

---Agincourt----
                             venue  freq
0                           Lounge  0.25
1               Chinese Restaurant  0.25
2                   Breakfast Spot  0.25
3        Latin American Restaurant  0.25
4              Monument / Landmark  0.00
5  Molecular Gastronomy Restaurant  0.00
6       Modern European Restaurant  0.00
7               Miscellaneous Shop  0.00
8        Middle Eastern Restaurant  0.00
9                    Metro Station  0.00


---Alderwood, Long Branch----
                      venue  freq
0               Pizza Place  0.33
1                  Pharmacy  0.17
2               Coffee Shop  0.17
3                       Pub  0.17
4                       Gym  0.17
5         Health Food Store  0.00
6               Music Venue  0.00
7  Mediterranean Restaurant  0.00
8               Men's Store  0.00
9             Metro Station  0.00


---Bathurst Manor, Wilson Heights, Downsview North----
                       venue  freq
0                       Bank  0.10

#### Put the top N into pandas dataframe
##### This dataset can help us know the top N categories and better understand the cluster we created later

In [63]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [66]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood Name']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood Name'] = toronto_grouped['Neighborhood Name']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Latin American Restaurant,Lounge,Breakfast Spot,Chinese Restaurant,Yoga Studio,Dog Run,Dim Sum Restaurant,Diner,Discount Store,Distribution Center
1,"Alderwood, Long Branch",Pizza Place,Gym,Pub,Pharmacy,Coffee Shop,Airport Lounge,College Cafeteria,Electronics Store,Eastern European Restaurant,Dumpling Restaurant
2,"Bathurst Manor, Wilson Heights, Downsview North",Coffee Shop,Bank,Park,Fried Chicken Joint,Bridal Shop,Diner,Sandwich Place,Deli / Bodega,Restaurant,Ice Cream Shop
3,Bayview Village,Japanese Restaurant,Café,Bank,Chinese Restaurant,Department Store,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Yoga Studio
4,"Bedford Park, Lawrence Manor East",Coffee Shop,Italian Restaurant,Sandwich Place,Pharmacy,Pet Store,Café,Pub,Sushi Restaurant,Restaurant,Thai Restaurant


#### Create cluster and explore Toronto neighborhood

In [67]:
from sklearn.cluster import KMeans

In [69]:
k = 5

toronto_cluster = toronto_grouped.drop('Neighborhood Name',axis=1)
kmeans = KMeans(init = 'k-means++', n_clusters = k, random_state=0)
kmeans_model = kmeans.fit(toronto_cluster)

kmeans_model.labels_[0:10]

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)

In [72]:
df_all.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


In [87]:
# add cluster label
neighborhoods_venues_sorted.insert(0,'Cluster Labels',kmeans_model.labels_)
toronto_merged = df_all
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood Name'),on='Neighbourhood',how='inner')

In [90]:
print(toronto_merged.head())
print(toronto_merged['Cluster Labels'].value_counts())
# convert cluster labels to int to be used in the map
toronto_merged['Cluster Labels'] = toronto_merged['Cluster Labels'].astype(int)
toronto_merged.dtypes


  Postal Code           Borough                                Neighbourhood  \
0         M3A        North York                                    Parkwoods   
1         M4A        North York                             Victoria Village   
2         M5A  Downtown Toronto                    Regent Park, Harbourfront   
3         M6A        North York             Lawrence Manor, Lawrence Heights   
4         M7A  Downtown Toronto  Queen's Park, Ontario Provincial Government   

    Latitude  Longitude  Cluster Labels 1st Most Common Venue  \
0  43.753259 -79.329656               0                  Park   
1  43.725882 -79.315572               1           Coffee Shop   
2  43.654260 -79.360636               1           Coffee Shop   
3  43.718518 -79.464763               1        Clothing Store   
4  43.662301 -79.389494               1           Coffee Shop   

  2nd Most Common Venue   3rd Most Common Venue  4th Most Common Venue  \
0     Food & Drink Shop             Yoga Studio    Dis

Postal Code                object
Borough                    object
Neighbourhood              object
Latitude                  float64
Longitude                 float64
Cluster Labels              int64
1st Most Common Venue      object
2nd Most Common Venue      object
3rd Most Common Venue      object
4th Most Common Venue      object
5th Most Common Venue      object
6th Most Common Venue      object
7th Most Common Venue      object
8th Most Common Venue      object
9th Most Common Venue      object
10th Most Common Venue     object
dtype: object

#### Visualize the clustering result

In [92]:
# create map
map_clusters = folium.Map(location=[latitude,longitude],zoom_start=11)

import matplotlib.cm as cm
import matplotlib.colors as colors

#set color scheme
x = np.arange(k)
ys = [i+x+(i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0,1,len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_color = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'],toronto_merged['Longitude'],toronto_merged['Neighbourhood'],toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi)+' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
    [lat, lon],
    radius = 5,
    popup=label,
    color=rainbow[cluster-1],
    fill=True,
    fill_color=rainbow[cluster-1],
    fill_opacity=0.7).add_to(map_clusters)
    
map_clusters


#### Examine Cluster to learn the characteristic of each cluster
##### Cluster 1
The common venue for this cluster is park, follow by Yoga studi, dog run. This indicates these neighborhoods are good for people enjoy outdoor activities, healty lifestyle and have pets

In [99]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Parkwoods,0,Park,Food & Drink Shop,Yoga Studio,Distribution Center,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run
21,Caledonia-Fairbanks,0,Park,Women's Store,Pool,Distribution Center,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store
52,"Willowdale, Newtonbrook",0,Park,Yoga Studio,Dog Run,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Doner Restaurant
64,Weston,0,Park,Yoga Studio,Dog Run,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Doner Restaurant
66,York Mills West,0,Park,Convenience Store,Yoga Studio,Dog Run,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Doner Restaurant
77,"Kingsview Village, St. Phillips, Martin Grove ...",0,Park,Sandwich Place,Yoga Studio,Discount Store,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Distribution Center
91,Rosedale,0,Park,Trail,Playground,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store


##### Cluster 2
This cluster has the most neighborhoods included. The common theme are coffee shop, Asian cuisine. The neighborhoods in the cluster could be good for people like coffee and enjoy Asian food.

In [100]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Victoria Village,1,Coffee Shop,Pizza Place,Intersection,Portuguese Restaurant,Hockey Arena,Yoga Studio,Discount Store,Department Store,Dessert Shop,Dim Sum Restaurant
2,"Regent Park, Harbourfront",1,Coffee Shop,Bakery,Pub,Park,Breakfast Spot,Café,Theater,Beer Store,Brewery,Shoe Store
3,"Lawrence Manor, Lawrence Heights",1,Clothing Store,Accessories Store,Furniture / Home Store,Coffee Shop,Boutique,Event Space,Vietnamese Restaurant,Gift Shop,Cupcake Shop,Drugstore
4,"Queen's Park, Ontario Provincial Government",1,Coffee Shop,Sushi Restaurant,Yoga Studio,Distribution Center,Portuguese Restaurant,Park,Nightclub,Mexican Restaurant,Japanese Restaurant,Italian Restaurant
7,Don Mills,1,Gym,Coffee Shop,Japanese Restaurant,Beer Store,Restaurant,Café,Smoke Shop,Sandwich Place,Bike Shop,Italian Restaurant
...,...,...,...,...,...,...,...,...,...,...,...,...
97,"First Canadian Place, Underground city",1,Coffee Shop,Café,Hotel,Gym,Japanese Restaurant,Restaurant,Asian Restaurant,Salad Place,Seafood Restaurant,Deli / Bodega
98,"The Kingsway, Montgomery Road, Old Mill North",1,River,Pool,Yoga Studio,Discount Store,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Distribution Center
99,Church and Wellesley,1,Coffee Shop,Sushi Restaurant,Japanese Restaurant,Fast Food Restaurant,Restaurant,Gay Bar,Café,Pub,Hotel,Dance Studio
100,"Business reply mail Processing Centre, South C...",1,Light Rail Station,Yoga Studio,Garden,Skate Park,Brewery,Burrito Place,Farmers Market,Fast Food Restaurant,Restaurant,Auto Workshop


##### Cluster 3
There is only 1 neighborhood in the cluster which has fast food as it's most common venue

In [101]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,"Malvern, Rouge",2,Fast Food Restaurant,Yoga Studio,Dance Studio,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dog Run,Distribution Center


##### Cluster 4
The most common venue for this cluster is associated with outdoor and exercise. This cluster could be good for people enjoy exercising.

In [102]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
32,Scarborough Village,3,Playground,Yoga Studio,Distribution Center,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run
83,"Moore Park, Summerhill East",3,Gym,Playground,Distribution Center,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run


##### Cluster 5
This cluster is mainly for large open venue or underconstruction area

In [103]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
57,"Humberlea, Emery",4,Construction & Landscaping,Baseball Field,Dog Run,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Yoga Studio,Deli / Bodega
101,"Old Mill South, King's Mill Park, Sunnylea, Hu...",4,Construction & Landscaping,Baseball Field,Dog Run,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Yoga Studio,Deli / Bodega
