<h1 align=center><font size = 5>Applied Data Science Capstone</font></h1>
<h2 align=center><font size = 4>Week 3 Assignment, Part 3 <br>
    Segmenting and Clustering Neighborhoods in Toronto, Canada</font></h1>

## Overview:
In this Notebook the code will scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M and load 
the data into a pandas dataframe.  Rows with "Not assigned" value for Borough will be dropped.  Data will be grouped by Postal Code while 
retaining all Neighborhoods for the Postal Code.  If Neighborhood is not assigned, the 'Not assigned' value will be replaced with the value of Borough.  Latitude and Longitude will be retrieved and loaded into dataframe from csv file.  The two dataframes will be joined by postal code and merged into new dataframe. Unnecessary columns will be dropped from new dataframe. Finally, the neighborhood venues for boroughs containing Toronto will be clustered on map.


### Import Pandas Library

In [1]:
import pandas as pd

### Load dataframe from Wikipedia page and set column names

In [2]:
df = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0]
df.columns = 'PostalCode','Borough','Neighborhood'
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


### Delete rows where column Borough has value of 'Not assigned' 

In [3]:
indexBorough = df[(df['Borough'] == 'Not assigned')].index
df.drop(indexBorough , axis=0, inplace=True)
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights


### Merge rows with duplicate PostalCode and Borough, retain Neighborhood value from merged rows and separate by comma

In [4]:
df = df.groupby(['PostalCode','Borough']).agg({'Neighborhood': ', '.join}).reset_index()
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


### If Neighborhood is 'Not assigned' set it to value of Borough and print postal code M7A for Queen's Park to validate

In [5]:
df.loc[df.Neighborhood == 'Not assigned','Neighborhood'] = df.Borough
print(df.loc[df['PostalCode'] == 'M7A'])

   PostalCode       Borough  Neighborhood
85        M7A  Queen's Park  Queen's Park


## Get number of rows and columns

In [6]:
df.shape

(103, 3)

### Load geo data from csv file into dataframe and label columns

In [7]:
df_geo = pd.read_csv('http://cocl.us/Geospatial_data')
df_geo.columns = 'GeoPC','Latitude','Longitude'
df_geo.head()

Unnamed: 0,GeoPC,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


### Merge dataframes to retrieve latitude and longitude for postal codes 

In [8]:
df_Toronto = pd.merge(df, df_geo, how='left', on=[df.PostalCode, df_geo.GeoPC])
df_Toronto.head()

Unnamed: 0,key_0,key_1,PostalCode,Borough,Neighborhood,GeoPC,Latitude,Longitude
0,M1B,M1B,M1B,Scarborough,"Rouge, Malvern",M1B,43.806686,-79.194353
1,M1C,M1C,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",M1C,43.784535,-79.160497
2,M1E,M1E,M1E,Scarborough,"Guildwood, Morningside, West Hill",M1E,43.763573,-79.188711
3,M1G,M1G,M1G,Scarborough,Woburn,M1G,43.770992,-79.216917
4,M1H,M1H,M1H,Scarborough,Cedarbrae,M1H,43.773136,-79.239476


### Drop unnecessary columns

In [9]:
df_Toronto.drop(['key_0','key_1','GeoPC'], axis=1, inplace=True)
df_Toronto.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


### Import Necessary Tools 

In [10]:
import folium
from geopy.geocoders import Nominatim
import requests
from pandas.io.json import json_normalize
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

### Create map of Toronto, overlay neighborhoods and set labels

In [11]:
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

In [12]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_Toronto['Latitude'], \
                                           df_Toronto['Longitude'], \
                                           df_Toronto['Borough'],\
                                           df_Toronto['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### Create subset of data to include only boroughs containing string "Central Toronto" or "Downtown Toronto"

In [13]:
df_subset = df_Toronto[df_Toronto['Borough'].str.contains('Central Toronto|Downtown Toronto')].reset_index(drop=True)
df_subset

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879
1,M4P,Central Toronto,Davisville North,43.712751,-79.390197
2,M4R,Central Toronto,North Toronto West,43.715383,-79.405678
3,M4S,Central Toronto,Davisville,43.704324,-79.38879
4,M4T,Central Toronto,"Moore Park, Summerhill East",43.689574,-79.38316
5,M4V,Central Toronto,"Deer Park, Forest Hill SE, Rathnelly, South Hi...",43.686412,-79.400049
6,M4W,Downtown Toronto,Rosedale,43.679563,-79.377529
7,M4X,Downtown Toronto,"Cabbagetown, St. James Town",43.667967,-79.367675
8,M4Y,Downtown Toronto,Church and Wellesley,43.66586,-79.38316
9,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636


In [14]:
df_subset.shape

(27, 5)

### Set parameter values to be used for Foursquare API and retreive results from url

In [15]:
CLIENT_ID = 'T3IF3MRF1SWKKFO5EKCCTTW3AOSG33HQQE1TJK4CBYUCTUXV'
CLIENT_SECRET = 'B4F3GZGHT4102Y0X4MVCWPWUISR4ABU2YYFCABE0VDFRUIVT'
VERSION = '20180605'

### Define function to extract categories of venue

### Define function to get nearby venues for neighborhoods in subset data, use radius of ~.5 miles and limit results to 100

In [16]:
def getNearbyVenues(names, latitudes, longitudes,radius=805,LIMIT=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius,
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

### Get the nearby venues for neighborhoods in subset and look at data

In [17]:
toronto_venues = getNearbyVenues(names=df_subset['Neighborhood'],
                                   latitudes=df_subset['Latitude'],
                                   longitudes=df_subset['Longitude']
                                  )

Lawrence Park
Davisville North
North Toronto West
Davisville
Moore Park, Summerhill East
Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West
Rosedale
Cabbagetown, St. James Town
Church and Wellesley
Harbourfront, Regent Park
Ryerson, Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide, King, Richmond
Harbourfront East, Toronto Islands, Union Station
Design Exchange, Toronto Dominion Centre
Commerce Court, Victoria Hotel
Roselawn
Forest Hill North, Forest Hill West
The Annex, North Midtown, Yorkville
Harbord, University of Toronto
Chinatown, Grange Park, Kensington Market
CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place, Underground city
Christie


In [18]:
print(toronto_venues.shape)
toronto_venues.head()

(1931, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Lawrence Park,43.72802,-79.38879,Lawrence Park Ravine,43.726963,-79.394382,Park
1,Lawrence Park,43.72802,-79.38879,Granite Club,43.733043,-79.381986,Gym / Fitness Center
2,Lawrence Park,43.72802,-79.38879,Tim Hortons,43.727324,-79.379563,Coffee Shop
3,Lawrence Park,43.72802,-79.38879,TTC Bus #162 - Lawrence-Donway,43.728026,-79.382805,Bus Line
4,Lawrence Park,43.72802,-79.38879,Granite Club Dining Room,43.732616,-79.381728,Restaurant


In [19]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",100,100,100,100,100,100
Berczy Park,100,100,100,100,100,100
"CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara",26,26,26,26,26,26
"Cabbagetown, St. James Town",69,69,69,69,69,69
Central Bay Street,100,100,100,100,100,100
"Chinatown, Grange Park, Kensington Market",100,100,100,100,100,100
Christie,66,66,66,66,66,66
Church and Wellesley,100,100,100,100,100,100
"Commerce Court, Victoria Hotel",100,100,100,100,100,100
Davisville,90,90,90,90,90,90


In [20]:
print('There are {} unique categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 237 unique categories.


### Analyze venues

In [21]:
# one hot encoding
subset_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
subset_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [subset_onehot.columns[-1]] + list(subset_onehot.columns[:-1])
subset_onehot = subset_onehot[fixed_columns]

subset_onehot.head()

Unnamed: 0,Yoga Studio,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Trail,Train Station,Tunnel,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [22]:
subset_onehot.shape

(1931, 237)

In [23]:
subset_grouped = subset_onehot.groupby('Neighborhood').mean().reset_index()
subset_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Trail,Train Station,Tunnel,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"CN Tower, Bathurst Quay, Island airport, Harbo...",0.0,0.0,0.0,0.038462,0.038462,0.038462,0.076923,0.076923,0.115385,...,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Cabbagetown, St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.01,0.0
5,"Chinatown, Grange Park, Kensington Market",0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.04,0.01,0.0
6,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.015152,0.0,0.0,0.0,0.015152,0.0
7,Church and Wellesley,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.01,0.0,0.01
8,"Commerce Court, Victoria Hotel",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0
9,Davisville,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.011111,0.0,0.0,0.0,0.0,0.011111


In [24]:
subset_grouped.shape

(27, 237)

### Look at top 5 venues for each neighborhood

In [25]:
num_top_venues = 5

for hood in subset_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = subset_grouped[subset_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide, King, Richmond----
                 venue  freq
0                 Café  0.07
1          Coffee Shop  0.06
2                Hotel  0.04
3  American Restaurant  0.04
4     Sushi Restaurant  0.03


----Berczy Park----
                 venue  freq
0                 Café  0.06
1           Restaurant  0.06
2                Hotel  0.06
3          Coffee Shop  0.06
4  Japanese Restaurant  0.03


----CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara----
              venue  freq
0   Harbor / Marina  0.15
1     Boat or Ferry  0.12
2  Airport Terminal  0.12
3    Airport Lounge  0.08
4   Airport Service  0.08


----Cabbagetown, St. James Town----
         venue  freq
0   Restaurant  0.07
1  Coffee Shop  0.07
2         Café  0.06
3  Pizza Place  0.04
4         Park  0.04


----Central Bay Street----
                venue  freq
0         Coffee Shop  0.09
1                Café  0.05
2     Bubble Tea Shop  0.03
3  Italian Restaura

### Look at top 10 most venues for Downtown and Central Toronto

In [26]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [27]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = subset_grouped['Neighborhood']

for ind in np.arange(subset_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(subset_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Café,Coffee Shop,American Restaurant,Hotel,Asian Restaurant,Steakhouse,Sushi Restaurant,Bar,Thai Restaurant,Theater
1,Berczy Park,Restaurant,Café,Coffee Shop,Hotel,Cocktail Bar,Japanese Restaurant,Pub,Italian Restaurant,Park,Breakfast Spot
2,"CN Tower, Bathurst Quay, Island airport, Harbo...",Harbor / Marina,Airport Terminal,Boat or Ferry,Sculpture Garden,Airport Lounge,Airport Service,Coffee Shop,Dog Run,Music Venue,Airport
3,"Cabbagetown, St. James Town",Restaurant,Coffee Shop,Café,Pizza Place,Park,Pharmacy,Breakfast Spot,Gastropub,Japanese Restaurant,Thai Restaurant
4,Central Bay Street,Coffee Shop,Café,Italian Restaurant,Burger Joint,Bubble Tea Shop,Park,Ramen Restaurant,Bar,Spa,Chinese Restaurant


### Cluster the data, create map and analyze data for each cluster

In [28]:
# set number of clusters
kclusters = 7

subset_grouped_clustering = subset_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(subset_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([5, 5, 6, 0, 0, 0, 0, 0, 5, 0], dtype=int32)

In [29]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

subset_merged = df_subset

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
subset_merged = subset_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

subset_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,3,Gym / Fitness Center,Coffee Shop,Bookstore,Park,Bus Line,Restaurant,Café,Discount Store,Dim Sum Restaurant,Diner
1,M4P,Central Toronto,Davisville North,43.712751,-79.390197,0,Pizza Place,Coffee Shop,Café,Italian Restaurant,Sushi Restaurant,Fast Food Restaurant,Burger Joint,Park,Gym,Dessert Shop
2,M4R,Central Toronto,North Toronto West,43.715383,-79.405678,0,Coffee Shop,Sporting Goods Shop,Diner,Italian Restaurant,Café,Electronics Store,Mexican Restaurant,Restaurant,Flower Shop,Rental Car Location
3,M4S,Central Toronto,Davisville,43.704324,-79.38879,0,Coffee Shop,Italian Restaurant,Pizza Place,Sushi Restaurant,Café,Sandwich Place,Fast Food Restaurant,Restaurant,Indian Restaurant,Gym
4,M4T,Central Toronto,"Moore Park, Summerhill East",43.689574,-79.38316,2,Park,Grocery Store,Candy Store,Sandwich Place,Café,Sushi Restaurant,Thai Restaurant,Bank,Japanese Restaurant,Playground


In [30]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(subset_merged['Latitude'], subset_merged['Longitude'], subset_merged['Neighborhood'], subset_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [31]:
subset_merged.loc[subset_merged['Cluster Labels'] == 0, subset_merged.columns[[1] + list(range(5, subset_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Central Toronto,0,Pizza Place,Coffee Shop,Café,Italian Restaurant,Sushi Restaurant,Fast Food Restaurant,Burger Joint,Park,Gym,Dessert Shop
2,Central Toronto,0,Coffee Shop,Sporting Goods Shop,Diner,Italian Restaurant,Café,Electronics Store,Mexican Restaurant,Restaurant,Flower Shop,Rental Car Location
3,Central Toronto,0,Coffee Shop,Italian Restaurant,Pizza Place,Sushi Restaurant,Café,Sandwich Place,Fast Food Restaurant,Restaurant,Indian Restaurant,Gym
5,Central Toronto,0,Coffee Shop,Italian Restaurant,Sushi Restaurant,Thai Restaurant,Café,Pharmacy,Bagel Shop,Pub,Pizza Place,Sandwich Place
7,Downtown Toronto,0,Restaurant,Coffee Shop,Café,Pizza Place,Park,Pharmacy,Breakfast Spot,Gastropub,Japanese Restaurant,Thai Restaurant
8,Downtown Toronto,0,Coffee Shop,Japanese Restaurant,Burger Joint,Restaurant,Gay Bar,Café,Sushi Restaurant,Dance Studio,Pizza Place,Park
9,Downtown Toronto,0,Coffee Shop,Italian Restaurant,Park,Café,Restaurant,Bakery,Theater,Pub,Bar,Breakfast Spot
10,Downtown Toronto,0,Coffee Shop,Clothing Store,Restaurant,Gastropub,Ramen Restaurant,Plaza,Tea Room,Café,Italian Restaurant,Thai Restaurant
13,Downtown Toronto,0,Coffee Shop,Café,Italian Restaurant,Burger Joint,Bubble Tea Shop,Park,Ramen Restaurant,Bar,Spa,Chinese Restaurant
19,Central Toronto,0,Coffee Shop,Italian Restaurant,Trail,Park,Gastropub,Asian Restaurant,Sushi Restaurant,Bakery,Bank,Café


In [32]:
subset_merged.loc[subset_merged['Cluster Labels'] == 1, subset_merged.columns[[1] + list(range(5, subset_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Downtown Toronto,1,Park,Bank,Trail,Playground,Candy Store,Grocery Store,College Theater,Design Studio,Electronics Store,Eastern European Restaurant


In [33]:
subset_merged.loc[subset_merged['Cluster Labels'] == 2, subset_merged.columns[[1] + list(range(5, subset_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Central Toronto,2,Park,Grocery Store,Candy Store,Sandwich Place,Café,Sushi Restaurant,Thai Restaurant,Bank,Japanese Restaurant,Playground


In [34]:
subset_merged.loc[subset_merged['Cluster Labels'] == 3, subset_merged.columns[[1] + list(range(5, subset_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central Toronto,3,Gym / Fitness Center,Coffee Shop,Bookstore,Park,Bus Line,Restaurant,Café,Discount Store,Dim Sum Restaurant,Diner


In [35]:
subset_merged.loc[subset_merged['Cluster Labels'] == 4, subset_merged.columns[[1] + list(range(5, subset_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,Central Toronto,4,Playground,Pet Store,Home Service,Garden,Department Store,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant


In [36]:
subset_merged.loc[subset_merged['Cluster Labels'] == 5, subset_merged.columns[[1] + list(range(5, subset_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Downtown Toronto,5,Coffee Shop,Café,Restaurant,Hotel,Italian Restaurant,Bakery,Gastropub,Breakfast Spot,Seafood Restaurant,Cosmetics Shop
12,Downtown Toronto,5,Restaurant,Café,Coffee Shop,Hotel,Cocktail Bar,Japanese Restaurant,Pub,Italian Restaurant,Park,Breakfast Spot
14,Downtown Toronto,5,Café,Coffee Shop,American Restaurant,Hotel,Asian Restaurant,Steakhouse,Sushi Restaurant,Bar,Thai Restaurant,Theater
15,Downtown Toronto,5,Coffee Shop,Hotel,Café,Italian Restaurant,Restaurant,Aquarium,Scenic Lookout,Deli / Bodega,Concert Hall,Brewery
16,Downtown Toronto,5,Hotel,Café,Coffee Shop,Restaurant,American Restaurant,Thai Restaurant,Deli / Bodega,Steakhouse,Gastropub,Bar
17,Downtown Toronto,5,Coffee Shop,Hotel,Café,Restaurant,Concert Hall,Deli / Bodega,Steakhouse,Japanese Restaurant,American Restaurant,Seafood Restaurant
24,Downtown Toronto,5,Coffee Shop,Café,Restaurant,Hotel,Japanese Restaurant,Italian Restaurant,Pub,Seafood Restaurant,Cocktail Bar,Bakery
25,Downtown Toronto,5,Hotel,Café,Coffee Shop,Restaurant,American Restaurant,Gastropub,Deli / Bodega,Steakhouse,Asian Restaurant,Concert Hall


In [37]:
subset_merged.loc[subset_merged['Cluster Labels'] == 6, subset_merged.columns[[1] + list(range(5, subset_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
23,Downtown Toronto,6,Harbor / Marina,Airport Terminal,Boat or Ferry,Sculpture Garden,Airport Lounge,Airport Service,Coffee Shop,Dog Run,Music Venue,Airport


#### Cluster one indicates high concentration of coffee shops and restaurants and cluster six implies areas with high level of visitors possibly due to category of venues while remaining clusters indicate grocery, fitness,recreation and travel venues