# Neighborhood Segmentation & Clustering in Toronto
   ## Coursera--IBM Data Science Professional Certificate Capstone
   ## By: Abdullah M. Mustsfa
### ----------------------------------------------------------------------------------------------------------------------------------------------------

### Get Neccessary Library
First, let's get the neccessary libraries for data manipulation, analysis, modeling, and visualization.
A few packages were missing and needed to be installed.

In [1]:
#!pip install geocoder
#!pip install geopy
#!conda install -c conda-forge folium=0.5.0 --yes

In [2]:
import urllib.request #Request a web page through its URL to return its HTML content.
import requests
from bs4 import BeautifulSoup #Pasrse the HTML content into BS tree format
import pandas as pd #Tabular data manipulation in python
import numpy as np #numerical data manipulation in python
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
from sklearn.cluster import KMeans # A non-parametric clustering algorithm
import folium # Map visualisation package
import matplotlib.pyplot as plt
from matplotlib import cm, colors
%matplotlib inline

## Part 1: Retrieve a List of Postal Codes of Toronto

### Get the postal code data from the corresponding wikipedia page.
- Use urllib to request the wikipedia url and get its HTML content
- Parse the data into tree format using beautifulSoup library

In [3]:
# Define the the url of the wikipedia page
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
# open the url using urllib.request and put the HTML into the page variable
page = urllib.request.urlopen(url)
# parse the HTML from our URL into the BeautifulSoup parse tree format
soup = BeautifulSoup(page, "lxml")

### Parse the table data into a pandas dataframe
- Parse the beautifulsoup tree into corresponding postal code, borough, and neighborhood.  
- Drop postal codes with "Not assigned" Boroughs.  
- Format the table into a pandas dataframe for future manipulation.  

In [4]:
right_table=soup.find('table', class_='wikitable sortable')  # locate and retrieve the table from the soup tree
A, B , C = [] , [] , [] #initialize empty arrays
for row in right_table.findAll('tr'): #loop over the table and locate table elements 'tr' and 'td'
    cells = row.findAll('td')
    if len(cells)==3:
        # extract the table elements
        a = cells[0].find(text=True)[:-1]
        b = cells[1].find(text=True)[:-1]
        c = cells[2].find(text=True)[:-1]
        # ignore unassigned boroughs
        if b != 'Not assigned':
            A.append(a)
            B.append(b)
            C.append(c)
# Cast the arrarys into a pandas dataframe
df=pd.DataFrame({'PostalCode':A,'Borough':B,'Neighborhood':C})
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Regent Park / Harbourfront
3,M6A,North York,Lawrence Manor / Lawrence Heights
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government


#### Check for unassigned or empty neighborhoods

In [5]:
print(sum(df['Neighborhood'] == 'Not assigned'))
print(sum(df['Neighborhood'] == ''))

0
0


====> **No empty neighborhood cells were found**

#### Check for any Postal code duplicates

In [6]:
sum(df['PostalCode'].duplicated())

0

===> **No postal code duplicates** 

### Get the shape of the data

In [7]:
df.shape

(103, 3)

**===> The data contains 103 unique postal code and their associated neighborhoods and boroughs.**

## Part 2: Retrieve the corresponding latitude and longitude for each postal code

- **It wasn't possible to retrieve the data through the geocoder library.**
- **The following commented code could be used to retrieve the location data.**

In [8]:
'''
import geocoder # import geocoder
lats = []
longs = []
for postcode in df['PostalCode']:
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
      g = geocoder.google('{}, Toronto, Ontario'.format(postcode))
      lat_lng_coords = g.latlng

    latitude = lat_lng_coords[0]
    longitude = lat_lng_coords[1]
    print(postcode, latitude, longitude)
    lats.append(latitude)
    longs.append(longitude)
'''

"\nimport geocoder # import geocoder\nlats = []\nlongs = []\nfor postcode in df['PostalCode']:\n    # initialize your variable to None\n    lat_lng_coords = None\n    # loop until you get the coordinates\n    while(lat_lng_coords is None):\n      g = geocoder.google('{}, Toronto, Ontario'.format(postcode))\n      lat_lng_coords = g.latlng\n\n    latitude = lat_lng_coords[0]\n    longitude = lat_lng_coords[1]\n    print(postcode, latitude, longitude)\n    lats.append(latitude)\n    longs.append(longitude)\n"

### Get the Geospatial data and import into a pandas dataframe.

In [9]:
geo_locs = pd.read_csv('https://cocl.us/Geospatial_data')
geo_locs.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


### Merge the location dataframe with the postal code dataframe

In [10]:
df_toronto = df.merge(geo_locs, left_on='PostalCode', right_on='Postal Code')
df_toronto.drop(['Postal Code'],axis=1,inplace=True) # Drop the redunant postal code info
df_toronto.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Regent Park / Harbourfront,43.65426,-79.360636
3,M6A,North York,Lawrence Manor / Lawrence Heights,43.718518,-79.464763
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.662301,-79.389494


## Part 3: Segmentation and Clustering of the Neighborhoods
- This is a replication of the analysis of NewYork neighborhoods.

### Retrieve the latitude and longitude of Toronto through geopy package.

In [11]:
address = 'Toronto, TO'
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


### Establish the Foursquare credentails.

In [12]:
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: UP32LRYVT2IUPMQ040YGRLE5OFYS2S2CUCVLHNSBESPFDIJM
CLIENT_SECRET:NY2TGAXSMJBZ24MQEJ5CBYF1CUK5HL4QIKISY20P1RPFJO1T


### Extract the most popular venues with every district.
- **Using Foursquare API, retreieve the most popular venues at a certain longitude and latitude.**
- **Limit the search radius to 1000 meters.**
- **Get the top 50 venues.**
- **Format the data into a pandas dataframe.**

In [13]:
def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            100)
        
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

### Get the Toronto poular venues dataframe.

In [14]:
# type your answer here
toronto_venues = getNearbyVenues(names=df_toronto['Neighborhood'],
                                   latitudes=df_toronto['Latitude'],
                                   longitudes=df_toronto['Longitude']
                                  )
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Allwyn's Bakery,43.75984,-79.324719,Caribbean Restaurant
1,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
2,Parkwoods,43.753259,-79.329656,Tim Hortons,43.760668,-79.326368,Café
3,Parkwoods,43.753259,-79.329656,A&W,43.760643,-79.326865,Fast Food Restaurant
4,Parkwoods,43.753259,-79.329656,Bruno's valu-mart,43.746143,-79.32463,Grocery Store


### Distribution of venues per neighborhood

In [15]:
toronto_venues['Neighborhood'].value_counts().sample(10)

Davisville North                                                                                      100
Rouge Hill / Port Union / Highland Creek                                                                5
Don Mills                                                                                              76
New Toronto / Mimico South / Humber Bay Shores                                                         22
Kingsview Village / St. Phillips / Martin Grove Gardens / Richview Gardens                             16
Bedford Park / Lawrence Manor East                                                                     41
Woburn                                                                                                  9
Thorncliffe Park                                                                                       49
Parkdale / Roncesvalles                                                                               100
Mimico NW / The Queensway West / South of Bloo

In [16]:
toronto_venues['Neighborhood'].value_counts().describe()

count     97.000000
mean      51.051546
std       35.454011
min        3.000000
25%       20.000000
50%       41.000000
75%      100.000000
max      111.000000
Name: Neighborhood, dtype: float64

**=> The number of venues per neighborhood varies significantly.**

### Segment the neighborhood based on venues similarity.
#### Create onehot encoded vectors of the venues

In [17]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood']
# move neighborhood column to the first column
fixed_columns = ['Neighborhood'] + list(set(toronto_onehot.columns) - set(['Neighborhood']))

toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Neighborhood,Lake,Cantonese Restaurant,Laundry Service,Japanese Restaurant,Comic Shop,French Restaurant,Women's Store,Garden,Asian Restaurant,...,Gaming Cafe,Massage Studio,Hostel,Portuguese Restaurant,Bagel Shop,Farm,Stationery Store,Jazz Club,Video Store,Sandwich Place
0,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### Get the venues per each neighborhood using groupby
#### This dataframe can be used as a feature matrix to build neighborhood clusters

In [18]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Lake,Cantonese Restaurant,Laundry Service,Japanese Restaurant,Comic Shop,French Restaurant,Women's Store,Garden,Asian Restaurant,...,Gaming Cafe,Massage Studio,Hostel,Portuguese Restaurant,Bagel Shop,Farm,Stationery Store,Jazz Club,Video Store,Sandwich Place
0,Agincourt,0.0,0.02,0.0,0.020000,0.0,0.000000,0.0,0.0,0.020000,...,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,0.0,0.000000,0.040000
1,Alderwood / Long Branch,0.0,0.00,0.0,0.000000,0.0,0.000000,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,0.0,0.000000,0.040000
2,Bathurst Manor / Wilson Heights / Downsview North,0.0,0.00,0.0,0.000000,0.0,0.000000,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,0.0,0.032258,0.032258
3,Bayview Village,0.0,0.00,0.0,0.133333,0.0,0.000000,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,0.0,0.000000,0.000000
4,Bedford Park / Lawrence Manor East,0.0,0.00,0.0,0.000000,0.0,0.000000,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.024390,0.048780
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
92,Willowdale / Newtonbrook,0.0,0.00,0.0,0.034483,0.0,0.000000,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,0.0,0.000000,0.034483
93,Woburn,0.0,0.00,0.0,0.000000,0.0,0.000000,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,0.0,0.000000,0.000000
94,Woodbine Heights,0.0,0.00,0.0,0.000000,0.0,0.000000,0.0,0.0,0.032258,...,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,0.0,0.032258,0.064516
95,York Mills / Silver Hills,0.0,0.00,0.0,0.000000,0.0,0.000000,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,0.0,0.000000,0.000000


### Get Most popular venues for each neighborhood

In [19]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [20]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Chinese Restaurant,Shopping Mall,Sandwich Place,Pizza Place,Caribbean Restaurant,Bakery,Coffee Shop,Discount Store,Seafood Restaurant,Park
1,Alderwood / Long Branch,Pharmacy,Discount Store,Pizza Place,Sandwich Place,Pub,Grocery Store,Liquor Store,Shopping Mall,Intersection,Gym
2,Bathurst Manor / Wilson Heights / Downsview North,Pizza Place,Coffee Shop,Convenience Store,Bank,Fried Chicken Joint,Trail,Diner,Supermarket,Community Center,Dog Run
3,Bayview Village,Grocery Store,Japanese Restaurant,Gas Station,Bank,Chinese Restaurant,Trail,Shopping Mall,Park,Restaurant,Intersection
4,Bedford Park / Lawrence Manor East,Italian Restaurant,Coffee Shop,Sandwich Place,Restaurant,Bank,Fast Food Restaurant,Butcher,Bridal Shop,Liquor Store,Sushi Restaurant


### Clustering using KMeans
- **Use KMeans clustering with K = 5 to cluster the neighborhoods by the venues features.**

In [21]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters,n_init=100).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 2, 2, 2, 0, 0, 0, 0, 0, 0])

### Merge labels data and popular venues with other data frame

In [22]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
df_toronto_clust = df_toronto.merge(neighborhoods_venues_sorted, on='Neighborhood')

df_toronto_clust.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,2,Park,Pharmacy,Convenience Store,Bus Stop,Shopping Mall,Chinese Restaurant,Road,Discount Store,Food & Drink Shop,Pizza Place
1,M4A,North York,Victoria Village,43.725882,-79.315572,0,Coffee Shop,Golf Course,Boxing Gym,Hockey Arena,Men's Store,Sporting Goods Shop,Gym / Fitness Center,Lounge,Park,Pizza Place
2,M5A,Downtown Toronto,Regent Park / Harbourfront,43.65426,-79.360636,0,Coffee Shop,Café,Park,Theater,Diner,Pub,Bakery,Breakfast Spot,Restaurant,Italian Restaurant
3,M6A,North York,Lawrence Manor / Lawrence Heights,43.718518,-79.464763,0,Fast Food Restaurant,Restaurant,Coffee Shop,Clothing Store,Vietnamese Restaurant,Accessories Store,Sushi Restaurant,Fried Chicken Joint,Dessert Shop,Furniture / Home Store
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.662301,-79.389494,0,Coffee Shop,Sushi Restaurant,Park,Ramen Restaurant,Restaurant,Japanese Restaurant,Burger Joint,Italian Restaurant,Gastropub,Ice Cream Shop


### Visualize the clustered map using Folium package

In [23]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(df_toronto_clust['Latitude'], df_toronto_clust['Longitude'], df_toronto_clust['Neighborhood'], df_toronto_clust['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

- **We can note that most of the data falls within clusters 0 and 2.**

### Display similiar neighborhood

In [24]:
df_toronto_clust.loc[df_toronto_clust['Cluster Labels'] == 0, df_toronto_clust.columns[[1] + list(range(5, df_toronto_clust.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,North York,0,Coffee Shop,Golf Course,Boxing Gym,Hockey Arena,Men's Store,Sporting Goods Shop,Gym / Fitness Center,Lounge,Park,Pizza Place
2,Downtown Toronto,0,Coffee Shop,Café,Park,Theater,Diner,Pub,Bakery,Breakfast Spot,Restaurant,Italian Restaurant
3,North York,0,Fast Food Restaurant,Restaurant,Coffee Shop,Clothing Store,Vietnamese Restaurant,Accessories Store,Sushi Restaurant,Fried Chicken Joint,Dessert Shop,Furniture / Home Store
4,Downtown Toronto,0,Coffee Shop,Sushi Restaurant,Park,Ramen Restaurant,Restaurant,Japanese Restaurant,Burger Joint,Italian Restaurant,Gastropub,Ice Cream Shop
6,Scarborough,0,Fast Food Restaurant,Coffee Shop,Trail,Restaurant,Caribbean Restaurant,Chinese Restaurant,Bakery,Bus Station,Gym,Supermarket
7,North York,0,Restaurant,Japanese Restaurant,Coffee Shop,Gym,Asian Restaurant,Supermarket,Bank,Burger Joint,Pizza Place,Sandwich Place
8,North York,0,Restaurant,Japanese Restaurant,Coffee Shop,Gym,Asian Restaurant,Supermarket,Bank,Burger Joint,Pizza Place,Sandwich Place
10,Downtown Toronto,0,Coffee Shop,Gastropub,Japanese Restaurant,Café,Theater,Italian Restaurant,Restaurant,Diner,Ramen Restaurant,Creperie
14,East York,0,Park,Coffee Shop,Sandwich Place,Thai Restaurant,Pizza Place,Pastry Shop,Pub,Curling Ice,Cosmetics Shop,Liquor Store
15,Downtown Toronto,0,Coffee Shop,Café,Restaurant,Hotel,Seafood Restaurant,Gastropub,Italian Restaurant,Theater,Furniture / Home Store,Cosmetics Shop


In [25]:
df_toronto_clust.loc[df_toronto_clust['Cluster Labels'] == 1, df_toronto_clust.columns[[1] + list(range(5, df_toronto_clust.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,Scarborough,1,Burger Joint,Playground,Park,Breakfast Spot,Italian Restaurant,Sandwich Place,Bus Line,Auto Garage,BBQ Joint,Churrascaria
100,Etobicoke,1,Italian Restaurant,Park,Ice Cream Shop,Eastern European Restaurant,Gym / Fitness Center,Sandwich Place,BBQ Joint,Churrascaria,General Travel,Ramen Restaurant


In [26]:
df_toronto_clust.loc[df_toronto_clust['Cluster Labels'] == 2, df_toronto_clust.columns[[1] + list(range(5, df_toronto_clust.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,2,Park,Pharmacy,Convenience Store,Bus Stop,Shopping Mall,Chinese Restaurant,Road,Discount Store,Food & Drink Shop,Pizza Place
5,Etobicoke,2,Pharmacy,Skating Rink,Golf Course,Playground,Grocery Store,Convenience Store,Bank,Bakery,Shopping Mall,Café
9,East York,2,Pizza Place,Fast Food Restaurant,Brewery,Bakery,Athletics & Sports,Intersection,Pharmacy,Bus Line,Gym / Fitness Center,Pet Store
11,North York,2,Grocery Store,Fast Food Restaurant,Pizza Place,Coffee Shop,Park,Gym,Gas Station,Mediterranean Restaurant,Gym / Fitness Center,Pet Store
12,Etobicoke,2,Park,Pizza Place,Hotel,Convenience Store,Grocery Store,Bank,Gym,Mexican Restaurant,Fish & Chips Shop,Clothing Store
17,Etobicoke,2,Coffee Shop,Farmers Market,Shopping Mall,College Rec Center,Liquor Store,Park,Beer Store,Café,Fish & Chips Shop,Shopping Plaza
18,Scarborough,2,Pizza Place,Fast Food Restaurant,Coffee Shop,Grocery Store,Bank,Restaurant,Liquor Store,Beer Store,Electronics Store,Pharmacy
21,York,2,Park,Mexican Restaurant,Pharmacy,Fast Food Restaurant,Bank,Food Truck,Falafel Restaurant,Bus Line,Beer Store,Sporting Goods Shop
22,Scarborough,2,Coffee Shop,Park,Indian Restaurant,Fast Food Restaurant,Chinese Restaurant,Mobile Phone Shop,Pharmacy,Sandwich Place,Churrascaria,General Travel
26,Scarborough,2,Coffee Shop,Bakery,Gas Station,Indian Restaurant,Bank,Fried Chicken Joint,Hakka Restaurant,Board Shop,Bus Line,Intersection


In [27]:
df_toronto_clust.loc[df_toronto_clust['Cluster Labels'] == 3, df_toronto_clust.columns[[1] + list(range(5, df_toronto_clust.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
48,North York,3,Park,Pool,Sandwich Place,Other Repair Shop,BBQ Joint,Churrascaria,General Travel,Ramen Restaurant,Music School,Bus Line


In [28]:
df_toronto_clust.loc[df_toronto_clust['Cluster Labels'] == 4, df_toronto_clust.columns[[1] + list(range(5, df_toronto_clust.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
94,Etobicoke,4,Rental Car Location,Hotel,Coffee Shop,Baseball Stadium,Auto Garage,BBQ Joint,Churrascaria,General Travel,Ramen Restaurant,Music School


- **Some clusters contains only a single neighborhood. These could be regarded as outliers. A density based clustering would be more suitable in this case.**