# Segmenting and Clustering Data (Week 3 Assignment)

## Part 1: Getting the Data

First, install the necessary libraries:

In [1]:
!pip install beautifulsoup4
!pip install lxml
#!pip install requests

Collecting beautifulsoup4
[?25l  Downloading https://files.pythonhosted.org/packages/3b/c8/a55eb6ea11cd7e5ac4bacdf92bac4693b90d3ba79268be16527555e186f0/beautifulsoup4-4.8.1-py3-none-any.whl (101kB)
[K     |████████████████████████████████| 102kB 16.0MB/s ta 0:00:01
[?25hCollecting soupsieve>=1.2 (from beautifulsoup4)
  Downloading https://files.pythonhosted.org/packages/81/94/03c0f04471fc245d08d0a99f7946ac228ca98da4fa75796c507f61e688c2/soupsieve-1.9.5-py2.py3-none-any.whl
Installing collected packages: soupsieve, beautifulsoup4
Successfully installed beautifulsoup4-4.8.1 soupsieve-1.9.5
Collecting lxml
[?25l  Downloading https://files.pythonhosted.org/packages/ec/be/5ab8abdd8663c0386ec2dd595a5bc0e23330a0549b8a91e32f38c20845b6/lxml-4.4.1-cp36-cp36m-manylinux1_x86_64.whl (5.8MB)
[K     |████████████████████████████████| 5.8MB 27.9MB/s eta 0:00:01
[?25hInstalling collected packages: lxml
Successfully installed lxml-4.4.1


<hr>
Import all necessary libraries:

In [2]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np

print("Libraries imported")

Libraries imported


<hr>
Store the HTML and table data in Python variables:

In [3]:
html = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M").text

wikiPage = BeautifulSoup(html, "lxml")
    
postalTable = wikiPage.find("table")

<hr>

Create a list of headers for the column names.<br>
The following code loops through all the <code>\<th></code> tags, which contain the names of the columns, and stores the names in a list.<br>
(It also removes the \n of the last item.)

In [4]:
headers = []

for headName in postalTable.tbody.tr.find_all("th"):
    headers.append(headName.text.replace("\n", ""))
    
print(headers)

['Postcode', 'Borough', 'Neighbourhood']


<hr>

Create a list of nested lists as rows to populate the table.<br>
The following code loops through all the <code>\<tr></code> tags, which contain the values for the rows.<br>
It loops through every <code>\<td></code> tag in the <code>\<tr></code> tags, which are the individual cells in each row.<br>
Lastly, it gets rid of the first row because it is an empty header row.<br>
(It also removes the \n of the last item of each row.)

In [5]:
rows = []

for row in postalTable.tbody.find_all("tr"):
    rows.append([])
    for cell in row.find_all("td"):
        rows[-1].append(cell.text.replace("\n", ""))
        
del(rows[0])
print(len(rows), "rows")
print(rows[0:5])

288 rows
[['M1A', 'Not assigned', 'Not assigned'], ['M2A', 'Not assigned', 'Not assigned'], ['M3A', 'North York', 'Parkwoods'], ['M4A', 'North York', 'Victoria Village'], ['M5A', 'Downtown Toronto', 'Harbourfront']]


<hr>

Create a data frame using the <code>headers</code> list for the column names and the <code>rows</code> list for the rows.<br>
It also makes the name of the data frame variable shorter.

In [6]:
neighborhoodTable = pd.DataFrame(columns=headers, data=rows)

nht = neighborhoodTable

nht

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
...,...,...,...
283,M8Z,Etobicoke,Mimico NW
284,M8Z,Etobicoke,The Queensway West
285,M8Z,Etobicoke,Royal York South West
286,M8Z,Etobicoke,South of Bloor


<hr>

The following code is for cleaning the data frame.<br>
<ul>
    <li>It renames the first column</li>
    <li>It changes all the "Not assigned" cells for <code>NaN</code> values</li>
    <li>It drops rows where "Borough" had a <code>NaN</code> value</li>
    <li>It replaces the <code>NaN</code> values in "Neighbourhood" for the corresponding value in "Borough"
</ul>

In [7]:
nht.rename(columns={"Postcode":"PostalCode"}, inplace=True)

nht.replace("Not assigned", np.nan, inplace=True)

nht.dropna(subset=["Borough"], inplace=True)
nht.reset_index(drop=True, inplace=True)

for index, row in enumerate(nht["Neighbourhood"]):
    if (type(row) == type(np.nan)):
        nht.replace(row, nht["Borough"][index], inplace=True)

<hr>

The following code merges all the neighborhoods that are from the same borough into a single string.<br>
It loops through all the unique postal codes, and each iteration loops through all the boroughs.<br>
If the postal code for the borough matches the unique postal code, it makes a string object with all the neighborhoods in the borough.<br>
This is done for all the Postal Codes to group the neighborhoods.<br>
It then creates another list with nested lists that have the rows merged.<br>
I think there is probably an easier way of doing this, but I couldn't figure it out.

In [8]:
mergedRows = []

for indexP, postcode in enumerate(nht["PostalCode"].unique()):
    neighborhoods = ""
    for indexB, borough in enumerate(nht["Borough"]):
        if (nht["PostalCode"][indexB] == postcode):
            neighborhoods = neighborhoods + nht["Neighbourhood"][indexB] + ", "
            newIndex = indexB
    neighborhoods = neighborhoods.replace(neighborhoods, neighborhoods[0:-2])
    mergedRows.append([postcode, nht["Borough"][newIndex], neighborhoods])

<hr>

The following code creates a data frame with the same headers as before, but with rows that have all the neighborhoods in a borough.<br>
Now, all the neighborhoods are grouped by borough, which are grouped by postal code.

In [9]:
mergedTable = pd.DataFrame(columns=headers, data=mergedRows)

nht2 = mergedTable

nht2

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Harbourfront, Regent Park"
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M7A,Queen's Park,Queen's Park
...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,Business Reply Mail Processing Centre 969 Eastern
101,M8Y,Etobicoke,"Humber Bay, King's Mill Park, Kingsway Park So..."


<hr>

Finally, I print the shape of the resulting data frame.

In [10]:
nht2.shape

(103, 3)

<hr>
This is the end of Part 1
<hr>

## Part 2: Getting the Coordinates

I wasn't able to use the Geocoder Python package to get the coordinates, so I used the CSV file instead.

In [11]:
postCoords = pd.read_csv("https://cocl.us/Geospatial_data")
postCoords

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
...,...,...,...
98,M9N,43.706876,-79.518188
99,M9P,43.696319,-79.532242
100,M9R,43.688905,-79.554724
101,M9V,43.739416,-79.588437


<hr>

Before combining the tables, I sorted the rows from the first table so the Postal Codes matched the rows in the coordinate table.

In [12]:
nhtSorted = nht2.sort_values("Postcode")
nhtSorted.reset_index(drop=True, inplace=True)
nhtSorted.rename(columns={"Postcode":"Postal Code"}, inplace=True)
nhtSorted

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
...,...,...,...
98,M9N,York,Weston
99,M9P,Etobicoke,Westmount
100,M9R,Etobicoke,"Kingsview Village, Martin Grove Gardens, Richv..."
101,M9V,Etobicoke,"Albion Gardens, Beaumond Heights, Humbergate, ..."


<hr>

Lastly, I merged the tables in a single table that has the Postal Code, Borough, Neighborhood, and Coordinates.

In [13]:
newTable = pd.merge(nhtSorted, postCoords, on="Postal Code")
newTable

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
...,...,...,...,...,...
98,M9N,York,Weston,43.706876,-79.518188
99,M9P,Etobicoke,Westmount,43.696319,-79.532242
100,M9R,Etobicoke,"Kingsview Village, Martin Grove Gardens, Richv...",43.688905,-79.554724
101,M9V,Etobicoke,"Albion Gardens, Beaumond Heights, Humbergate, ...",43.739416,-79.588437


<hr>
This is the end of Part 2
<hr>

## Part 3: Exploration and Clustering

Importing/Installing the necessary libraries: 

In [14]:
!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes
import folium

Solving environment: done


  current version: 4.5.11
  latest version: 4.7.12

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.2 MB

The following NEW packages will be INSTALLED:

    geographiclib: 1.50-py_0         conda-forge
    geopy:         1.20.0-py_0       conda-forge

The following packages will be UPDATED:

    openssl:       1.1.1

<hr>

Getting the coordinates for Toronto, Ontario:

In [15]:
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="CA_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Toronto, Ontario are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Toronto, Ontario are 43.653963, -79.387207.


<hr>

The following code plots a map of the different boroughs from the table from the previous section:

In [16]:
map_Toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

for lat, lng, borough, postcode in zip(newTable['Latitude'], newTable['Longitude'], newTable['Borough'], newTable['Postal Code']):
    label = '{}, {}'.format(postcode, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Toronto)  
    
map_Toronto

In [17]:
CLIENT_ID = '********'
CLIENT_SECRET = '********'
VERSION = '20180605'

LIMIT = 100
radius = 500

<hr>

I used the code from the lab to get a list of nearby venues for every borough in the table.

In [18]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Borough', 
                  'Borough Latitude', 
                  'Borough Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [19]:
ontario_venues = getNearbyVenues(names=newTable['Borough'],
                                   latitudes=newTable['Latitude'],
                                   longitudes=newTable['Longitude']
                                  )

Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
North York
North York
North York
North York
North York
North York
North York
North York
North York
North York
North York
North York
North York
North York
North York
North York
North York
North York
East York
East York
East Toronto
East York
East York
East York
East Toronto
East Toronto
East Toronto
Central Toronto
Central Toronto
Central Toronto
Central Toronto
Central Toronto
Central Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
North York
Central Toronto
Central Toronto
Central Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
North York
North York
York
York
Downtown Toronto
Wes

<hr>

Once again, I used the code from the lab to transform the categorical values of the venues' categories into numerical values.

In [20]:
ontario_onehot = pd.get_dummies(ontario_venues[['Venue Category']], prefix="", prefix_sep="")

ontario_onehot['Borough'] = ontario_venues['Borough'] 

fixed_columns = [ontario_onehot.columns[-1]] + list(ontario_onehot.columns[:-1])
ontario_onehot = ontario_onehot[fixed_columns]

ontario_onehot.head()

Unnamed: 0,Borough,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Scarborough,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Scarborough,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Scarborough,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Scarborough,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Scarborough,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


<hr>

Grouping the boroughs according to the mean occurrence of different venues.

In [21]:
ontario_grouped = ontario_onehot.groupby('Borough').mean().reset_index()
ontario_grouped

Unnamed: 0,Borough,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Central Toronto,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018018,...,0.0,0.009009,0.0,0.0,0.009009,0.0,0.0,0.0,0.0,0.009009
1,Downtown Toronto,0.0,0.000782,0.000782,0.000782,0.000782,0.001564,0.001564,0.001564,0.016419,...,0.002346,0.010946,0.002346,0.0,0.005473,0.0,0.007037,0.000782,0.0,0.002346
2,East Toronto,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023622,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023622
3,East York,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.013158
4,Etobicoke,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014085,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014085,0.0,0.0
5,Mississauga,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,North York,0.004016,0.0,0.004016,0.0,0.0,0.0,0.0,0.0,0.008032,...,0.0,0.0,0.004016,0.004016,0.008032,0.0,0.0,0.004016,0.012048,0.0
7,Queen's Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.02381
8,Scarborough,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,...,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.0
9,West Toronto,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.011364,0.0,0.0,0.011364,0.0,0.005682,0.0,0.0,0.005682


<hr>

Using the code from the lab to get the most common venues for each borough.

In [22]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [23]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

columns = ['Borough']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

boroughs_venues_sorted = pd.DataFrame(columns=columns)
boroughs_venues_sorted['Borough'] = ontario_grouped['Borough']

for ind in np.arange(ontario_grouped.shape[0]):
    boroughs_venues_sorted.iloc[ind, 1:] = return_most_common_venues(ontario_grouped.iloc[ind, :], num_top_venues)

boroughs_venues_sorted.head()

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central Toronto,Coffee Shop,Sandwich Place,Park,Café,Sushi Restaurant,Clothing Store,Pizza Place,Dessert Shop,Restaurant,Pub
1,Downtown Toronto,Coffee Shop,Café,Restaurant,Hotel,Bakery,Italian Restaurant,Bar,Japanese Restaurant,Seafood Restaurant,American Restaurant
2,East Toronto,Greek Restaurant,Coffee Shop,Italian Restaurant,Ice Cream Shop,Brewery,Café,Pizza Place,Sandwich Place,Light Rail Station,Yoga Studio
3,East York,Coffee Shop,Burger Joint,Park,Sporting Goods Shop,Sandwich Place,Bank,Pizza Place,Pharmacy,Liquor Store,Grocery Store
4,Etobicoke,Pizza Place,Sandwich Place,Pharmacy,Park,Discount Store,Café,Gym,Fast Food Restaurant,Coffee Shop,Fried Chicken Joint


<hr>
<hr>

Now, all the clustering occurs:

In [24]:
kclusters = 5

ontario_grouped_clustering = ontario_grouped.drop('Borough', 1)

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(ontario_grouped_clustering)

kmeans.labels_[0:10] 

array([0, 0, 0, 4, 4, 1, 0, 3, 4, 0], dtype=int32)

<hr>

Similar to the lab, I merged the original table that had the Postal Codes, Boroughs, Neighborhoods, and Coordinates with the table that was just generated with the most common venues for each borough.

In [25]:
boroughs_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

ontario_merged = newTable

ontario_merged = ontario_merged.join(boroughs_venues_sorted.set_index('Borough'), on='Borough')

ontario_merged.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353,4,Fast Food Restaurant,Breakfast Spot,Coffee Shop,Pizza Place,Chinese Restaurant,Bakery,Playground,Skating Rink,Park,Thai Restaurant
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,4,Fast Food Restaurant,Breakfast Spot,Coffee Shop,Pizza Place,Chinese Restaurant,Bakery,Playground,Skating Rink,Park,Thai Restaurant
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,4,Fast Food Restaurant,Breakfast Spot,Coffee Shop,Pizza Place,Chinese Restaurant,Bakery,Playground,Skating Rink,Park,Thai Restaurant
3,M1G,Scarborough,Woburn,43.770992,-79.216917,4,Fast Food Restaurant,Breakfast Spot,Coffee Shop,Pizza Place,Chinese Restaurant,Bakery,Playground,Skating Rink,Park,Thai Restaurant
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,4,Fast Food Restaurant,Breakfast Spot,Coffee Shop,Pizza Place,Chinese Restaurant,Bakery,Playground,Skating Rink,Park,Thai Restaurant


<hr>

Lastly, a map is generated with the different clusters marked with different color markers.

In [26]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, poi, cluster in zip(ontario_merged['Latitude'], ontario_merged['Longitude'], ontario_merged['Borough'], ontario_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<hr>

### Cluster Analysis

### Cluster 1

In [27]:
ontario_merged.loc[ontario_merged['Cluster Labels'] == 0, ontario_merged.columns[[1] + list(range(5, ontario_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,North York,0,Coffee Shop,Clothing Store,Fast Food Restaurant,Park,Japanese Restaurant,Pizza Place,Restaurant,Café,Bank,Sandwich Place
18,North York,0,Coffee Shop,Clothing Store,Fast Food Restaurant,Park,Japanese Restaurant,Pizza Place,Restaurant,Café,Bank,Sandwich Place
19,North York,0,Coffee Shop,Clothing Store,Fast Food Restaurant,Park,Japanese Restaurant,Pizza Place,Restaurant,Café,Bank,Sandwich Place
20,North York,0,Coffee Shop,Clothing Store,Fast Food Restaurant,Park,Japanese Restaurant,Pizza Place,Restaurant,Café,Bank,Sandwich Place
21,North York,0,Coffee Shop,Clothing Store,Fast Food Restaurant,Park,Japanese Restaurant,Pizza Place,Restaurant,Café,Bank,Sandwich Place
...,...,...,...,...,...,...,...,...,...,...,...,...
83,West Toronto,0,Bar,Coffee Shop,Café,Bakery,Italian Restaurant,Restaurant,Pizza Place,Park,Bookstore,Diner
84,West Toronto,0,Bar,Coffee Shop,Café,Bakery,Italian Restaurant,Restaurant,Pizza Place,Park,Bookstore,Diner
87,East Toronto,0,Greek Restaurant,Coffee Shop,Italian Restaurant,Ice Cream Shop,Brewery,Café,Pizza Place,Sandwich Place,Light Rail Station,Yoga Studio
96,North York,0,Coffee Shop,Clothing Store,Fast Food Restaurant,Park,Japanese Restaurant,Pizza Place,Restaurant,Café,Bank,Sandwich Place


### Cluster 2

In [28]:
ontario_merged.loc[ontario_merged['Cluster Labels'] == 1, ontario_merged.columns[[1] + list(range(5, ontario_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
86,Mississauga,1,Hotel,Coffee Shop,Mediterranean Restaurant,Middle Eastern Restaurant,Gym / Fitness Center,Fried Chicken Joint,Sandwich Place,American Restaurant,Burrito Place,Dim Sum Restaurant


### Cluster 3

In [29]:
ontario_merged.loc[ontario_merged['Cluster Labels'] == 2, ontario_merged.columns[[1] + list(range(5, ontario_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
73,York,2,Park,Convenience Store,Trail,Restaurant,Sandwich Place,Check Cashing Service,Field,Hockey Arena,Fast Food Restaurant,Bus Line
74,York,2,Park,Convenience Store,Trail,Restaurant,Sandwich Place,Check Cashing Service,Field,Hockey Arena,Fast Food Restaurant,Bus Line
80,York,2,Park,Convenience Store,Trail,Restaurant,Sandwich Place,Check Cashing Service,Field,Hockey Arena,Fast Food Restaurant,Bus Line
81,York,2,Park,Convenience Store,Trail,Restaurant,Sandwich Place,Check Cashing Service,Field,Hockey Arena,Fast Food Restaurant,Bus Line
98,York,2,Park,Convenience Store,Trail,Restaurant,Sandwich Place,Check Cashing Service,Field,Hockey Arena,Fast Food Restaurant,Bus Line


### Cluster 4

In [30]:
ontario_merged.loc[ontario_merged['Cluster Labels'] == 3, ontario_merged.columns[[1] + list(range(5, ontario_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
85,Queen's Park,3,Coffee Shop,Gym,Diner,Park,College Cafeteria,Sandwich Place,Portuguese Restaurant,Nightclub,Mexican Restaurant,Japanese Restaurant


### Cluster 5

In [31]:
ontario_merged.loc[ontario_merged['Cluster Labels'] == 4, ontario_merged.columns[[1] + list(range(5, ontario_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Scarborough,4,Fast Food Restaurant,Breakfast Spot,Coffee Shop,Pizza Place,Chinese Restaurant,Bakery,Playground,Skating Rink,Park,Thai Restaurant
1,Scarborough,4,Fast Food Restaurant,Breakfast Spot,Coffee Shop,Pizza Place,Chinese Restaurant,Bakery,Playground,Skating Rink,Park,Thai Restaurant
2,Scarborough,4,Fast Food Restaurant,Breakfast Spot,Coffee Shop,Pizza Place,Chinese Restaurant,Bakery,Playground,Skating Rink,Park,Thai Restaurant
3,Scarborough,4,Fast Food Restaurant,Breakfast Spot,Coffee Shop,Pizza Place,Chinese Restaurant,Bakery,Playground,Skating Rink,Park,Thai Restaurant
4,Scarborough,4,Fast Food Restaurant,Breakfast Spot,Coffee Shop,Pizza Place,Chinese Restaurant,Bakery,Playground,Skating Rink,Park,Thai Restaurant
5,Scarborough,4,Fast Food Restaurant,Breakfast Spot,Coffee Shop,Pizza Place,Chinese Restaurant,Bakery,Playground,Skating Rink,Park,Thai Restaurant
6,Scarborough,4,Fast Food Restaurant,Breakfast Spot,Coffee Shop,Pizza Place,Chinese Restaurant,Bakery,Playground,Skating Rink,Park,Thai Restaurant
7,Scarborough,4,Fast Food Restaurant,Breakfast Spot,Coffee Shop,Pizza Place,Chinese Restaurant,Bakery,Playground,Skating Rink,Park,Thai Restaurant
8,Scarborough,4,Fast Food Restaurant,Breakfast Spot,Coffee Shop,Pizza Place,Chinese Restaurant,Bakery,Playground,Skating Rink,Park,Thai Restaurant
9,Scarborough,4,Fast Food Restaurant,Breakfast Spot,Coffee Shop,Pizza Place,Chinese Restaurant,Bakery,Playground,Skating Rink,Park,Thai Restaurant


It appears that cluster 1 mainly grouped boroughs whose most common venues were Coffee Shops.<br>
Cluster 2 is a single borough that is further away from the rest; it is the only point in Mississauga.<br>
Cluster 3 appears to have grouped boroughs whose most common venues were parks.<br>
Cluster 4 is also a single borough, but it's found in the middle of the points for cluster 1; it's the only point in Queen's Park.<br>
Finally, cluster 5 seems to have grouped boroughs whose main venues were places to go eat.<br><br>
It is interesting that cluster 5 has points to the sides of cluster 1, while cluster 3 almost seems to be part of cluster 1.