# Jupyter Notebook for Week 3 Project
## IBM Applied Data Science Capstone Course
### Charles Vuono

In this notebook, we examine neighborhoods in the central boroughs of the Toronto metropolitan area (the boroughs in "Toronto" and "York") using a *k-means clustering* approach to the venue counts of venues in the neighborhood as provided by the developper API of *Four Square*. 

## PART ONE
# Creating the Neighborhoods data frame
After loading releveant libraries, the data frame is created by scraping the Wikipedia page on Toronto postal codes *https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M*. 



In [2]:
# Load Libraries 
import random # library for random number generation
import numpy as np # library for vectorized computation
import pandas as pd # library to process data as dataframes

import matplotlib.pyplot as plt # plotting library
# backend for rendering plots within the browser
%matplotlib inline 

from sklearn.cluster import KMeans 
from sklearn.datasets.samples_generator import make_blobs

import requests
import urllib.request

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't installed geopy
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values


# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't installed folium
import folium # map rendering library



print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    altair-4.1.0               |             py_1         614 KB  conda-forge
    ca-certificates-2020.4.5.1 |       hecc5488_0         146 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    branca-0.4.1               |             py_0          26 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    certifi-2020.4.5.1         |   py36h9f0ad1d_0         151 KB  conda-forge
    ------------------------------------------------------------
                       

In [72]:
# Reading url into a data frame

url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
r = requests.get(url)
wikdf_list = pd.read_html(r.text) # this parses all the tables in webpages to a list
wikdf = wikdf_list[0]                # this sets the data frame to the first element of the list of tables
wikdf = wikdf[wikdf['Borough']!='Not assigned']     # this eliminates everywhere that the Borough is "Not Assigned"
wikdf.head(10)


Unnamed: 0,Postal Code,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"
11,M3B,North York,Don Mills
12,M4B,East York,"Parkview Hill, Woodbine Gardens"
13,M5B,Downtown Toronto,"Garden District, Ryerson"


In [75]:
# Find the shape of the data frame
print("The data frame has", wikdf.shape[0], "observations of", wikdf.shape[1], "variables.")

The data frame has 103 observations of 3 variables.


## PART TWO
# Adding geospatial data to the data frame
Longtitude and Latitude data is added tot he data frame by merging the data frame with geospatial data provided in a csv file available at codes *http://cocl.us/Geospatial_data*. The negihborhoods are mapped using the *Folium* library.



In [76]:
# Load the longitude and latitudes into a df and merge into the main df with a left join

geourl = 'http://cocl.us/Geospatial_data'
csvdf = pd. read_csv(geourl)
neighborhoods = pd.merge(wikdf,csvdf,on='Postal Code',how='left')

neighborhoods.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


In [7]:
# Define a user_agent for an instance of geocoder for Toronto
address = 'Toronto, On'

geolocator = Nominatim(user_agent="yyz_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [8]:
# create map of Toronto using latitude and longitude values
map_yyz = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_yyz)  
    
map_yyz

## PART THREE
# Exploring the Neighborhoods
Next we investigate the characteristics of the various neighborhoods in the central part of metropolitan Toronto, which we define as those boroughs containing either "Toronto" or "York" int their names. The data frame is extended by adding venue information for each neighborhood from the *FourSquare* developer API. The venue information is consoidated into frequency indices for each neighborhood that allows us to perform a *k-means clustering* algorithm to group similar neighborhoods based on the venues that are located within them. Finally we investigate and map the clustered neighborhoods. 

# Step One: Acquiring the *FourSquare* data for the Toronto and York neigbhorhoods

In [77]:
# Define Four Square Credentials

CLIENT_ID = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX' # Foursquare ID
CLIENT_SECRET = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX' # Foursquare Secret
VERSION = '20180605' # Foursquare API version

In [42]:
# Create a function to explore neighborhoods with FourSquare data

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
                    
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [58]:
# Get venues in Toronto and York

# limit to neighborhoods in Toronto and York
yyz_data = pd.concat([neighborhoods[neighborhoods['Borough'].str.contains("Toronto")].reset_index(drop=True), neighborhoods[neighborhoods['Borough'].str.contains("York")].reset_index(drop=True)])

# Get the venues using the getNearbyVenues function created above and store in a Pandas data frame yyz_venues

LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
      
yyz_venues = getNearbyVenues(names=yyz_data['Neighborhood'],latitudes=yyz_data['Latitude'], longitudes=yyz_data['Longitude'])
yyz_venues.head()


Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
The Beaches
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
The Danforth West, Riverdale
Toronto Dominion Centre, Design Exchange
Brockton, Parkdale Village, Exhibition Place
India Bazaar, The Beaches West
Commerce Court, Victoria Hotel
Studio District
Lawrence Park
Roselawn
Davisville North
Forest Hill North & West, Forest Hill Road Park
High Park, The Junction South
North Toronto West, Lawrence Park
The Annex, North Midtown, Yorkville
Parkdale, Roncesvalles
Davisville
University of Toronto, Harbord
Runnymede, Swansea
Moore Park, Summerhill East
Kensington Market, Chinatown, Grange Park
Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
R

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Regent Park, Harbourfront",43.65426,-79.360636,Morning Glory Cafe,43.653947,-79.361149,Breakfast Spot
3,"Regent Park, Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
4,"Regent Park, Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa


In [59]:
# Describe the nearby_venues data frame

print('{} total venues were returned by Foursquare.'.format(yyz_venues.shape[0]))
print('There are {} uniques categories.'.format(len(yyz_venues['Venue Category'].unique())))

1955 total venues were returned by Foursquare.
There are 259 uniques categories.


# Step Two: Perform a *k-means clustering* on the venue data

In [60]:
# Create the dataframe that will be used for the clustering analysis

# one hot encoding
yyz_onehot = pd.get_dummies(yyz_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
yyz_onehot['Neighborhood'] = yyz_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [yyz_onehot.columns[-1]] + list(yyz_onehot.columns[:-1])
yyz_onehot = yyz_onehot[fixed_columns]

# create grouped data frame
yyz_grouped = yyz_onehot.groupby('Neighborhood').mean().reset_index()

yyz_grouped.head()

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Brockton, Parkdale Village, Exhibition Place",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [61]:
# Reshape the dataframe to list the most common venues by neighborhood

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = yyz_grouped['Neighborhood']

for ind in np.arange(yyz_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(yyz_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Bathurst Manor, Wilson Heights, Downsview North",Coffee Shop,Bank,Pizza Place,Diner,Sandwich Place,Bridal Shop,Deli / Bodega,Restaurant,Ice Cream Shop,Supermarket
1,Bayview Village,Chinese Restaurant,Café,Bank,Japanese Restaurant,Discount Store,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Women's Store
2,"Bedford Park, Lawrence Manor East",Sushi Restaurant,Coffee Shop,Sandwich Place,Italian Restaurant,Restaurant,Thai Restaurant,Juice Bar,Liquor Store,Indian Restaurant,Pub
3,Berczy Park,Coffee Shop,Cocktail Bar,Pub,Restaurant,Beer Bar,Bakery,Café,Cheese Shop,Seafood Restaurant,Irish Pub
4,"Brockton, Parkdale Village, Exhibition Place",Café,Nightclub,Breakfast Spot,Coffee Shop,Climbing Gym,Furniture / Home Store,Burrito Place,Bakery,Stadium,Convenience Store


In [62]:
# CREATE THE CLUSTERS

# set number of clusters
kclusters = 5

yyz_grouped_clustering = yyz_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(yyz_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

# drop old clustering labels if they exist
if 'Cluster Labels' in neighborhoods_venues_sorted.columns:
        neighborhoods_venues_sorted = neighborhoods_venues_sorted.drop(columns='Cluster Labels', axis=1)

# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)  # uncomment if custering labels not already inserted

yyz_merged = yyz_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
yyz_merged = yyz_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

yyz_merged.head()


Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,0,Coffee Shop,Pub,Park,Bakery,Breakfast Spot,Theater,Café,Electronics Store,Shoe Store,Event Space
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,0,Coffee Shop,Yoga Studio,Beer Bar,Smoothie Shop,Bar,Bank,Diner,Discount Store,Mexican Restaurant,Creperie
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,0,Clothing Store,Coffee Shop,Café,Cosmetics Shop,Japanese Restaurant,Middle Eastern Restaurant,Bubble Tea Shop,Italian Restaurant,Restaurant,Tea Room
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,0,Coffee Shop,Café,Cocktail Bar,Gastropub,Restaurant,American Restaurant,Theater,Moroccan Restaurant,Seafood Restaurant,Hotel
4,M4E,East Toronto,The Beaches,43.676357,-79.293031,0,Coffee Shop,Health Food Store,Pub,Trail,Park,General Entertainment,Creperie,Doner Restaurant,Dog Run,Distribution Center


Note: Several choices of *k* were considered in conducting the *k-means clustering* analysis. We settled on 5 clusters as that provided a robust selection of clusters with fewer "single neighborhood" clusters.  For all choices of *k* there was a single large cluster suggesting a high degree of overlap in the venue composition accross neighborhoods. 

# Step Three: Map the clustered neigbhorhood data and display the results by cluster

In [63]:
# Map the clusters

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(yyz_merged['Latitude'], yyz_merged['Longitude'], yyz_merged['Neighborhood'], yyz_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [64]:
#Examine the clusters

yyz_merged.loc[yyz_merged['Cluster Labels'] == 0, yyz_merged.columns[[1] + list(range(5, yyz_merged.shape[1]))]]


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,0,Coffee Shop,Pub,Park,Bakery,Breakfast Spot,Theater,Café,Electronics Store,Shoe Store,Event Space
1,Downtown Toronto,0,Coffee Shop,Yoga Studio,Beer Bar,Smoothie Shop,Bar,Bank,Diner,Discount Store,Mexican Restaurant,Creperie
2,Downtown Toronto,0,Clothing Store,Coffee Shop,Café,Cosmetics Shop,Japanese Restaurant,Middle Eastern Restaurant,Bubble Tea Shop,Italian Restaurant,Restaurant,Tea Room
3,Downtown Toronto,0,Coffee Shop,Café,Cocktail Bar,Gastropub,Restaurant,American Restaurant,Theater,Moroccan Restaurant,Seafood Restaurant,Hotel
4,East Toronto,0,Coffee Shop,Health Food Store,Pub,Trail,Park,General Entertainment,Creperie,Doner Restaurant,Dog Run,Distribution Center
5,Downtown Toronto,0,Coffee Shop,Cocktail Bar,Pub,Restaurant,Beer Bar,Bakery,Café,Cheese Shop,Seafood Restaurant,Irish Pub
6,Downtown Toronto,0,Coffee Shop,Sandwich Place,Italian Restaurant,Café,Burger Joint,Salad Place,Japanese Restaurant,Bubble Tea Shop,Yoga Studio,Diner
7,Downtown Toronto,0,Grocery Store,Café,Park,Restaurant,Baby Store,Coffee Shop,Nightclub,Diner,Athletics & Sports,Italian Restaurant
8,Downtown Toronto,0,Coffee Shop,Café,Restaurant,Deli / Bodega,Thai Restaurant,Gym,Hotel,Clothing Store,Salad Place,Bookstore
9,West Toronto,0,Bakery,Pharmacy,Park,Pizza Place,Smoke Shop,Brewery,Bar,Bank,Supermarket,Middle Eastern Restaurant


In [65]:
#Examine the clusters

yyz_merged.loc[yyz_merged['Cluster Labels'] == 1, yyz_merged.columns[[2,1] + list(range(5, yyz_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
23,"Willowdale, Newtonbrook",North York,1,Piano Bar,Women's Store,Cupcake Shop,Eastern European Restaurant,Donut Shop,Doner Restaurant,Dog Run,Distribution Center,Discount Store,Diner


In [66]:
#Examine the clusters

yyz_merged.loc[yyz_merged['Cluster Labels'] == 2, yyz_merged.columns[[2,1] + list(range(5, yyz_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
16,"East Toronto, Broadview North (Old East York)",East York,2,Park,Coffee Shop,Convenience Store,Diner,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Discount Store
31,Weston,York,2,Park,Convenience Store,Curling Ice,Electronics Store,Eastern European Restaurant,Donut Shop,Doner Restaurant,Dog Run,Distribution Center,Discount Store
32,York Mills West,North York,2,Park,Convenience Store,Curling Ice,Electronics Store,Eastern European Restaurant,Donut Shop,Doner Restaurant,Dog Run,Distribution Center,Discount Store


In [67]:
#Examine the clusters

yyz_merged.loc[yyz_merged['Cluster Labels'] == 3, yyz_merged.columns[[2,1] + list(range(5, yyz_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,Humber Summit,North York,3,Shopping Mall,Women's Store,Ethiopian Restaurant,Eastern European Restaurant,Donut Shop,Doner Restaurant,Dog Run,Distribution Center,Discount Store,Diner


In [68]:
#Examine the clusters

yyz_merged.loc[yyz_merged['Cluster Labels'] == 4, yyz_merged.columns[[2,1] + list(range(5, yyz_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,Lawrence Park,Central Toronto,4,Park,Swim School,Bus Line,Diner,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Discount Store,Ethiopian Restaurant
21,"Forest Hill North & West, Forest Hill Road Park",Central Toronto,4,Park,Jewelry Store,Sushi Restaurant,Trail,Donut Shop,Doner Restaurant,Dog Run,Distribution Center,Discount Store,Cupcake Shop
33,Rosedale,Downtown Toronto,4,Park,Playground,Trail,Donut Shop,Doner Restaurant,Dog Run,Distribution Center,Eastern European Restaurant,Discount Store,Cuban Restaurant
0,Parkwoods,North York,4,Park,Food & Drink Shop,Ethiopian Restaurant,Eastern European Restaurant,Donut Shop,Doner Restaurant,Dog Run,Distribution Center,Discount Store,Diner
8,Humewood-Cedarvale,York,4,Trail,Park,Field,Hockey Arena,Diner,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant
9,Caledonia-Fairbanks,York,4,Park,Pool,Women's Store,College Rec Center,Colombian Restaurant,Electronics Store,Eastern European Restaurant,Donut Shop,Doner Restaurant,Dog Run
21,"North Park, Maple Leaf Park, Upwood Park",North York,4,Park,Bakery,Construction & Landscaping,Eastern European Restaurant,Donut Shop,Doner Restaurant,Dog Run,Distribution Center,Curling Ice,Discount Store


## Conclusion
Many of the neighborhoods in central Toronto have similar venues within them as the clustering analysis shows one large cluster containing well over half of the neighborhoods. 