<h1 align=center><font size = 5>Capstone Week 3 - Segmenting and Clustering Neighborhoods in Toronto</font></h1>

## Part 3 - Clustering the Neighborhoods
explore and cluster the neighborhoods in Toronto.

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1. <a href="#item1">Part 1 - Data Scraping of Neighborhood Names and Postal Codes</a>

2. <a href="#item2">Part 2 - Get Latitue and Longitude Coordinates of Neighborhoods</a>

3. <a href="#item3">Part 3 - Clustering the Neighborhoods</a>

</font>
</div>

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [4]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# used for scraping
import requests
from bs4 import BeautifulSoup
import matplotlib.colors as colors

#!conda install -c conda-forge geocoder --yes # uncomment this line if you haven't completed the Foursquare API lab
import geocoder # import geocoder

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    altair-4.0.1               |             py_0         575 KB  conda-forge
    branca-0.4.0               |             py_0          26 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         673 KB

The following NEW packages will be INSTALLED:

    altair:  4.0.1-py_0 conda-forge
    branca:  0.4.0-py_0 conda-forge
    folium:  0.5.0-py_0 conda-forge
    vincent: 0.4.4-py_1 conda-forge


Downloading and Extracting Packages
vincent-0.4.4        | 28 KB     | #####

<a id='item1'></a>

## 1. Download and Explore Dataset

In [149]:
# Get the json from wikipedia and read it with beautiful soup
res = requests.get(" https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[0] 
df = pd.read_html(str(table))

# The wikipedia page is not really a table, it's more like a matrix.  Concatenate all the text into a single column
df=df[0]
frames = [df.iloc[:,0],df.iloc[:,1],df.iloc[:,2],df.iloc[:,3],df.iloc[:,4],df.iloc[:,5],df.iloc[:,6],df.iloc[:,7]]
df = pd.concat(frames)
df = df.to_frame()
df.reset_index(inplace = True, drop=True) # Fix the indices
df.columns = ['raw']

# get rid of rows for postal codes that aren't assigned
df = df[~df.raw.str.contains("Not assigned")]
df = df[~df.raw.str.contains("Queen's Park")] # This gets rid of Queen's Park, which is mostly a government center, so not interesting to us here.
df = df[~df.raw.str.contains("Upper Rouge")] # This gets rid of Upper Rouge, which is a wilderness area, so not interesting to us here.
df = df[~df.raw.str.contains("Newtonbrook")] # This gets rid of Newtonbrook.  It's a distant exurb, and for some reason foursquare doesn't find any venues there, so I'm removing it.


# Ditch the "enclaves" too.  They're related to bulk mailing, and not interesting for the purposes of this analysis, which is centered on fun places to live.  
# Also they don't follow exactly the same format and mess up the cleaning process.
df = df[~df.raw.str.contains("Enclave")]
df.reset_index(inplace = True, drop=True)

# The dataframe has only a single column that has all the data kludged together.  Regex would probably be prettier, but let's pull the data out without it...
# Postal code is the first three letters
df['Postal Code'] = df['raw'].str[:3] 
df['raw'] = df['raw'].str[3:-1] # -1 removes the closing parentheses that we don't need

# now split up the neighborhood and boroughs
df[['Borough','Neighborhood']]=df.raw.str.split("(",n=1,expand=True,)
del df["raw"]

## 2. Get geolocation data

In [150]:
# Read the provided CSV, and convert to a dataframe
# I probably spent three hours trying to get the coordinates the hard way, and just came up empty.  Even the code provided by an instructor on the discussion boards didn't work for me. Sad.
!wget -O Geospatial_Coordiantes.csv http://cocl.us/Geospatial_data
locationDF = pd.read_csv("Geospatial_Coordiantes.csv")

--2020-03-23 02:22:18--  http://cocl.us/Geospatial_data
Resolving cocl.us (cocl.us)... 158.85.108.83, 158.85.108.86, 169.48.113.194
Connecting to cocl.us (cocl.us)|158.85.108.83|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://cocl.us/Geospatial_data [following]
--2020-03-23 02:22:20--  https://cocl.us/Geospatial_data
Connecting to cocl.us (cocl.us)|158.85.108.83|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://ibm.box.com/shared/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv [following]
--2020-03-23 02:22:23--  https://ibm.box.com/shared/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv
Resolving ibm.box.com (ibm.box.com)... 107.152.27.197, 107.152.26.197
Connecting to ibm.box.com (ibm.box.com)|107.152.27.197|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /public/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv [following]
--2020-03-23 02:22:24--  https://ib

### Merge the location data with the postal code data from Part 1

In [151]:
df = pd.merge(df, locationDF, on='Postal Code')

## 3. Cluster the Neighborhoods

In [155]:
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,Malvern / Rouge,43.806686,-79.194353
1,M1C,Scarborough,Rouge Hill / Port Union / Highland Creek,43.784535,-79.160497
2,M1E,Scarborough,Guildwood / Morningside / West Hill,43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


#### Use geopy library to get the latitude and longitude values of Toronto.

In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>ny_explorer</em>, as shown below.

In [156]:
address = 'Toronto, ON'
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

geolocator = Nominatim(user_agent="Toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


### Create a map of New York with neighborhoods superimposed on top.

In [157]:
# create map of New York using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### Define Foursquare Credentials and Version

In [244]:
# @hidden cell
CLIENT_ID = 'WJ53105PC20DYM2B5NH5TBDPOBP5D4CP2B4VS3ANCZDEGHNV' # your Foursquare ID
CLIENT_SECRET = 'MEQHUTWXJLBAOH1FITA0X02NJCNL5K3QTHFZSGBMIWGMD0NW' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: WJ53105PC20DYM2B5NH5TBDPOBP5D4CP2B4VS3ANCZDEGHNV
CLIENT_SECRET:MEQHUTWXJLBAOH1FITA0X02NJCNL5K3QTHFZSGBMIWGMD0NW


From the Foursquare lab in the previous module, we know that all the information is in the *items* key. Before we proceed, let's borrow the **get_category_type** function from the Foursquare lab.

In [158]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Let's create a function to repeat the same process to all the neighborhoods in Manhattan

In [159]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    LIMIT = 100 # limit of number of venues returned by Foursquare API
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Now write the code to run the above function on each neighborhood and create a new dataframe called manhattan_venues.¶

In [160]:
toronto_venues = getNearbyVenues(names=df['Neighborhood'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

Malvern / Rouge
Rouge Hill / Port Union / Highland Creek
Guildwood / Morningside / West Hill
Woburn
Cedarbrae
Scarborough Village
Kennedy Park / Ionview / East Birchmount Park
Golden Mile / Clairlea / Oakridge
Cliffside / Cliffcrest / Scarborough Village West
Birch Cliff / Cliffside West
Dorset Park / Wexford Heights / Scarborough Town Centre
Wexford / Maryvale
Agincourt
Clarks Corners / Tam O'Shanter / Sullivan
Milliken / Agincourt North / Steeles East / L'Amoreaux East
Steeles West / L'Amoreaux West
Hillcrest Village
Fairview / Henry Farm / Oriole
Bayview Village
York Mills / Silver Hills
Willowdale)Sout
York Mills West
Willowdale)Wes
Parkwoods
Don Mills)Nort
Don Mills)South(Flemingdon Park
Bathurst Manor / Wilson Heights / Downsview North
Northwood Park / York University
Downsview)East (CFB Toronto
Downsview)Wes
Downsview)Centra
Downsview)Northwes
Victoria Village
Parkview Hill / Woodbine Gardens
Woodbine Heights
The Beaches
Leaside
Thorncliffe Park
The Danforth East
The Danforth We

Let's find out how many unique categories can be curated from all the returned venues

In [161]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 264 uniques categories.


### Analyze Each Neighborhood

In [243]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [242]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()


### Let's print each neighborhood along with the top 5 most common venues

In [241]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    #print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    #print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    #print('\n')

Let's put that into a pandas dataframe

First, let's write a function to sort the venues in descending order.


In [167]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.


In [212]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Breakfast Spot,Skating Rink,Latin American Restaurant,Clothing Store,Lounge,Dog Run,Dim Sum Restaurant,Diner,Discount Store,Distribution Center
1,Alderwood / Long Branch,Pizza Place,Coffee Shop,Sandwich Place,Skating Rink,Pool,Pharmacy,Pub,Gym,Distribution Center,Discount Store
2,Bathurst Manor / Wilson Heights / Downsview North,Bank,Coffee Shop,Frozen Yogurt Shop,Sushi Restaurant,Ice Cream Shop,Deli / Bodega,Pizza Place,Pharmacy,Restaurant,Supermarket
3,Bayview Village,Chinese Restaurant,Café,Japanese Restaurant,Bank,Dog Run,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Women's Store
4,Bedford Park / Lawrence Manor East,Italian Restaurant,Coffee Shop,Restaurant,Sandwich Place,Grocery Store,Pharmacy,Liquor Store,Indian Restaurant,Ice Cream Shop,Fast Food Restaurant


### Cluster Neighborhoods

Run k-means to cluster the neighborhood into 5 clusters.
I picked 9 because the largest group has 42 neighborhoods.  With other choices the largest group always had more
Even with 30 groups, the largest group still has 35 neighborhoods.  They must be very homogeneous...

In [236]:
# set number of clusters
kclusters = 9

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# Inspect the number of neighborhoods placed in each group.  We don't want too many single-neighborhood groups
y = np.bincount(kmeans.labels_)
ii = np.nonzero(y)[0]
groupSize = np.vstack((ii,y[ii])).T

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.


In [214]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,Malvern / Rouge,43.806686,-79.194353,6,Fast Food Restaurant,Women's Store,Deli / Bodega,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run
1,M1C,Scarborough,Rouge Hill / Port Union / Highland Creek,43.784535,-79.160497,7,Bar,Women's Store,Deli / Bodega,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run
2,M1E,Scarborough,Guildwood / Morningside / West Hill,43.763573,-79.188711,0,Electronics Store,Breakfast Spot,Medical Center,Bank,Mexican Restaurant,Intersection,Rental Car Location,Spa,Doner Restaurant,Donut Shop
3,M1G,Scarborough,Woburn,43.770992,-79.216917,1,Coffee Shop,Soccer Field,Korean Restaurant,College Stadium,Colombian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,0,Athletics & Sports,Caribbean Restaurant,Bakery,Bank,Fried Chicken Joint,Thai Restaurant,Gas Station,Hakka Restaurant,Doner Restaurant,Dog Run


Finally, let's visualize the resulting clusters

In [235]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examine Clusters

Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, you can then assign a name to each cluster.

#### Cluster 1

This cluster is of neighborhoods pretty far from downtown.  Lots of banks, discount stores, electronic stores, pharmacies, grocery stores, pizza places, diner
### I call it boring suburbia

In [219]:
print('This cluster has '+ str(groupSize[0,1]) + ' neighborhoods')
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

This cluster has 15 neighborhoods


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Scarborough,0,Electronics Store,Breakfast Spot,Medical Center,Bank,Mexican Restaurant,Intersection,Rental Car Location,Spa,Doner Restaurant,Donut Shop
4,Scarborough,0,Athletics & Sports,Caribbean Restaurant,Bakery,Bank,Fried Chicken Joint,Thai Restaurant,Gas Station,Hakka Restaurant,Doner Restaurant,Dog Run
11,Scarborough,0,Auto Garage,Bakery,Shopping Mall,Sandwich Place,Breakfast Spot,Smoke Shop,Construction & Landscaping,Convenience Store,Colombian Restaurant,Empanada Restaurant
12,Scarborough,0,Breakfast Spot,Skating Rink,Latin American Restaurant,Clothing Store,Lounge,Dog Run,Dim Sum Restaurant,Diner,Discount Store,Distribution Center
13,Scarborough,0,Pizza Place,Pharmacy,Intersection,Noodle House,Shopping Mall,Gas Station,Fast Food Restaurant,Italian Restaurant,Chinese Restaurant,Bank
15,Scarborough,0,Fast Food Restaurant,Chinese Restaurant,Grocery Store,Sandwich Place,Bank,Pharmacy,Pizza Place,Breakfast Spot,Coffee Shop,Nail Salon
16,North York,0,Golf Course,Mediterranean Restaurant,Pool,Dog Run,Discount Store,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Distribution Center
18,North York,0,Chinese Restaurant,Café,Japanese Restaurant,Bank,Dog Run,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Women's Store
22,North York,0,Grocery Store,Bank,Home Service,Discount Store,Pharmacy,Pizza Place,Coffee Shop,Airport Lounge,Department Store,Ethiopian Restaurant
29,North York,0,Grocery Store,Park,Shopping Mall,Bank,Hotel,Comfort Food Restaurant,Comic Shop,College Stadium,Electronics Store,Eastern European Restaurant


#### Cluster 2

This is a large group of neighborhoods.  

They often have coffee shops, cafes, restaurants, (Italian, pizza, generic, bakery, desert, diner, bar, japanese etc.)  

### I call this a restaurant destination

In [221]:
print('This cluster has '+ str(groupSize[1,1]) + ' neighborhoods')
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

This cluster has 42 neighborhoods


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Scarborough,1,Coffee Shop,Soccer Field,Korean Restaurant,College Stadium,Colombian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop
6,Scarborough,1,Discount Store,Department Store,Coffee Shop,Convenience Store,Hobby Shop,Dog Run,Dessert Shop,Dim Sum Restaurant,Diner,Distribution Center
9,Scarborough,1,College Stadium,Skating Rink,Café,General Entertainment,Women's Store,Distribution Center,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store
14,Scarborough,1,Coffee Shop,Park,Bakery,Playground,Distribution Center,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store
17,North York,1,Clothing Store,Coffee Shop,Fast Food Restaurant,Women's Store,Bank,Bakery,Convenience Store,Cosmetics Shop,Toy / Game Store,Japanese Restaurant
20,North York,1,Ramen Restaurant,Restaurant,Pizza Place,Sushi Restaurant,Coffee Shop,Sandwich Place,Café,Japanese Restaurant,Bubble Tea Shop,Shopping Mall
24,North York,1,Baseball Field,Gym / Fitness Center,Café,Japanese Restaurant,Caribbean Restaurant,Women's Store,Dim Sum Restaurant,Diner,Discount Store,Distribution Center
25,North York,1,Asian Restaurant,Restaurant,Coffee Shop,Beer Store,Gym,Sporting Goods Shop,Italian Restaurant,Bike Shop,Sandwich Place,Grocery Store
26,North York,1,Bank,Coffee Shop,Frozen Yogurt Shop,Sushi Restaurant,Ice Cream Shop,Deli / Bodega,Pizza Place,Pharmacy,Restaurant,Supermarket
27,North York,1,Metro Station,Coffee Shop,Massage Studio,Bar,Caribbean Restaurant,Discount Store,Dessert Shop,Dim Sum Restaurant,Diner,Distribution Center


Cluster 3

In [238]:
print('This cluster has '+ str(groupSize[2,1]) + ' neighborhoods')
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

This cluster has 1 neighborhoods


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
61,Central Toronto,2,Garden,Women's Store,Distribution Center,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Dance Studio


Cluster 4

Most of these neighborhoods feature lots of partks, plus dicount stores, women's stores, rivers, playgorounds, diners, Dim Sum and desserts

### I call it parks and recreation

In [223]:
print('This cluster has '+ str(groupSize[3,1]) + ' neighborhoods')
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

This cluster has 8 neighborhoods


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,North York,3,Park,Convenience Store,Bank,Dog Run,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Women's Store
23,North York,3,Park,Food & Drink Shop,Women's Store,Distribution Center,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run
28,North York,3,Park,Airport,Women's Store,Dog Run,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center
38,East YorkEast Toronto,3,Park,Convenience Store,Metro Station,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Dance Studio
48,Downtown Toronto,3,Park,Trail,Playground,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Women's Store
71,York,3,Park,Women's Store,Market,Distribution Center,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run
76,North York,3,Park,Bakery,Construction & Landscaping,Women's Store,Dog Run,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center
84,Etobicoke,3,Park,River,Women's Store,Discount Store,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Distribution Center


#### Cluster 5

A good balnace of restaurants and recreational activities.  A nice place to live?

In [224]:
print('This cluster has '+ str(groupSize[4,1]) + ' neighborhoods')
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

This cluster has 16 neighborhoods


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,Scarborough,4,Bus Line,Bakery,Intersection,Metro Station,Ice Cream Shop,Bus Station,Park,Soccer Field,Donut Shop,Doner Restaurant
8,Scarborough,4,Skating Rink,Movie Theater,Motel,American Restaurant,Women's Store,Dog Run,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store
10,Scarborough,4,Indian Restaurant,Vietnamese Restaurant,Pet Store,Chinese Restaurant,Women's Store,Distribution Center,Department Store,Dessert Shop,Dim Sum Restaurant,Diner
34,East York,4,Skating Rink,Pharmacy,Beer Store,Cosmetics Shop,Curling Ice,Park,Comfort Food Restaurant,Department Store,Electronics Store,College Stadium
35,East Toronto,4,Pub,Health Food Store,Trail,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Distribution Center,Curling Ice
37,East York,4,Indian Restaurant,Grocery Store,Supermarket,Bank,Burger Joint,Coffee Shop,Fast Food Restaurant,Gas Station,Gym,Liquor Store
40,East Toronto,4,Board Shop,Pizza Place,Brewery,Movie Theater,Italian Restaurant,Restaurant,Burrito Place,Ice Cream Shop,Fast Food Restaurant,Steakhouse
42,Central Toronto,4,Swim School,Bus Line,Gym / Fitness Center,Park,Women's Store,Distribution Center,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store
43,Central Toronto,4,Hotel,Gym / Fitness Center,Food & Drink Shop,Breakfast Spot,Park,Sandwich Place,Gym,Department Store,Dumpling Restaurant,Donut Shop
46,Central Toronto,4,Gym,Park,Tennis Court,Playground,Women's Store,Discount Store,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant


Cluster 6

In [225]:
print('This cluster has '+ str(groupSize[5,1]) + ' neighborhoods')
toronto_merged.loc[toronto_merged['Cluster Labels'] == 5, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

This cluster has 1 neighborhoods


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
19,North York,5,Cafeteria,Women's Store,Distribution Center,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Dance Studio


Cluster 7

In [226]:
print('This cluster has '+ str(groupSize[6,1]) + ' neighborhoods')
toronto_merged.loc[toronto_merged['Cluster Labels'] == 6, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

This cluster has 1 neighborhoods


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Scarborough,6,Fast Food Restaurant,Women's Store,Deli / Bodega,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run


Cluster 8

In [227]:
print('This cluster has '+ str(groupSize[7,1]) + ' neighborhoods')
toronto_merged.loc[toronto_merged['Cluster Labels'] == 7, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

This cluster has 1 neighborhoods


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Scarborough,7,Bar,Women's Store,Deli / Bodega,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run


Cluster 9

In [229]:
print('This cluster has '+ str(groupSize[8,1]) + ' neighborhoods')
toronto_merged.loc[toronto_merged['Cluster Labels'] == 8, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

This cluster has 2 neighborhoods


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Scarborough,8,Construction & Landscaping,Playground,Doner Restaurant,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Women's Store
70,York,8,Trail,Field,Hockey Arena,Playground,Distribution Center,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store
