# Segmenting and Clustering Neighborhoods in Toronto, Canada

### Applied Data Science Capstone - Week3 Assignment

Adam Rubins

## Table of Contents


1. Environment Setup

2. Scraping postal codes of Canada from wikipedia

3. Download the Geographical coordinates of the neighborhoods

4. Explore, Merge, Filter & Visualize the data

5. Analyze Each Neighborhood

6. Cluster Neighborhoods

7. Examine Clusters   


## 1. Environment Setup

In [1]:
import numpy as np  # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# to handle web requests and web scraping
import requests
from bs4 import BeautifulSoup

# Matplotlib, folium and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium  # folium=0.5.0

# import k-means from clustering stage
from sklearn.cluster import KMeans

# geocode libraries
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder

from collections import defaultdict # have a function call return value, as a default value for the dict

## 2. Scraping postal codes of Canada from wikipedia

download the data

In [2]:
postal_codes_Canada_m_url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

In [3]:
html_postal_codes = requests.get(postal_codes_Canada_m_url).text

use BeautifulSoup to sceape the data

In [4]:
soup = BeautifulSoup(html_postal_codes, 'html.parser')

Explore the html and look for the table data

In [5]:
print(soup.prettify()[:300])

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   List of postal codes of Canada: M - Wikipedia
  </title>
  <script>
   document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitT


the table is in a table tag

In [6]:
table = soup.find('table')

In [7]:
print(table.prettify()[:300])

<table class="wikitable sortable">
 <tbody>
  <tr>
   <th>
    Postcode
   </th>
   <th>
    Borough
   </th>
   <th>
    Neighbourhood
   </th>
  </tr>
  <tr>
   <td>
    M1A
   </td>
   <td>
    Not assigned
   </td>
   <td>
    Not assigned
   </td>
  </tr>
  <tr>
   <td>
    M2A
   </td>
   <td>


#### Tranform the data into a *pandas* dataframe

In [8]:
# Get the table Headers
headers = [header.text.strip() for header in table.find_all('th')] 
headers

['Postcode', 'Borough', 'Neighbourhood']

In [9]:
# create an Empty DataFrame with the Headers
neighborhoods = pd.DataFrame(columns=headers)
neighborhoods

Unnamed: 0,Postcode,Borough,Neighbourhood


Fill the neighborhoods DataFrame with only assigned borough, and aggregate the neighbourhoods for each (postcode, borough) combimation. (If a borough is assigned but a neighborhood is not, then the neighborhood will be the same as the borough.)

In [10]:
neighborhoods_dict = defaultdict(list) # will use to aggregate the neighbourhoods as a list 
for row in table.find_all('tr')[1:]: # skip the headers row
    postcode, borough, neighbourhood = [field.text.strip() for field in row.find_all('td')]
    
    if borough == 'Not assigned': # Ignore rows with a borough that is Not assigned.
        continue
    # If a borough is assigned but a neighborhood is not, then the neighborhood will be the same as the borough.
    neighbourhood = borough if neighbourhood == 'Not assigned' else neighbourhood
    # aggregate the neighbourhoods as a list   
    neighborhoods_dict[(postcode, borough)].append(neighbourhood)

# make rows with postcode, borough and neighbourhood (as text with ',' delimiter)
fields_rows = ([key[0], key[1], ', '.join(value)] for key, value in neighborhoods_dict.items())
# fill the DataFrame
for fields_row in fields_rows:
    neighborhoods = neighborhoods.append({header: field for header, field in zip(headers, fields_row)},
                                         ignore_index=True)


neighborhoods.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M7A,Downtown Toronto,Queen's Park


In [11]:
# Chack value of the Postcode == 'M9A':
# Borough and the Neighborhood columns will be Queen's Park (Neighbourhood Not assigned).
neighborhoods[neighborhoods.Postcode == 'M9A']

Unnamed: 0,Postcode,Borough,Neighbourhood
5,M9A,Queen's Park,Queen's Park


size check

In [12]:
neighborhoods.shape

(103, 3)

In [13]:
print('The Postcode dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The Postcode dataframe has 11 boroughs and 103 neighborhoods.


## 3. Download the Geographical coordinates of the neighborhoods

In [14]:
url = 'http://cocl.us/Geospatial_data'
geo_data = pd.read_csv(url)
geo_data.rename({'Postal Code': 'Postcode'}, axis=1, inplace=True)
geo_data.head()

Unnamed: 0,Postcode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


make sure that the dataset has the same number of Postcode before the merge

In [15]:
assert neighborhoods.shape[0] == geo_data.shape[0]
print('same number of rows?', neighborhoods.shape[0] == geo_data.shape[0], '; Postcode:', geo_data.shape[0])

same number of rows? True ; Postcode: 103


## 4. Explore, Merge, Filter & Visualize the data

merge the Postcode with the Geospatial data

In [16]:
merged_data = neighborhoods.merge(geo_data, how='inner', on='Postcode')
merged_data.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763
4,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494


### Filter Only boroughs that contain the word Toronto

In [17]:
toronto_index = merged_data.Borough.str.lower().str.find('toronto') != -1
toronto_neighborhoods = merged_data[toronto_index].reset_index(drop=True)
print('Unique Borough: ')
print()
print(toronto_neighborhoods.Borough.unique())
print('*' * 100)
print('Unique Neighbourhoods: ')
print()
print(toronto_neighborhoods.Neighbourhood.unique())
print('*' * 100)
print('toronto_neighborhoods shape: ', toronto_neighborhoods.shape)

Unique Borough: 

['Downtown Toronto' 'East Toronto' 'West Toronto' 'Central Toronto']
****************************************************************************************************
Unique Neighbourhoods: 

['Harbourfront' "Queen's Park" 'Ryerson, Garden District' 'St. James Town'
 'The Beaches' 'Berczy Park' 'Central Bay Street' 'Christie'
 'Adelaide, King, Richmond' 'Dovercourt Village, Dufferin'
 'Harbourfront East, Toronto Islands, Union Station'
 'Little Portugal, Trinity' 'The Danforth West, Riverdale'
 'Design Exchange, Toronto Dominion Centre'
 'Brockton, Exhibition Place, Parkdale Village'
 'The Beaches West, India Bazaar' 'Commerce Court, Victoria Hotel'
 'Studio District' 'Lawrence Park' 'Roselawn' 'Davisville North'
 'Forest Hill North, Forest Hill West' 'High Park, The Junction South'
 'North Toronto West' 'The Annex, North Midtown, Yorkville'
 'Parkdale, Roncesvalles' 'Davisville' 'Harbord, University of Toronto'
 'Runnymede, Swansea' 'Moore Park, Summerhill East'
 

**Visualize Map**

**Note: Unfortunately the folium maps do not render on GitHub.**

**drop the github link to This .ipynb file into nbviewer.org and get a full dynamic output**

Get the geographical coordinates of Toronto.

In [18]:
address = 'Toronto, Ontario'
geolocator = Nominatim(user_agent="Toronto_explorer")
location = geolocator.geocode(address)
latitude, longitude = location.latitude, location.longitude

Visualize Toronto neighborhoods

In [19]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, toronto_neighborhoods_ in zip(toronto_neighborhoods['Latitude'],
                                                    toronto_neighborhoods['Longitude'],
                                                    toronto_neighborhoods['Borough'],
                                                    toronto_neighborhoods['Neighbourhood']):
    label = '{}, {}'.format(toronto_neighborhoods_, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

#### Utilizing the Foursquare API to explore the neighborhoods and segment them

#### Define Foursquare Credentials and Version and Parameters

In [20]:
import configparser
config = configparser.ConfigParser()
config_foursquare = config.read('foursquare.ini') # my Foursquare Credentials are in the file
CLIENT_ID = config['FOURSQUARE']['CLIENT_ID'] # your Foursquare ID
CLIENT_SECRET = config['FOURSQUARE']['CLIENT_SECRET'] # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
print('Your credentails:')
print('CLIENT_ID: ' + 'top secret')
print('CLIENT_SECRET:' + 'top secret')
print('Number of venues returned by Foursquare API:', LIMIT, '; Radius to check:', radius)

Your credentails:
CLIENT_ID: top secret
CLIENT_SECRET:top secret
Number of venues returned by Foursquare API: 100 ; Radius to check: 500


From the Foursquare lab, I know that all the information is in the *items* key. I'll borrow the **get_category_type** function from the Foursquare lab.

In [21]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

I'll borrow the function to clean the json and structure it into a pandas dataframe

In [22]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        # print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
#         results = requests.get(url).json()
#         print(results)
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

**Run function on each neighborhood and create a new dataframe called toronto_venues.**

In [23]:
toronto_venues = getNearbyVenues(names=toronto_neighborhoods.Neighbourhood,
                                   latitudes=toronto_neighborhoods.Latitude,
                                   longitudes=toronto_neighborhoods.Longitude
                                  )

In [24]:
toronto_venues.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Harbourfront,43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,Harbourfront,43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,Harbourfront,43.65426,-79.360636,Cooper Koo Family YMCA,43.653191,-79.357947,Gym / Fitness Center
3,Harbourfront,43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,Harbourfront,43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant


size check

In [25]:
toronto_venues.shape

(1699, 7)

In [26]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 231 uniques categories.


Unique list of venue category

In [27]:
print(toronto_venues['Venue Category'].unique())

['Bakery' 'Coffee Shop' 'Gym / Fitness Center' 'Spa' 'Restaurant' 'Pub'
 'Park' 'Breakfast Spot' 'Historic Site' 'Farmers Market' 'Chocolate Shop'
 'Dessert Shop' 'Performing Arts Venue' 'Café' 'French Restaurant'
 'Event Space' 'Mexican Restaurant' 'Yoga Studio' 'Ice Cream Shop'
 'Shoe Store' 'Theater' 'Art Gallery' 'Brewery' 'Electronics Store'
 'Beer Store' 'Bank' 'Hotel' 'Health Food Store' 'Antique Shop'
 'Portuguese Restaurant' 'Italian Restaurant' 'Gym' 'Creperie'
 'Burrito Place' 'Beer Bar' 'Arts & Crafts Store' 'Sushi Restaurant'
 'Hobby Shop' 'Diner' 'Fried Chicken Joint' 'Burger Joint' 'Nightclub'
 'Chinese Restaurant' 'Salad Place' 'Seafood Restaurant'
 'Fast Food Restaurant' 'Juice Bar' 'Sandwich Place' 'College Auditorium'
 'Bar' 'Clothing Store' 'Comic Shop' 'Pizza Place' 'Plaza' 'Tea Room'
 'Music Venue' 'Ramen Restaurant' 'Thai Restaurant' 'Movie Theater'
 'Steakhouse' 'Bookstore' 'American Restaurant' 'Japanese Restaurant'
 'Sporting Goods Shop' 'Gastropub' 'Tanning S

check how many venues were returned for each neighborhood

In [28]:
toronto_venues.groupby('Neighbourhood').Venue.count()

Neighbourhood
Adelaide, King, Richmond                                                                                      100
Berczy Park                                                                                                    55
Brockton, Exhibition Place, Parkdale Village                                                                   22
Business Reply Mail Processing Centre 969 Eastern                                                              15
CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara     17
Cabbagetown, St. James Town                                                                                    43
Central Bay Street                                                                                             83
Chinatown, Grange Park, Kensington Market                                                                      86
Christie                                                                  

## 5. Analyze Each Neighborhood

In [29]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighbourhood'] = toronto_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = ['Neighbourhood'] + list(toronto_onehot.columns[toronto_onehot.columns != 'Neighbourhood'])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Neighbourhood,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Store,Belgian Restaurant,Bistro,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Butcher,Café,Cajun / Creole Restaurant,Camera Store,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Auditorium,College Gym,College Rec Center,Colombian Restaurant,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Festival,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Gluten-free Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hospital,Hostel,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Intersection,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Korean Restaurant,Lake,Latin American Restaurant,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Market,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Movie Theater,Museum,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Plane,Playground,Plaza,Poke Place,Portuguese Restaurant,Post Office,Poutine Place,Pub,Ramen Restaurant,Record Shop,Recording Studio,Rental Car Location,Restaurant,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Snack Place,South American Restaurant,Southern / Soul Food Restaurant,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,Stationery Store,Steakhouse,Strip Club,Summer Camp,Supermarket,Sushi Restaurant,Swim School,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Harbourfront,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Harbourfront,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Harbourfront,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Harbourfront,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Harbourfront,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


size check

In [30]:
toronto_onehot.shape

(1699, 232)

**Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category**

In [31]:
toronto_grouped = toronto_onehot.groupby('Neighbourhood').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Neighbourhood,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Store,Belgian Restaurant,Bistro,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Butcher,Café,Cajun / Creole Restaurant,Camera Store,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Auditorium,College Gym,College Rec Center,Colombian Restaurant,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Festival,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Gluten-free Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hospital,Hostel,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Intersection,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Korean Restaurant,Lake,Latin American Restaurant,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Market,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Movie Theater,Museum,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Plane,Playground,Plaza,Poke Place,Portuguese Restaurant,Post Office,Poutine Place,Pub,Ramen Restaurant,Record Shop,Recording Studio,Rental Car Location,Restaurant,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Snack Place,South American Restaurant,Southern / Soul Food Restaurant,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,Stationery Store,Steakhouse,Strip Club,Summer Camp,Supermarket,Sushi Restaurant,Swim School,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.02,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.07,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.03,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.01,0.0
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.018182,0.0,0.018182,0.036364,0.0,0.0,0.0,0.018182,0.018182,0.0,0.036364,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.036364,0.0,0.0,0.0,0.0,0.036364,0.0,0.0,0.0,0.0,0.0,0.054545,0.072727,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.018182,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.036364,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.018182,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.036364,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.036364,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.018182,0.018182,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.0
2,"Brockton, Exhibition Place, Parkdale Village",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.045455,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Business Reply Mail Processing Centre 969 Eastern,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",0.0,0.058824,0.058824,0.058824,0.117647,0.117647,0.117647,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


size check

In [32]:
toronto_grouped.shape

(39, 232)

**Function to make a dataframe of the top 'num_top_venues' of most common venues for each neighborhood**

In [33]:
def make_neighborhoods_venues_sorted(toronto_grouped_by_Neighbourhood=toronto_grouped, num_top_venues=10):
    '''Craete dataframe of the top 'num_top_venues' venues for each neighborhood'''
    def return_most_common_venues(row, num_top_venues=10):
        # function to sort each neighborhood the venues in descending order (From the lab)
        row_categories = row.iloc[1:]
        row_categories_sorted = row_categories.sort_values(ascending=False)
        return row_categories_sorted.index.values[0:num_top_venues]
    
    num_top_venues = num_top_venues
    indicators = ['st', 'nd', 'rd']

    # create columns according to number of top venues
    columns = ['Neighbourhood']
    for ind in np.arange(num_top_venues):
        try:
            columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
        except:
            columns.append('{}th Most Common Venue'.format(ind+1))

    # create a new dataframe with the new columns
    neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
    neighborhoods_venues_sorted['Neighbourhood'] = toronto_grouped_by_Neighbourhood['Neighbourhood']

    for ind in np.arange(toronto_grouped_by_Neighbourhood.shape[0]):
        neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(
            toronto_grouped_by_Neighbourhood.iloc[ind, :], num_top_venues)

    return neighborhoods_venues_sorted

**Create a Dataframe of the top 10 venues for each neighborhood**

In [34]:
num_top_venues = 10
neighborhoods_venues_sorted = make_neighborhoods_venues_sorted(num_top_venues=num_top_venues)
print('df shape:', neighborhoods_venues_sorted.shape)
neighborhoods_venues_sorted.head()

df shape: (39, 11)


Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Steakhouse,Café,Bar,Bakery,Restaurant,Asian Restaurant,Cosmetics Shop,Thai Restaurant,Seafood Restaurant
1,Berczy Park,Coffee Shop,Cocktail Bar,Steakhouse,Cheese Shop,Café,Farmers Market,Beer Bar,Bakery,Seafood Restaurant,Gourmet Shop
2,"Brockton, Exhibition Place, Parkdale Village",Coffee Shop,Café,Breakfast Spot,Grocery Store,Bakery,Office,Performing Arts Venue,Pet Store,Nightclub,Climbing Gym
3,Business Reply Mail Processing Centre 969 Eastern,Pizza Place,Auto Workshop,Brewery,Light Rail Station,Farmers Market,Spa,Fast Food Restaurant,Burrito Place,Restaurant,Recording Studio
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",Airport Lounge,Airport Service,Airport Terminal,Plane,Harbor / Marina,Coffee Shop,Rental Car Location,Sculpture Garden,Boat or Ferry,Bar


## 6. Cluster Neighborhoods

In the next cell there is a function that automate the cluster making process from the toronto_grouped and toronto_neighborhoods Dataframes.
it default to 5 clusters and to 10 top_venues per neighborhood group (postalcode row).
The use of a function makes it esay to experiment and irerate with diffrent number of cluster

In [35]:
def make_clustes(toronto_neighborhoods=toronto_neighborhoods, toronto_data=toronto_grouped, kclusters=5, num_top_venues=10):
    '''This function automate the cluster making process from the toronto_grouped Dataframe'''
    neighborhoods_venues_sorted = make_neighborhoods_venues_sorted(toronto_data)
    # set number of clusters
    kclusters = kclusters
    toronto_grouped_clustering = toronto_data.drop('Neighbourhood', 1)

    # run k-means clustering
    kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

    # add clustering labels
    neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
    
    # merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
    toronto_merged = toronto_neighborhoods.merge(neighborhoods_venues_sorted, on='Neighbourhood')
    
    clusters = []
    # for each cluster lable, Ceate a cluster and put it into clusters list 
    for label in toronto_merged['Cluster Labels'].sort_values().unique():
        clusters.append(
            toronto_merged.loc[toronto_merged['Cluster Labels'] == label, toronto_merged.columns[[1, 2] + list(
                range(5, toronto_merged.shape[1]))]])
    # return a tuple with the toronto_merged dataframe and the clusters list
    return toronto_merged, clusters

After some experimentation with higer number of clusters, I decided on 3 clusters (mostly because with more clusters, many of them (relatively) had only one data point in the cluster, it's not necessarily a bad thing and one could dig deeper with more experimentation and analysis)

In [36]:
# set number of clusters
kclusters = 3
toronto_data, clusters = make_clustes(kclusters=kclusters, num_top_venues=num_top_venues)
cluster1, cluster2, cluster3 = clusters

dataframe that includes the cluster as well as the top 10 most common venues for each neighborhood.

In [37]:
toronto_data.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636,0,Coffee Shop,Café,Park,Bakery,Pub,Mexican Restaurant,Restaurant,Yoga Studio,Beer Store,Hotel
1,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494,0,Coffee Shop,Park,Gym,Yoga Studio,Burrito Place,Fast Food Restaurant,Italian Restaurant,Juice Bar,Seafood Restaurant,Sandwich Place
2,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937,0,Coffee Shop,Clothing Store,Cosmetics Shop,Café,Japanese Restaurant,Bakery,Tea Room,Italian Restaurant,Pizza Place,Bubble Tea Shop
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,0,Coffee Shop,Café,Restaurant,Bakery,Italian Restaurant,Hotel,Beer Bar,Cocktail Bar,Cosmetics Shop,Breakfast Spot
4,M4E,East Toronto,The Beaches,43.676357,-79.293031,0,Neighborhood,Health Food Store,Trail,Pub,Department Store,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop


**Visualize clusters**

**Note: Unfortunately the folium maps do not render on GitHub.**

**drop the github link to This .ipynb file into nbviewer.org and get a full dynamic output**

To enable easy iteration and experimentation, I put the visualization code in a function 

In [38]:
def visualize_clusters(toronto_data):
    kclusters = len(toronto_data['Cluster Labels'].unique())
    address = 'Toronto, Ontario'
    geolocator = Nominatim(user_agent="Toronto_explorer")
    location = geolocator.geocode(address)
    latitude, longitude = location.latitude, location.longitude
    # create map
    map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)
    # set color scheme for the clusters
    x = np.arange(kclusters)
    ys = [i + x + (i*x)**2 for i in range(kclusters)]
    colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
    rainbow = [colors.rgb2hex(i) for i in colors_array]

    # add markers to the map
    markers_colors = []
    for lat, lon, poi, cluster in zip(toronto_data['Latitude'],
                                      toronto_data['Longitude'], 
                                      toronto_data['Neighbourhood'], 
                                      toronto_data['Cluster Labels']):
        label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
        folium.CircleMarker(
            [lat, lon],
            radius=5,
            popup=label,
            color=rainbow[cluster-1],
            fill=True,
            fill_color=rainbow[cluster-1],
            fill_opacity=0.7).add_to(map_clusters)

    return map_clusters

In [39]:
visualize_clusters(toronto_data)

## 7. Examine Clusters

Helper Function to print the number of: 
* Boroughs
* rows 
* venue 

and also prints a unique list of neighbourhoods and venues

In [40]:
def print_cluster_info(cluster, max_venues_to_display=None):
    print('Cluster{} info:'.format(cluster['Cluster Labels'].unique() + 1 )) # so the fist cluster will look like 1 and not 0
    boroughs, neighbourhoods = len(cluster.Borough.unique()), cluster.shape[0]
    print('Total number of Boroughs:', boroughs, '; Total number of Neighbourhoods groups (rows):', neighbourhoods)
    print('Neighbourhoods:')
    for i, neighbourhood in enumerate(cluster.Neighbourhood.unique(), start=1):
        print('\t{}: {}'.format(i, neighbourhood))
    cluster_venues = pd.Series(cluster.loc[:,'1st Most Common Venue':'10th Most Common Venue'].values.flatten()).unique()
    print('\nTotal number of unique Venues:',len(cluster_venues), end='\n\n')
    print('Venues:')
    
    for i, venue in enumerate(cluster_venues, start=1):
        print('\t{}: {}'.format(i, venue))
        if isinstance(max_venues_to_display, int) and i == max_venues_to_display:
            break

#### Cluster 1

In [41]:
cluster1.head(3)

Unnamed: 0,Borough,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,Harbourfront,0,Coffee Shop,Café,Park,Bakery,Pub,Mexican Restaurant,Restaurant,Yoga Studio,Beer Store,Hotel
1,Downtown Toronto,Queen's Park,0,Coffee Shop,Park,Gym,Yoga Studio,Burrito Place,Fast Food Restaurant,Italian Restaurant,Juice Bar,Seafood Restaurant,Sandwich Place
2,Downtown Toronto,"Ryerson, Garden District",0,Coffee Shop,Clothing Store,Cosmetics Shop,Café,Japanese Restaurant,Bakery,Tea Room,Italian Restaurant,Pizza Place,Bubble Tea Shop


**Examine the Total number of Boroughs, Neighbourhoods groups (rows) in the cluster and the Total number of unique Venues**

In [42]:
print_cluster_info(cluster1)

Cluster[1] info:
Total number of Boroughs: 4 ; Total number of Neighbourhoods groups (rows): 37
Neighbourhoods:
	1: Harbourfront
	2: Queen's Park
	3: Ryerson, Garden District
	4: St. James Town
	5: The Beaches
	6: Berczy Park
	7: Central Bay Street
	8: Christie
	9: Adelaide, King, Richmond
	10: Dovercourt Village, Dufferin
	11: Harbourfront East, Toronto Islands, Union Station
	12: Little Portugal, Trinity
	13: The Danforth West, Riverdale
	14: Design Exchange, Toronto Dominion Centre
	15: Brockton, Exhibition Place, Parkdale Village
	16: The Beaches West, India Bazaar
	17: Commerce Court, Victoria Hotel
	18: Studio District
	19: Lawrence Park
	20: Davisville North
	21: Forest Hill North, Forest Hill West
	22: High Park, The Junction South
	23: North Toronto West
	24: The Annex, North Midtown, Yorkville
	25: Parkdale, Roncesvalles
	26: Davisville
	27: Harbord, University of Toronto
	28: Runnymede, Swansea
	29: Chinatown, Grange Park, Kensington Market
	30: Deer Park, Forest Hill SE, Ra

**Create a custom Score measure for each Venue  in the cluster**

I will create a Score measure for each Venue that will Count all of the occurances of that venue in the cluster (will give different weights in the summation to reflect that the 1st Most Common Venue has higher weight then the 2nd and so on) and will divided that sum by the number of Neighbourhoods groups (rows) in the cluster.
that way I can get a kind of ranking measure of venues in the cluster.

**weights selection**

I choose a simple weighting method in which the weight of the first most common venue will have ten times the weight of the 10th most common venue

In [43]:
weights_series = np.linspace(10, 1, 10)
weights_series

array([10.,  9.,  8.,  7.,  6.,  5.,  4.,  3.,  2.,  1.])

In [44]:
venues = pd.Series(cluster1.loc[:,'1st Most Common Venue':'10th Most Common Venue'].values.flatten()).unique()
venues = pd.DataFrame(venues, columns=['Venue'])
venues['cartesian_product_join_key'] = 1
venues.head()

Unnamed: 0,Venue,cartesian_product_join_key
0,Coffee Shop,1
1,Café,1
2,Park,1
3,Bakery,1
4,Pub,1


create a cartesian product of all the venues with all the rows of the cluster

In [45]:
cluster1['cartesian_product_join_key'] = 1
feature_venues = venues.merge(cluster1, on='cartesian_product_join_key')
feature_venues.drop(columns=['cartesian_product_join_key'], axis=1, inplace=True)
feature_venues.head()

Unnamed: 0,Venue,Borough,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Coffee Shop,Downtown Toronto,Harbourfront,0,Coffee Shop,Café,Park,Bakery,Pub,Mexican Restaurant,Restaurant,Yoga Studio,Beer Store,Hotel
1,Coffee Shop,Downtown Toronto,Queen's Park,0,Coffee Shop,Park,Gym,Yoga Studio,Burrito Place,Fast Food Restaurant,Italian Restaurant,Juice Bar,Seafood Restaurant,Sandwich Place
2,Coffee Shop,Downtown Toronto,"Ryerson, Garden District",0,Coffee Shop,Clothing Store,Cosmetics Shop,Café,Japanese Restaurant,Bakery,Tea Room,Italian Restaurant,Pizza Place,Bubble Tea Shop
3,Coffee Shop,Downtown Toronto,St. James Town,0,Coffee Shop,Café,Restaurant,Bakery,Italian Restaurant,Hotel,Beer Bar,Cocktail Bar,Cosmetics Shop,Breakfast Spot
4,Coffee Shop,East Toronto,The Beaches,0,Neighborhood,Health Food Store,Trail,Pub,Department Store,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop


for each row see if the 'Venue' column is equal to each of the most common venues columns

In [46]:
venues_ = feature_venues.loc[:,'1st Most Common Venue':'10th Most Common Venue'].apply(lambda col: col == feature_venues.Venue)
venues_.head()

Unnamed: 0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,True,False,False,False,False,False,False,False,False,False
1,True,False,False,False,False,False,False,False,False,False
2,True,False,False,False,False,False,False,False,False,False
3,True,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False


Apply the weights

In [47]:
venues_ = venues_ * weights_series # apply weights
venues_['Venue'] = feature_venues.Venue
venues_.head()

Unnamed: 0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Venue
0,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Coffee Shop
1,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Coffee Shop
2,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Coffee Shop
3,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Coffee Shop
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Coffee Shop


Group by venue and sum the weighted values

In [48]:
grouped_venues = venues_.groupby('Venue').sum()
grouped_venues.head()

Unnamed: 0_level_0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
Venue,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Airport Lounge,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Airport Service,0.0,9.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Airport Terminal,0.0,0.0,8.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
American Restaurant,0.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0
Aquarium,0.0,9.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [49]:
grouped_venues['Score'] = grouped_venues.sum(axis=1)
grouped_venues['Score'] = grouped_venues['Score'] / cluster1.shape[0]
grouped_venues.head()

Unnamed: 0_level_0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Score
Venue,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Airport Lounge,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.27027
Airport Service,0.0,9.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.243243
Airport Terminal,0.0,0.0,8.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.216216
American Restaurant,0.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.108108
Aquarium,0.0,9.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.243243


**Top 10 Venues in the cluster**

In [50]:
grouped_venues = grouped_venues.sort_values(by='Score', ascending=False, axis=0)
grouped_venues[['Score']].head(10)

Unnamed: 0_level_0,Score
Venue,Unnamed: 1_level_1
Coffee Shop,6.162162
Café,4.945946
Restaurant,2.918919
Italian Restaurant,2.540541
Park,2.216216
Bakery,1.675676
Hotel,1.486486
Sandwich Place,1.378378
Bar,1.324324
Yoga Studio,1.243243


**Distinguishing Characteristics**

This is the biggest of the clusters. In this cluster we find that the most common venues are amenities such as coffee shops, restaurants, bars, etc. It is a vibrant erea with tourism (hotels).

**Because the Same Analysis is needed For the Other Clusters, I will transform the process of getting the scores into a function**

In [51]:
def get_scores(cluster ,weights_series=weights_series):
    '''returns a dataframe with the custom Score for each venue'''
    # Create a unique venues dataframe 
    venues = pd.Series(cluster.loc[:,'1st Most Common Venue':'10th Most Common Venue'].values.flatten()).unique()
    venues = pd.DataFrame(venues, columns=['Venue'])
    # cartesian product with the cluster (in feature_venues dataframe)
    venues['cartesian_product_join_key'] = 1
    cluster['cartesian_product_join_key'] = 1
    feature_venues = venues.merge(cluster, on='cartesian_product_join_key')
    feature_venues.drop(columns=['cartesian_product_join_key'], axis=1, inplace=True)
    # for each row see if Venue is equal to each of the most common venues columns 
    venues_ = feature_venues.loc[:,'1st Most Common Venue':'10th Most Common Venue'].apply(
        lambda col: col == feature_venues.Venue)
    
    venues_ = venues_ * weights_series # apply weights
    venues_['Venue'] = feature_venues.Venue
    # Group by venue and sum the weighted values
    grouped_venues = venues_.groupby('Venue').sum()
    # create a score column
    grouped_venues['Score'] = grouped_venues.sum(axis=1)
    grouped_venues['Score'] = grouped_venues['Score'] / cluster.shape[0]
    # sort Decending by the score
    grouped_venues = grouped_venues.sort_values(by='Score', ascending=False, axis=0)
    return grouped_venues


def get_top_n_venues(cluster, n=10):
    '''returns the top_n_venues by using the get_scores function'''
    return get_scores(cluster)[['Score']].head(n)

#### Cluster 2

In [52]:
cluster2

Unnamed: 0,Borough,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
29,Central Toronto,"Moore Park, Summerhill East",1,Playground,Summer Camp,Yoga Studio,Dessert Shop,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop


**Examine the Total number of Boroughs, Neighbourhoods groups (rows) in the cluster and the Total number of unique Venues**

In [53]:
print_cluster_info(cluster2)

Cluster[2] info:
Total number of Boroughs: 1 ; Total number of Neighbourhoods groups (rows): 1
Neighbourhoods:
	1: Moore Park, Summerhill East

Total number of unique Venues: 10

Venues:
	1: Playground
	2: Summer Camp
	3: Yoga Studio
	4: Dessert Shop
	5: Event Space
	6: Ethiopian Restaurant
	7: Electronics Store
	8: Eastern European Restaurant
	9: Dumpling Restaurant
	10: Donut Shop


Create Costom Score

In [54]:
get_scores(cluster2)

Unnamed: 0_level_0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Score
Venue,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Playground,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,10.0
Summer Camp,0.0,9.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,9.0
Yoga Studio,0.0,0.0,8.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,8.0
Dessert Shop,0.0,0.0,0.0,7.0,0.0,0.0,0.0,0.0,0.0,0.0,7.0
Event Space,0.0,0.0,0.0,0.0,6.0,0.0,0.0,0.0,0.0,0.0,6.0
Ethiopian Restaurant,0.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0,0.0,5.0
Electronics Store,0.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,4.0
Eastern European Restaurant,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,3.0
Dumpling Restaurant,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,2.0
Donut Shop,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0


**Top 10 Venues in the cluster**

In [55]:
get_top_n_venues(cluster2, n=10)

Unnamed: 0_level_0,Score
Venue,Unnamed: 1_level_1
Playground,10.0
Summer Camp,9.0
Yoga Studio,8.0
Dessert Shop,7.0
Event Space,6.0
Ethiopian Restaurant,5.0
Electronics Store,4.0
Eastern European Restaurant,3.0
Dumpling Restaurant,2.0
Donut Shop,1.0


**Distinguishing Characteristics**

This is most likely a cluster of residential neiborhoods for families. It contains parks, playgrounds, a swimming school, some stores and a few restaurants.  

#### Cluster 3

In [56]:
cluster3

Unnamed: 0,Borough,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
19,Central Toronto,Roselawn,2,Garden,Yoga Studio,Dessert Shop,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop


**Examine the Total number of Boroughs, Neighbourhoods groups (rows) in the cluster and the Total number of unique Venues**

In [57]:
print_cluster_info(cluster3)

Cluster[3] info:
Total number of Boroughs: 1 ; Total number of Neighbourhoods groups (rows): 1
Neighbourhoods:
	1: Roselawn

Total number of unique Venues: 10

Venues:
	1: Garden
	2: Yoga Studio
	3: Dessert Shop
	4: Falafel Restaurant
	5: Event Space
	6: Ethiopian Restaurant
	7: Electronics Store
	8: Eastern European Restaurant
	9: Dumpling Restaurant
	10: Donut Shop


Create Costom Score

In [58]:
get_scores(cluster3)

Unnamed: 0_level_0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Score
Venue,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Garden,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,10.0
Yoga Studio,0.0,9.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,9.0
Dessert Shop,0.0,0.0,8.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,8.0
Falafel Restaurant,0.0,0.0,0.0,7.0,0.0,0.0,0.0,0.0,0.0,0.0,7.0
Event Space,0.0,0.0,0.0,0.0,6.0,0.0,0.0,0.0,0.0,0.0,6.0
Ethiopian Restaurant,0.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0,0.0,5.0
Electronics Store,0.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,4.0
Eastern European Restaurant,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,3.0
Dumpling Restaurant,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,2.0
Donut Shop,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0


**Top 10 Venues in the cluster**

In [59]:
get_top_n_venues(cluster3, n=10)

Unnamed: 0_level_0,Score
Venue,Unnamed: 1_level_1
Garden,10.0
Yoga Studio,9.0
Dessert Shop,8.0
Falafel Restaurant,7.0
Event Space,6.0
Ethiopian Restaurant,5.0
Electronics Store,4.0
Eastern European Restaurant,3.0
Dumpling Restaurant,2.0
Donut Shop,1.0


**Distinguishing Characteristics**

This is the smallest cluster (only one neiborhood), which seems to be a well established neiborhood, with a variety of recreational activites as well as various ethnic restaurants.  