# Introduction

In this project I explore, segment, and cluster the neighborhoods in the city of Toronto.

For the Toronto neighborhood data, a Wikipedia page: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M exists that has all the information needed to explore and cluster the neighborhoods in Toronto. First, it's necessary to scrape the Wikipedia page and wrangle the data, clean it, and then read it into a pandas dataframe so that it is in a structured format.

With the data in a structured format, I conduct the analysis to explore and cluster the neighborhoods in the city of Toronto.

### Import necessary libraries

In [1]:
import requests
website_url = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text

from bs4 import BeautifulSoup
soup = BeautifulSoup(website_url,'lxml')
print(soup.prettify())

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   List of postal codes of Canada: M - Wikipedia
  </title>
  <script>
   document.documentElement.className=document.documentElement.className.replace(/(^|\s)client-nojs(\s|$)/,"$1client-js$2");RLCONF={"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"List_of_postal_codes_of_Canada:_M","wgTitle":"List of postal codes of Canada: M","wgCurRevisionId":906439794,"wgRevisionId":906439794,"wgArticleId":539066,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Communications in Ontario","Postal codes in Canada","Toronto","Ontario-related lists"],"wgBreakFrames":!1,"wgPageContentLanguage":"en","wgPageContentModel":"wikitext","wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June",

### By inspecting the output, it's evident that the data is availabe in table and belongs to class="wikitable sortable." So extract only table

In [2]:
Table1 = soup.find('table',{'class':'wikitable sortable'})
Table1

<table class="wikitable sortable">
<tbody><tr>
<th>Postcode</th>
<th>Borough</th>
<th>Neighbourhood
</th></tr>
<tr>
<td>M1A</td>
<td>Not assigned</td>
<td>Not assigned
</td></tr>
<tr>
<td>M2A</td>
<td>Not assigned</td>
<td>Not assigned
</td></tr>
<tr>
<td>M3A</td>
<td><a href="/wiki/North_York" title="North York">North York</a></td>
<td><a href="/wiki/Parkwoods" title="Parkwoods">Parkwoods</a>
</td></tr>
<tr>
<td>M4A</td>
<td><a href="/wiki/North_York" title="North York">North York</a></td>
<td><a href="/wiki/Victoria_Village" title="Victoria Village">Victoria Village</a>
</td></tr>
<tr>
<td>M5A</td>
<td><a href="/wiki/Downtown_Toronto" title="Downtown Toronto">Downtown Toronto</a></td>
<td><a href="/wiki/Harbourfront_(Toronto)" title="Harbourfront (Toronto)">Harbourfront</a>
</td></tr>
<tr>
<td>M5A</td>
<td><a href="/wiki/Downtown_Toronto" title="Downtown Toronto">Downtown Toronto</a></td>
<td><a href="/wiki/Regent_Park" title="Regent Park">Regent Park</a>
</td></tr>
<tr>
<td>M6A</td>

### These are the 3 variables extracted

In [3]:
print(Table1.tr.text)


Postcode
Borough
Neighbourhood



In [4]:
headers="Postcode,Borough,Neighborhood"

### Get all values in "tr" and seperate each "td" within by a comma

In [5]:
table2=""
for tr in Table1.find_all('tr'):
    row1=""
    for tds in tr.find_all('td'):
        row1=row1+","+tds.text
    table2=table2+row1[1:]
print(table2)

M1A,Not assigned,Not assigned
M2A,Not assigned,Not assigned
M3A,North York,Parkwoods
M4A,North York,Victoria Village
M5A,Downtown Toronto,Harbourfront
M5A,Downtown Toronto,Regent Park
M6A,North York,Lawrence Heights
M6A,North York,Lawrence Manor
M7A,Queen's Park,Not assigned
M8A,Not assigned,Not assigned
M9A,Etobicoke,Islington Avenue
M1B,Scarborough,Rouge
M1B,Scarborough,Malvern
M2B,Not assigned,Not assigned
M3B,North York,Don Mills North
M4B,East York,Woodbine Gardens
M4B,East York,Parkview Hill
M5B,Downtown Toronto,Ryerson
M5B,Downtown Toronto,Garden District
M6B,North York,Glencairn
M7B,Not assigned,Not assigned
M8B,Not assigned,Not assigned
M9B,Etobicoke,Cloverdale
M9B,Etobicoke,Islington
M9B,Etobicoke,Martin Grove
M9B,Etobicoke,Princess Gardens
M9B,Etobicoke,West Deane Park
M1C,Scarborough,Highland Creek
M1C,Scarborough,Rouge Hill
M1C,Scarborough,Port Union
M2C,Not assigned,Not assigned
M3C,North York,Flemingdon Park
M3C,North York,Don Mills South
M4C,East York,Woodbine Heights
M

### Write the data into a .csv file

In [6]:
file_tor=open("toronto.csv","wb")
#file_tor.write(bytes(headers,encoding="ascii",errors="ignore"))
file_tor.write(bytes(table2,encoding="ascii",errors="ignore"))

8738

### Convert into a pandas dataframe

In [7]:
import pandas as pd
df = pd.read_csv('toronto.csv',header=None)
df.columns=["Postalcode","Borough","Neighborhood"]



df.head(10)

Unnamed: 0,Postalcode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights
7,M6A,North York,Lawrence Manor
8,M7A,Queen's Park,Not assigned
9,M8A,Not assigned,Not assigned


### Keep only the cells that have an assigned Borough. Ignore cells with a Borough that is "Not assigned." 

In [8]:
# Get names of indexes for which the column Borough has value "Not assigned"
index_names = df[ df['Borough'] =='Not assigned'].index

# Delete these row indexes from dataframe
df.drop(index_names , inplace=True)

df.head(10)

Unnamed: 0,Postalcode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights
7,M6A,North York,Lawrence Manor
8,M7A,Queen's Park,Not assigned
10,M9A,Etobicoke,Islington Avenue
11,M1B,Scarborough,Rouge
12,M1B,Scarborough,Malvern


### If a cell has a Borough, but a "Not assigned" Neighborhood, then the Neighborhood will be indentical to the Borough

In [9]:
df.loc[df['Neighborhood'] =='Not assigned' , 'Neighborhood'] = df['Borough']
df.head(10)

Unnamed: 0,Postalcode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights
7,M6A,North York,Lawrence Manor
8,M7A,Queen's Park,Queen's Park
10,M9A,Etobicoke,Islington Avenue
11,M1B,Scarborough,Rouge
12,M1B,Scarborough,Malvern


### More than one Neighborhood can exist in one postal code area. When this is the case for a particular row, have both Neighborhoods included separated with a comma.

In [10]:
combo = df.groupby(['Postalcode','Borough'], sort=False).agg( ','.join)

df2=combo.reset_index()
df2.head(20)

Unnamed: 0,Postalcode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Harbourfront,Regent Park"
3,M6A,North York,"Lawrence Heights,Lawrence Manor"
4,M7A,Queen's Park,Queen's Park
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Rouge,Malvern"
7,M3B,North York,Don Mills North
8,M4B,East York,"Woodbine Gardens,Parkview Hill"
9,M5B,Downtown Toronto,"Ryerson,Garden District"


### Print number of rows in dataframe....3 columns with 103 rows

In [11]:
df2.shape

(103, 3)

### Use the .csv (from http://cocl.us/Geospatial_data) file to create a dataframe with longitude and latitude for each postal code.

In [12]:
coords=pd.read_csv('http://cocl.us/Geospatial_data')

In [13]:
coords.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


### Merge dataframes

In [14]:
coords.rename(columns={'Postal Code':'Postalcode'},inplace=True)
coords_merged = pd.merge(coords, df2, on='Postalcode')
coords_merged.head()

Unnamed: 0,Postalcode,Latitude,Longitude,Borough,Neighborhood
0,M1B,43.806686,-79.194353,Scarborough,"Rouge,Malvern"
1,M1C,43.784535,-79.160497,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,43.763573,-79.188711,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,43.770992,-79.216917,Scarborough,Woburn
4,M1H,43.773136,-79.239476,Scarborough,Cedarbrae


### Correct sequence and print table head

In [15]:
# Correct the sequence of data
coords_df2=coords_merged[['Postalcode','Borough','Neighborhood','Latitude','Longitude']]
coords_df2.head()

Unnamed: 0,Postalcode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [16]:
coords_df2.shape

(103, 5)

### Install necessary libraries

In [17]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Libraries imported.


### Find coordinates of Toronto

In [18]:
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="Toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


### Create map of Toronto using folium and add markers for neighbourhoods/boroughs

In [19]:
# Create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# Add markers to map
for lat, lng, borough, neighborhood in zip(coords_df2['Latitude'], coords_df2['Longitude'], coords_df2['Borough'], coords_df2['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### Now, let's do the same, but for the 'Downtown Toronto' Borough only.

In [20]:
DownT = coords_df2[coords_df2['Borough'] == 'Downtown Toronto'].reset_index(drop=True)
DownT

Unnamed: 0,Postalcode,Borough,Neighborhood,Latitude,Longitude
0,M4W,Downtown Toronto,Rosedale,43.679563,-79.377529
1,M4X,Downtown Toronto,"Cabbagetown,St. James Town",43.667967,-79.367675
2,M4Y,Downtown Toronto,Church and Wellesley,43.66586,-79.38316
3,M5A,Downtown Toronto,"Harbourfront,Regent Park",43.65426,-79.360636
4,M5B,Downtown Toronto,"Ryerson,Garden District",43.657162,-79.378937
5,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
6,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
7,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
8,M5H,Downtown Toronto,"Adelaide,King,Richmond",43.650571,-79.384568
9,M5J,Downtown Toronto,"Harbourfront East,Toronto Islands,Union Station",43.640816,-79.381752


In [21]:
DownT.shape

(18, 5)

In [22]:
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="Tor_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Downtown Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Downtown Toronto are 43.653963, -79.387207.


In [23]:
# Create map of Toronto using latitude and longitude values
toronto_latitude = 43.6539; toronto_longitude = -79.3872
map_DownT = folium.Map(location=[toronto_latitude, toronto_longitude], zoom_start=13)

# Add markers to map
for lat, lng, borough, neighborhood in zip(DownT['Latitude'], DownT['Longitude'], DownT['Borough'], DownT['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_DownT)  
    
map_DownT

### Next, we are going to use Foursquare's API to explore the neighborhoods and segment them.

### Define Foursquare Credentials and Version

In [24]:
CLIENT_ID = 'WZYT5YWOESAEIPU3WBWLQ5UYVUXC1OFKQXB1V4USLJUO5ITZ' # your Foursquare ID
CLIENT_SECRET = 'QZ355UJ3GDVVYU1FXA2CYTJC0H4BG4OZVDXRAJOHMQBIG2YF' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: WZYT5YWOESAEIPU3WBWLQ5UYVUXC1OFKQXB1V4USLJUO5ITZ
CLIENT_SECRET:QZ355UJ3GDVVYU1FXA2CYTJC0H4BG4OZVDXRAJOHMQBIG2YF


### Now, let's get the top 100 venues that are in Downtown Toronto

In [25]:
def getNearbyVenues(names, latitudes, longitudes, radius=10000, LIMIT=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius,
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

### Now run the above function on each neighborhood and create a new dataframe called DownT_venues.

In [27]:
DownT_venues = getNearbyVenues(names=DownT['Neighborhood'],
                                   latitudes=DownT['Latitude'],
                                   longitudes=DownT['Longitude']
                                  )

Rosedale
Cabbagetown,St. James Town
Church and Wellesley
Harbourfront,Regent Park
Ryerson,Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide,King,Richmond
Harbourfront East,Toronto Islands,Union Station
Design Exchange,Toronto Dominion Centre
Commerce Court,Victoria Hotel
Harbord,University of Toronto
Chinatown,Grange Park,Kensington Market
CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place,Underground city
Christie


### Check size and resulting dataframe

In [28]:
print(DownT_venues.shape)
DownT_venues.head()

(1800, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Rosedale,43.679563,-79.377529,Summerhill Market,43.686265,-79.375458,Grocery Store
1,Rosedale,43.679563,-79.377529,Greenhouse Juice Co,43.679101,-79.390686,Juice Bar
2,Rosedale,43.679563,-79.377529,Black Camel,43.677016,-79.389367,BBQ Joint
3,Rosedale,43.679563,-79.377529,LCBO,43.681497,-79.391261,Liquor Store
4,Rosedale,43.679563,-79.377529,Evergreen Brick Works,43.684401,-79.365242,Historic Site


### Check how many venues were returned for each Neighbourhood

In [29]:
DownT_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide,King,Richmond",100,100,100,100,100,100
Berczy Park,100,100,100,100,100,100
"CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara",100,100,100,100,100,100
"Cabbagetown,St. James Town",100,100,100,100,100,100
Central Bay Street,100,100,100,100,100,100
"Chinatown,Grange Park,Kensington Market",100,100,100,100,100,100
Christie,100,100,100,100,100,100
Church and Wellesley,100,100,100,100,100,100
"Commerce Court,Victoria Hotel",100,100,100,100,100,100
"Design Exchange,Toronto Dominion Centre",100,100,100,100,100,100


### How many unique categories can be curated from all the returned venues?

In [30]:
print('There are {} uniques categories.'.format(len(DownT_venues['Venue Category'].unique())))

There are 99 uniques categories.


### What type of venues are in each Neighborhood?

In [31]:
# one hot encoding
DownT_onehot = pd.get_dummies(DownT_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
DownT_onehot['Neighborhood'] = DownT_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [DownT_onehot.columns[-1]] + list(DownT_onehot.columns[:-1])
DownT_onehot = DownT_onehot[fixed_columns]

DownT_onehot.head(20)

Unnamed: 0,Yoga Studio,American Restaurant,Aquarium,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bar,Baseball Stadium,Beach,Beer Bar,Bookstore,Brewery,Bubble Tea Shop,Butcher,Café,Castle,Chocolate Shop,Circus,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club,Concert Hall,Cosmetics Shop,Creperie,Cuban Restaurant,Cupcake Shop,Dance Studio,Dessert Shop,Diner,Dog Run,Farmers Market,Field,Fish & Chips Shop,Food Truck,French Restaurant,Garden,Gastropub,Gift Shop,Greek Restaurant,Grocery Store,Gym,Harbor / Marina,Historic Site,Hostel,Hotel,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Korean Restaurant,Lake,Liquor Store,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Monument / Landmark,Movie Theater,Museum,Music Venue,Neighborhood,New American Restaurant,Organic Grocery,Paper / Office Supplies Store,Park,Performing Arts Venue,Pizza Place,Plaza,Racetrack,Ramen Restaurant,Record Shop,Restaurant,Salad Place,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shoe Store,Snack Place,Soccer Stadium,South American Restaurant,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Steakhouse,Supermarket,Tapas Restaurant,Thai Restaurant,Theater,Theme Restaurant,Train Station,Vegetarian / Vegan Restaurant,Whisky Bar
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Rosedale,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,Rosedale,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Rosedale,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,Rosedale,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Rosedale,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Rosedale,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Rosedale,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
7,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Rosedale,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Rosedale,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Rosedale,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0


### Examine the shape

In [32]:
DownT_onehot.shape

(1800, 99)

### Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [33]:
DownT_grouped = DownT_onehot.groupby('Neighborhood').mean().reset_index()
DownT_grouped

Unnamed: 0,Neighborhood,Yoga Studio,American Restaurant,Aquarium,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bar,Baseball Stadium,Beach,Beer Bar,Bookstore,Brewery,Bubble Tea Shop,Butcher,Café,Castle,Chocolate Shop,Circus,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club,Concert Hall,Cosmetics Shop,Creperie,Cuban Restaurant,Cupcake Shop,Dance Studio,Dessert Shop,Diner,Dog Run,Farmers Market,Field,Fish & Chips Shop,Food Truck,French Restaurant,Garden,Gastropub,Gift Shop,Greek Restaurant,Grocery Store,Gym,Harbor / Marina,Historic Site,Hostel,Hotel,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Korean Restaurant,Lake,Liquor Store,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Monument / Landmark,Movie Theater,Museum,Music Venue,New American Restaurant,Organic Grocery,Paper / Office Supplies Store,Park,Performing Arts Venue,Pizza Place,Plaza,Racetrack,Ramen Restaurant,Record Shop,Restaurant,Salad Place,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shoe Store,Snack Place,Soccer Stadium,South American Restaurant,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Steakhouse,Supermarket,Tapas Restaurant,Thai Restaurant,Theater,Theme Restaurant,Train Station,Vegetarian / Vegan Restaurant,Whisky Bar
0,"Adelaide,King,Richmond",0.02,0.01,0.01,0.02,0.01,0.0,0.0,0.0,0.03,0.01,0.01,0.0,0.01,0.0,0.01,0.01,0.01,0.04,0.0,0.0,0.0,0.01,0.01,0.08,0.01,0.01,0.01,0.01,0.0,0.0,0.02,0.01,0.01,0.0,0.03,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.03,0.0,0.0,0.0,0.03,0.02,0.0,0.0,0.0,0.01,0.0,0.01,0.02,0.01,0.01,0.01,0.01,0.0,0.0,0.01,0.0,0.04,0.01,0.03,0.0,0.0,0.0,0.01,0.04,0.01,0.03,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.02,0.02,0.01,0.0,0.01,0.02,0.01,0.01,0.01,0.0
1,Berczy Park,0.02,0.01,0.01,0.02,0.01,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.02,0.0,0.01,0.01,0.01,0.01,0.09,0.01,0.01,0.01,0.01,0.0,0.0,0.01,0.02,0.01,0.0,0.03,0.0,0.0,0.01,0.02,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.04,0.0,0.0,0.0,0.03,0.02,0.0,0.0,0.0,0.01,0.0,0.02,0.02,0.01,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.06,0.01,0.03,0.01,0.0,0.0,0.01,0.03,0.01,0.03,0.01,0.01,0.01,0.0,0.0,0.0,0.01,0.01,0.01,0.02,0.02,0.01,0.0,0.01,0.02,0.0,0.01,0.01,0.0
2,"CN Tower,Bathurst Quay,Island airport,Harbourf...",0.02,0.01,0.01,0.03,0.01,0.0,0.0,0.0,0.05,0.01,0.01,0.0,0.02,0.01,0.02,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.03,0.03,0.01,0.01,0.01,0.0,0.01,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.04,0.01,0.0,0.0,0.01,0.01,0.0,0.01,0.02,0.0,0.01,0.01,0.01,0.01,0.01,0.0,0.01,0.06,0.01,0.04,0.01,0.01,0.0,0.01,0.02,0.01,0.03,0.01,0.02,0.01,0.0,0.01,0.0,0.01,0.01,0.01,0.02,0.02,0.0,0.0,0.01,0.01,0.0,0.01,0.01,0.0
3,"Cabbagetown,St. James Town",0.0,0.01,0.0,0.01,0.01,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.02,0.01,0.0,0.04,0.0,0.01,0.0,0.01,0.0,0.1,0.01,0.02,0.01,0.01,0.0,0.0,0.01,0.02,0.0,0.0,0.04,0.0,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.02,0.01,0.04,0.02,0.0,0.0,0.01,0.03,0.01,0.01,0.0,0.0,0.01,0.02,0.01,0.01,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.07,0.0,0.03,0.0,0.0,0.0,0.01,0.03,0.01,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.02,0.02,0.01,0.0,0.01,0.02,0.01,0.01,0.01,0.0
4,Central Bay Street,0.01,0.01,0.01,0.02,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.01,0.01,0.01,0.06,0.0,0.0,0.0,0.01,0.02,0.07,0.01,0.02,0.01,0.01,0.0,0.0,0.02,0.01,0.01,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.03,0.01,0.0,0.0,0.03,0.03,0.01,0.0,0.0,0.0,0.0,0.01,0.02,0.01,0.01,0.01,0.02,0.0,0.0,0.01,0.0,0.01,0.0,0.03,0.0,0.0,0.01,0.01,0.04,0.01,0.04,0.01,0.01,0.01,0.0,0.0,0.0,0.01,0.01,0.01,0.02,0.02,0.01,0.01,0.01,0.02,0.01,0.01,0.02,0.0
5,"Chinatown,Grange Park,Kensington Market",0.02,0.01,0.01,0.03,0.01,0.02,0.0,0.0,0.04,0.01,0.01,0.0,0.01,0.01,0.02,0.0,0.01,0.05,0.0,0.0,0.0,0.01,0.03,0.04,0.01,0.02,0.01,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.02,0.01,0.0,0.01,0.04,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.02,0.0,0.01,0.01,0.01,0.0,0.01,0.01,0.0,0.03,0.0,0.04,0.0,0.0,0.01,0.01,0.03,0.0,0.06,0.01,0.02,0.0,0.0,0.0,0.0,0.02,0.01,0.01,0.01,0.01,0.0,0.02,0.01,0.02,0.0,0.01,0.02,0.0
6,Christie,0.01,0.0,0.0,0.02,0.01,0.03,0.0,0.01,0.04,0.04,0.0,0.0,0.02,0.01,0.02,0.01,0.01,0.08,0.01,0.0,0.0,0.0,0.03,0.04,0.0,0.01,0.01,0.0,0.01,0.01,0.02,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.02,0.02,0.01,0.03,0.02,0.01,0.01,0.01,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.04,0.0,0.04,0.0,0.0,0.01,0.01,0.01,0.0,0.05,0.0,0.02,0.0,0.01,0.0,0.01,0.03,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.01
7,Church and Wellesley,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.06,0.0,0.0,0.0,0.01,0.01,0.08,0.01,0.02,0.01,0.01,0.0,0.0,0.02,0.01,0.01,0.0,0.04,0.0,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.02,0.01,0.03,0.01,0.0,0.01,0.01,0.02,0.01,0.01,0.0,0.0,0.01,0.02,0.01,0.01,0.0,0.01,0.02,0.0,0.0,0.01,0.0,0.04,0.0,0.03,0.0,0.0,0.01,0.01,0.04,0.0,0.04,0.01,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.01,0.01,0.01,0.01,0.01,0.01,0.02,0.01,0.01,0.02,0.0
8,"Commerce Court,Victoria Hotel",0.02,0.01,0.01,0.02,0.01,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.04,0.0,0.01,0.0,0.01,0.01,0.08,0.01,0.01,0.01,0.01,0.0,0.0,0.02,0.02,0.01,0.0,0.03,0.0,0.0,0.01,0.02,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.03,0.0,0.0,0.0,0.03,0.02,0.0,0.0,0.0,0.01,0.0,0.02,0.02,0.01,0.01,0.01,0.01,0.0,0.0,0.01,0.0,0.04,0.01,0.03,0.0,0.0,0.0,0.01,0.03,0.01,0.03,0.01,0.01,0.01,0.0,0.0,0.0,0.01,0.01,0.01,0.02,0.02,0.01,0.0,0.01,0.02,0.01,0.01,0.01,0.0
9,"Design Exchange,Toronto Dominion Centre",0.02,0.01,0.01,0.02,0.01,0.0,0.0,0.0,0.03,0.01,0.01,0.0,0.01,0.0,0.01,0.0,0.01,0.04,0.0,0.01,0.0,0.01,0.01,0.08,0.01,0.01,0.01,0.01,0.0,0.0,0.01,0.02,0.01,0.0,0.03,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.03,0.0,0.0,0.0,0.03,0.02,0.0,0.0,0.0,0.01,0.0,0.02,0.02,0.01,0.01,0.01,0.01,0.0,0.0,0.01,0.0,0.04,0.01,0.03,0.0,0.0,0.0,0.01,0.03,0.01,0.03,0.01,0.01,0.01,0.0,0.0,0.0,0.01,0.01,0.01,0.02,0.02,0.01,0.0,0.01,0.02,0.0,0.01,0.01,0.0


### Check the new size

In [34]:
DownT_grouped.shape

(18, 99)

### Print each neighborhood along with the top 10 most common venues for each

In [35]:
num_top_venues = 10

for hood in DownT_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = DownT_grouped[DownT_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide,King,Richmond----
                venue  freq
0         Coffee Shop  0.08
1                Café  0.04
2          Restaurant  0.04
3                Park  0.04
4      Farmers Market  0.03
5         Pizza Place  0.03
6  Italian Restaurant  0.03
7      Sandwich Place  0.03
8              Bakery  0.03
9               Hotel  0.03


----Berczy Park----
                venue  freq
0         Coffee Shop  0.09
1                Park  0.06
2               Hotel  0.04
3      Farmers Market  0.03
4         Pizza Place  0.03
5          Restaurant  0.03
6  Italian Restaurant  0.03
7      Sandwich Place  0.03
8                Café  0.02
9  Mexican Restaurant  0.02


----CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara----
                venue  freq
0                Park  0.06
1              Bakery  0.05
2  Italian Restaurant  0.04
3         Pizza Place  0.04
4        Cocktail Bar  0.03
5         Art Gallery  0.03
6         Coffee Shop  0

### Create new pandas dataframe

### First, write a function to sort the venues in descending order.

In [36]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

### Create the new dataframe and display the top 10 venues for each neighborhood.

In [37]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = DownT_grouped['Neighborhood']

for ind in np.arange(DownT_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(DownT_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head(10)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide,King,Richmond",Coffee Shop,Café,Park,Restaurant,Hotel,Sandwich Place,Bakery,Farmers Market,Italian Restaurant,Pizza Place
1,Berczy Park,Coffee Shop,Park,Hotel,Sandwich Place,Farmers Market,Italian Restaurant,Pizza Place,Restaurant,Art Gallery,Café
2,"CN Tower,Bathurst Quay,Island airport,Harbourf...",Park,Bakery,Italian Restaurant,Pizza Place,Art Gallery,Cocktail Bar,Coffee Shop,Sandwich Place,Hotel,Seafood Restaurant
3,"Cabbagetown,St. James Town",Coffee Shop,Park,Hotel,Café,Farmers Market,Japanese Restaurant,Pizza Place,Restaurant,Museum,Brewery
4,Central Bay Street,Coffee Shop,Café,Restaurant,Sandwich Place,Italian Restaurant,Pizza Place,Hotel,Japanese Restaurant,Art Gallery,Dance Studio
5,"Chinatown,Grange Park,Kensington Market",Sandwich Place,Café,Coffee Shop,Pizza Place,Bakery,Italian Restaurant,Park,Restaurant,Cocktail Bar,Art Gallery
6,Christie,Café,Sandwich Place,Park,Pizza Place,Bakery,Bar,Coffee Shop,Asian Restaurant,Italian Restaurant,Cocktail Bar
7,Church and Wellesley,Coffee Shop,Café,Sandwich Place,Farmers Market,Park,Restaurant,Pizza Place,Spa,Hotel,Mediterranean Restaurant
8,"Commerce Court,Victoria Hotel",Coffee Shop,Café,Park,Hotel,Sandwich Place,Farmers Market,Italian Restaurant,Pizza Place,Restaurant,Art Gallery
9,"Design Exchange,Toronto Dominion Centre",Coffee Shop,Café,Park,Hotel,Sandwich Place,Bakery,Farmers Market,Italian Restaurant,Pizza Place,Restaurant


### Now to cluster the neighborhoods

### Run k-means to cluster the neighborhood into 5 clusters.

In [39]:
# set number of clusters
kclusters = 5

DownT_grouped_clustering = DownT_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(DownT_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 0, 4, 3, 2, 1, 1, 3, 2, 2])

### Create a new dataframe that includes the clusters as well as the top 10 venues for each neighborhood.

In [40]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster', kmeans.labels_)

DownT_merged = DownT #coords_df2

# merge DownT_grouped with toronto_data to add latitude/longitude for each neighborhood
DownT_merged = DownT_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

DownT_merged.head(20) # check the last columns!

Unnamed: 0,Postalcode,Borough,Neighborhood,Latitude,Longitude,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4W,Downtown Toronto,Rosedale,43.679563,-79.377529,3,Coffee Shop,Café,Park,Farmers Market,Hotel,Dessert Shop,Restaurant,Sandwich Place,Museum,Japanese Restaurant
1,M4X,Downtown Toronto,"Cabbagetown,St. James Town",43.667967,-79.367675,3,Coffee Shop,Park,Hotel,Café,Farmers Market,Japanese Restaurant,Pizza Place,Restaurant,Museum,Brewery
2,M4Y,Downtown Toronto,Church and Wellesley,43.66586,-79.38316,3,Coffee Shop,Café,Sandwich Place,Farmers Market,Park,Restaurant,Pizza Place,Spa,Hotel,Mediterranean Restaurant
3,M5A,Downtown Toronto,"Harbourfront,Regent Park",43.65426,-79.360636,0,Coffee Shop,Park,Hotel,Café,Brewery,Pizza Place,Farmers Market,Restaurant,Sandwich Place,Art Gallery
4,M5B,Downtown Toronto,"Ryerson,Garden District",43.657162,-79.378937,2,Coffee Shop,Restaurant,Café,Park,Farmers Market,Pizza Place,Sandwich Place,Hotel,Steakhouse,Art Gallery
5,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,0,Coffee Shop,Park,Hotel,Café,Farmers Market,Pizza Place,Sandwich Place,Restaurant,Steakhouse,Dance Studio
6,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,0,Coffee Shop,Park,Hotel,Sandwich Place,Farmers Market,Italian Restaurant,Pizza Place,Restaurant,Art Gallery,Café
7,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,2,Coffee Shop,Café,Restaurant,Sandwich Place,Italian Restaurant,Pizza Place,Hotel,Japanese Restaurant,Art Gallery,Dance Studio
8,M5H,Downtown Toronto,"Adelaide,King,Richmond",43.650571,-79.384568,2,Coffee Shop,Café,Park,Restaurant,Hotel,Sandwich Place,Bakery,Farmers Market,Italian Restaurant,Pizza Place
9,M5J,Downtown Toronto,"Harbourfront East,Toronto Islands,Union Station",43.640816,-79.381752,2,Coffee Shop,Park,Hotel,Sandwich Place,Bakery,Café,Farmers Market,Italian Restaurant,Pizza Place,Restaurant


### Finally, let's visualize the resulting clusters

In [41]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(DownT_merged['Latitude'], DownT_merged['Longitude'], DownT_merged['Neighborhood'], DownT_merged['Cluster']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster. 

### We see Cluster 0 is notable for coffee shops, parks, and hotels, mainly

In [42]:
DownT_merged.loc[DownT_merged['Cluster'] == 0, DownT_merged.columns[[1] + list(range(5, DownT_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Downtown Toronto,0,Coffee Shop,Park,Hotel,Café,Brewery,Pizza Place,Farmers Market,Restaurant,Sandwich Place,Art Gallery
5,Downtown Toronto,0,Coffee Shop,Park,Hotel,Café,Farmers Market,Pizza Place,Sandwich Place,Restaurant,Steakhouse,Dance Studio
6,Downtown Toronto,0,Coffee Shop,Park,Hotel,Sandwich Place,Farmers Market,Italian Restaurant,Pizza Place,Restaurant,Art Gallery,Café
15,Downtown Toronto,0,Coffee Shop,Park,Hotel,Sandwich Place,Farmers Market,Italian Restaurant,Pizza Place,Restaurant,Art Gallery,Café


### We see Cluster 1 is notable for sandwich shops and cafe's

In [43]:
DownT_merged.loc[DownT_merged['Cluster'] == 1, DownT_merged.columns[[1] + list(range(5, DownT_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,Downtown Toronto,1,Café,Sandwich Place,Coffee Shop,Pizza Place,Italian Restaurant,Bakery,Cocktail Bar,Park,Hotel,Art Gallery
13,Downtown Toronto,1,Sandwich Place,Café,Coffee Shop,Pizza Place,Bakery,Italian Restaurant,Park,Restaurant,Cocktail Bar,Art Gallery
17,Downtown Toronto,1,Café,Sandwich Place,Park,Pizza Place,Bakery,Bar,Coffee Shop,Asian Restaurant,Italian Restaurant,Cocktail Bar


### We see Cluster 2 is notable for coffee shops, cafe's, and parks

In [44]:
DownT_merged.loc[DownT_merged['Cluster'] == 2, DownT_merged.columns[[1] + list(range(5, DownT_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Downtown Toronto,2,Coffee Shop,Restaurant,Café,Park,Farmers Market,Pizza Place,Sandwich Place,Hotel,Steakhouse,Art Gallery
7,Downtown Toronto,2,Coffee Shop,Café,Restaurant,Sandwich Place,Italian Restaurant,Pizza Place,Hotel,Japanese Restaurant,Art Gallery,Dance Studio
8,Downtown Toronto,2,Coffee Shop,Café,Park,Restaurant,Hotel,Sandwich Place,Bakery,Farmers Market,Italian Restaurant,Pizza Place
9,Downtown Toronto,2,Coffee Shop,Park,Hotel,Sandwich Place,Bakery,Café,Farmers Market,Italian Restaurant,Pizza Place,Restaurant
10,Downtown Toronto,2,Coffee Shop,Café,Park,Hotel,Sandwich Place,Bakery,Farmers Market,Italian Restaurant,Pizza Place,Restaurant
11,Downtown Toronto,2,Coffee Shop,Café,Park,Hotel,Sandwich Place,Farmers Market,Italian Restaurant,Pizza Place,Restaurant,Art Gallery
16,Downtown Toronto,2,Coffee Shop,Park,Café,Hotel,Sandwich Place,Pizza Place,Italian Restaurant,Farmers Market,Bakery,Restaurant


### We see Cluster 3 is notable for coffee shops and cafe's

In [45]:
DownT_merged.loc[DownT_merged['Cluster'] == 3, DownT_merged.columns[[1] + list(range(5, DownT_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,3,Coffee Shop,Café,Park,Farmers Market,Hotel,Dessert Shop,Restaurant,Sandwich Place,Museum,Japanese Restaurant
1,Downtown Toronto,3,Coffee Shop,Park,Hotel,Café,Farmers Market,Japanese Restaurant,Pizza Place,Restaurant,Museum,Brewery
2,Downtown Toronto,3,Coffee Shop,Café,Sandwich Place,Farmers Market,Park,Restaurant,Pizza Place,Spa,Hotel,Mediterranean Restaurant


### And cluster 4 is mainly known for the park

In [46]:
DownT_merged.loc[DownT_merged['Cluster'] == 4, DownT_merged.columns[[1] + list(range(5, DownT_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,Downtown Toronto,4,Park,Bakery,Italian Restaurant,Pizza Place,Art Gallery,Cocktail Bar,Coffee Shop,Sandwich Place,Hotel,Seafood Restaurant


### Taken together, it seems that people in Toronto really enjoy coffee!