# Segment & Cluster Toronto Neighborhoods
by: Diardano Raihan (Indonesia)
<hr>

__Objective__:
- Previously, we have succeeded to retrieve the latitude and longitude coordinate in `Pre2_Coordinate_Retrieval.ipynb`notebook file. 


- Now, we will __explore__, __segment__, and __group neighborhoods__ into clusters to find similar neighborhoods in __Toronto City__.


- Specifically, say you currently live at Northwest, Etobicoke, and you got a new job at Downtown Toronto. You want to move there but trying to find a settlement as the same as your current neighborhood. This notebook will help you find the same neighborhood as you currently live in Downtown Toronto.

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%config IPCompleter.greedy=True
%config IPCompleter.use_jedi=False

## Load Data

Let's import `toronto_poscode_latlng.csv` and turn it into a dataframe:

In [4]:
toronto_df = pd.read_csv('datasets/toronto_poscode_latlng.csv')
print(toronto_df.shape)
toronto_df.head()

(103, 5)


Unnamed: 0,PostalCode,Borough,Neighbourhood,latitude,longitude
0,M3A,North York,Parkwoods,43.75245,-79.32991
1,M4A,North York,Victoria Village,43.73057,-79.31306
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65512,-79.36264
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.72327,-79.45042
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.66253,-79.39188


As you might have guessed by now, for each postal code, we can have more than one neighborhood. From now on, we will treat each postal code as a neighborhood also. Let's see how many boroughs and postal codes (neighborhoods) we got:

In [5]:
print('The dataframe has {} boroughs and {} postal codes.'.format(
        len(toronto_df['Borough'].unique()),
        toronto_df.shape[0]
    )
)

The dataframe has 10 boroughs and 103 postal codes.


## Map: Toronto & Neighborhoods

Now that we have data required to create a map of each neighborhood coordinate using __Folium__ module. 

What's left is to define the coordinate of Toronto City itself. We can do get the coordinate using __Geopy__ library.

In [6]:
from geopy.geocoders import Nominatim

address = 'Toronto, Ontario'

# Define a unique user_agent
geolocator = Nominatim(user_agent="toronto_explorer")

# Retrieve Toronto coordinate
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto City are 43.6534817, -79.3839347.


Now, we can see the neighbourhoods being superimposed on top of the city

In [7]:
#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10.5)

# add markers to map
for lat, lng, borough, neighborhood in zip(toronto_df['latitude'], toronto_df['longitude'], toronto_df['Borough'], toronto_df['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

## Map: A Borough and Neighborhoods

We will pick boroughs that contain _'toronto'_ name in it.
Let's see what boroughs those are:

In [8]:
toronto_df.groupby(by='Borough').count().sort_values(by='Neighbourhood', ascending=False)

Unnamed: 0_level_0,PostalCode,Neighbourhood,latitude,longitude
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
North York,24,24,24,24
Downtown Toronto,19,19,19,19
Scarborough,17,17,17,17
Etobicoke,12,12,12,12
Central Toronto,9,9,9,9
West Toronto,6,6,6,6
East Toronto,5,5,5,5
East York,5,5,5,5
York,5,5,5,5
Mississauga,1,1,1,1


Say you currently live at _Northwest_, __Etobicoke__, and you got a new job at __Downtown Toronto__. You want to move there but trying to find a settlement as the same as your current neighborhood.

We will name it __`downtown_df`__ dataframe to rerpresent all neghborhoods containing both boroughs.

In [9]:
# downtown_df = toronto_df[toronto_df.Borough.str.contains('Toronto') == True].reset_index(drop=True)
downtown_df = toronto_df[(toronto_df['Borough']=='Downtown Toronto') | (toronto_df['Borough']=='Etobicoke')].reset_index(drop=True)
print(downtown_df.shape)
downtown_df.head()

(31, 5)


Unnamed: 0,PostalCode,Borough,Neighbourhood,latitude,longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65512,-79.36264
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.66253,-79.39188
2,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.66263,-79.52831
3,M5B,Downtown Toronto,"Garden District, Ryerson",43.65739,-79.37804
4,M9B,Etobicoke,"West Deane Park, Princess Gardens, Martin Grov...",43.65034,-79.55362


Let's get the geographical coordinates of North York.

In [10]:
# address = 'Downtown Toronto, Toronto'

# geolocator = Nominatim(user_agent="toronto_explorer")
# location = geolocator.geocode(address)
# latitude = location.latitude
# longitude = location.longitude
# print('The geograpical coordinate of North York are {}, {}.'.format(latitude, longitude))

Let's visualize Downtown Toronto and Etobicokewith neighborhoods.

In [11]:
# create map of Toronto using latitude and longitude values
map_downtownToronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(downtown_df['latitude'], downtown_df['longitude'], downtown_df['Borough'], downtown_df['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_downtownToronto)  
    
map_downtownToronto

## Explore a Neighborhood in Downtown Toronto

Now, we will utilize the FourSquare API to explore Downtown neighborhoods and segment them

1. __Define Foursquare Credentials and Version__

In [12]:
# @hidden_cell
CLIENT_ID = 'XXX' # your Foursquare ID
CLIENT_SECRET = 'XXX' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

# print('Your credentails:')
# print('CLIENT_ID: ' + CLIENT_ID)
# print('CLIENT_SECRET:' + CLIENT_SECRET)

2. __Let's explore the first neighborhood in our dataframe.__

In [13]:
downtown_df.head(1)

Unnamed: 0,PostalCode,Borough,Neighbourhood,latitude,longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65512,-79.36264


In [14]:
print('First neighborhood: {}'.format(downtown_df.loc[0,'Neighbourhood']))

First neighborhood: Regent Park, Harbourfront


- Get the location coordinate of the neighborhood

In [15]:
neighborhood_latitude = downtown_df.loc[0, 'latitude'] # neighborhood latitude value
neighborhood_longitude = downtown_df.loc[0, 'longitude'] # neighborhood longitude value

neighborhood_name = downtown_df.loc[0, 'Neighbourhood'] # neighborhood name

print('The coordinate values of {} are\n- latitude: {},\n- longitude: {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

The coordinate values of Regent Park, Harbourfront are
- latitude: 43.65512000000007,
- longitude: -79.36263999999993.


3. __Now, let's get the top 100 venues that are in Regent Park, Harbourfront within a radius of 500 meters.__

- Create a GET request URL

In [16]:

LIMIT = 100
RADIUS = 500

url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        neighborhood_latitude, 
        neighborhood_longitude,
        RADIUS,
        LIMIT)
# url

- Send the GET request and examine the resutls

In [17]:
import requests
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5faaf105d9a716314477ad0d'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Corktown',
  'headerFullLocation': 'Corktown, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 20,
  'suggestedBounds': {'ne': {'lat': 43.65962000450007,
    'lng': -79.35643191123269},
   'sw': {'lat': 43.650619995500065, 'lng': -79.36884808876717}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '54ea41ad498e9a11e9e13308',
       'name': 'Roselle Desserts',
       'location': {'address': '362 King St E',
        'crossStreet': 'Trinity St',
        'lat': 43.653446723052674,
        'lng': -79.3620167174383,
        'labeledLatLngs': [{'label'

In [18]:
results['response']['groups'][0]['items']

[{'reasons': {'count': 0,
   'items': [{'summary': 'This spot is popular',
     'type': 'general',
     'reasonName': 'globalInteractionReason'}]},
  'venue': {'id': '54ea41ad498e9a11e9e13308',
   'name': 'Roselle Desserts',
   'location': {'address': '362 King St E',
    'crossStreet': 'Trinity St',
    'lat': 43.653446723052674,
    'lng': -79.3620167174383,
    'labeledLatLngs': [{'label': 'display',
      'lat': 43.653446723052674,
      'lng': -79.3620167174383}],
    'distance': 192,
    'postalCode': 'M5A 1K9',
    'cc': 'CA',
    'city': 'Toronto',
    'state': 'ON',
    'country': 'Canada',
    'formattedAddress': ['362 King St E (Trinity St)',
     'Toronto ON M5A 1K9',
     'Canada']},
   'categories': [{'id': '4bf58dd8d48988d16a941735',
     'name': 'Bakery',
     'pluralName': 'Bakeries',
     'shortName': 'Bakery',
     'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/bakery_',
      'suffix': '.png'},
     'primary': True}],
   'photos': {'count': 0, 'grou

- Based on observation, it seems that all the information is in the __items__ key. Let's put that into a list of venues.

In [19]:
import json # library to handle JSON files
from pandas import json_normalize # tranform JSON file into a pandas dataframe

venues = results['response']['groups'][0]['items']
nearby_venues = json_normalize(venues)
nearby_venues.head(2)

Unnamed: 0,referralId,reasons.count,reasons.items,venue.id,venue.name,venue.location.address,venue.location.crossStreet,venue.location.lat,venue.location.lng,venue.location.labeledLatLngs,...,venue.location.cc,venue.location.city,venue.location.state,venue.location.country,venue.location.formattedAddress,venue.categories,venue.photos.count,venue.photos.groups,venue.venuePage.id,venue.location.neighborhood
0,e-0-54ea41ad498e9a11e9e13308-0,0,"[{'summary': 'This spot is popular', 'type': '...",54ea41ad498e9a11e9e13308,Roselle Desserts,362 King St E,Trinity St,43.653447,-79.362017,"[{'label': 'display', 'lat': 43.65344672305267...",...,CA,Toronto,ON,Canada,"[362 King St E (Trinity St), Toronto ON M5A 1K...","[{'id': '4bf58dd8d48988d16a941735', 'name': 'B...",0,[],,
1,e-0-53b8466a498e83df908c3f21-1,0,"[{'summary': 'This spot is popular', 'type': '...",53b8466a498e83df908c3f21,Tandem Coffee,368 King St E,at Trinity St,43.653559,-79.361809,"[{'label': 'display', 'lat': 43.65355870959944...",...,CA,Toronto,ON,Canada,"[368 King St E (at Trinity St), Toronto ON, Ca...","[{'id': '4bf58dd8d48988d1e0931735', 'name': 'C...",0,[],,


- Okay, we have redundant columns that we do not need. We can filter the dataframe containing data columns needed only. 

In [20]:
# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]
nearby_venues.head(2)

Unnamed: 0,venue.name,venue.categories,venue.location.lat,venue.location.lng
0,Roselle Desserts,"[{'id': '4bf58dd8d48988d16a941735', 'name': 'B...",43.653447,-79.362017
1,Tandem Coffee,"[{'id': '4bf58dd8d48988d1e0931735', 'name': 'C...",43.653559,-79.361809


- The column __venue.categories__ is DISASTROUS!!! Let's extract the category `name` only inside that list of dictionary.

In [21]:
nearby_venues['venue.categories'] = nearby_venues['venue.categories'].apply(lambda x: x[0]['name'])
nearby_venues.head(2)

Unnamed: 0,venue.name,venue.categories,venue.location.lat,venue.location.lng
0,Roselle Desserts,Bakery,43.653447,-79.362017
1,Tandem Coffee,Coffee Shop,43.653559,-79.361809


- SWEET!!!! Let's clean the column names and see the how many venues returned by FourSquare

In [22]:
nearby_venues.columns = [column.split('.')[-1] for column in nearby_venues.columns]
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))
nearby_venues.head()

20 venues were returned by Foursquare.


Unnamed: 0,name,categories,lat,lng
0,Roselle Desserts,Bakery,43.653447,-79.362017
1,Tandem Coffee,Coffee Shop,43.653559,-79.361809
2,Figs Breakfast & Lunch,Breakfast Spot,43.655675,-79.364503
3,The Yoga Lounge,Yoga Studio,43.655515,-79.364955
4,Body Blitz Spa East,Spa,43.654735,-79.359874


## Explore Neihborhoods in Downtown Toronto

- __Let's create a function to repeat the same process to all the neighborhoods in Downtown Toronto__

_Double click __here__ for the explanation_
<!--
[item for venue_list in venues_list for item in venue_list]
equals to:

for venue_list in downtown_venues:
    for item in venue_list:
        print(item)

- Result
('Regent Park, Harbourfront', 43.65512000000007, -79.36263999999993, 'Roselle Desserts', 
 43.653446723052674, -79.3620167174383, 'Bakery')
('Regent Park, Harbourfront', 43.65512000000007, -79.36263999999993, 'Tandem Coffee', 
 43.65355870959944, -79.36180945913513, 'Coffee Shop')
('Regent Park, Harbourfront', 43.65512000000007, -79.36263999999993, 'Figs Breakfast & Lunch', 
 43.65567455427388, -79.3645032892494, 'Breakfast Spot')
.......

-->

In [23]:
# get_nerby_venues(downtown_df['neighbourhood'], downtown_df['latitude'], downtown_df['longitude'])
# Return downtown_venues
def get_nearby_venues(neighborhoods, latitudes, longitudes):
    
    # Define default resul limit and radius
    LIMIT = 100
    RADIUS = 500
    
    # Define an empty venue list
    venues_list = []
    
    # Loop for each neighborhood (i.e. 1st neighborhood = Regent Park)
    for neighborhood, lat, lng in zip(neighborhoods, latitudes, longitudes):
        # Confirm the name of naighborhood
        print(neighborhood) # Regent Park
        
        # Create an API URL for each neighborhood to explore its venues
        url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
                CLIENT_ID, 
                CLIENT_SECRET, 
                VERSION, 
                lat, 
                lng,
                RADIUS,
                LIMIT)
        
        # Make the GET request and return a JSON file
        results = requests.get(url).json()['response']['groups'][0]['items']
        
        # Return only relevant information for each nearby venue (explanation is below the notebook)
        venues_list.append([(neighborhood, 
                            lat, 
                            lng,  
                            v['venue']['name'],
                            v['venue']['location']['lat'],
                            v['venue']['location']['lng'],
                            v['venue']['categories'][0]['name']) for v in results])
        
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])    
    nearby_venues.columns = ['Neighborhood', 
          'Neighborhood Latitude', 
          'Neighborhood Longitude', 
          'Venue', 
          'Venue Latitude', 
          'Venue Longitude', 
          'Venue Category']
    
    return(nearby_venues)

In [24]:
# type your answer here
downtown_venues = get_nearby_venues(downtown_df['Neighbourhood'], 
                                   downtown_df['latitude'], 
                                   downtown_df['longitude'])

Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Islington Avenue, Humber Valley Village
Garden District, Ryerson
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
St. James Town
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Harbourfront East, Union Station, Toronto Islands
Toronto Dominion Centre, Design Exchange
Commerce Court, Victoria Hotel
Westmount
Kingsview Village, St. Phillips, Martin Grove Gardens, Richview Gardens
University of Toronto, Harbord
Kensington Market, Chinatown, Grange Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
New Toronto, Mimico South, Humber Bay Shores
South Steeles, Silverstone, Humbergate, Jamestown, Mount Olive, Beaumond Heights, Thistletown, Albion Gardens
Rosedale
Stn A PO Boxes
Alderwood, Long Branch
Northwest, West Humber - Clairville
St. James Town, Cabbagetown
Fi

In [25]:
print(downtown_venues.shape)
downtown_venues.head()

(1328, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65512,-79.36264,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Regent Park, Harbourfront",43.65512,-79.36264,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Regent Park, Harbourfront",43.65512,-79.36264,Figs Breakfast & Lunch,43.655675,-79.364503,Breakfast Spot
3,"Regent Park, Harbourfront",43.65512,-79.36264,The Yoga Lounge,43.655515,-79.364955,Yoga Studio
4,"Regent Park, Harbourfront",43.65512,-79.36264,Body Blitz Spa East,43.654735,-79.359874,Spa


Let's check how many venues were returned for each neighborhood

In [26]:
downtown_venues.groupby( by='Neighborhood').count().sort_values(by='Venue', ascending=False)

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Toronto Dominion Centre, Design Exchange",100,100,100,100,100,100
"Commerce Court, Victoria Hotel",100,100,100,100,100,100
"Richmond, Adelaide, King",100,100,100,100,100,100
"First Canadian Place, Underground city",100,100,100,100,100,100
"Garden District, Ryerson",100,100,100,100,100,100
Stn A PO Boxes,100,100,100,100,100,100
St. James Town,83,83,83,83,83,83
Church and Wellesley,79,79,79,79,79,79
Central Bay Street,76,76,76,76,76,76
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",76,76,76,76,76,76


__Let's find out how many unique categories can be curated from all the returned venues__

In [27]:
print('There are {} unique venue categories.'.format(len(downtown_venues['Venue Category'].unique())))

There are 208 unique venue categories.


## Analyze Each Neighborhood

__Our next objective is to create a dataframe containing the top 10 venues for each neighborhood.__

__1. First, we will do the One Hot Encoding to the venue category for each neighborhood.__

In [28]:
# one hot encoding and adjust the name prefix
downtown_onehot = pd.get_dummies(downtown_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to the dataframe
downtown_onehot['Neighbourhood'] = downtown_venues['Neighborhood']

# # move neighborhood column to the first column
fixed_columns = [downtown_onehot.columns[-1]] + list(downtown_onehot.columns[:-1])
downtown_onehot = downtown_onehot[fixed_columns]
downtown_onehot.head()

Unnamed: 0,Neighbourhood,Afghan Restaurant,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,...,Vegetarian / Vegan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
4,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


- And let's examine the new dataframe size.

In [29]:
print(downtown_onehot.shape)

(1328, 209)


__2. Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category__


In [30]:
downtown_grouped = downtown_onehot.groupby(by='Neighbourhood').mean().reset_index()
downtown_grouped

Unnamed: 0,Neighbourhood,Afghan Restaurant,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,...,Vegetarian / Vegan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Berczy Park,0.0,0.0,0.016667,0.0,0.016667,0.0,0.0,0.0,0.0,...,0.016667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016667
2,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,...,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158
3,Central Bay Street,0.0,0.0,0.0,0.0,0.013158,0.013158,0.0,0.0,0.0,...,0.0,0.0,0.013158,0.0,0.013158,0.013158,0.0,0.0,0.0,0.0
4,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Church and Wellesley,0.012658,0.012658,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.012658,0.0,0.0,0.0,0.0,0.012658
6,"Commerce Court, Victoria Hotel",0.0,0.04,0.0,0.0,0.01,0.0,0.0,0.01,0.0,...,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01
7,"Eringate, Bloordale Gardens, Old Burnhamthorpe...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,"First Canadian Place, Underground city",0.0,0.04,0.0,0.0,0.01,0.0,0.0,0.03,0.0,...,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0
9,"Garden District, Ryerson",0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,...,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0


Let's confirm the new size


In [31]:
downtown_grouped.shape

(31, 209)

__3. Let's create a dataframe with the top 10 most common venues__

- Define a function to sort the venues in descending order.

_Double-click **here** for the explanation._

<!--
num_top_venues = 5

for hood in manhattan_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = manhattan_grouped[manhattan_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

RESULT:
# ----Battery Park City----
#            venue  freq
# 0           Park  0.09
# 1          Hotel  0.08
# 2    Coffee Shop  0.06
# 3            Gym  0.06
# 4  Memorial Site  0.05


# ----Carnegie Hill----
#                   venue  freq
# 0           Coffee Shop  0.09
# 1                  Café  0.05
# 2           Yoga Studio  0.03
# 3     French Restaurant  0.03
# 4  Gym / Fitness Center  0.03

-->

In [32]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

- Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [33]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        # Add column names such as 1st, 2nd, 3rd Most Common Venue
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        # Add column names such as 4th, 5th, .. ,10th Most Common Venue
        columns.append('{}th Most Common Venue'.format(ind+1))
        
# In the end, we have:
# columns == ['Neighborhood', '1st Most Common Venue', .. , '10th Most Common Venue']        

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = downtown_grouped['Neighbourhood']

for ind in np.arange(downtown_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(downtown_grouped.iloc[ind, :], num_top_venues)

In [34]:
neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Alderwood, Long Branch",Convenience Store,Gym,Pub,Performing Arts Venue,Yoga Studio,Eastern European Restaurant,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm
1,Berczy Park,Coffee Shop,Farmers Market,Seafood Restaurant,Restaurant,Beer Bar,Cheese Shop,Breakfast Spot,Cocktail Bar,Bakery,Yoga Studio
2,"CN Tower, King and Spadina, Railway Lands, Har...",Italian Restaurant,Coffee Shop,Café,Bar,Park,French Restaurant,Speakeasy,Bakery,Restaurant,Sandwich Place
3,Central Bay Street,Coffee Shop,Clothing Store,Café,Sandwich Place,Middle Eastern Restaurant,Cosmetics Shop,Sushi Restaurant,Plaza,Hotel,Restaurant
4,Christie,Café,Grocery Store,Playground,Coffee Shop,Candy Store,Italian Restaurant,Athletics & Sports,Baby Store,Fish & Chips Shop,Fast Food Restaurant
5,Church and Wellesley,Coffee Shop,Restaurant,Japanese Restaurant,Gay Bar,Sushi Restaurant,Café,Dance Studio,Bubble Tea Shop,Pub,Men's Store
6,"Commerce Court, Victoria Hotel",Coffee Shop,Hotel,Restaurant,Café,American Restaurant,Italian Restaurant,Japanese Restaurant,Gym,Beer Bar,Deli / Bodega
7,"Eringate, Bloordale Gardens, Old Burnhamthorpe...",Fish & Chips Shop,Park,Grocery Store,College Rec Center,Shopping Mall,Electronics Store,Yoga Studio,Eastern European Restaurant,Fast Food Restaurant,Farmers Market
8,"First Canadian Place, Underground city",Coffee Shop,Café,Hotel,Restaurant,American Restaurant,Gym,Asian Restaurant,Seafood Restaurant,Deli / Bodega,Japanese Restaurant
9,"Garden District, Ryerson",Coffee Shop,Clothing Store,Café,Cosmetics Shop,Japanese Restaurant,Theater,Ramen Restaurant,Bubble Tea Shop,Italian Restaurant,Furniture / Home Store


<a id='item4'></a>

## Cluster the Neighborhoods

__1. Run _k_-means to cluster the neighborhood into 5 clusters.__


In [35]:
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 6

downtown_grouped_clustering = downtown_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(downtown_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([3, 5, 5, 5, 0, 5, 5, 0, 5, 5, 5, 0, 5, 4, 5, 0, 5, 0, 5, 5, 5, 0,
       0, 5, 5, 5, 1, 5, 5, 2, 2])

__2. Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.__


In [36]:
# add clustering labels
# neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
neighborhoods_venues_sorted['Cluster Labels'] = pd.Series(kmeans.labels_)

downtown_merged = downtown_df

# merge downtown_grouped with downtown_df to add latitude/longitude for each neighborhood
downtown_merged = downtown_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighbourhood')

downtown_merged.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood,latitude,longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65512,-79.36264,Coffee Shop,Breakfast Spot,Theater,Distribution Center,Pub,Restaurant,Electronics Store,Event Space,Spa,Food Truck,5
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.66253,-79.39188,Coffee Shop,Sandwich Place,Falafel Restaurant,Gastropub,Bank,Burrito Place,Theater,Fried Chicken Joint,Italian Restaurant,Portuguese Restaurant,5
2,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.66263,-79.52831,Pharmacy,Bank,Skating Rink,Park,Grocery Store,Shopping Mall,Café,Farmers Market,Farm,Falafel Restaurant,0
3,M5B,Downtown Toronto,"Garden District, Ryerson",43.65739,-79.37804,Coffee Shop,Clothing Store,Café,Cosmetics Shop,Japanese Restaurant,Theater,Ramen Restaurant,Bubble Tea Shop,Italian Restaurant,Furniture / Home Store,5
4,M9B,Etobicoke,"West Deane Park, Princess Gardens, Martin Grov...",43.65034,-79.55362,Pizza Place,Print Shop,Sandwich Place,Tea Room,Chinese Restaurant,Department Store,Creperie,Fast Food Restaurant,Farmers Market,Farm,2


__3. Finally, let's visualize the resulting clusters__

In [37]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium # map rendering library

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(downtown_merged['latitude'], downtown_merged['longitude'], downtown_merged['Neighbourhood'], downtown_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<a id=''></a>

# Examine Clusters

Now, we can examine each cluster and determine the discriminating venue categories that distinguish each cluster.

## Cluster 1

In [38]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 0, downtown_merged.columns[[2] + list(range(6, downtown_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
2,"Islington Avenue, Humber Valley Village",Bank,Skating Rink,Park,Grocery Store,Shopping Mall,Café,Farmers Market,Farm,Falafel Restaurant,0
6,"Eringate, Bloordale Gardens, Old Burnhamthorpe...",Park,Grocery Store,College Rec Center,Shopping Mall,Electronics Store,Yoga Studio,Eastern European Restaurant,Fast Food Restaurant,Farmers Market,0
9,Christie,Grocery Store,Playground,Coffee Shop,Candy Store,Italian Restaurant,Athletics & Sports,Baby Store,Fish & Chips Shop,Fast Food Restaurant,0
19,"New Toronto, Mimico South, Humber Bay Shores",Tennis Court,Grocery Store,Park,Skating Rink,Convenience Store,Art Museum,Food & Drink Shop,Fish Market,Fish & Chips Shop,0
20,"South Steeles, Silverstone, Humbergate, Jamest...",Hardware Store,Beer Store,Pizza Place,Park,Discount Store,Coffee Shop,Sandwich Place,Caribbean Restaurant,Fast Food Restaurant,0
21,Rosedale,Tennis Court,Park,Bike Trail,Shop & Service,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant,Event Space,0
29,"Old Mill South, King's Mill Park, Sunnylea, Hu...",Bank,Fast Food Restaurant,Flower Shop,Park,Italian Restaurant,Coffee Shop,Sushi Restaurant,Ethiopian Restaurant,Escape Room,0


## Cluster 2

In [39]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 1, downtown_merged.columns[[2] + list(range(6, downtown_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
27,"The Kingsway, Montgomery Road, Old Mill North",Costume Shop,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant,Event Space,Ethiopian Restaurant,1


## Cluster 3

In [40]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 2, downtown_merged.columns[[2] + list(range(6, downtown_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
4,"West Deane Park, Princess Gardens, Martin Grov...",Print Shop,Sandwich Place,Tea Room,Chinese Restaurant,Department Store,Creperie,Fast Food Restaurant,Farmers Market,Farm,2
14,Westmount,Chinese Restaurant,Coffee Shop,Sandwich Place,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant,Event Space,2


## Cluster 4

In [41]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 3, downtown_merged.columns[[2] + list(range(6, downtown_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
23,"Alderwood, Long Branch",Gym,Pub,Performing Arts Venue,Yoga Studio,Eastern European Restaurant,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm,3


## Cluster 5

In [42]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 4, downtown_merged.columns[[2] + list(range(6, downtown_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
15,"Kingsview Village, St. Phillips, Martin Grove ...",Bus Line,Arts & Crafts Store,Music Venue,Clothing Store,Electronics Store,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm,4


## Cluster 6

In [43]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 5, downtown_merged.columns[[2] + list(range(6, downtown_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
0,"Regent Park, Harbourfront",Breakfast Spot,Theater,Distribution Center,Pub,Restaurant,Electronics Store,Event Space,Spa,Food Truck,5
1,"Queen's Park, Ontario Provincial Government",Sandwich Place,Falafel Restaurant,Gastropub,Bank,Burrito Place,Theater,Fried Chicken Joint,Italian Restaurant,Portuguese Restaurant,5
3,"Garden District, Ryerson",Clothing Store,Café,Cosmetics Shop,Japanese Restaurant,Theater,Ramen Restaurant,Bubble Tea Shop,Italian Restaurant,Furniture / Home Store,5
5,St. James Town,Cocktail Bar,Clothing Store,Gastropub,Café,Restaurant,Cosmetics Shop,Hotel,Lingerie Store,Japanese Restaurant,5
7,Berczy Park,Farmers Market,Seafood Restaurant,Restaurant,Beer Bar,Cheese Shop,Breakfast Spot,Cocktail Bar,Bakery,Yoga Studio,5
8,Central Bay Street,Clothing Store,Café,Sandwich Place,Middle Eastern Restaurant,Cosmetics Shop,Sushi Restaurant,Plaza,Hotel,Restaurant,5
10,"Richmond, Adelaide, King",Café,Hotel,Restaurant,Gym,Salad Place,American Restaurant,Asian Restaurant,Japanese Restaurant,Steakhouse,5
11,"Harbourfront East, Union Station, Toronto Islands",Hotel,Plaza,Restaurant,Japanese Restaurant,Aquarium,Boat or Ferry,Deli / Bodega,Park,Roof Deck,5
12,"Toronto Dominion Centre, Design Exchange",Hotel,Restaurant,Café,Salad Place,Japanese Restaurant,American Restaurant,Seafood Restaurant,Beer Bar,Sporting Goods Shop,5
13,"Commerce Court, Victoria Hotel",Hotel,Restaurant,Café,American Restaurant,Italian Restaurant,Japanese Restaurant,Gym,Beer Bar,Deli / Bodega,5


## Conclusion

Here you go, your origin neighborhood (Northwest, Etobicoke) falls into __Cluster 6__.

Luckily, you will not find any trouble figuring out a candidate neighborhood in Downtown Toronto, as you have so many options for neighborhoods that have the same characteristics. 

Just pick the best neighborhood that suits your needs and face a new career with ease!!!

That's it for this project, by for now :)