<a href="https://cognitiveclass.ai"><img src = "https://ibm.box.com/shared/static/9gegpsmnsoo25ikkbl4qzlvlyjbgxs5x.png" width = 400> </a>

<h1 align=center><font size = 5>Moving to Toronto from Montreal. Which neighborhood to choose from?</font></h1>

# Introduction

I'm planning to move to Toronot from Montreal. I would like to move to the neighbourhood similar to where I currenlty live. For that purpose I would need to use Foursquare API to get the venues in my current neighbourhood. I would need to use clustering to determine cluster labels for Montreal.

Next, using Foursquare API, I'll fetch venues data for Toronoto. Instead of clustering I'll use classification methodology to classify Toronto's neighbourhoods given labels from Montreal neighbourhood clusterization. Then I'll choose same clusters in Toronto that I like in Montreal. And that would be my neighborhood of first choice to consider.


# Data 

For this report we will be using Foursquare API Venue:

GET https://api.foursquare.com/v2/venues/explore

Returns a list of recommended venues near the current location. For more robust information about the venues themselves (photos/tips/etc.), please see our venue details endpoint.

### Request API 

| Name	| Example	| Description | 
|------|------|------|
|   ll  | 40.74224,-73.99386| required unless near is provided. Latitude and longitude of the user's location.|
|   near  | Chicago, IL| required unless ll is provided. A string naming a place in the world. If the near string is not geocodable, returns a failed_geocode error. Otherwise, searches within the bounds of the geocode and adds a geocode object to the response.|
|   radius  | 250| Radius to search within, in meters. If radius is not specified, a suggested radius will be used based on the density of venues in the area. The maximum supported radius is currently 100,000 meters.|

### Response Fields

| Field		| Description | 
|------|------|
| warning| Presents an object with a text field that contains a warning message, if applicable (i.e. not enough results, try doing X).|
| groups| A text name for the location the user searched, e.g. "SoHo".|
| headerLocation| A text name for the location the user searched, e.g. "SoHo".|
| id| A unique string identifier for this venue.|
| name|The best known name for this venue.|
| location|An object containing none, some, or all of address (street address), crossStreet, city, state, postalCode, country, lat, lng, and distance. All fields are strings, except for lat, lng, and distance. Distance is measured in meters. Some venues have their locations intentionally hidden for privacy reasons (such as private residences). If this is the case, the parameter isFuzzed will be set to true, and the lat/lng parameters will have reduced precision.|
| categories | An array, possibly empty, of categories that have been applied to this venue. One of the categories will have a primary field indicating that it is the primary category for the venue. For the complete category tree, see categories.|


    
Next, I'll do same clustering exercise as I did for clustering Toronto neighbourhood, but this time for Montreal city.

Postal Codes for Montreal:
https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_H
    
Finally, I'll use K-Nearest Neighbour Classifier to predict neghbour cluster given venues in 500m proximity.   

# Methodology 

I'll use KNN clustering algorythm to cluster Montreal boroughs. 

Next, I'll use the labels to train KNN classifier. 
Main idea is to train classifier to predict the cluster label given vanues of Montreal city.

Last, I'll use KNN classifier trained on the Montreal data set to predict cluster label for Toronto city. 
This will help to identify boroughs in Toronto that are similar to Montreal. 

In [2]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
import requests 
from bs4 import BeautifulSoup 

In [164]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans
import re
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import time
import folium # map rendering library

In [181]:
pd.set_option('display.max_columns', 50)
pd.set_option('display.max_rows', None)


Load and process Montreal postal codes

In [7]:
URL = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_H"
r = requests.get(URL) 
  
soup = BeautifulSoup(r.content, 'html5lib') 
table = soup.find(table) 

# print(soup.prettify()) 
print('Page Scrapped.')

Page Scrapped.


In [None]:
for row in table.findAll("td"):
    pc = row.text.strip()
    try:
        if 'Not assigned' not in pc:
            postal_code = re.match('\w\d\w', pc)[0]
            neigh = pc.split(postal_code)[-1].replace(',','').strip()
            
            if 'Griffintown' in neigh:
                neigh = 'Griffintown'
            print ( postal_code ,'|', neigh)
    except:
        pass
   

Loading postprocesed csv file with Montreal postal codes and neighbourhood

In [157]:
df = pd.read_csv('./montreal_postal_codes.csv')

Use geopy library to get the latitude and longitude values of Montreal City.
In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent ny_explorer, as shown below.

In [158]:
df_postal_c = pd.DataFrame()

for index in range(df.shape[0]):

    try:
        
        address = '%s , QC'%(df.iloc[index]['Neighbourhood'])
        geolocator = Nominatim(user_agent="qc_explorer")
        location = geolocator.geocode(address)
        latitude = location.latitude
        longitude = location.longitude
        time.sleep(5)
        print('The geograpical coordinate of %s are %s %s.'%(df.iloc[index]['Neighbourhood'] , latitude, longitude))
        df_postal_c = df_postal_c.append({'Borough': df.iloc[index]['PostalCode'],
                                          'Neighborhood': df.iloc[index]['Neighbourhood'],
                                          'Latitude': latitude,
                                          'Longitude': longitude}, ignore_index=True)
    except:
        pass 

The geograpical coordinate of  Pointe-aux-Trembles are 45.667824350000004 -73.50513339569802.
The geograpical coordinate of  Saint-Michel East are 45.2381816 -73.5699087.
The geograpical coordinate of  Downtown Montreal North  are 45.5052895 -73.5640756.
The geograpical coordinate of  Notre-Dame-de-Grâce  are 45.4679674 -73.6289223.
The geograpical coordinate of  Place Bonaventure are 45.49958275 -73.56491666661734.
The geograpical coordinate of  Duvernay-Est are 45.59314 -73.66641.
The geograpical coordinate of  Dollard-des-Ormeaux North west are 45.48423 -73.8064547.
The geograpical coordinate of  Montreal East are 45.4972159 -73.6103642.
The geograpical coordinate of  Ahuntsic North are 45.55930765 -73.65270165067771.
The geograpical coordinate of  Downtown Montreal East are 45.5052895 -73.5640756.
The geograpical coordinate of  Notre-Dame-de-Grâce South west are 45.4679674 -73.6289223.
The geograpical coordinate of  Place Desjardins are 45.507423700000004 -73.5644547155236.
The geo

The geograpical coordinate of  Rosemont Central are 45.5314955 -73.5973383.
The geograpical coordinate of  Plateau Mont-Royal Southeast are 45.5218361 -73.5821731.
The geograpical coordinate of  Hampstead are 45.4811545 -73.6469908.
The geograpical coordinate of  Montreal West are 45.452853 -73.6442548.
The geograpical coordinate of  Sainte-Dorothée are 45.5284869 -73.8200017.
The geograpical coordinate of  Sainte-Anne-De-Bellevue are 45.4092897 -73.9461485.
The geograpical coordinate of  Rosemont South are 45.5314955 -73.5973383.
The geograpical coordinate of  Old Montreal are 45.5033677 -73.5574484.
The geograpical coordinate of  Dorval Central  are 45.4453082 -73.7510888.
The geograpical coordinate of  Îles-Laval are 45.5203029 -73.8522024.
The geograpical coordinate of  Saint-Michel West are 45.2381816 -73.5699087.
The geograpical coordinate of  Downtown Montreal Northeast are 45.5052895 -73.5640756.
The geograpical coordinate of  Westmount South are 45.4857329 -73.5963951.
The geo

In [160]:
df_postal_c.head()

Unnamed: 0,Borough,Latitude,Longitude,Neighborhood
0,H1A,45.667824,-73.505133,Pointe-aux-Trembles
1,H2A,45.238182,-73.569909,Saint-Michel East
2,H3A,45.50529,-73.564076,Downtown Montreal North
3,H4A,45.467967,-73.628922,Notre-Dame-de-Grâce
4,H5A,45.499583,-73.564917,Place Bonaventure


In [161]:
#df_postal_c.to_csv('./df_postal_c_with_ll.csv')

As we did with Toronto boroughs, let's visualize all Montreal neighborhoods.

In [553]:
address = 'Montreal, QC'

geolocator = Nominatim(user_agent="my_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Montreal are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Montreal are 45.4972159, -73.6103642.


In [165]:
# create map of Manhattan using latitude and longitude values
map_montreal = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df_postal_c['Latitude'], df_postal_c['Longitude'], df_postal_c['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_montreal)  
    
map_montreal

![Map of Montreal Boroghs](montreal_all_boroughs.JPG)


Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

#### Define Foursquare Credentials and Version

In [554]:
CLIENT_ID = 'CD3LA0M1TG30QJNBQL2BOJRT12Z1AYS5NSSA3HDUOUC' # your Foursquare ID
CLIENT_SECRET = 'VYBFSJGULEXQR1PVNLTD30LDL4UTDXJW25GIBXCO3MKM' # your Foursquare Secret
VERSION = '20180323' # Foursquare API version

radius = 500 # define radius
LIMIT = 100 # limit of number of venues returned by Foursquare API
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: CD3LA0M1TG30QJNBQL2BOJRT12Z1AYS5NSSA3HDUOUC
CLIENT_SECRET:VYBFSJGULEXQR1PVNLTD30LDL4UTDXJW25GIBXCO3MKM


In [167]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Query all venues given the borough coordinates

In [168]:
montreal_venues = getNearbyVenues(names=df_postal_c['Neighborhood'],
                                   latitudes=df_postal_c['Latitude'],
                                   longitudes=df_postal_c['Longitude']
                                  )

 Pointe-aux-Trembles
 Saint-Michel East
 Downtown Montreal North 
 Notre-Dame-de-Grâce 
 Place Bonaventure
 Duvernay-Est
 Dollard-des-Ormeaux North west
 Montreal East
 Ahuntsic North
 Downtown Montreal East
 Notre-Dame-de-Grâce South west
 Place Desjardins
 Saint-François
 Dollard-des-Ormeaux East
 Rivière-des-Prairies North east
 Ahuntsic Central
 Griffintown
 Saint-Henri
 Saint-Vincent-de-Paul
 LÎle-Bizard North east
 Rivière-des-Prairies South west
 Villeray North east
 LÎle-Des-Soeurs
 Ville Émard
 Duvernay
 LÎle-Bizard South west
 Montréal-Nord North
 Petite-Patrie North east
 Downtown Montreal South east
 Verdun North
 Pont-Viau
 Dollard-des-Ormeaux South west
 Montréal-Nord South
 Plateau Mont-Royal North
 Downtown Montreal South west
 Verdun South
 Auteuil West
Pierrefonds
 Anjou West
 Plateau Mont-Royal North Central
 Petite-Bourgogne
 Cartierville Central
 Auteuil North east
 Kirkland
 Anjou East
 Pointe-Saint-Charles
 Cartierville South west
 Auteuil South
 Senneville
 Merc

Let's check the size of the resulting dataframe

In [227]:
montreal_venues.shape

(2297, 7)

Let's check how many venues were returned for each neighborhood

In [226]:
montreal_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Ahuntsic Central,14,14,14,14,14,14
Ahuntsic East,14,14,14,14,14,14
Ahuntsic North,14,14,14,14,14,14
Ahuntsic South west,14,14,14,14,14,14
Ahuntsic Southeast,14,14,14,14,14,14
Anjou East,3,3,3,3,3,3
Anjou West,3,3,3,3,3,3
Auteuil North east,5,5,5,5,5,5
Auteuil South,5,5,5,5,5,5
Auteuil West,5,5,5,5,5,5


Let's find out how many unique categories can be curated from all the returned venues

In [394]:
print('There are {} uniques categories.'.format(len(montreal_venues['Venue Category'].unique())))

There are 202 uniques categories.


#### Analyze Each Neighborhood in Montreal

We need to use one hot encoding technique to work with numerical data instead of categorical

In [229]:
# one hot encoding
montreal_onehot = pd.get_dummies(montreal_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
montreal_onehot['Neighborhood'] = montreal_venues['Neighborhood'] 


montreal_onehot.head()

Unnamed: 0,ATM,Adult Boutique,American Restaurant,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Bakery,Bank,Bar,Baseball Field,Basketball Court,Beer Store,Big Box Store,Bike Rental / Bike Share,Bistro,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Boxing Gym,Breakfast Spot,Brewery,...,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Swiss Restaurant,Taco Place,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Thrift / Vintage Store,Toy / Game Store,Track,Trail,Train,Train Station,University,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [230]:
montreal_onehot.columns.get_indexer_for(['Neighborhood'])

array([131], dtype=int64)

In [231]:
len(montreal_onehot.columns)

202

What is top type of the venues that are located in Montreal?

In [232]:
montreal_venues['Venue Category'].value_counts()

Café                                        155
Coffee Shop                                  94
Restaurant                                   93
French Restaurant                            75
Bakery                                       69
Hotel                                        58
Bar                                          50
Pizza Place                                  48
Sandwich Place                               44
Pharmacy                                     43
Middle Eastern Restaurant                    43
Italian Restaurant                           41
Fast Food Restaurant                         41
Park                                         40
Breakfast Spot                               39
Grocery Store                                35
Vietnamese Restaurant                        34
Plaza                                        34
Sushi Restaurant                             34
Bookstore                                    33
Japanese Restaurant                     


Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category


In [427]:
montreal_grouped = montreal_onehot.groupby('Neighborhood').mean().reset_index()
montreal_grouped

Unnamed: 0,Neighborhood,ATM,Adult Boutique,American Restaurant,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Bakery,Bank,Bar,Baseball Field,Basketball Court,Beer Store,Big Box Store,Bike Rental / Bike Share,Bistro,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Boxing Gym,Breakfast Spot,...,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Swiss Restaurant,Taco Place,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Thrift / Vintage Store,Toy / Game Store,Track,Trail,Train,Train Station,University,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,Ahuntsic Central,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0
1,Ahuntsic East,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0
2,Ahuntsic North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0
3,Ahuntsic South west,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0
4,Ahuntsic Southeast,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0
5,Anjou East,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Anjou West,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Auteuil North east,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Auteuil South,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Auteuil West,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


 Let's confirm the new size

In [428]:
montreal_grouped.shape

(116, 202)

Let's print each neighborhood along with the top 5 most common venues

In [297]:
num_top_venues = 50

First, let's write a function to sort the venues in descending order.

In [298]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [299]:
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = montreal_grouped['Neighborhood']

for ind in np.arange(montreal_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(montreal_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue,21th Most Common Venue,22th Most Common Venue,23th Most Common Venue,24th Most Common Venue,...,26th Most Common Venue,27th Most Common Venue,28th Most Common Venue,29th Most Common Venue,30th Most Common Venue,31th Most Common Venue,32th Most Common Venue,33th Most Common Venue,34th Most Common Venue,35th Most Common Venue,36th Most Common Venue,37th Most Common Venue,38th Most Common Venue,39th Most Common Venue,40th Most Common Venue,41th Most Common Venue,42th Most Common Venue,43th Most Common Venue,44th Most Common Venue,45th Most Common Venue,46th Most Common Venue,47th Most Common Venue,48th Most Common Venue,49th Most Common Venue,50th Most Common Venue
0,Ahuntsic Central,Café,Restaurant,Plaza,Vietnamese Restaurant,Pharmacy,Hardware Store,Coffee Shop,Furniture / Home Store,Breakfast Spot,Ice Cream Shop,Fast Food Restaurant,Sausage Shop,Yoga Studio,Discount Store,Dive Bar,Dim Sum Restaurant,Drugstore,Dumpling Restaurant,Diner,Department Store,Dessert Shop,English Restaurant,Deli / Bodega,Cycle Studio,...,Curling Ice,Cupcake Shop,Cuban Restaurant,Creperie,Empanada Restaurant,Farm,Event Space,Falafel Restaurant,Gym Pool,Gym / Fitness Center,Gym,Grocery Store,Greek Restaurant,Gourmet Shop,Golf Course,Gift Shop,German Restaurant,Gastropub,Gas Station,Gaming Cafe,Fried Chicken Joint,French Restaurant,Food Truck,Food & Drink Shop,Food
1,Ahuntsic East,Café,Restaurant,Plaza,Vietnamese Restaurant,Pharmacy,Hardware Store,Coffee Shop,Furniture / Home Store,Breakfast Spot,Ice Cream Shop,Fast Food Restaurant,Sausage Shop,Yoga Studio,Discount Store,Dive Bar,Dim Sum Restaurant,Drugstore,Dumpling Restaurant,Diner,Department Store,Dessert Shop,English Restaurant,Deli / Bodega,Cycle Studio,...,Curling Ice,Cupcake Shop,Cuban Restaurant,Creperie,Empanada Restaurant,Farm,Event Space,Falafel Restaurant,Gym Pool,Gym / Fitness Center,Gym,Grocery Store,Greek Restaurant,Gourmet Shop,Golf Course,Gift Shop,German Restaurant,Gastropub,Gas Station,Gaming Cafe,Fried Chicken Joint,French Restaurant,Food Truck,Food & Drink Shop,Food
2,Ahuntsic North,Café,Restaurant,Plaza,Vietnamese Restaurant,Pharmacy,Hardware Store,Coffee Shop,Furniture / Home Store,Breakfast Spot,Ice Cream Shop,Fast Food Restaurant,Sausage Shop,Yoga Studio,Discount Store,Dive Bar,Dim Sum Restaurant,Drugstore,Dumpling Restaurant,Diner,Department Store,Dessert Shop,English Restaurant,Deli / Bodega,Cycle Studio,...,Curling Ice,Cupcake Shop,Cuban Restaurant,Creperie,Empanada Restaurant,Farm,Event Space,Falafel Restaurant,Gym Pool,Gym / Fitness Center,Gym,Grocery Store,Greek Restaurant,Gourmet Shop,Golf Course,Gift Shop,German Restaurant,Gastropub,Gas Station,Gaming Cafe,Fried Chicken Joint,French Restaurant,Food Truck,Food & Drink Shop,Food
3,Ahuntsic South west,Café,Restaurant,Plaza,Vietnamese Restaurant,Pharmacy,Hardware Store,Coffee Shop,Furniture / Home Store,Breakfast Spot,Ice Cream Shop,Fast Food Restaurant,Sausage Shop,Yoga Studio,Discount Store,Dive Bar,Dim Sum Restaurant,Drugstore,Dumpling Restaurant,Diner,Department Store,Dessert Shop,English Restaurant,Deli / Bodega,Cycle Studio,...,Curling Ice,Cupcake Shop,Cuban Restaurant,Creperie,Empanada Restaurant,Farm,Event Space,Falafel Restaurant,Gym Pool,Gym / Fitness Center,Gym,Grocery Store,Greek Restaurant,Gourmet Shop,Golf Course,Gift Shop,German Restaurant,Gastropub,Gas Station,Gaming Cafe,Fried Chicken Joint,French Restaurant,Food Truck,Food & Drink Shop,Food
4,Ahuntsic Southeast,Café,Restaurant,Plaza,Vietnamese Restaurant,Pharmacy,Hardware Store,Coffee Shop,Furniture / Home Store,Breakfast Spot,Ice Cream Shop,Fast Food Restaurant,Sausage Shop,Yoga Studio,Discount Store,Dive Bar,Dim Sum Restaurant,Drugstore,Dumpling Restaurant,Diner,Department Store,Dessert Shop,English Restaurant,Deli / Bodega,Cycle Studio,...,Curling Ice,Cupcake Shop,Cuban Restaurant,Creperie,Empanada Restaurant,Farm,Event Space,Falafel Restaurant,Gym Pool,Gym / Fitness Center,Gym,Grocery Store,Greek Restaurant,Gourmet Shop,Golf Course,Gift Shop,German Restaurant,Gastropub,Gas Station,Gaming Cafe,Fried Chicken Joint,French Restaurant,Food Truck,Food & Drink Shop,Food


In [314]:
neighborhoods_venues_sorted.shape

(116, 52)

### Cluster Neighborhoods

Run *k*-means to cluster the neighborhood into 5 clusters.

Loading data set of venues for Toronto we need to make sure that both Toronto and Montreal have same features - venues 

In [419]:
df_toronto = pd.read_csv('toronto_50_venues_merged.csv')

In [444]:
df_toronto = df_toronto.drop('Neighborhood',1)

Lets explore common venues and unique venues to each city.

In [447]:

montreal_venues = set(montreal_grouped_clustering.columns)
toronto_venues = set(df_toronto.columns)

In [448]:
unique_to_montreal = montreal_venues - toronto_venues
unique_to_toronto = toronto_venues - montreal_venues

We need to drop unique venues from the dataset and only keep common venues

In [449]:
df_montreal = montreal_grouped_clustering.drop(list(unique_to_montreal), 1)

In [450]:
df_toronto = df_toronto.drop(list(unique_to_toronto), 1)

In [451]:
df_montreal.shape

(116, 150)

In [452]:
df_toronto.shape

(95, 150)

Do clusterization of Montreal boroghs

In [453]:
# set number of clusters
kclusters = 15

#montreal_grouped_clustering = montreal_grouped.drop('Neighborhood', 1)

# run k-means clustering
#kmeans = KMeans(n_clusters=kclusters, random_state=10).fit(montreal_grouped_clustering)
kmeans = KMeans(n_clusters=kclusters, random_state=10).fit(df_montreal)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:25] 

array([ 6,  6,  6,  6,  6,  6,  6,  5,  5,  5, 10,  2,  2,  2,  6,  1,  1,
        1,  1,  5,  5,  6,  6,  6,  6])

In [455]:
#neighborhoods_venues_sorted.drop('Cluster Labels',1, inplace=True)

In [None]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

montreal_merged = df_postal_c

Merge datasets with cluster labels

In [457]:


# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
montreal_merged = montreal_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

montreal_merged.head() # check the last columns!

Unnamed: 0,Borough,Latitude,Longitude,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue,...,26th Most Common Venue,27th Most Common Venue,28th Most Common Venue,29th Most Common Venue,30th Most Common Venue,31th Most Common Venue,32th Most Common Venue,33th Most Common Venue,34th Most Common Venue,35th Most Common Venue,36th Most Common Venue,37th Most Common Venue,38th Most Common Venue,39th Most Common Venue,40th Most Common Venue,41th Most Common Venue,42th Most Common Venue,43th Most Common Venue,44th Most Common Venue,45th Most Common Venue,46th Most Common Venue,47th Most Common Venue,48th Most Common Venue,49th Most Common Venue,50th Most Common Venue
0,H1A,45.667824,-73.505133,Pointe-aux-Trembles,5.0,Convenience Store,Pharmacy,Supermarket,Sushi Restaurant,Dim Sum Restaurant,Falafel Restaurant,Event Space,English Restaurant,Empanada Restaurant,Dumpling Restaurant,Drugstore,Dive Bar,Discount Store,Diner,Dessert Shop,Farmers Market,Department Store,Deli / Bodega,Cycle Studio,Currency Exchange,...,Cosmetics Shop,Farm,Yoga Studio,Construction & Landscaping,Filipino Restaurant,Hardware Store,Harbor / Marina,Gym Pool,Gym / Fitness Center,Gym,Grocery Store,Greek Restaurant,Gourmet Shop,Golf Course,Gift Shop,German Restaurant,Gastropub,Gas Station,Gaming Cafe,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Food Truck,Food & Drink Shop,Food
1,H2A,45.238182,-73.569909,Saint-Michel East,10.0,ATM,Pizza Place,Dim Sum Restaurant,Farm,Falafel Restaurant,Event Space,English Restaurant,Empanada Restaurant,Dumpling Restaurant,Drugstore,Dive Bar,Discount Store,Diner,Dessert Shop,Convenience Store,Department Store,Deli / Bodega,Cycle Studio,Currency Exchange,Curling Ice,...,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Hawaiian Restaurant,Hardware Store,Harbor / Marina,Gym Pool,Gym / Fitness Center,Gym,Grocery Store,Greek Restaurant,Gourmet Shop,Golf Course,Gift Shop,German Restaurant,Gastropub,Gas Station,Gaming Cafe,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Food Truck,Food & Drink Shop,Food,Cosmetics Shop
2,H3A,45.50529,-73.564076,Downtown Montreal North,6.0,Hotel,Café,Plaza,French Restaurant,Restaurant,Japanese Restaurant,Performing Arts Venue,Pizza Place,Building,Bubble Tea Shop,Cocktail Bar,Coffee Shop,Concert Hall,Taco Place,Yoga Studio,Theater,Bakery,Asian Restaurant,Art Museum,Dessert Shop,...,Hotel Bar,Dumpling Restaurant,Gourmet Shop,Pharmacy,Gastropub,Chinese Restaurant,Noodle House,Comedy Club,Cycle Studio,Music Venue,Comfort Food Restaurant,Museum,Movie Theater,Department Store,Mongolian Restaurant,Men's Store,Cupcake Shop,Lounge,Liquor Store,Korean Restaurant,Jazz Club,Cuban Restaurant,Food Truck,Public Art,Salad Place
3,H4A,45.467967,-73.628922,Notre-Dame-de-Grâce,6.0,Café,Grocery Store,Mexican Restaurant,Sporting Goods Shop,Bistro,Sandwich Place,Pharmacy,Mac & Cheese Joint,Tea Room,Park,Bakery,Athletics & Sports,Asian Restaurant,Pizza Place,University,Vegetarian / Vegan Restaurant,Gym,Food & Drink Shop,Dumpling Restaurant,Dim Sum Restaurant,...,Yoga Studio,Dessert Shop,Department Store,Deli / Bodega,Currency Exchange,Curling Ice,Cupcake Shop,Cuban Restaurant,Creperie,Coworking Space,Cycle Studio,Farmers Market,English Restaurant,Furniture / Home Store,Greek Restaurant,Gourmet Shop,Golf Course,Gift Shop,German Restaurant,Gastropub,Gas Station,Gaming Cafe,Fried Chicken Joint,Event Space,French Restaurant
4,H5A,45.499583,-73.564917,Place Bonaventure,6.0,Plaza,Café,Restaurant,Coworking Space,Market,French Restaurant,Food Truck,Breakfast Spot,Building,Park,Scenic Lookout,Cupcake Shop,Music Venue,Bistro,Monument / Landmark,Church,Portuguese Restaurant,Cocktail Bar,Coffee Shop,Steakhouse,...,Bar,BBQ Joint,Hawaiian Restaurant,Taco Place,Cycle Studio,Deli / Bodega,Department Store,Currency Exchange,Curling Ice,Dessert Shop,Gourmet Shop,Golf Course,Greek Restaurant,Gift Shop,Dim Sum Restaurant,Diner,German Restaurant,Grocery Store,Discount Store,Fried Chicken Joint,Dive Bar,Drugstore,Food & Drink Shop,Food,Fish & Chips Shop


In [458]:
montreal_merged.head()

Unnamed: 0,Borough,Latitude,Longitude,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue,...,26th Most Common Venue,27th Most Common Venue,28th Most Common Venue,29th Most Common Venue,30th Most Common Venue,31th Most Common Venue,32th Most Common Venue,33th Most Common Venue,34th Most Common Venue,35th Most Common Venue,36th Most Common Venue,37th Most Common Venue,38th Most Common Venue,39th Most Common Venue,40th Most Common Venue,41th Most Common Venue,42th Most Common Venue,43th Most Common Venue,44th Most Common Venue,45th Most Common Venue,46th Most Common Venue,47th Most Common Venue,48th Most Common Venue,49th Most Common Venue,50th Most Common Venue
0,H1A,45.667824,-73.505133,Pointe-aux-Trembles,5.0,Convenience Store,Pharmacy,Supermarket,Sushi Restaurant,Dim Sum Restaurant,Falafel Restaurant,Event Space,English Restaurant,Empanada Restaurant,Dumpling Restaurant,Drugstore,Dive Bar,Discount Store,Diner,Dessert Shop,Farmers Market,Department Store,Deli / Bodega,Cycle Studio,Currency Exchange,...,Cosmetics Shop,Farm,Yoga Studio,Construction & Landscaping,Filipino Restaurant,Hardware Store,Harbor / Marina,Gym Pool,Gym / Fitness Center,Gym,Grocery Store,Greek Restaurant,Gourmet Shop,Golf Course,Gift Shop,German Restaurant,Gastropub,Gas Station,Gaming Cafe,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Food Truck,Food & Drink Shop,Food
1,H2A,45.238182,-73.569909,Saint-Michel East,10.0,ATM,Pizza Place,Dim Sum Restaurant,Farm,Falafel Restaurant,Event Space,English Restaurant,Empanada Restaurant,Dumpling Restaurant,Drugstore,Dive Bar,Discount Store,Diner,Dessert Shop,Convenience Store,Department Store,Deli / Bodega,Cycle Studio,Currency Exchange,Curling Ice,...,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Hawaiian Restaurant,Hardware Store,Harbor / Marina,Gym Pool,Gym / Fitness Center,Gym,Grocery Store,Greek Restaurant,Gourmet Shop,Golf Course,Gift Shop,German Restaurant,Gastropub,Gas Station,Gaming Cafe,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Food Truck,Food & Drink Shop,Food,Cosmetics Shop
2,H3A,45.50529,-73.564076,Downtown Montreal North,6.0,Hotel,Café,Plaza,French Restaurant,Restaurant,Japanese Restaurant,Performing Arts Venue,Pizza Place,Building,Bubble Tea Shop,Cocktail Bar,Coffee Shop,Concert Hall,Taco Place,Yoga Studio,Theater,Bakery,Asian Restaurant,Art Museum,Dessert Shop,...,Hotel Bar,Dumpling Restaurant,Gourmet Shop,Pharmacy,Gastropub,Chinese Restaurant,Noodle House,Comedy Club,Cycle Studio,Music Venue,Comfort Food Restaurant,Museum,Movie Theater,Department Store,Mongolian Restaurant,Men's Store,Cupcake Shop,Lounge,Liquor Store,Korean Restaurant,Jazz Club,Cuban Restaurant,Food Truck,Public Art,Salad Place
3,H4A,45.467967,-73.628922,Notre-Dame-de-Grâce,6.0,Café,Grocery Store,Mexican Restaurant,Sporting Goods Shop,Bistro,Sandwich Place,Pharmacy,Mac & Cheese Joint,Tea Room,Park,Bakery,Athletics & Sports,Asian Restaurant,Pizza Place,University,Vegetarian / Vegan Restaurant,Gym,Food & Drink Shop,Dumpling Restaurant,Dim Sum Restaurant,...,Yoga Studio,Dessert Shop,Department Store,Deli / Bodega,Currency Exchange,Curling Ice,Cupcake Shop,Cuban Restaurant,Creperie,Coworking Space,Cycle Studio,Farmers Market,English Restaurant,Furniture / Home Store,Greek Restaurant,Gourmet Shop,Golf Course,Gift Shop,German Restaurant,Gastropub,Gas Station,Gaming Cafe,Fried Chicken Joint,Event Space,French Restaurant
4,H5A,45.499583,-73.564917,Place Bonaventure,6.0,Plaza,Café,Restaurant,Coworking Space,Market,French Restaurant,Food Truck,Breakfast Spot,Building,Park,Scenic Lookout,Cupcake Shop,Music Venue,Bistro,Monument / Landmark,Church,Portuguese Restaurant,Cocktail Bar,Coffee Shop,Steakhouse,...,Bar,BBQ Joint,Hawaiian Restaurant,Taco Place,Cycle Studio,Deli / Bodega,Department Store,Currency Exchange,Curling Ice,Dessert Shop,Gourmet Shop,Golf Course,Greek Restaurant,Gift Shop,Dim Sum Restaurant,Diner,German Restaurant,Grocery Store,Discount Store,Fried Chicken Joint,Dive Bar,Drugstore,Food & Drink Shop,Food,Fish & Chips Shop


Here is my borough cluster label in Montreal that I'd like to find in Toronto

In [459]:
montreal_merged.iloc[110]

Borough                                      H2Y 
Latitude                                  45.5034
Longitude                                -73.5574
Neighborhood                         Old Montreal
Cluster Labels                                  6
1st Most Common Venue                        Café
2nd Most Common Venue                       Hotel
3rd Most Common Venue           French Restaurant
4th Most Common Venue                  Restaurant
5th Most Common Venue          Italian Restaurant
6th Most Common Venue                         Spa
7th Most Common Venue          Chinese Restaurant
8th Most Common Venue              Sandwich Place
9th Most Common Venue            Asian Restaurant
10th Most Common Venue                  Nightclub
11th Most Common Venue                   Boutique
12th Most Common Venue               Burger Joint
13th Most Common Venue                 Steakhouse
14th Most Common Venue               Dessert Shop
15th Most Common Venue                Coffee Shop


In [460]:
montreal_merged.dropna(inplace=True)

In [461]:
montreal_merged = montreal_merged.astype({'Cluster Labels':'int32'})

In [462]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(montreal_merged['Latitude'], montreal_merged['Longitude'], montreal_merged['Neighborhood'], montreal_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

![Montreal Boroughs](montreal_cluster_6.JPG)

Lets sort the columns in each dataframe. We need this for consistency of features (columns) in both datasets

In [465]:
df_toronto_sorted = df_toronto.reindex(sorted(df_toronto.columns), axis=1)

In [466]:
df_toronto_sorted.head()

Unnamed: 0,American Restaurant,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bank,Bar,Baseball Field,Beer Store,Bistro,Boat or Ferry,Bookstore,Boutique,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Bus Line,Bus Station,Business Service,Butcher,Café,...,Scenic Lookout,Seafood Restaurant,Shoe Store,Shopping Mall,Skating Rink,Soccer Field,Spa,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Taco Place,Tea Room,Thai Restaurant,Theater,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.041667,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [467]:
df_montreal_sorted = df_montreal.reindex(sorted(df_montreal.columns), axis=1)

In [468]:
df_montreal_sorted.head()

Unnamed: 0,American Restaurant,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bank,Bar,Baseball Field,Beer Store,Bistro,Boat or Ferry,Bookstore,Boutique,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Bus Line,Bus Station,Business Service,Butcher,Café,...,Scenic Lookout,Seafood Restaurant,Shoe Store,Shopping Mall,Skating Rink,Soccer Field,Spa,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Taco Place,Tea Room,Thai Restaurant,Theater,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0


Feature selection for classifier that we will use to predict the cluster label on Montreal venues data set


In [469]:
X = df_montreal_sorted
X[0:5]

Unnamed: 0,American Restaurant,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bank,Bar,Baseball Field,Beer Store,Bistro,Boat or Ferry,Bookstore,Boutique,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Bus Line,Bus Station,Business Service,Butcher,Café,...,Scenic Lookout,Seafood Restaurant,Shoe Store,Shopping Mall,Skating Rink,Soccer Field,Spa,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Taco Place,Tea Room,Thai Restaurant,Theater,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0


In [470]:
y = montreal_merged['Cluster Labels']

In [471]:
X.shape

(116, 150)

In [472]:
y.shape

(116,)

Feature selection for classifier that we will use to predict the cluster label on Toronto venues data set


In [474]:
X_t = df_toronto_sorted
X_t[0:5]

Unnamed: 0,American Restaurant,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bank,Bar,Baseball Field,Beer Store,Bistro,Boat or Ferry,Bookstore,Boutique,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Bus Line,Bus Station,Business Service,Butcher,Café,...,Scenic Lookout,Seafood Restaurant,Shoe Store,Shopping Mall,Skating Rink,Soccer Field,Spa,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Taco Place,Tea Room,Thai Restaurant,Theater,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.041667,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Data Normalization Step

In [475]:
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn import metrics

In [476]:
X= preprocessing.StandardScaler().fit(X).transform(X)
X[0:5]

array([[-0.09325048, -0.24991173, -0.26344279, -0.22531319, -0.11890631,
        -0.30605366, -0.52627364, -0.42183071, -0.33927522, -0.09325048,
        -0.15014879, -0.24431382, -0.09325048, -0.36581591, -0.11004202,
         1.32531787, -0.12825197, -0.23318698, -0.22915334, -0.34235757,
        -0.09325048, -0.11945298, -0.1620909 , -0.18898224,  1.31527965,
        -0.09325048, -0.09325048, -0.13245324, -0.28389439, -0.09325048,
        -0.22617854, -0.18898224, -0.2410366 , -0.30792806,  0.64277466,
        -0.21223818, -0.21619666, -0.23824869, -0.40079586, -0.09862755,
        -0.09325048, -0.11582191, -0.20021811, -0.17899763, -0.09325048,
        -0.21459438, -0.20850187, -0.1938295 , -0.09325048, -0.17851599,
        -0.19328958, -0.23401729, -0.09325048, -0.18898224, -0.22171929,
         0.50324968, -0.09325048, -0.17560724, -0.13245324, -0.21194997,
        -0.20021811, -0.42100794, -0.1319719 ,  2.3826586 , -0.21223818,
        -0.1671186 , -0.32994795, -0.13245324, -0.2

## K Nearest Neighbor(KNN) Classifier
We will train classifier on Montreal venues to predict the cluster label 

In [477]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.03, random_state=4)
print ('Train set:', X_train.shape,  y_train.shape)
print ('Test set:', X_test.shape,  y_test.shape)

Train set: (112, 150) (112,)
Test set: (4, 150) (4,)


Lets investigate how many K-neighbours we need for classifier

In [478]:
Ks = 10
mean_acc = np.zeros((Ks-1))
std_acc = np.zeros((Ks-1))
ConfustionMx = [];
for n in range(1,Ks):
    
    #Train Model and Predict  
    neigh = KNeighborsClassifier(n_neighbors = n).fit(X_train,y_train)
    yhat=neigh.predict(X_test)
    mean_acc[n-1] = metrics.accuracy_score(y_test, yhat)

    
    std_acc[n-1]=np.std(yhat==y_test)/np.sqrt(yhat.shape[0])

mean_acc

array([0.5 , 0.25, 0.5 , 0.75, 0.75, 0.75, 0.75, 0.75, 0.75])

4 is the best number so we retrain classifier using 4 nearest neighbours

In [482]:
Knn = KNeighborsClassifier(n_neighbors = 4).fit(X_train,y_train)
yhat=neigh.predict(X_test)
accuracy = metrics.accuracy_score(y_test, yhat)
accuracy

0.75

Normalize Troronto Data for predicting a cluster given venues in the neighbourhood

In [484]:
X_t = preprocessing.StandardScaler().fit(X_t).transform(X_t)
X_t[0:5]

array([[-0.29974395, -0.33347705, -0.14401918, -0.25357   , -0.22571469,
        -0.17675071, -0.3779249 , -0.39653285, -0.32893849, -0.15587421,
        -0.26354469, -0.20390405, -0.10314212, -0.34101694, -0.17795086,
         5.58830685, -0.30429715, -0.24269189, -0.1804979 , -0.30001642,
        -0.20698077, -0.15348155, -0.10314212, -0.22235768, -0.55814018,
        -0.10314212, -0.2080414 , -0.20266493, -0.26268292, -0.14429212,
        -0.10314212, -0.10314212,  4.91583723, -0.24090451, -0.71020323,
        -0.25526489, -0.25774719, -0.14664712, -0.28302442, -0.24721708,
        -0.10314212, -0.24631673, -0.12941714, -0.13678853, -0.10314212,
        -0.28149736, -0.2144693 , -0.30830799, -0.10314212, -0.37375659,
        -0.24229839, -0.10314212, -0.14857417, -0.12665929, -0.23169992,
        -0.23770833, -0.10502039, -0.17241438, -0.1466357 , -0.14401703,
        -0.15450708, -0.20251136, -0.32114085, -0.23175818, -0.10314212,
        -0.2431654 , -0.3119206 , -0.10314212, -0.1

### Predict cluster given Toronto venues using KNN classifier trained on Montreal data

In [485]:
y_t = Knn.predict(X_t)

Check how many lables we got

In [488]:
len(y_t)

95

In [525]:
df_toronto_of_my_choice = pd.read_csv('toronto_50_venues_merged.csv')

In [526]:
df_toronto_of_my_choice.insert(0, 'Cluster Labels', y_t)

In [527]:
df_toronto_of_my_choice.head()

Unnamed: 0.1,Cluster Labels,Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,...,Strip Club,Supermarket,Supplement Shop,Sushi Restaurant,Swim School,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,6,0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,6,1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,6,2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0
3,6,3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,6,4,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [528]:
df_toronto_coordinates = pd.read_csv('./toronto_neighborhoods_with_coordinates.csv')

Merging two dataframes into one with venues and boroughs coordinates

In [529]:
df_toronto_of_my_choice = df_toronto_of_my_choice.merge(df_toronto_coordinates, on='Neighborhood')

### My prefered neighbourhood in Montreal is H2Y wich corresponds to cluster 6. 
Next, lets select all cluster 6 from Toronto

In [534]:
df_toronto_cluster_6 = df_toronto_of_my_choice[df_toronto_of_my_choice['Cluster Labels'] == 6]

In [535]:
df_toronto_cluster_6.shape

(92, 273)

Let's visualize nighborhoods in Toronto. Cluster of my choice, which is 6, I'll mark with green color on the map and all the rest will be market with red.

In [550]:
toronto_lat = 43.6532
toronto_long = -79.3832

# create map
map_cluster_6 = folium.Map(location=[toronto_lat, toronto_long], zoom_start=11)

# set color scheme for the clusters
x = np.arange(10)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(df_toronto_of_my_choice['Latitude'], df_toronto_of_my_choice['Longitude'], df_toronto_of_my_choice['Neighborhood'], df_toronto_of_my_choice['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    if cluster == 6:
        folium.CircleMarker(
            [lat, lon],
            radius=5,
            popup=label,
            color=['darkgreen'],
            fill=True,
            fill_color=rainbow[cluster-1],
            fill_opacity=0.7).add_to(map_cluster_6)
    else:
        folium.CircleMarker(
            [lat, lon],
            radius=5,
            popup=label,
            color=['red'],
            fill=True,
            fill_color=rainbow[cluster-1],
            fill_opacity=0.7).add_to(map_cluster_6)
        
       


In [551]:
map_cluster_6

![Toronto Boroughs of Choice](toronto_boroughs_of_choice.JPG)

# Results

As we can see only 8 out of 95 neighborhood in Toronto are out of my preference based on the venue proximity of 500 meters.

But I also have a lot to choose from when I decide to move to Toronto from Montreal. 

# Discussion 

One interesting observation is that unique venue types in Toronto are different to Montreal. 

Could that be considered as business oportunity in another province? 

I cant answer right now, but certainly would be interesting to explore. 

# Conclusion

To conclude. I would say that Montreal is very similar to Toronto city. 

Moving to Toronto would be an easy switch knowing the similary between boroughs. 