<h1>Segmenting and Clustering Neighborhoods in Montreal</h1>

In [1]:
import numpy as np
import pandas as pd
import requests
import folium
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
from bs4 import BeautifulSoup
from geopy.geocoders import Nominatim

import sys
if sys.version_info[0] < 3: 
    from StringIO import StringIO
else:
    from io import StringIO

Make sure Wiki page is reachable

In [2]:
wikiPage = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_H')
wikiPage

<Response [200]>

Scrape contents (look for table tag)

In [3]:
soup = BeautifulSoup(wikiPage.text)
# Get the table having the class wikitable
hoodTable = soup.find("table")
rows = []
for row in hoodTable.tbody.find_all("tr")[:]:
    for td in row.find_all("td"):
        code = td.b.text.replace(',', '').strip()
        boro = ("" if not(td.i) else td.i.text) if not(td.a) else td.a.text
        rows.append([code, boro])
# Create dataframe
headings = ["PostalCode","Borough"]
df = pd.DataFrame(data=rows, columns=headings)
df.loc[df['PostalCode'].isin(['H8R','H8S','H8T'])]

Unnamed: 0,PostalCode,Borough
115,H8R,Ville Saint-Pierre
124,H8S,Lachine
133,H8T,Lachine


Get latest COVID cases from Sante Montreal's website.

Need to fake the agent as the site does not accept anything but official browsers.

Encoding is also a little picky due to special characters... from several tests, ***windows-1252*** fits best

In [4]:
url_cases = 'https://santemontreal.qc.ca/fileadmin/fichiers/Campagnes/coronavirus/situation-montreal/municipal.csv'
csv_cases = StringIO(requests.get(url_cases, headers={'user-agent': 'MTL Explorer'}).text)
df_cases = pd.read_csv(csv_cases, sep=";", encoding='cp1252')
df_cases.head()

Unnamed: 0,Arrondissement ou ville liée,Nombre de cas confirmés,Répartition des cas (%),Taux de cas pour 100 000 personnes,Nombre de décès,Taux de mortalité pour 100 000 personnes
0,Ahuntsic–Cartierville,2397,86,17855,354,2637
1,Anjou,718,26,16777,51,1192
2,Baie D'urfé,31,1,"*810,9",< 5,n.p.
3,Beaconsfield,62,2,3208,9,n.p.
4,Côte-des-Neiges–Notre-Dame-de-Grâce,2258,81,1356,259,1555


Data wrangling to match Montreal geojson

In [5]:
# Header names
df_cases.columns = ['Borough', 'ConfirmedCount', 'DistributionRate', 'ConfirmedPer100K', 'DeathCount', 'DeathPer100K']
# Clean up borough names
df_cases = df_cases[~df_cases["Borough"].isin(['L\'Île-Dorval', 'Territoire à confirmer', 'Total à Montréal'])]
df_cases['Borough'].replace('–', '-', regex=True, inplace=True)
df_cases['Borough'].replace('Baie D\'urfé', 'Baie-d\'Urfé', regex=True, inplace=True)
df_cases['Borough'].replace('Montréal Est', 'Montréal-Est', regex=True, inplace=True)
df_cases['Borough'].replace('Plateau Mont-Royal', 'Le Plateau-Mont-Royal', regex=True, inplace=True)
df_cases['Borough'].replace('Rosemont.*Patrie', 'Rosemont-La Petite-Patrie', regex=True, inplace=True)
df_cases['Borough'].replace('Sud-Ouest', 'Le Sud-Ouest', regex=True, inplace=True)
# Clean up noise
df_cases.replace('<[ ]?|\*', '', regex=True, inplace=True)
df_cases.replace('n\.p\.', '0', regex=True, inplace=True)
df_cases.replace(',', '.', regex=True, inplace=True)
# Cast to correct data type
df_cases['ConfirmedCount'] = df_cases['ConfirmedCount'].astype('float')
df_cases['DistributionRate'] = df_cases['DistributionRate'].astype('float')
df_cases['ConfirmedPer100K'] = df_cases['ConfirmedPer100K'].astype('float')
df_cases['DeathCount'] = df_cases['DeathCount'].astype('float')
df_cases['DeathPer100K'] = df_cases['DeathPer100K'].astype('float')
# If no confirmed cases per 100k is not populated, assume it will be 100k
df_cases.loc[df_cases["ConfirmedPer100K"] <= 0.0, "ConfirmedPer100K"] = 100000
df_cases.head()

Unnamed: 0,Borough,ConfirmedCount,DistributionRate,ConfirmedPer100K,DeathCount,DeathPer100K
0,Ahuntsic-Cartierville,2397.0,8.6,1785.5,354.0,263.7
1,Anjou,718.0,2.6,1677.7,51.0,119.2
2,Baie-d'Urfé,31.0,0.1,810.9,5.0,0.0
3,Beaconsfield,62.0,0.2,320.8,9.0,0.0
4,Côte-des-Neiges-Notre-Dame-de-Grâce,2258.0,8.1,1356.0,259.0,155.5


Calculate the population from confirmed cases and confirmed per 100k.

This gives a very close approximation of the real numbers published in the last census from 2016.

In [6]:
df_cases['Population'] = round(df_cases['ConfirmedCount'] * 100000 / df_cases['ConfirmedPer100K'])
df_cases['Population'] = df_cases['Population'].astype(int)
df_cases[['Borough','Population']].head()

Unnamed: 0,Borough,Population
0,Ahuntsic-Cartierville,134248
1,Anjou,42797
2,Baie-d'Urfé,3823
3,Beaconsfield,19327
4,Côte-des-Neiges-Notre-Dame-de-Grâce,166519


Get geojson of all boroughs and cities

In [7]:
mtl_boro_url = 'http://donnees.ville.montreal.qc.ca/dataset/00bd85eb-23aa-4669-8f1b-ba9a000e3dd8/resource/e9b0f927-8f75-458c-8fda-b5da65cc8b73/download/limadmin.json'
mtl_boro_json = requests.get(mtl_boro_url).json()
mtl_boro_json['features'][0]['properties']

{'ABREV': 'OM',
 'AIRE': 3813355.72326504,
 'CODEID': '11',
 'CODEMAMROT': 'REM05',
 'MUNID': 66023,
 'NOM': 'Outremont',
 'NUM': 5,
 'PERIM': 10836.6706340882,
 'TYPE': 'Arrondissement'}

Extract area information (in km<sup>2</sup>) and translate

In [8]:
df_boro_area = pd.json_normalize(mtl_boro_json['features'])
df_boro_area = df_boro_area.loc[:,['properties.NOM','properties.AIRE', 'properties.TYPE']]
df_boro_area.columns = ['Borough', 'Area', 'BoroughType']
df_boro_area['Area'] = df_boro_area['Area'] / 1000000
df_boro_area.loc[df_boro_area["BoroughType"] == 'Arrondissement', "BoroughType"] = 0
df_boro_area.loc[df_boro_area["BoroughType"] == 'Ville liée', "BoroughType"] = 1
df_boro_area['BoroughType'] = df_boro_area['BoroughType'].astype(int)
df_boro_area.head()

Unnamed: 0,Borough,Area,BoroughType
0,Outremont,3.813356,0
1,LaSalle,25.197268,0
2,Mont-Royal,7.44556,1
3,Ville-Marie,21.500632,0
4,Le Plateau-Mont-Royal,8.151665,0


Left join the above to our main dataset

In [9]:
df_cases = df_cases.merge(right=df_boro_area, how='left', on='Borough')
df_cases.head()

Unnamed: 0,Borough,ConfirmedCount,DistributionRate,ConfirmedPer100K,DeathCount,DeathPer100K,Population,Area,BoroughType
0,Ahuntsic-Cartierville,2397.0,8.6,1785.5,354.0,263.7,134248,25.571187,0
1,Anjou,718.0,2.6,1677.7,51.0,119.2,42797,13.878194,0
2,Baie-d'Urfé,31.0,0.1,810.9,5.0,0.0,3823,8.025921,1
3,Beaconsfield,62.0,0.2,320.8,9.0,0.0,19327,24.922506,1
4,Côte-des-Neiges-Notre-Dame-de-Grâce,2258.0,8.1,1356.0,259.0,155.5,166519,21.483755,0


Calculate each borough's population density (per km<sup>2</sup>)

In [10]:
df_cases['Density'] = df_cases['Population'] / df_cases['Area']
df_cases[['Borough', 'Population', 'Area', 'Density']].head()

Unnamed: 0,Borough,Population,Area,Density
0,Ahuntsic-Cartierville,134248,25.571187,5249.971285
1,Anjou,42797,13.878194,3083.758656
2,Baie-d'Urfé,3823,8.025921,476.331652
3,Beaconsfield,19327,24.922506,775.483821
4,Côte-des-Neiges-Notre-Dame-de-Grâce,166519,21.483755,7750.926334


Get Montreal's coordinates

In [11]:
address = 'Montreal, Quebec, Canada'

geolocator = Nominatim(user_agent="MTL Explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Montreal are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Montreal are 45.4972159, -73.6103642.


Visualize cases on the Montreal island and ensure that the choropleth properly matches the names of our dataframe

In [12]:
mtl_map = folium.Map(location=[latitude,longitude], zoom_start=10, tiles='OpenStreetMap')

# Function to style suburbs not part of te cirty of Montreal
style_function = lambda x: {
    'stroke': True if x['properties']['TYPE'] == 'Ville liée' else False,
    'weight': 1.5,
    'color': 'purple',
    'fillOpacity': 0
}
suburb_contours = folium.features.GeoJson(mtl_boro_json, style_function=style_function)

# Counts of confirmed cases
choropleth = folium.Choropleth(
    mtl_boro_json,
    data=df_cases,
    columns=['Borough', 'ConfirmedCount'],
    key_on='feature.properties.NOM',
    fill_color='YlOrRd',
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='COVID cases'
).add_to(mtl_map)

mtl_map.add_child(suburb_contours)
mtl_map

Interesting that not many cases are recorded on the West Island (basically, west of Lachine/Saint-Laurent). Can it be due to population density in those boroughs?

In [13]:
mtl_map = folium.Map(location=[latitude,longitude], zoom_start=10, tiles='OpenStreetMap')

# Densities by borough
choropleth = folium.Choropleth(
    mtl_boro_json,
    data=df_cases,
    columns=['Borough', 'Density'],
    key_on='feature.properties.NOM',
    fill_color='YlGn', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Population Density'
).add_to(mtl_map)

mtl_map.add_child(suburb_contours)
mtl_map

Population density parially answers the above question. However, the Plateau, which is the densest area on the map, has not had that many cases compared to neighboring municipalities.

Calculate Latitude/Longitude of each borough based on its geojson

In [14]:
df_cases['Latitude'] = 0.0
df_cases['Longitude'] = 0.0
boros = mtl_boro_json['features']

for idx in range(len(boros)):
    coords = boros[idx]['geometry']['coordinates'][0][0]
    ll = [0.0, 0.0]
    for pnt in range(len(coords)):
        ll = list(map(sum, zip(ll, coords[pnt])))
    ll = list(map(lambda x: x / len(coords), ll))
    df_cases.loc[df_cases['Borough'] == boros[idx]['properties']['NOM'], 'Latitude'] = ll[1]
    df_cases.loc[df_cases['Borough'] == boros[idx]['properties']['NOM'], 'Longitude'] = ll[0]

df_cases.head()

Unnamed: 0,Borough,ConfirmedCount,DistributionRate,ConfirmedPer100K,DeathCount,DeathPer100K,Population,Area,BoroughType,Density,Latitude,Longitude
0,Ahuntsic-Cartierville,2397.0,8.6,1785.5,354.0,263.7,134248,25.571187,0,5249.971285,45.559576,-73.674273
1,Anjou,718.0,2.6,1677.7,51.0,119.2,42797,13.878194,0,3083.758656,45.609607,-73.556824
2,Baie-d'Urfé,31.0,0.1,810.9,5.0,0.0,3823,8.025921,1,476.331652,45.412291,-73.913494
3,Beaconsfield,62.0,0.2,320.8,9.0,0.0,19327,24.922506,1,775.483821,45.428059,-73.870311
4,Côte-des-Neiges-Notre-Dame-de-Grâce,2258.0,8.1,1356.0,259.0,155.5,166519,21.483755,0,7750.926334,45.476786,-73.640154


Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

In [15]:
CLIENT_ID = 'BQFGSANCVA4JLVSFDADVHZHJRMA2INX4URMRIFHJ0QGHRVPV' # your Foursquare ID
CLIENT_SECRET = 'TR00D4NNSNOSWX3JK1BZAOBFSQN3EVRD1BYXSCANUP3DRSXH' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: BQFGSANCVA4JLVSFDADVHZHJRMA2INX4URMRIFHJ0QGHRVPV
CLIENT_SECRET:TR00D4NNSNOSWX3JK1BZAOBFSQN3EVRD1BYXSCANUP3DRSXH


Get the neighborhood's name.

In [16]:
df_cases.loc[0, 'Borough']

'Ahuntsic-Cartierville'

Get the neighborhood's latitude and longitude values.

In [50]:
neighborhood_latitude = df_cases.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = df_cases.loc[0, 'Longitude'] # neighborhood longitude value
neighborhood_name = df_cases.loc[0, 'Borough'] # neighborhood name
borough_radius = df_cases.loc[0, 'Area'] ** (1/2) * 1000

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Ahuntsic-Cartierville are 45.55957592841399, -73.67427335493899.


First, let's create the GET request URL. Name your URL **url**.

In [64]:
# type your answer here
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = borough_radius # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url


'https://api.foursquare.com/v2/venues/explore?&client_id=BQFGSANCVA4JLVSFDADVHZHJRMA2INX4URMRIFHJ0QGHRVPV&client_secret=TR00D4NNSNOSWX3JK1BZAOBFSQN3EVRD1BYXSCANUP3DRSXH&v=20180605&ll=45.55957592841399,-73.67427335493899&radius=5056.796167833463&limit=100'

Send the GET request and examine the resutls

In [65]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5f30c944af3e194d40ecf890'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-4dc68eddd22dafda2fc00fb8-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/bakery_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d16a941735',
         'name': 'Bakery',
         'pluralName': 'Bakeries',
         'primary': True,
         'shortName': 'Bakery'}],
       'id': '4dc68eddd22dafda2fc00fb8',
       'location': {'address': '114 Fleury Ouest',
        'cc': 'CA',
        'city': 'Montréal',
        'country': 'Canada',
        'distance': 1565,
        'formattedAddress': ['114 Fleury Ouest',
         'Montréal QC H3L 1T6',
         'Canada'],
        'labeledLatLngs': [{'label': 'display',
          'lat': 45.54680337304326,
      

From the Foursquare lab in the previous module, we know that all the information is in the *items* key. Before we proceed, let's borrow the **get_category_type** function from the Foursquare lab.

In [66]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a *pandas* dataframe.

In [67]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = pd.json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues = nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,La Bête à Pain,Bakery,45.546803,-73.665857
1,Parc-nature de l'Île-de-la-Visitation,Park,45.575632,-73.658867
2,Mondou,Pet Store,45.553764,-73.662113
3,132 Bar Vintage,Lounge,45.546571,-73.665895
4,Les Cavistes Fleury,Wine Bar,45.545784,-73.666446


And how many venues were returned by Foursquare?

In [68]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare.


<a id='item2'></a>

## 2. Explore Neighborhoods in Downtown Montreal

#### Let's create a function to repeat the same process to all the neighborhoods in Downtown Montreal

In [83]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):    
    venues_list=[]
    for name, lat, lng, rad in zip(names, latitudes, longitudes, radius):
        print(name)

        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            rad,
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now write the code to run the above function on each neighborhood and create a new dataframe called *montreal_venues*.

In [84]:
montreal_venues = getNearbyVenues(names=df_cases['Borough'],
                                   latitudes=df_cases['Latitude'],
                                   longitudes=df_cases['Longitude'],
                                   radius=df_cases['Area'] ** (1/2) * 1000 / 2
                                 )

Ahuntsic-Cartierville
Anjou
Baie-d'Urfé
Beaconsfield
Côte-des-Neiges-Notre-Dame-de-Grâce
Côte-Saint-Luc
Dollard-des-Ormeaux
Dorval
Hampstead
Kirkland
Lachine
LaSalle
L'Île-Bizard-Sainte-Geneviève
Mercier-Hochelaga-Maisonneuve
Montréal-Est
Montréal-Nord
Montréal-Ouest
Mont-Royal
Outremont
Pierrefonds-Roxboro
Le Plateau-Mont-Royal
Pointe-Claire
Rivière-des-Prairies-Pointe-aux-Trembles
Rosemont-La Petite-Patrie
Sainte-Anne-de-Bellevue
Saint-Laurent
Saint-Léonard
Senneville
Le Sud-Ouest
Verdun
Ville-Marie
Villeray-Saint-Michel-Parc-Extension
Westmount


#### Let's check the size of the resulting dataframe

In [85]:
print(montreal_venues.shape)
montreal_venues.head()

(1872, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Ahuntsic-Cartierville,45.559576,-73.674273,Mondou,45.553764,-73.662113,Pet Store
1,Ahuntsic-Cartierville,45.559576,-73.674273,La Fromagerie Hamel,45.557576,-73.659493,Cheese Shop
2,Ahuntsic-Cartierville,45.559576,-73.674273,La Bête à Pain,45.546803,-73.665857,Bakery
3,Ahuntsic-Cartierville,45.559576,-73.674273,Café de Da d'Ahuntsic,45.552853,-73.662297,Café
4,Ahuntsic-Cartierville,45.559576,-73.674273,L'Estaminet,45.560277,-73.657848,Restaurant


Let's check how many venues were returned for each neighborhood

In [86]:
montreal_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Ahuntsic-Cartierville,100,100,100,100,100,100
Anjou,54,54,54,54,54,54
Baie-d'Urfé,5,5,5,5,5,5
Beaconsfield,39,39,39,39,39,39
Côte-Saint-Luc,17,17,17,17,17,17
Côte-des-Neiges-Notre-Dame-de-Grâce,100,100,100,100,100,100
Dollard-des-Ormeaux,54,54,54,54,54,54
Dorval,62,62,62,62,62,62
Hampstead,5,5,5,5,5,5
Kirkland,40,40,40,40,40,40


#### Let's find out how many unique categories can be curated from all the returned venues

In [87]:
print('There are {} uniques categories.'.format(len(montreal_venues['Venue Category'].unique())))

There are 224 uniques categories.


<a id='item3'></a>

## 3. Analyze Each Neighborhood

In [88]:
# one hot encoding
montreal_onehot = pd.get_dummies(montreal_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
montreal_onehot['Neighborhood'] = montreal_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [montreal_onehot.columns[-1]] + list(montreal_onehot.columns[:-1])
montreal_onehot = montreal_onehot[fixed_columns]

montreal_onehot.head()

Unnamed: 0,Neighborhood,Airport,Airport Lounge,Airport Service,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,...,Transportation Service,Tunnel,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Yoga Studio,Zoo
0,Ahuntsic-Cartierville,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Ahuntsic-Cartierville,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Ahuntsic-Cartierville,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Ahuntsic-Cartierville,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Ahuntsic-Cartierville,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [89]:
montreal_onehot.shape

(1872, 225)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [90]:
montreal_grouped = montreal_onehot.groupby('Neighborhood').mean().reset_index()
montreal_grouped

Unnamed: 0,Neighborhood,Airport,Airport Lounge,Airport Service,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,...,Transportation Service,Tunnel,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Yoga Studio,Zoo
0,Ahuntsic-Cartierville,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0
1,Anjou,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Baie-d'Urfé,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Beaconsfield,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Côte-Saint-Luc,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Côte-des-Neiges-Notre-Dame-de-Grâce,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,...,0.0,0.0,0.02,0.0,0.0,0.03,0.0,0.0,0.01,0.0
6,Dollard-des-Ormeaux,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Dorval,0.032258,0.064516,0.064516,0.016129,0.0,0.0,0.0,0.0,0.0,...,0.016129,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Hampstead,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Kirkland,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Let's confirm the new size

In [91]:
montreal_grouped.shape

(33, 225)

#### Let's print each neighborhood along with the top 5 most common venues

In [93]:
num_top_venues = 5

for hood in montreal_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = montreal_grouped[montreal_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Ahuntsic-Cartierville----
           venue  freq
0       Pharmacy  0.10
1  Grocery Store  0.06
2           Park  0.06
3           Café  0.05
4     Restaurant  0.04


----Anjou----
                  venue  freq
0           Coffee Shop  0.11
1            Restaurant  0.07
2  Fast Food Restaurant  0.07
3        Clothing Store  0.06
4           Pizza Place  0.04


----Baie-d'Urfé----
              venue  freq
0      Liquor Store   0.2
1     Grocery Store   0.2
2  Business Service   0.2
3     Train Station   0.2
4    Sandwich Place   0.2


----Beaconsfield----
                  venue  freq
0    Italian Restaurant  0.05
1              Pharmacy  0.05
2          Soccer Field  0.05
3        Sandwich Place  0.05
4  Gym / Fitness Center  0.05


----Côte-Saint-Luc----
           venue  freq
0           Bank  0.12
1   Liquor Store  0.06
2  Shopping Mall  0.06
3           Park  0.06
4    Golf Course  0.06


----Côte-des-Neiges-Notre-Dame-de-Grâce----
               venue  freq
0               Caf

First, let's write a function to sort the venues in descending order.

In [95]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [96]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = montreal_grouped['Neighborhood']

for ind in np.arange(montreal_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(montreal_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head(45)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Ahuntsic-Cartierville,Pharmacy,Grocery Store,Park,Café,Breakfast Spot,Sandwich Place,Restaurant,Supermarket,Fast Food Restaurant,Pizza Place
1,Anjou,Coffee Shop,Restaurant,Fast Food Restaurant,Clothing Store,Liquor Store,Food & Drink Shop,Pizza Place,Gym,Gas Station,Burger Joint
2,Baie-d'Urfé,Sandwich Place,Business Service,Train Station,Grocery Store,Liquor Store,English Restaurant,Flea Market,Fish Market,Fish & Chips Shop,Filipino Restaurant
3,Beaconsfield,Italian Restaurant,Pub,Gym / Fitness Center,Pharmacy,Restaurant,Pizza Place,Soccer Field,Sandwich Place,Bank,Furniture / Home Store
4,Côte-Saint-Luc,Bank,Pool,Asian Restaurant,Shopping Mall,Fast Food Restaurant,Liquor Store,Baseball Field,Tennis Court,Restaurant,Golf Course
5,Côte-des-Neiges-Notre-Dame-de-Grâce,Café,Indian Restaurant,Coffee Shop,Park,Grocery Store,Gym,Fast Food Restaurant,Restaurant,Vietnamese Restaurant,Pizza Place
6,Dollard-des-Ormeaux,Bank,Pharmacy,Grocery Store,Indian Restaurant,Chinese Restaurant,Sandwich Place,Breakfast Spot,Gas Station,Sushi Restaurant,Skating Rink
7,Dorval,Coffee Shop,Hotel,Airport Lounge,Airport Service,Scenic Lookout,Café,Duty-free Shop,Gastropub,Italian Restaurant,Rental Car Location
8,Hampstead,Home Service,Park,Bagel Shop,Food Service,Ice Cream Shop,Dessert Shop,Event Space,Food & Drink Shop,Food,Flea Market
9,Kirkland,Fast Food Restaurant,Pharmacy,Furniture / Home Store,Ice Cream Shop,Sandwich Place,Italian Restaurant,Bistro,Coffee Shop,Bakery,Soccer Field


<a id='item4'></a>

## 4. Cluster Neighborhoods

Run *k*-means to cluster the neighborhood into 5 clusters.

In [35]:
# set number of clusters
kclusters = 4

montreal_grouped_clustering = montreal_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(montreal_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 0, 2, 1, 1, 2, 1, 1, 2], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [36]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

montreal_merged = df_cases

# merge montreal_grouped with montreal_data to add latitude/longitude for each neighborhood
montreal_merged = montreal_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Borough')

montreal_merged.head() # check the last columns!

Unnamed: 0,Borough,ConfirmedCount,DistributionRate,ConfirmedPer100K,DeathCount,DeathPer100K,Population,Area,BoroughType,Density,...,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Ahuntsic-Cartierville,2397.0,8.6,1785.5,354.0,263.7,134248,25.571187,0,5249.971285,...,Adult Boutique,Pharmacy,Park,Dive Bar,Flower Shop,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store
1,Anjou,718.0,2.6,1677.7,51.0,119.2,42797,13.878194,0,3083.758656,...,Convenience Store,Fast Food Restaurant,BBQ Joint,Drugstore,Donut Shop,Flower Shop,Farmers Market,Falafel Restaurant,Event Space,Dive Bar
2,Baie-d'Urfé,31.0,0.1,810.9,5.0,0.0,3823,8.025921,1,476.331652,...,Music Venue,Wine Bar,Cosmetics Shop,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dive Bar,Donut Shop
3,Beaconsfield,62.0,0.2,320.8,9.0,0.0,19327,24.922506,1,775.483821,...,Hockey Arena,Park,Soccer Field,Flower Shop,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dive Bar
4,Côte-des-Neiges-Notre-Dame-de-Grâce,2258.0,8.1,1356.0,259.0,155.5,166519,21.483755,0,7750.926334,...,Ice Cream Shop,Tennis Court,Bagel Shop,Dive Bar,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Drugstore,Donut Shop


Finally, let's visualize the resulting clusters

In [97]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(montreal_merged['Latitude'], montreal_merged['Longitude'], montreal_merged['Borough'], montreal_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

TypeError: list indices must be integers or slices, not float

<a id='item5'></a>

## 5. Examine Clusters

Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, you can then assign a name to each cluster. I will leave this exercise to you.

#### Cluster 1: Parks & outdoors

In [98]:
montreal_merged.loc[montreal_merged['Cluster Labels'] == 0, montreal_merged.columns[[1] + list(range(5, montreal_merged.shape[1]))]]

Unnamed: 0,ConfirmedCount,DeathPer100K,Population,Area,BoroughType,Density,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,31.0,0.0,3823,8.025921,1,476.331652,45.412291,-73.913494,0.0,Music Venue,Wine Bar,Cosmetics Shop,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dive Bar,Donut Shop


#### Cluster 2: Coffee shops

In [99]:
montreal_merged.loc[montreal_merged['Cluster Labels'] == 1, montreal_merged.columns[[1] + list(range(5, montreal_merged.shape[1]))]]

Unnamed: 0,ConfirmedCount,DeathPer100K,Population,Area,BoroughType,Density,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,2397.0,263.7,134248,25.571187,0,5249.971285,45.559576,-73.674273,1.0,Adult Boutique,Pharmacy,Park,Dive Bar,Flower Shop,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store
1,718.0,119.2,42797,13.878194,0,3083.758656,45.609607,-73.556824,1.0,Convenience Store,Fast Food Restaurant,BBQ Joint,Drugstore,Donut Shop,Flower Shop,Farmers Market,Falafel Restaurant,Event Space,Dive Bar
4,2258.0,155.5,166519,21.483755,0,7750.926334,45.476786,-73.640154,1.0,Ice Cream Shop,Tennis Court,Bagel Shop,Dive Bar,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Drugstore,Donut Shop
5,524.0,188.0,32448,6.81021,1,4764.611012,45.461789,-73.660656,1.0,Bank,Pharmacy,Deli / Bodega,Farmers Market,Wine Bar,Fast Food Restaurant,Falafel Restaurant,Event Space,Drugstore,Dive Bar
7,200.0,252.9,18981,28.15615,1,674.133368,45.469728,-73.743065,1.0,Airport Service,Wine Bar,Donut Shop,Flower Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Drugstore,Dive Bar
8,56.0,0.0,6973,1.768055,1,3943.881204,45.478693,-73.641883,1.0,Ice Cream Shop,Bagel Shop,Park,Donut Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Drugstore,Wine Bar
11,1252.0,232.9,76852,25.197268,0,3050.013221,45.431967,-73.619272,1.0,Home Service,Fast Food Restaurant,Grocery Store,Gift Shop,Donut Shop,Farmers Market,Falafel Restaurant,Event Space,Drugstore,Dive Bar
13,2395.0,275.7,136025,27.408412,0,4962.892438,45.586082,-73.536675,1.0,Restaurant,Coffee Shop,Breakfast Spot,Bus Station,Liquor Store,Sushi Restaurant,Donut Shop,Gas Station,Pizza Place,Vietnamese Restaurant
15,2597.0,293.2,84233,12.430208,0,6776.475378,45.595179,-73.636356,1.0,Pharmacy,Fast Food Restaurant,Comedy Club,Auto Workshop,Bar,Donut Shop,Farmers Market,Falafel Restaurant,Event Space,Drugstore
16,23.0,0.0,5051,1.419449,1,3558.422241,45.454891,-73.654088,1.0,Playground,Pool,Park,French Restaurant,Skating Rink,Dive Bar,Event Space,Drugstore,Donut Shop,Diner


#### Cluster 3: Airport & travel

In [100]:
montreal_merged.loc[montreal_merged['Cluster Labels'] == 2, montreal_merged.columns[[1] + list(range(5, montreal_merged.shape[1]))]]

Unnamed: 0,ConfirmedCount,DeathPer100K,Population,Area,BoroughType,Density,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,62.0,0.0,19327,24.922506,1,775.483821,45.428059,-73.870311,2.0,Hockey Arena,Park,Soccer Field,Flower Shop,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dive Bar
6,430.0,165.6,48897,15.065159,1,3245.700946,45.491093,-73.826586,2.0,Print Shop,Park,Wine Bar,Flower Shop,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dive Bar
9,121.0,119.1,20150,9.687581,1,2079.982546,45.451867,-73.879104,2.0,Construction & Landscaping,Park,Tennis Court,Wine Bar,Dive Bar,Farmers Market,Falafel Restaurant,Event Space,Drugstore,Donut Shop
18,274.0,41.7,23953,3.813356,0,6281.344238,45.514219,-73.607894,2.0,Park,Mountain,Café,Wine Bar,Flower Shop,Department Store,Dessert Shop,Diner,Discount Store,Dive Bar
24,18.0,0.0,4959,11.150546,1,444.731596,45.42825,-73.929492,2.0,Park,Wine Bar,Cosmetics Shop,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dive Bar,Donut Shop
28,979.0,229.0,78151,18.144269,0,4307.200134,45.469547,-73.580558,2.0,Park,Food Truck,Café,Flower Shop,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dive Bar
32,193.0,118.2,20312,4.016301,1,5057.389864,45.484906,-73.597496,2.0,Building,Gym,Park,Garden,Wine Bar,Donut Shop,Falafel Restaurant,Event Space,Drugstore,Discount Store


#### Cluster 4: Grocery stores

In [101]:
montreal_merged.loc[montreal_merged['Cluster Labels'] == 3, montreal_merged.columns[[1] + list(range(5, montreal_merged.shape[1]))]]

Unnamed: 0,ConfirmedCount,DeathPer100K,Population,Area,BoroughType,Density,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,675.0,233.8,44490,23.127786,0,1923.660138,45.452941,-73.695655,3.0,Intersection,Wine Bar,Food Truck,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dive Bar,Donut Shop
