# Capstone Project - The Battle of the Neighborhoods (Week 4 & 5)
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Similarity of Neighborhoods](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Similarity of Neighborhoods <a name="introduction"></a>

In this project, we will try to make comparison on neighborhoods of several major financial capitals. Inspired by the question asked in the Week 4 description, namely the similarity or dissimilarity of cities, we will try to group the neighborhoods and boroughs over different cities, as a preliminary attempt to identify the functionality and types of them. My hope it that such categorization will shed light on the design of cities of similar type in the future. 

## Data <a name="data"></a>

Aside from a couple of tables, which we will obtain by scraping some webpages, we will also make use of **geopy** for the geospatial location of neighboorhoods. We will also access information of venues using **Foursquare API**.

In [1]:
import numpy as np
import pandas as pd
import requests
import lxml.html as lh
!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import json # library to handle JSON files
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Libraries imported.


In [2]:
url = 'https://en.wikipedia.org/wiki/Arrondissements_of_Paris'
page = requests.get(url)
doc = lh.fromstring(page.content)
tr_elements = doc.xpath('//tr')

Through trial and error, we get to know that the 15-34 entries are the "Arrondissements"(French word for county or district) we need. And we need only the names for future location checking. We will make a list of these names.

In [3]:
paris_dist = []
for x in range(15,35):
    paris_dist.append(tr_elements[x].text_content().split('\n')[2])
print(paris_dist)

['Louvre', 'Bourse', 'Temple', 'Hôtel-de-Ville', 'Panthéon', 'Luxembourg', 'Palais-Bourbon', 'Élysée', 'Opéra', 'Entrepôt', 'Popincourt', 'Reuilly', 'Gobelins', 'Observatoire', 'Vaugirard', 'Passy', 'Batignolles-Monceau', 'Butte-Montmartre', 'Buttes-Chaumont', 'Ménilmontant']


In [4]:
suffix = ', Paris, France'
lats = []
longs = []

geolocator = Nominatim(user_agent="ny_explorer")
for i in range(len(paris_dist)):
    location = geolocator.geocode(paris_dist[i] + suffix)
    latitude = location.latitude
    longitude = location.longitude
    lats.append(latitude)
    longs.append(longitude)
print(lats,longs)

[48.8611473, 48.8686296, 48.8665004, 48.856426299999995, 48.84619085, 48.8504333, 48.86159615, 48.8466437, 48.8706446, 48.876106, 48.858416, 48.8396154, 48.8323973, 48.8295667, 48.8413705, 48.8575047, 48.881452, 48.8900117, 48.8783961, 48.8667079] [2.33802768704666, 2.3414739, 2.360708, 2.3525275780116073, 2.346078521905153, 2.3329507, 2.3179092733655935, 2.3698297, 2.33233, 2.35991, 2.379703, 2.3957517, 2.3555829, 2.3239624642685364, 2.3003827, 2.2809828, 2.3166666, 2.3464668, 2.3812008, 2.3833739]


In [5]:
paris_dict = {'Hood': paris_dist, 'Latitude': lats, 'Longitude':longs}
df_paris = pd.DataFrame(paris_dict)
df_paris

Unnamed: 0,Hood,Latitude,Longitude
0,Louvre,48.861147,2.338028
1,Bourse,48.86863,2.341474
2,Temple,48.8665,2.360708
3,Hôtel-de-Ville,48.856426,2.352528
4,Panthéon,48.846191,2.346079
5,Luxembourg,48.850433,2.332951
6,Palais-Bourbon,48.861596,2.317909
7,Élysée,48.846644,2.36983
8,Opéra,48.870645,2.33233
9,Entrepôt,48.876106,2.35991


In [6]:
address = 'Paris, France'

location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Paris are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Paris are 48.8566969, 2.3514616.


In [7]:
# create map of New York using latitude and longitude values
map_paris = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, neighborhood in zip(df_paris['Latitude'], df_paris['Longitude'], df_paris['Hood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_paris)  
    
map_paris

In [8]:
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
page = requests.get(url)
doc = lh.fromstring(page.content)
tr_elements = doc.xpath('//tr')

In [9]:
col=[]
i=0
#For each row, store each first element (header) and an empty list
for t in tr_elements[0]:
    i+=1
    name=t.text_content().replace('\n','')
    print('%d:"%s"'%(i,name))
    col.append((name,[]))

1:"Postal Code"
2:"Borough"
3:"Neighborhood"


In [10]:
#Since out first row is the header, data is stored on the second row onwards
for j in range(1,len(tr_elements)):
    #T is our j'th row
    T=tr_elements[j]
    
    #If row is not of size 10, the //tr data is not from our table 
    if len(T)!=3:
        break
    
    #i is the index of our column
    i=0
    
    #Iterate through each element of the row
    for t in T.iterchildren():
        data=t.text_content().replace('\n','')
        col[i][1].append(data)
        #Increment i for the next column
        i+=1

In [11]:
Dict={title:column for (title,column) in col}
df=pd.DataFrame(Dict)
# Somehow the last row of [[],['Canadian Postal Code'], []] is always included. 
# Couldn't figure out the cause, just deleted it directly.
df = df[:-1]
msk = (df.Borough == 'Not assigned')
df = df[~msk]
df.reset_index(drop=True)
df.rename(columns={'Neighborhood':'Hood'},inplace=True)

In [12]:
ll = pd.read_csv("https://cocl.us/Geospatial_data")
df = df.merge(ll, on='Postal Code')
msk = []
for x in df['Borough']:
    msk.append('Toronto' in x)
msk
df_toronto = df[msk]
df_toronto.reset_index(drop=True)

Unnamed: 0,Postal Code,Borough,Hood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M4E,East Toronto,The Beaches,43.676357,-79.293031
5,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
6,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
7,M6G,Downtown Toronto,Christie,43.669542,-79.422564
8,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568
9,M6H,West Toronto,"Dufferin, Dovercourt Village",43.669005,-79.442259


In [13]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [14]:
neighborhoods_data = newyork_data['features']
# define the dataframe columns
column_names = ['Borough', 'Hood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Hood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [15]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)
# Change the name
df_newyork = neighborhoods

The dataframe has 5 boroughs and 306 neighborhoods.


In [16]:
df_newyork.head()

Unnamed: 0,Borough,Hood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


(306, 4)

In [17]:
#CLIENT_ID = '1FYLRB3WAXIP1BW2I4NNI3J5FHSEN0TQGXVONDYOA2HXLGWC' # your Foursquare ID
#CLIENT_SECRET = 'ZJ5BPI4P1LQZLCONL00PI1DMZZ1W2QI1CEIKAJEDDUZ1M5DG' # your Foursquare Secret
#CLIENT_ID = 'VL3S4DCT1KRZF3FTAC2JZLNZYQLOHPQ4HMEEITVKAMZRMMOZ' # your Foursquare ID
#CLIENT_SECRET = 'QLJMPTMTLO5FUEYY4CYS22BEGVHUMWNFANPKCGRQYE4YXYQB' # your Foursquare Secret
CLIENT_ID = 'C45QATOVZIADLSRNQNAYNDQZ33XPY1RSK2JL2XSKPHLQ3SGC'
CLIENT_SECRET = '4GCHXKWENSVVTJRJO0NB3QB0GKXR4YRW4GRGL3RVRJW3UTA2'
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: C45QATOVZIADLSRNQNAYNDQZ33XPY1RSK2JL2XSKPHLQ3SGC
CLIENT_SECRET:4GCHXKWENSVVTJRJO0NB3QB0GKXR4YRW4GRGL3RVRJW3UTA2


In [18]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        LIMIT = 100
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Hood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [19]:
toronto_venues = getNearbyVenues(names=df_toronto['Hood'],
                                   latitudes=df_toronto['Latitude'],
                                   longitudes=df_toronto['Longitude']
                                  )
paris_venues = getNearbyVenues(names=df_paris['Hood'],
                                   latitudes=df_paris['Latitude'],
                                   longitudes=df_paris['Longitude']
                                  )
newyork_venues = getNearbyVenues(names=df_newyork['Hood'],
                                   latitudes=df_newyork['Latitude'],
                                   longitudes=df_newyork['Longitude']
                                  )
print('Done loading venues.')

Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
The Beaches
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
The Danforth West, Riverdale
Toronto Dominion Centre, Design Exchange
Brockton, Parkdale Village, Exhibition Place
India Bazaar, The Beaches West
Commerce Court, Victoria Hotel
Studio District
Lawrence Park
Roselawn
Davisville North
Forest Hill North & West
High Park, The Junction South
North Toronto West
The Annex, North Midtown, Yorkville
Parkdale, Roncesvalles
Davisville
University of Toronto, Harbord
Runnymede, Swansea
Moore Park, Summerhill East
Kensington Market, Chinatown, Grange Park
Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Rosedale
Stn A PO Boxes
St. James Town,

In [20]:
toronto_venues.to_csv(r'toronto_venues.csv', index = False)
paris_venues.to_csv(r'paris_venues.csv', index = False)
newyork_venues.to_csv(r'newyork_venues.csv', index = False)
print('Venue files saved.')

Venue files saved.


In [21]:
print(toronto_venues.shape)
toronto_venues.head()

(1613, 7)


Unnamed: 0,Hood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Regent Park, Harbourfront",43.65426,-79.360636,Morning Glory Cafe,43.653947,-79.361149,Breakfast Spot
3,"Regent Park, Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
4,"Regent Park, Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa


In [22]:
print(paris_venues.shape)
paris_venues.head()

(1312, 7)


Unnamed: 0,Hood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Louvre,48.861147,2.338028,Cour Carrée du Louvre,48.86036,2.338543,Pedestrian Plaza
1,Louvre,48.861147,2.338028,Musée du Louvre,48.860847,2.33644,Art Museum
2,Louvre,48.861147,2.338028,La Vénus de Milo (Vénus de Milo),48.859943,2.337234,Exhibit
3,Louvre,48.861147,2.338028,Place du Palais Royal,48.862523,2.336688,Plaza
4,Louvre,48.861147,2.338028,Palais Royal,48.863236,2.337127,Historic Site


In [23]:
print(newyork_venues.shape)
newyork_venues.head()

(9892, 7)


Unnamed: 0,Hood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Wakefield,40.894705,-73.847201,Lollipops Gelato,40.894123,-73.845892,Dessert Shop
1,Wakefield,40.894705,-73.847201,Carvel Ice Cream,40.890487,-73.848568,Ice Cream Shop
2,Wakefield,40.894705,-73.847201,Walgreens,40.896528,-73.8447,Pharmacy
3,Wakefield,40.894705,-73.847201,Rite Aid,40.896649,-73.844846,Pharmacy
4,Wakefield,40.894705,-73.847201,Dunkin',40.890459,-73.849089,Donut Shop


False

In [24]:
print('Neighborhood' in toronto_venues.columns.to_list())
print('Neighborhood' in paris_venues.columns.to_list())
print('Neighborhood' in newyork_venues.columns.to_list())

False
False
False


In [25]:
print('There are {} uniques categories in Toronto venues.'.format(len(toronto_venues['Venue Category'].unique())))
print('There are {} uniques categories in Paris venues.'.format(len(paris_venues['Venue Category'].unique())))
print('There are {} uniques categories in New York venues.'.format(len(newyork_venues['Venue Category'].unique())))

There are 239 uniques categories in Toronto venues.
There are 208 uniques categories in Paris venues.
There are 428 uniques categories in New York venues.


In [26]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Hood'] = toronto_venues['Hood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])

toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Hood,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [27]:
# one hot encoding
paris_onehot = pd.get_dummies(paris_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
paris_onehot['Hood'] = paris_venues['Hood'] 

# move neighborhood column to the first column
fixed_columns = [paris_onehot.columns[-1]] + list(paris_onehot.columns[:-1])
paris_onehot = paris_onehot[fixed_columns]

paris_onehot.head()

Unnamed: 0,Hood,Afghan Restaurant,African Restaurant,Alsatian Restaurant,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Toy / Game Store,Trail,Train Station,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store
0,Louvre,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Louvre,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Louvre,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Louvre,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Louvre,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [28]:
# one hot encoding
newyork_onehot = pd.get_dummies(newyork_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
newyork_onehot['Hood'] = newyork_venues['Hood'] 

# move neighborhood column to the first column
fixed_columns = [newyork_onehot.columns[-1]] + list(newyork_onehot.columns[:-1])
newyork_onehot = newyork_onehot[fixed_columns]

newyork_onehot.head()

Unnamed: 0,Hood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport Terminal,American Restaurant,Animal Shelter,Antique Shop,Arcade,...,Warehouse Store,Waste Facility,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Wakefield,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Wakefield,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Wakefield,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Wakefield,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Wakefield,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [29]:
toronto_grouped = toronto_onehot.groupby('Hood').mean().reset_index()
paris_grouped = paris_onehot.groupby('Hood').mean().reset_index()
newyork_grouped = newyork_onehot.groupby('Hood').mean().reset_index()

In [30]:
toronto_grouped.head()

Unnamed: 0,Hood,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.0
1,"Brockton, Parkdale Village, Exhibition Place",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Business reply mail Processing Centre,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.066667,0.066667,0.066667,0.066667,0.2,0.133333,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.015385


In [62]:
paris_grouped.head()

Unnamed: 0,Hood,Afghan Restaurant,African Restaurant,Alsatian Restaurant,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Toy / Game Store,Trail,Train Station,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store
0,Batignolles-Monceau,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0
1,Bourse,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.07,0.01,0.02
2,Butte-Montmartre,0.0,0.0,0.0,0.0,0.014085,0.014085,0.0,0.0,0.014085,...,0.0,0.0,0.0,0.0,0.0,0.014085,0.0,0.014085,0.0,0.0
3,Buttes-Chaumont,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0
4,Entrepôt,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,...,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.02,0.0


In [143]:
newyork_grouped.shape

(300, 429)

In [187]:
def kmeans(df, kclusters = 5):

    df_clustering = df.drop('Hood', 1)

    # run k-means clustering
    kmeansclustering = KMeans(n_clusters=kclusters, random_state=0).fit(df_clustering)

    # check cluster labels generated for each row in the dataframe
    return kmeansclustering.labels_

In [188]:
toronto_labels = kmeans(toronto_grouped)
paris_labels = kmeans(paris_grouped)
newyork_labels = kmeans(newyork_grouped)

In [189]:
toronto_labels

array([0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 2, 0, 0, 1, 0, 1, 3, 1, 2, 0,
       1, 0, 0, 0, 2, 4, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1], dtype=int32)

In [190]:
paris_labels

array([2, 3, 4, 1, 3, 0, 3, 3, 2, 4, 1, 0, 1, 3, 0, 2, 3, 3, 3, 0],
      dtype=int32)

In [191]:
newyork_labels

array([0, 0, 4, 4, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 4, 0, 0, 4, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 2, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0,
       0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       4, 4, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0,
       0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4,
       0, 0, 0, 4, 0, 0, 3, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 3, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 4, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0], d

In [192]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [193]:
def mcv(df_grouped, num_top_venues = 10):

    indicators = ['st', 'nd', 'rd']

    # create columns according to number of top venues
    columns = ['Hood']
    for ind in np.arange(num_top_venues):
        try:
            columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
        except:
            columns.append('{}th Most Common Venue'.format(ind+1))

    # create a new dataframe
    neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
    neighborhoods_venues_sorted['Hood'] = df_grouped['Hood']

    for ind in np.arange(df_grouped.shape[0]):
        neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(df_grouped.iloc[ind, :], num_top_venues)
    return neighborhoods_venues_sorted

In [194]:
def assemble(df, df_grouped, df_labels):
    df_mcv = mcv(df_grouped)

    # add clustering labels
    df_mcv.insert(0, 'Cluster Labels', df_labels)

    df_merged = df

    # merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
    df_merged = df_merged.join(df_mcv.set_index('Hood'), on='Hood')

    return df_merged

In [195]:
toronto_merged = assemble(df_toronto, toronto_grouped, toronto_labels)
paris_merged = assemble(df_paris, paris_grouped, paris_labels)
newyork_merged = assemble(df_newyork, newyork_grouped, newyork_labels)

In [132]:
toronto_merged.head()

Unnamed: 0,Postal Code,Borough,Hood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,0,Coffee Shop,Pub,Bakery,Park,Breakfast Spot,Theater,Café,Health Food Store,Historic Site,Hotel
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,0,Coffee Shop,College Cafeteria,Sushi Restaurant,Yoga Studio,Creperie,Japanese Restaurant,Diner,Mexican Restaurant,Smoothie Shop,Bar
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,0,Clothing Store,Coffee Shop,Café,Cosmetics Shop,Restaurant,Middle Eastern Restaurant,Italian Restaurant,Japanese Restaurant,Bubble Tea Shop,Tea Room
15,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,0,Coffee Shop,Café,Cocktail Bar,Gastropub,American Restaurant,Cosmetics Shop,Department Store,Moroccan Restaurant,Lingerie Store,Italian Restaurant
19,M4E,East Toronto,The Beaches,43.676357,-79.293031,0,Neighborhood,Health Food Store,Pub,Trail,Eastern European Restaurant,Electronics Store,Donut Shop,Doner Restaurant,Dance Studio,Dog Run


In [133]:
paris_merged.head()

Unnamed: 0,Hood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Louvre,48.861147,2.338028,0,French Restaurant,Plaza,Hotel,Art Museum,Historic Site,Cosmetics Shop,Garden,Italian Restaurant,Boutique,Bakery
1,Bourse,48.86863,2.341474,0,French Restaurant,Hotel,Wine Bar,Cocktail Bar,Clothing Store,Bakery,Bistro,Creperie,Indie Movie Theater,Italian Restaurant
2,Temple,48.8665,2.360708,0,French Restaurant,Hotel,Wine Bar,Restaurant,Art Gallery,Bar,Bakery,Sandwich Place,Italian Restaurant,Vietnamese Restaurant
3,Hôtel-de-Ville,48.856426,2.352528,0,French Restaurant,Ice Cream Shop,Plaza,Art Gallery,Clothing Store,Park,Cosmetics Shop,Wine Bar,Hotel,Dessert Shop
4,Panthéon,48.846191,2.346079,0,French Restaurant,Hotel,Bar,Pub,Italian Restaurant,Bakery,Indie Movie Theater,Ice Cream Shop,Creperie,Café


In [139]:
newyork_merged.head()

Unnamed: 0,Borough,Hood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bronx,Wakefield,40.894705,-73.847201,0.0,Pharmacy,Ice Cream Shop,Donut Shop,Sandwich Place,Gas Station,Dessert Shop,Deli / Bodega,Laundromat,Fish & Chips Shop,Financial or Legal Service
1,Bronx,Co-op City,40.874294,-73.829939,0.0,Fast Food Restaurant,Bus Station,Discount Store,Baseball Field,Park,Grocery Store,Bagel Shop,Pharmacy,Mattress Store,Pizza Place
2,Bronx,Eastchester,40.887556,-73.827806,0.0,Bus Station,Caribbean Restaurant,Deli / Bodega,Diner,Donut Shop,Seafood Restaurant,Pizza Place,Platform,Bus Stop,Cosmetics Shop
3,Bronx,Fieldston,40.895437,-73.905643,0.0,Plaza,Bus Station,Field,English Restaurant,Entertainment Service,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Factory
4,Bronx,Riverdale,40.890834,-73.912585,0.0,Park,Bus Station,Bank,Plaza,Gym,Home Service,Baseball Field,Food Truck,Fish Market,Factory


In [196]:
toronto_merged.isnull().values.any()

False

In [197]:
paris_merged.isnull().values.any()

False

In [198]:
newyork_merged.isnull().values.any()

True

In [199]:
newyork_merged.dropna(inplace = True)
newyork_merged.isnull().values.any()

False

In [200]:
def MapAndMarkers(address, kclusters, df):
    geolocator = Nominatim(user_agent="ny_explorer", timeout=3)
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude

    # create map
    map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

    # set color scheme for the clusters
    x = np.arange(kclusters)
    ys = [i + x + (i*x)**2 for i in range(kclusters)]
    colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
    rainbow = [colors.rgb2hex(i) for i in colors_array]

    # add markers to the map
    markers_colors = []
    for lat, lon, poi, cluster in zip(df['Latitude'], df['Longitude'], df['Hood'], df['Cluster Labels'].astype(int)):
        label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
        folium.CircleMarker(
            [lat, lon],
            radius=5,
            popup=label,
            color=rainbow[cluster-1],
            fill=True,
            fill_color=rainbow[cluster-1],
            fill_opacity=0.7).add_to(map_clusters)
       
    display(map_clusters)

In [201]:
MapAndMarkers('Toronto, Canada', 5, toronto_merged)

In [204]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Downtown Toronto,0,Coffee Shop,Pub,Bakery,Park,Breakfast Spot,Theater,Café,Health Food Store,Historic Site,Hotel
4,Downtown Toronto,0,Coffee Shop,College Cafeteria,Sushi Restaurant,Yoga Studio,Creperie,Japanese Restaurant,Diner,Mexican Restaurant,Smoothie Shop,Bar
9,Downtown Toronto,0,Clothing Store,Coffee Shop,Café,Cosmetics Shop,Restaurant,Middle Eastern Restaurant,Italian Restaurant,Japanese Restaurant,Bubble Tea Shop,Tea Room
15,Downtown Toronto,0,Coffee Shop,Café,Cocktail Bar,Gastropub,American Restaurant,Cosmetics Shop,Department Store,Moroccan Restaurant,Lingerie Store,Italian Restaurant
20,Downtown Toronto,0,Coffee Shop,Cocktail Bar,Beer Bar,Bakery,Cheese Shop,Seafood Restaurant,Café,Restaurant,Japanese Restaurant,Hotel
24,Downtown Toronto,0,Coffee Shop,Sandwich Place,Italian Restaurant,Café,Salad Place,Bar,Bubble Tea Shop,Burger Joint,Ice Cream Shop,Japanese Restaurant
25,Downtown Toronto,0,Grocery Store,Café,Park,Nightclub,Coffee Shop,Italian Restaurant,Baby Store,Diner,Athletics & Sports,Candy Store
30,Downtown Toronto,0,Coffee Shop,Café,Restaurant,Hotel,Deli / Bodega,Gym,Clothing Store,Thai Restaurant,Salad Place,Cosmetics Shop
36,Downtown Toronto,0,Coffee Shop,Aquarium,Café,Hotel,Italian Restaurant,Scenic Lookout,Sporting Goods Shop,Fried Chicken Joint,Brewery,Restaurant
42,Downtown Toronto,0,Coffee Shop,Hotel,Café,Restaurant,Salad Place,Italian Restaurant,American Restaurant,Japanese Restaurant,Seafood Restaurant,Deli / Bodega


The biggest class of neighborhoods of Toronto, as we can see, consists of many restaurants, bars, and cafe as the most popular venues. As can be seen, this group locates at the very center of the city. This is, we can guess, where much is going on. The existence of banks as popular spots came off as a surprise, but it's really not considering Toronto's role as the financial hub of Canada. 

In [205]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
19,East Toronto,1,Neighborhood,Health Food Store,Pub,Trail,Eastern European Restaurant,Electronics Store,Donut Shop,Doner Restaurant,Dance Studio,Dog Run
31,West Toronto,1,Pharmacy,Bakery,Art Gallery,Bank,Supermarket,Middle Eastern Restaurant,Brewery,Café,Portuguese Restaurant,Pool
37,West Toronto,1,Bar,Restaurant,Asian Restaurant,Vegetarian / Vegan Restaurant,Men's Store,Café,Yoga Studio,Beer Store,New American Restaurant,Boutique
41,East Toronto,1,Greek Restaurant,Italian Restaurant,Coffee Shop,Ice Cream Shop,Restaurant,Furniture / Home Store,Liquor Store,Indian Restaurant,Spa,Juice Bar
69,West Toronto,1,Café,Mexican Restaurant,Thai Restaurant,Fast Food Restaurant,Fried Chicken Joint,Convenience Store,Music Venue,Diner,Cajun / Creole Restaurant,Restaurant
75,West Toronto,1,Breakfast Spot,Gift Shop,Movie Theater,Cuban Restaurant,Eastern European Restaurant,Dog Run,Bar,Italian Restaurant,Dessert Shop,Restaurant
80,Downtown Toronto,1,Café,Bar,Italian Restaurant,Japanese Restaurant,Bookstore,Restaurant,Bakery,Yoga Studio,Pub,Beer Bar
84,Downtown Toronto,1,Café,Bakery,Vietnamese Restaurant,Coffee Shop,Mexican Restaurant,Bar,Dessert Shop,Gaming Cafe,Vegetarian / Vegan Restaurant,Donut Shop
87,Downtown Toronto,1,Airport Service,Airport Terminal,Plane,Rental Car Location,Boat or Ferry,Sculpture Garden,Harbor / Marina,Boutique,Airport Lounge,Airport Gate
100,East Toronto,1,Gym / Fitness Center,Auto Workshop,Garden Center,Garden,Fast Food Restaurant,Farmers Market,Light Rail Station,Comic Shop,Park,Recording Studio


The second biggest group of neighborhoods, as we can see, locate some what further away from the center than the 1st tier. However, the occurance of pool, supermarket, and trail hint at a vibe of life. 

In [206]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
68,Central Toronto,2,Trail,Jewelry Store,Park,Sushi Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Yoga Studio
83,Central Toronto,2,Restaurant,Park,Trail,Deli / Bodega,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Donut Shop,Doner Restaurant
91,Downtown Toronto,2,Park,Trail,Playground,Doner Restaurant,Dessert Shop,Diner,Discount Store,Distribution Center,Dog Run,Yoga Studio


The 3rd group is way smaller than the big two. Trail, parks and playgrounds, suggests more of a life vibe and more public facilities than the 2nd group. And geographically we can see they locate further away from center.  

In [207]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
61,Central Toronto,3,Park,Bus Line,Swim School,Yoga Studio,Department Store,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Donut Shop


In [208]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
62,Central Toronto,4,Home Service,Garden,Yoga Studio,Department Store,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Donut Shop


The last two groups consists of one neighborhood each. Note that the last 5 places fall in almost the same categories. They are probably residential areas too. 

In [202]:
MapAndMarkers('Paris, France', 5, paris_merged)

In [211]:
paris_merged.loc[paris_merged['Cluster Labels'] == 0, paris_merged.columns[[0] + list(range(3, paris_merged.shape[1]))]]

Unnamed: 0,Hood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,Élysée,0,Hotel,Sandwich Place,Bar,French Restaurant,Hotel Bar,Train Station,Plaza,Coffee Shop,Bakery,Shopping Mall
8,Opéra,0,Hotel,Japanese Restaurant,French Restaurant,Taiwanese Restaurant,Theater,Italian Restaurant,Sandwich Place,Bookstore,Chocolate Shop,Jewelry Store
12,Gobelins,0,Hotel,Thai Restaurant,French Restaurant,Italian Restaurant,Indian Restaurant,Bar,Vietnamese Restaurant,Bakery,Gaming Cafe,Pub
15,Passy,0,French Restaurant,Hotel,Italian Restaurant,Japanese Restaurant,Thai Restaurant,Plaza,Bakery,Chinese Restaurant,Supermarket,Museum


The first group is the one corresponding to the red markers. It consists of, again, quite a lot of restaurants and bars. Many hotels take the first and second place and in popularity, indicating we are looking at a tourism city indeed.  

In [212]:
paris_merged.loc[paris_merged['Cluster Labels'] == 1, paris_merged.columns[[0] + list(range(3, paris_merged.shape[1]))]]

Unnamed: 0,Hood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Palais-Bourbon,1,French Restaurant,Plaza,Hotel,Pedestrian Plaza,Food Truck,Italian Restaurant,Beer Garden,Cultural Center,Restaurant,Smoke Shop
13,Observatoire,1,French Restaurant,Hotel,Bakery,EV Charging Station,Café,Fast Food Restaurant,Bus Stop,Modern European Restaurant,Food & Drink Shop,Pizza Place
18,Buttes-Chaumont,1,French Restaurant,Restaurant,Park,Bar,Pool,Italian Restaurant,Coffee Shop,Gas Station,Café,Soup Place


In [213]:
paris_merged.loc[paris_merged['Cluster Labels'] == 2, paris_merged.columns[[0] + list(range(3, paris_merged.shape[1]))]]

Unnamed: 0,Hood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Luxembourg,2,Italian Restaurant,Wine Bar,Plaza,French Restaurant,Bistro,Café,Cocktail Bar,Clothing Store,Chocolate Shop,Pastry Shop
10,Popincourt,2,French Restaurant,Bar,Café,Cocktail Bar,Pastry Shop,Bistro,Italian Restaurant,Restaurant,Coffee Shop,Japanese Restaurant
16,Batignolles-Monceau,2,French Restaurant,Italian Restaurant,Farmers Market,Restaurant,Lebanese Restaurant,Creperie,Pastry Shop,Deli / Bodega,Café,Coffee Shop


In [214]:
paris_merged.loc[paris_merged['Cluster Labels'] == 3, paris_merged.columns[[0] + list(range(3, paris_merged.shape[1]))]]

Unnamed: 0,Hood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Louvre,3,French Restaurant,Plaza,Hotel,Art Museum,Historic Site,Cosmetics Shop,Garden,Italian Restaurant,Boutique,Bakery
1,Bourse,3,French Restaurant,Hotel,Wine Bar,Cocktail Bar,Clothing Store,Bakery,Bistro,Creperie,Indie Movie Theater,Italian Restaurant
2,Temple,3,French Restaurant,Hotel,Wine Bar,Restaurant,Art Gallery,Bar,Bakery,Sandwich Place,Italian Restaurant,Vietnamese Restaurant
3,Hôtel-de-Ville,3,French Restaurant,Ice Cream Shop,Plaza,Art Gallery,Clothing Store,Park,Cosmetics Shop,Wine Bar,Hotel,Dessert Shop
4,Panthéon,3,French Restaurant,Hotel,Bar,Pub,Italian Restaurant,Bakery,Indie Movie Theater,Ice Cream Shop,Creperie,Café
9,Entrepôt,3,French Restaurant,Hotel,Coffee Shop,Bistro,Café,Pizza Place,Restaurant,Indian Restaurant,Japanese Restaurant,Bar
11,Reuilly,3,Hotel,French Restaurant,Supermarket,Bistro,Pizza Place,Bakery,Sushi Restaurant,Farmers Market,Beer Bar,Bookstore
14,Vaugirard,3,French Restaurant,Italian Restaurant,Hotel,Supermarket,Coffee Shop,Bar,Lebanese Restaurant,Korean Restaurant,Japanese Restaurant,Spanish Restaurant


In [215]:
paris_merged.loc[paris_merged['Cluster Labels'] == 4, paris_merged.columns[[0] + list(range(3, paris_merged.shape[1]))]]

Unnamed: 0,Hood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,Butte-Montmartre,4,Bar,French Restaurant,Middle Eastern Restaurant,Bistro,Italian Restaurant,Pizza Place,Plaza,Convenience Store,Sandwich Place,Coffee Shop
19,Ménilmontant,4,Bar,Pizza Place,Cocktail Bar,French Restaurant,Brewery,Beer Bar,Hotel,Burger Joint,Italian Restaurant,Restaurant


The 2nd, 3rd, 4th, 5th group are quite similar in their categories, namely many popular restaurants, among which, French ones are the most dominant. Also, the biggest group, which locates geographically quite close to the center, have got hotels as their second or the first most popular venues. So the presence of tourism in Paris is for real. Overall, neighborhoods in Paris are not as as diversed in their functionalities as those in Toronto.  

In [203]:
MapAndMarkers('New York, NY', 5, newyork_merged)

In [219]:
newyork_merged.loc[newyork_merged['Cluster Labels'] == 0, newyork_merged.columns[[1] + list(range(5, newyork_merged.shape[1]))]]

Unnamed: 0,Hood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Wakefield,Pharmacy,Ice Cream Shop,Donut Shop,Sandwich Place,Gas Station,Dessert Shop,Deli / Bodega,Laundromat,Fish & Chips Shop,Financial or Legal Service
1,Co-op City,Fast Food Restaurant,Bus Station,Discount Store,Baseball Field,Park,Grocery Store,Bagel Shop,Pharmacy,Mattress Store,Pizza Place
2,Eastchester,Bus Station,Caribbean Restaurant,Deli / Bodega,Diner,Donut Shop,Seafood Restaurant,Pizza Place,Platform,Bus Stop,Cosmetics Shop
3,Fieldston,Plaza,Bus Station,Field,English Restaurant,Entertainment Service,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Factory
4,Riverdale,Park,Bus Station,Bank,Plaza,Gym,Home Service,Baseball Field,Food Truck,Fish Market,Factory
...,...,...,...,...,...,...,...,...,...,...,...
299,Kingsbridge Heights,Pizza Place,Bus Station,Spanish Restaurant,Food Truck,Coffee Shop,Park,Pharmacy,Salon / Barbershop,Sandwich Place,Grocery Store
300,Erasmus,Caribbean Restaurant,Yoga Studio,Grocery Store,Mobile Phone Shop,Furniture / Home Store,Supermarket,Music Venue,Bus Line,Chinese Restaurant,Bank
301,Hudson Yards,Hotel,Gym / Fitness Center,Italian Restaurant,American Restaurant,Dog Run,Restaurant,Coffee Shop,Gym,Café,Park
302,Hammels,Beach,Neighborhood,Deli / Bodega,Dog Run,Diner,Fast Food Restaurant,Bus Station,Bus Stop,Gym / Fitness Center,Shoe Store


The biggest group of NYC, as we can see, do not have as many restaurants and bars in the first places as Toronto or Paris. We can see that the functionalities seem to be more diversed. Every neighborhood seems to have a bit of everything. My experience in NYC confirms the above observation, each neighborhood in NYC functions like a city of its own. 

In [220]:
newyork_merged.loc[newyork_merged['Cluster Labels'] == 1, newyork_merged.columns[[1] + list(range(5, newyork_merged.shape[1]))]]

Unnamed: 0,Hood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
183,Jamaica Estates,Indian Restaurant,Dog Run,Yoga Studio,Fast Food Restaurant,English Restaurant,Entertainment Service,Ethiopian Restaurant,Event Service,Event Space,Exhibit
202,Grymes Hill,Dog Run,Yoga Studio,Fast Food Restaurant,English Restaurant,Entertainment Service,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Factory


In [221]:
newyork_merged.loc[newyork_merged['Cluster Labels'] == 2, newyork_merged.columns[[1] + list(range(5, newyork_merged.shape[1]))]]

Unnamed: 0,Hood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
76,Mill Island,Pool,Yoga Studio,Fast Food Restaurant,Empanada Restaurant,English Restaurant,Entertainment Service,Ethiopian Restaurant,Event Service,Event Space,Exhibit
238,Butler Manor,Baseball Field,Pool,Convenience Store,Yoga Studio,Fast Food Restaurant,English Restaurant,Entertainment Service,Ethiopian Restaurant,Event Service,Event Space


In [222]:
newyork_merged.loc[newyork_merged['Cluster Labels'] == 3, newyork_merged.columns[[1] + list(range(5, newyork_merged.shape[1]))]]

Unnamed: 0,Hood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
27,Clason Point,Park,Grocery Store,Boat or Ferry,Pool,Bus Stop,South American Restaurant,Yoga Studio,Farm,Entertainment Service,Ethiopian Restaurant
192,Somerville,Park,Yoga Studio,Electronics Store,English Restaurant,Entertainment Service,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Factory
203,Todt Hill,Park,Yoga Studio,Electronics Store,English Restaurant,Entertainment Service,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Factory
303,Bayswater,Playground,Park,Yoga Studio,English Restaurant,Entertainment Service,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Factory


The last 3 groups are quite similar from what we can tell. Many parks, yoga studio, are the most popular. And the last few spots are occupied by Exhibit, Event space, Factory. These neighborhoods clearly are more spatial or less busy than the first group.

In [223]:
newyork_merged.loc[newyork_merged['Cluster Labels'] == 4, newyork_merged.columns[[1] + list(range(5, newyork_merged.shape[1]))]]

Unnamed: 0,Hood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
77,Manhattan Beach,Café,Bus Stop,Harbor / Marina,Ice Cream Shop,Beach,Sandwich Place,Playground,Pizza Place,Food,Fish & Chips Shop
89,Ocean Hill,Deli / Bodega,Bus Stop,Food,Supermarket,Grocery Store,Donut Shop,Fried Chicken Joint,Southern / Soul Food Restaurant,Mexican Restaurant,Check Cashing Service
150,Whitestone,Bubble Tea Shop,Dance Studio,Deli / Bodega,Candy Store,Yoga Studio,Field,Ethiopian Restaurant,Event Service,Event Space,Exhibit
172,Breezy Point,Trail,Beach,Bus Stop,Monument / Landmark,Yoga Studio,Field,Ethiopian Restaurant,Event Service,Event Space,Exhibit
193,Brookville,Deli / Bodega,Yoga Studio,Field,English Restaurant,Entertainment Service,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Factory
198,New Brighton,Bus Stop,Park,Deli / Bodega,Laundromat,Bowling Alley,Playground,Discount Store,Yoga Studio,Farm,Farmers Market
204,South Beach,Pier,Beach,Deli / Bodega,Bus Stop,Athletics & Sports,Yoga Studio,Fast Food Restaurant,Ethiopian Restaurant,Event Service,Event Space
205,Port Richmond,Rental Car Location,Bus Stop,Donut Shop,Bar,Food & Drink Shop,Farm,English Restaurant,Entertainment Service,Ethiopian Restaurant,Event Service
206,Mariner's Harbor,Italian Restaurant,Deli / Bodega,Bus Stop,Field,Entertainment Service,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Factory
212,Oakwood,Bar,Bus Stop,Lawyer,Fast Food Restaurant,English Restaurant,Entertainment Service,Ethiopian Restaurant,Event Service,Event Space,Exhibit


The second biggest group lies some what in between the biggest and the smaller ones. We can see more restaurants, bakery than the last bunch of neighborhoods. But we find also plenty of Exhibit, Event service, unlike in the biggest group. 

In conclusion, from what we discovered so far, we can see that New York city has neighborhoods that are more uniform and complete in their functionalities. Whereas Toronto and Paris both seem to have more restaurants as popular venues. As the result of such preliminary study, we can draw the conslusion that Toronto probably shares more in common with Paris than with New York City. However, the popularity of French restaurants and hotels in Paris is unique, we do not see any cuisine being so dominant in the other two city. Such popularity demonstrates the important role of tourism in Paris. 