# Capstone Project - The Battle of the Neighborhoods
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
1. [Introduction: Business Problem](#introduction)
2. [Data](#data)
3. [Methodology](#methodology)
4. [Results](#results)
5. [Discussion and Conclusion](#conclusion)



## 1. Introduction: Business Problem <a name="introduction"></a>

The objective of this project is to compare the neighbourhoods of two major cities: **London, the UK** and **Toronto, Canada**. In this project, I will focus on downtown Toronto and the western central London. By exploring the most common venues in each neighbourhood, I am trying to identify **the differences between the European and North American cities**, which may reflect *different city designs, lifestyles and cultures.*

This project might be interesting for:
* Students who want to study abroad in either North America or Europe
* Adults who are considering working abroad
* Travellers who are looking for their next destinations
* Researchers in the field of urban studies/human geography

## 2. Data <a name="data"></a>

I will use the following datasets to collect the information needed for this project.


* The postal codes of western central London will be obtained from https://en.wikipedia.org/wiki/WC_postcode_area.
* The postal codes of downtown Toronto will be obtained from https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M.
* The geographical coordinates of each neighbourhood will be obtained using **Python Geocoder package**.
* The types and locations of venues in each neighborhood will be obtained using **Foursquare API**.

### 2.1. Gather the postal codes of western central London

In [1]:
import pandas as pd
from bs4 import BeautifulSoup
import requests

In [2]:
# Scrape the wikipedia page
source1 = requests.get('https://en.wikipedia.org/wiki/WC_postcode_area').text
soup1 = BeautifulSoup(source1,'lxml')

table1 = soup1.find('table',{'class':'wikitable sortable'})

In [3]:
# Iteration: loop through the rows to get the data
PostalCode =[]
PostTown = []
Neighbourhood = []

for row in table1.findAll("tr"):
    cells = row.findAll("th")
    if len(cells) == 1:
        PostalCode.append(cells[0].find(text=True))
    
    cells = row.findAll("td")
    if len(cells) == 3: 
        PostTown.append(cells[0].find(text=True))
        Neighbourhood.append(cells[1].find(text=True))

london = pd.DataFrame(PostalCode, columns = ['PostalCode'])
london['PostTown'] = PostTown
london['Neighbourhood'] = Neighbourhood
london.head()

Unnamed: 0,PostalCode,PostTown,Neighbourhood
0,WC1A,LONDON,New Oxford Street
1,WC1B,LONDON,Bloomsbury
2,WC1E,LONDON,University College London
3,WC1H,LONDON,St Pancras
4,WC1N,LONDON,Russell Square


In [4]:
# Change 'Kings Cross'to 'Kings Cross Station' for clarity
london['Neighbourhood'] = london['Neighbourhood'].replace('Kings Cross','Kings Cross Station')

### 2.2 Get the latitudes and longitudes for each neighbourhood in western central London

In [5]:
from geopy.geocoders import Nominatim

In [6]:
Latitude = []
Longitude = []

for i in london['Neighbourhood']:
    geolocator = Nominatim(user_agent="ld_explorer")
    location = geolocator.geocode(i)
    
    latitude = location.latitude
    Latitude.append(latitude)
    
    longitude = location.longitude
    Longitude.append(longitude)
    
london['Latitude'] = Latitude
london['Longitude'] = Longitude
london.head()

Unnamed: 0,PostalCode,PostTown,Neighbourhood,Latitude,Longitude
0,WC1A,LONDON,New Oxford Street,51.517302,-0.123046
1,WC1B,LONDON,Bloomsbury,51.523126,-0.126066
2,WC1E,LONDON,University College London,51.523161,-0.128204
3,WC1H,LONDON,St Pancras,53.316558,-6.28224
4,WC1N,LONDON,Russell Square,51.521699,-0.126074


In [7]:
# Drop 'St Pancras' and 'Charing Cross' which are far away from other neighbourhoods
london = london.drop(london.index[3])
london = london.drop(london.index[11])
london.head()

Unnamed: 0,PostalCode,PostTown,Neighbourhood,Latitude,Longitude
0,WC1A,LONDON,New Oxford Street,51.517302,-0.123046
1,WC1B,LONDON,Bloomsbury,51.523126,-0.126066
2,WC1E,LONDON,University College London,51.523161,-0.128204
4,WC1N,LONDON,Russell Square,51.521699,-0.126074
5,WC1R,LONDON,Gray's Inn,51.518938,-0.112812


### 2.3 Gather the postal codes of downtown Toronto

In [8]:
# Scrape the wikipedia page
source2 = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup2 = BeautifulSoup(source2,'lxml')

table2 = soup2.find('table',{'class':'wikitable sortable'})

In [9]:
# Iteration: loop through the rows to get the data
PostalCode =[]
Borough = []
Neighbourhood =[]

for row in table2.findAll("tr"):
    cells = row.findAll("td")
    if len(cells) == 3:
        PostalCode.append(cells[0].find(text=True))
        Borough.append(cells[1].find(text=True))
        Neighbourhood.append(cells[2].find(text=True))
        
toronto = pd.DataFrame(PostalCode, columns = ['PostalCode'])
toronto['Borough'] = Borough
toronto['Neighbourhood'] = Neighbourhood
toronto.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


#### Clean data

In [10]:
# 1. Remove cells with a borough that is 'Not assigned'
condition = toronto.Borough == 'Not assigned'
toronto = toronto.drop(toronto[condition].index, axis = 0, inplace = False)

In [11]:
# 2. For cells with a 'Not assigned' neighborhood, replace the neighborhood with the borough.
toronto['Neighbourhood'] = toronto['Neighbourhood'].str.strip()

import numpy as np
toronto['Neighbourhood'] = np.where(toronto['Neighbourhood'] =='Not assigned', toronto['Borough'], toronto['Neighbourhood'])

In [12]:
# 3. Combine Neighbourhood with the same postal code
toronto2 = pd.DataFrame(toronto.groupby(['PostalCode','Borough'], as_index = False).agg(', '.join))

In [13]:
toronto2.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


### 2.4 Get the latitudes and longitudes for each neighbourhood in downtown Toronto

In [14]:
geodata = pd.read_csv('https://cocl.us/Geospatial_data')

In [15]:
toronto3 = pd.concat([toronto2, geodata], axis=1).drop('Postal Code',axis = 1)
toronto3.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [16]:
# We will focus on downtown Toronto.
dt_trt = toronto3[toronto3['Borough'] == 'Downtown Toronto'].reset_index(drop=True)
dt_trt.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M4W,Downtown Toronto,Rosedale,43.679563,-79.377529
1,M4X,Downtown Toronto,"Cabbagetown, St. James Town",43.667967,-79.367675
2,M4Y,Downtown Toronto,Church and Wellesley,43.66586,-79.38316
3,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636
4,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937


### 2.5 Now I have two cleaned datasets of neighourhoods and their coordinates in central London and downtown Toronto.

The dataset of central London is called **london**.

In [62]:
london.head()

Unnamed: 0,PostalCode,PostTown,Neighbourhood,Latitude,Longitude
0,WC1A,LONDON,New Oxford Street,51.517302,-0.123046
1,WC1B,LONDON,Bloomsbury,51.523126,-0.126066
2,WC1E,LONDON,University College London,51.523161,-0.128204
4,WC1N,LONDON,Russell Square,51.521699,-0.126074
5,WC1R,LONDON,Gray's Inn,51.518938,-0.112812


The dataset of downtown Toronto is called **dt_trt**.

In [61]:
dt_trt.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M4W,Downtown Toronto,Rosedale,43.679563,-79.377529
1,M4X,Downtown Toronto,"Cabbagetown, St. James Town",43.667967,-79.367675
2,M4Y,Downtown Toronto,Church and Wellesley,43.66586,-79.38316
3,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636
4,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937


## 3. Methodology <a name="methodology"></a>

After cleaning the data, I will first visualize all neighourhoods in the central London (using **folium map**) to take a closer look at their locations. 

Using the **Foursquare API**, I will then explore the top 100 venues that are in each neighbourhood within a radius of 500 meters. The coordinate and category of each venue is recorded in a dataset called ***london_venues***. 

By calculating the average frequency of occurrence of each category, I will identify the top 10 most common venues in each neighborhood, which are recorded in a dataset called ***london_neighborhoods_venues_sorted***.

Next, I will employ a machine learning algorithm called **K Means Clustering** to separate the neighbourhoods into three clusters, and visualize them on the map. I will then label each cluster based on its most common venues.

The same analysis will be performed on the dataset of downtown toronto to cluster its neighourhoods.

Finally, I will compare the neighbourhood clusters in these two cities, identify and discuss any difference/similarity.

### 3.1 Visualize all neighbourhoods in the western central London

In [19]:
address = 'London'

geolocator = Nominatim(user_agent="ld_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of London are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of London are 51.5073219, -0.1276474.


In [20]:
import folium

# Create map of London using latitude and longitude values
map_london = folium.Map(location=[latitude, longitude], zoom_start=10)

# Add markers to map
for lat, lng, borough, neighborhood in zip(london['Latitude'], london['Longitude'], london['PostTown'], london['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_london)  
    
map_london

### 3.2 Use the Foursquare API to get nearby venues in each neighborhood

In [21]:
CLIENT_ID = 'U10ZJS1TXTKPWOT5UBLAIVNARBH3AOVVEB3WWYAPBKERMBIG'
CLIENT_SECRET = '00RAH1VSAN5HGN5V1XX4ASG5MJR45NPHJRQ3R1LGMTQYK2AG'
VERSION = '20181110'

In [22]:
LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [23]:
london_venues = getNearbyVenues(names = london['Neighbourhood'],
                                latitudes = london['Latitude'],
                                longitudes = london['Longitude'])       

New Oxford Street
Bloomsbury
University College London
Russell Square
Gray's Inn
High Holborn
Kings Cross Station
Lincoln's Inn Fields
Drury Lane
Covent Garden
Leicester Square
Somerset House


In [24]:
london_venues.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,New Oxford Street,51.517302,-0.123046,The Hoxton Holborn,51.517145,-0.12203,Hotel
1,New Oxford Street,51.517302,-0.123046,Abeno,51.517447,-0.125168,Okonomiyaki Restaurant
2,New Oxford Street,51.517302,-0.123046,Drury 188-189,51.51606,-0.124111,Café
3,New Oxford Street,51.517302,-0.123046,London Review Bookshop,51.518485,-0.124369,Bookstore
4,New Oxford Street,51.517302,-0.123046,Top Secret Comedy,51.515384,-0.123202,Comedy Club


In [25]:
# Count how many venues were returned for each neighborhood
london_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bloomsbury,96,96,96,96,96,96
Covent Garden,100,100,100,100,100,100
Drury Lane,100,100,100,100,100,100
Gray's Inn,91,91,91,91,91,91
High Holborn,100,100,100,100,100,100
Kings Cross Station,82,82,82,82,82,82
Leicester Square,100,100,100,100,100,100
Lincoln's Inn Fields,100,100,100,100,100,100
New Oxford Street,100,100,100,100,100,100
Russell Square,88,88,88,88,88,88


In [26]:
print('There are {} uniques categories of all the returned venues.'.format(len(london_venues['Venue Category'].unique())))

There are 165 uniques categories of all the returned venues.


### 3.3 Get the top 10 most common venues in each neighborhood

In [27]:
london_onehot = pd.get_dummies(london_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
london_onehot['Neighbourhood'] = london_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [london_onehot.columns[-1]] + list(london_onehot.columns[:-1])
london_onehot = london_onehot[fixed_columns]

london_onehot.head()

Unnamed: 0,Neighbourhood,Accessories Store,African Restaurant,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Train Station,Turkish Restaurant,Used Bookstore,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,New Oxford Street,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,New Oxford Street,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,New Oxford Street,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,New Oxford Street,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,New Oxford Street,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


#### Calculate the average frequency of occurrence of each category

In [28]:
london_grouped = london_onehot.groupby('Neighbourhood').mean().reset_index()
london_grouped.head()

Unnamed: 0,Neighbourhood,Accessories Store,African Restaurant,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Train Station,Turkish Restaurant,Used Bookstore,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,Bloomsbury,0.0,0.0,0.0,0.0,0.0,0.0,0.010417,0.0,0.0,...,0.0,0.010417,0.010417,0.0,0.0,0.0,0.0,0.0,0.010417,0.0
1,Covent Garden,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.01
2,Drury Lane,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.01,0.01
3,Gray's Inn,0.0,0.0,0.0,0.010989,0.010989,0.0,0.0,0.010989,0.010989,...,0.0,0.0,0.0,0.021978,0.021978,0.010989,0.010989,0.0,0.0,0.0
4,High Holborn,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01


#### Get the top 10 most common venues  in each neighborhood

In [29]:
# Sort the venues in descending order

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [60]:
import numpy as np

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
london_neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
london_neighborhoods_venues_sorted['Neighbourhood'] = london_grouped['Neighbourhood']

for ind in np.arange(london_grouped.shape[0]):
    london_neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(london_grouped.iloc[ind, :], num_top_venues)

london_neighborhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bloomsbury,Coffee Shop,Café,Hotel,Pub,Exhibit,Bookstore,Park,Sandwich Place,Plaza,Bar
1,Covent Garden,Theater,Coffee Shop,Dessert Shop,Clothing Store,Indian Restaurant,Italian Restaurant,Bakery,Ice Cream Shop,Restaurant,French Restaurant
2,Drury Lane,Theater,Clothing Store,Restaurant,Coffee Shop,Indian Restaurant,French Restaurant,Dessert Shop,Japanese Restaurant,Italian Restaurant,Ice Cream Shop
3,Gray's Inn,Coffee Shop,Pub,Korean Restaurant,Restaurant,Gym / Fitness Center,Bar,French Restaurant,Sandwich Place,Bookstore,Beer Bar
4,High Holborn,Pub,Coffee Shop,Hotel,Restaurant,Café,Bookstore,Italian Restaurant,Korean Restaurant,Japanese Restaurant,Gastropub
5,Kings Cross Station,Pub,Hotel,Coffee Shop,Burger Joint,Café,Pizza Place,Bookstore,English Restaurant,Plaza,Restaurant
6,Leicester Square,Theater,Ice Cream Shop,Italian Restaurant,Hotel,Cocktail Bar,Wine Bar,Plaza,Pub,Coffee Shop,Steakhouse
7,Lincoln's Inn Fields,Pub,Coffee Shop,Restaurant,Hotel,Theater,Bar,Japanese Restaurant,Sushi Restaurant,Café,Plaza
8,New Oxford Street,Coffee Shop,Hotel,Theater,Exhibit,Restaurant,Ramen Restaurant,Pub,Burger Joint,Plaza,Gift Shop
9,Russell Square,Coffee Shop,Café,Pub,Bookstore,Plaza,Hotel,Exhibit,Park,Bar,Sandwich Place


### 3.4 Cluster neighbourhoods in the central London

In [31]:
from sklearn.cluster import KMeans

In [32]:
london_grouped_clustering = london_grouped.drop('Neighbourhood', 1)

kclusters = 3
kmeans = KMeans(n_clusters = kclusters, random_state = 0).fit(london_grouped_clustering)

kmeans.labels_[0:10] # cluster labels generated for each row in the dataframe

array([1, 0, 0, 2, 2, 2, 0, 2, 0, 1], dtype=int32)

In [33]:
# add clustering labels
london_neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

london_merged = london
london_merged = london_merged.join(london_neighborhoods_venues_sorted.set_index('Neighbourhood'), 
                                   on ='Neighbourhood')

london_merged.head()

Unnamed: 0,PostalCode,PostTown,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,WC1A,LONDON,New Oxford Street,51.517302,-0.123046,0,Coffee Shop,Hotel,Theater,Exhibit,Restaurant,Ramen Restaurant,Pub,Burger Joint,Plaza,Gift Shop
1,WC1B,LONDON,Bloomsbury,51.523126,-0.126066,1,Coffee Shop,Café,Hotel,Pub,Exhibit,Bookstore,Park,Sandwich Place,Plaza,Bar
2,WC1E,LONDON,University College London,51.523161,-0.128204,1,Café,Coffee Shop,Exhibit,Hotel,Pub,Plaza,Bookstore,Bar,Sandwich Place,Park
4,WC1N,LONDON,Russell Square,51.521699,-0.126074,1,Coffee Shop,Café,Pub,Bookstore,Plaza,Hotel,Exhibit,Park,Bar,Sandwich Place
5,WC1R,LONDON,Gray's Inn,51.518938,-0.112812,2,Coffee Shop,Pub,Korean Restaurant,Restaurant,Gym / Fitness Center,Bar,French Restaurant,Sandwich Place,Bookstore,Beer Bar


### 3.5 Visualize the clusters in London

In [34]:
import matplotlib.cm as cm
import matplotlib.colors as colors

In [35]:
# create map
london_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(london_merged['Latitude'],london_merged['Longitude'],london_merged['Neighbourhood'],london_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(london_clusters)
       
london_clusters

### 3.6 Label the clusters in London

#### Cluster 1:  Theater

In [36]:
london_merged.loc[london_merged['Cluster Labels'] == 0, 
                  london_merged.columns[[1] + list(range(5, london_merged.shape[1]))]]

Unnamed: 0,PostTown,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,LONDON,0,Coffee Shop,Hotel,Theater,Exhibit,Restaurant,Ramen Restaurant,Pub,Burger Joint,Plaza,Gift Shop
9,LONDON,0,Theater,Clothing Store,Restaurant,Coffee Shop,Indian Restaurant,French Restaurant,Dessert Shop,Japanese Restaurant,Italian Restaurant,Ice Cream Shop
10,LONDON,0,Theater,Coffee Shop,Dessert Shop,Clothing Store,Indian Restaurant,Italian Restaurant,Bakery,Ice Cream Shop,Restaurant,French Restaurant
11,LONDON,0,Theater,Ice Cream Shop,Italian Restaurant,Hotel,Cocktail Bar,Wine Bar,Plaza,Pub,Coffee Shop,Steakhouse
13,LONDON,0,Theater,Coffee Shop,French Restaurant,Dessert Shop,Cocktail Bar,American Restaurant,Restaurant,Tea Room,Burger Joint,Indian Restaurant


#### Cluster 2: Cafe

In [37]:
london_merged.loc[london_merged['Cluster Labels'] == 1, 
                  london_merged.columns[[1] + list(range(5, london_merged.shape[1]))]]

Unnamed: 0,PostTown,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,LONDON,1,Coffee Shop,Café,Hotel,Pub,Exhibit,Bookstore,Park,Sandwich Place,Plaza,Bar
2,LONDON,1,Café,Coffee Shop,Exhibit,Hotel,Pub,Plaza,Bookstore,Bar,Sandwich Place,Park
4,LONDON,1,Coffee Shop,Café,Pub,Bookstore,Plaza,Hotel,Exhibit,Park,Bar,Sandwich Place


#### Cluster 3: Pub

In [38]:
london_merged.loc[london_merged['Cluster Labels'] == 2, 
                  london_merged.columns[[1] + list(range(5, london_merged.shape[1]))]]

Unnamed: 0,PostTown,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,LONDON,2,Coffee Shop,Pub,Korean Restaurant,Restaurant,Gym / Fitness Center,Bar,French Restaurant,Sandwich Place,Bookstore,Beer Bar
6,LONDON,2,Pub,Coffee Shop,Hotel,Restaurant,Café,Bookstore,Italian Restaurant,Korean Restaurant,Japanese Restaurant,Gastropub
7,LONDON,2,Pub,Hotel,Coffee Shop,Burger Joint,Café,Pizza Place,Bookstore,English Restaurant,Plaza,Restaurant
8,LONDON,2,Pub,Coffee Shop,Restaurant,Hotel,Theater,Bar,Japanese Restaurant,Sushi Restaurant,Café,Plaza


### Now I am going to perform the same analysis on downtown Toronto. 
### 3.7 Visualize all neighbourhoods in downtown Toronto

In [39]:
address = 'Downtown Toronto'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Downtown Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Downtown Toronto are 43.6541737, -79.3808116451341.


In [40]:
# Create map of Downtown Toronto using latitude and longitude values
map_dt_trt = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(dt_trt['Latitude'], dt_trt['Longitude'], dt_trt['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_dt_trt)  
    
map_dt_trt

### 3.8 Use the Foursquare API to get nearby venues in each neighborhood

In [41]:
dt_venues = getNearbyVenues(names = dt_trt['Neighbourhood'],
                            latitudes = dt_trt['Latitude'],
                            longitudes = dt_trt['Longitude'])   

Rosedale
Cabbagetown, St. James Town
Church and Wellesley
Harbourfront, Regent Park
Ryerson, Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide, King, Richmond
Harbourfront East, Toronto Islands, Union Station
Design Exchange, Toronto Dominion Centre
Commerce Court, Victoria Hotel
Harbord, University of Toronto
Chinatown, Grange Park, Kensington Market
CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place, Underground city
Christie


In [42]:
dt_venues.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Rosedale,43.679563,-79.377529,Rosedale Park,43.682328,-79.378934,Playground
1,Rosedale,43.679563,-79.377529,Whitney Park,43.682036,-79.373788,Park
2,Rosedale,43.679563,-79.377529,Alex Murray Parkette,43.6783,-79.382773,Park
3,Rosedale,43.679563,-79.377529,Milkman's Lane,43.676352,-79.373842,Trail
4,"Cabbagetown, St. James Town",43.667967,-79.367675,Cranberries,43.667843,-79.369407,Diner


In [43]:
# Count how many venues were returned for each neighborhood
dt_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",100,100,100,100,100,100
Berczy Park,56,56,56,56,56,56
"CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara",14,14,14,14,14,14
"Cabbagetown, St. James Town",46,46,46,46,46,46
Central Bay Street,82,82,82,82,82,82
"Chinatown, Grange Park, Kensington Market",100,100,100,100,100,100
Christie,16,16,16,16,16,16
Church and Wellesley,86,86,86,86,86,86
"Commerce Court, Victoria Hotel",100,100,100,100,100,100
"Design Exchange, Toronto Dominion Centre",100,100,100,100,100,100


In [44]:
print('There are {} uniques categories of all the returned venues.'.format(len(dt_venues['Venue Category'].unique())))

There are 207 uniques categories of all the returned venues.


In [45]:
dt_onehot = pd.get_dummies(dt_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
dt_onehot['Neighbourhood'] = dt_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [dt_onehot.columns[-1]] + list(dt_onehot.columns[:-1])
dt_onehot = dt_onehot[fixed_columns]

dt_onehot.head()

Unnamed: 0,Neighbourhood,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Rosedale,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Rosedale,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Rosedale,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Rosedale,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Cabbagetown, St. James Town",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### 3.9 Get the top 10 most common venues in each neighborhood

#### Calculate the average frequency of occurrence of each category

In [46]:
dt_grouped = dt_onehot.groupby('Neighbourhood').mean().reset_index()
dt_grouped.head()

Unnamed: 0,Neighbourhood,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,...,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"CN Tower, Bathurst Quay, Island airport, Harbo...",0.0,0.0,0.071429,0.071429,0.071429,0.142857,0.142857,0.142857,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Cabbagetown, St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012195,...,0.0,0.012195,0.0,0.012195,0.0,0.012195,0.0,0.0,0.0,0.012195


#### Get the top 10 most common venues  in each neighborhood

In [47]:
num_top_venues = 5

for hood in dt_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = dt_grouped[dt_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide, King, Richmond----
             venue  freq
0      Coffee Shop  0.06
1             Café  0.05
2  Thai Restaurant  0.04
3       Steakhouse  0.04
4              Bar  0.04


----Berczy Park----
                venue  freq
0         Coffee Shop  0.07
1        Cocktail Bar  0.05
2  Italian Restaurant  0.04
3          Restaurant  0.04
4                Café  0.04


----CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara----
              venue  freq
0  Airport Terminal  0.14
1    Airport Lounge  0.14
2   Airport Service  0.14
3     Boat or Ferry  0.07
4  Sculpture Garden  0.07


----Cabbagetown, St. James Town----
         venue  freq
0   Restaurant  0.09
1  Coffee Shop  0.09
2          Pub  0.04
3         Café  0.04
4  Pizza Place  0.04


----Central Bay Street----
                venue  freq
0         Coffee Shop  0.17
1  Italian Restaurant  0.05
2                Café  0.05
3                 Bar  0.04
4        Burger Joint

In [45]:
# Sort the venues in descending order

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [65]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
trt_neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
trt_neighborhoods_venues_sorted['Neighbourhood'] = dt_grouped['Neighbourhood']

for ind in np.arange(dt_grouped.shape[0]):
    trt_neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(dt_grouped.iloc[ind, :], num_top_venues)

trt_neighborhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Café,Steakhouse,Bar,Thai Restaurant,Burger Joint,Gym,Asian Restaurant,American Restaurant,Bakery
1,Berczy Park,Coffee Shop,Cocktail Bar,Seafood Restaurant,Restaurant,Bakery,Steakhouse,Cheese Shop,Café,Farmers Market,Italian Restaurant
2,"CN Tower, Bathurst Quay, Island airport, Harbo...",Airport Lounge,Airport Service,Airport Terminal,Harbor / Marina,Boat or Ferry,Airport,Airport Food Court,Airport Gate,Sculpture Garden,Plane
3,"Cabbagetown, St. James Town",Restaurant,Coffee Shop,Park,Pub,Italian Restaurant,Bakery,Café,Market,Pizza Place,Pharmacy
4,Central Bay Street,Coffee Shop,Café,Italian Restaurant,Bubble Tea Shop,Bar,Burger Joint,Spa,Chinese Restaurant,Sushi Restaurant,Salad Place


### 3.10 Cluster neighbourhoods in downtown Toronto

In [49]:
dt_grouped_clustering = dt_grouped.drop('Neighbourhood', 1)

kclusters = 6
kmeans = KMeans(n_clusters = kclusters, random_state = 0).fit(dt_grouped_clustering)

kmeans.labels_[0:10] # cluster labels generated for each row in the dataframe

array([4, 0, 3, 0, 4, 5, 2, 0, 4, 4], dtype=int32)

In [50]:
# add clustering labels
trt_neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

dt_merged = dt_trt

# merge dt_grouped with dt_trt to add latitude/longitude for each neighborhood
dt_merged = dt_merged.join(trt_neighborhoods_venues_sorted.set_index('Neighbourhood'), 
                           on ='Neighbourhood')

dt_merged.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4W,Downtown Toronto,Rosedale,43.679563,-79.377529,1,Park,Playground,Trail,Yoga Studio,Department Store,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant
1,M4X,Downtown Toronto,"Cabbagetown, St. James Town",43.667967,-79.367675,0,Restaurant,Coffee Shop,Park,Pub,Italian Restaurant,Bakery,Café,Market,Pizza Place,Pharmacy
2,M4Y,Downtown Toronto,Church and Wellesley,43.66586,-79.38316,0,Coffee Shop,Japanese Restaurant,Gay Bar,Sushi Restaurant,Restaurant,Bubble Tea Shop,Burger Joint,Café,Pub,Men's Store
3,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636,4,Coffee Shop,Café,Bakery,Park,Pub,Theater,Mexican Restaurant,Breakfast Spot,Shoe Store,Event Space
4,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937,0,Coffee Shop,Clothing Store,Café,Cosmetics Shop,Middle Eastern Restaurant,Ramen Restaurant,Tea Room,Burger Joint,Bubble Tea Shop,Pizza Place


### 3.11 Visualize the clusters in Toronto

In [51]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(dt_merged['Latitude'], dt_merged['Longitude'], dt_merged['Neighbourhood'], dt_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### 3.12 Label the clusters in London

#### Cluster 1: Cafe & Restaurant

In [52]:
dt_merged.loc[dt_merged['Cluster Labels'] == 0, 
              dt_merged.columns[[1] + list(range(5, dt_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Downtown Toronto,0,Restaurant,Coffee Shop,Park,Pub,Italian Restaurant,Bakery,Café,Market,Pizza Place,Pharmacy
2,Downtown Toronto,0,Coffee Shop,Japanese Restaurant,Gay Bar,Sushi Restaurant,Restaurant,Bubble Tea Shop,Burger Joint,Café,Pub,Men's Store
4,Downtown Toronto,0,Coffee Shop,Clothing Store,Café,Cosmetics Shop,Middle Eastern Restaurant,Ramen Restaurant,Tea Room,Burger Joint,Bubble Tea Shop,Pizza Place
5,Downtown Toronto,0,Coffee Shop,Restaurant,Hotel,Café,Breakfast Spot,Clothing Store,Park,Cocktail Bar,Gastropub,Cosmetics Shop
6,Downtown Toronto,0,Coffee Shop,Cocktail Bar,Seafood Restaurant,Restaurant,Bakery,Steakhouse,Cheese Shop,Café,Farmers Market,Italian Restaurant
15,Downtown Toronto,0,Coffee Shop,Restaurant,Café,Seafood Restaurant,Hotel,Cocktail Bar,Fast Food Restaurant,Italian Restaurant,Creperie,Cheese Shop


#### Cluster 2: Park

In [53]:
dt_merged.loc[dt_merged['Cluster Labels'] == 1, 
              dt_merged.columns[[1] + list(range(5, dt_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,1,Park,Playground,Trail,Yoga Studio,Department Store,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant


#### Cluster 3: Grocery Store

In [54]:
dt_merged.loc[dt_merged['Cluster Labels'] == 2, 
              dt_merged.columns[[1] + list(range(5, dt_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,Downtown Toronto,2,Grocery Store,Café,Park,Athletics & Sports,Italian Restaurant,Coffee Shop,Convenience Store,Diner,Nightclub,Restaurant


#### Cluster 4: Airport 

In [55]:
dt_merged.loc[dt_merged['Cluster Labels'] == 3, 
              dt_merged.columns[[1] + list(range(5, dt_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,Downtown Toronto,3,Airport Lounge,Airport Service,Airport Terminal,Harbor / Marina,Boat or Ferry,Airport,Airport Food Court,Airport Gate,Sculpture Garden,Plane


#### Cluster 5: Cafe

In [56]:
dt_merged.loc[dt_merged['Cluster Labels'] == 4, 
              dt_merged.columns[[1] + list(range(5, dt_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Downtown Toronto,4,Coffee Shop,Café,Bakery,Park,Pub,Theater,Mexican Restaurant,Breakfast Spot,Shoe Store,Event Space
7,Downtown Toronto,4,Coffee Shop,Café,Italian Restaurant,Bubble Tea Shop,Bar,Burger Joint,Spa,Chinese Restaurant,Sushi Restaurant,Salad Place
8,Downtown Toronto,4,Coffee Shop,Café,Steakhouse,Bar,Thai Restaurant,Burger Joint,Gym,Asian Restaurant,American Restaurant,Bakery
9,Downtown Toronto,4,Coffee Shop,Hotel,Aquarium,Italian Restaurant,Café,Scenic Lookout,Pizza Place,Brewery,Restaurant,History Museum
10,Downtown Toronto,4,Coffee Shop,Café,Hotel,American Restaurant,Italian Restaurant,Seafood Restaurant,Restaurant,Gastropub,Deli / Bodega,Lounge
11,Downtown Toronto,4,Coffee Shop,Café,Hotel,Restaurant,American Restaurant,Steakhouse,Seafood Restaurant,Deli / Bodega,Gastropub,Bakery
16,Downtown Toronto,4,Coffee Shop,Café,Hotel,Steakhouse,Restaurant,American Restaurant,Asian Restaurant,Bakery,Burger Joint,Bar


#### Cluster 6: Bar

In [57]:
dt_merged.loc[dt_merged['Cluster Labels'] == 5, 
              dt_merged.columns[[1] + list(range(5, dt_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,Downtown Toronto,5,Café,Japanese Restaurant,Bar,Bakery,Coffee Shop,Bookstore,Restaurant,Chinese Restaurant,Italian Restaurant,Beer Bar
13,Downtown Toronto,5,Bar,Café,Vegetarian / Vegan Restaurant,Chinese Restaurant,Coffee Shop,Vietnamese Restaurant,Bakery,Caribbean Restaurant,Mexican Restaurant,Cocktail Bar


## 4. Results <a name="results"></a>

#### Create a table to summarize the categories of 3 clusters in London and 6 clusters in Toronto.

In [58]:
Clusters = [1,2,3,4,5,6]
comparison = pd.DataFrame(Clusters, columns = ['Clusters'])

London = ['Theater','Cafe','Pub','-','-','-']
Toronto = ['Cafe & Restaurant','Park',' Grocery Store','Airport','Cafe','Bar']

comparison['London'] = London
comparison['Toronto'] = Toronto

comparison

Unnamed: 0,Clusters,London,Toronto
0,1,Theater,Cafe & Restaurant
1,2,Cafe,Park
2,3,Pub,Grocery Store
3,4,-,Airport
4,5,-,Cafe
5,6,-,Bar


## 5. Discussion and Conclusion <a name="conclusion"></a>

The clustering result reveals that London and Toronto are very similar based on the most common venues in their neighbouhoods. 

Both cities have a lot of coffee shops, which is probably true in most western countries. Also, both cities have a wide variety of restaurants, ranging from Italian and French to Japanese and Chinese restaurants. This reflects the fact that both cities are culturally diverse. Different cultures are celebrated and embraced in both cities. Therefore, if you are considering studying or working abroad in either London or Toronto, you may not worry too much about the cultural issues. It is very likely that you will find some signs of your own culture, such as a restaurant which provides food from your hometown. 

However, there does exist some differences between London and Toronto. 

First, Toronto tends to have more parks than London does. This is a very positive sign, especially for a large crowded city like Toronto. If you are thinking about living abroad for a long period of time, the living environment is an important factor to consider.

Second, London tends to have more theatres, exhibits and bookstores than Toronto does. As we all know, London is famous for its rich history, cultures and arts, so I am not surprised to discover this difference. For people who are interested in history or arts, London is an ideal place to experience and learn the European culture. 

With increasing globalization, major cities around the world tend to become more similar in terms of city designs. However, they still have unique history backgrounds and cultures, which make them different from each other to some extent. For researchers in the field of urban studies, I hope this project can provide you with additional insights into the difference between the European and North American cities. 