<h1 align="center"> Clustering and Comparsion of London and New York City - Code</h1>

Tim Chen

16th January 2021

##### Note: Detailed explanation is written in the report. This notebook contains complete code and brief explanation

# 1. Introduction

  London and New York City are two of the most influential and populous international cities in the world. As two megacities, London and New York City have diverse demographics and cultures, which contribute to their different neighborhoods and living experiences.
  
  In order to compare the unique living experience of the two cities, this project clusters the neighborhoods of London and New York City to give a comprehensive report of the difference and variety between their neighborhoods.

# 2. Business Problem

The purpose of this project is to provide a detailed comparison for those who are interested in learning about London and New York City. This project also helps tourists to make decisions about their destination depending on their preference of the neighborhoods. It is also a way for those who are considering migration to London or New York City to make their decisions. For investors and stakeholders, this project can provide them with more insights on the neighborhoods, and help them make decisions on their future investment locations.

# 3. Data Description

To make comparison between London and NYC, location (geographical) data is needed for both cities. To acquire detailed venue and neighborhood data, several other APIs are also needed for this project.

## 3.1  New York City Data

The data available for New York City can be found in the Json file located at https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json

The Json file contains all necessary informations of New York City for this project, including:

1. *Borough*: The name of the Borough (district) of NYC
2. *Neighborhood*: The name of neighborhoods of NYC
3. *Latitude*: The corresponding latitude of a specific neighborhood
4. *Longitude*: The corresponding longitude of a specific neighborhood

## 3.2 London Data

The data available for London can be found in the table of the wiki page https://en.wikipedia.org/wiki/List_of_areas_of_London.

The table contains borough and neighborhood data that needed for this project:

1. *Location*: The name of neighborhood of London
2. *London borough*: The name of borough (district) of London

However, longitude and latitude data for London neighborhood are not included in the wiki page. To solve this problem, we use the Geocoder Python package to derive those information.

## 3.3 Geocoder Python Package

Geocoder is a simple and consistent geocoding library written in Python. Dealing with multiple different geocoding provider such as Google, Bing, OSM & many more has never been easier.

Nominatim is a tool from Geocoder package that returns latitude and longitude of a specific location given its name. Using Nominatim, we are able to derive latitude and longitude data for London neighborhoods.

## 3.4 Foursquare API

After acquiring necessary neighborhood location data for both cities, we also need detailed venue data for those neighborhoods.

Foursquare is a social location service that allows users to explore the world around them. Foursquare API contains a massive dataset of accurate location data. It contains all sets of information needed for this project, including venue names, venue category, venue longitude, and venue latitude.

After collecting requisite latitude and longtitude data for all neighborhoods, we pass those information, with our credentials and maximum radius (we choose 500) to Foursquare API. Then, the API will provide us with the following information:

1. *Venue*: The name of a specific venue
2. *Venue Latitude*: The latitude of a specific venue
3. *Venue Longitude*: The longitude of a specific venue
4. *Venue Category*: The category of a specific venue, such as "Pizza Place" and "Coffee Shop"

A sample data derived from the API is shown below:

![image.png](attachment:image.png)

#  4. Methodology

## 4.1 Importing and Downloading Necessary Packages 

In [5]:
import pandas as pd 
import json 
!conda install -c conda-forge geopy --yes 

from geopy.geocoders import Nominatim 
import requests
from pandas.io.json import json_normalize

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

!pip install folium
import folium # map rendering library

print('imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: \ 
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
                                                                                                            /failed

UnsatisfiableError: The following specifications were found
to be incompatible with the existing python installation in your environment:

Specifications:

  - cffi -> python[version='2.7.*|3.5.*|3.6.*|3.6.12|3.6.12|>=3.6,<3.7.0a0|>=3.7,<3.8.0a0|>=3.9,<3.10.0a0|>=3.8,<3.9.0a0|3.7.9|3.6.9|3.6.9|3.6.9|>=2.7,<2.8.0a0|3.6.9|>=3.5,<3.6.0a0|3.4.*',build='2_73_pypy|4_73_

Collecting folium
  Downloading folium-0.12.0-py2.py3-none-any.whl (94 kB)
[K     |████████████████████████████████| 94 kB 5.8 MB/s  eta 0:00:01
Collecting branca>=0.3.0
  Downloading branca-0.4.2-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.2 folium-0.12.0
imported.


## 4.2 Data Collection
### 4.2.1. New York City

Download the Json file and load data

In [6]:
!wget -q -O 'newyork_data.json' https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json
print('Data downloaded!')

Data downloaded!


In [8]:
import json as json
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

All data needed for this project is in the features key, declare a new variable to save the data

In [9]:
nyc_neigh_data = newyork_data['features']
nyc_neigh_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

#### Transform the data into pandas dataframe

In [10]:
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 
nyc_neighborhoods = pd.DataFrame(columns=column_names)

for data in nyc_neigh_data:
    #get borough
    borough = data['properties']['borough'] 
    #get neighborhood name
    nyc_neighborhoods_name = data['properties']['name']
    #get latitude and longtitude    
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    #insert into the data frame
    nyc_neighborhoods= nyc_neighborhoods.append({'Borough': borough,
                                          'Neighborhood': nyc_neighborhoods_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

nyc_neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


### 4.2.2. London

First, use requests to get all information from the wiki page

In [11]:
london_url = requests.get('https://en.wikipedia.org/wiki/List_of_areas_of_London')

london_data = pd.read_html(london_url.text)
london_data

[                                                   0
 0  Map all coordinates in "Category:Areas of Lond...
 1                 Download coordinates as: KML · GPX,
             Location                     London borough       Post town  \
 0         Abbey Wood              Bexley, Greenwich [7]          LONDON   
 1              Acton  Ealing, Hammersmith and Fulham[8]          LONDON   
 2          Addington                         Croydon[8]         CROYDON   
 3         Addiscombe                         Croydon[8]         CROYDON   
 4        Albany Park                             Bexley  BEXLEY, SIDCUP   
 ..               ...                                ...             ...   
 526         Woolwich                          Greenwich          LONDON   
 527   Worcester Park       Sutton, Kingston upon Thames  WORCESTER PARK   
 528  Wormwood Scrubs             Hammersmith and Fulham          LONDON   
 529          Yeading                         Hillingdon           HAYES   
 

The second table is the one we need

In [12]:
london_neigh= london_data[1]
london_neigh.head()

Unnamed: 0,Location,London borough,Post town,Postcode district,Dial code,OS grid ref
0,Abbey Wood,"Bexley, Greenwich [7]",LONDON,SE2,20,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4",20,TQ205805
2,Addington,Croydon[8],CROYDON,CR0,20,TQ375645
3,Addiscombe,Croydon[8],CROYDON,CR0,20,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",20,TQ478728


## 4.3. Data Preprocessing
The two tables are created, but the names of their corresponding attributes are different. We need to change "Location" and "London borough" attributes to "Neighborhood" and "Borough". Then, we need to exchange the position of these two columns. We also need to remove "[]" for the second table.

Change column names

In [13]:
london_neigh.columns=['Neighborhood','Borough','Post_town','Postcode_district','Dial_code',"OS_grid_ref"]
london_neigh.head()

Unnamed: 0,Neighborhood,Borough,Post_town,Postcode_district,Dial_code,OS_grid_ref
0,Abbey Wood,"Bexley, Greenwich [7]",LONDON,SE2,20,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4",20,TQ205805
2,Addington,Croydon[8],CROYDON,CR0,20,TQ375645
3,Addiscombe,Croydon[8],CROYDON,CR0,20,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",20,TQ478728


Remove "[]"

In [14]:
london_neigh['Borough'] = london_neigh['Borough'].map(lambda x: x.rstrip(']').rstrip('0123456789').rstrip('['))
london_neigh.head()

Unnamed: 0,Neighborhood,Borough,Post_town,Postcode_district,Dial_code,OS_grid_ref
0,Abbey Wood,"Bexley, Greenwich",LONDON,SE2,20,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4",20,TQ205805
2,Addington,Croydon,CROYDON,CR0,20,TQ375645
3,Addiscombe,Croydon,CROYDON,CR0,20,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",20,TQ478728


Exchange the first two columns

In [15]:
london_neigh=london_neigh.reindex(columns=['Borough','Neighborhood','Post_town','Postcode_district','Dial_code','OS_grid_ref'])
london_neigh.head()

Unnamed: 0,Borough,Neighborhood,Post_town,Postcode_district,Dial_code,OS_grid_ref
0,"Bexley, Greenwich",Abbey Wood,LONDON,SE2,20,TQ465785
1,"Ealing, Hammersmith and Fulham",Acton,LONDON,"W3, W4",20,TQ205805
2,Croydon,Addington,CROYDON,CR0,20,TQ375645
3,Croydon,Addiscombe,CROYDON,CR0,20,TQ345665
4,Bexley,Albany Park,"BEXLEY, SIDCUP","DA5, DA14",20,TQ478728


## 4.4. Feature Selection
The data frame for London contains unnecessary columns, such as Post_town, Postcode_district, Dial_code, and OS_grid_ref. We only need Borough and Neighborhood so far, so we need to drop other columns

In [16]:
df_nyc = nyc_neighborhoods

df_lon = london_neigh.drop( ['Post_town','Postcode_district','Dial_code','OS_grid_ref'], axis=1)
df_lon.head(6)

Unnamed: 0,Borough,Neighborhood
0,"Bexley, Greenwich",Abbey Wood
1,"Ealing, Hammersmith and Fulham",Acton
2,Croydon,Addington
3,Croydon,Addiscombe
4,Bexley,Albany Park
5,Redbridge,Aldborough Hatch


## 4.5. Feature Engineering

So far, the data frame of London only contains borough and neighborhood. Longitude and Latitude are not included in the data frame. We need to use Geocoder to add columns of longitude and latitude to the table.

First, add latitude and longitude columns to the table

In [17]:
df_lon["Latitude"]=''
df_lon["Longitude"]=''
df_lon.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,"Bexley, Greenwich",Abbey Wood,,
1,"Ealing, Hammersmith and Fulham",Acton,,
2,Croydon,Addington,,
3,Croydon,Addiscombe,,
4,Bexley,Albany Park,,


Use Geocode to get latitude and longitude for each neighborhood, and assign them to the table. If Geocode is unable to derive latitude and longitude for a specific neighborhood, simply leave it empty

In [18]:
row=0 #531 rows
for nei, bor in zip(df_lon['Neighborhood'], df_lon['Borough']):
    #make the address
    address='{},{}'.format(nei,bor)
    
    #use geolocater to get the location of a specific neighborhood
    geolocator = Nominatim(user_agent="lo_explorer")
    location = geolocator.geocode(address)
    
    i=0
    while(location is None and i<5): #If unable to get the location, retry 5 times at maximum
        location = geolocator.geocode(address)
        i=i+1
    
    if(not(location is None)):
        latitude = location.latitude
        longitude = location.longitude
    else:
        latitude = None
        longitude = None
        
    print('row{},latitude:{},longitude:{}.'.format(row,latitude, longitude)) 
    #assign values to the table
    df_lon.at[row, 'Latitude'] = latitude
    df_lon.at[row, 'Longitude'] = longitude
    row=row+1

print("Finished")

row0,latitude:51.4855716,longitude:0.11968682027131783.
row1,latitude:51.5066276,longitude:-0.2446946.
row2,latitude:44.4206405,longitude:-76.978248.
row3,latitude:51.3796916,longitude:-0.0742821.
row4,latitude:51.4353837,longitude:0.1259653.
row5,latitude:None,longitude:None.
row6,latitude:51.5142477,longitude:-0.0757186.
row7,latitude:51.5122004,longitude:-0.1188958.
row8,latitude:51.5408036,longitude:-0.3000963.
row9,latitude:51.4075993,longitude:-0.0619394.
row10,latitude:51.5318417,longitude:-0.1057137.
row11,latitude:51.3102034,longitude:0.0427001.
row12,latitude:51.5654371,longitude:-0.1349977.
row13,latitude:51.5841911,longitude:0.2209904.
row14,latitude:51.6525448,longitude:-0.2195744.
row15,latitude:51.6164024,longitude:-0.1332873.
row16,latitude:51.4456449,longitude:-0.1503643.
row17,latitude:51.5074991,longitude:-0.0993021.
row18,latitude:51.5201501,longitude:-0.0986832.
row19,latitude:51.5394838,longitude:0.0813821.
row20,latitude:51.5858181,longitude:0.0886245.
row21,lati

row169,latitude:51.4442232,longitude:-0.410659.
row170,latitude:51.5973246,longitude:-0.1805587.
row171,latitude:51.521798,longitude:-0.0914245.
row172,latitude:51.5648345,longitude:-0.1064144.
row173,latitude:51.5198652,longitude:-0.13479368989582974.
row174,latitude:51.4416793,longitude:0.150488.
row175,latitude:51.5495236,longitude:0.0249248.
row176,latitude:51.4392419,longitude:-0.0530903.
row177,latitude:51.3513283,longitude:-0.0387563.
row178,latitude:51.590997,longitude:-0.1534208.
row179,latitude:51.675772,longitude:-0.0314301.
row180,latitude:51.6128792,longitude:-0.1585948.
row181,latitude:51.5512024,longitude:-0.1804511.
row182,latitude:51.4808834,longitude:-0.1943493.
row183,latitude:51.4337482,longitude:-0.3496847.
row184,latitude:51.5929177,longitude:0.2130979.
row185,latitude:51.5765744,longitude:0.0653081.
row186,latitude:51.5811818,longitude:0.2059524.
row187,latitude:51.4245334,longitude:-0.0840424.
row188,latitude:51.3690444,longitude:0.1117289.
row189,latitude:51.57

row335,latitude:51.5196647,longitude:-0.2106959.
row336,latitude:51.5431601,longitude:0.2882425.
row337,latitude:51.4653247,longitude:-0.2865821.
row338,latitude:51.5004071,longitude:0.064154.
row339,latitude:51.5484582,longitude:-0.3695247.
row340,latitude:51.4711864,longitude:0.1611451.
row341,latitude:51.6051713,longitude:-0.4205812.
row342,latitude:51.4966058,longitude:-0.3693772.
row343,latitude:51.5109995,longitude:-0.2055267.
row344,latitude:51.4615309,longitude:-0.0535056.
row345,latitude:51.6376675,longitude:-0.1662251.
row346,latitude:51.344712799999996,longitude:-0.10279970200947028.
row347,latitude:51.5341124,longitude:-0.0268215.
row348,latitude:51.382484,longitude:-0.2590897.
row349,latitude:51.5279486,longitude:-0.2470894.
row350,latitude:51.3736037,longitude:0.0887195.
row351,latitude:51.6349223,longitude:-0.13703584913350678.
row352,latitude:51.4812336,longitude:-0.3521981.
row353,latitude:51.48375215,longitude:-0.11496182711601476.
row354,latitude:51.5170856,longitude

row502,latitude:51.5280966,longitude:0.0045685.
row503,latitude:51.5468194,longitude:-0.1899646.
row504,latitude:51.5795852,longitude:-0.3530692.
row505,latitude:51.4808745,longitude:0.1273574.
row506,latitude:51.578213,longitude:-0.2403793.
row507,latitude:51.4906622,longitude:-0.205916.
row508,latitude:51.4346192,longitude:-0.1036917.
row509,latitude:51.3758036,longitude:-0.0146843.
row510,latitude:51.4842137,longitude:0.0188049.
row511,latitude:51.5004439,longitude:-0.1265398.
row512,latitude:51.6301762,longitude:-0.1748844.
row513,latitude:51.5119347,longitude:-0.2242361.
row514,latitude:51.5186227,longitude:-0.0620807.
row515,latitude:None,longitude:None.
row516,latitude:51.4511693,longitude:-0.3579759.
row517,latitude:51.5493524,longitude:-0.222223.
row518,latitude:51.4220721,longitude:-0.2052902.
row519,latitude:51.6333948,longitude:-0.1033617.
row520,latitude:51.597416,longitude:-0.1097795.
row521,latitude:51.6068063,longitude:0.0340272.
row522,latitude:None,longitude:None.
row

In [19]:
df_lon.head(10)

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,"Bexley, Greenwich",Abbey Wood,51.4856,0.119687
1,"Ealing, Hammersmith and Fulham",Acton,51.5066,-0.244695
2,Croydon,Addington,44.4206,-76.9782
3,Croydon,Addiscombe,51.3797,-0.0742821
4,Bexley,Albany Park,51.4354,0.125965
5,Redbridge,Aldborough Hatch,,
6,City,Aldgate,51.5142,-0.0757186
7,Westminster,Aldwych,51.5122,-0.118896
8,Brent,Alperton,51.5408,-0.300096
9,Bromley,Anerley,51.4076,-0.0619394


Since Geocode is unable to find the exact latitude and longitude for some neighborhoods, we need to drop the rows with "None" latitude and longitude

In [20]:
df_lon.dropna(axis=0,inplace=True)
print(df_lon.shape)
df_lon.head(10)

(515, 4)


Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,"Bexley, Greenwich",Abbey Wood,51.4856,0.119687
1,"Ealing, Hammersmith and Fulham",Acton,51.5066,-0.244695
2,Croydon,Addington,44.4206,-76.9782
3,Croydon,Addiscombe,51.3797,-0.0742821
4,Bexley,Albany Park,51.4354,0.125965
6,City,Aldgate,51.5142,-0.0757186
7,Westminster,Aldwych,51.5122,-0.118896
8,Brent,Alperton,51.5408,-0.300096
9,Bromley,Anerley,51.4076,-0.0619394
10,Islington,Angel,51.5318,-0.105714


## 4.6. Visualization of the two Cities
After gathering all necessary data for the two cities, we generate two maps to help visualize the two cities. All neighborhoods are pointed on the two maps.

First, we have to gain the coordinates of the two cities. Use the same method as we have done above.

In [21]:
geolocator = Nominatim(user_agent="lo_explorer")
london_coor = geolocator.geocode('London, England')

london_lat = london_coor.latitude
london_lng = london_coor.longitude

print("London latitude: {}, London longitude:{}".format(london_lat,london_lng))

nyc_coor = geolocator.geocode('New York City, New York')

nyc_lat = nyc_coor.latitude
nyc_lng = nyc_coor.longitude

print("NYC latitude: {}, NYC longitude:{}".format(nyc_lat,nyc_lng))


London latitude: 51.5073219, London longitude:-0.1276474
NYC latitude: 40.7127281, NYC longitude:-74.0060152


### 4.6.1. London Map
We use folium package to generate a map for London and NYC

In [23]:
map_lon = folium.Map(location=[london_lat, london_lng], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_lon['Latitude'], df_lon['Longitude'], df_lon['Borough'], df_lon['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_lon)  
    
map_lon

If you are unable to see the map in Github, below is the screenshot of the map:

![d.png](attachment:d.png)

### 4.6.2. NYC Map
Use the same method to generate a map for New York City

In [24]:
map_nyc = folium.Map(location=[nyc_lat, nyc_lng], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_nyc['Latitude'], df_nyc['Longitude'], df_nyc['Borough'], df_nyc['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=label,
        color='Green',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_nyc)  
    
map_nyc

If you are unable to see the map in Github, below is the screenshot of the map:

![e.png](attachment:e.png)

## 4.7. Venues
After collecting all data for neighborhood and generating the two maps, we need to utilize Foursquare API to obtain all venues near a specific neighborhood.

### 4.7.1. Credential Setup
To setup the API, we need credential information

In [53]:
CLIENT_ID = 'MFSULYGUEDVYY1WBAESH33FHXYJ13EUGOFIS343WRWVWCKEX' # your Foursquare ID
CLIENT_SECRET = 'ESSFIQBT4CFTEWLDIXUD5H3OU1SYWPC1MRILSMGI4M2GPHYH' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: MFSULYGUEDVYY1WBAESH33FHXYJ13EUGOFIS343WRWVWCKEX
CLIENT_SECRET:ESSFIQBT4CFTEWLDIXUD5H3OU1SYWPC1MRILSMGI4M2GPHYH


### 4.7.2. Function Declaration
To ease the process of getting nearby venues, we define such a function:

In [54]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

### 4.7.3. London and NYC Venues
We call the function to get all London and NYC venues

In [55]:
london_venues = getNearbyVenues(names=df_lon['Neighborhood'],
                                   latitudes=df_lon['Latitude'],
                                   longitudes=df_lon['Longitude']
                                  )
london_venues.head()

Abbey Wood
Acton
Addington
Addiscombe
Albany Park
Aldgate
Aldwych
Alperton
Anerley
Angel
Aperfield
Archway
Ardleigh Green
Arkley
Arnos Grove
Balham
Bankside
Barbican
Barking
Barkingside
Barnehurst
Barnes
Barnes Cray
Barnet Gate
Barnsbury
Battersea
Bayswater
Beckenham
Beckton
Becontree
Becontree Heath
Beddington
Bedford Park
Belgravia
Bellingham
Belmont
Belmont
Belsize Park
Belvedere
Bermondsey
Berrylands
Bethnal Green
Bickley
Biggin Hill
Blackfen
Blackfriars
Blackheath
Blackheath Royal Standard
Blackwall
Blendon
Bloomsbury
Botany Bay
Bounds Green
Bow
Bowes Park
Brentford
Brent Cross
Brent Park
Brimsdown
Brixton
Brockley
Bromley
Bromley Common
Brompton
Brondesbury
Brunswick Park
Bulls Cross
Burnt Oak
Burroughs, The
Camberwell
Cambridge Heath
Camden Town
Canary Wharf
Canning Town
Canonbury
Carshalton
Castelnau
Castle Green
Catford
Chadwell Heath
Chalk Farm
Charing Cross
Charlton
Chase Cross
Cheam
Chelsea
Chelsfield
Chessington
Childs Hill
Chinatown
Chinbrook
Chingford
Chislehurst
Church 

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Abbey Wood,51.485572,0.119687,Co-op Food,51.48765,0.11349,Grocery Store
1,Abbey Wood,51.485572,0.119687,Abbey Wood Caravan Club,51.485502,0.120014,Campground
2,Acton,51.506628,-0.244695,Sufi Restaurant,51.50407,-0.243703,Middle Eastern Restaurant
3,Acton,51.506628,-0.244695,Princess Victoria,51.50653,-0.240915,Gastropub
4,Acton,51.506628,-0.244695,Cafe Paulo,51.506751,-0.248901,Breakfast Spot


In [56]:
nyc_venues = getNearbyVenues(names=df_nyc['Neighborhood'],
                                   latitudes=df_nyc['Latitude'],
                                   longitudes=df_nyc['Longitude']
                                  )
nyc_venues.head()

Wakefield
Co-op City
Eastchester
Fieldston
Riverdale
Kingsbridge
Marble Hill
Woodlawn
Norwood
Williamsbridge
Baychester
Pelham Parkway
City Island
Bedford Park
University Heights
Morris Heights
Fordham
East Tremont
West Farms
High  Bridge
Melrose
Mott Haven
Port Morris
Longwood
Hunts Point
Morrisania
Soundview
Clason Point
Throgs Neck
Country Club
Parkchester
Westchester Square
Van Nest
Morris Park
Belmont
Spuyten Duyvil
North Riverdale
Pelham Bay
Schuylerville
Edgewater Park
Castle Hill
Olinville
Pelham Gardens
Concourse
Unionport
Edenwald
Bay Ridge
Bensonhurst
Sunset Park
Greenpoint
Gravesend
Brighton Beach
Sheepshead Bay
Manhattan Terrace
Flatbush
Crown Heights
East Flatbush
Kensington
Windsor Terrace
Prospect Heights
Brownsville
Williamsburg
Bushwick
Bedford Stuyvesant
Brooklyn Heights
Cobble Hill
Carroll Gardens
Red Hook
Gowanus
Fort Greene
Park Slope
Cypress Hills
East New York
Starrett City
Canarsie
Flatlands
Mill Island
Manhattan Beach
Coney Island
Bath Beach
Borough Park
Dyker

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Wakefield,40.894705,-73.847201,Lollipops Gelato,40.894123,-73.845892,Dessert Shop
1,Wakefield,40.894705,-73.847201,Rite Aid,40.896649,-73.844846,Pharmacy
2,Wakefield,40.894705,-73.847201,Carvel Ice Cream,40.890487,-73.848568,Ice Cream Shop
3,Wakefield,40.894705,-73.847201,Walgreens,40.896528,-73.8447,Pharmacy
4,Wakefield,40.894705,-73.847201,Dunkin',40.890459,-73.849089,Donut Shop


### 4.7.4. One Hot Encoding
The goal is to determine different type of venue (Venue Category) in the neighborhood of NYC and London. However, the attributes of venue category are categorical attributes. For the simplicity of model building and machine learning, we use One Hot Encoding to transform those categorical data to numeric data.

After performing One Hot Encoding, we will group them by neighborhood name and calculate the average of the frequency of each venue category for each neighborhood.

In [57]:
# one hot encoding for NYC
nyc_onehot = pd.get_dummies(nyc_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
nyc_onehot['Neighborhood'] = nyc_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [nyc_onehot.columns[-1]] + list(nyc_onehot.columns[:-1])
nyc_onehot = nyc_onehot[fixed_columns]

nyc_onehot.head()

Unnamed: 0,Yoga Studio,ATM,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport Terminal,American Restaurant,Antique Shop,Arcade,...,Volleyball Court,Warehouse Store,Waste Facility,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [59]:
# one hot encoding for London
london_onehot = pd.get_dummies(london_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
london_onehot['Neighborhood'] = london_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [london_onehot.columns[-1]] + list(london_onehot.columns[:-1])
london_onehot = london_onehot[fixed_columns]

london_onehot.head()

Unnamed: 0,Zoo Exhibit,Accessories Store,Acupuncturist,Adult Boutique,Afghan Restaurant,African Restaurant,Airport,Airport Service,Airport Terminal,American Restaurant,...,Waterfront,Whisky Bar,Windmill,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Group the two tables by neighborhood and calculate the frequency of each category

In [61]:
nyc_grouped = nyc_onehot.groupby('Neighborhood').mean().reset_index()
nyc_grouped

Unnamed: 0,Neighborhood,Yoga Studio,ATM,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport Terminal,American Restaurant,Antique Shop,...,Volleyball Court,Warehouse Store,Waste Facility,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,Allerton,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.000000,0.0,0.0
1,Annadale,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.000000,0.0,0.0
2,Arden Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.000000,0.0,0.0
3,Arlington,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.000000,0.0,0.0
4,Arrochar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.000000,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
296,Woodhaven,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.000000,0.0,0.0
297,Woodlawn,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.000000,0.0,0.0
298,Woodrow,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.000000,0.0,0.0
299,Woodside,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.039474,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.013158,0.0,0.0


In [62]:
london_grouped = london_onehot.groupby('Neighborhood').mean().reset_index()
london_grouped

Unnamed: 0,Neighborhood,Zoo Exhibit,Accessories Store,Acupuncturist,Adult Boutique,Afghan Restaurant,African Restaurant,Airport,Airport Service,Airport Terminal,...,Waterfront,Whisky Bar,Windmill,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio
0,Abbey Wood,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,...,0.0,0.0,0.0,0.00,0.000000,0.0,0.0,0.0,0.0,0.0
1,Acton,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,...,0.0,0.0,0.0,0.00,0.037037,0.0,0.0,0.0,0.0,0.0
2,Addiscombe,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,...,0.0,0.0,0.0,0.00,0.000000,0.0,0.0,0.0,0.0,0.0
3,Albany Park,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,...,0.0,0.0,0.0,0.00,0.000000,0.0,0.0,0.0,0.0,0.0
4,Aldgate,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,...,0.0,0.0,0.0,0.01,0.010000,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
496,Woodside Park,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,...,0.0,0.0,0.0,0.00,0.000000,0.0,0.0,0.0,0.0,0.0
497,Woolwich,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,...,0.0,0.0,0.0,0.00,0.000000,0.0,0.0,0.0,0.0,0.0
498,Wormwood Scrubs,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,...,0.0,0.0,0.0,0.00,0.000000,0.0,0.0,0.0,0.0,0.0
499,Yeading,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,...,0.0,0.0,0.0,0.00,0.000000,0.0,0.0,0.0,0.0,0.0


## 4.8. Top Venues
The next process is to rank the top venue categories for each neighborhood for NYC and London

For simplicity, this function is created to return the most common venues:

In [63]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Create two new data frames to save the most common venues for NYC and London

### 4.8.1. Top Venues: London

In [79]:
import numpy as np
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
london_venues_sorted = pd.DataFrame(columns=columns)
london_venues_sorted['Neighborhood'] = london_grouped['Neighborhood']

for ind in np.arange(london_grouped.shape[0]):
   london_venues_sorted.iloc[ind, 1:] = return_most_common_venues(london_grouped.iloc[ind, :], num_top_venues)

london_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Abbey Wood,Grocery Store,Campground,Food Truck,Eastern European Restaurant,Electronics Store,English Restaurant,Escape Room,Ethiopian Restaurant,Event Service,Event Space
1,Acton,Grocery Store,Café,Breakfast Spot,Bakery,Fish & Chips Shop,Gastropub,Coffee Shop,Middle Eastern Restaurant,Fast Food Restaurant,Japanese Restaurant
2,Addiscombe,Park,Grocery Store,Bakery,Fast Food Restaurant,Chinese Restaurant,Café,Cosmetics Shop,Pub,Fish & Chips Shop,Event Service
3,Albany Park,Pub,Train Station,Grocery Store,Indian Restaurant,Adult Boutique,Farmers Market,Electronics Store,English Restaurant,Escape Room,Ethiopian Restaurant
4,Aldgate,Hotel,Coffee Shop,Cocktail Bar,Café,Pizza Place,Gym / Fitness Center,Middle Eastern Restaurant,Restaurant,Pub,Indian Restaurant


### 4.8.2. Top Venues: NYC

In [72]:
import numpy as np
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
nyc_venues_sorted = pd.DataFrame(columns=columns)
nyc_venues_sorted['Neighborhood'] = nyc_grouped['Neighborhood']

for ind in np.arange(nyc_grouped.shape[0]):
   nyc_venues_sorted.iloc[ind, 1:] = return_most_common_venues( nyc_grouped.iloc[ind, :], num_top_venues)

nyc_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Allerton,Pizza Place,Supermarket,Spa,Deli / Bodega,Chinese Restaurant,Bus Station,Fast Food Restaurant,Bakery,Grocery Store,Check Cashing Service
1,Annadale,Pizza Place,American Restaurant,Bakery,Liquor Store,Train Station,Sushi Restaurant,Diner,Restaurant,Cosmetics Shop,Deli / Bodega
2,Arden Heights,Pizza Place,Pharmacy,Deli / Bodega,Bus Stop,Coffee Shop,Women's Store,Event Service,Event Space,Exhibit,Eye Doctor
3,Arlington,Intersection,Deli / Bodega,Bus Stop,Boat or Ferry,Coffee Shop,Women's Store,Filipino Restaurant,Event Space,Exhibit,Eye Doctor
4,Arrochar,Bus Stop,Pizza Place,Italian Restaurant,Deli / Bodega,Food Truck,Middle Eastern Restaurant,Outdoors & Recreation,Liquor Store,Bagel Shop,Sandwich Place


## 4.9. Modeling: KMeans Clustering
Use KMeans Clustering algorithm to cluster neighborhoods that share similar features. We cluster neighborhoods into 5 clusters.

### 4.9.1. Modeling: NYC

In [69]:
# set number of clusters
kclusters = 5

nyc_grouped_clustering = nyc_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans_nyc = KMeans(n_clusters=kclusters, random_state=0).fit(nyc_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans_nyc.labels_[0:10] 

array([2, 2, 0, 0, 0, 3, 3, 2, 3, 2], dtype=int32)

Create a new dataframe, including all NYC neighborhoods, their top 10 venues, and the cluster labels

In [73]:
# add clustering labels
nyc_venues_sorted.insert(0, 'Cluster Labels', kmeans_nyc.labels_)

nyc_merged = df_nyc

nyc_merged = nyc_merged.join(nyc_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

nyc_merged.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bronx,Wakefield,40.894705,-73.847201,2.0,Pharmacy,Deli / Bodega,Sandwich Place,Donut Shop,Dessert Shop,Ice Cream Shop,Laundromat,Food,Food Truck,Event Service
1,Bronx,Co-op City,40.874294,-73.829939,2.0,Bus Station,Fast Food Restaurant,Fried Chicken Joint,Restaurant,Grocery Store,Park,Pharmacy,Bagel Shop,Liquor Store,Pizza Place
2,Bronx,Eastchester,40.887556,-73.827806,0.0,Caribbean Restaurant,Deli / Bodega,Bus Stop,Diner,Convenience Store,Pizza Place,Platform,Seafood Restaurant,Bowling Alley,Fast Food Restaurant
3,Bronx,Fieldston,40.895437,-73.905643,3.0,Plaza,Medical Supply Store,Bus Station,River,Filipino Restaurant,Event Service,Event Space,Exhibit,Eye Doctor,Factory
4,Bronx,Riverdale,40.890834,-73.912585,3.0,Bus Station,Park,Baseball Field,Moving Target,Medical Supply Store,Bank,Gym,Playground,Plaza,Home Service


### 4.9.2. Modeling: London
Run the same algorithm for London

In [74]:
# set number of clusters
kclusters = 5

london_grouped_clustering = london_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans_london = KMeans(n_clusters=kclusters, random_state=0).fit(london_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans_london.labels_[0:10] 

array([1, 1, 1, 1, 4, 4, 4, 1, 4, 1], dtype=int32)

In [80]:
# add clustering labels
london_venues_sorted.insert(0, 'Cluster Labels', kmeans_london.labels_)

london_merged = df_lon

london_merged = london_merged.join(london_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

london_merged.head(10)

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Bexley, Greenwich",Abbey Wood,51.4856,0.119687,1.0,Grocery Store,Campground,Food Truck,Eastern European Restaurant,Electronics Store,English Restaurant,Escape Room,Ethiopian Restaurant,Event Service,Event Space
1,"Ealing, Hammersmith and Fulham",Acton,51.5066,-0.244695,1.0,Grocery Store,Café,Breakfast Spot,Bakery,Fish & Chips Shop,Gastropub,Coffee Shop,Middle Eastern Restaurant,Fast Food Restaurant,Japanese Restaurant
2,Croydon,Addington,44.4206,-76.9782,,,,,,,,,,,
3,Croydon,Addiscombe,51.3797,-0.0742821,1.0,Park,Grocery Store,Bakery,Fast Food Restaurant,Chinese Restaurant,Café,Cosmetics Shop,Pub,Fish & Chips Shop,Event Service
4,Bexley,Albany Park,51.4354,0.125965,1.0,Pub,Train Station,Grocery Store,Indian Restaurant,Adult Boutique,Farmers Market,Electronics Store,English Restaurant,Escape Room,Ethiopian Restaurant
6,City,Aldgate,51.5142,-0.0757186,4.0,Hotel,Coffee Shop,Cocktail Bar,Café,Pizza Place,Gym / Fitness Center,Middle Eastern Restaurant,Restaurant,Pub,Indian Restaurant
7,Westminster,Aldwych,51.5122,-0.118896,4.0,Pub,Theater,Coffee Shop,Burger Joint,Restaurant,Sandwich Place,Cocktail Bar,Hotel,Bar,Dessert Shop
8,Brent,Alperton,51.5408,-0.300096,4.0,Supermarket,Train Station,Sandwich Place,Asian Restaurant,Metro Station,Bus Stop,Café,Gym / Fitness Center,Indian Restaurant,Ethiopian Restaurant
9,Bromley,Anerley,51.4076,-0.0619394,1.0,Grocery Store,Park,Fast Food Restaurant,Supermarket,Convenience Store,Cricket Ground,Cuban Restaurant,Electronics Store,English Restaurant,Escape Room
10,Islington,Angel,51.5318,-0.105714,4.0,Pub,Coffee Shop,Café,Restaurant,Arts & Crafts Store,Gym / Fitness Center,Indian Restaurant,Food Truck,Mediterranean Restaurant,Burrito Place


### 4.9.3. Drop all empty values

In [83]:
print(london_merged.shape)
london_merged.dropna(axis=0,inplace=True)
print(london_merged.shape)
london_merged.head(10)

(515, 15)
(507, 15)


Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Bexley, Greenwich",Abbey Wood,51.4856,0.119687,1.0,Grocery Store,Campground,Food Truck,Eastern European Restaurant,Electronics Store,English Restaurant,Escape Room,Ethiopian Restaurant,Event Service,Event Space
1,"Ealing, Hammersmith and Fulham",Acton,51.5066,-0.244695,1.0,Grocery Store,Café,Breakfast Spot,Bakery,Fish & Chips Shop,Gastropub,Coffee Shop,Middle Eastern Restaurant,Fast Food Restaurant,Japanese Restaurant
3,Croydon,Addiscombe,51.3797,-0.0742821,1.0,Park,Grocery Store,Bakery,Fast Food Restaurant,Chinese Restaurant,Café,Cosmetics Shop,Pub,Fish & Chips Shop,Event Service
4,Bexley,Albany Park,51.4354,0.125965,1.0,Pub,Train Station,Grocery Store,Indian Restaurant,Adult Boutique,Farmers Market,Electronics Store,English Restaurant,Escape Room,Ethiopian Restaurant
6,City,Aldgate,51.5142,-0.0757186,4.0,Hotel,Coffee Shop,Cocktail Bar,Café,Pizza Place,Gym / Fitness Center,Middle Eastern Restaurant,Restaurant,Pub,Indian Restaurant
7,Westminster,Aldwych,51.5122,-0.118896,4.0,Pub,Theater,Coffee Shop,Burger Joint,Restaurant,Sandwich Place,Cocktail Bar,Hotel,Bar,Dessert Shop
8,Brent,Alperton,51.5408,-0.300096,4.0,Supermarket,Train Station,Sandwich Place,Asian Restaurant,Metro Station,Bus Stop,Café,Gym / Fitness Center,Indian Restaurant,Ethiopian Restaurant
9,Bromley,Anerley,51.4076,-0.0619394,1.0,Grocery Store,Park,Fast Food Restaurant,Supermarket,Convenience Store,Cricket Ground,Cuban Restaurant,Electronics Store,English Restaurant,Escape Room
10,Islington,Angel,51.5318,-0.105714,4.0,Pub,Coffee Shop,Café,Restaurant,Arts & Crafts Store,Gym / Fitness Center,Indian Restaurant,Food Truck,Mediterranean Restaurant,Burrito Place
11,Bromley,Aperfield,51.3102,0.0427001,1.0,Grocery Store,Home Service,Coffee Shop,Supermarket,Falafel Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,English Restaurant,Escape Room


In [84]:
print(nyc_merged.shape)
nyc_merged.dropna(axis=0,inplace=True)
print(nyc_merged.shape)
nyc_merged.head(10)

(306, 15)
(305, 15)


Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bronx,Wakefield,40.894705,-73.847201,2.0,Pharmacy,Deli / Bodega,Sandwich Place,Donut Shop,Dessert Shop,Ice Cream Shop,Laundromat,Food,Food Truck,Event Service
1,Bronx,Co-op City,40.874294,-73.829939,2.0,Bus Station,Fast Food Restaurant,Fried Chicken Joint,Restaurant,Grocery Store,Park,Pharmacy,Bagel Shop,Liquor Store,Pizza Place
2,Bronx,Eastchester,40.887556,-73.827806,0.0,Caribbean Restaurant,Deli / Bodega,Bus Stop,Diner,Convenience Store,Pizza Place,Platform,Seafood Restaurant,Bowling Alley,Fast Food Restaurant
3,Bronx,Fieldston,40.895437,-73.905643,3.0,Plaza,Medical Supply Store,Bus Station,River,Filipino Restaurant,Event Service,Event Space,Exhibit,Eye Doctor,Factory
4,Bronx,Riverdale,40.890834,-73.912585,3.0,Bus Station,Park,Baseball Field,Moving Target,Medical Supply Store,Bank,Gym,Playground,Plaza,Home Service
5,Bronx,Kingsbridge,40.881687,-73.902818,2.0,Pizza Place,Bar,Sandwich Place,Mexican Restaurant,Bakery,Fried Chicken Joint,Latin American Restaurant,Donut Shop,Café,Supermarket
6,Manhattan,Marble Hill,40.876551,-73.91066,2.0,Coffee Shop,Gym,Discount Store,Sandwich Place,Yoga Studio,Video Game Store,Supplement Shop,Pharmacy,Donut Shop,Diner
7,Bronx,Woodlawn,40.898273,-73.867315,0.0,Deli / Bodega,Pub,Food & Drink Shop,Pizza Place,Playground,Food Truck,Park,Grocery Store,Pharmacy,Liquor Store
8,Bronx,Norwood,40.877224,-73.879391,2.0,Pizza Place,Park,Bank,Pharmacy,Deli / Bodega,Fast Food Restaurant,Restaurant,Mexican Restaurant,Coffee Shop,Sandwich Place
9,Bronx,Williamsbridge,40.881039,-73.857446,3.0,Caribbean Restaurant,Soup Place,Bar,Nightclub,Food Stand,Food Court,Ethiopian Restaurant,Event Service,Event Space,Exhibit


# 5. Results
After the model is built, gather all the results by ploting all neighborhoods with clustered result to help visualization

Similar to what we have plotted above, we use Folium to plot the map

## 5.1. Map Result: NYC

In [86]:
map2_nyc = folium.Map(location=[nyc_lat, nyc_lng], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(nyc_merged['Latitude'], nyc_merged['Longitude'], nyc_merged['Neighborhood'], nyc_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(int(cluster+1)), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.7).add_to(map2_nyc)
       
map2_nyc

If you are unable to see the map in GitHub, below is a screenshot

![1.JPG](attachment:1.JPG)

## 5.2. Map Result: London

In [88]:
map2_london = folium.Map(location=[london_lat, london_lng], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(london_merged['Latitude'], london_merged['Longitude'], london_merged['Neighborhood'], london_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(int(cluster+1)), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.7).add_to(map2_london)
       
map2_london

If you are unable to see the map on GitHub, below is a screenshot

![2.JPG](attachment:2.JPG)

## 5.3. Examine Clusters
We print a data frame for each different cluster for NYC and London.

### 5.3.1. NYC
Cluster 1:

In [89]:
nyc_merged.loc[nyc_merged['Cluster Labels'] == 0, nyc_merged.columns[[1] + list(range(5, nyc_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Eastchester,Caribbean Restaurant,Deli / Bodega,Bus Stop,Diner,Convenience Store,Pizza Place,Platform,Seafood Restaurant,Bowling Alley,Fast Food Restaurant
7,Woodlawn,Deli / Bodega,Pub,Food & Drink Shop,Pizza Place,Playground,Food Truck,Park,Grocery Store,Pharmacy,Liquor Store
28,Throgs Neck,Deli / Bodega,Bar,Juice Bar,Asian Restaurant,Coffee Shop,American Restaurant,Pizza Place,Italian Restaurant,Sports Bar,Event Space
32,Van Nest,Pizza Place,Deli / Bodega,Middle Eastern Restaurant,Donut Shop,Food Truck,Bakery,BBQ Joint,Coffee Shop,Board Shop,Film Studio
39,Edgewater Park,Italian Restaurant,Deli / Bodega,Pizza Place,Donut Shop,Coffee Shop,Japanese Restaurant,Asian Restaurant,Park,Fast Food Restaurant,Spa
72,East New York,Deli / Bodega,Pizza Place,Plaza,Food Truck,Gym,Metro Station,Caribbean Restaurant,Fast Food Restaurant,Event Service,Child Care Service
78,Coney Island,Baseball Stadium,Theme Park Ride / Attraction,Beach,Pharmacy,Monument / Landmark,Pizza Place,Skating Rink,Caribbean Restaurant,Brewery,Deli / Bodega
89,Ocean Hill,Deli / Bodega,Food,Fried Chicken Joint,Bus Stop,Supermarket,Southern / Soul Food Restaurant,Grocery Store,Bakery,Playground,Donut Shop
144,Glendale,Pizza Place,Brewery,Deli / Bodega,Food & Drink Shop,Arts & Crafts Store,Women's Store,Event Space,Exhibit,Eye Doctor,Factory
148,South Ozone Park,Park,Deli / Bodega,Bar,Food Truck,Hotel,Grocery Store,Sandwich Place,Donut Shop,Fast Food Restaurant,Fish Market


Cluster 2

In [90]:
nyc_merged.loc[nyc_merged['Cluster Labels'] == 1, nyc_merged.columns[[1] + list(range(5, nyc_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
192,Somerville,Park,Women's Store,Entertainment Service,Event Service,Event Space,Exhibit,Eye Doctor,Factory,Falafel Restaurant,Farm
203,Todt Hill,Park,Women's Store,Entertainment Service,Event Service,Event Space,Exhibit,Eye Doctor,Factory,Falafel Restaurant,Farm


Cluster 3

In [91]:
nyc_merged.loc[nyc_merged['Cluster Labels'] == 2, nyc_merged.columns[[1] + list(range(5, nyc_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Wakefield,Pharmacy,Deli / Bodega,Sandwich Place,Donut Shop,Dessert Shop,Ice Cream Shop,Laundromat,Food,Food Truck,Event Service
1,Co-op City,Bus Station,Fast Food Restaurant,Fried Chicken Joint,Restaurant,Grocery Store,Park,Pharmacy,Bagel Shop,Liquor Store,Pizza Place
5,Kingsbridge,Pizza Place,Bar,Sandwich Place,Mexican Restaurant,Bakery,Fried Chicken Joint,Latin American Restaurant,Donut Shop,Café,Supermarket
6,Marble Hill,Coffee Shop,Gym,Discount Store,Sandwich Place,Yoga Studio,Video Game Store,Supplement Shop,Pharmacy,Donut Shop,Diner
8,Norwood,Pizza Place,Park,Bank,Pharmacy,Deli / Bodega,Fast Food Restaurant,Restaurant,Mexican Restaurant,Coffee Shop,Sandwich Place
...,...,...,...,...,...,...,...,...,...,...,...
295,Highland Park,Grocery Store,Latin American Restaurant,Garden,Tennis Court,Metro Station,Park,Liquor Store,Pizza Place,Gym / Fitness Center,Furniture / Home Store
297,Bronxdale,Performing Arts Venue,Breakfast Spot,Mexican Restaurant,Chinese Restaurant,Pizza Place,Eastern European Restaurant,Supermarket,Bank,Spanish Restaurant,Gym
298,Allerton,Pizza Place,Supermarket,Spa,Deli / Bodega,Chinese Restaurant,Bus Station,Fast Food Restaurant,Bakery,Grocery Store,Check Cashing Service
299,Kingsbridge Heights,Pizza Place,Chinese Restaurant,Spanish Restaurant,Bus Station,Grocery Store,Coffee Shop,Food,Shoe Store,Park,Mexican Restaurant


Cluster 4

In [92]:
nyc_merged.loc[nyc_merged['Cluster Labels'] == 3, nyc_merged.columns[[1] + list(range(5, nyc_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Fieldston,Plaza,Medical Supply Store,Bus Station,River,Filipino Restaurant,Event Service,Event Space,Exhibit,Eye Doctor,Factory
4,Riverdale,Bus Station,Park,Baseball Field,Moving Target,Medical Supply Store,Bank,Gym,Playground,Plaza,Home Service
9,Williamsbridge,Caribbean Restaurant,Soup Place,Bar,Nightclub,Food Stand,Food Court,Ethiopian Restaurant,Event Service,Event Space,Exhibit
12,City Island,Deli / Bodega,Seafood Restaurant,Thrift / Vintage Store,Harbor / Marina,Arts & Crafts Store,Grocery Store,Park,Boat or Ferry,Baseball Field,Bar
22,Port Morris,Baseball Field,Furniture / Home Store,Peruvian Restaurant,Metro Station,Donut Shop,Distillery,Restaurant,Latin American Restaurant,Brewery,Storage Facility
...,...,...,...,...,...,...,...,...,...,...,...
294,Malba,Rest Area,Bus Line,Tennis Court,Women's Store,Field,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Eye Doctor
301,Hudson Yards,Gym / Fitness Center,Italian Restaurant,American Restaurant,Café,Hotel,Restaurant,Coffee Shop,Gym,Dog Run,Park
303,Bayswater,Playground,Women's Store,Field,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Eye Doctor,Factory,Falafel Restaurant
304,Queensbridge,Hotel,Hotel Bar,Athletics & Sports,Performing Arts Venue,Platform,Sandwich Place,Cocktail Bar,Roof Deck,Gym / Fitness Center,Scenic Lookout


Cluster 5

In [96]:
nyc_merged.loc[nyc_merged['Cluster Labels'] == 4, nyc_merged.columns[[1] + list(range(5, nyc_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
202,Grymes Hill,Dog Run,Women's Store,Field,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Eye Doctor,Factory,Falafel Restaurant


### 5.3.2. London

Cluster 1

In [97]:
london_merged.loc[london_merged['Cluster Labels'] == 0, london_merged.columns[[1] + list(range(5, london_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
46,Bickley,Cosmetics Shop,Cricket Ground,Train Station,Home Service,Farm,Eastern European Restaurant,Electronics Store,English Restaurant,Escape Room,Ethiopian Restaurant
92,Chelsfield,Train Station,Pizza Place,Fast Food Restaurant,Pharmacy,Yoga Studio,Fabric Shop,Eastern European Restaurant,Electronics Store,English Restaurant,Escape Room
113,Coulsdon,Park,Pool Hall,Thai Restaurant,Platform,Fabric Shop,Duty-free Shop,Eastern European Restaurant,Electronics Store,English Restaurant,Escape Room
127,Crystal Palace,Platform,Outdoor Sculpture,Farm,Sculpture Garden,Park,Garden,Track Stadium,Gym / Fitness Center,Train Station,Athletics & Sports
152,Eden Park,English Restaurant,Hotel,Sports Club,Train Station,Soccer Stadium,Yoga Studio,Falafel Restaurant,Eastern European Restaurant,Electronics Store,Escape Room
159,Elmstead,Platform,Train Station,Yoga Studio,Falafel Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,English Restaurant,Escape Room,Ethiopian Restaurant
167,Falconwood,English Restaurant,Trail,Train Station,Platform,Other Repair Shop,Café,Bus Stop,Yoga Studio,Fabric Shop,Electronics Store
186,Gidea Park,Grocery Store,Bar,Train Station,Platform,Bus Stop,Falafel Restaurant,Eastern European Restaurant,Electronics Store,English Restaurant,Escape Room
193,Grange Park,Indian Restaurant,Golf Course,Train Station,English Restaurant,Falafel Restaurant,Eastern European Restaurant,Electronics Store,Escape Room,Ethiopian Restaurant,Event Service
203,Hadley Wood,Convenience Store,Train Station,Farm,Eastern European Restaurant,Electronics Store,English Restaurant,Escape Room,Ethiopian Restaurant,Event Service,Event Space


Cluster 2

In [98]:
london_merged.loc[london_merged['Cluster Labels'] == 1, london_merged.columns[[1] + list(range(5, london_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Abbey Wood,Grocery Store,Campground,Food Truck,Eastern European Restaurant,Electronics Store,English Restaurant,Escape Room,Ethiopian Restaurant,Event Service,Event Space
1,Acton,Grocery Store,Café,Breakfast Spot,Bakery,Fish & Chips Shop,Gastropub,Coffee Shop,Middle Eastern Restaurant,Fast Food Restaurant,Japanese Restaurant
3,Addiscombe,Park,Grocery Store,Bakery,Fast Food Restaurant,Chinese Restaurant,Café,Cosmetics Shop,Pub,Fish & Chips Shop,Event Service
4,Albany Park,Pub,Train Station,Grocery Store,Indian Restaurant,Adult Boutique,Farmers Market,Electronics Store,English Restaurant,Escape Room,Ethiopian Restaurant
9,Anerley,Grocery Store,Park,Fast Food Restaurant,Supermarket,Convenience Store,Cricket Ground,Cuban Restaurant,Electronics Store,English Restaurant,Escape Room
...,...,...,...,...,...,...,...,...,...,...,...
498,West Drayton,Pub,Hotel,Bed & Breakfast,Grocery Store,Indian Restaurant,Coworking Space,Falafel Restaurant,Eastern European Restaurant,Electronics Store,English Restaurant
500,West Green,Grocery Store,Turkish Restaurant,Breakfast Spot,Bus Stop,Park,Lounge,Steakhouse,Japanese Restaurant,Coffee Shop,Bar
504,West Harrow,Indian Restaurant,Warehouse Store,Grocery Store,Metro Station,Park,Fabric Shop,Duty-free Shop,Eastern European Restaurant,Electronics Store,English Restaurant
525,Woodside Park,Café,Grocery Store,Soccer Field,Falafel Restaurant,Eastern European Restaurant,Electronics Store,English Restaurant,Escape Room,Ethiopian Restaurant,Event Service


Cluster 3

In [99]:
london_merged.loc[london_merged['Cluster Labels'] == 2, london_merged.columns[[1] + list(range(5, london_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
96,Chinbrook,Park,Dance Studio,Farm,Eastern European Restaurant,Electronics Store,English Restaurant,Escape Room,Ethiopian Restaurant,Event Service,Event Space
192,Grahame Park,Park,Salon / Barbershop,Hobby Shop,Bus Stop,Fabric Shop,Duty-free Shop,Eastern European Restaurant,Electronics Store,English Restaurant,Escape Room
201,Hackney Marshes,Park,Bike Trail,Farm,Eastern European Restaurant,Electronics Store,English Restaurant,Escape Room,Ethiopian Restaurant,Event Service,Event Space
229,Havering-atte-Bower,Park,Farm,Eastern European Restaurant,Electronics Store,English Restaurant,Escape Room,Ethiopian Restaurant,Event Service,Event Space,Exhibit
287,Little Ilford,Park,Farm,Eastern European Restaurant,Electronics Store,English Restaurant,Escape Room,Ethiopian Restaurant,Event Service,Event Space,Exhibit
294,Loxford,Park,Pool,Event Service,Falafel Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,English Restaurant,Escape Room,Ethiopian Restaurant
331,North Cray,Park,Farm,Eastern European Restaurant,Electronics Store,English Restaurant,Escape Room,Ethiopian Restaurant,Event Service,Event Space,Exhibit
332,North End,Park,Farm,Eastern European Restaurant,Electronics Store,English Restaurant,Escape Room,Ethiopian Restaurant,Event Service,Event Space,Exhibit
341,Northwood,Park,Golf Course,Food Truck,Eastern European Restaurant,Electronics Store,English Restaurant,Escape Room,Ethiopian Restaurant,Event Service,Event Space
378,Rainham,Park,Chinese Restaurant,Fish & Chips Shop,Falafel Restaurant,Eastern European Restaurant,Electronics Store,English Restaurant,Escape Room,Ethiopian Restaurant,Event Service


Cluster 4

In [100]:
london_merged.loc[london_merged['Cluster Labels'] == 3, london_merged.columns[[1] + list(range(5, london_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,Barnehurst,Pub,Asian Restaurant,Pizza Place,Middle Eastern Restaurant,Farm,Electronics Store,English Restaurant,Escape Room,Ethiopian Restaurant,Event Service
33,Beddington,Pub,Park,Hardware Store,Indian Restaurant,Adult Boutique,Farmers Market,Electronics Store,English Restaurant,Escape Room,Ethiopian Restaurant
47,Biggin Hill,Pub,Airport,Airport Service,Massage Studio,Farm,Eastern European Restaurant,Electronics Store,English Restaurant,Escape Room,Ethiopian Restaurant
55,Botany Bay,Pub,Sports Club,Daycare,Food & Drink Shop,Food Court,Eastern European Restaurant,Electronics Store,English Restaurant,Escape Room,Ethiopian Restaurant
67,Bromley Common,Pub,Bus Station,Gas Station,Fast Food Restaurant,Food,Flower Shop,Electronics Store,English Restaurant,Escape Room,Ethiopian Restaurant
71,Bulls Cross,Pub,Soccer Field,Garden,Park,Creperie,Cricket Ground,Electronics Store,English Restaurant,Escape Room,Ethiopian Restaurant
82,Castelnau,Pub,French Restaurant,Café,Lake,Park,Coffee Shop,Fabric Shop,Eastern European Restaurant,Electronics Store,English Restaurant
83,Castle Green,Pub,Go Kart Track,Bus Stop,Skate Park,Fabric Shop,Eastern European Restaurant,Electronics Store,English Restaurant,Escape Room,Ethiopian Restaurant
98,Chislehurst,Pub,Gastropub,Indian Restaurant,Café,Eastern European Restaurant,Electronics Store,English Restaurant,Escape Room,Ethiopian Restaurant,Event Service
136,Derry Downs,Pub,Photography Studio,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,English Restaurant,Escape Room,Ethiopian Restaurant,Event Service,Event Space


Cluster 5

In [101]:
london_merged.loc[london_merged['Cluster Labels'] == 4, london_merged.columns[[1] + list(range(5, london_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Aldgate,Hotel,Coffee Shop,Cocktail Bar,Café,Pizza Place,Gym / Fitness Center,Middle Eastern Restaurant,Restaurant,Pub,Indian Restaurant
7,Aldwych,Pub,Theater,Coffee Shop,Burger Joint,Restaurant,Sandwich Place,Cocktail Bar,Hotel,Bar,Dessert Shop
8,Alperton,Supermarket,Train Station,Sandwich Place,Asian Restaurant,Metro Station,Bus Stop,Café,Gym / Fitness Center,Indian Restaurant,Ethiopian Restaurant
10,Angel,Pub,Coffee Shop,Café,Restaurant,Arts & Crafts Store,Gym / Fitness Center,Indian Restaurant,Food Truck,Mediterranean Restaurant,Burrito Place
12,Archway,Coffee Shop,Grocery Store,Pub,Pizza Place,Café,Italian Restaurant,Japanese Restaurant,Gym / Fitness Center,Gastropub,Asian Restaurant
...,...,...,...,...,...,...,...,...,...,...,...
523,Woodlands,Café,Pub,Fast Food Restaurant,Bus Stop,Restaurant,Supermarket,Breakfast Spot,River,Convenience Store,Czech Restaurant
524,Woodside,Tram Station,Indian Restaurant,Chinese Restaurant,Park,Falafel Restaurant,Eastern European Restaurant,Electronics Store,English Restaurant,Escape Room,Ethiopian Restaurant
526,Woolwich,Pub,Grocery Store,Fast Food Restaurant,Plaza,Coffee Shop,Clothing Store,Bakery,Pharmacy,Asian Restaurant,Supermarket
528,Wormwood Scrubs,Gym,Track Stadium,Baseball Field,Bus Stop,Park,Yoga Studio,Fabric Shop,Eastern European Restaurant,Electronics Store,English Restaurant


# 6. Discussion
It can be seen that London has a wide variety of neighborhoods.Each cluster has more than 10 different neighborhoods. In contrast, the venue of New York City seems less diverse than London. For NYC, cluster 1,3,and 5 have much more neighborhoods than cluster 2,5. 

Moreover, NYC has more diversity in restaurant than London. Italian food, Italian food, Chinese food, etc, are available for people with different cultural background. In contrast, many restaurant in London are English or European restaurant.

We can also see that London has more parks than New York City, while New York City has more diversity in shops than London.

# 7. Conclusion
In the end, London has more overall venue diversity than New York City, which means it has more different kinds of venue than New York City. The venues in New York City are more focused on shops and restaurant. It can be seen that New York City is more suitable for those who seek for a multicultural society, 