# Final project - The Battle of Neighborhoods

## Coursera | Applied Data Science Capstone
### by Martin Kovarik

## Notebook
Notebook follow the structure of the Report

1. Introduction where you discuss the business problem and who would be interested in this project.
2. Data where you describe the data that will be used to solve the problem and the source of the data.
3. Methodology section which represents the main component of the report where you discuss and describe any exploratory data analysis that you did, any inferential statistical testing that you performed, if any, and what machine learnings were used and why.

In [1]:
#Libs
import pandas as pd
import numpy as np
import requests
import lxml
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import matplotlib.colors as colors
import folium
from folium import plugins
from geopy.geocoders import Nominatim
from sklearn.cluster import KMeans
from bs4 import BeautifulSoup
#import csv

## 2. Data

### 2a) Extracting the list of London boroughs, its population and their coordinates.


In [2]:
# Extract data from Wikipedia table
url = 'https://en.wikipedia.org/wiki/List_of_London_boroughs'
extract_data = requests.get(url).text

In [3]:
# Parse data with BeautifulSoup lib
soup = BeautifulSoup(extract_data, 'lxml')

In [4]:
print(soup.prettify())

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   List of London boroughs - Wikipedia
  </title>
  <script>
   document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"X-uCawpAICAAACTu7ZMAAACR","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"List_of_London_boroughs","wgTitle":"List of London boroughs","wgCurRevisionId":997329509,"wgRevisionId":997329509,"wgArticleId":28092685,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Use dmy dates from August 2015","Use British English from August 2015","Lists of coordinates","Geographic coordinate lists","Article

In [5]:
#Declare temporary lists
B = []
C = []

In [6]:
#Extract the information about Boroughs, Population and Coordinates from Wikitable
for row in soup.find('table').find_all('tr'):
    cells = row.find_all('td')
    if len(cells) > 0:
        B.append(cells[0].text.rstrip('\n'))
        C.append(cells[8].text.rstrip('\n'))

In [7]:
# Create a Pandas dataframe from extracted information
dict = {'Borough' : B, 'Coordinates': C}
london = pd.DataFrame.from_dict(dict)
london.head()

Unnamed: 0,Borough,Coordinates
0,Barking and Dagenham [note 1],".mw-parser-output .geo-default,.mw-parser-outp..."
1,Barnet,51°37′31″N 0°09′06″W﻿ / ﻿51.6252°N 0.1517°W﻿ /...
2,Bexley,51°27′18″N 0°09′02″E﻿ / ﻿51.4549°N 0.1505°E﻿ /...
3,Brent,51°33′32″N 0°16′54″W﻿ / ﻿51.5588°N 0.2817°W﻿ /...
4,Bromley,51°24′14″N 0°01′11″E﻿ / ﻿51.4039°N 0.0198°E﻿ /...


In [8]:
#Clean the Borough data
london['Borough'] = london['Borough'].map(lambda x: x.rstrip(']'))
london['Borough'] = london['Borough'].map(lambda x: x.rstrip('1234567890.'))
london['Borough'] = london['Borough'].str.replace('note','')
london['Borough'] = london['Borough'].map(lambda x: x.rstrip(' ['))

In [10]:
#Clean the Coordinates data
london.iloc[0,1] = london.iloc[0,1][258:]
london['Coordinates'] = london['Coordinates'].str[46:]
london['Latitude'] = london['Coordinates'].str[:8]
london['Latitude'] = london['Latitude'].map(lambda x: x.strip())
london['Coordinates'] = london['Coordinates'].str[10:]
london['Coordinates'] = london['Coordinates'].map(lambda x: x.lstrip())
london['Coordinates'] = london['Coordinates'].str.split(' ', expand=True)
temp_df = london['Coordinates']
london.drop(labels = 'Coordinates', axis=1, inplace = True)
london = pd.concat([london, temp_df], axis=1)
london.rename(columns={"Coordinates": "Longitude"}, inplace = True)
london['Longitude'] = london['Longitude'].map(lambda x: x.rstrip(u'\ufeff'))
london['Latitude'] = pd.to_numeric(london['Latitude'])
london['Longitude'] = pd.to_numeric(london['Longitude'])

In [11]:
london.head()

Unnamed: 0,Borough,Latitude,Longitude
0,Barking and Dagenham,51.5607,0.1557
1,Barnet,51.6252,-0.1517
2,Bexley,51.4549,0.1505
3,Brent,51.5588,-0.2817
4,Bromley,51.4039,0.0198


### 2b) Extracting data from Foursquare database about venues in listed London's boroughs

In [12]:
#Foursquare credentials
CLIENT_ID = 'EO24XDHSUV3E2UL33BTSOWCBN5H2NZF11ISJACBYV4RRFVEN'
CLIENT_SECRET = 'FHMLYRYMOORYOBOVVIEQTLSDMUNVSZBFYWX1BF2S2FRZOQGO'
VERSION = '20200110'

In [13]:
#Defining the function which calls Frousquare API and extract relevant information about nearby venues from Foursquare database
def extract_venues(names, latitudes, longitudes, radius=1000):
    venues=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, radius, LIMIT)  
        results = requests.get(url).json()["response"]['groups'][0]['items']
        venues.append([(name, lat, lng, v['venue']['name'], v['venue']['location']['lat'], v['venue']['location']['lng'], v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue in venues for item in venue])
    nearby_venues.columns = ['Borough', 'Borough Latitude', 'Borough Longitude', 'Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category']
    return(nearby_venues)

In [14]:
#Get top 100 venues in 500m radius of the center of each Borough
LIMIT = 100
venues_df = extract_venues(names=london['Borough'], latitudes=london['Latitude'], longitudes=london['Longitude'])

Barking and Dagenham
Barnet
Bexley
Brent
Bromley
Camden
Croydon
Ealing
Enfield
Greenwich
Hackney
Hammersmith and Fulham
Haringey
Harrow
Havering
Hillingdon
Hounslow
Islington
Kensington and Chelsea
Kingston upon Thames
Lambeth
Lewisham
Merton
Newham
Redbridge
Richmond upon Thames
Southwark
Sutton
Tower Hamlets
Waltham Forest
Wandsworth
Westminster


In [15]:
venues_df.head()

Unnamed: 0,Borough,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Barking and Dagenham,51.5607,0.1557,Central Park,51.55956,0.161981,Park
1,Barking and Dagenham,51.5607,0.1557,Lara Grill,51.562445,0.147178,Turkish Restaurant
2,Barking and Dagenham,51.5607,0.1557,Iceland,51.560578,0.147685,Grocery Store
3,Barking and Dagenham,51.5607,0.1557,B&M Store,51.565287,0.143793,Discount Store
4,Barking and Dagenham,51.5607,0.1557,Shell,51.560415,0.148364,Gas Station


### 2c) Getting rental data

Summary of monthly rents recorded between October 2019 and September 2020 by borough and bedroom category for London.</br>
Source file: https://www.ons.gov.uk/peoplepopulationandcommunity/housing/adhocs/12435privaterentalmarketinlondonoctober2019toseptember2020

In [16]:
# Read the rent data
rents = pd.read_csv('londonrentalstatisticsq32020.csv', encoding = "ISO-8859-1")
rents.head()

FileNotFoundError: [Errno 2] No such file or directory: 'londonrentalstatisticsq32020.csv'

In [52]:
# For our use case let's focus just on 2 bedrooms apartments and let's use Median of the rents
rents = rents[rents['Bedroom Category'] == 'Two Bedrooms']
rents.drop(rents.columns[1:5], axis=1, inplace=True)
rents.drop(rents.columns[2], axis=1, inplace=True)
rents['Median'] = rents['Median'].str.replace(' ','')
rents.dropna(inplace=True)
rents = rents.reset_index(drop=True)
rents.head()

Unnamed: 0,Borough,Median
0,Barking and Dagenham,1200
1,Barnet,1400
2,Bexley,1100
3,Brent,1447
4,Bromley,1250


In [54]:
rents.values

array([['Barking and Dagenham', '1200'],
       ['Barnet', '1400'],
       ['Bexley', '1100'],
       ['Brent', '1447'],
       ['Bromley', '1250'],
       ['Camden', '2150'],
       ['City of London', '..'],
       ['Croydon', '1200'],
       ['Ealing', '1495'],
       ['Enfield', '1300'],
       ['Greenwich', '1400'],
       ['Hackney', '1750'],
       ['Hammersmith and Fulham', '1842'],
       ['Haringey', '1517'],
       ['Harrow', '1350'],
       ['Havering', '1100'],
       ['Hillingdon', '1275'],
       ['Hounslow', '1300'],
       ['Islington', '1842'],
       ['Kensington and Chelsea', '2708'],
       ['Kingston upon Thames', '1350'],
       ['Lambeth', '1650'],
       ['Lewisham', '1350'],
       ['Merton', '1500'],
       ['Newham', '1450'],
       ['Redbridge', '1275'],
       ['Richmond upon Thames', '1600'],
       ['Southwark', '1668'],
       ['Sutton', '1150'],
       ['Tower Hamlets', '1778'],
       ['Waltham Forest', '1325'],
       ['Wandsworth', '1750'],
       ['

We can see that for *City of London* we have no numeric data. So lets drop that row.

In [55]:
rents = rents[rents['Median'] != '..']

In [59]:
#Check if we have data in expected format
rents.dtypes

Borough    object
Median      int64
dtype: object

In [49]:
#Converting Median values to numbers
rents['Median'] = pd.to_numeric(rents['Median'])

In [60]:
rents.head()

Unnamed: 0,Borough,Median
0,Barking and Dagenham,1200
1,Barnet,1400
2,Bexley,1100
3,Brent,1447
4,Bromley,1250


## 3. Methodology

### 3a) Let's visualize rents rates in particular boroughs on the map of London

**Let's combine *rents* data with *london* data**

In [61]:
rents.head()

Unnamed: 0,Borough,Median
0,Barking and Dagenham,1200
1,Barnet,1400
2,Bexley,1100
3,Brent,1447
4,Bromley,1250


In [62]:
london.head()

Unnamed: 0,Borough,Population,Latitude,Longitude
0,Barking and Dagenham,212906,51.5607,0.1557
1,Barnet,395896,51.6252,-0.1517
2,Bexley,248287,51.4549,0.1505
3,Brent,329771,51.5588,-0.2817
4,Bromley,332336,51.4039,0.0198


In [87]:
lon_ren = pd.merge(rents,london, how='outer', on = 'Borough')
lon_ren.head()

Unnamed: 0,Borough,Median,Population,Latitude,Longitude
0,Barking and Dagenham,1200,212906,51.5607,0.1557
1,Barnet,1400,395896,51.6252,-0.1517
2,Bexley,1100,248287,51.4549,0.1505
3,Brent,1447,329771,51.5588,-0.2817
4,Bromley,1250,332336,51.4039,0.0198


In [88]:
lon_ren.dtypes

Borough        object
Median          int64
Population      int64
Latitude      float64
Longitude     float64
dtype: object

In [108]:
#lon_ren

**Let's visualize the rents Medians on the London's Map**

In [112]:
#We will need GeoJson with Londons Boroughs
london_geo = r'london_boroughs.json'

In [113]:
#Get geo coordinates of London
address = 'London, UK'
geolocator = Nominatim(user_agent="explorer")
london_loc = geolocator.geocode(address)
l_lat = london_loc.latitude
l_long = london_loc.longitude
print('The geograpical coordinate of London are {}, {}.'.format(l_lat, l_long))

The geograpical coordinate of London are 51.5073219, -0.1276474.


In [114]:
london_map = folium.Map(location = [l_lat, l_long], zoom_start = 10)

In [115]:
#Plotting the map
london_map.choropleth(
    geo_data=london_geo,
    data=lon_ren,
    columns=['Borough', 'Median'],
    key_on='feature.properties.name',
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='London Rents'
)
london_map

### 3b) Clustering the Boroughs based on similarity of the venues, using K-means method

In [167]:
venues_df.head()

Unnamed: 0,Borough,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Barking and Dagenham,51.5607,0.1557,Central Park,51.55956,0.161981,Park
1,Barking and Dagenham,51.5607,0.1557,Lara Grill,51.562445,0.147178,Turkish Restaurant
2,Barking and Dagenham,51.5607,0.1557,Iceland,51.560578,0.147685,Grocery Store
3,Barking and Dagenham,51.5607,0.1557,B&M Store,51.565287,0.143793,Discount Store
4,Barking and Dagenham,51.5607,0.1557,Shell,51.560415,0.148364,Gas Station


In [168]:
# One hot encoding
oh_enc = pd.get_dummies(venues_df[['Venue Category']], prefix="", prefix_sep="")
oh_enc.insert(0, 'Borough', venues_df.Borough)

In [169]:
oh_enc.head()

Unnamed: 0,Borough,Afghan Restaurant,African Restaurant,Airport,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Argentinian Restaurant,Art Gallery,...,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Windmill,Wine Bar,Wine Shop,Winery,Women's Store,Yoga Studio
0,Barking and Dagenham,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Barking and Dagenham,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Barking and Dagenham,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Barking and Dagenham,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Barking and Dagenham,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [170]:
#pd.set_option('display.max_rows', 30)

In [171]:
#Lets convert each borough into single row while calculating mean of frequency of each Venue Category 
oh_enc_g = oh_enc.groupby('Borough').mean().reset_index()

In [172]:
oh_enc_g.head()

Unnamed: 0,Borough,Afghan Restaurant,African Restaurant,Airport,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Argentinian Restaurant,Art Gallery,...,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Windmill,Wine Bar,Wine Shop,Winery,Women's Store,Yoga Studio
0,Barking and Dagenham,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Barnet,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Bexley,0.0,0.0,0.0,0.0,0.0,0.0,0.042553,0.0,0.0,...,0.0,0.021277,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0
3,Brent,0.0,0.0,0.0,0.0,0.0,0.0,0.021978,0.0,0.0,...,0.0,0.0,0.0,0.032967,0.0,0.0,0.0,0.0,0.0,0.0
4,Bromley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [188]:
#Let's create a dataframe
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [189]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Borough']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
borough_venues_sorted = pd.DataFrame(columns=columns)
borough_venues_sorted['Borough'] = oh_enc_g['Borough']

for ind in np.arange(oh_enc_g.shape[0]):
    borough_venues_sorted.iloc[ind, 1:] = return_most_common_venues(oh_enc_g.iloc[ind, :], num_top_venues)

borough_venues_sorted.head()

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Barking and Dagenham,Bus Stop,Sports Club,Grocery Store,Construction & Landscaping,Chinese Restaurant,Discount Store,Park,Soccer Field,Supermarket,Gas Station
1,Barnet,Pub,Park,Bus Stop,Fish & Chips Shop,Café,Rental Car Location,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room
2,Bexley,Pub,Clothing Store,Hotel,Supermarket,Coffee Shop,Fast Food Restaurant,Pharmacy,Chinese Restaurant,American Restaurant,Shopping Mall
3,Brent,Coffee Shop,Hotel,Bar,Sporting Goods Shop,Clothing Store,Sandwich Place,Indian Restaurant,Grocery Store,Warehouse Store,Pizza Place
4,Bromley,Pub,Clothing Store,Coffee Shop,Supermarket,Indian Restaurant,Bar,Burger Joint,Café,Electronics Store,Portuguese Restaurant


In [190]:
#Distribute borouhs into 5 clusters
kclusters = 4
london_c = oh_enc_g.drop('Borough', 1)
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(london_c)

kmeans.labels_[0:10] 

array([2, 0, 3, 3, 3, 1, 1, 1, 3, 3], dtype=int32)

In [191]:
#Label the clusters
borough_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

#Bring coordinates to the dataframe
london_merged = lon_ren
london_merged = london_merged.join(borough_venues_sorted.set_index('Borough'), on='Borough')
london_merged.head()

Unnamed: 0,Borough,Median,Population,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Barking and Dagenham,1200,212906,51.5607,0.1557,2,Bus Stop,Sports Club,Grocery Store,Construction & Landscaping,Chinese Restaurant,Discount Store,Park,Soccer Field,Supermarket,Gas Station
1,Barnet,1400,395896,51.6252,-0.1517,0,Pub,Park,Bus Stop,Fish & Chips Shop,Café,Rental Car Location,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room
2,Bexley,1100,248287,51.4549,0.1505,3,Pub,Clothing Store,Hotel,Supermarket,Coffee Shop,Fast Food Restaurant,Pharmacy,Chinese Restaurant,American Restaurant,Shopping Mall
3,Brent,1447,329771,51.5588,-0.2817,3,Coffee Shop,Hotel,Bar,Sporting Goods Shop,Clothing Store,Sandwich Place,Indian Restaurant,Grocery Store,Warehouse Store,Pizza Place
4,Bromley,1250,332336,51.4039,0.0198,3,Pub,Clothing Store,Coffee Shop,Supermarket,Indian Restaurant,Bar,Burger Joint,Café,Electronics Store,Portuguese Restaurant


**Let's visualize the clusters on the London map**

In [192]:
clusters_map = folium.Map(location=[l_lat, l_long], zoom_start=11)

x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(london_merged['Latitude'], london_merged['Longitude'], london_merged['Borough'], london_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=20,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=1).add_to(clusters_map)
       
clusters_map

### 3c) Analyzig the clusters

In [193]:
#Cluster 0
london_merged.loc[london_merged['Cluster Labels'] == 0, london_merged.columns[[0] + list(range(6, london_merged.shape[1]))]]

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Barnet,Pub,Park,Bus Stop,Fish & Chips Shop,Café,Rental Car Location,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room


Let's try to describe **cluster 0** based on the most common venues with one word for easy comparison with other clusters: **Recreatoin**

In [183]:
#Cluster 1
london_merged.loc[london_merged['Cluster Labels'] == 1, london_merged.columns[[0] + list(range(6, london_merged.shape[1]))]]

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Camden,Coffee Shop,Café,Bookstore,Hotel,Art Gallery,Breakfast Spot,Burger Joint,History Museum,Movie Theater,Plaza
6,Croydon,Coffee Shop,Pub,Clothing Store,Italian Restaurant,Platform,Hotel,Indian Restaurant,Mediterranean Restaurant,Café,Bookstore
7,Ealing,Coffee Shop,Pub,Café,Italian Restaurant,Park,Indian Restaurant,Pizza Place,Bakery,Thai Restaurant,Burger Joint
10,Hackney,Pub,Café,Coffee Shop,Bakery,Brewery,Supermarket,Park,Vegetarian / Vegan Restaurant,Flea Market,Restaurant
11,Hammersmith and Fulham,Pub,Café,Indian Restaurant,Coffee Shop,Gym / Fitness Center,French Restaurant,Park,Sandwich Place,Gastropub,Italian Restaurant
17,Islington,Pub,Park,Café,Coffee Shop,French Restaurant,Bakery,Gastropub,Cocktail Bar,Pizza Place,Music Venue
18,Kensington and Chelsea,Café,Restaurant,Pub,Garden,Italian Restaurant,Juice Bar,Clothing Store,Hotel,Bakery,Burger Joint
19,Kingston upon Thames,Coffee Shop,Café,Pub,Clothing Store,Burger Joint,Italian Restaurant,Thai Restaurant,Sushi Restaurant,Hotel,Department Store
20,Lambeth,Coffee Shop,Pizza Place,Pub,Caribbean Restaurant,Cocktail Bar,Tapas Restaurant,Beer Bar,Music Venue,Market,Food Court
23,Newham,Hotel,Light Rail Station,Coffee Shop,Airport Service,Sandwich Place,Deli / Bodega,Park,Bus Stop,Bus Station,Breakfast Spot


Let's try to describe **cluster 1** based on the most common venues with one word for easy comparison with other clusters: **Tourism**

In [184]:
#Cluster 2
london_merged.loc[london_merged['Cluster Labels'] == 2, london_merged.columns[[0] + list(range(6, london_merged.shape[1]))]]

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Barking and Dagenham,Bus Stop,Sports Club,Grocery Store,Construction & Landscaping,Chinese Restaurant,Discount Store,Park,Soccer Field,Supermarket,Gas Station


Let's try to describe **cluster 2** based on the most common venues with one word for easy comparison with other clusters: **Living**

In [185]:
#Cluster 3
london_merged.loc[london_merged['Cluster Labels'] == 3, london_merged.columns[[0] + list(range(6, london_merged.shape[1]))]]

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Bexley,Pub,Clothing Store,Hotel,Supermarket,Coffee Shop,Fast Food Restaurant,Pharmacy,Chinese Restaurant,American Restaurant,Shopping Mall
3,Brent,Coffee Shop,Hotel,Bar,Sporting Goods Shop,Clothing Store,Sandwich Place,Indian Restaurant,Grocery Store,Warehouse Store,Pizza Place
4,Bromley,Pub,Clothing Store,Coffee Shop,Supermarket,Indian Restaurant,Bar,Burger Joint,Café,Electronics Store,Portuguese Restaurant
8,Enfield,Pub,Coffee Shop,Clothing Store,Café,Supermarket,Bookstore,Fish & Chips Shop,Italian Restaurant,Sandwich Place,Indian Restaurant
9,Greenwich,Grocery Store,Pub,Clothing Store,Plaza,Supermarket,Coffee Shop,Bakery,Gym / Fitness Center,Fast Food Restaurant,Hotel
12,Haringey,Pub,Café,Fast Food Restaurant,Clothing Store,Turkish Restaurant,Bakery,Supermarket,Indian Restaurant,Pharmacy,Coffee Shop
13,Harrow,Coffee Shop,Indian Restaurant,Fast Food Restaurant,Clothing Store,Sandwich Place,Pharmacy,Department Store,Park,Bus Stop,Women's Store
14,Havering,Coffee Shop,Clothing Store,Pub,Café,Shopping Mall,Furniture / Home Store,Park,Supermarket,Hotel,Grocery Store
15,Hillingdon,Coffee Shop,Pub,Clothing Store,Fast Food Restaurant,Gym,Grocery Store,Pharmacy,Italian Restaurant,Department Store,Bar
16,Hounslow,Coffee Shop,Fast Food Restaurant,Indian Restaurant,Hotel,Clothing Store,Grocery Store,Discount Store,Bus Stop,Supermarket,Sandwich Place


Let's try to describe **cluster 3** based on the most common venues with one word for easy comparison with other clusters: **Utility**