# Analyzing boroughs in London for Starting a Restaurant

## Introduction
London is the capital and largest city of England and the United Kingdom. It is one of the world's most important financial, commerce and educational centers. London has a diverse range of people and cultures, and more than 300 languages are spoken in the region. Its estimated population is roughly 9 million, which made it the third-most populous city in Europe. If we are looking to open a new restaurant, this is one of the best cities to consider possible locations. This project can be useful for business owners and entrepreneurs who are looking to invest in a restaurant. The main objective of this project is to carefully analyze appropriate data and find recommendations for the stakeholders.

## Data Collection
The data required for this project has been collected from multiple sources. A summary of the data required for this project is given below.

### Borough geo coordinates data
The data of the boroughs in London was scraped from https://en.wikipedia.org/wiki/List_of_London_boroughs.

### Borough earnings data
Information on the income of the population of the borough is collected on the basis of two sources: data on the income of taxpayers living in the borough https://data.london.gov.uk/dataset/average-income-tax-payers-borough, and data on the income of people working in the borough https://data.london.gov.uk/dataset/earnings-workplace-borough.

### Geographical Coordinates
The geographical coordinates for London data has been obtained from the GeoPy library in python.

### Venue Data
The venue data has been extracted using the Foursquare API. This data contains venue recommendations for all boroughs in London and is used to study the popular venues of different boroughs.

## Data usage
The data on the venues will be used with K-Means clustering model to analyze different clusters of boroughs and determine the best location to start a restaurant business. Depending on the level of income of the working and living population, an adjustment function will be added to the cluster label of borough, to clarifying the attractiveness of opening a restaurant in this location.

## Import libraries

In [2]:
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
from geopy.geocoders import Nominatim
import requests
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium
import math
from sklearn.preprocessing import StandardScaler

print('Libraries imported.')

Libraries imported.


## Preparing data for analysis

In [3]:
#Creating soup object
url = 'https://en.wikipedia.org/wiki/List_of_London_boroughs'
page = requests.get(url)
soup = BeautifulSoup(page.text, "html.parser")

In [4]:
#Creating borough Dataframe
table_contents=[]
row = {}
counter = 0

table=soup.find('table')

for cell in table.findAll('td'):
    if counter > 9:
        counter = 0
        table_contents.append(row)
        row = {}
    
    if counter == 0:
        row['Borough'] = cell.text.strip()
    elif counter == 6:
        row['Area_sq_mi'] = float(cell.text.strip())
    elif counter == 8:
        row['Latitude'] = float(cell.text.split('/')[2].split(';')[0])
        row['Longitude'] = float(cell.text.split('/')[2].split(';')[1][1:7])
        
    counter +=1
    
    
df=pd.DataFrame(table_contents)
df['Borough']=df['Borough'].replace({'Barking and Dagenham [note 1]':'Barking and Dagenham',
                                             'Greenwich [note 2]':'Greenwich',
                                             'Hammersmith and Fulham [note 4]':'Hammersmith and Fulham'})

display(df)



Unnamed: 0,Borough,Area_sq_mi,Latitude,Longitude
0,Barking and Dagenham,13.93,51.5607,0.1557
1,Barnet,33.49,51.6252,-0.151
2,Bexley,23.38,51.4549,0.1505
3,Brent,16.7,51.5588,-0.281
4,Bromley,57.97,51.4039,0.0198
5,Camden,8.4,51.529,-0.125
6,Croydon,33.41,51.3714,-0.097
7,Ealing,21.44,51.513,-0.308
8,Enfield,31.74,51.6538,-0.079
9,Greenwich,18.28,51.4892,0.0648


In [41]:
# create map of London to visualize boroughs lacations
address = 'London, England'

geolocator = Nominatim(user_agent="London_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
map_london = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, label in zip(df['Latitude'], df['Longitude'], df['Borough']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_london)  
    
map_london

Since the sizes of boroughs vary significantly, it is necessary to determine the correct radius for eachone before searching venues.
Suppose that the shape of the districts tends to be a circle, then it becomes possible to estimate the correct search radius.

In [6]:
df['Radius'] = (df['Area_sq_mi'] * 2.58999 / math.pi)**0.5*1000
display(df)

Unnamed: 0,Borough,Area_sq_mi,Latitude,Longitude,Radius
0,Barking and Dagenham,13.93,51.5607,0.1557,3388.829082
1,Barnet,33.49,51.6252,-0.151,5254.503444
2,Bexley,23.38,51.4549,0.1505,4390.321866
3,Brent,16.7,51.5588,-0.281,3710.499205
4,Bromley,57.97,51.4039,0.0198,6913.146454
5,Camden,8.4,51.529,-0.125,2631.562871
6,Croydon,33.41,51.3714,-0.097,5248.223785
7,Ealing,21.44,51.513,-0.308,4204.230299
8,Enfield,31.74,51.6538,-0.079,5115.376082
9,Greenwich,18.28,51.4892,0.0648,3882.059638


In [9]:
#Get information about venues in the boroughs
def getNearbyVenues(names, latitudes, longitudes, radius, LIMIT=500):
    CLIENT_ID = client_id
    CLIENT_SECRET = client_secret
    VERSION = '20180605'
    
    venues_list=[]
    for name, lat, lng, radius in zip(names, latitudes, longitudes, radius):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Borough', 
                  'Borough Latitude', 
                  'Borough Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [10]:
client_id = 'FZPPYJO4H4RKTWFPC5AE2IEKJCYYON0HOME2XCSXOR5C3QL3'
client_secret = 'UDXXPYVBFUMJBKMG4X1GURLV21OUT3INSVQ4RLWMV05NON1L'

london_venues = getNearbyVenues(names=df['Borough'], latitudes=df['Latitude'], longitudes=df['Longitude'], radius=df['Radius'])
london_venues.shape

Barking and Dagenham
Barnet
Bexley
Brent
Bromley
Camden
Croydon
Ealing
Enfield
Greenwich
Hackney
Hammersmith and Fulham
Haringey
Harrow
Havering
Hillingdon
Hounslow
Islington
Kensington and Chelsea
Kingston upon Thames
Lambeth
Lewisham
Merton
Newham
Redbridge
Richmond upon Thames
Southwark
Sutton
Tower Hamlets
Waltham Forest
Wandsworth


(3082, 7)

In [54]:
display(london_venues.head(10))

Unnamed: 0,Borough,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Barking and Dagenham,51.5607,0.1557,Central Park,51.55956,0.161981,Park
1,Barking and Dagenham,51.5607,0.1557,Lara Grill,51.562445,0.147178,Turkish Restaurant
2,Barking and Dagenham,51.5607,0.1557,The Eva Hart (Wetherspoon),51.57046,0.130342,Pub
3,Barking and Dagenham,51.5607,0.1557,Costa Coffee,51.57689,0.179497,Coffee Shop
4,Barking and Dagenham,51.5607,0.1557,Harrow Lodge Park,51.555648,0.197926,Park
5,Barking and Dagenham,51.5607,0.1557,The Range,51.57555,0.180254,Furniture / Home Store
6,Barking and Dagenham,51.5607,0.1557,Hoo Hing,51.567561,0.135999,Grocery Store
7,Barking and Dagenham,51.5607,0.1557,Debenhams,51.579097,0.18272,Department Store
8,Barking and Dagenham,51.5607,0.1557,Ciao Bella,51.576103,0.182819,Italian Restaurant
9,Barking and Dagenham,51.5607,0.1557,Pets at Home,51.569605,0.183878,Pet Store


## Explore and cluster the boroughs (venue data)

Before opening a restaurant, an important element is to study the competition in the borough. To do this, we will conduct a quick assessment of the competition based on the number of venues in the "restaurant" category.

In [28]:
#Create df with information about number of venues in neighborhoods
df_restaurant = london_venues[london_venues['Venue Category'].str.contains('Restaurant')]
rest_count = df_restaurant[['Borough', 'Venue']].groupby('Borough').count().sort_values('Venue', ascending=False)
display(rest_count)

Unnamed: 0_level_0,Venue
Borough,Unnamed: 1_level_1
Merton,32
Kensington and Chelsea,31
Harrow,31
Haringey,27
Hounslow,25
Hammersmith and Fulham,25
Redbridge,25
Croydon,25
Barnet,24
Brent,22


As we can see, the number of restaurants in the boroughs differs significantly. In this regard, it would be logical to shorten the list of boroughs for further analysis. To do this, exclude the districts with the largest (top 25%) and the smallest (bottom 25%) number of venues in the "restaurant" category.

In [29]:
#Setting boundaries to exclude Boroughs with the most and least number of restaurants
high_border = np.percentile(rest_count['Venue'], 75)
low_border = np.percentile(rest_count['Venue'], 25)

#List of Boroughs to explore
interesting_boroughs = rest_count[(rest_count['Venue']<high_border)&(rest_count['Venue']>low_border)].index

#london_grouped[london_grouped['Borough'].isin(df_explore_1.index)]
df_explore = df[df['Borough'].isin(interesting_boroughs)]
london_venues_explore = london_venues[london_venues['Borough'].isin(interesting_boroughs)]

display(df_explore)

Unnamed: 0,Borough,Area_sq_mi,Latitude,Longitude,Radius
0,Barking and Dagenham,13.93,51.5607,0.1557,3388.829082
1,Barnet,33.49,51.6252,-0.151,5254.503444
3,Brent,16.7,51.5588,-0.281,3710.499205
4,Bromley,57.97,51.4039,0.0198,6913.146454
8,Enfield,31.74,51.6538,-0.079,5115.376082
15,Hillingdon,44.67,51.5441,-0.476,6068.510162
17,Islington,5.74,51.5416,-0.102,2175.354565
19,Kingston upon Thames,14.38,51.4085,-0.306,3443.13103
20,Lambeth,10.36,51.4607,-0.116,2922.496401
21,Lewisham,13.57,51.4452,-0.02,3344.75284


We use One-hot Encoding to convert categories to numeric values from 1 if the venue belongs
categories and 0 if the place does not belong to a category.
Weighing is then performed to obtain the proportion of the category in the exploring boroughs.

In [30]:
#Get dummies for Dataframe and compute the weight ov venue category for each neighborhood
london_processing = pd.get_dummies(london_venues_explore[['Venue Category']], prefix="", prefix_sep="")
london_processing['Borough'] = london_venues_explore['Borough'] 

fixed_columns = [london_processing.columns[-1]] + list(london_processing.columns[:-1])
london_processing = london_processing[fixed_columns]

london_grouped = london_processing.groupby('Borough').mean().reset_index()
display(london_grouped)

Unnamed: 0,Borough,Afghan Restaurant,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Waterfront,Windmill,Wine Bar,Wine Shop,Xinjiang Restaurant,Yoga Studio
0,Barking and Dagenham,0.0,0.0,0.021505,0.0,0.0,0.0,0.0,0.0,0.010753,...,0.0,0.0,0.0,0.010753,0.0,0.0,0.0,0.0,0.0,0.0
1,Barnet,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Brent,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,...,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0
3,Bromley,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01
4,Enfield,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0
5,Hillingdon,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020619,...,0.0,0.0,0.0,0.010309,0.0,0.0,0.0,0.0,0.0,0.0
6,Islington,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01
7,Kingston upon Thames,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Lambeth,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,...,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01
9,Lewisham,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,...,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0


In [15]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Creation of a dataframe for the top 10 categories for the boroughs under study.

In [65]:
#Explore top 10 venue categories for each neighborhood
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Borough']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
boroughs_venues_sorted = pd.DataFrame(columns=columns)
boroughs_venues_sorted['Borough'] = london_grouped['Borough']

for ind in np.arange(london_grouped.shape[0]):
    boroughs_venues_sorted.iloc[ind, 1:] = return_most_common_venues(london_grouped.iloc[ind, :], num_top_venues)

display(boroughs_venues_sorted)

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Barking and Dagenham,Grocery Store,Supermarket,Pub,Coffee Shop,Park,Café,Hotel,Furniture / Home Store,Fast Food Restaurant,Shopping Mall
1,Barnet,Café,Turkish Restaurant,Coffee Shop,Pub,Park,Bakery,Bar,Grocery Store,Greek Restaurant,French Restaurant
2,Brent,Indian Restaurant,Coffee Shop,Hotel,Clothing Store,Park,Gym / Fitness Center,Hookah Bar,Café,Pub,Pizza Place
3,Bromley,Pub,Park,Coffee Shop,Gym / Fitness Center,Pizza Place,Indian Restaurant,Italian Restaurant,Historic Site,Garden Center,Fast Food Restaurant
4,Enfield,Pub,Coffee Shop,Park,Turkish Restaurant,Café,Garden Center,Gym / Fitness Center,Supermarket,Grocery Store,Greek Restaurant
5,Hillingdon,Pub,Indian Restaurant,Coffee Shop,Supermarket,Hotel,Gym / Fitness Center,Park,Pharmacy,Thai Restaurant,Golf Course
6,Islington,Pub,Café,Coffee Shop,Park,Gastropub,Theater,French Restaurant,Gym / Fitness Center,Mediterranean Restaurant,Pizza Place
7,Kingston upon Thames,Pub,Café,Park,Coffee Shop,Garden,Gym / Fitness Center,Italian Restaurant,Gastropub,Thai Restaurant,Hotel
8,Lambeth,Coffee Shop,Pub,Park,Café,Brewery,Market,Beer Bar,Gastropub,Pizza Place,Farmers Market
9,Lewisham,Pub,Park,Coffee Shop,Gastropub,Gym / Fitness Center,Italian Restaurant,Indian Restaurant,Café,Farmers Market,Beer Store


KMeans clustering based on venues data

In [66]:
#Cluster processing for boroughs based on venue data
kclusters = 3

london_grouped_clustering = london_grouped.drop('Borough', 1)

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(london_grouped_clustering)

boroughs_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

london_merged = df_explore

london_merged = london_merged[['Borough', 'Latitude', 'Longitude']].merge(boroughs_venues_sorted.set_index('Borough'), how='left',on='Borough')


In [67]:
#explore dataframe
display(london_merged.head(10))

Unnamed: 0,Borough,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Barking and Dagenham,51.5607,0.1557,0,Grocery Store,Supermarket,Pub,Coffee Shop,Park,Café,Hotel,Furniture / Home Store,Fast Food Restaurant,Shopping Mall
1,Barnet,51.6252,-0.151,0,Café,Turkish Restaurant,Coffee Shop,Pub,Park,Bakery,Bar,Grocery Store,Greek Restaurant,French Restaurant
2,Brent,51.5588,-0.281,1,Indian Restaurant,Coffee Shop,Hotel,Clothing Store,Park,Gym / Fitness Center,Hookah Bar,Café,Pub,Pizza Place
3,Bromley,51.4039,0.0198,2,Pub,Park,Coffee Shop,Gym / Fitness Center,Pizza Place,Indian Restaurant,Italian Restaurant,Historic Site,Garden Center,Fast Food Restaurant
4,Enfield,51.6538,-0.079,0,Pub,Coffee Shop,Park,Turkish Restaurant,Café,Garden Center,Gym / Fitness Center,Supermarket,Grocery Store,Greek Restaurant
5,Hillingdon,51.5441,-0.476,2,Pub,Indian Restaurant,Coffee Shop,Supermarket,Hotel,Gym / Fitness Center,Park,Pharmacy,Thai Restaurant,Golf Course
6,Islington,51.5416,-0.102,2,Pub,Café,Coffee Shop,Park,Gastropub,Theater,French Restaurant,Gym / Fitness Center,Mediterranean Restaurant,Pizza Place
7,Kingston upon Thames,51.4085,-0.306,2,Pub,Café,Park,Coffee Shop,Garden,Gym / Fitness Center,Italian Restaurant,Gastropub,Thai Restaurant,Hotel
8,Lambeth,51.4607,-0.116,2,Coffee Shop,Pub,Park,Café,Brewery,Market,Beer Bar,Gastropub,Pizza Place,Farmers Market
9,Lewisham,51.4452,-0.02,2,Pub,Park,Coffee Shop,Gastropub,Gym / Fitness Center,Italian Restaurant,Indian Restaurant,Café,Farmers Market,Beer Store


In [68]:
#Explore the first cluster
display(london_merged[london_merged['Cluster Labels']==0][["Borough", "1st Most Common Venue", "2nd Most Common Venue", "3rd Most Common Venue", "4th Most Common Venue", "5th Most Common Venue", "6th Most Common Venue", "7th Most Common Venue", "8th Most Common Venue", "9th Most Common Venue", "10th Most Common Venue"]])

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Barking and Dagenham,Grocery Store,Supermarket,Pub,Coffee Shop,Park,Café,Hotel,Furniture / Home Store,Fast Food Restaurant,Shopping Mall
1,Barnet,Café,Turkish Restaurant,Coffee Shop,Pub,Park,Bakery,Bar,Grocery Store,Greek Restaurant,French Restaurant
4,Enfield,Pub,Coffee Shop,Park,Turkish Restaurant,Café,Garden Center,Gym / Fitness Center,Supermarket,Grocery Store,Greek Restaurant
12,Sutton,Pub,Park,Grocery Store,Coffee Shop,Supermarket,Pharmacy,Café,Hotel,Italian Restaurant,Train Station


In [69]:
#Explore the second cluster
display(london_merged[london_merged['Cluster Labels']==1][["Borough", "1st Most Common Venue", "2nd Most Common Venue", "3rd Most Common Venue", "4th Most Common Venue", "5th Most Common Venue", "6th Most Common Venue", "7th Most Common Venue", "8th Most Common Venue", "9th Most Common Venue", "10th Most Common Venue"]])

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Brent,Indian Restaurant,Coffee Shop,Hotel,Clothing Store,Park,Gym / Fitness Center,Hookah Bar,Café,Pub,Pizza Place
11,Southwark,Coffee Shop,Hotel,Scenic Lookout,Theater,Restaurant,Pub,Beer Bar,Seafood Restaurant,Pizza Place,Grocery Store
13,Tower Hamlets,Coffee Shop,Hotel,Bar,Pub,Italian Restaurant,Plaza,Burger Joint,Gym / Fitness Center,Gym,Park


In [70]:
#Explore the third cluster
display(london_merged[london_merged['Cluster Labels']==2][["Borough", "1st Most Common Venue", "2nd Most Common Venue", "3rd Most Common Venue", "4th Most Common Venue", "5th Most Common Venue", "6th Most Common Venue", "7th Most Common Venue", "8th Most Common Venue", "9th Most Common Venue", "10th Most Common Venue"]])

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Bromley,Pub,Park,Coffee Shop,Gym / Fitness Center,Pizza Place,Indian Restaurant,Italian Restaurant,Historic Site,Garden Center,Fast Food Restaurant
5,Hillingdon,Pub,Indian Restaurant,Coffee Shop,Supermarket,Hotel,Gym / Fitness Center,Park,Pharmacy,Thai Restaurant,Golf Course
6,Islington,Pub,Café,Coffee Shop,Park,Gastropub,Theater,French Restaurant,Gym / Fitness Center,Mediterranean Restaurant,Pizza Place
7,Kingston upon Thames,Pub,Café,Park,Coffee Shop,Garden,Gym / Fitness Center,Italian Restaurant,Gastropub,Thai Restaurant,Hotel
8,Lambeth,Coffee Shop,Pub,Park,Café,Brewery,Market,Beer Bar,Gastropub,Pizza Place,Farmers Market
9,Lewisham,Pub,Park,Coffee Shop,Gastropub,Gym / Fitness Center,Italian Restaurant,Indian Restaurant,Café,Farmers Market,Beer Store
10,Richmond upon Thames,Pub,Park,Café,Coffee Shop,Garden,Rugby Stadium,Bakery,Italian Restaurant,Restaurant,Hotel
14,Waltham Forest,Pub,Coffee Shop,Park,Brewery,Café,Restaurant,Pizza Place,Supermarket,Bakery,Mediterranean Restaurant


In [71]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(london_merged['Latitude'], london_merged['Longitude'], london_merged['Borough'], london_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

It looks like the boroughs from the second cluster have the greatest potential for opening a new restaurant, because in the top 3 categories of venues there are: no parks (dense development) and there are hotels (tourists and business travellers can significantly increase attendance).
On opposite side top-10 venues of boroughs from the first cluster are made up of parks, supermarkets, furniture stores, grocery store and etc. This set of venues characterizes these boroughs as outskirts. Most of the venues are likely to be used by locals and suburbanites who drive through the area. This makes the opening of a new restaurant in these boroughs the least attractive. 

## Preparing data and cluster the boroughs (income data)

Since visiting restaurants is not a matter of basic necessity, when determining the most suitable boroughs, it should also take into account the income level of the population living and working there.

In [50]:
#Creating inocme Dataframe
income_data = pd.read_csv('income_by_borough.csv', sep=';')
income_data_explore = income_data[income_data['Borough'].isin(interesting_boroughs)]

display(income_data)

Unnamed: 0,Borough,Tax_payers,Workplace
0,Barking and Dagenham,23900,28553
1,Barnet,28700,32143
2,Bexley,26900,30733
3,Brent,24700,30134
4,Bromley,32000,29819
5,Camden,37300,38147
6,Croydon,27500,32109
7,Ealing,26700,30259
8,Enfield,26300,29134
9,Greenwich,27600,32635


In [51]:
#Cluster processing for boroughs based on income data
income_clusters = 3

income_data_clustering = income_data_explore.drop('Borough', 1)

kmeans = KMeans(n_clusters=income_clusters, random_state=0).fit(income_data_clustering)

income_data_explore.insert(3, 'Income Cluster Labels', kmeans.labels_)

display(income_data_explore)

Unnamed: 0,Borough,Tax_payers,Workplace,Income Cluster Labels
0,Barking and Dagenham,23900,28553,0
1,Barnet,28700,32143,0
3,Brent,24700,30134,0
4,Bromley,32000,29819,2
8,Enfield,26300,29134,0
15,Hillingdon,27100,33596,0
17,Islington,33400,39348,2
19,Kingston upon Thames,32400,31308,2
20,Lambeth,29900,35036,2
21,Lewisham,27300,33294,0


Let's create a data frame with geographic coordinates of the boroughs and both cluster labels. The results of the clustering processes will be combined under a single value for the final letter based rating ("A +" - most preferred, "C-" - least preferred).

In [52]:
#Creating final dataframe
borough_clustered = london_merged.merge(income_data_explore.set_index('Borough')['Income Cluster Labels'], how='left',on='Borough')
borough_clustered = borough_clustered[['Borough','Latitude', 'Longitude','Cluster Labels', 'Income Cluster Labels']]

borough_clustered['Cluster Labels'] = borough_clustered['Cluster Labels'].astype('str').replace({'0':'C', '1':'A', '2':'B',})
borough_clustered['Income Cluster Labels'] = borough_clustered['Income Cluster Labels'].astype('str').replace({'0':'-', '1':'+', '2':'',})

borough_clustered['Final cluster'] = borough_clustered['Cluster Labels'].astype('str') + borough_clustered['Income Cluster Labels']
display(borough_clustered)

Unnamed: 0,Borough,Latitude,Longitude,Cluster Labels,Income Cluster Labels,Final cluster
0,Barking and Dagenham,51.5607,0.1557,C,-,C-
1,Barnet,51.6252,-0.151,C,-,C-
2,Brent,51.5588,-0.281,A,-,A-
3,Bromley,51.4039,0.0198,B,,B
4,Enfield,51.6538,-0.079,C,-,C-
5,Hillingdon,51.5441,-0.476,B,-,B-
6,Islington,51.5416,-0.102,B,,B
7,Kingston upon Thames,51.4085,-0.306,B,,B
8,Lambeth,51.4607,-0.116,B,,B
9,Lewisham,51.4452,-0.02,B,-,B-


In [74]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
colors_dict = dict(zip([letter+symb for letter in ['A', 'B', 'C'] for symb in ['+', '', '-']], range(9)))
x = np.arange(len(colors_dict.keys()))
ys = [i + x + (i*x)**2 for i in range(len(colors_dict.keys()))]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]


# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(borough_clustered['Latitude'], borough_clustered['Longitude'], borough_clustered['Borough'], borough_clustered['Final cluster']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[colors_dict[cluster]],
        fill=True,
        fill_color=rainbow[colors_dict[cluster]],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<table>
    <thead>
        <tr>
            <th style="text-align:left;"><h3>Borough rating</h3></th>
            <th style="text-align:center;"><h3>Description</h3></th></tr>
    </thead>
    <tbody>
        <tr>
            <td style="text-align:left;">
                <span style="color:#6926C3"><h3>A+</h3></span>
                <span style="color:#3C58D5"><h3>A</h3></span>
                <span style="color:#21AAD8"><h3>A-</h3></span>                
            </td>
            <td style="text-align:left;">Boroughs rated "A" are the best option to open a new restaurant, especially those rated "A +".</td>
        </tr>
        <tr>
            <td style="text-align:left;">
                <span style="color:#91FFBE"><h3>B</h3></span>
                <span style="color:#C9E694"><h3>B-</h3></span>                
            </td>
            <td style="text-align:left;">Boroughs rated 'B' may be considered for open a new restaurant, but they are not the best options.</td>         
        </tr>
        <tr>
            <td style="text-align:left;">
                <span style="color:#D01212"><h3>C-</h3></span>
            </td>
            <td style="text-align:left;">Boroughs rated 'C' should not be considered for a new restaurant opening. However, with a more detailed exploring and identification of points of attraction of foot-traffic, they can be a good option.</td>            
        </tr>
    </tbody>
</table>

## Conclusion

In this project, the boroughs in London, England have been successfully analyzed for determining which would be the best borough for opening a new restaurant. Based on the analysis carried out, boroughs hve been rated based on venue and population income data. The stakeholders and investors can further tune this by considering various other factors like transport, legal requirements, and costs associated which were out of the scope for this project and thus were not considered.
