## Predicting the best Borough in outter London for a food/drink business

### 1. Introduction

#### 1.1 Background

London is the capital and largest city of England and the United Kingdom. The city stands on the River Thames in the south-east of England, at the head of its 50-mile (80 km) estuary leading to the North Sea. London has been a major settlement for two millennia.London is an international centre of business, finance, arts, and culture, and is recognized as one of the most multicultural and cosmopolitan cities in the world.This means that the market is highly competitive meaning that the cost of doing business is one of the highest. But what about those who don't have enough money, but still want to open a restaurant/drink business outside inner city limits? Where will it be cheapest and will have enough people living around to be popular? Where the competition is not too overwhelming? Therefore, any new business setting must be analyzed thorougly so as to have the most profit of it. Statistical analysis of the proper data is mandatory, because it will certainly reduce the risk of failure.

#### 1.2 Problem

Data that might contribute to determining the best outter borough in London might include the Boroughs outside the city of London and their population, this type of businesses around them, and the lowest rent possible. This project aims to predict which outter Borough is the most suitable for opening of such type of business.

#### 1.3 Interest

Obviously, everyone from an individual to a small or big company would be very interested in an accurate prediction of the best venue to make a profit of such business.

### 2. Data acquisition and cleaning

#### 2.1 Data sources

Most of the preferable data such as Boroughs of outter London with their coordinates, rent data, and venues data can be found online in Wikipedia (https://en.wikipedia.org/wiki/List_of_London_boroughs') and in the 4Square API.

#### 2.2 Data cleaning

Data were downloaded and scraped in one table. However, there was a problem because Wikipedia provide us with some information that weren't needed for our analysis such as Borough counsil, political situation, and the inner Boroughs of London. These data were deleted and we only kept those that mattered to us such as Name, Area, Population, Coordinates, and Rent.

Through the explore function in 4SQuare we get a dataset of venues, from which we request the specific venues we take interest in.

#### 2.3 Feature selection

From the dataset we created, we transform it to show the top 5 places we can better work with. Then, we merge the dataframes we created through our analysis into one dataframe, that contains all data values that will best help us with our analysis.

In [26]:
# I only use these lines of code for the description "feature selection"

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]


num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Borough']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
brgh_venues_sorted = pd.DataFrame(columns=columns)
brgh_venues_sorted['Borough'] = Brgh_grouped['Borough']

for ind in np.arange(Brgh_grouped.shape[0]):
    brgh_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Brgh_grouped.iloc[ind, :], num_top_venues)
brgh_venues_sorted.head(8)

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Barking and Dagenham,Supermarket,Grocery Store,Park,Coffee Shop,Pub
1,Bexley,Pizza Place,Coffee Shop,Ice Cream Shop,Discount Store,Bakery
2,Bromley,Pub,Grocery Store,Coffee Shop,Park,Pizza Place
3,Enfield,Pub,Coffee Shop,Turkish Restaurant,Greek Restaurant,Garden Center
4,Haringey,Café,Pub,Park,Coffee Shop,Turkish Restaurant
5,Havering,Hotel,Park,Coffee Shop,Garden,Bakery
6,Merton,Pub,Park,Coffee Shop,Café,Bar
7,Redbridge,Pub,Park,Coffee Shop,Restaurant,Italian Restaurant


### 3. Exploratory Data analysis

*First, we get the nessessary information on London Boroughs, dropping the extras, that will not be needed for the analysis.*


In [1]:
from geopy.geocoders import Nominatim
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
import requests
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors
!pip install -U notebook-as-pdf

Requirement already up-to-date: notebook-as-pdf in /opt/conda/envs/Python-3.7-main/lib/python3.7/site-packages (0.4.0)


In [2]:
url='https://en.wikipedia.org/wiki/List_of_London_boroughs'

LDF=pd.read_html(url, header=0)[0]

LDF.head()

Unnamed: 0,Borough,Inner,Status,Local authority,Political control,Headquarters,Area (sq mi),Population (2019 est)[1],Co-ordinates,Nr. in map
0,Barking and Dagenham [note 1],,,Barking and Dagenham London Borough Council,Labour,"Town Hall, 1 Town Square",13.93,212906,".mw-parser-output .geo-default,.mw-parser-outp...",25
1,Barnet,,,Barnet London Borough Council,Conservative,"Barnet House, 2 Bristol Avenue, Colindale",33.49,395896,51°37′31″N 0°09′06″W﻿ / ﻿51.6252°N 0.1517°W,31
2,Bexley,,,Bexley London Borough Council,Conservative,"Civic Offices, 2 Watling Street",23.38,248287,51°27′18″N 0°09′02″E﻿ / ﻿51.4549°N 0.1505°E,23
3,Brent,,,Brent London Borough Council,Labour,"Brent Civic Centre, Engineers Way",16.7,329771,51°33′32″N 0°16′54″W﻿ / ﻿51.5588°N 0.2817°W,12
4,Bromley,,,Bromley London Borough Council,Conservative,"Civic Centre, Stockwell Close",57.97,332336,51°24′14″N 0°01′11″E﻿ / ﻿51.4039°N 0.0198°E,20


In [27]:
# we drop the data we don't need

LF = LDF.drop(['Status','Local authority','Political control','Headquarters','Nr. in map'], axis=1)
LF['Inner'].replace(np.nan,'0', inplace=True)
LF['Borough'].replace('Barking and Dagenham [note 1]','Barking and Dagenham', inplace=True)
LF['Borough'].replace('Greenwich [note 2]','Greenwich', inplace=True)
LF['Borough'].replace('Hammersmith and Fulham [note 4]','Hammersmith and Fulham', inplace=True)
Inn = ['Camden','Greenwich','Hackney','Hammersmith and Fulham','Islington','Kensington and Chelsea','Lewisham','Lambeth','Southwark','Tower Hamlets','Wandsworth','Westminster']
LF.head()
LF['Inner'] = '0'
LF.head()

Unnamed: 0,Borough,Inner,Area (sq mi),Population (2019 est)[1],Co-ordinates
0,Barking and Dagenham,0,13.93,212906,".mw-parser-output .geo-default,.mw-parser-outp..."
1,Barnet,0,33.49,395896,51°37′31″N 0°09′06″W﻿ / ﻿51.6252°N 0.1517°W
2,Bexley,0,23.38,248287,51°27′18″N 0°09′02″E﻿ / ﻿51.4549°N 0.1505°E
3,Brent,0,16.7,329771,51°33′32″N 0°16′54″W﻿ / ﻿51.5588°N 0.2817°W
4,Bromley,0,57.97,332336,51°24′14″N 0°01′11″E﻿ / ﻿51.4039°N 0.0198°E


*Then we rename the columns, making the dataset better on the eyes. Because of extra notes in the Wiki page, we will rename some of the Boroughs. Due to the staggering difference in rent price, as well as the ammount of venues in London, we will filter to have only the Outer boroughs going forward*

In [28]:
LF['Inner'] = LF.Borough.isin(Inn).astype(int)
Out = LF[LF.Inner == 0]
Out = Out.drop(['Inner'], axis=1)
df = Out.rename(columns = {"Area (sq mi)": "Area", 
                            "Population (2013 est)[1]":"Population"})
geolocator = Nominatim(user_agent="London_explorer")
df['Co-ordinates']= df['Borough'].apply(geolocator.geocode).apply(lambda x: (x.latitude, x.longitude))
df[['Latitude', 'Longitude']] = df['Co-ordinates'].apply(pd.Series)
df.head()

Unnamed: 0,Borough,Area,Population (2019 est)[1],Co-ordinates,Latitude,Longitude
0,Barking and Dagenham,13.93,212906,"(51.5541171, 0.15050434261994267)",51.554117,0.150504
1,Barnet,33.49,395896,"(51.65309, -0.2002261)",51.65309,-0.200226
2,Bexley,23.38,248287,"(39.9692378, -82.936864)",39.969238,-82.936864
3,Brent,16.7,329771,"(32.9373463, -87.1647184)",32.937346,-87.164718
4,Bromley,57.97,332336,"(51.4028046, 0.0148142)",51.402805,0.014814


*Finally, we edit the coordinates, and find the Boroughs with the lowest max rent.*

In [5]:
Max_Rent = ['102.25','150.75','97','150.75','118.5','129.25','140','102.25','107.75','140','86','161.5','161.5','140','123.75','134.5','118.5','140','129.25','145.25']
df['Max_Rent'] = Max_Rent

df["Max_Rent"] = pd.to_numeric(df["Max_Rent"])
fin = df[df.Max_Rent <= 125]
fin

Long_list = fin['Longitude'].tolist()
Lat_list = fin['Latitude'].tolist()
print ("Old latitude list: ", Lat_list)
print ("Old Longitude list: ", Long_list)

replace_longitudes = {-106.6621329:0.0799, -2.8417544: 0.1837}
replace_latitudes = {50.7164496:51.6636, 51.0358628: 51.5499}

longtitudes_new = [replace_longitudes.get(n7,n7) for n7 in Long_list]
latitudes_new = [replace_latitudes.get(n7,n7) for n7 in Lat_list]

fin = fin.drop(['Co-ordinates', 'Longitude'], axis=1)

fin['Longitude'] = longtitudes_new
fin['Latitude'] = latitudes_new
fin

Old latitude list:  [51.5541171, 39.9692378, 51.4028046, 51.6520851, 51.587929849999995, 51.5443851, 51.41086985, 51.5763203]
Old Longitude list:  [0.15050434261994267, -82.936864, 0.0148142, -0.0810175, -0.10541010599099046, -0.14430716398919305, -0.18809708858824303, 0.0454097]


Unnamed: 0,Borough,Area,Population (2019 est)[1],Latitude,Max_Rent,Longitude
0,Barking and Dagenham,13.93,212906,51.554117,102.25,0.150504
2,Bexley,23.38,248287,39.969238,97.0,-82.936864
4,Bromley,57.97,332336,51.402805,118.5,0.014814
8,Enfield,31.74,333794,51.652085,102.25,-0.081018
12,Haringey,11.42,268647,51.58793,107.75,-0.10541
14,Havering,43.35,259552,51.544385,86.0,-0.144307
22,Merton,14.52,206548,51.41087,123.75,-0.188097
24,Redbridge,21.78,305222,51.57632,118.5,0.04541


*Creation of the map with all Boroughs*

In [29]:

address = 'London'

geolocator = Nominatim(user_agent="London_explorer")
location = geolocator.geocode(address)
London_latitude = location.latitude
London_longitude = location.longitude

In [30]:
!pip install folium
import folium

Fin_Brgh = folium.Map(location=[London_latitude, London_longitude], zoom_start=12)


for lat, lng, label in zip(fin['Latitude'], fin['Longitude'], 
                            fin['Borough']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=9,
        popup=label,
        color='Red',
        fill=True,
        fill_color='#Blue',
        fill_opacity=0.7).add_to(Fin_Brgh)
Fin_Brgh



*Accessing the venues through 4Square Api, enlarging the radius due to the big Borough area.*

In [31]:
CLIENT_ID = 'QPT2O4JOXLDJW43COYX2F2N5XS02ATEO5W4POCBNGJ4WPS3Y' #'your-client-ID' # your Foursquare ID
CLIENT_SECRET = 'NHZXDY231F3WQ2WRJOMRCM1B34YPFWHGVG454INTLYNX05C5' #'your-client-secret' # your Foursquare Secret
VERSION = '20190915' # Foursquare API version

In [32]:
radius = 5000
LIMIT = 100

def getVenues(names, latitudes, longitudes, radius=5000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Borough', 
                  'Latitude', 
                  'Longitude', 
                  'Venue', 
                  'Venue_Lat', 
                  'Venue_Long', 
                  'Venue_Category']
    
    return(nearby_venues)

In [9]:
Brgh_Venues = getVenues(names=fin['Borough'],
                        latitudes=fin['Latitude'],
                        longitudes=fin['Longitude'])

Barking and Dagenham
Bexley
Bromley
Enfield
Haringey
Havering
Merton
Redbridge


*Counting the venues of each Borough, and finding the most popular one*

In [10]:
Brgh_Venues.groupby('Borough').count()

Unnamed: 0_level_0,Latitude,Longitude,Venue,Venue_Lat,Venue_Long,Venue_Category
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Barking and Dagenham,97,97,97,97,97,97
Bexley,91,91,91,91,91,91
Bromley,100,100,100,100,100,100
Enfield,100,100,100,100,100,100
Haringey,100,100,100,100,100,100
Havering,100,100,100,100,100,100
Merton,100,100,100,100,100,100
Redbridge,100,100,100,100,100,100


In [11]:
London_Brgh_onehot = pd.get_dummies(Brgh_Venues[['Venue_Category']], prefix="", prefix_sep="")
mid =  Brgh_Venues['Borough']

London_Brgh_onehot.insert(0, 'Borough', mid)

London_Brgh_onehot.head()

Unnamed: 0,Borough,ATM,Afghan Restaurant,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Barking and Dagenham,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Barking and Dagenham,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Barking and Dagenham,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Barking and Dagenham,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Barking and Dagenham,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [12]:
Brgh_grouped = London_Brgh_onehot.groupby('Borough').mean().reset_index()
Brgh_grouped

Unnamed: 0,Borough,ATM,Afghan Restaurant,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Barking and Dagenham,0.0,0.0,0.010309,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Bexley,0.010989,0.0,0.010989,0.0,0.0,0.010989,0.0,0.0,0.0,...,0.0,0.021978,0.010989,0.0,0.0,0.0,0.010989,0.010989,0.010989,0.0
2,Bromley,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01
3,Enfield,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,...,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0
4,Haringey,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0
5,Havering,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.01,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01
6,Merton,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,...,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0
7,Redbridge,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,...,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0


In [13]:
# 5 most popular venues

num_top_venues = 5

for brgh in Brgh_grouped['Borough']:
    print("_________"+brgh+"________")
    temp = Brgh_grouped[Brgh_grouped['Borough'] == brgh].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

_________Barking and Dagenham________
           venue  freq
0    Supermarket  0.10
1  Grocery Store  0.09
2           Park  0.09
3    Coffee Shop  0.08
4            Pub  0.05


_________Bexley________
                venue  freq
0         Pizza Place  0.05
1         Coffee Shop  0.05
2      Discount Store  0.04
3      Ice Cream Shop  0.04
4  Chinese Restaurant  0.03


_________Bromley________
           venue  freq
0            Pub  0.12
1    Coffee Shop  0.09
2  Grocery Store  0.09
3           Park  0.06
4    Pizza Place  0.05


_________Enfield________
                venue  freq
0         Coffee Shop  0.08
1                 Pub  0.08
2  Turkish Restaurant  0.07
3    Greek Restaurant  0.06
4                Café  0.05


_________Haringey________
                venue  freq
0                Café  0.09
1                 Pub  0.08
2                Park  0.07
3         Coffee Shop  0.07
4  Turkish Restaurant  0.07


_________Havering________
         venue  freq
0        Hotel  0.06
1   

In [14]:
# Convert the above arrays into dataframe

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]


num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Borough']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
brgh_venues_sorted = pd.DataFrame(columns=columns)
brgh_venues_sorted['Borough'] = Brgh_grouped['Borough']

for ind in np.arange(Brgh_grouped.shape[0]):
    brgh_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Brgh_grouped.iloc[ind, :], num_top_venues)

brgh_venues_sorted.head(8)

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Barking and Dagenham,Supermarket,Grocery Store,Park,Coffee Shop,Pub
1,Bexley,Pizza Place,Coffee Shop,Ice Cream Shop,Discount Store,Bakery
2,Bromley,Pub,Grocery Store,Coffee Shop,Park,Pizza Place
3,Enfield,Pub,Coffee Shop,Turkish Restaurant,Greek Restaurant,Garden Center
4,Haringey,Café,Pub,Park,Coffee Shop,Turkish Restaurant
5,Havering,Hotel,Park,Coffee Shop,Garden,Bakery
6,Merton,Pub,Park,Coffee Shop,Café,Bar
7,Redbridge,Pub,Park,Coffee Shop,Restaurant,Italian Restaurant


In [15]:
kclusters = 3

brgh_grouped_clustering = Brgh_grouped.drop('Borough', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(brgh_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 1, 0, 0, 2, 1, 2, 2], dtype=int32)

In [16]:
# Join all the dataframes

brgh_venues_sorted.insert(0, 'Cluster Label', kmeans.labels_)
Borough_merged = pd.merge(fin,brgh_venues_sorted, on='Borough')
Borough_merged

Unnamed: 0,Borough,Area,Population (2019 est)[1],Latitude,Max_Rent,Longitude,Cluster Label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Barking and Dagenham,13.93,212906,51.554117,102.25,0.150504,0,Supermarket,Grocery Store,Park,Coffee Shop,Pub
1,Bexley,23.38,248287,39.969238,97.0,-82.936864,1,Pizza Place,Coffee Shop,Ice Cream Shop,Discount Store,Bakery
2,Bromley,57.97,332336,51.402805,118.5,0.014814,0,Pub,Grocery Store,Coffee Shop,Park,Pizza Place
3,Enfield,31.74,333794,51.652085,102.25,-0.081018,0,Pub,Coffee Shop,Turkish Restaurant,Greek Restaurant,Garden Center
4,Haringey,11.42,268647,51.58793,107.75,-0.10541,2,Café,Pub,Park,Coffee Shop,Turkish Restaurant
5,Havering,43.35,259552,51.544385,86.0,-0.144307,1,Hotel,Park,Coffee Shop,Garden,Bakery
6,Merton,14.52,206548,51.41087,123.75,-0.188097,2,Pub,Park,Coffee Shop,Café,Bar
7,Redbridge,21.78,305222,51.57632,118.5,0.04541,2,Pub,Park,Coffee Shop,Restaurant,Italian Restaurant


In [17]:
Borough_merged.loc[Borough_merged['Cluster Label'] == 0, Borough_merged.columns[[0,1,2,4] + list(range(6, Borough_merged.shape[1]))]]

Unnamed: 0,Borough,Area,Population (2019 est)[1],Max_Rent,Cluster Label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Barking and Dagenham,13.93,212906,102.25,0,Supermarket,Grocery Store,Park,Coffee Shop,Pub
2,Bromley,57.97,332336,118.5,0,Pub,Grocery Store,Coffee Shop,Park,Pizza Place
3,Enfield,31.74,333794,102.25,0,Pub,Coffee Shop,Turkish Restaurant,Greek Restaurant,Garden Center


In [18]:
Borough_merged.loc[Borough_merged['Cluster Label'] == 1, Borough_merged.columns[[0,1,2,4] + list(range(6, Borough_merged.shape[1]))]]

Unnamed: 0,Borough,Area,Population (2019 est)[1],Max_Rent,Cluster Label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,Bexley,23.38,248287,97.0,1,Pizza Place,Coffee Shop,Ice Cream Shop,Discount Store,Bakery
5,Havering,43.35,259552,86.0,1,Hotel,Park,Coffee Shop,Garden,Bakery


In [19]:
Borough_merged.loc[Borough_merged['Cluster Label'] == 2, Borough_merged.columns[[0,1,2,4] + list(range(6, Borough_merged.shape[1]))]]

Unnamed: 0,Borough,Area,Population (2019 est)[1],Max_Rent,Cluster Label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
4,Haringey,11.42,268647,107.75,2,Café,Pub,Park,Coffee Shop,Turkish Restaurant
6,Merton,14.52,206548,123.75,2,Pub,Park,Coffee Shop,Café,Bar
7,Redbridge,21.78,305222,118.5,2,Pub,Park,Coffee Shop,Restaurant,Italian Restaurant


In [22]:
#Mapping of the clusters

map_clusters = folium.Map(location=[London_latitude, London_longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster, rent, pop in zip(Borough_merged['Latitude'],
                                  Borough_merged['Longitude'],
                                  Borough_merged['Borough'],
                                  Borough_merged['Cluster Label'],
                                  Borough_merged['Max_Rent'],
                                  Borough_merged['Population (2019 est)[1]']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster) + " " + "Rent " + str(rent) + " " + "Population " + str(pop), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=25,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### 3.1 Clustering

Once the Boroughs are selected, we cluster them to analyse their similarities and differencies and we can find the advantages or disadvantages of each Borough selected.

#### 4. Results

In the **first** cluster which includes *Barking and Dagenham, Bromley,*, and *Enfield* we see that the highest rent is in *Bromley*, while the other two Boroughs share the same price of rent. The lowest in popularity type of venue is **Pizza Place** for *Bromley*, **Garden Center** for *Enfield* (which isn't similar with the type of business we are interested in), and **Pub** for *Barking and Dagenham*.

In the **second** cluster which includes *Bexley*, and *Havering* we see that the highest rent is in *Bexley*. The lowest in popularity type of venue is **Bakery** for both Boroughs.

In the **third** cluster which includes *Haringey, Merton*, and *Redbridge* we see that the highest rent is in *Merton*, while the lowest is in *Haringey*.The lowest in popularity type of venue is **Turkish restaurant** for *Haringey*, **Bar** for *Merton*, and **Italian Restaurant** for *Redbridge*.

### 5. Discussion

Taking into consideration the population, the max rent and the least common venue of these 3 clusters, we came into the conclusion that *Enfield,Havering*, and *Haringey* are the best places to open a food/drink business. Although a vast variety of information are lacking or cannot be retrieved at that moment, our data analysis provided a slight insight for a more profitable business move.

### 6. Conclusion

In conclusion, in this analysis i used some of the most common libraries to clear and manipulate data, and the 4square api to extract info about different types of venues in the outter London. By clustering the desirable Boroughs, I managed to come to a conclusion about the most suitable areas to open a food/drink business. Of course, due to lack of certain data, further future analysis is needed.