# The Battle of the Neighbourhoods
### Applied Data Science Capstone


---


## Introduction

London has a population of roughly eight million people and finding the best spot to open a hospitality business is difficult. I aim to make this process easier by analysing different locations in London. I will use the example of finding the optimal location to open a Thai Restaurant.

The ideal location will be found from a combination of factors such as the area average annual house price increase, proximity to central London, proximity to other restaurants and more specifically, proximity to other Thai Restaurants.

Ultimately I will suggest a few neighbourhoods and identify what are the key aspects that make them a good place to open a business.


## Data

The population per London Borough is between 150,000 to 300,000. Instead of boroughs, I will look at smaller subdivisions of London in order to split London into more manageable sizes. To do this lets look at the Parliament Constituenties which are roughly 80,000 in population. Lets get the names by scraping the data from wikipedia using **beautiful soup**. https://en.wikipedia.org/wiki/London_boroughs.

I will get the latitude and longitude coordinates of the centre of London and each of the Boroughs by using the **google maps geocoding API**.

Data relating to parliamentry constituencies is well documented. In order to find the most up and coming areas I will look at which areas had the largest increase in house prices over the past year. The data will be scraped from an excel document from the UK's Office for National Statistics Website.  https://www.ons.gov.uk/peoplepopulationandcommunity/housing/bulletins/housepricestatisticsforsmallareas/yearendingseptember2019

Using **foursquare API** I can get information on the type of venues in the areas to see where there are fewest restaurants.

## Neighborhoods

First lets use beautiful soup to scrape the information on Parliamentry Constituencies from a table on wikipedia

In [1]:
from bs4 import BeautifulSoup
import pandas as pd
import requests

In [2]:
#Create URL variable
URL = 'https://en.wikipedia.org/wiki/List_of_Parliamentary_constituencies_in_London'

#Get request
response = requests.get(URL)

#Parse the data
soup = BeautifulSoup(response.text, 'html.parser')

In [3]:
#Sort the data into a dataframe
table = soup.find('table', {'class': 'wikitable sortable'}).tbody
rows = table.find_all('tr')
columns = [header.text.replace('\n', '') for header in rows[0].find_all('th')]
column = [columns[0]]
constituencies = []
for i in range(1, len(rows)):
    row = rows[i].find_all('td')
    values = [v.text.replace('\n', '') for v in row]
    constituencies.append(values[0])
print('We have a total of {} Constituencies to analyse'.format(len(constituencies)))
df1 = pd.DataFrame(constituencies, columns = column)

We have a total of 73 Constituencies to analyse


## Add longitude & latitude coordinates to the dataset

To get the latitude and longitude coordinates of each london borough I will use google maps geocoding API

In [4]:
import geocoder
import googlemaps

#API_Key is restricted (input your own if you want to run the code)
API = ''

gmaps_key = googlemaps.Client(key = API)

#Create Latitude and Longitude lists
df1Lat = []
df1Lng = []

#loop through the different postal codes to get the latitude and longitude coordinates
for i in range(0, len(df1)):
    g = gmaps_key.geocode('{}, London, UK'.format(df1.iat[i,0]))
    latitude = g[0]['geometry']['location']['lat']
    longitude = g[0]['geometry']['location']['lng']
    df1Lat.append(latitude)
    df1Lng.append(longitude)

#Add the data to the dataframe
df1['Latitude'] = df1Lat
df1['Longitude'] = df1Lng
df1.head()

Unnamed: 0,Constituency,Latitude,Longitude
0,Barking,51.536563,0.075766
1,Battersea,51.472201,-0.165547
2,Beckenham,51.40817,-0.025813
3,Bermondsey and Old Southwark,51.49,-0.07
4,Bethnal Green and Bow,51.530858,-0.040193


Lets plot these coordinates on an interactive map

In [5]:
import folium

#Lets find the latitude and longitude coordinates of London
LondonLat = gmaps_key.geocode('London, UK')[0]['geometry']['location']['lat']
LondonLng = gmaps_key.geocode('London, UK')[0]['geometry']['location']['lng']

#Create London map
map_London = folium.Map(location=[LondonLat, LondonLng], zoom_start=11, control_scale=True)

#Add markers to map
for lat, lng, label in zip(df1['Latitude'], df1['Longitude'], df1['Constituency']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_London) 
    
map_London

## Add house price data and annual % change for 2019/20

Lets now add the annual house price changes for each of the local constituencies to the dataset to find the most up and coming areas. The data will be pulled from an excel file which can be found on the UK Gov website https://commonslibrary.parliament.uk/social-policy/housing/home-ownership/constituency-data-house-prices/#compare_constituencies. Excel document download link: https://data.parliament.uk/resources/constituencystatistics/House-prices.xlsx

In [6]:
#Create the URL variable
file_url = "https://data.parliament.uk/resources/constituencystatistics/House-prices.xlsx"

#Create get request
r = requests.get(file_url, allow_redirects=True)

#Save to local folder as House-prices.xlsx
open('House-prices.xlsx', 'wb').write(r.content)

7893753

In [7]:
import xlrd

#Load the spreadsheet
Housepricedata = xlrd.open_workbook('House-prices.xlsx')

#Open the sheet that corresponds to the house price data
ConstituencyHP=Housepricedata.sheet_by_index(4)

In [8]:
#Print any constituencies the are in the dataframe but are not found in the excel spreadsheet and store in a list
Const = []
for row in range(0,len(df1['Constituency'])):
    if df1['Constituency'][row] not in ConstituencyHP.col_values(1):
        Const.append(df1['Constituency'][row])
        print(df1['Constituency'][row])

Ealing Southall
Enfield Southgate
Lewisham Deptford


Inspection of the excel document tells us that this is because they are stored as 'Ealing, Southall', 'Enfield, Southgate' and 'Lewisham, Deptford'. Lets update these in the dataframe

In [9]:
#Find the indices in the dataframe and store in a list
Constindices = []
for Constituency in range(len(Const)):
    Constindices.append(df1[df1['Constituency']==Const[Constituency]].index.values[0])

#Change the constituency names so they are consistent with the excel document
df1.loc[df1.index[Constindices[0]], 'Constituency'] = 'Ealing, Southall'
df1.loc[df1.index[Constindices[1]], 'Constituency'] = 'Enfield, Southgate'
df1.loc[df1.index[Constindices[2]], 'Constituency'] = 'Lewisham, Deptford'

In [10]:
columns1 = ['Constituency','Median House Price GBP','18/19 % Change']
df2 = pd.DataFrame(columns=columns1)

for row in range(0,ConstituencyHP.nrows):
    if ConstituencyHP.cell_value(row, 1) in df1['Constituency'].tolist() and ConstituencyHP.cell_value(row, 7) == 43709.0 :
        Constituency = ConstituencyHP.cell_value(row, 1)
        AvHousePrice = ConstituencyHP.cell_value(row, 8)
        AvPerChange = ConstituencyHP.cell_value(row, 11)*100
        values=[Constituency,AvHousePrice,AvPerChange]
        df2 = df2.append(pd.Series(values, index=columns1), ignore_index = True)
df2.shape

(73, 3)

In [11]:
df = pd.merge(left = df1, right = df2)
df.head()

Unnamed: 0,Constituency,Latitude,Longitude,Median House Price GBP,18/19 % Change
0,Barking,51.536563,0.075766,310000.0,1.639344
1,Battersea,51.472201,-0.165547,725200.0,3.468447
2,Beckenham,51.40817,-0.025813,485000.0,-2.020202
3,Bermondsey and Old Southwark,51.49,-0.07,600000.0,2.564103
4,Bethnal Green and Bow,51.530858,-0.040193,536500.0,7.3


In [93]:
df.sort_values(by='18/19 % Change', inplace=True, ascending = False)
df.reset_index(inplace = True, drop = True)
df.head()

Unnamed: 0,Constituency,Latitude,Longitude,Median House Price GBP,18/19 % Change,Distance to centre
0,Greenwich and Woolwich,51.483,0.028,515000.0,8.421053,11151.1903
1,Bethnal Green and Bow,51.530858,-0.040193,536500.0,7.3,6616.858593
2,Feltham and Heston,51.46,-0.412,359000.0,6.845238,20435.641039
3,Dulwich and West Norwood,51.447,-0.084,550000.0,6.692532,7370.658052
4,Hackney South and Shoreditch,51.54,-0.06,595000.0,6.25,5942.274411


We can see that Greewich and Woolwich have the largest percentage increase in house prices so potentially it would be a good place to open a restaurant

Lets also add the distance from the centre of London

In [13]:
import geopy.distance
LondonCoords = (LondonLat, LondonLng)
DistanceToCentre = []
for row in range(len(df['Constituency'])):
    ConstituencyCoords = (df['Latitude'][row], df['Longitude'][row])
    DistanceToCentre.append(geopy.distance.geodesic(LondonCoords, ConstituencyCoords).m)
df['Distance to centre'] = DistanceToCentre
df.head()

Unnamed: 0,Constituency,Latitude,Longitude,Median House Price GBP,18/19 % Change,Distance to centre
0,Greenwich and Woolwich,51.483,0.028,515000.0,8.421053,11151.1903
1,Bethnal Green and Bow,51.530858,-0.040193,536500.0,7.3,6616.858593
2,Feltham and Heston,51.46,-0.412,359000.0,6.845238,20435.641039
3,Dulwich and West Norwood,51.447,-0.084,550000.0,6.692532,7370.658052
4,Hackney South and Shoreditch,51.54,-0.06,595000.0,6.25,5942.274411


Here we can see that although it has seen the most growth in house prices, Greenwich and Woolwich is relatively far from the city centre. Perhaps Bethnal Green and Bow would be a better location for a restaurant as it is closer to the city centre.

## Information on nearby venues in the constituencies

Lets use foursquare to get some information on the restaurants in London and their locations. We will see if we can use this information to help us decide on the best location to open a Thai Restaurant

In [14]:
#Input you 4square credentials here
ClientID = ''
ClientSecret = ''
VERSION = '20180605'

In [15]:
# Food https://developer.foursquare.com/docs/build-with-foursquare/categories/
FoodCategories = '4d4b7105d754a06374d81259'
ThaiRestaurantCategories = ['4bf58dd8d48988d149941735', '56aa371be4b08b9a8d573502']

In [16]:
def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def getNearbyVenues(names, lats, lngs, category, radius = 4000, limit = 150):
    venues_list=[]
    for name, lat, lng in zip(names, lats, lngs):
        print(name)
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
            ClientID, ClientSecret, VERSION, lat, lng, category, radius, limit)
        # make the GET request
        results = requests.get(url).json()['response']['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(name, 
                             lat, 
                             lng,
                             v['venue']['id'],
                             v['venue']['name'],
                             v['venue']['location']['lat'], 
                             v['venue']['location']['lng'],
                             v['venue']['categories'][0]['id'],
                             v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Constituency', 
                  'Constituency_Lat',
                  'Constituency_Long', 
                  'Venue_ID',
                  'Venue',
                  'Venue_Lat',
                  'Venue_Long',
                  'Venue_Cat_ID',
                  'Venue_Cat']
    
    return nearby_venues

In [17]:
latitudes = df['Latitude'].tolist()
longitudes = df['Longitude'].tolist()
LondonVenues = getNearbyVenues(df['Constituency'].tolist(),latitudes,longitudes,FoodCategories)

Greenwich and Woolwich
Bethnal Green and Bow
Feltham and Heston
Dulwich and West Norwood
Hackney South and Shoreditch
Lewisham, Deptford
Brent North
Lewisham West and Penge
West Ham
Enfield, Southgate
Dagenham and Rainham
Croydon North
Ruislip, Northwood and Pinner
Battersea
Croydon South
Westminster North
Lewisham East
Hayes and Harlington
Hendon
Putney
Hornchurch and Upminster
Bermondsey and Old Southwark
Croydon Central
Enfield North
Camberwell and Peckham
Eltham
Edmonton
Hackney North and Stoke Newington
Barking
Tottenham
Richmond Park
Bexleyheath and Crayford
Ealing North
Bromley and Chislehurst
Ilford North
Hornsey and Wood Green
Carshalton and Wallington
Erith and Thamesmead
Wimbledon
Tooting
Leyton and Wanstead
Old Bexley and Sidcup
Walthamstow
Twickenham
Romford
Harrow East
Ealing, Southall
Vauxhall
Streatham
Hampstead and Kilburn
Islington South and Finsbury
Finchley and Golders Green
Orpington
Poplar and Limehouse
Chipping Barnet
Harrow West
Chingford and Woodford Green
Sutt

In [18]:
#Lets drop any duplicates due to overlapping areas
LondonVenues = pd.read_csv('LondonVenues.csv', index_col = 0)
LondonVenues.drop_duplicates(subset = 'Venue_Lat', keep = 'first', inplace=True)
LondonVenues.reset_index(inplace=True)
print('Total No. of Restaurants found:', len(LondonVenues))
print('Total No. of Thai Restaurants found:', len(LondonVenues[LondonVenues['Venue_Cat_ID'] == '4bf58dd8d48988d149941735']))

Total No. of Restaurants found: 3282
Total No. of Thai Restaurants found: 64


In [19]:
#Save to a local folder
LondonVenues.to_csv('LondonVenues.csv',index=False)

Lets see what the data looks like

In [20]:
LondonVenues.head()

Unnamed: 0,Constituency,Constituency_Lat,Constituency_Long,Venue_ID,Venue,Venue_Lat,Venue_Long,Venue_Cat_ID,Venue_Cat
0,Greenwich and Woolwich,51.489475,0.067588,5a82a793c97f285a363851db,Boulangerie Jade,51.492575,0.070559,4bf58dd8d48988d16a941735,Bakery
1,Greenwich and Woolwich,51.489475,0.067588,4ef2461b93adff223e479e05,Kailash Momo Restaurant,51.48899,0.067385,4bf58dd8d48988d142941735,Asian Restaurant
2,Greenwich and Woolwich,51.489475,0.067588,557c3971498ec5857dd9bdf4,The Plumstead Pantry,51.481712,0.083707,4bf58dd8d48988d16d941735,Café
3,Greenwich and Woolwich,51.489475,0.067588,4c94a5576b35a143c5201ddc,Viet Baguette,51.488502,0.067808,4bf58dd8d48988d14a941735,Vietnamese Restaurant
4,Greenwich and Woolwich,51.489475,0.067588,58681bde76f2ca03426c1b5d,Con Gusto,51.495038,0.070682,4bf58dd8d48988d110941735,Italian Restaurant


We can plot these locations on a map

In [21]:
#Create London map
map_London = folium.Map(location=[LondonLat, LondonLng], zoom_start=11, control_scale=True)

for lat, lng, label, categoryID in zip(LondonVenues['Venue_Lat'],LondonVenues['Venue_Long'],LondonVenues['Venue'],LondonVenues['Venue_Cat_ID']):
    label = folium.Popup(LondonVenues['Venue'], parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=4,
        popup=label,
        color='red' if categoryID == '4bf58dd8d48988d149941735' else 'blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_London)

map_London

Lets look at a heat map of the restaurant density.

In [22]:
from folium import plugins
from folium.plugins import HeatMap

map_London = folium.Map(location=[LondonLat, LondonLng], zoom_start=13, control_scale=True)
HeatMap(LondonVenues[['Venue_Lat','Venue_Long']]).add_to(map_London)
map_London

We can do the same for the locations of Thai Restaurants in London

In [23]:
from folium.plugins import HeatMap
map_London = folium.Map(location=[LondonLat, LondonLng], zoom_start=11, control_scale=True)
HeatMap(LondonVenues[LondonVenues['Venue_Cat_ID'] == '4bf58dd8d48988d149941735'][['Venue_Lat','Venue_Long']]).add_to(map_London)
map_London

In [24]:
from pyproj import Proj
import math
# Convert from latitude, longitude to cartesain x, y coordinates.
latlonp = Proj(proj="utm", zone=30, ellps="WGS84")

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

London_center_x, London_center_y = latlonp(LondonLat, LondonLng) # City center in UTM Cartesian coordinates

k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = London_center_x - 6000
x_step = 600
y_min = London_center_y - 20000 - (int(21/k)*k*600 - 12000)/2
y_step = 600 * k 

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0, int(60/k)):
    y = y_min + i * y_step
    x_offset = 300 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(London_center_x, London_center_y, x, y)
        if (distance_from_center <= 60001):
            lat, lon = latlonp(x, y, inverse = True)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)
import folium
#Lets find the latitude and longitude coordinates of London
LondonLat = gmaps_key.geocode('London, UK')[0]['geometry']['location']['lat']
LondonLng = gmaps_key.geocode('London, UK')[0]['geometry']['location']['lng']
#Create London map
map_London = folium.Map(location=[LondonLat, LondonLng], zoom_start=13, control_scale=True)
for lat, lon in zip(latitudes, longitudes):
    folium.CircleMarker([lat, lon], radius=0.5, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_London) 
print(len(latitudes), 'candidate neighborhood centers generated.')
map_London

1449 candidate neighborhood centers generated.


Now lets filter these locations by finding places with no restaurants in a 400 km radius.

In [26]:
import heapq

closestrestlat1 = []
closestrestlng1 = []
closestrestlat2 = []
closestrestlng2 = []
for loc in range(len(latitudes)):
#lets use the Manhattan distance to find the closest points as a quicker calculation, then we will calculate actual distance
    Manhattan_distance = []
    for rest in range(len(LondonVenues['Venue_Lat'])):
        Manhattan_distance.append(abs(LondonVenues.Venue_Lat.values[rest] - latitudes[loc]) + abs(LondonVenues.Venue_Long.values[rest] - longitudes[loc]))
#Returns the index of the smallest and second smallest values
    closestindex = heapq.nsmallest(2, range(len(Manhattan_distance)), key=Manhattan_distance.__getitem__)
#Add the restaurant latitudes and longitudes to a list
    closestrestlat1.append(LondonVenues.Venue_Lat.values[closestindex[0]])
    closestrestlng1.append(LondonVenues.Venue_Long.values[closestindex[0]])
    closestrestlat2.append(LondonVenues.Venue_Lat.values[closestindex[1]])
    closestrestlng2.append(LondonVenues.Venue_Long.values[closestindex[1]])

In [27]:
#lets find the actual distance to the nearest restaurants in metres
distance1 = []
distance2 = []
for loc in range(len(latitudes)):
    distance1.append(geopy.distance.geodesic((closestrestlat1[loc],closestrestlng1[loc]),(latitudes[loc],longitudes[loc])).m)
    distance2.append(geopy.distance.geodesic((closestrestlat2[loc],closestrestlng2[loc]),(latitudes[loc],longitudes[loc])).m)

In [28]:
NRcolumns = ['Location_Lat','Location_Lng','Restaurant_Lat(1)','Restaurant_Lng(1)','Distance(1)','Restaurant_Lat(2)','Restaurant_Lng(2)','Distance(2)']
NearestRest = pd.DataFrame(columns = NRcolumns)
NearestRest['Location_Lat'] = latitudes
NearestRest['Location_Lng'] = longitudes
NearestRest['Restaurant_Lat(1)'] = closestrestlat1
NearestRest['Restaurant_Lng(1)'] = closestrestlng1
NearestRest['Distance(1)'] = distance1
NearestRest['Restaurant_Lat(2)'] = closestrestlat2
NearestRest['Restaurant_Lng(2)'] = closestrestlng2
NearestRest['Distance(2)'] = distance2
NearestRest.head()

Unnamed: 0,Location_Lat,Location_Lng,Restaurant_Lat(1),Restaurant_Lng(1),Distance(1),Restaurant_Lat(2),Restaurant_Lng(2),Distance(2)
0,51.478268,-0.233514,51.47488,-0.239207,546.368271,51.474705,-0.241282,669.618551
1,51.48138,-0.233495,51.482088,-0.223215,718.48807,51.484034,-0.224425,695.897933
2,51.484491,-0.233477,51.492641,-0.232788,908.021013,51.484034,-0.224425,630.875312
3,51.487603,-0.233459,51.492641,-0.232788,562.527589,51.492917,-0.235096,602.132442
4,51.490714,-0.233441,51.492641,-0.232788,219.198618,51.492917,-0.235096,270.767276


we can filter this in order to find the locations that do not have a restaurant in a 400m radius

In [69]:
Acceptable_Locations = NearestRest[NearestRest['Distance(1)'] > 400]
Acceptable_Locations

Unnamed: 0,Location_Lat,Location_Lng,Restaurant_Lat(1),Restaurant_Lng(1),Distance(1),Restaurant_Lat(2),Restaurant_Lng(2),Distance(2)
0,51.478268,-0.233514,51.474880,-0.239207,546.368271,51.474705,-0.241282,669.618551
1,51.481380,-0.233495,51.482088,-0.223215,718.488070,51.484034,-0.224425,695.897933
2,51.484491,-0.233477,51.492641,-0.232788,908.021013,51.484034,-0.224425,630.875312
3,51.487603,-0.233459,51.492641,-0.232788,562.527589,51.492917,-0.235096,602.132442
7,51.500046,-0.233387,51.506478,-0.233772,716.185843,51.494477,-0.231942,627.603220
...,...,...,...,...,...,...,...,...
1438,51.508734,-0.048976,51.498645,-0.048972,1122.520922,51.513190,-0.042932,649.472112
1439,51.511844,-0.048972,51.513190,-0.042932,445.257087,51.512112,-0.040804,567.865605
1440,51.514953,-0.048968,51.513190,-0.042932,462.658064,51.521480,-0.047662,731.826820
1446,51.533603,-0.048945,51.533421,-0.042884,421.115835,51.529201,-0.046631,515.447665


In [30]:
map_London = folium.Map(location=[LondonLat, LondonLng], zoom_start=13, control_scale=True)
for lat, lon in zip(Acceptable_Locations['Location_Lat'], Acceptable_Locations['Location_Lng']):
    folium.CircleMarker([lat, lon], radius=0.5, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_London) 
HeatMap(LondonVenues[['Venue_Lat','Venue_Long']]).add_to(map_London)
map_London

Lets see if we can use K-means clustering to create neighbourhoods

In [74]:
Acceptable_Locations.reset_index(drop=True)

Unnamed: 0,Location_Lat,Location_Lng,Restaurant_Lat(1),Restaurant_Lng(1),Distance(1),Restaurant_Lat(2),Restaurant_Lng(2),Distance(2)
0,51.478268,-0.233514,51.474880,-0.239207,546.368271,51.474705,-0.241282,669.618551
1,51.481380,-0.233495,51.482088,-0.223215,718.488070,51.484034,-0.224425,695.897933
2,51.484491,-0.233477,51.492641,-0.232788,908.021013,51.484034,-0.224425,630.875312
3,51.487603,-0.233459,51.492641,-0.232788,562.527589,51.492917,-0.235096,602.132442
4,51.500046,-0.233387,51.506478,-0.233772,716.185843,51.494477,-0.231942,627.603220
...,...,...,...,...,...,...,...,...
481,51.508734,-0.048976,51.498645,-0.048972,1122.520922,51.513190,-0.042932,649.472112
482,51.511844,-0.048972,51.513190,-0.042932,445.257087,51.512112,-0.040804,567.865605
483,51.514953,-0.048968,51.513190,-0.042932,462.658064,51.521480,-0.047662,731.826820
484,51.533603,-0.048945,51.533421,-0.042884,421.115835,51.529201,-0.046631,515.447665


In [88]:
from sklearn.cluster import KMeans

number_of_clusters = 15

kmeans = KMeans(n_clusters=number_of_clusters, random_state=0).fit(Acceptable_Locations[['Location_Lat','Location_Lng']].values)

### Results & Discussion
Our analysis shows a number of potential locations to open a restaurant. Particularly in the borough of Camberwell and Peckham which is south of the river.
There are a number of promising locations in Bethnal Green and Bow which showed the second largest increase in average house price and is relatively close to the centre. Hence, by the methodology above, this is the best place to open a restaurant in London

### Conclusion

I have found a number of locations to open a restaurant that do not have another restaurant in a 400m radius. I then grouped these neighbourhoods by using kmeans clustering. Further work would find the best of these locations based on their proximity to the centre of London and by how much the house price had increased in the last year.