# IBM Data Science Capstone Project

#### This project aims to find an ideal location for a luxury hotel in London. Given the popularity of parkside hotels in London, this project analyses neighbourhoods in London which have a large number of parks, low competition and venues that match their target demographic of tourist families.

Importing libraries and scrapping data from a Wikipedia page

In [163]:
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [164]:
from urllib.request import urlopen
from bs4 import BeautifulSoup
import requests

In [165]:
data = requests.get('https://en.wikipedia.org/wiki/List_of_places_in_London').text


In [166]:
soup = BeautifulSoup(data, 'html.parser')

In [167]:
neighbourhoodList=[]

In [168]:
for row in soup.find('table').find_all('tr'):
    cells = row.find_all('td')
    if(len(cells) > 0):
        neighbourhoodList.append(cells[1].text)

In [169]:
df = pd.DataFrame({"Neighbourhood": neighbourhoodList})

df.head()

Unnamed: 0,Neighbourhood
0,Barking and Dagenham
1,Barnet
2,Bexley
3,Brent
4,Bromley


Stored all 32 neighbourhoods in London in a dataframe

In [170]:
df

Unnamed: 0,Neighbourhood
0,Barking and Dagenham
1,Barnet
2,Bexley
3,Brent
4,Bromley
5,Camden
6,Croydon
7,Ealing
8,Enfield
9,Greenwich


In [171]:
import numpy as np 
import json 

from pandas.io.json import json_normalize 
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
!pip install folium

import folium 

print("Libraries imported.")

Libraries imported.


Finding the geographical coordinates for each neighbourhood and storing them in a dataframe coords

In [172]:
#!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim 
#!pip install geocoder
import geocoder

In [173]:
def get_latlng(neighbourhood):
    
    lat_lng_coords = None
    
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, London, United Kingdom'.format(neighbourhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords



In [174]:
coords = [ get_latlng(neighbourhood) for neighbourhood in df["Neighbourhood"].tolist() ]

In [175]:
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

In [176]:
df_coords.shape

(32, 2)

Adding the coordinates to df

In [177]:
df['Latitude'] = df_coords['Latitude']
df['Longitude'] = df_coords['Longitude']

Finding the geographical coordinates of London

In [178]:
address = 'London, United Kingdom'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of London are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of London are 51.5073219, -0.1276474.


Displaying a map of London with neighbourhood markers superimposed on it

In [179]:
londonmap = folium.Map(location=[latitude, longitude], zoom_start=11)

for lat, lng, neighbourhood in zip(df['Latitude'], df['Longitude'], df['Neighbourhood']):
    label = '{}'.format(neighbourhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(londonmap)  
    
londonmap

Using Foursquare API to explore the neighbourhoods

In [180]:
CLIENT_ID = 'PBD2ONZGZWMVXXCKG23LKHZ4KSDE3GSIVVB5SOXBTW2LF5HQ' # your Foursquare ID
CLIENT_SECRET = 'WBLFH2G2IRXP0UNNW2ZNWINDAZQ1UVLC2XQ4JQ0PZTZHPASV' # your Foursquare Secret
VERSION = '20180605'
LIMIT=100
radius=500

In [181]:
radius = 2000
LIMIT = 100

venues = []

for lat, long, neighbourhood in zip(df['Latitude'], df['Longitude'], df['Neighbourhood']):
    
    
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    
    for venue in results:
        venues.append((
            neighbourhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [182]:
venues_df = pd.DataFrame(venues)


venues_df.columns = ['Neighbourhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(2944, 7)


Unnamed: 0,Neighbourhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Barking and Dagenham,51.543932,0.133157,Capital Karts,51.531792,0.118739,Go Kart Track
1,Barking and Dagenham,51.543932,0.133157,Mayesbrook Park,51.549842,0.108544,Park
2,Barking and Dagenham,51.543932,0.133157,Vue,51.532149,0.135,Movie Theater
3,Barking and Dagenham,51.543932,0.133157,Co-op Food,51.540093,0.127522,Grocery Store
4,Barking and Dagenham,51.543932,0.133157,Wilko,51.541002,0.148898,Furniture / Home Store


Viewing all unique categories of venues

In [183]:
venues_df['VenueCategory'].unique()

array(['Go Kart Track', 'Park', 'Movie Theater', 'Grocery Store',
       'Furniture / Home Store', 'Supermarket', 'Hotel', 'Pizza Place',
       'Pub', 'Bowling Alley', 'Bus Stop', 'Fast Food Restaurant',
       'Soccer Field', 'Gym / Fitness Center', 'Rugby Pitch', 'Gym',
       'Library', 'Chinese Restaurant', 'Skate Park', 'History Museum',
       'Home Service', 'Food & Drink Shop', 'Soccer Stadium',
       'Golf Course', 'Bakery', 'Café', 'Juice Bar', 'Farm',
       'Sandwich Place', 'Indian Restaurant', 'Argentinian Restaurant',
       'Italian Restaurant', 'Restaurant', 'Pharmacy', 'Bookstore',
       'Clothing Store', 'Sushi Restaurant', 'English Restaurant',
       'Fish & Chips Shop', 'Convenience Store', 'Stationery Store',
       'Middle Eastern Restaurant', 'Mediterranean Restaurant',
       'Train Station', 'Coffee Shop', 'Campground', 'Athletics & Sports',
       'Steakhouse', 'Performing Arts Venue', 'Turkish Restaurant',
       'Breakfast Spot', 'Greek Restaurant', 'Ic

In [184]:
venues_df.groupby(["Neighbourhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Barking and Dagenham,37,37,37,37,37,37
Barnet,44,44,44,44,44,44
Bexley,77,77,77,77,77,77
Brent,86,86,86,86,86,86
Bromley,58,58,58,58,58,58
Camden,100,100,100,100,100,100
Croydon,100,100,100,100,100,100
Ealing,100,100,100,100,100,100
Enfield,100,100,100,100,100,100
Greenwich,100,100,100,100,100,100


Viewing how many of each venue are in each neighbourhood

In [185]:
l_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

l_onehot['Neighbourhoods'] = venues_df['Neighbourhood'] 

fixed_columns = [l_onehot.columns[-1]] + list(l_onehot.columns[:-1])
l_onehot = l_onehot[fixed_columns]

print(l_onehot.shape)
l_onehot.head()

(2944, 275)


Unnamed: 0,Neighbourhoods,Afghan Restaurant,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Vietnamese Restaurant,Warehouse Store,Waterfront,Wine Bar,Wine Shop,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Zoo Exhibit
0,Barking and Dagenham,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Barking and Dagenham,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Barking and Dagenham,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Barking and Dagenham,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Barking and Dagenham,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [186]:
l_grouped = l_onehot.groupby(["Neighbourhoods"]).mean().reset_index()

print(l_grouped.shape)
l_grouped

(32, 275)


Unnamed: 0,Neighbourhoods,Afghan Restaurant,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Vietnamese Restaurant,Warehouse Store,Waterfront,Wine Bar,Wine Shop,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Zoo Exhibit
0,Barking and Dagenham,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Barnet,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Bexley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Brent,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011628,...,0.011628,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Bromley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.017241,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Camden,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,...,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0
6,Croydon,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Ealing,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,...,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0
8,Enfield,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,...,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0
9,Greenwich,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.01,...,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01


Creating a separate dataframe for mean of frequency of occurrence of parks in each neighbourhood 

In [187]:
l_park = l_grouped[["Neighbourhoods","Park"]]

Printing the mean frequency of parks in each neighbourhood

In [188]:
num_top_venues = 5

for hood in l_park['Neighbourhoods']:
    print("----"+hood+"----")
    temp = l_park[l_park['Neighbourhoods'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Barking and Dagenham----
  venue  freq
0  Park  0.05


----Barnet----
  venue  freq
0  Park  0.05


----Bexley----
  venue  freq
0  Park  0.04


----Brent----
  venue  freq
0  Park  0.03


----Bromley----
  venue  freq
0  Park  0.09


----Camden----
  venue  freq
0  Park   0.0


----Croydon----
  venue  freq
0  Park  0.05


----Ealing----
  venue  freq
0  Park  0.05


----Enfield----
  venue  freq
0  Park  0.03


----Greenwich----
  venue  freq
0  Park  0.06


----Hackney----
  venue  freq
0  Park  0.02


----Hammersmith and Fulham----
  venue  freq
0  Park  0.04


----Haringey----
  venue  freq
0  Park  0.04


----Harrow----
  venue  freq
0  Park  0.01


----Havering----
  venue  freq
0  Park  0.01


----Hillingdon----
  venue  freq
0  Park  0.04


----Hounslow----
  venue  freq
0  Park  0.01


----Islington----
  venue  freq
0  Park   0.0


----Kensington and Chelsea----
  venue  freq
0  Park  0.08


----Kingston upon Thames----
  venue  freq
0  Park  0.03


----Lambeth----
  ven

Clustering the dataframe into three clusters based on the number of parks in each neighbourhood

In [189]:
kclusters = 3

l_grouped_clustering = l_park.drop(["Neighbourhoods"], 1)

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(l_grouped_clustering)
l_merged = l_park.copy()

l_merged["Cluster Labels"] = kmeans.labels_
l_merged.rename(columns={"Neighbourhoods": "Neighbourhood"}, inplace=True)
l_merged.head()


Unnamed: 0,Neighbourhood,Park,Cluster Labels
0,Barking and Dagenham,0.054054,0
1,Barnet,0.045455,0
2,Bexley,0.038961,0
3,Brent,0.034884,0
4,Bromley,0.086207,2


Joining the two dataframes

In [190]:
l_merged = l_merged.join(df.set_index("Neighbourhood"), on="Neighbourhood")

print(l_merged.shape)
l_merged.head()

(32, 5)


Unnamed: 0,Neighbourhood,Park,Cluster Labels,Latitude,Longitude
0,Barking and Dagenham,0.054054,0,51.543932,0.133157
1,Barnet,0.045455,0,51.627294,-0.253759
2,Bexley,0.038961,0,51.622832,-0.080656
3,Brent,0.034884,0,51.609768,-0.194688
4,Bromley,0.086207,2,51.43182,-0.016566


Plotting a Folium map with markers of each cluster

In [191]:
from sklearn.cluster import KMeans

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0.0, 1.0, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, poi, cluster in zip(l_merged['Latitude'], l_merged['Longitude'], l_merged['Neighbourhood'], l_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Cluster 1 has the least number of parks. Cluster 0 has relatively more parks, while Cluster 2 has the most number of parks.

In [192]:
l_merged.loc[l_merged['Cluster Labels'] == 0, l_merged.columns[[0] + list(range(1, l_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Park,Cluster Labels,Latitude,Longitude
0,Barking and Dagenham,0.054054,0,51.543932,0.133157
1,Barnet,0.045455,0,51.627294,-0.253759
2,Bexley,0.038961,0,51.622832,-0.080656
3,Brent,0.034884,0,51.609768,-0.194688
6,Croydon,0.05,0,51.593209,-0.08339
7,Ealing,0.05,0,51.51406,-0.30073
8,Enfield,0.03,0,51.540021,-0.077501
11,Hammersmith and Fulham,0.04,0,51.48269,-0.21291
12,Haringey,0.04,0,51.589264,-0.106405
15,Hillingdon,0.04,0,51.484225,-0.09648


In [193]:
l_merged.loc[l_merged['Cluster Labels'] == 1, l_merged.columns[[0] + list(range(1, l_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Park,Cluster Labels,Latitude,Longitude
5,Camden,0.0,1,51.53236,-0.12796
10,Hackney,0.02,1,51.54505,-0.05532
13,Harrow,0.01,1,51.51318,-0.10698
14,Havering,0.01,1,51.544605,-0.144105
16,Hounslow,0.011364,1,51.471391,-0.351375
17,Islington,0.0,1,51.53279,-0.10614
23,Newham,0.010989,1,51.517368,0.022979
26,Southwark,0.0,1,51.50541,-0.08921
27,Sutton,0.01,1,51.490987,-0.167417
28,Tower Hamlets,0.0,1,51.52022,-0.05431


In [194]:
l_merged.loc[l_merged['Cluster Labels'] == 2, l_merged.columns[[0] + list(range(1, l_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Park,Cluster Labels,Latitude,Longitude
4,Bromley,0.086207,2,51.43182,-0.016566
9,Greenwich,0.06,2,51.48454,0.00275
18,Kensington and Chelsea,0.08,2,51.51038,-0.33147
20,Lambeth,0.07,2,51.49084,-0.11108
25,Richmond upon Thames,0.08,2,51.48021,-0.23718
29,Waltham Forest,0.079365,2,51.581761,-0.276969
30,Wandsworth,0.06,2,51.45682,-0.19452


Converting Cluster 2, the one with the most parks, into a dataframe to further analyse which neighbourhood would be ideal for a hotel

In [195]:
park_df=l_merged.loc[l_merged['Cluster Labels'] == 2]
park_df.drop(['Park','Cluster Labels'], axis=1, inplace=True)
park_df

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)


Unnamed: 0,Neighbourhood,Latitude,Longitude
4,Bromley,51.43182,-0.016566
9,Greenwich,51.48454,0.00275
18,Kensington and Chelsea,51.51038,-0.33147
20,Lambeth,51.49084,-0.11108
25,Richmond upon Thames,51.48021,-0.23718
29,Waltham Forest,51.581761,-0.276969
30,Wandsworth,51.45682,-0.19452


In [196]:
newdf=park_df
newdf

Unnamed: 0,Neighbourhood,Latitude,Longitude
4,Bromley,51.43182,-0.016566
9,Greenwich,51.48454,0.00275
18,Kensington and Chelsea,51.51038,-0.33147
20,Lambeth,51.49084,-0.11108
25,Richmond upon Thames,51.48021,-0.23718
29,Waltham Forest,51.581761,-0.276969
30,Wandsworth,51.45682,-0.19452


In [197]:
l_grouped

Unnamed: 0,Neighbourhoods,Afghan Restaurant,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Vietnamese Restaurant,Warehouse Store,Waterfront,Wine Bar,Wine Shop,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Zoo Exhibit
0,Barking and Dagenham,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Barnet,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Bexley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Brent,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011628,...,0.011628,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Bromley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.017241,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Camden,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,...,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0
6,Croydon,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Ealing,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,...,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0
8,Enfield,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,...,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0
9,Greenwich,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.01,...,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01


Selecting rows from l_grouped which match the neighbourhoods in Cluster 2

In [203]:
newdf=l_grouped.iloc[[4,9,18,20,25,29,30]]
newdf

Unnamed: 0,Neighbourhoods,Afghan Restaurant,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Vietnamese Restaurant,Warehouse Store,Waterfront,Wine Bar,Wine Shop,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Zoo Exhibit
4,Bromley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.017241,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Greenwich,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.01,...,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01
18,Kensington and Chelsea,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,...,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0
20,Lambeth,0.0,0.01,0.0,0.0,0.0,0.02,0.01,0.01,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0
25,Richmond upon Thames,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,...,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0
29,Waltham Forest,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
30,Wandsworth,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Seeing how many hotels are in these neighbourhoods

In [204]:
l_hotel=newdf[["Neighbourhoods","Hotel"]]
l_hotel.head(7)

Unnamed: 0,Neighbourhoods,Hotel
4,Bromley,0.0
9,Greenwich,0.0
18,Kensington and Chelsea,0.04
20,Lambeth,0.07
25,Richmond upon Thames,0.01
29,Waltham Forest,0.0
30,Wandsworth,0.0


Neighbourhoods Bromley, Greenwich, Waltham Forest and Wandsworth have no hotels. This would mean less competition for a new hotel. Thus, carrying forward with only these four neighbourhoods.

In [205]:
df2=newdf.iloc[[0,1,5,6]]
df2

Unnamed: 0,Neighbourhoods,Afghan Restaurant,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Vietnamese Restaurant,Warehouse Store,Waterfront,Wine Bar,Wine Shop,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Zoo Exhibit
4,Bromley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.017241,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Greenwich,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.01,...,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01
29,Waltham Forest,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
30,Wandsworth,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Finding the most common venues near these neighbourhoods to see if they match the demographic of tourist families

In [206]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [207]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

columns = ['Neighbourhoods']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighbourhoods'] = df2['Neighbourhoods']

for ind in np.arange(df2.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(df2.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighbourhoods,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Bromley,Grocery Store,Park,Supermarket,Fast Food Restaurant,Pub,Bus Stop,Italian Restaurant,Coffee Shop,Train Station,Platform
9,Greenwich,Pub,Park,Garden,Grocery Store,Café,Turkish Restaurant,Scenic Lookout,Furniture / Home Store,French Restaurant,History Museum
29,Waltham Forest,Indian Restaurant,Supermarket,Grocery Store,Park,Pub,Sandwich Place,Fast Food Restaurant,Coffee Shop,Gym / Fitness Center,Café
30,Wandsworth,Pub,Coffee Shop,Café,Park,Pizza Place,Gym / Fitness Center,Supermarket,Thai Restaurant,Bakery,Gym


Given the target demographic of the hotel is tourist families, Bromley matches this demographic more than the other neighbourhoods. The presence of a train station and a bus stop in the neighbourhood would be an important factor for tourists. Furthermore, grocery stores and supermarkets are also more common in this neighbourhood than others. The restaurants in Bromley like Fast Food Restaurants and Italian Restaurants also cater more to the family demographic, compared to pubs and Turkish/French restaurants. Waltham Forest would be a close second in terms of shopping and restaurants, but Bromley has an added advantage of the train station and bus stop which would make travel easier for tourists and having a hotel close to both a station and a park would be a huge plus for the company.

# Bromley is the ideal location for the specified hotel in London.