# Prospect of new restaurant business in London

## Introduction

### 1. Business Problem

Restaurant business is thriving in London due to its diverse culture and London being a hotspot for tourism. Being a Londoner, I have experienced the trouble of finding cuisines of choice in certain part of the cities, leaving me wonder the flourishing business prospect of a restaurant in those areas. However rather than going with guess work on the location, this project is make use of Machine learning and the data available online to analyse the hotspots for new restaurant locations/cuisines.

###  2. Data

information about London boroughs and locations from https://en.wikipedia.org/wiki/List_of_areas_of_London. <br>
Use the Geopy and Folium library to get the coordinates of every locations and map geospatial data on a London map.<br>
Using Foursquare API, collect the top 100 restaurants and their categories for each location within a radius 500 meters.


### 3. Methodology

#### Install the packages required for Analysis

In [1]:
!conda install -c conda-forge wikipedia --yes
!conda install -c conda-forge folium=0.5.0 --yes
!conda install -c conda-forge geopy --yes
!conda install -c conda-forge geocoder --yes

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - wikipedia


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    wikipedia-1.4.0            |             py_2          13 KB  conda-forge
    certifi-2020.6.20          |   py36h9f0ad1d_0         151 KB  conda-forge
    ca-certificates-2020.6.20  |       hecda079_0         145 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.4 MB

The following NEW packages will be INSTALLED:

    python_abi:      3.6-1_cp36m       conda-forge
    wikipedia:       1.4.0-py_2        conda-forge

The following packages will be UPDATED:

   

#### Importing the packages required for Analysis

Before we get the data and start exploring it, let's import dependencies that we will need.

In [2]:
#Importing the packages required for Analysis
import requests
import pandas as pd
import numpy as np
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import wikipedia as wp
from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter 

Python Wikipedia library is used to scrape the data from wikipedia page <a href="https://en.wikipedia.org/wiki/List_of_areas_of_London">click here</a> this holds the  neighborhood and borough informations. <br><br>
For our analysis, we are only interested in the location with Post Town='LONDON'

In [190]:
html = wp.page("List of areas of London").html().encode("UTF-8")
df_rawdata = pd.read_html(html)[1]
df=df_rawdata[df_rawdata['Post town'] =='LONDON']
df.head()

Unnamed: 0,Location,London borough,Post town,Postcode district,Dial code,OS grid ref
0,Abbey Wood,"Bexley, Greenwich [7]",LONDON,SE2,20,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4",20,TQ205805
6,Aldgate,City[10],LONDON,EC3,20,TQ334813
7,Aldwych,Westminster[10],LONDON,WC2,20,TQ307810
9,Anerley,Bromley[11],LONDON,SE20,20,TQ345695


In [191]:
#Check how the dataset looks like 
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 299 entries, 0 to 530
Data columns (total 6 columns):
Location             299 non-null object
London borough       299 non-null object
Post town            299 non-null object
Postcode district    299 non-null object
Dial code            299 non-null object
OS grid ref          299 non-null object
dtypes: object(6)
memory usage: 16.4+ KB


In [193]:
#Assigning column names and cleaning the data of borough columns to remove the []
df.columns = ['Location','London_borough','Post town','Postcode district','Dial Code','OS grid ref']
df['London_borough'] =  df['London_borough'].apply(lambda x: x.replace('[','').replace(']','')) 
df['London_borough'] =  df['London_borough'].str.replace('\d+', '')
df['London_borough'] =  df['London_borough'].str.split(',').str[0]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  app.launch_new_instance()
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


In [194]:
#Drop the columns irrelevent for the analysis and adding new two columns to store latitude and longitude
df.drop(['Postcode district','Dial Code','OS grid ref'],axis=1, inplace=True)
df['Latitude']=0
df['Longitude']=0

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  app.launch_new_instance()
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


In [197]:
#Analyse the records in the dataset and the size of the data
df.head()

Unnamed: 0,Location,London_borough,Post town,Latitude,Longitude
0,Abbey Wood,Bexley,LONDON,0,0
1,Acton,Ealing,LONDON,0,0
6,Aldgate,City,LONDON,0,0
7,Aldwych,Westminster,LONDON,0,0
9,Anerley,Bromley,LONDON,0,0


In [198]:
#Check the size of the data
df.shape

(299, 5)

#### Convert an address into latitude and longitude values  - Using geocoder

In [199]:
import geocoder # import geocoder
latitude = []
longitude = []
for i in range(len(df)):
    address = ','.join(map(str,list(df.iloc[i, :3])))    
    geolocator = Nominatim(user_agent="london_explorer")
    geocode = RateLimiter(geolocator.geocode, min_delay_seconds=1)
    location = geolocator.geocode(address)
    try: 
        latitude.append(location.latitude)
        longitude.append(location.longitude)
    except:
        latitude.append("Not Found")
        longitude.append("Not Found")
    
print('Done!')

Done!


In [202]:
#Assign the value of Longitude and latitude to the dataframe
df['Latitude']=latitude
df['Longitude']=longitude

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  app.launch_new_instance()


In [203]:
#View the dataset
df.head()

Unnamed: 0,Location,London_borough,Post town,Latitude,Longitude
0,Abbey Wood,Bexley,LONDON,51.49,0.132891
1,Acton,Ealing,LONDON,51.5081,-0.273261
6,Aldgate,City,LONDON,51.5142,-0.0757186
7,Aldwych,Westminster,LONDON,51.5131,-0.117593
9,Anerley,Bromley,LONDON,51.4076,-0.0619394


In [204]:
#Dropping the index 
df.reset_index(drop=True, inplace=True)

In [205]:
df.head(5)

Unnamed: 0,Location,London_borough,Post town,Latitude,Longitude
0,Abbey Wood,Bexley,LONDON,51.49,0.132891
1,Acton,Ealing,LONDON,51.5081,-0.273261
2,Aldgate,City,LONDON,51.5142,-0.0757186
3,Aldwych,Westminster,LONDON,51.5131,-0.117593
4,Anerley,Bromley,LONDON,51.4076,-0.0619394


In [206]:
# Write into CSV - Backing up the data
df.to_pickle('London.csv')
df.to_csv('LondonRestaurant.csv')

In [209]:
#Records with co-oridnates are assigned to a new dataframe this will now be used for further analysis
df_london=df[df["Latitude"]!="Not Found"]

In [210]:
df_london.head()

Unnamed: 0,Location,London_borough,Post town,Latitude,Longitude
0,Abbey Wood,Bexley,LONDON,51.49,0.132891
1,Acton,Ealing,LONDON,51.5081,-0.273261
2,Aldgate,City,LONDON,51.5142,-0.0757186
3,Aldwych,Westminster,LONDON,51.5131,-0.117593
4,Anerley,Bromley,LONDON,51.4076,-0.0619394


In [18]:
df_london.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 293 entries, 0 to 298
Data columns (total 5 columns):
Location          293 non-null object
London_borough    293 non-null object
Post town         293 non-null object
Latitude          293 non-null object
Longitude         293 non-null object
dtypes: object(5)
memory usage: 13.7+ KB


In [19]:
df_london.drop_duplicates(inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':


In [21]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(df_london['London_borough'].unique()),
        df.shape[0]
    )
)

The dataframe has 32 boroughs and 299 neighborhoods.


#### Get the London "central" point


In [212]:
address = 'London, England'

geolocator = Nominatim(user_agent="london_explorer")
location = geolocator.geocode(address)
lat_e = location.latitude
long_e = location.longitude
print('The geograpical coordinate of London City are {}, {}.'.format(lat_e, long_e))

The geograpical coordinate of London City are 51.5073219, -0.1276474.


#### create map of London using starting point coordinates

In [213]:
import folium
map_london = folium.Map(location=[lat_e, long_e], zoom_start=10)

# add markers to map
for lat, lng, borough, location in zip(df_london['Latitude'], df_london['Longitude'], df_london['London_borough'], df_london['Location']):
    label = '{}, {}'.format(location, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_london)  
    
map_london

We have explored the london boroughs and locations using the scraped data from wikipedia page. Lets now look at the restaurant informations of these locations using Foursquare API to explore the neighborhoods and segment them

#### Exploring London Restaurants using Foursquare API

Setting the Foursquare Credentials and Version

In [24]:
LIMIT = 200
CLIENT_ID = 'PAMLV50THNC031WTGXMUEAD5CEQRWVPT4UC2VZ12XHIXBRQW' # your Foursquare ID
CLIENT_SECRET = 'L5YPYTJIIFV5TDSRIT34S1PFALUB2TS0M0SU221UITY2UWBV' # your Foursquare Secret
VERSION = '20191116' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: PAMLV50THNC031WTGXMUEAD5CEQRWVPT4UC2VZ12XHIXBRQW
CLIENT_SECRET:L5YPYTJIIFV5TDSRIT34S1PFALUB2TS0M0SU221UITY2UWBV


Creating a new dataframe with Venue information recieved from Foursquare API using co-ordinates  . Venue Type - restaurants

In [70]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&query=restaurant'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        #results = requests.get(url).json()["response"]['groups'][0]['items']
        results = requests.get(url).json()['response'].get('groups',[{}])[0].get('items', [])
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list],columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category'])
    """nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']"""
    return(nearby_venues)
                             

Above function is invoked on each neighborhood and create a new dataframe called London_venues

In [71]:
London_venues = getNearbyVenues(names=df['Location'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )


Abbey Wood
Acton
Aldgate
Aldwych
Anerley
Angel
Archway
Arnos Grove
Balham
Bankside
Barbican
Barnes
Barnsbury
Battersea
Bayswater
Bedford Park
Belgravia
Bellingham
Belsize Park
Bermondsey
Bethnal Green
Blackfriars
Blackheath
Blackheath Royal Standard
Blackwall
Bloomsbury
Bounds Green
Bow
Bowes Park
Brent Cross
Brent Park
Brixton
Brockley
Bromley (also Bromley-by-Bow)
Brompton
Brondesbury
Brunswick Park
Burroughs, The
Camberwell
Cambridge Heath
Camden Town
Canary Wharf
Cann Hall
Canning Town
Canonbury
Castelnau
Catford
Chalk Farm
Charing Cross
Charlton
Chelsea
Childs Hill
Chinatown
Chinbrook
Chingford
Chiswick
Church End
Church End
Clapham
Clerkenwell
Colindale
Colliers Wood
Colney Hatch
Covent Garden
Cricklewood
Crofton Park
Crossness
Crouch End
Crystal Palace
Cubitt Town
Custom House
Dalston
Dartford
De Beauvoir Town
Denmark Hill
Deptford
Dollis Hill
Dulwich
Ealing
Earls Court
Earlsfield
East Dulwich
East Finchley
East Ham
East Sheen
Edmonton
Elephant and Castle
Eltham
Farringdon
Finch

Checking the size of the resulting dataframe

In [214]:
print(London_venues.shape)

(6188, 7)


In [215]:
London_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Abbey Wood,51.489962,0.132891,Abbey Cafe,51.492258,0.136148,Café
1,Acton,51.50814,-0.273261,MrBakeme,51.508452,-0.268543,Creperie
2,Acton,51.50814,-0.273261,Amigo's Peri Peri,51.508396,-0.274561,Fast Food Restaurant
3,Acton,51.50814,-0.273261,Subway,51.507509,-0.271781,Sandwich Place
4,Acton,51.50814,-0.273261,North China Restaurant,51.508251,-0.277435,Chinese Restaurant


Counting/grouping the venues(restaurant info) returned for each Neighborhood/Location

In [79]:
London_venues[['Neighborhood', 'Venue']].groupby('Neighborhood').count()

Unnamed: 0_level_0,Venue
Neighborhood,Unnamed: 1_level_1
Abbey Wood,1
Acton,12
Aldgate,100
Aldwych,98
Anerley,4
Angel,64
Archway,23
Arnos Grove,5
Balham,32
Bankside,45


Below code is to find the neighborhood with <b>NO</b> restaurant information recieved from Foursquare API

In [86]:
x = London_venues[['Neighborhood', 'Venue']].groupby('Neighborhood').count().shape[0]
y = df_london.shape[0]
empty_locations = []
if x != y:
    print('Missing data for {0} locations:'.format(y-x))
    # And print them
    for i in range(df_london.shape[0]):
        loc = df_london.iloc[i,0]
        k = 0
        for j in range(London_venues.shape[0]):
            if loc == London_venues.iloc[j,0]:
                k += 1
        if k == 0:
            print(i,loc)
            empty_locations.append(loc)

Missing data for 10 locations:
25 Bloomsbury
30 Brent Park
50 Chinbrook
63 Crossness
69 Dartford
113 Hampstead Garden Suburb
154 Little Ilford
166 Mill Hill


Lets check the unique categories of restuarant information recieved from Foursquare API

In [217]:
print('There are {0} uniques categories.'.format(len(London_venues['Venue Category'].unique())))

There are 126 uniques categories.


By observating the London_venues dataset - it is evident that it has venues inlcuding cafes and bakeries. In this project the focus is more on restaturants. hence ,assigning the dataframe to a new dataframe London_restaurant by filtering out Venue categories containing text "Restaurant"

In [229]:
London_restaurants = London_venues[London_venues['Venue Category'].str.contains("Restaurant")] 
London_restaurants.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
2,Acton,51.50814,-0.273261,Amigo's Peri Peri,51.508396,-0.274561,Fast Food Restaurant
4,Acton,51.50814,-0.273261,North China Restaurant,51.508251,-0.277435,Chinese Restaurant
5,Acton,51.50814,-0.273261,Subway,51.507501,-0.271765,Restaurant
6,Acton,51.50814,-0.273261,Ming's,51.507456,-0.27226,Chinese Restaurant
9,Acton,51.50814,-0.273261,Sam's Chicken,51.507193,-0.270431,Fast Food Restaurant


Analysis of each neighborhood - by transforming collected information using the one-hot encoding method.

In [230]:
# one hot encoding
london_onehot = pd.get_dummies(London_restaurants[['Venue Category']], prefix="", prefix_sep="")

# add location column back to dataframe
london_onehot['Neighborhood'] = London_restaurants['Neighborhood'] 

# move location column to the first column
fixed_columns = [london_onehot.columns[-1]] + list(london_onehot.columns[:-1])
london_onehot = london_onehot[fixed_columns]

london_onehot.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,Australian Restaurant,Austrian Restaurant,Belgian Restaurant,...,Tapas Restaurant,Thai Restaurant,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Veneto Restaurant,Vietnamese Restaurant,Xinjiang Restaurant,Yakitori Restaurant,Yoshoku Restaurant
2,Acton,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Acton,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,Acton,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,Acton,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,Acton,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [231]:
london_onehot.shape

(3342, 89)

In [262]:
print('There are {} restaurants in London with {} different style of cuisines.'.format(london_onehot.shape[0],(london_onehot.shape[1]-1)))

There are 3342 restaurants in London with 88 different style of cuisines.


We are now going to group rows by neighborhood and by taking the mean of the frequency of occurrence of each category preparing the dataframe for clustering.



In [233]:
London_grouped = london_onehot.groupby('Neighborhood').mean().reset_index()
London_grouped

Unnamed: 0,Neighborhood,Afghan Restaurant,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,Australian Restaurant,Austrian Restaurant,Belgian Restaurant,...,Tapas Restaurant,Thai Restaurant,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Veneto Restaurant,Vietnamese Restaurant,Xinjiang Restaurant,Yakitori Restaurant,Yoshoku Restaurant
0,Acton,0.000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.125000,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.0
1,Aldgate,0.000,0.000000,0.013889,0.000000,0.013889,0.027778,0.000000,0.000000,0.000000,...,0.000000,0.069444,0.027778,0.0,0.013889,0.000000,0.041667,0.0,0.0,0.0
2,Aldwych,0.000,0.000000,0.034483,0.000000,0.017241,0.017241,0.000000,0.000000,0.017241,...,0.034483,0.034483,0.000000,0.0,0.017241,0.000000,0.017241,0.0,0.0,0.0
3,Anerley,0.000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.0
4,Angel,0.025,0.000000,0.000000,0.000000,0.000000,0.025000,0.000000,0.025000,0.000000,...,0.025000,0.000000,0.025000,0.0,0.050000,0.000000,0.050000,0.0,0.0,0.0
5,Archway,0.000,0.000000,0.000000,0.000000,0.000000,0.100000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.0,0.100000,0.000000,0.000000,0.0,0.0,0.0
6,Arnos Grove,0.000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.333333,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.0
7,Balham,0.000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.0
8,Bankside,0.000,0.000000,0.000000,0.000000,0.000000,0.037037,0.037037,0.000000,0.000000,...,0.037037,0.037037,0.037037,0.0,0.037037,0.000000,0.037037,0.0,0.0,0.0
9,Barbican,0.000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.068966,0.0,0.0,0.0


In [234]:
London_grouped.shape

(266, 89)

#### Top 5 restaurants for each Neighborhood based on cuisine

In [235]:
num_top_rest = 5

for hood in London_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = London_grouped[London_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_rest))
    print('\n')    
    

----Acton----
                  venue  freq
0  Fast Food Restaurant  0.38
1    Chinese Restaurant  0.25
2   Japanese Restaurant  0.12
3    Turkish Restaurant  0.12
4            Restaurant  0.12


----Aldgate----
                venue  freq
0   Indian Restaurant  0.12
1  Italian Restaurant  0.10
2          Restaurant  0.08
3    Sushi Restaurant  0.08
4     Thai Restaurant  0.07


----Aldwych----
                 venue  freq
0           Restaurant  0.21
1   Italian Restaurant  0.10
2    French Restaurant  0.09
3   English Restaurant  0.07
4  Japanese Restaurant  0.07


----Anerley----
                   venue  freq
0   Fast Food Restaurant   1.0
1      Afghan Restaurant   0.0
2     Kurdish Restaurant   0.0
3       Ramen Restaurant   0.0
4  Portuguese Restaurant   0.0


----Angel----
                venue  freq
0    Sushi Restaurant  0.12
1  Italian Restaurant  0.10
2          Restaurant  0.10
3  Mexican Restaurant  0.05
4   Korean Restaurant  0.05


----Archway----
                 venue

#### Creating a new dataframe with the top 10 restaurants/cuisine based on occurance for each neighborhood

In [264]:
# A function to sort the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [265]:
#Creating a new dataframe and displaying the top 10 restaurants based on occurance and cuisine for each neighborhood
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = London_grouped['Neighborhood']


for ind in np.arange(London_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(London_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Acton,Fast Food Restaurant,Chinese Restaurant,Restaurant,Japanese Restaurant,Turkish Restaurant,Greek Restaurant,Ethiopian Restaurant,Falafel Restaurant,Filipino Restaurant,French Restaurant
1,Aldgate,Indian Restaurant,Italian Restaurant,Sushi Restaurant,Restaurant,Thai Restaurant,Japanese Restaurant,Middle Eastern Restaurant,Vietnamese Restaurant,Greek Restaurant,Mediterranean Restaurant
2,Aldwych,Restaurant,Italian Restaurant,French Restaurant,Japanese Restaurant,English Restaurant,Sushi Restaurant,Indian Restaurant,Seafood Restaurant,Spanish Restaurant,Korean Restaurant
3,Anerley,Fast Food Restaurant,Yoshoku Restaurant,Yakitori Restaurant,Ethiopian Restaurant,Falafel Restaurant,Filipino Restaurant,French Restaurant,German Restaurant,Gluten-free Restaurant,Greek Restaurant
4,Angel,Sushi Restaurant,Italian Restaurant,Restaurant,Korean Restaurant,English Restaurant,Mexican Restaurant,Mediterranean Restaurant,Indian Restaurant,French Restaurant,Vegetarian / Vegan Restaurant


In [266]:
# Creating new dataframe to determine best k value by dropping first column
London_clustering_testing = London_grouped.drop('Neighborhood', 1)

Below code is to find out the optimal value for K in K means cluster

In [261]:
#Below code is to find out the optimal value for K in K means cluster
"""from sklearn.cluster import KMeans 
from sklearn import metrics 
from scipy.spatial.distance import cdist 
import numpy as np 
import matplotlib.pyplot as plt 
import pandas as pd
from sklearn.preprocessing import MinMaxScaler


mms = MinMaxScaler()
mms.fit(London_clustering_testing)
data_transformed = mms.transform(London_clustering_testing)

Sum_of_squared_distances = []
K = range(1,15)
for k in K:
    km = KMeans(n_clusters=k)
    km = km.fit(data_transformed)
    Sum_of_squared_distances.append(km.inertia_)
    
plt.plot(K, Sum_of_squared_distances, 'bx-')
plt.xlabel('k')
plt.ylabel('Sum_of_squared_distances')
plt.title('Elbow Method For Optimal k')
plt.show()

from sklearn.metrics import silhouette_score
import matplotlib.pyplot as plt
%matplotlib inline

def plot(x, y, xlabel, ylabel):
    plt.figure(figsize=(20,10))
    plt.plot(np.arange(2, x), y, 'o-')
    plt.xlabel(xlabel)
    plt.ylabel(ylabel)
    plt.xticks(np.arange(2, x))
    plt.show()
    
indices = []
scores = []
max_range = 20

for kclusters in range(2, max_range) :
    
    # Run k-means clustering
    lct = London_clustering_testing
    kmeans = KMeans(n_clusters = kclusters, init = 'k-means++', random_state = 0).fit_predict(lct)
       # Gets the score for the clustering operation performed
    score = silhouette_score(lct, kmeans)
    
    # Appending the index and score to the respective lists
    indices.append(kclusters)
    scores.append(score)
    

plot(max_range, scores, "No. of clusters", "Silhouette Score")
"""

'from sklearn.cluster import KMeans \nfrom sklearn import metrics \nfrom scipy.spatial.distance import cdist \nimport numpy as np \nimport matplotlib.pyplot as plt \nimport pandas as pd\nfrom sklearn.preprocessing import MinMaxScaler\n\n\nmms = MinMaxScaler()\nmms.fit(London_clustering_testing)\ndata_transformed = mms.transform(London_clustering_testing)\n\nSum_of_squared_distances = []\nK = range(1,15)\nfor k in K:\n    km = KMeans(n_clusters=k)\n    km = km.fit(data_transformed)\n    Sum_of_squared_distances.append(km.inertia_)\n    \nplt.plot(K, Sum_of_squared_distances, \'bx-\')\nplt.xlabel(\'k\')\nplt.ylabel(\'Sum_of_squared_distances\')\nplt.title(\'Elbow Method For Optimal k\')\nplt.show()\n\nfrom sklearn.metrics import silhouette_score\nimport matplotlib.pyplot as plt\n%matplotlib inline\n\ndef plot(x, y, xlabel, ylabel):\n    plt.figure(figsize=(20,10))\n    plt.plot(np.arange(2, x), y, \'o-\')\n    plt.xlabel(xlabel)\n    plt.ylabel(ylabel)\n    plt.xticks(np.arange(2, x))\n 

In [241]:
#opt = np.argmax(scores) + 2 # Finds the optimal value
#opt

#### Clustering neighborhood (based on type of restaurants) using K-means clustering 

Run k-means to cluster the neighborhood into 5 clusters.

In [267]:
# set number of clusters
kclusters = 5

London_grouped_clustering = London_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(London_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 3, 3, 1, 3, 3, 0, 3, 3, 3], dtype=int32)

In [268]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

London_merged = London_restaurants

London_merged = London_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
London_merged.fillna(0)
London_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Acton,51.50814,-0.273261,Amigo's Peri Peri,51.508396,-0.274561,Fast Food Restaurant,1,Fast Food Restaurant,Chinese Restaurant,Restaurant,Japanese Restaurant,Turkish Restaurant,Greek Restaurant,Ethiopian Restaurant,Falafel Restaurant,Filipino Restaurant,French Restaurant
4,Acton,51.50814,-0.273261,North China Restaurant,51.508251,-0.277435,Chinese Restaurant,1,Fast Food Restaurant,Chinese Restaurant,Restaurant,Japanese Restaurant,Turkish Restaurant,Greek Restaurant,Ethiopian Restaurant,Falafel Restaurant,Filipino Restaurant,French Restaurant
5,Acton,51.50814,-0.273261,Subway,51.507501,-0.271765,Restaurant,1,Fast Food Restaurant,Chinese Restaurant,Restaurant,Japanese Restaurant,Turkish Restaurant,Greek Restaurant,Ethiopian Restaurant,Falafel Restaurant,Filipino Restaurant,French Restaurant
6,Acton,51.50814,-0.273261,Ming's,51.507456,-0.27226,Chinese Restaurant,1,Fast Food Restaurant,Chinese Restaurant,Restaurant,Japanese Restaurant,Turkish Restaurant,Greek Restaurant,Ethiopian Restaurant,Falafel Restaurant,Filipino Restaurant,French Restaurant
9,Acton,51.50814,-0.273261,Sam's Chicken,51.507193,-0.270431,Fast Food Restaurant,1,Fast Food Restaurant,Chinese Restaurant,Restaurant,Japanese Restaurant,Turkish Restaurant,Greek Restaurant,Ethiopian Restaurant,Falafel Restaurant,Filipino Restaurant,French Restaurant


Number of venues belonging to each cluster

In [271]:
London_merged['Cluster Labels'].value_counts()

3    2899
1     177
0     120
2      86
4      60
Name: Cluster Labels, dtype: int64

#### Creating map for these cluster for better visualization

In [272]:
# create map
map_clusters = folium.Map(location=[lat_e, long_e], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
count=0
for lat, lon, poi, cluster, vc in zip(London_merged['Venue Latitude'], London_merged['Venue Longitude'], London_merged['Venue'], London_merged['Cluster Labels'], London_merged['Venue Category']):
    label = folium.Popup(str(poi) +' (' +str(vc) + ') ['+ str(cluster)+']', parse_html=True)

    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.6).add_to(map_clusters)
    count += 1
map_clusters

### 4. Results

Lets examine each of these clusters to find out the most common and least common restaurant types for the recommendation system.

#### Cluster 1

In [247]:
c1 = London_merged.loc[London_merged['Cluster Labels'] ==0, London_merged.columns[[0]+list(range(5, London_merged.shape[1]))]]
c1.drop_duplicates()

Unnamed: 0,Neighborhood,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
302,Arnos Grove,-0.134875,Turkish Restaurant,0,Turkish Restaurant,Indian Restaurant,Chinese Restaurant,Yoshoku Restaurant,Greek Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,French Restaurant
305,Arnos Grove,-0.134448,Indian Restaurant,0,Turkish Restaurant,Indian Restaurant,Chinese Restaurant,Yoshoku Restaurant,Greek Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,French Restaurant
306,Arnos Grove,-0.128742,Chinese Restaurant,0,Turkish Restaurant,Indian Restaurant,Chinese Restaurant,Yoshoku Restaurant,Greek Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,French Restaurant
806,Brockley,-0.036473,Malay Restaurant,0,Chinese Restaurant,Fast Food Restaurant,Malay Restaurant,Yoshoku Restaurant,Halal Restaurant,Falafel Restaurant,Filipino Restaurant,French Restaurant,German Restaurant,Gluten-free Restaurant
813,Brockley,-0.034321,Chinese Restaurant,0,Chinese Restaurant,Fast Food Restaurant,Malay Restaurant,Yoshoku Restaurant,Halal Restaurant,Falafel Restaurant,Filipino Restaurant,French Restaurant,German Restaurant,Gluten-free Restaurant
814,Brockley,-0.037258,Fast Food Restaurant,0,Chinese Restaurant,Fast Food Restaurant,Malay Restaurant,Yoshoku Restaurant,Halal Restaurant,Falafel Restaurant,Filipino Restaurant,French Restaurant,German Restaurant,Gluten-free Restaurant
816,Brockley,-0.037983,Chinese Restaurant,0,Chinese Restaurant,Fast Food Restaurant,Malay Restaurant,Yoshoku Restaurant,Halal Restaurant,Falafel Restaurant,Filipino Restaurant,French Restaurant,German Restaurant,Gluten-free Restaurant
817,Brockley,-0.037711,Chinese Restaurant,0,Chinese Restaurant,Fast Food Restaurant,Malay Restaurant,Yoshoku Restaurant,Halal Restaurant,Falafel Restaurant,Filipino Restaurant,French Restaurant,German Restaurant,Gluten-free Restaurant
1698,Crofton Park,-0.036473,Malay Restaurant,0,Chinese Restaurant,Fast Food Restaurant,Malay Restaurant,Yoshoku Restaurant,Halal Restaurant,Falafel Restaurant,Filipino Restaurant,French Restaurant,German Restaurant,Gluten-free Restaurant
1701,Crofton Park,-0.037258,Fast Food Restaurant,0,Chinese Restaurant,Fast Food Restaurant,Malay Restaurant,Yoshoku Restaurant,Halal Restaurant,Falafel Restaurant,Filipino Restaurant,French Restaurant,German Restaurant,Gluten-free Restaurant


In [248]:
cluster_1 = London_merged.loc[London_merged['Cluster Labels'] == 0, London_merged.columns[[0] + list(range(5, London_merged.shape[1]))]]
cluster_1.describe(include='all')

Unnamed: 0,Neighborhood,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
count,120,120.0,120,120.0,120,120,120,120,120,120,120,120,120,120
unique,26,,27,,10,10,13,12,11,10,9,9,8,8
top,New Cross,,Chinese Restaurant,,Chinese Restaurant,Chinese Restaurant,Indian Restaurant,Yoshoku Restaurant,Halal Restaurant,Falafel Restaurant,Fast Food Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant
freq,14,,43,,60,32,22,21,18,30,29,27,27,27
mean,,-0.071297,,0.0,,,,,,,,,,
std,,0.081099,,0.0,,,,,,,,,,
min,,-0.337967,,0.0,,,,,,,,,,
25%,,-0.123175,,0.0,,,,,,,,,,
50%,,-0.045557,,0.0,,,,,,,,,,
75%,,-0.036175,,0.0,,,,,,,,,,


#### Cluster 2

In [282]:
c2 = London_merged.loc[London_merged['Cluster Labels'] == 1, London_merged.columns[[0]+list(range(5, London_merged.shape[1]))]]
c2.drop_duplicates()

Unnamed: 0,Neighborhood,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Acton,-0.274561,Fast Food Restaurant,1,Fast Food Restaurant,Chinese Restaurant,Restaurant,Japanese Restaurant,Turkish Restaurant,Greek Restaurant,Ethiopian Restaurant,Falafel Restaurant,Filipino Restaurant,French Restaurant
4,Acton,-0.277435,Chinese Restaurant,1,Fast Food Restaurant,Chinese Restaurant,Restaurant,Japanese Restaurant,Turkish Restaurant,Greek Restaurant,Ethiopian Restaurant,Falafel Restaurant,Filipino Restaurant,French Restaurant
5,Acton,-0.271765,Restaurant,1,Fast Food Restaurant,Chinese Restaurant,Restaurant,Japanese Restaurant,Turkish Restaurant,Greek Restaurant,Ethiopian Restaurant,Falafel Restaurant,Filipino Restaurant,French Restaurant
6,Acton,-0.272260,Chinese Restaurant,1,Fast Food Restaurant,Chinese Restaurant,Restaurant,Japanese Restaurant,Turkish Restaurant,Greek Restaurant,Ethiopian Restaurant,Falafel Restaurant,Filipino Restaurant,French Restaurant
9,Acton,-0.270431,Fast Food Restaurant,1,Fast Food Restaurant,Chinese Restaurant,Restaurant,Japanese Restaurant,Turkish Restaurant,Greek Restaurant,Ethiopian Restaurant,Falafel Restaurant,Filipino Restaurant,French Restaurant
10,Acton,-0.272239,Turkish Restaurant,1,Fast Food Restaurant,Chinese Restaurant,Restaurant,Japanese Restaurant,Turkish Restaurant,Greek Restaurant,Ethiopian Restaurant,Falafel Restaurant,Filipino Restaurant,French Restaurant
11,Acton,-0.269494,Japanese Restaurant,1,Fast Food Restaurant,Chinese Restaurant,Restaurant,Japanese Restaurant,Turkish Restaurant,Greek Restaurant,Ethiopian Restaurant,Falafel Restaurant,Filipino Restaurant,French Restaurant
12,Acton,-0.270255,Fast Food Restaurant,1,Fast Food Restaurant,Chinese Restaurant,Restaurant,Japanese Restaurant,Turkish Restaurant,Greek Restaurant,Ethiopian Restaurant,Falafel Restaurant,Filipino Restaurant,French Restaurant
212,Anerley,-0.059266,Fast Food Restaurant,1,Fast Food Restaurant,Yoshoku Restaurant,Yakitori Restaurant,Ethiopian Restaurant,Falafel Restaurant,Filipino Restaurant,French Restaurant,German Restaurant,Gluten-free Restaurant,Greek Restaurant
740,Blackheath Royal Standard,0.021685,Fast Food Restaurant,1,Fast Food Restaurant,Chinese Restaurant,Yoshoku Restaurant,English Restaurant,Falafel Restaurant,Filipino Restaurant,French Restaurant,German Restaurant,Gluten-free Restaurant,Greek Restaurant


In [250]:
cluster_1 = London_merged.loc[London_merged['Cluster Labels'] == 1, London_merged.columns[[0] + list(range(5, London_merged.shape[1]))]]
cluster_1.describe(include='all')

Unnamed: 0,Neighborhood,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
count,177,177.0,177,177.0,177,177,177,177,177,177,177,177,177,177
unique,31,,27,,2,19,14,15,13,12,12,7,7,7
top,Wood Green,,Fast Food Restaurant,,Fast Food Restaurant,Italian Restaurant,Restaurant,Italian Restaurant,Chinese Restaurant,Portuguese Restaurant,Filipino Restaurant,Gluten-free Restaurant,Filipino Restaurant,French Restaurant
freq,16,,81,,165,22,42,27,25,37,31,38,40,40
mean,,-0.107754,,1.0,,,,,,,,,,
std,,0.098638,,0.0,,,,,,,,,,
min,,-0.339465,,1.0,,,,,,,,,,
25%,,-0.213378,,1.0,,,,,,,,,,
50%,,-0.095468,,1.0,,,,,,,,,,
75%,,-0.043488,,1.0,,,,,,,,,,


#### Cluster 3

In [252]:
c2 = London_merged.loc[London_merged['Cluster Labels'] == 2, London_merged.columns[[0]+list(range(5, London_merged.shape[1]))]]
c2.drop_duplicates()

Unnamed: 0,Neighborhood,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
755,Bounds Green,-0.124613,Middle Eastern Restaurant,2,Indian Restaurant,Middle Eastern Restaurant,Yoshoku Restaurant,Halal Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,French Restaurant,German Restaurant
756,Bounds Green,-0.127042,Indian Restaurant,2,Indian Restaurant,Middle Eastern Restaurant,Yoshoku Restaurant,Halal Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,French Restaurant,German Restaurant
772,Bowes Park,-0.116601,Greek Restaurant,2,Indian Restaurant,Middle Eastern Restaurant,Greek Restaurant,Yoshoku Restaurant,Halal Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,French Restaurant
775,Bowes Park,-0.124613,Middle Eastern Restaurant,2,Indian Restaurant,Middle Eastern Restaurant,Greek Restaurant,Yoshoku Restaurant,Halal Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,French Restaurant
776,Bowes Park,-0.127042,Indian Restaurant,2,Indian Restaurant,Middle Eastern Restaurant,Greek Restaurant,Yoshoku Restaurant,Halal Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,French Restaurant
793,Brixton,-0.113276,Modern European Restaurant,2,Indian Restaurant,Fast Food Restaurant,Tapas Restaurant,Caribbean Restaurant,Modern European Restaurant,Yoshoku Restaurant,Halal Restaurant,Falafel Restaurant,Filipino Restaurant,French Restaurant
795,Brixton,-0.116038,Tapas Restaurant,2,Indian Restaurant,Fast Food Restaurant,Tapas Restaurant,Caribbean Restaurant,Modern European Restaurant,Yoshoku Restaurant,Halal Restaurant,Falafel Restaurant,Filipino Restaurant,French Restaurant
797,Brixton,-0.116558,Caribbean Restaurant,2,Indian Restaurant,Fast Food Restaurant,Tapas Restaurant,Caribbean Restaurant,Modern European Restaurant,Yoshoku Restaurant,Halal Restaurant,Falafel Restaurant,Filipino Restaurant,French Restaurant
798,Brixton,-0.119421,Indian Restaurant,2,Indian Restaurant,Fast Food Restaurant,Tapas Restaurant,Caribbean Restaurant,Modern European Restaurant,Yoshoku Restaurant,Halal Restaurant,Falafel Restaurant,Filipino Restaurant,French Restaurant
799,Brixton,-0.113471,Indian Restaurant,2,Indian Restaurant,Fast Food Restaurant,Tapas Restaurant,Caribbean Restaurant,Modern European Restaurant,Yoshoku Restaurant,Halal Restaurant,Falafel Restaurant,Filipino Restaurant,French Restaurant


In [253]:
cluster_1 = London_merged.loc[London_merged['Cluster Labels'] == 2, London_merged.columns[[0] + list(range(5, London_merged.shape[1]))]]
cluster_1.describe(include='all')

Unnamed: 0,Neighborhood,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
count,86,86.0,86,86.0,86,86,86,86,86,86,86,86,86,86
unique,24,,21,,6,9,8,11,7,7,6,5,4,4
top,East Ham,,Indian Restaurant,,Indian Restaurant,Fast Food Restaurant,Yoshoku Restaurant,Halal Restaurant,Ethiopian Restaurant,Falafel Restaurant,Filipino Restaurant,French Restaurant,German Restaurant,Gluten-free Restaurant
freq,13,,46,,72,22,28,26,28,37,31,31,31,31
mean,,-0.073356,,2.0,,,,,,,,,,
std,,0.115802,,0.0,,,,,,,,,,
min,,-0.32191,,2.0,,,,,,,,,,
25%,,-0.143377,,2.0,,,,,,,,,,
50%,,-0.074777,,2.0,,,,,,,,,,
75%,,0.034733,,2.0,,,,,,,,,,


#### Cluster 4

In [254]:
c2 = London_merged.loc[London_merged['Cluster Labels'] == 3, London_merged.columns[[0]+list(range(5, London_merged.shape[1]))]]
c2.drop_duplicates()

Unnamed: 0,Neighborhood,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
15,Aldgate,-0.079079,Japanese Restaurant,3,Indian Restaurant,Italian Restaurant,Sushi Restaurant,Restaurant,Thai Restaurant,Japanese Restaurant,Middle Eastern Restaurant,Vietnamese Restaurant,Greek Restaurant,Mediterranean Restaurant
16,Aldgate,-0.075434,Italian Restaurant,3,Indian Restaurant,Italian Restaurant,Sushi Restaurant,Restaurant,Thai Restaurant,Japanese Restaurant,Middle Eastern Restaurant,Vietnamese Restaurant,Greek Restaurant,Mediterranean Restaurant
17,Aldgate,-0.070606,Restaurant,3,Indian Restaurant,Italian Restaurant,Sushi Restaurant,Restaurant,Thai Restaurant,Japanese Restaurant,Middle Eastern Restaurant,Vietnamese Restaurant,Greek Restaurant,Mediterranean Restaurant
18,Aldgate,-0.075438,Argentinian Restaurant,3,Indian Restaurant,Italian Restaurant,Sushi Restaurant,Restaurant,Thai Restaurant,Japanese Restaurant,Middle Eastern Restaurant,Vietnamese Restaurant,Greek Restaurant,Mediterranean Restaurant
20,Aldgate,-0.076932,Szechuan Restaurant,3,Indian Restaurant,Italian Restaurant,Sushi Restaurant,Restaurant,Thai Restaurant,Japanese Restaurant,Middle Eastern Restaurant,Vietnamese Restaurant,Greek Restaurant,Mediterranean Restaurant
21,Aldgate,-0.076465,Restaurant,3,Indian Restaurant,Italian Restaurant,Sushi Restaurant,Restaurant,Thai Restaurant,Japanese Restaurant,Middle Eastern Restaurant,Vietnamese Restaurant,Greek Restaurant,Mediterranean Restaurant
23,Aldgate,-0.077195,Indian Restaurant,3,Indian Restaurant,Italian Restaurant,Sushi Restaurant,Restaurant,Thai Restaurant,Japanese Restaurant,Middle Eastern Restaurant,Vietnamese Restaurant,Greek Restaurant,Mediterranean Restaurant
25,Aldgate,-0.080360,Restaurant,3,Indian Restaurant,Italian Restaurant,Sushi Restaurant,Restaurant,Thai Restaurant,Japanese Restaurant,Middle Eastern Restaurant,Vietnamese Restaurant,Greek Restaurant,Mediterranean Restaurant
26,Aldgate,-0.081169,Sushi Restaurant,3,Indian Restaurant,Italian Restaurant,Sushi Restaurant,Restaurant,Thai Restaurant,Japanese Restaurant,Middle Eastern Restaurant,Vietnamese Restaurant,Greek Restaurant,Mediterranean Restaurant
27,Aldgate,-0.075626,Thai Restaurant,3,Indian Restaurant,Italian Restaurant,Sushi Restaurant,Restaurant,Thai Restaurant,Japanese Restaurant,Middle Eastern Restaurant,Vietnamese Restaurant,Greek Restaurant,Mediterranean Restaurant


In [255]:
cluster_1 = London_merged.loc[London_merged['Cluster Labels'] == 3, London_merged.columns[[0] + list(range(5, London_merged.shape[1]))]]
cluster_1.describe(include='all')

Unnamed: 0,Neighborhood,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
count,2899,2899.0,2899,2899.0,2899,2899,2899,2899,2899,2899,2899,2899,2899,2899
unique,175,,87,,28,31,39,40,39,44,50,44,47,52
top,Aldgate,,Italian Restaurant,,Italian Restaurant,Italian Restaurant,Restaurant,Indian Restaurant,Restaurant,Sushi Restaurant,Thai Restaurant,Seafood Restaurant,Falafel Restaurant,Mexican Restaurant
freq,72,,376,,1188,727,352,404,308,227,203,262,168,220
mean,,-0.129506,,3.0,,,,,,,,,,
std,,0.063537,,0.0,,,,,,,,,,
min,,-0.308454,,3.0,,,,,,,,,,
25%,,-0.174251,,3.0,,,,,,,,,,
50%,,-0.125022,,3.0,,,,,,,,,,
75%,,-0.098466,,3.0,,,,,,,,,,


#### Cluster 5

In [259]:
c2 = London_merged.loc[London_merged['Cluster Labels'] == 4, London_merged.columns[[0]+list(range(5, London_merged.shape[1]))]]
c2.drop_duplicates()

Unnamed: 0,Neighborhood,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
593,Bellingham,-0.017902,Turkish Restaurant,4,Fast Food Restaurant,Turkish Restaurant,Yoshoku Restaurant,Halal Restaurant,Ethiopian Restaurant,Falafel Restaurant,Filipino Restaurant,French Restaurant,German Restaurant,Gluten-free Restaurant
594,Bellingham,-0.015619,Fast Food Restaurant,4,Fast Food Restaurant,Turkish Restaurant,Yoshoku Restaurant,Halal Restaurant,Ethiopian Restaurant,Falafel Restaurant,Filipino Restaurant,French Restaurant,German Restaurant,Gluten-free Restaurant
759,Bow,-0.026779,Turkish Restaurant,4,Fast Food Restaurant,Turkish Restaurant,Chinese Restaurant,Yoshoku Restaurant,Halal Restaurant,Ethiopian Restaurant,Falafel Restaurant,Filipino Restaurant,French Restaurant,German Restaurant
765,Bow,-0.027413,Fast Food Restaurant,4,Fast Food Restaurant,Turkish Restaurant,Chinese Restaurant,Yoshoku Restaurant,Halal Restaurant,Ethiopian Restaurant,Falafel Restaurant,Filipino Restaurant,French Restaurant,German Restaurant
769,Bow,-0.029928,Chinese Restaurant,4,Fast Food Restaurant,Turkish Restaurant,Chinese Restaurant,Yoshoku Restaurant,Halal Restaurant,Ethiopian Restaurant,Falafel Restaurant,Filipino Restaurant,French Restaurant,German Restaurant
847,Brunswick Park,-0.15329,Turkish Restaurant,4,Turkish Restaurant,Yoshoku Restaurant,Halal Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,French Restaurant,German Restaurant,Gluten-free Restaurant
1066,Canning Town,0.009132,Turkish Restaurant,4,Turkish Restaurant,Restaurant,Italian Restaurant,Fast Food Restaurant,Greek Restaurant,English Restaurant,Ethiopian Restaurant,Falafel Restaurant,Filipino Restaurant,French Restaurant
1067,Canning Town,0.0107,Italian Restaurant,4,Turkish Restaurant,Restaurant,Italian Restaurant,Fast Food Restaurant,Greek Restaurant,English Restaurant,Ethiopian Restaurant,Falafel Restaurant,Filipino Restaurant,French Restaurant
1068,Canning Town,0.012059,Fast Food Restaurant,4,Turkish Restaurant,Restaurant,Italian Restaurant,Fast Food Restaurant,Greek Restaurant,English Restaurant,Ethiopian Restaurant,Falafel Restaurant,Filipino Restaurant,French Restaurant
1070,Canning Town,0.008953,Turkish Restaurant,4,Turkish Restaurant,Restaurant,Italian Restaurant,Fast Food Restaurant,Greek Restaurant,English Restaurant,Ethiopian Restaurant,Falafel Restaurant,Filipino Restaurant,French Restaurant


In [260]:
cluster_1 = London_merged.loc[London_merged['Cluster Labels'] == 4, London_merged.columns[[0] + list(range(5, London_merged.shape[1]))]]
cluster_1.describe(include='all')

Unnamed: 0,Neighborhood,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
count,60,60.0,60,60.0,60,60,60,60,60,60,60,60,60,60
unique,10,,15,,2,6,6,6,7,7,6,4,5,5
top,Shacklewell,,Turkish Restaurant,,Turkish Restaurant,Mediterranean Restaurant,Kebab Restaurant,Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Portuguese Restaurant,Falafel Restaurant,Cuban Restaurant,Caribbean Restaurant
freq,22,,29,,53,22,22,22,22,22,22,32,22,22
mean,,-0.065453,,4.0,,,,,,,,,,
std,,0.036789,,0.0,,,,,,,,,,
min,,-0.15329,,4.0,,,,,,,,,,
25%,,-0.09577,,4.0,,,,,,,,,,
50%,,-0.075029,,4.0,,,,,,,,,,
75%,,-0.056497,,4.0,,,,,,,,,,


#### Result Summary 

<table style="width:50%">
<tr>
<th>Cluster</th>
<th>Most Common Restaurant</th>
<th>Least Common Restaurant</th>
</tr>
<tr>
<td>1</td>
<td>Chinese Restaurant</td>
<td>Filipino Restaurant</td>
</tr>
<tr>
<td>2</td>
<td>Fast Food Restaurant  </td>
<td>French Restaurant</td>
</tr>
<tr>
<td>3</td>
<td>Indian Restaurant</td>
<td>Gluten-free Restaurant</td>
</tr>
<tr>
<td>4</td>
<td>Italian Restaurant</td>
<td>Mexican Restaurant</td>
</tr>
<tr>
<td>5</td>
<td>Turkish Restaurant</td>
<td>Caribbean Restaurant</td>
</tr>
</table>

### 5.  Discussion

The analysis can be summarized to say – the safest option to choose the restaurant type for a particular locality is by looking at the least common type. Opening a new Indian restaurant in a street with dozens of Indian dining place makes no sense. Looking at the competition, this will be a risky investment.  However, while choosing the least popular restaurant is also uncertain, given the demand and popularity for that type is food is less in that area. Hence the analysis can be made better by including the analysis of population and demography. The analysis is limited to 10 common venues to eliminate the risk of opening a business with no prospect in a location

Stakeholder can 
* Either choose a location from the cluster and this model can advise what would be the best cuisine to opt for in the locality <btr>
* Or choose a gastronomy , and we can check the model to find out the best suited cluster/locality for that type

### 6. Conclusion

The analysis is performed to find out the most promising type of restaurant and suitable location to start the business.  The data is collected from Wikipedia , geospatial library and foursquare API. The method used is K-means clustering to group the similar neighbourhood on the basis of frequency of certain cuisine types.  Undoubtedly the model can be improved by collecting more information around the neighbourhood population(/demography) , restaurant menu and ratings to come up with better recommendation