# Opening a new Italian restaurant in London, UK

By Aria Kumar

## Introduction/Business Problem

London is one of the world's major cities and is a diverse multicultural hub with over 300 languages being spoken here. The city is world famous for its history, culture, attractions, art, shopping opportunities and food. There are around 15,500 restaurants in London according to the Office of National Statistics, making London one of the most ideal places in the world to open a restaurant. London hosts approximately 30 million tourists each year, so the demand for restaurants is always high and business will never be low if opened in the right place. 

Londoners favourite cusine is Italian with approximately 30% saying it is their favourite food to eat out for. According to https://www.prnewswire.com/news-releases/food-for-thought-the-average-brit-spends-ps700-on-eating-out-per-year-here-s-how-to-do-it-for-less-with-budget-meal-ideas-in-2019-826776701.html

The objective of this project is to analyse and find the best location to open an Italian food restaurant in London. 

The target audience for this project is a businessman who wants to invest in a location in London to open an Italian Restaurant. 

## Data

Problem Statement: Where is the best location to open an Italian restaurant in London? 

I will be getting data from following sources: 

+ The Neighbourhoods of London and their postcodes using Wikipedia(https://en.wikipedia.org/wiki/List_of_areas_of_London)
+ The Locations in each Neighbourhood in London area(https://developer.foursquare.com/)
+ Coordinates of each Neighbourhood/postcode using geocoder python library(https://developers.arcgis.com/python/guide/using-the-geocode-function/)

How will I use the data?

1. Get the Neighbourhoods data from Wikipedia using Beautiful Soup library.
2. Explore the venues and their category in each Neighbourhood using Foursquare API.
3. Get the latitude and longitude data using Geocoder library.
4. Combine the data above into one dataframe.
5. Group by Neighbourhood and count the occurrences of Venues for each Neighbourhood.
6. Cluster each Neighbourhood based on the venues and add the cluster back into the initial dataframe
7. Identify patterns in each cluster and the most common venue.
8. Data visualisation will be done using Folium maps
9. Will use machine learning clustering techniques to hopefully get an ideal location  

In [8]:
!pip install geopy
!pip install geocoder
!pip install folium
!pip install beautifulsoup4 
!pip install lxml 
!pip install split



In [9]:
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json
from geopy.geocoders import Nominatim

import requests
from pandas.io.json import json_normalize 

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans
import folium

from bs4 import BeautifulSoup
import string

from sklearn import metrics
from scipy.spatial.distance import cdist
import numpy as np
import matplotlib as plt
from matplotlib import pyplot

In [10]:
import requests 
import pandas as pd 
import numpy as np 
import random 

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim 

from IPython.display import Image 
from IPython.core.display import HTML 
    
from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium 

print('Folium installed')
print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2019.11.28         |           py36_0         149 KB  conda-forge
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0         conda-forge
    geopy:           1.20.0-py_0       conda-forge

The following packages will be UPDATED:

    ca-

In [11]:
list1 = pd.read_html("https://en.wikipedia.org/wiki/List_of_areas_of_London",header=0)
type(list1)

list

In [12]:
df = list1[1]
df.head()

Unnamed: 0,Location,London borough,Post town,Postcode district,Dial code,OS grid ref
0,Abbey Wood,"Bexley, Greenwich [7]",LONDON,SE2,20,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4",20,TQ205805
2,Addington,Croydon[8],CROYDON,CR0,20,TQ375645
3,Addiscombe,Croydon[8],CROYDON,CR0,20,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",20,TQ478728


In [13]:
print(df.columns)

df.rename(columns={"Location":"Neighourhood"}, inplace=True)
df.head()

Index(['Location', 'London borough', 'Post town', 'Postcode district',
       'Dial code', 'OS grid ref'],
      dtype='object')


Unnamed: 0,Neighourhood,London borough,Post town,Postcode district,Dial code,OS grid ref
0,Abbey Wood,"Bexley, Greenwich [7]",LONDON,SE2,20,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4",20,TQ205805
2,Addington,Croydon[8],CROYDON,CR0,20,TQ375645
3,Addiscombe,Croydon[8],CROYDON,CR0,20,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",20,TQ478728


In [14]:
print(df.columns)

Index(['Neighourhood', 'London borough', 'Post town', 'Postcode district',
       'Dial code', 'OS grid ref'],
      dtype='object')


In [15]:
for index,value in enumerate(df['Borough']):
    
    df.at[index,'Borough']=value.split('[')[0]
df['Borough'] =  df['Borough'].apply(lambda x: x.split(",")[0])

df['Postcode'] =  df['Postcode'].apply(lambda x: x.split(",")[0])

KeyError: 'Borough'

In [16]:
df.head()

Unnamed: 0,Neighourhood,London borough,Post town,Postcode district,Dial code,OS grid ref
0,Abbey Wood,"Bexley, Greenwich [7]",LONDON,SE2,20,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4",20,TQ205805
2,Addington,Croydon[8],CROYDON,CR0,20,TQ375645
3,Addiscombe,Croydon[8],CROYDON,CR0,20,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",20,TQ478728


In [17]:
df.dtypes

Neighourhood         object
London borough       object
Post town            object
Postcode district    object
Dial code            object
OS grid ref          object
dtype: object

In [18]:
df = df.astype(str)
df.dtypes

Neighourhood         object
London borough       object
Post town            object
Postcode district    object
Dial code            object
OS grid ref          object
dtype: object

In [139]:
df.sort_values(by='Postcode',ascending=True,inplace=True)
df.reset_index(drop=True,inplace=True)
df.head()

KeyError: 'Postcode'

In [64]:
df.shape

(533, 6)

In [76]:
import geopy
address="London,UK"
geolocator = Nominatim(user_agent="London_explorer")
lat_lng_coords = None

while(lat_lng_coords is None):
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    lat_lng_coords=[latitude,longitude]

print('The geograpical coordinate of London are {}, {}.'.format(lat_lng_coords[0], lat_lng_coords[1]))

The geograpical coordinate of London are 51.5073219, -0.1276474.


In [None]:
import geocoder
Latitude_list=[]
Longitude_list=[]
for i in range(df.shape[0]):
    address='{}, London,UK'.format(df.at[i,'Postcode'])   
    g = geocoder.arcgis(address)
    Latitude_list.append(g.latlng[0])
    Longitude_list.append(g.latlng[1])

In [None]:
df['Latitude']=Latitude_list
df['Longitude']=Longitude_list

In [128]:
df.head()

Unnamed: 0,Neighourhood,London borough,Post town,Postcode district,Dial code,OS grid ref
0,Abbey Wood,"Bexley, Greenwich [7]",LONDON,SE2,20,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4",20,TQ205805
2,Addington,Croydon[8],CROYDON,CR0,20,TQ375645
3,Addiscombe,Croydon[8],CROYDON,CR0,20,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",20,TQ478728


In [129]:
CLIENT_ID = 'CATVSEF43Z1JR3ZVH2ED2DL00CVECE5TNBAO0O5FZO4HX4PV' 
CLIENT_SECRET = 'FQTWJVLGTLZGXL1DN0W5FNK2YU0ONEXKOQJSLPHX2URYZHET' 
VERSION = '20180604' 
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: CATVSEF43Z1JR3ZVH2ED2DL00CVECE5TNBAO0O5FZO4HX4PV
CLIENT_SECRET:FQTWJVLGTLZGXL1DN0W5FNK2YU0ONEXKOQJSLPHX2URYZHET


In [140]:
radius = 400000
LIMIT = 300

venues = []

for lat, long, neighbourhood in zip(df['Latitude'], df['Longitude'], df['Neighbourhood']):
    
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
       results = requests.get(url).json()["response"]['groups'][0]['items']
    
      for venue in results:
        venues.append((
            neighbourhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

KeyError: 'Latitude'

In [141]:
search_query = 'Italian'

print(search_query + ' .... OK!')

Italian .... OK!


In [142]:
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=CATVSEF43Z1JR3ZVH2ED2DL00CVECE5TNBAO0O5FZO4HX4PV&client_secret=FQTWJVLGTLZGXL1DN0W5FNK2YU0ONEXKOQJSLPHX2URYZHET&ll=51.5073219,-0.1276474&v=20180604&query=Italian&radius=4000000000000000000000000000000000000000000000000&limit=30000000000000000000000000'

In [143]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5deeb3099388d7001be6d168'},
 'response': {'venues': [{'id': '4c54126830f92d7f9f71d2b9',
    'name': 'ASK Italian',
    'location': {'address': '121 - 125 Park St',
     'lat': 51.5133638,
     'lng': -0.155954,
     'labeledLatLngs': [{'label': 'display',
       'lat': 51.5133638,
       'lng': -0.155954}],
     'distance': 2073,
     'postalCode': 'W1K 7JA',
     'cc': 'GB',
     'city': 'Mayfair',
     'state': 'Greater London',
     'country': 'United Kingdom',
     'formattedAddress': ['121 - 125 Park St',
      'Mayfair',
      'Greater London',
      'W1K 7JA',
      'United Kingdom']},
    'categories': [{'id': '4bf58dd8d48988d110941735',
      'name': 'Italian Restaurant',
      'pluralName': 'Italian Restaurants',
      'shortName': 'Italian',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/italian_',
       'suffix': '.png'},
      'primary': True}],
    'referralId': 'v-1575924515',
    'hasPerk': False},
   {'id': '

In [144]:
venues = results['response']['venues']

dataframe = json_normalize(venues)
dataframe.head()

Unnamed: 0,categories,hasPerk,id,location.address,location.cc,location.city,location.country,location.crossStreet,location.distance,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.neighborhood,location.postalCode,location.state,name,referralId,venuePage.id
0,"[{'id': '4bf58dd8d48988d110941735', 'name': 'I...",False,4c54126830f92d7f9f71d2b9,121 - 125 Park St,GB,Mayfair,United Kingdom,,2073,"[121 - 125 Park St, Mayfair, Greater London, W...","[{'label': 'display', 'lat': 51.5133638, 'lng'...",51.513364,-0.155954,,W1K 7JA,Greater London,ASK Italian,v-1575924515,
1,"[{'id': '4bf58dd8d48988d114951735', 'name': 'B...",False,4ac518edf964a520c0ac20e3,5 Cecil Court,GB,London,United Kingdom,,384,"[5 Cecil Court, London, Greater London, WC2N 4...","[{'label': 'display', 'lat': 51.51077834156889...",51.510778,-0.127522,,WC2N 4EZ,Greater London,Italian Bookshop,v-1575924515,
2,"[{'id': '4bf58dd8d48988d142941735', 'name': 'A...",False,4e805fb2dab41c952c505ffc,50a Berkeley St,GB,London,United Kingdom,,1054,"[50a Berkeley St, London, Greater London, W1J ...","[{'label': 'display', 'lat': 51.50776710376237...",51.507767,-0.14285,Mayfair,W1J 8HA,Greater London,Novikov,v-1575924515,121043084.0
3,"[{'id': '4bf58dd8d48988d110941735', 'name': 'I...",False,535e72ce498e70b6d38ba610,60c Holborn Viaduct,GB,London,United Kingdom,,1945,"[60c Holborn Viaduct, London, Greater London, ...","[{'label': 'display', 'lat': 51.51697997134506...",51.51698,-0.104241,,EC1A 2FD,Greater London,Beboz Italian Street Food,v-1575924515,132860391.0
4,"[{'id': '4bf58dd8d48988d103951735', 'name': 'C...",False,4eb026469a521bacdb927d65,9 Sloane St,GB,London,United Kingdom,,886,"[9 Sloane St, London, Greater London, SW1X 9LE...","[{'label': 'display', 'lat': 51.50645206211797...",51.506452,-0.14037,,SW1X 9LE,Greater London,Billionaire Italian Couture,v-1575924515,


In [145]:
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id
0,ASK Italian,Italian Restaurant,121 - 125 Park St,GB,Mayfair,United Kingdom,,2073,"[121 - 125 Park St, Mayfair, Greater London, W...","[{'label': 'display', 'lat': 51.5133638, 'lng'...",51.513364,-0.155954,,W1K 7JA,Greater London,4c54126830f92d7f9f71d2b9
1,Italian Bookshop,Bookstore,5 Cecil Court,GB,London,United Kingdom,,384,"[5 Cecil Court, London, Greater London, WC2N 4...","[{'label': 'display', 'lat': 51.51077834156889...",51.510778,-0.127522,,WC2N 4EZ,Greater London,4ac518edf964a520c0ac20e3
2,Novikov,Asian Restaurant,50a Berkeley St,GB,London,United Kingdom,,1054,"[50a Berkeley St, London, Greater London, W1J ...","[{'label': 'display', 'lat': 51.50776710376237...",51.507767,-0.14285,Mayfair,W1J 8HA,Greater London,4e805fb2dab41c952c505ffc
3,Beboz Italian Street Food,Italian Restaurant,60c Holborn Viaduct,GB,London,United Kingdom,,1945,"[60c Holborn Viaduct, London, Greater London, ...","[{'label': 'display', 'lat': 51.51697997134506...",51.51698,-0.104241,,EC1A 2FD,Greater London,535e72ce498e70b6d38ba610
4,Billionaire Italian Couture,Clothing Store,9 Sloane St,GB,London,United Kingdom,,886,"[9 Sloane St, London, Greater London, SW1X 9LE...","[{'label': 'display', 'lat': 51.50645206211797...",51.506452,-0.14037,,SW1X 9LE,Greater London,4eb026469a521bacdb927d65
5,Prezzo Italian Restaurant London St Martins Lane,Italian Restaurant,116 Saint Martin's Ln,GB,London,United Kingdom,,234,"[116 Saint Martin's Ln, London, WC2N 4BD, Unit...","[{'label': 'display', 'lat': 51.50940280181761...",51.509403,-0.127127,,WC2N 4BD,,4c62d713ec94a5932e3d2aca
6,The Italian Job,Beer Bar,40-42 Newington Causeway,GB,London,United Kingdom,,2201,"[40-42 Newington Causeway, London, Greater Lon...","[{'label': 'display', 'lat': 51.49871478592709...",51.498715,-0.099036,,SE1 6DR,Greater London,57f3baea498efae5781eca28
7,Vespa Italian Restaurant,Italian Restaurant,9a Irving St,GB,London,United Kingdom,,293,"[9a Irving St, London, Greater London, WC2H 7A...","[{'label': 'display', 'lat': 51.509867, 'lng':...",51.509867,-0.12877,,WC2H 7AH,Greater London,5b266200066332001c011f5e
8,National Gallery,Art Museum,Trafalgar Sq,GB,London,United Kingdom,,182,"[Trafalgar Sq, London, Greater London, WC2N 5D...","[{'label': 'display', 'lat': 51.50887601013219...",51.508876,-0.128478,,WC2N 5DN,Greater London,4ac518cdf964a520e6a520e3
9,italian furniture london,Furniture / Home Store,,GB,London,United Kingdom,,158,"[London, Greater London, HA9, United Kingdom]","[{'label': 'display', 'lat': 51.50864897264021...",51.508649,-0.126836,,HA9,Greater London,56c33025cd106d138c68f947


In [146]:
dataframe_filtered.name

0                                          ASK Italian
1                                     Italian Bookshop
2                                              Novikov
3                            Beboz Italian Street Food
4                          Billionaire Italian Couture
5     Prezzo Italian Restaurant London St Martins Lane
6                                      The Italian Job
7                             Vespa Italian Restaurant
8                                     National Gallery
9                             italian furniture london
10                                       italian sofas
11                                 Italian Sausage Bap
12                            Italian Trade Commission
13                            Pepe Italian Street Food
14                                    Italian Pizzeria
15                           St Peter's Italian Church
16                                       Italian Jacks
17      Amici Italian Restaurant, Courtyard & Wine Bar
18        

In [147]:
venues_df = pd.DataFrame(venues)

venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']
venues_df.to_csv('venues_df.csv', index = False)
print(venues_df.shape)
venues_df.head()

(50, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,"[{'id': '4bf58dd8d48988d110941735', 'name': 'I...",False,4c54126830f92d7f9f71d2b9,"{'address': '121 - 125 Park St', 'lat': 51.513...",ASK Italian,v-1575924515,
1,"[{'id': '4bf58dd8d48988d114951735', 'name': 'B...",False,4ac518edf964a520c0ac20e3,"{'address': '5 Cecil Court', 'lat': 51.5107783...",Italian Bookshop,v-1575924515,
2,"[{'id': '4bf58dd8d48988d142941735', 'name': 'A...",False,4e805fb2dab41c952c505ffc,"{'address': '50a Berkeley St', 'lat': 51.50776...",Novikov,v-1575924515,{'id': '121043084'}
3,"[{'id': '4bf58dd8d48988d110941735', 'name': 'I...",False,535e72ce498e70b6d38ba610,"{'address': '60c Holborn Viaduct', 'lat': 51.5...",Beboz Italian Street Food,v-1575924515,{'id': '132860391'}
4,"[{'id': '4bf58dd8d48988d103951735', 'name': 'C...",False,4eb026469a521bacdb927d65,"{'address': '9 Sloane St', 'lat': 51.506452062...",Billionaire Italian Couture,v-1575924515,


In [148]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13)

for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
    ).add_to(venues_map)

venues_map

AttributeError: module 'folium.features' has no attribute 'CircleMarker'

If my code wasn't coming up with errors, i would use the k-means clustering model.


If you have any suggestions to fix this please tell me