# Applied Data Science Capstone - Final Project

## 1. Business Problem

According to www.food.gov.uk, there are more than 14,000 restaurants in London and about 9 million people. That is why opening a new restaurant there can be an extremely challenging task. Choosing a restaurant type and a good spot, an entrepreneur usually carelessly relies on common sense and domain knowledge. Needless to say that too often an inconsiderate decision leads to a poor income and inevitable bankruptcy. According to several surveys, up to 40% of such start-ups fail in the very first year. Let's suppose, an investor has enough time and money, as well as a passion to open the best eating spot in London. What type of restaurant would it be? What would be the best place for it? Is there a better way to answer these questions rather than guessing?  
What if there is a way to cluster city neighborhoods, based on their near-by restaurant similarity? What if we can visualize these clusters on a map? What if we might find what type of restaurant is the most and least popular in each location? Equipped with that knowledge, we might be able to make a smart choice from a huge number of restaurant types and available places.  
Let us allow machine learning to get the job done. Using reliable venue data, it can investigate the city neighborhoods, and show us unseen dependencies. Dependencies that we are not aware of.


**Target audience:** investors, entrepreneurs, and chefs interested in opening a restaurant in London, who may need a piece of objective advice of what type of restaurant would be more successful and where exactly it should be opened.

## 2. Data

**Step 1.** Using a table on https://en.wikipedia.org/wiki/List_of_areas_of_London, collect information about London boroughs and locations, excluding records whose "Post Town" is not London.  
**Step 2.** Use the Geopy and Folium library to get the coordinates of every locations and map geospatial data on a London map.  
**Step 3.** Using Foursquare API, collect the top 100 restaurants and their categories for each location within a radius 500 meters.  
**Step 4.** Group collected restaurants by location and by taking the mean of the frequency of occurrence of each type, preparing them for clustering.   
**Step 5.** Cluster restaurants by k-means algorithm and analyze the top 10 most common restaurants in each cluster.    
**Step 6.** Visualize clusters on the map, thus showing the best locations for opening the chosen restaurant.

## 3. Methodology

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [1]:
import time # for time delay while working with API

import requests # library to handle requests

import bs4 # library to parse webpages

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Convert an address into latitude and longitude values
from geopy.geocoders import Nominatim
import geopy.geocoders

import json # library to handle JSON files

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# k-means from clustering stage
from sklearn.cluster import KMeans

# Map rendering library
import folium

# regular expressions
import re

### 3.1. Collecting London Neighborhoods

Let's create a webscrapping script to collect London neighborhoods information from the table on https://en.wikipedia.org/wiki/List_of_areas_of_London with following columns: Post_town, Borough, and Location.

In [2]:
# Download the webpage
url = 'https://en.wikipedia.org/wiki/List_of_areas_of_London'
res = requests.get(url)
res.raise_for_status()

In [3]:
# Create an beautifulSoup object
london_soup = bs4.BeautifulSoup(res.text)

In [4]:
# Selecting all elements inside the corresponding tags
elements = london_soup.select('div table tbody tr td')

In [5]:
# Take a look on raw data
for i in range(2, len(elements), 6):
    print('{0} | {1} | {2} | {3}'.format(str(i//6+1), elements[i].getText(), elements[i+1].getText(), elements[i+2].getText(),
                                                    elements[i+3].getText()))
    if elements[i].getText() == 'Yiewsley': # the last location on the table
        break

1 | Abbey Wood | Bexley,  Greenwich [7] | LONDON
2 | Acton | Ealing, Hammersmith and Fulham[8] | LONDON
3 | Addington | Croydon[8] | CROYDON
4 | Addiscombe | Croydon[8] | CROYDON
5 | Albany Park | Bexley | BEXLEY, SIDCUP
6 | Aldborough Hatch | Redbridge[9] | ILFORD
7 | Aldgate | City[10] | LONDON
8 | Aldwych | Westminster[10] | LONDON
9 | Alperton | Brent[11] | WEMBLEY
10 | Anerley | Bromley[11] | LONDON
11 | Angel | Islington[8] | LONDON
12 | Aperfield | Bromley[11] | WESTERHAM
13 | Archway | Islington[12] | LONDON
14 | Ardleigh Green | Havering[12] | HORNCHURCH
15 | Arkley | Barnet[12] | BARNET, LONDON
16 | Arnos Grove | Enfield[12] | LONDON
17 | Balham | Wandsworth[13] | LONDON
18 | Bankside | Southwark[14] | LONDON
19 | Barbican | City[14] | LONDON
20 | Barking | Barking and Dagenham[14] | BARKING
21 | Barkingside | Redbridge[15] | ILFORD
22 | Barnehurst | Bexley[15] | BEXLEYHEATH
23 | Barnes | Richmond upon Thames[15] | LONDON
24 | Barnes Cray | Bexley[16] | DARTFORD
25 | Barnet G

In [106]:
yiewsley_index = (533-1)*6 + 2
elements[yiewsley_index].get_text()

'Yiewsley'

At the previous step we collected 533 rows with data. The last location in the table is 'Yiewsley' and its index in the _elements_ list is 3194. Let's transform raw data into a list of lists, considering the restriction to ignore location with a _Postal Town_ that is not 'LONDON'. Also we will add two zeros in each row as a initial geographical coordinates.

In [7]:
# Creating a new list of rows
lst = []
for i in range(2, 3195, 6):
    location, borough, postal_town = elements[i].getText(), elements[i+1].getText(), elements[i+2].getText()
    if postal_town != 'LONDON':
        continue
    lst.append([location, borough, postal_town, 0, 0])
lst[25:34]

[['Bloomsbury', 'Camden[29]', 'LONDON', 0, 0],
 ['Bounds Green', 'Haringey[31]', 'LONDON', 0, 0],
 ['Bow', 'Tower Hamlets[31]', 'LONDON', 0, 0],
 ['Bowes Park', 'Haringey[32]', 'LONDON', 0, 0],
 ['Brent Cross', 'Barnet', 'LONDON', 0, 0],
 ['Brent Park', 'Brent', 'LONDON', 0, 0],
 ['Brixton', 'Lambeth[34]', 'LONDON', 0, 0],
 ['Brockley', 'Lewisham[34]', 'LONDON', 0, 0],
 ['Bromley (also Bromley-by-Bow)', 'Tower Hamlets[36]', 'LONDON', 0, 0]]

As we can see there is some garbage in or data, for example in the last row in the previous output: ['Bromley (also Bromley-by-Bow)', 'Tower Hamlets[30]', 'LONDON'].  
Let's clean our data by deleting text in brackets using regular expressions.

In [8]:
for i in range(len(lst)):
    loc, bor = lst[i][0], lst[i][1]
    if loc.endswith(')') or loc.endswith(']'):
        lst[i][0] = re.sub('(\s?\(.*?\)$)|(\s?\[.*?\]$)', '', loc)
    if bor.endswith(')') or bor.endswith(']'):
        lst[i][1] = re.sub('(\s?\(.*?\)$)|(\s?\[.*?\]$)', '', bor)
lst[25:34]

[['Bloomsbury', 'Camden', 'LONDON', 0, 0],
 ['Bounds Green', 'Haringey', 'LONDON', 0, 0],
 ['Bow', 'Tower Hamlets', 'LONDON', 0, 0],
 ['Bowes Park', 'Haringey', 'LONDON', 0, 0],
 ['Brent Cross', 'Barnet', 'LONDON', 0, 0],
 ['Brent Park', 'Brent', 'LONDON', 0, 0],
 ['Brixton', 'Lambeth', 'LONDON', 0, 0],
 ['Brockley', 'Lewisham', 'LONDON', 0, 0],
 ['Bromley', 'Tower Hamlets', 'LONDON', 0, 0]]

So our dataset is clear enough and ready to be transformed into a pandas dataframe. Fine! By the way, how many locations do we have?

In [9]:
print('Now we have {} rows of relevant data.'.format(len(lst)))

Now we have 299 rows of relevant data.


Let's transform them.

In [10]:
london_df = pd.DataFrame(lst, columns=['Location', 'Borough', 'PostalTown', 'Latitude', 'Longitude'])
london_df.head()

Unnamed: 0,Location,Borough,PostalTown,Latitude,Longitude
0,Abbey Wood,"Bexley, Greenwich",LONDON,0,0
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,0,0
2,Aldgate,City,LONDON,0,0
3,Aldwych,Westminster,LONDON,0,0
4,Anerley,Bromley,LONDON,0,0


Confirm the size:

In [11]:
london_df.shape

(299, 5)

### 3.2. Adding Coordinates

In order to utilize the Foursquare location data, we need to get latitude and longitude coordinates for each neighborhood in the dataframe.  
We will use the geopy library for that purpose. Let's try with the first address that is Abbey Wood, Greenwich, London.

In [12]:
# Getting the address string
address = ', '.join(list(london_df.iloc[0, :3]))
address

'Abbey Wood, Bexley,  Greenwich, LONDON'

In [13]:
# Using geopy
geolocator = Nominatim(user_agent='opening_restaurant_london')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of {0} are {1}, {2}.'.format(address, latitude, longitude))

The geograpical coordinate of Abbey Wood, Bexley,  Greenwich, LONDON are 51.4855716, 0.11968682027131783.


In [14]:
# Make changes to the dataframe
london_df.iloc[0,3] = latitude
london_df.iloc[0,4] = longitude
london_df.head()

Unnamed: 0,Location,Borough,PostalTown,Latitude,Longitude
0,Abbey Wood,"Bexley, Greenwich",LONDON,51.485572,0.119687
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,0.0,0.0
2,Aldgate,City,LONDON,0.0,0.0
3,Aldwych,Westminster,LONDON,0.0,0.0
4,Anerley,Bromley,LONDON,0.0,0.0


Read london coordinates.

In [16]:
lon_df = pd.read_csv('london_coordinates.csv')
lon_df.head()

Unnamed: 0,Location,Borough,PostalTown,Latitude,Longitude
0,Abbey Wood,Greenwich,LONDON,51.487621,0.11405
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,0.0,0.0
2,Aldgate,City,LONDON,51.514248,-0.075719
3,Aldwych,Westminster,LONDON,51.51294,-0.118101
4,Anerley,Bromley,LONDON,51.412848,-0.065301


The next step is to drop rows that still contain 0 as a latitude or longitude.

In [17]:
# Check initial shape
lon_df.shape

(299, 5)

In [18]:
# Substitute all zeros by NAN
lon_df = lon_df.replace(0, np.nan)

# Drop all rows containing NAN
lon_df.dropna(subset=['Latitude', 'Longitude'], axis=0, inplace=True)
lon_df.reset_index(drop=True, inplace=True)
print('Now the London dataframe has {0} data rows.'.format(lon_df.shape[0]))

Now the London dataframe has 290 data rows.


Check if there are not unique location names.

In [19]:
len(lon_df['Location'].unique())

288

In [20]:
# Printing these locations
for i in range(len(lon_df)):
    loc = lon_df.iloc[i,0]
    for j in range(i+1, len(lon_df)):
        if lon_df.iloc[j,0] == loc:
            print(j, loc)

54 Church End
103 Grove Park


For illustration purposes, let's simplify things and drop the doubled locations.  
_(Actually, we are not allowed to do that because a pair of "not unique" location and its borough is still unique. There are only 2 locations from 290 what is not a big deal.)_

In [21]:
lon_df.drop_duplicates(subset='Location', keep='first', inplace=True)
if lon_df['Location'].unique().shape[0] == lon_df.shape[0]:
    print('Duplicates were removed successfully.')

Duplicates were removed successfully.


Confirm the new size.

In [22]:
lon_df.shape

(288, 5)

So 288 London neighborhoods are ready to be shown on a map.  
We will use the folium library for this purpose.

In [23]:
# Get the London "central" point
london_address = 'London, England'
geolocator = Nominatim(user_agent='opening_restaurant_london')
location = geolocator.geocode(london_address)
london_lat = location.latitude
london_lon = location.longitude
print('The geograpical coordinate of {0} are {1}, {2}.'.format(london_address, london_lat, london_lon))

The geograpical coordinate of London, England are 51.5073219, -0.1276474.


In [27]:
# create map of London using starting point coordinates
london_map = folium.Map(location=[london_lat, london_lon], zoom_start=11)

# add markers to map
for lat, lng, bor, loc in zip(lon_df['Latitude'], lon_df['Longitude'], lon_df['Borough'], lon_df['Location']):
    label = '{}, {}'.format(loc, bor)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        ).add_to(london_map)
    
london_map

## 4. Exploring London Restaurants

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

### 4.1. Collecting Restaurants

Let's explore the first neighborhood in our dataframe.

In [25]:
lon_df.loc[16, 'Location']

'Bellingham'

Get the neighborhood's latitude and longitude values.

In [26]:
loc_latitude = lon_df.loc[16, 'Latitude'] # neighborhood latitude value
loc_longitude = lon_df.loc[16, 'Longitude'] # neighborhood longitude value

loc_name = lon_df.loc[16, 'Location'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(loc_name, 
                                                               loc_latitude, 
                                                               loc_longitude))

Latitude and longitude values of Bellingham are 51.4329965, -0.019337599999999996.


Now, let's get the top 100 venues that are in Marble Hill within a radius of 500 meters.

In [29]:
# type your answer here
radius = 500
LIMIT = 100
CLIENT_ID = 'KPLW3E1TOZKWZGEYHYPHWSTE3YXLPBJ5GGSDHKPHVOUF4GXD' # your Foursquare ID
CLIENT_SECRET = 'FYVIMZYB4XVBTGW5QT1JEOZGU5YL3KXL3XEIOGJ5L2TS3TEY' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
url = 'https://api.foursquare.com/v2/venues/explore?client_id={0}&client_secret={1}&ll={2},{3}&v={4}&radius={5}&limit={6}&query=restaurant'.format(CLIENT_ID, CLIENT_SECRET, loc_latitude, loc_longitude, VERSION, radius, LIMIT)

In [31]:
results = requests.get(url).json()

In [32]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [33]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues = nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,Turkuaz,Turkish Restaurant,51.43532,-0.017902
1,Papa John's Pizza,Pizza Place,51.433644,-0.017322
2,Chicago Hot Pizza & Sandwich Bar,Pizza Place,51.43331,-0.017467
3,Morley's,Fried Chicken Joint,51.432677,-0.017359


In [34]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

4 venues were returned by Foursquare.


Let's create a function to repeat the same process to all the neighborhoods in London.

In [37]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={0}&client_secret={1}&v={2}&ll={3},{4}&radius={5}&limit={6}&query=restaurant'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            #v['venue']['location']['lat'], 
            #v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Location', 
                  'Latitude', 
                  'Longitude', 
                  'Venue', 
                  #'Venue Latitude', 
                  #'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Now run the above function on each neighborhood and create a new dataframe called *london_venues*.

In [38]:
london_venues = getNearbyVenues(names=lon_df['Location'],
                                   latitudes=lon_df['Latitude'],
                                   longitudes=lon_df['Longitude']
                                  )

Let's check the size of the resulting dataframe.

In [39]:
print(london_venues.shape)
london_venues.head()

(6357, 5)


Unnamed: 0,Location,Latitude,Longitude,Venue,Venue Category
0,Abbey Wood,51.487621,0.11405,Frank's Fish Bar,Fish & Chips Shop
1,Aldgate,51.514248,-0.075719,Benk + Bo,Bakery
2,Aldgate,51.514248,-0.075719,1n1 Fashion Pizza,Pizza Place
3,Aldgate,51.514248,-0.075719,Bife,Argentinian Restaurant
4,Aldgate,51.514248,-0.075719,The Japanese Canteen,Japanese Restaurant


Let's check how many restaurants were returned for each neighborhood.

In [41]:
london_venues[['Location', 'Venue']].groupby('Location').count()

Unnamed: 0_level_0,Venue
Location,Unnamed: 1_level_1
Abbey Wood,1
Aldgate,100
Aldwych,100
Anerley,3
Angel,68
Archway,24
Arnos Grove,8
Balham,35
Bankside,62
Barbican,51


And check if Foursquare API did not return restaurants for some locations.

In [42]:
x = london_venues[['Location', 'Venue']].groupby('Location').count().shape[0]
y = lon_df.shape[0]
empty_locations = []
if x != y:
    print('Missing data for {0} locations:'.format(y-x))
    # And print them
    for i in range(lon_df.shape[0]):
        loc = lon_df.iloc[i,0]
        k = 0
        for j in range(london_venues.shape[0]):
            if loc == london_venues.iloc[j,0]:
                k += 1
        if k == 0:
            print(i,loc)
            empty_locations.append(loc)

Missing data for 4 locations:
51 Chinbrook
62 Crossness
163 Mill Hill
287 Wormwood Scrubs


Let's find out how many unique categories can be curated from all the returned restaurants.

In [43]:
print('There are {0} uniques categories.'.format(len(london_venues['Venue Category'].unique())))

There are 130 uniques categories.


### 4.2. Exploring Restaurants

To begin analisys we need to transform collected information using the one-hot encoding method.

In [44]:
# one hot encoding
london_onehot = pd.get_dummies(london_venues[['Venue Category']], prefix="", prefix_sep="")

# add location column back to dataframe
london_onehot['Location'] = london_venues['Location'] 

# move location column to the first column
fixed_columns = [london_onehot.columns[-1]] + list(london_onehot.columns[:-1])
london_onehot = london_onehot[fixed_columns]

london_onehot.head()

Unnamed: 0,Location,Afghan Restaurant,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,Australian Restaurant,Austrian Restaurant,BBQ Joint,Bagel Shop,Bakery,Belgian Restaurant,Bistro,Brasserie,Brazilian Restaurant,Breakfast Spot,Buffet,Bulgarian Restaurant,Burger Joint,Burrito Place,Cafeteria,Café,Cajun / Creole Restaurant,Cantonese Restaurant,Caribbean Restaurant,Caucasian Restaurant,Chaat Place,Chinese Restaurant,Churrascaria,Cigkofte Place,Creperie,Cuban Restaurant,Currywurst Joint,Czech Restaurant,Deli / Bodega,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Food,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Gastropub,German Restaurant,Gluten-free Restaurant,Greek Restaurant,Halal Restaurant,Hawaiian Restaurant,Himalayan Restaurant,Hot Dog Joint,Hunan Restaurant,Hungarian Restaurant,Indian Restaurant,Indonesian Restaurant,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jewish Restaurant,Kebab Restaurant,Korean Restaurant,Kosher Restaurant,Kurdish Restaurant,Latin American Restaurant,Lebanese Restaurant,Malay Restaurant,Mamak Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Mineiro Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,New American Restaurant,Noodle House,North Indian Restaurant,Okonomiyaki Restaurant,Paella Restaurant,Pakistani Restaurant,Persian Restaurant,Peruvian Restaurant,Pizza Place,Poke Place,Polish Restaurant,Portuguese Restaurant,Ramen Restaurant,Restaurant,Russian Restaurant,Salad Place,Sandwich Place,Scandinavian Restaurant,Seafood Restaurant,Shabu-Shabu Restaurant,Snack Place,Soup Place,South American Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Souvlaki Shop,Spanish Restaurant,Sri Lankan Restaurant,Steakhouse,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tapas Restaurant,Thai Restaurant,Turkish Restaurant,Udon Restaurant,Ukrainian Restaurant,Vegetarian / Vegan Restaurant,Veneto Restaurant,Vietnamese Restaurant,Wings Joint,Xinjiang Restaurant,Yoshoku Restaurant
0,Abbey Wood,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Aldgate,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Aldgate,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Aldgate,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Aldgate,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [45]:
london_onehot.shape

(6357, 131)

Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category preparing the dataframe for clustering.

In [46]:
london_grouped = london_onehot.groupby('Location').mean().reset_index()
london_grouped

Unnamed: 0,Location,Afghan Restaurant,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,Australian Restaurant,Austrian Restaurant,BBQ Joint,Bagel Shop,Bakery,Belgian Restaurant,Bistro,Brasserie,Brazilian Restaurant,Breakfast Spot,Buffet,Bulgarian Restaurant,Burger Joint,Burrito Place,Cafeteria,Café,Cajun / Creole Restaurant,Cantonese Restaurant,Caribbean Restaurant,Caucasian Restaurant,Chaat Place,Chinese Restaurant,Churrascaria,Cigkofte Place,Creperie,Cuban Restaurant,Currywurst Joint,Czech Restaurant,Deli / Bodega,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Food,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Gastropub,German Restaurant,Gluten-free Restaurant,Greek Restaurant,Halal Restaurant,Hawaiian Restaurant,Himalayan Restaurant,Hot Dog Joint,Hunan Restaurant,Hungarian Restaurant,Indian Restaurant,Indonesian Restaurant,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jewish Restaurant,Kebab Restaurant,Korean Restaurant,Kosher Restaurant,Kurdish Restaurant,Latin American Restaurant,Lebanese Restaurant,Malay Restaurant,Mamak Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Mineiro Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,New American Restaurant,Noodle House,North Indian Restaurant,Okonomiyaki Restaurant,Paella Restaurant,Pakistani Restaurant,Persian Restaurant,Peruvian Restaurant,Pizza Place,Poke Place,Polish Restaurant,Portuguese Restaurant,Ramen Restaurant,Restaurant,Russian Restaurant,Salad Place,Sandwich Place,Scandinavian Restaurant,Seafood Restaurant,Shabu-Shabu Restaurant,Snack Place,Soup Place,South American Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Souvlaki Shop,Spanish Restaurant,Sri Lankan Restaurant,Steakhouse,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tapas Restaurant,Thai Restaurant,Turkish Restaurant,Udon Restaurant,Ukrainian Restaurant,Vegetarian / Vegan Restaurant,Veneto Restaurant,Vietnamese Restaurant,Wings Joint,Xinjiang Restaurant,Yoshoku Restaurant
0,Abbey Wood,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Aldgate,0.0,0.0,0.01,0.0,0.01,0.02,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.1,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.05,0.0,0.04,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.01,0.05,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.01,0.0,0.06,0.0,0.05,0.04,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.06,0.01,0.0,0.0,0.05,0.02,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.0
2,Aldwych,0.0,0.0,0.03,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.04,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.06,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.01,0.0,0.08,0.0,0.03,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.12,0.0,0.0,0.07,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.04,0.0,0.0,0.02,0.02,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0
3,Anerley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Angel,0.014706,0.0,0.0,0.0,0.0,0.014706,0.0,0.014706,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014706,0.0,0.0,0.044118,0.044118,0.0,0.117647,0.014706,0.0,0.0,0.014706,0.0,0.0,0.0,0.0,0.014706,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.014706,0.0,0.0,0.0,0.0,0.0,0.029412,0.029412,0.014706,0.014706,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014706,0.0,0.029412,0.014706,0.0,0.0,0.058824,0.0,0.014706,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.014706,0.058824,0.0,0.0,0.044118,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.088235,0.0,0.0,0.014706,0.014706,0.014706,0.0,0.0,0.029412,0.0,0.029412,0.0,0.0,0.0
5,Archway,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.083333,0.0,0.083333,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0
6,Arnos Grove,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Balham,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.085714,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.057143,0.0,0.0,0.171429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.057143,0.0,0.057143,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.085714,0.0,0.0,0.0,0.028571,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.028571,0.0,0.028571,0.0,0.0,0.057143,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Bankside,0.0,0.0,0.0,0.0,0.0,0.048387,0.016129,0.0,0.0,0.0,0.032258,0.0,0.016129,0.0,0.0,0.016129,0.0,0.0,0.0,0.016129,0.0,0.080645,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.016129,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.032258,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.048387,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.048387,0.0,0.0,0.0,0.080645,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.016129,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.048387,0.032258,0.048387,0.0,0.0,0.032258,0.0,0.048387,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.048387,0.0,0.016129,0.016129,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.016129,0.0,0.0,0.0
9,Barbican,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.019608,0.0,0.156863,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.039216,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.019608,0.098039,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.117647,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.039216,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.098039,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.019608,0.0,0.058824,0.058824,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.039216,0.0,0.0,0.0


Let's confirm the new size.

In [47]:
london_grouped.shape

(284, 131)

Let's investigate each neighborhood along with the top 5 most common venues.

In [50]:
# Function to sort the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [49]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Location']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Location'] = london_grouped['Location']

for ind in np.arange(london_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(london_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Location,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Abbey Wood,Fish & Chips Shop,Yoshoku Restaurant,Currywurst Joint,Deli / Bodega,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant
1,Aldgate,Café,Indian Restaurant,Sushi Restaurant,Restaurant,Salad Place,Thai Restaurant,Middle Eastern Restaurant,Italian Restaurant,Japanese Restaurant,Sandwich Place
2,Aldwych,Restaurant,Italian Restaurant,Sandwich Place,Café,Burger Joint,French Restaurant,Indian Restaurant,Sushi Restaurant,Bakery,English Restaurant
3,Anerley,Sandwich Place,Burger Joint,Pizza Place,Dumpling Restaurant,Falafel Restaurant,Ethiopian Restaurant,English Restaurant,Empanada Restaurant,Eastern European Restaurant,Yoshoku Restaurant
4,Angel,Café,Sushi Restaurant,Restaurant,Italian Restaurant,Sandwich Place,Burger Joint,Burrito Place,English Restaurant,Pizza Place,Mexican Restaurant
5,Archway,Café,Pizza Place,Fast Food Restaurant,Italian Restaurant,Japanese Restaurant,Sandwich Place,Indian Restaurant,Asian Restaurant,Bakery,Vegetarian / Vegan Restaurant
6,Arnos Grove,Steakhouse,Kebab Restaurant,Italian Restaurant,Fish & Chips Shop,Café,French Restaurant,Mediterranean Restaurant,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant
7,Balham,Café,Pizza Place,Bakery,Indian Restaurant,Sandwich Place,Fast Food Restaurant,Fish & Chips Shop,Burger Joint,Portuguese Restaurant,Asian Restaurant
8,Bankside,Café,Italian Restaurant,Indian Restaurant,Asian Restaurant,Portuguese Restaurant,Restaurant,Spanish Restaurant,Seafood Restaurant,Food Truck,Chinese Restaurant
9,Barbican,Café,Italian Restaurant,French Restaurant,Sandwich Place,Steakhouse,Sushi Restaurant,Vietnamese Restaurant,Modern European Restaurant,English Restaurant,Moroccan Restaurant


### 4.3. Clustering Restaurants

Run *k*-means to cluster the neighborhood into 5 clusters.

In [51]:
# set number of clusters
kclusters = 5

london_grouped_clustering = london_grouped.drop('Location', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=4).fit(london_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 2, 2, 2, 2, 2, 2, 2, 2, 2], dtype=int32)

Let's create a new dataframe that includes the clusters as well as the top 10 venues for each neighborhood.  
Do not forget that some location didn't get any data from Foursquare API, and we put them to the list.  
Therfore we are forced to exclude them from the resulting dataset.

In [52]:
london_merged = lon_df

# Substitute all empty locations by NAN
for loc in empty_locations:
    london_merged = london_merged.replace(loc, np.nan)

# then drop all rows containing NAN
london_merged.dropna(subset=['Location'], axis=0, inplace=True)
london_merged.reset_index(drop=True, inplace=True)
print('Now the cluster dataframe has {0} data rows.'.format(london_merged.shape[0]))

# add clustering labels
london_merged['Cluster Labels'] = kmeans.labels_

# merge london_grouped with lon_df to add latitude/longitude for each neighborhood
london_merged = london_merged.join(neighborhoods_venues_sorted.set_index('Location'), on='Location')

london_merged.head()

Now the cluster dataframe has 284 data rows.


Unnamed: 0,Location,Borough,PostalTown,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Abbey Wood,Greenwich,LONDON,51.487621,0.11405,4,Fish & Chips Shop,Yoshoku Restaurant,Currywurst Joint,Deli / Bodega,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant
1,Aldgate,City,LONDON,51.514248,-0.075719,2,Café,Indian Restaurant,Sushi Restaurant,Restaurant,Salad Place,Thai Restaurant,Middle Eastern Restaurant,Italian Restaurant,Japanese Restaurant,Sandwich Place
2,Aldwych,Westminster,LONDON,51.51294,-0.118101,2,Restaurant,Italian Restaurant,Sandwich Place,Café,Burger Joint,French Restaurant,Indian Restaurant,Sushi Restaurant,Bakery,English Restaurant
3,Anerley,Bromley,LONDON,51.412848,-0.065301,2,Sandwich Place,Burger Joint,Pizza Place,Dumpling Restaurant,Falafel Restaurant,Ethiopian Restaurant,English Restaurant,Empanada Restaurant,Eastern European Restaurant,Yoshoku Restaurant
4,Angel,Islington,LONDON,51.531946,-0.106106,2,Café,Sushi Restaurant,Restaurant,Italian Restaurant,Sandwich Place,Burger Joint,Burrito Place,English Restaurant,Pizza Place,Mexican Restaurant


## 5. Results

And now we are ready to conclude our report.

### 5.1. Examine Clusters

Let's examine each cluster and the discriminating restaurant categories that distinguish a cluster.

#### Cluster 1

In [53]:
london_merged.loc[london_merged['Cluster Labels'] == 0, london_merged.columns[[1] + list(range(5, london_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,Ealing,0,Café,Mediterranean Restaurant,French Restaurant,Pizza Place,Middle Eastern Restaurant,Bakery,Falafel Restaurant,Ethiopian Restaurant,English Restaurant,Yoshoku Restaurant
19,Tower Hamlets,0,Café,Fast Food Restaurant,Breakfast Spot,Pizza Place,French Restaurant,Eastern European Restaurant,Restaurant,Sandwich Place,Falafel Restaurant,Korean Restaurant
22,Greenwich,0,Café,Deli / Bodega,Chinese Restaurant,Fast Food Restaurant,Fish & Chips Shop,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant
24,Camden,0,Café,Restaurant,Sandwich Place,Indian Restaurant,Burger Joint,Italian Restaurant,Deli / Bodega,Pizza Place,Turkish Restaurant,French Restaurant
26,Tower Hamlets,0,Burger Joint,Fast Food Restaurant,Café,Bakery,English Restaurant,Filipino Restaurant,Falafel Restaurant,Ethiopian Restaurant,Yoshoku Restaurant,Fish & Chips Shop
28,Barnet,0,Café,Fast Food Restaurant,Pizza Place,Italian Restaurant,Snack Place,Sandwich Place,Sushi Restaurant,Donut Shop,Burger Joint,Portuguese Restaurant
31,Lewisham,0,Chinese Restaurant,Café,Pizza Place,Breakfast Spot,Malay Restaurant,Fried Chicken Joint,Fast Food Restaurant,Fish & Chips Shop,Diner,Dumpling Restaurant
32,Tower Hamlets,0,Burger Joint,Café,Fast Food Restaurant,Yoshoku Restaurant,Fish & Chips Shop,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant
33,"Kensington and Chelsea,Hammersmith and Fulham",0,Café,Pizza Place,Indian Restaurant,Restaurant,Middle Eastern Restaurant,Burger Joint,BBQ Joint,Italian Restaurant,Mediterranean Restaurant,Donut Shop
35,Barnet,0,Turkish Restaurant,Diner,Café,Eastern European Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,English Restaurant,Empanada Restaurant,Yoshoku Restaurant


#### Cluster 2

In [55]:
london_merged.loc[london_merged['Cluster Labels'] == 1, london_merged.columns[[1] + list(range(5, london_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
29,Brent,1,Indian Restaurant,Fast Food Restaurant,Burger Joint,Fish & Chips Shop,Deli / Bodega,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant
47,Greenwich,1,Indian Restaurant,Thai Restaurant,Café,Fish & Chips Shop,Food Truck,Breakfast Spot,Greek Restaurant,Gluten-free Restaurant,Deli / Bodega,Dim Sum Restaurant
57,Barnet,1,Indian Restaurant,Chinese Restaurant,Food Truck,Café,Eastern European Restaurant,Pizza Place,Doner Restaurant,Donut Shop,Fast Food Restaurant,Diner
63,Tower Hamlets,1,Indian Restaurant,Café,Food Truck,Empanada Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,English Restaurant,Eastern European Restaurant,Fish & Chips Shop
66,Dartford,1,Indian Restaurant,Café,Fish & Chips Shop,Deli / Bodega,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant
83,Barnet,1,Indian Restaurant,Café,Fast Food Restaurant,Fish & Chips Shop,Deli / Bodega,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant
97,Enfield,1,Indian Restaurant,English Restaurant,Currywurst Joint,Deli / Bodega,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant
114,Lambeth,1,Pizza Place,Café,Fish & Chips Shop,Bakery,Restaurant,English Restaurant,Middle Eastern Restaurant,Deli / Bodega,Doner Restaurant,Donut Shop
152,Westminster,1,Café,Thai Restaurant,Deli / Bodega,Pizza Place,Bakery,Chinese Restaurant,Mediterranean Restaurant,Gastropub,Fish & Chips Shop,Japanese Restaurant
153,Hackney,1,Pizza Place,Café,Fast Food Restaurant,Restaurant,Bagel Shop,Thai Restaurant,Diner,Doner Restaurant,Donut Shop,Dim Sum Restaurant


#### Cluster 3

In [57]:
london_merged.loc[london_merged['Cluster Labels'] == 2, london_merged.columns[[1] + list(range(5, london_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,City,2,Café,Indian Restaurant,Sushi Restaurant,Restaurant,Salad Place,Thai Restaurant,Middle Eastern Restaurant,Italian Restaurant,Japanese Restaurant,Sandwich Place
2,Westminster,2,Restaurant,Italian Restaurant,Sandwich Place,Café,Burger Joint,French Restaurant,Indian Restaurant,Sushi Restaurant,Bakery,English Restaurant
3,Bromley,2,Sandwich Place,Burger Joint,Pizza Place,Dumpling Restaurant,Falafel Restaurant,Ethiopian Restaurant,English Restaurant,Empanada Restaurant,Eastern European Restaurant,Yoshoku Restaurant
4,Islington,2,Café,Sushi Restaurant,Restaurant,Italian Restaurant,Sandwich Place,Burger Joint,Burrito Place,English Restaurant,Pizza Place,Mexican Restaurant
5,Islington,2,Café,Pizza Place,Fast Food Restaurant,Italian Restaurant,Japanese Restaurant,Sandwich Place,Indian Restaurant,Asian Restaurant,Bakery,Vegetarian / Vegan Restaurant
6,Enfield,2,Steakhouse,Kebab Restaurant,Italian Restaurant,Fish & Chips Shop,Café,French Restaurant,Mediterranean Restaurant,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant
7,Wandsworth,2,Café,Pizza Place,Bakery,Indian Restaurant,Sandwich Place,Fast Food Restaurant,Fish & Chips Shop,Burger Joint,Portuguese Restaurant,Asian Restaurant
8,Southwark,2,Café,Italian Restaurant,Indian Restaurant,Asian Restaurant,Portuguese Restaurant,Restaurant,Spanish Restaurant,Seafood Restaurant,Food Truck,Chinese Restaurant
9,City,2,Café,Italian Restaurant,French Restaurant,Sandwich Place,Steakhouse,Sushi Restaurant,Vietnamese Restaurant,Modern European Restaurant,English Restaurant,Moroccan Restaurant
10,Richmond upon Thames,2,Thai Restaurant,Pizza Place,Breakfast Spot,Café,Restaurant,Bakery,Eastern European Restaurant,Ethiopian Restaurant,English Restaurant,Empanada Restaurant


#### Cluster 4

In [59]:
london_merged.loc[london_merged['Cluster Labels'] == 3, london_merged.columns[[1] + list(range(5, london_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
108,Camden,3,Café,Bakery,Italian Restaurant,Pizza Place,Japanese Restaurant,Sushi Restaurant,Deli / Bodega,Creperie,English Restaurant,Chinese Restaurant
121,Islington,3,Café,Fast Food Restaurant,Bakery,Chinese Restaurant,Indian Restaurant,Eastern European Restaurant,Ramen Restaurant,Mediterranean Restaurant,Sandwich Place,Ethiopian Restaurant
163,Richmond upon Thames,3,Indian Restaurant,Restaurant,Gastropub,Café,Falafel Restaurant,Ethiopian Restaurant,English Restaurant,Empanada Restaurant,Eastern European Restaurant,Dumpling Restaurant
177,Kensington and Chelsea,3,Italian Restaurant,Bakery,Restaurant,Deli / Bodega,Indian Restaurant,Burger Joint,Gastropub,Japanese Restaurant,Korean Restaurant,Latin American Restaurant
178,Southwark,3,Indian Restaurant,Chinese Restaurant,Café,Bakery,Tapas Restaurant,Thai Restaurant,Pizza Place,Diner,Doner Restaurant,Donut Shop
179,Barnet,3,Café,Fried Chicken Joint,Yoshoku Restaurant,Empanada Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,English Restaurant,Eastern European Restaurant,Fish & Chips Shop
204,Tower Hamlets,3,Fast Food Restaurant,Indian Restaurant,Fried Chicken Joint,Burger Joint,Restaurant,Snack Place,Pizza Place,Bakery,Italian Restaurant,Doner Restaurant
234,Southwark,3,Café,Fast Food Restaurant,Pizza Place,American Restaurant,Fish & Chips Shop,Chinese Restaurant,Thai Restaurant,Vietnamese Restaurant,Diner,Dim Sum Restaurant
235,Camden,3,Café,Italian Restaurant,Fast Food Restaurant,Restaurant,Chinese Restaurant,Bakery,Japanese Restaurant,Moroccan Restaurant,Sushi Restaurant,Sandwich Place
283,Greenwich,3,Café,Yoshoku Restaurant,Fish & Chips Shop,Deli / Bodega,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant


#### Cluster 5

In [66]:
london_merged.loc[london_merged['Cluster Labels'] == 4, london_merged.columns[[1] + list(range(5, london_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Greenwich,4,Fish & Chips Shop,Yoshoku Restaurant,Currywurst Joint,Deli / Bodega,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant
16,Lewisham,4,Pizza Place,Fried Chicken Joint,Turkish Restaurant,Yoshoku Restaurant,Eastern European Restaurant,Falafel Restaurant,Ethiopian Restaurant,English Restaurant,Empanada Restaurant,Dumpling Restaurant
36,Barnet,4,Fast Food Restaurant,Sushi Restaurant,Pizza Place,Fried Chicken Joint,Noodle House,Sandwich Place,Korean Restaurant,Japanese Restaurant,Breakfast Spot,Italian Restaurant
41,Newham,4,Fast Food Restaurant,Turkish Restaurant,Bakery,African Restaurant,Pizza Place,Breakfast Spot,Café,Fish & Chips Shop,English Restaurant,Yoshoku Restaurant
53,"Lambeth, Wandsworth",4,Fast Food Restaurant,Indian Restaurant,Gastropub,Caucasian Restaurant,Mexican Restaurant,Pizza Place,Breakfast Spot,Fish & Chips Shop,German Restaurant,Gluten-free Restaurant
56,Merton,4,Indian Restaurant,American Restaurant,Fast Food Restaurant,Italian Restaurant,Fish & Chips Shop,Portuguese Restaurant,Café,Middle Eastern Restaurant,Lebanese Restaurant,Turkish Restaurant
59,"Barnet, Brent, Camden",4,Fast Food Restaurant,Café,Breakfast Spot,Afghan Restaurant,Italian Restaurant,Asian Restaurant,Sandwich Place,Bagel Shop,English Restaurant,Falafel Restaurant
70,Brent,4,Fast Food Restaurant,Bakery,Pizza Place,Vegetarian / Vegan Restaurant,Portuguese Restaurant,Restaurant,Thai Restaurant,Ethiopian Restaurant,English Restaurant,Empanada Restaurant
77,Newham,4,Indian Restaurant,Turkish Restaurant,Café,Sandwich Place,Fast Food Restaurant,Thai Restaurant,Chinese Restaurant,Doner Restaurant,Donut Shop,Diner
81,Greenwich,4,Mexican Restaurant,Restaurant,Chinese Restaurant,Pizza Place,Dumpling Restaurant,Falafel Restaurant,Ethiopian Restaurant,English Restaurant,Empanada Restaurant,Eastern European Restaurant


### 5.2. Visualizing Clusters

Finally, let's visualize the resulting clusters.

In [67]:
# create map
map_clusters = folium.Map(location=[london_lat, london_lon], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(london_merged['Latitude'], london_merged['Longitude'], london_merged['Location'], london_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster+1), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

**MAP LEGEND**  
Cluster 1 - red dots  
Cluster 2 - purple dots  
Cluster 3 - blue dots  
Cluster 4 - green dots  
Cluster 5 - orange dots  

## 6. Discussion

Analyzing the most popular restaurants in each cluster, the stakeholder should prefer the *least* popular types as a safe choice. There is no sense in opening the 17th pizzeria in the same street. Of course, there might be more than 10 types in a location. And one might object, that following this logic, the stakeholder must prefer the last type in a full list, and not the 10th one. But bear in mind that descending on the popularity list we might face an absence of demand for this type of food, and open a restaurant that is not needed in this particular location. Presence of interested customers is a must for a successful business. That is why in our recommendations we offer to stop on 10th and 9th positions.

Recommendations, based on description of each cluster:  
**Cluster 1 Locations:** Eastern European or Dumpling Restaurant  
**Cluster 2 Locations:** Empanada or Ethiopian Restaurant  
**Cluster 3 Locations:** Eastern European or Ethiopian Restaurant  
**Cluster 4 Locations:** Eastern European or Dumpling Restaurant  
**Cluster 5 Locations:** Eastern European or Dumpling Restaurant  

After the type of restaurant is chosen, it is time to select a right place. Using the map created in 5.2 and its legend the solution is quite obvious. 

## 7. Conclusion

In this report we worked out a methodology to determine what the most promising type of restaurant is and where it should be opened.  

We collected information about London boroughs from Wikipedia, and using geospatial libraries mapped them. Using Foursquare API, we collected the top 100 restaurants and their types for each location within a radius 500 meters from its central point. Then we grouped collected restaurants by location and by taking the mean of the frequency of occurrence of each type, preparing them for clustering. Finally we clustered restaurants by the k-means algorithm and analize the top 10 most common restaurants in each cluster, making useful observations. Eventually we visualized clusters on the map, thus showing the best locations for opening the chosen type of restaurant.

This type of analysis can be applied to any city of your choice that has available geospatial information.

This type of analysis can be applied to any type of venue (shopping, clubs, etc.) that is available in Foursquare database.