# A NEW RESTAURANT IN LONDON

##### 1.DESCRIPTION OF THE PROBLEM AND A DISCUSSION OF THE BACKGROUND

It is well known that london is one of the most multicultural cities in the world. In London you can find different cuisines from all over the world and try some delicious food which you never had the chance before in your life. That is why opening a new restaurant there can be an extremely challenging task. Choosing a restaurant type and a good spot, an entrepreneur usually carelessly relies on common sense and domain knowledge. Needless to say that too often an inconsiderate decision leads to a poor income and inevitable bankruptcy. According to several surveys, up to 40% of such start-ups fail in the very first year. Let's suppose, an investor has enough time and money, as well as a passion to open the best eating spot in London. What type of restaurant would it be? What would be the best place for it? Is there a better way to answer these questions rather than guessing?
What if there is a way to cluster city neighborhoods, based on their restaurant similarity? What if we can visualize these clusters on a map? What if we might find what type of restaurant is the most and least popular in each location? Equipped with that knowledge, we might be able to make a smart choice from a huge number of restaurant types and available places.
Let us allow machine learning to get the job done. Using reliable venue data, it can investigate the city neighborhoods, and show us unseen dependencies. Dependencies that we are not aware of.


###### Target audience:  
investors, entrepreneurs, and chefs interested in opening a restaurant in London, who may need a piece of objective advice of what type of restaurant would be more successful and where exactly it should be opened.

###### 2.DESCRIPTION OF THE DATA AND HOW IT WILL BE USED TO SOLVE THE PROBLEM

Step 1. Using a table on https://en.wikipedia.org/wiki/List_of_areas_of_London, collect information about London boroughs and locations, excluding records whose "Post Town" is not London.

Step 2. Use the Geopy and Folium library to get the coordinates of every locations and map geospatial data on a London map.

Step 3. Using Foursquare API, collect the top 100 restaurants and their categories for each location within a radius 500 meters.

Step 4. Group collected restaurants by location and by taking the mean of the frequency of occurrence of each type, preparing them for clustering.

Step 5. Cluster restaurants by k-means algorithm and analyze the top 10 most common restaurants in each cluster.

Step 6. Visualize clusters on the map, thus showing the best locations for opening the chosen restaurant.

###### 3.METHODOLOGY SECTION

Before starting our exploratory data analysis, let's download all the dependencies that we will need.

In [1]:
import time # for time delay while working with API

import requests # library to handle requests

import bs4 # library to parse webpages

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Convert an address into latitude and longitude values
!conda install -c conda-forge geopy --yes 
import geopy.geocoders
from geopy.geocoders import Nominatim

import json # library to handle JSON files

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
!conda install -c conda-forge folium
# Map rendering library
import folium

# regular expressions
import re

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    certifi-2019.11.28         |           py36_0         149 KB  conda-forge
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-forge
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0         conda-forge
    geopy:           1.20.0-py_0       conda-forge

The following packages will be UPDATED:

    ca-

###### Section 1

In this section we will collect all London neighborhoods. We start by creating a webscrapping script to collect London neighborhoods information from the table on https://en.wikipedia.org/wiki/List_of_areas_of_London with following columns: Post_town, Borough, and Location.

In [2]:
# Download the webpage
url = 'https://en.wikipedia.org/wiki/List_of_areas_of_London'
res = requests.get(url)
res.raise_for_status()

In [3]:
# Create a beautifulSoup object
london_soup = bs4.BeautifulSoup(res.text)

In [4]:
# Selecting all elements inside the corresponding tags
elements = london_soup.select('div table tbody tr td')

In [5]:
for i in range(2, len(elements), 6):
    print('{0} | {1} | {2} | {3}'.format(str(i//6+1), elements[i].getText(), elements[i+1].getText(), elements[i+2].getText(),
                                                    elements[i+3].getText()))
    if elements[i].getText() == 'Yiewsley': # the last location on the table
        break

1 | Abbey Wood | Bexley,  Greenwich [7] | LONDON
2 | Acton | Ealing, Hammersmith and Fulham[8] | LONDON
3 | Addington | Croydon[8] | CROYDON
4 | Addiscombe | Croydon[8] | CROYDON
5 | Albany Park | Bexley | BEXLEY, SIDCUP
6 | Aldborough Hatch | Redbridge[9] | ILFORD
7 | Aldgate | City[10] | LONDON
8 | Aldwych | Westminster[10] | LONDON
9 | Alperton | Brent[11] | WEMBLEY
10 | Anerley | Bromley[11] | LONDON
11 | Angel | Islington[8] | LONDON
12 | Aperfield | Bromley[11] | WESTERHAM
13 | Archway | Islington[12] | LONDON
14 | Ardleigh Green | Havering[12] | HORNCHURCH
15 | Arkley | Barnet[12] | BARNET, LONDON
16 | Arnos Grove | Enfield[12] | LONDON
17 | Balham | Wandsworth[13] | LONDON
18 | Bankside | Southwark[14] | LONDON
19 | Barbican | City[14] | LONDON
20 | Barking | Barking and Dagenham[14] | BARKING
21 | Barkingside | Redbridge[15] | ILFORD
22 | Barnehurst | Bexley[15] | BEXLEYHEATH
23 | Barnes | Richmond upon Thames[15] | LONDON
24 | Barnes Cray | Bexley[16] | DARTFORD
25 | Barnet G

In [6]:
yiewsley_index = (533-1)*6 + 2
elements[yiewsley_index].get_text()

'Yiewsley'

At the previous step we collected 533 rows with data. The last location in the table is 'Yiewsley' and its index in the elements list is 3194. Let's transform raw data into a list of lists, considering the restriction to ignore location with a Postal Town that is not 'LONDON'. Also we will add two zeros in each row as a initial geographical coordinates.

In [7]:
# Creating a new list of rows
lst = []
for i in range(2, 3195, 6):
    location, borough, postal_town = elements[i].getText(), elements[i+1].getText(), elements[i+2].getText()
    if postal_town != 'LONDON':
        continue
    lst.append([location, borough, postal_town, 0, 0])
lst[25:34]

[['Bloomsbury', 'Camden[29]', 'LONDON', 0, 0],
 ['Bounds Green', 'Haringey[31]', 'LONDON', 0, 0],
 ['Bow', 'Tower Hamlets[31]', 'LONDON', 0, 0],
 ['Bowes Park', 'Haringey[32]', 'LONDON', 0, 0],
 ['Brent Cross', 'Barnet', 'LONDON', 0, 0],
 ['Brent Park', 'Brent', 'LONDON', 0, 0],
 ['Brixton', 'Lambeth[34]', 'LONDON', 0, 0],
 ['Brockley', 'Lewisham[34]', 'LONDON', 0, 0],
 ['Bromley (also Bromley-by-Bow)', 'Tower Hamlets[36]', 'LONDON', 0, 0]]

As we can see there is some garbage in or data, for example in the last row in the previous output: ['Bromley (also Bromley-by-Bow)', 'Tower Hamlets[36]', 'LONDON'].
Let's clean our data by deleting text in brackets using regular expressions.

In [8]:
for i in range(len(lst)):
    loc, bor = lst[i][0], lst[i][1]
    if loc.endswith(')') or loc.endswith(']'):
        lst[i][0] = re.sub('(\s?\(.*?\)$)|(\s?\[.*?\]$)', '', loc)
    if bor.endswith(')') or bor.endswith(']'):
        lst[i][1] = re.sub('(\s?\(.*?\)$)|(\s?\[.*?\]$)', '', bor)
lst[25:34]

[['Bloomsbury', 'Camden', 'LONDON', 0, 0],
 ['Bounds Green', 'Haringey', 'LONDON', 0, 0],
 ['Bow', 'Tower Hamlets', 'LONDON', 0, 0],
 ['Bowes Park', 'Haringey', 'LONDON', 0, 0],
 ['Brent Cross', 'Barnet', 'LONDON', 0, 0],
 ['Brent Park', 'Brent', 'LONDON', 0, 0],
 ['Brixton', 'Lambeth', 'LONDON', 0, 0],
 ['Brockley', 'Lewisham', 'LONDON', 0, 0],
 ['Bromley', 'Tower Hamlets', 'LONDON', 0, 0]]

Now our dataset is clear enough and ready to be transformed into a pandas dataframe.

In [9]:
print('We have {} rows of relevant data.'.format(len(lst)))

We have 299 rows of relevant data.


Let's get a pd dataframe

In [10]:
london_df = pd.DataFrame(lst, columns=['Location', 'Borough', 'PostalTown', 'Latitude', 'Longitude'])
london_df.head()

Unnamed: 0,Location,Borough,PostalTown,Latitude,Longitude
0,Abbey Wood,"Bexley, Greenwich",LONDON,0,0
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,0,0
2,Aldgate,City,LONDON,0,0
3,Aldwych,Westminster,LONDON,0,0
4,Anerley,Bromley,LONDON,0,0


Confirm the size:

In [11]:
london_df.shape

(299, 5)

###### Section 2

In this section we add geographical coordinates(latitude, longtitude) in our pandas dataframe in order to use the Foursquare location data, next. Now, we will use the geopy library for that purpose. Let's try with the first address that is Abbey Wood, Greenwich, London.

In [12]:
# Getting the address string
address = ', '.join(list(london_df.iloc[0, :3]))
address

'Abbey Wood, Bexley,  Greenwich, LONDON'

In [13]:
# Using geopy
geolocator = Nominatim(user_agent='opening_restaurant_london')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of {0} are {1}, {2}.'.format(address, latitude, longitude))

The geograpical coordinate of Abbey Wood, Bexley,  Greenwich, LONDON are 51.4855716, 0.119686820271318.


In [14]:
# Make changes to the dataframe
london_df.iloc[0,3] = latitude
london_df.iloc[0,4] = longitude
london_df.head()

Unnamed: 0,Location,Borough,PostalTown,Latitude,Longitude
0,Abbey Wood,"Bexley, Greenwich",LONDON,51.485572,0.119687
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,0.0,0.0
2,Aldgate,City,LONDON,0.0,0.0
3,Aldwych,Westminster,LONDON,0.0,0.0
4,Anerley,Bromley,LONDON,0.0,0.0


Now we are ready to apply a for loop to go through all addresses in the dataframe and get the corresponding coordinates.
Attention: due to various API restrictions, the following script takes several minutes to complete the task.

In [15]:
for i in range(len(london_df)):
    address = ', '.join(list(london_df.iloc[i, :3]))
    geolocator = Nominatim(user_agent='opening_restaurant_london')
    location = geolocator.geocode(address, timeout= 1000)
    if location == None:
        continue
    latitude = location.latitude
    longitude = location.longitude
    london_df.iloc[i,3] = latitude
    london_df.iloc[i,4] = longitude

london_df.head()

Unnamed: 0,Location,Borough,PostalTown,Latitude,Longitude
0,Abbey Wood,"Bexley, Greenwich",LONDON,51.485572,0.119687
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,0.0,0.0
2,Aldgate,City,LONDON,51.514248,-0.075719
3,Aldwych,Westminster,LONDON,51.512625,-0.118568
4,Anerley,Bromley,LONDON,51.407599,-0.061939


The next step is to drop rows that still contain 0 as a latitude or longitude.

In [16]:
# Check initial shape
london_df.shape

(299, 5)

In [17]:
# Substitute all zeros by NAN
london_df = london_df.replace(0, np.nan)

# Drop all rows containing NAN
london_df.dropna(subset=['Latitude', 'Longitude'], axis=0, inplace=True)
london_df.reset_index(drop=True, inplace=True)
print('Now the London dataframe has {0} data rows.'.format(london_df.shape[0]))

Now the London dataframe has 290 data rows.


Check if there are not unique location names.

In [18]:
len(london_df['Location'].unique())

288

In [19]:
# Printing these locations
for i in range(len(london_df)):
    loc = london_df.iloc[i,0]
    for j in range(i+1, len(london_df)):
        if london_df.iloc[j,0] == loc:
            print(j, loc)

53 Church End
102 Grove Park


Let's simplify things and drop the doubled locations.

In [20]:
london_df.drop_duplicates(subset='Location', keep='first', inplace=True)
if london_df['Location'].unique().shape[0] == london_df.shape[0]:
    print('Duplicates were removed successfully.')

Duplicates were removed successfully.


Confirm the new size.

In [21]:
london_df.shape

(288, 5)

So 288 London neighborhoods are ready to be shown on a map.
We will use the folium library for this purpose.

In [22]:
# Get the London "central" point
london_address = 'London, England'
geolocator = Nominatim(user_agent='opening_restaurant_london')
location = geolocator.geocode(london_address)
london_lat = location.latitude
london_lon = location.longitude
print('The geograpical coordinate of {0} are {1}, {2}.'.format(london_address, london_lat, london_lon))

The geograpical coordinate of London, England are 51.5073219, -0.1276474.


In [23]:
# create map of London using starting point coordinates
london_map = folium.Map(location=[london_lat, london_lon], zoom_start=11)

# add markers to map
for lat, lng, bor, loc in zip(london_df['Latitude'], london_df['Longitude'], london_df['Borough'], london_df['Location']):
    label = '{}, {}'.format(loc, bor)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        ).add_to(london_map)
    
london_map

###### 4.EXPLORING LONDON RESTAURANTS

Now, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

###### Section 1-Collecting Restaurant Data 

Let's explore the first neighborhood in our dataframe.

In [24]:
london_df.loc[16, 'Location']

'Bellingham'

Get the neighborhood's latitude and longitude values.

In [25]:
loc_latitude = london_df.loc[16, 'Latitude'] # neighborhood latitude value
loc_longitude = london_df.loc[16, 'Longitude'] # neighborhood longitude value

loc_name = london_df.loc[16, 'Location'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(loc_name, 
                                                               loc_latitude, 
                                                               loc_longitude))

Latitude and longitude values of Bellingham are 51.4310809, -0.0245145.


Now, let's get the top 100 venues that are in Marble Hill within a radius of 600 meters.

In [26]:
radius = 600
LIMIT = 100                                                                                                                                         #Client id                                          #Client Secret                                                                  #Version
url = 'https://api.foursquare.com/v2/venues/explore?client_id={0}&client_secret={1}&ll={2},{3}&v={4}&radius={5}&limit={6}&query=restaurant'.format('K0GFKK3X54TINV4ZOKHCBBJI4K0X00Q0AINTM1CBRURRNABT', 'CPJT2I3T3CCLF0DANGCWDWVZEDDKAKU0ZRXB1FLEHITDNRBW', loc_latitude, loc_longitude, '20180605', radius, LIMIT)

In [27]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5e0b3f8b9fcb92001bbbe531'},
 'response': {'headerLocation': 'Bellingham',
  'headerFullLocation': 'Bellingham, London',
  'headerLocationGranularity': 'neighborhood',
  'query': 'restaurant',
  'totalResults': 7,
  'suggestedBounds': {'ne': {'lat': 51.4364809054,
    'lng': -0.015869259932335608},
   'sw': {'lat': 51.42568089459999, 'lng': -0.033159740067664395}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '531f3dbf498e08abe4af40ae',
       'name': 'Rhubarb and Custard',
       'location': {'lat': 51.432321,
        'lng': -0.019559,
        'labeledLatLngs': [{'label': 'display',
          'lat': 51.432321,
          'lng': -0.019559}],
        'distance': 370,
        'cc': 'GB',
        'country': 'United Kingdom',
        'for

In [28]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [29]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues = nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Rhubarb and Custard,Café,51.432321,-0.019559
1,Twins Cafe,Café,51.432912,-0.018629
2,Jerklan Grill,Caribbean Restaurant,51.433042,-0.018148
3,Morley's,Fried Chicken Joint,51.432677,-0.017359
4,Ayten's cafe,Café,51.427317,-0.029772


In [30]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

7 venues were returned by Foursquare.


Let's create a function to repeat the same process to all the neighborhoods in London.

In [39]:
def getNearbyVenues(names, latitudes, longitudes, radius=600):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={0}&client_secret={1}&v={2}&ll={3},{4}&radius={5}&limit={6}&query=restaurant'.format(
            'K0GFKK3X54TINV4ZOKHCBBJI4K0X00Q0AINTM1CBRURRNABT', #client id
            'CPJT2I3T3CCLF0DANGCWDWVZEDDKAKU0ZRXB1FLEHITDNRBW', #client secret 
            '20180605', #version
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            #v['venue']['location']['lat'], 
            #v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Location', 
                  'Latitude', 
                  'Longitude', 
                  'Venue', 
                  #'Venue Latitude', 
                  #'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Now we run the above function on each neighborhood and create a new dataframe called london_venues.

In [40]:
london_venues = getNearbyVenues(names=london_df['Location'],
                                   latitudes=london_df['Latitude'],
                                   longitudes=london_df['Longitude']
                                  )

Let's check the size of the resulting dataframe.

In [41]:
print(london_venues.shape)

(7801, 5)


In [42]:
london_venues.head()

Unnamed: 0,Location,Latitude,Longitude,Venue,Venue Category
0,Abbey Wood,51.485572,0.119687,Greggs,Bakery
1,Abbey Wood,51.485572,0.119687,Abbey Cafe,Café
2,Abbey Wood,51.485572,0.119687,The Crafty Cafe by Sharon,Café
3,Abbey Wood,51.485572,0.119687,Frank's Fish Bar,Fish & Chips Shop
4,Aldgate,51.514248,-0.075719,Benk + Bo,Bakery


Let's check how many restaurants were returned for each neighborhood.

In [43]:
london_venues[['Location', 'Venue']].groupby('Location').count()

Unnamed: 0_level_0,Venue
Location,Unnamed: 1_level_1
Abbey Wood,4
Aldgate,100
Aldwych,100
Anerley,6
Angel,75
Archway,27
Arnos Grove,4
Balham,37
Bankside,92
Barbican,94


In [44]:
x = london_venues[['Location', 'Venue']].groupby('Location').count().shape[0]
y = london_df.shape[0]
empty_locations = []
if x != y:
    print('Missing data for {0} locations:'.format(y-x))
    # And print them
    for i in range(london_df.shape[0]):
        loc = london_df.iloc[i,0]
        k = 0
        for j in range(london_venues.shape[0]):
            if loc == london_venues.iloc[j,0]:
                k += 1
        if k == 0:
            print(i,loc)
            empty_locations.append(loc)

Missing data for 4 locations:
50 Chinbrook
61 Crossness
104 Hackney Marshes
163 Mill Hill


Let's find out how many unique categories can be created from all the returned restaurants.

In [45]:
print('There are {0} uniques categories.'.format(len(london_venues['Venue Category'].unique())))

There are 130 uniques categories.


###### Section 2-Exploring Restaurants

To begin analysis we need to transform collected information using the one-hot encoding method.

In [46]:
# one hot encoding
london_onehot = pd.get_dummies(london_venues[['Venue Category']], prefix="", prefix_sep="")

# add location column back to dataframe
london_onehot['Location'] = london_venues['Location'] 

# move location column to the first column
fixed_columns = [london_onehot.columns[-1]] + list(london_onehot.columns[:-1])
london_onehot = london_onehot[fixed_columns]

london_onehot.head()

Unnamed: 0,Location,Afghan Restaurant,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,Australian Restaurant,Austrian Restaurant,BBQ Joint,Bagel Shop,Bakery,Belgian Restaurant,Bistro,Brasserie,Brazilian Restaurant,Breakfast Spot,Buffet,Burger Joint,Burrito Place,Cafeteria,Café,Cajun / Creole Restaurant,Cantonese Restaurant,Caribbean Restaurant,Caucasian Restaurant,Chaat Place,Chinese Restaurant,Churrascaria,Cigkofte Place,Colombian Restaurant,Creperie,Cuban Restaurant,Currywurst Joint,Czech Restaurant,Deli / Bodega,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Food,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Gastropub,German Restaurant,Gluten-free Restaurant,Greek Restaurant,Halal Restaurant,Himalayan Restaurant,Hot Dog Joint,Hunan Restaurant,Indian Restaurant,Indonesian Restaurant,Iraqi Restaurant,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jewish Restaurant,Kebab Restaurant,Korean Restaurant,Kosher Restaurant,Latin American Restaurant,Lebanese Restaurant,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Mineiro Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,New American Restaurant,Noodle House,North Indian Restaurant,Okonomiyaki Restaurant,Paella Restaurant,Pakistani Restaurant,Persian Restaurant,Peruvian Restaurant,Pizza Place,Poke Place,Polish Restaurant,Portuguese Restaurant,Ramen Restaurant,Restaurant,Russian Restaurant,Salad Place,Sandwich Place,Scandinavian Restaurant,Scottish Restaurant,Seafood Restaurant,Shabu-Shabu Restaurant,Snack Place,Soup Place,South American Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Souvlaki Shop,Spanish Restaurant,Sri Lankan Restaurant,Steakhouse,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tapas Restaurant,Thai Restaurant,Theme Restaurant,Turkish Restaurant,Udon Restaurant,Ukrainian Restaurant,Vegetarian / Vegan Restaurant,Veneto Restaurant,Vietnamese Restaurant,Wings Joint,Xinjiang Restaurant,Yakitori Restaurant,Yoshoku Restaurant
0,Abbey Wood,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Abbey Wood,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Abbey Wood,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Abbey Wood,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Aldgate,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [47]:
london_onehot.shape

(7801, 131)

Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category preparing the dataframe for clustering.

In [48]:
london_grouped = london_onehot.groupby('Location').mean().reset_index()
london_grouped

Unnamed: 0,Location,Afghan Restaurant,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,Australian Restaurant,Austrian Restaurant,BBQ Joint,Bagel Shop,Bakery,Belgian Restaurant,Bistro,Brasserie,Brazilian Restaurant,Breakfast Spot,Buffet,Burger Joint,Burrito Place,Cafeteria,Café,Cajun / Creole Restaurant,Cantonese Restaurant,Caribbean Restaurant,Caucasian Restaurant,Chaat Place,Chinese Restaurant,Churrascaria,Cigkofte Place,Colombian Restaurant,Creperie,Cuban Restaurant,Currywurst Joint,Czech Restaurant,Deli / Bodega,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Food,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Gastropub,German Restaurant,Gluten-free Restaurant,Greek Restaurant,Halal Restaurant,Himalayan Restaurant,Hot Dog Joint,Hunan Restaurant,Indian Restaurant,Indonesian Restaurant,Iraqi Restaurant,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jewish Restaurant,Kebab Restaurant,Korean Restaurant,Kosher Restaurant,Latin American Restaurant,Lebanese Restaurant,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Mineiro Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,New American Restaurant,Noodle House,North Indian Restaurant,Okonomiyaki Restaurant,Paella Restaurant,Pakistani Restaurant,Persian Restaurant,Peruvian Restaurant,Pizza Place,Poke Place,Polish Restaurant,Portuguese Restaurant,Ramen Restaurant,Restaurant,Russian Restaurant,Salad Place,Sandwich Place,Scandinavian Restaurant,Scottish Restaurant,Seafood Restaurant,Shabu-Shabu Restaurant,Snack Place,Soup Place,South American Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Souvlaki Shop,Spanish Restaurant,Sri Lankan Restaurant,Steakhouse,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tapas Restaurant,Thai Restaurant,Theme Restaurant,Turkish Restaurant,Udon Restaurant,Ukrainian Restaurant,Vegetarian / Vegan Restaurant,Veneto Restaurant,Vietnamese Restaurant,Wings Joint,Xinjiang Restaurant,Yakitori Restaurant,Yoshoku Restaurant
0,Abbey Wood,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Aldgate,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.07,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.02,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.04,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.05,0.0,0.04,0.0,0.0,0.03,0.0,0.0,0.0,0.01,0.03,0.01,0.04,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.01,0.0,0.0,0.0,0.08,0.0,0.05,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.01,0.0,0.01,0.04,0.0,0.03,0.0,0.0,0.02,0.0,0.02,0.01,0.0,0.0,0.0
2,Aldwych,0.0,0.0,0.03,0.0,0.01,0.02,0.0,0.0,0.01,0.0,0.06,0.01,0.0,0.01,0.0,0.02,0.0,0.07,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.01,0.0,0.05,0.0,0.03,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.04,0.0,0.0,0.0,0.01,0.11,0.0,0.0,0.03,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.03,0.06,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0
3,Anerley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Angel,0.013333,0.0,0.0,0.0,0.0,0.013333,0.0,0.013333,0.0,0.0,0.013333,0.0,0.0,0.0,0.013333,0.013333,0.0,0.04,0.04,0.0,0.133333,0.013333,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.04,0.013333,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.04,0.013333,0.0,0.0,0.0,0.066667,0.0,0.026667,0.0,0.0,0.026667,0.0,0.0,0.0,0.0,0.026667,0.026667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026667,0.0,0.0,0.0,0.013333,0.04,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.013333,0.08,0.0,0.0,0.013333,0.013333,0.0,0.013333,0.0,0.0,0.026667,0.0,0.026667,0.0,0.0,0.0,0.0
5,Archway,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.259259,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.074074,0.0,0.074074,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.074074,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.074074,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0
6,Arnos Grove,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Balham,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.081081,0.0,0.0,0.0,0.0,0.054054,0.0,0.054054,0.0,0.0,0.162162,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.054054,0.0,0.054054,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.081081,0.0,0.0,0.0,0.0,0.054054,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.135135,0.0,0.0,0.0,0.0,0.054054,0.0,0.0,0.054054,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Bankside,0.0,0.0,0.0,0.0,0.0,0.043478,0.01087,0.0,0.01087,0.0,0.043478,0.0,0.01087,0.0,0.0,0.021739,0.0,0.043478,0.021739,0.0,0.054348,0.0,0.0,0.0,0.0,0.0,0.01087,0.0,0.0,0.0,0.01087,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.01087,0.032609,0.0,0.0,0.032609,0.0,0.0,0.0,0.0,0.01087,0.032609,0.01087,0.0,0.0,0.01087,0.0,0.0,0.0,0.0,0.0,0.0,0.054348,0.0,0.0,0.01087,0.0,0.086957,0.0,0.01087,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01087,0.0,0.0,0.0,0.032609,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.021739,0.021739,0.054348,0.0,0.01087,0.01087,0.01087,0.0,0.054348,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032609,0.0,0.021739,0.0,0.0,0.01087,0.021739,0.0,0.0,0.01087,0.01087,0.0,0.01087,0.0,0.021739,0.0,0.0,0.0,0.0
9,Barbican,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.010638,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.010638,0.031915,0.0,0.117021,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.021277,0.021277,0.0,0.010638,0.0,0.0,0.010638,0.06383,0.085106,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.148936,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.021277,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.010638,0.0,0.021277,0.06383,0.0,0.0,0.0,0.0,0.0,0.010638,0.010638,0.0,0.0,0.0,0.010638,0.0,0.010638,0.06383,0.0,0.010638,0.0,0.0,0.0,0.021277,0.0,0.0,0.010638,0.0,0.053191,0.0,0.0,0.0,0.0


Let's confirm the new size.

In [49]:
london_grouped.shape

(284, 131)

Let's investigate each neighborhood along with the top 5 most common venues.

In [50]:
# Function to sort the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [51]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Location']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Location'] = london_grouped['Location']

for ind in np.arange(london_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(london_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Location,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Abbey Wood,Café,Fish & Chips Shop,Bakery,English Restaurant,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,Yoshoku Restaurant,Food
1,Aldgate,Restaurant,Café,Indian Restaurant,Italian Restaurant,Salad Place,French Restaurant,Middle Eastern Restaurant,Japanese Restaurant,Pizza Place,Thai Restaurant
2,Aldwych,Restaurant,French Restaurant,Burger Joint,Sushi Restaurant,Bakery,Italian Restaurant,Indian Restaurant,Pizza Place,Café,Deli / Bodega
3,Anerley,Café,Gastropub,Fast Food Restaurant,Chinese Restaurant,Fish & Chips Shop,Yoshoku Restaurant,Falafel Restaurant,Ethiopian Restaurant,English Restaurant,Eastern European Restaurant
4,Angel,Café,Sushi Restaurant,Italian Restaurant,Burrito Place,Burger Joint,French Restaurant,Sandwich Place,Restaurant,Indian Restaurant,Mexican Restaurant
5,Archway,Café,Indian Restaurant,Fast Food Restaurant,Italian Restaurant,Sandwich Place,Japanese Restaurant,Pizza Place,Asian Restaurant,Kebab Restaurant,Seafood Restaurant
6,Arnos Grove,Indian Restaurant,Café,Chinese Restaurant,Fish & Chips Shop,Yoshoku Restaurant,Empanada Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,English Restaurant
7,Balham,Café,Pizza Place,Bakery,Indian Restaurant,Burger Joint,Fish & Chips Shop,Sandwich Place,Breakfast Spot,Restaurant,Fast Food Restaurant
8,Bankside,Italian Restaurant,Seafood Restaurant,Café,Restaurant,Indian Restaurant,Burger Joint,Asian Restaurant,Bakery,Spanish Restaurant,Modern European Restaurant
9,Barbican,Italian Restaurant,Café,French Restaurant,Food Truck,Sandwich Place,Sushi Restaurant,Vietnamese Restaurant,Burrito Place,Gastropub,English Restaurant


###### Section 3-Clustering Restaurants

In [52]:
# set number of clusters
kclusters = 5

london_grouped_clustering = london_grouped.drop('Location', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=4).fit(london_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 0, 0, 1, 0, 4, 4, 0, 0, 0], dtype=int32)

Let's create a new dataframe that includes the clusters as well as the top 10 venues for each neighborhood.
Do not forget that some location didn't get any data from Foursquare API, and we put them to the list.
Therfore we are forced to exclude them from the resulting dataset.

In [54]:
london_merged = london_df

# Substitute all empty locations by NAN
for loc in empty_locations:
    london_merged = london_merged.replace(loc, np.nan)

# then drop all rows containing NAN
london_merged.dropna(subset=['Location'], axis=0, inplace=True)
london_merged.reset_index(drop=True, inplace=True)
print('Now the cluster dataframe has {0} data rows.'.format(london_merged.shape[0]))

# add clustering labels
london_merged['Cluster Labels'] = kmeans.labels_

# merge london_grouped with london_df to add latitude/longitude for each neighborhood
london_merged = london_merged.join(neighborhoods_venues_sorted.set_index('Location'), on='Location')

london_merged.head()

Now the cluster dataframe has 284 data rows.


Unnamed: 0,Location,Borough,PostalTown,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Abbey Wood,"Bexley, Greenwich",LONDON,51.485572,0.119687,1,Café,Fish & Chips Shop,Bakery,English Restaurant,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,Yoshoku Restaurant,Food
1,Aldgate,City,LONDON,51.514248,-0.075719,0,Restaurant,Café,Indian Restaurant,Italian Restaurant,Salad Place,French Restaurant,Middle Eastern Restaurant,Japanese Restaurant,Pizza Place,Thai Restaurant
2,Aldwych,Westminster,LONDON,51.512625,-0.118568,0,Restaurant,French Restaurant,Burger Joint,Sushi Restaurant,Bakery,Italian Restaurant,Indian Restaurant,Pizza Place,Café,Deli / Bodega
3,Anerley,Bromley,LONDON,51.407599,-0.061939,1,Café,Gastropub,Fast Food Restaurant,Chinese Restaurant,Fish & Chips Shop,Yoshoku Restaurant,Falafel Restaurant,Ethiopian Restaurant,English Restaurant,Eastern European Restaurant
4,Angel,Islington,LONDON,51.531946,-0.106106,0,Café,Sushi Restaurant,Italian Restaurant,Burrito Place,Burger Joint,French Restaurant,Sandwich Place,Restaurant,Indian Restaurant,Mexican Restaurant


###### 5.RESULTS

Now we are ready to conclude our report.

###### Section 1-Examine Clusters

We will examine each cluster and the discriminating restaurant categories that distinguish a cluster.

###### Cluster 1

In [63]:
london_merged.loc[london_merged['Cluster Labels'] == 0, london_merged.columns[[1] + list(range(5, london_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,City,0,Restaurant,Café,Indian Restaurant,Italian Restaurant,Salad Place,French Restaurant,Middle Eastern Restaurant,Japanese Restaurant,Pizza Place,Thai Restaurant
2,Westminster,0,Restaurant,French Restaurant,Burger Joint,Sushi Restaurant,Bakery,Italian Restaurant,Indian Restaurant,Pizza Place,Café,Deli / Bodega
4,Islington,0,Café,Sushi Restaurant,Italian Restaurant,Burrito Place,Burger Joint,French Restaurant,Sandwich Place,Restaurant,Indian Restaurant,Mexican Restaurant
7,Wandsworth,0,Café,Pizza Place,Bakery,Indian Restaurant,Burger Joint,Fish & Chips Shop,Sandwich Place,Breakfast Spot,Restaurant,Fast Food Restaurant
8,Southwark,0,Italian Restaurant,Seafood Restaurant,Café,Restaurant,Indian Restaurant,Burger Joint,Asian Restaurant,Bakery,Spanish Restaurant,Modern European Restaurant
9,City,0,Italian Restaurant,Café,French Restaurant,Food Truck,Sandwich Place,Sushi Restaurant,Vietnamese Restaurant,Burrito Place,Gastropub,English Restaurant
10,Richmond upon Thames,0,Pizza Place,Gastropub,Breakfast Spot,French Restaurant,Bakery,Thai Restaurant,Café,Italian Restaurant,Restaurant,Doner Restaurant
12,Wandsworth,0,Italian Restaurant,Bakery,Thai Restaurant,Café,Japanese Restaurant,Chinese Restaurant,Restaurant,Lebanese Restaurant,Sushi Restaurant,Bistro
13,Westminster,0,Café,Italian Restaurant,Chinese Restaurant,Restaurant,Pizza Place,Indian Restaurant,Greek Restaurant,Sandwich Place,Bakery,Malay Restaurant
14,Ealing,0,Pizza Place,Café,Bakery,Middle Eastern Restaurant,Japanese Restaurant,Fast Food Restaurant,Burger Joint,Modern European Restaurant,Brasserie,Sushi Restaurant


In [69]:
cluster_1 = london_merged.loc[london_merged['Cluster Labels'] == 0, london_merged.columns[[1] + list(range(5, london_merged.shape[1]))]]
cluster_1.describe(include='all')

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
count,134,134.0,134,134,134,134,134,134,134,134,134,134
unique,33,,20,27,36,37,49,46,50,49,45,52
top,Tower Hamlets,,Café,Café,Restaurant,Café,Italian Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,English Restaurant,Dumpling Restaurant
freq,12,,40,27,12,18,9,10,12,11,12,9
mean,,0.0,,,,,,,,,,
std,,0.0,,,,,,,,,,
min,,0.0,,,,,,,,,,
25%,,0.0,,,,,,,,,,
50%,,0.0,,,,,,,,,,
75%,,0.0,,,,,,,,,,


###### Cluster 2

In [64]:
london_merged.loc[london_merged['Cluster Labels'] == 1, london_merged.columns[[1] + list(range(5, london_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Bexley, Greenwich",1,Café,Fish & Chips Shop,Bakery,English Restaurant,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,Yoshoku Restaurant,Food
3,Bromley,1,Café,Gastropub,Fast Food Restaurant,Chinese Restaurant,Fish & Chips Shop,Yoshoku Restaurant,Falafel Restaurant,Ethiopian Restaurant,English Restaurant,Eastern European Restaurant
16,Lewisham,1,Café,Fast Food Restaurant,Caribbean Restaurant,Pizza Place,Fried Chicken Joint,Falafel Restaurant,Ethiopian Restaurant,English Restaurant,Empanada Restaurant,Yoshoku Restaurant
22,Greenwich,1,Café,Deli / Bodega,Chinese Restaurant,Fast Food Restaurant,Fish & Chips Shop,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant
34,Barnet,1,Café,Turkish Restaurant,Diner,Yoshoku Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,English Restaurant,Empanada Restaurant,Eastern European Restaurant
63,Newham,1,Café,Fast Food Restaurant,Yoshoku Restaurant,Fish & Chips Shop,Deli / Bodega,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant
70,Southwark,1,Café,Restaurant,Gastropub,Falafel Restaurant,Ethiopian Restaurant,English Restaurant,Empanada Restaurant,Eastern European Restaurant,Dumpling Restaurant,Filipino Restaurant
86,Newham,1,Café,Fast Food Restaurant,Bakery,Chinese Restaurant,Fish & Chips Shop,Yoshoku Restaurant,English Restaurant,Falafel Restaurant,Ethiopian Restaurant,Eastern European Restaurant
94,Camden,1,Café,Gastropub,Breakfast Spot,French Restaurant,Pizza Place,Wings Joint,Italian Restaurant,Falafel Restaurant,Ethiopian Restaurant,English Restaurant
95,Barnet,1,Café,Bakery,Yoshoku Restaurant,English Restaurant,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,Empanada Restaurant,Food


In [70]:
cluster_2 = london_merged.loc[london_merged['Cluster Labels'] == 1, london_merged.columns[[1] + list(range(5, london_merged.shape[1]))]]
cluster_2.describe(include='all')

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
count,30,30.0,30,30,30,30,30,30,30,30,30,30
unique,17,,8,16,19,21,18,19,21,19,18,17
top,Barnet,,Café,Fast Food Restaurant,Bakery,Pizza Place,Fast Food Restaurant,Diner,Burger Joint,Falafel Restaurant,Dumpling Restaurant,Eastern European Restaurant
freq,4,,23,5,5,3,3,3,3,4,4,7
mean,,1.0,,,,,,,,,,
std,,0.0,,,,,,,,,,
min,,1.0,,,,,,,,,,
25%,,1.0,,,,,,,,,,
50%,,1.0,,,,,,,,,,
75%,,1.0,,,,,,,,,,


###### Cluster 3

In [65]:
london_merged.loc[london_merged['Cluster Labels'] == 2, london_merged.columns[[1] + list(range(5, london_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
250,Croydon,2,English Restaurant,Yoshoku Restaurant,Currywurst Joint,Deli / Bodega,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant


In [71]:
cluster_3 = london_merged.loc[london_merged['Cluster Labels'] == 2, london_merged.columns[[1] + list(range(5, london_merged.shape[1]))]]
cluster_3.describe(include='all')

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
count,1,1.0,1,1,1,1,1,1,1,1,1,1
unique,1,,1,1,1,1,1,1,1,1,1,1
top,Croydon,,English Restaurant,Yoshoku Restaurant,Currywurst Joint,Deli / Bodega,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant
freq,1,,1,1,1,1,1,1,1,1,1,1
mean,,2.0,,,,,,,,,,
std,,,,,,,,,,,,
min,,2.0,,,,,,,,,,
25%,,2.0,,,,,,,,,,
50%,,2.0,,,,,,,,,,
75%,,2.0,,,,,,,,,,


###### Cluster 4

In [66]:
london_merged.loc[london_merged['Cluster Labels'] == 3, london_merged.columns[[1] + list(range(5, london_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
29,Brent,3,Indian Restaurant,Fast Food Restaurant,Mediterranean Restaurant,Sandwich Place,Eastern European Restaurant,Falafel Restaurant,Ethiopian Restaurant,English Restaurant,Empanada Restaurant,Dumpling Restaurant
76,Newham,3,Indian Restaurant,Fast Food Restaurant,Sandwich Place,Bakery,Eastern European Restaurant,Falafel Restaurant,Ethiopian Restaurant,English Restaurant,Empanada Restaurant,Dumpling Restaurant
96,Enfield,3,Indian Restaurant,English Restaurant,Yoshoku Restaurant,Fish & Chips Shop,Deli / Bodega,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant
106,Camden,3,Café,Bakery,Italian Restaurant,Indian Restaurant,Greek Restaurant,Bagel Shop,Pizza Place,Japanese Restaurant,Dim Sum Restaurant,Bistro
150,Westminster,3,Café,Thai Restaurant,Pizza Place,Bakery,Indian Restaurant,Deli / Bodega,Restaurant,Italian Restaurant,Chinese Restaurant,Fast Food Restaurant
252,Newham,3,Indian Restaurant,Bakery,Fast Food Restaurant,Sandwich Place,Eastern European Restaurant,Falafel Restaurant,Ethiopian Restaurant,English Restaurant,Empanada Restaurant,Dumpling Restaurant
267,Bexley,3,Indian Restaurant,Yoshoku Restaurant,Fish & Chips Shop,Deli / Bodega,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant


In [72]:
cluster_4 = london_merged.loc[london_merged['Cluster Labels'] == 3, london_merged.columns[[1] + list(range(5, london_merged.shape[1]))]]
cluster_4.describe(include='all')

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
count,7,7.0,7,7,7,7,7,7,7,7,7,7
unique,6,,2,5,7,5,5,5,5,5,5,4
top,Newham,,Indian Restaurant,Bakery,Mediterranean Restaurant,Bakery,Eastern European Restaurant,Falafel Restaurant,Ethiopian Restaurant,English Restaurant,Empanada Restaurant,Dumpling Restaurant
freq,2,,5,2,1,2,3,3,3,3,3,4
mean,,3.0,,,,,,,,,,
std,,0.0,,,,,,,,,,
min,,3.0,,,,,,,,,,
25%,,3.0,,,,,,,,,,
50%,,3.0,,,,,,,,,,
75%,,3.0,,,,,,,,,,


###### Cluster 5

In [67]:
london_merged.loc[london_merged['Cluster Labels'] == 4, london_merged.columns[[1] + list(range(5, london_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Islington,4,Café,Indian Restaurant,Fast Food Restaurant,Italian Restaurant,Sandwich Place,Japanese Restaurant,Pizza Place,Asian Restaurant,Kebab Restaurant,Seafood Restaurant
6,Enfield,4,Indian Restaurant,Café,Chinese Restaurant,Fish & Chips Shop,Yoshoku Restaurant,Empanada Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,English Restaurant
11,Islington,4,Café,Gastropub,Caucasian Restaurant,Fast Food Restaurant,Chinese Restaurant,African Restaurant,Vietnamese Restaurant,Ethiopian Restaurant,Italian Restaurant,Restaurant
17,Camden,4,Café,Deli / Bodega,Italian Restaurant,Bakery,Indian Restaurant,Pizza Place,Restaurant,Tapas Restaurant,Middle Eastern Restaurant,Mediterranean Restaurant
18,Southwark,4,Café,Burger Joint,Ramen Restaurant,Bakery,Fried Chicken Joint,Fish & Chips Shop,Pizza Place,Chinese Restaurant,Breakfast Spot,Brazilian Restaurant
24,Camden,4,Café,Italian Restaurant,Sandwich Place,Turkish Restaurant,Pizza Place,Bakery,Restaurant,Food Court,Deli / Bodega,Indian Restaurant
26,Tower Hamlets,4,Burger Joint,Café,Fast Food Restaurant,Bakery,Yoshoku Restaurant,English Restaurant,Filipino Restaurant,Falafel Restaurant,Ethiopian Restaurant,Empanada Restaurant
28,Barnet,4,Café,Fast Food Restaurant,Portuguese Restaurant,Sushi Restaurant,Deli / Bodega,Restaurant,Sandwich Place,Snack Place,Burger Joint,Pizza Place
31,Lewisham,4,Café,Chinese Restaurant,Fish & Chips Shop,Pizza Place,Diner,Fast Food Restaurant,Breakfast Spot,Malay Restaurant,Fried Chicken Joint,Ethiopian Restaurant
32,Tower Hamlets,4,Café,Burger Joint,Fish & Chips Shop,Fast Food Restaurant,Bakery,English Restaurant,Filipino Restaurant,Falafel Restaurant,Ethiopian Restaurant,Yoshoku Restaurant


In [73]:
cluster_5 = london_merged.loc[london_merged['Cluster Labels'] == 4, london_merged.columns[[1] + list(range(5, london_merged.shape[1]))]]
cluster_5.describe(include='all')

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
count,112,112.0,112,112,112,112,112,112,112,112,112,112
unique,29,,16,27,32,40,36,42,39,46,44,44
top,Barnet,,Café,Café,Italian Restaurant,Italian Restaurant,Sandwich Place,Fast Food Restaurant,Falafel Restaurant,Falafel Restaurant,English Restaurant,English Restaurant
freq,11,,73,18,13,10,11,11,12,12,11,8
mean,,4.0,,,,,,,,,,
std,,0.0,,,,,,,,,,
min,,4.0,,,,,,,,,,
25%,,4.0,,,,,,,,,,
50%,,4.0,,,,,,,,,,
75%,,4.0,,,,,,,,,,


###### Section 2-Visualiazation of Clusters

Finally, let's visualize the resulting clusters.

In [68]:
# create map
map_clusters = folium.Map(location=[london_lat, london_lon], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(london_merged['Latitude'], london_merged['Longitude'], london_merged['Location'], london_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster+1), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

###### MAP LEGEND
Cluster 1 - red dots <br/>
Cluster 2 - purple dots <br/>
Cluster 3 - blue dot <br/>
Cluster 4 - green dots <br/>
Cluster 5 - orange dots

###### 6.DISCUSSION

Analyzing the most popular restaurants in each cluster, the stakeholder should prefer the least popular types as a safe choice. For example there is no sense in opening an indian restaurant or a cafe in the Boroughs of Cluster 4. Of course, there might be more than 10 types in a location. And one might object, that following this logic, the stakeholder must prefer the last type in a full list, and not the 10th one. But bear in mind that descending on the popularity list we might face an absence of demand for this type of food, and open a restaurant that is not needed in this particular location. Presence of interested customers is a must for a successful business. That is why in our recommendations we offer to stop on 10th and 9th positions.

Recommendations, based on description of each cluster:<br/>
Cluster 1 Locations: Empanada or English Restaurant<br/>
Cluster 2 Locations: Dumpling or Eastern European Restaurant<br/>
Cluster 3 Locations: Dumpling or Eastern European Restaurant<br/>
Cluster 4 Locations: Empanada or Dumpling Restaurant<br/>
Cluster 5 Locations: English or Ethiopian Restaurant<br/><br/>

After the type of restaurant is chosen, it is time to select a right place. Using the map created in Section 2 of Part 5 and its legend the solution is quite obvious.

###### 7. CONCLUSSION

In this report we worked out a methodology to determine what the most promising type of restaurant is and where it should be opened.<br/><br/>

We collected information about London boroughs from Wikipedia, and using geospatial libraries mapped them. Using Foursquare API, we collected the top 100 restaurants and their types for each location within a radius 600 meters from its central point. Then we grouped collected restaurants by location and by taking the mean of the frequency of occurrence of each type, preparing them for clustering. Finally we clustered restaurants by the k-means algorithm and analize the top 10 most common restaurants in each cluster, making useful observations. Eventually we visualized clusters on the map, thus showing the best locations for opening the chosen type of restaurant.<br/><br/>

This type of analysis can be applied to any city of your choice that has available geospatial information.<br/><br/>

This type of analysis can be applied to any type of venue (shopping, clubs, etc.) that is available in Foursquare database.