# Market analysis for a sushi restaurant company: should they open a restaurant in London or in Madrid?

## 1. Introduction

An important japanese sushi restaurant company decides to start its activities in Europe, since they have plenty of restaurants only in Japan. The owner is considered a visionary, he is fascinated by the idea
of opening a restaurant in an important european capital, but he is not sure which this should be. In fact, northern and southern european citizens can be considered quite different in their respective ways of living the city. 
For this reason, the company asks to compare two very different cities and, more in general, approaches to life style: London and Madrid. Both of this cities could be good to expand this business, but the company wants to find the ideal one. 

To complete this task you have no specifics requirements besides one: since the company is very proud of the standard of the quality of food and, more in general, of the "experience" to offer to the consumer, you have to consider that this restaurant is not particularly cheap, so your analysis should focus only on richest areas of the cities.

## 2. Business Problem

In order to find the ideal city, you decide to focus your research only in the top 5 areas of the cities, where it is reasonable to think that the life style is more expensive. 
Once found out the most favourable Boroughs, you have to look for the most common venues to give a consult to your client.

The *__top 5 richest boroughs in London__* are:

1. Camden
2. Hackney
3. Hammersmith and Fulham
4. Kensington and Chelsea
5. Westminster

And *__top 5 richest boroughs in Madrid__* are: 

1. Centro
2. Chamrtin
3. Chamberi
4. Retiro
5. Salamanca

## 3. Developing the model

Using python to develop the entire model. Different packages will be used:
    
* __bs4__: for web scraping
* __folium__: to generate maps;
* __geopy__: to convert an address into latitude and longitude values
* __matplotlib__: to detail maps and eventually plot graphs;
* __numpy__:  to exploit some of its mathematical methods;
* __pandas__: to create and manipulate databases;
* __sklearn__: to create the clusters;
* __requests__: to manage http requests

## 4. Data Collection

To find out info about boroughs in London and Madrid, it is sufficient to scrape from https://en.wikipedia.org/wiki/List_of_London_boroughs and https://en.wikipedia.org/wiki/Districts_of_Madrid.

To find out about venues and places, **Foursquare** will be used.
This site makes possible to retrieve information about places in the City and then to encorporate them in the code: 
this is crucial since the business model will be based on this real-world location data, which enables to cluster venues in the city. 
Here's the info to be gathered:

1. Name of the Borough;
2. Latitude of the Borough;
3. Longitude of the Borough;
4. Venue : Name of the Venue;
5. Venue Latitude : Latitude of Venue;
6. Venue Longitude : Longitude of Venue;
7. Venue Category : Category of Venue.

## 5. Coding

### Installing required libraries

In [1]:
!pip install geopy

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes


In [2]:
! pip install folium

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
Collecting folium
  Downloading folium-0.12.1-py2.py3-none-any.whl (94 kB)
[K     |████████████████████████████████| 94 kB 4.5 MB/s  eta 0:00:01
Collecting branca>=0.3.0
  Downloading branca-0.4.2-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.2 folium-0.12.1


### Importing libraries

In [3]:
from bs4 import BeautifulSoup
import folium
import geopy
import matplotlib.pyplot as plt
import pandas as pd
import requests
from sklearn.cluster import KMeans

### Retrieving data from the source

In [4]:
url = "https://en.wikipedia.org/wiki/List_of_London_boroughs"
wiki_data = requests.get(url).text

# Checking the connection with the page
if wiki_data != " ":
    print("Data retrieved succesfully!")

Data retrieved succesfully!


### Parsing the html data

In [5]:
soup = BeautifulSoup(wiki_data, "html5lib")

### Scraping the data in the Table

In [6]:
table = soup.find_all("table", {'class':'wikitable sortable'})
df = pd.read_html(str(table[0]), index_col=None, header=0)[0]
df.head()

Unnamed: 0,Borough,Inner,Status,Local authority,Political control,Headquarters,Area (sq mi),Population (2019 est)[1],Co-ordinates,Nr. in map
0,Barking and Dagenham [note 1],,,Barking and Dagenham London Borough Council,Labour,"Town Hall, 1 Town Square",13.93,212906,".mw-parser-output .geo-default,.mw-parser-outp...",25
1,Barnet,,,Barnet London Borough Council,Conservative,"Barnet House, 2 Bristol Avenue, Colindale",33.49,395896,51°37′31″N 0°09′06″W﻿ / ﻿51.6252°N 0.1517°W,31
2,Bexley,,,Bexley London Borough Council,Conservative,"Civic Offices, 2 Watling Street",23.38,248287,51°27′18″N 0°09′02″E﻿ / ﻿51.4549°N 0.1505°E,23
3,Brent,,,Brent London Borough Council,Labour,"Brent Civic Centre, Engineers Way",16.7,329771,51°33′32″N 0°16′54″W﻿ / ﻿51.5588°N 0.2817°W,12
4,Bromley,,,Bromley London Borough Council,Conservative,"Civic Centre, Stockwell Close",57.97,332336,51°24′14″N 0°01′11″E﻿ / ﻿51.4039°N 0.0198°E,20


In [7]:
# Vrifying the shape
df.shape

(32, 10)

In [8]:
# Info about the created df
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32 entries, 0 to 31
Data columns (total 10 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Borough                   32 non-null     object 
 1   Inner                     3 non-null      object 
 2   Status                    4 non-null      object 
 3   Local authority           32 non-null     object 
 4   Political control         32 non-null     object 
 5   Headquarters              32 non-null     object 
 6   Area (sq mi)              32 non-null     float64
 7   Population (2019 est)[1]  32 non-null     int64  
 8   Co-ordinates              32 non-null     object 
 9   Nr. in map                32 non-null     int64  
dtypes: float64(1), int64(2), object(7)
memory usage: 2.6+ KB


### Cleaning the DataFrame

In [9]:
# Creating a copy of the DataFrame
df_london = df
df_london.head()

Unnamed: 0,Borough,Inner,Status,Local authority,Political control,Headquarters,Area (sq mi),Population (2019 est)[1],Co-ordinates,Nr. in map
0,Barking and Dagenham [note 1],,,Barking and Dagenham London Borough Council,Labour,"Town Hall, 1 Town Square",13.93,212906,".mw-parser-output .geo-default,.mw-parser-outp...",25
1,Barnet,,,Barnet London Borough Council,Conservative,"Barnet House, 2 Bristol Avenue, Colindale",33.49,395896,51°37′31″N 0°09′06″W﻿ / ﻿51.6252°N 0.1517°W,31
2,Bexley,,,Bexley London Borough Council,Conservative,"Civic Offices, 2 Watling Street",23.38,248287,51°27′18″N 0°09′02″E﻿ / ﻿51.4549°N 0.1505°E,23
3,Brent,,,Brent London Borough Council,Labour,"Brent Civic Centre, Engineers Way",16.7,329771,51°33′32″N 0°16′54″W﻿ / ﻿51.5588°N 0.2817°W,12
4,Bromley,,,Bromley London Borough Council,Conservative,"Civic Centre, Stockwell Close",57.97,332336,51°24′14″N 0°01′11″E﻿ / ﻿51.4039°N 0.0198°E,20


In [10]:
# Dropping unwanted columns
df_london.drop(["Inner", "Status", "Local authority", "Political control", "Headquarters", "Co-ordinates", "Nr. in map"], axis=1, inplace=True)
df_london.head()

Unnamed: 0,Borough,Area (sq mi),Population (2019 est)[1]
0,Barking and Dagenham [note 1],13.93,212906
1,Barnet,33.49,395896
2,Bexley,23.38,248287
3,Brent,16.7,329771
4,Bromley,57.97,332336


In [11]:
# Renaming the columns
df_london.rename(columns={"Population (2019 est)[1]": "Population (2019)"}, inplace=True)
df_london

Unnamed: 0,Borough,Area (sq mi),Population (2019)
0,Barking and Dagenham [note 1],13.93,212906
1,Barnet,33.49,395896
2,Bexley,23.38,248287
3,Brent,16.7,329771
4,Bromley,57.97,332336
5,Camden,8.4,270029
6,Croydon,33.41,386710
7,Ealing,21.44,341806
8,Enfield,31.74,333794
9,Greenwich [note 2],18.28,287942


There are still few things to remove, specifically *[note 1], [note 2] and [note 4]* in Borough column. The code below is not so "smart" but I can remove manually since there are just three of these errors 

In [12]:
df_london.at[0, "Borough"] = "Barking and Dagenham"
df_london.at[9, "Borough"] = "Greenwich"
df_london.at[11, "Borough"] = "Hammersmith and Fulham"
df_london

Unnamed: 0,Borough,Area (sq mi),Population (2019)
0,Barking and Dagenham,13.93,212906
1,Barnet,33.49,395896
2,Bexley,23.38,248287
3,Brent,16.7,329771
4,Bromley,57.97,332336
5,Camden,8.4,270029
6,Croydon,33.41,386710
7,Ealing,21.44,341806
8,Enfield,31.74,333794
9,Greenwich,18.28,287942


### Getting the latitude and the longitude coordinates of each neighborhood 

*__Note__*: *despite I already had some kind of coordinates in the Table I scraped from Wikipedia, just for the purpose of the analysis I will be playing with another procedure in order to apply some of the differnt things learnt in the course*

In [13]:
from geopy.geocoders import Nominatim # Convert an address into latitude and longitude values

address = 'london'
geolocator = Nominatim(user_agent="london_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of London are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of London are 51.5073219, -0.1276474.


Here's the procedure:

1. Building a list to store all the boroughs I scraped;
2. Using geocode to retrieve coordinates of the elements in the list;
3. Building a new DataFrame (df_coord) to store the informations;
4. Combining the new DataFrame (df_coord) with the previous one (df_london).

###### 1. Building a list to store all the boroughs I scraped

In [14]:
bor = df_london["Borough"].tolist()

###### 2. Using geocode to retrieve coordinates of the elements in the list

In [15]:
city = "London"
lats, lngs, adds = [], [], []

for borough in bor:
    address = borough + ", " + city
    location = geolocator.geocode(address)
    lat = location.latitude
    lng = location.longitude
    lats.append(lat)
    lngs.append(lng)
    adds.append(address)
    print(address, lat, lng)

Barking and Dagenham, London 51.5541171 0.15050434261994267
Barnet, London 51.65309 -0.2002261
Bexley, London 51.4416793 0.150488
Brent, London 51.563825800000004 -0.2757596561855699
Bromley, London 51.4028046 0.0148142
Camden, London 51.5423045 -0.1395604
Croydon, London 51.3713049 -0.101957
Ealing, London 51.5126553 -0.3051952
Enfield, London 51.6520851 -0.0810175
Greenwich, London 51.4820845 -0.0045417
Hackney, London 51.5432402 -0.0493621
Hammersmith and Fulham, London 51.4920377 -0.2236401
Haringey, London 51.6014736 -0.1117815
Harrow, London 51.596827149999996 -0.33731605402671094
Havering, London 51.5443851 -0.14430716398919305
Hillingdon, London 51.542519299999995 -0.44833493117949663
Hounslow, London 51.4686132 -0.3613471
Islington, London 51.5384287 -0.0999051
Kensington and Chelsea, London 51.498480400000005 -0.1990432138025393
Kingston upon Thames, London 51.4096275 -0.3062621
Lambeth, London 51.5013012 -0.117287
Lewisham, London 51.4624325 -0.0101331
Merton, London 51.4108

###### 3. Building a new DataFrame (df_coord) to store the informations

In [16]:
df_coord = pd.DataFrame(list(zip(adds, lats, lngs)), columns=["Borough", "Latitude", "Longitude"])

In [17]:
### Verifying shape of the df_coord
df_coord.shape

(32, 3)

In [18]:
### Printing 
df_coord

Unnamed: 0,Borough,Latitude,Longitude
0,"Barking and Dagenham, London",51.554117,0.150504
1,"Barnet, London",51.65309,-0.200226
2,"Bexley, London",51.441679,0.150488
3,"Brent, London",51.563826,-0.27576
4,"Bromley, London",51.402805,0.014814
5,"Camden, London",51.542305,-0.13956
6,"Croydon, London",51.371305,-0.101957
7,"Ealing, London",51.512655,-0.305195
8,"Enfield, London",51.652085,-0.081018
9,"Greenwich, London",51.482084,-0.004542


###### 4. Combining the new DataFrame (df_coord) with the previous one (df_london)

In [19]:
# Adding a column for the Latitude
df_london["Latitude"] = df_coord["Latitude"]

In [20]:
df_london

Unnamed: 0,Borough,Area (sq mi),Population (2019),Latitude
0,Barking and Dagenham,13.93,212906,51.554117
1,Barnet,33.49,395896,51.65309
2,Bexley,23.38,248287,51.441679
3,Brent,16.7,329771,51.563826
4,Bromley,57.97,332336,51.402805
5,Camden,8.4,270029,51.542305
6,Croydon,33.41,386710,51.371305
7,Ealing,21.44,341806,51.512655
8,Enfield,31.74,333794,51.652085
9,Greenwich,18.28,287942,51.482084


In [21]:
# Adding a column for the Longitude
df_london["Longitude"] = df_coord["Longitude"]
df_london

Unnamed: 0,Borough,Area (sq mi),Population (2019),Latitude,Longitude
0,Barking and Dagenham,13.93,212906,51.554117,0.150504
1,Barnet,33.49,395896,51.65309,-0.200226
2,Bexley,23.38,248287,51.441679,0.150488
3,Brent,16.7,329771,51.563826,-0.27576
4,Bromley,57.97,332336,51.402805,0.014814
5,Camden,8.4,270029,51.542305,-0.13956
6,Croydon,33.41,386710,51.371305,-0.101957
7,Ealing,21.44,341806,51.512655,-0.305195
8,Enfield,31.74,333794,51.652085,-0.081018
9,Greenwich,18.28,287942,51.482084,-0.004542


*__Note__*: I noticed that there was a problem while retrieving coordinates of "Tower Hamlets", in fact it appeared to be located in Canterbury. I looked for the right coordinates and then I inserted them manually.

In [22]:
df_london.at[28, "Longitude"] = 0.0293
df_london

Unnamed: 0,Borough,Area (sq mi),Population (2019),Latitude,Longitude
0,Barking and Dagenham,13.93,212906,51.554117,0.150504
1,Barnet,33.49,395896,51.65309,-0.200226
2,Bexley,23.38,248287,51.441679,0.150488
3,Brent,16.7,329771,51.563826,-0.27576
4,Bromley,57.97,332336,51.402805,0.014814
5,Camden,8.4,270029,51.542305,-0.13956
6,Croydon,33.41,386710,51.371305,-0.101957
7,Ealing,21.44,341806,51.512655,-0.305195
8,Enfield,31.74,333794,51.652085,-0.081018
9,Greenwich,18.28,287942,51.482084,-0.004542


In [23]:
df_london.at[28, "Latitude"] = 51.5203
df_london

Unnamed: 0,Borough,Area (sq mi),Population (2019),Latitude,Longitude
0,Barking and Dagenham,13.93,212906,51.554117,0.150504
1,Barnet,33.49,395896,51.65309,-0.200226
2,Bexley,23.38,248287,51.441679,0.150488
3,Brent,16.7,329771,51.563826,-0.27576
4,Bromley,57.97,332336,51.402805,0.014814
5,Camden,8.4,270029,51.542305,-0.13956
6,Croydon,33.41,386710,51.371305,-0.101957
7,Ealing,21.44,341806,51.512655,-0.305195
8,Enfield,31.74,333794,51.652085,-0.081018
9,Greenwich,18.28,287942,51.482084,-0.004542


### Creating the map of neighborhoods on London

In [24]:
# create map of New York using latitude and longitude values
map_london = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough in zip(df_london['Latitude'], df_london['Longitude'], df_london['Borough']):
    label = '{}'.format(borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_london)  
    
map_london

### Utilizing Foursquare API to explore the neighborhoods

#### Defining Foursquare Credential and Version

In [25]:
CLIENT_ID = 'XV0DMY5DLCNSFC4ZUXYJ2LUKDV0ROYCUY2SYDKLOTQZFDYNA' # your Foursquare ID
CLIENT_SECRET = 'WEKJ35WDWX5YBJCHBXMSS10DQDZPSGPJJ4UFEDDAA1BOD2IQ' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: XV0DMY5DLCNSFC4ZUXYJ2LUKDV0ROYCUY2SYDKLOTQZFDYNA
CLIENT_SECRET:WEKJ35WDWX5YBJCHBXMSS10DQDZPSGPJJ4UFEDDAA1BOD2IQ


#### Creating a function to repeat the process to all the neighborhoods in London

In [26]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Borough', 
                  'Borough Latitude', 
                  'Borough Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Run the above function on each neaighborhood and create a new DataFrame named as "london_venues"

In [27]:
london_venues = getNearbyVenues(names=df_london['Borough'],
                                   latitudes=df_london['Latitude'],
                                   longitudes=df_london['Longitude']
                                  )

Barking and Dagenham
Barnet
Bexley
Brent
Bromley
Camden
Croydon
Ealing
Enfield
Greenwich
Hackney
Hammersmith and Fulham
Haringey
Harrow
Havering
Hillingdon
Hounslow
Islington
Kensington and Chelsea
Kingston upon Thames
Lambeth
Lewisham
Merton
Newham
Redbridge
Richmond upon Thames
Southwark
Sutton
Tower Hamlets
Waltham Forest
Wandsworth
Westminster


#### Checking the new DataFrame

In [28]:
print(london_venues.shape)
london_venues.head()

(1220, 7)


Unnamed: 0,Borough,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Barking and Dagenham,51.554117,0.150504,Tesco Express,51.551536,0.152784,Grocery Store
1,Barking and Dagenham,51.554117,0.150504,Connor Road Bus Stop,51.554345,0.147162,Bus Stop
2,Barking and Dagenham,51.554117,0.150504,Oglethorpe Road Bus Stop,51.555221,0.147136,Bus Stop
3,Barking and Dagenham,51.554117,0.150504,Five Elms Off Licence,51.553878,0.145531,Liquor Store
4,Barnet,51.65309,-0.200226,Ye Old Mitre Inne,51.65294,-0.199507,Pub


#### How many venues in each neighborhood?

In [29]:
london_venues.groupby('Borough').count()

Unnamed: 0_level_0,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Barking and Dagenham,4,4,4,4,4,4
Barnet,30,30,30,30,30,30
Bexley,12,12,12,12,12,12
Brent,15,15,15,15,15,15
Bromley,41,41,41,41,41,41
Camden,89,89,89,89,89,89
Croydon,26,26,26,26,26,26
Ealing,93,93,93,93,93,93
Enfield,58,58,58,58,58,58
Greenwich,59,59,59,59,59,59


In [30]:
print('There are {} uniques categories.'.format(len(london_venues['Venue Category'].unique())))

There are 218 uniques categories.


#### To acquire more info about the neighborhoods, it's time to analyze each one of them

In [31]:
# one hot encoding
london_onehot = pd.get_dummies(london_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
london_onehot['Borough'] = london_venues['Borough'] 

# move neighborhood column to the first column
fixed_columns = [london_onehot.columns[-1]] + list(london_onehot.columns[:-1])
london_onehot = london_onehot[fixed_columns]

london_onehot.head()

Unnamed: 0,Borough,Afghan Restaurant,African Restaurant,American Restaurant,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Austrian Restaurant,...,Tunnel,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Yoga Studio
0,Barking and Dagenham,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Barking and Dagenham,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Barking and Dagenham,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Barking and Dagenham,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Barnet,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


#### Grouping rows by Borough and by taking the mean of the frequency of occurrence of each category

In [32]:
london_grouped = london_onehot.groupby('Borough').mean().reset_index()
london_grouped

Unnamed: 0,Borough,Afghan Restaurant,African Restaurant,American Restaurant,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Austrian Restaurant,...,Tunnel,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Yoga Studio
0,Barking and Dagenham,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Barnet,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Bexley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Brent,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Bromley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,...,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Camden,0.0,0.011236,0.011236,0.0,0.0,0.0,0.0,0.011236,0.0,...,0.0,0.0,0.022472,0.0,0.022472,0.0,0.0,0.0,0.0,0.0
6,Croydon,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Ealing,0.0,0.0,0.010753,0.0,0.010753,0.0,0.0,0.010753,0.0,...,0.0,0.0,0.0,0.010753,0.021505,0.010753,0.010753,0.0,0.0,0.0
8,Enfield,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.017241,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0
9,Greenwich,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,...,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0


#### Printing each neighborhood along with the top 5 most common venues

In [33]:
num_top_venues = 5

for hood in london_grouped['Borough']:
    print("----"+hood+"----")
    temp = london_grouped[london_grouped['Borough'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Barking and Dagenham----
           venue  freq
0       Bus Stop  0.50
1   Liquor Store  0.25
2  Grocery Store  0.25
3         Museum  0.00
4    Music Venue  0.00


----Barnet----
         venue  freq
0  Coffee Shop  0.13
1  Pizza Place  0.07
2          Pub  0.07
3    Bookstore  0.07
4         Park  0.07


----Bexley----
                  venue  freq
0                   Pub  0.17
1  Fast Food Restaurant  0.17
2         Train Station  0.08
3    Italian Restaurant  0.08
4      Greek Restaurant  0.08


----Brent----
               venue  freq
0        Coffee Shop  0.20
1        Supermarket  0.13
2              Hotel  0.13
3  Indian Restaurant  0.07
4  Electronics Store  0.07


----Bromley----
                   venue  freq
0         Clothing Store  0.15
1            Coffee Shop  0.12
2           Burger Joint  0.05
3  Portuguese Restaurant  0.05
4                    Pub  0.05


----Camden----
                venue  freq
0                 Pub  0.11
1         Coffee Shop  0.09
2         

#### Putting in a DataFrame

But first, writing a function to sort the venues in descending order

In [34]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#### Creating the new DataFrame and displaying the top 10 venues for each Borough

In [35]:
import numpy as np

In [36]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Borough']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
borough_venues_sorted = pd.DataFrame(columns=columns)
borough_venues_sorted['Borough'] = london_grouped['Borough']

for ind in np.arange(london_grouped.shape[0]):
    borough_venues_sorted.iloc[ind, 1:] = return_most_common_venues(london_grouped.iloc[ind, :], num_top_venues)

borough_venues_sorted.head()

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Barking and Dagenham,Bus Stop,Liquor Store,Grocery Store,Yoga Studio,Falafel Restaurant,Food Court,Food & Drink Shop,Flea Market,Fish Market,Fish & Chips Shop
1,Barnet,Coffee Shop,Pharmacy,Park,Restaurant,Pizza Place,Convenience Store,Pub,Bookstore,Fast Food Restaurant,Bakery
2,Bexley,Pub,Fast Food Restaurant,Greek Restaurant,Chinese Restaurant,Toy / Game Store,Train Station,Italian Restaurant,Tennis Court,Indian Restaurant,Breakfast Spot
3,Brent,Coffee Shop,Hotel,Supermarket,Pedestrian Plaza,Electronics Store,Café,Bus Stop,Food Court,Burger Joint,Sports Bar
4,Bromley,Clothing Store,Coffee Shop,Burger Joint,Pub,Pizza Place,Portuguese Restaurant,Park,Chocolate Shop,Gelato Shop,Stationery Store


In [37]:
london_rich = ('Westminster', 'Kensington and Chelsea', 'Camden', 'Hammersmith and Fulham', 'Hackney')

In [38]:
df_london_rich_venues = borough_venues_sorted.loc[borough_venues_sorted['Borough'].isin(london_rich)]
df_london_rich_venues

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Camden,Pub,Coffee Shop,Café,Burger Joint,Italian Restaurant,Ice Cream Shop,Beer Bar,Vegetarian / Vegan Restaurant,Caribbean Restaurant,Vietnamese Restaurant
10,Hackney,Coffee Shop,Pub,Café,Supermarket,Brewery,Flea Market,Beer Store,Sporting Goods Shop,Boutique,Yoga Studio
11,Hammersmith and Fulham,Café,Pub,Coffee Shop,Hotel,Gym / Fitness Center,Grocery Store,Sandwich Place,Thai Restaurant,Breakfast Spot,Portuguese Restaurant
18,Kensington and Chelsea,Café,Pub,Italian Restaurant,Persian Restaurant,Burger Joint,Clothing Store,Supermarket,Breakfast Spot,Mediterranean Restaurant,Filipino Restaurant
31,Westminster,Coffee Shop,Pub,Sandwich Place,Historic Site,Outdoor Sculpture,Plaza,Café,Monument / Landmark,Hotel,Garden


## Repeating the same process for Madrid

*__Note__*: *To shorten the code, no markdowns will be written from here since the process has already been detailed before*

In [39]:
url_2 = "https://en.wikipedia.org/wiki/Districts_of_Madrid"
wiki_data_2 = requests.get(url_2).text

# Checking the connection with the page
if wiki_data_2 != " ":
    print("Data retrieved succesfully!")

Data retrieved succesfully!


In [40]:
soup_2 = BeautifulSoup(wiki_data_2, "html5lib")

In [41]:
table_2 = soup_2.find_all("table", {'class':'wikitable sortable'})
df_2 = pd.read_html(str(table_2[0]), index_col=None, header=0)[0]
df_2.head()

Unnamed: 0,District Number,Name,District area[n 1] (Ha.),Population,Population density(Hab./Ha.),Location,Administrative wards
0,1.0,Centro,522.82,131928,252.34,,Palacio (11)Embajadores (12)Cortes (13)Justici...
1,2.0,Arganzuela,646.22,151965,235.16,,Imperial (21)Acacias (22)Chopera (23)Legazpi (...
2,3.0,Retiro,546.62,118516,216.82,,Pacífico (31)Adelfas (32)Estrella (33)Ibiza (3...
3,4.0,Salamanca,539.24,143800,266.67,,Recoletos (41)Goya (42)Fuente del Berro (43)Gu...
4,5.0,Chamartín,917.55,143424,156.31,,El Viso (51)Prosperidad (52)Ciudad Jardín (53)...


In [42]:
df_2.shape

(22, 7)

In [43]:
df_2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 22 entries, 0 to 21
Data columns (total 7 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   District Number               21 non-null     float64
 1   Name                          22 non-null     object 
 2   District area[n 1] (Ha.)      22 non-null     float64
 3   Population                    22 non-null     int64  
 4   Population density(Hab./Ha.)  22 non-null     float64
 5   Location                      0 non-null      float64
 6   Administrative wards          21 non-null     object 
dtypes: float64(4), int64(1), object(2)
memory usage: 1.3+ KB


In [44]:
# Creating a copy of the DataFrame
df_madrid = df_2
df_madrid.head()

Unnamed: 0,District Number,Name,District area[n 1] (Ha.),Population,Population density(Hab./Ha.),Location,Administrative wards
0,1.0,Centro,522.82,131928,252.34,,Palacio (11)Embajadores (12)Cortes (13)Justici...
1,2.0,Arganzuela,646.22,151965,235.16,,Imperial (21)Acacias (22)Chopera (23)Legazpi (...
2,3.0,Retiro,546.62,118516,216.82,,Pacífico (31)Adelfas (32)Estrella (33)Ibiza (3...
3,4.0,Salamanca,539.24,143800,266.67,,Recoletos (41)Goya (42)Fuente del Berro (43)Gu...
4,5.0,Chamartín,917.55,143424,156.31,,El Viso (51)Prosperidad (52)Ciudad Jardín (53)...


In [45]:
df_madrid.columns

Index(['District Number', 'Name', 'District area[n 1] (Ha.)', 'Population',
       'Population density(Hab./Ha.)', 'Location', 'Administrative wards'],
      dtype='object')

In [46]:
# Dropping unwanted columns
df_madrid.drop(["District Number", "District area[n 1] (Ha.)", "Population density(Hab./Ha.)", "Location", "Administrative wards"], axis=1, inplace=True)

In [47]:
df_madrid

Unnamed: 0,Name,Population
0,Centro,131928
1,Arganzuela,151965
2,Retiro,118516
3,Salamanca,143800
4,Chamartín,143424
5,Tetuán,153789
6,Chamberí,137401
7,Fuencarral-El Pardo,238756
8,Moncloa-Aravaca,116903
9,Latina,233808


In [48]:
df_madrid.drop(df_madrid.index[[21]])

Unnamed: 0,Name,Population
0,Centro,131928
1,Arganzuela,151965
2,Retiro,118516
3,Salamanca,143800
4,Chamartín,143424
5,Tetuán,153789
6,Chamberí,137401
7,Fuencarral-El Pardo,238756
8,Moncloa-Aravaca,116903
9,Latina,233808


In [49]:
address_2 = 'Madrid'
geolocator_2 = Nominatim(user_agent="madrid_explorer")
location_2 = geolocator_2.geocode(address_2)
print(location_2)

Madrid, Área metropolitana de Madrid y Corredor del Henares, Comunidad de Madrid, 28001, España


In [50]:
latitude_m = location_2.latitude
longitude_m = location_2.longitude
print('The geograpical coordinates of Madrid are {}, {}.'.format(latitude_m, longitude_m))

The geograpical coordinates of Madrid are 40.4167047, -3.7035825.


In [51]:
df_madrid.rename(columns={'Name': 'Borough'}, inplace=True)

In [52]:
bor_madrid = df_madrid["Borough"].tolist()

In [53]:
city_2 = "Madrid"
lats_2, lngs_2, adds_2 = [], [], []

for borough_2 in bor_madrid:
    address_2 = borough_2 + ", " + city_2
    location_madrid = geolocator_2.geocode(address_2)
    lat_2 = location_madrid.latitude
    lng_2 = location_madrid.longitude
    lats_2.append(lat_2)
    lngs_2.append(lng_2)
    adds_2.append(address_2)
    print(address_2, lat_2, lng_2)

Centro, Madrid 40.417652700000005 -3.7079137662915533
Arganzuela, Madrid 40.3969535 -3.6972891
Retiro, Madrid 40.4111495 -3.6760566
Salamanca, Madrid 40.4270451 -3.6806024
Chamartín, Madrid 40.4589872 -3.6761288
Tetuán, Madrid 40.4605781 -3.6982806
Chamberí, Madrid 40.43624735 -3.7038303534513837
Fuencarral-El Pardo, Madrid 40.55634555 -3.7785905137518054
Moncloa-Aravaca, Madrid 40.43949485 -3.7442035396547055
Latina, Madrid 40.4035317 -3.736152
Carabanchel, Madrid 40.3742112 -3.744676
Usera, Madrid 40.383894 -3.7064459
Puente de Vallecas, Madrid 40.3835532 -3.65453548036571
Moratalaz, Madrid 40.4059332 -3.6448737
Ciudad Lineal, Madrid 40.4484305 -3.650495
Hortaleza, Madrid 40.4725491 -3.6425515
Villaverde, Madrid 40.3456104 -3.6959556
Villa de Vallecas, Madrid 40.3739576 -3.6121632
Vicálvaro, Madrid 40.3965841 -3.5766216
San Blas-Canillejas, Madrid 40.428919050000005 -3.604002428077398
Barajas, Madrid 40.4733176 -3.5798446
TOTAL, Madrid 48.951263 2.4894989391771016


In [54]:
df_madrid_coord = pd.DataFrame(list(zip(adds_2, lats_2, lngs_2)), columns=["Borough", "Latitude", "Longitude"])

In [55]:
df_madrid_coord

Unnamed: 0,Borough,Latitude,Longitude
0,"Centro, Madrid",40.417653,-3.707914
1,"Arganzuela, Madrid",40.396954,-3.697289
2,"Retiro, Madrid",40.41115,-3.676057
3,"Salamanca, Madrid",40.427045,-3.680602
4,"Chamartín, Madrid",40.458987,-3.676129
5,"Tetuán, Madrid",40.460578,-3.698281
6,"Chamberí, Madrid",40.436247,-3.70383
7,"Fuencarral-El Pardo, Madrid",40.556346,-3.778591
8,"Moncloa-Aravaca, Madrid",40.439495,-3.744204
9,"Latina, Madrid",40.403532,-3.736152


In [56]:
df_madrid["Latitude"] = df_madrid_coord["Latitude"]

In [57]:
df_madrid["Longitude"] = df_madrid_coord["Longitude"]

In [58]:
df_madrid

Unnamed: 0,Borough,Population,Latitude,Longitude
0,Centro,131928,40.417653,-3.707914
1,Arganzuela,151965,40.396954,-3.697289
2,Retiro,118516,40.41115,-3.676057
3,Salamanca,143800,40.427045,-3.680602
4,Chamartín,143424,40.458987,-3.676129
5,Tetuán,153789,40.460578,-3.698281
6,Chamberí,137401,40.436247,-3.70383
7,Fuencarral-El Pardo,238756,40.556346,-3.778591
8,Moncloa-Aravaca,116903,40.439495,-3.744204
9,Latina,233808,40.403532,-3.736152


In [59]:
df_madrid.drop(df_madrid.index[[21]], inplace=True)

In [60]:
df_madrid

Unnamed: 0,Borough,Population,Latitude,Longitude
0,Centro,131928,40.417653,-3.707914
1,Arganzuela,151965,40.396954,-3.697289
2,Retiro,118516,40.41115,-3.676057
3,Salamanca,143800,40.427045,-3.680602
4,Chamartín,143424,40.458987,-3.676129
5,Tetuán,153789,40.460578,-3.698281
6,Chamberí,137401,40.436247,-3.70383
7,Fuencarral-El Pardo,238756,40.556346,-3.778591
8,Moncloa-Aravaca,116903,40.439495,-3.744204
9,Latina,233808,40.403532,-3.736152


In [61]:
# create map of New York using latitude and longitude values
map_madrid = folium.Map(location=[latitude_m, longitude_m], zoom_start=10)

# add markers to map
for latt, lngg, label_2 in zip(df_madrid['Latitude'], df_madrid['Longitude'], df_madrid['Borough']):
    #label_2 = '{}'.format(boroughh)
    label_2 = folium.Popup(label_2, parse_html=True)
    folium.CircleMarker(
        [latt, lngg],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_madrid)  
    
map_madrid

In [62]:
CLIENT_ID = 'XV0DMY5DLCNSFC4ZUXYJ2LUKDV0ROYCUY2SYDKLOTQZFDYNA' # your Foursquare ID
CLIENT_SECRET = 'WEKJ35WDWX5YBJCHBXMSS10DQDZPSGPJJ4UFEDDAA1BOD2IQ' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: XV0DMY5DLCNSFC4ZUXYJ2LUKDV0ROYCUY2SYDKLOTQZFDYNA
CLIENT_SECRET:WEKJ35WDWX5YBJCHBXMSS10DQDZPSGPJJ4UFEDDAA1BOD2IQ


In [63]:
madrid_venues = getNearbyVenues(names=df_madrid['Borough'],
                                   latitudes=df_madrid['Latitude'],
                                   longitudes=df_madrid['Longitude']
                                  )

Centro
Arganzuela
Retiro
Salamanca
Chamartín
Tetuán
Chamberí
Fuencarral-El Pardo
Moncloa-Aravaca
Latina
Carabanchel
Usera
Puente de Vallecas
Moratalaz
Ciudad Lineal
Hortaleza
Villaverde
Villa de Vallecas
Vicálvaro
San Blas-Canillejas
Barajas


In [64]:
print(madrid_venues.shape)
madrid_venues.head()

(576, 7)


Unnamed: 0,Borough,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Centro,40.417653,-3.707914,Plaza de Isabel II,40.418114,-3.709397,Plaza
1,Centro,40.417653,-3.707914,Cerveceria Erte,40.419241,-3.70747,Bar
2,Centro,40.417653,-3.707914,Amorino,40.416065,-3.708383,Ice Cream Shop
3,Centro,40.417653,-3.707914,TOC Hostel,40.417264,-3.705928,Hostel
4,Centro,40.417653,-3.707914,Casa Jaguar,40.419019,-3.708516,Bistro


In [65]:
madrid_venues.groupby('Borough').count()

Unnamed: 0_level_0,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Arganzuela,35,35,35,35,35,35
Barajas,42,42,42,42,42,42
Carabanchel,13,13,13,13,13,13
Centro,76,76,76,76,76,76
Chamartín,43,43,43,43,43,43
Chamberí,81,81,81,81,81,81
Ciudad Lineal,29,29,29,29,29,29
Hortaleza,22,22,22,22,22,22
Latina,7,7,7,7,7,7
Moncloa-Aravaca,3,3,3,3,3,3


In [66]:
print('There are {} uniques categories.'.format(len(madrid_venues['Venue Category'].unique())))

There are 136 uniques categories.


In [67]:
# one hot encoding
madrid_onehot = pd.get_dummies(madrid_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
madrid_onehot['Borough'] = madrid_venues['Borough'] 

# move neighborhood column to the first column
fixed_columns_2 = [madrid_onehot.columns[-1]] + list(madrid_onehot.columns[:-1])
madrid_onehot = madrid_onehot[fixed_columns_2]

madrid_onehot.head()

Unnamed: 0,Borough,Accessories Store,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Asian Restaurant,BBQ Joint,Bakery,Bar,...,Student Center,Supermarket,Sushi Restaurant,Tapas Restaurant,Thai Restaurant,Theater,Thrift / Vintage Store,Toy / Game Store,Wine Bar,Women's Store
0,Centro,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Centro,0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
2,Centro,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Centro,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Centro,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [68]:
madrid_grouped = madrid_onehot.groupby('Borough').mean().reset_index()
madrid_grouped

Unnamed: 0,Borough,Accessories Store,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Asian Restaurant,BBQ Joint,Bakery,Bar,...,Student Center,Supermarket,Sushi Restaurant,Tapas Restaurant,Thai Restaurant,Theater,Thrift / Vintage Store,Toy / Game Store,Wine Bar,Women's Store
0,Arganzuela,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.057143,0.057143,...,0.0,0.0,0.0,0.114286,0.0,0.0,0.0,0.0,0.0,0.0
1,Barajas,0.0,0.0,0.0,0.047619,0.0,0.02381,0.0,0.0,0.0,...,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.02381,0.0
2,Carabanchel,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.076923,0.0,0.153846,0.0,0.0,0.0,0.0,0.0,0.0
3,Centro,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,...,0.0,0.0,0.0,0.039474,0.0,0.026316,0.0,0.0,0.013158,0.0
4,Chamartín,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.023256,0.046512,...,0.0,0.046512,0.023256,0.046512,0.0,0.0,0.0,0.0,0.0,0.0
5,Chamberí,0.0,0.012346,0.0,0.0,0.0,0.012346,0.012346,0.037037,0.098765,...,0.0,0.012346,0.012346,0.111111,0.0,0.049383,0.0,0.012346,0.012346,0.0
6,Ciudad Lineal,0.0,0.0,0.0,0.068966,0.0,0.0,0.0,0.034483,0.0,...,0.0,0.068966,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0
7,Hortaleza,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,...,0.0,0.090909,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.045455
8,Latina,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,...,0.142857,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Moncloa-Aravaca,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [69]:
num_top_venues_madrid = 5

for hood in madrid_grouped['Borough']:
    print("----"+hood+"----")
    temp_madrid = madrid_grouped[madrid_grouped['Borough'] == hood].T.reset_index()
    temp_madrid.columns = ['venue','freq']
    temp_madrid = temp_madrid.iloc[1:]
    temp_madrid['freq'] = temp_madrid['freq'].astype(float)
    temp_madrid = temp_madrid.round({'freq': 2})
    print(temp_madrid.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Arganzuela----
                      venue  freq
0          Tapas Restaurant  0.11
1                    Bakery  0.06
2        Spanish Restaurant  0.06
3  Mediterranean Restaurant  0.06
4               Beer Garden  0.06
5                       Bar  0.06
6                Restaurant  0.06
7                    Market  0.06
8                       Gym  0.03
9      Gym / Fitness Center  0.03


----Barajas----
                    venue  freq
0                   Hotel  0.24
1      Spanish Restaurant  0.12
2              Restaurant  0.07
3  Argentinian Restaurant  0.05
4        Tapas Restaurant  0.05
5             Pizza Place  0.02
6          Boarding House  0.02
7          Sandwich Place  0.02
8                   Plaza  0.02
9                 Brewery  0.02


----Carabanchel----
                venue  freq
0    Tapas Restaurant  0.15
1          Restaurant  0.15
2  Spanish Restaurant  0.15
3       Metro Station  0.08
4                Café  0.08
5               Hotel  0.08
6   Food & Drink Sh

In [70]:
num_top_venues_madrid = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Borough']
for ind in np.arange(num_top_venues_madrid):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
borough_venues_sorted_madrid = pd.DataFrame(columns=columns)
borough_venues_sorted_madrid['Borough'] = madrid_grouped['Borough']

for ind in np.arange(madrid_grouped.shape[0]):
    borough_venues_sorted_madrid.iloc[ind, 1:] = return_most_common_venues(madrid_grouped.iloc[ind, :], num_top_venues_madrid)

borough_venues_sorted_madrid.head()

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Arganzuela,Tapas Restaurant,Restaurant,Mediterranean Restaurant,Market,Beer Garden,Spanish Restaurant,Bakery,Bar,Burger Joint,Italian Restaurant
1,Barajas,Hotel,Spanish Restaurant,Restaurant,Argentinian Restaurant,Tapas Restaurant,Himalayan Restaurant,Brewery,Plaza,Pizza Place,Café
2,Carabanchel,Spanish Restaurant,Tapas Restaurant,Restaurant,Candy Store,Hotel,Food & Drink Shop,Metro Station,Café,Grocery Store,Supermarket
3,Centro,Plaza,Spanish Restaurant,Hotel,Gourmet Shop,Bookstore,Hostel,Tapas Restaurant,Restaurant,Department Store,Mexican Restaurant
4,Chamartín,Restaurant,Spanish Restaurant,Mediterranean Restaurant,Grocery Store,Gym,Tapas Restaurant,Plaza,Supermarket,Cocktail Bar,Bar


In [71]:
madrid_rich = ('Salamanca', 'Retiro', 'Chamberí', 'Centro', 'Chamartín')

In [72]:
df_madrid_rich_venues = borough_venues_sorted_madrid.loc[borough_venues_sorted_madrid['Borough'].isin(madrid_rich)]
df_madrid_rich_venues

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Centro,Plaza,Spanish Restaurant,Hotel,Gourmet Shop,Bookstore,Hostel,Tapas Restaurant,Restaurant,Department Store,Mexican Restaurant
4,Chamartín,Restaurant,Spanish Restaurant,Mediterranean Restaurant,Grocery Store,Gym,Tapas Restaurant,Plaza,Supermarket,Cocktail Bar,Bar
5,Chamberí,Spanish Restaurant,Tapas Restaurant,Bar,Café,Restaurant,Theater,Bakery,Plaza,Mediterranean Restaurant,Beer Bar
12,Retiro,Spanish Restaurant,Plaza,Garden,Supermarket,Dog Run,Diner,Jazz Club,Dessert Shop,Board Shop,Pizza Place
13,Salamanca,Restaurant,Spanish Restaurant,Tapas Restaurant,Furniture / Home Store,Italian Restaurant,Burger Joint,Mediterranean Restaurant,Bakery,Ice Cream Shop,Café


### Comparing the most common venues in the richest boruoghs of London and Madrid

In [73]:
df_london_rich_venues

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Camden,Pub,Coffee Shop,Café,Burger Joint,Italian Restaurant,Ice Cream Shop,Beer Bar,Vegetarian / Vegan Restaurant,Caribbean Restaurant,Vietnamese Restaurant
10,Hackney,Coffee Shop,Pub,Café,Supermarket,Brewery,Flea Market,Beer Store,Sporting Goods Shop,Boutique,Yoga Studio
11,Hammersmith and Fulham,Café,Pub,Coffee Shop,Hotel,Gym / Fitness Center,Grocery Store,Sandwich Place,Thai Restaurant,Breakfast Spot,Portuguese Restaurant
18,Kensington and Chelsea,Café,Pub,Italian Restaurant,Persian Restaurant,Burger Joint,Clothing Store,Supermarket,Breakfast Spot,Mediterranean Restaurant,Filipino Restaurant
31,Westminster,Coffee Shop,Pub,Sandwich Place,Historic Site,Outdoor Sculpture,Plaza,Café,Monument / Landmark,Hotel,Garden


### Machine Learning to cluster the venues

#### Clustering London by K-Means

In [94]:
# set number of clusters
kclusters_london = 5

london_grouped_clustering = london_grouped.loc[borough_venues_sorted['Borough'].isin(london_rich)]
london_grouped_clustering = london_grouped_clustering.drop('Borough', 1)

# run k-means clustering
kmeans_london = KMeans(n_clusters=kclusters_london, random_state=0).fit(london_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans_london.labels_[0:100] 

array([4, 1, 2, 3, 0], dtype=int32)

Creating a df to include the clusters and the top 10 venues for each Borough

In [95]:
kmeans_london

KMeans(n_clusters=5, random_state=0)

In [101]:
#borough_venues_sorted.insert(0, 'Cluster Labels', kmeans_london.labels_)

In [102]:
borough_venues_sorted

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Barking and Dagenham,Bus Stop,Liquor Store,Grocery Store,Yoga Studio,Falafel Restaurant,Food Court,Food & Drink Shop,Flea Market,Fish Market,Fish & Chips Shop
1,Barnet,Coffee Shop,Pharmacy,Park,Restaurant,Pizza Place,Convenience Store,Pub,Bookstore,Fast Food Restaurant,Bakery
2,Bexley,Pub,Fast Food Restaurant,Greek Restaurant,Chinese Restaurant,Toy / Game Store,Train Station,Italian Restaurant,Tennis Court,Indian Restaurant,Breakfast Spot
3,Brent,Coffee Shop,Hotel,Supermarket,Pedestrian Plaza,Electronics Store,Café,Bus Stop,Food Court,Burger Joint,Sports Bar
4,Bromley,Clothing Store,Coffee Shop,Burger Joint,Pub,Pizza Place,Portuguese Restaurant,Park,Chocolate Shop,Gelato Shop,Stationery Store
5,Camden,Pub,Coffee Shop,Café,Burger Joint,Italian Restaurant,Ice Cream Shop,Beer Bar,Vegetarian / Vegan Restaurant,Caribbean Restaurant,Vietnamese Restaurant
6,Croydon,Pub,Coffee Shop,Spanish Restaurant,Burger Joint,Caribbean Restaurant,Mediterranean Restaurant,Furniture / Home Store,Malay Restaurant,Bookstore,Gaming Cafe
7,Ealing,Coffee Shop,Pub,Clothing Store,Grocery Store,Italian Restaurant,Café,Park,Bakery,Burger Joint,Bus Stop
8,Enfield,Clothing Store,Coffee Shop,Supermarket,Optical Shop,Pub,Café,Video Game Store,Shopping Mall,Bookstore,Gift Shop
9,Greenwich,Pub,Boat or Ferry,Burger Joint,Pizza Place,Garden,Bakery,Market,History Museum,Grocery Store,Pier


In [103]:
# Dropping NaN to prevent erros
#london_merged_clean = london_merged.dropna(subset=['Cluster Labels'])

In [104]:
london_merged = df_london

london_merged = london_merged.join(borough_venues_sorted.set_index('Borough'), on='Borough')

london_merged.head()

Unnamed: 0,Borough,Area (sq mi),Population (2019),Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Barking and Dagenham,13.93,212906,51.554117,0.150504,Bus Stop,Liquor Store,Grocery Store,Yoga Studio,Falafel Restaurant,Food Court,Food & Drink Shop,Flea Market,Fish Market,Fish & Chips Shop
1,Barnet,33.49,395896,51.65309,-0.200226,Coffee Shop,Pharmacy,Park,Restaurant,Pizza Place,Convenience Store,Pub,Bookstore,Fast Food Restaurant,Bakery
2,Bexley,23.38,248287,51.441679,0.150488,Pub,Fast Food Restaurant,Greek Restaurant,Chinese Restaurant,Toy / Game Store,Train Station,Italian Restaurant,Tennis Court,Indian Restaurant,Breakfast Spot
3,Brent,16.7,329771,51.563826,-0.27576,Coffee Shop,Hotel,Supermarket,Pedestrian Plaza,Electronics Store,Café,Bus Stop,Food Court,Burger Joint,Sports Bar
4,Bromley,57.97,332336,51.402805,0.014814,Clothing Store,Coffee Shop,Burger Joint,Pub,Pizza Place,Portuguese Restaurant,Park,Chocolate Shop,Gelato Shop,Stationery Store


In [80]:
# Dropping NaN to prevent erros
#london_merged_clean = london_merged.dropna(subset=['Cluster Labels'])

In [81]:
# Leaving only the reaching areas
london_rich_clean = london_merged.loc[london_merged['Borough'].isin(london_rich)]

In [82]:
london_rich_clean

Unnamed: 0,Borough,Area (sq mi),Population (2019),Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Camden,8.4,270029,51.542305,-0.13956,Pub,Coffee Shop,Café,Burger Joint,Italian Restaurant,Ice Cream Shop,Beer Bar,Vegetarian / Vegan Restaurant,Caribbean Restaurant,Vietnamese Restaurant
10,Hackney,7.36,281120,51.54324,-0.049362,Coffee Shop,Pub,Café,Supermarket,Brewery,Flea Market,Beer Store,Sporting Goods Shop,Boutique,Yoga Studio
11,Hammersmith and Fulham,6.33,185143,51.492038,-0.22364,Café,Pub,Coffee Shop,Hotel,Gym / Fitness Center,Grocery Store,Sandwich Place,Thai Restaurant,Breakfast Spot,Portuguese Restaurant
18,Kensington and Chelsea,4.68,156129,51.49848,-0.199043,Café,Pub,Italian Restaurant,Persian Restaurant,Burger Joint,Clothing Store,Supermarket,Breakfast Spot,Mediterranean Restaurant,Filipino Restaurant
31,Westminster,8.29,261317,51.500444,-0.12654,Coffee Shop,Pub,Sandwich Place,Historic Site,Outdoor Sculpture,Plaza,Café,Monument / Landmark,Hotel,Garden


In [83]:
import matplotlib.cm as cm
import matplotlib.colors as colors

In [107]:
# create map
map_clusters_london = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters_london)
ys = [i + x + (i*x)**2 for i in range(kclusters_london)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi in zip(london_rich_clean['Latitude'], london_rich_clean['Longitude'], london_rich_clean['Borough']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.7).add_to(map_clusters_london)
       
map_clusters_london

### Examining Clusters in London

In [121]:
# Cluster 1:
london_rich_clean.loc[london_merged['Cluster Labels'] == 0, london_merged.columns[[1] + list(range(5, london_merged.shape[1]))]]

Unnamed: 0,Area (sq mi),Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,8.4,0,Pub,Coffee Shop,Café,Burger Joint,Italian Restaurant,Ice Cream Shop,Beer Bar,Vegetarian / Vegan Restaurant,Caribbean Restaurant,Vietnamese Restaurant
10,7.36,0,Coffee Shop,Pub,Café,Supermarket,Brewery,Flea Market,Beer Store,Sporting Goods Shop,Boutique,Yoga Studio
11,6.33,0,Café,Pub,Coffee Shop,Hotel,Gym / Fitness Center,Grocery Store,Sandwich Place,Thai Restaurant,Breakfast Spot,Portuguese Restaurant
18,4.68,0,Café,Pub,Italian Restaurant,Persian Restaurant,Burger Joint,Clothing Store,Supermarket,Breakfast Spot,Mediterranean Restaurant,Filipino Restaurant
31,8.29,0,Coffee Shop,Pub,Sandwich Place,Historic Site,Outdoor Sculpture,Plaza,Café,Monument / Landmark,Hotel,Garden


In [122]:
# Cluster 2:
london_rich_clean.loc[london_merged['Cluster Labels'] == 1, london_merged.columns[[1] + list(range(5, london_merged.shape[1]))]]

Unnamed: 0,Area (sq mi),Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue


In [123]:
# Cluster 3:
london_rich_clean.loc[london_merged['Cluster Labels'] == 2, london_merged.columns[[1] + list(range(5, london_merged.shape[1]))]]

Unnamed: 0,Area (sq mi),Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue


In [124]:
# Cluster 3:
london_rich_clean.loc[london_merged['Cluster Labels'] == 2, london_merged.columns[[1] + list(range(5, london_merged.shape[1]))]]

Unnamed: 0,Area (sq mi),Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue


In [125]:
# Cluster 4:
london_rich_clean.loc[london_merged['Cluster Labels'] == 5, london_merged.columns[[1] + list(range(5, london_merged.shape[1]))]]

Unnamed: 0,Area (sq mi),Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue


#### Clustering Madrid by K-Means

In [85]:
# set number of clusters
kclusters_madrid = 5

madrid_grouped_clustering = madrid_grouped.drop('Borough', 1)

# run k-means clustering
kmeans_madrid = KMeans(n_clusters=kclusters_madrid, random_state=0).fit(madrid_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans_madrid.labels_[0:10] 

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 3], dtype=int32)

In [86]:
kmeans_madrid

KMeans(n_clusters=5, random_state=0)

In [87]:
borough_venues_sorted_madrid.insert(0, 'Cluster Labels', kmeans_madrid.labels_)

In [88]:
madrid_merged = df_madrid

madrid_merged = madrid_merged.join(borough_venues_sorted_madrid.set_index('Borough'), on='Borough')

madrid_merged.head()

Unnamed: 0,Borough,Population,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Centro,131928,40.417653,-3.707914,0.0,Plaza,Spanish Restaurant,Hotel,Gourmet Shop,Bookstore,Hostel,Tapas Restaurant,Restaurant,Department Store,Mexican Restaurant
1,Arganzuela,151965,40.396954,-3.697289,0.0,Tapas Restaurant,Restaurant,Mediterranean Restaurant,Market,Beer Garden,Spanish Restaurant,Bakery,Bar,Burger Joint,Italian Restaurant
2,Retiro,118516,40.41115,-3.676057,0.0,Spanish Restaurant,Plaza,Garden,Supermarket,Dog Run,Diner,Jazz Club,Dessert Shop,Board Shop,Pizza Place
3,Salamanca,143800,40.427045,-3.680602,0.0,Restaurant,Spanish Restaurant,Tapas Restaurant,Furniture / Home Store,Italian Restaurant,Burger Joint,Mediterranean Restaurant,Bakery,Ice Cream Shop,Café
4,Chamartín,143424,40.458987,-3.676129,0.0,Restaurant,Spanish Restaurant,Mediterranean Restaurant,Grocery Store,Gym,Tapas Restaurant,Plaza,Supermarket,Cocktail Bar,Bar


In [89]:
# Dropping NaN to prevent erros
madrid_merged_clean = madrid_merged.dropna(subset=['Cluster Labels'])

In [90]:
# Leaving only the reaching areas
madrid_rich_clean = madrid_merged_clean.loc[madrid_merged_clean['Borough'].isin(madrid_rich)]

In [91]:
# Matplotlib and plotting module associated
import matplotlib.cm as cm
import matplotlib.colors as colors

In [92]:
madrid_rich_clean

Unnamed: 0,Borough,Population,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Centro,131928,40.417653,-3.707914,0.0,Plaza,Spanish Restaurant,Hotel,Gourmet Shop,Bookstore,Hostel,Tapas Restaurant,Restaurant,Department Store,Mexican Restaurant
2,Retiro,118516,40.41115,-3.676057,0.0,Spanish Restaurant,Plaza,Garden,Supermarket,Dog Run,Diner,Jazz Club,Dessert Shop,Board Shop,Pizza Place
3,Salamanca,143800,40.427045,-3.680602,0.0,Restaurant,Spanish Restaurant,Tapas Restaurant,Furniture / Home Store,Italian Restaurant,Burger Joint,Mediterranean Restaurant,Bakery,Ice Cream Shop,Café
4,Chamartín,143424,40.458987,-3.676129,0.0,Restaurant,Spanish Restaurant,Mediterranean Restaurant,Grocery Store,Gym,Tapas Restaurant,Plaza,Supermarket,Cocktail Bar,Bar
6,Chamberí,137401,40.436247,-3.70383,0.0,Spanish Restaurant,Tapas Restaurant,Bar,Café,Restaurant,Theater,Bakery,Plaza,Mediterranean Restaurant,Beer Bar


In [93]:
# create map
map_clusters_madrid = folium.Map(location=[latitude_m, longitude_m], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters_madrid)
ys = [i + x + (i*x)**2 for i in range(kclusters_madrid)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(madrid_rich_clean['Latitude'], madrid_rich_clean['Longitude'], madrid_rich_clean['Borough'], madrid_rich_clean['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.7).add_to(map_clusters_madrid)
       
map_clusters_madrid

### Examining Clusters in Madrid

In [100]:
# Cluster 1:
madrid_rich_clean.loc[madrid_merged['Cluster Labels'] == 0, madrid_merged.columns[[1] + list(range(5, madrid_merged.shape[1]))]]

Unnamed: 0,Population,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,131928,Plaza,Spanish Restaurant,Hotel,Gourmet Shop,Bookstore,Hostel,Tapas Restaurant,Restaurant,Department Store,Mexican Restaurant
2,118516,Spanish Restaurant,Plaza,Garden,Supermarket,Dog Run,Diner,Jazz Club,Dessert Shop,Board Shop,Pizza Place
3,143800,Restaurant,Spanish Restaurant,Tapas Restaurant,Furniture / Home Store,Italian Restaurant,Burger Joint,Mediterranean Restaurant,Bakery,Ice Cream Shop,Café
4,143424,Restaurant,Spanish Restaurant,Mediterranean Restaurant,Grocery Store,Gym,Tapas Restaurant,Plaza,Supermarket,Cocktail Bar,Bar
6,137401,Spanish Restaurant,Tapas Restaurant,Bar,Café,Restaurant,Theater,Bakery,Plaza,Mediterranean Restaurant,Beer Bar


In [101]:
# Cluster 2:
madrid_rich_clean.loc[madrid_merged['Cluster Labels'] == 1, madrid_merged.columns[[1] + list(range(5, madrid_merged.shape[1]))]]

Unnamed: 0,Population,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue


In [102]:
# Cluster 3:
madrid_rich_clean.loc[madrid_merged['Cluster Labels'] == 2, madrid_merged.columns[[1] + list(range(5, madrid_merged.shape[1]))]]

Unnamed: 0,Population,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue


In [103]:
# Cluster 4:
madrid_rich_clean.loc[madrid_merged['Cluster Labels'] == 3, madrid_merged.columns[[1] + list(range(5, madrid_merged.shape[1]))]]

Unnamed: 0,Population,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue


In [104]:
# Cluster 5:
madrid_rich_clean.loc[madrid_merged['Cluster Labels'] == 5, madrid_merged.columns[[1] + list(range(5, madrid_merged.shape[1]))]]

Unnamed: 0,Population,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue


## 6. Results

In which city do the sushi restaurant should be open?

London is very very multicultural. This place is now called home by citizens whose origins are from all over the world. On the other side Madrid is rich of foreign people but it is safe to say that probably the most of them are european. Despite the target of the reseach are the most exclusive areas (top 5) of this two cities, it is easy to point out that London offers a wide range of differents cuisines - both asian and european - while Madrid remains more attached on the european styles (Spanish in primis). Besides this, another important thing must be highlitened: also in the richest areas, the top venues in London are pubs and coffè shops, while in Madrid we can find more traditional restaurants: this concept make clears how the people in Madrid have a different life style, which appears to be more favourable to welcome positively a sushi restaurant since this places are not famous for beer and not pretentious food but for a very sofisticated cuisine. In addition, since London is already full of different kind of restaurants, a sushi one risks to become just one of them, while in Madrid this could become a landmark. 

For these reasons, *__Madrid should be chosen to open the sushi restaurant__* if the intention is to become a landmark.