## IBM Data Science Professional Certificate
### Capstone Project - The Battle of Neighborhoods

Damien Azzopardi - July 2021

<h2>Table of Contents</h2>
<br>
<ol>
    <li><a href="introduction"><b>Introduction</b></a>
<br>
<br>
    <li><a href="data"><b>Data</b></a></li>
        <ul>
            <li><a href="neighborhoods_and_coordinates">Neighborhoods and coordinates</a>
            <li><a href="districts">Districts</a>    
            <li><a href="venues">Venues</a>     
        </ul>
<br>
    <li><a href="data_manipulation"><b>Data Extraction & Manipulation</b></a></li>
        <ul>
            <li><a href="scrap_neighborhoods">Scrap Barcelona's neighborhoods and coordinates</a>
            <li><a href="import_discticts">Import Barcelona's districts</a>
            <li><a href="map_neighborhoods">Map Barcelona's neighborhoods</a>
        </ul>
<br>
    <li><a href="analysis"><b>Finding the best neighboorhood in Barcelona using K Means</b></a></li>
<br>
    <li><a href="conclusion"><b>Conclusion</b></a></li>

<h2 id="introduction">Introduction</h2>

**The Green Alternative** is a group of vegetarian restaurants, which started operating in Madrid, Spain, in 2010. We are currently running six different restaurants across different neighborhoods in Madrid, oriented towards locals. As our group is becoming successful in the Spanish capital, this year, we would like to expand our operations and open a vegetarian restaurant in Barcelona.

The question we are trying to answer is **what is the best neighborhood to open a vegetarian restaurant in Barcelona?**

After running a market research and looking into the data collected from our six current restaurants in Madrid, we found that our most successful locations are in neighborhoods which, **in order of priority**:
1. Are close to a **metro** or **train station**, where the flow of people is high.
2. Have a **gym** close by, as most of our customers come for lunch or dinner after training at the gym.
3. Have a **park** or **garden** close by, where our customers like to have lunch.

Knowing this, we'll leverage the Foursquare location data in order to calculate the density of metro and train stations, parks, gardens, and gyms, for each neighborhood in Barcelona, and pick the one with higher density of selected venues to open our first vegetarian restaurant in the city of Barcelona.

<h2 id="data">Data</h2>

The data we will be using to help us answer our question comes from the following sources.

<h3 id="neighborhoods_and_coordinates">Neighborhoods and coordinates</h3>

<h4>Metabolism of Cities</h4>

The full list of Barcelona's neighborhoods, along with their corresponding coordinates is available in [this](https://data.metabolismofcities.org/library/maps/577245/view/) page (*metabolismofcities.org*). It consists of a table with two rows, **Neighborhoods** and **Coordinates**. We will scrap the table containing the list of neighborhoods and coordinates directly in this workbook.


<h3 id="districts">Districts</h3>

<h4>Wikipedia</h4>

The full list of Barcelona's districts, along with their corresponding neighborhoods is available in [this](https://en.wikipedia.org/wiki/Districts_of_Barcelona) page (*wikipedia.org*). We will export a CSV containing two rows, **Districts** and **Neighborhoods**, that we will read directly in this workbook, and join it with the first dataset containing the **Neighborhoods** and **Coordinates**.


<h3 id="venues">Venues</h3>

<h4>Foursquare</h4>

We will leverage the Foursquare location data in order to calculate the density of the venues we have selected for the analysis. We will join it with the first two datasets containing the **District**, **Neighborhoods** and **Coordinates**.

<h2 id="data_manipulation">Data Extraction & Manipulation</h2>

In [1]:
# load libraries
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
#!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium

<h3 id="scrap_neighborhoods">Scrap Barcelona's neighborhoods and coordinates</h3>

The full list of Barcelona's neighborhoods, along with their corresponding coordinates is available in [this](https://data.metabolismofcities.org/library/maps/577245/view/) page (*metabolismofcities.org*).

In [2]:
# scrap Barcelona's neighborhoods and coordinates table
url = 'https://data.metabolismofcities.org/library/maps/577245/view/'

r = requests.get(url)
html = r.text

soup = BeautifulSoup(html, 'lxml')
table = soup.find('table')
rows = table.find_all('tr')
data = []
for row in rows[1:]:
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    data.append([ele for ele in cols if ele])

# convert to dataframe
df = pd.DataFrame(data)

# rename columns
df.columns = ['Neighborhood', 'Coordinates']

df.head()

Unnamed: 0,Neighborhood,Coordinates
0,Baró de Viver,"[41.44581467347341, 2.19899775842406]"
1,Can Baró,"[41.4167603624773, 2.1623865539676492]"
2,Can Peguera,"[41.43484212038238, 2.1664501320817235]"
3,Canyelles,"[41.445032990983854, 2.1634504252403164]"
4,Ciutat Meridiana,"[41.46120773644666, 2.1748476502321963]"


Split the 'Coordinates' column into two seperate 'Latitude' and 'Longitude' columns.

In [3]:
# split the 'Coordinates' column
df[['Latitude','Longitude']] = df.Coordinates.str.split(', ', expand = True)

# drop the 'Coordinates' column
df_bcn_neighborhoods = df.drop(['Coordinates'], axis = 1)

df_bcn_neighborhoods.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Baró de Viver,[41.44581467347341,2.19899775842406]
1,Can Baró,[41.4167603624773,2.1623865539676492]
2,Can Peguera,[41.43484212038238,2.1664501320817235]
3,Canyelles,[41.445032990983854,2.1634504252403164]
4,Ciutat Meridiana,[41.46120773644666,2.1748476502321963]


Remove the special characters in the 'Latitude' and 'Longitude' columns and check the colummns type.

In [4]:
# special characters to remove from the dataframe
spec_chars = ["[","]"]

# removing special characters from the 'Latitude' column
for char in spec_chars:
    df_bcn_neighborhoods['Latitude'] = df_bcn_neighborhoods['Latitude'].str.replace(char,'', regex=True)

# removing special characters from the 'Longitude' column
for char in spec_chars:
    df_bcn_neighborhoods['Longitude'] = df_bcn_neighborhoods['Longitude'].str.replace(char,'', regex=True)

# check column type
df_bcn_neighborhoods.dtypes

Neighborhood    object
Latitude        object
Longitude       object
dtype: object

Modify the 'Latitude' and 'Longitude' columns type to **float** so they can be properly properly used forward.

In [5]:
# change column type 
df_bcn_neighborhoods = df_bcn_neighborhoods.astype({"Neighborhood": str, "Latitude": float, "Longitude": float})

# check column type
df_bcn_neighborhoods.dtypes

Neighborhood     object
Latitude        float64
Longitude       float64
dtype: object

Check the final dataset containing each neighborhood along with its corresponding latitude and longitude.

In [6]:
df_bcn_neighborhoods.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Baró de Viver,41.445815,2.198998
1,Can Baró,41.41676,2.162387
2,Can Peguera,41.434842,2.16645
3,Canyelles,41.445033,2.16345
4,Ciutat Meridiana,41.461208,2.174848


<h3 id="import_discticts">Import Barcelona's districts</h3>

The full list of Barcelona's districts, along with their corresponding neirhborhoods is available in [this](https://en.wikipedia.org/wiki/Districts_of_Barcelona) page (*wikipedia.org*).

In [7]:
# import dataset with Districts
df_bcn_districts = pd.read_csv("/Users/damienazzopardi/Documents/GitHub/Coursera_Capstone/Districts_Barcelona.csv")

df_bcn_districts.head()

Unnamed: 0,Neighborhoods,District
0,Baró de Viver,Sant Andreu
1,Can Baró,Horta-Guinardó
2,Can Peguera,Nou Barris
3,Canyelles,Nou Barris
4,Ciutat Meridiana,Nou Barris


Merge both datasets containing neighborhoods, coordinates and districts together, and check the final dataset.

In [8]:
# merge both dataframes into one
df_bcn = pd.merge(df_bcn_neighborhoods, df_bcn_districts, how = 'left', left_on = 'Neighborhood', right_on = 'Neighborhoods')
df_bcn.drop("Neighborhoods", axis = 1, inplace = True)

df_bcn.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,District
0,Baró de Viver,41.445815,2.198998,Sant Andreu
1,Can Baró,41.41676,2.162387,Horta-Guinardó
2,Can Peguera,41.434842,2.16645,Nou Barris
3,Canyelles,41.445033,2.16345,Nou Barris
4,Ciutat Meridiana,41.461208,2.174848,Nou Barris


<h3 id="map_neighborhoods">Map Barcelona's neighborhoods</h3>

Define an instance of the geocoder for Barcelona.

In [9]:
address = 'Barcelona, Spain'
geolocator = Nominatim(user_agent="barcelona_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

print('The geograpical coordinate of Barcelona are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Barcelona are 41.3828939, 2.1774322.


Create a map of Barcelona with neighborhoods superimposed on top.

In [10]:
# create map of Barcelona using latitude and longitude values
map_bcn = folium.Map(location=[latitude, longitude], tiles="Stamen Terrain", zoom_start=12)

# add neighborhoods markers to map
for lat, lng, district, neighborhoods in zip(df_bcn['Latitude'], df_bcn['Longitude'], df_bcn['District'], df_bcn['Neighborhood']):
    label = '{}, {}'.format(neighborhoods, district)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_bcn)  
    
map_bcn

<h2 id="analysis">Finding the best neighboorhood in Barcelona using K Means</h2>

Define Foursquare credentials and version

In [11]:
CLIENT_ID = 'RIYT5OPWMC205ZAGHGCHANWVHPBEQ2HJSTIRKLYXIAY0AWOJ'
CLIENT_SECRET = 'CTFKGNQ13Y5ISMAXCEZL1ZB40WHZB1IXG0E44ZFBRLSXKCMD'
VERSION = '20180605'
LIMIT = 100

Create a function that extracts the category of the venue.

In [12]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Create a function that gets all the venues within 500 meters of a specific neighborhood.

In [13]:
# function to repeat the same process to all neighborhoods in Barcelona
def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [14]:
bcn_venues = getNearbyVenues(names=df_bcn['Neighborhood'],
                                   latitudes=df_bcn['Latitude'],
                                   longitudes=df_bcn['Longitude']
                                  )

Baró de Viver
Can Baró
Can Peguera
Canyelles
Ciutat Meridiana
Diagonal Mar i el Front Marítim del Poblenou
Horta
Hostafrancs
Montbau
Navas
Pedralbes
Porta
Provençals del Poblenou
Sant Andreu
Sant Antoni
Sant Genís dels Agudells
Sant Gervasi - Galvany
Sant Gervasi - la Bonanova
Sant Martí de Provençals
Sant Pere, Santa Caterina i la Ribera
Sants
Sants - Badal
Sarrià
Torre Baró
Vallbona
Vallcarca i els Penitents
Vallvidrera, el Tibidabo i les Planes
Verdun
Vilapicina i la Torre Llobeta
el Baix Guinardó
el Barri Gòtic
el Besòs i el Maresme
el Bon Pastor
el Camp d'en Grassot i Gràcia Nova
el Camp de l'Arpa del Clot
el Carmel
el Clot
el Coll
el Congrés i els Indians
el Fort Pienc
el Guinardó
el Parc i la Llacuna del Poblenou
el Poble-sec
el Poblenou
el Putxet i el Farró
el Raval
el Turó de la Peira
l'Antiga Esquerra de l'Eixample
la Barceloneta
la Bordeta
la Clota
la Dreta de l'Eixample
la Font d'en Fargues
la Font de la Guatlla
la Guineueta
la Marina de Port
la Marina del Prat Vermell
la M

Check the final dataset containing all venues within 500 meters of a specific neighborhood, along with their category and coordinates.

In [15]:
bcn_venues.head(5)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Baró de Viver,41.445815,2.198998,Apple La Maquinista,41.440802,2.198427,Electronics Store
1,Baró de Viver,41.445815,2.198998,C.C. La Maquinista,41.440464,2.198219,Shopping Mall
2,Baró de Viver,41.445815,2.198998,The 1982 Birres & Burgers,41.4512,2.204559,Burger Joint
3,Baró de Viver,41.445815,2.198998,Nespresso,41.440537,2.198042,Coffee Shop
4,Baró de Viver,41.445815,2.198998,Restaurant Enriqueta,41.445684,2.206801,Spanish Restaurant


Convert each venue into a new categorical column and assign a binary value of 1 or 0 to these columns using the one hot encoding method.

In [16]:
# one hot encoding
df_bcn_onehot = pd.get_dummies(bcn_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
df_bcn_onehot['Neighborhood'] = bcn_venues['Neighborhood']

# move neighborhood column to the first column
fixed_columns = [df_bcn_onehot.columns[-1]] + list(df_bcn_onehot.columns[:-1])
df_bcn_onehot = df_bcn_onehot[fixed_columns]

df_bcn_onehot.head()

# group rows by neighborhood and y taking the mmean of the frequency of occurence of each category
df_bcn_grouped = df_bcn_onehot.groupby('Neighborhood').mean().reset_index()

df_bcn_grouped.head()

Unnamed: 0,Neighborhood,Zoo,Accessories Store,African Restaurant,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Arcade,Argentinian Restaurant,...,Vacation Rental,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Winery,Women's Store,Yoga Studio
0,Baró de Viver,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.0
1,Can Baró,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Can Peguera,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0
3,Canyelles,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Ciutat Meridiana,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


As we're only interested in a specific set of venues to answer our question, let's filter the dataset above and pick only the following venue categories:
- **Metro Station**
- **Train Station**
- **Park**
- **Garden**
- **Gym**

In [17]:
df_bcn_final = df_bcn_grouped[['Neighborhood','Park','Garden','Metro Station', 'Train Station', 'Gym']]

df_bcn_final = df_bcn_final.astype({"Park": float, "Garden": float, "Metro Station": float, "Train Station": float, 'Gym': float})

df_bcn_final.head()

Unnamed: 0,Neighborhood,Park,Garden,Metro Station,Train Station,Gym
0,Baró de Viver,0.03,0.0,0.0,0.0,0.0
1,Can Baró,0.06,0.0,0.0,0.0,0.05
2,Can Peguera,0.03,0.0,0.0,0.0,0.0
3,Canyelles,0.04,0.0,0.04,0.0,0.04
4,Ciutat Meridiana,0.111111,0.0,0.333333,0.111111,0.0


Create a function that sorts the venues in descending order.

In [18]:
# function to sort the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Create a dataframe containing the top 5 venues for each neighborhood.

In [19]:
# top 5 venues for each neighborhood
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = df_bcn_final['Neighborhood']

for ind in np.arange(df_bcn_final.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(df_bcn_final.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Baró de Viver,Park,Garden,Metro Station,Train Station,Gym
1,Can Baró,Park,Gym,Garden,Metro Station,Train Station
2,Can Peguera,Park,Garden,Metro Station,Train Station,Gym
3,Canyelles,Park,Metro Station,Gym,Garden,Train Station
4,Ciutat Meridiana,Metro Station,Park,Train Station,Garden,Gym


Cluster the neighborhoods using the K Means method.

In [20]:
# set number of clusters
kclusters = 5

bcn_grouped_clustering = df_bcn_final.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(bcn_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 0, 2, 2, 4, 2, 0, 2, 0, 2], dtype=int32)

In [21]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

df_bcn_final = df_bcn

# merge datasets
df_bcn_final = df_bcn_final.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

df_bcn_final.head() # check the last columns!

Unnamed: 0,Neighborhood,Latitude,Longitude,District,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Baró de Viver,41.445815,2.198998,Sant Andreu,2,Park,Garden,Metro Station,Train Station,Gym
1,Can Baró,41.41676,2.162387,Horta-Guinardó,0,Park,Gym,Garden,Metro Station,Train Station
2,Can Peguera,41.434842,2.16645,Nou Barris,2,Park,Garden,Metro Station,Train Station,Gym
3,Canyelles,41.445033,2.16345,Nou Barris,2,Park,Metro Station,Gym,Garden,Train Station
4,Ciutat Meridiana,41.461208,2.174848,Nou Barris,4,Metro Station,Park,Train Station,Garden,Gym


Create a map of Barcelona with clusters superimposed on top.

In [22]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], tiles="Stamen Terrain", zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(df_bcn_final['Latitude'], df_bcn_final['Longitude'], df_bcn_final['Neighborhood'], df_bcn_final['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Now let's examine each clusters

### Cluster 1

In [23]:
df_bcn_final.loc[df_bcn_final['Cluster Labels'] == 0, df_bcn_final.columns[[0] + list(range(5, df_bcn_final.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,Can Baró,Park,Gym,Garden,Metro Station,Train Station
6,Horta,Park,Garden,Metro Station,Train Station,Gym
8,Montbau,Park,Garden,Metro Station,Train Station,Gym
10,Pedralbes,Park,Garden,Metro Station,Train Station,Gym
25,Vallcarca i els Penitents,Park,Metro Station,Train Station,Gym,Garden
35,el Carmel,Park,Gym,Garden,Metro Station,Train Station
37,el Coll,Park,Metro Station,Garden,Train Station,Gym
42,el Poble-sec,Park,Garden,Metro Station,Train Station,Gym
50,la Clota,Park,Metro Station,Gym,Garden,Train Station
59,la Prosperitat,Park,Metro Station,Gym,Garden,Train Station


### Cluster 2

In [24]:
df_bcn_final.loc[df_bcn_final['Cluster Labels'] == 1, df_bcn_final.columns[[0] + list(range(5, df_bcn_final.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
23,Torre Baró,Metro Station,Train Station,Gym,Park,Garden


### Cluster 3

In [25]:
df_bcn_final.loc[df_bcn_final['Cluster Labels'] == 2, df_bcn_final.columns[[0] + list(range(5, df_bcn_final.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Baró de Viver,Park,Garden,Metro Station,Train Station,Gym
2,Can Peguera,Park,Garden,Metro Station,Train Station,Gym
3,Canyelles,Park,Metro Station,Gym,Garden,Train Station
5,Diagonal Mar i el Front Marítim del Poblenou,Park,Garden,Metro Station,Train Station,Gym
7,Hostafrancs,Park,Gym,Garden,Metro Station,Train Station
9,Navas,Park,Gym,Garden,Metro Station,Train Station
11,Porta,Park,Gym,Garden,Metro Station,Train Station
12,Provençals del Poblenou,Park,Gym,Garden,Metro Station,Train Station
13,Sant Andreu,Park,Gym,Garden,Metro Station,Train Station
14,Sant Antoni,Park,Garden,Metro Station,Train Station,Gym


### Cluster 4

In [26]:
df_bcn_final.loc[df_bcn_final['Cluster Labels'] == 3, df_bcn_final.columns[[0] + list(range(5, df_bcn_final.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
26,"Vallvidrera, el Tibidabo i les Planes",Train Station,Park,Garden,Metro Station,Gym


### Cluster 5

In [27]:
df_bcn_final.loc[df_bcn_final['Cluster Labels'] == 4, df_bcn_final.columns[[0] + list(range(5, df_bcn_final.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
4,Ciutat Meridiana,Metro Station,Park,Train Station,Garden,Gym
24,Vallbona,Metro Station,Park,Train Station,Garden,Gym


<h2 id="conclusionn">Conclusion</h2>

Let's remember our original conditions, **in order of priority**, in order to find the best neighborhood in Barcelona for our future restaurant:
1. Close to a **metro** or **train station**, where the flow of people is high.
2. Have a **gym** closeby, as most of our customers come for lunch or dinner after training at the gym.
3. Have a **park** or **garden** closeby, where our customers like to have lunch.

The only cluster that perfectly meets our original conditions is **cluster 2**, which returnned the neighborhood **Torre Barró**.

In [28]:
df_bcn_final.loc[df_bcn_final['Cluster Labels'] == 1, df_bcn_final.columns[[0] + list(range(5, df_bcn_final.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
23,Torre Baró,Metro Station,Train Station,Gym,Park,Garden


Based on the data previously collected in the city of Madrid and the analysis ran on Barcelona's neighborhoods, we can safely say that the **Torre Barró** neighborhood is the best area to open a new **The Green Alternative** restaurant.