## IBM Data Science Professional Certificate
### Capstone Project - The Battle of Neighborhoods

Damien Azzopardi - July 2021

<h2>Table of Contents</h2>
<br>
<ol>
    <li><a href="introduction"><b>Introduction</b></a>
<br>
<br>
    <li><a href="data"><b>Data</b></a></li>
        <ul>
            <li><a href="neighborhoods_and_coordinates">Neighborhoods and coordinates</a>
            <li><a href="districts">Districts</a>    
            <li><a href="venues">Venues</a>     
        </ul>
<br>
    <li><a href="data_manipulation"><b>Data Extraction & Manipulation</b></a></li>
        <ul>
            <li><a href="scrap_neighborhoods">Scrap Barcelona's neighborhoods and coordinates</a>
            <li><a href="import_discticts">Import Barcelona's districts</a>
            <li><a href="map_neighborhoods">Map Barcelona's neighborhoods</a>
        </ul>

<h2 id="introduction">Introduction</h2>

**The Green Alternative** is a group of vegetarian restaurants, which started operating in Madrid, Spain, in 2010. We are currently running six different restaurants across different neighborhoods in Madrid, oriented towards locals. As our group is becoming successful in the spanish capital, this year, we would like to expand our operations and open a vegetarian restaurant in Barcelona.

The question we are trying to answer is; **what is the best neighborhood to open a vegetarian restaurant in Barcelona?**

After running a market research and looking into the data collected from our six current restaurants in Madrid, we found that our most successful locations are in neighborhoods which:
- Are close to a **metro** or **train station**, where the flow of people is high.
- Have a **park** or **garden** closeby, where our customers like to have lunch.
- Have a **gym** closeby, as most of our customers come for lunch or dinner after training at the gym.

Knowing this, we'll leverage the Foursquare location data in order to calculate the density of metro and train stations, parks, gardens, and gyms, for each neighborhood in Barcelona, and pick the one with higher density of selected venues to open our first vegetarian restaurant in the city of Barcelona.

<h2 id="data">Data</h2>

The data we will be using to help us answer our question comes from the following sources.

<h3 id="neighborhoods_and_coordinates">Neighborhoods ans coordinates</h3>

<h4>Metabolism of Cities</h4>

The full list of Barcelona's neighborhoods, along with their corresponding coordinates is available in [this](https://data.metabolismofcities.org/library/maps/577245/view/) page (*metabolismofcities.org*). It consists of a table with two rows, **Neighborhoods** and **Coordinates**. We will scrap the table containing the list of neighborhoods and coordinates directly in this workbook.


<h3 id="districts">Districts</h3>

<h4>Wikipedia</h4>

The full list of Barcelona's districts, along with their corresponding neirhborhoods is available in [this](https://en.wikipedia.org/wiki/Districts_of_Barcelona) page (*wikipedia.org*). We will export a CSV containing two rows, **Districts** and  **Neighborhoods**, that we will read directly in this workbook, and join it with the first dataset containing the **Neighborhoods** and **Coordinates**.


<h3 id="venues">Venues</h3>

<h4>Foursquare</h4>

We will leverage the Foursquare location data in order to calculate the density of the venues we have selected for the analysis. We will join it with the first two datasets containing the **District**, **Neighborhoods** and **Coordinates**.

<h2 id="data_manipulation">Data Extraction & Manipulation</h2>

In [1]:
# load libraries
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
#!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium

<h3 id="scrap_neighborhoods">Scrap Barcelona's neighborhoods and coordinates</h3>

The full list of Barcelona's neighborhoods, along with their corresponding coordinates is available in [this](https://data.metabolismofcities.org/library/maps/577245/view/) page (*metabolismofcities.org*).

In [2]:
# scrap Barcelona's neighborhoods and coordinates table
url = 'https://data.metabolismofcities.org/library/maps/577245/view/'

r = requests.get(url)
html = r.text

soup = BeautifulSoup(html, 'lxml')
table = soup.find('table')
rows = table.find_all('tr')
data = []
for row in rows[1:]:
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    data.append([ele for ele in cols if ele])

# convert to dataframe
df = pd.DataFrame(data)

# rename columns
df.columns = ['Neighborhood', 'Coordinates']

df.head()

Unnamed: 0,Neighborhood,Coordinates
0,Baró de Viver,"[41.44581467347341, 2.19899775842406]"
1,Can Baró,"[41.4167603624773, 2.1623865539676492]"
2,Can Peguera,"[41.43484212038238, 2.1664501320817235]"
3,Canyelles,"[41.445032990983854, 2.1634504252403164]"
4,Ciutat Meridiana,"[41.46120773644666, 2.1748476502321963]"


Split the 'Coordinates' column into two seperate 'Latitude' and 'Longitude' columns.

In [3]:
# split the 'Coordinates' column
df[['Latitude','Longitude']] = df.Coordinates.str.split(', ', expand = True)

# drop the 'Coordinates' column
df_bcn_neighborhoods = df.drop(['Coordinates'], axis = 1)

df_bcn_neighborhoods.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Baró de Viver,[41.44581467347341,2.19899775842406]
1,Can Baró,[41.4167603624773,2.1623865539676492]
2,Can Peguera,[41.43484212038238,2.1664501320817235]
3,Canyelles,[41.445032990983854,2.1634504252403164]
4,Ciutat Meridiana,[41.46120773644666,2.1748476502321963]


Remove the special characters in the 'Latitude' and 'Longitude' columns and check the colummns type.

In [4]:
# special characters to remove from the dataframe
spec_chars = ["[","]"]

# removing special characters from the 'Latitude' column
for char in spec_chars:
    df_bcn_neighborhoods['Latitude'] = df_bcn_neighborhoods['Latitude'].str.replace(char,'', regex=True)

# removing special characters from the 'Longitude' column
for char in spec_chars:
    df_bcn_neighborhoods['Longitude'] = df_bcn_neighborhoods['Longitude'].str.replace(char,'', regex=True)

# check column type
df_bcn_neighborhoods.dtypes

Neighborhood    object
Latitude        object
Longitude       object
dtype: object

Modify the 'Latitude' and 'Longitude' columns type to **float** so they can be properly properly used forward.

In [5]:
# change column type 
df_bcn_neighborhoods = df_bcn_neighborhoods.astype({"Neighborhood": str, "Latitude": float, "Longitude": float})

# check column type
df_bcn_neighborhoods.dtypes

Neighborhood     object
Latitude        float64
Longitude       float64
dtype: object

Check the final dataset containing each neighborhood along with its corresponding latitude and longitude.

In [6]:
df_bcn_neighborhoods.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Baró de Viver,41.445815,2.198998
1,Can Baró,41.41676,2.162387
2,Can Peguera,41.434842,2.16645
3,Canyelles,41.445033,2.16345
4,Ciutat Meridiana,41.461208,2.174848


<h3 id="import_discticts">Import Barcelona's districts</h3>

The full list of Barcelona's districts, along with their corresponding neirhborhoods is available in [this](https://en.wikipedia.org/wiki/Districts_of_Barcelona) page (*wikipedia.org*).

In [7]:
# import dataset with Districts
df_bcn_districts = pd.read_csv("/Users/damienazzopardi/Documents/GitHub/Coursera_Capstone/Districts_Barcelona.csv")

df_bcn_districts.head()

Unnamed: 0,Neighborhoods,District
0,Baró de Viver,Sant Andreu
1,Can Baró,Horta-Guinardó
2,Can Peguera,Nou Barris
3,Canyelles,Nou Barris
4,Ciutat Meridiana,Nou Barris


Merge both datasets containing neighborhoods, coordinates and districts together.

In [8]:
# merge both dataframes into one
df_bcn = pd.merge(df_bcn_neighborhoods, df_bcn_districts, how = 'left', left_on = 'Neighborhood', right_on = 'Neighborhoods')
df_bcn.drop("Neighborhoods", axis = 1, inplace = True)

df_bcn.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,District
0,Baró de Viver,41.445815,2.198998,Sant Andreu
1,Can Baró,41.41676,2.162387,Horta-Guinardó
2,Can Peguera,41.434842,2.16645,Nou Barris
3,Canyelles,41.445033,2.16345,Nou Barris
4,Ciutat Meridiana,41.461208,2.174848,Nou Barris


<h3 id="map_neighborhoods">Map Barcelona's neighborhoods</h3>

Define an instance of the geocoder for Barcelona.

In [9]:
address = 'Barcelona, Spain'
geolocator = Nominatim(user_agent="barcelona_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

print('The geograpical coordinate of Barcelona are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Barcelona are 41.3828939, 2.1774322.


Create a map of Barcelona with neighborhoods superimposed on top.

In [18]:
# create map of Barcelona using latitude and longitude values
map_bcn = folium.Map(location=[latitude, longitude], tiles="Stamen Terrain", zoom_start=12)

# add neighborhoods markers to map
for lat, lng, district, neighborhoods in zip(df_bcn['Latitude'], df_bcn['Longitude'], df_bcn['District'], df_bcn['Neighborhood']):
    label = '{}, {}'.format(neighborhoods, district)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_bcn)  
    
map_bcn