## IBM Data Science Professional Certificate
### Capstone Project - The Battle of Neighborhoods

Damien Azzopardi - July 2021

<h2>Table of Contents</h2>

<ol>
    <li><a href="#introduction"><b>Introduction</b></a>
        <ul>
            <li><a href="#business_problem">Business Problem</a>          
        </ul>
<br>
<br>
    <li><a href="#data"><b>Data</b></a></li>
        <ul>
            <li><a href="#data_1">Data 1</a>     
        </ul>
<hr>

<h2 id="introduction">Introduction</h2>

Clearly define a problem or an idea of your choice, where you would need to leverage the Foursquare location data to solve or execute. Remember that data science problems always target an audience and are meant to help a group of stakeholders solve a problem, so make sure that you explicitly describe your audience and why they would care about your problem.

<h3 id="business_problem">Business Problem</h3>

XXX

<h2 id="data">Data</h2>

Describe the data that you will be using to solve the problem or execute your idea. Remember that you will need to use the Foursquare location data to solve the problem or execute your idea. You can absolutely use other datasets in combination with the Foursquare location data. So make sure that you provide adequate explanation and discussion, with examples, of the data that you will be using, even if it is only Foursquare location data.

- **Neighborhoods list**:
The data containing the list of neighborhoods in Barcelona is coming from the [Districts of Barcelona Wikipedia page](https://en.wikipedia.org/wiki/Districts_of_Barcelona). The data manipulation required in order to scrap and get the list of neighborhoods in the proper format will be done directly in the workbook.


- **Neighborhoods coordinates**:

https://data.metabolismofcities.org/library/maps/577245/view/

https://data.metabolismofcities.org/referencespaces/view/577264/

XXX

- **Foursquare**:

<h2 id="xxx">Segmenting and Clustering Neighborhoods in Barcelona</h2>

### Data extraction and manipulation

In [1]:
# load libraries
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
#!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium

<h3 id="xxx">1. Scrap Barcelona's neighborhoods and coordinates</h3>

The full list of Barcelona's neighborhoods, along with their corresponding coordinates is available in [this Metabolism of Cities page](https://data.metabolismofcities.org/library/maps/577245/view/).

In [176]:
# scrap Barcelona's neighborhoods and coordinates table
url = 'https://data.metabolismofcities.org/library/maps/577245/view/'

r = requests.get(url)
html = r.text

soup = BeautifulSoup(html, 'lxml')
table = soup.find('table')
rows = table.find_all('tr')
data = []
for row in rows[1:]:
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    data.append([ele for ele in cols if ele])

# convert to dataframe
df = pd.DataFrame(data)

# rename columns
df.columns = ['Neighborhoods', 'Coordinates']

df.head()

Unnamed: 0,Neighborhoods,Coordinates
0,Baró de Viver,"[41.44581467347341, 2.19899775842406]"
1,Can Baró,"[41.4167603624773, 2.1623865539676492]"
2,Can Peguera,"[41.43484212038238, 2.1664501320817235]"
3,Canyelles,"[41.445032990983854, 2.1634504252403164]"
4,Ciutat Meridiana,"[41.46120773644666, 2.1748476502321963]"


In [177]:
# split the 'Coordinates' column into two new columns 'Latitude' and 'Longitude'
df[['Latitude','Longitude']] = df.Coordinates.str.split(', ', expand = True)

df.head()

Unnamed: 0,Neighborhoods,Coordinates,Latitude,Longitude
0,Baró de Viver,"[41.44581467347341, 2.19899775842406]",[41.44581467347341,2.19899775842406]
1,Can Baró,"[41.4167603624773, 2.1623865539676492]",[41.4167603624773,2.1623865539676492]
2,Can Peguera,"[41.43484212038238, 2.1664501320817235]",[41.43484212038238,2.1664501320817235]
3,Canyelles,"[41.445032990983854, 2.1634504252403164]",[41.445032990983854,2.1634504252403164]
4,Ciutat Meridiana,"[41.46120773644666, 2.1748476502321963]",[41.46120773644666,2.1748476502321963]


In [178]:
# drop the 'Coordinates' column
df_bcn = df.drop(['Coordinates'], axis = 1)

df_bcn.head()

Unnamed: 0,Neighborhoods,Latitude,Longitude
0,Baró de Viver,[41.44581467347341,2.19899775842406]
1,Can Baró,[41.4167603624773,2.1623865539676492]
2,Can Peguera,[41.43484212038238,2.1664501320817235]
3,Canyelles,[41.445032990983854,2.1634504252403164]
4,Ciutat Meridiana,[41.46120773644666,2.1748476502321963]


In [179]:
# special characters to remove from the dataframe
spec_chars = ["[","]"]

# removing special characters from the 'Latitude' column
for char in spec_chars:
    df_bcn['Latitude'] = df_bcn['Latitude'].str.replace(char,'', regex=True)

# removing special characters from the 'Longitude column'    
for char in spec_chars:
    df_bcn['Longitude'] = df_bcn['Longitude'].str.replace(char,'', regex=True)

df_bcn.head()

Unnamed: 0,Neighborhoods,Latitude,Longitude
0,Baró de Viver,41.44581467347341,2.19899775842406
1,Can Baró,41.4167603624773,2.162386553967649
2,Can Peguera,41.43484212038238,2.1664501320817235
3,Canyelles,41.445032990983854,2.1634504252403164
4,Ciutat Meridiana,41.46120773644666,2.1748476502321963


In [180]:
# check column type
df_bcn.dtypes

Neighborhoods    object
Latitude         object
Longitude        object
dtype: object

In [181]:
# change column type 
df_bcn = df_bcn.astype({"Neighborhoods": str, "Latitude": float, "Longitude": float})

# check column type
df_bcn.dtypes

Neighborhoods     object
Latitude         float64
Longitude        float64
dtype: object

In [182]:
# import dataset with Districts
df_bcn_districts = pd.read_csv("/Users/damienazzopardi/Documents/GitHub/Coursera_Capstone/Districts_Barcelona.csv")
df_bcn_districts.head()

Unnamed: 0,Neighborhood,District
0,Baró de Viver,Sant Andreu
1,Can Baró,Horta-Guinardó
2,Can Peguera,Nou Barris
3,Canyelles,Nou Barris
4,Ciutat Meridiana,Nou Barris


In [183]:
# merge both dataframes into one
df_barcelona = pd.merge(df_bcn, df_bcn_districts, how = 'left', left_on = 'Neighborhoods', right_on = 'Neighborhood')
df_barcelona.drop("Neighborhood", axis = 1, inplace = True)
df_barcelona.head()

Unnamed: 0,Neighborhoods,Latitude,Longitude,District
0,Baró de Viver,41.445815,2.198998,Sant Andreu
1,Can Baró,41.41676,2.162387,Horta-Guinardó
2,Can Peguera,41.434842,2.16645,Nou Barris
3,Canyelles,41.445033,2.16345,Nou Barris
4,Ciutat Meridiana,41.461208,2.174848,Nou Barris


In [185]:
address = 'Barcelona, Spain'

geolocator = Nominatim(user_agent="barcelona_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Barcelona are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Barcelona are 41.3828939, 2.1774322.


In [186]:
# create map of Toronto using latitude and longitude values
map_barcelona = folium.Map(location=[latitude, longitude], zoom_start=12)

# add neighborhoods markers to map
for lat, lng, district, neighborhoods in zip(df_barcelona['Latitude'], df_barcelona['Longitude'], df_barcelona['District'], df_barcelona['Neighborhoods']):
    label = '{}, {}'.format(neighborhoods, district)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_barcelona)  
    
map_barcelona