<a id='toc'></a>
<center><h1>The Battle of Neighborhood</font></h1></center>
<center>Segmenting and Clustering Neighborhoods of Madrid and Barcelona</center>  


# 0. Table of Contents
- 1_    [Introduction](#introduction)
- 2_    [Objectives](#objective)
- 3_    [Requiered Libs](#libs)    
- 4_    [Data](#data)
- 4.1_  [Data. Import Credentials to IBM WATSON Notebook](#data_credentials)
- 4.2_  [Data. Import pandas dataframe](#data_pandas)
- 5_    [Methodology](#methodology)
- 6_    [Foursquare. Credentials and version](#foursquare)
- 7_    [Madrid](#madrid)
- 7.1_  [Madrid Analyze](#amadrid)
- 7.2_  [Madrid: Getting coordinates and plot the map](#cmadrid) 
- 7.3_  [Madrid: Plotting all neighbourhood into map](#nmadrid) 
- 7.4_  [Madrid: Exploring the neigborhoods](#emadrid)
- 7.5_  [Madrid: Clustering the neigborhood](#kmadrid)         
- 8_    [Barcelona](#madrid)
- 8.1_  [Barcelona Analyze](#abarcelona)
- 8.2_  [Barcelona: Getting coordinates and plot the map](#cbarcelona)
- 8.3_  [Barcelona: Plotting all neighbourhood into map](#nbarcelona)  
- 8.4_  [Barcelona: Exploring the neigborhoods](#ebarcelona)
- 8.5_  [Barcelona: Clustering the neigborhood](#kbarcelona)
- 9_    [Results](#results)
- 10_   [Discussion](#discussion)
- 11_   [Conclusion](#conclusion)

<a id='introduction'></a>
# 1. Introduction

As Spain's two premier cities, you can't go wrong whichever you choose. But if you had to select one, which should it be: Madrid or Barcelona?

Brief information about both cities:

- <b>Madrid</b> is the capital of Spain and the largest municipality in both the Community of Madrid and Spain as a whole. The city has almost 3.3 million inhabitants and a metropolitan area population of approximately 6.5 million. It is the third-largest city in the European Union (EU), smaller than only London and Berlin, and its monocentric metropolitan area is the third-largest in the EU, smaller only than those of London and Paris.The municipality covers 604.3 km2 (233.3 sq mi). The Madrid urban agglomeration has the third-largest GDP in the European Union and its influence in politics, education, entertainment, environment, media, fashion, science, culture, and the arts all contribute to its status as one of the world's major global cities. Madrid is home to two world-famous football clubs, Real Madrid and Atlético Madrid. Due to its economic output, high standard of living, and market size, Madrid is considered the leading economic hub of the Iberian Peninsula and of Southern Europe. (source: https://en.wikipedia.org/wiki/Madrid)  



- <b>Barcelona</b> is one of the popular city in Spain. It is the capital and largest city of the autonomous community of Catalonia, as well as the second most populous municipality of Spain. With a population of 1.6 million within city limits, its urban area extends to numerous neighbouring municipalities within the Province of Barcelona and is home to around 4.8 million people, making it the sixth most populous urban area in the European Union after Paris, London, Madrid, the Ruhr area and Milan.[3] It is one of the largest metropolises on the Mediterranean Sea, located on the coast between the mouths of the rivers Llobregat and Besòs, and bounded to the west by the Serra de Collserola mountain range, the tallest peak of which is 512 metres (1,680 feet) high.  (source: https://en.wikipedia.org/wiki/Barcelona)  


<a id='objective'></a>
# 2. Objective
In this project, we will study in details the area classification using Foursquare data and machine learning segmentation and clustering.
The aim of this project is to segment areas of Madrid and Barcelona based on the most common places captured from Foursquare. 

 We will try to determine, using segmentation and clustering:

- a. The similarity or dissimilarirty of both cities
- b. Classification of area located inside the city, tourism places, whether it is residential and others

<a id='libs'></a>
# 3. Requiered Libraries

In [1]:
!pip install folium
import requests                                  # library to handle requests
import pandas as pd                              # library for data analsysis
import numpy as np                               # library to handle data in a vectorized manner
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium                                    # plotting library

import types
import ibm_boto3
from botocore.client import Config
from geopy import Nominatim                      # module to convert an address into latitude and longitude values
from IPython.display import Image                # libraries for displaying images
from IPython.core.display import HTML            # libraries for displaying images
from sklearn.cluster import KMeans
from pandas.io.json import json_normalize        # tranforming json file into a pandas dataframe library

 

Collecting folium
[?25l  Downloading https://files.pythonhosted.org/packages/72/ff/004bfe344150a064e558cb2aedeaa02ecbf75e60e148a55a9198f0c41765/folium-0.10.0-py2.py3-none-any.whl (91kB)
[K     |████████████████████████████████| 92kB 274kB/s eta 0:00:01
Collecting branca>=0.3.0 (from folium)
  Downloading https://files.pythonhosted.org/packages/63/36/1c93318e9653f4e414a2e0c3b98fc898b4970e939afeedeee6075dd3b703/branca-0.3.1-py3-none-any.whl
Installing collected packages: branca, folium
Successfully installed branca-0.3.1 folium-0.10.0


<a id='data'></a>
# 4. Data

The data acquired from: 

 URLs: https://opendata-ajuntament.barcelona.cat/data/es/organization/territori?q=Barrios&sort=fecha_publicacion+desc,
       https://datos.madrid.es/sites/v/index.jsp?vgnextoid=374512b9ace9f310VgnVCM100000171f5a0aRCRD&buscar=true&Texto=Distrito&Sector=&Formato=&Periodicidad=&orderByCombo=CONTENT_INSTANCE_NAME_DECODE   
 
 The data adquired from URL, was restructured for easier manipulation and reading into files:

 - barcelona.csv 
 - madrid.csv 
     
 This files were uploaded into Github repository:
 
 https://github.com/alcupe/ds_capstone_example


<a id='data_credentials'></a>
## 4.1 Data. Import Credentials to IBM WATSON Notebook


In [2]:
# Barcelona 
# @hidden_cell
# The following code contains the credentials for a file in your IBM Cloud Object Storage.
# You might want to remove those credentials before you share your notebook.
credentials_1 = {
    'IAM_SERVICE_ID': 'iam-ServiceId-ff6aa9f5-b3fe-4e98-bbf2-af312097ec44',
    'IBM_API_KEY_ID': '6KJAVnk2cpJtasN25yOsAY8axOdBVIvfvAmsHBE0vpJB',
    'ENDPOINT': 'https://s3.eu-geo.objectstorage.service.networklayer.com',
    'IBM_AUTH_ENDPOINT': 'https://iam.eu-gb.bluemix.net/oidc/token',
    'BUCKET': 'dscoursera-donotdelete-pr-0vrzjcrfstxmux',
    'FILE': 'BARCELONA.csv'
}


In [3]:
# Madrid
# @hidden_cell
# The following code contains the credentials for a file in your IBM Cloud Object Storage.
# You might want to remove those credentials before you share your notebook.
credentials_2 = {
    'IAM_SERVICE_ID': 'iam-ServiceId-ff6aa9f5-b3fe-4e98-bbf2-af312097ec44',
    'IBM_API_KEY_ID': '6KJAVnk2cpJtasN25yOsAY8axOdBVIvfvAmsHBE0vpJB',
    'ENDPOINT': 'https://s3.eu-geo.objectstorage.service.networklayer.com',
    'IBM_AUTH_ENDPOINT': 'https://iam.eu-gb.bluemix.net/oidc/token',
    'BUCKET': 'dscoursera-donotdelete-pr-0vrzjcrfstxmux',
    'FILE': 'MADRID.csv'
}


<a id='data_pandas'></a>
## 4.2 Data. Import pandas dataframe


In [4]:
def __iter__(self): return 0

# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share your notebook.
client_4c4528dd32ec472798b66ad7d2e3df91 = ibm_boto3.client(service_name='s3',
    ibm_api_key_id='6KJAVnk2cpJtasN25yOsAY8axOdBVIvfvAmsHBE0vpJB',
    ibm_auth_endpoint="https://iam.eu-gb.bluemix.net/oidc/token",
    config=Config(signature_version='oauth'),
    endpoint_url='https://s3.eu-geo.objectstorage.service.networklayer.com')


<a id='methodology'></a>
# 5.  Methodology

- Using the Foursquare API to explore neighborhoods in both cities, Madrid and Barcelona
- Exploring function to get the most common venue categories in each neighborhood, and then use this feature to group the neighborhoods into clusters
- K-means clustering algorithm will be use to complete this task.
- Using the Folium library to visualize the neighborhoods in Madrid and Barcelona their emerging clusters.

<a id='foursquare'></a>
# 6.  Foursquare. Credentials and version


In [5]:
CLIENT_ID = 'QL2OMYOY1B24UKBMDUGFY0HY0IKMSQIUCIPLHRK334LUQYZZ'
CLIENT_SECRET = 'K5KVFRWRKBQFTESD24TEQVG2EBWC4DCGO1HYHNS0WUOPFDTA' 
VERSION = '20180604'
LIMIT = 30

<a id='madrid'></a>
# 7.  Madrid


In [6]:
body = client_4c4528dd32ec472798b66ad7d2e3df91.get_object(Bucket='dscoursera-donotdelete-pr-0vrzjcrfstxmux',Key='MADRID.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

# MADRID dataframe
dfm= pd.read_csv(body, encoding='latin-1')
dfm.head()

Unnamed: 0,City,District,Neighborhood,Postal,Latitude,Longitude
0,Madrid,SALAMANCA,RECOLETOS,28001,40.41669,-3.700346
1,Madrid,CHAMARTIN,PROSPERIDAD,28002,40.443681,-3.673545
2,Madrid,CHAMARTIN,CIUDAD JARDIN,28002,40.451219,-3.667724
3,Madrid,CHAMARTIN,EL VISO,28002,40.444624,-3.678637
4,Madrid,CHAMBERI,RIOS ROSAS,28003,40.446613,-3.703114


<a id='amadrid'></a>
## 7.1 Madrid Analyze


In [7]:
print('Madrid has {} district and {} neighbohood.'.format(len(dfm['District'].unique()),dfm.shape[0]))

dfmg = dfm.groupby(["Postal", "District"])["Neighborhood"].apply(", ".join).reset_index()
dfmg.head()



Madrid has 22 district and 247 neighbohood.


Unnamed: 0,Postal,District,Neighborhood
0,28001,SALAMANCA,RECOLETOS
1,28002,CHAMARTIN,"PROSPERIDAD, CIUDAD JARDIN, EL VISO"
2,28003,CHAMBERI,"RIOS ROSAS, VALLEHERMOSO, VALLEHERMOSO"
3,28004,CENTRO,"JUSTICIA, UNIVERSIDAD"
4,28005,ARGANZUELA,"IMPERIAL, ACACIAS, IMPERIAL, ACACIAS, IMPERIAL"


<a id='cmadrid'></a>
## 7.2 Madrid: Getting coordinates and plot the map

In [8]:
address = "Madrid"
geolocator = Nominatim(user_agent="madrid_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('Coordinates of Madrid:  {}, {}'.format(latitude, longitude))
# create map using latitude and longitude values
madrid_m = folium.Map(location=[latitude, longitude], zoom_start=10)
madrid_m

Coordinates of Madrid:  40.4167047, -3.7035825


<a id='nmadrid'></a>
## 7.3 Madrid: Plotting all neighbourhood into map

In [9]:
for lat, lng, District, Neighborhood in zip(dfm['Latitude'], dfm['Longitude'], dfm['District'], dfm['Neighborhood']):
    label = '{}, {}'.format(Neighborhood, District)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(madrid_m)  
madrid_m

<a id='emadrid'></a>
## 7.4 Madrid: Exploring the neigborhoods


In [10]:
# Get the neighborhood's coordinates
neighborhood_n = dfm.loc[0, 'Neighborhood']
neighborhood_lat = dfm.loc[0, 'Latitude']
neighborhood_lon = dfm.loc[0, 'Longitude']



In [11]:
# Get the top 100 venues within a radius of 500 meters
# defining radius and limit of venues to get
radius=500
LIMIT=100
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, neighborhood_lat, neighborhood_lon, radius, LIMIT)
# get the result to a json file
results = requests.get(url).json()

In [12]:
# Clean the JSON and structure it into a pandas dataframe
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
    if len(categories_list) == 0: return None
    else:
        return categories_list[0]['name']


In [13]:
venues = results['response']['groups'][0]['items']

# flatten JSON
nearby_venues = json_normalize(venues) 

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues = nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
nearby_venues

Unnamed: 0,name,categories,lat,lng
0,La Cabaña Argentina,Argentinian Restaurant,40.415696,-3.698974
1,Salmon Gurú,Cocktail Bar,40.414867,-3.699532
2,Bacoa,Burger Joint,40.416698,-3.701682
3,Apple Puerta del Sol,Electronics Store,40.416804,-3.702221
4,Casino de Madrid,Casino,40.417705,-3.700037
5,La Pulpería de Victoria,Seafood Restaurant,40.416506,-3.701709
6,El Inti de Oro,Peruvian Restaurant,40.415751,-3.698969
7,Arrocería Marina Ventura,Paella Restaurant,40.415353,-3.698941
8,Plaza de Santa Ana,Plaza,40.414631,-3.701033
9,InClan Brutal Bar,Tapas Restaurant,40.415023,-3.701864


In [14]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood','Neighborhood Latitude','Neighborhood Longitude', 'Venue', 'Venue Latitude', 'Venue Longitude','Venue Category']
    
    return(nearby_venues)

In [15]:
# Venues for each neighborhood

madrid_v = getNearbyVenues(names = dfm['Neighborhood'], latitudes = dfm['Latitude'], longitudes = dfm['Longitude'])

RECOLETOS           
PROSPERIDAD
CIUDAD JARDIN
EL VISO
RIOS ROSAS
VALLEHERMOSO
VALLEHERMOSO
JUSTICIA
UNIVERSIDAD
EMBAJADORES
IMPERIAL
ACACIAS
IMPERIAL
PALACIO
ACACIAS
PALACIO
EMBAJADORES
IMPERIAL
PALACIO
EL VISO
EL VISO
CASTELLANA
ESTRELLA
SANTA EUGENIA
PACIFICO
ADELFAS
CASA DE CAMPO
CASA DE CAMPO
ARGUELLES
NIÑO JESUS
IBIZA
TRAFALGAR
LUCERO
PUERTA DEL ANGEL
PUERTA DEL ANGEL
PUERTA DEL ANGEL
EMBAJADORES
CORTES
PACIFICO
ARAPILES
ARAPILES
NUEVA ESPAÑA
NUEVA ESPAÑA
NUEVA ESPAÑA
PUEBLO NUEVO
VENTAS
VENTAS
VENTAS
PALOMERAS SURESTE
PALOMERAS SURESTE
PALOMERAS SURESTE
PALOMERAS SURESTE
PALOMERAS BAJAS
PALOMERAS BAJAS
PALOMERAS BAJAS
PALOMERAS BAJAS
COMILLAS
SAN ISIDRO
SAN ISIDRO
COMILLAS
OPAÑEL
CUATRO CAMINOS
CASTILLEJOS
LOS ROSALES
BUTARQUE
SAN CRISTOBAL
SAN ANDRES
BUTARQUE
BUTARQUE
SAN ANDRES
SAN CRISTOBAL
SAN ANDRES
SAN CRISTOBAL
LOS ANGELES
SAN CRISTOBAL
SAN ANDRES
SAN ANDRES
HELLIN
HELLIN
REJAS
ARCOS
ROSAS
CANILLEJAS
ROSAS
CANILLEJAS
ROSAS
ARAVACA
ARAVACA
LAS AGUILAS
ALUCHE
CAMPAMENTO
CAM

In [16]:
madrid_v.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,RECOLETOS,40.41669,-3.700346,La Cabaña Argentina,40.415696,-3.698974,Argentinian Restaurant
1,RECOLETOS,40.41669,-3.700346,Salmon Gurú,40.414867,-3.699532,Cocktail Bar
2,RECOLETOS,40.41669,-3.700346,Bacoa,40.416698,-3.701682,Burger Joint
3,RECOLETOS,40.41669,-3.700346,Apple Puerta del Sol,40.416804,-3.702221,Electronics Store
4,RECOLETOS,40.41669,-3.700346,Casino de Madrid,40.417705,-3.700037,Casino


In [17]:
madrid_v.shape

(5222, 7)

In [18]:
# Categories from all venues
madrid_v.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
ABRANTES,51,51,51,51,51,51
ACACIAS,137,137,137,137,137,137
ADELFAS,34,34,34,34,34,34
ALAMEDA DE OSUNA,23,23,23,23,23,23
ALMENARA,63,63,63,63,63,63
ALMENDRALES,33,33,33,33,33,33
ALUCHE,90,90,90,90,90,90
AMBROZ,11,11,11,11,11,11
APOSTOL SANTIAGO,7,7,7,7,7,7
ARAPILES,149,149,149,149,149,149


In [19]:
print('There are {} uniques categories.'.format(len(madrid_v['Venue Category'].unique())))

There are 276 uniques categories.


In [20]:
# Analyze each neighborhood
# one hot encoding
madrid_oh = pd.get_dummies(madrid_v[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
madrid_oh['Neighborhood'] = madrid_v['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [madrid_oh.columns[-1]] + list(madrid_oh.columns[:-1])
madrid_oh = madrid_oh[fixed_columns]

#examine the new dataframe size after one hot encoding
print('{} rows were returned after one hot encoding.'.format(madrid_oh.shape[0]))

#group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
madrid_g = madrid_oh.groupby('Neighborhood').mean().reset_index()

#examine the new dataframe size after one hot encoding
print('{} rows were returned after grouping.'.format(madrid_g.shape[0]))

madrid_oh.head()

5222 rows were returned after one hot encoding.
103 rows were returned after grouping.


Unnamed: 0,Yoga Studio,Accessories Store,African Restaurant,Airport,American Restaurant,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,...,Train Station,Turkish Restaurant,Used Bookstore,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store
0,0,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [21]:
madrid_oh.shape

(5222, 276)

In [22]:
#print each neighborhood along with the top 5 most common venues
num_top_venues = 5

for hood in madrid_g['Neighborhood']:
    print("----"+hood+"----")
    temp = madrid_g[madrid_g['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----ABRANTES----
          venue  freq
0         Plaza  0.10
1        Bakery  0.10
2           Bar  0.08
3  Soccer Field  0.08
4  Burger Joint  0.06


----ACACIAS----
                venue  freq
0  Spanish Restaurant  0.09
1    Tapas Restaurant  0.08
2                 Bar  0.07
3          Restaurant  0.06
4         Pizza Place  0.04


----ADELFAS----
               venue  freq
0      Grocery Store  0.06
1               Café  0.06
2                Bar  0.06
3        Pizza Place  0.06
4  Food & Drink Shop  0.06


----ALAMEDA DE OSUNA----
              venue  freq
0        Restaurant  0.13
1     Metro Station  0.09
2     Grocery Store  0.09
3  Tapas Restaurant  0.09
4             Hotel  0.04


----ALMENARA----
                      venue  freq
0        Spanish Restaurant  0.14
1                Restaurant  0.08
2        Italian Restaurant  0.05
3  Mediterranean Restaurant  0.05
4                     Hotel  0.05


----ALMENDRALES----
                venue  freq
0  Spanish Restaurant  0.12
1

In [23]:
madrid_gr = madrid_oh.groupby('Neighborhood').mean().reset_index()
madrid_gr.head()

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,African Restaurant,Airport,American Restaurant,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,...,Train Station,Turkish Restaurant,Used Bookstore,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store
0,ABRANTES,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,ACACIAS,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043796,...,0.0,0.0,0.0,0.007299,0.0,0.0,0.0,0.0,0.0,0.0
2,ADELFAS,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,ALAMEDA DE OSUNA,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,ALMENARA,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,...,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.0


In [24]:
def common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

In [25]:
num_top_venues = 10
indicators = ['st', 'nd', 'rd']
# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try: 
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
    neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
    neighborhoods_venues_sorted['Neighborhood'] = madrid_gr['Neighborhood']
       
for ind in np.arange(madrid_gr.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = common_venues(madrid_gr.iloc[ind, :], num_top_venues)

In [26]:
neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,ABRANTES,Plaza,Bakery,Soccer Field,Bar,Colombian Restaurant,Burger Joint,Brazilian Restaurant,Metro Station,Gym / Fitness Center,Park
1,ACACIAS,Spanish Restaurant,Tapas Restaurant,Bar,Restaurant,Café,Art Gallery,Indie Theater,Pizza Place,Theater,Market
2,ADELFAS,Grocery Store,Pizza Place,Food & Drink Shop,Bar,Café,Korean Restaurant,Chinese Restaurant,Farmers Market,Coffee Shop,Gift Shop
3,ALAMEDA DE OSUNA,Restaurant,Grocery Store,Metro Station,Tapas Restaurant,Mexican Restaurant,Bakery,Park,Coffee Shop,Asian Restaurant,Cocktail Bar
4,ALMENARA,Spanish Restaurant,Restaurant,Chinese Restaurant,Hotel,Mediterranean Restaurant,Italian Restaurant,Gym / Fitness Center,BBQ Joint,Fast Food Restaurant,Plaza


<a id='kmadrid'></a>
## 7.5 Madrid: Clustering the neigborhood

In [27]:
# Cluster neighborhoods
# Run k-means to cluster the neighborhood into 2 clusters.
# number of clusters
kclusters = 5
madrid_gc = madrid_gr.drop('Neighborhood', 1)
# run k-means 
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(madrid_gc)
# check cluster labels 
kmeans.labels_[0:10]

array([1, 1, 1, 0, 3, 3, 1, 0, 0, 3], dtype=int32)

In [28]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
neighborhoods_venues_sorted.head()

Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1,ABRANTES,Plaza,Bakery,Soccer Field,Bar,Colombian Restaurant,Burger Joint,Brazilian Restaurant,Metro Station,Gym / Fitness Center,Park
1,1,ACACIAS,Spanish Restaurant,Tapas Restaurant,Bar,Restaurant,Café,Art Gallery,Indie Theater,Pizza Place,Theater,Market
2,1,ADELFAS,Grocery Store,Pizza Place,Food & Drink Shop,Bar,Café,Korean Restaurant,Chinese Restaurant,Farmers Market,Coffee Shop,Gift Shop
3,0,ALAMEDA DE OSUNA,Restaurant,Grocery Store,Metro Station,Tapas Restaurant,Mexican Restaurant,Bakery,Park,Coffee Shop,Asian Restaurant,Cocktail Bar
4,3,ALMENARA,Spanish Restaurant,Restaurant,Chinese Restaurant,Hotel,Mediterranean Restaurant,Italian Restaurant,Gym / Fitness Center,BBQ Joint,Fast Food Restaurant,Plaza


In [29]:
madrid_m = dfm
# merging 
madrid_m = madrid_m.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
madrid_m.head()

Unnamed: 0,City,District,Neighborhood,Postal,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Madrid,SALAMANCA,RECOLETOS,28001,40.41669,-3.700346,3,Spanish Restaurant,Hotel,Restaurant,Tapas Restaurant,Plaza,Nightclub,Theater,Cocktail Bar,Deli / Bodega,Pub
1,Madrid,CHAMARTIN,PROSPERIDAD,28002,40.443681,-3.673545,1,Spanish Restaurant,Bar,Supermarket,Restaurant,Café,Tapas Restaurant,Gym,Metro Station,Coffee Shop,Plaza
2,Madrid,CHAMARTIN,CIUDAD JARDIN,28002,40.451219,-3.667724,3,Coffee Shop,Seafood Restaurant,Café,Bakery,Asian Restaurant,Tapas Restaurant,Rock Club,Music Venue,Pub,Thai Restaurant
3,Madrid,CHAMARTIN,EL VISO,28002,40.444624,-3.678637,3,Spanish Restaurant,Restaurant,Plaza,Coffee Shop,Bar,Theme Restaurant,Pizza Place,Snack Place,Burger Joint,Japanese Restaurant
4,Madrid,CHAMBERI,RIOS ROSAS,28003,40.446613,-3.703114,1,Spanish Restaurant,Bar,Tapas Restaurant,Restaurant,Grocery Store,Café,Supermarket,Latin American Restaurant,Pizza Place,Taco Place


In [30]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(
    madrid_m['Latitude'],
    madrid_m['Longitude'],
    madrid_m['Neighborhood'],
    madrid_m['Cluster Labels']):
    label = folium.Popup(str(poi) + 'Cluster' + str(cluster), parse_html=True)

    folium.CircleMarker(
                    [lat, lon],
                    radius = 5,
                    popup = label,
                    color = rainbow[cluster-1],
                    fill = True,
                    fill_color = rainbow[cluster-1],
                    fill_opacity=0.7).add_to(map_clusters)
map_clusters

In [33]:
# Madrid: Cluster 0
madrid_m.loc[madrid_m['Cluster Labels'] == 0, madrid_m.columns[[1] + list(range(5, madrid_m.shape[1]))]]

Unnamed: 0,District,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
48,PUENTE DE VALLECAS,-3.642605,0,Grocery Store,Pizza Place,Spanish Restaurant,Fast Food Restaurant,Café,Bar,Fruit & Vegetable Store,Chinese Restaurant,Park,Dessert Shop
49,PUENTE DE VALLECAS,-3.639028,0,Grocery Store,Pizza Place,Spanish Restaurant,Fast Food Restaurant,Café,Bar,Fruit & Vegetable Store,Chinese Restaurant,Park,Dessert Shop
50,PUENTE DE VALLECAS,-3.64395,0,Grocery Store,Pizza Place,Spanish Restaurant,Fast Food Restaurant,Café,Bar,Fruit & Vegetable Store,Chinese Restaurant,Park,Dessert Shop
51,PUENTE DE VALLECAS,-3.642006,0,Grocery Store,Pizza Place,Spanish Restaurant,Fast Food Restaurant,Café,Bar,Fruit & Vegetable Store,Chinese Restaurant,Park,Dessert Shop
57,CARABANCHEL,-3.72324,0,Grocery Store,Coffee Shop,Fast Food Restaurant,Moving Target,Stadium,Supermarket,Flea Market,Metro Station,Gym / Fitness Center,Plaza
58,CARABANCHEL,-3.736809,0,Grocery Store,Coffee Shop,Fast Food Restaurant,Moving Target,Stadium,Supermarket,Flea Market,Metro Station,Gym / Fitness Center,Plaza
63,VILLAVERDE,-3.686185,0,Metro Station,Pet Store,Spanish Restaurant,Snack Place,Pizza Place,Supermarket,Basketball Court,Bus Station,Shopping Mall,Brewery
65,VILLAVERDE,-3.687001,0,Furniture / Home Store,Train Station,Breakfast Spot,Snack Place,Park,Athletics & Sports,Restaurant,Smoke Shop,Food & Drink Shop,Locksmith
70,VILLAVERDE,-3.685296,0,Furniture / Home Store,Train Station,Breakfast Spot,Snack Place,Park,Athletics & Sports,Restaurant,Smoke Shop,Food & Drink Shop,Locksmith
72,VILLAVERDE,-3.689582,0,Furniture / Home Store,Train Station,Breakfast Spot,Snack Place,Park,Athletics & Sports,Restaurant,Smoke Shop,Food & Drink Shop,Locksmith


In [32]:
# Madrid: Cluster 1
madrid_m.loc[madrid_m['Cluster Labels'] == 1, madrid_m.columns[[1] + list(range(5, madrid_m.shape[1]))]]

Unnamed: 0,District,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,SALAMANCA,-3.700346,1,Spanish Restaurant,Hotel,Tapas Restaurant,Restaurant,Plaza,Nightclub,Theater,Cocktail Bar,Peruvian Restaurant,Pub
1,CHAMARTIN,-3.673545,1,Spanish Restaurant,Bar,Restaurant,Supermarket,Café,Grocery Store,Tapas Restaurant,Italian Restaurant,Bakery,Coffee Shop
2,CHAMARTIN,-3.667724,1,Coffee Shop,Bakery,Café,Seafood Restaurant,Tapas Restaurant,Asian Restaurant,Burger Joint,Breakfast Spot,Spanish Restaurant,Middle Eastern Restaurant
3,CHAMARTIN,-3.678637,1,Spanish Restaurant,Restaurant,Coffee Shop,Plaza,Bar,Burger Joint,Theme Restaurant,Snack Place,Japanese Restaurant,Nightclub
4,CHAMBERI,-3.703114,1,Spanish Restaurant,Tapas Restaurant,Bar,Grocery Store,Restaurant,Café,Italian Restaurant,Soccer Field,Supermarket,Bakery
5,CHAMBERI,-3.708835,1,Bar,Café,Spanish Restaurant,Beer Garden,Seafood Restaurant,Theater,Restaurant,Sandwich Place,Burrito Place,Pizza Place
6,CHAMBERI,-3.710968,1,Bar,Café,Spanish Restaurant,Beer Garden,Seafood Restaurant,Theater,Restaurant,Sandwich Place,Burrito Place,Pizza Place
7,CENTRO,-3.699723,1,Restaurant,Bar,Cocktail Bar,Plaza,Bookstore,Spanish Restaurant,Tapas Restaurant,Ice Cream Shop,Hotel,Italian Restaurant
8,CENTRO,-3.704582,1,Bar,Tapas Restaurant,Restaurant,Cocktail Bar,Bookstore,Café,Plaza,Gift Shop,Mediterranean Restaurant,Ice Cream Shop
9,CENTRO,-3.705528,1,Bar,Café,Tapas Restaurant,Spanish Restaurant,Theater,Coffee Shop,Restaurant,Plaza,Hotel,Art Gallery


In [37]:
# Madrid: Cluster 2
madrid_m.loc[madrid_m['Cluster Labels'] == 2, madrid_m.columns[[1] + list(range(5, madrid_m.shape[1]))]]

Unnamed: 0,District,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
32,LATINA,-3.749873,2,Theme Park Ride / Attraction,Restaurant,Supermarket,Tapas Restaurant,Snack Place,Theme Park,Seafood Restaurant,Park,Fast Food Restaurant,Falafel Restaurant


In [35]:
# Madrid: Cluster 3
madrid_m.loc[madrid_m['Cluster Labels'] == 3, madrid_m.columns[[1] + list(range(5, madrid_m.shape[1]))]]

Unnamed: 0,District,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,SALAMANCA,-3.700346,3,Spanish Restaurant,Hotel,Restaurant,Tapas Restaurant,Plaza,Nightclub,Theater,Cocktail Bar,Deli / Bodega,Pub
2,CHAMARTIN,-3.667724,3,Coffee Shop,Seafood Restaurant,Café,Bakery,Asian Restaurant,Tapas Restaurant,Rock Club,Music Venue,Pub,Thai Restaurant
3,CHAMARTIN,-3.678637,3,Spanish Restaurant,Restaurant,Plaza,Coffee Shop,Bar,Theme Restaurant,Pizza Place,Snack Place,Burger Joint,Japanese Restaurant
19,CHAMARTIN,-3.681618,3,Spanish Restaurant,Restaurant,Plaza,Coffee Shop,Bar,Theme Restaurant,Pizza Place,Snack Place,Burger Joint,Japanese Restaurant
20,CHAMARTIN,-3.686146,3,Spanish Restaurant,Restaurant,Plaza,Coffee Shop,Bar,Theme Restaurant,Pizza Place,Snack Place,Burger Joint,Japanese Restaurant
21,SALAMANCA,-3.679669,3,Spanish Restaurant,Restaurant,Coffee Shop,Tapas Restaurant,Hotel,Seafood Restaurant,Mediterranean Restaurant,Bar,Nightclub,Asian Restaurant
28,MONCLOA-ARAVACA,-3.714912,3,Hotel,Spanish Restaurant,Tapas Restaurant,Bar,Mexican Restaurant,Mediterranean Restaurant,Coffee Shop,Breakfast Spot,Restaurant,Movie Theater
30,RETIRO,-3.677873,3,Spanish Restaurant,Restaurant,Tapas Restaurant,Ice Cream Shop,Indian Restaurant,Seafood Restaurant,Mediterranean Restaurant,Gastropub,Bakery,Cupcake Shop
37,CENTRO,-3.692921,3,Spanish Restaurant,Restaurant,Bar,Hotel,Café,Art Museum,Garden,Tapas Restaurant,Breakfast Spot,Bookstore
39,CHAMBERI,-3.709739,3,Spanish Restaurant,Bar,Tapas Restaurant,Restaurant,Café,Theater,Supermarket,Italian Restaurant,Pub,Gastropub


In [38]:
# Madrid: Cluster 4
madrid_m.loc[madrid_m['Cluster Labels'] == 4, madrid_m.columns[[1] + list(range(5, madrid_m.shape[1]))]]

Unnamed: 0,District,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
64,VILLAVERDE,-3.674747,4,Park,Sandwich Place,Plaza,Sports Bar,Playground,Trail,Bar,Latin American Restaurant,Grocery Store,Flea Market
67,VILLAVERDE,-3.674981,4,Park,Sandwich Place,Plaza,Sports Bar,Playground,Trail,Bar,Latin American Restaurant,Grocery Store,Flea Market
68,VILLAVERDE,-3.677835,4,Park,Sandwich Place,Plaza,Sports Bar,Playground,Trail,Bar,Latin American Restaurant,Grocery Store,Flea Market


<a id='barcelona'></a>
# 8.  Barcelona

In [39]:
body = client_4c4528dd32ec472798b66ad7d2e3df91.get_object(Bucket='dscoursera-donotdelete-pr-0vrzjcrfstxmux',Key='BARCELONA.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

# BARCELONA dataframe
dfb = pd.read_csv(body, encoding='latin-1')
dfb.head()

Unnamed: 0,City,District,Neighborhood,Postal,Latitude,Longitude
0,Barcelona,Ciutat Vella,El Raval,8001,41.373601,2.168384
1,Barcelona,Ciutat Vella,El Barrio Gótico,8002,41.378367,2.175353
2,Barcelona,Ciutat Vella,La Barceloneta,8003,41.379057,2.190086
3,Barcelona,Ciutat Vella,"Sant Pere, Santa Caterina i la Ribera",8003,41.38669,2.177982
4,Barcelona,Eixample,El Fort Pienc,8013,41.390645,2.178475


<a id='abarcelona'></a>
## 8.1 Barcelona Analyze


In [40]:
print('Barcelona has {} district and {} neighbohood.'.format(len(dfb['District'].unique()),dfb.shape[0]))

dfbg = dfb.groupby(["Postal", "District"])["Neighborhood"].apply(", ".join).reset_index()
dfbg.head()


Barcelona has 10 district and 73 neighbohood.


Unnamed: 0,Postal,District,Neighborhood
0,8001,Ciutat Vella,El Raval
1,8002,Ciutat Vella,El Barrio Gótico
2,8003,Ciutat Vella,"La Barceloneta, Sant Pere, Santa Caterina i la..."
3,8004,Sants Montjuic,"El Poble Sec, La Font de la Guatlla"
4,8005,Sant Martí,"La Vila Olímpica del Poblenou, El Poblenou"


<a id='cmadrid'></a>
## 8.2 Barcelona: Getting coordinates and plot the map

In [41]:
address = "Barcelona"
geolocator = Nominatim(user_agent="barcelona_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('Coordinates of Barcelona:  {}, {}'.format(latitude, longitude))
# create map using latitude and longitude values
barcelona_m = folium.Map(location=[latitude, longitude], zoom_start=10)
barcelona_m

Coordinates of Barcelona:  41.3828939, 2.1774322


<a id='nbarcelona'></a>
## 8.3 Barcelona: Plotting all neighbourhood into map
    

In [42]:
for lat, lng, District, Neigborhood in zip(dfb['Latitude'], dfb['Longitude'], dfb['District'], dfb['Neighborhood']):
    label = '{}, {}'.format(Neigborhood, District)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(barcelona_m)  
barcelona_m

<a id='ebarcelona'></a>
## 8.4 Barcelona: Exploring the neigborhoods


In [43]:
neighborhood_n =   dfb.loc[0, 'Neighborhood']
neighborhood_lat = dfb.loc[0, 'Latitude']
neighborhood_lon = dfb.loc[0, 'Longitude']

In [44]:
# Get the top 100 venues within a radius of 500 meters
# defining radius and limit of venues to get
radius=500
LIMIT=100
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, neighborhood_lat, neighborhood_lon, radius, LIMIT)
# get the result to a json file
results = requests.get(url).json()

In [45]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
    if len(categories_list) == 0: return None
    else:
        return categories_list[0]['name']


In [46]:
venues = results['response']['groups'][0]['items']

# flatten JSON
nearby_venues = json_normalize(venues) 

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues = nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
nearby_venues

Unnamed: 0,name,categories,lat,lng
0,Palo Cortao,Tapas Restaurant,41.372803,2.167719
1,Koska,Tapas Restaurant,41.373124,2.166040
2,Teatre Victoria,Theater,41.374573,2.168835
3,Sala Apolo,Concert Hall,41.374355,2.169668
4,Quimet & Quimet,Tapas Restaurant,41.373997,2.165522
5,La Chana,Tapas Restaurant,41.374469,2.165774
6,La Confitería,Cocktail Bar,41.375520,2.167962
7,Abirradero,Brewery,41.374280,2.168565
8,La Tasqueta de Blai,Tapas Restaurant,41.373358,2.165361
9,Hotel Brummell,Hotel,41.371698,2.166362


In [47]:
# Explore neighborhoods
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood','Neighborhood Latitude','Neighborhood Longitude', 'Venue', 'Venue Latitude', 'Venue Longitude','Venue Category']
    
    return(nearby_venues)

In [48]:
# Venues for each neighborhood
barcelona_v = getNearbyVenues(names = dfb['Neighborhood'], latitudes = dfb['Latitude'], longitudes = dfb['Longitude'])

El Raval
El Barrio Gótico
La Barceloneta
Sant Pere, Santa Caterina i la Ribera
El Fort Pienc
La Sagrada Familia
La Dreta de l'Eixample
L'Antiga Esquerra de l'Eixample
La Nova Esquerra de l'Eixample
Sant Antoni
El Poble Sec
La Marina del Prat Vermell
La Marina de Port
La Font de la Guatlla
Hostafrancs
La Bordeta
Sants - Badal
Sants
Les Corts
La Maternitat i Sant Ramon
Pedralbes
Vallvidrera, el Tibidabo i les Planes
Sarriá
Les Tres Torres
Sant Gervasi - La Bonanova
Sant Gervasi - Galvany
El Putget i Farró
Vallcarca i els Penitents
El Coll
La Salut
La Vila de Gràcia
El Camp d'en Grassot i GrÃ cia Nova
El Baix Guinard
Can Baró
El Guinardó
La Font d'en Fargues
El Carmel
La Teixonera
Sant GenÃ­s dels Agudells
Montbau
La Vall d'Hebron
La Clota
Horta
Vilapicina i la Torre Llobeta
Porta
El Turó de la Peira
Can Peguera
La Guineueta
Canyelles
Les Roquetes
Verdun
La Prosperitat
La Trinitat Nova
Torre Baró
Ciutat Meridiana
Vallbona
La Trinitat Vella
Baro De Viver 
El Bon Pastor
Sant Andreu
La Sagre

In [49]:
barcelona_v.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,El Raval,41.373601,2.168384,Palo Cortao,41.372803,2.167719,Tapas Restaurant
1,El Raval,41.373601,2.168384,Koska,41.373124,2.16604,Tapas Restaurant
2,El Raval,41.373601,2.168384,Teatre Victoria,41.374573,2.168835,Theater
3,El Raval,41.373601,2.168384,Sala Apolo,41.374355,2.169668,Concert Hall
4,El Raval,41.373601,2.168384,Quimet & Quimet,41.373997,2.165522,Tapas Restaurant


In [50]:
barcelona_v.shape

(3120, 7)

In [51]:
# Categories from all venues
barcelona_v.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Provençals del Poblenou,77,77,77,77,77,77
Baro De Viver,4,4,4,4,4,4
Can Baró,28,28,28,28,28,28
Can Peguera,45,45,45,45,45,45
Canyelles,12,12,12,12,12,12
Ciutat Meridiana,6,6,6,6,6,6
Diagonal Mar i el Front MarÃ­tim del Poblenou,41,41,41,41,41,41
El Baix Guinard,59,59,59,59,59,59
El Barrio Gótico,100,100,100,100,100,100
El Besòs i el Maresm,29,29,29,29,29,29


In [52]:
print('There are {} uniques categories.'.format(len(barcelona_v['Venue Category'].unique())))

There are 279 uniques categories.


In [53]:
# Analyze each neighborhood

# one hot encoding
barcelona_oh = pd.get_dummies(barcelona_v[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
barcelona_oh['Neighborhood'] = barcelona_v['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [barcelona_oh.columns[-1]] + list(barcelona_oh.columns[:-1])
barcelona_oh = barcelona_oh[fixed_columns]

#examine the new dataframe size after one hot encoding
print('{} rows were returned after one hot encoding.'.format(barcelona_oh.shape[0]))

#group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
barcelona_g = barcelona_oh.groupby('Neighborhood').mean().reset_index()

#examine the new dataframe size after one hot encoding
print('{} rows were returned after grouping.'.format(barcelona_g.shape[0]))


3120 rows were returned after one hot encoding.
73 rows were returned after grouping.


In [54]:
barcelona_oh.shape

(3120, 279)

In [55]:
barcelona_gr = barcelona_oh.groupby('Neighborhood').mean().reset_index()
barcelona_gr.head()

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,American Restaurant,Amphitheater,Animal Shelter,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,...,University,Vacation Rental,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Winery,Women's Store
0,Provençals del Poblenou,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987
1,Baro De Viver,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Can Baró,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Can Peguera,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.022222,0.0
4,Canyelles,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [56]:
def common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]


In [57]:
num_top_venues = 10
indicators = ['st', 'nd', 'rd']
# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try: 
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
    neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
    neighborhoods_venues_sorted['Neighborhood'] = barcelona_gr['Neighborhood']
       
for ind in np.arange(barcelona_gr.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = common_venues(barcelona_gr.iloc[ind, :], num_top_venues)
    

In [58]:
neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Provençals del Poblenou,Clothing Store,Hotel,Café,Restaurant,Coffee Shop,Tapas Restaurant,Italian Restaurant,Mediterranean Restaurant,Diner,Burger Joint
1,Baro De Viver,Spanish Restaurant,Metro Station,Soccer Field,Tapas Restaurant,Flea Market,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market
2,Can Baró,Spanish Restaurant,Plaza,Tapas Restaurant,Dessert Shop,Grocery Store,Chinese Restaurant,Park,Soccer Field,Restaurant,Café
3,Can Peguera,Tapas Restaurant,Spanish Restaurant,Café,Breakfast Spot,Bakery,Pizza Place,Pub,Burger Joint,Sandwich Place,Supermarket
4,Canyelles,Market,Soccer Field,Food & Drink Shop,Café,Grocery Store,Mediterranean Restaurant,Plaza,Hot Spring,Metro Station,Food Court


In [59]:
#print each neighborhood along with the top 5 most common venues
num_top_venues = 5

for hood in barcelona_g['Neighborhood']:
    print("----"+hood+"----")
    temp = barcelona_g[barcelona_g['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

---- Provençals del Poblenou ----
                venue  freq
0               Hotel  0.06
1      Clothing Store  0.06
2                Café  0.06
3          Restaurant  0.05
4  Italian Restaurant  0.04


----Baro De Viver ----
                venue  freq
0    Tapas Restaurant  0.25
1       Metro Station  0.25
2        Soccer Field  0.25
3  Spanish Restaurant  0.25
4         Yoga Studio  0.00


----Can Baró----
                venue  freq
0  Spanish Restaurant  0.18
1       Grocery Store  0.07
2  Chinese Restaurant  0.07
3               Plaza  0.07
4    Tapas Restaurant  0.07


----Can Peguera----
                venue  freq
0    Tapas Restaurant  0.11
1  Spanish Restaurant  0.09
2         Pizza Place  0.04
3                Café  0.04
4                 Pub  0.04


----Canyelles----
           venue  freq
0   Soccer Field  0.17
1         Market  0.17
2     Hot Spring  0.08
3  Grocery Store  0.08
4           Café  0.08


----Ciutat Meridiana----
           venue  freq
0  Metro Station  0.

<a id='kbarcelona'></a>
## 8.5 Barcelona: Clustering the neigborhood

In [60]:
# Cluster neighborhoods
# Run k-means to cluster the neighborhood into 2 clusters.
# number of clusters
kclusters = 5
barcelona_gc = barcelona_gr.drop('Neighborhood', 1)
# run k-means 
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(barcelona_gc)
# check cluster labels 
kmeans.labels_[0:10]

array([1, 4, 4, 4, 4, 0, 1, 1, 4, 1], dtype=int32)

In [61]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
neighborhoods_venues_sorted.head()

Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1,Provençals del Poblenou,Clothing Store,Hotel,Café,Restaurant,Coffee Shop,Tapas Restaurant,Italian Restaurant,Mediterranean Restaurant,Diner,Burger Joint
1,4,Baro De Viver,Spanish Restaurant,Metro Station,Soccer Field,Tapas Restaurant,Flea Market,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market
2,4,Can Baró,Spanish Restaurant,Plaza,Tapas Restaurant,Dessert Shop,Grocery Store,Chinese Restaurant,Park,Soccer Field,Restaurant,Café
3,4,Can Peguera,Tapas Restaurant,Spanish Restaurant,Café,Breakfast Spot,Bakery,Pizza Place,Pub,Burger Joint,Sandwich Place,Supermarket
4,4,Canyelles,Market,Soccer Field,Food & Drink Shop,Café,Grocery Store,Mediterranean Restaurant,Plaza,Hot Spring,Metro Station,Food Court


In [62]:
barcelona_m = dfb
# merging
barcelona_m = barcelona_m.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
barcelona_m.head()

Unnamed: 0,City,District,Neighborhood,Postal,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Barcelona,Ciutat Vella,El Raval,8001,41.373601,2.168384,4,Tapas Restaurant,Hotel,Cocktail Bar,Brewery,Wine Bar,Mexican Restaurant,Theater,Hostel,Restaurant,Spanish Restaurant
1,Barcelona,Ciutat Vella,El Barrio Gótico,8002,41.378367,2.175353,4,Tapas Restaurant,Spanish Restaurant,Italian Restaurant,Bar,Hotel,Art Gallery,Plaza,Pizza Place,Cocktail Bar,Coffee Shop
2,Barcelona,Ciutat Vella,La Barceloneta,8003,41.379057,2.190086,4,Tapas Restaurant,Mediterranean Restaurant,Paella Restaurant,Bar,Seafood Restaurant,Spanish Restaurant,Restaurant,Burger Joint,Beach Bar,Pizza Place
3,Barcelona,Ciutat Vella,"Sant Pere, Santa Caterina i la Ribera",8003,41.38669,2.177982,1,Hotel,Tapas Restaurant,Café,Vegetarian / Vegan Restaurant,Bar,Spanish Restaurant,Plaza,Restaurant,Cocktail Bar,Italian Restaurant
4,Barcelona,Eixample,El Fort Pienc,8013,41.390645,2.178475,1,Hotel,Tapas Restaurant,Hostel,Bar,Coffee Shop,Chinese Restaurant,Vegetarian / Vegan Restaurant,Café,Italian Restaurant,Breakfast Spot


In [63]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(
    barcelona_m['Latitude'],
    barcelona_m['Longitude'],
    barcelona_m['Neighborhood'],
    barcelona_m['Cluster Labels']):
    label = folium.Popup(str(poi) + 'Cluster' + str(cluster), parse_html=True)

    folium.CircleMarker(
                    [lat, lon],
                    radius = 5,
                    popup = label,
                    color = rainbow[cluster-1],
                    fill = True,
                    fill_color = rainbow[cluster-1],
                    fill_opacity=0.7).add_to(map_clusters)
map_clusters

In [67]:
# Barcelona: Cluster 0
barcelona_m.loc[barcelona_m['Cluster Labels'] == 0, barcelona_m.columns[[1] + list(range(5, barcelona_m.shape[1]))]]

Unnamed: 0,District,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
54,Nou Barris,2.176489,0,Metro Station,Plaza,Park,Train Station,Supermarket,Fast Food Restaurant,Event Space,Exhibit,Falafel Restaurant,Farmers Market
55,Nou Barris,2.171685,0,Metro Station,Park,Mediterranean Restaurant,Building,Women's Store,Flea Market,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market
56,Nou Barris,2.183948,0,Plaza,Metro Station,Park,Train Station,Women's Store,Fish & Chips Shop,Exhibit,Falafel Restaurant,Farmers Market,Fast Food Restaurant


In [68]:
# Barcelona: Cluster 1
barcelona_m.loc[barcelona_m['Cluster Labels'] == 1, barcelona_m.columns[[1] + list(range(5, barcelona_m.shape[1]))]]

Unnamed: 0,District,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Ciutat Vella,2.177982,1,Hotel,Tapas Restaurant,Café,Vegetarian / Vegan Restaurant,Bar,Spanish Restaurant,Plaza,Restaurant,Cocktail Bar,Italian Restaurant
4,Eixample,2.178475,1,Hotel,Tapas Restaurant,Hostel,Bar,Coffee Shop,Chinese Restaurant,Vegetarian / Vegan Restaurant,Café,Italian Restaurant,Breakfast Spot
5,Eixample,2.175593,1,Café,Spanish Restaurant,Hotel,Mexican Restaurant,Grocery Store,Restaurant,Korean Restaurant,Plaza,Pizza Place,Coffee Shop
6,Eixample,2.164419,1,Hotel,Tapas Restaurant,Clothing Store,Spanish Restaurant,Boutique,Mediterranean Restaurant,Hostel,Restaurant,Furniture / Home Store,Cosmetics Shop
7,Eixample,2.158034,1,Hotel,Tapas Restaurant,Cocktail Bar,Mediterranean Restaurant,Spanish Restaurant,Café,Japanese Restaurant,Dessert Shop,Beer Bar,Restaurant
8,Eixample,2.153657,1,Café,Mediterranean Restaurant,Coffee Shop,Hotel,Tapas Restaurant,Restaurant,Brewery,Japanese Restaurant,Italian Restaurant,Bakery
12,Sants Montjuic,2.137948,1,Spanish Restaurant,Flea Market,Café,Bus Stop,Farmers Market,Restaurant,Diner,Tennis Court,Mediterranean Restaurant,Furniture / Home Store
15,Sants Montjuic,2.140996,1,Hotel,Food & Drink Shop,Supermarket,Peruvian Restaurant,Park,Spanish Restaurant,Sports Club,Dog Run,Restaurant,Tennis Court
16,Sants Montjuic,2.129023,1,Supermarket,Bakery,Mexican Restaurant,Bar,Tapas Restaurant,Pizza Place,Mediterranean Restaurant,Cheese Shop,Spa,Gym / Fitness Center
17,Sants Montjuic,2.135835,1,Tapas Restaurant,Bar,Mediterranean Restaurant,Bakery,Supermarket,Café,Italian Restaurant,Plaza,Gym / Fitness Center,Wine Bar


In [71]:
# Barcelona: Cluster 2
barcelona_m.loc[barcelona_m['Cluster Labels'] == 2, barcelona_m.columns[[1] + list(range(5, barcelona_m.shape[1]))]]

Unnamed: 0,District,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
39,Horta-Guinardó,2.13027,2,Scenic Lookout,Mountain,Women's Store,Flea Market,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flower Shop
53,Nou Barris,2.173977,2,Scenic Lookout,Women's Store,Fish Market,Exhibit,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Flea Market,Empanada Restaurant


In [72]:
# Barcelona: Cluster 3
barcelona_m.loc[barcelona_m['Cluster Labels'] == 3, barcelona_m.columns[[1] + list(range(5, barcelona_m.shape[1]))]]

Unnamed: 0,District,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Sants Montjuic,2.159555,3,Pier,Boat or Ferry,Art Gallery,Fountain,Food Truck,Food Stand,Food Court,Food & Drink Shop,Food,Event Space


In [73]:
# Barcelona: Cluster 4
barcelona_m.loc[barcelona_m['Cluster Labels'] == 4, barcelona_m.columns[[1] + list(range(5, barcelona_m.shape[1]))]]

Unnamed: 0,District,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Ciutat Vella,2.168384,4,Tapas Restaurant,Hotel,Cocktail Bar,Brewery,Wine Bar,Mexican Restaurant,Theater,Hostel,Restaurant,Spanish Restaurant
1,Ciutat Vella,2.175353,4,Tapas Restaurant,Spanish Restaurant,Italian Restaurant,Bar,Hotel,Art Gallery,Plaza,Pizza Place,Cocktail Bar,Coffee Shop
2,Ciutat Vella,2.190086,4,Tapas Restaurant,Mediterranean Restaurant,Paella Restaurant,Bar,Seafood Restaurant,Spanish Restaurant,Restaurant,Burger Joint,Beach Bar,Pizza Place
9,Eixample,2.164592,4,Tapas Restaurant,Bar,Café,Mediterranean Restaurant,Cocktail Bar,Pizza Place,Hotel,Spanish Restaurant,Brewery,Burger Joint
10,Sants Montjuic,2.15149,4,Plaza,Gym / Fitness Center,Stadium,Garden,Park,Skate Park,General College & University,Basketball Stadium,Rock Climbing Spot,Diner
13,Sants Montjuic,2.146517,4,Spanish Restaurant,Plaza,Hotel,Nightclub,Art Gallery,Breakfast Spot,Historic Site,Track,Tapas Restaurant,Café
14,Sants Montjuic,2.146076,4,Spanish Restaurant,Tapas Restaurant,Pizza Place,Mediterranean Restaurant,Restaurant,Plaza,Hotel,Track,Breakfast Spot,Japanese Restaurant
27,Gracia,2.133941,4,Bar,Cocktail Bar,Park,Plaza,Restaurant,Snack Place,Breakfast Spot,Spanish Restaurant,Animal Shelter,Mediterranean Restaurant
30,Gracia,2.154199,4,Bar,Tapas Restaurant,Plaza,Café,Restaurant,Mediterranean Restaurant,Gastropub,Cocktail Bar,Donut Shop,Theater
33,Gracia,2.159781,4,Spanish Restaurant,Plaza,Tapas Restaurant,Dessert Shop,Grocery Store,Chinese Restaurant,Park,Soccer Field,Restaurant,Café


<a id='results'></a>
# 9. Results

###  Madrid
    - Districts:                    22
    - Neigborhood:                  247
    - Uniques Categories:           270  
    
#### 1st. Most Common Venues
    - Cluster 0:  Grocery Store              
    - Cluster 1:  Spanish Restaurant
    - Cluster 2:  Spanish Restaurant
    - Cluster 3:  Theme Park Ride / Attraction
    - Cluster 4:  Café  
        
### Barcelona
    -Districts:                    10
    -Neigborhood:                  73
    -Uniques Categories:           278  
    
#### 1st. Most Common Venues  
    - Cluster 0:  Tapas restaurant             
    - Cluster 1:  Tapas restaurant
    - Cluster 2:  Scenic Lookout
    - Cluster 3:  Mountain
    - Cluster 4:  Park  
      
        
        
        



<a id='discussions'></a>
# 10. Discussions

- Are the results obtained sufficient to identify and distinguish different districts and describe the correlation of the most common places registered in Foursquare?
- In fact, similar cities may or may not have similar places. Then, an additional step in this classification would be to find a method to extract these common places and integrate spatial correlations between different neighborhoods or districts.
- However, the proposed segmentation and clustering is a first approach towards a quantitative and systematic comparison of the different cities.

<a id='conclusions'></a>
# 11. Conclusions

- 1. We can captured data of common places all around the world using Foursquare API
- 2. Barcelona despite having fewer districts and neighborhoods than Madrid, has a similar offer of venues to Madrid
- 3. More studies are needed to relate the acquired data and then observe them with more significant and objective results.

By Alberto Cuesta / a_cuesta@hotmail.com 