## A Battle of Neighbourhoods - Clustering the Neighbourhoods of Barcelona and Madrid


Nehaal Patel

To view the notebook with full maps :https://nbviewer.jupyter.org/github/Nehaal-Patel/Coursera_Capstone/blob/master/Battle-of-Neighbourhoods-Barcelona-Madrid.ipynb

# 1. Introduction


Two of the biggest culturally vibrant and popular cities in Spain are Madrid and Barcelona. There are many distinction in and difference in the cultures Madrid being the Spanish Captial and Barcelona being the capital of the autonomous community of Catalonia.

Both Barcelona and Madrid are quite popular tourist attraction and vacation destination for people all around the world. They are both diverse and multicutural cities that offer a lot of unique experiences that is widely sought after. In this project we will try to we will try to group neighbourhoods of Barcelona and Madrid and compare and contrast what each city has to offer that is unique. Since Madrid and Barcelona are provinces they have over 500 municipalites, so for the purpose of this project we will just focus on the main cities for the purposes of this project

# 2. Business Problem

The goal is to help tourists choose their destinations depending on the unique experiences that each of te neighbourhood in the city has to offer. Based on the experiences a tourist or a person can choose where to go to get that sort of experience that they are looking for and can also help in terms of planning ahead for the trip. Our findings will offer beyond tourist attractions, we aim to offer entertainment venues, unique cuisines that are available in the neighbourhoods.  

# 3. Data

We require geographical location data for both Barcelona and Madrid. Each city will have a neighbourhood code and the district code, along with the name which then we can use to determine coordinates. Using coordinates we use can find out the  venues and their most popular venue categories for each neighborhood.





### Barcelona Data:

To derive the geographical data of Barcelona , we will visit the Open Data Service portal that has been provided by the Barcelona City governement in csv format: https://opendata-ajuntament.barcelona.cat/data/dataset/808daafa-d9ce-48c0-925a-fa5afdb1ed41/resource/4cc59b76-a977-40ac-8748-61217c8ff367/download/districtes_i_barris_170705.csv

The CSV file has data about all the neighbourhoods in Barcelona
***
1. *CODI_DISTRICTE*: District Code 
2. *NOM_DISTRICTE*: District Name
3. *CODI_BARRI*: Neighbourhood Code
4. *NOM_BARRI*: Neighbourhood Name
***


### Madrid Data:

To derive geographical data of Madrid, we will visit the Open Data Service portal that has been provided by the Madrid City governement in csv format: https://datos.madrid.es/egob/catalogo/200078-1-distritos-barrios.csv

The CSV file has data about all the neighbourhoods in Madrid we will only be leveraging the following attributes if necassary:
***
1. *Codigo de barrio*: Neighbourhood Code rename to: *CODI_BARRI*
2. *Codigo de distrito al que pertenece*: District code to which it belongs rename to: *CODI_DISTRICTE*
3. *Nombre de barrio*: Neighbourhood Name rename to: *NOM_BARRI*
***

###  Geocoder/ArcGIS API

ArcGIS Online enables you to connect people, locations, and data using interactive maps. Work with smart, data-driven styles and intuitive analysis tools that deliver location intelligence. Share your insights with the world or specific groups.



For our project purposes we will use the Geocoder package to call on ArcGIS API to retrieve the coordinates of all of the neighbourhood in both cities, and add the following attributes to both dataset:
***
1. *Latitude*: Latitude for Neighbourhood
2. *Longitude*: Longitude for Neighbourhood
***

### Venue Data using Foursquare API

We will need data about different venues in each neighbourhood. In order to gain that information we will leverage Foursquare api to provide information on all manner of venues and events within an area of interest.I nformation such as venue names, locations, menus and even photos. 

We will connect to the Foursquare API to gather information about venues inside each and every neighbourhood. For each neighbourhood, we have chosen the radius to be 500 meters.

The data retrieved from Foursquare contained information of venues within a specified distance of the longitude and latitude of the neighbourhood . The information obtained per venue as follows:

1. *Neighbourhood* : Name of the Neighbourhood
2. *Neighbourhood Latitude* : Latitude of the Neighbourhood
3. *Neighbourhood Longitude* : Longitude of the Neighbourhood
4. *Venue* : Name of the Venue
5. *Venue Latitude* : Latitude of Venue
6. *Venue Longitude* : Longitude of Venue
7. *Venue Category* : Category of Venue

Based on all the information collected for both Barcelona and Madrid, we will have sufficient data to build our model. We clustered  the neighbourhoods together based on similar venue categories. We will then present our observations and findings. Using this data, our stakeholders can take the necessary decision.



# 4. Methodology
We will be creating our model with the help of Python so we start off by importing all the required packages.

In [2]:
import pandas as pd 
import geocoder
from geopy.geocoders import Nominatim
pd.options.display.max_rows = 10000
import requests
import numpy as np
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium
from sklearn.cluster import KMeans


Python Packages:

- Pandas : to read manipulate data in csv and then data analysis
- geocoder: to retrieve the coordinates for each neighborhood 
- requests : Handle http requests
- matplotlib : Detailing the generated maps
- folium : Generating maps of London and Paris
- sklearn : To import Kmeans clustering machine learning model method

# Exploring Barcelona 

### Data Collection

We begin by reading the csv data and collecting coordinates on neighbourhoods of Barcelona

In [3]:
!wget -q -O 'barcelona_data.csv' https://opendata-ajuntament.barcelona.cat/data/dataset/808daafa-d9ce-48c0-925a-fa5afdb1ed41/resource/4cc59b76-a977-40ac-8748-61217c8ff367/download/districtes_i_barris_170705.csv
print('Data downloaded!')

Data downloaded!


In [4]:
dfb = pd.read_csv('barcelona_data.csv')
dfb.head()

Unnamed: 0,CODI_DISTRICTE,NOM_DISTRICTE,CODI_BARRI,NOM_BARRI
0,1,Ciutat Vella,1,el Raval
1,1,Ciutat Vella,2,el Barri Gòtic
2,1,Ciutat Vella,3,la Barceloneta
3,1,Ciutat Vella,4,"Sant Pere, Santa Caterina i la Ribera"
4,2,Eixample,5,el Fort Pienc


In [5]:
dfb.keys()

Index(['CODI_DISTRICTE', 'NOM_DISTRICTE', 'CODI_BARRI', 'NOM_BARRI'], dtype='object')

### Geolocations of the Barcelona Neighbourhoods

We will use the geocoder library and leverage ArcGis API to obtain coordinates of each of the neighourhoods, inputs will be City, Province (Barcelona, Barcelona) and Name of the neighbourhood (NOM_BARRI)

In [6]:
def get_latlng(NOM_BARRI):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Barcelona, Barcelona'.format(NOM_BARRI))
        lat_lng_coords = g.latlng
    return lat_lng_coords
# Call the function to get the coordinates, store in a new list using list comprehension
coords = [ get_latlng(NOM_BARRI) for NOM_BARRI in dfb["NOM_BARRI"].tolist()]

We will place the coordinates in a seperate dataframe and then merge them with our original data

In [7]:
dfb_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])
# Merge the coordinates into the original dataframe
dfb['Latitude'] = dfb_coords['Latitude']
dfb['Longitude'] = dfb_coords['Longitude']
print(dfb.shape)

(73, 6)


In [8]:
dfb

Unnamed: 0,CODI_DISTRICTE,NOM_DISTRICTE,CODI_BARRI,NOM_BARRI,Latitude,Longitude
0,1,Ciutat Vella,1,el Raval,41.37763,2.17145
1,1,Ciutat Vella,2,el Barri Gòtic,41.38042,2.17992
2,1,Ciutat Vella,3,la Barceloneta,41.38185,2.19151
3,1,Ciutat Vella,4,"Sant Pere, Santa Caterina i la Ribera",41.389547,2.179
4,2,Eixample,5,el Fort Pienc,41.39519,2.18307
5,2,Eixample,6,la Sagrada Família,41.40408,2.17623
6,2,Eixample,7,la Dreta de l'Eixample,41.39498,2.16748
7,2,Eixample,8,l'Antiga Esquerra de l'Eixample,41.39069,2.14494
8,2,Eixample,9,la Nova Esquerra de l'Eixample,41.38571,2.14254
9,2,Eixample,10,Sant Antoni,41.37564,2.15923


## Map of Barcelona with all of the Neighbourhoods

We will use the Nominatim and folium library to generate the coordinates of city first and then map the neighbourhoods

In [9]:
address = 'Barcelona, Spain'

geolocator = Nominatim(user_agent="barcelona_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The coordinates of Barcelona are {}, {}.'.format(latitude, longitude))

The coordinates of Barcelona are 41.3828939, 2.1774322.


In [10]:
map_Barcelona = folium.Map(location=[latitude, longitude], zoom_start=11.4)

# adding markers to map
for latitude, longitude, NOM_DISTRICTE, NOM_BARRI in zip(dfb['Latitude'], dfb['Longitude'], dfb['NOM_BARRI'], dfb['NOM_DISTRICTE']):
    label = '{}, {}'.format(NOM_DISTRICTE, NOM_BARRI)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color='blue',
        fill=True
        ).add_to(map_Barcelona)  
    
map_Barcelona

### Venues in Barcelona


Now we will utilize Foursquare API to get all of the venues  in each neighbourhood.
First we enter and define Foursquare API credentials

In [11]:
CLIENT_ID = 'LDCRRZGOPCB3PXVGQCPCHFP1TXDTPMWCM2J3ZVLUQJC35EXW' 
CLIENT_SECRET = 'IJ2PLHTJ0G1VSIQB23B2U2B3VJDXVPSILD1TYGUUC2THLEOI'
VERSION = '20201205' # Foursquare API version


print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: LDCRRZGOPCB3PXVGQCPCHFP1TXDTPMWCM2J3ZVLUQJC35EXW
CLIENT_SECRET:IJ2PLHTJ0G1VSIQB23B2U2B3VJDXVPSILD1TYGUUC2THLEOI


Defining a function to get the nearby venues in the neighbourhood.



In [12]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius
            )
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Category']
    
    return(nearby_venues)

Getting all of the venues in Barcelona

In [13]:
barcelona_venues = getNearbyVenues(dfb['NOM_BARRI'], dfb['Latitude'], dfb['Longitude'])


el Raval
el Barri Gòtic
la Barceloneta
Sant Pere, Santa Caterina i la Ribera
el Fort Pienc
la Sagrada Família
la Dreta de l'Eixample
l'Antiga Esquerra de l'Eixample
la Nova Esquerra de l'Eixample
Sant Antoni
el Poble Sec
la Marina del Prat Vermell
la Marina de Port
la Font de la Guatlla
Hostafrancs
la Bordeta
Sants - Badal
Sants
les Corts
la Maternitat i Sant Ramon
Pedralbes
Vallvidrera, el Tibidabo i les Planes
Sarrià
les Tres Torres
Sant Gervasi - la Bonanova
Sant Gervasi - Galvany
el Putxet i el Farró
Vallcarca i els Penitents
el Coll
la Salut
la Vila de Gràcia
el Camp d'en Grassot i Gràcia Nova
el Baix Guinardó
Can Baró
el Guinardó
la Font d'en Fargues
el Carmel
la Teixonera
Sant Genís dels Agudells
Montbau
la Vall d'Hebron
la Clota
Horta
Vilapicina i la Torre Llobeta
Porta
el Turó de la Peira
Can Peguera
la Guineueta
Canyelles
les Roquetes
Verdun
la Prosperitat
la Trinitat Nova
Torre Baró
Ciutat Meridiana
Vallbona
la Trinitat Vella
Baró de Viver
el Bon Pastor
Sant Andreu
la Sagrer

In [14]:
barcelona_venues.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Category
0,el Raval,41.37763,2.17145,Pizza Circus,Pizza Place
1,el Raval,41.37763,2.17145,Filmoteca de Catalunya,Movie Theater
2,el Raval,41.37763,2.17145,Cassette Bar,Bar
3,el Raval,41.37763,2.17145,Cañete,Tapas Restaurant
4,el Raval,41.37763,2.17145,Frankie Gallo Cha Cha Cha,Pizza Place


In [15]:
barcelona_venues.shape

(1685, 5)

### Grouping by Venue Categories


In [16]:
barcelona_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Baró de Viver,4,4,4,4
Can Baró,22,22,22,22
Can Peguera,13,13,13,13
Canyelles,10,10,10,10
Ciutat Meridiana,8,8,8,8
Diagonal Mar i el Front Marítim del Poblenou,30,30,30,30
Horta,10,10,10,10
Hostafrancs,30,30,30,30
Montbau,17,17,17,17
Navas,30,30,30,30


In [17]:
print('There are {} uniques categories.'.format(len(barcelona_venues['Venue Category'].unique())))

There are 223 uniques categories.


### Analyze each Neighbourhood


In [18]:
# one hot encoding
barcelona_onehot = pd.get_dummies(barcelona_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
barcelona_onehot['Neighbourhood'] = barcelona_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [barcelona_onehot.columns[-1]] + list(barcelona_onehot.columns[:-1])
barcelona_onehot = barcelona_onehot[fixed_columns]

barcelona_onehot.head()

Unnamed: 0,Neighbourhood,African Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,...,Train Station,Tram Station,Transportation Service,Tunnel,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop,Winery,Yoga Studio
0,el Raval,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,el Raval,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,el Raval,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,el Raval,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,el Raval,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


We will group the Neighbourhoods and calculate the mean venue categories value in each Neighbourhood



In [19]:
barcelona_grouped = barcelona_onehot.groupby('Neighbourhood').mean().reset_index()
barcelona_grouped

Unnamed: 0,Neighbourhood,African Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,...,Train Station,Tram Station,Transportation Service,Tunnel,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop,Winery,Yoga Studio
0,Baró de Viver,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Can Baró,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Can Peguera,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Canyelles,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Ciutat Meridiana,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Diagonal Mar i el Front Marítim del Poblenou,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,...,0.0,0.0,0.0,0.0,0.033333,0.0,0.033333,0.0,0.0,0.0
6,Horta,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Hostafrancs,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0
8,Montbau,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Navas,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,...,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


We take the top 10 categories to cluster the neighbourhoods.

In [20]:
num_top_venues = 10

for hood in barcelona_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = barcelona_grouped[barcelona_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

       Pizza Place  0.09
3        Soccer Field  0.09
4         Supermarket  0.09
5    Tapas Restaurant  0.09
6           Pawn Shop  0.04
7         Coffee Shop  0.04
8      Farmers Market  0.04
9                Café  0.04


----Sant Pere, Santa Caterina i la Ribera----
                           venue  freq
0                          Hotel  0.20
1  Vegetarian / Vegan Restaurant  0.07
2                         Hostel  0.07
3            Japanese Restaurant  0.03
4               Toy / Game Store  0.03
5                     Comic Shop  0.03
6               Tapas Restaurant  0.03
7                      Bookstore  0.03
8                         Bistro  0.03
9            Monument / Landmark  0.03


----Sants----
                       venue  freq
0                      Plaza  0.07
1                   Wine Bar  0.07
2                        Bar  0.07
3  Middle Eastern Restaurant  0.07
4           Tapas Restaurant  0.07
5   Mediterranean Restaurant  0.07
6             Ice Cream Shop  0.03
7     

In [21]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

### Top venue categories in Barcelona

In [22]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Neighbourhood'] = barcelona_grouped['Neighbourhood']

for ind in np.arange(barcelona_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(barcelona_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Baró de Viver,Plaza,Metro Station,Café,Park,Electronics Store,Flower Shop,Flea Market,Fast Food Restaurant,Farmers Market,Farm
1,Can Baró,Spanish Restaurant,Chinese Restaurant,Grocery Store,Scenic Lookout,Pool,Tapas Restaurant,Bookstore,Basketball Court,Italian Restaurant,Bakery
2,Can Peguera,Park,Food & Drink Shop,Escape Room,Hostel,Fast Food Restaurant,Plaza,Brewery,Supermarket,Grocery Store,Tapas Restaurant
3,Canyelles,Soccer Field,Market,Brewery,Plaza,Skate Park,Grocery Store,Metro Station,Mediterranean Restaurant,Falafel Restaurant,Ethiopian Restaurant
4,Ciutat Meridiana,Metro Station,Park,Plaza,Grocery Store,Train Station,Donut Shop,Flea Market,Fast Food Restaurant,Farmers Market,Farm


## Model Building

K Means: Clustering the city og Barcelona to 5 clusters

In [23]:
# number of clusters
kclusters = 5

barcelona_grouped_clustering = barcelona_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(barcelona_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([4, 3, 1, 0, 4, 0, 2, 1, 0, 1], dtype=int32)

New dataframe that includes the cluster as well as the top 10 venues for each neighborhood.


In [24]:
# add clustering labels
neighbourhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

barcelona_merged = dfb

barcelona_merged = barcelona_merged.join(neighbourhoods_venues_sorted.set_index('Neighbourhood'), on='NOM_BARRI')

barcelona_merged.head() # check the last columns!

Unnamed: 0,CODI_DISTRICTE,NOM_DISTRICTE,CODI_BARRI,NOM_BARRI,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1,Ciutat Vella,1,el Raval,41.37763,2.17145,1,Spanish Restaurant,Cocktail Bar,Pizza Place,Tapas Restaurant,Plaza,Bar,Theater,Opera House,Mediterranean Restaurant,Café
1,1,Ciutat Vella,2,el Barri Gòtic,41.38042,2.17992,0,Hotel,Tapas Restaurant,Italian Restaurant,Bike Rental / Bike Share,Bar,Spanish Restaurant,Hotel Pool,Circus,Ramen Restaurant,Diner
2,1,Ciutat Vella,3,la Barceloneta,41.38185,2.19151,0,Tapas Restaurant,Mediterranean Restaurant,Bar,Beer Bar,Beach,Pizza Place,Bakery,Market,Soccer Field,Salon / Barbershop
3,1,Ciutat Vella,4,"Sant Pere, Santa Caterina i la Ribera",41.389547,2.179,0,Hotel,Vegetarian / Vegan Restaurant,Hostel,Tapas Restaurant,Coffee Shop,Food & Drink Shop,Camera Store,Monument / Landmark,Breakfast Spot,Pastry Shop
4,2,Eixample,5,el Fort Pienc,41.39519,2.18307,1,Coffee Shop,Restaurant,Sushi Restaurant,Supermarket,Bistro,Food & Drink Shop,Café,Chinese Restaurant,Yoga Studio,Juice Bar


Drop all the null values 



In [25]:
barcelona_merged.dropna(subset=['Cluster Labels'], inplace=True)
barcelona_merged.isnull().sum()

CODI_DISTRICTE            0
NOM_DISTRICTE             0
CODI_BARRI                0
NOM_BARRI                 0
Latitude                  0
Longitude                 0
Cluster Labels            0
1st Most Common Venue     0
2nd Most Common Venue     0
3rd Most Common Venue     0
4th Most Common Venue     0
5th Most Common Venue     0
6th Most Common Venue     0
7th Most Common Venue     0
8th Most Common Venue     0
9th Most Common Venue     0
10th Most Common Venue    0
dtype: int64

## Visualizing Barcelona with  clustered neighbourhood

In [26]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11.4)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(barcelona_merged['Latitude'], barcelona_merged['Longitude'], barcelona_merged['NOM_BARRI'], barcelona_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Examining our Clusters

Cluster 1

In [27]:
barcelona_merged.loc[barcelona_merged['Cluster Labels'] == 0, barcelona_merged.columns[[1] + list(range(5, barcelona_merged.shape[1]))]]


Unnamed: 0,NOM_DISTRICTE,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Ciutat Vella,2.17992,0,Hotel,Tapas Restaurant,Italian Restaurant,Bike Rental / Bike Share,Bar,Spanish Restaurant,Hotel Pool,Circus,Ramen Restaurant,Diner
2,Ciutat Vella,2.19151,0,Tapas Restaurant,Mediterranean Restaurant,Bar,Beer Bar,Beach,Pizza Place,Bakery,Market,Soccer Field,Salon / Barbershop
3,Ciutat Vella,2.179,0,Hotel,Vegetarian / Vegan Restaurant,Hostel,Tapas Restaurant,Coffee Shop,Food & Drink Shop,Camera Store,Monument / Landmark,Breakfast Spot,Pastry Shop
5,Eixample,2.17623,0,Italian Restaurant,Pizza Place,Plaza,Wine Bar,Burger Joint,Historic Site,Sandwich Place,Seafood Restaurant,Mediterranean Restaurant,Breakfast Spot
6,Eixample,2.16748,0,Hotel,Hostel,Tapas Restaurant,Pizza Place,Boutique,Beer Bar,Gastropub,Sandwich Place,Burger Joint,Roof Deck
13,Sants-Montjuïc,2.14645,0,Plaza,Spanish Restaurant,Art Gallery,Nightclub,Pizza Place,Hotel,Monument / Landmark,Candy Store,Restaurant,Fountain
17,Sants-Montjuïc,2.1376,0,Tapas Restaurant,Middle Eastern Restaurant,Plaza,Mediterranean Restaurant,Bar,Wine Bar,Peruvian Restaurant,Seafood Restaurant,Pedestrian Plaza,Dessert Shop
19,Les Corts,2.12137,0,Soccer Stadium,Soccer Field,Café,Mediterranean Restaurant,Supermarket,Museum,College Cafeteria,Burger Joint,Tapas Restaurant,Food & Drink Shop
20,Les Corts,2.11111,0,Mediterranean Restaurant,Garden,Pizza Place,Café,Garden Center,Nightclub,College Cafeteria,Roof Deck,Trail,Hotel
23,Sarrià-Sant Gervasi,2.13035,0,Hotel,Pizza Place,Mediterranean Restaurant,Spanish Restaurant,Café,Restaurant,Train Station,Japanese Restaurant,Mexican Restaurant,Sushi Restaurant


Cluster 2

In [28]:
barcelona_merged.loc[barcelona_merged['Cluster Labels'] == 1, barcelona_merged.columns[[1] + list(range(5, barcelona_merged.shape[1]))]]


Unnamed: 0,NOM_DISTRICTE,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Ciutat Vella,2.17145,1,Spanish Restaurant,Cocktail Bar,Pizza Place,Tapas Restaurant,Plaza,Bar,Theater,Opera House,Mediterranean Restaurant,Café
4,Eixample,2.18307,1,Coffee Shop,Restaurant,Sushi Restaurant,Supermarket,Bistro,Food & Drink Shop,Café,Chinese Restaurant,Yoga Studio,Juice Bar
7,Eixample,2.14494,1,Restaurant,Wine Shop,Spanish Restaurant,Theater,Hotel,French Restaurant,Big Box Store,Supermarket,Bookstore,Japanese Restaurant
8,Eixample,2.14254,1,Mediterranean Restaurant,Restaurant,Gym,Paella Restaurant,Café,Spanish Restaurant,Pizza Place,Tech Startup,Mobile Phone Shop,Breakfast Spot
9,Eixample,2.15923,1,Tapas Restaurant,Café,Coffee Shop,Bar,Thai Restaurant,Mexican Restaurant,Burger Joint,Mediterranean Restaurant,Liquor Store,Restaurant
10,Sants-Montjuïc,2.16668,1,Tapas Restaurant,Brewery,Park,Café,Yoga Studio,Hostel,Winery,Hotel,Lounge,Monument / Landmark
14,Sants-Montjuïc,2.14597,1,Burger Joint,Mediterranean Restaurant,Spanish Restaurant,Restaurant,Tapas Restaurant,Scenic Lookout,Pizza Place,Bar,Café,Event Space
15,Sants-Montjuïc,2.13622,1,Pizza Place,Italian Restaurant,Burger Joint,Mediterranean Restaurant,Park,Hotel,Supermarket,Japanese Restaurant,Food & Drink Shop,Music Venue
16,Sants-Montjuïc,2.1273,1,Pizza Place,Bakery,Supermarket,Tapas Restaurant,Hostel,Latin American Restaurant,Mediterranean Restaurant,Mexican Restaurant,Escape Room,Coffee Shop
18,Les Corts,2.13631,1,Restaurant,Café,Asian Restaurant,Sporting Goods Shop,Pedestrian Plaza,Thai Restaurant,Shopping Mall,Sandwich Place,Cosmetics Shop,Electronics Store


Cluster 3

In [29]:
barcelona_merged.loc[barcelona_merged['Cluster Labels'] == 2, barcelona_merged.columns[[1] + list(range(5, barcelona_merged.shape[1]))]]



Unnamed: 0,NOM_DISTRICTE,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,Sarrià-Sant Gervasi,2.11609,2,Museum,Park,Hostel,Dog Run,Gym Pool,Tunnel,Hotel,Soccer Field,Diner,Tapas Restaurant
42,Horta-Guinardó,2.15795,2,Soccer Field,Bakery,Hotel,Outdoors & Recreation,Supermarket,Park,Tapas Restaurant,Hostel,Farmers Market,Farm
56,Sant Andreu,2.19129,2,Soccer Field,Park,Spanish Restaurant,Metro Station,Tapas Restaurant,Empanada Restaurant,Flower Shop,Flea Market,Fast Food Restaurant,Farmers Market


Cluster 4

In [30]:
barcelona_merged.loc[barcelona_merged['Cluster Labels'] == 3, barcelona_merged.columns[[1] + list(range(5, barcelona_merged.shape[1]))]]



Unnamed: 0,NOM_DISTRICTE,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Sants-Montjuïc,2.13847,3,Spanish Restaurant,Café,Grocery Store,Mediterranean Restaurant,Furniture / Home Store,Supermarket,Bus Station,Transportation Service,Asian Restaurant,Plaza
12,Sants-Montjuïc,2.13847,3,Spanish Restaurant,Café,Grocery Store,Mediterranean Restaurant,Furniture / Home Store,Supermarket,Bus Station,Transportation Service,Asian Restaurant,Plaza
33,Horta-Guinardó,2.16253,3,Spanish Restaurant,Chinese Restaurant,Grocery Store,Scenic Lookout,Pool,Tapas Restaurant,Bookstore,Basketball Court,Italian Restaurant,Bakery
37,Horta-Guinardó,2.14482,3,Metro Station,Spanish Restaurant,Bus Station,Soccer Stadium,Plaza,Stadium,Café,Bar,Bakery,Farmers Market
41,Horta-Guinardó,2.15268,3,Spanish Restaurant,Coffee Shop,Hostel,Supermarket,Outdoor Sculpture,Plaza,Asian Restaurant,Gym,Grocery Store,Café
50,Nou Barris,2.17485,3,Spanish Restaurant,Grocery Store,Park,Plaza,Brewery,Sushi Restaurant,South American Restaurant,Soccer Stadium,Snack Place,Skate Park
51,Nou Barris,2.18164,3,Spanish Restaurant,Grocery Store,Park,Pizza Place,Restaurant,Italian Restaurant,Breakfast Spot,Bus Station,Sports Club,Supermarket
52,Nou Barris,2.18532,3,Spanish Restaurant,Metro Station,Park,Pharmacy,Plaza,Tapas Restaurant,Breakfast Spot,Grocery Store,Escape Room,Ethiopian Restaurant
53,Nou Barris,2.17619,3,Spanish Restaurant,Art Gallery,Scenic Lookout,Optical Shop,Electronics Store,Flower Shop,Flea Market,Fast Food Restaurant,Farmers Market,Farm


Cluster 5

In [31]:
barcelona_merged.loc[barcelona_merged['Cluster Labels'] == 4, barcelona_merged.columns[[1] + list(range(5, barcelona_merged.shape[1]))]]



Unnamed: 0,NOM_DISTRICTE,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
54,Nou Barris,2.17722,4,Metro Station,Park,Plaza,Grocery Store,Train Station,Donut Shop,Flea Market,Fast Food Restaurant,Farmers Market,Farm
55,Nou Barris,2.18313,4,Plaza,Train Station,Grocery Store,Park,Dog Run,Flea Market,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant
57,Sant Andreu,2.19963,4,Plaza,Metro Station,Café,Park,Electronics Store,Flower Shop,Flea Market,Fast Food Restaurant,Farmers Market,Farm


***

# Exploring Madrid

### Data Collection

We begin by reading the csv data and collecting coordinates on neighbourhoods of Madrid

In [32]:
!wget -q -O 'madrid_data.csv' https://datos.madrid.es/egob/catalogo/200078-1-distritos-barrios.csv
print('Data downloaded!')

Data downloaded!


In [33]:
dfm = pd.read_csv('madrid_data.csv', sep=';', encoding='latin-1')
dfm.head()

Unnamed: 0,Codigo de barrio,Codigo de distrito al que pertenece,Nombre de barrio,Nombre acentuado del barrio,Superficie (m2),Perimetro (m)
0,1,1,PALACIO,PALACIO,1471085,5754
1,1,2,IMPERIAL,IMPERIAL,967500,4557
2,1,3,PACIFICO,PACÍFICO,750065,4005
3,1,4,RECOLETOS,RECOLETOS,870857,3927
4,1,5,EL VISO,EL VISO,1708046,5269


### Feature Engineering 

In this case since the column name have spaces in between to make things easier we will shorten the required attributes:
1. *Codigo de barrio*: Neighbourhood Code rename to: *CODI_BARRI*
2. *Codigo de distrito al que pertenece*: District code to which it belongs rename to: *CODI_DISTRICTE*
3. *Nombre de barrio*: Neighbourhood Name rename to: *NOM_BARRI*

In [34]:
dfm = dfm.rename(columns = {"Codigo de barrio":"CODI_BARRI", "Codigo de distrito al que pertenece":"CODI_DISTRICTE", "Nombre de barrio":"NOM_BARRI"})

dfm

Unnamed: 0,CODI_BARRI,CODI_DISTRICTE,NOM_BARRI,Nombre acentuado del barrio,Superficie (m2),Perimetro (m)
0,1,1,PALACIO,PALACIO,1471085,5754
1,1,2,IMPERIAL,IMPERIAL,967500,4557
2,1,3,PACIFICO,PACÍFICO,750065,4005
3,1,4,RECOLETOS,RECOLETOS,870857,3927
4,1,5,EL VISO,EL VISO,1708046,5269
5,1,6,BELLAS VISTAS,BELLAS VISTAS,716261,3443
6,1,7,GAZTAMBIDE,GAZTAMBIDE,506596,2969
7,1,8,EL PARDO,EL PARDO,187642916,87125
8,1,9,CASA DE CAMPO,CASA DE CAMPO,17470075,19233
9,1,10,LOS CARMENES,LOS CÁRMENES,1292235,6186


In [35]:
dfm.keys()

Index(['CODI_BARRI', 'CODI_DISTRICTE', 'NOM_BARRI',
       'Nombre acentuado del barrio', 'Superficie (m2)', 'Perimetro (m)'],
      dtype='object')

In [36]:
def get_latlng(NOM_BARRI):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Madrid, Spain'.format(NOM_BARRI))
        lat_lng_coords = g.latlng
    return lat_lng_coords
# Call the function to get the coordinates, store in a new list using list comprehension
coords = [ get_latlng(NOM_BARRI) for NOM_BARRI in dfm["NOM_BARRI"].tolist()]

In [37]:
dfm_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])
# Merge the coordinates into the original dataframe
dfm['Latitude'] = dfm_coords['Latitude']
dfm['Longitude'] = dfm_coords['Longitude']
print(dfm.shape)

(131, 8)


In [38]:
dfm.head()

Unnamed: 0,CODI_BARRI,CODI_DISTRICTE,NOM_BARRI,Nombre acentuado del barrio,Superficie (m2),Perimetro (m),Latitude,Longitude
0,1,1,PALACIO,PALACIO,1471085,5754,40.41517,-3.71273
1,1,2,IMPERIAL,IMPERIAL,967500,4557,40.40833,-3.71865
2,1,3,PACIFICO,PACÍFICO,750065,4005,40.40191,-3.67603
3,1,4,RECOLETOS,RECOLETOS,870857,3927,40.4253,-3.68651
4,1,5,EL VISO,EL VISO,1708046,5269,40.44746,-3.68543


## Map of Madrid with all of the Neighbourhoods

We will use the Nominatim and folium library to generate the coordinates of city first and then map the neighbourhoods

In [39]:
address = 'Madrid, Spain'

geolocator = Nominatim(user_agent="madrid_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The coordinates of Madrid are {}, {}.'.format(latitude, longitude))

The coordinates of Madrid are 40.4167047, -3.7035825.


In [40]:
map_Madrid = folium.Map(location=[latitude, longitude], zoom_start=10.4)

# adding markers to map
for latitude, longitude, NOM_BARRI in zip(dfm['Latitude'], dfm['Longitude'], dfm['NOM_BARRI']):
    label = '{},'.format(NOM_BARRI)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color='red',
        fill=True
        ).add_to(map_Madrid)  
    
map_Madrid

### Venues in Madrid


Defining a function to get the nearby venues in the neighbourhood.



In [41]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius
            )
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Category']
    
    return(nearby_venues)

Getting all of the venues in Madrid

In [42]:
madrid_venues = getNearbyVenues(dfm['NOM_BARRI'], dfm['Latitude'], dfm['Longitude'])


PALACIO             
IMPERIAL            
PACIFICO            
RECOLETOS           
EL VISO             
BELLAS VISTAS       
GAZTAMBIDE          
EL PARDO            
CASA DE CAMPO       
LOS CARMENES        
COMILLAS            
ORCASITAS           
ENTREVIAS           
PAVONES             
VENTAS              
PALOMAS             
VILLAVERDE ALTO C.H.
CASCO H.VALLECAS    
CASCO H.VICALVARO   
SIMANCAS            
ALAMEDA DE OSUNA    
EMBAJADORES         
ACACIAS             
ADELFAS             
GOYA                
PROSPERIDAD         
CUATRO CAMINOS      
ARAPILES            
FUENTELARREINA      
ARGUELLES           
PUERTA DEL ANGEL    
OPAÑEL              
ORCASUR             
SAN DIEGO           
HORCAJO             
PUEBLO NUEVO        
PIOVERA             
SAN CRISTOBAL       
SANTA EUGENIA       
VALDEBERNARDO       
HELLIN              
AEROPUERTO          
CORTES              
CHOPERA             
ESTRELLA            
FUENTE DEL BERRO    
CIUDAD JARDIN       
CASTILLEJOS  

In [43]:
madrid_venues.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Category
0,PALACIO,40.41517,-3.71273,Plaza de la Villa,Historic Site
1,PALACIO,40.41517,-3.71273,Cervecería La Mayor,Beer Bar
2,PALACIO,40.41517,-3.71273,Santa Iglesia Catedral de Santa María la Real ...,Church
3,PALACIO,40.41517,-3.71273,Plaza de La Almudena,Plaza
4,PALACIO,40.41517,-3.71273,la gastroteca de santiago,Restaurant


In [44]:
madrid_venues.shape

(2118, 5)

### Grouping by Venue Categories


In [45]:
madrid_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
ABRANTES,6,6,6,6
ACACIAS,30,30,30,30
ADELFAS,30,30,30,30
ALAMEDA DE OSUNA,24,24,24,24
ALMAGRO,30,30,30,30
ALMENARA,4,4,4,4
ALMENDRALES,17,17,17,17
ALUCHE,15,15,15,15
AMPOSTA,9,9,9,9
APOSTOL SANTIAGO,7,7,7,7


In [46]:
print('There are {} uniques categories.'.format(len(madrid_venues['Venue Category'].unique())))

There are 232 uniques categories.


### Analyze each Neighbourhood


In [47]:
# one hot encoding
madrid_onehot = pd.get_dummies(madrid_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
madrid_onehot['Neighbourhood'] = madrid_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [madrid_onehot.columns[-1]] + list(madrid_onehot.columns[:-1])
madrid_onehot = madrid_onehot[fixed_columns]

madrid_onehot.head()

Unnamed: 0,Neighbourhood,Accessories Store,American Restaurant,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Art Studio,Asian Restaurant,...,Trail,Train,Train Station,Used Bookstore,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop,Yoga Studio,Zoo
0,PALACIO,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,PALACIO,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,PALACIO,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,PALACIO,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,PALACIO,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


We will group the Neighbourhoods and calculate the mean venue categories value in each Neighbourhood



In [48]:
madrid_grouped = madrid_onehot.groupby('Neighbourhood').mean().reset_index()
madrid_grouped

Unnamed: 0,Neighbourhood,Accessories Store,American Restaurant,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Art Studio,Asian Restaurant,...,Trail,Train,Train Station,Used Bookstore,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop,Yoga Studio,Zoo
0,ABRANTES,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,ACACIAS,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,...,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,ADELFAS,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,ALAMEDA DE OSUNA,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,ALMAGRO,0.0,0.033333,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,ALMENARA,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,ALMENDRALES,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,ALUCHE,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,AMPOSTA,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,APOSTOL SANTIAGO,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


We take the top 10 categories to cluster the neighbourhoods.

In [49]:
num_top_venues = 10

for hood in madrid_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = madrid_grouped[madrid_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

 & Drink Shop  0.08
7           Snack Place  0.08
8    Italian Restaurant  0.08
9  Fast Food Restaurant  0.08


----MARROQUINA          ----
                  venue  freq
0          Soccer Field   0.2
1            Restaurant   0.2
2             Nightclub   0.2
3                Bakery   0.2
4                  Park   0.2
5     Accessories Store   0.0
6  Other Great Outdoors   0.0
7             Multiplex   0.0
8                Museum   0.0
9          Music School   0.0


----MEDIA LEGUA         ----
                  venue  freq
0           Pizza Place  0.11
1                   Bar  0.11
2            Restaurant  0.11
3  Fast Food Restaurant  0.11
4   American Restaurant  0.05
5    Chinese Restaurant  0.05
6         Grocery Store  0.05
7         Big Box Store  0.05
8        Sandwich Place  0.05
9                   Gym  0.05


----MIRASIERRA          ----
               venue  freq
0            Theater  0.25
1  Convenience Store  0.25
2      Metro Station  0.25
3      Grocery Store  0.25
4 

In [50]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

### Top venue categories in Madrid

In [51]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Neighbourhood'] = madrid_grouped['Neighbourhood']

for ind in np.arange(madrid_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(madrid_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,ABRANTES,Fast Food Restaurant,Athletics & Sports,Plaza,Pizza Place,Park,Gym / Fitness Center,Fabric Shop,Falafel Restaurant,Farm,Farmers Market
1,ACACIAS,Bar,Park,Spanish Restaurant,Café,Tapas Restaurant,Pizza Place,Indie Theater,Brewery,Pub,Bookstore
2,ADELFAS,Supermarket,Grocery Store,Bakery,Tapas Restaurant,Burger Joint,Korean Restaurant,Bookstore,Breakfast Spot,Brewery,Football Stadium
3,ALAMEDA DE OSUNA,Bakery,Restaurant,Smoke Shop,Scenic Lookout,Fried Chicken Joint,Bistro,Spanish Restaurant,Tapas Restaurant,Plaza,Market
4,ALMAGRO,Restaurant,Japanese Restaurant,French Restaurant,Hotel,Bistro,Sports Club,Boutique,Spanish Restaurant,Furniture / Home Store,Market


## Model Building

K Means: Clustering the city of Madrid to 5 clusters

In [52]:
# number of clusters
kclusters = 5

madrid_grouped_clustering = madrid_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(madrid_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([3, 3, 3, 3, 3, 3, 1, 3, 0, 0], dtype=int32)

New dataframe that includes the cluster as well as the top 10 venues for each neighborhood.


In [53]:
# add clustering labels
neighbourhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

madrid_merged = dfm

madrid_merged = madrid_merged.join(neighbourhoods_venues_sorted.set_index('Neighbourhood'), on='NOM_BARRI')

madrid_merged.head() # check the last columns!

Unnamed: 0,CODI_BARRI,CODI_DISTRICTE,NOM_BARRI,Nombre acentuado del barrio,Superficie (m2),Perimetro (m),Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1,1,PALACIO,PALACIO,1471085,5754,40.41517,-3.71273,3.0,Plaza,Spanish Restaurant,Historic Site,Ice Cream Shop,Pastry Shop,Café,Church,Cocktail Bar,Restaurant,Performing Arts Venue
1,1,2,IMPERIAL,IMPERIAL,967500,4557,40.40833,-3.71865,1.0,Spanish Restaurant,Park,Hotel,Pizza Place,Japanese Restaurant,Chinese Restaurant,Church,Spa,Coffee Shop,Garden
2,1,3,PACIFICO,PACÍFICO,750065,4005,40.40191,-3.67603,3.0,Food & Drink Shop,Restaurant,Spanish Restaurant,Bakery,Pizza Place,Tapas Restaurant,Sandwich Place,Café,Farmers Market,Burger Joint
3,1,4,RECOLETOS,RECOLETOS,870857,3927,40.4253,-3.68651,1.0,Restaurant,Spanish Restaurant,Clothing Store,Accessories Store,Tapas Restaurant,Hotel,Japanese Restaurant,Jewelry Store,Furniture / Home Store,Market
4,1,5,EL VISO,EL VISO,1708046,5269,40.44746,-3.68543,1.0,Spanish Restaurant,Pizza Place,Café,Bakery,Mediterranean Restaurant,Sandwich Place,Bistro,Sushi Restaurant,Restaurant,Gourmet Shop


Drop all the null values 



In [54]:
madrid_merged.dropna(subset=['Cluster Labels'], inplace=True)
madrid_merged.isnull().sum()

CODI_BARRI                     0
CODI_DISTRICTE                 0
NOM_BARRI                      0
Nombre acentuado del barrio    0
Superficie (m2)                0
Perimetro (m)                  0
Latitude                       0
Longitude                      0
Cluster Labels                 0
1st Most Common Venue          0
2nd Most Common Venue          0
3rd Most Common Venue          0
4th Most Common Venue          0
5th Most Common Venue          0
6th Most Common Venue          0
7th Most Common Venue          0
8th Most Common Venue          0
9th Most Common Venue          0
10th Most Common Venue         0
dtype: int64

## Visualizing Madrid with  clustered neighbourhood

In [55]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11.4)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(madrid_merged['Latitude'], madrid_merged['Longitude'], madrid_merged['NOM_BARRI'], madrid_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Examining our Clusters

Cluster 1

In [56]:
madrid_merged.loc[madrid_merged['Cluster Labels'] == 0, madrid_merged.columns[[1] + list(range(5, madrid_merged.shape[1]))]]


Unnamed: 0,CODI_DISTRICTE,Perimetro (m),Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,13,11057,40.37925,-3.67212,0.0,Pizza Place,Train Station,Grocery Store,Bakery,Gym / Fitness Center,Park,Restaurant,Diner,Embassy / Consulate,Flea Market
14,15,8207,40.42238,-3.6502,0.0,Soccer Field,Grocery Store,Food,Bakery,Pizza Place,Restaurant,Falafel Restaurant,Metro Station,Spanish Restaurant,Trail
16,17,13710,40.34922,-3.71211,0.0,Home Service,Bar,Construction & Landscaping,Train,Bakery,Park,Zoo,Fast Food Restaurant,Falafel Restaurant,Farm
17,18,31924,40.38333,-3.61667,0.0,Spanish Restaurant,Plaza,Soccer Field,Park,Zoo,Event Space,Flower Shop,Flea Market,Fish Market,Fish & Chips Shop
18,19,33326,40.4,-3.6,0.0,Platform,Supermarket,Falafel Restaurant,Metro Station,Grocery Store,Train Station,Park,Farmers Market,Fabric Shop,Farm
28,8,7691,40.4802,-3.73934,0.0,Park,Bakery,Zoo,Exhibit,Food & Drink Shop,Food,Flower Shop,Flea Market,Fish Market,Fish & Chips Shop
30,10,5373,40.40886,-3.73392,0.0,Grocery Store,Pizza Place,Spanish Restaurant,Student Center,Snack Place,Restaurant,Other Great Outdoors,Supermarket,Park,Cosmetics Shop
52,11,6149,40.39588,-3.73048,0.0,Bar,Food Service,Beer Bar,Rock Club,Grocery Store,Stadium,Fish Market,Fish & Chips Shop,Flea Market,Fast Food Restaurant
54,13,5649,40.38756,-3.66029,0.0,Park,Soccer Stadium,Tapas Restaurant,Supermarket,Concert Hall,Fast Food Restaurant,Falafel Restaurant,Farm,Farmers Market,Zoo
55,14,6746,40.41176,-3.64618,0.0,Soccer Field,Park,Restaurant,Bakery,Nightclub,Zoo,Exhibit,Flower Shop,Flea Market,Fish Market


Cluster 2

In [57]:
madrid_merged.loc[madrid_merged['Cluster Labels'] == 1, madrid_merged.columns[[1] + list(range(5, madrid_merged.shape[1]))]]


Unnamed: 0,CODI_DISTRICTE,Perimetro (m),Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,2,4557,40.40833,-3.71865,1.0,Spanish Restaurant,Park,Hotel,Pizza Place,Japanese Restaurant,Chinese Restaurant,Church,Spa,Coffee Shop,Garden
3,4,3927,40.4253,-3.68651,1.0,Restaurant,Spanish Restaurant,Clothing Store,Accessories Store,Tapas Restaurant,Hotel,Japanese Restaurant,Jewelry Store,Furniture / Home Store,Market
4,5,5269,40.44746,-3.68543,1.0,Spanish Restaurant,Pizza Place,Café,Bakery,Mediterranean Restaurant,Sandwich Place,Bistro,Sushi Restaurant,Restaurant,Gourmet Shop
5,6,3443,40.45457,-3.70552,1.0,Spanish Restaurant,Bar,Tapas Restaurant,Bakery,Supermarket,Seafood Restaurant,Museum,Grocery Store,Donut Shop,Frozen Yogurt Shop
10,11,4257,40.39495,-3.70976,1.0,Spanish Restaurant,Park,Tapas Restaurant,Bar,Beer Garden,Playground,Grocery Store,Bridge,Soccer Field,Fast Food Restaurant
13,14,4134,40.40004,-3.633,1.0,Tea Room,Breakfast Spot,Diner,Spanish Restaurant,Bus Station,Grocery Store,Athletics & Sports,Fish & Chips Shop,Farm,Farmers Market
19,20,6678,40.43577,-3.62488,1.0,Hotel,Spanish Restaurant,Restaurant,Café,Bar,Brewery,Fast Food Restaurant,Mediterranean Restaurant,Sandwich Place,Music Venue
24,4,3473,40.42547,-3.67418,1.0,Spanish Restaurant,Seafood Restaurant,Bookstore,Supermarket,Italian Restaurant,Basketball Stadium,Burger Joint,French Restaurant,Spa,Restaurant
25,5,5250,40.4418,-3.66925,1.0,Spanish Restaurant,Tapas Restaurant,Café,Supermarket,Hotel,Restaurant,Grocery Store,Chinese Restaurant,Brazilian Restaurant,Fried Chicken Joint
27,7,3096,40.43286,-3.7084,1.0,Spanish Restaurant,Restaurant,Café,Cheese Shop,Ice Cream Shop,Supermarket,Bar,Theater,Market,Embassy / Consulate


Cluster 3

In [58]:
madrid_merged.loc[madrid_merged['Cluster Labels'] == 2, madrid_merged.columns[[1] + list(range(5, madrid_merged.shape[1]))]]


Unnamed: 0,CODI_DISTRICTE,Perimetro (m),Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
36,16,7812,40.45653,-3.6371,2.0,Cheese Shop,Zoo,Fabric Shop,Food Service,Food & Drink Shop,Food,Flower Shop,Flea Market,Fish Market,Fish & Chips Shop


Cluster 4

In [59]:
madrid_merged.loc[madrid_merged['Cluster Labels'] == 3, madrid_merged.columns[[1] + list(range(5, madrid_merged.shape[1]))]]


Unnamed: 0,CODI_DISTRICTE,Perimetro (m),Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1,5754,40.41517,-3.71273,3.0,Plaza,Spanish Restaurant,Historic Site,Ice Cream Shop,Pastry Shop,Café,Church,Cocktail Bar,Restaurant,Performing Arts Venue
2,3,4005,40.40191,-3.67603,3.0,Food & Drink Shop,Restaurant,Spanish Restaurant,Bakery,Pizza Place,Tapas Restaurant,Sandwich Place,Café,Farmers Market,Burger Joint
6,7,2969,40.4349,-3.71551,3.0,Ice Cream Shop,Spanish Restaurant,Japanese Restaurant,Salad Place,Burrito Place,Burger Joint,Restaurant,Boxing Gym,Cocktail Bar,Tapas Restaurant
8,9,19233,40.41336,-3.75982,3.0,Exhibit,Zoo,Farm,Food Service,Food & Drink Shop,Food,Flower Shop,Flea Market,Fish Market,Fish & Chips Shop
9,10,6186,40.4028,-3.73178,3.0,Electronics Store,Restaurant,Grocery Store,Event Space,Food,Flower Shop,Flea Market,Fish Market,Fish & Chips Shop,Fast Food Restaurant
11,12,4664,40.36985,-3.71231,3.0,Plaza,Sporting Goods Shop,Café,Farmers Market,Grocery Store,Zoo,Exhibit,Food,Flower Shop,Flea Market
15,16,4988,40.45383,-3.61501,3.0,Restaurant,Café,Asian Restaurant,Spanish Restaurant,Cosmetics Shop,Gym,Coffee Shop,Cocktail Bar,Sandwich Place,Fast Food Restaurant
20,21,6043,40.45818,-3.58953,3.0,Bakery,Restaurant,Smoke Shop,Scenic Lookout,Fried Chicken Joint,Bistro,Spanish Restaurant,Tapas Restaurant,Plaza,Market
21,1,4267,40.40803,-3.70067,3.0,Art Gallery,Tapas Restaurant,Plaza,Market,Restaurant,Pizza Place,Bar,Seafood Restaurant,Mediterranean Restaurant,Liquor Store
22,2,3950,40.40137,-3.70669,3.0,Bar,Park,Spanish Restaurant,Café,Tapas Restaurant,Pizza Place,Indie Theater,Brewery,Pub,Bookstore


Cluster 5

In [60]:
madrid_merged.loc[madrid_merged['Cluster Labels'] == 4, madrid_merged.columns[[1] + list(range(5, madrid_merged.shape[1]))]]


Unnamed: 0,CODI_DISTRICTE,Perimetro (m),Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
91,9,8300,40.46996,-3.77641,4.0,Tennis Court,Arcade,Zoo,Food & Drink Shop,Food,Flower Shop,Flea Market,Fish Market,Fish & Chips Shop,Fast Food Restaurant


# Results and Discussion

The neighbourhoods of Barcelona, and Madrid are offer both similar and different mutlicutlural experiences at the same time. Let's start of by discussing the geography, Madrid is a bustling capital on arid plains in the heart of Spain it is bigger and has about 132 neighbourhoods to explore. Barcelona is relatively small with 73 neighbourhoods to expore. Barcelona is more of an Mediterranean city and offers a lot more natural, and picturesque natural scenery. Both have historical sites and museums although Barcelona has more extraordinary art and architecture on the street to enjoy like the Gothic quarter, it is a perfect combination of City and Beach. Meanwhile, Madrid is a cosmopolitian city, and offers a lot more possibilities in terms of entertainment and Spanish atmosphere. In terms of cuisines both Barcelona and Madrid offer wide variety of international cuisines.  Barcelona offers more Catalan cuisine and is more vegetarian and vegan friendly, while Madrid offers more traditional Spanish, and a gourmet experience. Both have really good street markets to offer like Barcelona's La Ramblas and Madrid's Rastro. If you are looking for more cosmopolitian and bustling city with lots of entertainment  like experience Madrid will be the choice. If you are looking for more combination of city and beach/nature with architecture and art on the street then Barcelona will be the choice. 

# Conclusion

The purpose of this project was to explore the two of the biggest cities Spain (Barcelona and Madrid) and see how attractive it is to potential tourists and migrants. We explored both the cities based on their neighbourhoods and then extrapolated the common venues present in each of the neighbourhoods finally concluding with clustering similar neighbourhoods together.

We can conclude that each of the neighbourhoods in both the cities are culturally diverse and offer a wide variety of unique experiences  which is unique in it's own way. 

Both cities (Barcelona and Madrid) seem to offer a nice vacation stay  with a lot of places to explore, and variety of unique activites to do. Overall, it's up to the individual to decide which experience they would prefer more cosmopolitian with lot's of possibilities of enterntainment(Madrid) or a combination of city along with natural scenery with extra ordinary arts and architecture integrated within it's streets (Barcelona).