## A Battle of Neighbourhoods - Clustering the Neighbourhoods of Barcelona and Madrid


Dimas Dwi Putra

To view the notebook with full maps: https://dataplatform.cloud.ibm.com/analytics/notebooks/v2/8524c923-8f6a-4a1e-85d0-fcf752d7c48b/view?access_token=4b5f1bff1b940bcf3d5289d7dff694f8a9c77fe0469bd5cc0a90bf6265a60c38

# 1. Introduction


Two of the biggest culturally vibrant and popular cities in Spain are Madrid and Barcelona. There are many distinction in and difference in the cultures Madrid being the Spanish Captial and Barcelona being the capital of the autonomous community of Catalonia.

Both Barcelona and Madrid are quite popular tourist attraction and vacation destination for people all around the world. They are both diverse and multicutural cities that offer a lot of unique experiences that is widely sought after. In this project we will try to we will try to group neighbourhoods of Barcelona and Madrid and compare and contrast what each city has to offer that is unique. Since Madrid and Barcelona are provinces they have over 500 municipalites, so for the purpose of this project we will just focus on the main cities for the purposes of this project

# 2. Business Problem

The goal is to help tourists choose their destinations depending on the unique experiences that each of te neighbourhood in the city has to offer. Based on the experiences a tourist or a person can choose where to go to get that sort of experience that they are looking for and can also help in terms of planning ahead for the trip. Our findings will offer beyond tourist attractions, we aim to offer entertainment venues, unique cuisines that are available in the neighbourhoods.  

# 3. Data

We require geographical location data for both Barcelona and Madrid. Each city will have a neighbourhood code and the district code, along with the name which then we can use to determine coordinates. Using coordinates we use can find out the  venues and their most popular venue categories for each neighborhood.





### Barcelona Data:

To derive the geographical data of Barcelona , we will visit the Open Data Service portal that has been provided by the Barcelona City governement in csv format: https://opendata-ajuntament.barcelona.cat/data/dataset/808daafa-d9ce-48c0-925a-fa5afdb1ed41/resource/4cc59b76-a977-40ac-8748-61217c8ff367/download/districtes_i_barris_170705.csv

The CSV file has data about all the neighbourhoods in Barcelona
***
1. *CODI_DISTRICTE*: District Code 
2. *NOM_DISTRICTE*: District Name
3. *CODI_BARRI*: Neighbourhood Code
4. *NOM_BARRI*: Neighbourhood Name
***


### Madrid Data:

To derive geographical data of Madrid, we will visit the Open Data Service portal that has been provided by the Madrid City governement in csv format: https://datos.madrid.es/egob/catalogo/200078-1-distritos-barrios.csv

The CSV file has data about all the neighbourhoods in Madrid we will only be leveraging the following attributes if necassary:
***
1. *Codigo de barrio*: Neighbourhood Code rename to: *CODI_BARRI*
2. *Codigo de distrito al que pertenece*: District code to which it belongs rename to: *CODI_DISTRICTE*
3. *Nombre de barrio*: Neighbourhood Name rename to: *NOM_BARRI*
***

###  Geocoder/ArcGIS API

ArcGIS Online enables you to connect people, locations, and data using interactive maps. Work with smart, data-driven styles and intuitive analysis tools that deliver location intelligence. Share your insights with the world or specific groups.



For our project purposes we will use the Geocoder package to call on ArcGIS API to retrieve the coordinates of all of the neighbourhood in both cities, and add the following attributes to both dataset:
***
1. *Latitude*: Latitude for Neighbourhood
2. *Longitude*: Longitude for Neighbourhood
***

### Venue Data using Foursquare API

We will need data about different venues in each neighbourhood. In order to gain that information we will leverage Foursquare api to provide information on all manner of venues and events within an area of interest.I nformation such as venue names, locations, menus and even photos. 

We will connect to the Foursquare API to gather information about venues inside each and every neighbourhood. For each neighbourhood, we have chosen the radius to be 500 meters.

The data retrieved from Foursquare contained information of venues within a specified distance of the longitude and latitude of the neighbourhood . The information obtained per venue as follows:

1. *Neighbourhood* : Name of the Neighbourhood
2. *Neighbourhood Latitude* : Latitude of the Neighbourhood
3. *Neighbourhood Longitude* : Longitude of the Neighbourhood
4. *Venue* : Name of the Venue
5. *Venue Latitude* : Latitude of Venue
6. *Venue Longitude* : Longitude of Venue
7. *Venue Category* : Category of Venue

Based on all the information collected for both Barcelona and Madrid, we will have sufficient data to build our model. We clustered  the neighbourhoods together based on similar venue categories. We will then present our observations and findings. Using this data, our stakeholders can take the necessary decision.



# 4. Methodology
We will be creating our model with the help of Python so we start off by importing all the required packages.

In [1]:
import pandas as pd 
import geocoder
from geopy.geocoders import Nominatim
pd.options.display.max_rows = 10000
import requests
import numpy as np
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium
from sklearn.cluster import KMeans


ModuleNotFoundError: No module named 'geocoder'

Python Packages:

- Pandas : to read manipulate data in csv and then data analysis
- geocoder: to retrieve the coordinates for each neighborhood 
- requests : Handle http requests
- matplotlib : Detailing the generated maps
- folium : Generating maps of Barcelona and Madrid
- sklearn : To import Kmeans clustering machine learning model method

# Exploring Barcelona 

### Data Collection

We begin by reading the csv data and collecting coordinates on neighbourhoods of Barcelona

In [2]:
!wget -q -O 'barcelona_data.csv' https://opendata-ajuntament.barcelona.cat/data/dataset/808daafa-d9ce-48c0-925a-fa5afdb1ed41/resource/4cc59b76-a977-40ac-8748-61217c8ff367/download/districtes_i_barris_170705.csv
print('Data downloaded!')

Data downloaded!


In [3]:
dfb = pd.read_csv('barcelona_data.csv')
dfb.head()

Unnamed: 0,CODI_DISTRICTE,NOM_DISTRICTE,CODI_BARRI,NOM_BARRI
0,1,Ciutat Vella,1,el Raval
1,1,Ciutat Vella,2,el Barri Gòtic
2,1,Ciutat Vella,3,la Barceloneta
3,1,Ciutat Vella,4,"Sant Pere, Santa Caterina i la Ribera"
4,2,Eixample,5,el Fort Pienc


In [4]:
dfb.keys()

Index(['CODI_DISTRICTE', 'NOM_DISTRICTE', 'CODI_BARRI', 'NOM_BARRI'], dtype='object')

### Geolocations of the Barcelona Neighbourhoods

We will use the geocoder library and leverage ArcGis API to obtain coordinates of each of the neighourhoods, inputs will be City, Province (Barcelona, Barcelona) and Name of the neighbourhood (NOM_BARRI)

In [5]:
def get_latlng(NOM_BARRI):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Barcelona, Barcelona'.format(NOM_BARRI))
        lat_lng_coords = g.latlng
    return lat_lng_coords
# Call the function to get the coordinates, store in a new list using list comprehension
coords = [ get_latlng(NOM_BARRI) for NOM_BARRI in dfb["NOM_BARRI"].tolist()]

NameError: name 'geocoder' is not defined

We will place the coordinates in a seperate dataframe and then merge them with our original data

In [None]:
dfb_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])
# Merge the coordinates into the original dataframe
dfb['Latitude'] = dfb_coords['Latitude']
dfb['Longitude'] = dfb_coords['Longitude']
print(dfb.shape)

In [None]:
dfb

## Map of Barcelona with all of the Neighbourhoods

We will use the Nominatim and folium library to generate the coordinates of city first and then map the neighbourhoods

In [None]:
address = 'Barcelona, Spain'

geolocator = Nominatim(user_agent="barcelona_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The coordinates of Barcelona are {}, {}.'.format(latitude, longitude))

In [6]:
map_Barcelona = folium.Map(location=[latitude, longitude], zoom_start=11.4)

# adding markers to map
for latitude, longitude, NOM_DISTRICTE, NOM_BARRI in zip(dfb['Latitude'], dfb['Longitude'], dfb['NOM_BARRI'], dfb['NOM_DISTRICTE']):
    label = '{}, {}'.format(NOM_DISTRICTE, NOM_BARRI)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color='blue',
        fill=True
        ).add_to(map_Barcelona)  
    
map_Barcelona

NameError: name 'folium' is not defined

### Venues in Barcelona


Now we will utilize Foursquare API to get all of the venues  in each neighbourhood.
First we enter and define Foursquare API credentials

In [7]:
CLIENT_ID = 'LDCRRZGOPCB3PXVGQCPCHFP1TXDTPMWCM2J3ZVLUQJC35EXW' 
CLIENT_SECRET = 'IJ2PLHTJ0G1VSIQB23B2U2B3VJDXVPSILD1TYGUUC2THLEOI'
VERSION = '20201205' # Foursquare API version


print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: LDCRRZGOPCB3PXVGQCPCHFP1TXDTPMWCM2J3ZVLUQJC35EXW
CLIENT_SECRET:IJ2PLHTJ0G1VSIQB23B2U2B3VJDXVPSILD1TYGUUC2THLEOI


Defining a function to get the nearby venues in the neighbourhood.



In [8]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius
            )
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Category']
    
    return(nearby_venues)

Getting all of the venues in Barcelona

In [9]:
barcelona_venues = getNearbyVenues(dfb['NOM_BARRI'], dfb['Latitude'], dfb['Longitude'])


KeyError: 'Latitude'

In [10]:
barcelona_venues.head()

NameError: name 'barcelona_venues' is not defined

In [11]:
barcelona_venues.shape

NameError: name 'barcelona_venues' is not defined

### Grouping by Venue Categories


In [12]:
barcelona_venues.groupby('Neighbourhood').count()

NameError: name 'barcelona_venues' is not defined

In [13]:
print('There are {} uniques categories.'.format(len(barcelona_venues['Venue Category'].unique())))

NameError: name 'barcelona_venues' is not defined

### Analyze each Neighbourhood


In [14]:
# one hot encoding
barcelona_onehot = pd.get_dummies(barcelona_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
barcelona_onehot['Neighbourhood'] = barcelona_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [barcelona_onehot.columns[-1]] + list(barcelona_onehot.columns[:-1])
barcelona_onehot = barcelona_onehot[fixed_columns]

barcelona_onehot.head()

NameError: name 'barcelona_venues' is not defined

We will group the Neighbourhoods and calculate the mean venue categories value in each Neighbourhood



In [15]:
barcelona_grouped = barcelona_onehot.groupby('Neighbourhood').mean().reset_index()
barcelona_grouped

NameError: name 'barcelona_onehot' is not defined

We take the top 10 categories to cluster the neighbourhoods.

In [16]:
num_top_venues = 10

for hood in barcelona_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = barcelona_grouped[barcelona_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

NameError: name 'barcelona_grouped' is not defined

In [17]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

### Top venue categories in Barcelona

In [18]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Neighbourhood'] = barcelona_grouped['Neighbourhood']

for ind in np.arange(barcelona_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(barcelona_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_venues_sorted.head()

NameError: name 'np' is not defined

## Model Building

K Means: Clustering the city og Barcelona to 5 clusters

In [19]:
# number of clusters
kclusters = 5

barcelona_grouped_clustering = barcelona_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(barcelona_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

NameError: name 'barcelona_grouped' is not defined

New dataframe that includes the cluster as well as the top 10 venues for each neighborhood.


In [20]:
# add clustering labels
neighbourhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

barcelona_merged = dfb

barcelona_merged = barcelona_merged.join(neighbourhoods_venues_sorted.set_index('Neighbourhood'), on='NOM_BARRI')

barcelona_merged.head() # check the last columns!

NameError: name 'neighbourhoods_venues_sorted' is not defined

Drop all the null values 



In [21]:
barcelona_merged.dropna(subset=['Cluster Labels'], inplace=True)
barcelona_merged.isnull().sum()

NameError: name 'barcelona_merged' is not defined

## Visualizing Barcelona with  clustered neighbourhood

In [22]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11.4)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(barcelona_merged['Latitude'], barcelona_merged['Longitude'], barcelona_merged['NOM_BARRI'], barcelona_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

NameError: name 'folium' is not defined

## Examining our Clusters

Cluster 1

In [23]:
barcelona_merged.loc[barcelona_merged['Cluster Labels'] == 0, barcelona_merged.columns[[1] + list(range(5, barcelona_merged.shape[1]))]]


NameError: name 'barcelona_merged' is not defined

Cluster 2

In [24]:
barcelona_merged.loc[barcelona_merged['Cluster Labels'] == 1, barcelona_merged.columns[[1] + list(range(5, barcelona_merged.shape[1]))]]


NameError: name 'barcelona_merged' is not defined

Cluster 3

In [25]:
barcelona_merged.loc[barcelona_merged['Cluster Labels'] == 2, barcelona_merged.columns[[1] + list(range(5, barcelona_merged.shape[1]))]]



NameError: name 'barcelona_merged' is not defined

Cluster 4

In [26]:
barcelona_merged.loc[barcelona_merged['Cluster Labels'] == 3, barcelona_merged.columns[[1] + list(range(5, barcelona_merged.shape[1]))]]



NameError: name 'barcelona_merged' is not defined

Cluster 5

In [27]:
barcelona_merged.loc[barcelona_merged['Cluster Labels'] == 4, barcelona_merged.columns[[1] + list(range(5, barcelona_merged.shape[1]))]]



NameError: name 'barcelona_merged' is not defined

***

# Exploring Madrid

### Data Collection

We begin by reading the csv data and collecting coordinates on neighbourhoods of Madrid

In [28]:
!wget -q -O 'madrid_data.csv' https://datos.madrid.es/egob/catalogo/200078-1-distritos-barrios.csv
print('Data downloaded!')

Data downloaded!


In [29]:
dfm = pd.read_csv('madrid_data.csv', sep=';', encoding='latin-1')
dfm.head()

EmptyDataError: No columns to parse from file

### Feature Engineering 

In this case since the column name have spaces in between to make things easier we will shorten the required attributes:
1. *Codigo de barrio*: Neighbourhood Code rename to: *CODI_BARRI*
2. *Codigo de distrito al que pertenece*: District code to which it belongs rename to: *CODI_DISTRICTE*
3. *Nombre de barrio*: Neighbourhood Name rename to: *NOM_BARRI*

In [None]:
dfm = dfm.rename(columns = {"Codigo de barrio":"CODI_BARRI", "Codigo de distrito al que pertenece":"CODI_DISTRICTE", "Nombre de barrio":"NOM_BARRI"})

dfm

In [None]:
dfm.keys()

In [None]:
def get_latlng(NOM_BARRI):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Madrid, Spain'.format(NOM_BARRI))
        lat_lng_coords = g.latlng
    return lat_lng_coords
# Call the function to get the coordinates, store in a new list using list comprehension
coords = [ get_latlng(NOM_BARRI) for NOM_BARRI in dfm["NOM_BARRI"].tolist()]

In [None]:
dfm_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])
# Merge the coordinates into the original dataframe
dfm['Latitude'] = dfm_coords['Latitude']
dfm['Longitude'] = dfm_coords['Longitude']
print(dfm.shape)

In [None]:
dfm.head()

## Map of Madrid with all of the Neighbourhoods

We will use the Nominatim and folium library to generate the coordinates of city first and then map the neighbourhoods

In [None]:
address = 'Madrid, Spain'

geolocator = Nominatim(user_agent="madrid_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The coordinates of Madrid are {}, {}.'.format(latitude, longitude))

In [30]:
map_Madrid = folium.Map(location=[latitude, longitude], zoom_start=10.4)

# adding markers to map
for latitude, longitude, NOM_BARRI in zip(dfm['Latitude'], dfm['Longitude'], dfm['NOM_BARRI']):
    label = '{},'.format(NOM_BARRI)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color='red',
        fill=True
        ).add_to(map_Madrid)  
    
map_Madrid

NameError: name 'folium' is not defined

### Venues in Madrid


Defining a function to get the nearby venues in the neighbourhood.



In [31]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius
            )
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Category']
    
    return(nearby_venues)

Getting all of the venues in Madrid

In [32]:
madrid_venues = getNearbyVenues(dfm['NOM_BARRI'], dfm['Latitude'], dfm['Longitude'])


NameError: name 'dfm' is not defined

In [33]:
madrid_venues.head()

NameError: name 'madrid_venues' is not defined

In [34]:
madrid_venues.shape

NameError: name 'madrid_venues' is not defined

### Grouping by Venue Categories


In [35]:
madrid_venues.groupby('Neighbourhood').count()

NameError: name 'madrid_venues' is not defined

In [36]:
print('There are {} uniques categories.'.format(len(madrid_venues['Venue Category'].unique())))

NameError: name 'madrid_venues' is not defined

### Analyze each Neighbourhood


In [37]:
# one hot encoding
madrid_onehot = pd.get_dummies(madrid_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
madrid_onehot['Neighbourhood'] = madrid_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [madrid_onehot.columns[-1]] + list(madrid_onehot.columns[:-1])
madrid_onehot = madrid_onehot[fixed_columns]

madrid_onehot.head()

NameError: name 'madrid_venues' is not defined

We will group the Neighbourhoods and calculate the mean venue categories value in each Neighbourhood



In [38]:
madrid_grouped = madrid_onehot.groupby('Neighbourhood').mean().reset_index()
madrid_grouped

NameError: name 'madrid_onehot' is not defined

We take the top 10 categories to cluster the neighbourhoods.

In [39]:
num_top_venues = 10

for hood in madrid_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = madrid_grouped[madrid_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

NameError: name 'madrid_grouped' is not defined

In [40]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

### Top venue categories in Madrid

In [41]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Neighbourhood'] = madrid_grouped['Neighbourhood']

for ind in np.arange(madrid_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(madrid_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_venues_sorted.head()

NameError: name 'np' is not defined

## Model Building

K Means: Clustering the city of Madrid to 5 clusters

In [42]:
# number of clusters
kclusters = 5

madrid_grouped_clustering = madrid_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(madrid_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

NameError: name 'madrid_grouped' is not defined

New dataframe that includes the cluster as well as the top 10 venues for each neighborhood.


In [43]:
# add clustering labels
neighbourhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

madrid_merged = dfm

madrid_merged = madrid_merged.join(neighbourhoods_venues_sorted.set_index('Neighbourhood'), on='NOM_BARRI')

madrid_merged.head() # check the last columns!

NameError: name 'neighbourhoods_venues_sorted' is not defined

Drop all the null values 



In [44]:
madrid_merged.dropna(subset=['Cluster Labels'], inplace=True)
madrid_merged.isnull().sum()

NameError: name 'madrid_merged' is not defined

## Visualizing Madrid with  clustered neighbourhood

In [45]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11.4)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(madrid_merged['Latitude'], madrid_merged['Longitude'], madrid_merged['NOM_BARRI'], madrid_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

NameError: name 'folium' is not defined

## Examining our Clusters

Cluster 1

In [46]:
madrid_merged.loc[madrid_merged['Cluster Labels'] == 0, madrid_merged.columns[[1] + list(range(5, madrid_merged.shape[1]))]]


NameError: name 'madrid_merged' is not defined

Cluster 2

In [47]:
madrid_merged.loc[madrid_merged['Cluster Labels'] == 1, madrid_merged.columns[[1] + list(range(5, madrid_merged.shape[1]))]]


NameError: name 'madrid_merged' is not defined

Cluster 3

In [48]:
madrid_merged.loc[madrid_merged['Cluster Labels'] == 2, madrid_merged.columns[[1] + list(range(5, madrid_merged.shape[1]))]]


NameError: name 'madrid_merged' is not defined

Cluster 4

In [49]:
madrid_merged.loc[madrid_merged['Cluster Labels'] == 3, madrid_merged.columns[[1] + list(range(5, madrid_merged.shape[1]))]]


NameError: name 'madrid_merged' is not defined

Cluster 5

In [50]:
madrid_merged.loc[madrid_merged['Cluster Labels'] == 4, madrid_merged.columns[[1] + list(range(5, madrid_merged.shape[1]))]]


NameError: name 'madrid_merged' is not defined

# Results and Discussion

The neighbourhoods of Barcelona, and Madrid are offer both similar and different mutlicutlural experiences at the same time. Let's start of by discussing the geography, Madrid is a bustling capital on arid plains in the heart of Spain it is bigger and has about 132 neighbourhoods to explore. Barcelona is relatively small with 73 neighbourhoods to expore. Barcelona is more of an Mediterranean city and offers a lot more natural, and picturesque natural scenery. Both have historical sites and museums although Barcelona has more extraordinary art and architecture on the street to enjoy like the Gothic quarter, it is a perfect combination of City and Beach. Meanwhile, Madrid is a cosmopolitian city, and offers a lot more possibilities in terms of entertainment and Spanish atmosphere. In terms of cuisines both Barcelona and Madrid offer wide variety of international cuisines.  Barcelona offers more Catalan cuisine and is more vegetarian and vegan friendly, while Madrid offers more traditional Spanish, and a gourmet experience. Both have really good street markets to offer like Barcelona's La Ramblas and Madrid's Rastro. If you are looking for more cosmopolitian and bustling city with lots of entertainment  like experience Madrid will be the choice. If you are looking for more combination of city and beach/nature with architecture and art on the street then Barcelona will be the choice. 

# Conclusion

The purpose of this project was to explore the two of the biggest cities Spain (Barcelona and Madrid) and see how attractive it is to potential tourists and migrants. We explored both the cities based on their neighbourhoods and then extrapolated the common venues present in each of the neighbourhoods finally concluding with clustering similar neighbourhoods together.

We can conclude that each of the neighbourhoods in both the cities are culturally diverse and offer a wide variety of unique experiences  which is unique in it's own way. 

Both cities (Barcelona and Madrid) seem to offer a nice vacation stay  with a lot of places to explore, and variety of unique activites to do. Overall, it's up to the individual to decide which experience they would prefer more cosmopolitian with lot's of possibilities of enterntainment(Madrid) or a combination of city along with natural scenery with extra ordinary arts and architecture integrated within it's streets (Barcelona).