# Capstone Project - The Battle of the Neighborhoods (Contd)

## Moving to Madrid

### Table of contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 2>

1. <a href="#item1">Introduction: Business Understanding</a>
2. <a href="#item2">Data</a>
3. <a href="#item3">Methodology</a>
4. <a href="#item4">Analysis</a>
5. <a href="#item5">Results and Discussion</a>    
6. <a href="#item6">Conclusion</a>  
</font>
</div>

## 1. Introduction: Business Understanding

When deciding which neighborhood moving to, there are several key aspects. One of course is money/budget. Another one is criminal rate. Another one is communications and last one is finding in that neighborhood what  matters to the person that is moving. We are going to focus on the latest.

In this project, we are going use data science and Foursquare to recommend families interested in **moving to Madrid** which neighborhood(s) to choose depending on the venues that are more important to them. That would be schools, daycare centers, parks and groceries. 

## 2. Data

We will get the data from several data bases:
- Borughs and neighborhoods in Madrid from Wikipedia: https://es.wikipedia.org/wiki/Anexo:Barrios_administrativos_de_Madrid
- Latitude and longitude of each neighborhood using Geopy library
- High criminality and noisy boroughs from City Council of Madrid: https://datos.madrid.es/egob/catalogo/212616-74-policia-estadisticas.xlsx, so that we can discard them as they are not recommendable for families.
- Number of each desired venue and Top 10 venues by neighborhood from Foursquare data

First, we load the necessary libraries.

In [1]:
pip install geocoder

Collecting geocoder
[?25l  Downloading https://files.pythonhosted.org/packages/4f/6b/13166c909ad2f2d76b929a4227c952630ebaf0d729f6317eb09cbceccbab/geocoder-1.38.1-py2.py3-none-any.whl (98kB)
[K     |████████████████████████████████| 102kB 6.3MB/s ta 0:00:011
[?25hCollecting click (from geocoder)
[?25l  Downloading https://files.pythonhosted.org/packages/d2/3d/fa76db83bf75c4f8d338c2fd15c8d33fdd7ad23a9b5e57eb6c5de26b430e/click-7.1.2-py2.py3-none-any.whl (82kB)
[K     |████████████████████████████████| 92kB 6.2MB/s eta 0:00:011
Collecting ratelim (from geocoder)
  Downloading https://files.pythonhosted.org/packages/f2/98/7e6d147fd16a10a5f821db6e25f192265d6ecca3d82957a4fdd592cad49c/ratelim-0.1.6-py2.py3-none-any.whl
Collecting future (from geocoder)
[?25l  Downloading https://files.pythonhosted.org/packages/45/0b/38b06fd9b92dc2b68d58b75f900e97884c45bedd2ff83203d933cf5851c9/future-0.18.2.tar.gz (829kB)
[K     |████████████████████████████████| 829kB 18.8MB/s eta 0:00:01
Building wheel

In [2]:
!conda install -c conda-forge geopy --yes 

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-1.22.0               |     pyh9f0ad1d_0          63 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          97 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-1.22.0-pyh9f0ad1d_0



Downloading and Extracting Packages
geopy-1.22.0         | 63 KB     | ##################################### | 100% 
geographiclib-1.50   | 34 KB     | ###############################

In [3]:
import csv
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

In [4]:
from pandas import ExcelWriter
from pandas import ExcelFile

In [5]:
import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    altair-4.1.0               |             py_1         614 KB  conda-forge
    branca-0.4.1               |             py_0          26 KB  conda-forge
    brotlipy-0.7.0             |py36h8c4c3a4_1000         346 KB  conda-forge
    chardet-3.0.4              |py36h9f0ad1d_1006         188 KB  conda-forge
    cryptography-2.9.2         |   py36h45558ae_0         613 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    pandas-1.0.3               |   py36h83

Then, we download the data we got from Wikipedia and prepared previously about Madrid's Boroughs, Neighborhoods and coordinates.

In [6]:
neighborhoods_Madrid=pd.read_csv('Madrid_data_coord.csv')
neighborhoods_Madrid.head()

Unnamed: 0.1,Unnamed: 0,Borough,Neighborhood,Address,Latitude,Longitude
0,0,Centro,Palacio,"Palacio, Centro, Madrid",40.40963,-3.87979
1,1,Centro,Embajadores,"Embajadores, Centro, Madrid",40.39107,-3.69273
2,2,Centro,Cortes,"Cortes, Centro, Madrid",40.41641,-3.69887
3,3,Centro,Justicia,"Justicia, Centro, Madrid",40.42446,-3.69672
4,4,Centro,Universidad,"Universidad, Centro, Madrid",40.42565,-3.70726


In [7]:
neighborhoods_Madrid = neighborhoods_Madrid.drop('Unnamed: 0', 1)

Then we are going to reduce the number of neighborhoods/boroughs of this study by discarding the ones with high criminality and noise as they are not recommendable for families. For that we use data from City Council of Madrid: https://datos.madrid.es/egob/catalogo/212616-74-policia-estadisticas.xlsx.

In [8]:
criminality_Madrid = pd.ExcelFile('https://datos.madrid.es/egob/catalogo/212616-74-policia-estadisticas.xlsx')
arrested_Madrid = pd.read_excel(criminality_Madrid, 'PERS. DETENIDAS X DISTRITOS')
alcohol_complaints_Madrid = pd.read_excel(criminality_Madrid, 'CONSUMO ALCOHOL')
businesses_complaints_Madrid = pd.read_excel(criminality_Madrid, 'LOCALES')

In [9]:
arrested_Madrid.rename(columns={'PERSONAS DETENIDAS E INVESTIGADAS':'Borough', 'Unnamed: 1':'Arrested people'}, 
                 inplace=True)
arrested_Madrid.drop(arrested_Madrid.index[[0,1]],inplace=True)
arrested_Madrid.sort_values(['Arrested people'], ascending=False).head(5)

Unnamed: 0,Borough,Arrested people
24,TOTAL,789
2,CENTRO,141
14,PUENTE DE VALLECAS,98
7,TETUÁN,59
5,SALAMANCA,50


So we are going to discard Centro and Puente de Vallecas.

In [10]:
alcohol_complaints_Madrid.rename(columns={'CONSUMO DE ALCOHOL EN VÍA PÚBLICA':'Borough', 'Unnamed: 1':'Above 18 years old', 'Unnamed: 2':'Under age'}, 
                 inplace=True)
alcohol_complaints_Madrid.drop(alcohol_complaints_Madrid.index[[0,1]],inplace=True)
alcohol_complaints_Madrid.sort_values(['Above 18 years old'], ascending=False).head(5)

Unnamed: 0,Borough,Above 18 years old,Under age
24,TOTAL,2940,52
2,CENTRO,729,1
13,USERA,426,1
8,CHAMBERÍ,270,14
7,TETUÁN,269,7


So we are discarding Usera, Chamberi and Tetuan as well.

In [11]:
businesses_complaints_Madrid.rename(columns={'INSPECCIONES Y ACTUACIONES EN LOCALES DE ESPECTÁCULOS':'Borough', 'Unnamed: 1':'Inspections', 'Unnamed: 2':'Reports'}, 
                 inplace=True)
businesses_complaints_Madrid.drop(businesses_complaints_Madrid.index[[0,1]],inplace=True)
businesses_complaints_Madrid.sort_values(['Reports'], ascending=False).head(5)

Unnamed: 0,Borough,Inspections,Reports
24,TOTAL,1452,2565
2,CENTRO,325,574
8,CHAMBERÍ,128,315
5,SALAMANCA,136,252
7,TETUÁN,74,165


So we are discarding Salamanca too. We are also going to discard outskirts: Fuencarral-El Pardo, Latina, Moncloa-Aravaca, Carabanchel, Moratalaz, Ciudad Lineal, Hortaleza, Villaverde Vicalvaro, San Blas-Canillejas and Barajas. So we have narrowed the interesting boroughs to four: ARGANZUELA, RETIRO, CHAMARTÍN and VILLA DE VALLECAS.

In [12]:
reduced_Madrid=neighborhoods_Madrid[neighborhoods_Madrid['Borough'].isin(['Arganzuela', 'Retiro', 'Chamartín', 'Villa de Vallecas']) ]

In [13]:
reduced_Madrid.shape

(22, 5)

In [14]:
reduced_Madrid.head()

Unnamed: 0,Borough,Neighborhood,Address,Latitude,Longitude
6,Arganzuela,Imperial,"Imperial, Arganzuela, Madrid",40.40833,-3.71865
7,Arganzuela,Acacias,"Acacias, Arganzuela, Madrid",40.40137,-3.70669
8,Arganzuela,Chopera,"Chopera, Arganzuela, Madrid",40.3935,-3.69845
9,Arganzuela,Legazpi,"Legazpi, Arganzuela, Madrid",40.38702,-3.6899
10,Arganzuela,Delicias,"Delicias, Arganzuela, Madrid",40.39613,-3.68946


In [15]:
reduced_Madrid2 = reduced_Madrid.reset_index()

In [17]:
reduced_Madrid2.head()

Unnamed: 0,index,Borough,Neighborhood,Address,Latitude,Longitude
0,6,Arganzuela,Imperial,"Imperial, Arganzuela, Madrid",40.40833,-3.71865
1,7,Arganzuela,Acacias,"Acacias, Arganzuela, Madrid",40.40137,-3.70669
2,8,Arganzuela,Chopera,"Chopera, Arganzuela, Madrid",40.3935,-3.69845
3,9,Arganzuela,Legazpi,"Legazpi, Arganzuela, Madrid",40.38702,-3.6899
4,10,Arganzuela,Delicias,"Delicias, Arganzuela, Madrid",40.39613,-3.68946


In [20]:
reduced_Madrid2 = reduced_Madrid2.drop('index', 1)
reduced_Madrid2.head()

Unnamed: 0,level_0,Borough,Neighborhood,Address,Latitude,Longitude
0,0,Arganzuela,Imperial,"Imperial, Arganzuela, Madrid",40.40833,-3.71865
1,1,Arganzuela,Acacias,"Acacias, Arganzuela, Madrid",40.40137,-3.70669
2,2,Arganzuela,Chopera,"Chopera, Arganzuela, Madrid",40.3935,-3.69845
3,3,Arganzuela,Legazpi,"Legazpi, Arganzuela, Madrid",40.38702,-3.6899
4,4,Arganzuela,Delicias,"Delicias, Arganzuela, Madrid",40.39613,-3.68946


In [21]:
reduced_Madrid2 = reduced_Madrid2.drop('level_0', 1)
reduced_Madrid2.head()

Unnamed: 0,Borough,Neighborhood,Address,Latitude,Longitude
0,Arganzuela,Imperial,"Imperial, Arganzuela, Madrid",40.40833,-3.71865
1,Arganzuela,Acacias,"Acacias, Arganzuela, Madrid",40.40137,-3.70669
2,Arganzuela,Chopera,"Chopera, Arganzuela, Madrid",40.3935,-3.69845
3,Arganzuela,Legazpi,"Legazpi, Arganzuela, Madrid",40.38702,-3.6899
4,Arganzuela,Delicias,"Delicias, Arganzuela, Madrid",40.39613,-3.68946


In [22]:
reduced_Madrid2.loc[0, 'Neighborhood']

'Imperial'

In [23]:
reduced_Madrid2.dtypes

Borough          object
Neighborhood     object
Address          object
Latitude        float64
Longitude       float64
dtype: object

This is what we are going to use as a database. 

## 3. Methodology

### 3.1. Madrid Map

First, we can use Folium to situate Madrid (Spain) and the neighborhoods on a map, using the coordinates we calculated previously.

In [24]:
address = 'Madrid'

geolocator = Nominatim(user_agent="madrid_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Madrid are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Madrid are 40.4167047, -3.7035825.


In [29]:
# create map and display it
madrid_map = folium.Map(location=[latitude, longitude], zoom_start=12)

# display the map of Madrid
madrid_map

In [27]:
madrid_map = folium.Map(location=[latitude, longitude], zoom_start=10, tiles='Stamen Terrain')
madrid_map

In [141]:
# create map of Madrid's neighborhoods using latitude and longitude values
map_madrid = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods_Madrid['Latitude'], neighborhoods_Madrid['Longitude'], neighborhoods_Madrid['Borough'], neighborhoods_Madrid['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=12,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        #parse_html=False).
        #.add_to(map_toronto)  
        parse_html=False).add_to(map_madrid)  
    
map_madrid

40.4167047

In [30]:
# create map of Madrid's neighborhoods using latitude and longitude values
map_madrid = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, borough, neighborhood in zip(reduced_Madrid['Latitude'], reduced_Madrid['Longitude'], reduced_Madrid['Borough'], reduced_Madrid['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=12,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        #parse_html=False).
        #.add_to(map_toronto)  
        parse_html=False).add_to(map_madrid)  
    
map_madrid

### 3.2. Foursquare

We utilize Foursquare API to explore the neighborhoods and segment them.

First, we need to define Foursquare Credentials and Version

In [31]:
CLIENT_ID = 'UT00WXQB1DSZBAEGF3EEVXKREOSWEKNDZDUSOAQ1BF044JPY' # your Foursquare ID
CLIENT_SECRET = 'J2FI0CCKM0KS5WC12SGOCTMZANIIOA2BAR2D2XVLAOSWPYUW' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: UT00WXQB1DSZBAEGF3EEVXKREOSWEKNDZDUSOAQ1BF044JPY
CLIENT_SECRET:J2FI0CCKM0KS5WC12SGOCTMZANIIOA2BAR2D2XVLAOSWPYUW


Let's explore the first neighborhood in our dataframe.

Now, let's get the top 100 venues that are in the first neighborhood within a radius of 500 meters.

Get the neighborhood's latitude and longitude values.

In [32]:
neighborhood_latitude = reduced_Madrid2.iloc[0,3] # neighborhood latitude value
neighborhood_longitude = reduced_Madrid2.iloc[0,4] # neighborhood longitude value

In [33]:
neighborhood_name = reduced_Madrid.iloc[0,2] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Imperial, Arganzuela, Madrid are 40.408330000000035, -3.718649999999968.


First, we create the GET request URL. Name your URL url.

In [34]:
LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format( CLIENT_ID, CLIENT_SECRET, VERSION, neighborhood_latitude, neighborhood_longitude, radius, LIMIT)

Then we send the GET request and examine the results.

In [35]:
results = requests.get(url).json()

All the information is in the items key. Before we proceed, let's define a function to to get the category of the venue.

In [36]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a pandas dataframe.

In [37]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,Madrid Río (Sector Norte),Park,40.408791,-3.722992
1,Seoul,Korean Restaurant,40.411059,-3.71809
2,El Landó,Spanish Restaurant,40.4119,-3.715076
3,Parque de Atenas,Park,40.41133,-3.719384
4,El Camarote,Coffee Shop,40.40839,-3.716242


We obtain every venue in the neighborhood. Let's find out how many venues were returned by Foursquare. 

In [38]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

25 venues were returned by Foursquare.


### 3.3. Venues in the city

Now we need to explore all Neighborhoods in Madrid on our radar. Let's create a function to repeat the same process to all these neighborhoods.

In [104]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)    

Now we write the code to run the above function on each neighborhood and create a new dataframe called madrid_venues.

In [105]:
madrid_venues = getNearbyVenues(names=reduced_Madrid2['Neighborhood'],
                                   latitudes=reduced_Madrid2['Latitude'],
                                   longitudes=reduced_Madrid2['Longitude']
                                  )

Imperial
Acacias
Chopera
Legazpi
Delicias
Palos de Moguer
Atocha
Pacífico
Adelfas
Estrella
Ibiza
Jerónimos
Niño Jesús
El Viso
Prosperidad
Ciudad Jardín
Hispanoamérica
Nueva España
Castilla
Casco Histórico de Vallecas
Santa Eugenia
Ensanche de Vallecas


Let's check the size of the resulting dataframe:

In [106]:
print(madrid_venues.shape)
madrid_venues.head()

(757, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Imperial,40.40833,-3.71865,Madrid Río (Sector Norte),40.408791,-3.722992,Park
1,Imperial,40.40833,-3.71865,Seoul,40.411059,-3.71809,Korean Restaurant
2,Imperial,40.40833,-3.71865,El Landó,40.4119,-3.715076,Spanish Restaurant
3,Imperial,40.40833,-3.71865,Parque de Atenas,40.41133,-3.719384,Park
4,Imperial,40.40833,-3.71865,El Camarote,40.40839,-3.716242,Coffee Shop


Let's check how many venues were returned for each neighborhood

In [107]:
madrid_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Acacias,52,52,52,52,52,52
Adelfas,50,50,50,50,50,50
Atocha,54,54,54,54,54,54
Casco Histórico de Vallecas,3,3,3,3,3,3
Castilla,2,2,2,2,2,2
Chopera,46,46,46,46,46,46
Ciudad Jardín,35,35,35,35,35,35
Delicias,29,29,29,29,29,29
El Viso,16,16,16,16,16,16
Ensanche de Vallecas,11,11,11,11,11,11


Let's find out how many unique categories can be curated from all the returned venues

In [108]:
print('There are {} unique categories.'.format(len(madrid_venues['Venue Category'].unique())))

There are 143 unique categories.


We create a venue matrix indicating with number 1 the venue type of each one we found.

In [109]:
# one hot encoding
madrid_onehot = pd.get_dummies(madrid_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
madrid_onehot['Neighborhood'] = madrid_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [madrid_onehot.columns[-1]] + list(madrid_onehot.columns[:-1])
madrid_onehot = madrid_onehot[fixed_columns]

madrid_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bar,Beer Bar,Beer Garden,Big Box Store,Bistro,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Burger Joint,Café,Candy Store,Chinese Restaurant,Church,City Hall,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Cosmetics Shop,Cuban Restaurant,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop,Diner,Eastern European Restaurant,Electronics Store,Event Space,Exhibit,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish Market,Flea Market,Food & Drink Shop,Food Truck,Football Stadium,Fountain,Fried Chicken Joint,Furniture / Home Store,Garden,Gastropub,Gay Bar,General Entertainment,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gun Range,Gym,Gym / Fitness Center,Gymnastics Gym,Health & Beauty Service,History Museum,Hostel,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Italian Restaurant,Japanese Restaurant,Jazz Club,Karaoke Bar,Korean Restaurant,Lake,Latin American Restaurant,Market,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Molecular Gastronomy Restaurant,Monument / Landmark,Movie Theater,Museum,Music Venue,Nightclub,Other Great Outdoors,Other Nightlife,Outdoors & Recreation,Paella Restaurant,Park,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Pizza Place,Platform,Playground,Plaza,Polish Restaurant,Pool,Portuguese Restaurant,Pub,Public Art,Restaurant,Road,Rock Club,Roof Deck,Salvadoran Restaurant,Sandwich Place,Scenic Lookout,Science Museum,Seafood Restaurant,Shopping Mall,Skate Park,Snack Place,Soccer Field,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Club,Steakhouse,Supermarket,Sushi Restaurant,Tailor Shop,Tapas Restaurant,Tattoo Parlor,Tea Room,Thai Restaurant,Theater,Thrift / Vintage Store,Trade School,Train,Train Station,Used Bookstore,Vegetarian / Vegan Restaurant,Wine Bar
0,Imperial,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Imperial,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Imperial,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Imperial,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Imperial,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Let's examine the new dataframe size.

In [110]:
madrid_onehot.shape

(757, 144)

Let's save it to csv.

In [111]:
madrid_onehot.to_csv('onehot_Madrid.csv')

Next, let's group venues by neighborhood and by taking the mean of the frequency of occurrence of each category.

In [112]:
madrid_grouped = madrid_onehot.groupby('Neighborhood').mean().reset_index()
madrid_grouped

Unnamed: 0,Neighborhood,American Restaurant,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bar,Beer Bar,Beer Garden,Big Box Store,Bistro,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Burger Joint,Café,Candy Store,Chinese Restaurant,Church,City Hall,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Cosmetics Shop,Cuban Restaurant,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop,Diner,Eastern European Restaurant,Electronics Store,Event Space,Exhibit,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish Market,Flea Market,Food & Drink Shop,Food Truck,Football Stadium,Fountain,Fried Chicken Joint,Furniture / Home Store,Garden,Gastropub,Gay Bar,General Entertainment,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gun Range,Gym,Gym / Fitness Center,Gymnastics Gym,Health & Beauty Service,History Museum,Hostel,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Italian Restaurant,Japanese Restaurant,Jazz Club,Karaoke Bar,Korean Restaurant,Lake,Latin American Restaurant,Market,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Molecular Gastronomy Restaurant,Monument / Landmark,Movie Theater,Museum,Music Venue,Nightclub,Other Great Outdoors,Other Nightlife,Outdoors & Recreation,Paella Restaurant,Park,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Pizza Place,Platform,Playground,Plaza,Polish Restaurant,Pool,Portuguese Restaurant,Pub,Public Art,Restaurant,Road,Rock Club,Roof Deck,Salvadoran Restaurant,Sandwich Place,Scenic Lookout,Science Museum,Seafood Restaurant,Shopping Mall,Skate Park,Snack Place,Soccer Field,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Club,Steakhouse,Supermarket,Sushi Restaurant,Tailor Shop,Tapas Restaurant,Tattoo Parlor,Tea Room,Thai Restaurant,Theater,Thrift / Vintage Store,Trade School,Train,Train Station,Used Bookstore,Vegetarian / Vegan Restaurant,Wine Bar
0,Acacias,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.057692,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.019231,0.0,0.019231,0.038462,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.038462,0.0,0.0,0.0,0.0,0.019231,0.0,0.019231,0.0,0.019231,0.038462,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.038462,0.0,0.019231,0.0,0.0,0.057692,0.0,0.019231,0.0,0.0,0.019231,0.019231,0.038462,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.019231,0.0,0.0,0.115385,0.0,0.0,0.0,0.057692,0.019231,0.0,0.057692,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.019231,0.0,0.0,0.0
1,Adelfas,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.04,0.08,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.06,0.02,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.04,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.06,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.06,0.0,0.0,0.0,0.04,0.0,0.0,0.04,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0
2,Atocha,0.018519,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.092593,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.018519,0.0,0.018519,0.0,0.0,0.037037,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.018519,0.0,0.0,0.018519,0.018519,0.0,0.0,0.0,0.0,0.018519,0.018519,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.018519,0.018519,0.0,0.0,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018519,0.0,0.018519,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.018519,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.092593,0.0,0.0,0.0,0.0,0.018519,0.0,0.166667,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.055556,0.0
3,Casco Histórico de Vallecas,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Castilla,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Chopera,0.0,0.021739,0.0,0.021739,0.043478,0.021739,0.0,0.0,0.021739,0.021739,0.021739,0.0,0.043478,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.021739,0.0,0.021739,0.0,0.0,0.043478,0.0,0.065217,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.021739,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.021739,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.021739,0.0,0.043478,0.0,0.0,0.0,0.021739,0.0,0.0,0.021739,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.043478,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.021739,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Ciudad Jardín,0.0,0.0,0.0,0.0,0.0,0.0,0.057143,0.0,0.0,0.057143,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.057143,0.057143,0.0,0.0,0.0,0.0,0.0,0.0,0.057143,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.028571,0.0,0.0,0.028571,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.028571,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.028571,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.114286,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.028571,0.0,0.0,0.057143,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0
7,Delicias,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.034483,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.068966,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.068966,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.034483,0.0,0.103448,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.068966,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.034483,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0
8,El Viso,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Ensanche de Vallecas,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.181818,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Let's check the new size:

In [113]:
madrid_grouped.shape

(22, 144)

Let's put all venues into a pandas dataframe. First, let's write a function to sort the venues in descending order.

In [114]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [115]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = madrid_grouped['Neighborhood']

for ind in np.arange(madrid_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(madrid_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Acacias,Spanish Restaurant,Pizza Place,Tapas Restaurant,Bar,Supermarket,Café,Pub,Park,Gym,Gym / Fitness Center
1,Adelfas,Bar,Breakfast Spot,Grocery Store,Spanish Restaurant,Bakery,Gym,Food & Drink Shop,Pizza Place,Tapas Restaurant,Supermarket
2,Atocha,Tapas Restaurant,Spanish Restaurant,Bar,Café,Vegetarian / Vegan Restaurant,Restaurant,Cocktail Bar,Plaza,Flea Market,Church
3,Casco Histórico de Vallecas,Bakery,Pizza Place,Scenic Lookout,Wine Bar,Fountain,Football Stadium,Food Truck,Food & Drink Shop,Flea Market,Fish Market
4,Castilla,Restaurant,Tailor Shop,Wine Bar,Fast Food Restaurant,Fountain,Football Stadium,Food Truck,Food & Drink Shop,Flea Market,Fish Market


In [116]:
neighborhoods_venues_sorted.to_csv('neighborhoods_venues_sorted_Madrid.csv')

### 3.4. Cluster Neighborhoods

We run k-means to cluster the neighborhood into 4 clusters.

In [117]:
# set number of clusters
kclusters = 4

madrid_grouped_clustering = madrid_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(madrid_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 1, 0, 2, 1, 1, 1, 1, 1], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [118]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

In [119]:
madrid_merged = reduced_Madrid2

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
madrid_merged = madrid_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

madrid_merged.head()

Unnamed: 0,Borough,Neighborhood,Address,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Arganzuela,Imperial,"Imperial, Arganzuela, Madrid",40.40833,-3.71865,1,Spanish Restaurant,Park,Hotel,Gym / Fitness Center,Gym,Japanese Restaurant,Grocery Store,Korean Restaurant,Garden,Pizza Place
1,Arganzuela,Acacias,"Acacias, Arganzuela, Madrid",40.40137,-3.70669,1,Spanish Restaurant,Pizza Place,Tapas Restaurant,Bar,Supermarket,Café,Pub,Park,Gym,Gym / Fitness Center
2,Arganzuela,Chopera,"Chopera, Arganzuela, Madrid",40.3935,-3.69845,1,Coffee Shop,Burger Joint,Plaza,Italian Restaurant,Spanish Restaurant,Grocery Store,Clothing Store,Beer Garden,Tapas Restaurant,Art Gallery
3,Arganzuela,Legazpi,"Legazpi, Arganzuela, Madrid",40.38702,-3.6899,1,Supermarket,Bar,Café,Spanish Restaurant,Mexican Restaurant,Tapas Restaurant,Coffee Shop,Restaurant,Bistro,General Entertainment
4,Arganzuela,Delicias,"Delicias, Arganzuela, Madrid",40.39613,-3.68946,1,Restaurant,Grocery Store,Mediterranean Restaurant,Snack Place,Plaza,Spanish Restaurant,Farmers Market,Museum,Coffee Shop,Pub


Finally, let's visualize the resulting clusters

In [120]:
madrid_merged.dropna()

Unnamed: 0,Borough,Neighborhood,Address,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Arganzuela,Imperial,"Imperial, Arganzuela, Madrid",40.40833,-3.71865,1,Spanish Restaurant,Park,Hotel,Gym / Fitness Center,Gym,Japanese Restaurant,Grocery Store,Korean Restaurant,Garden,Pizza Place
1,Arganzuela,Acacias,"Acacias, Arganzuela, Madrid",40.40137,-3.70669,1,Spanish Restaurant,Pizza Place,Tapas Restaurant,Bar,Supermarket,Café,Pub,Park,Gym,Gym / Fitness Center
2,Arganzuela,Chopera,"Chopera, Arganzuela, Madrid",40.3935,-3.69845,1,Coffee Shop,Burger Joint,Plaza,Italian Restaurant,Spanish Restaurant,Grocery Store,Clothing Store,Beer Garden,Tapas Restaurant,Art Gallery
3,Arganzuela,Legazpi,"Legazpi, Arganzuela, Madrid",40.38702,-3.6899,1,Supermarket,Bar,Café,Spanish Restaurant,Mexican Restaurant,Tapas Restaurant,Coffee Shop,Restaurant,Bistro,General Entertainment
4,Arganzuela,Delicias,"Delicias, Arganzuela, Madrid",40.39613,-3.68946,1,Restaurant,Grocery Store,Mediterranean Restaurant,Snack Place,Plaza,Spanish Restaurant,Farmers Market,Museum,Coffee Shop,Pub
5,Arganzuela,Palos de Moguer,"Palos de Moguer, Arganzuela, Madrid",40.40301,-3.69358,1,Restaurant,Spanish Restaurant,Bakery,Tapas Restaurant,Coffee Shop,Grocery Store,Pizza Place,Platform,Chinese Restaurant,Hotel
6,Arganzuela,Atocha,"Atocha, Arganzuela, Madrid",40.40879,-3.71011,1,Tapas Restaurant,Spanish Restaurant,Bar,Café,Vegetarian / Vegan Restaurant,Restaurant,Cocktail Bar,Plaza,Flea Market,Church
7,Retiro,Pacífico,"Pacífico, Retiro, Madrid",40.40191,-3.67603,1,Spanish Restaurant,Grocery Store,Bar,Bakery,Asian Restaurant,Food & Drink Shop,Tapas Restaurant,Pizza Place,Café,Restaurant
8,Retiro,Adelfas,"Adelfas, Retiro, Madrid",40.40173,-3.67288,1,Bar,Breakfast Spot,Grocery Store,Spanish Restaurant,Bakery,Gym,Food & Drink Shop,Pizza Place,Tapas Restaurant,Supermarket
9,Retiro,Estrella,"Estrella, Retiro, Madrid",40.41117,-3.66593,1,Coffee Shop,Asian Restaurant,Bar,Spanish Restaurant,Plaza,Sports Club,Gym,Italian Restaurant,Jazz Club,Grocery Store


In [121]:
madrid_merged.dropna(axis = 1, how = 'all')

Unnamed: 0,Borough,Neighborhood,Address,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Arganzuela,Imperial,"Imperial, Arganzuela, Madrid",40.40833,-3.71865,1,Spanish Restaurant,Park,Hotel,Gym / Fitness Center,Gym,Japanese Restaurant,Grocery Store,Korean Restaurant,Garden,Pizza Place
1,Arganzuela,Acacias,"Acacias, Arganzuela, Madrid",40.40137,-3.70669,1,Spanish Restaurant,Pizza Place,Tapas Restaurant,Bar,Supermarket,Café,Pub,Park,Gym,Gym / Fitness Center
2,Arganzuela,Chopera,"Chopera, Arganzuela, Madrid",40.3935,-3.69845,1,Coffee Shop,Burger Joint,Plaza,Italian Restaurant,Spanish Restaurant,Grocery Store,Clothing Store,Beer Garden,Tapas Restaurant,Art Gallery
3,Arganzuela,Legazpi,"Legazpi, Arganzuela, Madrid",40.38702,-3.6899,1,Supermarket,Bar,Café,Spanish Restaurant,Mexican Restaurant,Tapas Restaurant,Coffee Shop,Restaurant,Bistro,General Entertainment
4,Arganzuela,Delicias,"Delicias, Arganzuela, Madrid",40.39613,-3.68946,1,Restaurant,Grocery Store,Mediterranean Restaurant,Snack Place,Plaza,Spanish Restaurant,Farmers Market,Museum,Coffee Shop,Pub
5,Arganzuela,Palos de Moguer,"Palos de Moguer, Arganzuela, Madrid",40.40301,-3.69358,1,Restaurant,Spanish Restaurant,Bakery,Tapas Restaurant,Coffee Shop,Grocery Store,Pizza Place,Platform,Chinese Restaurant,Hotel
6,Arganzuela,Atocha,"Atocha, Arganzuela, Madrid",40.40879,-3.71011,1,Tapas Restaurant,Spanish Restaurant,Bar,Café,Vegetarian / Vegan Restaurant,Restaurant,Cocktail Bar,Plaza,Flea Market,Church
7,Retiro,Pacífico,"Pacífico, Retiro, Madrid",40.40191,-3.67603,1,Spanish Restaurant,Grocery Store,Bar,Bakery,Asian Restaurant,Food & Drink Shop,Tapas Restaurant,Pizza Place,Café,Restaurant
8,Retiro,Adelfas,"Adelfas, Retiro, Madrid",40.40173,-3.67288,1,Bar,Breakfast Spot,Grocery Store,Spanish Restaurant,Bakery,Gym,Food & Drink Shop,Pizza Place,Tapas Restaurant,Supermarket
9,Retiro,Estrella,"Estrella, Retiro, Madrid",40.41117,-3.66593,1,Coffee Shop,Asian Restaurant,Bar,Spanish Restaurant,Plaza,Sports Club,Gym,Italian Restaurant,Jazz Club,Grocery Store


In [122]:
madrid_merged = madrid_merged.fillna(0)

In [123]:
madrid_merged['Cluster Labels'] = madrid_merged['Cluster Labels'].astype(int)

In [124]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(madrid_merged['Latitude'], madrid_merged['Longitude'], madrid_merged['Neighborhood'], madrid_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [125]:
madrid_merged.head()

Unnamed: 0,Borough,Neighborhood,Address,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Arganzuela,Imperial,"Imperial, Arganzuela, Madrid",40.40833,-3.71865,1,Spanish Restaurant,Park,Hotel,Gym / Fitness Center,Gym,Japanese Restaurant,Grocery Store,Korean Restaurant,Garden,Pizza Place
1,Arganzuela,Acacias,"Acacias, Arganzuela, Madrid",40.40137,-3.70669,1,Spanish Restaurant,Pizza Place,Tapas Restaurant,Bar,Supermarket,Café,Pub,Park,Gym,Gym / Fitness Center
2,Arganzuela,Chopera,"Chopera, Arganzuela, Madrid",40.3935,-3.69845,1,Coffee Shop,Burger Joint,Plaza,Italian Restaurant,Spanish Restaurant,Grocery Store,Clothing Store,Beer Garden,Tapas Restaurant,Art Gallery
3,Arganzuela,Legazpi,"Legazpi, Arganzuela, Madrid",40.38702,-3.6899,1,Supermarket,Bar,Café,Spanish Restaurant,Mexican Restaurant,Tapas Restaurant,Coffee Shop,Restaurant,Bistro,General Entertainment
4,Arganzuela,Delicias,"Delicias, Arganzuela, Madrid",40.39613,-3.68946,1,Restaurant,Grocery Store,Mediterranean Restaurant,Snack Place,Plaza,Spanish Restaurant,Farmers Market,Museum,Coffee Shop,Pub


## 5. Results and Discussion

Now, we can examine each cluster and determine the discriminating venue categories that distinguish each of them. Based on the defining categories, we can then assign a name to each cluster.

Cluster 1

In [126]:
madrid_merged.loc[madrid_merged['Cluster Labels'] == 0, madrid_merged.columns[[1] + list(range(5, madrid_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
19,Casco Histórico de Vallecas,0,Bakery,Pizza Place,Scenic Lookout,Wine Bar,Fountain,Football Stadium,Food Truck,Food & Drink Shop,Flea Market,Fish Market


This neighborhood is basically based on food shops with a slight touch of outdoors activities.

Cluster 2

In [127]:
madrid_merged.loc[madrid_merged['Cluster Labels'] == 1, madrid_merged.columns[[1] + list(range(5, madrid_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Imperial,1,Spanish Restaurant,Park,Hotel,Gym / Fitness Center,Gym,Japanese Restaurant,Grocery Store,Korean Restaurant,Garden,Pizza Place
1,Acacias,1,Spanish Restaurant,Pizza Place,Tapas Restaurant,Bar,Supermarket,Café,Pub,Park,Gym,Gym / Fitness Center
2,Chopera,1,Coffee Shop,Burger Joint,Plaza,Italian Restaurant,Spanish Restaurant,Grocery Store,Clothing Store,Beer Garden,Tapas Restaurant,Art Gallery
3,Legazpi,1,Supermarket,Bar,Café,Spanish Restaurant,Mexican Restaurant,Tapas Restaurant,Coffee Shop,Restaurant,Bistro,General Entertainment
4,Delicias,1,Restaurant,Grocery Store,Mediterranean Restaurant,Snack Place,Plaza,Spanish Restaurant,Farmers Market,Museum,Coffee Shop,Pub
5,Palos de Moguer,1,Restaurant,Spanish Restaurant,Bakery,Tapas Restaurant,Coffee Shop,Grocery Store,Pizza Place,Platform,Chinese Restaurant,Hotel
6,Atocha,1,Tapas Restaurant,Spanish Restaurant,Bar,Café,Vegetarian / Vegan Restaurant,Restaurant,Cocktail Bar,Plaza,Flea Market,Church
7,Pacífico,1,Spanish Restaurant,Grocery Store,Bar,Bakery,Asian Restaurant,Food & Drink Shop,Tapas Restaurant,Pizza Place,Café,Restaurant
8,Adelfas,1,Bar,Breakfast Spot,Grocery Store,Spanish Restaurant,Bakery,Gym,Food & Drink Shop,Pizza Place,Tapas Restaurant,Supermarket
9,Estrella,1,Coffee Shop,Asian Restaurant,Bar,Spanish Restaurant,Plaza,Sports Club,Gym,Italian Restaurant,Jazz Club,Grocery Store


Cluster 3 - Markets and restaurants

In [128]:
madrid_merged.loc[madrid_merged['Cluster Labels'] == 2, madrid_merged.columns[[1] + list(range(5, madrid_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,Castilla,2,Restaurant,Tailor Shop,Wine Bar,Fast Food Restaurant,Fountain,Football Stadium,Food Truck,Food & Drink Shop,Flea Market,Fish Market


In these neighborhood most venues are markets and restaurant businesses. Therefore, they are not very recommendable for our purpose.

Cluster 4 - Other businesses

In [129]:
madrid_merged.loc[madrid_merged['Cluster Labels'] == 3, madrid_merged.columns[[1] + list(range(5, madrid_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
20,Santa Eugenia,3,Department Store,Platform,Gym,Metro Station,Fish Market,Fountain,Football Stadium,Food Truck,Food & Drink Shop,Flea Market


In this neighborhood there are gyms and other type of businesses non-related to our purpose.

Unlikely, Foursquare doesn't include information about schools and daycare centers. We need to extract this from other tools, we use Googleapis by proximity and saved it in Excel:

In [81]:
schools = pd.ExcelFile('schools_neighborhoods_Madrid.xlsx')
schools_Madrid = pd.read_excel(schools)

In [86]:
reduced_schools=schools_Madrid[schools_Madrid['Borough'].isin(['Arganzuela', 'Retiro', 'Chamartín', 'Villa de Vallecas'])] 

In [87]:
reduced_schools.head()

Unnamed: 0.1,Unnamed: 0,Borough,Neighborhood,Address,Latitude,Longitude,Schools,Daycare centers,Total schools
6,6,Arganzuela,Imperial,"Imperial, Arganzuela, Madrid",40408330000000000,-37186499999999600,4.0,3.0,7.0
7,7,Arganzuela,Acacias,"Acacias, Arganzuela, Madrid",4040137000000000,-37066899999999800,6.0,13.0,19.0
8,8,Arganzuela,Chopera,"Chopera, Arganzuela, Madrid",4039349997056970,-3698450003895090,5.0,4.0,9.0
9,9,Arganzuela,Legazpi,"Legazpi, Arganzuela, Madrid",40387020000000000,-3689899999999960,4.0,8.0,12.0
10,10,Arganzuela,Delicias,"Delicias, Arganzuela, Madrid",4039613000000000,-368945999999994,2.0,6.0,8.0


In [88]:
reduced_schools.sort_values(['Total schools'], ascending=False).head(5)

Unnamed: 0.1,Unnamed: 0,Borough,Neighborhood,Address,Latitude,Longitude,Schools,Daycare centers,Total schools
112,112,Villa de Vallecas,Santa Eugenia,"Santa Eugenia, Villa de Vallecas, Madrid",4038544011651400,-3621275467099940,14.0,10.0,24.0
113,113,Villa de Vallecas,Ensanche de Vallecas,"Ensanche de Vallecas, Villa de Vallecas, Madrid",40369798344688800,-3617079086507530,10.0,14.0,24.0
18,18,Retiro,NiÃ±o JesÃºs,"NiÃ±o JesÃºs, Retiro, Madrid",4041095000000000,-367229999999995,14.0,9.0,23.0
15,15,Retiro,Estrella,"Estrella, Retiro, Madrid",4041117000000000,-3665929999999940,13.0,8.0,21.0
7,7,Arganzuela,Acacias,"Acacias, Arganzuela, Madrid",4040137000000000,-37066899999999800,6.0,13.0,19.0


In [89]:
reduced_schools.sort_values(['Daycare centers'], ascending=False).head(5)

Unnamed: 0.1,Unnamed: 0,Borough,Neighborhood,Address,Latitude,Longitude,Schools,Daycare centers,Total schools
113,113,Villa de Vallecas,Ensanche de Vallecas,"Ensanche de Vallecas, Villa de Vallecas, Madrid",40369798344688800,-3617079086507530,10.0,14.0,24.0
7,7,Arganzuela,Acacias,"Acacias, Arganzuela, Madrid",4040137000000000,-37066899999999800,6.0,13.0,19.0
112,112,Villa de Vallecas,Santa Eugenia,"Santa Eugenia, Villa de Vallecas, Madrid",4038544011651400,-3621275467099940,14.0,10.0,24.0
18,18,Retiro,NiÃ±o JesÃºs,"NiÃ±o JesÃºs, Retiro, Madrid",4041095000000000,-367229999999995,14.0,9.0,23.0
9,9,Arganzuela,Legazpi,"Legazpi, Arganzuela, Madrid",40387020000000000,-3689899999999960,4.0,8.0,12.0


The neighborhoods with most daycare centers and schools are: Ensanche, Acacias, Niño Jesus and Santa Eugenia. As discussed before Santa Eugenia is Cluster 4 (Others) non-recommendable for our porpuse. 

## 6. Conclusion

We have narrowed the interesting neighborhoods to 3. Let's see if we can narrow it a little bit more or sort them.

In [90]:
madrid_interest = madrid_grouped[['Neighborhood','Bakery', 'Food & Drink Shop', 'Garden', 'Grocery Store', 'Ice Cream Shop', 'Other Great Outdoors', 'Park', 'Playground', 'Plaza', 'Shopping Mall']]

In [92]:
madrid_interest.head()

Unnamed: 0,Neighborhood,Bakery,Food & Drink Shop,Garden,Grocery Store,Ice Cream Shop,Other Great Outdoors,Park,Playground,Plaza,Shopping Mall
0,Acacias,0.0,0.019231,0.0,0.0,0.019231,0.0,0.038462,0.019231,0.0,0.0
1,Adelfas,0.04,0.04,0.0,0.06,0.0,0.0,0.02,0.0,0.0,0.0
2,Atocha,0.0,0.018519,0.018519,0.0,0.0,0.0,0.018519,0.0,0.037037,0.0
3,Casco Histórico de Vallecas,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Castilla,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [102]:
madrid_interest_reduced=madrid_interest[madrid_interest['Neighborhood'].isin(['Acacias', 'Ensanche de Vallecas', 'Niño Jesús'])] 

In [103]:
madrid_interest_reduced.head()

Unnamed: 0,Neighborhood,Bakery,Food & Drink Shop,Garden,Grocery Store,Ice Cream Shop,Other Great Outdoors,Park,Playground,Plaza,Shopping Mall
0,Acacias,0.0,0.019231,0.0,0.0,0.019231,0.0,0.038462,0.019231,0.0,0.0
9,Ensanche de Vallecas,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0
16,Niño Jesús,0.023256,0.0,0.0,0.023256,0.0,0.0,0.046512,0.0,0.069767,0.0


As we can see, Ensanche has no parks of playgrounds, so it is not recommendable for families with children. Niño Jesus is the neighborhood families would be most interested to live in and Acacias the second one for having closeby everything they will need.