# Capstone Project - THE BATTLE OF NEIGHBORHOODS

## Description of the problem and  background

This report aims to analyze the city of Madrid and establish, based on different criteria, the best place to install a dental clinic.

Madrid is the capital of Spain, so it is an pretty big city and has a very abundant population, being one of the most important capitals of Europe.


It is also a city that has a great tourist attraction, so many areas of the capital are occupied by businesses dedicated to attract tourists.

In this sense, it should be noted that, when looking for a suitable location for a dental clinic, it is necessary to look for an area with a high number of fixed inhabitants, that do not vary over time and that can be established as fixed business customers, so it is necessary to avoid the most tourist areas and look for residential areas but with a high economic activity.

Therefore, the aspects to study when evaluating an adequate location for a dental clinic would be:

    - Number of inhabitants

    - Per capita income of the inhabitants

    - Existing businesses in the surroundings

    - Adequate communication with the rest of the city

    -Touristic attractions nearby

# Description of the data 

To solve the problem the following data is needed:
    
    - Districts Madrid data(code, name, coordinates, area, habitants...): This information has been obtained form different websites, and has been stored as shp and csv files.
    
    - Diferent bussines in the neighbourhood: thanks to Foursquare API, we can obtain information about the venues in each neighborhood
    
    
    
    

# Approach

    - Collect the districts of Madrid info

    - Use FoursquareApi to find all venues in the different districts
    
    - Finally, considering the venues, the communications and the number of inhabitants choose the best localization for the dental clinic

# LIBRARIES REQUIRED

In [106]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

# conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

import geopandas as gpd

print('Libraries imported.')

Libraries imported.


# DATA COLLECTION

In [107]:
dtypes = {'Codigo': 'str', 'Nombre': 'str', 'Latitude':'float', 'Longitude':'float', 'Superficie (ha)':'float', 'Poblacion':'int', 'Densidad(hab/ha)':'float'}

In [108]:
dfDistritos=pd.read_csv('Distritos.csv', sep=';', decimal=',',  encoding='latin-1', dtype=dtypes)

In [109]:
dfDistritos.columns = map(str.upper, dfDistritos.columns)

In [110]:
dfDistritos.rename(columns={'CODIGO':'CODDISTRIT'}, inplace=True)


In [111]:
dfDistritos.columns

Index(['CODDISTRIT', 'NOMBRE', 'LATITUDE', 'LONGITUDE', 'SUPERFICIE (HA)',
       'POBLACIÓN', 'DENSIDAD(HAB/HA)'],
      dtype='object')

In [112]:
dfDistritos.loc[0,:]

CODDISTRIT                1
NOMBRE               Centro
LATITUDE            40.4183
LONGITUDE          -3.70275
SUPERFICIE (HA)      522.82
POBLACIÓN            131928
DENSIDAD(HAB/HA)     252.34
Name: 0, dtype: object

In [113]:
address = 'Madrid, Spain'

geolocator = Nominatim(user_agent="T_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Madrid are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Madrid are 40.4167047, -3.7035825.


In [116]:
map_Madrid = folium.Map(location=[latitude, longitude], zoom_start=11)

map_Madrid.choropleth(
    geo_data='Distritos.geojson',
    name='choropleth',
    data=dfDistritos,
    columns=['CODDISTRIT', 'DENSIDAD(HAB/HA)'],
    key_on = 'feature.properties.CODDISTRIT',
    fill_color='YlGn',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='DENSIDAD (HAB/HA)'
)





folium.LayerControl().add_to(map_Madrid)
map_Madrid





In [114]:
map_Madrid = folium.Map(location=[latitude, longitude], zoom_start=10)

map_Madrid.choropleth(
    geo_data='Distritos.geojson',
    name='choropleth',
    data=dfDistritos,
    columns=['CODDISTRIT', 'DENSIDAD(HAB/HA)'],
    key_on = 'feature.properties.CODDISTRIT',
    fill_color='YlGn',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='DENSIDAD (HAB/HA)'
)



map_Madrid.choropleth(open('ParadasMetro.geojson',  encoding="utf8").read())



folium.LayerControl().add_to(map_Madrid)
map_Madrid





In [58]:
#@hidden_cell
CLIENT_ID = 'CULRBXG4RB3KHY5XH4JWM4QF4SIDYOSWQRYSGL0B5ROAXDMF' # your Foursquare ID
CLIENT_SECRET = 'DFBRCEALXBMJ3AGGOA0XZHJKVY0SHW4Y3TQGXCH43BXRXKTB' # your Foursquare Secret
VERSION = '20190802' # Foursquare API version

In [59]:
neighborhood_latitude = dfDistritos.loc[0, 'LATITUDE'] # neighborhood latitude value
neighborhood_longitude = dfDistritos.loc[0, 'LONGITUDE'] # neighborhood longitude value
neighborhood_name = dfDistritos.loc[0, 'NOMBRE'] # neighborhood name

In [60]:
LIMIT = 30 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url 

'https://api.foursquare.com/v2/venues/explore?&client_id=CULRBXG4RB3KHY5XH4JWM4QF4SIDYOSWQRYSGL0B5ROAXDMF&client_secret=DFBRCEALXBMJ3AGGOA0XZHJKVY0SHW4Y3TQGXCH43BXRXKTB&v=20190802&ll=40.4183083,-3.70275&radius=500&limit=30'

In [61]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [62]:
LIMIT=100
Madrid_venues = getNearbyVenues(names=dfDistritos['NOMBRE'],
                                   latitudes=dfDistritos['LATITUDE'],
                                   longitudes=dfDistritos['LONGITUDE'])


Centro
Arganzuela
Retiro
Salamanca
Chamartín
Tetuán
Chamberí
Fuencarral-El Pardo
Moncloa-Aravaca
Latina
Carabanchel
Usera
Puente de Vallecas
Moratalaz
Ciudad Lineal
Hortaleza
Villaverde
Villa de Vallecas
Vicálvaro
San Blas-Canillejas
Barajas


In [63]:
Madrid_onehot = pd.get_dummies(Madrid_venues[['Venue Category']], prefix="", prefix_sep="")


In [64]:
Madrid_venues.columns

Index(['Neighborhood', 'Neighborhood Latitude', 'Neighborhood Longitude',
       'Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category'],
      dtype='object')

In [65]:
# add neighborhood column back to dataframe
Madrid_onehot['NOMBRE'] = Madrid_venues['Neighborhood'] 

In [66]:
fixed_columns = [Madrid_onehot.columns[-1]] + list(Madrid_onehot.columns[:-1])
Madrid_onehot = Madrid_onehot[fixed_columns]


In [67]:
Madrid_grouped= Madrid_onehot.groupby('NOMBRE').mean().reset_index()
Madrid_grouped

Unnamed: 0,NOMBRE,Accessories Store,American Restaurant,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bar,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bistro,Boarding House,Bookstore,Boutique,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Burger Joint,Burrito Place,Café,Camera Store,Casino,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Cafeteria,Comedy Club,Comfort Food Restaurant,Concert Hall,Cosmetics Shop,Cuban Restaurant,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop,Diner,Electronics Store,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Flea Market,Flower Shop,Food,Fountain,French Restaurant,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gastropub,General Entertainment,Gift Shop,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Health Food Store,Herbs & Spices Store,Historic Site,Hookah Bar,Hostel,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Italian Restaurant,Japanese Restaurant,Juice Bar,Korean Restaurant,Lake,Liquor Store,Lounge,Market,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Monument / Landmark,Motorcycle Shop,Movie Theater,Multiplex,Museum,Music Venue,Neighborhood,Nightclub,Office,Optical Shop,Outdoor Sculpture,Paella Restaurant,Park,Pastry Shop,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Pool,Pub,Ramen Restaurant,Restaurant,Rock Club,Sandwich Place,Science Museum,Seafood Restaurant,Shopping Mall,Shopping Plaza,Skate Park,Snack Place,Soccer Field,Soccer Stadium,South American Restaurant,Spa,Spanish Restaurant,Sporting Goods Shop,Steakhouse,Supermarket,Sushi Restaurant,Tapas Restaurant,Tattoo Parlor,Thai Restaurant,Theater,Thrift / Vintage Store,Toy / Game Store,Trade School,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Wine Bar,Wine Shop
0,Arganzuela,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.057692,0.0,0.019231,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.019231,0.0,0.0,0.038462,0.0,0.019231,0.038462,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.019231,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.057692,0.019231,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.019231,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.019231,0.019231,0.0,0.0,0.0,0.0,0.019231,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.057692,0.0,0.019231,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.076923,0.019231,0.019231,0.0,0.0,0.0,0.019231,0.019231,0.0,0.0,0.0,0.0,0.0
1,Barajas,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.028571,0.0,0.0,0.0,0.0,0.028571,0.028571,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.057143,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.171429,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.028571,0.0,0.028571,0.0,0.0,0.0,0.085714,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.085714,0.0,0.0,0.0,0.0,0.057143,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.028571,0.0
2,Carabanchel,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.222222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.222222,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.222222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.222222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Centro,0.01,0.0,0.0,0.0,0.04,0.0,0.03,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.01,0.0,0.06,0.02,0.02,0.0,0.0,0.0,0.01,0.02,0.01,0.0,0.01,0.02,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.01,0.03,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.02,0.11,0.01,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.02,0.0,0.03,0.0,0.01,0.0,0.06,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.07,0.03,0.0,0.0,0.01,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Chamartín,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.04,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.04,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.12,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.04,0.04,0.04,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Chamberí,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.04,0.06,0.02,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.04,0.03,0.01,0.04,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.01,0.0,0.02,0.03,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.04,0.0,0.02,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.02,0.0,0.01,0.0,0.1,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.11,0.0,0.0,0.01,0.01,0.06,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0
6,Ciudad Lineal,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.032258,0.0,0.032258,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.064516,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.129032,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.064516,0.0,0.0,0.0,0.0,0.0,0.064516,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.064516,0.0,0.0,0.0,0.0,0.0,0.064516,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.129032,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Hortaleza,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0625,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0625,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Latina,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.153846,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.230769,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Moncloa-Aravaca,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.222222,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.222222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [68]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [69]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['NOMBRE']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['NOMBRE'] = Madrid_grouped['NOMBRE']

for ind in np.arange(Madrid_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Madrid_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,NOMBRE,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Arganzuela,Tapas Restaurant,Spanish Restaurant,Grocery Store,Restaurant,Bakery,Gym / Fitness Center,Falafel Restaurant,Coffee Shop,Beer Garden,Chinese Restaurant
1,Barajas,Hotel,Restaurant,Spanish Restaurant,Tapas Restaurant,Gastropub,Japanese Restaurant,Plaza,Pizza Place,Bistro,Boarding House
2,Carabanchel,Coffee Shop,Fast Food Restaurant,Supermarket,Grocery Store,Flea Market,Wine Shop,Diner,Farmers Market,Falafel Restaurant,Electronics Store
3,Centro,Hotel,Spanish Restaurant,Restaurant,Clothing Store,Tapas Restaurant,Argentinian Restaurant,Gourmet Shop,Plaza,Art Museum,Bookstore
4,Chamartín,Restaurant,Spanish Restaurant,Tapas Restaurant,Steakhouse,Sushi Restaurant,Metro Station,Seafood Restaurant,Plaza,Big Box Store,Deli / Bodega


In [70]:
neighborhoods_venues_sorted.loc[:, ['NOMBRE', '1st Most Common Venue']]

Unnamed: 0,NOMBRE,1st Most Common Venue
0,Arganzuela,Tapas Restaurant
1,Barajas,Hotel
2,Carabanchel,Coffee Shop
3,Centro,Hotel
4,Chamartín,Restaurant
5,Chamberí,Spanish Restaurant
6,Ciudad Lineal,Grocery Store
7,Hortaleza,Spanish Restaurant
8,Latina,Grocery Store
9,Moncloa-Aravaca,Coffee Shop


In [71]:

# import k-means from clustering stage
from sklearn.cluster import KMeans

In [72]:
# set number of clusters
kclusters = 5

Madrid_grouped_clustering = Madrid_grouped.drop('NOMBRE', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Madrid_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 1, 0, 0, 0, 1, 1, 1, 4])

In [73]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

In [74]:
neighborhoods_venues_sorted.columns

Index(['Cluster Labels', 'NOMBRE', '1st Most Common Venue',
       '2nd Most Common Venue', '3rd Most Common Venue',
       '4th Most Common Venue', '5th Most Common Venue',
       '6th Most Common Venue', '7th Most Common Venue',
       '8th Most Common Venue', '9th Most Common Venue',
       '10th Most Common Venue'],
      dtype='object')

In [75]:
Madrid_merged = dfDistritos

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
Madrid_merged = Madrid_merged.join(neighborhoods_venues_sorted.set_index('NOMBRE'), on='NOMBRE')

Madrid_merged.head() # check the last columns!

Unnamed: 0,CODDISTRIT,NOMBRE,LATITUDE,LONGITUDE,SUPERFICIE (HA),POBLACIÓN,DENSIDAD(HAB/HA),Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1,Centro,40.418308,-3.70275,522.82,131928,252.34,0.0,Hotel,Spanish Restaurant,Restaurant,Clothing Store,Tapas Restaurant,Argentinian Restaurant,Gourmet Shop,Plaza,Art Museum,Bookstore
1,2,Arganzuela,40.400211,-3.69618,646.22,151965,235.16,0.0,Tapas Restaurant,Spanish Restaurant,Grocery Store,Restaurant,Bakery,Gym / Fitness Center,Falafel Restaurant,Coffee Shop,Beer Garden,Chinese Restaurant
2,3,Retiro,40.41317,-3.68307,546.62,118516,216.82,3.0,Garden,Café,Monument / Landmark,Fountain,Lake,Plaza,Park,Shopping Plaza,Diner,Snack Place
3,4,Salamanca,40.429722,-3.67975,539.24,143800,266.67,0.0,Spanish Restaurant,Restaurant,Tapas Restaurant,Boutique,Coffee Shop,Seafood Restaurant,Bakery,Japanese Restaurant,Mexican Restaurant,Mediterranean Restaurant
4,5,Chamartín,40.462059,-3.6766,917.55,143424,156.31,0.0,Restaurant,Spanish Restaurant,Tapas Restaurant,Steakhouse,Sushi Restaurant,Metro Station,Seafood Restaurant,Plaza,Big Box Store,Deli / Bodega


In [95]:
Madrid_merged=Madrid_merged.dropna(subset=['Cluster Labels'])

In [96]:
# create map
map_clusters = folium.Map(location=[neighborhood_latitude, neighborhood_longitude], zoom_start=11)

In [97]:
# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

In [117]:
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Madrid_merged['LATITUDE'], Madrid_merged['LONGITUDE'], Madrid_merged['NOMBRE'], Madrid_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    print(str(poi) + ' Cluster ' + str(cluster))
    
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)],
        fill=True,
        fill_color=rainbow[int(cluster)],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Centro Cluster 0.0
Arganzuela Cluster 0.0
Retiro Cluster 3.0
Salamanca Cluster 0.0
Chamartín Cluster 0.0
Tetuán Cluster 0.0
Chamberí Cluster 0.0
Moncloa-Aravaca Cluster 4.0
Latina Cluster 1.0
Carabanchel Cluster 1.0
Usera Cluster 0.0
Puente de Vallecas Cluster 0.0
Moratalaz Cluster 0.0
Ciudad Lineal Cluster 1.0
Hortaleza Cluster 1.0
Villaverde Cluster 2.0
Villa de Vallecas Cluster 0.0
Vicálvaro Cluster 3.0
San Blas-Canillejas Cluster 1.0
Barajas Cluster 0.0


In [100]:
cluster=0

Madrid_merged.loc[Madrid_merged['Cluster Labels'] == cluster, Madrid_merged.columns[[1] + list(range(5, Madrid_merged.shape[1]))]]

Unnamed: 0,NOMBRE,POBLACIÓN,DENSIDAD(HAB/HA),Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Centro,131928,252.34,0.0,Hotel,Spanish Restaurant,Restaurant,Clothing Store,Tapas Restaurant,Argentinian Restaurant,Gourmet Shop,Plaza,Art Museum,Bookstore
1,Arganzuela,151965,235.16,0.0,Tapas Restaurant,Spanish Restaurant,Grocery Store,Restaurant,Bakery,Gym / Fitness Center,Falafel Restaurant,Coffee Shop,Beer Garden,Chinese Restaurant
3,Salamanca,143800,266.67,0.0,Spanish Restaurant,Restaurant,Tapas Restaurant,Boutique,Coffee Shop,Seafood Restaurant,Bakery,Japanese Restaurant,Mexican Restaurant,Mediterranean Restaurant
4,Chamartín,143424,156.31,0.0,Restaurant,Spanish Restaurant,Tapas Restaurant,Steakhouse,Sushi Restaurant,Metro Station,Seafood Restaurant,Plaza,Big Box Store,Deli / Bodega
5,Tetuán,153789,286.13,0.0,Spanish Restaurant,Restaurant,Seafood Restaurant,Brazilian Restaurant,Supermarket,Brewery,Coffee Shop,Chinese Restaurant,Hotel,Grocery Store
6,Chamberí,137401,293.64,0.0,Spanish Restaurant,Restaurant,Tapas Restaurant,Bar,Mexican Restaurant,Brewery,Bakery,Café,Burger Joint,Coffee Shop
11,Usera,134791,173.3,0.0,Clothing Store,Coffee Shop,Beer Garden,Spanish Restaurant,Gastropub,Electronics Store,Plaza,Farmers Market,Performing Arts Venue,Fast Food Restaurant
12,Puente de Vallecas,227595,152.05,0.0,Pizza Place,Gym,Music Venue,Pub,Chinese Restaurant,Italian Restaurant,Rock Club,Burger Joint,Concert Hall,Soccer Stadium
13,Moratalaz,94197,154.34,0.0,Bar,Plaza,Coffee Shop,Optical Shop,Pizza Place,Nightclub,Pub,Café,Brewery,Breakfast Spot
17,Villa de Vallecas,104421,19.86,0.0,Clothing Store,Fast Food Restaurant,Coffee Shop,Italian Restaurant,Sandwich Place,Restaurant,Brazilian Restaurant,Shopping Mall,Electronics Store,Boutique


In [101]:
cluster=1

Madrid_merged.loc[Madrid_merged['Cluster Labels'] == cluster, Madrid_merged.columns[[1] + list(range(5, Madrid_merged.shape[1]))]]

Unnamed: 0,NOMBRE,POBLACIÓN,DENSIDAD(HAB/HA),Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,Latina,233808,91.95,1.0,Grocery Store,Fast Food Restaurant,Pizza Place,Thrift / Vintage Store,Plaza,Breakfast Spot,Comedy Club,Spanish Restaurant,Metro Station,Indie Theater
10,Carabanchel,243998,173.68,1.0,Coffee Shop,Fast Food Restaurant,Supermarket,Grocery Store,Flea Market,Wine Shop,Diner,Farmers Market,Falafel Restaurant,Electronics Store
14,Ciudad Lineal,212529,186.01,1.0,Grocery Store,Spanish Restaurant,Coffee Shop,Restaurant,Pizza Place,Italian Restaurant,Hotel,Park,Asian Restaurant,Athletics & Sports
15,Hortaleza,180462,65.81,1.0,Spanish Restaurant,Pizza Place,Garden,Park,Café,Restaurant,Metro Station,Supermarket,Bar,Tapas Restaurant
19,San Blas-Canillejas,154357,69.24,1.0,Rock Club,Spanish Restaurant,Pet Store,Grocery Store,Pizza Place,Park,Gastropub,Plaza,Sandwich Place,Bakery


In [102]:
cluster=2

Madrid_merged.loc[Madrid_merged['Cluster Labels'] == cluster, Madrid_merged.columns[[1] + list(range(5, Madrid_merged.shape[1]))]]

Unnamed: 0,NOMBRE,POBLACIÓN,DENSIDAD(HAB/HA),Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
16,Villaverde,142608,70.64,2.0,Spanish Restaurant,Train Station,Mediterranean Restaurant,Grocery Store,Gastropub,Wine Shop,Diner,Fast Food Restaurant,Farmers Market,Falafel Restaurant


In [103]:
cluster=3

Madrid_merged.loc[Madrid_merged['Cluster Labels'] == cluster, Madrid_merged.columns[[1] + list(range(5, Madrid_merged.shape[1]))]]

Unnamed: 0,NOMBRE,POBLACIÓN,DENSIDAD(HAB/HA),Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Retiro,118516,216.82,3.0,Garden,Café,Monument / Landmark,Fountain,Lake,Plaza,Park,Shopping Plaza,Diner,Snack Place
18,Vicálvaro,70051,19.86,3.0,Café,Falafel Restaurant,Train Station,Park,Grocery Store,Supermarket,Dessert Shop,Fast Food Restaurant,Farmers Market,Electronics Store


In [104]:
cluster=4

Madrid_merged.loc[Madrid_merged['Cluster Labels'] == cluster, Madrid_merged.columns[[1] + list(range(5, Madrid_merged.shape[1]))]]

Unnamed: 0,NOMBRE,POBLACIÓN,DENSIDAD(HAB/HA),Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Moncloa-Aravaca,116903,25.12,4.0,Coffee Shop,Pub,Gym,Pool,Garden,College Cafeteria,Hookah Bar,Gym / Fitness Center,Farmers Market,Comfort Food Restaurant


Conclussion:

To choose the district in which to locate the dental clinic, we look for a residential area, with a high number of residents and with a high commercial activity. Likewise, it must be a well-connected district, for which the metro stops of the city of Madrid have been analyzed.


As indicated, the appropriate areas would be those included in cluster 3, since they are residential areas and not heavily influenced by tourism.


Within cluster 3, districts with a greater number of inhabitants, and therefore more apt to install the new business would beCarabanchel or Ciudad lineal.


Since Carabanchel is better communicated, since it has a greater number of subway stops, in addition to a commercial activity more compatible with the new business, it is considered the most appropriate district to install a new dental clinic.

Thank you!