## Define Task

Madrid, Spain's central capital, is a city of elegant boulevards, expansive parks and beautiful historical scenery. It's renowned for its rich repositories of European art, including the Prado Museum's works by Goya and Velazquez. It is also a region with a great variety of cuisines. Its central location and the many people originally from other regions of the country have influenced the cuisine of Madrid. The population size currently stands at 6.5 million inhabitants and the city also enjoys a large wave of tourists and international students moving to experience living in the Spanish capital.

Restaurants are spread all over the city with a massive variation ranging from oriental to mediterranean and cuisines from different European countries. One can imagine that opening up a restaurant in Madrid is not going to very easy due to the high competition that there is. Many restaurants that open end up failing due to underplanning and lacking of a solid business strategy.

I would like to open an Italian restaurant in Madrid but I want to make sure that it is indeed a good idea. The main business problem here would be whether the investment in opening an Italian restaurant will be profitable and worth the time, effort and money spent.

In order to fully understand the market and whether to procceed with the investment or not, I will take a look at population data and venues data to see what are the most common venues in the city. I will also study where would be the ideal location to open the restaurant based on different factors.

## Data Cleaning & Preparation

#### Import all packages that will be used for the analyis

In [1]:
#Import packages
import pandas as pd
import numpy as np
import requests
import json
from bs4 import BeautifulSoup
from sklearn.cluster import KMeans
import seaborn as sns
import matplotlib.pyplot as plt
from geopy.geocoders import Nominatim
import folium
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors

print("All libraries are imported.")

All libraries are imported.


#### Crawl wikipedia to get list of districts in Madrid and then merge coordinates data to create final dataset

In [2]:
#Extract districts of Madrid along with population and size
url = 'https://es.wikipedia.org/w/index.php?title=Anexo:Distritos_de_Madrid&oldid=119991001'
Madrid = requests.get(url)

#Use beautifulsoup to extract the table of interest from the page
soup = BeautifulSoup(Madrid.content, 'lxml')
Districts = soup.find_all('table')[0]
df = pd.read_html(url, flavor = 'bs4')
df = pd.read_json(df[0].to_json(orient = 'records'))

#Drop unnecessary columns and rows
df.drop(['Número','Imagen', 'Superficie[n. 1]​ (Ha.)'], axis = 1, inplace = True)
df.drop(df.tail(1).index, inplace = True)

#Change the title of the columns
df.columns = ['District', 'Population', 'Pop Density', 'Streets']

#Clean streets column from paranthesis
df['Streets'] = df['Streets'].str.replace(r"\((.*?)\)",", ", regex = True)

#Import dataset with coordinates and zip codes & merge it with current dataset
df1 = pd.read_csv('/Users/user/Desktop/Geospatial Coordinates Madrid.csv')
madrid_data = pd.merge(df, df1, on = 'District')

#Reorder columns
madrid_data = madrid_data[['Zip Code', 'District', 'Population', 'Pop Density', 'Streets', 'Latitude', 'Longitude']]

#Display data
madrid_data

Unnamed: 0,Zip Code,District,Population,Pop Density,Streets,Latitude,Longitude
0,28013,Centro,131 928,25234,"Palacio , Embajadores , Cortes , Justicia , Un...",40.419,-3.7118
1,28005,Arganzuela,151 965,23516,"Imperial , Acacias , Chopera , Legazpi , Delic...",40.405,-3.7105
2,28009,Retiro,118 516,21682,"Pacífico , Adelfas , Estrella , Ibiza , Jeróni...",40.4162,-3.6801
3,28001,Salamanca,143 800,26667,"Recoletos , Goya , Fuente del Berro , Guindale...",40.4262,-3.6851
4,28036,Chamartín,143 424,15631,"El Viso , Prosperidad , Ciudad Jardín , Hispan...",40.4618,-3.6851
5,28013,Tetuán,153 789,28613,"Bellas Vistas , Cuatro Caminos , Castillejos ,...",40.419,-3.7118
6,28015,Chamberí,137 401,29364,"Gaztambide , Arapiles , Trafalgar , Almagro , ...",40.4305,-3.7105
7,28004,Fuencarral-El Pardo,238 756,1004,"El Pardo , Fuentelarreina , Peñagrande , Pilar...",40.4245,-3.6991
8,28008,Moncloa-Aravaca,116 903,2512,"Casa de Campo , Argüelles , Ciudad Universitar...",40.43,-3.7257
9,28047,Latina,233 808,9195,"Los Cármenes , Puerta del Ángel , Lucero , Alu...",40.3961,-3.7486


The above information will be used to plot the map of Madrid based on the coordinates (longitude and latitude) in order to later check what are the most common venues in each district.

## Madrid: Geospatial Analysis

### A. Create a map and access the foursquare API

#### Create a map of Madrid

In [3]:
#Create an agent for the Madrid map and use geolocator
address = 'Madrid, Spain'
geolocator = Nominatim(user_agent = "mad_explorer")
location = geolocator.geocode(address)
Latitude = location.latitude
Longitude = location.longitude
print('The geograpical coordinates of Madrid are {}, {}.'.format(Latitude, Longitude))

#Create a map of Madrid using longitude and latitude values
map_madrid = folium.Map(location = [Latitude, Longitude], zoom_start = 11)

#Add markers on the map
for lat, lng, label in zip(madrid_data['Latitude'], madrid_data['Longitude'], madrid_data['District']):
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker(
        [lat, lng],
        radius = 5,
        popup = label,
        color = 'blue',
        fill = True,
        fill_color = '#3186cc',
        fill_opacity = 0.7,
        parse_html = False).add_to(map_madrid)  
    
#Display the map
map_madrid

The geograpical coordinates of Madrid are 40.4167047, -3.7035825.


#### Access Foursquare API to get venue details

I will use the foursquare API to be able to identify the venues that are located in every district. This is useful because then I will see what are the most common venues in each district which can help in identifying the ideal location to open the Italian restaurant.

In [4]:
#Access foursquare to get all the different venues in all different districts in Madrid
CLIENT_ID = 'TQT1M33FH3JO1HO2M0TPR1MPERDTRV0GTTWVM5WQ1DQEWYUN'
CLIENT_SECRET = '5SJJCS23XPAKSKMR2WGIB1D3W4YQEZB1MXFYBFVKYKP2ZFUB'
VERSION = '20180605'

print('My credentials for Foursquare are:')
print('CLIENT_ID:', CLIENT_ID)
print('CLIENT_SECRET:', CLIENT_SECRET)

My credentials for Foursquare are:
CLIENT_ID: TQT1M33FH3JO1HO2M0TPR1MPERDTRV0GTTWVM5WQ1DQEWYUN
CLIENT_SECRET: 5SJJCS23XPAKSKMR2WGIB1D3W4YQEZB1MXFYBFVKYKP2ZFUB


### B. Explore the first neighborhood (Centro)

#### Get coordinates data of the centro district

In [5]:
#Lock dataset to display the first district/neighborhood (Centro)
madrid_data.loc[0, 'District']

#Get the longitude and latitude of the Centro district
centro_latitude = madrid_data.loc[0, 'Latitude']
centro_longitude = madrid_data.loc[0, 'Longitude']
centro = madrid_data.loc[0, 'District']

#Display the coordinates of the centro district
print('Latitude and longitude values of {} are {}, {}.'.format(centro, centro_latitude, centro_longitude))

Latitude and longitude values of Centro are 40.419000000000004, -3.7118.


#### Get information on venues in the centro district

In [6]:
#Get the top 100 venues in centro whithin a radius of 500m
LIMIT = 100
RADIUS = 500

#Create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    centro_latitude, 
    centro_longitude, 
    RADIUS, 
    LIMIT)
print(url)

#Get results
results = requests.get(url).json()
results

#Function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

https://api.foursquare.com/v2/venues/explore?&client_id=TQT1M33FH3JO1HO2M0TPR1MPERDTRV0GTTWVM5WQ1DQEWYUN&client_secret=5SJJCS23XPAKSKMR2WGIB1D3W4YQEZB1MXFYBFVKYKP2ZFUB&v=20180605&ll=40.419000000000004,-3.7118&radius=500&limit=100


### C. Explore all venues around all districts of Madrid

#### Build and clean venues in a new dataset

In [7]:
#Clean and structure dataset in pandas df
venues = results['response']['groups'][0]['items']
nearby_venues = json_normalize(venues) #flatten JSON

#Filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

#Filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis = 1)

#Clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

#Display dataset
nearby_venues.head()

#Print the number of venues returned by foursquare API
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare.


#### Retrieve information on all venues located in Madrid

In [8]:
#Define function that gets information about all venues in Madrid
def getNearbyVenues(names, latitudes, longitudes, radius = 500):
    
    venues_list = []
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['District', 
                  'District Latitude', 
                  'District Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)


#Run above function and check output
madrid_venues = getNearbyVenues(names = madrid_data['District'],
                                   latitudes = madrid_data['Latitude'],
                                   longitudes = madrid_data['Longitude']
                                  )

#Check size of df created
print(madrid_venues.shape)
madrid_venues.head()

Centro
Arganzuela
Retiro
Salamanca
Chamartín
Tetuán
Chamberí
Fuencarral-El Pardo
Moncloa-Aravaca
Latina
Carabanchel
Usera
Puente de Vallecas
Moratalaz
Ciudad Lineal
Hortaleza
Villaverde
Villa de Vallecas
Vicálvaro
San Blas-Canillejas
Barajas
(849, 7)


Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Centro,40.419,-3.7118,Plaza de Oriente,40.418326,-3.712196,Plaza
1,Centro,40.419,-3.7118,Jardines de Sabatini,40.419954,-3.713126,Garden
2,Centro,40.419,-3.7118,Teatro Real de Madrid,40.418226,-3.711064,Opera House
3,Centro,40.419,-3.7118,Gran Meliá Palacio de los Duques *****,40.419835,-3.709494,Hotel
4,Centro,40.419,-3.7118,El Mollete,40.419913,-3.710503,Tapas Restaurant


#### Check number of venues returned for each neighborhood

In [9]:
#Check how many venues returned for each neighborhood
madrid_venues.groupby('District').count()

Unnamed: 0_level_0,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
District,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Arganzuela,28,28,28,28,28,28
Barajas,40,40,40,40,40,40
Carabanchel,3,3,3,3,3,3
Centro,100,100,100,100,100,100
Chamartín,40,40,40,40,40,40
Chamberí,83,83,83,83,83,83
Ciudad Lineal,13,13,13,13,13,13
Fuencarral-El Pardo,100,100,100,100,100,100
Hortaleza,100,100,100,100,100,100
Latina,8,8,8,8,8,8


#### Check number of unique venue categories

In [10]:
#Check how many unique categories
print('There are {} uniques categories.'.format(len(madrid_venues['Venue Category'].unique())))

There are 148 uniques categories.


#### Dummy encode categorical variables to be able to use them

In [11]:
#One hot encoding to change format to numerical
madrid_onehot = pd.get_dummies(madrid_venues[['Venue Category']], prefix = "", prefix_sep = "")

#Add district column back to dataframe
madrid_onehot['District'] = madrid_venues['District'] 

#Move district column to make it as first column
fixed_columns = [madrid_onehot.columns[-1]] + list(madrid_onehot.columns[:-1])
madrid_onehot = madrid_onehot[fixed_columns]
madrid_onehot.head()

#Examine new shape after OHE
madrid_onehot.shape

(849, 149)

In [12]:
#Group rows by district by taking the mean of the frequency of occurence of each category
madrid_grouped = madrid_onehot.groupby('District').mean().reset_index()
madrid_grouped

Unnamed: 0,District,Accessories Store,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,...,Taco Place,Tapas Restaurant,Thai Restaurant,Theater,Thrift / Vintage Store,Train Station,Vegetarian / Vegan Restaurant,Wine Bar,Wine Shop,Yoga Studio
0,Arganzuela,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,...,0.0,0.107143,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Barajas,0.025,0.05,0.05,0.1,0.1,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Carabanchel,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Centro,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,...,0.0,0.03,0.0,0.02,0.0,0.0,0.0,0.02,0.01,0.0
4,Chamartín,0.0,0.0,0.0,0.0,0.0,0.025,0.025,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Chamberí,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012048,...,0.012048,0.036145,0.0,0.036145,0.0,0.0,0.0,0.0,0.0,0.0
6,Ciudad Lineal,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.153846,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Fuencarral-El Pardo,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,...,0.0,0.05,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.01
8,Hortaleza,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,...,0.0,0.05,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.01
9,Latina,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0


In [34]:
#Print each district along with top 5 most common venues
num_top_venues = 5

for hood in madrid_grouped['District']:
    print("----"+hood+"----")
    temp = madrid_grouped[madrid_grouped['District'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending = False).reset_index(drop = True).head(num_top_venues))
    print('\n')

----Arganzuela----
              venue  freq
0               Bar  0.14
1  Tapas Restaurant  0.11
2             Plaza  0.07
3       Pizza Place  0.07
4       Art Gallery  0.07


----Barajas----
                 venue  freq
0  Rental Car Location  0.12
1      Airport Service  0.10
2     Airport Terminal  0.10
3       Duty-free Shop  0.10
4                 Café  0.05


----Carabanchel----
                 venue  freq
0        Metro Station  0.67
1                Motel  0.33
2    Accessories Store  0.00
3         Noodle House  0.00
4  Monument / Landmark  0.00


----Centro----
                venue  freq
0               Hotel  0.09
1  Spanish Restaurant  0.07
2               Plaza  0.06
3         Coffee Shop  0.04
4                 Bar  0.04


----Chamartín----
                      venue  freq
0        Spanish Restaurant  0.18
1                Restaurant  0.12
2  Mediterranean Restaurant  0.08
3        Seafood Restaurant  0.05
4      Gym / Fitness Center  0.05


----Chamberí----
         

In [13]:
#Sort venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [14]:
#Create new dataframe to display top 10 venues
num_top_venues = 10
indicators = ['st', 'nd', 'rd']

#Create columns according to number of top venues
columns = ['District']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind + 1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind + 1))

#Create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns = columns)
neighborhoods_venues_sorted['District'] = madrid_grouped['District']

for ind in np.arange(madrid_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(madrid_grouped.iloc[ind, :], num_top_venues)

#Display dataframe with districts and most common venues in every district
neighborhoods_venues_sorted.head()

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Arganzuela,Bar,Tapas Restaurant,Plaza,Pizza Place,Gym / Fitness Center,Art Gallery,Spanish Restaurant,Gym,Brewery,Restaurant
1,Barajas,Rental Car Location,Airport Service,Airport Terminal,Duty-free Shop,Airport Gate,Airport Lounge,Convenience Store,Metro Station,Café,Accessories Store
2,Carabanchel,Metro Station,Motel,Food Service,Flea Market,Fast Food Restaurant,Embassy / Consulate,Electronics Store,Duty-free Shop,Dumpling Restaurant,Donut Shop
3,Centro,Hotel,Spanish Restaurant,Plaza,Ice Cream Shop,Coffee Shop,Bar,Seafood Restaurant,Tapas Restaurant,Record Shop,Italian Restaurant
4,Chamartín,Spanish Restaurant,Restaurant,Mediterranean Restaurant,Nightclub,Gym / Fitness Center,Seafood Restaurant,Bar,Steakhouse,Plaza,Pizza Place


## Cluster analysis on venues

#### Initialize Kmeans cluster

In [15]:
#Initialize number of clusters
kclusters = 5

madrid_grouped_clustering = madrid_grouped.drop('District', 1)

#Run k-means clustering
kmeans = KMeans(n_clusters = kclusters, random_state = 0).fit(madrid_grouped_clustering)

#Check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([2, 2, 3, 2, 2, 2, 2, 2, 2, 2], dtype=int32)

#### Create new dataframe that includes clusters along with top 10 venues

In [None]:
#Add cluster labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
madrid_merged = madrid_data

#Merge to get final data
madrid_merged = madrid_merged.join(neighborhoods_venues_sorted.set_index('District'), on = 'District')

In [23]:
#Display dataset
madrid_merged

Unnamed: 0,Zip Code,District,Population,Pop Density,Streets,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,28013,Centro,131 928,25234,"Palacio , Embajadores , Cortes , Justicia , Un...",40.419,-3.7118,2.0,Hotel,Spanish Restaurant,Plaza,Ice Cream Shop,Coffee Shop,Bar,Seafood Restaurant,Tapas Restaurant,Record Shop,Italian Restaurant
1,28005,Arganzuela,151 965,23516,"Imperial , Acacias , Chopera , Legazpi , Delic...",40.405,-3.7105,2.0,Bar,Tapas Restaurant,Plaza,Pizza Place,Gym / Fitness Center,Art Gallery,Spanish Restaurant,Gym,Brewery,Restaurant
2,28009,Retiro,118 516,21682,"Pacífico , Adelfas , Estrella , Ibiza , Jeróni...",40.4162,-3.6801,2.0,Café,Bakery,Restaurant,Fountain,Park,Gastropub,Snack Place,Spanish Restaurant,Paella Restaurant,Lake
3,28001,Salamanca,143 800,26667,"Recoletos , Goya , Fuente del Berro , Guindale...",40.4262,-3.6851,2.0,Spanish Restaurant,Restaurant,Hotel,Clothing Store,Boutique,Furniture / Home Store,Tapas Restaurant,Italian Restaurant,Bakery,Accessories Store
4,28036,Chamartín,143 424,15631,"El Viso , Prosperidad , Ciudad Jardín , Hispan...",40.4618,-3.6851,2.0,Spanish Restaurant,Restaurant,Mediterranean Restaurant,Nightclub,Gym / Fitness Center,Seafood Restaurant,Bar,Steakhouse,Plaza,Pizza Place
5,28013,Tetuán,153 789,28613,"Bellas Vistas , Cuatro Caminos , Castillejos ,...",40.419,-3.7118,2.0,Hotel,Spanish Restaurant,Plaza,Ice Cream Shop,Coffee Shop,Bar,Seafood Restaurant,Tapas Restaurant,Record Shop,Italian Restaurant
6,28015,Chamberí,137 401,29364,"Gaztambide , Arapiles , Trafalgar , Almagro , ...",40.4305,-3.7105,2.0,Spanish Restaurant,Supermarket,Coffee Shop,Bar,Restaurant,Department Store,Hotel,Theater,Tapas Restaurant,Café
7,28004,Fuencarral-El Pardo,238 756,1004,"El Pardo , Fuentelarreina , Peñagrande , Pilar...",40.4245,-3.6991,2.0,Restaurant,Spanish Restaurant,Tapas Restaurant,Cocktail Bar,Hotel,Bar,Gay Bar,Italian Restaurant,Bookstore,Ice Cream Shop
8,28008,Moncloa-Aravaca,116 903,2512,"Casa de Campo , Argüelles , Ciudad Universitar...",40.43,-3.7257,2.0,Paella Restaurant,Tapas Restaurant,Bar,Spanish Restaurant,Restaurant,Pub,Movie Theater,Athletics & Sports,Argentinian Restaurant,Cocktail Bar
9,28047,Latina,233 808,9195,"Los Cármenes , Puerta del Ángel , Lucero , Alu...",40.3961,-3.7486,2.0,Scenic Lookout,Asian Restaurant,Peruvian Restaurant,Pizza Place,Plaza,Soccer Field,Paella Restaurant,Train Station,Gift Shop,Grocery Store


#### Clean dataset from missing values

In [32]:
#Remove NAs from dataset
madrid_merged.dropna(axis = 0, how = 'any', thresh = None, subset = None, inplace = True)
madrid_merged

Unnamed: 0,Zip Code,District,Population,Pop Density,Streets,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,28013,Centro,131 928,25234,"Palacio , Embajadores , Cortes , Justicia , Un...",40.419,-3.7118,2.0,Hotel,Spanish Restaurant,Plaza,Ice Cream Shop,Coffee Shop,Bar,Seafood Restaurant,Tapas Restaurant,Record Shop,Italian Restaurant
1,28005,Arganzuela,151 965,23516,"Imperial , Acacias , Chopera , Legazpi , Delic...",40.405,-3.7105,2.0,Bar,Tapas Restaurant,Plaza,Pizza Place,Gym / Fitness Center,Art Gallery,Spanish Restaurant,Gym,Brewery,Restaurant
2,28009,Retiro,118 516,21682,"Pacífico , Adelfas , Estrella , Ibiza , Jeróni...",40.4162,-3.6801,2.0,Café,Bakery,Restaurant,Fountain,Park,Gastropub,Snack Place,Spanish Restaurant,Paella Restaurant,Lake
3,28001,Salamanca,143 800,26667,"Recoletos , Goya , Fuente del Berro , Guindale...",40.4262,-3.6851,2.0,Spanish Restaurant,Restaurant,Hotel,Clothing Store,Boutique,Furniture / Home Store,Tapas Restaurant,Italian Restaurant,Bakery,Accessories Store
4,28036,Chamartín,143 424,15631,"El Viso , Prosperidad , Ciudad Jardín , Hispan...",40.4618,-3.6851,2.0,Spanish Restaurant,Restaurant,Mediterranean Restaurant,Nightclub,Gym / Fitness Center,Seafood Restaurant,Bar,Steakhouse,Plaza,Pizza Place
5,28013,Tetuán,153 789,28613,"Bellas Vistas , Cuatro Caminos , Castillejos ,...",40.419,-3.7118,2.0,Hotel,Spanish Restaurant,Plaza,Ice Cream Shop,Coffee Shop,Bar,Seafood Restaurant,Tapas Restaurant,Record Shop,Italian Restaurant
6,28015,Chamberí,137 401,29364,"Gaztambide , Arapiles , Trafalgar , Almagro , ...",40.4305,-3.7105,2.0,Spanish Restaurant,Supermarket,Coffee Shop,Bar,Restaurant,Department Store,Hotel,Theater,Tapas Restaurant,Café
7,28004,Fuencarral-El Pardo,238 756,1004,"El Pardo , Fuentelarreina , Peñagrande , Pilar...",40.4245,-3.6991,2.0,Restaurant,Spanish Restaurant,Tapas Restaurant,Cocktail Bar,Hotel,Bar,Gay Bar,Italian Restaurant,Bookstore,Ice Cream Shop
8,28008,Moncloa-Aravaca,116 903,2512,"Casa de Campo , Argüelles , Ciudad Universitar...",40.43,-3.7257,2.0,Paella Restaurant,Tapas Restaurant,Bar,Spanish Restaurant,Restaurant,Pub,Movie Theater,Athletics & Sports,Argentinian Restaurant,Cocktail Bar
9,28047,Latina,233 808,9195,"Los Cármenes , Puerta del Ángel , Lucero , Alu...",40.3961,-3.7486,2.0,Scenic Lookout,Asian Restaurant,Peruvian Restaurant,Pizza Place,Plaza,Soccer Field,Paella Restaurant,Train Station,Gift Shop,Grocery Store


#### Visualize results

In [48]:
#Create map
map_clusters = folium.Map(location = [Latitude, Longitude], zoom_start = 11)

#Set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

#Add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(madrid_merged['Latitude'], madrid_merged['Longitude'], madrid_merged['District'], madrid_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html = True)
    folium.CircleMarker(
        [lat, lon],
        radius = 5,
        popup = label,
        color = rainbow[int(cluster)-1],
        fill = True,
        fill_color = rainbow[int(cluster)-1],
        fill_opacity = 0.7).add_to(map_clusters)
       
map_clusters

### Examine cluster results

In [36]:
#Cluster 0
madrid_merged.loc[madrid_merged['Cluster Labels'] == 0, madrid_merged.columns[[1] + list(range(5, madrid_merged.shape[1]))]]

Unnamed: 0,District,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,Puente de Vallecas,40.3966,-3.6547,0.0,Grocery Store,Diner,Big Box Store,Beer Garden,Electronics Store,Scenic Lookout,Gym / Fitness Center,Park,Creperie,Cupcake Shop


In [37]:
#Cluster 1
madrid_merged.loc[madrid_merged['Cluster Labels'] == 1, madrid_merged.columns[[1] + list(range(5, madrid_merged.shape[1]))]]

Unnamed: 0,District,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Usera,40.3848,-3.7054,1.0,Fast Food Restaurant,Seafood Restaurant,Mobile Phone Shop,Metro Station,Bubble Tea Shop,Pub,Pool,Spanish Restaurant,Bakery,BBQ Joint


In [40]:
#Cluster 2
madrid_merged.loc[madrid_merged['Cluster Labels'] == 2, madrid_merged.columns[[1] + list(range(5, madrid_merged.shape[1]))]]

Unnamed: 0,District,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Centro,40.419,-3.7118,2.0,Hotel,Spanish Restaurant,Plaza,Ice Cream Shop,Coffee Shop,Bar,Seafood Restaurant,Tapas Restaurant,Record Shop,Italian Restaurant
1,Arganzuela,40.405,-3.7105,2.0,Bar,Tapas Restaurant,Plaza,Pizza Place,Gym / Fitness Center,Art Gallery,Spanish Restaurant,Gym,Brewery,Restaurant
2,Retiro,40.4162,-3.6801,2.0,Café,Bakery,Restaurant,Fountain,Park,Gastropub,Snack Place,Spanish Restaurant,Paella Restaurant,Lake
3,Salamanca,40.4262,-3.6851,2.0,Spanish Restaurant,Restaurant,Hotel,Clothing Store,Boutique,Furniture / Home Store,Tapas Restaurant,Italian Restaurant,Bakery,Accessories Store
4,Chamartín,40.4618,-3.6851,2.0,Spanish Restaurant,Restaurant,Mediterranean Restaurant,Nightclub,Gym / Fitness Center,Seafood Restaurant,Bar,Steakhouse,Plaza,Pizza Place
5,Tetuán,40.419,-3.7118,2.0,Hotel,Spanish Restaurant,Plaza,Ice Cream Shop,Coffee Shop,Bar,Seafood Restaurant,Tapas Restaurant,Record Shop,Italian Restaurant
6,Chamberí,40.4305,-3.7105,2.0,Spanish Restaurant,Supermarket,Coffee Shop,Bar,Restaurant,Department Store,Hotel,Theater,Tapas Restaurant,Café
7,Fuencarral-El Pardo,40.4245,-3.6991,2.0,Restaurant,Spanish Restaurant,Tapas Restaurant,Cocktail Bar,Hotel,Bar,Gay Bar,Italian Restaurant,Bookstore,Ice Cream Shop
8,Moncloa-Aravaca,40.43,-3.7257,2.0,Paella Restaurant,Tapas Restaurant,Bar,Spanish Restaurant,Restaurant,Pub,Movie Theater,Athletics & Sports,Argentinian Restaurant,Cocktail Bar
9,Latina,40.3961,-3.7486,2.0,Scenic Lookout,Asian Restaurant,Peruvian Restaurant,Pizza Place,Plaza,Soccer Field,Paella Restaurant,Train Station,Gift Shop,Grocery Store


In [41]:
#Cluster 3
madrid_merged.loc[madrid_merged['Cluster Labels'] == 3, madrid_merged.columns[[1] + list(range(5, madrid_merged.shape[1]))]]

Unnamed: 0,District,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,Carabanchel,40.3787,-3.7359,3.0,Metro Station,Motel,Food Service,Flea Market,Fast Food Restaurant,Embassy / Consulate,Electronics Store,Duty-free Shop,Dumpling Restaurant,Donut Shop


In [42]:
#Cluster 4
madrid_merged.loc[madrid_merged['Cluster Labels'] == 4, madrid_merged.columns[[1] + list(range(5, madrid_merged.shape[1]))]]

Unnamed: 0,District,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
16,Villaverde,40.3367,-3.6978,4.0,Food & Drink Shop,Spanish Restaurant,Electronics Store,Donut Shop,Flea Market,Fast Food Restaurant,Embassy / Consulate,Duty-free Shop,Dumpling Restaurant,Yoga Studio
