# Capstone Project

## Table of Contents of the report

<div style="margin-top: 20px">

<font size = 3>

1.  <a>Introduction and Business proposition</a>

2.  <a>Data source and preprocessing</a>

3.  <a>Methodology</a>

4.  <a>Results</a>

5.  <a>Discussion</a>

6.  <a>Conclussion</a>  
    </font>
    </div>



## 1. Introduction and Business Proposition

In Spain the focus of wealth is focused mainly in two cities, Madrid (the capital) and Barcelona. They have the biggest population density and concentrate most of the industrial fabric of the country. Also, they are known for the stress of their city life, the traffic jams, and the polution.
The covid-19 pandemic has made possible that many traditional companies allow their staff to work from home. 

This situation has extended for almost a year now, and many people are considering to make the change permanent. This change has been accepted by many companies, even those based who thought that in-person work was indispensable.
In many cases this situations has been extended, and many people are moving to remote destinations, only requiring a good internet connection to telework.

People moving to new destinations go preferentially to places not in the meanstream of tourist locations, and instead opt for moving (albeit temporarily) to less known destinations that are cheap and at the same time close to nature (mountains, beaches, etc.)

### Objectives

The purspose of this project is to allow decide what destinations are more interesting for people moving from the main cities for a semi-permanent telework.
This analysis could be used for constructor companies, or new companies providing services to this new generation of workers moving away from cities.


### Project definition and scope

The analisys to be conducted will try to find a balance between a work far from big cities while having access to the commodities and ammenities of the modern life. Thus we discard deep rural zones, as internet connection may pose a problem and access to supermarkets, farmacies and leisure not always available.

The project will analyse what cities from Spain, not including the main ones, have a better balance between 'rural' life and access to 'city commodities'

## 2. Data source and preprocessing

### Description of the data

We will use the information contained in the following site to extract information about the cities of Spain
https://códigospostales.es/listado-de-codigos-postales-de-espana/

A CSV file (listado-codigos-postales-con-LatyLon.csv) is available in that site containing a list with the provinces/cities/postal_codes and latitude and longititude.

Below we describe the proccess to collect the data and transform it into a Pandas dataframe that will later be used to conduct the analysis and cluster the different locations.

## Data downloading and preprocessing

In [2]:
import csv
import xml
import requests
import urllib.request
import numpy as np
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

### Let's download the data and store it in the local folder for later processing

In [3]:
url= "https://xn--cdigospostales-lob.es/wp-content/uploads/2018/09/listado-codigos-postales-con-LatyLon.csv"
urllib.request.urlretrieve(url, 'listado-codigos-postales-con-LatyLon.csv')

('listado-codigos-postales-con-LatyLon.csv',
 <http.client.HTTPMessage at 0x1054b8670>)

In [4]:
cities = pd.read_csv('listado-codigos-postales-con-LatyLon.csv', delimiter=';') 
cities.head()

Unnamed: 0,provincia,poblacion,codigopostalid,lat,lon
0,Araba/Álava,Alegría-Dulantzi,240,-2.712437,42.939812
1,Araba/Álava,Alegría-Dulantzi,1193,-2.712437,42.939812
2,Araba/Álava,Amurrio,1450,-3.000073,43.054278
3,Araba/Álava,Amurrio,1468,-3.000073,43.054278
4,Araba/Álava,Amurrio,1470,-3.000073,43.054278


### Latitude/Longitude incorrect!

It's important to note that the latitude and longitude 'columns' in the CSV are changed. That is, the 'longitude' field refers actually to the latitude. So, the first step is to get it correctly.

In [5]:
cities = cities.rename(columns={'lat':'longitude', 'lon':'latitude'})
cities.head()

Unnamed: 0,provincia,poblacion,codigopostalid,longitude,latitude
0,Araba/Álava,Alegría-Dulantzi,240,-2.712437,42.939812
1,Araba/Álava,Alegría-Dulantzi,1193,-2.712437,42.939812
2,Araba/Álava,Amurrio,1450,-3.000073,43.054278
3,Araba/Álava,Amurrio,1468,-3.000073,43.054278
4,Araba/Álava,Amurrio,1470,-3.000073,43.054278


In [6]:
print('The dataframe has {} provinces and {} cities.'.format(
        len(cities['provincia'].unique()),
        cities.shape[0]
    )
)

The dataframe has 52 provinces and 14665 cities.


#### Remove non-used data

Each Province has a number of cities, with the capital of the provice having the same name. So first we just leave out the minor towns in each province and also we exclude the big cities, Madrid and Barcelona


In [7]:
main_cities = cities[cities['provincia']==cities['poblacion']]
main_cities = main_cities.drop_duplicates(['provincia','poblacion'], keep='last')

indexNames = main_cities[ (main_cities['poblacion']=='Madrid') | (main_cities['poblacion']=='Barcelona')| (main_cities['poblacion']=='Valencia') ].index
main_cities.drop(indexNames , inplace=True)
main_cities.reset_index(inplace = True)
main_cities.head()

Unnamed: 0,index,provincia,poblacion,codigopostalid,longitude,latitude
0,148,Albacete,Albacete,2512,-1.855747,38.995881
1,349,Alicante/Alacant,Alicante/Alacant,3699,-0.483183,38.345487
2,586,Almería,Almería,4160,-2.464132,36.838924
3,791,Ávila,Ávila,5197,-4.697713,40.65587
4,1099,Badajoz,Badajoz,6195,-6.970997,38.878743


In [8]:
# create map of Spain using latitude and longitude values
map_spain = folium.Map(location=[36.976, -4.27], zoom_start=6)

# add markers to map
for lat, lng, province, city in zip(main_cities['latitude'], main_cities['longitude'], main_cities['provincia'], main_cities['poblacion']):
    label = '{}, {}'.format(city, province)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_spain)  
    
map_spain

In [9]:
CLIENT_ID = 'KQE1T3QCCUCSDMTK1VIGDJRHU4ZQHKGVZVV5RIWBUQEKVTZ2' # your Foursquare ID
CLIENT_SECRET = 'KQE1T3QCCUCSDMTK1VIGDJRHU4ZQHKGVZVV5RIWBUQEKVTZ2' # your Foursquare Secret
ACCESS_TOKEN = 'PPA4JQ0QQTS2JREWQ1PURHOKC13H4RZRJDBBEGDZZL1KQK5E' # your FourSquare Access Token
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: KQE1T3QCCUCSDMTK1VIGDJRHU4ZQHKGVZVV5RIWBUQEKVTZ2
CLIENT_SECRET:KQE1T3QCCUCSDMTK1VIGDJRHU4ZQHKGVZVV5RIWBUQEKVTZ2


In [10]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&oauth_token={}&radius={}&limit={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        lat, 
        lng,
        ACCESS_TOKEN,
        radius, 
        LIMIT)
        #url 

        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['City', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [11]:
#main_cities = main_cities.head()
city = main_cities.loc[:, 'poblacion'] #+'/'+list(map(str, malaga_data.loc[:, 'codigopostalid']))
latitudes = main_cities.loc[:, 'latitude']
longitudes = main_cities.loc[:, 'longitude']

spain_venues = getNearbyVenues(city, latitudes, longitudes, radius=500)

Albacete
Alicante/Alacant
Almería
Ávila
Badajoz
Burgos
Cáceres
Cádiz
Ciudad Real
Córdoba
A Coruña
Cuenca
Girona
Granada
Guadalajara
Huelva
Huesca
Jaén
León
Lleida
Lugo
Málaga
Murcia
Ourense
Palencia
Pontevedra
Salamanca
Santa Cruz de Tenerife
Segovia
Sevilla
Soria
Tarragona
Teruel
Toledo
Valladolid
Zamora
Zaragoza
Ceuta
Melilla


In [12]:
print(spain_venues.shape)
spain_venues.head()

(2787, 7)


Unnamed: 0,City,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Albacete,38.995881,-1.855747,Asador Concepción,38.994365,-1.855289,Spanish Restaurant
1,Albacete,38.995881,-1.855747,Gran Hotel Albacete,38.994349,-1.85396,Hotel
2,Albacete,38.995881,-1.855747,La Bodega de Serapio,38.99508,-1.856989,Winery
3,Albacete,38.995881,-1.855747,Piacere Gelato dil giorno,38.994309,-1.855284,Ice Cream Shop
4,Albacete,38.995881,-1.855747,Teatro Circo,38.995807,-1.854121,Theater


## 3. Methodology


In [13]:
spain_venues.groupby('City').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
City,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A Coruña,100,100,100,100,100,100
Albacete,84,84,84,84,84,84
Alicante/Alacant,100,100,100,100,100,100
Almería,100,100,100,100,100,100
Badajoz,53,53,53,53,53,53
Burgos,100,100,100,100,100,100
Ceuta,32,32,32,32,32,32
Ciudad Real,48,48,48,48,48,48
Cuenca,38,38,38,38,38,38
Cáceres,76,76,76,76,76,76


In [14]:
print('There are {} uniques categories.'.format(len(spain_venues['Venue Category'].unique())))

There are 207 uniques categories.


### Analysing cities

In [15]:
# one hot encoding
spain_onehot = pd.get_dummies(spain_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
spain_onehot['City'] = spain_venues['City'] 

# move neighborhood column to the first column
fixed_columns = [spain_onehot.columns[-1]] + list(spain_onehot.columns[:-1])
spain_onehot = spain_onehot[fixed_columns]

spain_onehot.head()

Unnamed: 0,City,ATM,Accessories Store,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Bakery,Bank,Bar,Bay,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Betting Shop,Bike Rental / Bike Share,Bistro,Board Shop,Boarding House,Bookstore,Botanical Garden,Boutique,Breakfast Spot,Brewery,Bridal Shop,Buffet,Burger Joint,Burrito Place,Bus Station,Business Service,Cafeteria,Café,Camera Store,Candy Store,Casino,Castle,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,City Hall,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Creperie,Cultural Center,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Dive Bar,Donut Shop,Electronics Store,Empanada Restaurant,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Financial or Legal Service,Fish & Chips Shop,Fish Market,Flea Market,Food,Food & Drink Shop,Food Court,French Restaurant,Frozen Yogurt Shop,Furniture / Home Store,Garden,Garden Center,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Gluten-free Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Halal Restaurant,Harbor / Marina,Health & Beauty Service,Heliport,Historic Site,History Museum,Hobby Shop,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Housing Development,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Insurance Office,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Kebab Restaurant,Kids Store,Lawyer,Lingerie Store,Lounge,Market,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Mosque,Motel,Movie Theater,Multiplex,Museum,Music Venue,Nail Salon,Neighborhood,Nightclub,Nightlife Spot,Optical Shop,Other Nightlife,Outdoors & Recreation,Paella Restaurant,Palace,Paper / Office Supplies Store,Park,Parking,Pedestrian Plaza,Performing Arts Venue,Perfume Shop,Pharmacy,Photography Studio,Pie Shop,Pizza Place,Playground,Plaza,Pub,Public Art,Record Shop,Rental Car Location,Residential Building (Apartment / Condo),Rest Area,Restaurant,Road,Rock Club,Roof Deck,Rooftop Bar,Sake Bar,Salad Place,Sandwich Place,Scenic Lookout,Science Museum,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Snack Place,South American Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Steakhouse,Strip Club,Supermarket,Sushi Restaurant,Tapas Restaurant,Tea Room,Tech Startup,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Tourist Information Center,Toy / Game Store,Trail,Train Station,Travel Agency,Udon Restaurant,Vegetarian / Vegan Restaurant,Water Park,Whisky Bar,Wine Bar,Winery,Women's Store
0,Albacete,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Albacete,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Albacete,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
3,Albacete,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Albacete,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [16]:
spain_onehot.shape

(2787, 208)

Next, let's group rows by city and by taking the mean of the frequency of occurrence of each category

In [17]:
spain_grouped = spain_onehot.groupby('City').mean().reset_index()
spain_grouped

Unnamed: 0,City,ATM,Accessories Store,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Bakery,Bank,Bar,Bay,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Betting Shop,Bike Rental / Bike Share,Bistro,Board Shop,Boarding House,Bookstore,Botanical Garden,Boutique,Breakfast Spot,Brewery,Bridal Shop,Buffet,Burger Joint,Burrito Place,Bus Station,Business Service,Cafeteria,Café,Camera Store,Candy Store,Casino,Castle,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,City Hall,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Creperie,Cultural Center,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Dive Bar,Donut Shop,Electronics Store,Empanada Restaurant,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Financial or Legal Service,Fish & Chips Shop,Fish Market,Flea Market,Food,Food & Drink Shop,Food Court,French Restaurant,Frozen Yogurt Shop,Furniture / Home Store,Garden,Garden Center,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Gluten-free Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Halal Restaurant,Harbor / Marina,Health & Beauty Service,Heliport,Historic Site,History Museum,Hobby Shop,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Housing Development,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Insurance Office,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Kebab Restaurant,Kids Store,Lawyer,Lingerie Store,Lounge,Market,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Mosque,Motel,Movie Theater,Multiplex,Museum,Music Venue,Nail Salon,Neighborhood,Nightclub,Nightlife Spot,Optical Shop,Other Nightlife,Outdoors & Recreation,Paella Restaurant,Palace,Paper / Office Supplies Store,Park,Parking,Pedestrian Plaza,Performing Arts Venue,Perfume Shop,Pharmacy,Photography Studio,Pie Shop,Pizza Place,Playground,Plaza,Pub,Public Art,Record Shop,Rental Car Location,Residential Building (Apartment / Condo),Rest Area,Restaurant,Road,Rock Club,Roof Deck,Rooftop Bar,Sake Bar,Salad Place,Sandwich Place,Scenic Lookout,Science Museum,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Snack Place,South American Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Steakhouse,Strip Club,Supermarket,Sushi Restaurant,Tapas Restaurant,Tea Room,Tech Startup,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Tourist Information Center,Toy / Game Store,Trail,Train Station,Travel Agency,Udon Restaurant,Vegetarian / Vegan Restaurant,Water Park,Whisky Bar,Wine Bar,Winery,Women's Store
0,A Coruña,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.12,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.03,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.05,0.02,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.12,0.01,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0
1,Albacete,0.0,0.0,0.011905,0.0,0.0,0.0,0.0,0.0,0.0,0.011905,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011905,0.0,0.0,0.0,0.0,0.011905,0.0,0.0,0.0,0.011905,0.0,0.0,0.0,0.0,0.059524,0.011905,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011905,0.011905,0.047619,0.0,0.0,0.0,0.011905,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011905,0.011905,0.0,0.011905,0.011905,0.0,0.0,0.011905,0.0,0.011905,0.0,0.0,0.0,0.02381,0.011905,0.0,0.0,0.011905,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.011905,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.011905,0.0,0.011905,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011905,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.011905,0.0,0.011905,0.0,0.0,0.0,0.011905,0.011905,0.0,0.011905,0.0,0.011905,0.0,0.011905,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011905,0.0,0.035714,0.047619,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.011905,0.0,0.0,0.0,0.0,0.0,0.011905,0.0,0.0,0.0,0.011905,0.047619,0.0,0.0,0.0,0.0,0.011905,0.0,0.071429,0.02381,0.0,0.0,0.011905,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011905,0.0,0.011905,0.0
2,Alicante/Alacant,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.07,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.06,0.02,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.07,0.0,0.0,0.0,0.0,0.0,0.0,0.16,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.03,0.0,0.0
3,Almería,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.09,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.03,0.07,0.0,0.0,0.0,0.0,0.0,0.04,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.06,0.01,0.0,0.0,0.0,0.0,0.0,0.16,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0
4,Badajoz,0.018868,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.037736,0.0,0.056604,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.056604,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.075472,0.037736,0.018868,0.0,0.0,0.018868,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037736,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.09434,0.056604,0.0,0.0,0.0,0.0,0.0,0.037736,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.09434,0.0,0.0,0.0,0.0,0.018868,0.0,0.09434,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Burgos,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.08,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.08,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.02,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.09,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.16,0.0,0.0,0.0,0.0,0.0,0.01,0.05,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0
6,Ceuta,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.03125,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.03125,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.03125,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.09375,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0
7,Ciudad Real,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.020833,0.020833,0.0,0.020833,0.0,0.0,0.0,0.0,0.041667,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0625,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.020833,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667
8,Cuenca,0.0,0.026316,0.026316,0.026316,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.026316,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.078947,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.078947,0.026316,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.210526,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Cáceres,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.026316,0.0,0.0,0.065789,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.039474,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.026316,0.0,0.0,0.0,0.078947,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.065789,0.039474,0.0,0.0,0.0,0.0,0.0,0.092105,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.013158,0.0,0.0,0.013158,0.131579,0.0,0.013158,0.0,0.0,0.0,0.0,0.105263,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [18]:
spain_grouped.shape

(39, 208)

In [19]:
num_top_venues = 5

for hood in spain_grouped['City']:
    print("----"+hood+"----")
    temp = spain_grouped[spain_grouped['City'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----A Coruña----
                venue  freq
0    Tapas Restaurant  0.12
1                 Bar  0.12
2  Spanish Restaurant  0.06
3          Restaurant  0.05
4               Plaza  0.05


----Albacete----
              venue  freq
0  Tapas Restaurant  0.07
1              Café  0.06
2        Restaurant  0.05
3       Coffee Shop  0.05
4               Pub  0.05


----Alicante/Alacant----
                venue  freq
0    Tapas Restaurant  0.16
1          Restaurant  0.10
2  Spanish Restaurant  0.07
3  Italian Restaurant  0.07
4               Plaza  0.06


----Almería----
                venue  freq
0    Tapas Restaurant  0.16
1                 Bar  0.09
2                 Pub  0.07
3  Spanish Restaurant  0.06
4          Restaurant  0.04


----Badajoz----
                venue  freq
0    Tapas Restaurant  0.09
1               Plaza  0.09
2  Spanish Restaurant  0.09
3      Clothing Store  0.08
4                Café  0.06


----Burgos----
                venue  freq
0  Spanish Restaurant  0.16


In [20]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [30]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['City']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['City'] = spain_grouped['City']

for ind in np.arange(spain_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(spain_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,A Coruña,Bar,Tapas Restaurant,Spanish Restaurant,Plaza,Restaurant,Wine Bar,Ice Cream Shop,Seafood Restaurant,Cupcake Shop,Coffee Shop
1,Albacete,Tapas Restaurant,Café,Restaurant,Coffee Shop,Spanish Restaurant,Pub,Plaza,Hotel,Bar,Grocery Store
2,Alicante/Alacant,Tapas Restaurant,Restaurant,Italian Restaurant,Spanish Restaurant,Plaza,Mediterranean Restaurant,Ice Cream Shop,Coffee Shop,Pizza Place,Wine Bar
3,Almería,Tapas Restaurant,Bar,Pub,Spanish Restaurant,Hotel,Restaurant,Plaza,Mediterranean Restaurant,Coffee Shop,Café
4,Badajoz,Spanish Restaurant,Tapas Restaurant,Plaza,Clothing Store,Café,Pub,Bar,Cocktail Bar,Bakery,Restaurant


In [31]:
# set number of clusters
kclusters = 5

spain_grouped_clustering = spain_grouped.drop('City', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(spain_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([1, 1, 2, 2, 1, 3, 1, 2, 3, 3, 1, 3, 2, 2, 1, 1, 2, 2, 1, 2, 1, 4,
       2, 2, 2, 1, 2, 1, 1, 3, 2, 1, 2, 3, 3, 1, 0, 2, 3], dtype=int32)

In [32]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

spain_merged = main_cities

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
spain_merged = spain_merged.join(neighborhoods_venues_sorted.set_index('City'), on='poblacion')

spain_merged.head() # check the last columns!

Unnamed: 0,index,provincia,poblacion,codigopostalid,longitude,latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,148,Albacete,Albacete,2512,-1.855747,38.995881,1,Tapas Restaurant,Café,Restaurant,Coffee Shop,Spanish Restaurant,Pub,Plaza,Hotel,Bar,Grocery Store
1,349,Alicante/Alacant,Alicante/Alacant,3699,-0.483183,38.345487,2,Tapas Restaurant,Restaurant,Italian Restaurant,Spanish Restaurant,Plaza,Mediterranean Restaurant,Ice Cream Shop,Coffee Shop,Pizza Place,Wine Bar
2,586,Almería,Almería,4160,-2.464132,36.838924,2,Tapas Restaurant,Bar,Pub,Spanish Restaurant,Hotel,Restaurant,Plaza,Mediterranean Restaurant,Coffee Shop,Café
3,791,Ávila,Ávila,5197,-4.697713,40.65587,3,Spanish Restaurant,Restaurant,Hotel,Plaza,Bar,Café,Dessert Shop,Pharmacy,Food,Electronics Store
4,1099,Badajoz,Badajoz,6195,-6.970997,38.878743,1,Spanish Restaurant,Tapas Restaurant,Plaza,Clothing Store,Café,Pub,Bar,Cocktail Bar,Bakery,Restaurant
5,1992,Burgos,Burgos,9199,-3.704198,42.34113,3,Spanish Restaurant,Restaurant,Bar,Café,Hotel,Tapas Restaurant,Italian Restaurant,Gastropub,Clothing Store,Pizza Place
6,2500,Cáceres,Cáceres,10920,-6.371211,39.473168,3,Spanish Restaurant,Tapas Restaurant,Restaurant,Hotel,Plaza,Bar,Historic Site,Pub,Café,Art Gallery
7,2793,Cádiz,Cádiz,11012,-6.284146,36.521712,1,Park,Tapas Restaurant,Italian Restaurant,Plaza,Bar,Spanish Restaurant,ATM,Furniture / Home Store,Gastropub,Bistro
8,3103,Ciudad Real,Ciudad Real,13197,-3.93132,38.986518,2,Tapas Restaurant,Clothing Store,Hotel,Mobile Phone Shop,Coffee Shop,Plaza,Supermarket,Ice Cream Shop,Restaurant,Café
9,3267,Córdoba,Córdoba,14912,-4.780325,37.879542,3,Spanish Restaurant,Hotel,Restaurant,Tapas Restaurant,Plaza,History Museum,Hostel,Bar,Spa,Historic Site


In [33]:
# create map
map_clusters = folium.Map(location=[40.6558, -4.6977], zoom_start=5)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(spain_merged['latitude'], spain_merged['longitude'], spain_merged['poblacion'], spain_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Analysing each cluster

In [25]:
spain_merged.loc[spain_merged['Cluster Labels'] == 0, spain_merged.columns[[1] + list(range(5, spain_merged.shape[1]))]][0:5]

Unnamed: 0,provincia,latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
35,Zamora,41.49914,0,Spanish Restaurant,Park,Gastropub,Scenic Lookout,Castle,Women's Store,Empanada Restaurant,Food,Flea Market,Fish Market


In [26]:
spain_merged.loc[spain_merged['Cluster Labels'] == 1, spain_merged.columns[[1] + list(range(5, spain_merged.shape[1]))]][0:5]

Unnamed: 0,provincia,latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Albacete,38.995881,1,Tapas Restaurant,Café,Restaurant,Coffee Shop,Spanish Restaurant,Pub,Plaza,Hotel,Bar,Grocery Store
4,Badajoz,38.878743,1,Spanish Restaurant,Tapas Restaurant,Plaza,Clothing Store,Café,Pub,Bar,Cocktail Bar,Bakery,Restaurant
7,Cádiz,36.521712,1,Park,Tapas Restaurant,Italian Restaurant,Plaza,Bar,Spanish Restaurant,ATM,Furniture / Home Store,Gastropub,Bistro
10,A Coruña,43.371266,1,Bar,Tapas Restaurant,Spanish Restaurant,Plaza,Restaurant,Wine Bar,Ice Cream Shop,Seafood Restaurant,Cupcake Shop,Coffee Shop
14,Guadalajara,40.634355,1,Mobile Phone Shop,Bar,Spanish Restaurant,Brewery,Plaza,Pub,Restaurant,Tech Startup,Park,Optical Shop


In [27]:
spain_merged.loc[spain_merged['Cluster Labels'] == 2, spain_merged.columns[[1] + list(range(5, spain_merged.shape[1]))]][0:5]

Unnamed: 0,provincia,latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Alicante/Alacant,38.345487,2,Tapas Restaurant,Restaurant,Italian Restaurant,Spanish Restaurant,Plaza,Mediterranean Restaurant,Ice Cream Shop,Coffee Shop,Pizza Place,Wine Bar
2,Almería,36.838924,2,Tapas Restaurant,Bar,Pub,Spanish Restaurant,Hotel,Restaurant,Plaza,Mediterranean Restaurant,Coffee Shop,Café
8,Ciudad Real,38.986518,2,Tapas Restaurant,Clothing Store,Hotel,Mobile Phone Shop,Coffee Shop,Plaza,Supermarket,Ice Cream Shop,Restaurant,Café
12,Girona,41.981861,2,Restaurant,Mediterranean Restaurant,Plaza,Spanish Restaurant,Café,Hotel,Bar,Wine Bar,Bakery,Ice Cream Shop
13,Granada,37.176419,2,Tapas Restaurant,Hotel,Spanish Restaurant,Plaza,Bar,Café,Gift Shop,Italian Restaurant,Seafood Restaurant,Moroccan Restaurant


In [28]:
spain_merged.loc[spain_merged['Cluster Labels'] == 3, spain_merged.columns[[1] + list(range(5, spain_merged.shape[1]))]][0:5]

Unnamed: 0,provincia,latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Ávila,40.65587,3,Spanish Restaurant,Restaurant,Hotel,Plaza,Bar,Café,Dessert Shop,Pharmacy,Food,Electronics Store
5,Burgos,42.34113,3,Spanish Restaurant,Restaurant,Bar,Café,Hotel,Tapas Restaurant,Italian Restaurant,Gastropub,Clothing Store,Pizza Place
6,Cáceres,39.473168,3,Spanish Restaurant,Tapas Restaurant,Restaurant,Hotel,Plaza,Bar,Historic Site,Pub,Café,Art Gallery
9,Córdoba,37.879542,3,Spanish Restaurant,Hotel,Restaurant,Tapas Restaurant,Plaza,History Museum,Hostel,Bar,Spa,Historic Site
11,Cuenca,40.076538,3,Spanish Restaurant,Historic Site,Plaza,Restaurant,Hotel,Arts & Crafts Store,Bar,Cocktail Bar,Pizza Place,Science Museum


## 4. Results
The clustering shows that the 1st common venue is an important factor in the k-means algorithm. These results do not give an explicit answer to what's the best almost-permanent destination for a teleworker. This will depend on the profile of each person and their preferences. 

For example, somebody who prefers green spaces and culture, from the clustering we would consider as one of the main options Zamora, where 'parks', 'castle' and 'scenic lookouts' are listed among the most common venues.




## 5. Discussion
The analysis performed has some limitations. One is that we are not considering if the city is on the coastline and the quality of the beaches. This could be solved upgrading the Foursquare account or crossing the initial data with a dataset containing that kind of information.

Weather should be another variable to take into account, as there's an important difference between the cities in the north and the south

## 6. Conclussion

This case shows that there are some misconceptions regarding which are the 'best' cities for living in Spain. However, these results will be meaningfult considering each person's preferences.

Also, there are some limitations that could be overcome with better datad.