# Lisbon Rebuilt: Notebook 2

## C. Lisbon Venues 

### C.1 Lisbon Venues with Foursquare

In this subsection, we extract a list of Venues in Lisbon Municipality, using the Foursquare API. As our aim is to study the Venues per Parish, we use as location the coordinates of the Parish Council for each parish, from which we define a radius. This is however an approximation in two ways: on the one hand, the Parish Council is not always at the geographical center of the Parish and on the other hand, parishes vary in shape, i.e., their areas are not circular. Since we dispose of information on the area of each parish, we will be using a variable radius, that is computed from it. Three dataframes are generated in this subsection and a map: 1 - a dataframe containing categorized venues of one choosen parish; 2 - a dataframe containing categorized venues of all parishes of Lisbon; 3 - a dataframe contaning the number of venues per parish, from which we create a new dataset to be used elsewhere; 4 - an interactive map of Lisbon signaling the position of each Parish Council. 

We start by importing the previously prepared dataset, containing data about the area of each parish and the geographical coordinates of the respective parish council:

In [1]:
import pandas as pd

In [2]:
# dfpc: dataframe parish coordinates

dfpc=pd.read_csv('parishcoord.csv')
dfpc.head()

Unnamed: 0.1,Unnamed: 0,Number,Parish,Population 2013,Area km^2,Latitude,Longitude
0,0,1,Ajuda,15617,2.88,38.705781,-9.201325
1,1,2,Alcântra,13943,5.07,38.704844,-9.179878
2,2,3,Alvalade,31813,5.34,38.746914,-9.140536
3,3,4,Areeiro,20131,1.74,38.745119,-9.142653
4,4,5,Arroios,31653,2.13,38.721008,-9.134658


In [3]:
dfpc.drop('Unnamed: 0',axis=1,inplace=True)
dfpc.head()

Unnamed: 0,Number,Parish,Population 2013,Area km^2,Latitude,Longitude
0,1,Ajuda,15617,2.88,38.705781,-9.201325
1,2,Alcântra,13943,5.07,38.704844,-9.179878
2,3,Alvalade,31813,5.34,38.746914,-9.140536
3,4,Areeiro,20131,1.74,38.745119,-9.142653
4,5,Arroios,31653,2.13,38.721008,-9.134658


In order to gain insight on the dimensions of each parish, we compute the average radius per parish:

In [4]:
avgarea=dfpc['Area km^2'].sum()/24.
avgarea

4.168333333333333

In [5]:
dfpc.describe()

Unnamed: 0,Number,Population 2013,Area km^2,Latitude,Longitude
count,24.0,24.0,24.0,24.0,24.0
mean,12.5,23029.166667,4.168333,38.734246,-9.153767
std,7.071068,9519.08835,2.400032,0.023613,0.030096
min,1.0,11836.0,1.49,38.698186,-9.206761
25%,6.75,15429.75,2.3925,38.714471,-9.171669
50%,12.5,20578.0,3.185,38.728058,-9.152394
75%,18.25,31693.0,5.365,38.747278,-9.133956
max,24.0,45605.0,10.43,38.782592,-9.097667


Computing the average, minimal, and maximal radius of parishes in the dataset, assuming each parish was a circle:

In [6]:
import math

In [7]:
avr=math.sqrt(avgarea/math.pi)
minr=math.sqrt(1.49/math.pi)
maxr=math.sqrt(10.43/math.pi)
print('Average radius: %.2f km' %avr)
print('Minimal radius: %.2f km' %minr)
print('Maximal radius: %.2f km' %maxr)

Average radius: 1.15 km
Minimal radius: 0.69 km
Maximal radius: 1.82 km


We notice that the radius of each parish is different enough to justify using a variable radius in our study.

<!-- We notice that choosing the average radius is likely to lead to the categorization of a number of Venues in more than one parish, which may blur our results. However, if we use a small radius, we may be excluding important Venues within the larger parishes. For this reason, we will perform the study for two different radius, r1=500 m, and r2=1000 m. -->

First, using `folium` library, we will draw an interactive map signaling the position of each parish council in Lisbon Municipality.

In [8]:
!pip install folium

Defaulting to user installation because normal site-packages is not writeable


In [9]:
import folium

We also use `Nominatim` in `geopy` library in order to find the geographical coordinates of Lisbon city.

In [10]:
!pip install geopy

Defaulting to user installation because normal site-packages is not writeable


In [11]:
from geopy.geocoders import Nominatim

In [12]:
address = 'Lisbon, PT'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Lisbon are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Lisbon are 38.7077507, -9.1365919.


So now we are ready to draw our map:

In [13]:
# drawing the folium map of Lisbon 
lisbon_map=folium.Map(location=[latitude, longitude], zoom_start=11)

# adding markers to the map: 
# it signals the position of each parish council;
# it shows up the parish name upon clicking on the marker.
for lat, lng, parish in zip(dfpc['Latitude'], dfpc['Longitude'],dfpc['Parish']):
    label = '{}'.format(parish)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat, lng],radius=5,popup=label,color='red',
        fill=True,fill_color='green',fill_opacity=0.7,
        parse_html=False).add_to(lisbon_map)  
    
lisbon_map

In [14]:
lisbon_map.save("lisbon_parishes.html")

In order to access Foursquare data, one needs to pass in the credentials (hidden for publication).

In order to see the Foursquare API working, we search for the Venues of a particular parish:

In [16]:
dfpc.loc[20,'Parish']

' Santa Maria Maior'

In [17]:
myplat=dfpc.loc[20,'Latitude'] 
myplong=dfpc.loc[20,'Longitude']

mypname = dfpc.loc[20, 'Parish']

print('Latitude and longitude values of {} are {}, {}.'.format(mypname, myplat, myplong))

Latitude and longitude values of  Santa Maria Maior are 38.71141666666667, -9.138008333333334.


In [18]:
area=dfpc.loc[20,'Area km^2']
radius=math.sqrt(area/math.pi)
print('Area: %.2f km^2' %area)
print('Radius: %.2f km' %radius)

Area: 3.01 km^2
Radius: 0.98 km


Let us choose a radius of 980 m for venues in 'Santa Maria Maior':

In [19]:
radius = 980
LIMIT=100

url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},\
{}&radius={}&limit={}'.\
format(CLIENT_ID, CLIENT_SECRET, VERSION,myplat,myplong,radius, LIMIT)

As the venues will be returned in a JSON file, we need to import the libraries `json` and `requests`, to handle the extracted information. For normalizing the results, we also import `json_normalize`from `pandas` library.

In [20]:
# necessary libraries to handle the extracted information
import json
import requests
#from pandas.io.json import json_normalize
from pandas import json_normalize

In [21]:
results=requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '6021537ea82a1d192ba65ea0'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'São Nicolau',
  'headerFullLocation': 'São Nicolau, Lisbon',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 234,
  'suggestedBounds': {'ne': {'lat': 38.72023667548668,
    'lng': -9.126726165783705},
   'sw': {'lat': 38.70259665784666, 'lng': -9.149290500882962}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4cc2ef7682388cfa79e26335',
       'name': 'Home Lisbon Hostel',
       'location': {'address': 'Rua de São Nicolau, 13, 2E',
        'lat': 38.710544539177334,
        'lng': -9.136180303413738,
        'labeledLatLngs': [{'label': 'display',
   

Form the JSON file, we now organize the Venues data in a dataframe.

In [22]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [23]:

venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head(15)

Unnamed: 0,name,categories,lat,lng
0,Home Lisbon Hostel,Hostel,38.710545,-9.13618
1,O Arco,Portuguese Restaurant,38.711836,-9.138565
2,Santini,Ice Cream Shop,38.712458,-9.139571
3,Fabrica da Nata - Rua Augusta,Bakery,38.712704,-9.138492
4,Escape Hunt Lisbon,Escape Room,38.710133,-9.136165
5,Hotel Santa Justa,Hotel,38.712692,-9.138045
6,Papabubble,Candy Store,38.709519,-9.137672
7,Living Lounge Hostel,Hostel,38.711034,-9.139089
8,Garrafeira Nacional,Wine Shop,38.712656,-9.137134
9,Brown ´s Hotel Central,Hotel,38.711454,-9.138375


In [24]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare.


We notice that most Venues in 'Santa Maria Maior' parish are either accomodation places or restaurants, which, in the context of our study, indicates a focus on touristic activity in this parish.

Now, we define a function that will extract the Venues of all parishes:

In [25]:
def getNearbyVenues(names, area, latitudes, longitudes):
    
    venues_list=[]
    for name, ar, lat, lng in zip(names, area, latitudes, longitudes):
        # computing the radius for the corresponding parish
        radius=math.sqrt(ar/math.pi)
        radius=radius*1000
#        radius=500
        print(name,':', round(radius,0))
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, radius, LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, lat, lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Parish', 'Parish Latitude', 'Parish Longitude', 
                  'Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category']
    
    return(nearby_venues)

Now, we call up the function defined above, to get the Venues:

In [26]:
# defining the parameters
name=dfpc['Parish']
lat=dfpc['Latitude']
long=dfpc['Longitude']
area=dfpc['Area km^2']

# calling the function
lisbon_venues=getNearbyVenues(name, area, lat, long)

 Ajuda : 957.0
 Alcântra : 1270.0
 Alvalade : 1304.0
 Areeiro : 744.0
 Arroios : 823.0
 Avenidas Novas : 976.0
 Beato : 885.0
 Belém : 1822.0
 Benfica : 1599.0
 Campo de Ourique : 725.0
 Campolide : 939.0
 Carnide : 1084.0
 Estrela : 1210.0
 Lumiar : 1446.0
 Marvila : 1505.0
 Misericórdia : 835.0
 Olivais : 1605.0
 Parque das Nações : 1316.0
 Penha de França : 929.0
 Santa Clara : 1034.0
 Santa Maria Maior : 979.0
 Santo António : 689.0
 São Domingos de Benfica : 1169.0
 São Vicente : 796.0


In [27]:
print('Number of venues in Lisbon:', lisbon_venues.shape[0],'\n')
lisbon_venues.head()

Number of venues in Lisbon: 1984 



Unnamed: 0,Parish,Parish Latitude,Parish Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Ajuda,38.705781,-9.201325,Restaurante Andorinhas,38.704911,-9.199349,Restaurant
1,Ajuda,38.705781,-9.201325,Palácio Nacional da Ajuda,38.707653,-9.197758,Historic Site
2,Ajuda,38.705781,-9.201325,Jardim Botânico da Ajuda,38.70643,-9.201222,Botanical Garden
3,Ajuda,38.705781,-9.201325,Churrasqueira do Marquês,38.703996,-9.199402,BBQ Joint
4,Ajuda,38.705781,-9.201325,Parque Recreativo dos Moinhos de Santana,38.705849,-9.205103,Park


In [28]:
lisbon_venues.drop(['Parish Latitude','Parish Longitude','Venue Latitude','Venue Longitude',\
                    'Venue Category'],axis=1,inplace=False).groupby('Parish').count()

Unnamed: 0_level_0,Venue
Parish,Unnamed: 1_level_1
Ajuda,49
Alcântra,70
Alvalade,100
Areeiro,97
Arroios,100
Avenidas Novas,100
Beato,8
Belém,85
Benfica,100
Campo de Ourique,37


In [29]:
lisbon_venues.to_csv('lisbonvenuesvarr.csv')

In [30]:
print('There are {} unique categories.'.format(len(lisbon_venues['Venue Category'].unique())))

There are 216 unique categories.


### C.2 Clustering Lisbon by Venues

In this subsection, our aim is to cluster the parishes in Lisbon Municipality according to their venues. The Machine Learning algorithm to be used is k-Means. This task is acomplished through the following: 1) preparation of the dataframe for clustering, which will be a normalized dataframe, that we call `lisbon_grouped`, containing the venues in each parish weighted by category, and it is derived from `lisbon_venues`; 2) building of a dataframe named `parish_venues_sorted`, containing information on the most frequent venues in each parish, and the attributed cluster label; 3) drawing of a folium map of Lisbon containing the clustering results; 4) exploration of the results.  

We start by organizing the venues by parish and by category:

In [31]:
# one hot encoding
lisbon_onehot = pd.get_dummies(lisbon_venues[['Venue Category']], prefix="", prefix_sep="")

# add parish column 
lisbon_onehot['Parish'] = lisbon_venues['Parish'] 

# move parish column to the first column
fixed_columns = [lisbon_onehot.columns[-1]] + list(lisbon_onehot.columns[:-1])
lisbon_onehot = lisbon_onehot[fixed_columns]

lisbon_onehot.head()

Unnamed: 0,Parish,Accessories Store,African Restaurant,Airport,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Amphitheater,Aquarium,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Veterinarian,Waterfront,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Ajuda,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Ajuda,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Ajuda,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Ajuda,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Ajuda,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [32]:
lisbon_onehot.shape

(1984, 217)

Now we group the venues by parish and compute the mean of each venue category per parish, which is a measure of the occurence rate of the corresponding venue. This data that will be directly used for clustering.

In [33]:
lisbon_grouped = lisbon_onehot.groupby('Parish').mean().reset_index()
lisbon_grouped.head(10)

Unnamed: 0,Parish,Accessories Store,African Restaurant,Airport,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Amphitheater,Aquarium,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Veterinarian,Waterfront,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Ajuda,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Alcântra,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Alvalade,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Areeiro,0.010309,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.010309,0.010309,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Arroios,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,...,0.0,0.0,0.03,0.0,0.0,0.02,0.0,0.0,0.0,0.0
5,Avenidas Novas,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,...,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Beato,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Belém,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Benfica,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Campo de Ourique,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [34]:
lisbon_grouped.shape

(24, 217)

Here we extract, form the previous dataframe, the top 5 venues per parish and corresponding frequency:

In [35]:
num_top_venues = 5

for hood in lisbon_grouped['Parish']:
    print("----"+hood+"----")
    temp = lisbon_grouped[lisbon_grouped['Parish'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

---- Ajuda----
                   venue  freq
0  Portuguese Restaurant  0.22
1             Restaurant  0.10
2                   Café  0.06
3              BBQ Joint  0.06
4            Supermarket  0.04


---- Alcântra----
                   venue  freq
0             Restaurant  0.11
1  Portuguese Restaurant  0.09
2                   Café  0.06
3                 Bakery  0.04
4                  Plaza  0.04


---- Alvalade----
                   venue  freq
0  Portuguese Restaurant  0.15
1             Restaurant  0.08
2         Ice Cream Shop  0.04
3                  Hotel  0.04
4   Gym / Fitness Center  0.03


---- Areeiro----
                   venue  freq
0  Portuguese Restaurant  0.09
1                  Hotel  0.08
2                    Bar  0.06
3             Restaurant  0.06
4                   Café  0.05


---- Arroios----
                   venue  freq
0                   Café  0.12
1  Portuguese Restaurant  0.09
2                  Hotel  0.07
3      Indian Restaurant  0.05
4       

Now we build up a dataframe containing the top 7 venues per parish:

In [36]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [37]:
import numpy as np

num_top_venues = 7

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Parish']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
parish_venues_sorted = pd.DataFrame(columns=columns)
parish_venues_sorted['Parish'] = lisbon_grouped['Parish']

for ind in np.arange(lisbon_grouped.shape[0]):
    parish_venues_sorted.iloc[ind, 1:] = return_most_common_venues(lisbon_grouped.iloc[ind, :], num_top_venues)

parish_venues_sorted.head()

Unnamed: 0,Parish,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
0,Ajuda,Portuguese Restaurant,Restaurant,Café,BBQ Joint,Supermarket,Bakery,Chinese Restaurant
1,Alcântra,Restaurant,Portuguese Restaurant,Café,Dessert Shop,Plaza,Bakery,Snack Place
2,Alvalade,Portuguese Restaurant,Restaurant,Hotel,Ice Cream Shop,Bar,Sushi Restaurant,Bakery
3,Areeiro,Portuguese Restaurant,Hotel,Restaurant,Bar,Café,Bakery,Pizza Place
4,Arroios,Café,Portuguese Restaurant,Hotel,Indian Restaurant,Scenic Lookout,Pizza Place,Plaza


For clustering, we will use the k-Means algorithm. We start by importing the necessary libraries.

In [38]:
!pip install sklearn

Defaulting to user installation because normal site-packages is not writeable


In [39]:
from sklearn.cluster import KMeans

We choose to organize our parishes in 7 clusters, i.e. k=7 in the K-Means algorithm. By applying the algorithm, each parish is attributed a cluster label.

In [40]:
# set number of clusters
kclusters = 7

lisbon_grouped_clustering = lisbon_grouped.drop('Parish', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(lisbon_grouped_clustering)

# cluster labels generated for each parish in the dataframe
kmeans.labels_[0:24]

array([0, 1, 6, 6, 3, 6, 2, 1, 1, 0, 6, 1, 0, 1, 4, 0, 1, 1, 3, 4, 3, 5,
       1, 3], dtype=int32)

In [41]:
# add clustering labels
parish_venues_sorted.insert(0,'Cluster Labels', kmeans.labels_)

lisbon_merged = dfpc

# merge data on lisbon parishes with data on parishes venues
lisbon_merged = lisbon_merged.join(parish_venues_sorted.set_index('Parish'), on='Parish')

lisbon_merged.head() 

Unnamed: 0,Number,Parish,Population 2013,Area km^2,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
0,1,Ajuda,15617,2.88,38.705781,-9.201325,0,Portuguese Restaurant,Restaurant,Café,BBQ Joint,Supermarket,Bakery,Chinese Restaurant
1,2,Alcântra,13943,5.07,38.704844,-9.179878,1,Restaurant,Portuguese Restaurant,Café,Dessert Shop,Plaza,Bakery,Snack Place
2,3,Alvalade,31813,5.34,38.746914,-9.140536,6,Portuguese Restaurant,Restaurant,Hotel,Ice Cream Shop,Bar,Sushi Restaurant,Bakery
3,4,Areeiro,20131,1.74,38.745119,-9.142653,6,Portuguese Restaurant,Hotel,Restaurant,Bar,Café,Bakery,Pizza Place
4,5,Arroios,31653,2.13,38.721008,-9.134658,3,Café,Portuguese Restaurant,Hotel,Indian Restaurant,Scenic Lookout,Pizza Place,Plaza


Finally, we display a folium map with the clustering results:

In [42]:
import matplotlib.cm as cm
import matplotlib.colors as colors
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

In [43]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(lisbon_merged['Latitude'],lisbon_merged['Longitude'],\
                                  lisbon_merged['Parish'],lisbon_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
#        color='red',
        fill=True,
        fill_color=rainbow[cluster-1],
#        fill_color='green',
        fill_opacity=1.).add_to(map_clusters)
       
map_clusters

In [44]:
map_clusters.save("k7_varr_wrwc.html")

We can study the properties of each cluster by examining its most common venues:

In [45]:
lisbon_merged.loc[lisbon_merged['Cluster Labels'] == 0, lisbon_merged.columns[[1]\
                    + list(range(6, lisbon_merged.shape[1]))]]

Unnamed: 0,Parish,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
0,Ajuda,0,Portuguese Restaurant,Restaurant,Café,BBQ Joint,Supermarket,Bakery,Chinese Restaurant
9,Campo de Ourique,0,Portuguese Restaurant,Coffee Shop,Bakery,Bar,Seafood Restaurant,Restaurant,Grocery Store
12,Estrela,0,Portuguese Restaurant,Café,Coffee Shop,Seafood Restaurant,Garden,Italian Restaurant,Restaurant
15,Misericórdia,0,Portuguese Restaurant,Wine Bar,Café,Coffee Shop,Lounge,Bar,Italian Restaurant


In [46]:
lisbon_merged.loc[lisbon_merged['Cluster Labels'] == 1, lisbon_merged.columns[[1]\
                                + list(range(6, lisbon_merged.shape[1]))]]

Unnamed: 0,Parish,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
1,Alcântra,1,Restaurant,Portuguese Restaurant,Café,Dessert Shop,Plaza,Bakery,Snack Place
7,Belém,1,Portuguese Restaurant,Café,Restaurant,Monument / Landmark,Italian Restaurant,Ice Cream Shop,Bakery
8,Benfica,1,Portuguese Restaurant,Café,Restaurant,Seafood Restaurant,Park,Theater,Soccer Stadium
11,Carnide,1,Portuguese Restaurant,Restaurant,Sushi Restaurant,Café,Soccer Stadium,Sporting Goods Shop,Burger Joint
13,Lumiar,1,Café,Bakery,Supermarket,Restaurant,Park,Portuguese Restaurant,History Museum
16,Olivais,1,Café,Portuguese Restaurant,Rental Car Location,Bakery,Airport Service,Coffee Shop,Grocery Store
17,Parque das Nações,1,Portuguese Restaurant,Restaurant,Café,Ice Cream Shop,Sushi Restaurant,Burger Joint,Hotel
22,São Domingos de Benfica,1,Café,Portuguese Restaurant,Restaurant,Clothing Store,Ice Cream Shop,Burger Joint,Sporting Goods Shop


In [47]:
lisbon_merged.loc[lisbon_merged['Cluster Labels'] == 2, lisbon_merged.columns[[1] +\
                                    list(range(6, lisbon_merged.shape[1]))]]

Unnamed: 0,Parish,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
6,Beato,2,Restaurant,Cafeteria,Museum,Theater,BBQ Joint,Supermarket,Yoga Studio


In [48]:
lisbon_merged.loc[lisbon_merged['Cluster Labels'] == 3, lisbon_merged.columns[[1]\
                                            + list(range(6, lisbon_merged.shape[1]))]]

Unnamed: 0,Parish,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
4,Arroios,3,Café,Portuguese Restaurant,Hotel,Indian Restaurant,Scenic Lookout,Pizza Place,Plaza
18,Penha de França,3,Portuguese Restaurant,Café,Indian Restaurant,Hotel,Hostel,Plaza,Italian Restaurant
20,Santa Maria Maior,3,Plaza,Portuguese Restaurant,Hostel,Wine Bar,Hotel,Bar,Coffee Shop
23,São Vicente,3,Café,Portuguese Restaurant,Hotel,Scenic Lookout,Vegetarian / Vegan Restaurant,Indian Restaurant,Coffee Shop


In [49]:
lisbon_merged.loc[lisbon_merged['Cluster Labels'] == 4, lisbon_merged.columns[[1]\
                                        + list(range(6, lisbon_merged.shape[1]))]]

Unnamed: 0,Parish,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
14,Marvila,4,Restaurant,Café,Portuguese Restaurant,Hotel,Plaza,Gym / Fitness Center,Brewery
19,Santa Clara,4,Restaurant,Supermarket,Café,Portuguese Restaurant,Gym / Fitness Center,Park,Plaza


In [50]:
lisbon_merged.loc[lisbon_merged['Cluster Labels'] == 5, lisbon_merged.columns[[1]\
                                + list(range(6, lisbon_merged.shape[1]))]]

Unnamed: 0,Parish,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
21,Santo António,5,Hotel,Restaurant,Bakery,Hotel Bar,Italian Restaurant,Portuguese Restaurant,Café


In [51]:
lisbon_merged.loc[lisbon_merged['Cluster Labels'] == 6, lisbon_merged.columns[[1]\
                                + list(range(6, lisbon_merged.shape[1]))]]

Unnamed: 0,Parish,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
2,Alvalade,6,Portuguese Restaurant,Restaurant,Hotel,Ice Cream Shop,Bar,Sushi Restaurant,Bakery
3,Areeiro,6,Portuguese Restaurant,Hotel,Restaurant,Bar,Café,Bakery,Pizza Place
5,Avenidas Novas,6,Portuguese Restaurant,Restaurant,Vegetarian / Vegan Restaurant,Pizza Place,Bakery,Coffee Shop,Sushi Restaurant
10,Campolide,6,Portuguese Restaurant,Restaurant,Bakery,Hotel,Italian Restaurant,Hotel Bar,Bar


As we can see, many of the venues found are restaurants and coffee shops. However, within the context of our study, we could assume that eating places are not a distinguishing feature for differentiating between residents, tourists, and external workers. Therefore we repeat the clustering procedure, but this time excluding most venues that are eating places.

In [52]:
lisbon_onehot = lisbon_onehot[lisbon_onehot.columns.drop(list(lisbon_onehot.filter(regex='Restaurant')))]
lisbon_onehot = lisbon_onehot[lisbon_onehot.columns.drop(list(lisbon_onehot.filter(regex='Pizza')))]
lisbon_onehot = lisbon_onehot[lisbon_onehot.columns.drop(list(lisbon_onehot.filter(regex='BBQ')))]
lisbon_onehot = lisbon_onehot[lisbon_onehot.columns.drop(list(lisbon_onehot.filter(regex='Steakhouse')))]
lisbon_onehot = lisbon_onehot[lisbon_onehot.columns.drop(list(lisbon_onehot.filter(regex='Burger')))]
lisbon_onehot = lisbon_onehot[lisbon_onehot.columns.drop(list(lisbon_onehot.filter(regex='Café')))]
lisbon_onehot = lisbon_onehot[lisbon_onehot.columns.drop(list(lisbon_onehot.filter(regex='Coffee')))]
lisbon_onehot = lisbon_onehot[lisbon_onehot.columns.drop(list(lisbon_onehot.filter(regex='Ice Cream')))]
lisbon_onehot.head()

Unnamed: 0,Parish,Accessories Store,Airport,Airport Lounge,Airport Service,Airport Terminal,Amphitheater,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Athletics & Sports,Automotive Shop,Bagel Shop,Bakery,Bar,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Big Box Store,Bistro,Board Shop,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Bowling Alley,Breakfast Spot,Brewery,Bubble Tea Shop,Buffet,Burrito Place,Cafeteria,Campground,Candy Store,Capitol Building,Casino,Castle,Cheese Shop,Chocolate Shop,Church,Climbing Gym,Clothing Store,Cocktail Bar,College Academic Building,College Cafeteria,Concert Hall,Convenience Store,Cosmetics Shop,Creperie,Cultural Center,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Dive Bar,Duty-free Shop,Electronics Store,Escape Room,Event Space,Exhibit,Farm,Farmers Market,Flower Shop,Food,Food Court,Food Truck,Fountain,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gas Station,Gastropub,General Entertainment,Gift Shop,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Harbor / Marina,Hardware Store,Health Food Store,Historic Site,History Museum,Hockey Arena,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Indie Movie Theater,Jewelry Store,Juice Bar,Kitchen Supply Store,Laundromat,Lingerie Store,Lounge,Market,Men's Store,Metro Station,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Motel,Motorcycle Shop,Movie Theater,Museum,Music Venue,Nightclub,Noodle House,Office,Opera House,Other Great Outdoors,Other Nightlife,Palace,Paper / Office Supplies Store,Park,Pastry Shop,Performing Arts Venue,Pet Store,Pharmacy,Photography Lab,Pie Shop,Pier,Planetarium,Playground,Plaza,Poke Place,Pool,Pub,Rental Car Location,Road,Roof Deck,Salad Place,Sandwich Place,Sausage Shop,Scenic Lookout,Science Museum,Sculpture Garden,Shoe Store,Shop & Service,Shopping Mall,Skating Rink,Snack Place,Soccer Field,Soccer Stadium,Soup Place,Spa,Speakeasy,Sporting Goods Shop,Sports Club,Stables,Stadium,Supermarket,Tea Room,Tennis Court,Theater,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Veterinarian,Waterfront,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Ajuda,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Ajuda,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Ajuda,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Ajuda,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Ajuda,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [53]:
lisbon_grouped = lisbon_onehot.groupby('Parish').mean().reset_index()
lisbon_grouped.head()

Unnamed: 0,Parish,Accessories Store,Airport,Airport Lounge,Airport Service,Airport Terminal,Amphitheater,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Athletics & Sports,Automotive Shop,Bagel Shop,Bakery,Bar,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Big Box Store,Bistro,Board Shop,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Bowling Alley,Breakfast Spot,Brewery,Bubble Tea Shop,Buffet,Burrito Place,Cafeteria,Campground,Candy Store,Capitol Building,Casino,Castle,Cheese Shop,Chocolate Shop,Church,Climbing Gym,Clothing Store,Cocktail Bar,College Academic Building,College Cafeteria,Concert Hall,Convenience Store,Cosmetics Shop,Creperie,Cultural Center,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Dive Bar,Duty-free Shop,Electronics Store,Escape Room,Event Space,Exhibit,Farm,Farmers Market,Flower Shop,Food,Food Court,Food Truck,Fountain,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gas Station,Gastropub,General Entertainment,Gift Shop,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Harbor / Marina,Hardware Store,Health Food Store,Historic Site,History Museum,Hockey Arena,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Indie Movie Theater,Jewelry Store,Juice Bar,Kitchen Supply Store,Laundromat,Lingerie Store,Lounge,Market,Men's Store,Metro Station,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Motel,Motorcycle Shop,Movie Theater,Museum,Music Venue,Nightclub,Noodle House,Office,Opera House,Other Great Outdoors,Other Nightlife,Palace,Paper / Office Supplies Store,Park,Pastry Shop,Performing Arts Venue,Pet Store,Pharmacy,Photography Lab,Pie Shop,Pier,Planetarium,Playground,Plaza,Poke Place,Pool,Pub,Rental Car Location,Road,Roof Deck,Salad Place,Sandwich Place,Sausage Shop,Scenic Lookout,Science Museum,Sculpture Garden,Shoe Store,Shop & Service,Shopping Mall,Skating Rink,Snack Place,Soccer Field,Soccer Stadium,Soup Place,Spa,Speakeasy,Sporting Goods Shop,Sports Club,Stables,Stadium,Supermarket,Tea Room,Tennis Court,Theater,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Veterinarian,Waterfront,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Ajuda,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.040816,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.020408,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.020408,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.040816,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Alcântra,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.042857,0.014286,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.042857,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.014286,0.014286,0.0,0.0,0.0,0.014286,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.014286,0.028571,0.0,0.014286,0.0,0.0,0.014286,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.042857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Alvalade,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.03,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.03,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Areeiro,0.010309,0.0,0.0,0.0,0.0,0.0,0.0,0.010309,0.0,0.0,0.0,0.0,0.0,0.041237,0.061856,0.0,0.0,0.0,0.0,0.0,0.010309,0.0,0.0,0.030928,0.0,0.0,0.0,0.0,0.010309,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010309,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010309,0.0,0.0,0.0,0.0,0.010309,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010309,0.0,0.0,0.010309,0.0,0.0,0.0,0.010309,0.020619,0.020619,0.0,0.0,0.0,0.0,0.010309,0.010309,0.0,0.0,0.0,0.0,0.082474,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010309,0.0,0.0,0.0,0.010309,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030928,0.0,0.010309,0.010309,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010309,0.0,0.010309,0.0,0.0,0.010309,0.0,0.0,0.0,0.0,0.0,0.0,0.030928,0.010309,0.0,0.010309,0.0,0.0,0.0,0.010309,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Arroios,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0


In [54]:
num_top_venues = 5

for hood in lisbon_grouped['Parish']:
    print("----"+hood+"----")
    temp = lisbon_grouped[lisbon_grouped['Parish'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

---- Ajuda----
              venue  freq
0            Bakery  0.04
1       Supermarket  0.04
2          Pie Shop  0.02
3  Capitol Building  0.02
4    Breakfast Spot  0.02


---- Alcântra----
          venue  freq
0         Plaza  0.04
1  Dessert Shop  0.04
2        Bakery  0.04
3   Snack Place  0.03
4        Museum  0.03


---- Alvalade----
                  venue  freq
0                 Hotel  0.04
1                Bakery  0.03
2             Bookstore  0.03
3  Gym / Fitness Center  0.03
4                 Plaza  0.03


---- Areeiro----
         venue  freq
0        Hotel  0.08
1          Bar  0.06
2       Bakery  0.04
3    Bookstore  0.03
4  Supermarket  0.03


---- Arroios----
            venue  freq
0           Hotel  0.07
1  Scenic Lookout  0.04
2           Plaza  0.03
3         Theater  0.03
4  Breakfast Spot  0.02


---- Avenidas Novas----
                  venue  freq
0                Bakery  0.04
1                   Bar  0.03
2                 Hotel  0.03
3  Gym / Fitness Center

In [55]:
num_top_venues = 7

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Parish']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
parish_venues_sorted = pd.DataFrame(columns=columns)
parish_venues_sorted['Parish'] = lisbon_grouped['Parish']

for ind in np.arange(lisbon_grouped.shape[0]):
    parish_venues_sorted.iloc[ind, 1:] = return_most_common_venues(lisbon_grouped.iloc[ind, :], num_top_venues)

parish_venues_sorted.head()

Unnamed: 0,Parish,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
0,Ajuda,Supermarket,Bakery,Pie Shop,Soccer Stadium,Historic Site,History Museum,Hostel
1,Alcântra,Dessert Shop,Bakery,Plaza,Nightclub,Snack Place,Museum,Lounge
2,Alvalade,Hotel,Bar,Bakery,Gym / Fitness Center,Bookstore,Plaza,Pub
3,Areeiro,Hotel,Bar,Bakery,Supermarket,Bookstore,Plaza,Gym
4,Arroios,Hotel,Scenic Lookout,Plaza,Theater,Arts & Crafts Store,Garden,Beer Bar


In [56]:
# set number of clusters
kclusters = 7

lisbon_grouped_clustering = lisbon_grouped.drop('Parish', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(lisbon_grouped_clustering)

# cluster labels generated for each parish in the dataframe
kmeans.labels_[0:24]

array([6, 4, 4, 4, 0, 4, 2, 4, 1, 4, 4, 1, 6, 5, 4, 6, 4, 1, 0, 5, 0, 3,
       1, 0], dtype=int32)

In [57]:
# add clustering labels
parish_venues_sorted.insert(0,'Cluster Labels', kmeans.labels_)

lisbon_merged = dfpc

# merge data on lisbon parishes with data on parishes venues
lisbon_merged = lisbon_merged.join(parish_venues_sorted.set_index('Parish'), on='Parish')

lisbon_merged.head() 

Unnamed: 0,Number,Parish,Population 2013,Area km^2,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
0,1,Ajuda,15617,2.88,38.705781,-9.201325,6,Supermarket,Bakery,Pie Shop,Soccer Stadium,Historic Site,History Museum,Hostel
1,2,Alcântra,13943,5.07,38.704844,-9.179878,4,Dessert Shop,Bakery,Plaza,Nightclub,Snack Place,Museum,Lounge
2,3,Alvalade,31813,5.34,38.746914,-9.140536,4,Hotel,Bar,Bakery,Gym / Fitness Center,Bookstore,Plaza,Pub
3,4,Areeiro,20131,1.74,38.745119,-9.142653,4,Hotel,Bar,Bakery,Supermarket,Bookstore,Plaza,Gym
4,5,Arroios,31653,2.13,38.721008,-9.134658,0,Hotel,Scenic Lookout,Plaza,Theater,Arts & Crafts Store,Garden,Beer Bar


In [58]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(lisbon_merged['Latitude'],lisbon_merged['Longitude'],\
                                  lisbon_merged['Parish'],lisbon_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
#        color='red',
        fill=True,
        fill_color=rainbow[cluster-1],
#        fill_color='green',
        fill_opacity=1.).add_to(map_clusters)
       
map_clusters

In [59]:
map_clusters.save("k7_varr_nornoc.html")

In [60]:
lisbon_merged.loc[lisbon_merged['Cluster Labels'] == 0, lisbon_merged.columns[[1]\
                    + list(range(6, lisbon_merged.shape[1]))]]

Unnamed: 0,Parish,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
4,Arroios,0,Hotel,Scenic Lookout,Plaza,Theater,Arts & Crafts Store,Garden,Beer Bar
18,Penha de França,0,Hotel,Hostel,Supermarket,Plaza,Bakery,Gym / Fitness Center,Scenic Lookout
20,Santa Maria Maior,0,Plaza,Hostel,Wine Bar,Hotel,Bar,Scenic Lookout,Theater
23,São Vicente,0,Hotel,Scenic Lookout,Plaza,Theater,Bar,Wine Bar,Bakery


In [61]:
lisbon_merged.loc[lisbon_merged['Cluster Labels'] == 1, lisbon_merged.columns[[1]\
                                + list(range(6, lisbon_merged.shape[1]))]]

Unnamed: 0,Parish,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
8,Benfica,1,Park,Pet Store,Theater,Soccer Stadium,Gym,Sporting Goods Shop,Bakery
11,Carnide,1,Soccer Stadium,Sporting Goods Shop,Garden,Bakery,Theater,Toy / Game Store,Electronics Store
17,Parque das Nações,1,Electronics Store,Bar,Hotel,Lounge,Sporting Goods Shop,Cosmetics Shop,Gym / Fitness Center
22,São Domingos de Benfica,1,Clothing Store,Sporting Goods Shop,Gym,Electronics Store,Soccer Stadium,Cosmetics Shop,Toy / Game Store


In [62]:
lisbon_merged.loc[lisbon_merged['Cluster Labels'] == 2, lisbon_merged.columns[[1]\
                                + list(range(6, lisbon_merged.shape[1]))]]

Unnamed: 0,Parish,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
6,Beato,2,Theater,Museum,Supermarket,Cafeteria,Dance Studio,Electronics Store,Duty-free Shop


In [63]:
lisbon_merged.loc[lisbon_merged['Cluster Labels'] == 3, lisbon_merged.columns[[1]\
                                + list(range(6, lisbon_merged.shape[1]))]]

Unnamed: 0,Parish,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
21,Santo António,3,Hotel,Bakery,Hotel Bar,Gym,Hostel,Snack Place,Bar


In [64]:
lisbon_merged.loc[lisbon_merged['Cluster Labels'] == 4, lisbon_merged.columns[[1]\
                                + list(range(6, lisbon_merged.shape[1]))]]

Unnamed: 0,Parish,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
1,Alcântra,4,Dessert Shop,Bakery,Plaza,Nightclub,Snack Place,Museum,Lounge
2,Alvalade,4,Hotel,Bar,Bakery,Gym / Fitness Center,Bookstore,Plaza,Pub
3,Areeiro,4,Hotel,Bar,Bakery,Supermarket,Bookstore,Plaza,Gym
5,Avenidas Novas,4,Bakery,Bar,Hotel,Gym / Fitness Center,Gourmet Shop,Grocery Store,Art Museum
7,Belém,4,Monument / Landmark,Bakery,Park,Garden,Sandwich Place,Museum,Hotel
9,Campo de Ourique,4,Bar,Bakery,Pet Store,Pool,Market,Furniture / Home Store,Garden
10,Campolide,4,Bakery,Hotel,Hotel Bar,Bar,Historic Site,Furniture / Home Store,Shopping Mall
14,Marvila,4,Hotel,Supermarket,Brewery,Gym / Fitness Center,Plaza,Art Gallery,Climbing Gym
16,Olivais,4,Rental Car Location,Bakery,Airport Service,Grocery Store,Airport Lounge,Hotel,Electronics Store


In [65]:
lisbon_merged.loc[lisbon_merged['Cluster Labels'] == 5, lisbon_merged.columns[[1]\
                                + list(range(6, lisbon_merged.shape[1]))]]

Unnamed: 0,Parish,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
13,Lumiar,5,Bakery,Supermarket,Park,Gym / Fitness Center,Airport Terminal,Plaza,Sporting Goods Shop
19,Santa Clara,5,Supermarket,Gym / Fitness Center,Park,Grocery Store,Gym Pool,Photography Lab,Plaza


In [66]:
lisbon_merged.loc[lisbon_merged['Cluster Labels'] == 6, lisbon_merged.columns[[1]\
                                + list(range(6, lisbon_merged.shape[1]))]]

Unnamed: 0,Parish,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
0,Ajuda,6,Supermarket,Bakery,Pie Shop,Soccer Stadium,Historic Site,History Museum,Hostel
12,Estrela,6,Lounge,Garden,Breakfast Spot,Hotel,Bar,Bakery,Scenic Lookout
15,Misericórdia,6,Wine Bar,Lounge,Bar,Garden,Deli / Bodega,Hostel,Bakery


The discussion of the results is treated in the associated report.

End of Notebook 2.