# Investing in a property in Lisbon, Portugal

This notebook is my capstone project for the IBM Data Science Professional Certificate course. For this assignment each student could come up with a problem, that could be solved by leveraging Foursquare location data and machine learning algorithms.

## Business Problem

Portugal has seen a huge increase in tourists in the past few years, especially in Lisbon, the biggest city in the country that has been competing with Porto as the number one destination in Portugal. Due to this growth in the tourism sector in Lisbon, more and more investors are now looking into investing in real estate in this city. Now they are looking to identify those neighborhoods that have the most interesting venues that makes them better for living and therefore a better investment and potential residency.

## Data

The data needed to solve this problem is: 

- List of neighborhoods in Lisbon. We get the list on https://en.wikipedia.org/wiki/Category:Parishes_of_Lisbon and create our neighborhoods data frame.
- Coordinates of the neighborhoods. These are obtained with geopy library based on district names and added to the data frame.
- Information about the venues and landmarks in the neighborhoods. The Foursquare API is used to collect this information. 

In [155]:
district_names = ['Ajuda','Alcântara','Alvalade','Areeiro','Arroios','Beato','Belém','Benfica','Campo de Ourique','Campolide','Carnide','Estrela','Lumiar','Marvila','Misericórdia','Olivais','Parque das Nações','Penha de França','Santa Clara (Lisbon)','Santa Maria Maior','São Domingos de Benfica','São Vicente']

Number of districts in Lisbon

In [156]:
len(district_names)

22

Importing pachakes required

In [142]:
!conda install -c conda-forge geopy --yes
!conda install -c conda-forge folium=0.5.0 --yes

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.



Getting the coordinates of Lisbon

In [157]:
# convert an address into latitude and longitude values
from geopy.geocoders import Nominatim

# map rendering library
import folium

city_address = 'Lisbon, Portugal'

geolocator = Nominatim(user_agent='lisbon_explorer')
location = geolocator.geocode(city_address)
city_latitude = location.latitude
city_longitude = location.longitude

print(f'The geograpical coordinate of Lisbon are {city_latitude}, {city_longitude}.')

The geograpical coordinate of Lisbon are 38.7077507, -9.1365919.


Getting the coordinates of each district

In [177]:
del district_locations
get_location = lambda name: geolocator.geocode(f'{name}, Lisbon, Portugal')
district_locations = []
for name in district_names:
    loc = get_location(name)
    if loc == None:
        print(f'Could not find location for {name}')
    else:
        district_locations.append((name, loc.latitude, loc.longitude))
district_locations

Could not find location for Santa Clara (Lisbon)


[('Ajuda', 38.71232685, -9.20124086834067),
 ('Alcântara', 38.7031126, -9.1806854),
 ('Alvalade', 38.753034, -9.1439777),
 ('Areeiro', 38.7423786, -9.1333962),
 ('Arroios', 38.731932, -9.1342465),
 ('Beato', 38.7326216, -9.110239570540049),
 ('Belém', 38.6977695, -9.2094318),
 ('Benfica', 38.7443647, -9.199569147811829),
 ('Campo de Ourique', 38.718212699999995, -9.16522274104384),
 ('Campolide', 38.73182715, -9.167911166284306),
 ('Carnide', 38.7592057, -9.1926491),
 ('Estrela', 38.7129797, -9.158298),
 ('Lumiar', 38.7727302, -9.160113),
 ('Marvila', 38.7460112, -9.1056191),
 ('Misericórdia', 38.7106835, -9.148208839964797),
 ('Olivais', 38.77032355, -9.125823414435962),
 ('Parque das Nações', 38.7750147, -9.0972562382713),
 ('Penha de França', 38.7261609, -9.1269126),
 ('Santa Maria Maior', 38.71244015, -9.13281443966602),
 ('São Domingos de Benfica', 38.74620965, -9.176214740466666),
 ('São Vicente', 38.7155455, -9.1234337)]

In [178]:
# Add location for Santa Clara and Santo António manually 
district_locations.append(('Santo António', 38.7233, 9.1483))
district_locations.append(('Santa Clara', 38.7168, 9.1251))
district_locations

[('Ajuda', 38.71232685, -9.20124086834067),
 ('Alcântara', 38.7031126, -9.1806854),
 ('Alvalade', 38.753034, -9.1439777),
 ('Areeiro', 38.7423786, -9.1333962),
 ('Arroios', 38.731932, -9.1342465),
 ('Beato', 38.7326216, -9.110239570540049),
 ('Belém', 38.6977695, -9.2094318),
 ('Benfica', 38.7443647, -9.199569147811829),
 ('Campo de Ourique', 38.718212699999995, -9.16522274104384),
 ('Campolide', 38.73182715, -9.167911166284306),
 ('Carnide', 38.7592057, -9.1926491),
 ('Estrela', 38.7129797, -9.158298),
 ('Lumiar', 38.7727302, -9.160113),
 ('Marvila', 38.7460112, -9.1056191),
 ('Misericórdia', 38.7106835, -9.148208839964797),
 ('Olivais', 38.77032355, -9.125823414435962),
 ('Parque das Nações', 38.7750147, -9.0972562382713),
 ('Penha de França', 38.7261609, -9.1269126),
 ('Santa Maria Maior', 38.71244015, -9.13281443966602),
 ('São Domingos de Benfica', 38.74620965, -9.176214740466666),
 ('São Vicente', 38.7155455, -9.1234337),
 ('Santo António', 38.7233, 9.1483),
 ('Santa Clara', 38.7

Let’s take a look at Lisbon and its districts on the map

In [179]:
lisbon_map = folium.Map(location=[city_latitude, city_longitude], zoom_start=11.5)

for name, lat, lng in district_locations:
    label = folium.Popup(name, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(lisbon_map)  
    
lisbon_map

In [180]:
#4square credentials
CLIENT_ID = '0S0MX5RWWDSBVSD4SK3SPPAU5TU1Q1K40CIA21VLIDNNFECL' # your Foursquare ID
CLIENT_SECRET = 'NQ5PNKOTTO3CLFJVWBAZMMFDGDHWJ2ZRBXRWHX4G1TUA4FI1' # your Foursquare Secret
VERSION = '20200404'
LIMIT = 10
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

# defining radius and limit of venues to get
radius=500
LIMIT=100

Your credentails:
CLIENT_ID: 0S0MX5RWWDSBVSD4SK3SPPAU5TU1Q1K40CIA21VLIDNNFECL
CLIENT_SECRET:NQ5PNKOTTO3CLFJVWBAZMMFDGDHWJ2ZRBXRWHX4G1TUA4FI1


Explore Neighborhoods in Lisbon

The idea is to search for all the nightlife spots in lisbon. To do that, I’m gonna use venues/explore API and specify nightlife in the category

In [181]:
import requests
import pandas as pd

venues_list = []
categoryId = '4d4b7105d754a06376d81259' # nightlife category
for name, lat, lng in district_locations:
    # create the API request URL
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&categoryId={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION,
            categoryId,
            lat, 
            lng, 
            radius, 
            LIMIT)

    # make the GET request
    results = requests.get(url).json()['response']['groups'][0]['items']

    # return only relevant information for each nearby venue
    venues_list.append([(
        name, 
        lat, 
        lng, 
        v['venue']['name'], 
        v['venue']['location']['lat'], 
        v['venue']['location']['lng'],  
        v['venue']['categories'][0]['name']) for v in results])
    
nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
nearby_venues.columns = ['Neighborhood', 
              'Neighborhood Latitude', 
              'Neighborhood Longitude', 
              'Venue', 
              'Venue Latitude', 
              'Venue Longitude', 
              'Venue Category']

In [182]:
nearby_venues.head(10)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Alcântara,38.703113,-9.180685,LX Factory,38.703091,-9.178833,General Entertainment
1,Alcântara,38.703113,-9.180685,Rio Maravilha,38.701798,-9.178076,Portuguese Restaurant
2,Alcântara,38.703113,-9.180685,Radio Hotel,38.700596,-9.180995,Lounge
3,Alcântara,38.703113,-9.180685,Alcântara Café,38.703599,-9.176503,Nightclub
4,Alcântara,38.703113,-9.180685,Conga Club,38.70183,-9.177626,Bar
5,Alcântara,38.703113,-9.180685,Dogs,38.701937,-9.177866,Hot Dog Joint
6,Alcântara,38.703113,-9.180685,Central Da Avenida,38.702702,-9.178575,Wine Bar
7,Alcântara,38.703113,-9.180685,Vinyl Bar Cafe,38.698896,-9.182031,Bar
8,Alcântara,38.703113,-9.180685,Hawaii,38.700207,-9.176987,Nightclub
9,Alcântara,38.703113,-9.180685,Bosq,38.701781,-9.177723,Nightclub


What’s our total amount of nightlife venues

In [183]:
len(nearby_venues)

222

What’s our total amount of venue categories?

In [184]:
print('There are {} uniques categories.'.format(len(nearby_venues['Venue Category'].unique())))

There are 30 uniques categories.


Using the obtained data on nightlife places, I’m gonna compare the districts of Lisbon, applying K-means clustering approach.

In [185]:
nearby_venues['Venue Category'].unique()

array(['General Entertainment', 'Portuguese Restaurant', 'Lounge',
       'Nightclub', 'Bar', 'Hot Dog Joint', 'Wine Bar', 'Pub', 'Brewery',
       'Ice Cream Shop', 'Music Venue', 'Beer Garden', 'Beer Bar',
       'Gastropub', 'Juice Bar', 'Burger Joint', 'Café', 'Sports Bar',
       'Karaoke Bar', 'Coffee Shop', 'Cocktail Bar', 'Tea Room',
       'Tapas Restaurant', 'Speakeasy', 'Restaurant', 'Gay Bar',
       'Hotel Bar', 'Liquor Store', 'Roof Deck', 'Dive Bar'], dtype=object)

## Methodology

In this project I’m providing investores with information on Lisbon nightlife and helping to compare them. As a first step I’m going to analyze the most common nightlife venues in each district. Then I’m going to apply K-means clustering to put the districts in clusters for better comparison. Finally I’m going to compare each cluster.

### Analysis

#### Analyzing districts

Let’s see what venue categories exist in each district.

In [186]:
venues_onehot = pd.get_dummies(nearby_venues[['Venue Category']], prefix='', prefix_sep='')
venues_onehot['Neighborhood'] = nearby_venues['Neighborhood']
fixed_columns = [venues_onehot.columns[-1]] + list(venues_onehot.columns[:-1])
venues_onehot = venues_onehot[fixed_columns]
venues_onehot.head()

Unnamed: 0,Neighborhood,Bar,Beer Bar,Beer Garden,Brewery,Burger Joint,Café,Cocktail Bar,Coffee Shop,Dive Bar,...,Nightclub,Portuguese Restaurant,Pub,Restaurant,Roof Deck,Speakeasy,Sports Bar,Tapas Restaurant,Tea Room,Wine Bar
0,Alcântara,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Alcântara,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
2,Alcântara,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Alcântara,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
4,Alcântara,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Now let’s group venue occurances:

In [187]:
venues_grouped = venues_onehot.groupby('Neighborhood').mean().reset_index()
venues_grouped

Unnamed: 0,Neighborhood,Bar,Beer Bar,Beer Garden,Brewery,Burger Joint,Café,Cocktail Bar,Coffee Shop,Dive Bar,...,Nightclub,Portuguese Restaurant,Pub,Restaurant,Roof Deck,Speakeasy,Sports Bar,Tapas Restaurant,Tea Room,Wine Bar
0,Alcântara,0.15,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,...,0.3,0.05,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.1
1,Alvalade,0.6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Areeiro,0.25,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Arroios,0.4,0.2,0.0,0.2,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2
4,Beato,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Belém,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.222222,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111
6,Campo de Ourique,0.6,0.0,0.0,0.2,0.2,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Carnide,0.666667,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Estrela,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.25
9,Lumiar,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


The data listed above already gives some idea of the kind of places we can see in each district. Now we can see top 5 places per district:

In [188]:
import numpy as np

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 5
indicators = ['st', 'nd', 'rd']
columns = ['Neighborhood']

for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind + 1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind + 1))

venues_sorted = pd.DataFrame(columns=columns)
venues_sorted['Neighborhood'] = venues_grouped['Neighborhood']

for ind in np.arange(venues_grouped.shape[0]):
    venues_sorted.iloc[ind, 1:] = return_most_common_venues(venues_grouped.iloc[ind, :], num_top_venues)

venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Alcântara,Nightclub,Bar,Lounge,Wine Bar,Hot Dog Joint
1,Alvalade,Bar,Nightclub,Music Venue,Hotel Bar,Beer Bar
2,Areeiro,Bar,Beer Garden,Pub,Lounge,Hotel Bar
3,Arroios,Bar,Brewery,Beer Bar,Wine Bar,Speakeasy
4,Beato,Bar,Tea Room,Beer Bar,Beer Garden,Brewery
5,Belém,Gastropub,Nightclub,Wine Bar,Portuguese Restaurant,Juice Bar
6,Campo de Ourique,Bar,Brewery,Burger Joint,Tea Room,Beer Bar
7,Carnide,Bar,Café,Tea Room,Beer Bar,Beer Garden
8,Estrela,Bar,Sports Bar,Wine Bar,Tapas Restaurant,Hot Dog Joint
9,Lumiar,Brewery,Karaoke Bar,Wine Bar,Hotel Bar,Beer Bar


In [189]:
nearby_venues['Venue Category'].value_counts()

Bar                      90
Wine Bar                 30
Nightclub                18
Lounge                   16
Cocktail Bar             13
Café                      7
Pub                       7
Brewery                   6
Portuguese Restaurant     5
Gastropub                 5
Gay Bar                   3
Beer Bar                  3
General Entertainment     2
Roof Deck                 1
Dive Bar                  1
Ice Cream Shop            1
Beer Garden               1
Karaoke Bar               1
Restaurant                1
Burger Joint              1
Music Venue               1
Tea Room                  1
Sports Bar                1
Liquor Store              1
Hotel Bar                 1
Speakeasy                 1
Juice Bar                 1
Hot Dog Joint             1
Tapas Restaurant          1
Coffee Shop               1
Name: Venue Category, dtype: int64

### Clustering

In this section I’m gonna apply K-means clustering to group the districts into clusters. First of all let’s put the district location data into a dataframe:

In [190]:
district_data = pd.DataFrame(district_locations, columns=['Neighborhood', 'Latitude', 'Longitude'])
district_data

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Ajuda,38.712327,-9.201241
1,Alcântara,38.703113,-9.180685
2,Alvalade,38.753034,-9.143978
3,Areeiro,38.742379,-9.133396
4,Arroios,38.731932,-9.134246
5,Beato,38.732622,-9.11024
6,Belém,38.697769,-9.209432
7,Benfica,38.744365,-9.199569
8,Campo de Ourique,38.718213,-9.165223
9,Campolide,38.731827,-9.167911


Now we can do some clustering:

In [191]:
from sklearn.cluster import KMeans 

kclusters = 5

venues_grouped_clustering = venues_grouped.drop('Neighborhood', 1)

kmeans = KMeans(init="k-means++", n_clusters=kclusters, n_init=12).fit(venues_grouped_clustering)
venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
venues_merged = district_data
venues_merged = venues_merged.join(venues_sorted.set_index('Neighborhood'), on='Neighborhood')

venues_merged

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Ajuda,38.712327,-9.201241,,,,,,
1,Alcântara,38.703113,-9.180685,0.0,Nightclub,Bar,Lounge,Wine Bar,Hot Dog Joint
2,Alvalade,38.753034,-9.143978,3.0,Bar,Nightclub,Music Venue,Hotel Bar,Beer Bar
3,Areeiro,38.742379,-9.133396,4.0,Bar,Beer Garden,Pub,Lounge,Hotel Bar
4,Arroios,38.731932,-9.134246,3.0,Bar,Brewery,Beer Bar,Wine Bar,Speakeasy
5,Beato,38.732622,-9.11024,1.0,Bar,Tea Room,Beer Bar,Beer Garden,Brewery
6,Belém,38.697769,-9.209432,0.0,Gastropub,Nightclub,Wine Bar,Portuguese Restaurant,Juice Bar
7,Benfica,38.744365,-9.199569,,,,,,
8,Campo de Ourique,38.718213,-9.165223,1.0,Bar,Brewery,Burger Joint,Tea Room,Beer Bar
9,Campolide,38.731827,-9.167911,,,,,,


In [192]:
# Remove Ajuda and Campolide as they have no venues 
venues_merged = venues_merged.dropna()
venues_merged['Cluster Labels'] = venues_merged['Cluster Labels'].astype(int)
venues_merged

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,Alcântara,38.703113,-9.180685,0,Nightclub,Bar,Lounge,Wine Bar,Hot Dog Joint
2,Alvalade,38.753034,-9.143978,3,Bar,Nightclub,Music Venue,Hotel Bar,Beer Bar
3,Areeiro,38.742379,-9.133396,4,Bar,Beer Garden,Pub,Lounge,Hotel Bar
4,Arroios,38.731932,-9.134246,3,Bar,Brewery,Beer Bar,Wine Bar,Speakeasy
5,Beato,38.732622,-9.11024,1,Bar,Tea Room,Beer Bar,Beer Garden,Brewery
6,Belém,38.697769,-9.209432,0,Gastropub,Nightclub,Wine Bar,Portuguese Restaurant,Juice Bar
8,Campo de Ourique,38.718213,-9.165223,1,Bar,Brewery,Burger Joint,Tea Room,Beer Bar
10,Carnide,38.759206,-9.192649,3,Bar,Café,Tea Room,Beer Bar,Beer Garden
11,Estrela,38.71298,-9.158298,3,Bar,Sports Bar,Wine Bar,Tapas Restaurant,Hot Dog Joint
12,Lumiar,38.77273,-9.160113,2,Brewery,Karaoke Bar,Wine Bar,Hotel Bar,Beer Bar


### Visualize the clusters

In [176]:
import matplotlib.cm as cm
import matplotlib.colors as colors

map_clusters = folium.Map(location=[city_latitude, city_longitude], zoom_start=11.3)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(venues_merged['Latitude'], venues_merged['Longitude'], venues_merged['Neighborhood'], venues_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Having our 5 clusters on the map, we can see one interesting observation: the districts within a cluster aren’t necesserily close to each other geographically. For example districts in Cluster 1, 2 and 3 are not in the same parts of the city.

Let’s do further analysis of each cluster to see their key features.

In [193]:
def show_cluster(cluster_id):
    return venues_merged.loc[venues_merged['Cluster Labels'] == cluster_id, venues_merged.columns[[0] + list(range(4, venues_merged.shape[1]))]]

#### Cluster 0

In [194]:
show_cluster(0)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,Alcântara,Nightclub,Bar,Lounge,Wine Bar,Hot Dog Joint
6,Belém,Gastropub,Nightclub,Wine Bar,Portuguese Restaurant,Juice Bar
13,Marvila,Bar,Nightclub,Hotel Bar,Beer Bar,Beer Garden
16,Parque das Nações,Café,Bar,Nightclub,Lounge,Hotel Bar


In these districts we will mostly find nightclubs and different kinds of bars. This would be a good location to open a bar

#### Cluster 1

In [195]:
show_cluster(1)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
5,Beato,Bar,Tea Room,Beer Bar,Beer Garden,Brewery
8,Campo de Ourique,Bar,Brewery,Burger Joint,Tea Room,Beer Bar
19,São Domingos de Benfica,Bar,Brewery,Tea Room,Beer Bar,Beer Garden


In these districts we find mostly bars, tea rooms and breweries. These are places with bars already but no nightclubs so not the best place to open a bar.

#### Cluster 2

In [200]:
show_cluster(2)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
12,Lumiar,Brewery,Karaoke Bar,Wine Bar,Hotel Bar,Beer Bar


This district has a karaoke bar and some bars and breweries. Not the best place to open a bar

#### Cluster 3

In [202]:
show_cluster(3)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
2,Alvalade,Bar,Nightclub,Music Venue,Hotel Bar,Beer Bar
4,Arroios,Bar,Brewery,Beer Bar,Wine Bar,Speakeasy
10,Carnide,Bar,Café,Tea Room,Beer Bar,Beer Garden
11,Estrela,Bar,Sports Bar,Wine Bar,Tapas Restaurant,Hot Dog Joint
14,Misericórdia,Bar,Cocktail Bar,Wine Bar,Nightclub,Lounge
18,Santa Maria Maior,Bar,Wine Bar,Portuguese Restaurant,Lounge,Café
20,São Vicente,Bar,Wine Bar,Portuguese Restaurant,Nightclub,Café


These districts have bars and some nightclubs and cafés. These would be good locations to open a bar too.

#### Cluster 4

In [201]:
show_cluster(4)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
3,Areeiro,Bar,Beer Garden,Pub,Lounge,Hotel Bar
15,Olivais,Hotel Bar,Pub,Lounge,Wine Bar,Beer Bar
17,Penha de França,Lounge,General Entertainment,Wine Bar,Hotel Bar,Beer Bar


These districts have lounges, hotel bars and pubs. Not the best place to open a bar since it has to nightclubs

## Results and Discussion

My analysis shows that districts in Lisbon have different nighlife spots, and therefore it’s possible to cluster them in various categories.

I learned that the most common in Lisbon  are bars, nightclubs and cafés.

Bars are prevailing in the city center, while restaurants are prevailing outside the city centre. Interestingly, only two clusters showed a good number of nightclubs.

Finally, I’d like to note that the analysis was done on the places returned by Foursquare API with 'nightlife spot' category. Analyzing nightlife spots for other purposed may give a different analysis, but the goal of this project was to focus on places to open a pre-drinking bar, so it was important that it was located near nightclubs.


## Conclusion


This project help nightlife investores to better understand the city districts from a nightlife perspective. I grouped similar districts into clusters and showed their differences. There’s no better cluster, everyone can decide what’s best for them, because each cluster has many nightlife spots. The project can be used as part of a bigger study on Lisbon districts to find different investment oppotunities using another category, not just bars but many venue categories.