## Introduction

Located in the north coast of Portugal and famous for its food, wine and beaches, Porto has been growing as one of the most attractive destinations worldwide. This has not only been recognized by the prizes own (World Travel Awards: https://www.worldtravelawards.com/profile-34429-porto-tourism) but also by the increasing the number of tourist arrivals - grew over 70% from 2013 to 2019.

This tendency has been followed by new investments in Porto and in deverse sectors - from real estate to the opening of subsidiaries / offices of many multinational companies in the city.

Having said that, in this section we will examine the different municipalities in Porto and select the best locations to open an Italian restaurant.

## Data overview

In order to analyze the best counties to open the restaurant, we will start by retrieving their respective latitudes and longitudes from https://simplemaps.com/data/pt-cities. Through this file, we can also get the number of people living in each municipality, which can be a good indicator for the market size. This will be done by reading the csv file and cleaning the data.

As we will be able to see, Porto is composed by 18 municipalities and has roughly 2 million inhabitants. 

The second step, will be to get the Venues data in each country, by using the Foursquare API.

In [110]:
import pandas as pd
import requests
import numpy as np

In [111]:
pt = pd.read_csv(r'C:\Users\nunos\Downloads\pt.csv')

pt.head()

Unnamed: 0,city,lat,lng,country,iso2,admin_name,capital,population,population_proper
0,Lisbon,38.7452,-9.1604,Portugal,PT,Lisboa,primary,506654.0,506654.0
1,Vila Nova de Gaia,41.1333,-8.6167,Portugal,PT,Porto,minor,302295.0,302295.0
2,Porto,41.1495,-8.6108,Portugal,PT,Porto,admin,237591.0,237591.0
3,Braga,41.5333,-8.4167,Portugal,PT,Braga,admin,181494.0,181494.0
4,Matosinhos,41.183,-8.67977,Portugal,PT,Porto,minor,175478.0,175478.0


In [112]:
# The first step will be to drop the uncessary columns
pt.drop(columns = ['iso2', 'country', 'population_proper'], inplace = True)

# In the second step we will rename the columns for easier visualization
pt.rename(columns = {'city': 'City',
                    'lat': 'Latitude',
                    'lng': 'Longitude',
                    'admin_name': 'District',
                    'capital': 'City type',
                    'population': 'Population'
                    },
         inplace = True
         )

# Check the dataframe
pt.head()

Unnamed: 0,City,Latitude,Longitude,District,City type,Population
0,Lisbon,38.7452,-9.1604,Lisboa,primary,506654.0
1,Vila Nova de Gaia,41.1333,-8.6167,Porto,minor,302295.0
2,Porto,41.1495,-8.6108,Porto,admin,237591.0
3,Braga,41.5333,-8.4167,Braga,admin,181494.0
4,Matosinhos,41.183,-8.67977,Porto,minor,175478.0


In [113]:
# Retrieve only the cities location in Porto district

pt_porto = pt.loc[pt['District'] == 'Porto']

# Checking our dataset

pt_porto.head()

Unnamed: 0,City,Latitude,Longitude,District,City type,Population
1,Vila Nova de Gaia,41.1333,-8.6167,Porto,minor,302295.0
2,Porto,41.1495,-8.6108,Porto,admin,237591.0
4,Matosinhos,41.183,-8.67977,Porto,minor,175478.0
8,Gondomar,41.15,-8.5333,Porto,minor,168027.0
13,Maia,41.2333,-8.6167,Porto,minor,135306.0


In [114]:
pt_porto.set_index('City')

Unnamed: 0_level_0,Latitude,Longitude,District,City type,Population
City,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Vila Nova de Gaia,41.1333,-8.6167,Porto,minor,302295.0
Porto,41.1495,-8.6108,Porto,admin,237591.0
Matosinhos,41.183,-8.67977,Porto,minor,175478.0
Gondomar,41.15,-8.5333,Porto,minor,168027.0
Maia,41.2333,-8.6167,Porto,minor,135306.0
Valongo,41.1833,-8.5,Porto,minor,93858.0
Paredes,41.2,-8.3333,Porto,minor,86854.0
Vila do Conde,41.35,-8.75,Porto,minor,79533.0
Penafiel,41.2,-8.2833,Porto,minor,72265.0
Póvoa de Varzim,41.3916,-8.7571,Porto,minor,63408.0


In [115]:
# Let's drop the observations that have NaN

pt_porto = pt_porto.dropna(subset = ['City type']).reset_index(drop = True)

In [116]:

pt_porto

Unnamed: 0,City,Latitude,Longitude,District,City type,Population
0,Vila Nova de Gaia,41.1333,-8.6167,Porto,minor,302295.0
1,Porto,41.1495,-8.6108,Porto,admin,237591.0
2,Matosinhos,41.183,-8.67977,Porto,minor,175478.0
3,Gondomar,41.15,-8.5333,Porto,minor,168027.0
4,Maia,41.2333,-8.6167,Porto,minor,135306.0
5,Valongo,41.1833,-8.5,Porto,minor,93858.0
6,Paredes,41.2,-8.3333,Porto,minor,86854.0
7,Vila do Conde,41.35,-8.75,Porto,minor,79533.0
8,Penafiel,41.2,-8.2833,Porto,minor,72265.0
9,Póvoa de Varzim,41.3916,-8.7571,Porto,minor,63408.0


In [117]:
# Defining Foursquare credentials

CLIENT_ID = 'WZ3OLOAKPUVAJ1D04OBRIHNRKZ5BLSPY3ERTO33S0IRPNX0M' # your Foursquare ID
CLIENT_SECRET = 'YE1C1L3KVYLUDUW0HGLJIRZF2JV12G13EVKIAXGP4S1VMUMR' # your Foursquare Secret
VERSION = '20210515'

In [118]:
def getNearbyVenues(names, latitudes, longitudes):
    radius=1000
    LIMIT=100
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [119]:
venues_porto = getNearbyVenues(names = pt_porto['City'],
                                   latitudes = pt_porto['Latitude'],
                                   longitudes = pt_porto['Longitude']
                                  )

Vila Nova de Gaia
Porto
Matosinhos
Gondomar
Maia
Valongo
Paredes
Vila do Conde
Penafiel
Póvoa de Varzim
Felgueiras
Paços de Ferreira
Amarante
Marco de Canavezes
Lousada
Trofa
Baião
Santo Tirso


In [120]:
# Get the venues for each city

venues_portugal.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Vila Nova de Gaia,41.1333,-8.6167,Croft Port,41.134585,-8.614832,Wine Shop
1,Vila Nova de Gaia,41.1333,-8.6167,Caves Taylor's,41.134341,-8.614405,Winery
2,Vila Nova de Gaia,41.1333,-8.6167,The Yeatman,41.133652,-8.612981,Hotel
3,Vila Nova de Gaia,41.1333,-8.6167,Barão Fladgate,41.134561,-8.614298,Portuguese Restaurant
4,Vila Nova de Gaia,41.1333,-8.6167,Yeatman Restaurant,41.133967,-8.613222,Restaurant


In [121]:
# Get the total number of venues by city

venues_portugal.groupby(by = 'Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Amarante,12,12,12,12,12,12
Baião,4,4,4,4,4,4
Felgueiras,1,1,1,1,1,1
Gondomar,4,4,4,4,4,4
Lousada,1,1,1,1,1,1
Maia,21,21,21,21,21,21
Marco de Canavezes,24,24,24,24,24,24
Matosinhos,3,3,3,3,3,3
Paredes,8,8,8,8,8,8
Paços de Ferreira,6,6,6,6,6,6


In [122]:
# One hot enconding -> Converting venues' categories into dummy variables

porto_1h = pd.get_dummies(venues_portugal[['Venue Category']], prefix = '', prefix_sep = '')
porto_1h.head()

Unnamed: 0,Art Gallery,Arts & Crafts Store,Asian Restaurant,Auto Garage,BBQ Joint,Bakery,Bar,Beach,Beer Bar,Beer Garden,...,Theater,Theme Park,Theme Restaurant,Toll Plaza,Train Station,Waterfront,Wine Bar,Wine Shop,Winery,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [123]:
# Add the City it as our first column

porto_1h.insert(loc=0, column = 'City', value = venues_portugal['Neighborhood'])
porto_1h.head()

Unnamed: 0,City,Art Gallery,Arts & Crafts Store,Asian Restaurant,Auto Garage,BBQ Joint,Bakery,Bar,Beach,Beer Bar,...,Theater,Theme Park,Theme Restaurant,Toll Plaza,Train Station,Waterfront,Wine Bar,Wine Shop,Winery,Yoga Studio
0,Vila Nova de Gaia,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
1,Vila Nova de Gaia,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0
2,Vila Nova de Gaia,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Vila Nova de Gaia,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Vila Nova de Gaia,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [124]:
# Now let's group it by neighborhood

porto_1h = porto_1h.groupby('City').mean().reset_index()
porto_1h

Unnamed: 0,City,Art Gallery,Arts & Crafts Store,Asian Restaurant,Auto Garage,BBQ Joint,Bakery,Bar,Beach,Beer Bar,...,Theater,Theme Park,Theme Restaurant,Toll Plaza,Train Station,Waterfront,Wine Bar,Wine Shop,Winery,Yoga Studio
0,Amarante,0.0,0.0,0.0,0.0,0.0,0.083333,0.083333,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Baião,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Felgueiras,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Gondomar,0.0,0.0,0.0,0.0,0.25,0.25,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Lousada,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Maia,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,...,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0
6,Marco de Canavezes,0.0,0.0,0.0,0.0,0.0,0.083333,0.166667,0.0,0.0,...,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0
7,Matosinhos,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Paredes,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Paços de Ferreira,0.0,0.0,0.0,0.166667,0.0,0.166667,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [125]:
# Create a function to retrieve the most common venues

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [126]:
# Get the top 10 venue categories in each Neighborhood

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['City']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['City'] = porto_1h['City']

for ind in np.arange(porto_1h.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(porto_1h.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Amarante,Restaurant,Hotel,Plaza,Bakery,Bar,Church,Lounge,Pastelaria,Café,Park
1,Baião,Ice Cream Shop,Harbor / Marina,Bakery,Restaurant,Flea Market,Department Store,Dessert Shop,Diner,Dutch Restaurant,Eastern European Restaurant
2,Felgueiras,Restaurant,Yoga Studio,Department Store,Dessert Shop,Diner,Dutch Restaurant,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Farm
3,Gondomar,Restaurant,BBQ Joint,Bakery,Snack Place,Flea Market,Department Store,Dessert Shop,Diner,Dutch Restaurant,Eastern European Restaurant
4,Lousada,Resort,Cupcake Shop,Department Store,Dessert Shop,Diner,Dutch Restaurant,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Farm


### Clustering

In [127]:
from sklearn.cluster import KMeans

In [128]:
# Defining the number of clusters
kclusters = 5

porto_clusters = porto_1h.drop('City', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(porto_clusters)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 0, 3, 0, 2, 1, 1, 4, 1, 1])

In [129]:
# Add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

porto_merged = pt_porto

In [130]:
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
porto_merged = porto_merged.join(neighborhoods_venues_sorted.set_index('City'), on='City')

porto_merged.head()

Unnamed: 0,City,Latitude,Longitude,District,City type,Population,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Vila Nova de Gaia,41.1333,-8.6167,Porto,minor,302295.0,1,Portuguese Restaurant,Restaurant,Wine Bar,Winery,Wine Shop,Bar,Hotel,Pizza Place,Park,Market
1,Porto,41.1495,-8.6108,Porto,admin,237591.0,1,Portuguese Restaurant,Bar,Hotel,Restaurant,Plaza,Ice Cream Shop,Hostel,Coffee Shop,Breakfast Spot,Tapas Restaurant
2,Matosinhos,41.183,-8.67977,Porto,minor,175478.0,4,BBQ Joint,Farm,Board Shop,Yoga Studio,Food Court,Dessert Shop,Diner,Dutch Restaurant,Eastern European Restaurant,Electronics Store
3,Gondomar,41.15,-8.5333,Porto,minor,168027.0,0,Restaurant,BBQ Joint,Bakery,Snack Place,Flea Market,Department Store,Dessert Shop,Diner,Dutch Restaurant,Eastern European Restaurant
4,Maia,41.2333,-8.6167,Porto,minor,135306.0,1,Pharmacy,Portuguese Restaurant,Bakery,Gym,Sushi Restaurant,Dessert Shop,Park,Café,Shopping Mall,Eastern European Restaurant


### Maping

In [131]:
import matplotlib.cm as cm
import matplotlib.colors as colors
!conda install -c conda-forge folium=0.5.0 --yes 
import folium
from geopy.geocoders import Nominatim 

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.



In [132]:
address = 'Porto'

geolocator = Nominatim(user_agent="porto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Porto are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Porto are 41.1494512, -8.6107884.


In [136]:
# Maping the clusters

latitude = location.latitude
longitude = location.longitude

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(porto_merged['Latitude'], porto_merged['Longitude'], porto_merged['City'], porto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [134]:
porto_merged.sort_values(by = 'Population', ascending = False)

Unnamed: 0,City,Latitude,Longitude,District,City type,Population,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Vila Nova de Gaia,41.1333,-8.6167,Porto,minor,302295.0,1,Portuguese Restaurant,Restaurant,Wine Bar,Winery,Wine Shop,Bar,Hotel,Pizza Place,Park,Market
1,Porto,41.1495,-8.6108,Porto,admin,237591.0,1,Portuguese Restaurant,Bar,Hotel,Restaurant,Plaza,Ice Cream Shop,Hostel,Coffee Shop,Breakfast Spot,Tapas Restaurant
2,Matosinhos,41.183,-8.67977,Porto,minor,175478.0,4,BBQ Joint,Farm,Board Shop,Yoga Studio,Food Court,Dessert Shop,Diner,Dutch Restaurant,Eastern European Restaurant,Electronics Store
3,Gondomar,41.15,-8.5333,Porto,minor,168027.0,0,Restaurant,BBQ Joint,Bakery,Snack Place,Flea Market,Department Store,Dessert Shop,Diner,Dutch Restaurant,Eastern European Restaurant
4,Maia,41.2333,-8.6167,Porto,minor,135306.0,1,Pharmacy,Portuguese Restaurant,Bakery,Gym,Sushi Restaurant,Dessert Shop,Park,Café,Shopping Mall,Eastern European Restaurant
5,Valongo,41.1833,-8.5,Porto,minor,93858.0,1,Café,Gym / Fitness Center,Lounge,Yoga Studio,Flea Market,Dessert Shop,Diner,Dutch Restaurant,Eastern European Restaurant,Electronics Store
6,Paredes,41.2,-8.3333,Porto,minor,86854.0,1,Coffee Shop,Supermarket,Portuguese Restaurant,Grocery Store,Bakery,Snack Place,Café,Electronics Store,Dessert Shop,Diner
7,Vila do Conde,41.35,-8.75,Porto,minor,79533.0,1,Café,Beach,Fast Food Restaurant,Tapas Restaurant,Portuguese Restaurant,Dessert Shop,Park,Candy Store,Seafood Restaurant,Shopping Mall
8,Penafiel,41.2,-8.2833,Porto,minor,72265.0,1,Bar,Yoga Studio,BBQ Joint,Shopping Mall,Theme Park,Bus Stop,Flea Market,Dessert Shop,Diner,Dutch Restaurant
9,Póvoa de Varzim,41.3916,-8.7571,Porto,minor,63408.0,1,Fast Food Restaurant,Bakery,Electronics Store,Bus Station,Stadium,Supermarket,Yoga Studio,Dessert Shop,Diner,Dutch Restaurant
