# Capstone Project - The Battle of Neighborhoods - Lisbon

### Topic: Finding the best place to open a restaurant in Lisbon - Restaurant Data

#### Author: Daniel Leite

### Topic: Finding the best place to open a restaurant in Lisbon - Restaurant Data

##### Firstly it is needed to upload the required libraries:

In [24]:
#Import Libs
#!conda install -c conda-forge geopy --yes 
import json
import folium
import requests
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values (OpenStreetMaps)
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

### Data Section

#### Restaurant Data

In order to make a good choice for the restaurant investment, the following data is required: 
1. List/Information on restaurants in Lisbon with their Geodata.
2. Information on the category of each restaurant/bar.
3. Number of reviews/likes associated with every restaurant/bar.


In [25]:
#FourSquare loading

#GET Foursquare key data to variables 
data = pd.read_csv('foursquare_data2.csv')
data['CLIENT_ID']= data['CLIENT_ID'].astype(str)
data['CLIENT_SECRET']= data['CLIENT_SECRET'].astype(str)
data['VERSION']= data['VERSION'].astype(str)
CLIENT_ID = data.iloc[0]['CLIENT_ID']
CLIENT_SECRET = data.iloc[0]['CLIENT_SECRET']
VERSION = data.iloc[0]['VERSION']

##### A CSV file was created in "1_Lisbon_Borough" with Lisbon boroughs, population,area and geographical coordinates. The CSV file "nnnn.csv" had the following below structure. It was used to search via FourSquare every tourist intended infrastructure including hotels, bars, museums, historic sites...

In [26]:
df = pd.read_csv('Lisboa_Borough.csv', index_col=0)
df.head()

Unnamed: 0,Borough,Population,Area(km²),Latitude,Longitude
0,Ajuda,15 617,288,38.7075,-9.198333
1,Alcântara,13 943,507,38.706389,-9.174167
2,Alvalade,31 813,534,38.746944,-9.136111
3,Areeiro,20 131,174,38.740278,-9.128056
4,Arroios,31 653,213,38.728889,-9.138889


##### The loop algorithm was used  to acquire from Foursquare all the tourist places of interest at Lisbon neighborhoods.

In [27]:
df_restaurant = pd.DataFrame()
for n in range(len(df)):
    borough = df['Borough'].iloc[n]
    latitude = df['Latitude'].iloc[n]
    longitude = df['Longitude'].iloc[n]
    #Getting Foursquare URL
    LIMIT = 100  # limit of number of venues returned by Foursquare API (Same number as used in the assignment to maintain conformity)
    radius = 500  # define radius
    # create URL
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        latitude, 
        longitude, 
        radius, 
        LIMIT)
    #url
    results = requests.get(url).json()
        
    def get_category_type(row):
        try:
            categories_list = row['categories']
        except:
            categories_list = row['venue.categories']

        if len(categories_list) == 0:
            return None
        else:
            return categories_list[0]['name']
        
    #pull the actual data from the Foursquare API
    venues = results['response']['groups'][0]['items']
    nearby_venues = json_normalize(venues)
    filtered_columns = ['venue.name', 'venue.id', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
    nearby_venues =nearby_venues.loc[:, filtered_columns]

    # filter the category for each row
    nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

    #fix the column names so they look relatively normal
    nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
    #nearby_venues.head()
    df_restaurant = pd.concat([df_restaurant, nearby_venues])

In [29]:
df_restaurant = df_restaurant.reset_index(drop=True)
#df_restaurant.to_csv('Restaurant_Data.csv')
df_restaurant

Unnamed: 0,name,id,categories,lat,lng
0,Palácio Nacional da Ajuda,4b0588a3f964a5207bd122e3,Historic Site,38.707653,-9.197758
1,Restaurante Andorinhas,4d9885d59079b1f7a0182d0a,Restaurant,38.704911,-9.199349
2,Páteo Alfacinha,4c532ced72cf0f47267c71d2,Restaurant,38.706537,-9.194202
3,Jardim Botânico da Ajuda,4c8b582be51e6dcb8e7671de,Botanical Garden,38.706430,-9.201222
4,Churrasqueira do Marquês,4c48033e76d72d7fa2043f4d,BBQ Joint,38.703996,-9.199402
...,...,...,...,...,...
1001,Mercado de Santa Clara,4e886bc5be7b88449a912b01,Event Space,38.715564,-9.125582
1002,Jardim Botto Machado,4c962b6e82b56dcbd0f9deaa,Garden,38.715877,-9.123740
1003,Feira da Ladra,4b0588a8f964a520cfd222e3,Flea Market,38.715368,-9.125244
1004,Cafe De Calcada,54d74df1498ec1066a06efaf,Bistro,38.718287,-9.131190


##### After Having the dataframe with all touristic places, it was needed to filter only the restaurants/bars or any kind of food/drink places. This was a needed step because the focus of this project investment was to open a new restaurant/bar in Lisbon. According to this objective there was no need to analyze the historic places/hotels and other places that do not represent a treat to the opening of the restaurant.

In [30]:
# find a list of unique categories from the API so we can see what may or may not fit for restaurants
#nearby_venues['categories'].unique()
#Creating a list of categorie regarding eating and drinking places
df_rest = df_restaurant[df_restaurant['categories'].str.contains('Restaurant|Food|Drink|Bar|Snack|Pizza|Beer|Cafe')]
df_rest = df_rest.rename(columns={'name': 'Places', 'id': 'ID', 'categories': 'Categories', 'lat': 'Latitude', 'lng': 'Longitude'})
#rest['Categories'].value_counts()
df_rest = df_rest.reset_index(drop=True)
df_rest

Unnamed: 0,Places,ID,Categories,Latitude,Longitude
0,Restaurante Andorinhas,4d9885d59079b1f7a0182d0a,Restaurant,38.704911,-9.199349
1,Páteo Alfacinha,4c532ced72cf0f47267c71d2,Restaurant,38.706537,-9.194202
2,Estufa Real,4b0588a4f964a520ced122e3,Restaurant,38.706840,-9.201975
3,Alcântara 50,50899fb2e4b0167a9c2eddf4,Portuguese Restaurant,38.705462,-9.173533
4,O Palácio,4c5c82867735c9b6507f8c72,Seafood Restaurant,38.706357,-9.173442
...,...,...,...,...,...
433,Penalva da Graça,4f89e174e4b00a6262549ad1,Seafood Restaurant,38.720722,-9.130070
434,Taproom Oitava Colina,5b4928789f8a9f002c28cc08,Beer Bar,38.718390,-9.131880
435,O Cardoso do Estrela de Ouro,4c892b94a0ffb60c7f4228c5,Portuguese Restaurant,38.720650,-9.130091
436,Tazza In Giro,5a09ef5d2619ee11bd25fffc,Vegetarian / Vegan Restaurant,38.715800,-9.125121


##### Having the "Restaurant Dataframe" it was created a map in order to visualize the geographical distribution of the data.

In [31]:
address = 'Lisbon,Portugal'
geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
lx_latitude = location.latitude
lx_longitude = location.longitude
print('The geograpical coordinate of Lisbon are latitude:{}, longitude:{}.'.format(lx_latitude, lx_longitude))

The geograpical coordinate of Lisbon are latitude:38.7077507, longitude:-9.1365919.


In [32]:
lx_map = folium.Map(location=[lx_latitude, lx_longitude], zoom_start=12)
# instantiate a feature group for the incidents in the dataframe
restaur = folium.map.FeatureGroup()

# loop through the 100 crimes and add each to the incidents feature group
for Latitude, Longitude, in zip(df_rest.Latitude, df_rest.Longitude):
    restaur.add_child(
        folium.features.CircleMarker(
            [Latitude, Longitude],
            radius=5, # define how big you want the circle markers to be
            color='yellow',
            fill=True,
            fill_color='Red',
            fill_opacity=0.6
        )
    )

# add incidents to map
lx_map.add_child(restaur)

##### The thought process behind this is that likes are a proxy for quality. The more likes there are, the better the restaurant is. This might be incorrect but API call issues (how many I can use for free) holds me back from getting price / rating data. I will then bin this data into a quality categorical variables so we can cluster appropriately. Having this into account to accomplish this project, this solutions seemed the most logical for having a comparative method.

In [33]:
#Getting a list of venues ID
venue_id_list = df_rest['ID'].tolist()

#set up to pull the likes from the API based on venue ID
url_list = []
like_list = []
json_list = []

for i in venue_id_list:
    venue_url = 'https://api.foursquare.com/v2/venues/{}/likes?client_id={}&client_secret={}&v={}'.format(i, CLIENT_ID, CLIENT_SECRET, VERSION)
    url_list.append(venue_url)
for link in url_list:
    result = requests.get(link).json()
    likes = result['response']['likes']['count']
    like_list.append(likes)

print('Length of Venue IDs List: {}, Length of Number of likes List: {}.'.format(len(venue_id_list), len(like_list)))

Length of Venue IDs List: 438, Length of Number of likes List: 438.


In [37]:
#Create a dataframe with all acquired data
lx_df = df_rest.copy()
lx_df['Likes'] = like_list
#lx_df.to_csv('Like_data.csv')
lx_df

Unnamed: 0,Places,ID,Categories,Latitude,Longitude,Likes
0,Restaurante Andorinhas,4d9885d59079b1f7a0182d0a,Restaurant,38.704911,-9.199349,23
1,Páteo Alfacinha,4c532ced72cf0f47267c71d2,Restaurant,38.706537,-9.194202,44
2,Estufa Real,4b0588a4f964a520ced122e3,Restaurant,38.706840,-9.201975,25
3,Alcântara 50,50899fb2e4b0167a9c2eddf4,Portuguese Restaurant,38.705462,-9.173533,27
4,O Palácio,4c5c82867735c9b6507f8c72,Seafood Restaurant,38.706357,-9.173442,86
...,...,...,...,...,...,...
433,Penalva da Graça,4f89e174e4b00a6262549ad1,Seafood Restaurant,38.720722,-9.130070,12
434,Taproom Oitava Colina,5b4928789f8a9f002c28cc08,Beer Bar,38.718390,-9.131880,12
435,O Cardoso do Estrela de Ouro,4c892b94a0ffb60c7f4228c5,Portuguese Restaurant,38.720650,-9.130091,9
436,Tazza In Giro,5a09ef5d2619ee11bd25fffc,Vegetarian / Vegan Restaurant,38.715800,-9.125121,6
