# 1. Description of the problem and a discussion of the background

### Applied Data Science Capstone by IBM/Coursera


## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Conclusion](#results)


## Introduction: Business Problem <a name="introduction"></a>

This report is targeted to stakeholders interested in opening a **hotel business in Barcelona (Spain)** 

Barcelona is one of the **world's leading tourist destinations** and has lots of accommodation options. The aim of this project is to identify the **areas less crowded with hotels**, specially those **with tourist landmarks nearby**, which could most likely attract potential customers.

Data science tools will be used to identify the most promising areas based on these criteria. 

## Data <a name="data"></a>

The factors that will influence the decision will be:
* Number of existing tourist attractions in the neighborhoods 
* Number of existing hotels in the neighborhoods
* Existing public transport stops (mainly metro) nearby

The following data sources will be used:
* Tourist attractions and their location in every neighborhood will be obtained from the **OPEN DATA BCN** website
* Lodging businesses and their location in every neighborhood will be obtained using **OPEN DATA BCN** website
* geojson file with the neighborhood boundaries from https://github.com/martgnz/bcn-geodata/tree/master/districtes

The use of **Foursquare API** has not been contemplated in the analysis phase since the data retrieved from this source is insufficient for the purpose of this report. Foursquare API is used after a suitable location is selected in order to explore the restaurant options nearby, also an important factor for visitors.

## Methodology <a name="methodology"></a>

In this project we will explore areas of Barcelona high in tourist attractions and low or moderately low in hotel density. 

*First we will collect data of the Tourist landmarks of the city. 

*The second step in our analysis will be the exploration of 'hotel density' across different neighborhoods in Barcelona.

We will use choropleth maps in order to better identify and visualize the areas.

*After a candidate location is selected, we will look for other aspects that are also key for a hotel location: access to public transport and the restaurants nearby. 


## Analysis <a name="analysis"></a>

The data with the Tourist Attractions of the city can be obtained from the OPEN DATA BCN website (http://www.bcn.cat/tercerlloc/pits_opendata_en.xml).

We first download the available data in a xml file:

In [3]:
import urllib.request

print('Beginning file download with urllib2...')

url = 'http://www.bcn.cat/tercerlloc/pits_opendata_en.xml'
urllib.request.urlretrieve(url, 'C:/Users/I/Documents/Cursos - Educació/Coursera/IBM Applied Data Science Capstone/Projecte/Tourist_attractions.xml') 

Beginning file download with urllib2...


('C:/Users/I/Documents/Cursos - Educació/Coursera/IBM Applied Data Science Capstone/Projecte/Tourist_attractions.xml',
 <http.client.HTTPMessage at 0x20705961b08>)

In [4]:
xml_data = "Tourist_attractions.xml"

In [5]:
# Parsing the XML data to create a Dataframe
import pandas as pd
import xml.etree.ElementTree as ET

tree = ET.parse(xml_data)
root = tree.getroot()
 
df = pd.DataFrame({'Name': [], 'Latitude': [], 'Longitude': [],'Neighborhood':[],'Information':[]})
for item in root.iter('row'):
    for name in item.iter('title'):
        name = name.text
        for lat in item.iter('gmapx'):
            latitude = lat.text
            for long in item.iter('gmapy'):
                longitude = long.text
                for neig in item.iter('district'):
                    neighborhood = neig.text
                    for descr in item.iter('text-twitter-internacional'):
                        info = descr.text
                        df = df.append({'Name': name, 'Latitude':latitude , 'Longitude': longitude,'Neighborhood': neighborhood, 'Information':info}, ignore_index=True)
                    
df.head()

Unnamed: 0,Name,Latitude,Longitude,Neighborhood,Information
0,The Auditori,41.398741180321,2.1851413447996,Eixample,Much more than a concert hall. Drop by the Aud...
1,The Auditori,41.398741180321,2.1851413447996,Eixample,Much more than a concert hall. Drop by the Aud...
2,The Auditori,41.398741180321,2.1851413,Eixample,Much more than a concert hall. Drop by the Aud...
3,The Auditori,41.398741180321,2.1851413,Eixample,Much more than a concert hall. Drop by the Aud...
4,The Auditori,41.398743,2.1851413447996,Eixample,Much more than a concert hall. Drop by the Aud...


In [18]:
# Let's drop the duplicate rows:
df.drop_duplicates(subset ="Name", 
                     keep = "last", inplace = True) 
df.head(20)

Unnamed: 0,Name,Latitude,Longitude,Neighborhood,Information
7,The Auditori,41.398743,2.1851413,Eixample,Much more than a concert hall. Drop by the Aud...
15,Camp Nou,41.380775,2.1228578,Les Corts,Did you know that the Museu del Camp Nou is Sp...
23,Magic Fountain,41.371197,2.1517797,Sants-Montjuïc,Visit the Magic Fountain light and colour show...
31,Estació del Nord and Parc de l’Estació del Nord,41.394295,2.1823204,Eixample,The Estació del Nord bus station and the park ...
39,Museu Nacional d'Art de Catalunya,41.368855,2.1533628,Sants-Montjuïc,"The Museu Nacional d'Art de Catalunya, a palac..."
47,La Casa de la Caritat (CCCB),41.383884,2.1667948,Ciutat Vella,"Casa de la Caritat, contemporary culture in an..."
55,Estació de França,41.384426,2.1853333,Ciutat Vella,"Come and see #Estació de França, a station of ..."
63,Piscines Bernat Picornell,41.3663,2.150775,Sants-Montjuïc,"The Picornell pools, part of the Barcelona 92 ..."
71,Palau Sant Jordi,41.36197,2.1523776,Sants-Montjuïc,A symbol of sport and big occasions. Discover ...
79,Mar Bella Beach,41.39855,2.2123103,Sant Martí,"Cosmopolitan, urban and always buzzing. Come t..."


In [19]:
# We reset the index
df.reset_index(drop=True, inplace=True)
df.head(20)

Unnamed: 0,Name,Latitude,Longitude,Neighborhood,Information
0,The Auditori,41.398743,2.1851413,Eixample,Much more than a concert hall. Drop by the Aud...
1,Camp Nou,41.380775,2.1228578,Les Corts,Did you know that the Museu del Camp Nou is Sp...
2,Magic Fountain,41.371197,2.1517797,Sants-Montjuïc,Visit the Magic Fountain light and colour show...
3,Estació del Nord and Parc de l’Estació del Nord,41.394295,2.1823204,Eixample,The Estació del Nord bus station and the park ...
4,Museu Nacional d'Art de Catalunya,41.368855,2.1533628,Sants-Montjuïc,"The Museu Nacional d'Art de Catalunya, a palac..."
5,La Casa de la Caritat (CCCB),41.383884,2.1667948,Ciutat Vella,"Casa de la Caritat, contemporary culture in an..."
6,Estació de França,41.384426,2.1853333,Ciutat Vella,"Come and see #Estació de França, a station of ..."
7,Piscines Bernat Picornell,41.3663,2.150775,Sants-Montjuïc,"The Picornell pools, part of the Barcelona 92 ..."
8,Palau Sant Jordi,41.36197,2.1523776,Sants-Montjuïc,A symbol of sport and big occasions. Discover ...
9,Mar Bella Beach,41.39855,2.2123103,Sant Martí,"Cosmopolitan, urban and always buzzing. Come t..."


In [113]:
df.shape

(521, 5)

In [20]:
# Let's find the geographical coordinates of Barcelona City:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
address = 'Barcelona'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
lat_bcn = location.latitude
long_bcn = location.longitude
print('The geograpical coordinates of Barcelona are {}, {}.'.format(lat_bcn, long_bcn))

The geograpical coordinates of Barcelona are 41.3828939, 2.1774322.


In [21]:
# We then create a map of Barcelona
import folium # map rendering library

# creating map of Barcelona using latitude and longitude found above
map_bcn = folium.Map(location=[lat_bcn, long_bcn], zoom_start=13)

# adding markers to map
for lat, lng, label in zip(df['Latitude'], df['Longitude'], df['Name']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=True).add_to(map_bcn)  


    
map_bcn

In [24]:
df_neighborhood=df.groupby(['Neighborhood'])['Name'].count().to_frame()
df_neighborhood = df_neighborhood.sort_values(by=['Name'], ascending=False)
df_neighborhood.rename(columns = {'Name':'Count'}, inplace = True)
df_neighborhood

Unnamed: 0_level_0,Count
Neighborhood,Unnamed: 1_level_1
Ciutat Vella,91
Eixample,81
Sants-Montjuïc,64
Sant Martí,56
Sarrià-Sant Gervasi,54
Horta-Guinardó,49
Gràcia,38
Les Corts,33
Sant Andreu,33
Nou Barris,22


We have created a list of the Barcelona neighborhoods sorted by their number of Tourist attractions. The most populated ones are Ciutat Vella (Old Town), Eixample and Sants-Montjuïc.

We will assign to each neighborhood its numerical code, since it will be used later. The numerical codes of the neighborhoods can be found here (https://opendata-ajuntament.barcelona.cat/data/dataset/808daafa-d9ce-48c0-925a-fa5afdb1ed41/resource/4cc59b76-a977-40ac-8748-61217c8ff367/download/districtes_i_barris_170705.csv)

In [25]:
Code = ['01', '02', '03', '10','05','07','06','04','09','08']
df_neighborhood['Code']=Code
df_neighborhood.head()

Unnamed: 0_level_0,Count,Code
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1
Ciutat Vella,91,1
Eixample,81,2
Sants-Montjuïc,64,3
Sant Martí,56,10
Sarrià-Sant Gervasi,54,5


In [26]:
# W now download geojson file with the perimeters of each neighborhood
path_to_map='https://github.com/martgnz/bcn-geodata/blob/master/districtes/districtes.geojson'
!wget --quiet path_to_map
   
print('GeoJSON file downloaded!')

GeoJSON file downloaded!


In [27]:
bcn_geo = r'districtes.geojson' # geojson file

# We create a plain map
bcn_map2 = folium.Map(location=[lat_bcn, long_bcn], zoom_start=12)

In [28]:
import numpy as np  # useful for many scientific computing in Python

# We create a numpy array of length 6 with linear spacing from the minium number of Tourist attractions to the maximum number of Tourist attractions
threshold_scale = np.linspace(df_neighborhood['Count'].min(),
                              df_neighborhood['Count'].max(),
                              6, dtype=int)
threshold_scale = threshold_scale.tolist() # change the numpy array to a list
threshold_scale[-1] = threshold_scale[-1] + 1 # so to make sure that the last value of the list is greater than the maximum 

# let Folium determine the scale.
bcn_map2 = folium.Map(location=[lat_bcn, long_bcn], zoom_start=12)

bcn_map2.choropleth(
    geo_data=bcn_geo,
    data=df_neighborhood,
    columns=['Code', 'Count'],
    key_on='feature.properties.DISTRICTE',
    threshold_scale=threshold_scale,
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Number of Tourist Landmarks',
    reset=True
)
bcn_map2



We can visualize now in the choropleth map what we had observed in the previous section: the neighborhoods with more Tourist attractions are Old Town and Eixample (in red) and Sants-Montjuïc (in dark orange). The neighborhoods with less tourist attractions are in light orange and pale yellow.

We will now focus on the existing hotel businesses in the city, extracting the data from the Economic Census

In [6]:
# The Economic census of the city can be retrieved from the following url and easily converted to a dataframe for further treatment
filename ="https://opendata-ajuntament.barcelona.cat/data/dataset/62fb990e-4cc3-457a-aea1-497604e15659/resource/c897c912-0f3c-4463-bdf2-a67ee97786ac/download"
df1 = pd.read_csv(filename)
df1.head()

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,ID_Bcn_2019,ID_Bcn_2016,Codi_Principal_Activitat,Nom_Principal_Activitat,Codi_Sector_Activitat,Nom_Sector_Activitat,Codi_Grup_Activitat,Nom_Grup_Activitat,Codi_Activitat_2019,Nom_Activitat,...,Solar,Codi_Parcela,Codi_Illa,Seccio_Censal,Codi_Barri,Nom_Barri,Codi_Districte,Nom_Districte,Referencia_cadastral,Data_Revisio
0,1075454,,1,Actiu,2,Serveis,16,Altres,1600400,Serveis a les empreses i oficines,...,,,,25.0,12,la Marina del Prat Vermell,3,Sants-Montjuïc,,20190925.0
1,1075453,,1,Actiu,2,Serveis,16,Altres,1600102,Activitats emmagatzematge,...,,,,25.0,12,la Marina del Prat Vermell,3,Sants-Montjuïc,,20190925.0
2,1075451,,1,Actiu,2,Serveis,16,Altres,1600400,Serveis a les empreses i oficines,...,,,,25.0,12,la Marina del Prat Vermell,3,Sants-Montjuïc,,20190925.0
3,1075449,,1,Actiu,3,Altres,17,Altres,1700100,Administració,...,,,,25.0,12,la Marina del Prat Vermell,3,Sants-Montjuïc,,20190925.0
4,1075448,,1,Actiu,2,Serveis,16,Altres,1600101,Activitats de transport,...,,,,25.0,12,la Marina del Prat Vermell,3,Sants-Montjuïc,,20190925.0


In [7]:
# We drop the rows of closed business
df_filt=df1[df1.Nom_Principal_Activitat != 'Sense activitat Econòmica'] # 'Sense activitat econòmica means 'No economic activity' in Catalan 

In [8]:
# We select columns: business name, type of activity and geographical location
df2=df_filt[['Nom_Activitat', 'Nom_Local','Latitud','Longitud', "Codi_Districte"]]
df2.head()

Unnamed: 0,Nom_Activitat,Nom_Local,Latitud,Longitud,Codi_Districte
0,Serveis a les empreses i oficines,SORIGUE,41.346101,2.130166,3
1,Activitats emmagatzematge,CEJIDOS SIVILA S.A,41.345939,2.12956,3
2,Serveis a les empreses i oficines,QUALITY ESPRESO,41.345591,2.128543,3
3,Administració,CLD,41.346262,2.130599,3
4,Activitats de transport,"CATALANA DEL BUTANO,S.A",41.346514,2.131271,3


In [10]:
# Let's rename the columns (translate them into English)
df2.rename(columns={'Nom_Activitat':'Category','Nom_Local':'Name','Latitud':'Latitude','Longitud':'Longitude', 'Codi_Districte':'Neighborhood'}, inplace=True)
df2.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0,Category,Name,Latitude,Longitude,Neighborhood
0,Serveis a les empreses i oficines,SORIGUE,41.346101,2.130166,3
1,Activitats emmagatzematge,CEJIDOS SIVILA S.A,41.345939,2.12956,3
2,Serveis a les empreses i oficines,QUALITY ESPRESO,41.345591,2.128543,3
3,Administració,CLD,41.346262,2.130599,3
4,Activitats de transport,"CATALANA DEL BUTANO,S.A",41.346514,2.131271,3


In [11]:
# We create variable for filtering the category we want (in this case lodging Services) 
hotels = df2['Category'] == "serveis d'allotjament" # 'Serveis d'allotjament' means 'Lodging services' in Catalan

In [12]:
# We create a dataframe with the lodging businesses of the city of Barcelona
df_hotels = df2[hotels]
df_hotels.reset_index(drop=True, inplace=True) # we reset the index
df_hotels=df_hotels.drop(['Category'], axis=1) # we drop the Category column since we no longer need it
df_hotels.head(20)

Unnamed: 0,Name,Latitude,Longitude,Neighborhood
0,CASA MACA,41.39734,2.165709,2
1,GRAN HOTEL BARCINO,41.383098,2.177786,1
2,HOSTAL EUROPA,41.381504,2.174413,1
3,SUNOTEL CLUB CENTRAL,41.387976,2.156864,2
4,YELLOW NEST HOSTEL,41.377135,2.123807,4
5,GOLDEN TULIP,41.400258,2.190726,10
6,MAJESTIC RESIDENCE,41.393373,2.162895,2
7,HOTEL SB GLOW,41.402442,2.19084,10
8,CIUTAT BARCELONA HOTEL,41.385945,2.181089,1
9,HOSTAL PARIS,41.381497,2.173453,1


In [36]:
df_hotels2=df_hotels.groupby(['Neighborhood'])['Name'].count().to_frame()
df_hotels2 = df_hotels2.sort_values(by=['Name'], ascending=False) # we sort the values 
df_hotels2.rename(columns = {'Name':'Count'}, inplace = True) # we rename the 'Name' column
df_hotels2

Unnamed: 0_level_0,Count
Neighborhood,Unnamed: 1_level_1
2,261
1,192
3,72
10,61
5,57
6,41
4,30
7,17
9,5
8,4


In [40]:
# We change the numerical code to the string value, since it will be needed when we use the geojson file
Codes_2 = ['02', '01', '03', '10','05','06','04','07','09','08']
df_hotels2['Code']=Codes_2
df_hotels2

Unnamed: 0_level_0,Count,Code
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1
2,261,2
1,192,1
3,72,3
10,61,10
5,57,5
6,41,6
4,30,4
7,17,7
9,5,9
8,4,8


We can see here that the neighborhoods with more lodging business are: 02 (Eixample), 01 (Old Town), 03 (Sants-Montjuïc)

In [38]:
bcn_geo = r'districtes.geojson' # geojson file

# We create a plain map
bcn_map3 = folium.Map(location=[lat_bcn, long_bcn], zoom_start=12)

In [39]:
import numpy as np  # useful for many scientific computing in Python

# We create a numpy array of length 6 and has linear spacing from the minium number of lodging businesses to the maximum number of lodging businesses
threshold_scale = np.linspace(df_hotels2['Count'].min(),
                              df_hotels2['Count'].max(),
                              6, dtype=int)
threshold_scale = threshold_scale.tolist() # change the numpy array to a list
threshold_scale[-1] = threshold_scale[-1] + 1 # to make sure that the last value of the list is greater than the maximum 

# let Folium determine the scale.
bcn_map3 = folium.Map(location=[lat_bcn, long_bcn], zoom_start=12)

bcn_map3.choropleth(
    geo_data=bcn_geo,
    data=df_hotels2,
    columns=['Code', 'Count'],
    key_on='feature.properties.DISTRICTE',
    threshold_scale=threshold_scale,
    fill_color='YlGnBu', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Number of Hotels',
    reset=True
 )
bcn_map3



We can visualize now in the choropleth map what we had observed in the previous section: the neighborhoods with more Hotels are Eixample (in dark blue) and Old Town (in blue) and the neighborhoods with less tourist attractions are in light green and pale yellow.

We had seen that the borough of Sants-Montjuïc has a fair good amount of Tourist Attractions (64). However, it has a moderate amount of hotels (72). This area (specially the zones neighboring the Old Town and Eixample) is a good candidate for opening a new hotel.

Let's explore the neighborhood and its possibilities:

In [43]:
# We find the geographical coordinates of Montjuïc in Barcelona City:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
address = 'Montjuïc, Barcelona'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
lat_bcn2 = location.latitude
long_bcn2 = location.longitude
print('The geograpical coordinates of Montjuïc are {}, {}.'.format(lat_bcn2, long_bcn2))

The geograpical coordinates of Montjuïc are 41.3647625, 2.154233.


In [46]:
# We create a map
map_bcn4 = folium.Map(location=[lat_bcn2, long_bcn2], zoom_start=14)
# add the hotels as red circle markers
for lat, lng, label in zip(df_hotels.Latitude, df_hotels.Longitude, df_hotels.Name):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='red',
        popup=label,
        fill = True,
        fill_color='red',
        fill_opacity=0.6
    ).add_to(map_bcn4)

# display map
map_bcn4

We can see that the area of Sants-Montjuïc is less crowded with hotels compared to the neighboring ones. The labels in blue on the map are metro and train stations. Ideally an hotel business should be near those convenient locations, since tourists usually get around the city by foot. A good option would be around the 'Magoria - La Campana' station, in the center of the map. Its coordinates can be found on Wikipedia (41°22′03″N 2°08′22″E) https://es.wikipedia.org/wiki/Estaci%C3%B3n_de_Magoria-La_Campana

Now that we have our location candidate, let's use Foursquare API to get info on restaurants nearby.

Foursquare credentials:

In [53]:

CLIENT_ID = 'MVJHFQI3P0YKJFP2ZZDQ42DO04PGQ4I2BXEUYVRFKRYVU4O1' # your Foursquare ID
CLIENT_SECRET = 'QRPERDHDBXCUP0K0BHKB3FJHIH2A5RC4TFYZ03PJE0PKQOTJ' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)


Your credentails:
CLIENT_ID: MVJHFQI3P0YKJFP2ZZDQ42DO04PGQ4I2BXEUYVRFKRYVU4O1
CLIENT_SECRET:QRPERDHDBXCUP0K0BHKB3FJHIH2A5RC4TFYZ03PJE0PKQOTJ


In [97]:
latitude = 41.3675
longitude = 2.139444

Let's define a query to search for Restaurants within 500 metres from our candidate location for the hotel:

In [119]:
search_query = 'Restaurant'
radius = 700
print(search_query + ' .... OK!')

Restaurant .... OK!


Let's define the corresponding url:

In [120]:
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=MVJHFQI3P0YKJFP2ZZDQ42DO04PGQ4I2BXEUYVRFKRYVU4O1&client_secret=QRPERDHDBXCUP0K0BHKB3FJHIH2A5RC4TFYZ03PJE0PKQOTJ&ll=41.3675,2.139444&v=20180604&query=Restaurant&radius=700&limit=30'

In [121]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5e923ab3d03993001b363c22'},
 'response': {'venues': [{'id': '4e899ca11081c38019bdf1d4',
    'name': 'Bar Restaurant Castilla',
    'location': {'address': 'C. Parcerisa, 15',
     'lat': 41.3682581710997,
     'lng': 2.1346306800842285,
     'labeledLatLngs': [{'label': 'display',
       'lat': 41.3682581710997,
       'lng': 2.1346306800842285}],
     'distance': 410,
     'cc': 'ES',
     'city': 'Barcelona',
     'state': 'Cataluña',
     'country': 'España',
     'formattedAddress': ['C. Parcerisa, 15', 'Barcelona Cataluña', 'España']},
    'categories': [{'id': '4bf58dd8d48988d1c4941735',
      'name': 'Restaurant',
      'pluralName': 'Restaurants',
      'shortName': 'Restaurant',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/default_',
       'suffix': '.png'},
      'primary': True}],
    'referralId': 'v-1586641671',
    'hasPerk': False},
   {'id': '4adcda54f964a5209f4221e3',
    'name': 'Restaurant Jiu',
    'loca

Now we get the relevant part of JSON and transform it into a pandas dataframe:

In [122]:
# We transform the json file into a pandas dataframe library
from pandas.io.json import json_normalize

# assign relevant part of JSON to venues
venues = results['response']['venues']

# We tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()

  


Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.lat,location.lng,location.labeledLatLngs,location.distance,location.cc,location.city,location.state,location.country,location.formattedAddress,location.postalCode,location.crossStreet,venuePage.id,location.neighborhood
0,4e899ca11081c38019bdf1d4,Bar Restaurant Castilla,"[{'id': '4bf58dd8d48988d1c4941735', 'name': 'R...",v-1586641671,False,"C. Parcerisa, 15",41.368258,2.134631,"[{'label': 'display', 'lat': 41.3682581710997,...",410,ES,Barcelona,Cataluña,España,"[C. Parcerisa, 15, Barcelona Cataluña, España]",,,,
1,4adcda54f964a5209f4221e3,Restaurant Jiu,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1586641671,False,Carrer de Sant Fructuós 133,41.36913,2.141392,"[{'label': 'display', 'lat': 41.36913030784869...",243,ES,Barcelona,Cataluña,España,"[Carrer de Sant Fructuós 133, 08004 Barcelona ...",8004.0,,,
2,4d695a34fd7ea35d4550a44a,Restaurant Can Moreu,[],v-1586641671,False,"crta sant feliu codines, avda rodolf batlle",41.370197,2.13807,"[{'label': 'display', 'lat': 41.37019665000000...",321,ES,Centelles,Cataluña,España,"[crta sant feliu codines, avda rodolf batlle (...",0.0,descon,,
3,51607982e4b0bbcaef9104d3,restaurant trallers,"[{'id': '4bf58dd8d48988d155941735', 'name': 'G...",v-1586641671,False,,41.364529,2.134583,"[{'label': 'display', 'lat': 41.36452865600586...",523,ES,,,España,[España],,,,
4,4ba7817cf964a520df9839e3,Restaurant Gran Muralla China,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1586641671,False,"Suria, 8",41.369812,2.133413,"[{'label': 'display', 'lat': 41.36981175584534...",565,ES,Barcelona,Cataluña,España,"[Suria, 8, 08014 Barcelona Cataluña, España]",8014.0,,,


In [123]:
# We keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# This function extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# We filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# We clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered

Unnamed: 0,name,categories,address,lat,lng,labeledLatLngs,distance,cc,city,state,country,formattedAddress,postalCode,crossStreet,neighborhood,id
0,Bar Restaurant Castilla,Restaurant,"C. Parcerisa, 15",41.368258,2.134631,"[{'label': 'display', 'lat': 41.3682581710997,...",410,ES,Barcelona,Cataluña,España,"[C. Parcerisa, 15, Barcelona Cataluña, España]",,,,4e899ca11081c38019bdf1d4
1,Restaurant Jiu,Chinese Restaurant,Carrer de Sant Fructuós 133,41.36913,2.141392,"[{'label': 'display', 'lat': 41.36913030784869...",243,ES,Barcelona,Cataluña,España,"[Carrer de Sant Fructuós 133, 08004 Barcelona ...",08004,,,4adcda54f964a5209f4221e3
2,Restaurant Can Moreu,,"crta sant feliu codines, avda rodolf batlle",41.370197,2.13807,"[{'label': 'display', 'lat': 41.37019665000000...",321,ES,Centelles,Cataluña,España,"[crta sant feliu codines, avda rodolf batlle (...",00000,descon,,4d695a34fd7ea35d4550a44a
3,restaurant trallers,Gastropub,,41.364529,2.134583,"[{'label': 'display', 'lat': 41.36452865600586...",523,ES,,,España,[España],,,,51607982e4b0bbcaef9104d3
4,Restaurant Gran Muralla China,Chinese Restaurant,"Suria, 8",41.369812,2.133413,"[{'label': 'display', 'lat': 41.36981175584534...",565,ES,Barcelona,Cataluña,España,"[Suria, 8, 08014 Barcelona Cataluña, España]",08014,,,4ba7817cf964a520df9839e3
5,Restaurant Enric i Pau,Spanish Restaurant,"Mineria, 4-6",41.363991,2.136895,"[{'label': 'display', 'lat': 41.36399101617513...",444,ES,Barcelona,Cataluña,España,"[Mineria, 4-6, 08038 Barcelona Cataluña, España]",08038,,,4b94fa00f964a520128a34e3
6,Restaurant catalunya Pita house,Falafel Restaurant,,41.372521,2.141071,"[{'label': 'display', 'lat': 41.37252138215384...",575,ES,,,España,[España],,,,52c95fb711d252d4f8314a9c
7,Restaurant Pizzeria Candela,Pizza Place,"Gran Vía, 325",41.371907,2.145185,"[{'label': 'display', 'lat': 41.37190692791032...",686,ES,Barcelona,Cataluña,España,"[Gran Vía, 325, Barcelona Cataluña, España]",,,,4e887f7655038939f9527b02
8,Restaurant UMA,Spanish Restaurant,"Carrer de Rossend Arús, 12",41.37335,2.137367,"[{'label': 'display', 'lat': 41.37335, 'lng': ...",673,ES,Barcelona,Cataluña,España,"[Carrer de Rossend Arús, 12, 08014 Barcelona C...",08014,,,57b4e4b7498ebc73ebb81a2c
9,Restaurant Ramallo,Sports Bar,,41.373612,2.142626,"[{'label': 'display', 'lat': 41.373612, 'lng':...",730,ES,,,España,[España],,,,518170f5498e85106fd5478c


In [124]:
dataframe_filtered.shape

(30, 16)

In [125]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=16) # generate map centred around the candidate location

# display a red circle marker to represent the candidate location
folium.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Candidate Location',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# display markers for Tourist attractions to map
for lat, lng, label in zip(df['Latitude'], df['Longitude'], df['Information']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='green',
        fill=True,
        fill_color='yellow',
        fill_opacity=0.7,
        parse_html=True).add_to(venues_map)

# display the restaurants as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

## Results and Conclusion <a name="results"></a>

There is a great number of Tourist attractions in Barcelona (more than 500 according to the Open Data BCN). The more dense areas are Ciutat Vella (Old Town) and Eixample, are also very densely crowded with accommodation options, hence opening a hotel business in those areas could be high risk. 

Alternatively our attention was focused on a moderately low hotel density area but still offering lots of Tourist attractions and with access to the public transport. The candidate area is the borough of Sants-Montjuïc, which is located in the southeast of Barcelona. This Olympic-inspired district is full of attractions: the Montjuïc Park, museums, a castle, and fantastic city views.  We selected an hotel-empty area around the Magòria-La Campana station. A fair amount of restaurants was detected near the candidate location, which adds attractiveness to the eventual hotel.
