# Capstone Project - The Battle of the Neighborhoods

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

In this project we will try to find an optimal location for a 5 star hotel. Specifically, this report will be targeted to stakeholders interested in opening a **5 star hotel** in **Barcelona**, Spain. As we know that barcelona is a well known tourist destination in europe, opening a luxurious hotel can be considered as a good investment.

Since there are lots of hotels in Barcelona we will try to detect **locations that are not already crowded with hotels**. We are also particularly interested in **areas with no 5 star hotels in vicinity**. We would also prefer locations **as close to city center as possible**, assuming that first two conditions are met.

We will use our data science skills to generate a few most promising neighborhoods based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

## Data <a name="data"></a>

Based on definition of our problem, factors that will influence our decission are:
* number of existing hotels in the neighborhood (any type of hotel)
* number of and distance to 5 star hotels in the neighborhood, if any
* distance of neighborhood from city center


Following data sources will be needed to extract/generate the required information:
* List of neighbourhood can be obtained using the wikipedia link
https://en.wikipedia.org/wiki/Districts_of_Barcelona
* centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using **Google Maps API reverse geocoding**
* number of hotels and their type and location in every neighborhood will be obtained using **Foursquare API**
* coordinate of Barcelona center will be obtained using **Google Maps API geocoding** of well known Barcelona location (Camp nou)

### Neighborhood Candidates

Let's create latitude & longitude coordinates for centroids of our candidate neighborhoods. 
Let's first find the latitude & longitude of Barcelona city center, using specific, well known address and Google Maps geocoding API.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    altair-4.1.0               |             py_1         614 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    certifi-2020.4.5.1         |   py36h9f0ad1d_0         151 KB  conda-forge
    ca-certificates-2020.4.5.1 |       hecc5488_0         146 KB  conda-forge
    branca-0.4.1               |             py_0          26 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    ------------------------------------------------------------
                       

The geograpical coordinate of Camp Nou are 41.38089905, 2.122922500751749.


In [3]:
from bs4 import BeautifulSoup
import pandas as pd
import requests
import lxml.html as lh

In [4]:
from urllib.request import urlopen
url = "https://en.wikipedia.org/wiki/Districts_of_Barcelona"
source = requests.get(url)
soup = BeautifulSoup(source.text, 'lxml')

In [5]:
#using soup object, iterate the .wikitable to get the data from the HTML page and store it into a list
data = []
columns = []
table = soup.find(class_='wikitable')
for index, tr in enumerate(table.find_all('tr')):
    section = []
    for td in tr.find_all(['th','td']):
        section.append(td.text.rstrip())
    
    #First row of data is the header
    if (index == 0):
        columns = section
    else:
        data.append(section)

In [6]:
barca_df = pd.DataFrame(data = data,columns = columns)
barca_df.head()

Unnamed: 0,Number,District,Size km2,Population,Density inhabitants/km2,Neighbourhoods,Councilman[2],Party
0,1,Ciutat Vella,4.49,111290,24786,"La Barceloneta, El Gòtic, El Raval, Sant Pere,...",Jordi Rabassa i Massons,Barcelona en Comú
1,2,Eixample,7.46,262485,35586,"L'Antiga Esquerra de l'Eixample, La Nova Esque...",Jordi Martí Grau,Barcelona en Comú
2,3,Sants-Montjuïc,21.35,177636,8321,"La Bordeta, la Font de la Guatlla, Hostafrancs...",Marc Serra Solé,Barcelona en Comú
3,4,Les Corts,6.08,82588,13584,"les Corts, la Maternitat i Sant Ramon, Pedralbes",Xavier Marcé Carol,Socialists' Party of Catalonia
4,5,Sarrià-Sant Gervasi,20.09,140461,6992,"El Putget i Farró, Sarrià, Sant Gervasi - la B...",Albert Batlle i Bastardas,Socialists' Party of Catalonia


In [7]:
barca_df=barca_df.drop(['Size km2','Population','Density inhabitants/km2', 'Councilman[2]' ,'Party'], axis = 1) 

In [9]:
barca_df.head()

Unnamed: 0,Number,District,Neighbourhoods
0,1,Ciutat Vella,"La Barceloneta, El Gòtic, El Raval, Sant Pere,..."
1,2,Eixample,"L'Antiga Esquerra de l'Eixample, La Nova Esque..."
2,3,Sants-Montjuïc,"La Bordeta, la Font de la Guatlla, Hostafrancs..."
3,4,Les Corts,"les Corts, la Maternitat i Sant Ramon, Pedralbes"
4,5,Sarrià-Sant Gervasi,"El Putget i Farró, Sarrià, Sant Gervasi - la B..."


In [10]:
locator = Nominatim(user_agent="myGeocoder")

In [11]:
!pip install googlemaps
from googlemaps import Client as GoogleMaps
import pandas as pd 

Collecting googlemaps
  Downloading https://files.pythonhosted.org/packages/6f/b5/3a2e0b1d96d61b6739a98b37369cef4db7e97144fb90ce2e5684fbac4dde/googlemaps-4.4.0.tar.gz
Building wheels for collected packages: googlemaps
  Building wheel for googlemaps (setup.py) ... [?25ldone
[?25h  Stored in directory: /home/dsxuser/.cache/pip/wheels/b1/f3/2a/6b416bce171c73da2891978bb6efc5011f000e074e72f51ed8
Successfully built googlemaps
Installing collected packages: googlemaps
Successfully installed googlemaps-4.4.0


In [None]:
gmaps = GoogleMaps('AIzaSyAtkI2R3wcboMkh1b7E09_v_gG_crQU978')

In [12]:
from itertools import chain

# return list from series of comma-separated strings
def chainer(s):
    return list(chain.from_iterable(s.str.split(',')))

# calculate lengths of splits
lens = barca_df['Neighbourhoods'].str.split(',').map(len)

# create new dataframe, repeating or chaining as appropriate
barca_df1 = pd.DataFrame({'Number': np.repeat(barca_df['Number'], lens),
                    'District': np.repeat(barca_df['District'], lens),
                    'Neighbourhoods': chainer(barca_df['Neighbourhoods'])})

print(barca_df1)

  Number             District                                 Neighbourhoods
0      1         Ciutat Vella                                 La Barceloneta
0      1         Ciutat Vella                                       El Gòtic
0      1         Ciutat Vella                                       El Raval
0      1         Ciutat Vella                                      Sant Pere
0      1         Ciutat Vella                     Santa Caterina i la Ribera
1      2             Eixample                L'Antiga Esquerra de l'Eixample
1      2             Eixample                 La Nova Esquerra de l'Eixample
1      2             Eixample                            Dreta de l'Eixample
1      2             Eixample                                     Fort Pienc
1      2             Eixample                                Sagrada Família
1      2             Eixample                                    Sant Antoni
2      3       Sants-Montjuïc                                     La Bordeta

In [13]:
locator = Nominatim(user_agent="myGeocoder")
from geopy.extra.rate_limiter import RateLimiter

# 1 - conveneint function to delay between geocoding calls
geocode = RateLimiter(locator.geocode, min_delay_seconds=1)
# 2- - create location column
barca_df1['location'] = barca_df1['Neighbourhoods'].apply(geocode)

In [None]:
barca_df1.head()

In [None]:
# 3 - create longitude, laatitude and altitude from location column (returns tuple)
barca_df1['point'] = barca_df1['location'].apply(lambda loc: tuple(loc.point) if loc else None)
# 4 - split point column into latitude, longitude and altitude columns
barca_df1[['latitude', 'longitude', 'altitude']] = pd.DataFrame(barca_df1['point'].tolist(), index=barca_df1.index)

In [None]:
barca_df1.head()

In [None]:
barca_df1.drop(columns =['location', 'point' , 'altitude']) 

In [None]:
barca_df1=barca_df1.dropna()
print(barca_df1)

In [None]:
address = 'Barcelona'

geolocator = Nominatim(user_agent="barcelona_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geographical coordinate of Barcelona are {}, {}.'.format(latitude, longitude))

In [None]:
map_barcelona = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng , label in zip(barca_df1['latitude'], barca_df1['longitude'] , barca_df1['Neighbourhoods']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_barcelona)  

map_barcelona
    