# Capstone Project - The Battle of Neighborhoods (Week 1)

 **_How to choose the best hotel in Paris and nearby areas?_**

## I. Business Problem Section

### Introduction

Being one of the world capitals of arts, culture, gastronomy and fashion, millions of travelers visit Paris each year to explore the city's cultural attractions, such as: The Eiffel Tower, Musée du Louvre, Cathédrale Notre-Dame de Paris, Avenue des Champs-Élysées, Disneyland, Palace of Versailles, Musée d'Orsay and so on. There are a lot of travel agencies that offer various deals on flights, hotel stays and rental cars. Also, there are people who prefer not to work with a travel agency and who want to plan the holiday on their own. 

If someone wants to travel there are many things to consider from choosing the right location, accomodation, flights, rental cars to attractions, restaurants, stores and other facilities. 

Therefore, the main idea is that in both cases, on their own and with a travel agency, it is necessary to have a list of recommendations and criteria of choosing the optimal one. So, a good idea would we to develop one aplication that incorporate a lot of machine learning techniques and leverage the Foursquare location data in order to cluster different cities neighborhoods, in our case Paris, to recommend venues and to support people who are looking for the right hotel to take the best decisions.

### Business Problem

In this scenario, the business problem I am trying to solve is: How could I provide support to different stakeholders (people or tourism agencies) in choosing the best accomodation? Where would I recommend that is the best place to stay?

To solve this business problem, we will use Foursquare location data and we will create machine learning models to cluster Paris neighborhoods in order to recommend profitable hotels based on different surrounding facilities such as venues, restaurants, stores, attractions and so on.

Through these models the stakeholders will have a wide range of recommentations for accomodation, they will know all the facitilies to enjoy on vacation, will receive a wide range of options and, in this way, they will know exactly what hotel is the most suitable for them.

## II. Data Section

To perform this idea, it was used data from 2 different sources. Data about hotels from Paris and nearby areas was taken from: https://www.accorhotels.com. It was collected information related to postal code, name of the hotel, address and it was integrated into a database which contain 40 observations about 40 hotels from Paris and nearby areas. For a better analysis it was selected data about hotels from different areas and with different facilities.

The second source used is Foursquare location data in order to explore and target recommended locations across different venues. Everything was arranged into a pandas dataframe for exploration, visualization and modeling.

The final database which combine Foursquare location data and Paris + nearby areas hotels data, will be used to develop our machine learning models and to cluster Paris + nearby areas neighborhoods in order to provide the best recommendations in choosing a hotel based on a wide range of surrounding facilities.

In [9]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Fetching package metadata .............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
geopy                     1.18.1                     py_0    conda-forge
Fetching package metadata .............
Solving package specifications: .

Package plan for installation in environment /opt/conda/envs/DSX-Python35:

The following NEW packages will be INSTALLED:

    altair:  2.2.2-py35_1 conda-forge
    branca:  0.3.1-py_0   conda-forge
    folium:  0.5.0-py_0   conda-forge
    vincent: 0.4.4-py_1   conda-forge

altair-2.2.2-p 100% |################################| Time: 0:00:00  36.07 MB/s
branca-0.3.1-p 100% |################################| Time: 0:00:00  32.88 MB/s
vincent-0.4.4- 100% |################################| Time: 0:00:00  38.51 MB/s
folium-0.5.0-p 100% |################################| Time: 0:00:00  46.43 MB/s
Libraries imported.


In [10]:
# Read the data (Source: www.accorhotels.com)
hotels = pd.read_csv('https://raw.githubusercontent.com/OanaStr/Coursera_Capstone/master/Hotels.csv',encoding = "ISO-8859-1")
hotels.head(5)

Unnamed: 0,PostalCode,Hotel_Name,Address
0,75008,Mercure Paris Opera Garnier Hotel,4 rue de l 'Isly
1,75009,Scribe Paris Opéra by Sofitel,1 rue Scribe
2,75017,Mercure Paris St Lazare Monceau hotel,99 bis Rue de Rome
3,75002,Hôtel Stendhal Place Vendôme Paris - MGallery ...,22 rue Danielle Casanova
4,75018,Mercure Paris Montmartre Sacré-Coeur Hotel,3 rue Caulaincourt


In [11]:
hotels.shape

(40, 3)

In [12]:
address = 'Paris, FR'
geolocator = Nominatim(user_agent="p_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Paris are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Paris are 48.8566101, 2.3514992.


In [19]:
geolocator = Nominatim(user_agent="my-application")
hotels['city_coord'] = hotels['PostalCode'].apply(geolocator.geocode).apply(lambda x: (x.latitude, x.longitude))
hotels.head(5)

Unnamed: 0,PostalCode,Hotel_Name,Address,Latitude,Longitude,city_coord
0,75008,Mercure Paris Opera Garnier Hotel,4 rue de l 'Isly,48.873601,2.307613,"(48.87360115, 2.30761301337209)"
1,75009,Scribe Paris Opéra by Sofitel,1 rue Scribe,48.87751,2.336875,"(48.8775101429297, 2.33687546379827)"
2,75017,Mercure Paris St Lazare Monceau hotel,99 bis Rue de Rome,48.883574,2.304989,"(48.8835744925048, 2.30498942147319)"
3,75002,Hôtel Stendhal Place Vendôme Paris - MGallery ...,22 rue Danielle Casanova,48.867317,2.344443,"(48.8673173622243, 2.34444344701296)"
4,75018,Mercure Paris Montmartre Sacré-Coeur Hotel,3 rue Caulaincourt,48.89008,2.34984,"(48.8900799653822, 2.34984008370641)"


In [20]:
hotels[['Latitude', 'Longitude']] = hotels['city_coord'].apply(pd.Series)

In [21]:
hotels = hotels.drop(columns=['city_coord'])

In [22]:
hotels.head(5)

Unnamed: 0,PostalCode,Hotel_Name,Address,Latitude,Longitude
0,75008,Mercure Paris Opera Garnier Hotel,4 rue de l 'Isly,48.873601,2.307613
1,75009,Scribe Paris Opéra by Sofitel,1 rue Scribe,48.87751,2.336875
2,75017,Mercure Paris St Lazare Monceau hotel,99 bis Rue de Rome,48.883574,2.304989
3,75002,Hôtel Stendhal Place Vendôme Paris - MGallery ...,22 rue Danielle Casanova,48.867317,2.344443
4,75018,Mercure Paris Montmartre Sacré-Coeur Hotel,3 rue Caulaincourt,48.89008,2.34984
