# Capstone Project - The Battle of Neighborhoods (Week 2)


**_How to choose the best hotel in Paris and nearby areas?_**


## I. Business Problem Section

### Introduction

Being one of the world capitals of arts, culture, gastronomy and fashion, millions of travelers visit Paris each year to explore the city's cultural attractions, such as: The Eiffel Tower, Musée du Louvre, Cathédrale Notre-Dame de Paris, Avenue des Champs-Élysées, Disneyland, Palace of Versailles, Musée d'Orsay and so on. There are a lot of travel agencies that offer various deals on flights, hotel stays and rental cars. Also, there are people who prefer not to work with a travel agency and who want to plan the holiday on their own.

If someone wants to travel there are many things to consider from choosing the right location, accommodation, flights, rental cars to attractions, restaurants, stores and other facilities.

Therefore, the main idea is that in both cases, on their own and with a travel agency, it is necessary to have a list of recommendations and criteria of choosing the optimal one. So, a good idea would we to develop one application that incorporate a lot of machine learning techniques and leverage the Foursquare location data in order to cluster different cities neighborhoods, in our case Paris, to recommend venues and to support people who are looking for the right hotel to take the best decisions.

### Business Problem

In this scenario, the business problem I am trying to solve is: How could I provide support to different stakeholders (people or tourism agencies) in choosing the best accommodation? Where would I recommend that is the best place to stay?

To solve this business problem, we will use Foursquare location data and we will create machine learning models to cluster Paris and nearby areas neighborhoods in order to recommend profitable hotels based on different surrounding facilities such as venues, restaurants, stores, attractions and so on.

Through these models the stakeholders will have a wide range of recommentations for accommodation, they will know all the facitilies to enjoy on vacation, will receive a wide range of options and, in this way, they will know exactly what hotel is the most suitable for them.

## II. Data Section

To perform this idea, it was used data from 2 different sources. Data about hotels from Paris and nearby areas was taken from: https://www.accorhotels.com. It was collected information related to postal code, name of the hotel, address and it was integrated into a database which contain 40 observations about 40 hotels from Paris and nearby areas. For a better analysis it was selected data about hotels from different areas and with different facilities.

The second source used is Foursquare location data in order to explore and target recommended locations across different venues. Everything was arranged into a pandas dataframe for exploration, visualization and modeling.

The final database which combine Foursquare location data and Paris + nearby areas hotels data, will be used to develop our machine learning models and to cluster Paris + nearby areas neighborhoods in order to provide the best recommendations in choosing a hotel based on a wide range of surrounding facilities.

## III. Methodology section 

In the methodology section will be presented the main components of the report, namely:
1. Data Collection: to bring all the reuired information in one single dataframe 
2. Data Exploration and Understanding: to inspect the data, extract and analyze all the hidden insigths 
3. Data Modeling: K-means Clustering: to cluster Paris and nearby hotels to see how they group togheted and based on what kind of information with the final puurpose to provide useful information related to accomodation strategies

#### Download all the dependencies

In [2]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Fetching package metadata .............
Solving package specifications: .

Package plan for installation in environment /opt/conda/envs/DSX-Python35:

The following NEW packages will be INSTALLED:

    geographiclib: 1.49-py_0   conda-forge
    geopy:         1.18.1-py_0 conda-forge

geographiclib- 100% |################################| Time: 0:00:00  23.10 MB/s
geopy-1.18.1-p 100% |################################| Time: 0:00:00  35.45 MB/s
Fetching package metadata .............
Solving package specifications: .

Package plan for installation in environment /opt/conda/envs/DSX-Python35:

The following NEW packages will be INSTALLED:

    altair:  2.2.2-py35_1 conda-forge
    branca:  0.3.1-py_0   conda-forge
    folium:  0.5.0-py_0   conda-forge
    vincent: 0.4.4-py_1   conda-forge

altair-2.2.2-p 100% |################################| Time: 0:00:00  47.85 MB/s
branca-0.3.1-p 100% |################################| Time: 0:00:00  33.46 MB/s
vincent-0.4.4- 100% |###################

### 1. Data Collection

As mentioned previously, it was used data from 2 different sources. The first database is about Paris + nearby areas hotels and contains information related to postal code, name of the hotel, address. It contains 40 observations about 40 Paris + nearby areas hotels. For a better analysis it was selected data about hotels from different areas and with different facilities. The second source used is Foursquare location data in order to explore and target recommended locations across different venues. Everything was arranged into one single pandas dataframe that can be seen below. The data was collected by me and it was processed before to import it in this notebook. 

In [3]:
# Read the Paris hotels data (Source: www.accorhotels.com)
hotels = pd.read_csv('https://raw.githubusercontent.com/OanaStr/Coursera_Capstone/master/Hotels.csv',encoding = "ISO-8859-1")
hotels

Unnamed: 0,PostalCode,Hotel_Name,Address
0,75008,Mercure Paris Opera Garnier Hotel,4 rue de l 'Isly
1,75009,Scribe Paris Opéra by Sofitel,1 rue Scribe
2,75017,Mercure Paris St Lazare Monceau hotel,99 bis Rue de Rome
3,75002,Hôtel Stendhal Place Vendôme Paris - MGallery ...,22 rue Danielle Casanova
4,75018,Mercure Paris Montmartre Sacré-Coeur Hotel,3 rue Caulaincourt
5,75010,Hôtel L'Échiquier Opéra Paris - MGallery by So...,38 Rue De L Echiquier
6,75015,ibis Paris Tour Montparnasse 15th,22 Avenue du Maine
7,75014,Mercure Paris Gare Montparnasse Hotel,20 rue de la Gaité
8,75012,Mercure Paris Gare de Lyon TGV hotel,2 place Louis Armand
9,75013,Mercure Paris Bercy Bibliothèque Hotel,"6, boulevard Vincent Auriol"


In [4]:
hotels.shape

(40, 3)

In [5]:
# Use geopy library to get the latitude and longitude values of Paris city
address = 'Paris,France'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Paris are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Paris are 48.8566101, 2.3514992.


In [6]:
# Assign based on postal code the latitude and longitode coordinates
geolocator = Nominatim(user_agent="my-application")
hotels['city_coord'] = hotels['PostalCode'].apply(geolocator.geocode).apply(lambda x: (x.latitude, x.longitude))
hotels.head(5)

Unnamed: 0,PostalCode,Hotel_Name,Address,city_coord
0,75008,Mercure Paris Opera Garnier Hotel,4 rue de l 'Isly,"(48.87360115, 2.30761301337209)"
1,75009,Scribe Paris Opéra by Sofitel,1 rue Scribe,"(48.88137175, 2.33357002448946)"
2,75017,Mercure Paris St Lazare Monceau hotel,99 bis Rue de Rome,"(48.8848133, 2.3028393)"
3,75002,Hôtel Stendhal Place Vendôme Paris - MGallery ...,22 rue Danielle Casanova,"(48.8673173622242, 2.34444344701296)"
4,75018,Mercure Paris Montmartre Sacré-Coeur Hotel,3 rue Caulaincourt,"(48.886698, 2.3526602)"


In [7]:
hotels[['Latitude', 'Longitude']] = hotels['city_coord'].apply(pd.Series)

In [8]:
Hotels = hotels.drop(columns=['city_coord'])

In [9]:
Hotels

Unnamed: 0,PostalCode,Hotel_Name,Address,Latitude,Longitude
0,75008,Mercure Paris Opera Garnier Hotel,4 rue de l 'Isly,48.873601,2.307613
1,75009,Scribe Paris Opéra by Sofitel,1 rue Scribe,48.881372,2.33357
2,75017,Mercure Paris St Lazare Monceau hotel,99 bis Rue de Rome,48.884813,2.302839
3,75002,Hôtel Stendhal Place Vendôme Paris - MGallery ...,22 rue Danielle Casanova,48.867317,2.344443
4,75018,Mercure Paris Montmartre Sacré-Coeur Hotel,3 rue Caulaincourt,48.886698,2.35266
5,75010,Hôtel L'Échiquier Opéra Paris - MGallery by So...,38 Rue De L Echiquier,48.877463,2.358011
6,75015,ibis Paris Tour Montparnasse 15th,22 Avenue du Maine,48.841429,2.295176
7,75014,Mercure Paris Gare Montparnasse Hotel,20 rue de la Gaité,48.828877,2.326006
8,75012,Mercure Paris Gare de Lyon TGV hotel,2 place Louis Armand,48.842645,2.387497
9,75013,Mercure Paris Bercy Bibliothèque Hotel,"6, boulevard Vincent Auriol",48.82931,2.362457


In [10]:
# Create a map of Paris using latitude and longitude values
map_paris = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, hotels, street in zip(Hotels['Latitude'], Hotels['Longitude'], Hotels['Hotel_Name'], Hotels['Address']):
    label = '{}, {}'.format(street, hotels)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_paris)  

map_paris

Using the latitude and longitude values it was created a map of Paris and nearby areas showing based on postal code and address the position of all the hotels collected in the database. For a better analysis it was selected including data about hotels nearby Paris with different surrounding facilities.

In [11]:
# Define Foursquare Credentials and Version
CLIENT_ID = 'IP15XLXSVGNVS2OAB01MRXZ2TLBSKVAYWAOJLY4LDBRM3XL3' # your Foursquare ID
CLIENT_SECRET = 'FBROZZGDJ03YQEKCLETRLVICLJLUILMWZ02KWOHT4LOYVNMA' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: IP15XLXSVGNVS2OAB01MRXZ2TLBSKVAYWAOJLY4LDBRM3XL3
CLIENT_SECRET:FBROZZGDJ03YQEKCLETRLVICLJLUILMWZ02KWOHT4LOYVNMA


### 2. Data Exploration

#### 2.1. Data Understanding

In [12]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Hotel', 
                  'Hotel Latitude', 
                  'Hotel Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [13]:
# Run the above function on each hotel and create a new dataframe called hotels_venues
LIMIT = 100
radius = 500
hotels_venues = getNearbyVenues(names=Hotels['Hotel_Name'],
                                   latitudes=Hotels['Latitude'],
                                   longitudes=Hotels['Longitude']
                                  )

Mercure Paris Opera Garnier Hotel
Scribe Paris Opéra by Sofitel
Mercure Paris St Lazare Monceau hotel
Hôtel Stendhal Place Vendôme Paris - MGallery by Sofitel
Mercure Paris Montmartre Sacré-Coeur Hotel
Hôtel L'Échiquier Opéra Paris - MGallery by Sofitel
ibis Paris Tour Montparnasse 15th
Mercure Paris Gare Montparnasse Hotel
Mercure Paris Gare de Lyon TGV hotel
Mercure Paris Bercy Bibliothèque Hotel
Novotel Paris Les Halles
Aparthotel Adagio access Paris La Villette
Novotel Suites Paris Stade de France hotel
Pullman Paris La Défense
ibis Poissy
Hôtel Mercure Paris Ouest Saint Germain
ibis budget Chambourcy Saint Germain
Hôtel Le Louis Versailles Château - MGallery by Sofitel
bis Paris Coeur d'Orly Airport
Novotel Paris Orly Rungis
ibis Styles Antony Paris Sud
ibis Massy
hotelF1 Chilly Mazarin les Champarts
ibis budget Villeneuve le Roi
bis budget Vitry sur Seine N7
hotelF1 Paris Porte de Montreuil
ibis Styles Paris Val de Fontenay
ibis budget Villemomble
ibis budget Orly Chevilly Tram 7

In [14]:
# Check the size of the resulting dataframe
print(hotels_venues.shape)
hotels_venues.head(5)

(1133, 7)


Unnamed: 0,Hotel,Hotel Latitude,Hotel Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Mercure Paris Opera Garnier Hotel,48.873601,2.307613,Hôtel Champs-Élysées Plaza,48.873726,2.306323,Hotel
1,Mercure Paris Opera Garnier Hotel,48.873601,2.307613,Hôtel Daniel,48.872914,2.307275,Hotel
2,Mercure Paris Opera Garnier Hotel,48.873601,2.307613,Apicius,48.873471,2.307362,French Restaurant
3,Mercure Paris Opera Garnier Hotel,48.873601,2.307613,Sens Unique,48.871747,2.306467,French Restaurant
4,Mercure Paris Opera Garnier Hotel,48.873601,2.307613,Hôtel Bradford-Élysées,48.872856,2.308091,Hotel


In [15]:
# Check how many venues were returned for each hotel
hotels_venues.groupby('Hotel').count()

Unnamed: 0_level_0,Hotel Latitude,Hotel Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Hotel,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Aparthotel Adagio access Paris La Villette,24,24,24,24,24,24
Hôtel L'Échiquier Opéra Paris - MGallery by Sofitel,100,100,100,100,100,100
Hôtel Le Louis Versailles Château - MGallery by Sofitel,47,47,47,47,47,47
Hôtel Mercure Paris Ouest Saint Germain,5,5,5,5,5,5
Hôtel Stendhal Place Vendôme Paris - MGallery by Sofitel,100,100,100,100,100,100
Mercure Paris Bercy Bibliothèque Hotel,49,49,49,49,49,49
Mercure Paris Gare Montparnasse Hotel,30,30,30,30,30,30
Mercure Paris Gare de Lyon TGV hotel,52,52,52,52,52,52
Mercure Paris Montmartre Sacré-Coeur Hotel,67,67,67,67,67,67
Mercure Paris Opera Garnier Hotel,100,100,100,100,100,100


This table shows how many venues were returned for each hotel. Hotels that are in the centre of Paris have the most facilities (e.g. Hôtel L'Échiquier Opéra Paris - MGallery by Sofitel, Hôtel Stendhal Place Vendôme Paris - MGallery by Sofitel, Mercure Paris Opera Garnier Hotel, Novotel Paris Les Halles, Scribe Paris Opéra by Sofitel). Compaired to the hotels located further from the center (e.g. ibis Poissy, bis budget Goussainville CDG, Hôtel Mercure Paris Ouest Saint Germain, Novotel Paris Charles-de-Gaulle Airport).

In [16]:
# Find out how many unique categories can be curated from all the returned venues
print('There are {} uniques categories.'.format(len(hotels_venues['Venue Category'].unique())))

There are 203 uniques categories.


In [17]:
#### 2.2. Analyze each hotel
# one hot encoding
hotels_onehot = pd.get_dummies(hotels_venues[['Venue Category']], prefix="", prefix_sep="")

# add hotel column back to dataframe
hotels_onehot['Hotel'] = hotels_venues['Hotel'] 

# move hotel column to the first column
fixed_columns = [hotels_onehot.columns[-1]] + list(hotels_onehot.columns[:-1])
hotels_onehot = hotels_onehot[fixed_columns]

hotels_onehot.head(5)

Unnamed: 0,Yoga Studio,African Restaurant,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Bagel Shop,Bakery,Bar,Basketball Court,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Belgian Restaurant,Big Box Store,Bike Rental / Bike Share,Bistro,Bookstore,Boutique,Bowling Alley,Brasserie,Breakfast Spot,Breton Restaurant,Brewery,Bridge,Burger Joint,Bus Station,Bus Stop,Cafeteria,Café,Cajun / Creole Restaurant,Cambodian Restaurant,Camera Store,Candy Store,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Creperie,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop,Diner,Disc Golf,Dive Bar,Donut Shop,Duty-free Shop,EV Charging Station,Electronics Store,Entertainment Service,Exhibit,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Film Studio,Fish & Chips Shop,Food & Drink Shop,Food Truck,Franconian Restaurant,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Gas Station,Gastropub,German Restaurant,Gift Shop,Gluten-free Restaurant,Gourmet Shop,Government Building,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hawaiian Restaurant,Health Food Store,Historic Site,Hookah Bar,Hostel,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indonesian Restaurant,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Kebab Restaurant,Kids Store,Korean Restaurant,Lebanese Restaurant,Lounge,Market,Martial Arts Dojo,Medical Center,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Moroccan Restaurant,Movie Theater,Multiplex,Museum,Music Store,Music Venue,Nightclub,Office,Optical Shop,Organic Grocery,Other Repair Shop,Outdoor Sculpture,Paper / Office Supplies Store,Park,Pastry Shop,Pedestrian Plaza,Performing Arts Venue,Perfume Shop,Peruvian Restaurant,Pharmacy,Pizza Place,Playground,Plaza,Poke Place,Pool,Pub,Public Art,Ramen Restaurant,Rental Car Location,Resort,Restaurant,Russian Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Seafood Restaurant,Shipping Store,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Smoke Shop,Snack Place,Soccer Field,Soup Place,Southwestern French Restaurant,Souvenir Shop,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Stables,Steakhouse,Supermarket,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Tapas Restaurant,Tea Room,Tennis Court,Tennis Stadium,Thai Restaurant,Theater,Toll Plaza,Toy / Game Store,Trail,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Mercure Paris Opera Garnier Hotel,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Mercure Paris Opera Garnier Hotel,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Mercure Paris Opera Garnier Hotel,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Mercure Paris Opera Garnier Hotel,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Mercure Paris Opera Garnier Hotel,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [18]:
# Examine the new dataframe size
hotels_onehot.shape

(1133, 203)

In [19]:
# Group rows by hotel and by taking the mean of the frequency of occurrence of each category
hotels_grouped = hotels_onehot.groupby('Hotel').mean().reset_index()
hotels_grouped

Unnamed: 0,Hotel,Yoga Studio,African Restaurant,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Bagel Shop,Bakery,Bar,Basketball Court,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Belgian Restaurant,Big Box Store,Bike Rental / Bike Share,Bistro,Bookstore,Boutique,Bowling Alley,Brasserie,Breakfast Spot,Breton Restaurant,Brewery,Bridge,Burger Joint,Bus Station,Bus Stop,Cafeteria,Café,Cajun / Creole Restaurant,Cambodian Restaurant,Camera Store,Candy Store,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Creperie,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop,Diner,Disc Golf,Dive Bar,Donut Shop,Duty-free Shop,EV Charging Station,Electronics Store,Entertainment Service,Exhibit,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Film Studio,Fish & Chips Shop,Food & Drink Shop,Food Truck,Franconian Restaurant,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Gas Station,Gastropub,German Restaurant,Gift Shop,Gluten-free Restaurant,Gourmet Shop,Government Building,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hawaiian Restaurant,Health Food Store,Historic Site,Hookah Bar,Hostel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indonesian Restaurant,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Kebab Restaurant,Kids Store,Korean Restaurant,Lebanese Restaurant,Lounge,Market,Martial Arts Dojo,Medical Center,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Moroccan Restaurant,Movie Theater,Multiplex,Museum,Music Store,Music Venue,Nightclub,Office,Optical Shop,Organic Grocery,Other Repair Shop,Outdoor Sculpture,Paper / Office Supplies Store,Park,Pastry Shop,Pedestrian Plaza,Performing Arts Venue,Perfume Shop,Peruvian Restaurant,Pharmacy,Pizza Place,Playground,Plaza,Poke Place,Pool,Pub,Public Art,Ramen Restaurant,Rental Car Location,Resort,Restaurant,Russian Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Seafood Restaurant,Shipping Store,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Smoke Shop,Snack Place,Soccer Field,Soup Place,Southwestern French Restaurant,Souvenir Shop,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Stables,Steakhouse,Supermarket,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Tapas Restaurant,Tea Room,Tennis Court,Tennis Stadium,Thai Restaurant,Theater,Toll Plaza,Toy / Game Store,Trail,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store
0,Aparthotel Adagio access Paris La Villette,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Hôtel L'Échiquier Opéra Paris - MGallery by So...,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.05,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.15,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.05,0.0,0.0,0.0,0.03,0.05,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.02,0.0
2,Hôtel Le Louis Versailles Château - MGallery b...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.042553,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.06383,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.212766,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.021277,0.0,0.0,0.0,0.021277,0.021277,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.042553,0.0,0.06383,0.0,0.0,0.085106,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.021277,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.021277,0.0
3,Hôtel Mercure Paris Ouest Saint Germain,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0
4,Hôtel Stendhal Place Vendôme Paris - MGallery ...,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.04,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.06,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.1,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.01,0.01,0.0,0.04,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.01,0.01,0.0,0.03,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.05,0.01,0.01
5,Mercure Paris Bercy Bibliothèque Hotel,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.163265,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.081633,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.061224,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.040816,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.061224,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.244898,0.0,0.0,0.0
6,Mercure Paris Gare Montparnasse Hotel,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.033333,0.0,0.0,0.266667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.033333,0.033333,0.033333,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Mercure Paris Gare de Lyon TGV hotel,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.019231,0.057692,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.019231,0.0,0.019231,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.019231,0.019231,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.019231,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.096154,0.019231,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.019231,0.0,0.0,0.019231,0.0,0.0,0.019231,0.0,0.019231,0.019231,0.0,0.0
8,Mercure Paris Montmartre Sacré-Coeur Hotel,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014925,0.029851,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.059701,0.104478,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.044776,0.029851,0.0,0.0,0.014925,0.0,0.0,0.014925,0.0,0.0,0.0,0.014925,0.0,0.059701,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.0,0.044776,0.0,0.014925,0.0,0.044776,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014925,0.014925,0.0,0.0,0.0,0.0,0.0,0.059701,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029851,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.014925,0.014925,0.029851,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029851,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.0
9,Mercure Paris Opera Garnier Hotel,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.03,0.03,0.01,0.01,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.13,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.03,0.03,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.04,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


This table shows what venues are available near each hotel. There is a huge variety of facilities from local and foreign restaurants (french, african, american, argentinian, asian, belgian, chinese, hawaiian and so on), airport services, art gallery, art museum, bus and trail station, shops, church, comedy club, grocery stores, market to jazz club, medical center, museum, nightclub, park, theater and many others.

In [20]:
# Confirm the new size
hotels_grouped.shape

(36, 203)

In [21]:
# Print each hotel along with the top 5 most common venues
num_top_venues = 5

for Code in hotels_grouped['Hotel']:
    print("Hotel:", Code)
    temp = hotels_grouped[hotels_grouped['Hotel'] == Code].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

Hotel: Aparthotel Adagio access Paris La Villette
               venue  freq
0  French Restaurant  0.12
1             Bakery  0.08
2               Café  0.08
3                Bar  0.08
4      Grocery Store  0.04


Hotel: Hôtel L'Échiquier Opéra Paris - MGallery by Sofitel
                 venue  freq
0    French Restaurant  0.15
1               Bistro  0.05
2                 Café  0.05
3  Japanese Restaurant  0.05
4                  Bar  0.05


Hotel: Hôtel Le Louis Versailles Château - MGallery by Sofitel
               venue  freq
0  French Restaurant  0.21
1                Pub  0.09
2              Plaza  0.06
3           Creperie  0.06
4             Bakery  0.04


Hotel: Hôtel Mercure Paris Ouest Saint Germain
                  venue  freq
0          Soccer Field   0.2
1         Train Station   0.2
2           Supermarket   0.2
3  Gym / Fitness Center   0.2
4        Tennis Stadium   0.2


Hotel: Hôtel Stendhal Place Vendôme Paris - MGallery by Sofitel
               venue  freq
0  F

We can observe for each hotel the top 5 most common venues. Hotels from central area have in particular facilities, such as: local and foreign restaurants, bistro, cafe, bar, bakery, stores, plaza, lounge, along with the biggest tourist attractions: The Eiffel Tower, Louvre Museum, Notre Dame de Paris, Pantheon, Arc de Triomphe.

In [22]:
#  Put that into a pandas dataframe and sort the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [23]:
# Create the new dataframe and display the top 10 venues for each hotel
num_top_venues = 10
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Hotel']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

In [24]:
# Show for each hotel the most common venue
venues_sorted = pd.DataFrame(columns=columns)
venues_sorted['Hotel'] = hotels_grouped['Hotel']

for ind in np.arange(hotels_grouped.shape[0]):
    venues_sorted.iloc[ind, 1:] = return_most_common_venues(hotels_grouped.iloc[ind, :], num_top_venues)

venues_sorted.head()

Unnamed: 0,Hotel,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Aparthotel Adagio access Paris La Villette,French Restaurant,Café,Bar,Bakery,Indian Restaurant,Sandwich Place,Supermarket,Restaurant,Dessert Shop,Food Truck
1,Hôtel L'Échiquier Opéra Paris - MGallery by So...,French Restaurant,Indian Restaurant,Café,Japanese Restaurant,Bar,Bistro,Italian Restaurant,Burger Joint,Restaurant,Train Station
2,Hôtel Le Louis Versailles Château - MGallery b...,French Restaurant,Pub,Plaza,Creperie,Pizza Place,Bakery,Coffee Shop,Bar,Burger Joint,Sandwich Place
3,Hôtel Mercure Paris Ouest Saint Germain,Soccer Field,Train Station,Gym / Fitness Center,Supermarket,Tennis Stadium,Women's Store,Disc Golf,Falafel Restaurant,Exhibit,Entertainment Service
4,Hôtel Stendhal Place Vendôme Paris - MGallery ...,French Restaurant,Cocktail Bar,Wine Bar,Bistro,Italian Restaurant,Bakery,Thai Restaurant,Pizza Place,Beer Bar,Boutique


We can see in a visually appealing way for each hotel the top 10 most common venues.

### 3. Data Modeling

In [25]:
# Run k-means to cluster the neighborhood into 7 clusters
# Set number of clusters
kclusters = 7
hotels_grouped_clustering = hotels_grouped.drop('Hotel', 1)

# Run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(hotels_grouped_clustering)

# Check cluster labels generated for each row in the dataframe
kmeans.labels_[0:50]

array([6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 4, 6, 6, 6, 6, 6, 6, 6, 6, 1, 6,
       3, 6, 6, 6, 6, 2, 6, 0, 5, 0, 0, 4, 6], dtype=int32)

In [26]:
# Add labels
venues_sorted.insert(0,'Cluster Labels', kmeans.labels_)
hotels_merged = Hotels
hotels_merged.rename(columns={'Hotel_Name': 'Hotel'}, inplace=True)

In [27]:
# merge toronto_grouped with toronto to add latitude/longitude for each neighborhood
hotels_merged = hotels_merged.join(venues_sorted.set_index('Hotel'), on='Hotel')
hotels_merged.head()

Unnamed: 0,PostalCode,Hotel,Address,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,75008,Mercure Paris Opera Garnier Hotel,4 rue de l 'Isly,48.873601,2.307613,6.0,French Restaurant,Café,Salad Place,Clothing Store,Cosmetics Shop,Bakery,Italian Restaurant,Japanese Restaurant,Cocktail Bar,Art Gallery
1,75009,Scribe Paris Opéra by Sofitel,1 rue Scribe,48.881372,2.33357,6.0,French Restaurant,Cocktail Bar,Bar,Italian Restaurant,Plaza,Lounge,Japanese Restaurant,Comedy Club,Music Venue,Theater
2,75017,Mercure Paris St Lazare Monceau hotel,99 bis Rue de Rome,48.884813,2.302839,6.0,French Restaurant,Bakery,Italian Restaurant,Bistro,Plaza,Restaurant,Japanese Restaurant,Supermarket,Asian Restaurant,Bar
3,75002,Hôtel Stendhal Place Vendôme Paris - MGallery ...,22 rue Danielle Casanova,48.867317,2.344443,6.0,French Restaurant,Cocktail Bar,Wine Bar,Bistro,Italian Restaurant,Bakery,Thai Restaurant,Pizza Place,Beer Bar,Boutique
4,75018,Mercure Paris Montmartre Sacré-Coeur Hotel,3 rue Caulaincourt,48.886698,2.35266,6.0,Bar,Bakery,Café,French Restaurant,Coffee Shop,Bistro,Convenience Store,Restaurant,Bookstore,Art Gallery


In [28]:
hotels_merged = hotels_merged.fillna(0)
hotels_merged['Cluster Labels']=hotels_merged['Cluster Labels'].astype(int)

In [29]:
# Visualize the resulting clusters
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(hotels_merged['Latitude'], hotels_merged['Longitude'], hotels_merged['Hotel'], hotels_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

The map shows the cluster grouping mode. It can be noticed that hotels with common venues and many tourist attractions are grouped together, compared with hotels located in less crowded areas and with fewer facilities and venues.

In [30]:
#  Examine each cluster
# 1st cluster
hotels_merged.loc[hotels_merged['Cluster Labels'] == 0, hotels_merged.columns[[1] + list(range(5, hotels_merged.shape[1]))]]

Unnamed: 0,Hotel,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
16,ibis budget Chambourcy Saint Germain,0,American Restaurant,Deli / Bodega,Medical Center,Café,Women's Store,Donut Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Exhibit
19,Novotel Paris Orly Rungis,0,0,0,0,0,0,0,0,0,0,0
28,ibis budget Orly Chevilly Tram 7,0,Disc Golf,Dive Bar,Soccer Field,Park,Café,BBQ Joint,Farmers Market,Falafel Restaurant,Exhibit,Entertainment Service
32,Hotel Mercure Paris CDG Airport & Convention,0,0,0,0,0,0,0,0,0,0,0
34,ibis budget Roissy CDG Paris Nord 2,0,0,0,0,0,0,0,0,0,0,0
35,ibis Styles Parc des Expositions de Villepinte,0,American Restaurant,Café,Pizza Place,Sports Bar,Grocery Store,Disc Golf,Farmers Market,Falafel Restaurant,Exhibit,Entertainment Service
36,ibis Villepinte Parc Expos,0,0,0,0,0,0,0,0,0,0,0


In [31]:
# 2nd cluster
hotels_merged.loc[hotels_merged['Cluster Labels'] == 1, hotels_merged.columns[[1] + list(range(5, hotels_merged.shape[1]))]]

Unnamed: 0,Hotel,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
37,bis budget Goussainville CDG,1,Construction & Landscaping,Women's Store,Dive Bar,Film Studio,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Exhibit,Entertainment Service,Electronics Store


In [32]:
# 3rd cluster
hotels_merged.loc[hotels_merged['Cluster Labels'] == 2, hotels_merged.columns[[1] + list(range(5, hotels_merged.shape[1]))]]

Unnamed: 0,Hotel,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,ibis Poissy,2,Gas Station,Women's Store,Dive Bar,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Exhibit,Entertainment Service,Electronics Store,EV Charging Station


In [33]:
# 4th cluster
hotels_merged.loc[hotels_merged['Cluster Labels'] == 3, hotels_merged.columns[[1] + list(range(5, hotels_merged.shape[1]))]]

Unnamed: 0,Hotel,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
39,hotelF1 Aulnay Garonor A1,3,Pool,Chinese Restaurant,Diner,Farmers Market,Falafel Restaurant,Exhibit,Entertainment Service,Electronics Store,EV Charging Station,Duty-free Shop


In [34]:
# 5th cluster
hotels_merged.loc[hotels_merged['Cluster Labels'] == 4, hotels_merged.columns[[1] + list(range(5, hotels_merged.shape[1]))]]

Unnamed: 0,Hotel,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
27,ibis budget Villemomble,4,Fast Food Restaurant,Chinese Restaurant,Gas Station,Tailor Shop,Dive Bar,Farmers Market,Falafel Restaurant,Exhibit,Entertainment Service,Electronics Store
30,Novotel Paris Charles-de-Gaulle Airport,4,Gas Station,Other Repair Shop,Auto Workshop,Women's Store,Film Studio,Farmers Market,Falafel Restaurant,Exhibit,Entertainment Service,Electronics Store


In [35]:
# 6th cluster
hotels_merged.loc[hotels_merged['Cluster Labels'] == 5, hotels_merged.columns[[1] + list(range(5, hotels_merged.shape[1]))]]

Unnamed: 0,Hotel,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
26,ibis Styles Paris Val de Fontenay,5,Pizza Place,Pub,Miscellaneous Shop,Bowling Alley,Restaurant,Park,Disc Golf,Exhibit,Entertainment Service,Electronics Store


In [36]:
# 7th cluster
hotels_merged.loc[hotels_merged['Cluster Labels'] == 6, hotels_merged.columns[[1] + list(range(5, hotels_merged.shape[1]))]]

Unnamed: 0,Hotel,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Mercure Paris Opera Garnier Hotel,6,French Restaurant,Café,Salad Place,Clothing Store,Cosmetics Shop,Bakery,Italian Restaurant,Japanese Restaurant,Cocktail Bar,Art Gallery
1,Scribe Paris Opéra by Sofitel,6,French Restaurant,Cocktail Bar,Bar,Italian Restaurant,Plaza,Lounge,Japanese Restaurant,Comedy Club,Music Venue,Theater
2,Mercure Paris St Lazare Monceau hotel,6,French Restaurant,Bakery,Italian Restaurant,Bistro,Plaza,Restaurant,Japanese Restaurant,Supermarket,Asian Restaurant,Bar
3,Hôtel Stendhal Place Vendôme Paris - MGallery ...,6,French Restaurant,Cocktail Bar,Wine Bar,Bistro,Italian Restaurant,Bakery,Thai Restaurant,Pizza Place,Beer Bar,Boutique
4,Mercure Paris Montmartre Sacré-Coeur Hotel,6,Bar,Bakery,Café,French Restaurant,Coffee Shop,Bistro,Convenience Store,Restaurant,Bookstore,Art Gallery
5,Hôtel L'Échiquier Opéra Paris - MGallery by So...,6,French Restaurant,Indian Restaurant,Café,Japanese Restaurant,Bar,Bistro,Italian Restaurant,Burger Joint,Restaurant,Train Station
6,ibis Paris Tour Montparnasse 15th,6,Italian Restaurant,French Restaurant,Bistro,Lebanese Restaurant,Restaurant,Supermarket,Bakery,Indian Restaurant,Park,Coffee Shop
7,Mercure Paris Gare Montparnasse Hotel,6,French Restaurant,Pizza Place,Fast Food Restaurant,Bakery,Steakhouse,Supermarket,Sushi Restaurant,Szechuan Restaurant,Brasserie,Food & Drink Shop
8,Mercure Paris Gare de Lyon TGV hotel,6,Supermarket,Bistro,Japanese Restaurant,Bar,Chinese Restaurant,Asian Restaurant,Sandwich Place,Soup Place,Beer Garden,Organic Grocery
9,Mercure Paris Bercy Bibliothèque Hotel,6,Vietnamese Restaurant,Asian Restaurant,Chinese Restaurant,French Restaurant,Thai Restaurant,Japanese Restaurant,Coffee Shop,Metro Station,Sandwich Place,Dessert Shop


## IV. Result Section

Using the latitude and longitude values it was created a map of Paris and nearby areas showing based on postal code and address the position of all the hotels collected in the database. For a better analysis it was selected including data about hotels nearby Paris with different surrounding facilities.

It was observed what venues are available near each hotel. Hotels that are in the centre of Paris have the most facilities (e.g. Hôtel L'Échiquier Opéra Paris - MGallery by Sofitel, Hôtel Stendhal Place Vendôme Paris - MGallery by Sofitel, Mercure Paris Opera Garnier Hotel, Novotel Paris Les Halles, Scribe Paris Opéra by Sofitel). Compaired to the hotels located further from the center (e.g. ibis Poissy, bis budget Goussainville CDG, Hôtel Mercure Paris Ouest Saint Germain, Novotel Paris Charles-de-Gaulle Airport).There is a huge variety of facilities from local and foreign restaurants (french, african, american, argentinian, asian, belgian, chinese, hawaiian and so on), airport services, art gallery, art museum, bus and trail station, shops, church, comedy club, grocery stores, market to jazz club, medical center, museum, nightclub, park, theater and many others.
Hotels from central area have in particular facilities, such as: local and foreign restaurants, bistro, cafe, bar, bakery, stores, plaza, lounge, along with the biggest tourist attractions: The Eiffel Tower, Louvre Museum, Notre Dame de Paris, Pantheon, Arc de Triomphe.

The map and the last 7 tables show the cluster grouping mode. The clusters were grouped by similarities. It can be noticed that hotels with common venues and many tourism attractions are grouped together, compared with hotels located in less crowded areas and with fewer facilities and venues. The largest cluster, the last one, contains the most number of hotels. They have the largest number of facilities and benefit of many tourist attractions. These are also the most expensive ones. The price of accommodation differs depending on the facilities and surroundings. The closer the hotel is to the tourist attractions, the more expensive it will be. 


## V. Discussion Section 

You will find a huge variety of hotel choices when you are planning to visit France. The cost of accommodation varies according to region and type. While hotel rooms are inevitably more expensive in Paris, there are also many interesting hotels nearby Paris.

It is interesting to note that, although the hotels in the center of the Paris (e.g. Hôtel L'Échiquier Opéra Paris - MGallery by Sofitel, Hôtel Stendhal Place Vendôme Paris - MGallery by Sofitel, Mercure Paris Opera Garnier Hotel, Novotel Paris Les Halles, Scribe Paris Opéra by Sofitel) might be considered very expensive due to all the surrounding facilities (such as local and foreign restaurants (french, african, american, argentinian, asian, belgian, chinese, hawaiian and so on), airport services, art gallery, art museum, bus and trail station, shops, church, comedy club, grocery stores, market, medical center, museum, nightclub, park, theater), hotels nearby the centre have also a wide range of facilities and tourist attractions: Parc des Expositions, Rolland Garros, Palace of Versailles, Stade de France, Parc de la Villette and so on.

Although, all the clusters have an optimal range of facilities, I have found two main patterns. The first pattern I am referring to, clusters 0, 1, 2, 3, 4, and 5, highlights the hotels that have a smaller number of facilities and are located in less crowded areas. Instead, the second pattern,  cluster 6, highlights the hotels that have the largest number of facilities and benefit of many tourist attractions. 

## VI. Conclusion Section 

To solve this business problem, we clustered hotels from Paris and nearby areas in order to recommend venues and to support people who are looking for the right hotel to take the best decisions.

We gathered data from https://www.accorhotels.com and Forsquare. We explore and target recommended locations across different venues, we extract, analyze and discover hidden insigths in data and finally we used the k-means clustering technique to cluster Paris and nearby hotels to see how they group togheted with the final puurpose to provide useful information related to accomodation strategies.

In conclusion, Paris is one of the most popular destinations in Europe and not only. It offers a wide range of tourist attractions: food, fashion, culture, nature, art and so on. If someone wants to travel there are many things to consider from choosing the right location, accomodation, flights, rental cars to attractions, restaurants, stores and other facilities. The price of accommodation depends on the number of venues and surrounding facilities. The closer the hotel is to the tourist attractions, the more expensive it will be. However the analyzes show that there are very good hotels at a very short distance from the center and that it is not necessarily mandatory to stay in the city center to visit the most beautiful attractions.
