### Hotel Recommendation System 

#### Problem Statement

To investigate the impact of customer reviews on hotel satisfaction and identify key factors influencing customer sentiment in order to improve the overall guest experience and make informed business decisions for hotels.

**Objectives**

Data Preparation: Preprocess and clean the hotel reviews dataset to ensure data quality and remove any irrelevant or redundant information.

Sentiment Analysis: Perform sentiment analysis on the customer reviews to determine the overall sentiment expressed by customers, such as positive, negative, or neutral.

Identify Key Factors: Analyze the customer reviews and extract the key factors that influence customer sentiment and satisfaction. These factors could include aspects like cleanliness, service quality, amenities, location, pricing, and staff behavior.

The need for the project lies in leveraging the wealth of customer review data to gain insights into the factors that influence hotel satisfaction, improve guest experiences, and enable hotels to make informed decisions for business success in the competitive hospitality industry.

### Success Metrics

Customer Satisfaction Score: Measure the overall customer satisfaction based on sentiment analysis of hotel reviews. This can be calculated as the percentage of positive reviews out of the total number of reviews.

Key Factors Identification: Evaluate the effectiveness of identifying key factors influencing customer sentiment by comparing the identified factors with expert knowledge or industry benchmarks. Measure the precision and recall of the identified factors.

## Data Understanding

This dataset contains 515,000 customer reviews and scoring of 1493 luxury hotels across Europe. Meanwhile, the geographical location of hotels are also provided for further analysis.

**Data Content**
The csv file contains 17 fields. The description of each field is as below:

Hotel_Address: Address of hotel.

Review_Date: Date when reviewer posted the corresponding review.

Average_Score: Average Score of the hotel, calculated based on the latest comment in the last year.

Hotel_Name: Name of Hotel

Reviewer_Nationality: Nationality of Reviewer

Negative_Review: Negative Review the reviewer gave to the hotel. If the reviewer does not give the negative review, then it should be: 'No Negative'

Review_Total_Negative_Word_Counts: Total number of words in the negative review.

Positive_Review: Positive Review the reviewer gave to the hotel. If the reviewer does not give the negative review, then it should be: 'No Positive'

Review_Total_Positive_Word_Counts: Total number of words in the positive review.

Reviewer_Score: Score the reviewer has given to the hotel, based on his/her experience

Total_Number_of_Reviews_Reviewer_Has_Given: Number of Reviews the reviewers has given in the past.

Total_Number_of_Reviews: Total number of valid reviews the hotel has.

Tags: Tags reviewer gave the hotel.

days_since_review: Duration between the review date and scrape date.

Additional_Number_of_Scoring: There are also some guests who just made a scoring on the service rather than a review. This number indicates how many valid scores without review in there.

lat: Latitude of the hotel

lng: longtitude of the hotel

## Loading the Data

In [1]:
#Import libraries
import numpy as np 
import pandas as pd 
import matplotlib


import nltk
from nltk import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer

In [2]:
# Read the data
data= pd.read_csv('Hotel_Reviews.csv')
data.head()

Unnamed: 0,Hotel_Address,Additional_Number_of_Scoring,Review_Date,Average_Score,Hotel_Name,Reviewer_Nationality,Negative_Review,Review_Total_Negative_Word_Counts,Total_Number_of_Reviews,Positive_Review,Review_Total_Positive_Word_Counts,Total_Number_of_Reviews_Reviewer_Has_Given,Reviewer_Score,Tags,days_since_review,lat,lng
0,s Gravesandestraat 55 Oost 1092 AA Amsterdam ...,194,8/3/2017,7.7,Hotel Arena,Russia,I am so angry that i made this post available...,397,1403,Only the park outside of the hotel was beauti...,11,7,2.9,"[' Leisure trip ', ' Couple ', ' Duplex Double...",0 days,52.360576,4.915968
1,s Gravesandestraat 55 Oost 1092 AA Amsterdam ...,194,8/3/2017,7.7,Hotel Arena,Ireland,No Negative,0,1403,No real complaints the hotel was great great ...,105,7,7.5,"[' Leisure trip ', ' Couple ', ' Duplex Double...",0 days,52.360576,4.915968
2,s Gravesandestraat 55 Oost 1092 AA Amsterdam ...,194,7/31/2017,7.7,Hotel Arena,Australia,Rooms are nice but for elderly a bit difficul...,42,1403,Location was good and staff were ok It is cut...,21,9,7.1,"[' Leisure trip ', ' Family with young childre...",3 days,52.360576,4.915968
3,s Gravesandestraat 55 Oost 1092 AA Amsterdam ...,194,7/31/2017,7.7,Hotel Arena,United Kingdom,My room was dirty and I was afraid to walk ba...,210,1403,Great location in nice surroundings the bar a...,26,1,3.8,"[' Leisure trip ', ' Solo traveler ', ' Duplex...",3 days,52.360576,4.915968
4,s Gravesandestraat 55 Oost 1092 AA Amsterdam ...,194,7/24/2017,7.7,Hotel Arena,New Zealand,You When I booked with your company on line y...,140,1403,Amazing location and building Romantic setting,8,3,6.7,"[' Leisure trip ', ' Couple ', ' Suite ', ' St...",10 days,52.360576,4.915968


In [3]:
#Rows and columns of the dataframe
data.shape

(515738, 17)

In [4]:
# summary of the dataframe
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 515738 entries, 0 to 515737
Data columns (total 17 columns):
 #   Column                                      Non-Null Count   Dtype  
---  ------                                      --------------   -----  
 0   Hotel_Address                               515738 non-null  object 
 1   Additional_Number_of_Scoring                515738 non-null  int64  
 2   Review_Date                                 515738 non-null  object 
 3   Average_Score                               515738 non-null  float64
 4   Hotel_Name                                  515738 non-null  object 
 5   Reviewer_Nationality                        515738 non-null  object 
 6   Negative_Review                             515738 non-null  object 
 7   Review_Total_Negative_Word_Counts           515738 non-null  int64  
 8   Total_Number_of_Reviews                     515738 non-null  int64  
 9   Positive_Review                             515738 non-null  object 
 

In [5]:
#Total number of the null values
data.isna().sum()

Hotel_Address                                    0
Additional_Number_of_Scoring                     0
Review_Date                                      0
Average_Score                                    0
Hotel_Name                                       0
Reviewer_Nationality                             0
Negative_Review                                  0
Review_Total_Negative_Word_Counts                0
Total_Number_of_Reviews                          0
Positive_Review                                  0
Review_Total_Positive_Word_Counts                0
Total_Number_of_Reviews_Reviewer_Has_Given       0
Reviewer_Score                                   0
Tags                                             0
days_since_review                                0
lat                                           3268
lng                                           3268
dtype: int64

In [6]:
print(data.Hotel_Address.head())

0     s Gravesandestraat 55 Oost 1092 AA Amsterdam ...
1     s Gravesandestraat 55 Oost 1092 AA Amsterdam ...
2     s Gravesandestraat 55 Oost 1092 AA Amsterdam ...
3     s Gravesandestraat 55 Oost 1092 AA Amsterdam ...
4     s Gravesandestraat 55 Oost 1092 AA Amsterdam ...
Name: Hotel_Address, dtype: object


In [7]:
# Change countries names to countries Alpha-2 codes for simplicity
data.Hotel_Address = data.Hotel_Address.str.replace("Netherlands","NL")
data.Hotel_Address = data.Hotel_Address.str.replace("United Kingdom","UK")
data.Hotel_Address = data.Hotel_Address.str.replace("france","FR")
data.Hotel_Address = data.Hotel_Address.str.replace("Spain","ES")
data.Hotel_Address = data.Hotel_Address.str.replace("Italy","IT")
data.Hotel_Address = data.Hotel_Address.str.replace("Austria","AT")

In [8]:
# Splitting the hotel address and picking out the last string which would be the countries
# and store them in new column by the name
data["countries"] = data.Hotel_Address.apply(lambda x: x.split(' ')[-1])

In [9]:
# checking on countries in the new column 
print(data.countries.unique())

['NL' 'UK' 'France' 'ES' 'IT' 'AT']


In [10]:
#choosing the required columns for our recommender
data.columns

Index(['Hotel_Address', 'Additional_Number_of_Scoring', 'Review_Date',
       'Average_Score', 'Hotel_Name', 'Reviewer_Nationality',
       'Negative_Review', 'Review_Total_Negative_Word_Counts',
       'Total_Number_of_Reviews', 'Positive_Review',
       'Review_Total_Positive_Word_Counts',
       'Total_Number_of_Reviews_Reviewer_Has_Given', 'Reviewer_Score', 'Tags',
       'days_since_review', 'lat', 'lng', 'countries'],
      dtype='object')

In [11]:
# Dropping unnecessary columns
data.drop(['Additional_Number_of_Scoring', 'Review_Date',
     'Reviewer_Nationality','Negative_Review', 'Review_Total_Negative_Word_Counts',
       'Total_Number_of_Reviews', 'Positive_Review',
       'Review_Total_Positive_Word_Counts',
       'Total_Number_of_Reviews_Reviewer_Has_Given', 'Reviewer_Score',
       'days_since_review', 'lat', 'lng'],1, inplace= True)

  data.drop(['Additional_Number_of_Scoring', 'Review_Date',


In [12]:
data.head()

Unnamed: 0,Hotel_Address,Average_Score,Hotel_Name,Tags,countries
0,s Gravesandestraat 55 Oost 1092 AA Amsterdam NL,7.7,Hotel Arena,"[' Leisure trip ', ' Couple ', ' Duplex Double...",NL
1,s Gravesandestraat 55 Oost 1092 AA Amsterdam NL,7.7,Hotel Arena,"[' Leisure trip ', ' Couple ', ' Duplex Double...",NL
2,s Gravesandestraat 55 Oost 1092 AA Amsterdam NL,7.7,Hotel Arena,"[' Leisure trip ', ' Family with young childre...",NL
3,s Gravesandestraat 55 Oost 1092 AA Amsterdam NL,7.7,Hotel Arena,"[' Leisure trip ', ' Solo traveler ', ' Duplex...",NL
4,s Gravesandestraat 55 Oost 1092 AA Amsterdam NL,7.7,Hotel Arena,"[' Leisure trip ', ' Couple ', ' Suite ', ' St...",NL


## Preprocessing

In [14]:
import string

# Define the inpute function to remove punctuation
def inpute(tags):
    tags = tags.translate(str.maketrans("", "", string.punctuation))
    return tags

# Apply the inpute function to the "Tags" column in the dataset
data["Tags"] = data["Tags"].apply(inpute)
data.head()



Unnamed: 0,Hotel_Address,Average_Score,Hotel_Name,Tags,countries
0,s Gravesandestraat 55 Oost 1092 AA Amsterdam NL,7.7,Hotel Arena,Leisure trip Couple Duplex Double Room ...,NL
1,s Gravesandestraat 55 Oost 1092 AA Amsterdam NL,7.7,Hotel Arena,Leisure trip Couple Duplex Double Room ...,NL
2,s Gravesandestraat 55 Oost 1092 AA Amsterdam NL,7.7,Hotel Arena,Leisure trip Family with young children D...,NL
3,s Gravesandestraat 55 Oost 1092 AA Amsterdam NL,7.7,Hotel Arena,Leisure trip Solo traveler Duplex Double ...,NL
4,s Gravesandestraat 55 Oost 1092 AA Amsterdam NL,7.7,Hotel Arena,Leisure trip Couple Suite Stayed 2 nigh...,NL


In [15]:
# change to lower case
data['Tags'] = data['Tags'].str.lower()
data['countries']= data['countries'].str.lower()

In [16]:
# Defining the recommender function
def recommender(location, description):
    # Dividing the texts into small tokens (sentences into words)
    description = description.lower()
    description_tokens = word_tokenize(description)
    
    # Applying stopwords
    stop_words = set(stopwords.words('english'))
    
    # Applying lemmatization
    lemm = WordNetLemmatizer()
    
    # Filtering the tokens
    filtered = [word for word in description_tokens if word not in stop_words]
    filtered_set = set(lemm.lemmatize(word) for word in filtered)
    
    # Creating a variable that takes in the location and returns the following features
    country = data[data['countries'] == location.lower()]
    country = country.reset_index(drop=True)
    
    cos = []
    for i in range(country.shape[0]):
        temp_token = word_tokenize(country.loc[i, 'Tags'])
        temp_set = {word for word in temp_token if word not in stop_words}
        temp2_set = set(lemm.lemmatize(s) for s in temp_set)
        vector = temp2_set.intersection(filtered_set)
        cos.append(len(vector))
    
    country['similarity'] = cos
    country = country.sort_values(by='similarity', ascending=False)
    country.drop_duplicates(subset='Hotel_Name', keep='first', inplace=True)
    country.sort_values('Average_Score', ascending=False, inplace=True)
    country.reset_index(inplace=True, drop=True)
    
    return country[['Hotel_Name', 'Average_Score', 'Hotel_Address']].head(10)
  

In [17]:
#['NL' 'UK' 'France' 'ES' 'IT' 'AT']
# Enter your input
recommender('UK', 'Business trip')

Unnamed: 0,Hotel_Name,Average_Score,Hotel_Address
0,41,9.6,41 Buckingham Palace Road Westminster Borough ...
1,Haymarket Hotel,9.6,1 Suffolk Place Westminster Borough London SW1...
2,Charlotte Street Hotel,9.5,15 17 Charlotte Street Hotel Westminster Borou...
3,Taj 51 Buckingham Gate Suites and Residences,9.5,Buckingham Gate Westminster Borough London SW1...
4,The Soho Hotel,9.5,4 Richmond Mews Westminster Borough London W1D...
5,Milestone Hotel Kensington,9.5,1 Kensington Court Kensington and Chelsea Lond...
6,Ham Yard Hotel,9.5,One Ham Yard Westminster Borough London W1D 7D...
7,The Lanesborough,9.4,Hyde Park Corner Westminster Borough London SW...
8,Lansbury Heritage Hotel,9.4,117 Poplar High Street Tower Hamlets London E1...
9,The Goring,9.4,15 Beeston Place Westminster Borough London SW...


In [18]:

# Enter your input
recommender('France', 'Business trip')

Unnamed: 0,Hotel_Name,Average_Score,Hotel_Address
0,Ritz Paris,9.8,15 Place Vend me 1st arr 75001 Paris France
1,H tel de La Tamise Esprit de France,9.6,4 rue d Alger 1st arr 75001 Paris France
2,Hotel The Peninsula Paris,9.5,19 avenue Kleber 16th arr 75116 Paris France
3,Le Narcisse Blanc Spa,9.5,19 Boulevard De La Tour Maubourg 7th arr 75007...
4,Hotel Monge,9.4,55 rue Monge 5th arr 75005 Paris France
5,Hotel Eiffel Blomet,9.4,78 Rue Blomet 15th arr 75015 Paris France
6,Nolinski Paris,9.4,16 Avenue de l Opera 1st arr 75001 Paris France
7,Goralska R sidences H tel Paris Bastille,9.4,7 Boulevard Bourdon 4th arr 75004 Paris France
8,La Chambre du Marais,9.4,85 87 RUE DES ARCHIVES 3rd arr 75003 Paris France
9,Splendide Royal Paris,9.4,18 Rue du Cirque 8th arr 75008 Paris France


In [19]:

# Enter your input
recommender('ES', 'Business trip')

Unnamed: 0,Hotel_Name,Average_Score,Hotel_Address
0,Hotel The Serras,9.6,Passeig de Colom 9 Ciutat Vella 08002 Barcelon...
1,H10 Casa Mimosa 4 Sup,9.6,Pau Claris 179 Eixample 08037 Barcelona ES
2,Hotel Casa Camper,9.6,Elisabets 11 Ciutat Vella 08001 Barcelona ES
3,Mercer Hotel Barcelona,9.5,Dels Lledo 7 Ciutat Vella 08003 Barcelona ES
4,Catalonia Square 4 Sup,9.4,Ronda Sant Pere 9 Eixample 08010 Barcelona ES
5,The One Barcelona GL,9.4,277 Carrer de Proven a Eixample 08037 Barcelon...
6,The Wittmore Adults Only,9.4,Riudarenes 7 Ciutat Vella 08002 Barcelona ES
7,Hotel Margot House,9.4,Paseo de Gracia 46 Eixample 08007 Barcelona ES
8,Catalonia Magdalenes,9.4,Magdalenes 13 15 Ciutat Vella 08002 Barcelona ES
9,Hotel Palace GL,9.4,Gran Via de les Corts Catalanes 668 Eixample 0...


In [20]:
# Enter your input
recommender('IT', 'Business trip')

Unnamed: 0,Hotel_Name,Average_Score,Hotel_Address
0,Excelsior Hotel Gallia Luxury Collection Hotel,9.4,Piazza Duca D Aosta 9 Central Station 20124 Mi...
1,Palazzo Parigi Hotel Grand Spa Milano,9.3,Corso Di Porta Nuova 1 Milan City Center 20121...
2,UNA Maison Milano,9.3,Via Mazzini 4 Milan City Center 20123 Milan IT
3,Hotel Spadari Al Duomo,9.3,Via Spadari 11 Milan City Center 20123 Milan IT
4,Room Mate Giulia,9.3,Silvio Pellico 4 Milan City Center 20121 Milan IT
5,Armani Hotel Milano,9.2,Via Manzoni 31 Milan City Center 20121 Milan IT
6,Hotel Manzoni,9.2,Via Santo Spirito 20 Milan City Center 20121 M...
7,ME Milan Il Duca,9.2,Piazza della Repubblica 13 Central Station 201...
8,Hotel Santa Marta Suites,9.2,Via Santa Marta 4 Milan City Center 20123 Mila...
9,The Yard Milano,9.2,Piazza XXIV Maggio 8 Milan City Center 20123 M...


In [21]:
# Enter your input
recommender('AT', 'Business trip')

Unnamed: 0,Hotel_Name,Average_Score,Hotel_Address
0,Palais Coburg Residenz,9.5,Coburgbastei 4 01 Innere Stadt 1010 Vienna AT
1,Hotel Sacher Wien,9.5,Philharmoniker Stra e 4 01 Innere Stadt 1010 V...
2,Best Western Premier Kaiserhof Wien,9.4,Frankenberggasse 10 04 Wieden 1040 Vienna AT
3,Hotel Sans Souci Wien,9.4,Burggasse 2 07 Neubau 1070 Vienna AT
4,The Guesthouse Vienna,9.4,F hrichgasse 10 01 Innere Stadt 1010 Vienna AT
5,Boutiquehotel Das Tyrol,9.4,Mariahilfer Stra e 15 06 Mariahilf 1060 Vienna AT
6,Hotel K nig von Ungarn,9.3,Schulerstra e 10 01 Innere Stadt 1010 Vienna AT
7,Hollmann Beletage Design Boutique,9.3,K llnerhofgasse 6 01 Innere Stadt 1010 Vienna AT
8,Hotel Rathaus Wein Design,9.3,Lange Gasse 13 08 Josefstadt 1080 Vienna AT
9,Hotel Am Stephansplatz,9.3,Stephansplatz 9 01 Innere Stadt 1010 Vienna AT


In [22]:
# Enter your input
recommender('NL', 'Business trip')

Unnamed: 0,Hotel_Name,Average_Score,Hotel_Address
0,Waldorf Astoria Amsterdam,9.5,Herengracht 542 556 Amsterdam City Center 1017...
1,Pillows Anna van den Vondel Amsterdam,9.4,Anna van den Vondelstraat 6 Oud West 1054 GZ A...
2,The Toren,9.4,Keizersgracht 164 Amsterdam City Center 1015 C...
3,Andaz Amsterdam Prinsengracht A Hyatt Hotel,9.3,Prinsengracht 587 Amsterdam City Center 1067 H...
4,Canal House,9.3,Keizersgracht 148 Amsterdam City Center 1015 C...
5,Luxury Suites Amsterdam,9.3,Oudeschans 75 Amsterdam City Center 1011 KW Am...
6,The Hoxton Amsterdam,9.3,Herengracht 255 Amsterdam City Center 1016 BJ ...
7,Ambassade Hotel,9.3,Herengracht 341 Amsterdam City Center 1016 AZ ...
8,Banks Mansion All Inclusive Hotel,9.2,Herengracht 519 525 Amsterdam City Center 1017...
9,Conservatorium Hotel,9.2,Van Baerlestraat 27 Oud Zuid 1071 AN Amsterdam NL
